[PATCH] block/vhost-user-blk: Fix hang on boot for some odd guests

Andrey Ryabinin posted 1 patch 1 year ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20230410083509.3311-1-arbn@yandex-team.com
Maintainers: Raphael Norwitz <raphael.norwitz@nutanix.com>, "Michael S. Tsirkin" <mst@redhat.com>, Kevin Wolf <kwolf@redhat.com>, Hanna Reitz <hreitz@redhat.com>
hw/block/vhost-user-blk.c | 4 ----
1 file changed, 4 deletions(-)
[PATCH] block/vhost-user-blk: Fix hang on boot for some odd guests
Posted by Andrey Ryabinin 1 year ago
Some guests hang on boot when using the vhost-user-blk-pci device,
but boot normally when using the virtio-blk device. The problem occurs
because the guest advertises VIRTIO_F_VERSION_1 but kicks the virtqueue
before setting VIRTIO_CONFIG_S_DRIVER_OK, causing vdev->start_on_kick to
be false in vhost_user_blk_handle_output() and preventing the device from
starting.

Fix this by removing the check for vdev->start_on_kick to ensure
that the device starts after the kick. This aligns the behavior of
'vhost-user-blk-pci' device with 'virtio-blk' as it does the similar
thing in its virtio_blk_handle_output() function.

Fixes: 110b9463d5c8 ("vhost-user-blk: start vhost when guest kicks")
Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com>
---
 hw/block/vhost-user-blk.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index aff4d2b8cbd..448ead448f3 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -279,10 +279,6 @@ static void vhost_user_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
     Error *local_err = NULL;
     int i, ret;
 
-    if (!vdev->start_on_kick) {
-        return;
-    }
-
     if (!s->connected) {
         return;
     }
-- 
2.39.2
Re: [PATCH] block/vhost-user-blk: Fix hang on boot for some odd guests
Posted by Andrey Ryabinin 1 year ago

On 4/10/23 10:35, Andrey Ryabinin wrote:
> Some guests hang on boot when using the vhost-user-blk-pci device,
> but boot normally when using the virtio-blk device. The problem occurs
> because the guest advertises VIRTIO_F_VERSION_1 but kicks the virtqueue
> before setting VIRTIO_CONFIG_S_DRIVER_OK, causing vdev->start_on_kick to
> be false in vhost_user_blk_handle_output() and preventing the device from
> starting.
> 
> Fix this by removing the check for vdev->start_on_kick to ensure
> that the device starts after the kick. This aligns the behavior of
> 'vhost-user-blk-pci' device with 'virtio-blk' as it does the similar
> thing in its virtio_blk_handle_output() function.
> 
> Fixes: 110b9463d5c8 ("vhost-user-blk: start vhost when guest kicks")
> Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com>
> ---
>  hw/block/vhost-user-blk.c | 4 ----
>  1 file changed, 4 deletions(-)
> 
> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> index aff4d2b8cbd..448ead448f3 100644
> --- a/hw/block/vhost-user-blk.c
> +++ b/hw/block/vhost-user-blk.c
> @@ -279,10 +279,6 @@ static void vhost_user_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
>      Error *local_err = NULL;
>      int i, ret;
>  
> -    if (!vdev->start_on_kick) {
> -        return;
> -    }
> -
>      if (!s->connected) {
>          return;
>      }


After looking a bit closer to this ->start_on_kick thing ( commit badaf79cfdbd ("virtio: Introduce started flag to VirtioDevice")
and follow ups) I'm starting to think that removing it entirely would be the right thing to do here.
The whole reason for it was to add special case for !VIRTIO_F_VERSION_1 guests.
If we making start on kick thing for misbehaving VIRTIO_F_VERSION_1 guests too, than the flag is no longer required,
so we can do following:

---
 hw/block/vhost-user-blk.c  |  4 ----
 hw/virtio/virtio-qmp.c     |  2 +-
 hw/virtio/virtio.c         | 21 ++-------------------
 include/hw/virtio/virtio.h |  5 -----
 4 files changed, 3 insertions(+), 29 deletions(-)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index aff4d2b8cbd..448ead448f3 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -279,10 +279,6 @@ static void vhost_user_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
     Error *local_err = NULL;
     int i, ret;
 
-    if (!vdev->start_on_kick) {
-        return;
-    }
-
     if (!s->connected) {
         return;
     }
diff --git a/hw/virtio/virtio-qmp.c b/hw/virtio/virtio-qmp.c
index e4d4bece2d7..4865819cd2f 100644
--- a/hw/virtio/virtio-qmp.c
+++ b/hw/virtio/virtio-qmp.c
@@ -773,7 +773,7 @@ VirtioStatus *qmp_x_query_virtio_status(const char *path, Error **errp)
     status->disabled = vdev->disabled;
     status->use_started = vdev->use_started;
     status->started = vdev->started;
-    status->start_on_kick = vdev->start_on_kick;
+    status->start_on_kick = true;
     status->disable_legacy_check = vdev->disable_legacy_check;
     status->bus_name = g_strdup(vdev->bus_name);
     status->use_guest_notifier_mask = vdev->use_guest_notifier_mask;
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index f35178f5fcd..218584eae85 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2126,7 +2126,6 @@ void virtio_reset(void *opaque)
         k->reset(vdev);
     }
 
-    vdev->start_on_kick = false;
     vdev->started = false;
     vdev->broken = false;
     vdev->guest_features = 0;
@@ -2248,9 +2247,7 @@ static void virtio_queue_notify_vq(VirtQueue *vq)
         trace_virtio_queue_notify(vdev, vq - vdev->vq, vq);
         vq->handle_output(vdev, vq);
 
-        if (unlikely(vdev->start_on_kick)) {
-            virtio_set_started(vdev, true);
-        }
+        virtio_set_started(vdev, true);
     }
 }
 
@@ -2268,9 +2265,7 @@ void virtio_queue_notify(VirtIODevice *vdev, int n)
     } else if (vq->handle_output) {
         vq->handle_output(vdev, vq);
 
-        if (unlikely(vdev->start_on_kick)) {
-            virtio_set_started(vdev, true);
-        }
+        virtio_set_started(vdev, true);
     }
 }
 
@@ -2881,12 +2876,6 @@ int virtio_set_features(VirtIODevice *vdev, uint64_t val)
             }
         }
     }
-    if (!ret) {
-        if (!virtio_device_started(vdev, vdev->status) &&
-            !virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) {
-            vdev->start_on_kick = true;
-        }
-    }
     return ret;
 }
 
@@ -3039,11 +3028,6 @@ int virtio_load(VirtIODevice *vdev, QEMUFile *f, int version_id)
         }
     }
 
-    if (!virtio_device_started(vdev, vdev->status) &&
-        !virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) {
-        vdev->start_on_kick = true;
-    }
-
     RCU_READ_LOCK_GUARD();
     for (i = 0; i < num; i++) {
         if (vdev->vq[i].vring.desc) {
@@ -3162,7 +3146,6 @@ void virtio_init(VirtIODevice *vdev, uint16_t device_id, size_t config_size)
             g_malloc0(sizeof(*vdev->vector_queues) * nvectors);
     }
 
-    vdev->start_on_kick = false;
     vdev->started = false;
     vdev->vhost_started = false;
     vdev->device_id = device_id;
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 77c6c55929f..5742876b4fa 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -144,7 +144,6 @@ struct VirtIODevice
      */
     bool use_started;
     bool started;
-    bool start_on_kick; /* when virtio 1.0 feature has not been negotiated */
     bool disable_legacy_check;
     bool vhost_started;
     VMChangeStateEntry *vmstate;
@@ -460,10 +459,6 @@ static inline bool virtio_device_should_start(VirtIODevice *vdev, uint8_t status
 
 static inline void virtio_set_started(VirtIODevice *vdev, bool started)
 {
-    if (started) {
-        vdev->start_on_kick = false;
-    }
-
     if (vdev->use_started) {
         vdev->started = started;
     }
-- 
2.39.2
Re: [PATCH] block/vhost-user-blk: Fix hang on boot for some odd guests
Posted by Raphael Norwitz 1 year ago
Hey Andrey - apologies for the late reply here.

It sounds like you are dealing with a buggy guest, rather than a QEMU issue.

> On Apr 10, 2023, at 11:39 AM, Andrey Ryabinin <arbn@yandex-team.com> wrote:
> 
> 
> 
> On 4/10/23 10:35, Andrey Ryabinin wrote:
>> Some guests hang on boot when using the vhost-user-blk-pci device,
>> but boot normally when using the virtio-blk device. The problem occurs
>> because the guest advertises VIRTIO_F_VERSION_1 but kicks the virtqueue
>> before setting VIRTIO_CONFIG_S_DRIVER_OK, causing vdev->start_on_kick to

Virtio 1.1 Section 3.1.1, says during setup “[t]he driver MUST NOT notify the device before setting DRIVER_OK.”

Therefore what you are describing is buggy guest behavior. Sounds like the driver should be made to either
- not advertise VIRTIO_F_VERSION_1
- not kick before setting VIRTIO_CONFIG_S_DRIVER_OK

If anything, the virtio-blk virtio_blk_handle_output() function should probably check start_on_kick?

>> be false in vhost_user_blk_handle_output() and preventing the device from
>> starting.
>> 
>> Fix this by removing the check for vdev->start_on_kick to ensure
>> that the device starts after the kick. This aligns the behavior of
>> 'vhost-user-blk-pci' device with 'virtio-blk' as it does the similar
>> thing in its virtio_blk_handle_output() function.
>> 
>> Fixes: 110b9463d5c8 ("vhost-user-blk: start vhost when guest kicks")
>> Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com>
>> ---
>> hw/block/vhost-user-blk.c | 4 ----
>> 1 file changed, 4 deletions(-)
>> 
>> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
>> index aff4d2b8cbd..448ead448f3 100644
>> --- a/hw/block/vhost-user-blk.c
>> +++ b/hw/block/vhost-user-blk.c
>> @@ -279,10 +279,6 @@ static void vhost_user_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
>>     Error *local_err = NULL;
>>     int i, ret;
>> 
>> -    if (!vdev->start_on_kick) {
>> -        return;
>> -    }
>> -
>>     if (!s->connected) {
>>         return;
>>     }
> 
> 
> After looking a bit closer to this ->start_on_kick thing ( commit badaf79cfdbd ("virtio: Introduce started flag to VirtioDevice")
> and follow ups) I'm starting to think that removing it entirely would be the right thing to do here.
> The whole reason for it was to add special case for !VIRTIO_F_VERSION_1 guests.

The virtio 1.0 spec section 2.1.2 explicitly says: "The device MUST NOT consume buffers or notify the driver before DRIVER_OK.”

Your change here would make QEMU violate this condition. I don’t know what the danger is but I assume that wording is there for a reason.

Unless MST or Cornellia (CCed) say otherwise I don’t think this is the correct approach.

> If we making start on kick thing for misbehaving VIRTIO_F_VERSION_1 guests too, than the flag is no longer required,
> so we can do following:
> 
> ---
> hw/block/vhost-user-blk.c  |  4 ----
> hw/virtio/virtio-qmp.c     |  2 +-
> hw/virtio/virtio.c         | 21 ++-------------------
> include/hw/virtio/virtio.h |  5 -----
> 4 files changed, 3 insertions(+), 29 deletions(-)
> 
> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> index aff4d2b8cbd..448ead448f3 100644
> --- a/hw/block/vhost-user-blk.c
> +++ b/hw/block/vhost-user-blk.c
> @@ -279,10 +279,6 @@ static void vhost_user_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
>     Error *local_err = NULL;
>     int i, ret;
> 
> -    if (!vdev->start_on_kick) {
> -        return;
> -    }
> -
>     if (!s->connected) {
>         return;
>     }
> diff --git a/hw/virtio/virtio-qmp.c b/hw/virtio/virtio-qmp.c
> index e4d4bece2d7..4865819cd2f 100644
> --- a/hw/virtio/virtio-qmp.c
> +++ b/hw/virtio/virtio-qmp.c
> @@ -773,7 +773,7 @@ VirtioStatus *qmp_x_query_virtio_status(const char *path, Error **errp)
>     status->disabled = vdev->disabled;
>     status->use_started = vdev->use_started;
>     status->started = vdev->started;
> -    status->start_on_kick = vdev->start_on_kick;
> +    status->start_on_kick = true;
>     status->disable_legacy_check = vdev->disable_legacy_check;
>     status->bus_name = g_strdup(vdev->bus_name);
>     status->use_guest_notifier_mask = vdev->use_guest_notifier_mask;
> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> index f35178f5fcd..218584eae85 100644
> --- a/hw/virtio/virtio.c
> +++ b/hw/virtio/virtio.c
> @@ -2126,7 +2126,6 @@ void virtio_reset(void *opaque)
>         k->reset(vdev);
>     }
> 
> -    vdev->start_on_kick = false;
>     vdev->started = false;
>     vdev->broken = false;
>     vdev->guest_features = 0;
> @@ -2248,9 +2247,7 @@ static void virtio_queue_notify_vq(VirtQueue *vq)
>         trace_virtio_queue_notify(vdev, vq - vdev->vq, vq);
>         vq->handle_output(vdev, vq);
> 
> -        if (unlikely(vdev->start_on_kick)) {
> -            virtio_set_started(vdev, true);
> -        }
> +        virtio_set_started(vdev, true);
>     }
> }
> 
> @@ -2268,9 +2265,7 @@ void virtio_queue_notify(VirtIODevice *vdev, int n)
>     } else if (vq->handle_output) {
>         vq->handle_output(vdev, vq);
> 
> -        if (unlikely(vdev->start_on_kick)) {
> -            virtio_set_started(vdev, true);
> -        }
> +        virtio_set_started(vdev, true);
>     }
> }
> 
> @@ -2881,12 +2876,6 @@ int virtio_set_features(VirtIODevice *vdev, uint64_t val)
>             }
>         }
>     }
> -    if (!ret) {
> -        if (!virtio_device_started(vdev, vdev->status) &&
> -            !virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) {
> -            vdev->start_on_kick = true;
> -        }
> -    }
>     return ret;
> }
> 
> @@ -3039,11 +3028,6 @@ int virtio_load(VirtIODevice *vdev, QEMUFile *f, int version_id)
>         }
>     }
> 
> -    if (!virtio_device_started(vdev, vdev->status) &&
> -        !virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) {
> -        vdev->start_on_kick = true;
> -    }
> -
>     RCU_READ_LOCK_GUARD();
>     for (i = 0; i < num; i++) {
>         if (vdev->vq[i].vring.desc) {
> @@ -3162,7 +3146,6 @@ void virtio_init(VirtIODevice *vdev, uint16_t device_id, size_t config_size)
>             g_malloc0(sizeof(*vdev->vector_queues) * nvectors);
>     }
> 
> -    vdev->start_on_kick = false;
>     vdev->started = false;
>     vdev->vhost_started = false;
>     vdev->device_id = device_id;
> diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> index 77c6c55929f..5742876b4fa 100644
> --- a/include/hw/virtio/virtio.h
> +++ b/include/hw/virtio/virtio.h
> @@ -144,7 +144,6 @@ struct VirtIODevice
>      */
>     bool use_started;
>     bool started;
> -    bool start_on_kick; /* when virtio 1.0 feature has not been negotiated */
>     bool disable_legacy_check;
>     bool vhost_started;
>     VMChangeStateEntry *vmstate;
> @@ -460,10 +459,6 @@ static inline bool virtio_device_should_start(VirtIODevice *vdev, uint8_t status
> 
> static inline void virtio_set_started(VirtIODevice *vdev, bool started)
> {
> -    if (started) {
> -        vdev->start_on_kick = false;
> -    }
> -
>     if (vdev->use_started) {
>         vdev->started = started;
>     }
> -- 
> 2.39.2

Re: [PATCH] block/vhost-user-blk: Fix hang on boot for some odd guests
Posted by Andrey Ryabinin 1 year ago
On 4/18/23 07:13, Raphael Norwitz wrote:
> Hey Andrey - apologies for the late reply here.
> 
> It sounds like you are dealing with a buggy guest, rather than a QEMU issue.

No arguing here, the guest is buggy.
However, the issue with QEMU is that virtio-blk tolerate such buggy guest
while vhost-user-blk is not.
We've been using virtio-blk in our cloud for a while and recently started switching to vhost-user-blk
which led us to discover this problem.

>> On Apr 10, 2023, at 11:39 AM, Andrey Ryabinin <arbn@yandex-team.com> wrote:
>>
>>
>>
>> On 4/10/23 10:35, Andrey Ryabinin wrote:
>>> Some guests hang on boot when using the vhost-user-blk-pci device,
>>> but boot normally when using the virtio-blk device. The problem occurs
>>> because the guest advertises VIRTIO_F_VERSION_1 but kicks the virtqueue
>>> before setting VIRTIO_CONFIG_S_DRIVER_OK, causing vdev->start_on_kick to
> 
> Virtio 1.1 Section 3.1.1, says during setup “[t]he driver MUST NOT notify the device before setting DRIVER_OK.”
> 
> Therefore what you are describing is buggy guest behavior. Sounds like the driver should be made to either
> - not advertise VIRTIO_F_VERSION_1
> - not kick before setting VIRTIO_CONFIG_S_DRIVER_OK
> 
> If anything, the virtio-blk virtio_blk_handle_output() function should probably check start_on_kick?
> 

Ideally this should have been done from the start. But if we do it now we'll just break these guests.


Re: [PATCH] block/vhost-user-blk: Fix hang on boot for some odd guests
Posted by Michael S. Tsirkin 1 year ago
On Tue, Apr 18, 2023 at 06:37:04PM +0200, Andrey Ryabinin wrote:
> On 4/18/23 07:13, Raphael Norwitz wrote:
> > Hey Andrey - apologies for the late reply here.
> > 
> > It sounds like you are dealing with a buggy guest, rather than a QEMU issue.
> 
> No arguing here, the guest is buggy.
> However, the issue with QEMU is that virtio-blk tolerate such buggy guest
> while vhost-user-blk is not.
> We've been using virtio-blk in our cloud for a while and recently started switching to vhost-user-blk
> which led us to discover this problem.
> 
> >> On Apr 10, 2023, at 11:39 AM, Andrey Ryabinin <arbn@yandex-team.com> wrote:
> >>
> >>
> >>
> >> On 4/10/23 10:35, Andrey Ryabinin wrote:
> >>> Some guests hang on boot when using the vhost-user-blk-pci device,
> >>> but boot normally when using the virtio-blk device. The problem occurs
> >>> because the guest advertises VIRTIO_F_VERSION_1 but kicks the virtqueue
> >>> before setting VIRTIO_CONFIG_S_DRIVER_OK, causing vdev->start_on_kick to
> > 
> > Virtio 1.1 Section 3.1.1, says during setup “[t]he driver MUST NOT notify the device before setting DRIVER_OK.”
> > 
> > Therefore what you are describing is buggy guest behavior. Sounds like the driver should be made to either
> > - not advertise VIRTIO_F_VERSION_1
> > - not kick before setting VIRTIO_CONFIG_S_DRIVER_OK
> > 
> > If anything, the virtio-blk virtio_blk_handle_output() function should probably check start_on_kick?
> > 
> 
> Ideally this should have been done from the start. But if we do it now we'll just break these guests.

The problem with hacks like this is the problem proliferates.  What are
those guests and how hard are they to fix?

-- 
MST


Re: [PATCH] block/vhost-user-blk: Fix hang on boot for some odd guests
Posted by Michael S. Tsirkin 1 year ago
On Tue, Apr 18, 2023 at 05:13:11AM +0000, Raphael Norwitz wrote:
> Hey Andrey - apologies for the late reply here.
> 
> It sounds like you are dealing with a buggy guest, rather than a QEMU issue.
> 
> > On Apr 10, 2023, at 11:39 AM, Andrey Ryabinin <arbn@yandex-team.com> wrote:
> > 
> > 
> > 
> > On 4/10/23 10:35, Andrey Ryabinin wrote:
> >> Some guests hang on boot when using the vhost-user-blk-pci device,
> >> but boot normally when using the virtio-blk device. The problem occurs
> >> because the guest advertises VIRTIO_F_VERSION_1 but kicks the virtqueue
> >> before setting VIRTIO_CONFIG_S_DRIVER_OK, causing vdev->start_on_kick to
> 
> Virtio 1.1 Section 3.1.1, says during setup “[t]he driver MUST NOT notify the device before setting DRIVER_OK.”
> 
> Therefore what you are describing is buggy guest behavior. Sounds like the driver should be made to either
> - not advertise VIRTIO_F_VERSION_1
> - not kick before setting VIRTIO_CONFIG_S_DRIVER_OK
> 
> If anything, the virtio-blk virtio_blk_handle_output() function should probably check start_on_kick?

Question is, how easy is this guest to fix.



> >> be false in vhost_user_blk_handle_output() and preventing the device from
> >> starting.
> >> 
> >> Fix this by removing the check for vdev->start_on_kick to ensure
> >> that the device starts after the kick. This aligns the behavior of
> >> 'vhost-user-blk-pci' device with 'virtio-blk' as it does the similar
> >> thing in its virtio_blk_handle_output() function.
> >> 
> >> Fixes: 110b9463d5c8 ("vhost-user-blk: start vhost when guest kicks")
> >> Signed-off-by: Andrey Ryabinin <arbn@yandex-team.com>
> >> ---
> >> hw/block/vhost-user-blk.c | 4 ----
> >> 1 file changed, 4 deletions(-)
> >> 
> >> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> >> index aff4d2b8cbd..448ead448f3 100644
> >> --- a/hw/block/vhost-user-blk.c
> >> +++ b/hw/block/vhost-user-blk.c
> >> @@ -279,10 +279,6 @@ static void vhost_user_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
> >>     Error *local_err = NULL;
> >>     int i, ret;
> >> 
> >> -    if (!vdev->start_on_kick) {
> >> -        return;
> >> -    }
> >> -
> >>     if (!s->connected) {
> >>         return;
> >>     }
> > 
> > 
> > After looking a bit closer to this ->start_on_kick thing ( commit badaf79cfdbd ("virtio: Introduce started flag to VirtioDevice")
> > and follow ups) I'm starting to think that removing it entirely would be the right thing to do here.
> > The whole reason for it was to add special case for !VIRTIO_F_VERSION_1 guests.
> 
> The virtio 1.0 spec section 2.1.2 explicitly says: "The device MUST NOT consume buffers or notify the driver before DRIVER_OK.”
> 
> Your change here would make QEMU violate this condition. I don’t know what the danger is but I assume that wording is there for a reason.
> 
> Unless MST or Cornellia (CCed) say otherwise I don’t think this is the correct approach.

If we propagate HV workarounds then guests do not get fixed.


> > If we making start on kick thing for misbehaving VIRTIO_F_VERSION_1 guests too, than the flag is no longer required,
> > so we can do following:
> > 
> > ---
> > hw/block/vhost-user-blk.c  |  4 ----
> > hw/virtio/virtio-qmp.c     |  2 +-
> > hw/virtio/virtio.c         | 21 ++-------------------
> > include/hw/virtio/virtio.h |  5 -----
> > 4 files changed, 3 insertions(+), 29 deletions(-)
> > 
> > diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
> > index aff4d2b8cbd..448ead448f3 100644
> > --- a/hw/block/vhost-user-blk.c
> > +++ b/hw/block/vhost-user-blk.c
> > @@ -279,10 +279,6 @@ static void vhost_user_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq)
> >     Error *local_err = NULL;
> >     int i, ret;
> > 
> > -    if (!vdev->start_on_kick) {
> > -        return;
> > -    }
> > -
> >     if (!s->connected) {
> >         return;
> >     }
> > diff --git a/hw/virtio/virtio-qmp.c b/hw/virtio/virtio-qmp.c
> > index e4d4bece2d7..4865819cd2f 100644
> > --- a/hw/virtio/virtio-qmp.c
> > +++ b/hw/virtio/virtio-qmp.c
> > @@ -773,7 +773,7 @@ VirtioStatus *qmp_x_query_virtio_status(const char *path, Error **errp)
> >     status->disabled = vdev->disabled;
> >     status->use_started = vdev->use_started;
> >     status->started = vdev->started;
> > -    status->start_on_kick = vdev->start_on_kick;
> > +    status->start_on_kick = true;
> >     status->disable_legacy_check = vdev->disable_legacy_check;
> >     status->bus_name = g_strdup(vdev->bus_name);
> >     status->use_guest_notifier_mask = vdev->use_guest_notifier_mask;
> > diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
> > index f35178f5fcd..218584eae85 100644
> > --- a/hw/virtio/virtio.c
> > +++ b/hw/virtio/virtio.c
> > @@ -2126,7 +2126,6 @@ void virtio_reset(void *opaque)
> >         k->reset(vdev);
> >     }
> > 
> > -    vdev->start_on_kick = false;
> >     vdev->started = false;
> >     vdev->broken = false;
> >     vdev->guest_features = 0;
> > @@ -2248,9 +2247,7 @@ static void virtio_queue_notify_vq(VirtQueue *vq)
> >         trace_virtio_queue_notify(vdev, vq - vdev->vq, vq);
> >         vq->handle_output(vdev, vq);
> > 
> > -        if (unlikely(vdev->start_on_kick)) {
> > -            virtio_set_started(vdev, true);
> > -        }
> > +        virtio_set_started(vdev, true);
> >     }
> > }
> > 
> > @@ -2268,9 +2265,7 @@ void virtio_queue_notify(VirtIODevice *vdev, int n)
> >     } else if (vq->handle_output) {
> >         vq->handle_output(vdev, vq);
> > 
> > -        if (unlikely(vdev->start_on_kick)) {
> > -            virtio_set_started(vdev, true);
> > -        }
> > +        virtio_set_started(vdev, true);
> >     }
> > }
> > 
> > @@ -2881,12 +2876,6 @@ int virtio_set_features(VirtIODevice *vdev, uint64_t val)
> >             }
> >         }
> >     }
> > -    if (!ret) {
> > -        if (!virtio_device_started(vdev, vdev->status) &&
> > -            !virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) {
> > -            vdev->start_on_kick = true;
> > -        }
> > -    }
> >     return ret;
> > }
> > 
> > @@ -3039,11 +3028,6 @@ int virtio_load(VirtIODevice *vdev, QEMUFile *f, int version_id)
> >         }
> >     }
> > 
> > -    if (!virtio_device_started(vdev, vdev->status) &&
> > -        !virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1)) {
> > -        vdev->start_on_kick = true;
> > -    }
> > -
> >     RCU_READ_LOCK_GUARD();
> >     for (i = 0; i < num; i++) {
> >         if (vdev->vq[i].vring.desc) {
> > @@ -3162,7 +3146,6 @@ void virtio_init(VirtIODevice *vdev, uint16_t device_id, size_t config_size)
> >             g_malloc0(sizeof(*vdev->vector_queues) * nvectors);
> >     }
> > 
> > -    vdev->start_on_kick = false;
> >     vdev->started = false;
> >     vdev->vhost_started = false;
> >     vdev->device_id = device_id;
> > diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
> > index 77c6c55929f..5742876b4fa 100644
> > --- a/include/hw/virtio/virtio.h
> > +++ b/include/hw/virtio/virtio.h
> > @@ -144,7 +144,6 @@ struct VirtIODevice
> >      */
> >     bool use_started;
> >     bool started;
> > -    bool start_on_kick; /* when virtio 1.0 feature has not been negotiated */
> >     bool disable_legacy_check;
> >     bool vhost_started;
> >     VMChangeStateEntry *vmstate;
> > @@ -460,10 +459,6 @@ static inline bool virtio_device_should_start(VirtIODevice *vdev, uint8_t status
> > 
> > static inline void virtio_set_started(VirtIODevice *vdev, bool started)
> > {
> > -    if (started) {
> > -        vdev->start_on_kick = false;
> > -    }
> > -
> >     if (vdev->use_started) {
> >         vdev->started = started;
> >     }
> > -- 
> > 2.39.2
> 


Re: [PATCH] block/vhost-user-blk: Fix hang on boot for some odd guests
Posted by Andrey Ryabinin 1 year ago

On 4/18/23 08:17, Michael S. Tsirkin wrote:
> On Tue, Apr 18, 2023 at 05:13:11AM +0000, Raphael Norwitz wrote:
>> Hey Andrey - apologies for the late reply here.
>>
>> It sounds like you are dealing with a buggy guest, rather than a QEMU issue.
>>
>>> On Apr 10, 2023, at 11:39 AM, Andrey Ryabinin <arbn@yandex-team.com> wrote:
>>>
>>>
>>>
>>> On 4/10/23 10:35, Andrey Ryabinin wrote:
>>>> Some guests hang on boot when using the vhost-user-blk-pci device,
>>>> but boot normally when using the virtio-blk device. The problem occurs
>>>> because the guest advertises VIRTIO_F_VERSION_1 but kicks the virtqueue
>>>> before setting VIRTIO_CONFIG_S_DRIVER_OK, causing vdev->start_on_kick to
>>
>> Virtio 1.1 Section 3.1.1, says during setup “[t]he driver MUST NOT notify the device before setting DRIVER_OK.”
>>
>> Therefore what you are describing is buggy guest behavior. Sounds like the driver should be made to either
>> - not advertise VIRTIO_F_VERSION_1
>> - not kick before setting VIRTIO_CONFIG_S_DRIVER_OK
>>
>> If anything, the virtio-blk virtio_blk_handle_output() function should probably check start_on_kick?
> 
> Question is, how easy is this guest to fix.
> 

I wouldn't count on that.

In this case the guest is Foritgate firewall, apparently from this guys https://www.fortinet.com/
It seems that the kernel they use claims itself as 3.2.16 Linux kernel, however it looks like
it's not vanilla kernel, but modified with some backports. I'm guessing that they backported
the patches introducing VIRTIO_F_VERSTION_1, but they didn't add this patch https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7a11370e5e6c26566904bb7f08281093a3002ff2

I've tried to look the sources of the kernel they use but failed to find any.
Found only some news about gpl voilation from 2005 )


Re: [PATCH] block/vhost-user-blk: Fix hang on boot for some odd guests
Posted by Michael S. Tsirkin 1 year ago
On Tue, Apr 18, 2023 at 07:20:29PM +0200, Andrey Ryabinin wrote:
> 
> 
> On 4/18/23 08:17, Michael S. Tsirkin wrote:
> > On Tue, Apr 18, 2023 at 05:13:11AM +0000, Raphael Norwitz wrote:
> >> Hey Andrey - apologies for the late reply here.
> >>
> >> It sounds like you are dealing with a buggy guest, rather than a QEMU issue.
> >>
> >>> On Apr 10, 2023, at 11:39 AM, Andrey Ryabinin <arbn@yandex-team.com> wrote:
> >>>
> >>>
> >>>
> >>> On 4/10/23 10:35, Andrey Ryabinin wrote:
> >>>> Some guests hang on boot when using the vhost-user-blk-pci device,
> >>>> but boot normally when using the virtio-blk device. The problem occurs
> >>>> because the guest advertises VIRTIO_F_VERSION_1 but kicks the virtqueue
> >>>> before setting VIRTIO_CONFIG_S_DRIVER_OK, causing vdev->start_on_kick to
> >>
> >> Virtio 1.1 Section 3.1.1, says during setup “[t]he driver MUST NOT notify the device before setting DRIVER_OK.”
> >>
> >> Therefore what you are describing is buggy guest behavior. Sounds like the driver should be made to either
> >> - not advertise VIRTIO_F_VERSION_1
> >> - not kick before setting VIRTIO_CONFIG_S_DRIVER_OK
> >>
> >> If anything, the virtio-blk virtio_blk_handle_output() function should probably check start_on_kick?
> > 
> > Question is, how easy is this guest to fix.
> > 
> 
> I wouldn't count on that.
> 
> In this case the guest is Foritgate firewall, apparently from this guys https://www.fortinet.com/
> It seems that the kernel they use claims itself as 3.2.16 Linux kernel, however it looks like
> it's not vanilla kernel, but modified with some backports. I'm guessing that they backported
> the patches introducing VIRTIO_F_VERSTION_1, but they didn't add this patch https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7a11370e5e6c26566904bb7f08281093a3002ff2
> 
> I've tried to look the sources of the kernel they use but failed to find any.
> Found only some news about gpl voilation from 2005 )

All I remember is this is a can of worms, the change seems risky
frankly. If you like, I'm open to an unstable property
(starting with "x-") that forces start on kick.
Will be handy for debugging too.


-- 
MST