[PATCH v1 0/2] virtio: fix for assertion failure: virtio_net_get_subqueue(nc)->async_tx.elem failed

Xuan Zhuo posted 2 patches 1 year, 3 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20230129025150.119972-1-xuanzhuo@linux.alibaba.com
Maintainers: "Michael S. Tsirkin" <mst@redhat.com>, Jason Wang <jasowang@redhat.com>
There is a newer version of this series
hw/net/virtio-net.c        |  3 ++-
hw/virtio/virtio.c         | 15 +++++++++++++++
include/hw/virtio/virtio.h |  1 +
3 files changed, 18 insertions(+), 1 deletion(-)
[PATCH v1 0/2] virtio: fix for assertion failure: virtio_net_get_subqueue(nc)->async_tx.elem failed
Posted by Xuan Zhuo 1 year, 3 months ago
In the current design, we stop the device from operating on the vring
during per-queue reset by resetting the structure VirtQueue.

But before the reset operation, when recycling some resources, we should
stop referencing new vring resources.

This bug is caused by this reason.

    https://gitlab.com/qemu-project/qemu/-/issues/1451

Before we reset the structure, we called the ->queue_reset callback to let the
device reclaim resources. Here virtio-net tries to release the packets sent
asynchronously, but during this process virtio_net_flush_tx() will be called,
and new data will be sent again. This leads to asserted.

     assert(!virtio_net_get_subqueue(nc)->async_tx.elem);

This patch set introduce new item "reset" into struct VirtQueue, then device can
know this virtqueue is per-queue reset state.

v1:
    1. rename "reset" to disabled_by_reset
    2. add api: virtio_queue_reset_state()

Xuan Zhuo (2):
  virtio: struct VirtQueue introduce reset
  virtio-net: virtio_net_flush_tx() check for per-queue reset

 hw/net/virtio-net.c        |  3 ++-
 hw/virtio/virtio.c         | 15 +++++++++++++++
 include/hw/virtio/virtio.h |  1 +
 3 files changed, 18 insertions(+), 1 deletion(-)

--
2.32.0.3.g01195cf9f
Re: [PATCH v1 0/2] virtio: fix for assertion failure: virtio_net_get_subqueue(nc)->async_tx.elem failed
Posted by Michael S. Tsirkin 1 year, 3 months ago
On Sun, Jan 29, 2023 at 10:51:48AM +0800, Xuan Zhuo wrote:
> In the current design, we stop the device from operating on the vring
> during per-queue reset by resetting the structure VirtQueue.
> 
> But before the reset operation, when recycling some resources, we should
> stop referencing new vring resources.
> 
> This bug is caused by this reason.
> 
>     https://gitlab.com/qemu-project/qemu/-/issues/1451
> 
> Before we reset the structure, we called the ->queue_reset callback to let the
> device reclaim resources. Here virtio-net tries to release the packets sent
> asynchronously, but during this process virtio_net_flush_tx() will be called,
> and new data will be sent again. This leads to asserted.
> 
>      assert(!virtio_net_get_subqueue(nc)->async_tx.elem);
> 
> This patch set introduce new item "reset" into struct VirtQueue, then device can
> know this virtqueue is per-queue reset state.

Better but I still don't exacctly understand what this state means.
Sent some questions on the patches themselves.
Thanks!


> v1:
>     1. rename "reset" to disabled_by_reset
>     2. add api: virtio_queue_reset_state()
> 
> Xuan Zhuo (2):
>   virtio: struct VirtQueue introduce reset
>   virtio-net: virtio_net_flush_tx() check for per-queue reset
> 
>  hw/net/virtio-net.c        |  3 ++-
>  hw/virtio/virtio.c         | 15 +++++++++++++++
>  include/hw/virtio/virtio.h |  1 +
>  3 files changed, 18 insertions(+), 1 deletion(-)
> 
> --
> 2.32.0.3.g01195cf9f