[PATCH v3 0/2] vhost-user: Support vhost_dev_start

Yajun Wu posted 2 patches 1 year, 7 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20221017064452.1226514-1-yajunw@nvidia.com
Maintainers: Raphael Norwitz <raphael.norwitz@nutanix.com>, "Michael S. Tsirkin" <mst@redhat.com>, Kevin Wolf <kwolf@redhat.com>, Hanna Reitz <hreitz@redhat.com>, Jason Wang <jasowang@redhat.com>
There is a newer version of this series
hw/block/vhost-user-blk.c | 18 ++++++----
hw/net/vhost_net.c        | 12 +++----
hw/virtio/vhost-user.c    | 74 ++++++++++++++++++++++++++++++++++++++-
3 files changed, 90 insertions(+), 14 deletions(-)
[PATCH v3 0/2] vhost-user: Support vhost_dev_start
Posted by Yajun Wu 1 year, 7 months ago
The motivation of adding vhost-user vhost_dev_start support is to
improve backend configuration speed and reduce live migration VM
downtime.

Today VQ configuration is issued one by one. For virtio net with
multi-queue support, backend needs to update RSS (Receive side
scaling) on every rx queue enable. Updating RSS is time-consuming
(typical time like 7ms).

Implement already defined vhost status and message in the vhost
specification [1].
(a) VHOST_USER_PROTOCOL_F_STATUS
(b) VHOST_USER_SET_STATUS
(c) VHOST_USER_GET_STATUS

Send message VHOST_USER_SET_STATUS with VIRTIO_CONFIG_S_DRIVER_OK for
device start and reset(0) for device stop.

On reception of the DRIVER_OK message, backend can apply the needed setting
only once (instead of incremental) and also utilize parallelism on enabling
queues.

This improves QEMU's live migration downtime with vhost user backend
implementation by great margin, specially for the large number of VQs of 64
from 800 msec to 250 msec.

Another change is to move the device start routines after finishing all the
necessary device and VQ configuration, further aligning to the virtio
specification for "device initialization sequence".

[1] https://qemu-project.gitlab.io/qemu/interop/vhost-user.html#introduction

v3:
- rebase

v2:
- add setting status bit VIRTIO_CONFIG_S_FEATURES_OK
- avoid adding status bits already set

Yajun Wu (2):
  vhost: Change the sequence of device start
  vhost-user: Support vhost_dev_start

 hw/block/vhost-user-blk.c | 18 ++++++----
 hw/net/vhost_net.c        | 12 +++----
 hw/virtio/vhost-user.c    | 74 ++++++++++++++++++++++++++++++++++++++-
 3 files changed, 90 insertions(+), 14 deletions(-)

-- 
2.27.0
Re: [PATCH v3 0/2] vhost-user: Support vhost_dev_start
Posted by Michael S. Tsirkin 1 year, 6 months ago
On Mon, Oct 17, 2022 at 02:44:50PM +0800, Yajun Wu wrote:
> The motivation of adding vhost-user vhost_dev_start support is to
> improve backend configuration speed and reduce live migration VM
> downtime.
> 
> Today VQ configuration is issued one by one. For virtio net with
> multi-queue support, backend needs to update RSS (Receive side
> scaling) on every rx queue enable. Updating RSS is time-consuming
> (typical time like 7ms).
> 
> Implement already defined vhost status and message in the vhost
> specification [1].
> (a) VHOST_USER_PROTOCOL_F_STATUS
> (b) VHOST_USER_SET_STATUS
> (c) VHOST_USER_GET_STATUS
> 
> Send message VHOST_USER_SET_STATUS with VIRTIO_CONFIG_S_DRIVER_OK for
> device start and reset(0) for device stop.
> 
> On reception of the DRIVER_OK message, backend can apply the needed setting
> only once (instead of incremental) and also utilize parallelism on enabling
> queues.
> 
> This improves QEMU's live migration downtime with vhost user backend
> implementation by great margin, specially for the large number of VQs of 64
> from 800 msec to 250 msec.
> 
> Another change is to move the device start routines after finishing all the
> necessary device and VQ configuration, further aligning to the virtio
> specification for "device initialization sequence".
> 
> [1] https://qemu-project.gitlab.io/qemu/interop/vhost-user.html#introduction


This patchset seems to trip up ubsan.
https://gitlab.com/mitsirkin/qemu/-/pipelines/684763327/failures


specifially:

passes before this patchset:
https://gitlab.com/mitsirkin/qemu/-/jobs/3269302594

fails with this patchset:
https://gitlab.com/mitsirkin/qemu/-/pipelines/684763327/failures

(there's one patch on top but it seems unrelated)


Seems hard to debug, only reproduced under gitlab though Stefan reports
reproducing this locally.
I'm thinking of dropping this patchset for now, deferring to next
release - thoughts?



> v3:
> - rebase
> 
> v2:
> - add setting status bit VIRTIO_CONFIG_S_FEATURES_OK
> - avoid adding status bits already set
> 
> Yajun Wu (2):
>   vhost: Change the sequence of device start
>   vhost-user: Support vhost_dev_start
> 
>  hw/block/vhost-user-blk.c | 18 ++++++----
>  hw/net/vhost_net.c        | 12 +++----
>  hw/virtio/vhost-user.c    | 74 ++++++++++++++++++++++++++++++++++++++-
>  3 files changed, 90 insertions(+), 14 deletions(-)
> 
> -- 
> 2.27.0
Re: [PATCH v3 0/2] vhost-user: Support vhost_dev_start
Posted by Michael S. Tsirkin 1 year, 6 months ago
On Fri, Nov 04, 2022 at 03:11:44AM -0400, Michael S. Tsirkin wrote:
> On Mon, Oct 17, 2022 at 02:44:50PM +0800, Yajun Wu wrote:
> > The motivation of adding vhost-user vhost_dev_start support is to
> > improve backend configuration speed and reduce live migration VM
> > downtime.
> > 
> > Today VQ configuration is issued one by one. For virtio net with
> > multi-queue support, backend needs to update RSS (Receive side
> > scaling) on every rx queue enable. Updating RSS is time-consuming
> > (typical time like 7ms).
> > 
> > Implement already defined vhost status and message in the vhost
> > specification [1].
> > (a) VHOST_USER_PROTOCOL_F_STATUS
> > (b) VHOST_USER_SET_STATUS
> > (c) VHOST_USER_GET_STATUS
> > 
> > Send message VHOST_USER_SET_STATUS with VIRTIO_CONFIG_S_DRIVER_OK for
> > device start and reset(0) for device stop.
> > 
> > On reception of the DRIVER_OK message, backend can apply the needed setting
> > only once (instead of incremental) and also utilize parallelism on enabling
> > queues.
> > 
> > This improves QEMU's live migration downtime with vhost user backend
> > implementation by great margin, specially for the large number of VQs of 64
> > from 800 msec to 250 msec.
> > 
> > Another change is to move the device start routines after finishing all the
> > necessary device and VQ configuration, further aligning to the virtio
> > specification for "device initialization sequence".
> > 
> > [1] https://qemu-project.gitlab.io/qemu/interop/vhost-user.html#introduction
> 
> 
> This patchset seems to trip up ubsan.
> https://gitlab.com/mitsirkin/qemu/-/pipelines/684763327/failures
> 
> 
> specifially:
> 
> passes before this patchset:
> https://gitlab.com/mitsirkin/qemu/-/jobs/3269302594
> 
> fails with this patchset:
> https://gitlab.com/mitsirkin/qemu/-/pipelines/684763327/failures
> 
> (there's one patch on top but it seems unrelated)
> 

And this is with just this patchset:
https://gitlab.com/mitsirkin/qemu/-/pipelines/685419082/failures
in fact it's just the 1st patch:
https://gitlab.com/mitsirkin/qemu/-/pipelines/685417959/failures

log here:
https://gitlab.com/mitsirkin/qemu/-/jobs/3273601845

emu-system-arm: Failed to write msg. Wrote -1 instead of 20.
qemu-system-arm: vhost VQ 0 ring restore failed: -22: Invalid argument (22)
qemu-system-arm: Failed to set msg fds.
qemu-system-arm: vhost VQ 1 ring restore failed: -22: Invalid argument (22)
qemu-system-arm: -chardev socket,id=chr-reconnect,path=/tmp/vhost-test-U5G0U1/reconnect.sock,server=on: info: QEMU waiting for connection on: disconnected:unix:/tmp/vhost-test-U5G0U1/reconnect.sock,server=on
qemu-system-arm: Failed to write msg. Wrote -1 instead of 20.
qemu-system-arm: vhost VQ 0 ring restore failed: -22: Invalid argument (22)
qemu-system-arm: Failed to set msg fds.
qemu-system-arm: vhost VQ 1 ring restore failed: -22: Invalid argument (22)
qemu-system-arm: -chardev socket,id=chr-connect-fail,path=/tmp/vhost-test-CN7ZU1/connect-fail.sock,server=on: info: QEMU waiting for connection on: disconnected:unix:/tmp/vhost-test-CN7ZU1/connect-fail.sock,server=on
qemu-system-arm: -netdev vhost-user,id=hs0,chardev=chr-connect-fail,vhostforce=on: Failed to read msg header. Read 0 instead of 12. Original request 1.
qemu-system-arm: -netdev vhost-user,id=hs0,chardev=chr-connect-fail,vhostforce=on: vhost_backend_init failed: Protocol error
qemu-system-arm: -netdev vhost-user,id=hs0,chardev=chr-connect-fail,vhostforce=on: failed to init vhost_net for queue 0
qemu-system-arm: -netdev vhost-user,id=hs0,chardev=chr-connect-fail,vhostforce=on: info: QEMU waiting for connection on: disconnected:unix:/tmp/vhost-test-CN7ZU1/connect-fail.sock,server=on
qemu-system-arm: Failed to write msg. Wrote -1 instead of 20.
qemu-system-arm: vhost VQ 0 ring restore failed: -22: Invalid argument (22)
qemu-system-arm: Failed to set msg fds.
qemu-system-arm: vhost VQ 1 ring restore failed: -22: Invalid argument (22)
qemu-system-arm: -chardev socket,id=chr-flags-mismatch,path=/tmp/vhost-test-4KRYU1/flags-mismatch.sock,server=on: info: QEMU waiting for connection on: disconnected:unix:/tmp/vhost-test-4KRYU1/flags-mismatch.sock,server=on
qemu-system-arm: Failed to write msg. Wrote -1 instead of 52.
qemu-system-arm: vhost_set_mem_table failed: Invalid argument (22)
qemu-system-arm: Failed to set msg fds.
qemu-system-arm: vhost VQ 0 ring restore failed: -22: Invalid argument (22)
UndefinedBehaviorSanitizer:DEADLYSIGNAL
==8621==ERROR: UndefinedBehaviorSanitizer: SEGV on unknown address 0x000000000000 (pc 0x55c91aca0620 bp 0x000000000000 sp 0x7ffd150b8600 T8621)
==8621==The signal is caused by a READ memory access.
==8621==Hint: address points to the zero page.
    #0 0x55c91aca0620 in ldl_he_p /builds/mitsirkin/qemu/include/qemu/bswap.h:301:5
    #1 0x55c91aca0620 in ldn_he_p /builds/mitsirkin/qemu/include/qemu/bswap.h:440:1
    #2 0x55c91aca0620 in flatview_write_continue /builds/mitsirkin/qemu/build/../softmmu/physmem.c:2824:19
    #3 0x55c91ac9da91 in flatview_write /builds/mitsirkin/qemu/build/../softmmu/physmem.c:2867:12
    #4 0x55c91ac9da91 in address_space_write /builds/mitsirkin/qemu/build/../softmmu/physmem.c:2963:18
    #5 0x55c91ac9e857 in address_space_unmap /builds/mitsirkin/qemu/build/../softmmu/physmem.c:3306:9
    #6 0x55c91ac461ec in vhost_memory_unmap /builds/mitsirkin/qemu/build/../hw/virtio/vhost.c:342:9
    #7 0x55c91ac461ec in vhost_virtqueue_stop /builds/mitsirkin/qemu/build/../hw/virtio/vhost.c:1242:5
    #8 0x55c91ac46484 in vhost_dev_stop /builds/mitsirkin/qemu/build/../hw/virtio/vhost.c:1882:9
    #9 0x55c91a664cc4 in vhost_net_stop_one /builds/mitsirkin/qemu/build/../hw/net/vhost_net.c:329:5
    #10 0x55c91a6646a6 in vhost_net_start /builds/mitsirkin/qemu/build/../hw/net/vhost_net.c:402:13
    #11 0x55c91abdfda6 in virtio_net_vhost_status /builds/mitsirkin/qemu/build/../hw/net/virtio-net.c:297:13
    #12 0x55c91abdfda6 in virtio_net_set_status /builds/mitsirkin/qemu/build/../hw/net/virtio-net.c:378:5
    #13 0x55c91ac322a9 in virtio_set_status /builds/mitsirkin/qemu/build/../hw/virtio/virtio.c:2442:9
    #14 0x55c91a7f69a0 in virtio_mmio_write /builds/mitsirkin/qemu/build/../hw/virtio/virtio-mmio.c:428:9
    #15 0x55c91ac882b6 in memory_region_write_accessor /builds/mitsirkin/qemu/build/../softmmu/memory.c:492:5
    #16 0x55c91ac8809a in access_with_adjusted_size /builds/mitsirkin/qemu/build/../softmmu/memory.c:554:18
    #17 0x55c91ac87e3d in memory_region_dispatch_write /builds/mitsirkin/qemu/build/../softmmu/memory.c
    #18 0x55c91aca0661 in flatview_write_continue /builds/mitsirkin/qemu/build/../softmmu/physmem.c:2825:23
    #19 0x55c91ac9da91 in flatview_write /builds/mitsirkin/qemu/build/../softmmu/physmem.c:2867:12
    #20 0x55c91ac9da91 in address_space_write /builds/mitsirkin/qemu/build/../softmmu/physmem.c:2963:18
    #21 0x55c91aca4766 in qtest_process_command /builds/mitsirkin/qemu/build/../softmmu/qtest.c
    #22 0x55c91aca3bfd in qtest_process_inbuf /builds/mitsirkin/qemu/build/../softmmu/qtest.c:796:9
    #23 0x55c91aedd672 in tcp_chr_read /builds/mitsirkin/qemu/build/../chardev/char-socket.c:508:13
    #24 0x7f199db120ae in g_main_context_dispatch (/lib64/libglib-2.0.so.0+0x550ae)
    #25 0x55c91afcfd2c in glib_pollfds_poll /builds/mitsirkin/qemu/build/../util/main-loop.c:297:9
    #26 0x55c91afcfd2c in os_host_main_loop_wait /builds/mitsirkin/qemu/build/../util/main-loop.c:320:5
    #27 0x55c91afcfd2c in main_loop_wait /builds/mitsirkin/qemu/build/../util/main-loop.c:596:11
    #28 0x55c91a826d36 in qemu_main_loop /builds/mitsirkin/qemu/build/../softmmu/runstate.c:739:9
    #29 0x55c91a3df4e5 in qemu_default_main /builds/mitsirkin/qemu/build/../softmmu/main.c:37:14
    #30 0x7f199b85aeaf in __libc_start_call_main (/lib64/libc.so.6+0x3feaf)
    #31 0x7f199b85af5f in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x3ff5f)
    #32 0x55c91a3b6084 in _start (/builds/mitsirkin/qemu/build/qemu-system-arm+0xc16084)
UndefinedBehaviorSanitizer can not provide additional info.
SUMMARY: UndefinedBehaviorSanitizer: SEGV /builds/mitsirkin/qemu/include/qemu/bswap.h:301:5 in ldl_he_p
==8621==ABORTING
Broken pipe
../tests/qtest/libqtest.c:179: kill_qemu() tried to terminate QEMU process but encountered exit status 1 (expected 0)
**
ERROR:../tests/qtest/qos-test.c:191:subprocess_run_one_test: child process (/arm/virt/virtio-mmio/virtio-bus/virtio-net-device/virtio-net/virtio-net-tests/vhost-user/flags-mismatch/subprocess [8612]) failed unexpectedly
(test program exited with status code -6)







> Seems hard to debug, only reproduced under gitlab though Stefan reports
> reproducing this locally.
> I'm thinking of dropping this patchset for now, deferring to next
> release - thoughts?
> 
> 
> 
> > v3:
> > - rebase
> > 
> > v2:
> > - add setting status bit VIRTIO_CONFIG_S_FEATURES_OK
> > - avoid adding status bits already set
> > 
> > Yajun Wu (2):
> >   vhost: Change the sequence of device start
> >   vhost-user: Support vhost_dev_start
> > 
> >  hw/block/vhost-user-blk.c | 18 ++++++----
> >  hw/net/vhost_net.c        | 12 +++----
> >  hw/virtio/vhost-user.c    | 74 ++++++++++++++++++++++++++++++++++++++-
> >  3 files changed, 90 insertions(+), 14 deletions(-)
> > 
> > -- 
> > 2.27.0