[PATCH v1 0/7] vhost-user-blk: fix the migration issue and enhance qtests

Dima Stepanov posted 7 patches 3 years, 9 months ago
Test docker-quick@centos7 failed
Test docker-mingw@fedora failed
Test checkpatch failed
Test FreeBSD failed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/cover.1596536559.git.dimastep@yandex-team.ru
Maintainers: Laurent Vivier <lvivier@redhat.com>, Raphael Norwitz <raphael.norwitz@nutanix.com>, Max Reitz <mreitz@redhat.com>, Thomas Huth <thuth@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>, "Michael S. Tsirkin" <mst@redhat.com>, Kevin Wolf <kwolf@redhat.com>
There is a newer version of this series
hw/block/vhost-user-blk.c          |  13 +-
hw/virtio/vhost.c                  |  39 ++++-
include/hw/virtio/vhost-user-blk.h |   1 +
tests/qtest/libqos/virtio-blk.c    |  14 ++
tests/qtest/vhost-user-test.c      | 291 +++++++++++++++++++++++++++++++------
5 files changed, 311 insertions(+), 47 deletions(-)
[PATCH v1 0/7] vhost-user-blk: fix the migration issue and enhance qtests
Posted by Dima Stepanov 3 years, 9 months ago
Reference e-mail threads:
  - https://lists.gnu.org/archive/html/qemu-devel/2020-05/msg01509.html
  - https://lists.gnu.org/archive/html/qemu-devel/2020-05/msg05241.html

If vhost-user daemon is used as a backend for the vhost device, then we
should consider a possibility of disconnect at any moment. There was a general
question here: should we consider it as an error or okay state for the vhost-user
devices during migration process?
I think the disconnect event for the vhost-user devices should not break the
migration process, because:
  - the device will be in the stopped state, so it will not be changed
    during migration
  - if reconnect will be made the migration log will be reinitialized as
    part of reconnect/init process:
    #0  vhost_log_global_start (listener=0x563989cf7be0)
    at hw/virtio/vhost.c:920
    #1  0x000056398603d8bc in listener_add_address_space (listener=0x563989cf7be0,
        as=0x563986ea4340 <address_space_memory>)
    at softmmu/memory.c:2664
    #2  0x000056398603dd30 in memory_listener_register (listener=0x563989cf7be0,
        as=0x563986ea4340 <address_space_memory>)
    at softmmu/memory.c:2740
    #3  0x0000563985fd6956 in vhost_dev_init (hdev=0x563989cf7bd8,
        opaque=0x563989cf7e30, backend_type=VHOST_BACKEND_TYPE_USER,
        busyloop_timeout=0)
    at hw/virtio/vhost.c:1385
    #4  0x0000563985f7d0b8 in vhost_user_blk_connect (dev=0x563989cf7990)
    at hw/block/vhost-user-blk.c:315
    #5  0x0000563985f7d3f6 in vhost_user_blk_event (opaque=0x563989cf7990,
        event=CHR_EVENT_OPENED)
    at hw/block/vhost-user-blk.c:379
The first patch in the patchset fixes this issue by setting vhost device to the
stopped state in the disconnect handler and check it the vhost_migration_log()
routine before returning from the function.
qtest framework was updated to test vhost-user-blk functionality. The
vhost-user-blk/vhost-user-blk-tests/migrate_reconnect test was added to reproduce
the original issue found.

Dima Stepanov (7):
  vhost: recheck dev state in the vhost_migration_log routine
  vhost: check queue state in the vhost_dev_set_log routine
  tests/qtest/vhost-user-test: prepare the tests for adding new dev
    class
  tests/qtest/libqos/virtio-blk: add support for vhost-user-blk
  tests/qtest/vhost-user-test: add support for the vhost-user-blk device
  tests/qtest/vhost-user-test: add migrate_reconnect test
  tests/qtest/vhost-user-test: enable the reconnect tests

 hw/block/vhost-user-blk.c          |  13 +-
 hw/virtio/vhost.c                  |  39 ++++-
 include/hw/virtio/vhost-user-blk.h |   1 +
 tests/qtest/libqos/virtio-blk.c    |  14 ++
 tests/qtest/vhost-user-test.c      | 291 +++++++++++++++++++++++++++++++------
 5 files changed, 311 insertions(+), 47 deletions(-)

-- 
2.7.4


Re: [PATCH v1 0/7] vhost-user-blk: fix the migration issue and enhance qtests
Posted by Michael S. Tsirkin 3 years, 9 months ago
On Tue, Aug 04, 2020 at 01:36:45PM +0300, Dima Stepanov wrote:
> Reference e-mail threads:
>   - https://lists.gnu.org/archive/html/qemu-devel/2020-05/msg01509.html
>   - https://lists.gnu.org/archive/html/qemu-devel/2020-05/msg05241.html
> 
> If vhost-user daemon is used as a backend for the vhost device, then we
> should consider a possibility of disconnect at any moment. There was a general
> question here: should we consider it as an error or okay state for the vhost-user
> devices during migration process?
> I think the disconnect event for the vhost-user devices should not break the
> migration process, because:
>   - the device will be in the stopped state, so it will not be changed
>     during migration
>   - if reconnect will be made the migration log will be reinitialized as
>     part of reconnect/init process:
>     #0  vhost_log_global_start (listener=0x563989cf7be0)
>     at hw/virtio/vhost.c:920
>     #1  0x000056398603d8bc in listener_add_address_space (listener=0x563989cf7be0,
>         as=0x563986ea4340 <address_space_memory>)
>     at softmmu/memory.c:2664
>     #2  0x000056398603dd30 in memory_listener_register (listener=0x563989cf7be0,
>         as=0x563986ea4340 <address_space_memory>)
>     at softmmu/memory.c:2740
>     #3  0x0000563985fd6956 in vhost_dev_init (hdev=0x563989cf7bd8,
>         opaque=0x563989cf7e30, backend_type=VHOST_BACKEND_TYPE_USER,
>         busyloop_timeout=0)
>     at hw/virtio/vhost.c:1385
>     #4  0x0000563985f7d0b8 in vhost_user_blk_connect (dev=0x563989cf7990)
>     at hw/block/vhost-user-blk.c:315
>     #5  0x0000563985f7d3f6 in vhost_user_blk_event (opaque=0x563989cf7990,
>         event=CHR_EVENT_OPENED)
>     at hw/block/vhost-user-blk.c:379
> The first patch in the patchset fixes this issue by setting vhost device to the
> stopped state in the disconnect handler and check it the vhost_migration_log()
> routine before returning from the function.

So I'm a bit confused. Isn't the connected state sufficient for this?
If not, adding some code comments explaining when is each flag
set would be helpful.
Thanks!

> qtest framework was updated to test vhost-user-blk functionality. The
> vhost-user-blk/vhost-user-blk-tests/migrate_reconnect test was added to reproduce
> the original issue found.
> 
> Dima Stepanov (7):
>   vhost: recheck dev state in the vhost_migration_log routine
>   vhost: check queue state in the vhost_dev_set_log routine
>   tests/qtest/vhost-user-test: prepare the tests for adding new dev
>     class
>   tests/qtest/libqos/virtio-blk: add support for vhost-user-blk
>   tests/qtest/vhost-user-test: add support for the vhost-user-blk device
>   tests/qtest/vhost-user-test: add migrate_reconnect test
>   tests/qtest/vhost-user-test: enable the reconnect tests
> 
>  hw/block/vhost-user-blk.c          |  13 +-
>  hw/virtio/vhost.c                  |  39 ++++-
>  include/hw/virtio/vhost-user-blk.h |   1 +
>  tests/qtest/libqos/virtio-blk.c    |  14 ++
>  tests/qtest/vhost-user-test.c      | 291 +++++++++++++++++++++++++++++++------
>  5 files changed, 311 insertions(+), 47 deletions(-)
> 
> -- 
> 2.7.4


Re: [PATCH v1 0/7] vhost-user-blk: fix the migration issue and enhance qtests
Posted by Dima Stepanov 3 years, 9 months ago
On Tue, Aug 04, 2020 at 10:19:17AM -0400, Michael S. Tsirkin wrote:
> On Tue, Aug 04, 2020 at 01:36:45PM +0300, Dima Stepanov wrote:
> > Reference e-mail threads:
> >   - https://lists.gnu.org/archive/html/qemu-devel/2020-05/msg01509.html
> >   - https://lists.gnu.org/archive/html/qemu-devel/2020-05/msg05241.html
> > 
> > If vhost-user daemon is used as a backend for the vhost device, then we
> > should consider a possibility of disconnect at any moment. There was a general
> > question here: should we consider it as an error or okay state for the vhost-user
> > devices during migration process?
> > I think the disconnect event for the vhost-user devices should not break the
> > migration process, because:
> >   - the device will be in the stopped state, so it will not be changed
> >     during migration
> >   - if reconnect will be made the migration log will be reinitialized as
> >     part of reconnect/init process:
> >     #0  vhost_log_global_start (listener=0x563989cf7be0)
> >     at hw/virtio/vhost.c:920
> >     #1  0x000056398603d8bc in listener_add_address_space (listener=0x563989cf7be0,
> >         as=0x563986ea4340 <address_space_memory>)
> >     at softmmu/memory.c:2664
> >     #2  0x000056398603dd30 in memory_listener_register (listener=0x563989cf7be0,
> >         as=0x563986ea4340 <address_space_memory>)
> >     at softmmu/memory.c:2740
> >     #3  0x0000563985fd6956 in vhost_dev_init (hdev=0x563989cf7bd8,
> >         opaque=0x563989cf7e30, backend_type=VHOST_BACKEND_TYPE_USER,
> >         busyloop_timeout=0)
> >     at hw/virtio/vhost.c:1385
> >     #4  0x0000563985f7d0b8 in vhost_user_blk_connect (dev=0x563989cf7990)
> >     at hw/block/vhost-user-blk.c:315
> >     #5  0x0000563985f7d3f6 in vhost_user_blk_event (opaque=0x563989cf7990,
> >         event=CHR_EVENT_OPENED)
> >     at hw/block/vhost-user-blk.c:379
> > The first patch in the patchset fixes this issue by setting vhost device to the
> > stopped state in the disconnect handler and check it the vhost_migration_log()
> > routine before returning from the function.
> 
> So I'm a bit confused. Isn't the connected state sufficient for this?
> If not, adding some code comments explaining when is each flag
> set would be helpful.
> Thanks!
Well, not really. The "connected" field is used internally as the flag
in the _connect/_disconnect routines. Because we made oneshot_bh for the
disconnect routine we can't really use it. Also in general the
vhost_log_global_start() routine doesn't know anything about the device
type (in this case vhost-user), so it is not correct to use this
variable here. So what i want to reflect that vhost-user-blk code should
change the state of the device to stopped state and not the general vhost
code should check the connection status. Because of it i've update the general
(struct vhost_dev)->started field with the stopped state. But yes, it is
a good idea to update the comments in include/hw/virtio/vhost-user-blk.h.
Will do it in v2.

> > qtest framework was updated to test vhost-user-blk functionality. The
> > vhost-user-blk/vhost-user-blk-tests/migrate_reconnect test was added to reproduce
> > the original issue found.
> > 
> > Dima Stepanov (7):
> >   vhost: recheck dev state in the vhost_migration_log routine
> >   vhost: check queue state in the vhost_dev_set_log routine
> >   tests/qtest/vhost-user-test: prepare the tests for adding new dev
> >     class
> >   tests/qtest/libqos/virtio-blk: add support for vhost-user-blk
> >   tests/qtest/vhost-user-test: add support for the vhost-user-blk device
> >   tests/qtest/vhost-user-test: add migrate_reconnect test
> >   tests/qtest/vhost-user-test: enable the reconnect tests
> > 
> >  hw/block/vhost-user-blk.c          |  13 +-
> >  hw/virtio/vhost.c                  |  39 ++++-
> >  include/hw/virtio/vhost-user-blk.h |   1 +
> >  tests/qtest/libqos/virtio-blk.c    |  14 ++
> >  tests/qtest/vhost-user-test.c      | 291 +++++++++++++++++++++++++++++++------
> >  5 files changed, 311 insertions(+), 47 deletions(-)
> > 
> > -- 
> > 2.7.4
> 

Re: [PATCH v1 0/7] vhost-user-blk: fix the migration issue and enhance qtests
Posted by Michael S. Tsirkin 3 years, 8 months ago
On Tue, Aug 04, 2020 at 01:36:45PM +0300, Dima Stepanov wrote:
> Reference e-mail threads:
>   - https://lists.gnu.org/archive/html/qemu-devel/2020-05/msg01509.html
>   - https://lists.gnu.org/archive/html/qemu-devel/2020-05/msg05241.html
> 
> If vhost-user daemon is used as a backend for the vhost device, then we
> should consider a possibility of disconnect at any moment. There was a general
> question here: should we consider it as an error or okay state for the vhost-user
> devices during migration process?
> I think the disconnect event for the vhost-user devices should not break the
> migration process, because:
>   - the device will be in the stopped state, so it will not be changed
>     during migration
>   - if reconnect will be made the migration log will be reinitialized as
>     part of reconnect/init process:
>     #0  vhost_log_global_start (listener=0x563989cf7be0)
>     at hw/virtio/vhost.c:920
>     #1  0x000056398603d8bc in listener_add_address_space (listener=0x563989cf7be0,
>         as=0x563986ea4340 <address_space_memory>)
>     at softmmu/memory.c:2664
>     #2  0x000056398603dd30 in memory_listener_register (listener=0x563989cf7be0,
>         as=0x563986ea4340 <address_space_memory>)
>     at softmmu/memory.c:2740
>     #3  0x0000563985fd6956 in vhost_dev_init (hdev=0x563989cf7bd8,
>         opaque=0x563989cf7e30, backend_type=VHOST_BACKEND_TYPE_USER,
>         busyloop_timeout=0)
>     at hw/virtio/vhost.c:1385
>     #4  0x0000563985f7d0b8 in vhost_user_blk_connect (dev=0x563989cf7990)
>     at hw/block/vhost-user-blk.c:315
>     #5  0x0000563985f7d3f6 in vhost_user_blk_event (opaque=0x563989cf7990,
>         event=CHR_EVENT_OPENED)
>     at hw/block/vhost-user-blk.c:379
> The first patch in the patchset fixes this issue by setting vhost device to the
> stopped state in the disconnect handler and check it the vhost_migration_log()
> routine before returning from the function.
> qtest framework was updated to test vhost-user-blk functionality. The
> vhost-user-blk/vhost-user-blk-tests/migrate_reconnect test was added to reproduce
> the original issue found.


Raphael any input on this?

> Dima Stepanov (7):
>   vhost: recheck dev state in the vhost_migration_log routine
>   vhost: check queue state in the vhost_dev_set_log routine
>   tests/qtest/vhost-user-test: prepare the tests for adding new dev
>     class
>   tests/qtest/libqos/virtio-blk: add support for vhost-user-blk
>   tests/qtest/vhost-user-test: add support for the vhost-user-blk device
>   tests/qtest/vhost-user-test: add migrate_reconnect test
>   tests/qtest/vhost-user-test: enable the reconnect tests
> 
>  hw/block/vhost-user-blk.c          |  13 +-
>  hw/virtio/vhost.c                  |  39 ++++-
>  include/hw/virtio/vhost-user-blk.h |   1 +
>  tests/qtest/libqos/virtio-blk.c    |  14 ++
>  tests/qtest/vhost-user-test.c      | 291 +++++++++++++++++++++++++++++++------
>  5 files changed, 311 insertions(+), 47 deletions(-)
> 
> -- 
> 2.7.4


Re: [PATCH v1 0/7] vhost-user-blk: fix the migration issue and enhance qtests
Posted by Raphael Norwitz 3 years, 8 months ago
On Thu, Aug 27, 2020 at 8:17 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Aug 04, 2020 at 01:36:45PM +0300, Dima Stepanov wrote:
> > Reference e-mail threads:
> >   - https://lists.gnu.org/archive/html/qemu-devel/2020-05/msg01509.html
> >   - https://lists.gnu.org/archive/html/qemu-devel/2020-05/msg05241.html
> >
> > If vhost-user daemon is used as a backend for the vhost device, then we
> > should consider a possibility of disconnect at any moment. There was a general
> > question here: should we consider it as an error or okay state for the vhost-user
> > devices during migration process?
> > I think the disconnect event for the vhost-user devices should not break the
> > migration process, because:
> >   - the device will be in the stopped state, so it will not be changed
> >     during migration
> >   - if reconnect will be made the migration log will be reinitialized as
> >     part of reconnect/init process:
> >     #0  vhost_log_global_start (listener=0x563989cf7be0)
> >     at hw/virtio/vhost.c:920
> >     #1  0x000056398603d8bc in listener_add_address_space (listener=0x563989cf7be0,
> >         as=0x563986ea4340 <address_space_memory>)
> >     at softmmu/memory.c:2664
> >     #2  0x000056398603dd30 in memory_listener_register (listener=0x563989cf7be0,
> >         as=0x563986ea4340 <address_space_memory>)
> >     at softmmu/memory.c:2740
> >     #3  0x0000563985fd6956 in vhost_dev_init (hdev=0x563989cf7bd8,
> >         opaque=0x563989cf7e30, backend_type=VHOST_BACKEND_TYPE_USER,
> >         busyloop_timeout=0)
> >     at hw/virtio/vhost.c:1385
> >     #4  0x0000563985f7d0b8 in vhost_user_blk_connect (dev=0x563989cf7990)
> >     at hw/block/vhost-user-blk.c:315
> >     #5  0x0000563985f7d3f6 in vhost_user_blk_event (opaque=0x563989cf7990,
> >         event=CHR_EVENT_OPENED)
> >     at hw/block/vhost-user-blk.c:379
> > The first patch in the patchset fixes this issue by setting vhost device to the
> > stopped state in the disconnect handler and check it the vhost_migration_log()
> > routine before returning from the function.
> > qtest framework was updated to test vhost-user-blk functionality. The
> > vhost-user-blk/vhost-user-blk-tests/migrate_reconnect test was added to reproduce
> > the original issue found.
>
>
> Raphael any input on this?

Just posted comments on the vhost/vhost-user-blk side. Will look at
the test code next.

>
> > Dima Stepanov (7):
> >   vhost: recheck dev state in the vhost_migration_log routine
> >   vhost: check queue state in the vhost_dev_set_log routine
> >   tests/qtest/vhost-user-test: prepare the tests for adding new dev
> >     class
> >   tests/qtest/libqos/virtio-blk: add support for vhost-user-blk
> >   tests/qtest/vhost-user-test: add support for the vhost-user-blk device
> >   tests/qtest/vhost-user-test: add migrate_reconnect test
> >   tests/qtest/vhost-user-test: enable the reconnect tests
> >
> >  hw/block/vhost-user-blk.c          |  13 +-
> >  hw/virtio/vhost.c                  |  39 ++++-
> >  include/hw/virtio/vhost-user-blk.h |   1 +
> >  tests/qtest/libqos/virtio-blk.c    |  14 ++
> >  tests/qtest/vhost-user-test.c      | 291 +++++++++++++++++++++++++++++++------
> >  5 files changed, 311 insertions(+), 47 deletions(-)
> >
> > --
> > 2.7.4
>
>