backends/cryptodev-vhost.c | 2 +- backends/vhost-user.c | 2 +- docs/interop/vhost-user.rst | 8 +++- hw/block/vhost-user-blk.c | 28 ++++++++++++- hw/net/vhost_net.c | 9 ++-- hw/scsi/vhost-scsi-common.c | 2 +- hw/virtio/vdpa-dev.c | 2 +- hw/virtio/vhost-user-base.c | 2 +- hw/virtio/vhost-user-fs.c | 2 +- hw/virtio/vhost-user-scmi.c | 2 +- hw/virtio/vhost-vsock-common.c | 2 +- hw/virtio/vhost.c | 66 ++++++++++++++++++++++++++---- include/hw/virtio/vhost-user-blk.h | 1 + include/hw/virtio/vhost.h | 13 +++++- include/migration/vmstate.h | 10 +++++ 15 files changed, 125 insertions(+), 26 deletions(-)
v4:
While testing inflight migration, I notices a problem with the fact that
GET_VRING_BASE is needed during migration, so the back-end stops
dirtying pages and synchronizes `last_avail` counter with QEMU. So after
migration in-flight I/O requests will be looks like resubmited on destination vm.
However, in new logic, we no longer need to wait for in-flight requests
to be complete at GET_VRING_BASE message. So support new parameter
`should_drain` in the GET_VRING_BASE to allow back-end stop vrings
immediately without waiting for in-flight I/O requests to complete.
Also:
- modify vhost-user rst
- refactor on vhost-user-blk.c, now `should_drain` is based on
device parameter `inflight-migration`
v3:
- use pre_load_errp instead of pre_load in vhost.c
- change vhost-user-blk property to
"skip-get-vring-base-inflight-migration"
- refactor vhost-user-blk.c, by moving vhost_user_blk_inflight_needed() higher
v2:
- rewrite migration using VMSD instead of qemufile API
- add vhost-user-blk parameter instead of migration capability
I don't know if VMSD was used cleanly in migration implementation, so
feel free for comments.
Based on Vladimir's work:
[PATCH v2 00/25] vhost-user-blk: live-backend local migration
which was based on:
- [PATCH v4 0/7] chardev: postpone connect
(which in turn is based on [PATCH 0/2] remove deprecated 'reconnect' options)
- [PATCH v3 00/23] vhost refactoring and fixes
- [PATCH v8 14/19] migration: introduce .pre_incoming() vmsd handler
Based-on: <20250924133309.334631-1-vsementsov@yandex-team.ru>
Based-on: <20251015212051.1156334-1-vsementsov@yandex-team.ru>
Based-on: <20251015145808.1112843-1-vsementsov@yandex-team.ru>
Based-on: <20251015132136.1083972-15-vsementsov@yandex-team.ru>
Based-on: <20251016114104.1384675-1-vsementsov@yandex-team.ru>
---
Hi!
During inter-host migration, waiting for disk requests to be drained
in the vhost-user backend can incur significant downtime.
This can be avoided if QEMU migrates the inflight region in vhost-user-blk.
Thus, during the qemu migration, the vhost-user backend can cancel all inflight requests and
then, after migration, they will be executed on another host.
At first, I tried to implement migration for all vhost-user devices that support inflight at once,
but this would require a lot of changes both in vhost-user-blk (to transfer it to the base class) and
in the vhost-user-base base class (inflight implementation and remodeling + a large refactor).
Therefore, for now I decided to leave this idea for later and
implement the migration of the inflight region first for vhost-user-blk.
Alexandr Moshkov (5):
vhost-user.rst: specify vhost-user back-end action on GET_VRING_BASE
vhost-user: introduce should_drain on GET_VRING_BASE
vmstate: introduce VMSTATE_VBUFFER_UINT64
vhost: add vmstate for inflight region with inner buffer
vhost-user-blk: support inter-host inflight migration
backends/cryptodev-vhost.c | 2 +-
backends/vhost-user.c | 2 +-
docs/interop/vhost-user.rst | 8 +++-
hw/block/vhost-user-blk.c | 28 ++++++++++++-
hw/net/vhost_net.c | 9 ++--
hw/scsi/vhost-scsi-common.c | 2 +-
hw/virtio/vdpa-dev.c | 2 +-
hw/virtio/vhost-user-base.c | 2 +-
hw/virtio/vhost-user-fs.c | 2 +-
hw/virtio/vhost-user-scmi.c | 2 +-
hw/virtio/vhost-vsock-common.c | 2 +-
hw/virtio/vhost.c | 66 ++++++++++++++++++++++++++----
include/hw/virtio/vhost-user-blk.h | 1 +
include/hw/virtio/vhost.h | 13 +++++-
include/migration/vmstate.h | 10 +++++
15 files changed, 125 insertions(+), 26 deletions(-)
--
2.34.1
On Mon, Dec 29, 2025 at 03:21:03PM +0500, Alexandr Moshkov wrote: > v4: > While testing inflight migration, I notices a problem with the fact that > GET_VRING_BASE is needed during migration, so the back-end stops > dirtying pages and synchronizes `last_avail` counter with QEMU. So after > migration in-flight I/O requests will be looks like resubmited on destination vm. > > However, in new logic, we no longer need to wait for in-flight requests > to be complete at GET_VRING_BASE message. So support new parameter > `should_drain` in the GET_VRING_BASE to allow back-end stop vrings > immediately without waiting for in-flight I/O requests to complete. > > Also: > - modify vhost-user rst > - refactor on vhost-user-blk.c, now `should_drain` is based on > device parameter `inflight-migration` > > v3: > - use pre_load_errp instead of pre_load in vhost.c > - change vhost-user-blk property to > "skip-get-vring-base-inflight-migration" > - refactor vhost-user-blk.c, by moving vhost_user_blk_inflight_needed() higher > > v2: > - rewrite migration using VMSD instead of qemufile API > - add vhost-user-blk parameter instead of migration capability > > I don't know if VMSD was used cleanly in migration implementation, so > feel free for comments. > > Based on Vladimir's work: > [PATCH v2 00/25] vhost-user-blk: live-backend local migration > which was based on: > - [PATCH v4 0/7] chardev: postpone connect > (which in turn is based on [PATCH 0/2] remove deprecated 'reconnect' options) > - [PATCH v3 00/23] vhost refactoring and fixes > - [PATCH v8 14/19] migration: introduce .pre_incoming() vmsd handler > > Based-on: <20250924133309.334631-1-vsementsov@yandex-team.ru> > Based-on: <20251015212051.1156334-1-vsementsov@yandex-team.ru> > Based-on: <20251015145808.1112843-1-vsementsov@yandex-team.ru> > Based-on: <20251015132136.1083972-15-vsementsov@yandex-team.ru> > Based-on: <20251016114104.1384675-1-vsementsov@yandex-team.ru> > > --- > > Hi! > > During inter-host migration, waiting for disk requests to be drained > in the vhost-user backend can incur significant downtime. > > This can be avoided if QEMU migrates the inflight region in vhost-user-blk. > Thus, during the qemu migration, the vhost-user backend can cancel all inflight requests and > then, after migration, they will be executed on another host. I'm surprised by this statement because cancellation requires communication with the disk. If in-flight requests are slow to drain, then I would expect cancellation to be slow too. What kind of storage are you using? > > At first, I tried to implement migration for all vhost-user devices that support inflight at once, > but this would require a lot of changes both in vhost-user-blk (to transfer it to the base class) and > in the vhost-user-base base class (inflight implementation and remodeling + a large refactor). > > Therefore, for now I decided to leave this idea for later and > implement the migration of the inflight region first for vhost-user-blk. Sounds okay to me. I'm not sure about the change to GET_VRING_BASE. A new parameter is added without a feature bit, so there is no way to detect this feature at runtime. Maybe a VHOST_USER_PROTOCOL_F_GET_VRING_BASE_INFLIGHT feature bit should be added? Once a feature bit exists, it may not even be necessary to add the parameter to GET_VRING_BASE: When VHOST_USER_PROTOCOL_F_GET_VRING_BASE_INFLIGHT is zero, GET_VRING_BASE drains in-flight I/O before completing. When VHOST_USER_PROTOCOL_F_GET_VRING_BASE_INFLIGHT is one, the backend may leave requests in-flight (but host I/O requests must be cancelled in order to comply with the "Suspended device state" semantics) when GET_VRING_BASE completes. What do you think? > > Alexandr Moshkov (5): > vhost-user.rst: specify vhost-user back-end action on GET_VRING_BASE > vhost-user: introduce should_drain on GET_VRING_BASE > vmstate: introduce VMSTATE_VBUFFER_UINT64 > vhost: add vmstate for inflight region with inner buffer > vhost-user-blk: support inter-host inflight migration > > backends/cryptodev-vhost.c | 2 +- > backends/vhost-user.c | 2 +- > docs/interop/vhost-user.rst | 8 +++- > hw/block/vhost-user-blk.c | 28 ++++++++++++- > hw/net/vhost_net.c | 9 ++-- > hw/scsi/vhost-scsi-common.c | 2 +- > hw/virtio/vdpa-dev.c | 2 +- > hw/virtio/vhost-user-base.c | 2 +- > hw/virtio/vhost-user-fs.c | 2 +- > hw/virtio/vhost-user-scmi.c | 2 +- > hw/virtio/vhost-vsock-common.c | 2 +- > hw/virtio/vhost.c | 66 ++++++++++++++++++++++++++---- > include/hw/virtio/vhost-user-blk.h | 1 + > include/hw/virtio/vhost.h | 13 +++++- > include/migration/vmstate.h | 10 +++++ > 15 files changed, 125 insertions(+), 26 deletions(-) > > -- > 2.34.1 >
Hi! Thanks for reply! On 1/7/26 00:04, Stefan Hajnoczi wrote: > On Mon, Dec 29, 2025 at 03:21:03PM +0500, Alexandr Moshkov wrote: >> v4: >> While testing inflight migration, I notices a problem with the fact that >> GET_VRING_BASE is needed during migration, so the back-end stops >> dirtying pages and synchronizes `last_avail` counter with QEMU. So after >> migration in-flight I/O requests will be looks like resubmited on destination vm. >> >> However, in new logic, we no longer need to wait for in-flight requests >> to be complete at GET_VRING_BASE message. So support new parameter >> `should_drain` in the GET_VRING_BASE to allow back-end stop vrings >> immediately without waiting for in-flight I/O requests to complete. >> >> Also: >> - modify vhost-user rst >> - refactor on vhost-user-blk.c, now `should_drain` is based on >> device parameter `inflight-migration` >> >> v3: >> - use pre_load_errp instead of pre_load in vhost.c >> - change vhost-user-blk property to >> "skip-get-vring-base-inflight-migration" >> - refactor vhost-user-blk.c, by moving vhost_user_blk_inflight_needed() higher >> >> v2: >> - rewrite migration using VMSD instead of qemufile API >> - add vhost-user-blk parameter instead of migration capability >> >> I don't know if VMSD was used cleanly in migration implementation, so >> feel free for comments. >> >> Based on Vladimir's work: >> [PATCH v2 00/25] vhost-user-blk: live-backend local migration >> which was based on: >> - [PATCH v4 0/7] chardev: postpone connect >> (which in turn is based on [PATCH 0/2] remove deprecated 'reconnect' options) >> - [PATCH v3 00/23] vhost refactoring and fixes >> - [PATCH v8 14/19] migration: introduce .pre_incoming() vmsd handler >> >> Based-on:<20250924133309.334631-1-vsementsov@yandex-team.ru> >> Based-on:<20251015212051.1156334-1-vsementsov@yandex-team.ru> >> Based-on:<20251015145808.1112843-1-vsementsov@yandex-team.ru> >> Based-on:<20251015132136.1083972-15-vsementsov@yandex-team.ru> >> Based-on:<20251016114104.1384675-1-vsementsov@yandex-team.ru> >> >> --- >> >> Hi! >> >> During inter-host migration, waiting for disk requests to be drained >> in the vhost-user backend can incur significant downtime. >> >> This can be avoided if QEMU migrates the inflight region in vhost-user-blk. >> Thus, during the qemu migration, the vhost-user backend can cancel all inflight requests and >> then, after migration, they will be executed on another host. > I'm surprised by this statement because cancellation requires > communication with the disk. If in-flight requests are slow to drain, > then I would expect cancellation to be slow too. What kind of storage > are you using? Iprobablychosethe wrongword"cancel"todescribethe patch. Weimplementedthis logic as follows:the vhost-userserverdoes notwaitfor in-flightrequeststo complete,butmarksthemallwith specificstatusCANCEL -inreality, norequestis explicitlycanceled.Thenthe serverimmediatelyproceedstocompletethe connectionwiththe VM.Thus,allin-flightrequestsmigrateto thedestinationvm. So,thisis moreabout ignoringin-flightrequeststhanactuallycancelingthem. In our case, this logic can lead to in-flight requests being sent to the backend twice, but this will not result in double execution of the requestson our disks. >> At first, I tried to implement migration for all vhost-user devices that support inflight at once, >> but this would require a lot of changes both in vhost-user-blk (to transfer it to the base class) and >> in the vhost-user-base base class (inflight implementation and remodeling + a large refactor). >> >> Therefore, for now I decided to leave this idea for later and >> implement the migration of the inflight region first for vhost-user-blk. > Sounds okay to me. > > I'm not sure about the change to GET_VRING_BASE. A new parameter is > added without a feature bit, so there is no way to detect this feature > at runtime. Maybe a VHOST_USER_PROTOCOL_F_GET_VRING_BASE_INFLIGHT > feature bit should be added? > > Once a feature bit exists, it may not even be necessary to add the > parameter to GET_VRING_BASE: > > When VHOST_USER_PROTOCOL_F_GET_VRING_BASE_INFLIGHT is zero, > GET_VRING_BASE drains in-flight I/O before completing. When > VHOST_USER_PROTOCOL_F_GET_VRING_BASE_INFLIGHT is one, the backend may > leave requests in-flight (but host I/O requests must be cancelled in > order to comply with the "Suspended device state" semantics) when > GET_VRING_BASE completes. > > What do you think? This solution looks much better! Thanks, I'll fix it. > >> Alexandr Moshkov (5): >> vhost-user.rst: specify vhost-user back-end action on GET_VRING_BASE >> vhost-user: introduce should_drain on GET_VRING_BASE >> vmstate: introduce VMSTATE_VBUFFER_UINT64 >> vhost: add vmstate for inflight region with inner buffer >> vhost-user-blk: support inter-host inflight migration >> >> backends/cryptodev-vhost.c | 2 +- >> backends/vhost-user.c | 2 +- >> docs/interop/vhost-user.rst | 8 +++- >> hw/block/vhost-user-blk.c | 28 ++++++++++++- >> hw/net/vhost_net.c | 9 ++-- >> hw/scsi/vhost-scsi-common.c | 2 +- >> hw/virtio/vdpa-dev.c | 2 +- >> hw/virtio/vhost-user-base.c | 2 +- >> hw/virtio/vhost-user-fs.c | 2 +- >> hw/virtio/vhost-user-scmi.c | 2 +- >> hw/virtio/vhost-vsock-common.c | 2 +- >> hw/virtio/vhost.c | 66 ++++++++++++++++++++++++++---- >> include/hw/virtio/vhost-user-blk.h | 1 + >> include/hw/virtio/vhost.h | 13 +++++- >> include/migration/vmstate.h | 10 +++++ >> 15 files changed, 125 insertions(+), 26 deletions(-) >> >> -- >> 2.34.1 >>
© 2016 - 2026 Red Hat, Inc.