[v4] support inflight migration

[PATCH v4 0/5] support inflight migration

Posted by Alexandr Moshkov 1 month, 1 week ago

v4:
While testing inflight migration, I notices a problem with the fact that
GET_VRING_BASE is needed during migration, so the back-end stops
dirtying pages and synchronizes `last_avail` counter with QEMU. So after
migration in-flight I/O requests will be looks like resubmited on destination vm.

However, in new logic, we no longer need to wait for in-flight requests
to be complete at GET_VRING_BASE message. So support new parameter
`should_drain` in the GET_VRING_BASE to allow back-end stop vrings
immediately without waiting for in-flight I/O requests to complete.

Also:
- modify vhost-user rst
- refactor on vhost-user-blk.c, now `should_drain` is based on
  device parameter `inflight-migration`

v3:
- use pre_load_errp instead of pre_load in vhost.c
- change vhost-user-blk property to
  "skip-get-vring-base-inflight-migration"
- refactor vhost-user-blk.c, by moving vhost_user_blk_inflight_needed() higher

v2:
- rewrite migration using VMSD instead of qemufile API
- add vhost-user-blk parameter instead of migration capability

I don't know if VMSD was used cleanly in migration implementation, so
feel free for comments.

Based on Vladimir's work:
[PATCH v2 00/25] vhost-user-blk: live-backend local migration
  which was based on:
    - [PATCH v4 0/7] chardev: postpone connect
      (which in turn is based on [PATCH 0/2] remove deprecated 'reconnect' options)
    - [PATCH v3 00/23] vhost refactoring and fixes
    - [PATCH v8 14/19] migration: introduce .pre_incoming() vmsd handler

Based-on: <20250924133309.334631-1-vsementsov@yandex-team.ru>
Based-on: <20251015212051.1156334-1-vsementsov@yandex-team.ru>
Based-on: <20251015145808.1112843-1-vsementsov@yandex-team.ru>
Based-on: <20251015132136.1083972-15-vsementsov@yandex-team.ru>
Based-on: <20251016114104.1384675-1-vsementsov@yandex-team.ru>

---

Hi!

During inter-host migration, waiting for disk requests to be drained
in the vhost-user backend can incur significant downtime.

This can be avoided if QEMU migrates the inflight region in vhost-user-blk.
Thus, during the qemu migration, the vhost-user backend can cancel all inflight requests and
then, after migration, they will be executed on another host.

At first, I tried to implement migration for all vhost-user devices that support inflight at once,
but this would require a lot of changes both in vhost-user-blk (to transfer it to the base class) and
in the vhost-user-base base class (inflight implementation and remodeling + a large refactor).

Therefore, for now I decided to leave this idea for later and
implement the migration of the inflight region first for vhost-user-blk.

Alexandr Moshkov (5):
  vhost-user.rst: specify vhost-user back-end action on GET_VRING_BASE
  vhost-user: introduce should_drain on GET_VRING_BASE
  vmstate: introduce VMSTATE_VBUFFER_UINT64
  vhost: add vmstate for inflight region with inner buffer
  vhost-user-blk: support inter-host inflight migration

 backends/cryptodev-vhost.c         |  2 +-
 backends/vhost-user.c              |  2 +-
 docs/interop/vhost-user.rst        |  8 +++-
 hw/block/vhost-user-blk.c          | 28 ++++++++++++-
 hw/net/vhost_net.c                 |  9 ++--
 hw/scsi/vhost-scsi-common.c        |  2 +-
 hw/virtio/vdpa-dev.c               |  2 +-
 hw/virtio/vhost-user-base.c        |  2 +-
 hw/virtio/vhost-user-fs.c          |  2 +-
 hw/virtio/vhost-user-scmi.c        |  2 +-
 hw/virtio/vhost-vsock-common.c     |  2 +-
 hw/virtio/vhost.c                  | 66 ++++++++++++++++++++++++++----
 include/hw/virtio/vhost-user-blk.h |  1 +
 include/hw/virtio/vhost.h          | 13 +++++-
 include/migration/vmstate.h        | 10 +++++
 15 files changed, 125 insertions(+), 26 deletions(-)

-- 
2.34.1

Re: [PATCH v4 0/5] support inflight migration

Posted by Stefan Hajnoczi 1 month ago

On Mon, Dec 29, 2025 at 03:21:03PM +0500, Alexandr Moshkov wrote:
> v4:
> While testing inflight migration, I notices a problem with the fact that
> GET_VRING_BASE is needed during migration, so the back-end stops
> dirtying pages and synchronizes `last_avail` counter with QEMU. So after
> migration in-flight I/O requests will be looks like resubmited on destination vm.
> 
> However, in new logic, we no longer need to wait for in-flight requests
> to be complete at GET_VRING_BASE message. So support new parameter
> `should_drain` in the GET_VRING_BASE to allow back-end stop vrings
> immediately without waiting for in-flight I/O requests to complete.
> 
> Also:
> - modify vhost-user rst
> - refactor on vhost-user-blk.c, now `should_drain` is based on
>   device parameter `inflight-migration`
> 
> v3:
> - use pre_load_errp instead of pre_load in vhost.c
> - change vhost-user-blk property to
>   "skip-get-vring-base-inflight-migration"
> - refactor vhost-user-blk.c, by moving vhost_user_blk_inflight_needed() higher
> 
> v2:
> - rewrite migration using VMSD instead of qemufile API
> - add vhost-user-blk parameter instead of migration capability
> 
> I don't know if VMSD was used cleanly in migration implementation, so
> feel free for comments.
> 
> Based on Vladimir's work:
> [PATCH v2 00/25] vhost-user-blk: live-backend local migration
>   which was based on:
>     - [PATCH v4 0/7] chardev: postpone connect
>       (which in turn is based on [PATCH 0/2] remove deprecated 'reconnect' options)
>     - [PATCH v3 00/23] vhost refactoring and fixes
>     - [PATCH v8 14/19] migration: introduce .pre_incoming() vmsd handler
> 
> Based-on: <20250924133309.334631-1-vsementsov@yandex-team.ru>
> Based-on: <20251015212051.1156334-1-vsementsov@yandex-team.ru>
> Based-on: <20251015145808.1112843-1-vsementsov@yandex-team.ru>
> Based-on: <20251015132136.1083972-15-vsementsov@yandex-team.ru>
> Based-on: <20251016114104.1384675-1-vsementsov@yandex-team.ru>
> 
> ---
> 
> Hi!
> 
> During inter-host migration, waiting for disk requests to be drained
> in the vhost-user backend can incur significant downtime.
> 
> This can be avoided if QEMU migrates the inflight region in vhost-user-blk.
> Thus, during the qemu migration, the vhost-user backend can cancel all inflight requests and
> then, after migration, they will be executed on another host.

I'm surprised by this statement because cancellation requires
communication with the disk. If in-flight requests are slow to drain,
then I would expect cancellation to be slow too. What kind of storage
are you using?

> 
> At first, I tried to implement migration for all vhost-user devices that support inflight at once,
> but this would require a lot of changes both in vhost-user-blk (to transfer it to the base class) and
> in the vhost-user-base base class (inflight implementation and remodeling + a large refactor).
> 
> Therefore, for now I decided to leave this idea for later and
> implement the migration of the inflight region first for vhost-user-blk.

Sounds okay to me.

I'm not sure about the change to GET_VRING_BASE. A new parameter is
added without a feature bit, so there is no way to detect this feature
at runtime. Maybe a VHOST_USER_PROTOCOL_F_GET_VRING_BASE_INFLIGHT
feature bit should be added?

Once a feature bit exists, it may not even be necessary to add the
parameter to GET_VRING_BASE:

When VHOST_USER_PROTOCOL_F_GET_VRING_BASE_INFLIGHT is zero,
GET_VRING_BASE drains in-flight I/O before completing. When
VHOST_USER_PROTOCOL_F_GET_VRING_BASE_INFLIGHT is one, the backend may
leave requests in-flight (but host I/O requests must be cancelled in
order to comply with the "Suspended device state" semantics) when
GET_VRING_BASE completes.

What do you think?

> 
> Alexandr Moshkov (5):
>   vhost-user.rst: specify vhost-user back-end action on GET_VRING_BASE
>   vhost-user: introduce should_drain on GET_VRING_BASE
>   vmstate: introduce VMSTATE_VBUFFER_UINT64
>   vhost: add vmstate for inflight region with inner buffer
>   vhost-user-blk: support inter-host inflight migration
> 
>  backends/cryptodev-vhost.c         |  2 +-
>  backends/vhost-user.c              |  2 +-
>  docs/interop/vhost-user.rst        |  8 +++-
>  hw/block/vhost-user-blk.c          | 28 ++++++++++++-
>  hw/net/vhost_net.c                 |  9 ++--
>  hw/scsi/vhost-scsi-common.c        |  2 +-
>  hw/virtio/vdpa-dev.c               |  2 +-
>  hw/virtio/vhost-user-base.c        |  2 +-
>  hw/virtio/vhost-user-fs.c          |  2 +-
>  hw/virtio/vhost-user-scmi.c        |  2 +-
>  hw/virtio/vhost-vsock-common.c     |  2 +-
>  hw/virtio/vhost.c                  | 66 ++++++++++++++++++++++++++----
>  include/hw/virtio/vhost-user-blk.h |  1 +
>  include/hw/virtio/vhost.h          | 13 +++++-
>  include/migration/vmstate.h        | 10 +++++
>  15 files changed, 125 insertions(+), 26 deletions(-)
> 
> -- 
> 2.34.1
>

Re: [PATCH v4 0/5] support inflight migration

Posted by Alexandr Moshkov 4 weeks ago

Hi! Thanks for reply!

On 1/7/26 00:04, Stefan Hajnoczi wrote:
> On Mon, Dec 29, 2025 at 03:21:03PM +0500, Alexandr Moshkov wrote:
>> v4:
>> While testing inflight migration, I notices a problem with the fact that
>> GET_VRING_BASE is needed during migration, so the back-end stops
>> dirtying pages and synchronizes `last_avail` counter with QEMU. So after
>> migration in-flight I/O requests will be looks like resubmited on destination vm.
>>
>> However, in new logic, we no longer need to wait for in-flight requests
>> to be complete at GET_VRING_BASE message. So support new parameter
>> `should_drain` in the GET_VRING_BASE to allow back-end stop vrings
>> immediately without waiting for in-flight I/O requests to complete.
>>
>> Also:
>> - modify vhost-user rst
>> - refactor on vhost-user-blk.c, now `should_drain` is based on
>>    device parameter `inflight-migration`
>>
>> v3:
>> - use pre_load_errp instead of pre_load in vhost.c
>> - change vhost-user-blk property to
>>    "skip-get-vring-base-inflight-migration"
>> - refactor vhost-user-blk.c, by moving vhost_user_blk_inflight_needed() higher
>>
>> v2:
>> - rewrite migration using VMSD instead of qemufile API
>> - add vhost-user-blk parameter instead of migration capability
>>
>> I don't know if VMSD was used cleanly in migration implementation, so
>> feel free for comments.
>>
>> Based on Vladimir's work:
>> [PATCH v2 00/25] vhost-user-blk: live-backend local migration
>>    which was based on:
>>      - [PATCH v4 0/7] chardev: postpone connect
>>        (which in turn is based on [PATCH 0/2] remove deprecated 'reconnect' options)
>>      - [PATCH v3 00/23] vhost refactoring and fixes
>>      - [PATCH v8 14/19] migration: introduce .pre_incoming() vmsd handler
>>
>> Based-on:<20250924133309.334631-1-vsementsov@yandex-team.ru>
>> Based-on:<20251015212051.1156334-1-vsementsov@yandex-team.ru>
>> Based-on:<20251015145808.1112843-1-vsementsov@yandex-team.ru>
>> Based-on:<20251015132136.1083972-15-vsementsov@yandex-team.ru>
>> Based-on:<20251016114104.1384675-1-vsementsov@yandex-team.ru>
>>
>> ---
>>
>> Hi!
>>
>> During inter-host migration, waiting for disk requests to be drained
>> in the vhost-user backend can incur significant downtime.
>>
>> This can be avoided if QEMU migrates the inflight region in vhost-user-blk.
>> Thus, during the qemu migration, the vhost-user backend can cancel all inflight requests and
>> then, after migration, they will be executed on another host.
> I'm surprised by this statement because cancellation requires
> communication with the disk. If in-flight requests are slow to drain,
> then I would expect cancellation to be slow too. What kind of storage
> are you using?

Iprobablychosethe wrongword"cancel"todescribethe patch. 
Weimplementedthis logic as follows:the vhost-userserverdoes notwaitfor 
in-flightrequeststo complete,butmarksthemallwith specificstatusCANCEL 
-inreality, norequestis explicitlycanceled.Thenthe 
serverimmediatelyproceedstocompletethe connectionwiththe 
VM.Thus,allin-flightrequestsmigrateto thedestinationvm.

So,thisis moreabout ignoringin-flightrequeststhanactuallycancelingthem. 
In our case, this logic can lead to in-flight requests being sent to the 
backend twice, but this will not result in double execution of the 
requestson our disks.

>> At first, I tried to implement migration for all vhost-user devices that support inflight at once,
>> but this would require a lot of changes both in vhost-user-blk (to transfer it to the base class) and
>> in the vhost-user-base base class (inflight implementation and remodeling + a large refactor).
>>
>> Therefore, for now I decided to leave this idea for later and
>> implement the migration of the inflight region first for vhost-user-blk.
> Sounds okay to me.
>
> I'm not sure about the change to GET_VRING_BASE. A new parameter is
> added without a feature bit, so there is no way to detect this feature
> at runtime. Maybe a VHOST_USER_PROTOCOL_F_GET_VRING_BASE_INFLIGHT
> feature bit should be added?
>
> Once a feature bit exists, it may not even be necessary to add the
> parameter to GET_VRING_BASE:
>
> When VHOST_USER_PROTOCOL_F_GET_VRING_BASE_INFLIGHT is zero,
> GET_VRING_BASE drains in-flight I/O before completing. When
> VHOST_USER_PROTOCOL_F_GET_VRING_BASE_INFLIGHT is one, the backend may
> leave requests in-flight (but host I/O requests must be cancelled in
> order to comply with the "Suspended device state" semantics) when
> GET_VRING_BASE completes.
>
> What do you think?
This solution looks much better! Thanks, I'll fix it.
>
>> Alexandr Moshkov (5):
>>    vhost-user.rst: specify vhost-user back-end action on GET_VRING_BASE
>>    vhost-user: introduce should_drain on GET_VRING_BASE
>>    vmstate: introduce VMSTATE_VBUFFER_UINT64
>>    vhost: add vmstate for inflight region with inner buffer
>>    vhost-user-blk: support inter-host inflight migration
>>
>>   backends/cryptodev-vhost.c         |  2 +-
>>   backends/vhost-user.c              |  2 +-
>>   docs/interop/vhost-user.rst        |  8 +++-
>>   hw/block/vhost-user-blk.c          | 28 ++++++++++++-
>>   hw/net/vhost_net.c                 |  9 ++--
>>   hw/scsi/vhost-scsi-common.c        |  2 +-
>>   hw/virtio/vdpa-dev.c               |  2 +-
>>   hw/virtio/vhost-user-base.c        |  2 +-
>>   hw/virtio/vhost-user-fs.c          |  2 +-
>>   hw/virtio/vhost-user-scmi.c        |  2 +-
>>   hw/virtio/vhost-vsock-common.c     |  2 +-
>>   hw/virtio/vhost.c                  | 66 ++++++++++++++++++++++++++----
>>   include/hw/virtio/vhost-user-blk.h |  1 +
>>   include/hw/virtio/vhost.h          | 13 +++++-
>>   include/migration/vmstate.h        | 10 +++++
>>   15 files changed, 125 insertions(+), 26 deletions(-)
>>
>> -- 
>> 2.34.1
>>