[PATCH v3 0/8] Live update: tap and vhost

Ben Chaney posted 8 patches 1 week, 3 days ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20251203-cpr-tap-v3-0-3c12e0a61f8e@akamai.com
Maintainers: "Michael S. Tsirkin" <mst@redhat.com>, Jason Wang <jasowang@redhat.com>, Alex Williamson <alex@shazbot.org>, "Cédric Le Goater" <clg@redhat.com>, Stefano Garzarella <sgarzare@redhat.com>, Peter Xu <peterx@redhat.com>, Fabiano Rosas <farosas@suse.de>, "Daniel P. Berrangé" <berrange@redhat.com>, Stefan Weil <sw@weilnetz.de>, Eric Blake <eblake@redhat.com>, Markus Armbruster <armbru@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>
hw/net/virtio-net.c               |  26 +++++++
hw/vfio/device.c                  |   2 +-
hw/virtio/vhost-backend.c         |   6 ++
hw/virtio/vhost.c                 |  32 +++++++++
include/hw/virtio/vhost-backend.h |   1 +
include/hw/virtio/vhost.h         |   1 +
include/migration/cpr.h           |   3 +-
include/net/tap.h                 |   1 +
io/channel-socket.c               |   4 +-
migration/cpr.c                   |  24 +++++--
migration/migration.c             |  69 ++++++++++++++----
net/tap-win32.c                   |   5 ++
net/tap.c                         | 147 +++++++++++++++++++++++++++++---------
qapi/net.json                     |   5 +-
stubs/cpr.c                       |   8 +++
stubs/meson.build                 |   1 +
16 files changed, 279 insertions(+), 56 deletions(-)
[PATCH v3 0/8] Live update: tap and vhost
Posted by Ben Chaney 1 week, 3 days ago
Changes since v2
- I have taken over this patch set since Steve retired
- Added comments to explain the order of events
- Remove redundant reversion to cleanup git history
- Inclusion of virtio and stub fixes

Tap and vhost devices can be preserved during cpr-transfer using
traditional live migration methods, wherein the management layer
creates new interfaces for the target and fiddles with 'ip link'
to deactivate the old interface and activate the new.

However, CPR can simply send the file descriptors to new QEMU,
with no special management actions required.  The user enables
this behavior by specifing '-netdev tap,cpr=on'.  The default
is cpr=off.

Signed-off-by: Ben Chaney <bchaney@akamai.com>
---
Steve Sistare (8):
      migration: stop vm earlier for cpr
      migration: cpr setup notifier
      vhost: reset vhost devices for cpr
      cpr: delete all fds
      tap: common return label
      tap: cpr support
      tap: postload fix for cpr
      tap: cpr fixes

 hw/net/virtio-net.c               |  26 +++++++
 hw/vfio/device.c                  |   2 +-
 hw/virtio/vhost-backend.c         |   6 ++
 hw/virtio/vhost.c                 |  32 +++++++++
 include/hw/virtio/vhost-backend.h |   1 +
 include/hw/virtio/vhost.h         |   1 +
 include/migration/cpr.h           |   3 +-
 include/net/tap.h                 |   1 +
 io/channel-socket.c               |   4 +-
 migration/cpr.c                   |  24 +++++--
 migration/migration.c             |  69 ++++++++++++++----
 net/tap-win32.c                   |   5 ++
 net/tap.c                         | 147 +++++++++++++++++++++++++++++---------
 qapi/net.json                     |   5 +-
 stubs/cpr.c                       |   8 +++
 stubs/meson.build                 |   1 +
 16 files changed, 279 insertions(+), 56 deletions(-)
---
base-commit: 9febfa94b69b7146582c48a868bd2330ac45037f
change-id: 20251203-cpr-tap-04fd811ace03

Best regards,
-- 
Ben Chaney <bchaney@akamai.com>
Re: [PATCH v3 0/8] Live update: tap and vhost
Posted by Cédric Le Goater 5 days, 16 hours ago
Hello,

Ben, Mark,

Since Steve retired, we have generic names under the "CheckPoint and
Restart (CPR)" entry in MAINTAINERS. Would you be willing to step forward
as Reviewers/Maintainers ?

Also, do you have a gitlab account so we can copy you on any reported
issues [1] ?

Thanks,

C.

[1] https://gitlab.com/qemu-project/qemu/-/issues/3235



On 12/3/25 19:51, Ben Chaney wrote:
> Changes since v2
> - I have taken over this patch set since Steve retired
> - Added comments to explain the order of events
> - Remove redundant reversion to cleanup git history
> - Inclusion of virtio and stub fixes
> 
> Tap and vhost devices can be preserved during cpr-transfer using
> traditional live migration methods, wherein the management layer
> creates new interfaces for the target and fiddles with 'ip link'
> to deactivate the old interface and activate the new.
> 
> However, CPR can simply send the file descriptors to new QEMU,
> with no special management actions required.  The user enables
> this behavior by specifing '-netdev tap,cpr=on'.  The default
> is cpr=off.
> 
> Signed-off-by: Ben Chaney <bchaney@akamai.com>
> ---
> Steve Sistare (8):
>        migration: stop vm earlier for cpr
>        migration: cpr setup notifier
>        vhost: reset vhost devices for cpr
>        cpr: delete all fds
>        tap: common return label
>        tap: cpr support
>        tap: postload fix for cpr
>        tap: cpr fixes
> 
>   hw/net/virtio-net.c               |  26 +++++++
>   hw/vfio/device.c                  |   2 +-
>   hw/virtio/vhost-backend.c         |   6 ++
>   hw/virtio/vhost.c                 |  32 +++++++++
>   include/hw/virtio/vhost-backend.h |   1 +
>   include/hw/virtio/vhost.h         |   1 +
>   include/migration/cpr.h           |   3 +-
>   include/net/tap.h                 |   1 +
>   io/channel-socket.c               |   4 +-
>   migration/cpr.c                   |  24 +++++--
>   migration/migration.c             |  69 ++++++++++++++----
>   net/tap-win32.c                   |   5 ++
>   net/tap.c                         | 147 +++++++++++++++++++++++++++++---------
>   qapi/net.json                     |   5 +-
>   stubs/cpr.c                       |   8 +++
>   stubs/meson.build                 |   1 +
>   16 files changed, 279 insertions(+), 56 deletions(-)
> ---
> base-commit: 9febfa94b69b7146582c48a868bd2330ac45037f
> change-id: 20251203-cpr-tap-04fd811ace03
> 
> Best regards,
Re: [PATCH v3 0/8] Live update: tap and vhost
Posted by Chaney, Ben 4 days, 7 hours ago
> Since Steve retired, we have generic names under the "CheckPoint and
> Restart (CPR)" entry in MAINTAINERS. Would you be willing to step forward
> as Reviewers/Maintainers ?


> Also, do you have a gitlab account so we can copy you on any reported
> issues [1] ?

I send a patch adding me and Mark as reviewers for CPR.

My gitlab username is benchaney

Thanks,
        Ben


Re: [PATCH v3 0/8] Live update: tap and vhost
Posted by Mark Kanda 5 days, 12 hours ago
On 12/8/25 4:08 AM, Cédric Le Goater wrote:
> Hello,
>
> Ben, Mark,
>
> Since Steve retired, we have generic names under the "CheckPoint and
> Restart (CPR)" entry in MAINTAINERS. Would you be willing to step forward
> as Reviewers/Maintainers ?
>
You can add me as a Reviewer.

Thanks/regards,
-Mark

> Also, do you have a gitlab account so we can copy you on any reported
> issues [1] ?
>
> Thanks,
>
> C.
>
> [1] https://gitlab.com/qemu-project/qemu/-/issues/3235
>
>
>
> On 12/3/25 19:51, Ben Chaney wrote:
>> Changes since v2
>> - I have taken over this patch set since Steve retired
>> - Added comments to explain the order of events
>> - Remove redundant reversion to cleanup git history
>> - Inclusion of virtio and stub fixes
>>
>> Tap and vhost devices can be preserved during cpr-transfer using
>> traditional live migration methods, wherein the management layer
>> creates new interfaces for the target and fiddles with 'ip link'
>> to deactivate the old interface and activate the new.
>>
>> However, CPR can simply send the file descriptors to new QEMU,
>> with no special management actions required.  The user enables
>> this behavior by specifing '-netdev tap,cpr=on'.  The default
>> is cpr=off.
>>
>> Signed-off-by: Ben Chaney <bchaney@akamai.com>
>> ---
>> Steve Sistare (8):
>>        migration: stop vm earlier for cpr
>>        migration: cpr setup notifier
>>        vhost: reset vhost devices for cpr
>>        cpr: delete all fds
>>        tap: common return label
>>        tap: cpr support
>>        tap: postload fix for cpr
>>        tap: cpr fixes
>>
>>   hw/net/virtio-net.c               |  26 +++++++
>>   hw/vfio/device.c                  |   2 +-
>>   hw/virtio/vhost-backend.c         |   6 ++
>>   hw/virtio/vhost.c                 |  32 +++++++++
>>   include/hw/virtio/vhost-backend.h |   1 +
>>   include/hw/virtio/vhost.h         |   1 +
>>   include/migration/cpr.h           |   3 +-
>>   include/net/tap.h                 |   1 +
>>   io/channel-socket.c               |   4 +-
>>   migration/cpr.c                   |  24 +++++--
>>   migration/migration.c             |  69 ++++++++++++++----
>>   net/tap-win32.c                   |   5 ++
>>   net/tap.c                         | 147 
>> +++++++++++++++++++++++++++++---------
>>   qapi/net.json                     |   5 +-
>>   stubs/cpr.c                       |   8 +++
>>   stubs/meson.build                 |   1 +
>>   16 files changed, 279 insertions(+), 56 deletions(-)
>> ---
>> base-commit: 9febfa94b69b7146582c48a868bd2330ac45037f
>> change-id: 20251203-cpr-tap-04fd811ace03
>>
>> Best regards,
>
>
>
Re: [PATCH v3 0/8] Live update: tap and vhost
Posted by Cédric Le Goater 5 days, 11 hours ago
On 12/8/25 15:22, Mark Kanda wrote:
> On 12/8/25 4:08 AM, Cédric Le Goater wrote:
>> Hello,
>>
>> Ben, Mark,
>>
>> Since Steve retired, we have generic names under the "CheckPoint and
>> Restart (CPR)" entry in MAINTAINERS. Would you be willing to step forward
>> as Reviewers/Maintainers ?
>>
> You can add me as a Reviewer.

You should send a patch, such :

  https://gitlab.com/legoater/qemu/-/commits/9942c711835f

Thanks,

C.


Re: [PATCH v3 0/8] Live update: tap and vhost
Posted by Vladimir Sementsov-Ogievskiy 1 week, 2 days ago
On 03.12.25 21:51, Ben Chaney wrote:
> Changes since v2
> - I have taken over this patch set since Steve retired
> - Added comments to explain the order of events
> - Remove redundant reversion to cleanup git history
> - Inclusion of virtio and stub fixes
> 
> Tap and vhost devices can be preserved during cpr-transfer using
> traditional live migration methods, wherein the management layer
> creates new interfaces for the target and fiddles with 'ip link'
> to deactivate the old interface and activate the new.
> 
> However, CPR can simply send the file descriptors to new QEMU,
> with no special management actions required.  The user enables
> this behavior by specifing '-netdev tap,cpr=on'.  The default
> is cpr=off.
> 
> Signed-off-by: Ben Chaney<bchaney@akamai.com>


Hi!

Hmm, note that I have alternative in-flight series,

[PATCH v9 0/8] virtio-net: live-TAP local migration
https://patchew.org/QEMU/20251030203116.870742-1-vsementsov@yandex-team.ru/

, which bring same thing: migrate TAP device, passing FDs
though migration channel. The benefit is that it doesn't
require additional migration channel.

So, it may be used as part of CPR-migration, or with
usual migration without CPR.

-- 
Best regards,
Vladimir
Re: [PATCH v3 0/8] Live update: tap and vhost
Posted by Chaney, Ben 5 days, 5 hours ago

On 12/4/25, 7:53 AM, "Vladimir Sementsov-Ogievskiy" <vsementsov@yandex-team.ru <mailto:vsementsov@yandex-team.ru>> wrote:

> [PATCH v9 0/8] virtio-net: live-TAP local migration
> https://urldefense.com/v3/__https://patchew.org/QEMU/20251030203116.870742-1-vsementsov@yandex-team.ru <mailto:20251030203116.870742-1-vsementsov@yandex-team.ru>/__;!!GjvTz_vk!Wv8g8JpZGOl96S-RX_T81d0fwPei5C-fKhKAUqM5DJmec3xKhFaStiinE3IFLyUwrs1UQrdQgth3VU1HRlblRjjmVA$


> , which bring same thing: migrate TAP device, passing FDs
> though migration channel. The benefit is that it doesn't
> require additional migration channel.

Hi Vladimir,
        Thanks for sending this. I tried testing your patch
set and I got the following errors from qemu

2025-12-08T20:44:31.251153Z qemu-system-x86_64: 8 != 101
2025-12-08T20:44:31.251199Z qemu-system-x86_64: Failed to load element of type uint16 equal for max_queue_pairs: -22
2025-12-08T20:44:31.251492Z qemu-system-x86_64: warning: qemu_fclose: received fd 141 was never claimed
2025-12-08T20:44:31.251497Z qemu-system-x86_64: warning: qemu_fclose: received fd 142 was never claimed
2025-12-08T20:44:31.251501Z qemu-system-x86_64: warning: qemu_fclose: received fd 143 was never claimed
2025-12-08T20:44:31.251524Z qemu-system-x86_64: load of migration failed: Invalid argument: error while loading state for instance 0x0 of device '0000:00:03.0/virtio-net': Failed to load element of type virtio for virtio: -22

Re: [PATCH v3 0/8] Live update: tap and vhost
Posted by Vladimir Sementsov-Ogievskiy 4 days, 18 hours ago
On 09.12.25 00:03, Chaney, Ben wrote:
> 
> 
> On 12/4/25, 7:53 AM, "Vladimir Sementsov-Ogievskiy" <vsementsov@yandex-team.ru <mailto:vsementsov@yandex-team.ru>> wrote:
> 
>> [PATCH v9 0/8] virtio-net: live-TAP local migration
>> https://urldefense.com/v3/__https://patchew.org/QEMU/20251030203116.870742-1-vsementsov@yandex-team.ru <mailto:20251030203116.870742-1-vsementsov@yandex-team.ru>/__;!!GjvTz_vk!Wv8g8JpZGOl96S-RX_T81d0fwPei5C-fKhKAUqM5DJmec3xKhFaStiinE3IFLyUwrs1UQrdQgth3VU1HRlblRjjmVA$
> 
> 
>> , which bring same thing: migrate TAP device, passing FDs
>> though migration channel. The benefit is that it doesn't
>> require additional migration channel.
> 
> Hi Vladimir,
>          Thanks for sending this. I tried testing your patch
> set and I got the following errors from qemu
> 
> 2025-12-08T20:44:31.251153Z qemu-system-x86_64: 8 != 101
> 2025-12-08T20:44:31.251199Z qemu-system-x86_64: Failed to load element of type uint16 equal for max_queue_pairs: -22
> 2025-12-08T20:44:31.251492Z qemu-system-x86_64: warning: qemu_fclose: received fd 141 was never claimed
> 2025-12-08T20:44:31.251497Z qemu-system-x86_64: warning: qemu_fclose: received fd 142 was never claimed
> 2025-12-08T20:44:31.251501Z qemu-system-x86_64: warning: qemu_fclose: received fd 143 was never claimed
> 2025-12-08T20:44:31.251524Z qemu-system-x86_64: load of migration failed: Invalid argument: error while loading state for instance 0x0 of device '0000:00:03.0/virtio-net': Failed to load element of type virtio for virtio: -22
> 

Thanks for testing!

Hmm. Migration errors was never friendly enough, to understand, what's going wrong. Could you describe, how exactly are you testing it? Also, do you use exactly my commit efb5b1a9aa839619db5  ( pushed here https://gitlab.com/vsementsov/qemu.git tag up-tap-fd-migration-v9 ) ?

Also, to check that the series basically work, you may start the included test (under sudo):

    export QEMU_TEST_QEMU_BINARY=$PWD/build/qemu-system-x86_64
    export PYTHONPATH=python:tests/functional
    python3 tests/functional/x86_64/test_tap_migration.py

-- 
Best regards,
Vladimir