[v4] Live update: tap and vhost

[PATCH v4 0/8] Live update: tap and vhost

Posted by Ben Chaney 1 week, 2 days ago

Tap and vhost devices can be preserved during cpr-transfer using
traditional live migration methods, wherein the management layer
creates new interfaces for the target and fiddles with 'ip link'
to deactivate the old interface and activate the new.

However, CPR can simply send the file descriptors to new QEMU,
with no special management actions required.  The user enables
this behavior by specifing '-netdev tap,cpr=on'.  The default
is cpr=off.

Signed-off-by: Ben Chaney <bchaney@akamai.com>
---
Changes in v4:
- change the name of cpr_get_fd_param as it is no longer used
  exclusively during cpr transfer
- clarify documentation
- Do not require fd=-1 if fds will be provided by cpr
- Do not interleave tap and vhost fds
- Do not check cpr state in qio_channel_handle_fds
- Link to v3: https://lore.kernel.org/qemu-devel/20251203-cpr-tap-v3-0-3c12e0a61f8e@akamai.com

---
Ben Chaney (2):
      tap: cpr support
      tap: cpr fixes

Steve Sistare (6):
      migration: stop vm earlier for cpr
      migration: cpr setup notifier
      vhost: reset vhost devices for cpr
      cpr: delete all fds
      tap: common return label
      tap: postload fix for cpr

 hw/net/virtio-net.c               |  26 +++++++
 hw/vfio/device.c                  |   2 +-
 hw/virtio/vhost-backend.c         |   6 ++
 hw/virtio/vhost.c                 |  32 +++++++++
 include/hw/virtio/vhost-backend.h |   1 +
 include/hw/virtio/vhost.h         |   1 +
 include/migration/cpr.h           |   5 +-
 include/net/tap.h                 |   1 +
 migration/cpr.c                   |  32 ++++++---
 migration/migration.c             |  69 ++++++++++++++----
 net/tap-win32.c                   |   5 ++
 net/tap.c                         | 148 +++++++++++++++++++++++++++++---------
 qapi/net.json                     |   6 +-
 stubs/cpr.c                       |   8 +++
 stubs/meson.build                 |   1 +
 15 files changed, 283 insertions(+), 60 deletions(-)
---
base-commit: 2339d0a1cfac6ecc667e6e062a593865c1541c35
change-id: 20251203-cpr-tap-04fd811ace03

Best regards,
-- 
Ben Chaney <bchaney@akamai.com>

Re: [PATCH v4 0/8] Live update: tap and vhost

Posted by Vladimir Sementsov-Ogievskiy 1 week, 1 day ago

On 28.01.26 23:39, Ben Chaney wrote:
> Tap and vhost devices can be preserved during cpr-transfer using
> traditional live migration methods, wherein the management layer
> creates new interfaces for the target and fiddles with 'ip link'
> to deactivate the old interface and activate the new.
> 
> However, CPR can simply send the file descriptors to new QEMU,
> with no special management actions required.  The user enables
> this behavior by specifing '-netdev tap,cpr=on'.  The default
> is cpr=off.
> 
> Signed-off-by: Ben Chaney <bchaney@akamai.com>
Hi!

I'd like to note again, that I'mvworking on an alternative solution for live-updating
virtio-net+TAP, passing FDs through unix domain socket, which:

1. Doesn't require second migration channel
2. Doesn't use CPR: the whole TAP state, including negotiated parameters
and opened FD are natively passed as usual migration state structure.
(look here: https://lore.kernel.org/qemu-devel/20251030203116.870742-7-vsementsov@yandex-team.ru/ )
3. Still should be compatible with CPR, and may be used in context of CPR-update

The latest version was

[PATCH v9 0/8] virtio-net: live-TAP local migration
https://lore.kernel.org/qemu-devel/20251030203116.870742-1-vsementsov@yandex-team.ru/

and I plan to post v10 soon.

-- 
Best regards,
Vladimir

Re: [PATCH v4 0/8] Live update: tap and vhost

Posted by Peter Xu 4 days, 13 hours ago

On Thu, Jan 29, 2026 at 04:58:41PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 28.01.26 23:39, Ben Chaney wrote:
> > Tap and vhost devices can be preserved during cpr-transfer using
> > traditional live migration methods, wherein the management layer
> > creates new interfaces for the target and fiddles with 'ip link'
> > to deactivate the old interface and activate the new.
> > 
> > However, CPR can simply send the file descriptors to new QEMU,
> > with no special management actions required.  The user enables
> > this behavior by specifing '-netdev tap,cpr=on'.  The default
> > is cpr=off.
> > 
> > Signed-off-by: Ben Chaney <bchaney@akamai.com>
> Hi!
> 
> I'd like to note again, that I'mvworking on an alternative solution for live-updating
> virtio-net+TAP, passing FDs through unix domain socket, which:
> 
> 1. Doesn't require second migration channel
> 2. Doesn't use CPR: the whole TAP state, including negotiated parameters
> and opened FD are natively passed as usual migration state structure.
> (look here: https://lore.kernel.org/qemu-devel/20251030203116.870742-7-vsementsov@yandex-team.ru/ )
> 3. Still should be compatible with CPR, and may be used in context of CPR-update
> 
> The latest version was
> 
> [PATCH v9 0/8] virtio-net: live-TAP local migration
> https://lore.kernel.org/qemu-devel/20251030203116.870742-1-vsementsov@yandex-team.ru/
> 
> and I plan to post v10 soon.

Yes, thanks for re-raising this.  If we have similar features being
proposed, we should always discuss whether we should stick with one of them
if that'll work for all.

IIUC Vladimir's solution looks indeed superior in that it has less
constraints, and also works for CPR mode.

Thanks,

-- 
Peter Xu

Re: [PATCH v4 0/8] Live update: tap and vhost

Posted by Chaney, Ben 4 days, 12 hours ago

On 2/2/26, 9:07 AM, "Peter Xu" <peterx@redhat.com <mailto:peterx@redhat.com>> wrote:

> > The latest version was
> >
> > [PATCH v9 0/8] virtio-net: live-TAP local migration
> > https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/20251030203116.870742-1-vsementsov@yandex-team.ru <mailto:20251030203116.870742-1-vsementsov@yandex-team.ru>/__;!!GjvTz_vk!T9a83fcFgNhv7DgAAfFINb-FE6knVp53t5c7XkCrY76jAhmGxZz42YqHIlPt-eV_hh7V-OksMeBMsw$
> >
> > and I plan to post v10 soon.

> Yes, thanks for re-raising this. If we have similar features being
> proposed, we should always discuss whether we should stick with one of them
> if that'll work for all.

> IIUC Vladimir's solution looks indeed superior in that it has less
> constraints, and also works for CPR mode.

This was previously discussed here: https://lore.kernel.org/all/ef7fd47a-f7c0-4bca-823c-07005c5f1959@yandex-team.ru/

My impression from that discussion is that

1. Vladimir's solution has some extra complexity
2. We are trying to standardize cpr as the primary method for local migration,
so the benefit of supporting non-cpr local transfers is slightly double edged

That said, both solutions seem valid and Vladimir is doing good work in this area,
so I'd be happy with either outcome in this case.

Thanks,
     Ben

Re: [PATCH v4 0/8] Live update: tap and vhost

Posted by Vladimir Sementsov-Ogievskiy 3 days, 17 hours ago

On 02.02.26 18:42, Chaney, Ben wrote:
> 
> 
> On 2/2/26, 9:07 AM, "Peter Xu" <peterx@redhat.com <mailto:peterx@redhat.com>> wrote:
> 
>>> The latest version was
>>>
>>> [PATCH v9 0/8] virtio-net: live-TAP local migration
>>> https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/20251030203116.870742-1-vsementsov@yandex-team.ru <mailto:20251030203116.870742-1-vsementsov@yandex-team.ru>/__;!!GjvTz_vk!T9a83fcFgNhv7DgAAfFINb-FE6knVp53t5c7XkCrY76jAhmGxZz42YqHIlPt-eV_hh7V-OksMeBMsw$
>>>
>>> and I plan to post v10 soon.
> 
> 
>> Yes, thanks for re-raising this. If we have similar features being
>> proposed, we should always discuss whether we should stick with one of them
>> if that'll work for all.
> 
> 
>> IIUC Vladimir's solution looks indeed superior in that it has less
>> constraints, and also works for CPR mode.
> 
> 
> This was previously discussed here: https://lore.kernel.org/all/ef7fd47a-f7c0-4bca-823c-07005c5f1959@yandex-team.ru/
> 
> My impression from that discussion is that
> 
> 1. Vladimir's solution has some extra complexity
> 2. We are trying to standardize cpr as the primary method for local migration,

I believe, that we may do local migration of devices with FDs natively,
through one migration channel, without CPR.

In my opinion, CPR breaks migration architecture, creating additional
state, which owns mixed pieces of different devices (and sometimes,
not only FDs, I heard).

Instead we can keep device state all in device state description,
including FDs if needed.

Also, second migration channel, and the fact that on target we can't
access QMP until we say "migrate" on source seems to me an unnecessary
load on the user and management software, we can avoid this.

Next, as I understand, the only point, why we use CPR for devices, is
avoiding rework of initialization code of some devices, which wants to
have FDs at early stage. But that approach can't be applied
everywhere. An example is vhost-user-blk: you have to rework
initialization code anyway, as if you simple pass FDs to the target in
CPR state, when source is still running, target will simple break the
source, touching the FDs. And, if we can't touch FDs until source stop
- it's actually a usual migration, and we can pass FDs through main
migration channel, doing necessary things in pre-save and post-load,
as usual.

Hmm, looking at patch 01 here, I understand, that virtio-net/TAP does
suffer from same problem? That we actually must not use passed FDs on
target, when source is still running? But stopping source earlier
means increase freeze-time. I think, if we can avoid it (and we can)
we should avoid it.

> so the benefit of supporting non-cpr local transfers is slightly double edged
> 

So, I think, if we plan that there would be more and more devices,
supporting FDs local migration, and we have any change of fitting them
into the "old" migration architecture (without CPR), we should try it.

--

Hm, I don't have a full picture of CPR, it's not only device migration,
but also some other things? Interesting, how much feasible is to move
all these things into main migration channel. That's the question I
can't answer now. But even if keep CPR for some non-device things, it
seems still good to keep the whole state description for a device in
one place - in device code, like it was historically.

-- 
Best regards,
Vladimir

Re: [PATCH v4 0/8] Live update: tap and vhost

Posted by Peter Xu 3 days, 8 hours ago

On Tue, Feb 03, 2026 at 12:57:16PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 02.02.26 18:42, Chaney, Ben wrote:
> > 
> > 
> > On 2/2/26, 9:07 AM, "Peter Xu" <peterx@redhat.com <mailto:peterx@redhat.com>> wrote:
> > 
> > > > The latest version was
> > > > 
> > > > [PATCH v9 0/8] virtio-net: live-TAP local migration
> > > > https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/20251030203116.870742-1-vsementsov@yandex-team.ru <mailto:20251030203116.870742-1-vsementsov@yandex-team.ru>/__;!!GjvTz_vk!T9a83fcFgNhv7DgAAfFINb-FE6knVp53t5c7XkCrY76jAhmGxZz42YqHIlPt-eV_hh7V-OksMeBMsw$
> > > > 
> > > > and I plan to post v10 soon.
> > 
> > 
> > > Yes, thanks for re-raising this. If we have similar features being
> > > proposed, we should always discuss whether we should stick with one of them
> > > if that'll work for all.
> > 
> > 
> > > IIUC Vladimir's solution looks indeed superior in that it has less
> > > constraints, and also works for CPR mode.
> > 
> > 
> > This was previously discussed here: https://lore.kernel.org/all/ef7fd47a-f7c0-4bca-823c-07005c5f1959@yandex-team.ru/
> > 
> > My impression from that discussion is that
> > 
> > 1. Vladimir's solution has some extra complexity
> > 2. We are trying to standardize cpr as the primary method for local migration,
> 
> I believe, that we may do local migration of devices with FDs natively,
> through one migration channel, without CPR.
> 
> In my opinion, CPR breaks migration architecture, creating additional
> state, which owns mixed pieces of different devices (and sometimes,
> not only FDs, I heard).
> 
> Instead we can keep device state all in device state description,
> including FDs if needed.
> 
> Also, second migration channel, and the fact that on target we can't
> access QMP until we say "migrate" on source seems to me an unnecessary
> load on the user and management software, we can avoid this.
> 
> Next, as I understand, the only point, why we use CPR for devices, is
> avoiding rework of initialization code of some devices, which wants to
> have FDs at early stage. But that approach can't be applied
> everywhere. An example is vhost-user-blk: you have to rework
> initialization code anyway, as if you simple pass FDs to the target in
> CPR state, when source is still running, target will simple break the
> source, touching the FDs. And, if we can't touch FDs until source stop
> - it's actually a usual migration, and we can pass FDs through main
> migration channel, doing necessary things in pre-save and post-load,
> as usual.
> 
> Hmm, looking at patch 01 here, I understand, that virtio-net/TAP does
> suffer from same problem? That we actually must not use passed FDs on
> target, when source is still running? But stopping source earlier
> means increase freeze-time. I think, if we can avoid it (and we can)
> we should avoid it.
> 
> > so the benefit of supporting non-cpr local transfers is slightly double edged
> > 
> 
> So, I think, if we plan that there would be more and more devices,
> supporting FDs local migration, and we have any change of fitting them
> into the "old" migration architecture (without CPR), we should try it.

Well explained, thank you Vladimir.  I wish some day we can move all at
least cpr-transfer users to local-migration and deprecate CPR if ever
possible.  The uncertainty to me is cpr-exec, but I really don't know how
much mgmt is adopting cpr-exec..  cpr-reboot also looks pretty special and
may not be relevant.

The core idea (originated from Steve..) is really about fd sharing, and
it's great if we can do it in a cleaner way.

> 
> --
> 
> Hm, I don't have a full picture of CPR, it's not only device migration,
> but also some other things? Interesting, how much feasible is to move
> all these things into main migration channel. That's the question I
> can't answer now. But even if keep CPR for some non-device things, it
> seems still good to keep the whole state description for a device in
> one place - in device code, like it was historically.

My understanding is there're some special mgmt (Oracle's?) that may depend
on cpr-exec; I'm not sure how far that went in any downstream deployment.

That should be able to reuse mgmt channels too (relevant to chardev fd
sharing, perhaps?) instead of requiring e.g. all monitor ports to reconnect
to a new QEMU after migration.  Said that, I always assumed re-connect is
fine, and most mgmt supports live migration so the mgmt should have that
infrastructure there already.  Maybe Ben would know better.

Thanks,

-- 
Peter Xu

Re: [PATCH v4 0/8] Live update: tap and vhost

Posted by Chaney, Ben 3 days, 8 hours ago

> Well explained, thank you Vladimir. I wish some day we can move all at
> least cpr-transfer users to local-migration and deprecate CPR if ever
> possible. The uncertainty to me is cpr-exec, but I really don't know how
> much mgmt is adopting cpr-exec.. cpr-reboot also looks pretty special and
> may not be relevant.


> The core idea (originated from Steve..) is really about fd sharing, and
> it's great if we can do it in a cleaner way.

Thanks for the clarification. If that is the case then probably Vladimir's
solution is preferable. I sent some comments on the prerequisite
refactoring patch set. I'll try to review the main set soon.


> My understanding is there're some special mgmt (Oracle's?) that may depend
> on cpr-exec; I'm not sure how far that went in any downstream deployment.


> That should be able to reuse mgmt channels too (relevant to chardev fd
> sharing, perhaps?) instead of requiring e.g. all monitor ports to reconnect
> to a new QEMU after migration. Said that, I always assumed re-connect is
> fine, and most mgmt supports live migration so the mgmt should have that
> infrastructure there already. Maybe Ben would know better.

I'm missing a lot of context here as I only became more involved in this project
recently. @Mark Kanda Can you provide any context about Steve's work before
he retired, or Oracle's usage of these features?

Thanks,
        Ben

Re: [PATCH v4 0/8] Live update: tap and vhost

Posted by Mark Kanda 3 days, 7 hours ago

On 2/3/26 1:46 PM, Chaney, Ben wrote:
>> Well explained, thank you Vladimir. I wish some day we can move all at
>> least cpr-transfer users to local-migration and deprecate CPR if ever
>> possible. The uncertainty to me is cpr-exec, but I really don't know how
>> much mgmt is adopting cpr-exec.. cpr-reboot also looks pretty special and
>> may not be relevant.
>> The core idea (originated from Steve..) is really about fd sharing, and
>> it's great if we can do it in a cleaner way.
> Thanks for the clarification. If that is the case then probably Vladimir's
> solution is preferable. I sent some comments on the prerequisite
> refactoring patch set. I'll try to review the main set soon.
>
>> My understanding is there're some special mgmt (Oracle's?) that may depend
>> on cpr-exec; I'm not sure how far that went in any downstream deployment.
> That should be able to reuse mgmt channels too (relevant to chardev fd
> sharing, perhaps?) instead of requiring e.g. all monitor ports to reconnect
> to a new QEMU after migration. Said that, I always assumed re-connect is
> fine, and most mgmt supports live migration so the mgmt should have that
> infrastructure there already. Maybe Ben would know better.
>
> I'm missing a lot of context here as I only became more involved in this project
> recently. @Mark Kanda Can you provide any context about Steve's work before
> he retired, or Oracle's usage of these features?

We (Oracle) have an internal VM manager which relies on cpr-exec, and would
like to continue supporting it.

Thanks/regards,
-Mark

Re: [PATCH v4 0/8] Live update: tap and vhost

Posted by Peter Xu 3 days, 7 hours ago

On Tue, Feb 03, 2026 at 02:04:21PM -0600, Mark Kanda wrote:
> On 2/3/26 1:46 PM, Chaney, Ben wrote:
> >> Well explained, thank you Vladimir. I wish some day we can move all at
> >> least cpr-transfer users to local-migration and deprecate CPR if ever
> >> possible. The uncertainty to me is cpr-exec, but I really don't know how
> >> much mgmt is adopting cpr-exec.. cpr-reboot also looks pretty special and
> >> may not be relevant.
> >> The core idea (originated from Steve..) is really about fd sharing, and
> >> it's great if we can do it in a cleaner way.
> > Thanks for the clarification. If that is the case then probably Vladimir's
> > solution is preferable. I sent some comments on the prerequisite
> > refactoring patch set. I'll try to review the main set soon.
> >
> >> My understanding is there're some special mgmt (Oracle's?) that may depend
> >> on cpr-exec; I'm not sure how far that went in any downstream deployment.
> > That should be able to reuse mgmt channels too (relevant to chardev fd
> > sharing, perhaps?) instead of requiring e.g. all monitor ports to reconnect
> > to a new QEMU after migration. Said that, I always assumed re-connect is
> > fine, and most mgmt supports live migration so the mgmt should have that
> > infrastructure there already. Maybe Ben would know better.
> >
> > I'm missing a lot of context here as I only became more involved in this project
> > recently. @Mark Kanda Can you provide any context about Steve's work before
> > he retired, or Oracle's usage of these features?
> 
> We (Oracle) have an internal VM manager which relies on cpr-exec, and would
> like to continue supporting it.

IIUC that means QEMU upstream needs to start merging two solutions for not
only migration but also vhost, and maybe more in the future.  Or we reject
Vladimir's work, but frankly I think that's indeed superior and less hacky
when without the need of the 2nd channel.

Can we try to reduce the duplicated logics between the two solutions?

For example, is it possible to rebase this series on top of Vladimir's
work, reusing logics as much as possible so that those FDs can also be
preserved via execve() and reused in the after-exec world (likely reused
not during fd open, but instead loadvm)?

Thanks,

-- 
Peter Xu

Re: [PATCH v4 0/8] Live update: tap and vhost

Posted by Vladimir Sementsov-Ogievskiy 2 days, 19 hours ago

On 03.02.26 23:47, Peter Xu wrote:
> On Tue, Feb 03, 2026 at 02:04:21PM -0600, Mark Kanda wrote:
>> On 2/3/26 1:46 PM, Chaney, Ben wrote:
>>>> Well explained, thank you Vladimir. I wish some day we can move all at
>>>> least cpr-transfer users to local-migration and deprecate CPR if ever
>>>> possible. The uncertainty to me is cpr-exec, but I really don't know how
>>>> much mgmt is adopting cpr-exec.. cpr-reboot also looks pretty special and
>>>> may not be relevant.
>>>> The core idea (originated from Steve..) is really about fd sharing, and
>>>> it's great if we can do it in a cleaner way.
>>> Thanks for the clarification. If that is the case then probably Vladimir's
>>> solution is preferable. I sent some comments on the prerequisite
>>> refactoring patch set. I'll try to review the main set soon.
>>>
>>>> My understanding is there're some special mgmt (Oracle's?) that may depend
>>>> on cpr-exec; I'm not sure how far that went in any downstream deployment.
>>> That should be able to reuse mgmt channels too (relevant to chardev fd
>>> sharing, perhaps?) instead of requiring e.g. all monitor ports to reconnect
>>> to a new QEMU after migration. Said that, I always assumed re-connect is
>>> fine, and most mgmt supports live migration so the mgmt should have that
>>> infrastructure there already. Maybe Ben would know better.
>>>
>>> I'm missing a lot of context here as I only became more involved in this project
>>> recently. @Mark Kanda Can you provide any context about Steve's work before
>>> he retired, or Oracle's usage of these features?
>>
>> We (Oracle) have an internal VM manager which relies on cpr-exec, and would
>> like to continue supporting it.
> 
> IIUC that means QEMU upstream needs to start merging two solutions for not
> only migration but also vhost, and maybe more in the future.  Or we reject
> Vladimir's work, but frankly I think that's indeed superior and less hacky
> when without the need of the 2nd channel.
> 
> Can we try to reduce the duplicated logics between the two solutions?
> 
> For example, is it possible to rebase this series on top of Vladimir's
> work, reusing logics as much as possible so that those FDs can also be
> preserved via execve() and reused in the after-exec world (likely reused
> not during fd open, but instead loadvm)?
> 

I think, rebasing is not necessary, as local-fd migration (this series)
should work together with CPR, and even with CPR-exec. So Oracle can use cpr-exec,
and simply enable backend-transfer for virtio-net/tap, and it should work, I
remember we discussed this with Steve. I didn't yet tested it, but I'll try to
add a test for such setup.

-- 
Best regards,
Vladimir

Re: [PATCH v4 0/8] Live update: tap and vhost

Posted by Peter Xu 2 days, 11 hours ago

On Wed, Feb 04, 2026 at 10:56:58AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> I think, rebasing is not necessary, as local-fd migration (this series)
> should work together with CPR, and even with CPR-exec. So Oracle can use cpr-exec,
> and simply enable backend-transfer for virtio-net/tap, and it should work, I
> remember we discussed this with Steve. I didn't yet tested it, but I'll try to
> add a test for such setup.

IIUC it should work for cpr-transfer, but likely not cpr-exec.  cpr-exec
requires removal of FD_CLOEXEC for fds to be persisted.  See cpr_exec_cb()
where it invokes cpr_exec_preserve_fds() (only on top of cpr saved fds).

-- 
Peter Xu