migration: Pass network packets received during switchover to dest VM

[PATCH 0/4] migration: Pass network packets received during switchover to dest VM

Posted by Juraj Marcin 1 week, 3 days ago

During switchover there is a period during which both source and
destination side VMs are paused. During this period, all network packets
are still routed to the source side, but it will never process them.
Once the destination resumes, it is not aware of these packets and they
are lost. This can cause packet loss in unreliable protocols and
extended delays due to retransmission in reliable protocols.

This series resolves this problem by caching packets received once the
source VM pauses and then passing and injecting them on the destination
side. This feature is implemented in the last patch. The caching and
injecting is implemented using network filter interface and should work
with any backend with vhost=off, but only TAP network backend was
explicitly tested.

This series also introduces an RP_VM_STARTED message on the return-path
channel, which is used to correctly calculate downtime for both precopy
and postcopy, and also as a trigger for netpass to forward packets to
the destination. With more data sent through the migration channel after
the destination VM starts, using RP_SHUT wouldn't be accurate anymore,
and in postcopy the downtime calculation was always incorrect.

As netpass requires return-path capability, its capability is also off
by default, but I am open for discussion about making it on by default,
as long as return-path is enabled (i.e. enabling return-path would also
enable netpass unless it is explicitly disabled).

Juraj Marcin (4):
  migration/qemu-file: Add ability to clear error
  migration: Introduce VM_STARTED return-path message
  migration: Convert VMSD early_setup into VMStateSavePhase enum
  migration: Pass network packets received during switchover to dest VM

 hw/core/machine.c           |   4 +-
 hw/virtio/virtio-mem.c      |   2 +-
 include/migration/vmstate.h |  33 +++--
 include/net/net.h           |   5 +
 migration/meson.build       |   1 +
 migration/migration.c       |  83 +++++++++++-
 migration/migration.h       |  11 ++
 migration/netpass.c         | 246 ++++++++++++++++++++++++++++++++++++
 migration/netpass.h         |  14 ++
 migration/options.c         |  29 +++++
 migration/options.h         |   2 +
 migration/qemu-file.c       |   6 +
 migration/qemu-file.h       |   1 +
 migration/savevm.c          |  44 ++++++-
 migration/savevm.h          |   2 +
 migration/trace-events      |   9 ++
 net/net.c                   |  11 ++
 net/tap.c                   |  11 +-
 qapi/migration.json         |   7 +-
 19 files changed, 501 insertions(+), 20 deletions(-)
 create mode 100644 migration/netpass.c
 create mode 100644 migration/netpass.h

-- 
2.52.0

Re: [PATCH 0/4] migration: Pass network packets received during switchover to dest VM

Posted by Stefano Brivio 1 week, 3 days ago

[Cc'ing Laurent and David]

On Tue, 27 Jan 2026 15:03:06 +0100
Juraj Marcin <jmarcin@redhat.com> wrote:

> During switchover there is a period during which both source and
> destination side VMs are paused. During this period, all network packets
> are still routed to the source side, but it will never process them.
> Once the destination resumes, it is not aware of these packets and they
> are lost. This can cause packet loss in unreliable protocols and
> extended delays due to retransmission in reliable protocols.
> 
> This series resolves this problem by caching packets received once the
> source VM pauses and then passing and injecting them on the destination
> side. This feature is implemented in the last patch. The caching and
> injecting is implemented using network filter interface and should work
> with any backend with vhost=off, but only TAP network backend was
> explicitly tested.

I haven't had a chance to try this change with passt(1) yet (the
backend can be enabled using "-net passt" or by starting it
separately).

Given that passt implements migration on its own (in deeper detail in
some sense, as TCP connections are preserved if IP addresses match), I
wonder if it this might affect or break it somehow.

Did you perhaps have some thoughts about that already?

For context, we didn't really write comprehensive documentation about
it yet, but:

- KubeVirt's enhancement repository has a detailed description at:
  https://github.com/kubevirt/enhancements/blob/main/veps/sig-network/passt/passt-migration-proposal.md#live-migration-with-passt

- the QEMU-facing details are outlined in:
  https://archives.passt.top/passt-dev/20241219111400.2352110-1-lvivier@redhat.com/

- usage of TCP_REPAIR is briefly described in passt-repair(1)

-- 
Stefano

Re: [PATCH 0/4] migration: Pass network packets received during switchover to dest VM

Posted by Laurent Vivier 3 days, 16 hours ago

On 1/27/26 19:21, Stefano Brivio wrote:
> [Cc'ing Laurent and David]
> 
> On Tue, 27 Jan 2026 15:03:06 +0100
> Juraj Marcin <jmarcin@redhat.com> wrote:
> 
>> During switchover there is a period during which both source and
>> destination side VMs are paused. During this period, all network packets
>> are still routed to the source side, but it will never process them.
>> Once the destination resumes, it is not aware of these packets and they
>> are lost. This can cause packet loss in unreliable protocols and
>> extended delays due to retransmission in reliable protocols.
>>
>> This series resolves this problem by caching packets received once the
>> source VM pauses and then passing and injecting them on the destination
>> side. This feature is implemented in the last patch. The caching and
>> injecting is implemented using network filter interface and should work
>> with any backend with vhost=off, but only TAP network backend was
>> explicitly tested.
> 
> I haven't had a chance to try this change with passt(1) yet (the
> backend can be enabled using "-net passt" or by starting it
> separately).
> 
> Given that passt implements migration on its own (in deeper detail in
> some sense, as TCP connections are preserved if IP addresses match), I
> wonder if it this might affect or break it somehow.
> 

passt implements migration only with the vhost-user backend ("-netdev vhost-user") that is 
not supported by netpass. All the vhost-* cannot be supported because netpass cannot catch 
packets on the virtio queues.

passt with "-netdev stream" doesn't implement migration, but QEMU can be migrated with it 
and all the connections are lost. So netpass will forward packets for connections that 
will be broken.

"-netdev passt" is only some kind of wrapper on top of "-netdev stream" and "-netdev 
vhost-user" that starts the passt backend by itself (rather than expecting it has been 
started by the user).

Thanks,
Laurent

Re: [PATCH 0/4] migration: Pass network packets received during switchover to dest VM

Posted by Stefano Brivio 3 days, 14 hours ago

On Tue, 3 Feb 2026 13:03:26 +0100
Laurent Vivier <lvivier@redhat.com> wrote:

> On 1/27/26 19:21, Stefano Brivio wrote:
> > [Cc'ing Laurent and David]
> > 
> > On Tue, 27 Jan 2026 15:03:06 +0100
> > Juraj Marcin <jmarcin@redhat.com> wrote:
> >   
> >> During switchover there is a period during which both source and
> >> destination side VMs are paused. During this period, all network packets
> >> are still routed to the source side, but it will never process them.
> >> Once the destination resumes, it is not aware of these packets and they
> >> are lost. This can cause packet loss in unreliable protocols and
> >> extended delays due to retransmission in reliable protocols.
> >>
> >> This series resolves this problem by caching packets received once the
> >> source VM pauses and then passing and injecting them on the destination
> >> side. This feature is implemented in the last patch. The caching and
> >> injecting is implemented using network filter interface and should work
> >> with any backend with vhost=off, but only TAP network backend was
> >> explicitly tested.  
> > 
> > I haven't had a chance to try this change with passt(1) yet (the
> > backend can be enabled using "-net passt" or by starting it
> > separately).
> > 
> > Given that passt implements migration on its own (in deeper detail in
> > some sense, as TCP connections are preserved if IP addresses match), I
> > wonder if it this might affect or break it somehow.
> 
> passt implements migration only with the vhost-user backend ("-netdev vhost-user") that is 
> not supported by netpass. All the vhost-* cannot be supported because netpass cannot catch 
> packets on the virtio queues.

Thanks for having a look! On this point... right, hence my question in:

  https://lore.kernel.org/qemu-devel/20260131032700.12f27487@elisabeth/

that is, is there a plan to add vhost support *for netpass*, eventually?
It looks like yes:

  https://lore.kernel.org/qemu-devel/CACLfguUZpT-3sj4C8G8e+LB5GHpBfE_HKLOhyZ9qYR8bgkTOCw@mail.gmail.com/

but I'm not sure I got it right (Cindy? Jason?).

> passt with "-netdev stream" doesn't implement migration, but QEMU can be migrated with it 
> and all the connections are lost. So netpass will forward packets for connections that 
> will be broken.

Realistically, I don't think anybody will ever try to migrate VMs using
-netdev stream with passt, so I guess we don't really have to care
about this (it might help with some protocols, probably make UDP usage
a bit worse, waste a bit of bandwidth with TCP... but that's it).

The only existing (known) user of passt's migration feature is
KubeVirt, which switched to passt's vhost-user interface entirely.

> "-netdev passt" is only some kind of wrapper on top of "-netdev stream" and "-netdev 
> vhost-user" that starts the passt backend by itself (rather than expecting it has been 
> started by the user).

-- 
Stefano

Re: [PATCH 0/4] migration: Pass network packets received during switchover to dest VM

Posted by Juraj Marcin 1 week, 2 days ago

Hi Stefano,

On 2026-01-27 19:21, Stefano Brivio wrote:
> [Cc'ing Laurent and David]
> 
> On Tue, 27 Jan 2026 15:03:06 +0100
> Juraj Marcin <jmarcin@redhat.com> wrote:
> 
> > During switchover there is a period during which both source and
> > destination side VMs are paused. During this period, all network packets
> > are still routed to the source side, but it will never process them.
> > Once the destination resumes, it is not aware of these packets and they
> > are lost. This can cause packet loss in unreliable protocols and
> > extended delays due to retransmission in reliable protocols.
> > 
> > This series resolves this problem by caching packets received once the
> > source VM pauses and then passing and injecting them on the destination
> > side. This feature is implemented in the last patch. The caching and
> > injecting is implemented using network filter interface and should work
> > with any backend with vhost=off, but only TAP network backend was
> > explicitly tested.
> 
> I haven't had a chance to try this change with passt(1) yet (the
> backend can be enabled using "-net passt" or by starting it
> separately).
> 
> Given that passt implements migration on its own (in deeper detail in
> some sense, as TCP connections are preserved if IP addresses match), I
> wonder if it this might affect or break it somehow.
> 
> Did you perhaps have some thoughts about that already?

I'm aware of passt migrating its state and passt-repair, but I also
haven't tested it as I couldn't get passt-repair to work. Does it also
handle other protocols, or just preserves TCP connections?

The main focus of this feature are protocols that cannot handle packet
loss on their own in environments where IP address is preserved (and
thus also TCP connections). So, mainly tap/bridge, with the idea that
other network backends could also benefit from it. However, if it causes
problems with other backends, I could limit it just to tap.

> 
> For context, we didn't really write comprehensive documentation about
> it yet, but:
> 
> - KubeVirt's enhancement repository has a detailed description at:
>   https://github.com/kubevirt/enhancements/blob/main/veps/sig-network/passt/passt-migration-proposal.md#live-migration-with-passt
> 
> - the QEMU-facing details are outlined in:
>   https://archives.passt.top/passt-dev/20241219111400.2352110-1-lvivier@redhat.com/
> 
> - usage of TCP_REPAIR is briefly described in passt-repair(1)
> 
> -- 
> Stefano
>

Re: [PATCH 0/4] migration: Pass network packets received during switchover to dest VM

Posted by Stefano Brivio 1 week, 2 days ago

On Wed, 28 Jan 2026 14:06:11 +0100
Juraj Marcin <jmarcin@redhat.com> wrote:

> Hi Stefano,
> 
> On 2026-01-27 19:21, Stefano Brivio wrote:
> > [Cc'ing Laurent and David]
> > 
> > On Tue, 27 Jan 2026 15:03:06 +0100
> > Juraj Marcin <jmarcin@redhat.com> wrote:
> >   
> > > During switchover there is a period during which both source and
> > > destination side VMs are paused. During this period, all network packets
> > > are still routed to the source side, but it will never process them.
> > > Once the destination resumes, it is not aware of these packets and they
> > > are lost. This can cause packet loss in unreliable protocols and
> > > extended delays due to retransmission in reliable protocols.
> > > 
> > > This series resolves this problem by caching packets received once the
> > > source VM pauses and then passing and injecting them on the destination
> > > side. This feature is implemented in the last patch. The caching and
> > > injecting is implemented using network filter interface and should work
> > > with any backend with vhost=off, but only TAP network backend was
> > > explicitly tested.  
> > 
> > I haven't had a chance to try this change with passt(1) yet (the
> > backend can be enabled using "-net passt" or by starting it
> > separately).
> > 
> > Given that passt implements migration on its own (in deeper detail in
> > some sense, as TCP connections are preserved if IP addresses match), I
> > wonder if it this might affect or break it somehow.
> > 
> > Did you perhaps have some thoughts about that already?  
> 
> I'm aware of passt migrating its state and passt-repair, but I also
> haven't tested it as I couldn't get passt-repair to work.

Oops. Let me know if you're hitting any specific error I could look
into.

I plan anyway to try out your changes but I might need a couple of days
before I find the time.

> Does it also handle other protocols, or just preserves TCP connections?

Layer-4-wise, we have an internal representation of UDP "flows"
(observed flows of packets for which we preserve the same source port
mapping, with timeouts) and we had a vague idea of migrating those as
well, but it's debatable where there's any benefit from it.

At Layer 2 and 3, we migrate IP and MAC addresses we observed from the
guest:

  https://passt.top/passt/tree/migrate.c?id=e3f70c05bad90368a1a89bf31a9015125232b9ae#n31

so that we have ARP and NDP resolution, as well as any NAT
mapping working right away as needed.

For completeness, this is the TCP context we migrate instead:

  https://passt.top/passt/tree/tcp_conn.h?id=e3f70c05bad90368a1a89bf31a9015125232b9ae#n108
  https://passt.top/passt/tree/tcp_conn.h?id=e3f70c05bad90368a1a89bf31a9015125232b9ae#n154

> The main focus of this feature are protocols that cannot handle packet
> loss on their own in environments where IP address is preserved (and
> thus also TCP connections).

Well, strictly speaking, TCP handles packet loss, that's actually the
main reason behind it. I guess this is to improve throughput and avoid
latency spikes or retransmissions that could be avoided?

> So, mainly tap/bridge, with the idea that
> other network backends could also benefit from it. However, if it causes
> problems with other backends, I could limit it just to tap.

I couldn't quite figure out yet if it's beneficial, useless, or
harmless for passt. With passt, what happens without your
implementation is:

1. guest pauses

2. the source instance of passt starts migrating, meaning that sockets
   are frozen one by one, their receiving and sending queues dumped

3. pending queues are sent to the target instance of passt, which opens
   sockets as refills queues as needed

4. target guest resumes and will get any traffic that was received by
   the source instance of passt between 1. and 2.

Right now there's still a Linux kernel issue we observed (see also
https://pad.passt.top/p/TcpRepairTodo, that's line 4 there) which might
cause segments to be received (and acknowledged!) on sockets of the
source instance of passt for a small time period *after* we freeze them
with TCP_REPAIR (that is, TCP_REPAIR doesn't really freeze the queue).

I'm currently working on a proper fix for that. Until then, point 2.
above isn't entirely accurate (but it only happens if you hammer it
with traffic generators, it's not really visible otherwise).

With your implementation, I guess:

1. guest pauses

2. the source instance of passt starts migrating, meaning that sockets
   are frozen one by one, their receiving and sending queues dumped

2a. any data received by QEMU after 1. will be stored and forwarded to
    the target later. But passt at this point prevents the guest from
    getting any data, so there should be no data involved

3. pending queues are sent to the target instance of passt, which opens
   sockets as refills queues as needed

3a. the target guest gets the data from 2a. As long as there's no data
    (as I'm assuming), there should be no change. If there's data coming
    in at this point, we risk that sequences don't match anymore? I'm not
    sure

4. target guest resumes and will *also* get any traffic that was received
   by the source instance of passt between 1. and 2.

So if my assumption from 2a. above holds, it should be useless, but
harmless.

Would your implementation help with the kernel glitch we're currently
observing? I don't think so, because your implementation would only play
a role between passt and QEMU, and we don't have issues there.

Well, it would be good to try things out. Other than that, unless I'm
missing something, your implementation should probably be skipped for
passt for simplicity, and also to avoid negatively affecting downtime.

Note that you can also use passt without "-net passt" (that's actually
quite recent) but with a tap back-end. Migration is only supported with
vhost-user enabled though, and as far as I understand your implementation
is disabled in that case?

-- 
Stefano

Re: [PATCH 0/4] migration: Pass network packets received during switchover to dest VM

Posted by Juraj Marcin 1 week ago

Hi Stefano,

thanks for the answer!

On 2026-01-28 18:27, Stefano Brivio wrote:
> On Wed, 28 Jan 2026 14:06:11 +0100
> Juraj Marcin <jmarcin@redhat.com> wrote:
> 
> > Hi Stefano,
> > 
> > On 2026-01-27 19:21, Stefano Brivio wrote:
> > > [Cc'ing Laurent and David]
> > > 
> > > On Tue, 27 Jan 2026 15:03:06 +0100
> > > Juraj Marcin <jmarcin@redhat.com> wrote:
> > >   
> > > > During switchover there is a period during which both source and
> > > > destination side VMs are paused. During this period, all network packets
> > > > are still routed to the source side, but it will never process them.
> > > > Once the destination resumes, it is not aware of these packets and they
> > > > are lost. This can cause packet loss in unreliable protocols and
> > > > extended delays due to retransmission in reliable protocols.
> > > > 
> > > > This series resolves this problem by caching packets received once the
> > > > source VM pauses and then passing and injecting them on the destination
> > > > side. This feature is implemented in the last patch. The caching and
> > > > injecting is implemented using network filter interface and should work
> > > > with any backend with vhost=off, but only TAP network backend was
> > > > explicitly tested.  
> > > 
> > > I haven't had a chance to try this change with passt(1) yet (the
> > > backend can be enabled using "-net passt" or by starting it
> > > separately).
> > > 
> > > Given that passt implements migration on its own (in deeper detail in
> > > some sense, as TCP connections are preserved if IP addresses match), I
> > > wonder if it this might affect or break it somehow.
> > > 
> > > Did you perhaps have some thoughts about that already?  
> > 
> > I'm aware of passt migrating its state and passt-repair, but I also
> > haven't tested it as I couldn't get passt-repair to work.
> 
> Oops. Let me know if you're hitting any specific error I could look
> into.

I tried it using this documentation [1] I found earlier, however, it
wouldn't work when migrating on the same host as I expected from it. The
destination passt process fails to get the port the outside TCP server
is communicating with and I see the connection still as established with
the source passt process. This is the specific error message from the
destination passt process:

    Flow 0 (TCP connection): Failed to connect migrated socket: Cannot assign requested address

[1]: https://www.qemu.org/docs/master/system/devices/net.html#example-of-migration-of-a-guest-on-the-same-host

> 
> I plan anyway to try out your changes but I might need a couple of days
> before I find the time.
> 
> > Does it also handle other protocols, or just preserves TCP connections?
> 
> Layer-4-wise, we have an internal representation of UDP "flows"
> (observed flows of packets for which we preserve the same source port
> mapping, with timeouts) and we had a vague idea of migrating those as
> well, but it's debatable where there's any benefit from it.
> 
> At Layer 2 and 3, we migrate IP and MAC addresses we observed from the
> guest:
> 
>   https://passt.top/passt/tree/migrate.c?id=e3f70c05bad90368a1a89bf31a9015125232b9ae#n31
> 
> so that we have ARP and NDP resolution, as well as any NAT
> mapping working right away as needed.
> 
> For completeness, this is the TCP context we migrate instead:
> 
>   https://passt.top/passt/tree/tcp_conn.h?id=e3f70c05bad90368a1a89bf31a9015125232b9ae#n108
>   https://passt.top/passt/tree/tcp_conn.h?id=e3f70c05bad90368a1a89bf31a9015125232b9ae#n154
> 
> > The main focus of this feature are protocols that cannot handle packet
> > loss on their own in environments where IP address is preserved (and
> > thus also TCP connections).
> 
> Well, strictly speaking, TCP handles packet loss, that's actually the
> main reason behind it. I guess this is to improve throughput and avoid
> latency spikes or retransmissions that could be avoided?

Sorry, I actually meant that all connections are preserved. The main
goal is to prevent losses with protocols other than TCP when possible,
which was requested by our Solution Architects. Possible improved TCP
throughput due to avoided retransmissions is just a side effect of that.

> 
> > So, mainly tap/bridge, with the idea that
> > other network backends could also benefit from it. However, if it causes
> > problems with other backends, I could limit it just to tap.
> 
> I couldn't quite figure out yet if it's beneficial, useless, or
> harmless for passt. With passt, what happens without your
> implementation is:
> 
> 1. guest pauses
> 
> 2. the source instance of passt starts migrating, meaning that sockets
>    are frozen one by one, their receiving and sending queues dumped
> 
> 3. pending queues are sent to the target instance of passt, which opens
>    sockets as refills queues as needed
> 
> 4. target guest resumes and will get any traffic that was received by
>    the source instance of passt between 1. and 2.
> 
> Right now there's still a Linux kernel issue we observed (see also
> https://pad.passt.top/p/TcpRepairTodo, that's line 4 there) which might
> cause segments to be received (and acknowledged!) on sockets of the
> source instance of passt for a small time period *after* we freeze them
> with TCP_REPAIR (that is, TCP_REPAIR doesn't really freeze the queue).
> 
> I'm currently working on a proper fix for that. Until then, point 2.
> above isn't entirely accurate (but it only happens if you hammer it
> with traffic generators, it's not really visible otherwise).
> 
> With your implementation, I guess:
> 
> 1. guest pauses
> 
> 2. the source instance of passt starts migrating, meaning that sockets
>    are frozen one by one, their receiving and sending queues dumped
> 
> 2a. any data received by QEMU after 1. will be stored and forwarded to
>     the target later. But passt at this point prevents the guest from
>     getting any data, so there should be no data involved
> 
> 3. pending queues are sent to the target instance of passt, which opens
>    sockets as refills queues as needed
> 
> 3a. the target guest gets the data from 2a. As long as there's no data
>     (as I'm assuming), there should be no change. If there's data coming
>     in at this point, we risk that sequences don't match anymore? I'm not
>     sure
> 
> 4. target guest resumes and will *also* get any traffic that was received
>    by the source instance of passt between 1. and 2.
> 
> So if my assumption from 2a. above holds, it should be useless, but
> harmless.
> 
> Would your implementation help with the kernel glitch we're currently
> observing? I don't think so, because your implementation would only play
> a role between passt and QEMU, and we don't have issues there.
> 
> Well, it would be good to try things out. Other than that, unless I'm
> missing something, your implementation should probably be skipped for
> passt for simplicity, and also to avoid negatively affecting downtime.

I agree with skipping passt in such case, although, I haven't perceived
any effect on downtime. Cached network packets are sent after the
destination resumes, so that the network knows about new location of the
VM and the source shouldn't receive any more packets intended for it.

> 
> Note that you can also use passt without "-net passt" (that's actually
> quite recent) but with a tap back-end. Migration is only supported with
> vhost-user enabled though, and as far as I understand your implementation
> is disabled in that case?

As of now it is disabled in that case as network filters don't support
vhost.

> 
> -- 
> Stefano

-- 
Juraj Marcin

Re: [PATCH 0/4] migration: Pass network packets received during switchover to dest VM

Posted by Stefano Brivio 1 week ago

On Fri, 30 Jan 2026 15:40:01 +0100
Juraj Marcin <jmarcin@redhat.com> wrote:

> Hi Stefano,
> 
> thanks for the answer!
> 
> On 2026-01-28 18:27, Stefano Brivio wrote:
> > On Wed, 28 Jan 2026 14:06:11 +0100
> > Juraj Marcin <jmarcin@redhat.com> wrote:
> >   
> > > Hi Stefano,
> > > 
> > > On 2026-01-27 19:21, Stefano Brivio wrote:  
> > > > [Cc'ing Laurent and David]
> > > > 
> > > > On Tue, 27 Jan 2026 15:03:06 +0100
> > > > Juraj Marcin <jmarcin@redhat.com> wrote:
> > > >     
> > > > > During switchover there is a period during which both source and
> > > > > destination side VMs are paused. During this period, all network packets
> > > > > are still routed to the source side, but it will never process them.
> > > > > Once the destination resumes, it is not aware of these packets and they
> > > > > are lost. This can cause packet loss in unreliable protocols and
> > > > > extended delays due to retransmission in reliable protocols.
> > > > > 
> > > > > This series resolves this problem by caching packets received once the
> > > > > source VM pauses and then passing and injecting them on the destination
> > > > > side. This feature is implemented in the last patch. The caching and
> > > > > injecting is implemented using network filter interface and should work
> > > > > with any backend with vhost=off, but only TAP network backend was
> > > > > explicitly tested.    
> > > > 
> > > > I haven't had a chance to try this change with passt(1) yet (the
> > > > backend can be enabled using "-net passt" or by starting it
> > > > separately).
> > > > 
> > > > Given that passt implements migration on its own (in deeper detail in
> > > > some sense, as TCP connections are preserved if IP addresses match), I
> > > > wonder if it this might affect or break it somehow.
> > > > 
> > > > Did you perhaps have some thoughts about that already?    
> > > 
> > > I'm aware of passt migrating its state and passt-repair, but I also
> > > haven't tested it as I couldn't get passt-repair to work.  
> > 
> > Oops. Let me know if you're hitting any specific error I could look
> > into.  
> 
> I tried it using this documentation [1] I found earlier, however, it
> wouldn't work when migrating on the same host as I expected from it. The
> destination passt process fails to get the port the outside TCP server
> is communicating with and I see the connection still as established with
> the source passt process. This is the specific error message from the
> destination passt process:
> 
>     Flow 0 (TCP connection): Failed to connect migrated socket: Cannot assign requested address
> 
> [1]: https://www.qemu.org/docs/master/system/devices/net.html#example-of-migration-of-a-guest-on-the-same-host

Ouch, I see.

Laurent wrote this part of documentation showing the QEMU-related bits
of the migration workflow, but we should have updated it with an
example with real TCP flows, because in that case you can't have the
two instances of QEMU and passt running in the same namespace of the
same machine: ports and addresses will conflict.

There are two alternatives to test migration of actual flows.

1. two namespaces, same machine, with one instance of passt and one
   instance of QEMU in each.

   Testing a connection from guest to host can be done with a simple
   client/server pair, whereas, the other way around, you need some
   form of proxying (see the 'bidirectional' example below).

   This is what we do in passt's upstream tests (for a sample run, see
   https://passt.top/#continuous-integration, skip to 'migrate/basic'
   using the links on the bottom). The setup function is here:

     https://passt.top/passt/tree/test/lib/setup?id=e3f70c05bad90368a1a89bf31a9015125232b9ae#n308

   and these are the test directives themselves:

     https://passt.top/passt/tree/test/migrate/basic
     https://passt.top/passt/tree/test/migrate/bidirectional

   ...if you want to try and run these tests, see
   https://passt.top/passt/tree/test/README.md. The test suite has quite
   a few dependencies and it might take a bit of effort to run the
   whole thing, but if you 'make assets' under test/ and then select
   one single test instead, with './run migrate/basic', it should be
   practical.

   I can write up "stand-alone" instructions based on that if needed.

2. two virtual machines, bridged (no need for root if you detach a
   network namespace on the host), migrating nested guests. Assuming
   three terminals (host, source, target), and a libvirt domain named
   "alpine" *inside* L1 guests (no need for libvirt, it just makes the
   write-up a bit more terse):

---
   [host]
   $ unshare -rUn
   # echo $$	# let's call this TARGET_PID
   # ip link set dev lo up


   [source]
   $ nsenter --preserve-credentials -U -n -t $TARGET_PID
   # qemu-system-x86_64 -machine accel=kvm -cpu host ... -nographic -serial mon:stdio -nodefaults -m 4G -netdev tap,id=n,script=no -device virtio-net,netdev=n

   ...in the guest, once it starts:

   # service NetworkManager stop	# if you have it
   # ip link set dev eth0 up
   # ip addr add dev eth0 10.0.0.1/24
   # ip route add default dev eth0
   # ip link set dev eth0 addr 52:54:00:12:34:57


   [target]
   $ nsenter --preserve-credentials -U -n -t $TARGET_PID
   # qemu-system-x86_64 -machine accel=kvm -cpu host ... -nographic -serial mon:stdio -nodefaults -m 4G -netdev tap,id=n,script=no -device virtio-net,netdev=n

   ...in the guest, once it starts:

   # service NetworkManager stop	# if you have it
   # ip link set dev eth0 up
   # ip addr add dev eth0 10.0.0.2/24
   # ip route add default dev eth0


   [host]
   # ip link set dev tap0 up
   # ip link set dev tap1 up
   # ip link add dev br0 type bridge
   # ip link set dev tap0 master br0
   # ip link set dev tap1 master br0
   # ip addr add dev br0 10.0.0.3/24

   check that we can reach the target
   # ping 10.0.0.2

   start the test server
   # ip addr add dev br0 172.16.0.3/24
   # nc -l -p 8080


   [*both* source and target]
   # ip addr add dev eth0 172.16.0.1/25 # make sure passt picks this address for the guests, as it's more specific than a 10.0.0.0/24, it's /25
   # ip route add default via 172.16.0.100 # and add a default route just so that the guest has one, but we don't need this


   [source (use another terminal, or run passt-repair in background)]
   # mkdir /run/user/1001/libvirt/qemu/run/passt/
   # passt-repair /run/user/1001/libvirt/qemu/run/passt/


   [source]
   $ virsh start --console alpine

   ...in the guest, once it starts:

   $ nc 172.16.0.3 8080
   start typing, before migration


   [source (use another terminal, or escape console while keeping nc running)]
   $ virsh migrate --verbose --p2p --live --unsafe alpine --tunneled qemu+ssh://10.0.0.2/session
---

   if you reverse the direction of the connection, you'll need two
   bridges, one for migration data and one for the test connection
   itself.

   Otherwise, the kernel on the source L1 guest will manage to send a
   RST to your client (on L0) as soon as the connection continues on
   the target, because ACK segments from the target will reach the
   source (they're bridged), but the source has no open socket at this
   point.

   With two bridges, you can "unplug" the source target (test
   connection / tap interface only) before migrating. As an alternative,
   you could drop RST segments using nftables.

   I can clean up my notes for these additional steps if anybody is
   interested. Eventually, I guess, it should all become part of QEMU's
   documentation.

> > I plan anyway to try out your changes but I might need a couple of days
> > before I find the time.
> >   
> > > Does it also handle other protocols, or just preserves TCP connections?  
> > 
> > Layer-4-wise, we have an internal representation of UDP "flows"
> > (observed flows of packets for which we preserve the same source port
> > mapping, with timeouts) and we had a vague idea of migrating those as
> > well, but it's debatable where there's any benefit from it.
> > 
> > At Layer 2 and 3, we migrate IP and MAC addresses we observed from the
> > guest:
> > 
> >   https://passt.top/passt/tree/migrate.c?id=e3f70c05bad90368a1a89bf31a9015125232b9ae#n31
> > 
> > so that we have ARP and NDP resolution, as well as any NAT
> > mapping working right away as needed.
> > 
> > For completeness, this is the TCP context we migrate instead:
> > 
> >   https://passt.top/passt/tree/tcp_conn.h?id=e3f70c05bad90368a1a89bf31a9015125232b9ae#n108
> >   https://passt.top/passt/tree/tcp_conn.h?id=e3f70c05bad90368a1a89bf31a9015125232b9ae#n154
> >   
> > > The main focus of this feature are protocols that cannot handle packet
> > > loss on their own in environments where IP address is preserved (and
> > > thus also TCP connections).  
> > 
> > Well, strictly speaking, TCP handles packet loss, that's actually the
> > main reason behind it. I guess this is to improve throughput and avoid
> > latency spikes or retransmissions that could be avoided?  
> 
> Sorry, I actually meant that all connections are preserved. The main
> goal is to prevent losses with protocols other than TCP when possible,
> which was requested by our Solution Architects. Possible improved TCP
> throughput due to avoided retransmissions is just a side effect of that.

Interesting... you mean UDP? Or non-IP protocols?

For some typical UDP applications (realtime audio/video streams) I
generally expect delayed datagrams (more of them) to be worse than some
lost datagrams (fewer of them). This is part of the reason why I didn't
particularly care about that in passt. Well, as long as there's a way
to disable this mechanism, one could tune the configuration to their
needs.

> > > So, mainly tap/bridge, with the idea that
> > > other network backends could also benefit from it. However, if it causes
> > > problems with other backends, I could limit it just to tap.  
> > 
> > I couldn't quite figure out yet if it's beneficial, useless, or
> > harmless for passt. With passt, what happens without your
> > implementation is:
> > 
> > 1. guest pauses
> > 
> > 2. the source instance of passt starts migrating, meaning that sockets
> >    are frozen one by one, their receiving and sending queues dumped
> > 
> > 3. pending queues are sent to the target instance of passt, which opens
> >    sockets as refills queues as needed
> > 
> > 4. target guest resumes and will get any traffic that was received by
> >    the source instance of passt between 1. and 2.
> > 
> > Right now there's still a Linux kernel issue we observed (see also
> > https://pad.passt.top/p/TcpRepairTodo, that's line 4 there) which might
> > cause segments to be received (and acknowledged!) on sockets of the
> > source instance of passt for a small time period *after* we freeze them
> > with TCP_REPAIR (that is, TCP_REPAIR doesn't really freeze the queue).
> > 
> > I'm currently working on a proper fix for that. Until then, point 2.
> > above isn't entirely accurate (but it only happens if you hammer it
> > with traffic generators, it's not really visible otherwise).
> > 
> > With your implementation, I guess:
> > 
> > 1. guest pauses
> > 
> > 2. the source instance of passt starts migrating, meaning that sockets
> >    are frozen one by one, their receiving and sending queues dumped
> > 
> > 2a. any data received by QEMU after 1. will be stored and forwarded to
> >     the target later. But passt at this point prevents the guest from
> >     getting any data, so there should be no data involved
> > 
> > 3. pending queues are sent to the target instance of passt, which opens
> >    sockets as refills queues as needed
> > 
> > 3a. the target guest gets the data from 2a. As long as there's no data
> >     (as I'm assuming), there should be no change. If there's data coming
> >     in at this point, we risk that sequences don't match anymore? I'm not
> >     sure
> > 
> > 4. target guest resumes and will *also* get any traffic that was received
> >    by the source instance of passt between 1. and 2.
> > 
> > So if my assumption from 2a. above holds, it should be useless, but
> > harmless.
> > 
> > Would your implementation help with the kernel glitch we're currently
> > observing? I don't think so, because your implementation would only play
> > a role between passt and QEMU, and we don't have issues there.
> > 
> > Well, it would be good to try things out. Other than that, unless I'm
> > missing something, your implementation should probably be skipped for
> > passt for simplicity, and also to avoid negatively affecting downtime.  
> 
> I agree with skipping passt in such case, although, I haven't perceived
> any effect on downtime. Cached network packets are sent after the
> destination resumes, so that the network knows about new location of the
> VM and the source shouldn't receive any more packets intended for it.
> 
> > Note that you can also use passt without "-net passt" (that's actually
> > quite recent) but with a tap back-end. Migration is only supported with
> > vhost-user enabled though, and as far as I understand your implementation
> > is disabled in that case?  
> 
> As of now it is disabled in that case as network filters don't support
> vhost.

Is that something you plan to fix / change in the future, though? In
that case, I would try to check how this works with passt in a bit more
detail (now or later).

-- 
Stefano

Re: [PATCH 0/4] migration: Pass network packets received during switchover to dest VM

Posted by Juraj Marcin 2 days, 17 hours ago

Hi Stefano,

On 2026-01-31 03:27, Stefano Brivio wrote:
> On Fri, 30 Jan 2026 15:40:01 +0100
> Juraj Marcin <jmarcin@redhat.com> wrote:
> 
> > Hi Stefano,
> > 
> > thanks for the answer!
> > 
> > On 2026-01-28 18:27, Stefano Brivio wrote:
> > > On Wed, 28 Jan 2026 14:06:11 +0100
> > > Juraj Marcin <jmarcin@redhat.com> wrote:
> > >   
> > > > Hi Stefano,
> > > > 
> > > > On 2026-01-27 19:21, Stefano Brivio wrote:  
> > > > > [Cc'ing Laurent and David]
> > > > > 
> > > > > On Tue, 27 Jan 2026 15:03:06 +0100
> > > > > Juraj Marcin <jmarcin@redhat.com> wrote:
> > > > >     
> > > > > > During switchover there is a period during which both source and
> > > > > > destination side VMs are paused. During this period, all network packets
> > > > > > are still routed to the source side, but it will never process them.
> > > > > > Once the destination resumes, it is not aware of these packets and they
> > > > > > are lost. This can cause packet loss in unreliable protocols and
> > > > > > extended delays due to retransmission in reliable protocols.
> > > > > > 
> > > > > > This series resolves this problem by caching packets received once the
> > > > > > source VM pauses and then passing and injecting them on the destination
> > > > > > side. This feature is implemented in the last patch. The caching and
> > > > > > injecting is implemented using network filter interface and should work
> > > > > > with any backend with vhost=off, but only TAP network backend was
> > > > > > explicitly tested.    
> > > > > 
> > > > > I haven't had a chance to try this change with passt(1) yet (the
> > > > > backend can be enabled using "-net passt" or by starting it
> > > > > separately).
> > > > > 
> > > > > Given that passt implements migration on its own (in deeper detail in
> > > > > some sense, as TCP connections are preserved if IP addresses match), I
> > > > > wonder if it this might affect or break it somehow.
> > > > > 
> > > > > Did you perhaps have some thoughts about that already?    
> > > > 
> > > > I'm aware of passt migrating its state and passt-repair, but I also
> > > > haven't tested it as I couldn't get passt-repair to work.  
> > > 
> > > Oops. Let me know if you're hitting any specific error I could look
> > > into.  
> > 
> > I tried it using this documentation [1] I found earlier, however, it
> > wouldn't work when migrating on the same host as I expected from it. The
> > destination passt process fails to get the port the outside TCP server
> > is communicating with and I see the connection still as established with
> > the source passt process. This is the specific error message from the
> > destination passt process:
> > 
> >     Flow 0 (TCP connection): Failed to connect migrated socket: Cannot assign requested address
> > 
> > [1]: https://www.qemu.org/docs/master/system/devices/net.html#example-of-migration-of-a-guest-on-the-same-host
> 
> Ouch, I see.
> 
> Laurent wrote this part of documentation showing the QEMU-related bits
> of the migration workflow, but we should have updated it with an
> example with real TCP flows, because in that case you can't have the
> two instances of QEMU and passt running in the same namespace of the
> same machine: ports and addresses will conflict.
> 
> There are two alternatives to test migration of actual flows.
> 
> 1. two namespaces, same machine, with one instance of passt and one
>    instance of QEMU in each.
> 
>    Testing a connection from guest to host can be done with a simple
>    client/server pair, whereas, the other way around, you need some
>    form of proxying (see the 'bidirectional' example below).
> 
>    This is what we do in passt's upstream tests (for a sample run, see
>    https://passt.top/#continuous-integration, skip to 'migrate/basic'
>    using the links on the bottom). The setup function is here:
> 
>      https://passt.top/passt/tree/test/lib/setup?id=e3f70c05bad90368a1a89bf31a9015125232b9ae#n308
> 
>    and these are the test directives themselves:
> 
>      https://passt.top/passt/tree/test/migrate/basic
>      https://passt.top/passt/tree/test/migrate/bidirectional
> 
>    ...if you want to try and run these tests, see
>    https://passt.top/passt/tree/test/README.md. The test suite has quite
>    a few dependencies and it might take a bit of effort to run the
>    whole thing, but if you 'make assets' under test/ and then select
>    one single test instead, with './run migrate/basic', it should be
>    practical.
> 
>    I can write up "stand-alone" instructions based on that if needed.
> 
> 2. two virtual machines, bridged (no need for root if you detach a
>    network namespace on the host), migrating nested guests. Assuming
>    three terminals (host, source, target), and a libvirt domain named
>    "alpine" *inside* L1 guests (no need for libvirt, it just makes the
>    write-up a bit more terse):
> 
> ---
>    [host]
>    $ unshare -rUn
>    # echo $$	# let's call this TARGET_PID
>    # ip link set dev lo up
> 
> 
>    [source]
>    $ nsenter --preserve-credentials -U -n -t $TARGET_PID
>    # qemu-system-x86_64 -machine accel=kvm -cpu host ... -nographic -serial mon:stdio -nodefaults -m 4G -netdev tap,id=n,script=no -device virtio-net,netdev=n
> 
>    ...in the guest, once it starts:
> 
>    # service NetworkManager stop	# if you have it
>    # ip link set dev eth0 up
>    # ip addr add dev eth0 10.0.0.1/24
>    # ip route add default dev eth0
>    # ip link set dev eth0 addr 52:54:00:12:34:57
> 
> 
>    [target]
>    $ nsenter --preserve-credentials -U -n -t $TARGET_PID
>    # qemu-system-x86_64 -machine accel=kvm -cpu host ... -nographic -serial mon:stdio -nodefaults -m 4G -netdev tap,id=n,script=no -device virtio-net,netdev=n
> 
>    ...in the guest, once it starts:
> 
>    # service NetworkManager stop	# if you have it
>    # ip link set dev eth0 up
>    # ip addr add dev eth0 10.0.0.2/24
>    # ip route add default dev eth0
> 
> 
>    [host]
>    # ip link set dev tap0 up
>    # ip link set dev tap1 up
>    # ip link add dev br0 type bridge
>    # ip link set dev tap0 master br0
>    # ip link set dev tap1 master br0
>    # ip addr add dev br0 10.0.0.3/24
> 
>    check that we can reach the target
>    # ping 10.0.0.2
> 
>    start the test server
>    # ip addr add dev br0 172.16.0.3/24
>    # nc -l -p 8080
> 
> 
>    [*both* source and target]
>    # ip addr add dev eth0 172.16.0.1/25 # make sure passt picks this address for the guests, as it's more specific than a 10.0.0.0/24, it's /25
>    # ip route add default via 172.16.0.100 # and add a default route just so that the guest has one, but we don't need this
> 
> 
>    [source (use another terminal, or run passt-repair in background)]
>    # mkdir /run/user/1001/libvirt/qemu/run/passt/
>    # passt-repair /run/user/1001/libvirt/qemu/run/passt/
> 
> 
>    [source]
>    $ virsh start --console alpine
> 
>    ...in the guest, once it starts:
> 
>    $ nc 172.16.0.3 8080
>    start typing, before migration
> 
> 
>    [source (use another terminal, or escape console while keeping nc running)]
>    $ virsh migrate --verbose --p2p --live --unsafe alpine --tunneled qemu+ssh://10.0.0.2/session
> ---
> 
>    if you reverse the direction of the connection, you'll need two
>    bridges, one for migration data and one for the test connection
>    itself.
> 
>    Otherwise, the kernel on the source L1 guest will manage to send a
>    RST to your client (on L0) as soon as the connection continues on
>    the target, because ACK segments from the target will reach the
>    source (they're bridged), but the source has no open socket at this
>    point.
> 
>    With two bridges, you can "unplug" the source target (test
>    connection / tap interface only) before migrating. As an alternative,
>    you could drop RST segments using nftables.
> 
>    I can clean up my notes for these additional steps if anybody is
>    interested. Eventually, I guess, it should all become part of QEMU's
>    documentation.
> 

thank you very much, I will try that.

> > > I plan anyway to try out your changes but I might need a couple of days
> > > before I find the time.
> > >   
> > > > Does it also handle other protocols, or just preserves TCP connections?  
> > > 
> > > Layer-4-wise, we have an internal representation of UDP "flows"
> > > (observed flows of packets for which we preserve the same source port
> > > mapping, with timeouts) and we had a vague idea of migrating those as
> > > well, but it's debatable where there's any benefit from it.
> > > 
> > > At Layer 2 and 3, we migrate IP and MAC addresses we observed from the
> > > guest:
> > > 
> > >   https://passt.top/passt/tree/migrate.c?id=e3f70c05bad90368a1a89bf31a9015125232b9ae#n31
> > > 
> > > so that we have ARP and NDP resolution, as well as any NAT
> > > mapping working right away as needed.
> > > 
> > > For completeness, this is the TCP context we migrate instead:
> > > 
> > >   https://passt.top/passt/tree/tcp_conn.h?id=e3f70c05bad90368a1a89bf31a9015125232b9ae#n108
> > >   https://passt.top/passt/tree/tcp_conn.h?id=e3f70c05bad90368a1a89bf31a9015125232b9ae#n154
> > >   
> > > > The main focus of this feature are protocols that cannot handle packet
> > > > loss on their own in environments where IP address is preserved (and
> > > > thus also TCP connections).  
> > > 
> > > Well, strictly speaking, TCP handles packet loss, that's actually the
> > > main reason behind it. I guess this is to improve throughput and avoid
> > > latency spikes or retransmissions that could be avoided?  
> > 
> > Sorry, I actually meant that all connections are preserved. The main
> > goal is to prevent losses with protocols other than TCP when possible,
> > which was requested by our Solution Architects. Possible improved TCP
> > throughput due to avoided retransmissions is just a side effect of that.
> 
> Interesting... you mean UDP? Or non-IP protocols?
> 
> For some typical UDP applications (realtime audio/video streams) I
> generally expect delayed datagrams (more of them) to be worse than some
> lost datagrams (fewer of them). This is part of the reason why I didn't
> particularly care about that in passt. Well, as long as there's a way
> to disable this mechanism, one could tune the configuration to their
> needs.

The original request from Solution Architects was demonstrated
using ICMP Echo Requests, but it should work for any protocol. While, yes,
it will add certain delay to the packets, depending on the user's use
case it might be better than loosing them altogether. User can decide
and configure to their needs.

However, in case of strict realtime workloads, migration might off the
table altogether as the switchover during which all CPUs are paused is
unavoidable.

> 
> > > > So, mainly tap/bridge, with the idea that
> > > > other network backends could also benefit from it. However, if it causes
> > > > problems with other backends, I could limit it just to tap.  
> > > 
> > > I couldn't quite figure out yet if it's beneficial, useless, or
> > > harmless for passt. With passt, what happens without your
> > > implementation is:
> > > 
> > > 1. guest pauses
> > > 
> > > 2. the source instance of passt starts migrating, meaning that sockets
> > >    are frozen one by one, their receiving and sending queues dumped
> > > 
> > > 3. pending queues are sent to the target instance of passt, which opens
> > >    sockets as refills queues as needed
> > > 
> > > 4. target guest resumes and will get any traffic that was received by
> > >    the source instance of passt between 1. and 2.
> > > 
> > > Right now there's still a Linux kernel issue we observed (see also
> > > https://pad.passt.top/p/TcpRepairTodo, that's line 4 there) which might
> > > cause segments to be received (and acknowledged!) on sockets of the
> > > source instance of passt for a small time period *after* we freeze them
> > > with TCP_REPAIR (that is, TCP_REPAIR doesn't really freeze the queue).
> > > 
> > > I'm currently working on a proper fix for that. Until then, point 2.
> > > above isn't entirely accurate (but it only happens if you hammer it
> > > with traffic generators, it's not really visible otherwise).
> > > 
> > > With your implementation, I guess:
> > > 
> > > 1. guest pauses
> > > 
> > > 2. the source instance of passt starts migrating, meaning that sockets
> > >    are frozen one by one, their receiving and sending queues dumped
> > > 
> > > 2a. any data received by QEMU after 1. will be stored and forwarded to
> > >     the target later. But passt at this point prevents the guest from
> > >     getting any data, so there should be no data involved
> > > 
> > > 3. pending queues are sent to the target instance of passt, which opens
> > >    sockets as refills queues as needed
> > > 
> > > 3a. the target guest gets the data from 2a. As long as there's no data
> > >     (as I'm assuming), there should be no change. If there's data coming
> > >     in at this point, we risk that sequences don't match anymore? I'm not
> > >     sure
> > > 
> > > 4. target guest resumes and will *also* get any traffic that was received
> > >    by the source instance of passt between 1. and 2.
> > > 
> > > So if my assumption from 2a. above holds, it should be useless, but
> > > harmless.
> > > 
> > > Would your implementation help with the kernel glitch we're currently
> > > observing? I don't think so, because your implementation would only play
> > > a role between passt and QEMU, and we don't have issues there.
> > > 
> > > Well, it would be good to try things out. Other than that, unless I'm
> > > missing something, your implementation should probably be skipped for
> > > passt for simplicity, and also to avoid negatively affecting downtime.  
> > 
> > I agree with skipping passt in such case, although, I haven't perceived
> > any effect on downtime. Cached network packets are sent after the
> > destination resumes, so that the network knows about new location of the
> > VM and the source shouldn't receive any more packets intended for it.
> > 
> > > Note that you can also use passt without "-net passt" (that's actually
> > > quite recent) but with a tap back-end. Migration is only supported with
> > > vhost-user enabled though, and as far as I understand your implementation
> > > is disabled in that case?  
> > 
> > As of now it is disabled in that case as network filters don't support
> > vhost.
> 
> Is that something you plan to fix / change in the future, though? In
> that case, I would try to check how this works with passt in a bit more
> detail (now or later).

Yes, we are planning to implement such feature also with vhost.

> 
> -- 
> Stefano
>