include/io/channel.h | 15 +++ include/migration/colo.h | 6 +- migration/migration.h | 109 +++++++++++++++++-- migration/savevm.h | 4 +- hw/vfio/migration-multifd.c | 3 - io/channel.c | 21 ++-- migration/channel.c | 7 +- migration/colo-stubs.c | 2 +- migration/colo.c | 26 ++--- migration/migration.c | 81 ++++++++------ migration/qemu-file.c | 6 +- migration/ram.c | 13 +-- migration/rdma.c | 204 ++++++++---------------------------- migration/savevm.c | 98 +++++++++-------- migration/trace-events | 4 +- 15 files changed, 291 insertions(+), 308 deletions(-)
This is v1, however not 10.2 material. The earliest I see fit would still
be 11.0+ even if everything goes extremely smooth.
Removal of RFC is only about that I'm more confident this should be able to
land without breaking something too easily, as I smoked it slightly more
cross-archs this time. AFAIU the best (and possibly only..) way to prove
it solid is to merge it.. likely in the early phase of a dev cycle.
The plan is we'll try to get to more device setups too soon, before it
could land.
Background
==========
Nowadays, live migration heavily depends on threads. For example, most of
the major features that will be used nowadays in live migration (multifd,
postcopy, mapped-ram, vfio, etc.) all work with threads internally.
But still, from time to time, we'll see some coroutines floating around the
migration context. The major one is precopy's loadvm, which is internally
a coroutine. It is still a critical path that any live migration depends on.
A mixture of using both coroutines and threads is prone to issues. Some
examples can refer to commit e65cec5e5d ("migration/ram: Yield periodically
to the main loop") or commit 7afbdada7e ("migration/postcopy: ensure
preempt channel is ready before loading states").
It was a coroutine since this work (thanks to Fabiano, the archeologist,
digging the link):
https://lists.gnu.org/archive/html/qemu-devel/2012-08/msg01136.html
Overview
========
This series tries to move migration further into the thread-based model, by
allowing the loadvm process to happen in a thread rather than in the main
thread with a coroutine.
Luckily, since the qio channel code is always ready for both cases, IO
paths should all be fine.
Note that loadvm for postcopy already happens in a ram load thread which is
separate. However, RAM is just the simple case here, even it has its own
challenges (on atomically update of the pgtables), its complexity lies in
the kernel.
For precopy, loadvm has quite a few operations that will need BQL. The
question is we can't take BQL for the whole process of loadvm, because
that'll block the main thread from executions (e.g. QMP hangs). Here, the
finer granule we can push BQL the better. This series so far chose
somewhere in the middle, by taking BQL on majorly these two places:
- CPU synchronizations
- Device START/FULL sections
After this series applied, most of the rest loadvm path will run without
BQL anymore. There is a more detailed discussion / todo in the commit
message of patch "migration: Thread-ify precopy vmstate load process"
explaning how to further split the BQL critical sections.
After the series applied, the only leftover pieces in migration/ that would
use a coroutine is snapshot save/load/delete jobs.
Tests
=====
Default CI passes.
RDMA unit tests pass as usual. I also tried out cancellation / failure
tests over RDMA channels, making sure nothing is stuck.
I also roughly measured how long it takes to run the whole 80+ migration
qtest suite, and see no measurable difference before / after this series.
I didn't test COLO, I wanted to but the doc example didn't work.
Risks
=====
This series has the risk of breaking things. I would be surprised if it
didn't..
The current way of taking BQL during FULL section load may cause issues, it
means when the IOs are unstable we could be waiting for IO (in the new
migration incoming thread) with BQL held. This is low possibility, though,
only happens when the network halts during flushing the device states.
However still possible. One solution is to further breakdown the BQL
critical sections to smaller sections, as mentioned in TODO.
Anything more than welcomed: suggestions, questions, objections, tests..
TODO
====
- Finer grained BQL breakdown
Peter Xu (13):
io: Add qio_channel_wait_cond() helper
migration: Properly wait on G_IO_IN when peeking messages
migration/rdma: Fix wrong context in qio_channel_rdma_shutdown()
migration/rdma: Allow qemu_rdma_wait_comp_channel work with thread
migration/rdma: Change io_create_watch() to return immediately
migration: Introduce WITH_BQL_HELD() / WITH_BQL_RELEASED()
migration: Pass in bql_held information from qemu_loadvm_state()
migration: Thread-ify precopy vmstate load process
migration/rdma: Remove coroutine path in qemu_rdma_wait_comp_channel
migration/postcopy: Remove workaround on wait preempt channel
migration/ram: Remove workaround on ram yield during load
migration: Allow blocking mode for incoming live migration
migration/vfio: Drop BQL dependency for loadvm SWITCHOVER_START
include/io/channel.h | 15 +++
include/migration/colo.h | 6 +-
migration/migration.h | 109 +++++++++++++++++--
migration/savevm.h | 4 +-
hw/vfio/migration-multifd.c | 3 -
io/channel.c | 21 ++--
migration/channel.c | 7 +-
migration/colo-stubs.c | 2 +-
migration/colo.c | 26 ++---
migration/migration.c | 81 ++++++++------
migration/qemu-file.c | 6 +-
migration/ram.c | 13 +--
migration/rdma.c | 204 ++++++++----------------------------
migration/savevm.c | 98 +++++++++--------
migration/trace-events | 4 +-
15 files changed, 291 insertions(+), 308 deletions(-)
--
2.50.1
On Wed, Oct 22, 2025 at 03:25:59PM -0400, Peter Xu wrote:
> This is v1, however not 10.2 material. The earliest I see fit would still
> be 11.0+ even if everything goes extremely smooth.
>
> Removal of RFC is only about that I'm more confident this should be able to
> land without breaking something too easily, as I smoked it slightly more
> cross-archs this time. AFAIU the best (and possibly only..) way to prove
> it solid is to merge it.. likely in the early phase of a dev cycle.
>
> The plan is we'll try to get to more device setups too soon, before it
> could land.
I forgot to attach a changelog, sorry, here it is:
rfc->v1:
- Collected tags
- Try to split the major patch 5 to smaller one [Dave]
- One with WITH_BQL_*() macros, rewritten to allow internal "return" or
nesting [Vladimir]
- One patch "migration: Pass in bql_held information from
qemu_loadvm_state()", the patch itself contain no functional change.
However it should contain all changes relevant to BQL, hence maybe it
would help review.
- One patch contains the rest that need to be in one patch.
- Refine commit message for patch "migration/rdma: Change io_create_watch()
to return immediately" [Fabiano]
- Pre-requisite patch to rework migration_channel_read_peek(), removing the
sleep functions [Fabiano]
- Added patch "io: Add qio_channel_wait_cond() helper"
- Added patch "migration: Properly wait on G_IO_IN when peeking messages"
- Squashed patch "migration/rdma: Remove rdma_cm_poll_handler" into patch
"migration: Thread-ify precopy vmstate load process", otherwise the patch
when split can be racy.
- Changed the vfio patch to be at the end, instead of adding dead code
directly remove the bql lock/unlock, making sure that API is invoked
without BQL instead.
- Added one patch "migration: Allow blocking mode for incoming live
migration", so that after the threadify work switching incoming main
channel to be in blocking mode. Reason in commit message.
It may not be complete, so there can be small touch ups here and there that
are not mentioned.
--
Peter Xu
On Wed, 22 Oct 2025 15:25:59 -0400
Peter Xu <peterx@redhat.com> wrote:
> This is v1, however not 10.2 material. The earliest I see fit would still
> be 11.0+ even if everything goes extremely smooth.
>
> Removal of RFC is only about that I'm more confident this should be able to
> land without breaking something too easily, as I smoked it slightly more
> cross-archs this time. AFAIU the best (and possibly only..) way to prove
> it solid is to merge it.. likely in the early phase of a dev cycle.
>
> The plan is we'll try to get to more device setups too soon, before it
> could land.
>
> Background
> ==========
>
> Nowadays, live migration heavily depends on threads. For example, most of
> the major features that will be used nowadays in live migration (multifd,
> postcopy, mapped-ram, vfio, etc.) all work with threads internally.
>
> But still, from time to time, we'll see some coroutines floating around the
> migration context. The major one is precopy's loadvm, which is internally
> a coroutine. It is still a critical path that any live migration depends on.
>
> A mixture of using both coroutines and threads is prone to issues. Some
> examples can refer to commit e65cec5e5d ("migration/ram: Yield periodically
> to the main loop") or commit 7afbdada7e ("migration/postcopy: ensure
> preempt channel is ready before loading states").
>
> It was a coroutine since this work (thanks to Fabiano, the archeologist,
> digging the link):
>
> https://lists.gnu.org/archive/html/qemu-devel/2012-08/msg01136.html
>
> [...]
>
> Tests
> =====
>
> Default CI passes.
>
> RDMA unit tests pass as usual. I also tried out cancellation / failure
> tests over RDMA channels, making sure nothing is stuck.
>
> I also roughly measured how long it takes to run the whole 80+ migration
> qtest suite, and see no measurable difference before / after this series.
>
> I didn't test COLO, I wanted to but the doc example didn't work.
>
> Risks
> =====
>
> This series has the risk of breaking things. I would be surprised if it
> didn't..
>
> The current way of taking BQL during FULL section load may cause issues, it
> means when the IOs are unstable we could be waiting for IO (in the new
> migration incoming thread) with BQL held. This is low possibility, though,
> only happens when the network halts during flushing the device states.
> However still possible. One solution is to further breakdown the BQL
> critical sections to smaller sections, as mentioned in TODO.
>
> Anything more than welcomed: suggestions, questions, objections, tests..
>
> TODO
> ====
>
> - Finer grained BQL breakdown
>
> Peter Xu (13):
> io: Add qio_channel_wait_cond() helper
> migration: Properly wait on G_IO_IN when peeking messages
> migration/rdma: Fix wrong context in qio_channel_rdma_shutdown()
> migration/rdma: Allow qemu_rdma_wait_comp_channel work with thread
> migration/rdma: Change io_create_watch() to return immediately
> migration: Introduce WITH_BQL_HELD() / WITH_BQL_RELEASED()
> migration: Pass in bql_held information from qemu_loadvm_state()
> migration: Thread-ify precopy vmstate load process
> migration/rdma: Remove coroutine path in qemu_rdma_wait_comp_channel
> migration/postcopy: Remove workaround on wait preempt channel
> migration/ram: Remove workaround on ram yield during load
> migration: Allow blocking mode for incoming live migration
> migration/vfio: Drop BQL dependency for loadvm SWITCHOVER_START
>
> include/io/channel.h | 15 +++
> include/migration/colo.h | 6 +-
> migration/migration.h | 109 +++++++++++++++++--
> migration/savevm.h | 4 +-
> hw/vfio/migration-multifd.c | 3 -
> io/channel.c | 21 ++--
> migration/channel.c | 7 +-
> migration/colo-stubs.c | 2 +-
> migration/colo.c | 26 ++---
> migration/migration.c | 81 ++++++++------
> migration/qemu-file.c | 6 +-
> migration/ram.c | 13 +--
> migration/rdma.c | 204 ++++++++----------------------------
> migration/savevm.c | 98 +++++++++--------
> migration/trace-events | 4 +-
> 15 files changed, 291 insertions(+), 308 deletions(-)
>
Works well in my COLO testing. Fro the whole series:
Tested-by: Lukas Straub <lukasstraub2@web.de>
On Sat, Jan 17, 2026 at 03:00:37PM +0100, Lukas Straub wrote:
> On Wed, 22 Oct 2025 15:25:59 -0400
> Peter Xu <peterx@redhat.com> wrote:
>
> > This is v1, however not 10.2 material. The earliest I see fit would still
> > be 11.0+ even if everything goes extremely smooth.
> >
> > Removal of RFC is only about that I'm more confident this should be able to
> > land without breaking something too easily, as I smoked it slightly more
> > cross-archs this time. AFAIU the best (and possibly only..) way to prove
> > it solid is to merge it.. likely in the early phase of a dev cycle.
> >
> > The plan is we'll try to get to more device setups too soon, before it
> > could land.
> >
> > Background
> > ==========
> >
> > Nowadays, live migration heavily depends on threads. For example, most of
> > the major features that will be used nowadays in live migration (multifd,
> > postcopy, mapped-ram, vfio, etc.) all work with threads internally.
> >
> > But still, from time to time, we'll see some coroutines floating around the
> > migration context. The major one is precopy's loadvm, which is internally
> > a coroutine. It is still a critical path that any live migration depends on.
> >
> > A mixture of using both coroutines and threads is prone to issues. Some
> > examples can refer to commit e65cec5e5d ("migration/ram: Yield periodically
> > to the main loop") or commit 7afbdada7e ("migration/postcopy: ensure
> > preempt channel is ready before loading states").
> >
> > It was a coroutine since this work (thanks to Fabiano, the archeologist,
> > digging the link):
> >
> > https://lists.gnu.org/archive/html/qemu-devel/2012-08/msg01136.html
> >
> > [...]
> >
> > Tests
> > =====
> >
> > Default CI passes.
> >
> > RDMA unit tests pass as usual. I also tried out cancellation / failure
> > tests over RDMA channels, making sure nothing is stuck.
> >
> > I also roughly measured how long it takes to run the whole 80+ migration
> > qtest suite, and see no measurable difference before / after this series.
> >
> > I didn't test COLO, I wanted to but the doc example didn't work.
> >
> > Risks
> > =====
> >
> > This series has the risk of breaking things. I would be surprised if it
> > didn't..
> >
> > The current way of taking BQL during FULL section load may cause issues, it
> > means when the IOs are unstable we could be waiting for IO (in the new
> > migration incoming thread) with BQL held. This is low possibility, though,
> > only happens when the network halts during flushing the device states.
> > However still possible. One solution is to further breakdown the BQL
> > critical sections to smaller sections, as mentioned in TODO.
> >
> > Anything more than welcomed: suggestions, questions, objections, tests..
> >
> > TODO
> > ====
> >
> > - Finer grained BQL breakdown
> >
> > Peter Xu (13):
> > io: Add qio_channel_wait_cond() helper
> > migration: Properly wait on G_IO_IN when peeking messages
> > migration/rdma: Fix wrong context in qio_channel_rdma_shutdown()
> > migration/rdma: Allow qemu_rdma_wait_comp_channel work with thread
> > migration/rdma: Change io_create_watch() to return immediately
> > migration: Introduce WITH_BQL_HELD() / WITH_BQL_RELEASED()
> > migration: Pass in bql_held information from qemu_loadvm_state()
> > migration: Thread-ify precopy vmstate load process
> > migration/rdma: Remove coroutine path in qemu_rdma_wait_comp_channel
> > migration/postcopy: Remove workaround on wait preempt channel
> > migration/ram: Remove workaround on ram yield during load
> > migration: Allow blocking mode for incoming live migration
> > migration/vfio: Drop BQL dependency for loadvm SWITCHOVER_START
> >
> > include/io/channel.h | 15 +++
> > include/migration/colo.h | 6 +-
> > migration/migration.h | 109 +++++++++++++++++--
> > migration/savevm.h | 4 +-
> > hw/vfio/migration-multifd.c | 3 -
> > io/channel.c | 21 ++--
> > migration/channel.c | 7 +-
> > migration/colo-stubs.c | 2 +-
> > migration/colo.c | 26 ++---
> > migration/migration.c | 81 ++++++++------
> > migration/qemu-file.c | 6 +-
> > migration/ram.c | 13 +--
> > migration/rdma.c | 204 ++++++++----------------------------
> > migration/savevm.c | 98 +++++++++--------
> > migration/trace-events | 4 +-
> > 15 files changed, 291 insertions(+), 308 deletions(-)
> >
>
> Works well in my COLO testing. Fro the whole series:
>
> Tested-by: Lukas Straub <lukasstraub2@web.de>
Thanks for the testing.
Instead of applying it all over, the major change on COLO is patch 8, I'll
move the tag over if no objections.
--
Peter Xu
© 2016 - 2026 Red Hat, Inc.