qapi/migration.json | 7 +- migration/migration.h | 1 + migration/savevm.h | 6 +- migration/migration.c | 209 +++++++++++++++++++++++------------- migration/savevm.c | 116 ++++++++------------ migration/vmstate.c | 6 +- tests/qtest/libqos/libqos.c | 3 +- migration/trace-events | 2 +- tests/qemu-iotests/194.out | 1 + tests/qemu-iotests/203.out | 1 + tests/qemu-iotests/234.out | 2 + tests/qemu-iotests/262.out | 1 + tests/qemu-iotests/280.out | 1 + 13 files changed, 200 insertions(+), 156 deletions(-)
CI: https://gitlab.com/peterx/qemu/-/pipelines/1625266692 (note: warning is present on rust stuff, but shouldn't be relevant) This series refactors the migration switchover path quite a bit. I started this work initially to measure the JSON writer overhead, but then I decided to cleanup the switchover path in general when I am at it altogether, as I wanted to do this for a long time. A few major things I tried to do: - About the JSON writer Currently, precopy migration always dumps a chunk of data called VM description (QEMU_VM_VMDESCRIPTION) for debugging purpose. That is a JSON blob explaining all the vmstates dumped in the migration stream. QEMU has a machine property suppress-vmdesc deciding whether migration will have that JSON chunk included. Postcopy does not have such JSON dump because postcopy is live session and it can't normally be debugged from stream level (e.g. as a streamed file). A tiny problem is we don't yet have a clue on how much cpu cycles we need to construct and dump these JSONs even if they're only for debugging, and even if suppress-vmdesc=on QEMU will still try to construct these JSONs (e.g. also for postcopy). This series has a few patches just to make sure the JSON blob won't be constructed if not needed (either postcopy, or suppress-vmdesc=on). I tried to measure the downtime diff with/without these changes, the time QEMU takes to construct / dump the JSON blob is still not measurable. So I suppose unconditionally having this is ok. Said that, let's still have these changes around so we avoid JSON operations if not needed. - DEVICE migration state QEMU has a very special DEVICE migration state, that only happens with precopy, and only when pause-before-switchover capability is enabled. Due to that specialty we can't merge precopy and postcopy code on switchover starts, because the state machine will be different. However after I checked the history and also with libvirt developers, this seems unnecessary. So I had one patch making DEVICE state to be the "switchover" phase for precopy/postcopy unconditionally. That will make the state machine much easier for both modes, meanwhile nothing is expected to break with it (but please still shoot if anyone knows / suspect something will, or could, break..). - General cleanups and fixes Most of the rest changes are random cleanups and fixes in the switchover path. E.g., postcopy_start() has some code that isn't easy to read due to some special flags here and there, mostly around the two calls of qemu_savevm_state_complete_precopy(). This series will remove most of those special treatments here and there. We could have done something twice in the past in postcopy switchover (e.g. I believe we sync CPU twice.. but only happens with postcopy), now they should all be sorted out. And quite some other things hopefully can be separately discussed and justified in each patch. After these cleanups, we will be able to have an unified entrance for precopy/postcopy on switchover. Initially I thought this could optimize the downtime slightly, but after some tests, it turns out there's no measureable difference, at least in my current setup... So let's take this as a cleanup series at least for now, and I hope they would still make some sense. Comments welcomed. Thanks, Peter Xu (16): migration: Remove postcopy implications in should_send_vmdesc() migration: Do not construct JSON description if suppressed migration: Optimize postcopy on downtime by avoiding JSON writer migration: Avoid two src-downtime-end tracepoints for postcopy migration: Drop inactivate_disk param in qemu_savevm_state_complete* migration: Synchronize all CPU states only for non-iterable dump migration: Adjust postcopy bandwidth during switchover migration: Adjust locking in migration_maybe_pause() migration: Drop cached migration state in migration_maybe_pause() migration: Take BQL slightly longer in postcopy_start() migration: Notify COMPLETE once for postcopy migration: Unwrap qemu_savevm_state_complete_precopy() in postcopy migration: Cleanup qemu_savevm_state_complete_precopy() migration: Always set DEVICE state migration: Merge precopy/postcopy on switchover start migration: Trivial cleanup on JSON writer of vmstate_save() qapi/migration.json | 7 +- migration/migration.h | 1 + migration/savevm.h | 6 +- migration/migration.c | 209 +++++++++++++++++++++++------------- migration/savevm.c | 116 ++++++++------------ migration/vmstate.c | 6 +- tests/qtest/libqos/libqos.c | 3 +- migration/trace-events | 2 +- tests/qemu-iotests/194.out | 1 + tests/qemu-iotests/203.out | 1 + tests/qemu-iotests/234.out | 2 + tests/qemu-iotests/262.out | 1 + tests/qemu-iotests/280.out | 1 + 13 files changed, 200 insertions(+), 156 deletions(-) -- 2.47.0
On 2025-01-14 18:07, Peter Xu wrote: > CI: https://gitlab.com/peterx/qemu/-/pipelines/1625266692 > (note: warning is present on rust stuff, but shouldn't be relevant) > > This series refactors the migration switchover path quite a bit. I started > this work initially to measure the JSON writer overhead, but then I decided > to cleanup the switchover path in general when I am at it altogether, as I > wanted to do this for a long time. > > A few major things I tried to do: > > - About the JSON writer > > Currently, precopy migration always dumps a chunk of data called VM > description (QEMU_VM_VMDESCRIPTION) for debugging purpose. That is a > JSON blob explaining all the vmstates dumped in the migration stream. > QEMU has a machine property suppress-vmdesc deciding whether migration > will have that JSON chunk included. > > Postcopy does not have such JSON dump because postcopy is live session > and it can't normally be debugged from stream level (e.g. as a streamed > file). > > A tiny problem is we don't yet have a clue on how much cpu cycles we > need to construct and dump these JSONs even if they're only for > debugging, and even if suppress-vmdesc=on QEMU will still try to > construct these JSONs (e.g. also for postcopy). > > This series has a few patches just to make sure the JSON blob won't be > constructed if not needed (either postcopy, or suppress-vmdesc=on). I > tried to measure the downtime diff with/without these changes, the time > QEMU takes to construct / dump the JSON blob is still not measurable. > So I suppose unconditionally having this is ok. Said that, let's still > have these changes around so we avoid JSON operations if not needed. > > - DEVICE migration state > > QEMU has a very special DEVICE migration state, that only happens with > precopy, and only when pause-before-switchover capability is enabled. > Due to that specialty we can't merge precopy and postcopy code on > switchover starts, because the state machine will be different. > > However after I checked the history and also with libvirt developers, > this seems unnecessary. So I had one patch making DEVICE state to be > the "switchover" phase for precopy/postcopy unconditionally. That will > make the state machine much easier for both modes, meanwhile nothing is > expected to break with it (but please still shoot if anyone knows / > suspect something will, or could, break..). > > - General cleanups and fixes > > Most of the rest changes are random cleanups and fixes in the > switchover path. > > E.g., postcopy_start() has some code that isn't easy to read due to > some special flags here and there, mostly around the two calls of > qemu_savevm_state_complete_precopy(). This series will remove most of > those special treatments here and there. > > We could have done something twice in the past in postcopy switchover > (e.g. I believe we sync CPU twice.. but only happens with postcopy), > now they should all be sorted out. > > And quite some other things hopefully can be separately discussed and > justified in each patch. After these cleanups, we will be able to have > an unified entrance for precopy/postcopy on switchover. > > Initially I thought this could optimize the downtime slightly, but after > some tests, it turns out there's no measureable difference, at least in my > current setup... So let's take this as a cleanup series at least for now, > and I hope they would still make some sense. Comments welcomed. > > Thanks, > > Peter Xu (16): > migration: Remove postcopy implications in should_send_vmdesc() > migration: Do not construct JSON description if suppressed > migration: Optimize postcopy on downtime by avoiding JSON writer > migration: Avoid two src-downtime-end tracepoints for postcopy > migration: Drop inactivate_disk param in qemu_savevm_state_complete* > migration: Synchronize all CPU states only for non-iterable dump > migration: Adjust postcopy bandwidth during switchover > migration: Adjust locking in migration_maybe_pause() > migration: Drop cached migration state in migration_maybe_pause() > migration: Take BQL slightly longer in postcopy_start() > migration: Notify COMPLETE once for postcopy > migration: Unwrap qemu_savevm_state_complete_precopy() in postcopy > migration: Cleanup qemu_savevm_state_complete_precopy() > migration: Always set DEVICE state > migration: Merge precopy/postcopy on switchover start > migration: Trivial cleanup on JSON writer of vmstate_save() > > qapi/migration.json | 7 +- > migration/migration.h | 1 + > migration/savevm.h | 6 +- > migration/migration.c | 209 +++++++++++++++++++++++------------- > migration/savevm.c | 116 ++++++++------------ > migration/vmstate.c | 6 +- > tests/qtest/libqos/libqos.c | 3 +- > migration/trace-events | 2 +- > tests/qemu-iotests/194.out | 1 + > tests/qemu-iotests/203.out | 1 + > tests/qemu-iotests/234.out | 2 + > tests/qemu-iotests/262.out | 1 + > tests/qemu-iotests/280.out | 1 + > 13 files changed, 200 insertions(+), 156 deletions(-) > > -- > 2.47.0 > All patches look good to me, nice refactor! Reviewed-by: Juraj Marcin <jmarcin@redhat.com> -- Juraj Marcin
On Tue, Jan 14, 2025 at 18:07:30 -0500, Peter Xu wrote: > CI: https://gitlab.com/peterx/qemu/-/pipelines/1625266692 > (note: warning is present on rust stuff, but shouldn't be relevant) > > This series refactors the migration switchover path quite a bit. I started > this work initially to measure the JSON writer overhead, but then I decided > to cleanup the switchover path in general when I am at it altogether, as I > wanted to do this for a long time. > > A few major things I tried to do: > ... > - DEVICE migration state > > QEMU has a very special DEVICE migration state, that only happens with > precopy, and only when pause-before-switchover capability is enabled. > Due to that specialty we can't merge precopy and postcopy code on > switchover starts, because the state machine will be different. > > However after I checked the history and also with libvirt developers, > this seems unnecessary. So I had one patch making DEVICE state to be > the "switchover" phase for precopy/postcopy unconditionally. That will > make the state machine much easier for both modes, meanwhile nothing is > expected to break with it (but please still shoot if anyone knows / > suspect something will, or could, break..). No problem from libvirt side... Tested-by: Jiri Denemark <jdenemar@redhat.com>
On Wed, Jan 15, 2025 at 10:12:45AM +0100, Jiri Denemark wrote: > On Tue, Jan 14, 2025 at 18:07:30 -0500, Peter Xu wrote: > > CI: https://gitlab.com/peterx/qemu/-/pipelines/1625266692 > > (note: warning is present on rust stuff, but shouldn't be relevant) > > > > This series refactors the migration switchover path quite a bit. I started > > this work initially to measure the JSON writer overhead, but then I decided > > to cleanup the switchover path in general when I am at it altogether, as I > > wanted to do this for a long time. > > > > A few major things I tried to do: > > > ... > > - DEVICE migration state > > > > QEMU has a very special DEVICE migration state, that only happens with > > precopy, and only when pause-before-switchover capability is enabled. > > Due to that specialty we can't merge precopy and postcopy code on > > switchover starts, because the state machine will be different. > > > > However after I checked the history and also with libvirt developers, > > this seems unnecessary. So I had one patch making DEVICE state to be > > the "switchover" phase for precopy/postcopy unconditionally. That will > > make the state machine much easier for both modes, meanwhile nothing is > > expected to break with it (but please still shoot if anyone knows / > > suspect something will, or could, break..). > > No problem from libvirt side... > > Tested-by: Jiri Denemark <jdenemar@redhat.com> This is definitely reassuring.. thanks a lot, Jiri! -- Peter Xu
Peter Xu <peterx@redhat.com> writes: > CI: https://gitlab.com/peterx/qemu/-/pipelines/1625266692 > (note: warning is present on rust stuff, but shouldn't be relevant) > > This series refactors the migration switchover path quite a bit. I started > this work initially to measure the JSON writer overhead, but then I decided > to cleanup the switchover path in general when I am at it altogether, as I > wanted to do this for a long time. > > A few major things I tried to do: > > - About the JSON writer > > Currently, precopy migration always dumps a chunk of data called VM > description (QEMU_VM_VMDESCRIPTION) for debugging purpose. That is a > JSON blob explaining all the vmstates dumped in the migration stream. > QEMU has a machine property suppress-vmdesc deciding whether migration > will have that JSON chunk included. > > Postcopy does not have such JSON dump because postcopy is live session > and it can't normally be debugged from stream level (e.g. as a streamed > file). > > A tiny problem is we don't yet have a clue on how much cpu cycles we > need to construct and dump these JSONs even if they're only for > debugging, and even if suppress-vmdesc=on QEMU will still try to > construct these JSONs (e.g. also for postcopy). > > This series has a few patches just to make sure the JSON blob won't be > constructed if not needed (either postcopy, or suppress-vmdesc=on). I > tried to measure the downtime diff with/without these changes, the time > QEMU takes to construct / dump the JSON blob is still not measurable. > So I suppose unconditionally having this is ok. Said that, let's still > have these changes around so we avoid JSON operations if not needed. > > - DEVICE migration state > > QEMU has a very special DEVICE migration state, that only happens with > precopy, and only when pause-before-switchover capability is enabled. > Due to that specialty we can't merge precopy and postcopy code on > switchover starts, because the state machine will be different. > > However after I checked the history and also with libvirt developers, > this seems unnecessary. So I had one patch making DEVICE state to be > the "switchover" phase for precopy/postcopy unconditionally. That will > make the state machine much easier for both modes, meanwhile nothing is > expected to break with it (but please still shoot if anyone knows / > suspect something will, or could, break..). > > - General cleanups and fixes > > Most of the rest changes are random cleanups and fixes in the > switchover path. > > E.g., postcopy_start() has some code that isn't easy to read due to > some special flags here and there, mostly around the two calls of > qemu_savevm_state_complete_precopy(). This series will remove most of > those special treatments here and there. > > We could have done something twice in the past in postcopy switchover > (e.g. I believe we sync CPU twice.. but only happens with postcopy), > now they should all be sorted out. > > And quite some other things hopefully can be separately discussed and > justified in each patch. After these cleanups, we will be able to have > an unified entrance for precopy/postcopy on switchover. > > Initially I thought this could optimize the downtime slightly, but after > some tests, it turns out there's no measureable difference, at least in my > current setup... So let's take this as a cleanup series at least for now, > and I hope they would still make some sense. Comments welcomed. > > Thanks, > > Peter Xu (16): > migration: Remove postcopy implications in should_send_vmdesc() > migration: Do not construct JSON description if suppressed > migration: Optimize postcopy on downtime by avoiding JSON writer > migration: Avoid two src-downtime-end tracepoints for postcopy > migration: Drop inactivate_disk param in qemu_savevm_state_complete* > migration: Synchronize all CPU states only for non-iterable dump > migration: Adjust postcopy bandwidth during switchover > migration: Adjust locking in migration_maybe_pause() > migration: Drop cached migration state in migration_maybe_pause() > migration: Take BQL slightly longer in postcopy_start() > migration: Notify COMPLETE once for postcopy > migration: Unwrap qemu_savevm_state_complete_precopy() in postcopy > migration: Cleanup qemu_savevm_state_complete_precopy() > migration: Always set DEVICE state > migration: Merge precopy/postcopy on switchover start > migration: Trivial cleanup on JSON writer of vmstate_save() > > qapi/migration.json | 7 +- > migration/migration.h | 1 + > migration/savevm.h | 6 +- > migration/migration.c | 209 +++++++++++++++++++++++------------- > migration/savevm.c | 116 ++++++++------------ > migration/vmstate.c | 6 +- > tests/qtest/libqos/libqos.c | 3 +- > migration/trace-events | 2 +- > tests/qemu-iotests/194.out | 1 + > tests/qemu-iotests/203.out | 1 + > tests/qemu-iotests/234.out | 2 + > tests/qemu-iotests/262.out | 1 + > tests/qemu-iotests/280.out | 1 + > 13 files changed, 200 insertions(+), 156 deletions(-) Queued, thanks!
© 2016 - 2025 Red Hat, Inc.