[PATCH RFC 00/11] migration/block: disk activation rewrite

Peter Xu posted 11 patches 5 months ago
There is a newer version of this series
migration/migration.h           |  33 ++++-
tests/qtest/migration-helpers.h |   2 +
migration/migration.c           | 177 +++++++++++-----------
migration/savevm.c              |  32 ++--
tests/qtest/migration-helpers.c |   8 +
tests/qtest/migration-test.c    | 252 +++++++++++++++++++++++++-------
6 files changed, 344 insertions(+), 160 deletions(-)
[PATCH RFC 00/11] migration/block: disk activation rewrite
Posted by Peter Xu 5 months ago
I started looking at this problem as a whole when reviewing Fabiano's
series, especially the patch (for a QEMU crash [1]):

https://lore.kernel.org/r/20241125144612.16194-5-farosas@suse.de

The proposed patch could work, but it's unwanted to add such side effect to
migration.  So I start to think about whether we can provide a cleaner
approach, at least remove that "we must active the disk for migration"
dependency, because migration really don't need the disks to be active..

It's also a pure wish that, if bdrv_inactivate_all() could be benign to be
called even if all disks are already inactive.  Then problem also gone.
After all, similar call on bdrv_activate_all() upon all-active disks is all
fine.  I hope that wish could still be fair.

And when I was looking at that, I found more things spread all over the
place on disk activation.  I decided to clean all of them up, while
hopefully fixing the QEMU crash [1] too.

So this is what I came up with as of today.  Marking RFC as of now, just to
collect some feedbacks first.  At least I'd like to go with one more patch
to deprecate late-block-active - not deprecating its function, but make it
always happen (which is the default as of now for Libvirt), which should
hopefully be migration-ABI-safe.

With the help of Fabiano's test cases, I at least am sure this series works
for the ping pong migrations, and all existing qtests.

Let me know, thanks.

[1] https://gitlab.com/qemu-project/qemu/-/issues/2395

Fabiano Rosas (4):
  tests/qtest/migration: Move more code under only_target
  tests/qtest/migration: Don't use hardcoded strings for -serial
  tests/qtest/migration: Support cleaning up only one side of migration
  tests/qtest/migration: Test successive migrations

Peter Xu (7):
  migration: Add helper to get target runstate
  migration/block: Make late-block-active the default
  migration/block: Apply late-block-active behavior to postcopy
  migration/block: Fix possible race with block_inactive
  migration/block: Merge block reactivations for fail/cancel
  migration/block: Extend the migration_block_* API to dest side
  migration/block: Apply the migration_block_* API to postcopy

 migration/migration.h           |  33 ++++-
 tests/qtest/migration-helpers.h |   2 +
 migration/migration.c           | 177 +++++++++++-----------
 migration/savevm.c              |  32 ++--
 tests/qtest/migration-helpers.c |   8 +
 tests/qtest/migration-test.c    | 252 +++++++++++++++++++++++++-------
 6 files changed, 344 insertions(+), 160 deletions(-)

-- 
2.47.0
Re: [PATCH RFC 00/11] migration/block: disk activation rewrite
Posted by Peter Xu 5 months ago
On Tue, Dec 03, 2024 at 07:51:27PM -0500, Peter Xu wrote:
>   migration/block: Merge block reactivations for fail/cancel
>   migration/block: Extend the migration_block_* API to dest side
>   migration/block: Apply the migration_block_* API to postcopy

I just noticed these three patches cannot be separate, because right after
we introduce a flag to cache disk activation status, it will need to apply
to all the code or inconsistency can happen if someone applies one of the
patches, for example.

I also overlooked that we must take qmp_cont() into the picture too, so
that will also need to use the API so the flag will be consistent.

So in the next version I'll squash the three patches into one, and also use
the new API in qmp_cont(), so the status flag should always be consistent.

Again, to block layer developers: please help if any of you know how to
make bdrv_inactivate_all() safe to be called on top of inactivated disks,
or any reasoning on why it mustn't.  It could be very helpful.

-- 
Peter Xu