Peter Xu <peterx@redhat.com> writes:
> Both dump-guest-memory and live migration have vm state cached internally.
> Allowing them to happen together means the vm state can be messed up. Simply
> block live migration for dump-guest-memory.
>
> One trivial thing to mention is we should still allow dump-guest-memory even if
> -only-migratable is specified, because that flag should majorly be used to
> guarantee not adding devices that will block migration by accident. Dump guest
> memory is not like that - it'll only block for the seconds when it's dumping.
I recently ran into a similarly unusual use of migration blockers:
Subject: -only-migrate and the two different uses of migration blockers
(was: spapr_events: Sure we may ignore migrate_add_blocker() failure?)
Date: Mon, 19 Jul 2021 13:00:20 +0200 (5 weeks, 1 day, 20 hours ago)
Message-ID: <87sg0amuuz.fsf_-_@dusky.pond.sub.org>
We appear to use migration blockers in two ways:
(1) Prevent migration for an indefinite time, typically due to use of
some feature that isn't compatible with migration.
(2) Delay migration for a short time.
Option -only-migrate is designed for (1). It interferes with (2).
Example for (1): device "x-pci-proxy-dev" doesn't support migration. It
adds a migration blocker on realize, and deletes it on unrealize. With
-only-migrate, device realize fails. Works as designed.
Example for (2): spapr_mce_req_event() makes an effort to prevent
migration degrate the reporting of FWNMIs. It adds a migration blocker
when it receives one, and deletes it when it's done handling it. This
is a best effort; if migration is already in progress by the time FWNMI
is received, we simply carry on, and that's okay. However, option
-only-migrate sabotages the best effort entirely.
While this isn't exactly terrible, it may be a weakness in our thinking
and our infrastructure. I'm bringing it up so the people in charge are
aware :)
https://lists.nongnu.org/archive/html/qemu-devel/2021-07/msg04723.html
Downthread there, Dave Gilbert opined
It almost feels like they need a way to temporarily hold off
'completion' of migratio - i.e. the phase where we stop the CPU and
write the device data; mind you you'd also probably want it to stop
cold-migrates/snapshots?