Devices may opt-in migration FAILED notifiers to be invoked when migration
fails. Currently, the notifications happen in migration_cleanup(). It is
normally fine, but maybe not ideal if there's dependency of the fallback
v.s. VM starts.
This patch moves the FAILED notification earlier, so that if the failure
happened during switchover, it'll notify before VM restart.
After walking over all existing FAILED notifier users, I got the conclusion
that this should also be a cleaner approach at least from design POV.
We have these notifier users, where the first two do not need to trap
FAILED:
|----------------------------+-------------------------------------+---------------------|
| device | handler | events needed |
|----------------------------+-------------------------------------+---------------------|
| gicv3 | kvm_arm_gicv3_notifier | DONE |
| vfio_iommufd / vfio_legacy | vfio_cpr_reboot_notifier | SETUP |
| cpr-exec | cpr_exec_notifier | FAILED, DONE |
| virtio-net | virtio_net_migration_state_notifier | SETUP, FAILED |
| vfio | vfio_migration_state_notifier | FAILED |
| vdpa | vdpa_net_migration_state_notifier | SETUP, FAILED |
| spice [*] | migration_state_notifier | SETUP, FAILED, DONE |
|----------------------------+-------------------------------------+---------------------|
For cpr-exec, it tries to cleanup some cpr-exec specific fd or env
variables. This should be fine either way, as long as before
migration_cleanup().
For virtio-net, we need to re-plug the primary device back to guest in the
failover mode. Likely benign.
VFIO needs to re-start the device if FAILED. IIUC it should do it before
vm_start(), if the VFIO device can be put into a STOPed state due to
migration, we should logically make it running again before vCPUs run.
VDPA will disable SVQ when migration is FAILED. Likely benign too, but
looks better if we can do it before resuming vCPUs.
For spice, we should rely on "spice_server_migrate_end(false)" to retake
the ownership. Benign, but looks more reasonable if the spice client does
it before VM runs again.
Note that this change may introduce slightly more downtime, if the
migration failed exactly at the switchover phase. But that's very rare,
and even if it happens, none of above expects a long delay, but a short
one, likely will be buried in the total downtime even if failed.
Cc: Cédric Le Goater <clg@redhat.com>
Cc: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
migration/migration.c | 20 ++++++++++++++++----
1 file changed, 16 insertions(+), 4 deletions(-)
diff --git a/migration/migration.c b/migration/migration.c
index 91775f8472..1d9a2fc068 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1481,7 +1481,6 @@ static void migration_cleanup_json_writer(MigrationState *s)
static void migration_cleanup(MigrationState *s)
{
- MigrationEventType type;
QEMUFile *tmp = NULL;
trace_migration_cleanup();
@@ -1535,9 +1534,15 @@ static void migration_cleanup(MigrationState *s)
/* It is used on info migrate. We can't free it */
error_report_err(error_copy(s->error));
}
- type = migration_has_failed(s) ? MIG_EVENT_PRECOPY_FAILED :
- MIG_EVENT_PRECOPY_DONE;
- migration_call_notifiers(s, type, NULL);
+
+ /*
+ * FAILED notification should have already happened. Notify DONE if
+ * migration completed successfully.
+ */
+ if (!migration_has_failed(s)) {
+ migration_call_notifiers(s, MIG_EVENT_PRECOPY_DONE, NULL);
+ }
+
yank_unregister_instance(MIGRATION_YANK_INSTANCE);
}
@@ -3589,6 +3594,13 @@ static void migration_iteration_finish(MigrationState *s)
error_free(local_err);
break;
}
+
+ /*
+ * Notify FAILED before starting VM, so that devices can invoke
+ * necessary fallbacks before vCPUs run again.
+ */
+ migration_call_notifiers(s, MIG_EVENT_PRECOPY_FAILED, NULL);
+
if (runstate_is_live(s->vm_old_state)) {
if (!runstate_check(RUN_STATE_SHUTDOWN)) {
vm_start();
--
2.50.1
Peter Xu <peterx@redhat.com> writes:
> Devices may opt-in migration FAILED notifiers to be invoked when migration
> fails. Currently, the notifications happen in migration_cleanup(). It is
> normally fine, but maybe not ideal if there's dependency of the fallback
> v.s. VM starts.
>
> This patch moves the FAILED notification earlier, so that if the failure
> happened during switchover, it'll notify before VM restart.
>
The change to FAILED in patch 2 should come to this patch to avoid
having a window where the notification only happens at the end.
> After walking over all existing FAILED notifier users, I got the conclusion
> that this should also be a cleaner approach at least from design POV.
>
> We have these notifier users, where the first two do not need to trap
> FAILED:
>
> |----------------------------+-------------------------------------+---------------------|
> | device | handler | events needed |
> |----------------------------+-------------------------------------+---------------------|
> | gicv3 | kvm_arm_gicv3_notifier | DONE |
> | vfio_iommufd / vfio_legacy | vfio_cpr_reboot_notifier | SETUP |
> | cpr-exec | cpr_exec_notifier | FAILED, DONE |
> | virtio-net | virtio_net_migration_state_notifier | SETUP, FAILED |
> | vfio | vfio_migration_state_notifier | FAILED |
> | vdpa | vdpa_net_migration_state_notifier | SETUP, FAILED |
> | spice [*] | migration_state_notifier | SETUP, FAILED, DONE |
> |----------------------------+-------------------------------------+---------------------|
>
> For cpr-exec, it tries to cleanup some cpr-exec specific fd or env
> variables. This should be fine either way, as long as before
> migration_cleanup().
>
> For virtio-net, we need to re-plug the primary device back to guest in the
> failover mode. Likely benign.
>
> VFIO needs to re-start the device if FAILED. IIUC it should do it before
> vm_start(), if the VFIO device can be put into a STOPed state due to
> migration, we should logically make it running again before vCPUs run.
>
> VDPA will disable SVQ when migration is FAILED. Likely benign too, but
> looks better if we can do it before resuming vCPUs.
>
> For spice, we should rely on "spice_server_migrate_end(false)" to retake
> the ownership. Benign, but looks more reasonable if the spice client does
> it before VM runs again.
>
> Note that this change may introduce slightly more downtime, if the
> migration failed exactly at the switchover phase. But that's very rare,
> and even if it happens, none of above expects a long delay, but a short
> one, likely will be buried in the total downtime even if failed.
>
> Cc: Cédric Le Goater <clg@redhat.com>
> Cc: Marc-André Lureau <marcandre.lureau@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
> migration/migration.c | 20 ++++++++++++++++----
> 1 file changed, 16 insertions(+), 4 deletions(-)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index 91775f8472..1d9a2fc068 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1481,7 +1481,6 @@ static void migration_cleanup_json_writer(MigrationState *s)
>
> static void migration_cleanup(MigrationState *s)
> {
> - MigrationEventType type;
> QEMUFile *tmp = NULL;
>
> trace_migration_cleanup();
> @@ -1535,9 +1534,15 @@ static void migration_cleanup(MigrationState *s)
> /* It is used on info migrate. We can't free it */
> error_report_err(error_copy(s->error));
> }
> - type = migration_has_failed(s) ? MIG_EVENT_PRECOPY_FAILED :
> - MIG_EVENT_PRECOPY_DONE;
> - migration_call_notifiers(s, type, NULL);
> +
> + /*
> + * FAILED notification should have already happened. Notify DONE if
> + * migration completed successfully.
> + */
> + if (!migration_has_failed(s)) {
> + migration_call_notifiers(s, MIG_EVENT_PRECOPY_DONE, NULL);
> + }
> +
> yank_unregister_instance(MIGRATION_YANK_INSTANCE);
> }
>
> @@ -3589,6 +3594,13 @@ static void migration_iteration_finish(MigrationState *s)
> error_free(local_err);
> break;
> }
> +
> + /*
> + * Notify FAILED before starting VM, so that devices can invoke
> + * necessary fallbacks before vCPUs run again.
> + */
> + migration_call_notifiers(s, MIG_EVENT_PRECOPY_FAILED, NULL);
> +
> if (runstate_is_live(s->vm_old_state)) {
> if (!runstate_check(RUN_STATE_SHUTDOWN)) {
> vm_start();
On Fri, Jan 23, 2026 at 09:59:35AM -0300, Fabiano Rosas wrote: > Peter Xu <peterx@redhat.com> writes: > > > Devices may opt-in migration FAILED notifiers to be invoked when migration > > fails. Currently, the notifications happen in migration_cleanup(). It is > > normally fine, but maybe not ideal if there's dependency of the fallback > > v.s. VM starts. > > > > This patch moves the FAILED notification earlier, so that if the failure > > happened during switchover, it'll notify before VM restart. > > > > The change to FAILED in patch 2 should come to this patch to avoid > having a window where the notification only happens at the end. Hmm.. Isn't that expected? Even after patch 2, we still notify FAILED at the end for precopy. It's the same for postcopy. For a failed postcopy we have following behavior: Before patch 2 ============== - notify FAILED (during switchover) - vm_start() - notify FAILED (during migration_cleanup) After patch 2 ============= - vm_start() - notify FAILED (during migration_cleanup) So patch 2 fixes the duplicate issue, and only fixes that. After patch 3 ============= - notify FAILED (during migration_iteration_finish) - vm_start() Patch 3 changes the place of FAILED notification so that it happens always before vm_start(), for both precopy and postcopy. -- Peter Xu
Peter Xu <peterx@redhat.com> writes: > On Fri, Jan 23, 2026 at 09:59:35AM -0300, Fabiano Rosas wrote: >> Peter Xu <peterx@redhat.com> writes: >> >> > Devices may opt-in migration FAILED notifiers to be invoked when migration >> > fails. Currently, the notifications happen in migration_cleanup(). It is >> > normally fine, but maybe not ideal if there's dependency of the fallback >> > v.s. VM starts. >> > >> > This patch moves the FAILED notification earlier, so that if the failure >> > happened during switchover, it'll notify before VM restart. >> > >> >> The change to FAILED in patch 2 should come to this patch to avoid >> having a window where the notification only happens at the end. > > Hmm.. Isn't that expected? Even after patch 2, we still notify FAILED at > the end for precopy. It's the same for postcopy. > Sorry, I meant: s/at the end/after vm_start/. > For a failed postcopy we have following behavior: > > Before patch 2 > ============== > > - notify FAILED (during switchover) > - vm_start() > - notify FAILED (during migration_cleanup) > > After patch 2 > ============= > > - vm_start() > - notify FAILED (during migration_cleanup) > > So patch 2 fixes the duplicate issue, and only fixes that. > > After patch 3 > ============= > > - notify FAILED (during migration_iteration_finish) > - vm_start() > > Patch 3 changes the place of FAILED notification so that it happens always > before vm_start(), for both precopy and postcopy. Right, my point is that with patch 3 we're establishing that the correct place to notify is before vm_start(). But after patch 2, *if* any driver actually depends on being informed of failure *before* starting the VM, that will not happen. I think both changes could be made at once so that this intermediate state never exists.
On Fri, Jan 23, 2026 at 02:36:28PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
>
> > On Fri, Jan 23, 2026 at 09:59:35AM -0300, Fabiano Rosas wrote:
> >> Peter Xu <peterx@redhat.com> writes:
> >>
> >> > Devices may opt-in migration FAILED notifiers to be invoked when migration
> >> > fails. Currently, the notifications happen in migration_cleanup(). It is
> >> > normally fine, but maybe not ideal if there's dependency of the fallback
> >> > v.s. VM starts.
> >> >
> >> > This patch moves the FAILED notification earlier, so that if the failure
> >> > happened during switchover, it'll notify before VM restart.
> >> >
> >>
> >> The change to FAILED in patch 2 should come to this patch to avoid
> >> having a window where the notification only happens at the end.
> >
> > Hmm.. Isn't that expected? Even after patch 2, we still notify FAILED at
> > the end for precopy. It's the same for postcopy.
> >
>
> Sorry, I meant: s/at the end/after vm_start/.
>
> > For a failed postcopy we have following behavior:
> >
> > Before patch 2
> > ==============
> >
> > - notify FAILED (during switchover)
> > - vm_start()
> > - notify FAILED (during migration_cleanup)
> >
> > After patch 2
> > =============
> >
> > - vm_start()
> > - notify FAILED (during migration_cleanup)
> >
> > So patch 2 fixes the duplicate issue, and only fixes that.
> >
> > After patch 3
> > =============
> >
> > - notify FAILED (during migration_iteration_finish)
> > - vm_start()
> >
> > Patch 3 changes the place of FAILED notification so that it happens always
> > before vm_start(), for both precopy and postcopy.
>
> Right, my point is that with patch 3 we're establishing that the correct
> place to notify is before vm_start().
Yep, likely not strictly correctness in terms of current notifiers, but
since Stefan may have yet another use case that may require a notifier to
be done before vm_start(), it makes more sense for us to move, IMHO.
> But after patch 2, *if* any driver actually depends on being informed of
> failure *before* starting the VM, that will not happen. I think both
> changes could be made at once so that this intermediate state never
> exists.
I see what you meant. I think there should have no such user.
It's because we always notify FAILED at migration_cleanup() for precopy, or
even postcopy before the cpr-exec work (before QEMU 9.0).
That behavior of "notify FAILED before vm_start() for postcopy" is very
specific and only added after commit 4af667f87c ("migration: notifier error
checking"). IOW, before QEMU 9.0, for both precopy and postcopy we always
notify FAILED in migration_cleanup(), never before vm_start().
I mentioned this in the commit log of previous patch too, where I bet the
additional FAILED notification added in 4af667f87c for postcopy path is an
accident (to make it pairing with the "reused DONE", however it turns out
we likely shouldn't do either of them..). So I don't expect anything will
depend on that behavior, and only for postcopy.
The benefit of splitting this patch and previous one is, the previous one
is a "fix" of duplicated notifications, hence if we need a backport that
can be done without this one. Said that, I don't think one should need
it.. It should also make each commits slightly easier to follow, because
they're fundamentally two changes.
Let me know what you think after reading my explanations above. I prefer
the split like as-is, but I can still squash it to close the trivially
small window that you described. I'll make sure if merged the commit
message contains separate discussions on two problems.
--
Peter Xu
Peter Xu <peterx@redhat.com> writes:
> On Fri, Jan 23, 2026 at 02:36:28PM -0300, Fabiano Rosas wrote:
>> Peter Xu <peterx@redhat.com> writes:
>>
>> > On Fri, Jan 23, 2026 at 09:59:35AM -0300, Fabiano Rosas wrote:
>> >> Peter Xu <peterx@redhat.com> writes:
>> >>
>> >> > Devices may opt-in migration FAILED notifiers to be invoked when migration
>> >> > fails. Currently, the notifications happen in migration_cleanup(). It is
>> >> > normally fine, but maybe not ideal if there's dependency of the fallback
>> >> > v.s. VM starts.
>> >> >
>> >> > This patch moves the FAILED notification earlier, so that if the failure
>> >> > happened during switchover, it'll notify before VM restart.
>> >> >
>> >>
>> >> The change to FAILED in patch 2 should come to this patch to avoid
>> >> having a window where the notification only happens at the end.
>> >
>> > Hmm.. Isn't that expected? Even after patch 2, we still notify FAILED at
>> > the end for precopy. It's the same for postcopy.
>> >
>>
>> Sorry, I meant: s/at the end/after vm_start/.
>>
>> > For a failed postcopy we have following behavior:
>> >
>> > Before patch 2
>> > ==============
>> >
>> > - notify FAILED (during switchover)
>> > - vm_start()
>> > - notify FAILED (during migration_cleanup)
>> >
>> > After patch 2
>> > =============
>> >
>> > - vm_start()
>> > - notify FAILED (during migration_cleanup)
>> >
>> > So patch 2 fixes the duplicate issue, and only fixes that.
>> >
>> > After patch 3
>> > =============
>> >
>> > - notify FAILED (during migration_iteration_finish)
>> > - vm_start()
>> >
>> > Patch 3 changes the place of FAILED notification so that it happens always
>> > before vm_start(), for both precopy and postcopy.
>>
>> Right, my point is that with patch 3 we're establishing that the correct
>> place to notify is before vm_start().
>
> Yep, likely not strictly correctness in terms of current notifiers, but
> since Stefan may have yet another use case that may require a notifier to
> be done before vm_start(), it makes more sense for us to move, IMHO.
>
>> But after patch 2, *if* any driver actually depends on being informed of
>> failure *before* starting the VM, that will not happen. I think both
>> changes could be made at once so that this intermediate state never
>> exists.
>
> I see what you meant. I think there should have no such user.
>
> It's because we always notify FAILED at migration_cleanup() for precopy, or
> even postcopy before the cpr-exec work (before QEMU 9.0).
>
> That behavior of "notify FAILED before vm_start() for postcopy" is very
> specific and only added after commit 4af667f87c ("migration: notifier error
> checking"). IOW, before QEMU 9.0, for both precopy and postcopy we always
> notify FAILED in migration_cleanup(), never before vm_start().
>
> I mentioned this in the commit log of previous patch too, where I bet the
> additional FAILED notification added in 4af667f87c for postcopy path is an
> accident (to make it pairing with the "reused DONE", however it turns out
> we likely shouldn't do either of them..). So I don't expect anything will
> depend on that behavior, and only for postcopy.
>
> The benefit of splitting this patch and previous one is, the previous one
> is a "fix" of duplicated notifications, hence if we need a backport that
> can be done without this one. Said that, I don't think one should need
> it.. It should also make each commits slightly easier to follow, because
> they're fundamentally two changes.
>
> Let me know what you think after reading my explanations above. I prefer
> the split like as-is, but I can still squash it to close the trivially
> small window that you described. I'll make sure if merged the commit
> message contains separate discussions on two problems.
Ok, fair points.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
© 2016 - 2026 Red Hat, Inc.