[PATCH v2 0/5] migration: Notifier fixes for 11.0

Peter Xu posted 5 patches 1 week, 4 days ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20260126213614.3815900-1-peterx@redhat.com
Maintainers: Peter Maydell <peter.maydell@linaro.org>, "Michael S. Tsirkin" <mst@redhat.com>, Jason Wang <jasowang@redhat.com>, Alex Williamson <alex@shazbot.org>, "Cédric Le Goater" <clg@redhat.com>, Peter Xu <peterx@redhat.com>, Fabiano Rosas <farosas@suse.de>, Mark Kanda <mark.kanda@oracle.com>, Ben Chaney <bchaney@akamai.com>, Stefano Garzarella <sgarzare@redhat.com>, "Marc-André Lureau" <marcandre.lureau@redhat.com>
include/migration/misc.h | 20 ++++++++++++--------
hw/intc/arm_gicv3_kvm.c  |  2 +-
hw/net/virtio-net.c      |  4 ++--
hw/vfio/cpr-legacy.c     |  2 +-
hw/vfio/cpr.c            |  8 ++++----
hw/vfio/migration.c      |  4 ++--
migration/cpr-exec.c     |  6 +++---
migration/migration.c    | 29 ++++++++++++++++++++---------
net/vhost-vdpa.c         |  4 ++--
ui/spice-core.c          |  7 ++++---
migration/trace-events   |  1 +
11 files changed, 52 insertions(+), 35 deletions(-)
[PATCH v2 0/5] migration: Notifier fixes for 11.0
Posted by Peter Xu 1 week, 4 days ago
CI: https://gitlab.com/peterx/qemu/-/pipelines/2287309287

v2:
- Collected r-bs / a-bs
- Patch 2: update comment for possible sequence of notifies [Fabiano]

v1: https://lore.kernel.org/r/20260122230331.3543312-1-peterx@redhat.com

Two major goals for this small series:

- Fix postcopy issue where DONE and FAILED notifiers will be invoked twice

- Move FAILED notifier to be before vm_start() if the failure happens
  during switchover (where we will stop the VM first)

The 2nd goal will be needed by Stefan's ongoing work on block persistent
reservations, where a fallback should be required on src to happen before
vm_start().  Instead of introducing another FAILED_BEFORE_START, this
patchset should make FAILED work instead.

Patch 1 adds a tracepoint for me to verify this fix.

Patch 2-3 are the real changes of above two.

Patch 3-4 are some cleanups alone the context that we can do, hence
attached at the end.

More details in commit logs individually.  Comments welcomed, thanks.

Peter Xu (5):
  migration: Add a tracepoint for invoking migration notifiers
  migration: Fix double notification of DONE/FAIL for postcopy
  migration: Notify migration FAILED before starting VM
  migration: Drop explicit block activation in postcopy fail path
  migration: Rename MIG_EVENT_PRECOPY_* to MIG_EVENT_*

 include/migration/misc.h | 20 ++++++++++++--------
 hw/intc/arm_gicv3_kvm.c  |  2 +-
 hw/net/virtio-net.c      |  4 ++--
 hw/vfio/cpr-legacy.c     |  2 +-
 hw/vfio/cpr.c            |  8 ++++----
 hw/vfio/migration.c      |  4 ++--
 migration/cpr-exec.c     |  6 +++---
 migration/migration.c    | 29 ++++++++++++++++++++---------
 net/vhost-vdpa.c         |  4 ++--
 ui/spice-core.c          |  7 ++++---
 migration/trace-events   |  1 +
 11 files changed, 52 insertions(+), 35 deletions(-)

-- 
2.50.1
Re: [PATCH v2 0/5] migration: Notifier fixes for 11.0
Posted by Fabiano Rosas 1 week, 4 days ago
Peter Xu <peterx@redhat.com> writes:

> CI: https://gitlab.com/peterx/qemu/-/pipelines/2287309287
>
> v2:
> - Collected r-bs / a-bs
> - Patch 2: update comment for possible sequence of notifies [Fabiano]
>
> v1: https://lore.kernel.org/r/20260122230331.3543312-1-peterx@redhat.com
>
> Two major goals for this small series:
>
> - Fix postcopy issue where DONE and FAILED notifiers will be invoked twice
>
> - Move FAILED notifier to be before vm_start() if the failure happens
>   during switchover (where we will stop the VM first)
>
> The 2nd goal will be needed by Stefan's ongoing work on block persistent
> reservations, where a fallback should be required on src to happen before
> vm_start().  Instead of introducing another FAILED_BEFORE_START, this
> patchset should make FAILED work instead.
>
> Patch 1 adds a tracepoint for me to verify this fix.
>
> Patch 2-3 are the real changes of above two.
>
> Patch 3-4 are some cleanups alone the context that we can do, hence
> attached at the end.
>
> More details in commit logs individually.  Comments welcomed, thanks.
>
> Peter Xu (5):
>   migration: Add a tracepoint for invoking migration notifiers
>   migration: Fix double notification of DONE/FAIL for postcopy
>   migration: Notify migration FAILED before starting VM
>   migration: Drop explicit block activation in postcopy fail path
>   migration: Rename MIG_EVENT_PRECOPY_* to MIG_EVENT_*
>
>  include/migration/misc.h | 20 ++++++++++++--------
>  hw/intc/arm_gicv3_kvm.c  |  2 +-
>  hw/net/virtio-net.c      |  4 ++--
>  hw/vfio/cpr-legacy.c     |  2 +-
>  hw/vfio/cpr.c            |  8 ++++----
>  hw/vfio/migration.c      |  4 ++--
>  migration/cpr-exec.c     |  6 +++---
>  migration/migration.c    | 29 ++++++++++++++++++++---------
>  net/vhost-vdpa.c         |  4 ++--
>  ui/spice-core.c          |  7 ++++---
>  migration/trace-events   |  1 +
>  11 files changed, 52 insertions(+), 35 deletions(-)

Queued, let me know if you want to change patch 2.
Re: [PATCH v2 0/5] migration: Notifier fixes for 11.0
Posted by Peter Xu 1 week, 4 days ago
On Mon, Jan 26, 2026 at 07:04:38PM -0300, Fabiano Rosas wrote:
> Queued, let me know if you want to change patch 2.

Thanks. I have no strong preference.

Considering that cpr-exec's execve() is not designed to fail at all at
least in production, and FAILED after DONE is utterly confusing.. maybe we
don't need to mention it in a doc everyone would read?  I actually instead
hope nobody noticed it's even possible..

I believe that code was there only because when Steve worked on it he
wanted to trap when he typed something wrong by accident when testing
cpr-exec.  It's a feature too special and mustn't used wrong or VM data can
definitely get lost.  From that POV, maybe some day later we can use an
g_assert_not_reached() to replace that whole blob, including the notify.

-- 
Peter Xu