[PATCH] qemu: fix switchover-ack regression for old qemu

Jon Kohler posted 1 patch 2 months, 3 weeks ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/libvirt tags/patchew/20240627181158.2369752-1-jon@nutanix.com
src/qemu/qemu_migration_params.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)
[PATCH] qemu: fix switchover-ack regression for old qemu
Posted by Jon Kohler 2 months, 3 weeks ago
When enabling switchover-ack on qemu from libvirt, the .party value
was set to both source and target; however, qemuMigrationParamsCheck()
only takes that into account to validate that the remote side of the
migration supports the flag if it is marked optional or auto/always on.

In the case of switchover-ack, when enabled on only the dst and not
the src, the migration will fail if the src qemu does not support
switchover-ack, as the dst qemu will issue a switchover-ack msg:
qemu/migration/savevm.c ->
  loadvm_process_command ->
    migrate_send_rp_switchover_ack(mis) ->
      migrate_send_rp_message(mis, MIG_RP_MSG_SWITCHOVER_ACK, 0, NULL)

Since the src qemu doesn't understand messages with header_type ==
MIG_RP_MSG_SWITCHOVER_ACK, qemu will kill the migration with error:
  qemu-kvm: RP: Received invalid message 0x0007 length 0x0000
  qemu-kvm: Unable to write to socket: Bad file descriptor

Looking at the original commit [1] for optional migration capabilities,
it seems that the spirit of optional handling was to enhance a given
existing capability where possible. Given that switchover-ack
exclusively depends on return-path, adding it as optional to that cap
feels right.

[1] 61e34b08568 ("qemu: Add support for optional migration capabilities")

Fixes: 1cc7737f69e ("qemu: add support for qemu switchover-ack")

Signed-off-by: Jon Kohler <jon@nutanix.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Avihai Horon <avihaih@nvidia.com>
Cc: Jiri Denemark <jdenemar@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: YangHang Liu <yanghliu@redhat.com>
---
 src/qemu/qemu_migration_params.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/src/qemu/qemu_migration_params.c b/src/qemu/qemu_migration_params.c
index 9593b6ba65..98822012cc 100644
--- a/src/qemu/qemu_migration_params.c
+++ b/src/qemu/qemu_migration_params.c
@@ -210,17 +210,13 @@ static const qemuMigrationParamsFlagMapItem qemuMigrationParamsFlagMap[] = {
     {.match = QEMU_MIGRATION_FLAG_FORBIDDEN,
      .flag = VIR_MIGRATE_TUNNELLED,
      .cap = QEMU_MIGRATION_CAP_RETURN_PATH,
+     .optional = QEMU_MIGRATION_CAP_SWITCHOVER_ACK,
      .party = QEMU_MIGRATION_SOURCE | QEMU_MIGRATION_DESTINATION},
 
     {.match = QEMU_MIGRATION_FLAG_REQUIRED,
      .flag = VIR_MIGRATE_ZEROCOPY,
      .cap = QEMU_MIGRATION_CAP_ZERO_COPY_SEND,
      .party = QEMU_MIGRATION_SOURCE},
-
-    {.match = QEMU_MIGRATION_FLAG_FORBIDDEN,
-     .flag = VIR_MIGRATE_TUNNELLED,
-     .cap = QEMU_MIGRATION_CAP_SWITCHOVER_ACK,
-     .party = QEMU_MIGRATION_SOURCE | QEMU_MIGRATION_DESTINATION},
 };
 
 /* Translation from VIR_MIGRATE_PARAM_* typed parameters to
-- 
2.43.0
Re: [PATCH] qemu: fix switchover-ack regression for old qemu
Posted by Jiri Denemark 2 months, 3 weeks ago
On Thu, Jun 27, 2024 at 11:11:56 -0700, Jon Kohler wrote:
> When enabling switchover-ack on qemu from libvirt, the .party value
> was set to both source and target; however, qemuMigrationParamsCheck()
> only takes that into account to validate that the remote side of the
> migration supports the flag if it is marked optional or auto/always on.
> 
> In the case of switchover-ack, when enabled on only the dst and not
> the src, the migration will fail if the src qemu does not support
> switchover-ack, as the dst qemu will issue a switchover-ack msg:
> qemu/migration/savevm.c ->
>   loadvm_process_command ->
>     migrate_send_rp_switchover_ack(mis) ->
>       migrate_send_rp_message(mis, MIG_RP_MSG_SWITCHOVER_ACK, 0, NULL)
> 
> Since the src qemu doesn't understand messages with header_type ==
> MIG_RP_MSG_SWITCHOVER_ACK, qemu will kill the migration with error:
>   qemu-kvm: RP: Received invalid message 0x0007 length 0x0000
>   qemu-kvm: Unable to write to socket: Bad file descriptor
> 
> Looking at the original commit [1] for optional migration capabilities,
> it seems that the spirit of optional handling was to enhance a given
> existing capability where possible. Given that switchover-ack
> exclusively depends on return-path, adding it as optional to that cap
> feels right.

Oops, sorry for not spotting this in my review and thanks for fixing it.

I guess we should add a possibility to check QEMU support for
capabilities in case they don't depend on another capability and they
are supposed to be enabled automatically. But that's a separate thing
which may possibly become useful in the future.

Reviewed-by: Jiri Denemark <jdenemar@redhat.com>