[v7] virtio-net: live-TAP local migration

[PATCH v7 16/19] qapi: add interface for backend-transfer virtio-net/tap migration

Posted by Vladimir Sementsov-Ogievskiy 4 months ago

To migrate virtio-net TAP device backend (including open fds) locally,
user should simply set migration parameter

   backend-transfer = ["virtio-net-tap"]

Why not simple boolean? To simplify migration to further versions,
when more devices will support backend-transfer migration.

Alternatively, we may add per-device option to disable backend-transfer
migration, but still:

1. It's more comfortable to set same capabilities/parameters on both
source and target QEMU, than care about each device.

2. To not break the design, that machine-type + device options +
migration capabilities and parameters are fully define the resulting
migration stream. We'll break this if add in future more
backend-transfer support in devices under same backend-transfer=true
parameter.

The commit only brings the interface, the realization will come in later
commit. That's why we add a temporary not-implemented error in
migrate_params_check().

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
---
 migration/options.c | 39 +++++++++++++++++++++++++++++++++++++++
 migration/options.h |  2 ++
 qapi/migration.json | 42 ++++++++++++++++++++++++++++++++++++------
 3 files changed, 77 insertions(+), 6 deletions(-)

diff --git a/migration/options.c b/migration/options.c
index 5183112775..76709af3ab 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -13,6 +13,7 @@
 
 #include "qemu/osdep.h"
 #include "qemu/error-report.h"
+#include "qapi/util.h"
 #include "exec/target_page.h"
 #include "qapi/clone-visitor.h"
 #include "qapi/error.h"
@@ -262,6 +263,20 @@ bool migrate_mapped_ram(void)
     return s->capabilities[MIGRATION_CAPABILITY_MAPPED_RAM];
 }
 
+bool migrate_virtio_net_tap(void)
+{
+    MigrationState *s = migrate_get_current();
+    BackendTransferList *el = s->parameters.backend_transfer;
+
+    for ( ; el; el = el->next) {
+        if (el->value == BACKEND_TRANSFER_VIRTIO_NET_TAP) {
+            return true;
+        }
+    }
+
+    return false;
+}
+
 bool migrate_ignore_shared(void)
 {
     MigrationState *s = migrate_get_current();
@@ -963,6 +978,12 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
     params->cpr_exec_command = QAPI_CLONE(strList,
                                           s->parameters.cpr_exec_command);
 
+    if (s->parameters.backend_transfer) {
+        params->has_backend_transfer = true;
+        params->backend_transfer = QAPI_CLONE(BackendTransferList,
+                                              s->parameters.backend_transfer);
+    }
+
     return params;
 }
 
@@ -997,6 +1018,7 @@ void migrate_params_init(MigrationParameters *params)
     params->has_zero_page_detection = true;
     params->has_direct_io = true;
     params->has_cpr_exec_command = true;
+    params->has_backend_transfer = true;
 }
 
 /*
@@ -1183,6 +1205,12 @@ bool migrate_params_check(MigrationParameters *params, Error **errp)
         return false;
     }
 
+    /* TODO: implement backend-transfer and remove this check */
+    if (params->has_backend_transfer) {
+        error_setg(errp, "Not implemented");
+        return false;
+    }
+
     return true;
 }
 
@@ -1305,6 +1333,10 @@ static void migrate_params_test_apply(MigrateSetParameters *params,
     if (params->has_cpr_exec_command) {
         dest->cpr_exec_command = params->cpr_exec_command;
     }
+
+    if (params->has_backend_transfer) {
+        dest->backend_transfer = params->backend_transfer;
+    }
 }
 
 static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
@@ -1443,6 +1475,13 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
         s->parameters.cpr_exec_command =
             QAPI_CLONE(strList, params->cpr_exec_command);
     }
+
+    if (params->has_backend_transfer) {
+        qapi_free_BackendTransferList(s->parameters.backend_transfer);
+
+        s->parameters.backend_transfer = QAPI_CLONE(BackendTransferList,
+                                                    params->backend_transfer);
+    }
 }
 
 void qmp_migrate_set_parameters(MigrateSetParameters *params, Error **errp)
diff --git a/migration/options.h b/migration/options.h
index 82d839709e..55c0345433 100644
--- a/migration/options.h
+++ b/migration/options.h
@@ -87,6 +87,8 @@ const char *migrate_tls_hostname(void);
 uint64_t migrate_xbzrle_cache_size(void);
 ZeroPageDetection migrate_zero_page_detection(void);
 
+bool migrate_virtio_net_tap(void);
+
 /* parameters helpers */
 
 bool migrate_params_check(MigrationParameters *params, Error **errp);
diff --git a/qapi/migration.json b/qapi/migration.json
index be0f3fcc12..1bfe7df191 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -770,6 +770,19 @@
       '*transform': 'BitmapMigrationBitmapAliasTransform'
   } }
 
+##
+# @BackendTransfer:
+#
+# @virtio-net-tap: Enable backend-transfer migration for
+#     virtio-net/tap. When enabled, TAP fds and all related state are
+#     passed to the destination in the migration channel (which must
+#     be a UNIX domain socket).
+#
+# Since: 10.2
+##
+{ 'enum': 'BackendTransfer',
+  'data': [ 'virtio-net-tap' ] }
+
 ##
 # @BitmapMigrationNodeAlias:
 #
@@ -951,9 +964,13 @@
 #     is @cpr-exec.  The first list element is the program's filename,
 #     the remainder its arguments.  (Since 10.2)
 #
+# @backend-transfer: List of targets for backend-transfer migration.
+#     See description in `BackendTransfer`.  Default is no
+#     backend-transfer migration (Since 10.2)
+#
 # Features:
 #
-# @unstable: Members @x-checkpoint-delay and
+# @unstable: Members @backend-transfer, @x-checkpoint-delay and
 #     @x-vcpu-dirty-limit-period are experimental.
 #
 # Since: 2.4
@@ -978,7 +995,8 @@
            'mode',
            'zero-page-detection',
            'direct-io',
-           'cpr-exec-command'] }
+           'cpr-exec-command',
+           { 'name': 'backend-transfer', 'features': ['unstable'] } ] }
 
 ##
 # @MigrateSetParameters:
@@ -1137,9 +1155,13 @@
 #     is @cpr-exec.  The first list element is the program's filename,
 #     the remainder its arguments.  (Since 10.2)
 #
+# @backend-transfer: List of targets for backend-transfer migration.
+#     See description in `BackendTransfer`.  Default is no
+#     backend-transfer migration (Since 10.2)
+#
 # Features:
 #
-# @unstable: Members @x-checkpoint-delay and
+# @unstable: Members @backend-transfer, @x-checkpoint-delay and
 #     @x-vcpu-dirty-limit-period are experimental.
 #
 # TODO: either fuse back into `MigrationParameters`, or make
@@ -1179,7 +1201,9 @@
             '*mode': 'MigMode',
             '*zero-page-detection': 'ZeroPageDetection',
             '*direct-io': 'bool',
-            '*cpr-exec-command': [ 'str' ]} }
+            '*cpr-exec-command': [ 'str' ],
+            '*backend-transfer': { 'type': [ 'BackendTransfer' ],
+                                   'features': [ 'unstable' ] } } }
 
 ##
 # @migrate-set-parameters:
@@ -1352,9 +1376,13 @@
 #     is @cpr-exec.  The first list element is the program's filename,
 #     the remainder its arguments.  (Since 10.2)
 #
+# @backend-transfer: List of targets for backend-transfer migration.
+#     See description in `BackendTransfer`.  Default is no
+#     backend-transfer migration (Since 10.2)
+#
 # Features:
 #
-# @unstable: Members @x-checkpoint-delay and
+# @unstable: Members @backend-transfer, @x-checkpoint-delay and
 #     @x-vcpu-dirty-limit-period are experimental.
 #
 # Since: 2.4
@@ -1391,7 +1419,9 @@
             '*mode': 'MigMode',
             '*zero-page-detection': 'ZeroPageDetection',
             '*direct-io': 'bool',
-            '*cpr-exec-command': [ 'str' ]} }
+            '*cpr-exec-command': [ 'str' ],
+            '*backend-transfer': { 'type': [ 'BackendTransfer' ],
+                                   'features': [ 'unstable' ] } } }
 
 ##
 # @query-migrate-parameters:
-- 
2.48.1

Re: [PATCH v7 16/19] qapi: add interface for backend-transfer virtio-net/tap migration

Posted by Peter Xu 3 months, 4 weeks ago

On Fri, Oct 10, 2025 at 08:39:54PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> To migrate virtio-net TAP device backend (including open fds) locally,
> user should simply set migration parameter
> 
>    backend-transfer = ["virtio-net-tap"]
> 
> Why not simple boolean? To simplify migration to further versions,
> when more devices will support backend-transfer migration.
> 
> Alternatively, we may add per-device option to disable backend-transfer
> migration, but still:
> 
> 1. It's more comfortable to set same capabilities/parameters on both
> source and target QEMU, than care about each device.

But it loses per-device control, right?  Say, we can have two devices, and
the admin can decide if only one of the devices will enable this feature.

> 
> 2. To not break the design, that machine-type + device options +
> migration capabilities and parameters are fully define the resulting
> migration stream. We'll break this if add in future more
> backend-transfer support in devices under same backend-transfer=true
> parameter.

Could you elaborate?

I thought last time we discussed, we planned to have both the global knob
and a per-device flag, then the feature is enabled only if both flags are
set.

If these parameters are all set the same on src/dst, would it also not
break the design when new devices start to support it (and the new device
will need to introduce its own per-device flags)?

> 
> The commit only brings the interface, the realization will come in later
> commit. That's why we add a temporary not-implemented error in
> migrate_params_check().
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> ---
>  migration/options.c | 39 +++++++++++++++++++++++++++++++++++++++
>  migration/options.h |  2 ++
>  qapi/migration.json | 42 ++++++++++++++++++++++++++++++++++++------
>  3 files changed, 77 insertions(+), 6 deletions(-)
> 
> diff --git a/migration/options.c b/migration/options.c
> index 5183112775..76709af3ab 100644
> --- a/migration/options.c
> +++ b/migration/options.c
> @@ -13,6 +13,7 @@
>  
>  #include "qemu/osdep.h"
>  #include "qemu/error-report.h"
> +#include "qapi/util.h"
>  #include "exec/target_page.h"
>  #include "qapi/clone-visitor.h"
>  #include "qapi/error.h"
> @@ -262,6 +263,20 @@ bool migrate_mapped_ram(void)
>      return s->capabilities[MIGRATION_CAPABILITY_MAPPED_RAM];
>  }
>  
> +bool migrate_virtio_net_tap(void)
> +{
> +    MigrationState *s = migrate_get_current();
> +    BackendTransferList *el = s->parameters.backend_transfer;
> +
> +    for ( ; el; el = el->next) {
> +        if (el->value == BACKEND_TRANSFER_VIRTIO_NET_TAP) {

So this is also something I want to avoid.  The hope is we don't
necessarily need to invent new device names into qapi/migration.json.
OTOH, we can export a helper in migration/misc.h so that devices can query
wehther the global feature is enabled or not, using that to AND the
per-device flag.

Thanks,

> +            return true;
> +        }
> +    }
> +
> +    return false;
> +}
> +
>  bool migrate_ignore_shared(void)
>  {
>      MigrationState *s = migrate_get_current();
> @@ -963,6 +978,12 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
>      params->cpr_exec_command = QAPI_CLONE(strList,
>                                            s->parameters.cpr_exec_command);
>  
> +    if (s->parameters.backend_transfer) {
> +        params->has_backend_transfer = true;
> +        params->backend_transfer = QAPI_CLONE(BackendTransferList,
> +                                              s->parameters.backend_transfer);
> +    }
> +
>      return params;
>  }
>  
> @@ -997,6 +1018,7 @@ void migrate_params_init(MigrationParameters *params)
>      params->has_zero_page_detection = true;
>      params->has_direct_io = true;
>      params->has_cpr_exec_command = true;
> +    params->has_backend_transfer = true;
>  }
>  
>  /*
> @@ -1183,6 +1205,12 @@ bool migrate_params_check(MigrationParameters *params, Error **errp)
>          return false;
>      }
>  
> +    /* TODO: implement backend-transfer and remove this check */
> +    if (params->has_backend_transfer) {
> +        error_setg(errp, "Not implemented");
> +        return false;
> +    }
> +
>      return true;
>  }
>  
> @@ -1305,6 +1333,10 @@ static void migrate_params_test_apply(MigrateSetParameters *params,
>      if (params->has_cpr_exec_command) {
>          dest->cpr_exec_command = params->cpr_exec_command;
>      }
> +
> +    if (params->has_backend_transfer) {
> +        dest->backend_transfer = params->backend_transfer;
> +    }
>  }
>  
>  static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
> @@ -1443,6 +1475,13 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
>          s->parameters.cpr_exec_command =
>              QAPI_CLONE(strList, params->cpr_exec_command);
>      }
> +
> +    if (params->has_backend_transfer) {
> +        qapi_free_BackendTransferList(s->parameters.backend_transfer);
> +
> +        s->parameters.backend_transfer = QAPI_CLONE(BackendTransferList,
> +                                                    params->backend_transfer);
> +    }
>  }
>  
>  void qmp_migrate_set_parameters(MigrateSetParameters *params, Error **errp)
> diff --git a/migration/options.h b/migration/options.h
> index 82d839709e..55c0345433 100644
> --- a/migration/options.h
> +++ b/migration/options.h
> @@ -87,6 +87,8 @@ const char *migrate_tls_hostname(void);
>  uint64_t migrate_xbzrle_cache_size(void);
>  ZeroPageDetection migrate_zero_page_detection(void);
>  
> +bool migrate_virtio_net_tap(void);
> +
>  /* parameters helpers */
>  
>  bool migrate_params_check(MigrationParameters *params, Error **errp);
> diff --git a/qapi/migration.json b/qapi/migration.json
> index be0f3fcc12..1bfe7df191 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -770,6 +770,19 @@
>        '*transform': 'BitmapMigrationBitmapAliasTransform'
>    } }
>  
> +##
> +# @BackendTransfer:
> +#
> +# @virtio-net-tap: Enable backend-transfer migration for
> +#     virtio-net/tap. When enabled, TAP fds and all related state are
> +#     passed to the destination in the migration channel (which must
> +#     be a UNIX domain socket).
> +#
> +# Since: 10.2
> +##
> +{ 'enum': 'BackendTransfer',
> +  'data': [ 'virtio-net-tap' ] }
> +
>  ##
>  # @BitmapMigrationNodeAlias:
>  #
> @@ -951,9 +964,13 @@
>  #     is @cpr-exec.  The first list element is the program's filename,
>  #     the remainder its arguments.  (Since 10.2)
>  #
> +# @backend-transfer: List of targets for backend-transfer migration.
> +#     See description in `BackendTransfer`.  Default is no
> +#     backend-transfer migration (Since 10.2)
> +#
>  # Features:
>  #
> -# @unstable: Members @x-checkpoint-delay and
> +# @unstable: Members @backend-transfer, @x-checkpoint-delay and
>  #     @x-vcpu-dirty-limit-period are experimental.
>  #
>  # Since: 2.4
> @@ -978,7 +995,8 @@
>             'mode',
>             'zero-page-detection',
>             'direct-io',
> -           'cpr-exec-command'] }
> +           'cpr-exec-command',
> +           { 'name': 'backend-transfer', 'features': ['unstable'] } ] }
>  
>  ##
>  # @MigrateSetParameters:
> @@ -1137,9 +1155,13 @@
>  #     is @cpr-exec.  The first list element is the program's filename,
>  #     the remainder its arguments.  (Since 10.2)
>  #
> +# @backend-transfer: List of targets for backend-transfer migration.
> +#     See description in `BackendTransfer`.  Default is no
> +#     backend-transfer migration (Since 10.2)
> +#
>  # Features:
>  #
> -# @unstable: Members @x-checkpoint-delay and
> +# @unstable: Members @backend-transfer, @x-checkpoint-delay and
>  #     @x-vcpu-dirty-limit-period are experimental.
>  #
>  # TODO: either fuse back into `MigrationParameters`, or make
> @@ -1179,7 +1201,9 @@
>              '*mode': 'MigMode',
>              '*zero-page-detection': 'ZeroPageDetection',
>              '*direct-io': 'bool',
> -            '*cpr-exec-command': [ 'str' ]} }
> +            '*cpr-exec-command': [ 'str' ],
> +            '*backend-transfer': { 'type': [ 'BackendTransfer' ],
> +                                   'features': [ 'unstable' ] } } }
>  
>  ##
>  # @migrate-set-parameters:
> @@ -1352,9 +1376,13 @@
>  #     is @cpr-exec.  The first list element is the program's filename,
>  #     the remainder its arguments.  (Since 10.2)
>  #
> +# @backend-transfer: List of targets for backend-transfer migration.
> +#     See description in `BackendTransfer`.  Default is no
> +#     backend-transfer migration (Since 10.2)
> +#
>  # Features:
>  #
> -# @unstable: Members @x-checkpoint-delay and
> +# @unstable: Members @backend-transfer, @x-checkpoint-delay and
>  #     @x-vcpu-dirty-limit-period are experimental.
>  #
>  # Since: 2.4
> @@ -1391,7 +1419,9 @@
>              '*mode': 'MigMode',
>              '*zero-page-detection': 'ZeroPageDetection',
>              '*direct-io': 'bool',
> -            '*cpr-exec-command': [ 'str' ]} }
> +            '*cpr-exec-command': [ 'str' ],
> +            '*backend-transfer': { 'type': [ 'BackendTransfer' ],
> +                                   'features': [ 'unstable' ] } } }
>  
>  ##
>  # @query-migrate-parameters:
> -- 
> 2.48.1
> 

-- 
Peter Xu

Re: [PATCH v7 16/19] qapi: add interface for backend-transfer virtio-net/tap migration

Posted by Vladimir Sementsov-Ogievskiy 3 months, 4 weeks ago

On 14.10.25 19:33, Peter Xu wrote:
> On Fri, Oct 10, 2025 at 08:39:54PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> To migrate virtio-net TAP device backend (including open fds) locally,
>> user should simply set migration parameter
>>
>>     backend-transfer = ["virtio-net-tap"]
>>
>> Why not simple boolean? To simplify migration to further versions,
>> when more devices will support backend-transfer migration.
>>
>> Alternatively, we may add per-device option to disable backend-transfer
>> migration, but still:
>>
>> 1. It's more comfortable to set same capabilities/parameters on both
>> source and target QEMU, than care about each device.
> 
> But it loses per-device control, right?  Say, we can have two devices, and
> the admin can decide if only one of the devices will enable this feature.
> 

Right. But, in short:

1. I'm not sure, that such granularity is necessary.

2. It may implemented later, on top of the feature.

>>
>> 2. To not break the design, that machine-type + device options +
>> migration capabilities and parameters are fully define the resulting
>> migration stream. We'll break this if add in future more
>> backend-transfer support in devices under same backend-transfer=true
>> parameter.
> 
> Could you elaborate?
> 
> I thought last time we discussed, we planned to have both the global knob
> and a per-device flag, then the feature is enabled only if both flags are
> set.

Right, here in v3: https://lists.nongnu.org/archive/html/qemu-devel/2025-09/msg01644.html

Still at this point, I also needed local-incoming=true target option, so I
considered all the parameters like "I can't make feature without extra
per-device options, so here they are".

A day later, after motivating comment from Markus (accidentally in v2),
I found and suggested the way:

https://lists.nongnu.org/archive/html/qemu-devel/2025-09/msg01960.html

And further versions v4-v7 were the realization of the idea. Still, main
benefit is possibility to get rid of per-device local-incoming=true
options for target, not about a kind of per-device "capability" flag we
discuss now.

A, and here I said [1]:

> 1. global fds-passing migration capability, to enable/disable the whole feature
> 
> 2. per-device fds-passing option, on by default for all supporting devices, to 
> be
> able to disable backing migration for some devices. (we discussed it here: 
> https://lore.kernel.org/all/aL8kuXQ2JF1TV3M7@x1.local/ ).
> Still, normally these options are always on by default.
> And more over, I can postpone their implementation to separate series, to 
> reduce discussion field, and to check that everything may work without 
> additional user input.

And then, went this way, postponing realization of per-device options..

And then, developing similar migration for vhost-user-blk, found
that I can't use on boolean capability for such features, the reason
in commit message, which we discuss now.

Than, current design came in v5 (v4 was skipped).. And I even got an
approval from Fabiano :)

https://lists.nongnu.org/archive/html/qemu-devel/2025-09/msg03999.html

> 
> If these parameters are all set the same on src/dst, would it also not
> break the design when new devices start to support it (and the new device
> will need to introduce its own per-device flags)?

Yes, right.

I missed, that, "postponing (probably forever)" per-device options
realization, I started to implement another way to solve the same
problem (switching from one boolean capability to a backend-transfer
list).

In other words, if at some point implement per-device options, that will
partly intersect by functionality with current complex migration
parameter..

-

But still, I think, that parameter backend-transfer = [list of targets]
is better than per-device option. With per-device options we'll have to
care about them forever. I can't imagine a way to make them TRUE by
default.

Using machine type, to set option to TRUE by default in new MT, and to
false in all previous ones doesn't make real sense: we never migrate on
another MT, but we do can migrate from QEMU without support for
virtio-net backend transfer to the QEMU with such support. And on target
QEMU we'll want to enable virtio-net backend-transfer for further
migrations..

So, I think, modifying machine types is wrong idea here. So, we have to
keep new options FALSE by default, and management tool have to care to
set them appropriately.

-

Let's look from the POV of management tool.

With complex parameter (list of backend-transfer targets, suggested with
this series), what should we do?

1. With introspection, get backend-transfer targets supported by source
    and target QEMUs
2. Get and intersection, assume X
3. Set same backend-transfer=X on source and target
4. Start a migration

But with per-device parameters it becomes a lot more complicated and
error prone

1. Somehow understand (how?), which devices support backend-transfer on
    source and target
2. Get an intersection
3. Set all the backend-transfer options on both vms correspondingly,
    doing personal qom-set for each device
4. Start a migration

-

In short:

1. per device - is too high granularity, making management more complex

2. per feature - is what we need. And it's a normal use for migration
capabilities: we implement a new migration feature, and add new
capability. The only new bit with this series is that "we are going to"
implement similar capabilities later, and seems good to organize them
all into a list, rather than make separate booleans.

> 
>>
>> The commit only brings the interface, the realization will come in later
>> commit. That's why we add a temporary not-implemented error in
>> migrate_params_check().
>>

[..]

>>   
>> +bool migrate_virtio_net_tap(void)
>> +{
>> +    MigrationState *s = migrate_get_current();
>> +    BackendTransferList *el = s->parameters.backend_transfer;
>> +
>> +    for ( ; el; el = el->next) {
>> +        if (el->value == BACKEND_TRANSFER_VIRTIO_NET_TAP) {
> 
> So this is also something I want to avoid.  The hope is we don't
> necessarily need to invent new device names into qapi/migration.json.
> OTOH, we can export a helper in migration/misc.h so that devices can query
> wehther the global feature is enabled or not, using that to AND the
> per-device flag.
> 

Understand. But I can't imagine how to keep management simple with per-device
options..

-

What do you think?

-- 
Best regards,
Vladimir

Re: [PATCH v7 16/19] qapi: add interface for backend-transfer virtio-net/tap migration

Posted by Peter Xu 3 months, 4 weeks ago

On Tue, Oct 14, 2025 at 10:31:30PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 14.10.25 19:33, Peter Xu wrote:
> > On Fri, Oct 10, 2025 at 08:39:54PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > To migrate virtio-net TAP device backend (including open fds) locally,
> > > user should simply set migration parameter
> > > 
> > >     backend-transfer = ["virtio-net-tap"]
> > > 
> > > Why not simple boolean? To simplify migration to further versions,
> > > when more devices will support backend-transfer migration.
> > > 
> > > Alternatively, we may add per-device option to disable backend-transfer
> > > migration, but still:
> > > 
> > > 1. It's more comfortable to set same capabilities/parameters on both
> > > source and target QEMU, than care about each device.
> > 
> > But it loses per-device control, right?  Say, we can have two devices, and
> > the admin can decide if only one of the devices will enable this feature.
> > 
> 
> Right. But, in short:
> 
> 1. I'm not sure, that such granularity is necessary.
> 
> 2. It may implemented later, on top of the feature.

I confess that's not a good example, but my point was that it was
straightforward idea to have two layers of settings, meanwhile it provides
full flexiblity.

> 
> > > 
> > > 2. To not break the design, that machine-type + device options +
> > > migration capabilities and parameters are fully define the resulting
> > > migration stream. We'll break this if add in future more
> > > backend-transfer support in devices under same backend-transfer=true
> > > parameter.
> > 
> > Could you elaborate?
> > 
> > I thought last time we discussed, we planned to have both the global knob
> > and a per-device flag, then the feature is enabled only if both flags are
> > set.
> 
> Right, here in v3: https://lists.nongnu.org/archive/html/qemu-devel/2025-09/msg01644.html
> 
> Still at this point, I also needed local-incoming=true target option, so I
> considered all the parameters like "I can't make feature without extra
> per-device options, so here they are".
> 
> A day later, after motivating comment from Markus (accidentally in v2),
> I found and suggested the way:
> 
> https://lists.nongnu.org/archive/html/qemu-devel/2025-09/msg01960.html
> 
> And further versions v4-v7 were the realization of the idea. Still, main
> benefit is possibility to get rid of per-device local-incoming=true
> options for target, not about a kind of per-device "capability" flag we
> discuss now.
> 
> A, and here I said [1]:
> 
> > 1. global fds-passing migration capability, to enable/disable the whole feature
> > 
> > 2. per-device fds-passing option, on by default for all supporting
> > devices, to be
> > able to disable backing migration for some devices. (we discussed it
> > here: https://lore.kernel.org/all/aL8kuXQ2JF1TV3M7@x1.local/ ).
> > Still, normally these options are always on by default.
> > And more over, I can postpone their implementation to separate series,
> > to reduce discussion field, and to check that everything may work
> > without additional user input.
> 
> And then, went this way, postponing realization of per-device options..

Postponing the per-device flag might still break different backends if you
specify the list with virtio-net-pci.

But only until now, I noticed you were using "virtio-net-tap" instead of
"virtio-net-pci".

Ouch.. I think that's even more complicated. :(

Here I think the problem is, introducing some arbitrary strings into
migration QAPI to represent some combinations of "virtio frontend F1" and
"virtio backend B1" doesn't sound the right thing to do.  Migration ideally
should have zero knowledge of the device topology, types of devices,
frontends or backends.  "virtio-*" as a string should not appear in
migration/ or qapi/migration.json at all..

> 
> And then, developing similar migration for vhost-user-blk, found
> that I can't use on boolean capability for such features, the reason
> in commit message, which we discuss now.

Why a bool isn't enough?  Could you share a link to that discussion?

> 
> Than, current design came in v5 (v4 was skipped).. And I even got an
> approval from Fabiano :)
> 
> https://lists.nongnu.org/archive/html/qemu-devel/2025-09/msg03999.html
> 
> > 
> > If these parameters are all set the same on src/dst, would it also not
> > break the design when new devices start to support it (and the new device
> > will need to introduce its own per-device flags)?
> 
> Yes, right.
> 
> I missed, that, "postponing (probably forever)" per-device options
> realization, I started to implement another way to solve the same
> problem (switching from one boolean capability to a backend-transfer
> list).
> 
> In other words, if at some point implement per-device options, that will
> partly intersect by functionality with current complex migration
> parameter..
> 
> -
> 
> But still, I think, that parameter backend-transfer = [list of targets]
> is better than per-device option. With per-device options we'll have to
> care about them forever. I can't imagine a way to make them TRUE by
> default.
> 
> Using machine type, to set option to TRUE by default in new MT, and to
> false in all previous ones doesn't make real sense: we never migrate on
> another MT, but we do can migrate from QEMU without support for
> virtio-net backend transfer to the QEMU with such support. And on target
> QEMU we'll want to enable virtio-net backend-transfer for further
> migrations..

So this is likely why you changed your mind.  I think machine properties
definitely make sense.

We set it OFF on old machines because when on old machines the src QEMU
_may_ not support this feature.  We set it ON on new machines because when
the QEMU has the new machine declared anyway, it is guaranteed to support
the feature.

We can still manually set the per-device properties iff the admin is sure
that both sides of "old" QEMUs support this feature.  However machine
properties worked like that for many years and I believe that's how it
works, by being always on the safe side.

> 
> So, I think, modifying machine types is wrong idea here. So, we have to
> keep new options FALSE by default, and management tool have to care to
> set them appropriately.
> 
> -
> 
> Let's look from the POV of management tool.
> 
> With complex parameter (list of backend-transfer targets, suggested with
> this series), what should we do?
> 
> 1. With introspection, get backend-transfer targets supported by source
>    and target QEMUs
> 2. Get and intersection, assume X
> 3. Set same backend-transfer=X on source and target
> 4. Start a migration
> 
> But with per-device parameters it becomes a lot more complicated and
> error prone
> 
> 1. Somehow understand (how?), which devices support backend-transfer on
>    source and target
> 2. Get an intersection
> 3. Set all the backend-transfer options on both vms correspondingly,
>    doing personal qom-set for each device
> 4. Start a migration
> 
> -
> 
> In short:
> 
> 1. per device - is too high granularity, making management more complex

If we follow the machine property way of doing this (which I believe we
used for years), then mgmt doesn't need any change except properly enable
fd-passing in migration cap/params when it's a local migration.  That's
all.  It doesn't need to know anything about "which device(s) supports
fd-passing", because they'll all be auto-set by the machine types.

> 
> 2. per feature - is what we need. And it's a normal use for migration
> capabilities: we implement a new migration feature, and add new
> capability. The only new bit with this series is that "we are going to"
> implement similar capabilities later, and seems good to organize them
> all into a list, rather than make separate booleans.
> 
> 
> > 
> > > 
> > > The commit only brings the interface, the realization will come in later
> > > commit. That's why we add a temporary not-implemented error in
> > > migrate_params_check().
> > > 
> 
> [..]
> 
> > > +bool migrate_virtio_net_tap(void)
> > > +{
> > > +    MigrationState *s = migrate_get_current();
> > > +    BackendTransferList *el = s->parameters.backend_transfer;
> > > +
> > > +    for ( ; el; el = el->next) {
> > > +        if (el->value == BACKEND_TRANSFER_VIRTIO_NET_TAP) {
> > 
> > So this is also something I want to avoid.  The hope is we don't
> > necessarily need to invent new device names into qapi/migration.json.
> > OTOH, we can export a helper in migration/misc.h so that devices can query
> > wehther the global feature is enabled or not, using that to AND the
> > per-device flag.
> > 
> 
> Understand. But I can't imagine how to keep management simple with per-device
> options..
> 
> -
> 
> What do you think?

I feel like you wanted to enable this feature _while_ using an old machine
type.  Is that what you're looking for?  Can you simply urge the users to
move to new machine types when looking for new features?  I believe that's
what we do..

MT properties were working like that for a long time.  What you were asking
is fair, but if so I'd still like to double check with you on that's your
real purpose (enabling this feature on NEW qemus but OLD machine types, all
automatically).

Thanks,

-- 
Peter Xu

Re: [PATCH v7 16/19] qapi: add interface for backend-transfer virtio-net/tap migration

Posted by Vladimir Sementsov-Ogievskiy 3 months, 4 weeks ago

On 14.10.25 23:25, Peter Xu wrote:
> On Tue, Oct 14, 2025 at 10:31:30PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> On 14.10.25 19:33, Peter Xu wrote:
>>> On Fri, Oct 10, 2025 at 08:39:54PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>>>> To migrate virtio-net TAP device backend (including open fds) locally,
>>>> user should simply set migration parameter
>>>>
>>>>      backend-transfer = ["virtio-net-tap"]
>>>>
>>>> Why not simple boolean? To simplify migration to further versions,
>>>> when more devices will support backend-transfer migration.
>>>>
>>>> Alternatively, we may add per-device option to disable backend-transfer
>>>> migration, but still:
>>>>
>>>> 1. It's more comfortable to set same capabilities/parameters on both
>>>> source and target QEMU, than care about each device.
>>>
>>> But it loses per-device control, right?  Say, we can have two devices, and
>>> the admin can decide if only one of the devices will enable this feature.
>>>
>>
>> Right. But, in short:
>>
>> 1. I'm not sure, that such granularity is necessary.
>>
>> 2. It may implemented later, on top of the feature.
> 
> I confess that's not a good example, but my point was that it was
> straightforward idea to have two layers of settings, meanwhile it provides
> full flexiblity.
> 
>>
>>>>
>>>> 2. To not break the design, that machine-type + device options +
>>>> migration capabilities and parameters are fully define the resulting
>>>> migration stream. We'll break this if add in future more
>>>> backend-transfer support in devices under same backend-transfer=true
>>>> parameter.
>>>
>>> Could you elaborate?
>>>
>>> I thought last time we discussed, we planned to have both the global knob
>>> and a per-device flag, then the feature is enabled only if both flags are
>>> set.
>>
>> Right, here in v3: https://lists.nongnu.org/archive/html/qemu-devel/2025-09/msg01644.html
>>
>> Still at this point, I also needed local-incoming=true target option, so I
>> considered all the parameters like "I can't make feature without extra
>> per-device options, so here they are".
>>
>> A day later, after motivating comment from Markus (accidentally in v2),
>> I found and suggested the way:
>>
>> https://lists.nongnu.org/archive/html/qemu-devel/2025-09/msg01960.html
>>
>> And further versions v4-v7 were the realization of the idea. Still, main
>> benefit is possibility to get rid of per-device local-incoming=true
>> options for target, not about a kind of per-device "capability" flag we
>> discuss now.
>>
>> A, and here I said [1]:
>>
>>> 1. global fds-passing migration capability, to enable/disable the whole feature
>>>
>>> 2. per-device fds-passing option, on by default for all supporting
>>> devices, to be
>>> able to disable backing migration for some devices. (we discussed it
>>> here: https://lore.kernel.org/all/aL8kuXQ2JF1TV3M7@x1.local/ ).
>>> Still, normally these options are always on by default.
>>> And more over, I can postpone their implementation to separate series,
>>> to reduce discussion field, and to check that everything may work
>>> without additional user input.
>>
>> And then, went this way, postponing realization of per-device options..
> 
> Postponing the per-device flag might still break different backends if you
> specify the list with virtio-net-pci.
> 
> But only until now, I noticed you were using "virtio-net-tap" instead of
> "virtio-net-pci".
> 
> Ouch.. I think that's even more complicated. :(
> 
> Here I think the problem is, introducing some arbitrary strings into
> migration QAPI to represent some combinations of "virtio frontend F1" and
> "virtio backend B1" doesn't sound the right thing to do.  Migration ideally
> should have zero knowledge of the device topology, types of devices,
> frontends or backends.  "virtio-*" as a string should not appear in
> migration/ or qapi/migration.json at all..
> 
>>
>> And then, developing similar migration for vhost-user-blk, found
>> that I can't use on boolean capability for such features, the reason
>> in commit message, which we discuss now.
> 
> Why a bool isn't enough?  Could you share a link to that discussion?
> 
>>
>> Than, current design came in v5 (v4 was skipped).. And I even got an
>> approval from Fabiano :)
>>
>> https://lists.nongnu.org/archive/html/qemu-devel/2025-09/msg03999.html
>>
>>>
>>> If these parameters are all set the same on src/dst, would it also not
>>> break the design when new devices start to support it (and the new device
>>> will need to introduce its own per-device flags)?
>>
>> Yes, right.
>>
>> I missed, that, "postponing (probably forever)" per-device options
>> realization, I started to implement another way to solve the same
>> problem (switching from one boolean capability to a backend-transfer
>> list).
>>
>> In other words, if at some point implement per-device options, that will
>> partly intersect by functionality with current complex migration
>> parameter..
>>
>> -
>>
>> But still, I think, that parameter backend-transfer = [list of targets]
>> is better than per-device option. With per-device options we'll have to
>> care about them forever. I can't imagine a way to make them TRUE by
>> default.
>>
>> Using machine type, to set option to TRUE by default in new MT, and to
>> false in all previous ones doesn't make real sense: we never migrate on
>> another MT, but we do can migrate from QEMU without support for
>> virtio-net backend transfer to the QEMU with such support. And on target
>> QEMU we'll want to enable virtio-net backend-transfer for further
>> migrations..
> 
> So this is likely why you changed your mind.  I think machine properties
> definitely make sense.
> 
> We set it OFF on old machines because when on old machines the src QEMU
> _may_ not support this feature.  We set it ON on new machines because when
> the QEMU has the new machine declared anyway, it is guaranteed to support
> the feature.
> 
> We can still manually set the per-device properties iff the admin is sure
> that both sides of "old" QEMUs support this feature.  However machine
> properties worked like that for many years and I believe that's how it
> works, by being always on the safe side.
> 
>>
>> So, I think, modifying machine types is wrong idea here. So, we have to
>> keep new options FALSE by default, and management tool have to care to
>> set them appropriately.
>>
>> -
>>
>> Let's look from the POV of management tool.
>>
>> With complex parameter (list of backend-transfer targets, suggested with
>> this series), what should we do?
>>
>> 1. With introspection, get backend-transfer targets supported by source
>>     and target QEMUs
>> 2. Get and intersection, assume X
>> 3. Set same backend-transfer=X on source and target
>> 4. Start a migration
>>
>> But with per-device parameters it becomes a lot more complicated and
>> error prone
>>
>> 1. Somehow understand (how?), which devices support backend-transfer on
>>     source and target
>> 2. Get an intersection
>> 3. Set all the backend-transfer options on both vms correspondingly,
>>     doing personal qom-set for each device
>> 4. Start a migration
>>
>> -
>>
>> In short:
>>
>> 1. per device - is too high granularity, making management more complex
> 
> If we follow the machine property way of doing this (which I believe we
> used for years), then mgmt doesn't need any change except properly enable
> fd-passing in migration cap/params when it's a local migration.  That's
> all.  It doesn't need to know anything about "which device(s) supports
> fd-passing", because they'll all be auto-set by the machine types.
> 
>>
>> 2. per feature - is what we need. And it's a normal use for migration
>> capabilities: we implement a new migration feature, and add new
>> capability. The only new bit with this series is that "we are going to"
>> implement similar capabilities later, and seems good to organize them
>> all into a list, rather than make separate booleans.
>>
>>
>>>
>>>>
>>>> The commit only brings the interface, the realization will come in later
>>>> commit. That's why we add a temporary not-implemented error in
>>>> migrate_params_check().
>>>>
>>
>> [..]
>>
>>>> +bool migrate_virtio_net_tap(void)
>>>> +{
>>>> +    MigrationState *s = migrate_get_current();
>>>> +    BackendTransferList *el = s->parameters.backend_transfer;
>>>> +
>>>> +    for ( ; el; el = el->next) {
>>>> +        if (el->value == BACKEND_TRANSFER_VIRTIO_NET_TAP) {
>>>
>>> So this is also something I want to avoid.  The hope is we don't
>>> necessarily need to invent new device names into qapi/migration.json.
>>> OTOH, we can export a helper in migration/misc.h so that devices can query
>>> wehther the global feature is enabled or not, using that to AND the
>>> per-device flag.
>>>
>>
>> Understand. But I can't imagine how to keep management simple with per-device
>> options..
>>
>> -
>>
>> What do you think?
> 
> I feel like you wanted to enable this feature _while_ using an old machine
> type.

Exactly

> Is that what you're looking for?  Can you simply urge the users to
> move to new machine types when looking for new features?  I believe that's
> what we do..
> 
> MT properties were working like that for a long time.  What you were asking
> is fair, but if so I'd still like to double check with you on that's your
> real purpose (enabling this feature on NEW qemus but OLD machine types, all
> automatically).
> 

You made me think.

On the one hand, you are right, I agree with all arguments about migration
being separate from virtio device types, their backends and frontends.

And yes, if refuse the idea of enabling the feature in old machine types
automatically, everything fits into existing paradigm.

On the other hand is our downstream practice in the cloud. We introduce
new machine types _very_ seldom. Almost always, new features developed
or backported to our downstream doesn't require new machine type. In such
situation, creating feature, which theoretically (and more simple in API!)
may be done without introducing new MT, but creating it by introducing new
MT, postponing the moment when we start to widely use it up to the moment when
most of existing vms will die or restart naturally (as for sure, we'll not
ask users to restart them, it would be too expensive (not saying about,
is restart a safe way to change MT, or we'd better recreate a vm), seems
very strange for me. (too long sentence detector blinking).

So, finally, it's OK for me to switch to per-device properties. Then, in
downstream I may implement corresponding capabilities to simplify management.
That's rather simple.

-

Interesting, could migration "return path" be somehow used to get information
from target, does it support backend transfer for concrete device?

So that, we simply enable backend-transfer=true parameter both on
source and target. Than, source somehow find out through return path,
for the device, does target support backend-transfer for it, and decide,
what to do? Or that's too complicated?

-- 
Best regards,
Vladimir

Re: [PATCH v7 16/19] qapi: add interface for backend-transfer virtio-net/tap migration

Posted by Peter Xu 3 months, 3 weeks ago

On Wed, Oct 15, 2025 at 12:46:26AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 14.10.25 23:25, Peter Xu wrote:
> > On Tue, Oct 14, 2025 at 10:31:30PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > On 14.10.25 19:33, Peter Xu wrote:
> > > > On Fri, Oct 10, 2025 at 08:39:54PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > > > To migrate virtio-net TAP device backend (including open fds) locally,
> > > > > user should simply set migration parameter
> > > > > 
> > > > >      backend-transfer = ["virtio-net-tap"]
> > > > > 
> > > > > Why not simple boolean? To simplify migration to further versions,
> > > > > when more devices will support backend-transfer migration.
> > > > > 
> > > > > Alternatively, we may add per-device option to disable backend-transfer
> > > > > migration, but still:
> > > > > 
> > > > > 1. It's more comfortable to set same capabilities/parameters on both
> > > > > source and target QEMU, than care about each device.
> > > > 
> > > > But it loses per-device control, right?  Say, we can have two devices, and
> > > > the admin can decide if only one of the devices will enable this feature.
> > > > 
> > > 
> > > Right. But, in short:
> > > 
> > > 1. I'm not sure, that such granularity is necessary.
> > > 
> > > 2. It may implemented later, on top of the feature.
> > 
> > I confess that's not a good example, but my point was that it was
> > straightforward idea to have two layers of settings, meanwhile it provides
> > full flexiblity.
> > 
> > > 
> > > > > 
> > > > > 2. To not break the design, that machine-type + device options +
> > > > > migration capabilities and parameters are fully define the resulting
> > > > > migration stream. We'll break this if add in future more
> > > > > backend-transfer support in devices under same backend-transfer=true
> > > > > parameter.
> > > > 
> > > > Could you elaborate?
> > > > 
> > > > I thought last time we discussed, we planned to have both the global knob
> > > > and a per-device flag, then the feature is enabled only if both flags are
> > > > set.
> > > 
> > > Right, here in v3: https://lists.nongnu.org/archive/html/qemu-devel/2025-09/msg01644.html
> > > 
> > > Still at this point, I also needed local-incoming=true target option, so I
> > > considered all the parameters like "I can't make feature without extra
> > > per-device options, so here they are".
> > > 
> > > A day later, after motivating comment from Markus (accidentally in v2),
> > > I found and suggested the way:
> > > 
> > > https://lists.nongnu.org/archive/html/qemu-devel/2025-09/msg01960.html
> > > 
> > > And further versions v4-v7 were the realization of the idea. Still, main
> > > benefit is possibility to get rid of per-device local-incoming=true
> > > options for target, not about a kind of per-device "capability" flag we
> > > discuss now.
> > > 
> > > A, and here I said [1]:
> > > 
> > > > 1. global fds-passing migration capability, to enable/disable the whole feature
> > > > 
> > > > 2. per-device fds-passing option, on by default for all supporting
> > > > devices, to be
> > > > able to disable backing migration for some devices. (we discussed it
> > > > here: https://lore.kernel.org/all/aL8kuXQ2JF1TV3M7@x1.local/ ).
> > > > Still, normally these options are always on by default.
> > > > And more over, I can postpone their implementation to separate series,
> > > > to reduce discussion field, and to check that everything may work
> > > > without additional user input.
> > > 
> > > And then, went this way, postponing realization of per-device options..
> > 
> > Postponing the per-device flag might still break different backends if you
> > specify the list with virtio-net-pci.
> > 
> > But only until now, I noticed you were using "virtio-net-tap" instead of
> > "virtio-net-pci".
> > 
> > Ouch.. I think that's even more complicated. :(
> > 
> > Here I think the problem is, introducing some arbitrary strings into
> > migration QAPI to represent some combinations of "virtio frontend F1" and
> > "virtio backend B1" doesn't sound the right thing to do.  Migration ideally
> > should have zero knowledge of the device topology, types of devices,
> > frontends or backends.  "virtio-*" as a string should not appear in
> > migration/ or qapi/migration.json at all..
> > 
> > > 
> > > And then, developing similar migration for vhost-user-blk, found
> > > that I can't use on boolean capability for such features, the reason
> > > in commit message, which we discuss now.
> > 
> > Why a bool isn't enough?  Could you share a link to that discussion?
> > 
> > > 
> > > Than, current design came in v5 (v4 was skipped).. And I even got an
> > > approval from Fabiano :)
> > > 
> > > https://lists.nongnu.org/archive/html/qemu-devel/2025-09/msg03999.html
> > > 
> > > > 
> > > > If these parameters are all set the same on src/dst, would it also not
> > > > break the design when new devices start to support it (and the new device
> > > > will need to introduce its own per-device flags)?
> > > 
> > > Yes, right.
> > > 
> > > I missed, that, "postponing (probably forever)" per-device options
> > > realization, I started to implement another way to solve the same
> > > problem (switching from one boolean capability to a backend-transfer
> > > list).
> > > 
> > > In other words, if at some point implement per-device options, that will
> > > partly intersect by functionality with current complex migration
> > > parameter..
> > > 
> > > -
> > > 
> > > But still, I think, that parameter backend-transfer = [list of targets]
> > > is better than per-device option. With per-device options we'll have to
> > > care about them forever. I can't imagine a way to make them TRUE by
> > > default.
> > > 
> > > Using machine type, to set option to TRUE by default in new MT, and to
> > > false in all previous ones doesn't make real sense: we never migrate on
> > > another MT, but we do can migrate from QEMU without support for
> > > virtio-net backend transfer to the QEMU with such support. And on target
> > > QEMU we'll want to enable virtio-net backend-transfer for further
> > > migrations..
> > 
> > So this is likely why you changed your mind.  I think machine properties
> > definitely make sense.
> > 
> > We set it OFF on old machines because when on old machines the src QEMU
> > _may_ not support this feature.  We set it ON on new machines because when
> > the QEMU has the new machine declared anyway, it is guaranteed to support
> > the feature.
> > 
> > We can still manually set the per-device properties iff the admin is sure
> > that both sides of "old" QEMUs support this feature.  However machine
> > properties worked like that for many years and I believe that's how it
> > works, by being always on the safe side.
> > 
> > > 
> > > So, I think, modifying machine types is wrong idea here. So, we have to
> > > keep new options FALSE by default, and management tool have to care to
> > > set them appropriately.
> > > 
> > > -
> > > 
> > > Let's look from the POV of management tool.
> > > 
> > > With complex parameter (list of backend-transfer targets, suggested with
> > > this series), what should we do?
> > > 
> > > 1. With introspection, get backend-transfer targets supported by source
> > >     and target QEMUs
> > > 2. Get and intersection, assume X
> > > 3. Set same backend-transfer=X on source and target
> > > 4. Start a migration
> > > 
> > > But with per-device parameters it becomes a lot more complicated and
> > > error prone
> > > 
> > > 1. Somehow understand (how?), which devices support backend-transfer on
> > >     source and target
> > > 2. Get an intersection
> > > 3. Set all the backend-transfer options on both vms correspondingly,
> > >     doing personal qom-set for each device
> > > 4. Start a migration
> > > 
> > > -
> > > 
> > > In short:
> > > 
> > > 1. per device - is too high granularity, making management more complex
> > 
> > If we follow the machine property way of doing this (which I believe we
> > used for years), then mgmt doesn't need any change except properly enable
> > fd-passing in migration cap/params when it's a local migration.  That's
> > all.  It doesn't need to know anything about "which device(s) supports
> > fd-passing", because they'll all be auto-set by the machine types.
> > 
> > > 
> > > 2. per feature - is what we need. And it's a normal use for migration
> > > capabilities: we implement a new migration feature, and add new
> > > capability. The only new bit with this series is that "we are going to"
> > > implement similar capabilities later, and seems good to organize them
> > > all into a list, rather than make separate booleans.
> > > 
> > > 
> > > > 
> > > > > 
> > > > > The commit only brings the interface, the realization will come in later
> > > > > commit. That's why we add a temporary not-implemented error in
> > > > > migrate_params_check().
> > > > > 
> > > 
> > > [..]
> > > 
> > > > > +bool migrate_virtio_net_tap(void)
> > > > > +{
> > > > > +    MigrationState *s = migrate_get_current();
> > > > > +    BackendTransferList *el = s->parameters.backend_transfer;
> > > > > +
> > > > > +    for ( ; el; el = el->next) {
> > > > > +        if (el->value == BACKEND_TRANSFER_VIRTIO_NET_TAP) {
> > > > 
> > > > So this is also something I want to avoid.  The hope is we don't
> > > > necessarily need to invent new device names into qapi/migration.json.
> > > > OTOH, we can export a helper in migration/misc.h so that devices can query
> > > > wehther the global feature is enabled or not, using that to AND the
> > > > per-device flag.
> > > > 
> > > 
> > > Understand. But I can't imagine how to keep management simple with per-device
> > > options..
> > > 
> > > -
> > > 
> > > What do you think?
> > 
> > I feel like you wanted to enable this feature _while_ using an old machine
> > type.
> 
> Exactly
> 
> > Is that what you're looking for?  Can you simply urge the users to
> > move to new machine types when looking for new features?  I believe that's
> > what we do..
> > 
> > MT properties were working like that for a long time.  What you were asking
> > is fair, but if so I'd still like to double check with you on that's your
> > real purpose (enabling this feature on NEW qemus but OLD machine types, all
> > automatically).
> > 
> 
> You made me think.
> 
> On the one hand, you are right, I agree with all arguments about migration
> being separate from virtio device types, their backends and frontends.
> 
> And yes, if refuse the idea of enabling the feature in old machine types
> automatically, everything fits into existing paradigm.
> 
> On the other hand is our downstream practice in the cloud. We introduce
> new machine types _very_ seldom. Almost always, new features developed
> or backported to our downstream doesn't require new machine type. In such
> situation, creating feature, which theoretically (and more simple in API!)
> may be done without introducing new MT, but creating it by introducing new
> MT, postponing the moment when we start to widely use it up to the moment when
> most of existing vms will die or restart naturally (as for sure, we'll not
> ask users to restart them, it would be too expensive (not saying about,
> is restart a safe way to change MT, or we'd better recreate a vm), seems
> very strange for me. (too long sentence detector blinking).

Yes, I agree once more it's still a fair ask, it's just not the major way
we do it in QEMU upstream otherwise there's no point introducing versioned
machine types (while we still need things like pc/q35 to identify the
boards even if no versioning on each of them).

> 
> So, finally, it's OK for me to switch to per-device properties. Then, in
> downstream I may implement corresponding capabilities to simplify management.
> That's rather simple.

With per-device properties, maybe.. it's still feasible to qom-list the
devices on both src/dst to know whether both of them would support this,
then turning it on if qom-list can report the property on both sides.  I
didn't think deeper than that, though..

> 
> -
> 
> Interesting, could migration "return path" be somehow used to get information
> from target, does it support backend transfer for concrete device?
> 
> So that, we simply enable backend-transfer=true parameter both on
> source and target. Than, source somehow find out through return path,
> for the device, does target support backend-transfer for it, and decide,
> what to do? Or that's too complicated?

Fabiano is looking at something like that, we called it migration
handshake.

https://wiki.qemu.org/ToDo/LiveMigration#Migration_handshake

Fundamentally one of its goal is that we can have bi-directional "talks"
between src/dst, before migration ever started, to synchronize on things
like this.  It's still likely not gonna happen this release.. though..  but
it's on the radar.  With that, dst also doesn't need to set migration
caps/params the same as src, because they'll talk things over.

> 
> -- 
> Best regards,
> Vladimir
> 

-- 
Peter Xu

Re: [PATCH v7 16/19] qapi: add interface for backend-transfer virtio-net/tap migration

Posted by Vladimir Sementsov-Ogievskiy 3 months, 3 weeks ago

On 15.10.25 21:27, Peter Xu wrote:
>> Interesting, could migration "return path" be somehow used to get information
>> from target, does it support backend transfer for concrete device?
>>
>> So that, we simply enable backend-transfer=true parameter both on
>> source and target. Than, source somehow find out through return path,
>> for the device, does target support backend-transfer for it, and decide,
>> what to do? Or that's too complicated?
> Fabiano is looking at something like that, we called it migration
> handshake.
> 
> https://wiki.qemu.org/ToDo/LiveMigration#Migration_handshake
> 
> Fundamentally one of its goal is that we can have bi-directional "talks"
> between src/dst, before migration ever started, to synchronize on things
> like this.  It's still likely not gonna happen this release.. though..  but
> it's on the radar.  With that, dst also doesn't need to set migration
> caps/params the same as src, because they'll talk things over.

Oh, that sounds cool, I've always dreamed of something like this.

Note for myself: look through the QEMU wiki, it may contain quite interesting things,
not only "QEMU Planning" and "Submit a Patch" :)

For live-update with backend transfer, we'll probably can not only check the
device tree, but recreate it automatically, using information from target.

> Allow QMP command "migrate[_incoming]" ..

O I thought about this too.

-

Off topic:

Didn't you think about moving to some context-free protocol for migration
stream? Current protocol is hardly bound to migration states definitions
in the code. This, for example, makes writing an external tool to analyze the
stream almost impossible. As well, any misconfiguration leads to strange
error, when we treat data wrongly on the target.

I imagine.. json? Or something like this.. So that we can always understand
the structure of incoming object, even if we don't know, what exactly we
are going to get. This also simplifies expanding the state in new verions:
we just add a new field into migratable object, and can handle absent field
in incoming stream.

-- 
Best regards,
Vladimir

Re: [PATCH v7 16/19] qapi: add interface for backend-transfer virtio-net/tap migration

Posted by Peter Xu 3 months, 3 weeks ago

On Wed, Oct 15, 2025 at 11:17:27PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> Off topic:
> 
> Didn't you think about moving to some context-free protocol for migration
> stream? Current protocol is hardly bound to migration states definitions
> in the code. This, for example, makes writing an external tool to analyze the
> stream almost impossible. As well, any misconfiguration leads to strange
> error, when we treat data wrongly on the target.
> 
> I imagine.. json? Or something like this.. So that we can always understand
> the structure of incoming object, even if we don't know, what exactly we
> are going to get. This also simplifies expanding the state in new verions:
> we just add a new field into migratable object, and can handle absent field
> in incoming stream.

Have you looked at the current encoded JSON dump within the migration
stream?  See should_send_vmdesc().

That looks like what you're describing, but definitely different in that it
should only be used for debugging purposes e.g. when a stream is dumped
into a file.  The JSON should only only appear also on precopy as of now.

We might try to move it _before_ the real binary stream, or making the
stream itself to be JSON, but there'll be tricky things we need to think
about.

At least it should be problematic when we want to dump it before the binary
stream, because there can be VMSD fields or subsections that has a test()
function that will only conditionally appear depending on any possible
conditions (e.g. device register states).  If we try to dump it before
hand, it may mean after device registers changed and when we stop VM and
dump the real binary stream the test() fn may return something different,
starting to mismatch with the JSON description.

Dump the whole thing completely with JSON format is indeed another approach
that I am not aware of anyone hought further.  I believe some of us
(including myself) pictured how it could look like, but I am not aware
anyone went deeper than that.  Maybe it's because the current methods work
not as good but okay so that no one yet decided to think it all through.
In short, for simple machine types, they use VMSD versioning hence backward
migration is not supported.  For enterprise use, machine type properties
are used and there aren't a huge lot so maybe not as bothering.

Thanks,

-- 
Peter Xu

Re: [PATCH v7 16/19] qapi: add interface for backend-transfer virtio-net/tap migration

Posted by Vladimir Sementsov-Ogievskiy 3 months, 3 weeks ago

On 16.10.25 19:25, Peter Xu wrote:
> On Wed, Oct 15, 2025 at 11:17:27PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> Off topic:
>>
>> Didn't you think about moving to some context-free protocol for migration
>> stream? Current protocol is hardly bound to migration states definitions
>> in the code. This, for example, makes writing an external tool to analyze the
>> stream almost impossible. As well, any misconfiguration leads to strange
>> error, when we treat data wrongly on the target.
>>
>> I imagine.. json? Or something like this.. So that we can always understand
>> the structure of incoming object, even if we don't know, what exactly we
>> are going to get. This also simplifies expanding the state in new verions:
>> we just add a new field into migratable object, and can handle absent field
>> in incoming stream.
> 
> Have you looked at the current encoded JSON dump within the migration
> stream?  See should_send_vmdesc().
> 
> That looks like what you're describing, but definitely different in that it
> should only be used for debugging purposes e.g. when a stream is dumped
> into a file.  The JSON should only only appear also on precopy as of now.
> 
> We might try to move it _before_ the real binary stream, or making the
> stream itself to be JSON, but there'll be tricky things we need to think
> about.
> 
> At least it should be problematic when we want to dump it before the binary
> stream, because there can be VMSD fields or subsections that has a test()
> function that will only conditionally appear depending on any possible
> conditions (e.g. device register states).  If we try to dump it before
> hand, it may mean after device registers changed and when we stop VM and
> dump the real binary stream the test() fn may return something different,
> starting to mismatch with the JSON description.
> 
> Dump the whole thing completely with JSON format is indeed another approach

Yes I meant this. Or maybe some other external binary protocol like protobuf.

> that I am not aware of anyone hought further.  I believe some of us
> (including myself) pictured how it could look like, but I am not aware
> anyone went deeper than that.  Maybe it's because the current methods work
> not as good but okay so that no one yet decided to think it all through.
> In short, for simple machine types, they use VMSD versioning hence backward
> migration is not supported.  For enterprise use, machine type properties
> are used and there aren't a huge lot so maybe not as bothering.
> 

yes. Too much work with little benefit..

another thought:

We have QAPI protocol, with quite good schema description, and we can add
new optional fields to structures, and backward compatibility works.

Maybe, we can migrate a QAPI generated structures? Then we may describe
state of devices in QAPI..

Just note: working with QEMU's migration protocol and QAPI for years,
I can say that QAPI is a lot simpler in:
- implementing new features in backward compatible style
- maintaining downstream-only features

Still, QAPI is not good for passing big chunks of raw data, like memory pages.

-- 
Best regards,
Vladimir

Re: [PATCH v7 16/19] qapi: add interface for backend-transfer virtio-net/tap migration

Posted by Vladimir Sementsov-Ogievskiy 3 months, 4 weeks ago

On 15.10.25 00:46, Vladimir Sementsov-Ogievskiy wrote:
>>
>> And then, developing similar migration for vhost-user-blk, found
>> that I can't use on boolean capability for such features, the reason
>> in commit message, which we discuss now.
> 
> Why a bool isn't enough?  Could you share a link to that discussion?

I mean, one boolean is not enough for different devices, when not assisted
by per-device options. So, I came to idea of "list of backend targets"
in migration parameter.

It doesn't matter, our discussion has already gone far ahead)

-- 
Best regards,
Vladimir

Re: [PATCH v7 16/19] qapi: add interface for backend-transfer virtio-net/tap migration

Posted by Daniel P. Berrangé 3 months, 3 weeks ago

On Wed, Oct 15, 2025 at 12:54:21AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 15.10.25 00:46, Vladimir Sementsov-Ogievskiy wrote:
> > > 
> > > And then, developing similar migration for vhost-user-blk, found
> > > that I can't use on boolean capability for such features, the reason
> > > in commit message, which we discuss now.
> > 
> > Why a bool isn't enough?  Could you share a link to that discussion?
> 
> I mean, one boolean is not enough for different devices, when not assisted
> by per-device options. So, I came to idea of "list of backend targets"
> in migration parameter.

If we need to identify backends or frontends, surely we should be using
the "id" that the mgmt app used when creating the object, that gets set
in the QOM tree.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|