[v8] virtio-net: live-TAP local migration

[PATCH v8 16/19] qapi: introduce backend-transfer migration parameter

Posted by Vladimir Sementsov-Ogievskiy 3 months, 3 weeks ago

This parameter enables backend-transfer feature: all devices
which support it will migrate their backends (for example a TAP
device, by passing open file descriptor to migration channel).

Currently no such devices, so the new parameter is a noop.

Next commit will add support for virtio-net, to migrate its
TAP backend.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
---
 migration/options.c | 18 ++++++++++++++++++
 migration/options.h |  2 ++
 qapi/migration.json | 38 ++++++++++++++++++++++++++++++++------
 3 files changed, 52 insertions(+), 6 deletions(-)

diff --git a/migration/options.c b/migration/options.c
index 5183112775..a461b07b54 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -262,6 +262,12 @@ bool migrate_mapped_ram(void)
     return s->capabilities[MIGRATION_CAPABILITY_MAPPED_RAM];
 }
 
+bool migrate_backend_transfer(void)
+{
+    MigrationState *s = migrate_get_current();
+    return s->parameters.backend_transfer;
+}
+
 bool migrate_ignore_shared(void)
 {
     MigrationState *s = migrate_get_current();
@@ -963,6 +969,9 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
     params->cpr_exec_command = QAPI_CLONE(strList,
                                           s->parameters.cpr_exec_command);
 
+    params->has_backend_transfer = true;
+    params->backend_transfer = s->parameters.backend_transfer;
+
     return params;
 }
 
@@ -997,6 +1006,7 @@ void migrate_params_init(MigrationParameters *params)
     params->has_zero_page_detection = true;
     params->has_direct_io = true;
     params->has_cpr_exec_command = true;
+    params->has_backend_transfer = true;
 }
 
 /*
@@ -1305,6 +1315,10 @@ static void migrate_params_test_apply(MigrateSetParameters *params,
     if (params->has_cpr_exec_command) {
         dest->cpr_exec_command = params->cpr_exec_command;
     }
+
+    if (params->has_backend_transfer) {
+        dest->backend_transfer = params->backend_transfer;
+    }
 }
 
 static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
@@ -1443,6 +1457,10 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
         s->parameters.cpr_exec_command =
             QAPI_CLONE(strList, params->cpr_exec_command);
     }
+
+    if (params->has_backend_transfer) {
+        s->parameters.backend_transfer = params->backend_transfer;
+    }
 }
 
 void qmp_migrate_set_parameters(MigrateSetParameters *params, Error **errp)
diff --git a/migration/options.h b/migration/options.h
index 82d839709e..755ba1c024 100644
--- a/migration/options.h
+++ b/migration/options.h
@@ -87,6 +87,8 @@ const char *migrate_tls_hostname(void);
 uint64_t migrate_xbzrle_cache_size(void);
 ZeroPageDetection migrate_zero_page_detection(void);
 
+bool migrate_backend_transfer(void);
+
 /* parameters helpers */
 
 bool migrate_params_check(MigrationParameters *params, Error **errp);
diff --git a/qapi/migration.json b/qapi/migration.json
index be0f3fcc12..35601a1f87 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -951,9 +951,16 @@
 #     is @cpr-exec.  The first list element is the program's filename,
 #     the remainder its arguments.  (Since 10.2)
 #
+# @backend-transfer: Enable backend-transfer feature for devices that
+#     supports it. In general that means that backend state and its
+#     file descriptors are passed to the destination in the migraton
+#     channel (which must be a UNIX socket). Individual devices
+#     declare the support for backend-transfer by per-device
+#     backend-transfer option. (Since 10.2)
+#
 # Features:
 #
-# @unstable: Members @x-checkpoint-delay and
+# @unstable: Members @backend-transfer, @x-checkpoint-delay and
 #     @x-vcpu-dirty-limit-period are experimental.
 #
 # Since: 2.4
@@ -978,7 +985,8 @@
            'mode',
            'zero-page-detection',
            'direct-io',
-           'cpr-exec-command'] }
+           'cpr-exec-command',
+           { 'name': 'backend-transfer', 'features': ['unstable'] } ] }
 
 ##
 # @MigrateSetParameters:
@@ -1137,9 +1145,16 @@
 #     is @cpr-exec.  The first list element is the program's filename,
 #     the remainder its arguments.  (Since 10.2)
 #
+# @backend-transfer: Enable backend-transfer feature for devices that
+#     supports it. In general that means that backend state and its
+#     file descriptors are passed to the destination in the migraton
+#     channel (which must be a UNIX socket). Individual devices
+#     declare the support for backend-transfer by per-device
+#     backend-transfer option. (Since 10.2)
+#
 # Features:
 #
-# @unstable: Members @x-checkpoint-delay and
+# @unstable: Members @backend-transfer, @x-checkpoint-delay and
 #     @x-vcpu-dirty-limit-period are experimental.
 #
 # TODO: either fuse back into `MigrationParameters`, or make
@@ -1179,7 +1194,9 @@
             '*mode': 'MigMode',
             '*zero-page-detection': 'ZeroPageDetection',
             '*direct-io': 'bool',
-            '*cpr-exec-command': [ 'str' ]} }
+            '*cpr-exec-command': [ 'str' ],
+            '*backend-transfer': { 'type': 'bool',
+                                   'features': [ 'unstable' ] } } }
 
 ##
 # @migrate-set-parameters:
@@ -1352,9 +1369,16 @@
 #     is @cpr-exec.  The first list element is the program's filename,
 #     the remainder its arguments.  (Since 10.2)
 #
+# @backend-transfer: Enable backend-transfer feature for devices that
+#     supports it. In general that means that backend state and its
+#     file descriptors are passed to the destination in the migraton
+#     channel (which must be a UNIX socket). Individual devices
+#     declare the support for backend-transfer by per-device
+#     backend-transfer option. (Since 10.2)
+#
 # Features:
 #
-# @unstable: Members @x-checkpoint-delay and
+# @unstable: Members @backend-transfer, @x-checkpoint-delay and
 #     @x-vcpu-dirty-limit-period are experimental.
 #
 # Since: 2.4
@@ -1391,7 +1415,9 @@
             '*mode': 'MigMode',
             '*zero-page-detection': 'ZeroPageDetection',
             '*direct-io': 'bool',
-            '*cpr-exec-command': [ 'str' ]} }
+            '*cpr-exec-command': [ 'str' ],
+            '*backend-transfer': { 'type': 'bool',
+                                   'features': [ 'unstable' ] } } }
 
 ##
 # @query-migrate-parameters:
-- 
2.48.1

Re: [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter

Posted by Markus Armbruster 3 months, 3 weeks ago

Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> writes:

> This parameter enables backend-transfer feature: all devices
> which support it will migrate their backends (for example a TAP
> device, by passing open file descriptor to migration channel).
>
> Currently no such devices, so the new parameter is a noop.
>
> Next commit will add support for virtio-net, to migrate its
> TAP backend.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

[...]

> diff --git a/qapi/migration.json b/qapi/migration.json
> index be0f3fcc12..35601a1f87 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -951,9 +951,16 @@
>  #     is @cpr-exec.  The first list element is the program's filename,
>  #     the remainder its arguments.  (Since 10.2)
>  #
> +# @backend-transfer: Enable backend-transfer feature for devices that

Either "Enable the backend transfer feature" or "Enable backend transfer"

> +#     supports it. In general that means that backend state and its

support

> +#     file descriptors are passed to the destination in the migraton
> +#     channel (which must be a UNIX socket). Individual devices
> +#     declare the support for backend-transfer by per-device
> +#     backend-transfer option. (Since 10.2)
> +#

I'm not sure I understand this.

What is a "per-device backend-transfer option"?  Is it a device
property?

If yes, I guess the device declares its capability to do this by having
this property.  Correct?

Does the property's value matter?  How?

>  # Features:
>  #
> -# @unstable: Members @x-checkpoint-delay and
> +# @unstable: Members @backend-transfer, @x-checkpoint-delay and
>  #     @x-vcpu-dirty-limit-period are experimental.
>  #
>  # Since: 2.4
> @@ -978,7 +985,8 @@
>             'mode',
>             'zero-page-detection',
>             'direct-io',
> -           'cpr-exec-command'] }
> +           'cpr-exec-command',
> +           { 'name': 'backend-transfer', 'features': ['unstable'] } ] }
>  
>  ##
>  # @MigrateSetParameters:

[...]

Re: [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter

Posted by Vladimir Sementsov-Ogievskiy 3 months, 3 weeks ago

On 16.10.25 13:56, Markus Armbruster wrote:
> Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> writes:
> 
>> This parameter enables backend-transfer feature: all devices
>> which support it will migrate their backends (for example a TAP
>> device, by passing open file descriptor to migration channel).
>>
>> Currently no such devices, so the new parameter is a noop.
>>
>> Next commit will add support for virtio-net, to migrate its
>> TAP backend.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> 
> [...]
> 
>> diff --git a/qapi/migration.json b/qapi/migration.json
>> index be0f3fcc12..35601a1f87 100644
>> --- a/qapi/migration.json
>> +++ b/qapi/migration.json
>> @@ -951,9 +951,16 @@
>>   #     is @cpr-exec.  The first list element is the program's filename,
>>   #     the remainder its arguments.  (Since 10.2)
>>   #
>> +# @backend-transfer: Enable backend-transfer feature for devices that
> 
> Either "Enable the backend transfer feature" or "Enable backend transfer"

then, "Enable the backend-transfer feature"

> 
>> +#     supports it. In general that means that backend state and its
> 
> support
> 
>> +#     file descriptors are passed to the destination in the migraton
>> +#     channel (which must be a UNIX socket). Individual devices
>> +#     declare the support for backend-transfer by per-device
>> +#     backend-transfer option. (Since 10.2)
>> +#
> 
> I'm not sure I understand this.
> 
> What is a "per-device backend-transfer option"?  Is it a device
> property?
> 
> If yes, I guess the device declares its capability to do this by having
> this property.  Correct?

No, user may set/unset this property to say, should device participate
in backend-transfer or not.

Still, as you can see in parallel thread, Daniel have strong arguments
against such API, so seems it will change again in v9.

https://lore.kernel.org/qemu-devel/aPCtkB-GvFNuqlHn@redhat.com/

> 
> Does the property's value matter?  How?
> 
>>   # Features:
>>   #
>> -# @unstable: Members @x-checkpoint-delay and
>> +# @unstable: Members @backend-transfer, @x-checkpoint-delay and
>>   #     @x-vcpu-dirty-limit-period are experimental.
>>   #
>>   # Since: 2.4
>> @@ -978,7 +985,8 @@
>>              'mode',
>>              'zero-page-detection',
>>              'direct-io',
>> -           'cpr-exec-command'] }
>> +           'cpr-exec-command',
>> +           { 'name': 'backend-transfer', 'features': ['unstable'] } ] }
>>   
>>   ##
>>   # @MigrateSetParameters:
> 
> [...]
> 


-- 
Best regards,
Vladimir

Re: [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter

Posted by Peter Xu 3 months, 3 weeks ago

On Wed, Oct 15, 2025 at 04:21:32PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> This parameter enables backend-transfer feature: all devices
> which support it will migrate their backends (for example a TAP
> device, by passing open file descriptor to migration channel).
> 
> Currently no such devices, so the new parameter is a noop.
> 
> Next commit will add support for virtio-net, to migrate its
> TAP backend.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> ---
>  migration/options.c | 18 ++++++++++++++++++
>  migration/options.h |  2 ++
>  qapi/migration.json | 38 ++++++++++++++++++++++++++++++++------
>  3 files changed, 52 insertions(+), 6 deletions(-)
> 
> diff --git a/migration/options.c b/migration/options.c
> index 5183112775..a461b07b54 100644
> --- a/migration/options.c
> +++ b/migration/options.c
> @@ -262,6 +262,12 @@ bool migrate_mapped_ram(void)
>      return s->capabilities[MIGRATION_CAPABILITY_MAPPED_RAM];
>  }
>  
> +bool migrate_backend_transfer(void)
> +{
> +    MigrationState *s = migrate_get_current();
> +    return s->parameters.backend_transfer;
> +}
> +
>  bool migrate_ignore_shared(void)
>  {
>      MigrationState *s = migrate_get_current();
> @@ -963,6 +969,9 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
>      params->cpr_exec_command = QAPI_CLONE(strList,
>                                            s->parameters.cpr_exec_command);
>  
> +    params->has_backend_transfer = true;
> +    params->backend_transfer = s->parameters.backend_transfer;
> +
>      return params;
>  }
>  
> @@ -997,6 +1006,7 @@ void migrate_params_init(MigrationParameters *params)
>      params->has_zero_page_detection = true;
>      params->has_direct_io = true;
>      params->has_cpr_exec_command = true;
> +    params->has_backend_transfer = true;
>  }
>  
>  /*
> @@ -1305,6 +1315,10 @@ static void migrate_params_test_apply(MigrateSetParameters *params,
>      if (params->has_cpr_exec_command) {
>          dest->cpr_exec_command = params->cpr_exec_command;
>      }
> +
> +    if (params->has_backend_transfer) {
> +        dest->backend_transfer = params->backend_transfer;
> +    }
>  }
>  
>  static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
> @@ -1443,6 +1457,10 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
>          s->parameters.cpr_exec_command =
>              QAPI_CLONE(strList, params->cpr_exec_command);
>      }
> +
> +    if (params->has_backend_transfer) {
> +        s->parameters.backend_transfer = params->backend_transfer;
> +    }
>  }
>  
>  void qmp_migrate_set_parameters(MigrateSetParameters *params, Error **errp)
> diff --git a/migration/options.h b/migration/options.h
> index 82d839709e..755ba1c024 100644
> --- a/migration/options.h
> +++ b/migration/options.h
> @@ -87,6 +87,8 @@ const char *migrate_tls_hostname(void);
>  uint64_t migrate_xbzrle_cache_size(void);
>  ZeroPageDetection migrate_zero_page_detection(void);
>  
> +bool migrate_backend_transfer(void);
> +
>  /* parameters helpers */
>  
>  bool migrate_params_check(MigrationParameters *params, Error **errp);
> diff --git a/qapi/migration.json b/qapi/migration.json
> index be0f3fcc12..35601a1f87 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -951,9 +951,16 @@
>  #     is @cpr-exec.  The first list element is the program's filename,
>  #     the remainder its arguments.  (Since 10.2)
>  #
> +# @backend-transfer: Enable backend-transfer feature for devices that
> +#     supports it. In general that means that backend state and its
> +#     file descriptors are passed to the destination in the migraton
> +#     channel (which must be a UNIX socket). Individual devices
> +#     declare the support for backend-transfer by per-device
> +#     backend-transfer option. (Since 10.2)

Thanks.

I still prefer the name "fd-passing" or anything more explicit than
"backend-transfer". Maybe the current name is fine for TAP, only because
TAP doesn't have its own VMSD to transfer?

Consider a device that would be a backend that supports VMSDs already to be
migrated, then if it starts to allow fd-passing, this name will stop being
suitable there, because it used to "transfer backend" already, now it's
just started to "fd-passing".

Meanwhile, consider another example - what if a device is not a backend at
all (e.g. vfio?), has its own VMSD, then want to do fd-passing?

In general, I think "fd" is really a core concept of this whole thing.  One
thing to complement that idea is, IMHO this patch misses one important
change, that migration framework should actually explicitly fail the
migration if this feature is enabled but it's not a unix socket protocol
(aka, fd-passing REQUIRES scm rights).  Would that look more reliable?
Otherwise IIUC it'll throw weird errors when e.g. when we enabled this
feature and trying to migrate via either TCP or to a file..

> +#
>  # Features:
>  #
> -# @unstable: Members @x-checkpoint-delay and
> +# @unstable: Members @backend-transfer, @x-checkpoint-delay and
>  #     @x-vcpu-dirty-limit-period are experimental.
>  #
>  # Since: 2.4
> @@ -978,7 +985,8 @@
>             'mode',
>             'zero-page-detection',
>             'direct-io',
> -           'cpr-exec-command'] }
> +           'cpr-exec-command',
> +           { 'name': 'backend-transfer', 'features': ['unstable'] } ] }
>  
>  ##
>  # @MigrateSetParameters:
> @@ -1137,9 +1145,16 @@
>  #     is @cpr-exec.  The first list element is the program's filename,
>  #     the remainder its arguments.  (Since 10.2)
>  #
> +# @backend-transfer: Enable backend-transfer feature for devices that
> +#     supports it. In general that means that backend state and its
> +#     file descriptors are passed to the destination in the migraton
> +#     channel (which must be a UNIX socket). Individual devices
> +#     declare the support for backend-transfer by per-device
> +#     backend-transfer option. (Since 10.2)
> +#
>  # Features:
>  #
> -# @unstable: Members @x-checkpoint-delay and
> +# @unstable: Members @backend-transfer, @x-checkpoint-delay and
>  #     @x-vcpu-dirty-limit-period are experimental.
>  #
>  # TODO: either fuse back into `MigrationParameters`, or make
> @@ -1179,7 +1194,9 @@
>              '*mode': 'MigMode',
>              '*zero-page-detection': 'ZeroPageDetection',
>              '*direct-io': 'bool',
> -            '*cpr-exec-command': [ 'str' ]} }
> +            '*cpr-exec-command': [ 'str' ],
> +            '*backend-transfer': { 'type': 'bool',
> +                                   'features': [ 'unstable' ] } } }
>  
>  ##
>  # @migrate-set-parameters:
> @@ -1352,9 +1369,16 @@
>  #     is @cpr-exec.  The first list element is the program's filename,
>  #     the remainder its arguments.  (Since 10.2)
>  #
> +# @backend-transfer: Enable backend-transfer feature for devices that
> +#     supports it. In general that means that backend state and its
> +#     file descriptors are passed to the destination in the migraton
> +#     channel (which must be a UNIX socket). Individual devices
> +#     declare the support for backend-transfer by per-device
> +#     backend-transfer option. (Since 10.2)
> +#
>  # Features:
>  #
> -# @unstable: Members @x-checkpoint-delay and
> +# @unstable: Members @backend-transfer, @x-checkpoint-delay and
>  #     @x-vcpu-dirty-limit-period are experimental.
>  #
>  # Since: 2.4
> @@ -1391,7 +1415,9 @@
>              '*mode': 'MigMode',
>              '*zero-page-detection': 'ZeroPageDetection',
>              '*direct-io': 'bool',
> -            '*cpr-exec-command': [ 'str' ]} }
> +            '*cpr-exec-command': [ 'str' ],
> +            '*backend-transfer': { 'type': 'bool',
> +                                   'features': [ 'unstable' ] } } }
>  
>  ##
>  # @query-migrate-parameters:
> -- 
> 2.48.1
> 

-- 
Peter Xu

Re: [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter

Posted by Vladimir Sementsov-Ogievskiy 3 months, 3 weeks ago

On 15.10.25 21:19, Peter Xu wrote:
> On Wed, Oct 15, 2025 at 04:21:32PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> This parameter enables backend-transfer feature: all devices
>> which support it will migrate their backends (for example a TAP
>> device, by passing open file descriptor to migration channel).
>>
>> Currently no such devices, so the new parameter is a noop.
>>
>> Next commit will add support for virtio-net, to migrate its
>> TAP backend.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
>> ---

[..]

>> --- a/qapi/migration.json
>> +++ b/qapi/migration.json
>> @@ -951,9 +951,16 @@
>>   #     is @cpr-exec.  The first list element is the program's filename,
>>   #     the remainder its arguments.  (Since 10.2)
>>   #
>> +# @backend-transfer: Enable backend-transfer feature for devices that
>> +#     supports it. In general that means that backend state and its
>> +#     file descriptors are passed to the destination in the migraton
>> +#     channel (which must be a UNIX socket). Individual devices
>> +#     declare the support for backend-transfer by per-device
>> +#     backend-transfer option. (Since 10.2)
> 
> Thanks.
> 
> I still prefer the name "fd-passing" or anything more explicit than
> "backend-transfer". Maybe the current name is fine for TAP, only because
> TAP doesn't have its own VMSD to transfer?
> 
> Consider a device that would be a backend that supports VMSDs already to be
> migrated, then if it starts to allow fd-passing, this name will stop being
> suitable there, because it used to "transfer backend" already, now it's
> just started to "fd-passing".
> 
> Meanwhile, consider another example - what if a device is not a backend at
> all (e.g. vfio?), has its own VMSD, then want to do fd-passing?

Reasonable.

But consider also the discussion with Fabiano in v5, where he argues against fds
(reasonable too):

https://lore.kernel.org/qemu-devel/87y0qatqoa.fsf@suse.de/

(still, they were against my "fds" name for the parameter, which is
really too generic, fd-passing is not)

and the arguments for backend-transfer (to read similar with cpr-transfer)

https://lore.kernel.org/qemu-devel/87ms6qtlgf.fsf@suse.de/


> 
> In general, I think "fd" is really a core concept of this whole thing.

I think, we can call "backend" any external object, linked by the fd.

Still, backend/frontend terminology is so misleading, when applied to
complex systems (for me, at least), that I don't really like "-backend"
word here.

fd-passing is OK for me, I can resend with it, if arguments by Fabiano
not change your mind.

>  One
> thing to complement that idea is, IMHO this patch misses one important
> change, that migration framework should actually explicitly fail the
> migration if this feature is enabled but it's not a unix socket protocol
> (aka, fd-passing REQUIRES scm rights).  Would that look more reliable?
> Otherwise IIUC it'll throw weird errors when e.g. when we enabled this
> feature and trying to migrate via either TCP or to a file..
> 

Right. I rely on checking in qemu_file_get_fd() / qemu_file_set_fd()
handlers.

But of course, earlier clean failure of qmp-migrate / qmp-incoming-migate
commands would be nice, will do.

Like this, I think:

diff --git a/migration/migration.c b/migration/migration.c
index 6ed6a10f57..0c73332706 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -255,6 +255,14 @@ migration_channels_and_transport_compatible(MigrationAddress *addr,
          return false;
      }

+    if (migrate_backend_transfer() &&
+        !(addr->transport == MIGRATION_ADDRESS_TYPE_SOCKET &&
+          addr->u.socket.type == SOCKET_ADDRESS_TYPE_UNIX)) {
+        error_setg(errp, "Migration requires a UNIX domain socket as transport, "
+                   "because backend-transfer is enabled");
+        return false;
+    }
+
      return true;
  }





-- 
Best regards,
Vladimir

Re: [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter

Posted by Peter Xu 3 months, 3 weeks ago

On Wed, Oct 15, 2025 at 10:02:14PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 15.10.25 21:19, Peter Xu wrote:
> > On Wed, Oct 15, 2025 at 04:21:32PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > This parameter enables backend-transfer feature: all devices
> > > which support it will migrate their backends (for example a TAP
> > > device, by passing open file descriptor to migration channel).
> > > 
> > > Currently no such devices, so the new parameter is a noop.
> > > 
> > > Next commit will add support for virtio-net, to migrate its
> > > TAP backend.
> > > 
> > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> > > ---
> 
> [..]
> 
> > > --- a/qapi/migration.json
> > > +++ b/qapi/migration.json
> > > @@ -951,9 +951,16 @@
> > >   #     is @cpr-exec.  The first list element is the program's filename,
> > >   #     the remainder its arguments.  (Since 10.2)
> > >   #
> > > +# @backend-transfer: Enable backend-transfer feature for devices that
> > > +#     supports it. In general that means that backend state and its
> > > +#     file descriptors are passed to the destination in the migraton
> > > +#     channel (which must be a UNIX socket). Individual devices
> > > +#     declare the support for backend-transfer by per-device
> > > +#     backend-transfer option. (Since 10.2)
> > 
> > Thanks.
> > 
> > I still prefer the name "fd-passing" or anything more explicit than
> > "backend-transfer". Maybe the current name is fine for TAP, only because
> > TAP doesn't have its own VMSD to transfer?
> > 
> > Consider a device that would be a backend that supports VMSDs already to be
> > migrated, then if it starts to allow fd-passing, this name will stop being
> > suitable there, because it used to "transfer backend" already, now it's
> > just started to "fd-passing".
> > 
> > Meanwhile, consider another example - what if a device is not a backend at
> > all (e.g. vfio?), has its own VMSD, then want to do fd-passing?
> 
> Reasonable.
> 
> But consider also the discussion with Fabiano in v5, where he argues against fds
> (reasonable too):
> 
> https://lore.kernel.org/qemu-devel/87y0qatqoa.fsf@suse.de/
> 
> (still, they were against my "fds" name for the parameter, which is
> really too generic, fd-passing is not)
> 
> and the arguments for backend-transfer (to read similar with cpr-transfer)
> 
> https://lore.kernel.org/qemu-devel/87ms6qtlgf.fsf@suse.de/
> 
> 
> > 
> > In general, I think "fd" is really a core concept of this whole thing.
> 
> I think, we can call "backend" any external object, linked by the fd.
> 
> Still, backend/frontend terminology is so misleading, when applied to
> complex systems (for me, at least), that I don't really like "-backend"
> word here.
> 
> fd-passing is OK for me, I can resend with it, if arguments by Fabiano
> not change your mind.

Ah, I didn't notice the name has been discussed.

I think it means you can vote for your own preference now because we have
one vote for each. :) Let's also see whether Fabiano will come up with
something better than both.

You mentioned explicitly the file descriptors in the qapi doc, that's what
I would strongly request for.  The other thing is the unix socket check, it
looks all good below now with it, thanks.  No strong feelings on the names.

> 
> >  One
> > thing to complement that idea is, IMHO this patch misses one important
> > change, that migration framework should actually explicitly fail the
> > migration if this feature is enabled but it's not a unix socket protocol
> > (aka, fd-passing REQUIRES scm rights).  Would that look more reliable?
> > Otherwise IIUC it'll throw weird errors when e.g. when we enabled this
> > feature and trying to migrate via either TCP or to a file..
> > 
> 
> Right. I rely on checking in qemu_file_get_fd() / qemu_file_set_fd()
> handlers.
> 
> But of course, earlier clean failure of qmp-migrate / qmp-incoming-migate
> commands would be nice, will do.
> 
> Like this, I think:
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 6ed6a10f57..0c73332706 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -255,6 +255,14 @@ migration_channels_and_transport_compatible(MigrationAddress *addr,
>          return false;
>      }
> 
> +    if (migrate_backend_transfer() &&
> +        !(addr->transport == MIGRATION_ADDRESS_TYPE_SOCKET &&
> +          addr->u.socket.type == SOCKET_ADDRESS_TYPE_UNIX)) {
> +        error_setg(errp, "Migration requires a UNIX domain socket as transport, "
> +                   "because backend-transfer is enabled");
> +        return false;
> +    }
> +
>      return true;
>  }
> 
> 
> 
> 
> 
> -- 
> Best regards,
> Vladimir
> 

-- 
Peter Xu

Re: [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter

Posted by Vladimir Sementsov-Ogievskiy 3 months, 3 weeks ago

On 15.10.25 23:07, Peter Xu wrote:
> On Wed, Oct 15, 2025 at 10:02:14PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> On 15.10.25 21:19, Peter Xu wrote:
>>> On Wed, Oct 15, 2025 at 04:21:32PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>>>> This parameter enables backend-transfer feature: all devices
>>>> which support it will migrate their backends (for example a TAP
>>>> device, by passing open file descriptor to migration channel).
>>>>
>>>> Currently no such devices, so the new parameter is a noop.
>>>>
>>>> Next commit will add support for virtio-net, to migrate its
>>>> TAP backend.
>>>>
>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
>>>> ---
>>
>> [..]
>>
>>>> --- a/qapi/migration.json
>>>> +++ b/qapi/migration.json
>>>> @@ -951,9 +951,16 @@
>>>>    #     is @cpr-exec.  The first list element is the program's filename,
>>>>    #     the remainder its arguments.  (Since 10.2)
>>>>    #
>>>> +# @backend-transfer: Enable backend-transfer feature for devices that
>>>> +#     supports it. In general that means that backend state and its
>>>> +#     file descriptors are passed to the destination in the migraton
>>>> +#     channel (which must be a UNIX socket). Individual devices
>>>> +#     declare the support for backend-transfer by per-device
>>>> +#     backend-transfer option. (Since 10.2)
>>>
>>> Thanks.
>>>
>>> I still prefer the name "fd-passing" or anything more explicit than
>>> "backend-transfer". Maybe the current name is fine for TAP, only because
>>> TAP doesn't have its own VMSD to transfer?
>>>
>>> Consider a device that would be a backend that supports VMSDs already to be
>>> migrated, then if it starts to allow fd-passing, this name will stop being
>>> suitable there, because it used to "transfer backend" already, now it's
>>> just started to "fd-passing".
>>>
>>> Meanwhile, consider another example - what if a device is not a backend at
>>> all (e.g. vfio?), has its own VMSD, then want to do fd-passing?
>>
>> Reasonable.
>>
>> But consider also the discussion with Fabiano in v5, where he argues against fds
>> (reasonable too):
>>
>> https://lore.kernel.org/qemu-devel/87y0qatqoa.fsf@suse.de/
>>
>> (still, they were against my "fds" name for the parameter, which is
>> really too generic, fd-passing is not)
>>
>> and the arguments for backend-transfer (to read similar with cpr-transfer)
>>
>> https://lore.kernel.org/qemu-devel/87ms6qtlgf.fsf@suse.de/
>>
>>
>>>
>>> In general, I think "fd" is really a core concept of this whole thing.
>>
>> I think, we can call "backend" any external object, linked by the fd.
>>
>> Still, backend/frontend terminology is so misleading, when applied to
>> complex systems (for me, at least), that I don't really like "-backend"
>> word here.
>>
>> fd-passing is OK for me, I can resend with it, if arguments by Fabiano
>> not change your mind.
> 
> Ah, I didn't notice the name has been discussed.
> 
> I think it means you can vote for your own preference now because we have
> one vote for each. :) Let's also see whether Fabiano will come up with
> something better than both.
> 
> You mentioned explicitly the file descriptors in the qapi doc, that's what
> I would strongly request for.  The other thing is the unix socket check, it
> looks all good below now with it, thanks.  No strong feelings on the names.
> 

After a bit more thinking, I leaning towards keeping backend-transfer. I think
it's more meaningful for the user:

If we call it "fd-passing", user may ask:

Ok, what is it? Allow QEMU to pass some fds through migration stream, if it
supports fds? Which fds? Why to pass them? Finally, why QEMU can't just check
is it unix socket or not, and pass any fds it wants if it is?

Logical question is, why not just drop the global capability, and check only
is it unix socket or not? (OK, relying only on socket type is wrong anyway,
as it may be some complex tunneling, which includes unix sockets, but still
can't pass fds, but I think now about feature naming)

But we really want an explicit switch for the feature. As qemu-update is
not the only case of local migration. The another case is changing the
backend. So for the user's choice is:

1. Remote migration: we can't reuse backends (files, sockets, host devices), as
we are moving to another host. So, we don't enable "backend-transfer". We don't
transfer the backend, we have to initialize new backend on another host.

2. Local migration to update QEMU, with minimal freeze-time and minimal
extra actions: use "backend-transfer", exactly to keep the backends
(vhost-user-server, TAP device in kernel, in-kernel vfio device state, etc)
as is.

3. Local migration, but we want to reconfigure some backend, or switch
to another backend. We disable "backend-transfer" for one device.

4. Some problem with "backend-transfer", may be some bug. Disable the whole
beackend-transfer feature, and do normal local migration to a new version
with bug fixed.

-

"backend-transfer" better reflects, what management layer should do, or
should not do with backends, depending on migration type.

>>
>>>   One
>>> thing to complement that idea is, IMHO this patch misses one important
>>> change, that migration framework should actually explicitly fail the
>>> migration if this feature is enabled but it's not a unix socket protocol
>>> (aka, fd-passing REQUIRES scm rights).  Would that look more reliable?
>>> Otherwise IIUC it'll throw weird errors when e.g. when we enabled this
>>> feature and trying to migrate via either TCP or to a file..
>>>
>>
>> Right. I rely on checking in qemu_file_get_fd() / qemu_file_set_fd()
>> handlers.
>>
>> But of course, earlier clean failure of qmp-migrate / qmp-incoming-migate
>> commands would be nice, will do.
>>
>> Like this, I think:
>>
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 6ed6a10f57..0c73332706 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -255,6 +255,14 @@ migration_channels_and_transport_compatible(MigrationAddress *addr,
>>           return false;
>>       }
>>
>> +    if (migrate_backend_transfer() &&
>> +        !(addr->transport == MIGRATION_ADDRESS_TYPE_SOCKET &&
>> +          addr->u.socket.type == SOCKET_ADDRESS_TYPE_UNIX)) {
>> +        error_setg(errp, "Migration requires a UNIX domain socket as transport, "
>> +                   "because backend-transfer is enabled");
>> +        return false;
>> +    }
>> +
>>       return true;
>>   }
>>
>>
>>
>>
>>
>> -- 
>> Best regards,
>> Vladimir
>>
> 


-- 
Best regards,
Vladimir

Re: [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter

Posted by Daniel P. Berrangé 3 months, 3 weeks ago

On Thu, Oct 16, 2025 at 12:02:45AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 15.10.25 23:07, Peter Xu wrote:
> > On Wed, Oct 15, 2025 at 10:02:14PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > On 15.10.25 21:19, Peter Xu wrote:
> > > > On Wed, Oct 15, 2025 at 04:21:32PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > > > This parameter enables backend-transfer feature: all devices
> > > > > which support it will migrate their backends (for example a TAP
> > > > > device, by passing open file descriptor to migration channel).
> > > > > 
> > > > > Currently no such devices, so the new parameter is a noop.
> > > > > 
> > > > > Next commit will add support for virtio-net, to migrate its
> > > > > TAP backend.
> > > > > 
> > > > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> > > > > ---
> > > 
> > > [..]
> > > 
> > > > > --- a/qapi/migration.json
> > > > > +++ b/qapi/migration.json
> > > > > @@ -951,9 +951,16 @@
> > > > >    #     is @cpr-exec.  The first list element is the program's filename,
> > > > >    #     the remainder its arguments.  (Since 10.2)
> > > > >    #
> > > > > +# @backend-transfer: Enable backend-transfer feature for devices that
> > > > > +#     supports it. In general that means that backend state and its
> > > > > +#     file descriptors are passed to the destination in the migraton
> > > > > +#     channel (which must be a UNIX socket). Individual devices
> > > > > +#     declare the support for backend-transfer by per-device
> > > > > +#     backend-transfer option. (Since 10.2)
> > > > 
> > > > Thanks.
> > > > 
> > > > I still prefer the name "fd-passing" or anything more explicit than
> > > > "backend-transfer". Maybe the current name is fine for TAP, only because
> > > > TAP doesn't have its own VMSD to transfer?
> > > > 
> > > > Consider a device that would be a backend that supports VMSDs already to be
> > > > migrated, then if it starts to allow fd-passing, this name will stop being
> > > > suitable there, because it used to "transfer backend" already, now it's
> > > > just started to "fd-passing".
> > > > 
> > > > Meanwhile, consider another example - what if a device is not a backend at
> > > > all (e.g. vfio?), has its own VMSD, then want to do fd-passing?
> > > 
> > > Reasonable.
> > > 
> > > But consider also the discussion with Fabiano in v5, where he argues against fds
> > > (reasonable too):
> > > 
> > > https://lore.kernel.org/qemu-devel/87y0qatqoa.fsf@suse.de/
> > > 
> > > (still, they were against my "fds" name for the parameter, which is
> > > really too generic, fd-passing is not)
> > > 
> > > and the arguments for backend-transfer (to read similar with cpr-transfer)
> > > 
> > > https://lore.kernel.org/qemu-devel/87ms6qtlgf.fsf@suse.de/
> > > 
> > > 
> > > > 
> > > > In general, I think "fd" is really a core concept of this whole thing.
> > > 
> > > I think, we can call "backend" any external object, linked by the fd.
> > > 
> > > Still, backend/frontend terminology is so misleading, when applied to
> > > complex systems (for me, at least), that I don't really like "-backend"
> > > word here.
> > > 
> > > fd-passing is OK for me, I can resend with it, if arguments by Fabiano
> > > not change your mind.
> > 
> > Ah, I didn't notice the name has been discussed.
> > 
> > I think it means you can vote for your own preference now because we have
> > one vote for each. :) Let's also see whether Fabiano will come up with
> > something better than both.
> > 
> > You mentioned explicitly the file descriptors in the qapi doc, that's what
> > I would strongly request for.  The other thing is the unix socket check, it
> > looks all good below now with it, thanks.  No strong feelings on the names.
> > 
> 
> After a bit more thinking, I leaning towards keeping backend-transfer. I think
> it's more meaningful for the user:
> 
> If we call it "fd-passing", user may ask:
> 
> Ok, what is it? Allow QEMU to pass some fds through migration stream, if it
> supports fds? Which fds? Why to pass them? Finally, why QEMU can't just check
> is it unix socket or not, and pass any fds it wants if it is?
> 
> Logical question is, why not just drop the global capability, and check only
> is it unix socket or not? (OK, relying only on socket type is wrong anyway,
> as it may be some complex tunneling, which includes unix sockets, but still
> can't pass fds, but I think now about feature naming)
> 
> But we really want an explicit switch for the feature. As qemu-update is
> not the only case of local migration. The another case is changing the
> backend. So for the user's choice is:
> 
> 1. Remote migration: we can't reuse backends (files, sockets, host devices), as
> we are moving to another host. So, we don't enable "backend-transfer". We don't
> transfer the backend, we have to initialize new backend on another host.
> 
> 2. Local migration to update QEMU, with minimal freeze-time and minimal
> extra actions: use "backend-transfer", exactly to keep the backends
> (vhost-user-server, TAP device in kernel, in-kernel vfio device state, etc)
> as is.
> 
> 3. Local migration, but we want to reconfigure some backend, or switch
> to another backend. We disable "backend-transfer" for one device.

This implies that you're changing 'backend-transfer' against the
device at time of each migration.

This takes us back to the situation we've had historically where the
behaviour of migration depends on global properties the mgmt app has
set prior to the 'migrate' command being run. We've just tried to get
away from that model by passing everything as parameters to the
migrate command, so I'm loathe to see us invent a new way to have
global state properties changing migration behaviour.

This 'backend-transfer' device property is not really a device property,
it is an indirect parameter to the 'migrate' command.

Ergo, if we need the ability to selectively migrate the backend state
of individal devices, then instead of a property on the device, we
should pass a list of device IDs as a parameter to the migrate
command in QMP.

> 
> 4. Some problem with "backend-transfer", may be some bug. Disable the whole
> beackend-transfer feature, and do normal local migration to a new version
> with bug fixed.
> 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter

Posted by Vladimir Sementsov-Ogievskiy 3 months, 3 weeks ago

On 16.10.25 11:32, Daniel P. Berrangé wrote:
> On Thu, Oct 16, 2025 at 12:02:45AM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> On 15.10.25 23:07, Peter Xu wrote:
>>> On Wed, Oct 15, 2025 at 10:02:14PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>>>> On 15.10.25 21:19, Peter Xu wrote:
>>>>> On Wed, Oct 15, 2025 at 04:21:32PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>>>>>> This parameter enables backend-transfer feature: all devices
>>>>>> which support it will migrate their backends (for example a TAP
>>>>>> device, by passing open file descriptor to migration channel).
>>>>>>
>>>>>> Currently no such devices, so the new parameter is a noop.
>>>>>>
>>>>>> Next commit will add support for virtio-net, to migrate its
>>>>>> TAP backend.
>>>>>>
>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
>>>>>> ---
>>>>
>>>> [..]
>>>>
>>>>>> --- a/qapi/migration.json
>>>>>> +++ b/qapi/migration.json
>>>>>> @@ -951,9 +951,16 @@
>>>>>>     #     is @cpr-exec.  The first list element is the program's filename,
>>>>>>     #     the remainder its arguments.  (Since 10.2)
>>>>>>     #
>>>>>> +# @backend-transfer: Enable backend-transfer feature for devices that
>>>>>> +#     supports it. In general that means that backend state and its
>>>>>> +#     file descriptors are passed to the destination in the migraton
>>>>>> +#     channel (which must be a UNIX socket). Individual devices
>>>>>> +#     declare the support for backend-transfer by per-device
>>>>>> +#     backend-transfer option. (Since 10.2)
>>>>>
>>>>> Thanks.
>>>>>
>>>>> I still prefer the name "fd-passing" or anything more explicit than
>>>>> "backend-transfer". Maybe the current name is fine for TAP, only because
>>>>> TAP doesn't have its own VMSD to transfer?
>>>>>
>>>>> Consider a device that would be a backend that supports VMSDs already to be
>>>>> migrated, then if it starts to allow fd-passing, this name will stop being
>>>>> suitable there, because it used to "transfer backend" already, now it's
>>>>> just started to "fd-passing".
>>>>>
>>>>> Meanwhile, consider another example - what if a device is not a backend at
>>>>> all (e.g. vfio?), has its own VMSD, then want to do fd-passing?
>>>>
>>>> Reasonable.
>>>>
>>>> But consider also the discussion with Fabiano in v5, where he argues against fds
>>>> (reasonable too):
>>>>
>>>> https://lore.kernel.org/qemu-devel/87y0qatqoa.fsf@suse.de/
>>>>
>>>> (still, they were against my "fds" name for the parameter, which is
>>>> really too generic, fd-passing is not)
>>>>
>>>> and the arguments for backend-transfer (to read similar with cpr-transfer)
>>>>
>>>> https://lore.kernel.org/qemu-devel/87ms6qtlgf.fsf@suse.de/
>>>>
>>>>
>>>>>
>>>>> In general, I think "fd" is really a core concept of this whole thing.
>>>>
>>>> I think, we can call "backend" any external object, linked by the fd.
>>>>
>>>> Still, backend/frontend terminology is so misleading, when applied to
>>>> complex systems (for me, at least), that I don't really like "-backend"
>>>> word here.
>>>>
>>>> fd-passing is OK for me, I can resend with it, if arguments by Fabiano
>>>> not change your mind.
>>>
>>> Ah, I didn't notice the name has been discussed.
>>>
>>> I think it means you can vote for your own preference now because we have
>>> one vote for each. :) Let's also see whether Fabiano will come up with
>>> something better than both.
>>>
>>> You mentioned explicitly the file descriptors in the qapi doc, that's what
>>> I would strongly request for.  The other thing is the unix socket check, it
>>> looks all good below now with it, thanks.  No strong feelings on the names.
>>>
>>
>> After a bit more thinking, I leaning towards keeping backend-transfer. I think
>> it's more meaningful for the user:
>>
>> If we call it "fd-passing", user may ask:
>>
>> Ok, what is it? Allow QEMU to pass some fds through migration stream, if it
>> supports fds? Which fds? Why to pass them? Finally, why QEMU can't just check
>> is it unix socket or not, and pass any fds it wants if it is?
>>
>> Logical question is, why not just drop the global capability, and check only
>> is it unix socket or not? (OK, relying only on socket type is wrong anyway,
>> as it may be some complex tunneling, which includes unix sockets, but still
>> can't pass fds, but I think now about feature naming)
>>
>> But we really want an explicit switch for the feature. As qemu-update is
>> not the only case of local migration. The another case is changing the
>> backend. So for the user's choice is:
>>
>> 1. Remote migration: we can't reuse backends (files, sockets, host devices), as
>> we are moving to another host. So, we don't enable "backend-transfer". We don't
>> transfer the backend, we have to initialize new backend on another host.
>>
>> 2. Local migration to update QEMU, with minimal freeze-time and minimal
>> extra actions: use "backend-transfer", exactly to keep the backends
>> (vhost-user-server, TAP device in kernel, in-kernel vfio device state, etc)
>> as is.
>>
>> 3. Local migration, but we want to reconfigure some backend, or switch
>> to another backend. We disable "backend-transfer" for one device.
> 
> This implies that you're changing 'backend-transfer' against the
> device at time of each migration.
> 
> This takes us back to the situation we've had historically where the
> behaviour of migration depends on global properties the mgmt app has
> set prior to the 'migrate' command being run. We've just tried to get
> away from that model by passing everything as parameters to the
> migrate command, so I'm loathe to see us invent a new way to have
> global state properties changing migration behaviour.
> 
> This 'backend-transfer' device property is not really a device property,
> it is an indirect parameter to the 'migrate' command.
> 
> Ergo, if we need the ability to selectively migrate the backend state
> of individal devices, then instead of a property on the device, we
> should pass a list of device IDs as a parameter to the migrate
> command in QMP.

Understand.

So, it will look like

# @backend-transfer: List of devices IDs or QOM paths, to enable
#     backend-transfer for. In general that means that backend
#     states and their file descriptors are passed to the destination
#     in the migration channel (which must be a UNIX socket), and
#     management tool doesn't have to configure new backends for
#     target QEMU (like vhost-user server, or TAP device in the kernel).
#     Default is no backend-transfer migration (Since 10.2)


Peter, is it OK for you?


> 
>>
>> 4. Some problem with "backend-transfer", may be some bug. Disable the whole
>> beackend-transfer feature, and do normal local migration to a new version
>> with bug fixed.
>>
> 
> With regards,
> Daniel


-- 
Best regards,
Vladimir

Re: [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter

Posted by Vladimir Sementsov-Ogievskiy 3 months, 3 weeks ago

On 16.10.25 12:23, Vladimir Sementsov-Ogievskiy wrote:
> On 16.10.25 11:32, Daniel P. Berrangé wrote:
>> On Thu, Oct 16, 2025 at 12:02:45AM +0300, Vladimir Sementsov-Ogievskiy wrote:
>>> On 15.10.25 23:07, Peter Xu wrote:
>>>> On Wed, Oct 15, 2025 at 10:02:14PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>>>>> On 15.10.25 21:19, Peter Xu wrote:
>>>>>> On Wed, Oct 15, 2025 at 04:21:32PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>>>>>>> This parameter enables backend-transfer feature: all devices
>>>>>>> which support it will migrate their backends (for example a TAP
>>>>>>> device, by passing open file descriptor to migration channel).
>>>>>>>
>>>>>>> Currently no such devices, so the new parameter is a noop.
>>>>>>>
>>>>>>> Next commit will add support for virtio-net, to migrate its
>>>>>>> TAP backend.
>>>>>>>
>>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
>>>>>>> ---
>>>>>
>>>>> [..]
>>>>>
>>>>>>> --- a/qapi/migration.json
>>>>>>> +++ b/qapi/migration.json
>>>>>>> @@ -951,9 +951,16 @@
>>>>>>>     #     is @cpr-exec.  The first list element is the program's filename,
>>>>>>>     #     the remainder its arguments.  (Since 10.2)
>>>>>>>     #
>>>>>>> +# @backend-transfer: Enable backend-transfer feature for devices that
>>>>>>> +#     supports it. In general that means that backend state and its
>>>>>>> +#     file descriptors are passed to the destination in the migraton
>>>>>>> +#     channel (which must be a UNIX socket). Individual devices
>>>>>>> +#     declare the support for backend-transfer by per-device
>>>>>>> +#     backend-transfer option. (Since 10.2)
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>> I still prefer the name "fd-passing" or anything more explicit than
>>>>>> "backend-transfer". Maybe the current name is fine for TAP, only because
>>>>>> TAP doesn't have its own VMSD to transfer?
>>>>>>
>>>>>> Consider a device that would be a backend that supports VMSDs already to be
>>>>>> migrated, then if it starts to allow fd-passing, this name will stop being
>>>>>> suitable there, because it used to "transfer backend" already, now it's
>>>>>> just started to "fd-passing".
>>>>>>
>>>>>> Meanwhile, consider another example - what if a device is not a backend at
>>>>>> all (e.g. vfio?), has its own VMSD, then want to do fd-passing?
>>>>>
>>>>> Reasonable.
>>>>>
>>>>> But consider also the discussion with Fabiano in v5, where he argues against fds
>>>>> (reasonable too):
>>>>>
>>>>> https://lore.kernel.org/qemu-devel/87y0qatqoa.fsf@suse.de/
>>>>>
>>>>> (still, they were against my "fds" name for the parameter, which is
>>>>> really too generic, fd-passing is not)
>>>>>
>>>>> and the arguments for backend-transfer (to read similar with cpr-transfer)
>>>>>
>>>>> https://lore.kernel.org/qemu-devel/87ms6qtlgf.fsf@suse.de/
>>>>>
>>>>>
>>>>>>
>>>>>> In general, I think "fd" is really a core concept of this whole thing.
>>>>>
>>>>> I think, we can call "backend" any external object, linked by the fd.
>>>>>
>>>>> Still, backend/frontend terminology is so misleading, when applied to
>>>>> complex systems (for me, at least), that I don't really like "-backend"
>>>>> word here.
>>>>>
>>>>> fd-passing is OK for me, I can resend with it, if arguments by Fabiano
>>>>> not change your mind.
>>>>
>>>> Ah, I didn't notice the name has been discussed.
>>>>
>>>> I think it means you can vote for your own preference now because we have
>>>> one vote for each. :) Let's also see whether Fabiano will come up with
>>>> something better than both.
>>>>
>>>> You mentioned explicitly the file descriptors in the qapi doc, that's what
>>>> I would strongly request for.  The other thing is the unix socket check, it
>>>> looks all good below now with it, thanks.  No strong feelings on the names.
>>>>
>>>
>>> After a bit more thinking, I leaning towards keeping backend-transfer. I think
>>> it's more meaningful for the user:
>>>
>>> If we call it "fd-passing", user may ask:
>>>
>>> Ok, what is it? Allow QEMU to pass some fds through migration stream, if it
>>> supports fds? Which fds? Why to pass them? Finally, why QEMU can't just check
>>> is it unix socket or not, and pass any fds it wants if it is?
>>>
>>> Logical question is, why not just drop the global capability, and check only
>>> is it unix socket or not? (OK, relying only on socket type is wrong anyway,
>>> as it may be some complex tunneling, which includes unix sockets, but still
>>> can't pass fds, but I think now about feature naming)
>>>
>>> But we really want an explicit switch for the feature. As qemu-update is
>>> not the only case of local migration. The another case is changing the
>>> backend. So for the user's choice is:
>>>
>>> 1. Remote migration: we can't reuse backends (files, sockets, host devices), as
>>> we are moving to another host. So, we don't enable "backend-transfer". We don't
>>> transfer the backend, we have to initialize new backend on another host.
>>>
>>> 2. Local migration to update QEMU, with minimal freeze-time and minimal
>>> extra actions: use "backend-transfer", exactly to keep the backends
>>> (vhost-user-server, TAP device in kernel, in-kernel vfio device state, etc)
>>> as is.
>>>
>>> 3. Local migration, but we want to reconfigure some backend, or switch
>>> to another backend. We disable "backend-transfer" for one device.
>>
>> This implies that you're changing 'backend-transfer' against the
>> device at time of each migration.
>>
>> This takes us back to the situation we've had historically where the
>> behaviour of migration depends on global properties the mgmt app has
>> set prior to the 'migrate' command being run. We've just tried to get
>> away from that model by passing everything as parameters to the
>> migrate command, so I'm loathe to see us invent a new way to have
>> global state properties changing migration behaviour.
>>
>> This 'backend-transfer' device property is not really a device property,
>> it is an indirect parameter to the 'migrate' command.
>>
>> Ergo, if we need the ability to selectively migrate the backend state
>> of individal devices, then instead of a property on the device, we
>> should pass a list of device IDs as a parameter to the migrate
>> command in QMP.
> 
> Understand.
> 
> So, it will look like
> 
> # @backend-transfer: List of devices IDs or QOM paths, to enable
> #     backend-transfer for. In general that means that backend
> #     states and their file descriptors are passed to the destination
> #     in the migration channel (which must be a UNIX socket), and
> #     management tool doesn't have to configure new backends for
> #     target QEMU (like vhost-user server, or TAP device in the kernel).
> #     Default is no backend-transfer migration (Since 10.2)
> 


RFC diff to these series, to switch the API to list of IDs:


diff --git a/hw/core/machine.c b/hw/core/machine.c
index a3d77f5604..681adbb7ac 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -40,7 +40,6 @@
  
  GlobalProperty hw_compat_10_1[] = {
      { TYPE_ACPI_GED, "x-has-hest-addr", "false" },
-    { TYPE_VIRTIO_NET, "backend-transfer", "false" },
  };
  const size_t hw_compat_10_1_len = G_N_ELEMENTS(hw_compat_10_1);
  
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 5f9711dee7..a895b26e5d 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -3638,7 +3638,7 @@ static bool virtio_net_is_tap_mig(void *opaque, int version_id)
  
      nc = qemu_get_queue(n->nic);
  
-    return migrate_backend_transfer() && n->backend_transfer && nc->peer &&
+    return migrate_backend_transfer(DEVICE(n)) && nc->peer &&
          nc->peer->info->type == NET_CLIENT_DRIVER_TAP;
  }
  
@@ -4461,7 +4461,6 @@ static const Property virtio_net_properties[] = {
                                 host_features_ex,
                                 VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM,
                                 false),
-    DEFINE_PROP_BOOL("backend-transfer", VirtIONet, backend_transfer, true),
  };
  
  static void virtio_net_class_init(ObjectClass *klass, const void *data)
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index a7bfb10dc7..0f3b7aa55e 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -1160,4 +1160,7 @@ typedef enum MachineInitPhase {
  bool phase_check(MachineInitPhase phase);
  void phase_advance(MachineInitPhase phase);
  
+bool migrate_backend_transfer(DeviceState *dev);
+bool migrate_backend_transfer_check_list(const strList *list, Error **errp);
+
  #endif
diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index bf07f8a4cb..5b8ab7bda7 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -231,7 +231,6 @@ struct VirtIONet {
      struct EBPFRSSContext ebpf_rss;
      uint32_t nr_ebpf_rss_fds;
      char **ebpf_rss_fds;
-    bool backend_transfer;
  };
  
  size_t virtio_net_handle_ctrl_iov(VirtIODevice *vdev,
diff --git a/include/migration/misc.h b/include/migration/misc.h
index 592b93021e..7f931bed17 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -152,4 +152,6 @@ bool multifd_device_state_save_thread_should_exit(void);
  void multifd_abort_device_state_save_threads(void);
  bool multifd_join_device_state_save_threads(void);
  
+const strList *migrate_backend_transfer_list(void);
+
  #endif
diff --git a/migration/options.c b/migration/options.c
index a461b07b54..1644728ed7 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -13,6 +13,7 @@
  
  #include "qemu/osdep.h"
  #include "qemu/error-report.h"
+#include "qapi/util.h"
  #include "exec/target_page.h"
  #include "qapi/clone-visitor.h"
  #include "qapi/error.h"
@@ -24,6 +25,7 @@
  #include "migration/colo.h"
  #include "migration/cpr.h"
  #include "migration/misc.h"
+#include "migration/options.h"
  #include "migration.h"
  #include "migration-stats.h"
  #include "qemu-file.h"
@@ -262,7 +264,7 @@ bool migrate_mapped_ram(void)
      return s->capabilities[MIGRATION_CAPABILITY_MAPPED_RAM];
  }
  
-bool migrate_backend_transfer(void)
+const strList *migrate_backend_transfer_list(void)
  {
      MigrationState *s = migrate_get_current();
      return s->parameters.backend_transfer;
@@ -969,8 +971,11 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
      params->cpr_exec_command = QAPI_CLONE(strList,
                                            s->parameters.cpr_exec_command);
  
-    params->has_backend_transfer = true;
-    params->backend_transfer = s->parameters.backend_transfer;
+    if (s->parameters.backend_transfer) {
+        params->has_backend_transfer = true;
+        params->backend_transfer = QAPI_CLONE(strList,
+                                              s->parameters.backend_transfer);
+    }
  
      return params;
  }
@@ -1193,6 +1198,11 @@ bool migrate_params_check(MigrationParameters *params, Error **errp)
          return false;
      }
  
+    if (params->has_backend_transfer &&
+        !migrate_backend_transfer_check_list(params->backend_transfer, errp)) {
+        return false;
+    }
+
      return true;
  }
  
@@ -1459,7 +1469,10 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
      }
  
      if (params->has_backend_transfer) {
-        s->parameters.backend_transfer = params->backend_transfer;
+        qapi_free_strList(s->parameters.backend_transfer);
+
+        s->parameters.backend_transfer = QAPI_CLONE(strList,
+                                                    params->backend_transfer);
      }
  }
  
diff --git a/migration/options.h b/migration/options.h
index 755ba1c024..82d839709e 100644
--- a/migration/options.h
+++ b/migration/options.h
@@ -87,8 +87,6 @@ const char *migrate_tls_hostname(void);
  uint64_t migrate_xbzrle_cache_size(void);
  ZeroPageDetection migrate_zero_page_detection(void);
  
-bool migrate_backend_transfer(void);
-
  /* parameters helpers */
  
  bool migrate_params_check(MigrationParameters *params, Error **errp);
diff --git a/qapi/migration.json b/qapi/migration.json
index 35601a1f87..9478c4ddab 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -951,12 +951,11 @@
  #     is @cpr-exec.  The first list element is the program's filename,
  #     the remainder its arguments.  (Since 10.2)
  #
-# @backend-transfer: Enable backend-transfer feature for devices that
-#     supports it. In general that means that backend state and its
-#     file descriptors are passed to the destination in the migraton
-#     channel (which must be a UNIX socket). Individual devices
-#     declare the support for backend-transfer by per-device
-#     backend-transfer option. (Since 10.2)
+# @backend-transfer: List of devices (IDs or QOM paths) for
+#     backend-transfer migration.  When enabled, device backends
+#     including opened fds will be passed to the destination in the
+#     migration channel (which must be a UNIX domain socket).  Default
+#     is no backend-transfer migration. (Since 10.2)
  #
  # Features:
  #
@@ -1145,12 +1144,11 @@
  #     is @cpr-exec.  The first list element is the program's filename,
  #     the remainder its arguments.  (Since 10.2)
  #
-# @backend-transfer: Enable backend-transfer feature for devices that
-#     supports it. In general that means that backend state and its
-#     file descriptors are passed to the destination in the migraton
-#     channel (which must be a UNIX socket). Individual devices
-#     declare the support for backend-transfer by per-device
-#     backend-transfer option. (Since 10.2)
+# @backend-transfer: List of devices (IDs or QOM paths) for
+#     backend-transfer migration.  When enabled, device backends
+#     including opened fds will be passed to the destination in the
+#     migration channel (which must be a UNIX domain socket).  Default
+#     is no backend-transfer migration. (Since 10.2)
  #
  # Features:
  #
@@ -1195,7 +1193,7 @@
              '*zero-page-detection': 'ZeroPageDetection',
              '*direct-io': 'bool',
              '*cpr-exec-command': [ 'str' ],
-            '*backend-transfer': { 'type': 'bool',
+            '*backend-transfer': { 'type': [ 'str' ],
                                     'features': [ 'unstable' ] } } }
  
  ##
@@ -1369,12 +1367,11 @@
  #     is @cpr-exec.  The first list element is the program's filename,
  #     the remainder its arguments.  (Since 10.2)
  #
-# @backend-transfer: Enable backend-transfer feature for devices that
-#     supports it. In general that means that backend state and its
-#     file descriptors are passed to the destination in the migraton
-#     channel (which must be a UNIX socket). Individual devices
-#     declare the support for backend-transfer by per-device
-#     backend-transfer option. (Since 10.2)
+# @backend-transfer: List of devices (IDs or QOM paths) for
+#     backend-transfer migration.  When enabled, device backends
+#     including opened fds will be passed to the destination in the
+#     migration channel (which must be a UNIX domain socket).  Default
+#     is no backend-transfer migration. (Since 10.2)
  #
  # Features:
  #
@@ -1416,7 +1413,7 @@
              '*zero-page-detection': 'ZeroPageDetection',
              '*direct-io': 'bool',
              '*cpr-exec-command': [ 'str' ],
-            '*backend-transfer': { 'type': 'bool',
+            '*backend-transfer': { 'type': [ 'str' ],
                                     'features': [ 'unstable' ] } } }
  
  ##
diff --git a/system/qdev-monitor.c b/system/qdev-monitor.c
index 2ac92d0a07..b4a1a88992 100644
--- a/system/qdev-monitor.c
+++ b/system/qdev-monitor.c
@@ -939,6 +939,32 @@ void qmp_device_del(const char *id, Error **errp)
      }
  }
  
+bool migrate_backend_transfer(DeviceState *dev)
+{
+    const strList *el = migrate_backend_transfer_list();
+
+    for ( ; el; el = el->next) {
+        if (find_device_state(el->value, false, NULL) == dev) {
+            return true;
+        }
+    }
+
+    return false;
+}
+
+bool migrate_backend_transfer_check_list(const strList *list, Error **errp)
+{
+    const strList *el = list;
+
+    for ( ; el; el = el->next) {
+        if (!find_device_state(el->value, false, errp)) {
+            return false;
+        }
+    }
+
+    return true;
+}
+
  int qdev_sync_config(DeviceState *dev, Error **errp)
  {
      DeviceClass *dc = DEVICE_GET_CLASS(dev);
diff --git a/tests/functional/test_x86_64_tap_migration.py b/tests/functional/test_x86_64_tap_migration.py
index 1f88ff174c..a324b0f374 100644
--- a/tests/functional/test_x86_64_tap_migration.py
+++ b/tests/functional/test_x86_64_tap_migration.py
@@ -254,17 +254,16 @@ def prepare_and_launch_vm(
          self.log.info(f"Launching {vm_s} VM")
          vm.launch()
  
-        self.set_migration_capabilities(vm, backend_transfer)
-
          if not backend_transfer:
              tap_name = TAP_ID2 if incoming else TAP_ID
          else:
              tap_name = TAP_ID
  
-        self.add_virtio_net(vm, vhost, tap_name, backend_transfer)
+        self.add_virtio_net(vm, vhost, tap_name)
+
+        self.set_migration_capabilities(vm, backend_transfer)
  
-    def add_virtio_net(self, vm, vhost: bool, tap_name: str,
-                       backend_transfer: bool):
+    def add_virtio_net(self, vm, vhost: bool, tap_name: str = "tap0"):
          netdev_params = {
              "id": "netdev.1",
              "vhost": vhost,
@@ -289,17 +288,19 @@ def add_virtio_net(self, vm, vhost: bool, tap_name: str,
              bus="pci.1",
              mac=GUEST_MAC,
              disable_legacy="off",
-            backend_transfer=backend_transfer,
          )
  
      def set_migration_capabilities(self, vm, backend_transfer=True):
-        vm.cmd("migrate-set-capabilities", { "capabilities": [
+        capabilities = [
              {"capability": "events", "state": True},
              {"capability": "x-ignore-shared", "state": True},
-        ]})
-        vm.cmd("migrate-set-parameters", {
-            "backend-transfer": backend_transfer
-        })
+        ]
+        vm.cmd("migrate-set-capabilities", {"capabilities": capabilities})
+        if backend_transfer:
+            vm.cmd(
+                "migrate-set-parameters",
+                {"backend-transfer": ["/machine/peripheral/vnet.1/virtio-backend"]},
+            )
  
      def setup_guest_network(self) -> None:
          exec_command_and_wait_for_pattern(self, "ip addr", "# ")



-- 
Best regards,
Vladimir

Re: [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter

Posted by Vladimir Sementsov-Ogievskiy 3 months, 3 weeks ago

On 16.10.25 23:26, Vladimir Sementsov-Ogievskiy wrote:
> On 16.10.25 12:23, Vladimir Sementsov-Ogievskiy wrote:
>> On 16.10.25 11:32, Daniel P. Berrangé wrote:
>>> On Thu, Oct 16, 2025 at 12:02:45AM +0300, Vladimir Sementsov-Ogievskiy wrote:
>>>> On 15.10.25 23:07, Peter Xu wrote:

[..]

>>>> 3. Local migration, but we want to reconfigure some backend, or switch
>>>> to another backend. We disable "backend-transfer" for one device.
>>>
>>> This implies that you're changing 'backend-transfer' against the
>>> device at time of each migration.
>>>
>>> This takes us back to the situation we've had historically where the
>>> behaviour of migration depends on global properties the mgmt app has
>>> set prior to the 'migrate' command being run. We've just tried to get
>>> away from that model by passing everything as parameters to the
>>> migrate command, so I'm loathe to see us invent a new way to have
>>> global state properties changing migration behaviour.
>>>
>>> This 'backend-transfer' device property is not really a device property,
>>> it is an indirect parameter to the 'migrate' command.
>>>
>>> Ergo, if we need the ability to selectively migrate the backend state
>>> of individal devices, then instead of a property on the device, we
>>> should pass a list of device IDs as a parameter to the migrate
>>> command in QMP.
>>
>> Understand.
>>
>> So, it will look like
>>
>> # @backend-transfer: List of devices IDs or QOM paths, to enable
>> #     backend-transfer for. In general that means that backend
>> #     states and their file descriptors are passed to the destination
>> #     in the migration channel (which must be a UNIX socket), and
>> #     management tool doesn't have to configure new backends for
>> #     target QEMU (like vhost-user server, or TAP device in the kernel).
>> #     Default is no backend-transfer migration (Since 10.2)
>>
> 
> 
> RFC diff to these series, to switch the API to list of IDs:
> 

[..]

> @@ -1193,6 +1198,11 @@ bool migrate_params_check(MigrationParameters *params, Error **errp)
>           return false;
>       }
> 
> +    if (params->has_backend_transfer &&
> +        !migrate_backend_transfer_check_list(params->backend_transfer, errp)) {
> +        return false;
> +    }

This made me to move capabilities setup after device add in the test. Not a problem.

> +
>       return true;
>   }
> 

[..]

> -        vm.cmd("migrate-set-parameters", {
> -            "backend-transfer": backend_transfer
> -        })
> +        ]
> +        vm.cmd("migrate-set-capabilities", {"capabilities": capabilities})
> +        if backend_transfer:
> +            vm.cmd(
> +                "migrate-set-parameters",
> +                {"backend-transfer": ["/machine/peripheral/vnet.1/virtio-backend"]},

If write just "vnet.1" it doesn't work, of course. Is there some way get pointer to
proxy device from virtio-net.c? But maybe, it's OK as is.

> +            )
> 
>       def setup_guest_network(self) -> None:
>           exec_command_and_wait_for_pattern(self, "ip addr", "# ")
> 
> 
> 


-- 
Best regards,
Vladimir

Re: [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter

Posted by Peter Xu 3 months, 3 weeks ago

On Thu, Oct 16, 2025 at 12:23:35PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 16.10.25 11:32, Daniel P. Berrangé wrote:
> > On Thu, Oct 16, 2025 at 12:02:45AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > On 15.10.25 23:07, Peter Xu wrote:
> > > > On Wed, Oct 15, 2025 at 10:02:14PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > > > On 15.10.25 21:19, Peter Xu wrote:
> > > > > > On Wed, Oct 15, 2025 at 04:21:32PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > > > > > This parameter enables backend-transfer feature: all devices
> > > > > > > which support it will migrate their backends (for example a TAP
> > > > > > > device, by passing open file descriptor to migration channel).
> > > > > > > 
> > > > > > > Currently no such devices, so the new parameter is a noop.
> > > > > > > 
> > > > > > > Next commit will add support for virtio-net, to migrate its
> > > > > > > TAP backend.
> > > > > > > 
> > > > > > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> > > > > > > ---
> > > > > 
> > > > > [..]
> > > > > 
> > > > > > > --- a/qapi/migration.json
> > > > > > > +++ b/qapi/migration.json
> > > > > > > @@ -951,9 +951,16 @@
> > > > > > >     #     is @cpr-exec.  The first list element is the program's filename,
> > > > > > >     #     the remainder its arguments.  (Since 10.2)
> > > > > > >     #
> > > > > > > +# @backend-transfer: Enable backend-transfer feature for devices that
> > > > > > > +#     supports it. In general that means that backend state and its
> > > > > > > +#     file descriptors are passed to the destination in the migraton
> > > > > > > +#     channel (which must be a UNIX socket). Individual devices
> > > > > > > +#     declare the support for backend-transfer by per-device
> > > > > > > +#     backend-transfer option. (Since 10.2)
> > > > > > 
> > > > > > Thanks.
> > > > > > 
> > > > > > I still prefer the name "fd-passing" or anything more explicit than
> > > > > > "backend-transfer". Maybe the current name is fine for TAP, only because
> > > > > > TAP doesn't have its own VMSD to transfer?
> > > > > > 
> > > > > > Consider a device that would be a backend that supports VMSDs already to be
> > > > > > migrated, then if it starts to allow fd-passing, this name will stop being
> > > > > > suitable there, because it used to "transfer backend" already, now it's
> > > > > > just started to "fd-passing".
> > > > > > 
> > > > > > Meanwhile, consider another example - what if a device is not a backend at
> > > > > > all (e.g. vfio?), has its own VMSD, then want to do fd-passing?
> > > > > 
> > > > > Reasonable.
> > > > > 
> > > > > But consider also the discussion with Fabiano in v5, where he argues against fds
> > > > > (reasonable too):
> > > > > 
> > > > > https://lore.kernel.org/qemu-devel/87y0qatqoa.fsf@suse.de/
> > > > > 
> > > > > (still, they were against my "fds" name for the parameter, which is
> > > > > really too generic, fd-passing is not)
> > > > > 
> > > > > and the arguments for backend-transfer (to read similar with cpr-transfer)
> > > > > 
> > > > > https://lore.kernel.org/qemu-devel/87ms6qtlgf.fsf@suse.de/
> > > > > 
> > > > > 
> > > > > > 
> > > > > > In general, I think "fd" is really a core concept of this whole thing.
> > > > > 
> > > > > I think, we can call "backend" any external object, linked by the fd.
> > > > > 
> > > > > Still, backend/frontend terminology is so misleading, when applied to
> > > > > complex systems (for me, at least), that I don't really like "-backend"
> > > > > word here.
> > > > > 
> > > > > fd-passing is OK for me, I can resend with it, if arguments by Fabiano
> > > > > not change your mind.
> > > > 
> > > > Ah, I didn't notice the name has been discussed.
> > > > 
> > > > I think it means you can vote for your own preference now because we have
> > > > one vote for each. :) Let's also see whether Fabiano will come up with
> > > > something better than both.
> > > > 
> > > > You mentioned explicitly the file descriptors in the qapi doc, that's what
> > > > I would strongly request for.  The other thing is the unix socket check, it
> > > > looks all good below now with it, thanks.  No strong feelings on the names.
> > > > 
> > > 
> > > After a bit more thinking, I leaning towards keeping backend-transfer. I think
> > > it's more meaningful for the user:
> > > 
> > > If we call it "fd-passing", user may ask:
> > > 
> > > Ok, what is it? Allow QEMU to pass some fds through migration stream, if it
> > > supports fds? Which fds? Why to pass them? Finally, why QEMU can't just check
> > > is it unix socket or not, and pass any fds it wants if it is?
> > > 
> > > Logical question is, why not just drop the global capability, and check only
> > > is it unix socket or not? (OK, relying only on socket type is wrong anyway,
> > > as it may be some complex tunneling, which includes unix sockets, but still
> > > can't pass fds, but I think now about feature naming)
> > > 
> > > But we really want an explicit switch for the feature. As qemu-update is
> > > not the only case of local migration. The another case is changing the
> > > backend. So for the user's choice is:
> > > 
> > > 1. Remote migration: we can't reuse backends (files, sockets, host devices), as
> > > we are moving to another host. So, we don't enable "backend-transfer". We don't
> > > transfer the backend, we have to initialize new backend on another host.
> > > 
> > > 2. Local migration to update QEMU, with minimal freeze-time and minimal
> > > extra actions: use "backend-transfer", exactly to keep the backends
> > > (vhost-user-server, TAP device in kernel, in-kernel vfio device state, etc)
> > > as is.
> > > 
> > > 3. Local migration, but we want to reconfigure some backend, or switch
> > > to another backend. We disable "backend-transfer" for one device.
> > 
> > This implies that you're changing 'backend-transfer' against the
> > device at time of each migration.
> > 
> > This takes us back to the situation we've had historically where the
> > behaviour of migration depends on global properties the mgmt app has
> > set prior to the 'migrate' command being run. We've just tried to get
> > away from that model by passing everything as parameters to the
> > migrate command, so I'm loathe to see us invent a new way to have
> > global state properties changing migration behaviour.
> > 
> > This 'backend-transfer' device property is not really a device property,
> > it is an indirect parameter to the 'migrate' command.

I was not seeing it like that.

I was treating per-device parameter to be a flag showing whether the device
is capable of passing over FDs, which is more like a device attribute.

Those things (after set by machine type) should never change, and the only
thing to be changed is the global "backend-transfer" boolean that can be
set in the "migrate" QMP command, and should be decided by the admin when
one wants to initiate the migration process.

> > 
> > Ergo, if we need the ability to selectively migrate the backend state
> > of individal devices, then instead of a property on the device, we
> > should pass a list of device IDs as a parameter to the migrate
> > command in QMP.

I doubt whether we would really need that in reality.

Likely the admin should only worry about whether setting the global
"backend-transfer", the admin may not even need to know which device, and
how many devices, will be beneficial to this feature enabled.

It just says, "we're doing local migration and via unix sockets, so
whatever devices can try to reuse their backends if possible".

Thanks,

-- 
Peter Xu

Re: [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter

Posted by Daniel P. Berrangé 3 months, 3 weeks ago

On Thu, Oct 16, 2025 at 02:40:58PM -0400, Peter Xu wrote:
> On Thu, Oct 16, 2025 at 12:23:35PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > On 16.10.25 11:32, Daniel P. Berrangé wrote:
> > > On Thu, Oct 16, 2025 at 12:02:45AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > > 1. Remote migration: we can't reuse backends (files, sockets, host devices), as
> > > > we are moving to another host. So, we don't enable "backend-transfer". We don't
> > > > transfer the backend, we have to initialize new backend on another host.
> > > > 
> > > > 2. Local migration to update QEMU, with minimal freeze-time and minimal
> > > > extra actions: use "backend-transfer", exactly to keep the backends
> > > > (vhost-user-server, TAP device in kernel, in-kernel vfio device state, etc)
> > > > as is.
> > > > 
> > > > 3. Local migration, but we want to reconfigure some backend, or switch
> > > > to another backend. We disable "backend-transfer" for one device.
> > > 
> > > This implies that you're changing 'backend-transfer' against the
> > > device at time of each migration.
> > > 
> > > This takes us back to the situation we've had historically where the
> > > behaviour of migration depends on global properties the mgmt app has
> > > set prior to the 'migrate' command being run. We've just tried to get
> > > away from that model by passing everything as parameters to the
> > > migrate command, so I'm loathe to see us invent a new way to have
> > > global state properties changing migration behaviour.
> > > 
> > > This 'backend-transfer' device property is not really a device property,
> > > it is an indirect parameter to the 'migrate' command.
> 
> I was not seeing it like that.
> 
> I was treating per-device parameter to be a flag showing whether the device
> is capable of passing over FDs, which is more like a device attribute.
> 
> Those things (after set by machine type) should never change, and the only
> thing to be changed is the global "backend-transfer" boolean that can be
> set in the "migrate" QMP command, and should be decided by the admin when
> one wants to initiate the migration process.
> 
> > > 
> > > Ergo, if we need the ability to selectively migrate the backend state
> > > of individal devices, then instead of a property on the device, we
> > > should pass a list of device IDs as a parameter to the migrate
> > > command in QMP.
> 
> I doubt whether we would really need that in reality.
> 
> Likely the admin should only worry about whether setting the global
> "backend-transfer", the admin may not even need to know which device, and
> how many devices, will be beneficial to this feature enabled.
> 
> It just says, "we're doing local migration and via unix sockets, so
> whatever devices can try to reuse their backends if possible".

An individual device can only use backend transfer if both the old and
new QEMU agree that it can be done. At the time we start the origin
QEMU we know which set of devices are capable of doing an outgoing
backend transfer, but we don't know what set of devices are capable
of doing an incoming backend transfer.

If we don't have a per-device toggle at time of migration, then we
have to assume that the target QEMU can always support at least the
same set of incoming backends as the src QEMU outgoing backend. This
feels like a potentially risky assumption.

Another scenario is where you are doing a localhost migration as a
mechanism to let you change a device backend. In that case you'll
want to do a backend transfer of all devices, except the one that
you want to change.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter

Posted by Peter Xu 3 months, 3 weeks ago

On Thu, Oct 16, 2025 at 07:51:42PM +0100, Daniel P. Berrangé wrote:
> On Thu, Oct 16, 2025 at 02:40:58PM -0400, Peter Xu wrote:
> > On Thu, Oct 16, 2025 at 12:23:35PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > On 16.10.25 11:32, Daniel P. Berrangé wrote:
> > > > On Thu, Oct 16, 2025 at 12:02:45AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > > > 1. Remote migration: we can't reuse backends (files, sockets, host devices), as
> > > > > we are moving to another host. So, we don't enable "backend-transfer". We don't
> > > > > transfer the backend, we have to initialize new backend on another host.
> > > > > 
> > > > > 2. Local migration to update QEMU, with minimal freeze-time and minimal
> > > > > extra actions: use "backend-transfer", exactly to keep the backends
> > > > > (vhost-user-server, TAP device in kernel, in-kernel vfio device state, etc)
> > > > > as is.
> > > > > 
> > > > > 3. Local migration, but we want to reconfigure some backend, or switch
> > > > > to another backend. We disable "backend-transfer" for one device.
> > > > 
> > > > This implies that you're changing 'backend-transfer' against the
> > > > device at time of each migration.
> > > > 
> > > > This takes us back to the situation we've had historically where the
> > > > behaviour of migration depends on global properties the mgmt app has
> > > > set prior to the 'migrate' command being run. We've just tried to get
> > > > away from that model by passing everything as parameters to the
> > > > migrate command, so I'm loathe to see us invent a new way to have
> > > > global state properties changing migration behaviour.
> > > > 
> > > > This 'backend-transfer' device property is not really a device property,
> > > > it is an indirect parameter to the 'migrate' command.
> > 
> > I was not seeing it like that.
> > 
> > I was treating per-device parameter to be a flag showing whether the device
> > is capable of passing over FDs, which is more like a device attribute.
> > 
> > Those things (after set by machine type) should never change, and the only
> > thing to be changed is the global "backend-transfer" boolean that can be
> > set in the "migrate" QMP command, and should be decided by the admin when
> > one wants to initiate the migration process.
> > 
> > > > 
> > > > Ergo, if we need the ability to selectively migrate the backend state
> > > > of individal devices, then instead of a property on the device, we
> > > > should pass a list of device IDs as a parameter to the migrate
> > > > command in QMP.
> > 
> > I doubt whether we would really need that in reality.
> > 
> > Likely the admin should only worry about whether setting the global
> > "backend-transfer", the admin may not even need to know which device, and
> > how many devices, will be beneficial to this feature enabled.
> > 
> > It just says, "we're doing local migration and via unix sockets, so
> > whatever devices can try to reuse their backends if possible".
> 
> An individual device can only use backend transfer if both the old and
> new QEMU agree that it can be done. At the time we start the origin
> QEMU we know which set of devices are capable of doing an outgoing
> backend transfer, but we don't know what set of devices are capable
> of doing an incoming backend transfer.
> 
> If we don't have a per-device toggle at time of migration, then we
> have to assume that the target QEMU can always support at least the
> same set of incoming backends as the src QEMU outgoing backend. This
> feels like a potentially risky assumption.

When using machine properties, these things should already be set by the
machine types.

E.g. if this is a new QEMU with an old machine type, we should have this
per-device property set to OFF forever when booting the VM, and should keep
it like that after any rounds of migrations.  Because any VM using the old
machine type _might_ be migrated back to an older QEMU that won't support
it.  So IIUC that strictly follows how we use versioned machine types.

What Vladimir mentioned previously would be something very special, but
indeed when there's no machine type versioning we may need to toggle this
before each migration.  However since upstream is following the machine
type properties way of doing this since N years ago, do we need to worry
about that?

> 
> Another scenario is where you are doing a localhost migration as a
> mechanism to let you change a device backend. In that case you'll
> want to do a backend transfer of all devices, except the one that
> you want to change.

Right, this might be a real need if it exists.  Said that, it's so special
that I'm not sure whether the admin can easily migrate with global
backend-transfer to OFF in this rare case.

In general, I would prefer avoiding to introduce any form of list of
devices into the migration system if ever possible.  I agree if we must
introduce that it should at least be a list of IDs rather than adhoc array
of strings.  However I still want to see whether we can completely avoid
it.

Thanks,

-- 
Peter Xu

Re: [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter

Posted by Daniel P. Berrangé 3 months, 3 weeks ago

On Thu, Oct 16, 2025 at 03:29:27PM -0400, Peter Xu wrote:
> On Thu, Oct 16, 2025 at 07:51:42PM +0100, Daniel P. Berrangé wrote:
> > On Thu, Oct 16, 2025 at 02:40:58PM -0400, Peter Xu wrote:
> > > On Thu, Oct 16, 2025 at 12:23:35PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > > On 16.10.25 11:32, Daniel P. Berrangé wrote:
> > > > > On Thu, Oct 16, 2025 at 12:02:45AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > > > > 1. Remote migration: we can't reuse backends (files, sockets, host devices), as
> > > > > > we are moving to another host. So, we don't enable "backend-transfer". We don't
> > > > > > transfer the backend, we have to initialize new backend on another host.
> > > > > > 
> > > > > > 2. Local migration to update QEMU, with minimal freeze-time and minimal
> > > > > > extra actions: use "backend-transfer", exactly to keep the backends
> > > > > > (vhost-user-server, TAP device in kernel, in-kernel vfio device state, etc)
> > > > > > as is.
> > > > > > 
> > > > > > 3. Local migration, but we want to reconfigure some backend, or switch
> > > > > > to another backend. We disable "backend-transfer" for one device.
> > > > > 
> > > > > This implies that you're changing 'backend-transfer' against the
> > > > > device at time of each migration.
> > > > > 
> > > > > This takes us back to the situation we've had historically where the
> > > > > behaviour of migration depends on global properties the mgmt app has
> > > > > set prior to the 'migrate' command being run. We've just tried to get
> > > > > away from that model by passing everything as parameters to the
> > > > > migrate command, so I'm loathe to see us invent a new way to have
> > > > > global state properties changing migration behaviour.
> > > > > 
> > > > > This 'backend-transfer' device property is not really a device property,
> > > > > it is an indirect parameter to the 'migrate' command.
> > > 
> > > I was not seeing it like that.
> > > 
> > > I was treating per-device parameter to be a flag showing whether the device
> > > is capable of passing over FDs, which is more like a device attribute.
> > > 
> > > Those things (after set by machine type) should never change, and the only
> > > thing to be changed is the global "backend-transfer" boolean that can be
> > > set in the "migrate" QMP command, and should be decided by the admin when
> > > one wants to initiate the migration process.
> > > 
> > > > > 
> > > > > Ergo, if we need the ability to selectively migrate the backend state
> > > > > of individal devices, then instead of a property on the device, we
> > > > > should pass a list of device IDs as a parameter to the migrate
> > > > > command in QMP.
> > > 
> > > I doubt whether we would really need that in reality.
> > > 
> > > Likely the admin should only worry about whether setting the global
> > > "backend-transfer", the admin may not even need to know which device, and
> > > how many devices, will be beneficial to this feature enabled.
> > > 
> > > It just says, "we're doing local migration and via unix sockets, so
> > > whatever devices can try to reuse their backends if possible".
> > 
> > An individual device can only use backend transfer if both the old and
> > new QEMU agree that it can be done. At the time we start the origin
> > QEMU we know which set of devices are capable of doing an outgoing
> > backend transfer, but we don't know what set of devices are capable
> > of doing an incoming backend transfer.
> > 
> > If we don't have a per-device toggle at time of migration, then we
> > have to assume that the target QEMU can always support at least the
> > same set of incoming backends as the src QEMU outgoing backend. This
> > feels like a potentially risky assumption.
> 
> When using machine properties, these things should already be set by the
> machine types.

Errm, machine types apply to devices, but this is about transferring
backends which are outside the scope of machine types. 

> E.g. if this is a new QEMU with an old machine type, we should have this
> per-device property set to OFF forever when booting the VM, and should keep
> it like that after any rounds of migrations.  Because any VM using the old
> machine type _might_ be migrated back to an older QEMU that won't support
> it.  So IIUC that strictly follows how we use versioned machine types.

That makes no conceptual sense. Whether or not a particular backend
can be transferred is determined by the choice of backend and its
configuration. A "backend-transfer" property against the device
frontend cannot be set from the machine type definition, as the
machine type has no knowledge of what backend configuration will
be used.

> In general, I would prefer avoiding to introduce any form of list of
> devices into the migration system if ever possible.  I agree if we must
> introduce that it should at least be a list of IDs rather than adhoc array
> of strings.  However I still want to see whether we can completely avoid
> it.

Yes, anything in the migrate API would have to directly correspond
to an ID of a device frontend or backend.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter

Posted by Peter Xu 3 months, 3 weeks ago

On Thu, Oct 16, 2025 at 08:57:18PM +0100, Daniel P. Berrangé wrote:
> Errm, machine types apply to devices, but this is about transferring
> backends which are outside the scope of machine types. 

Ah.. I didn't notice that net backends are not inherited by default from
qdev, hence not applicable to machine type properties.

Is it possible we enable it somehow, so that backends can have compat
properties similarly to frontends?

If we go with a list of devices in the migration parameters, to me it'll
only be a way to workaround the missing of such capability of net backends.
Meanwhile, the admin will need to manage the list of devices even if the
admin doesn't really needed to, IMHO.

Thanks,

-- 
Peter Xu

Re: [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter

Posted by Daniel P. Berrangé 3 months, 3 weeks ago

On Thu, Oct 16, 2025 at 04:28:10PM -0400, Peter Xu wrote:
> On Thu, Oct 16, 2025 at 08:57:18PM +0100, Daniel P. Berrangé wrote:
> > Errm, machine types apply to devices, but this is about transferring
> > backends which are outside the scope of machine types. 
> 
> Ah.. I didn't notice that net backends are not inherited by default from
> qdev, hence not applicable to machine type properties.
> 
> Is it possible we enable it somehow, so that backends can have compat
> properties similarly to frontends?

That is a technical limitation, but the problem here is bigger than
just the lack of qdev. It is a conceptual one - where a device is
implemented, its behaviour is determined exclusively by the QEMU
code. There are some rare exceptions, like host PCI device assignment
where functionality is partly in the host hardware, or external
device backends where impl is offloaded to an external process, but
most pure QEMU impls are able to be made always migratable and compat
can be easily ensured long term via machine types props.

With backends, alot of behaviour is offloaded to either the host
OS, or to external libraries or services. Certain narrow configs
may be able to transfer state, but there will always be configs
were state transfer is impossible. There can be no coarse rule
that a backend is migratable or not - it will usually be highly
dependent on the particular configuration choices of the backend
in use.  Machine types props can't magically make all backend
config scenarios migratable. We need to be able to interrogate
backends at the time migration is required.

> If we go with a list of devices in the migration parameters, to me it'll
> only be a way to workaround the missing of such capability of net backends.
> Meanwhile, the admin will need to manage the list of devices even if the
> admin doesn't really needed to, IMHO.

We shouldn't need to list devices in every scenario. We need to focus on
the internal API design. We need to have suitable APIs exposed by backends
to allow us to query migratability and process vmstate a mere property
'backend-transfer' is insufficient, whether set by QEMU code, or set by
the mgmt app.

If we have proper APIs each device should be able to query whether its
backend can be transferred, and so "do the right thing" if backend
transfer is requested by migration. The ability to list devices in the
migrate command is only needed to be able to exclude some backends if
the purpose of migration is to change a backend

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter

Posted by Peter Xu 3 months, 3 weeks ago

On Fri, Oct 17, 2025 at 09:10:38AM +0100, Daniel P. Berrangé wrote:
> On Thu, Oct 16, 2025 at 04:28:10PM -0400, Peter Xu wrote:
> > On Thu, Oct 16, 2025 at 08:57:18PM +0100, Daniel P. Berrangé wrote:
> > > Errm, machine types apply to devices, but this is about transferring
> > > backends which are outside the scope of machine types. 
> > 
> > Ah.. I didn't notice that net backends are not inherited by default from
> > qdev, hence not applicable to machine type properties.
> > 
> > Is it possible we enable it somehow, so that backends can have compat
> > properties similarly to frontends?
> 
> That is a technical limitation, but the problem here is bigger than
> just the lack of qdev. It is a conceptual one - where a device is
> implemented, its behaviour is determined exclusively by the QEMU
> code. There are some rare exceptions, like host PCI device assignment
> where functionality is partly in the host hardware, or external
> device backends where impl is offloaded to an external process, but
> most pure QEMU impls are able to be made always migratable and compat
> can be easily ensured long term via machine types props.
> 
> With backends, alot of behaviour is offloaded to either the host
> OS, or to external libraries or services. Certain narrow configs
> may be able to transfer state, but there will always be configs
> were state transfer is impossible. There can be no coarse rule
> that a backend is migratable or not - it will usually be highly
> dependent on the particular configuration choices of the backend
> in use.  Machine types props can't magically make all backend
> config scenarios migratable. We need to be able to interrogate
> backends at the time migration is required.

I believe we have similar things already, like USO, which relies on the
kernel feature set that QEMU runs on.  What we do right now, afaiu, is we
make it a per-device property ON/OFF.  Then when unknown remote information
is required, we make it ON/OFF/AUTO.  When it's AUTO, it may prefer ON and
probe the kernel, dynamically decide the value on realize.

I didn't check the code if it's explicitly done like that, but I think
that's doable at least when a backend relies on such remote information.

> 
> > If we go with a list of devices in the migration parameters, to me it'll
> > only be a way to workaround the missing of such capability of net backends.
> > Meanwhile, the admin will need to manage the list of devices even if the
> > admin doesn't really needed to, IMHO.
> 
> We shouldn't need to list devices in every scenario. We need to focus on
> the internal API design. We need to have suitable APIs exposed by backends
> to allow us to query migratability and process vmstate a mere property
> 'backend-transfer' is insufficient, whether set by QEMU code, or set by
> the mgmt app.
> 
> If we have proper APIs each device should be able to query whether its
> backend can be transferred, and so "do the right thing" if backend
> transfer is requested by migration. The ability to list devices in the
> migrate command is only needed to be able to exclude some backends if
> the purpose of migration is to change a backend

IIUC, it is a proposal of using exclude-list, which should in most cases be
empty.

Yes, I agree it's at least better than query all the devices and having
mgmt specify each backend to enable backend-transfer.

However IIUC it also means the query API will be internal, so that
migration will need to be able to query that from device.

Then we have similar issue on what happens if we migrate from a new QEMU to
an old QEMU, that new QEMU (when migration module queries TAP) reports
per-device ON, however it won't actually work because dest QEMU is OFF.
IOW, we're still missing the functionality that we leverage from machine
type properties..

Or if we make the query to be visible to QMP / mgmt, then it'll at least
need to be a include-list, not exclude-list.

Then, we're literally bypassing the machine type versioning mechanism,
offloading all these to mgmt.

It should work, which I agree. But it also means we're reinventing the
wheel of what machine type properties were designed for... because if we
expose all these caps on all devices (as long as mutable after device
realize), we do not need machine type properties anymore.  They're
fundamentally solving the same problem, IMHO, on providing a working value
for migration no matter what the dest QEMU binary is.

Thanks,

-- 
Peter Xu

Re: [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter

Posted by Vladimir Sementsov-Ogievskiy 3 months, 3 weeks ago

On 17.10.25 11:10, Daniel P. Berrangé wrote:
>> If we go with a list of devices in the migration parameters, to me it'll
>> only be a way to workaround the missing of such capability of net backends.
>> Meanwhile, the admin will need to manage the list of devices even if the
>> admin doesn't really needed to, IMHO.
> We shouldn't need to list devices in every scenario. We need to focus on
> the internal API design. We need to have suitable APIs exposed by backends
> to allow us to query migratability and process vmstate a mere property
> 'backend-transfer' is insufficient, whether set by QEMU code, or set by
> the mgmt app.

I now imagine the following:

I already need an additional .pre_incoming migration handler for the feature,
see patch

  [PATCH v8 14/19] migration: introduce .pre_incoming() vmsd handler
.

I can add a boolean backend_transfer parameter to that handler, so that it
informs the device, that it should get the backend state from the migration
stream. And that's a good point to fail, if device doesn't support backend
transfer in current configuration.

If so, it seems logical to add symmetrical .pre_outgoing() vmsd handler,
with same backend_transfer parameter, to inform source devices (or get errors
from them).

Or, otherwise, make a separate VMSD handler .supports_backend_transfer(),
which should be called at start of incoming and outgoing migrations to
check the specified list of IDs, as well as we can also call it on
migrate-set-parameters, to get an earlier failure. And keep the devices
to call some migrate_backend_transfer(dev), to understand, should they
do backend-transfer or not (like in a diff, which I've sent yesterday
in this thread).

-- 
Best regards,
Vladimir

Re: [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter

Posted by Vladimir Sementsov-Ogievskiy 3 months, 3 weeks ago

On 17.10.25 11:10, Daniel P. Berrangé wrote:
>> Meanwhile, the admin will need to manage the list of devices even if the
>> admin doesn't really needed to, IMHO.
> We shouldn't need to list devices in every scenario. 

Do you mean, we may make union,

    backend-transfer = true | false | [list of IDs]

Where true means, enable backend-transfer for all supporting devices?
So that normally, we'll not list all devices, but just set it to true?

But this way, migration will fail, if target version doesn't support
backend-transfer for some of used devices, or support for some
another, where source lack the support. So that's a way to create a
situation, where two QEMUs, with same device options, same machine
types, same configurations and same migration parameters / capabilities
define incompatible migration states..

> We need to focus on
> the internal API design. We need to have suitable APIs exposed by backends
> to allow us to query migratability and process vmstate a mere property
> 'backend-transfer' is insufficient, whether set by QEMU code, or set by
> the mgmt app.
> 
> If we have proper APIs each device should be able to query whether its
> backend can be transferred, and so "do the right thing" if backend
> transfer is requested by migration. The ability to list devices in the
> migrate command is only needed to be able to exclude some backends if
> the purpose of migration is to change a backend

-- 
Best regards,
Vladimir

Re: [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter

Posted by Daniel P. Berrangé 3 months, 3 weeks ago

On Fri, Oct 17, 2025 at 11:26:59AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 17.10.25 11:10, Daniel P. Berrangé wrote:
> > > Meanwhile, the admin will need to manage the list of devices even if the
> > > admin doesn't really needed to, IMHO.
> > We shouldn't need to list devices in every scenario.
> 
> Do you mean, we may make union,
> 
>    backend-transfer = true | false | [list of IDs]
> 
> Where true means, enable backend-transfer for all supporting devices?
> So that normally, we'll not list all devices, but just set it to true?

Well I was thinking separate parameters

   backend-transfer: bool
   backend-transfer-devices: [str]   (optional list of IDs)

but it amounts to the same thing

> But this way, migration will fail, if target version doesn't support
> backend-transfer for some of used devices, or support for some
> another, where source lack the support. So that's a way to create a
> situation, where two QEMUs, with same device options, same machine
> types, same configurations and same migration parameters / capabilities
> define incompatible migration states..

It is worse - the backend on both sides may support transfer,
but may none the less be incompatible due to changed configuration,
so this needs mgmt app input too.

The challenge we have is that whether or not a backend supports
transfer requires fairly detailed know of QEMU and the specific
configuration of the backend. It is pretty undesirable for mgmt
apps to have to that knowledge, as the matrix of possibilities
is quite large and liable to change over time.

If we consider 'backend transfer' to be a performance optimization,
then really we want QEMU to "do the right thing" as much as is
possible.

Source and dst QEMUs don't have a bi-directional channel though,
so they can't negotiate the common subset of backends they both
support - it'll need help from the mgmt app.

One possibility is a new QMP command "query-migratable-backends"
which lists all device IDs, whose current backend configuration
is reporting the ability to transfer state. The mgmt app could
run that on both sides of the migration, take the intersection
of the two lists, and then further subtract any devices where
it has delibrately changed the backend configuration on the dst.

If we had that, then we could always pass the ID list to the
migrate command, while also avoiding hardcoding knowledge of
QEMU backend impl details - it would largely "just work".

> > We need to focus on
> > the internal API design. We need to have suitable APIs exposed by backends
> > to allow us to query migratability and process vmstate a mere property
> > 'backend-transfer' is insufficient, whether set by QEMU code, or set by
> > the mgmt app.
> > 
> > If we have proper APIs each device should be able to query whether its
> > backend can be transferred, and so "do the right thing" if backend
> > transfer is requested by migration. The ability to list devices in the
> > migrate command is only needed to be able to exclude some backends if
> > the purpose of migration is to change a backend

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter

Posted by Vladimir Sementsov-Ogievskiy 3 months, 3 weeks ago

On 17.10.25 11:50, Daniel P. Berrangé wrote:
> On Fri, Oct 17, 2025 at 11:26:59AM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> On 17.10.25 11:10, Daniel P. Berrangé wrote:
>>>> Meanwhile, the admin will need to manage the list of devices even if the
>>>> admin doesn't really needed to, IMHO.
>>> We shouldn't need to list devices in every scenario.
>>
>> Do you mean, we may make union,
>>
>>     backend-transfer = true | false | [list of IDs]
>>
>> Where true means, enable backend-transfer for all supporting devices?
>> So that normally, we'll not list all devices, but just set it to true?
> 
> Well I was thinking separate parameters
> 
>     backend-transfer: bool
>     backend-transfer-devices: [str]   (optional list of IDs)
> 
> but it amounts to the same thing
> 
>> But this way, migration will fail, if target version doesn't support
>> backend-transfer for some of used devices, or support for some
>> another, where source lack the support. So that's a way to create a
>> situation, where two QEMUs, with same device options, same machine
>> types, same configurations and same migration parameters / capabilities
>> define incompatible migration states..
> 
> It is worse - the backend on both sides may support transfer,
> but may none the less be incompatible due to changed configuration,
> so this needs mgmt app input too.
> 
> The challenge we have is that whether or not a backend supports
> transfer requires fairly detailed know of QEMU and the specific
> configuration of the backend. It is pretty undesirable for mgmt
> apps to have to that knowledge, as the matrix of possibilities
> is quite large and liable to change over time.
> 
> If we consider 'backend transfer' to be a performance optimization,
> then really we want QEMU to "do the right thing" as much as is
> possible.
> 
> Source and dst QEMUs don't have a bi-directional channel though,
> so they can't negotiate the common subset of backends they both
> support - it'll need help from the mgmt app.

As I heard from Peter, there a future plans to create such channel
https://wiki.qemu.org/ToDo/LiveMigration#Migration_handshake

> 
> One possibility is a new QMP command "query-migratable-backends"
> which lists all device IDs, whose current backend configuration
> is reporting the ability to transfer state. The mgmt app could
> run that on both sides of the migration, take the intersection
> of the two lists, and then further subtract any devices where
> it has delibrately changed the backend configuration on the dst.
> 
> If we had that, then we could always pass the ID list to the
> migrate command, while also avoiding hardcoding knowledge of
> QEMU backend impl details - it would largely "just work".


Yes "query + get intersection + set the list" works good for me.
That's enough abstract, the management app should not even care
what these IDs are.

And if migration-handshake realized, that (as many other
paraameters) may be simplified. We may finally have

    backend-transfer = "off" | "auto" | [list of IDs]

, where "auto" means exactly negotiate with target the maximal set
of devices, for which we can do backend-transfer.

> 
>>> We need to focus on
>>> the internal API design. We need to have suitable APIs exposed by backends
>>> to allow us to query migratability and process vmstate a mere property
>>> 'backend-transfer' is insufficient, whether set by QEMU code, or set by
>>> the mgmt app.
>>>
>>> If we have proper APIs each device should be able to query whether its
>>> backend can be transferred, and so "do the right thing" if backend
>>> transfer is requested by migration. The ability to list devices in the
>>> migrate command is only needed to be able to exclude some backends if
>>> the purpose of migration is to change a backend
> 
> With regards,
> Daniel


-- 
Best regards,
Vladimir

Re: [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter

Posted by Vladimir Sementsov-Ogievskiy 3 months, 3 weeks ago

On 16.10.25 23:28, Peter Xu wrote:
> On Thu, Oct 16, 2025 at 08:57:18PM +0100, Daniel P. Berrangé wrote:
>> Errm, machine types apply to devices, but this is about transferring
>> backends which are outside the scope of machine types.
> 
> Ah.. I didn't notice that net backends are not inherited by default from
> qdev, hence not applicable to machine type properties.
> 
> Is it possible we enable it somehow, so that backends can have compat
> properties similarly to frontends?

But that would mean, that we can't reconfigure a backend during live migration.

In my understanding, machine type properties are visible to the guest,
and that's why we can't change them for running vm, even during live
migration.

Bringing here another type of properties, which we _can_ change for
running vm (even if changing is not very comfortable for admin), will
be like tying ourselves hands.

And yes, there is a way to change any properties by qom-set. But it
lays out of paradigm of machine types, and normally we can't change
most of properties in flight.

Or in other words: if we _can_ go on only with migration parameters,
that actually shows, that what we are talking about is definitely
property of migration, not property of device.

And final note: if we can use one mechanism instead of two mechanisms,
it makes the architecture twice simpler. Trying to go on with _only_
device properties would mean run a bench of qom-set commands before
every migration (as we have to distinguish local and remote migrations
anyway), that looks bad. On the other hand, go on with _only_ migration
parameter is feasible and looks better.

And very final note: making global parameter + per-device parameters,
actually, global parameter become a workaround to the fact that we
don't want run a bench of qom-set commands. So, global parameter is
an additional API to hide inconvenience of the main API.

> 
> If we go with a list of devices in the migration parameters, to me it'll
> only be a way to workaround the missing of such capability of net backends.
> Meanwhile, the admin will need to manage the list of devices even if the
> admin doesn't really needed to, IMHO.
> 
> Thanks,
> 

-- 
Best regards,
Vladimir

Re: [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter

Posted by Peter Xu 3 months, 3 weeks ago

On Fri, Oct 17, 2025 at 09:51:26AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 16.10.25 23:28, Peter Xu wrote:
> > On Thu, Oct 16, 2025 at 08:57:18PM +0100, Daniel P. Berrangé wrote:
> > > Errm, machine types apply to devices, but this is about transferring
> > > backends which are outside the scope of machine types.
> > 
> > Ah.. I didn't notice that net backends are not inherited by default from
> > qdev, hence not applicable to machine type properties.
> > 
> > Is it possible we enable it somehow, so that backends can have compat
> > properties similarly to frontends?
> 
> But that would mean, that we can't reconfigure a backend during live migration.
> 
> In my understanding, machine type properties are visible to the guest,
> and that's why we can't change them for running vm, even during live
> migration.

IIUC machine type properties may or may not be visible to the guest.  It
should depend on whether it is relevant to a guest-visible behavior.  Here
a flag showing "whether TAP, as a backend, can migrate" shouldn't be
exposed to guest.

I was indeed expecting that one will need to qom-set it for each device if
you want to get rid of versioned machine types.  It's not ideal interfacing
as what Dan was looking for, but it should still work so far, and I think
it might still be fair if it's only needed without machine type versionings.

> 
> Bringing here another type of properties, which we _can_ change for
> running vm (even if changing is not very comfortable for admin), will
> be like tying ourselves hands.
> 
> And yes, there is a way to change any properties by qom-set. But it
> lays out of paradigm of machine types, and normally we can't change
> most of properties in flight.
> 
> 
> Or in other words: if we _can_ go on only with migration parameters,
> that actually shows, that what we are talking about is definitely
> property of migration, not property of device.
> 
> 
> And final note: if we can use one mechanism instead of two mechanisms,
> it makes the architecture twice simpler. Trying to go on with _only_
> device properties would mean run a bench of qom-set commands before
> every migration (as we have to distinguish local and remote migrations
> anyway), that looks bad. On the other hand, go on with _only_ migration
> parameter is feasible and looks better.
> 
> 
> And very final note: making global parameter + per-device parameters,
> actually, global parameter become a workaround to the fact that we
> don't want run a bench of qom-set commands. So, global parameter is
> an additional API to hide inconvenience of the main API.

IMHO it's not a workaround.  To me, it's a better way of abstraction,
because the migration side provides the capability of passing FDs, and
whatever is generic about that should be attached to the global knob.
Migration shouldn't care about behavior or attributes of a specific device.
Listing the devices in any way in migration's QAPI is a workaround instead.

But I agree I do not know whether it's easy to have net backends support
machine types properties.  I think it still makes sense logically that a
net backend is a TYPE_DEVICE, even if it's a backend device which is not
directly visible to the guest.

Thanks,

-- 
Peter Xu

Re: [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter

Posted by Daniel P. Berrangé 3 months, 3 weeks ago

On Thu, Oct 16, 2025 at 07:51:42PM +0100, Daniel P. Berrangé wrote:
> On Thu, Oct 16, 2025 at 02:40:58PM -0400, Peter Xu wrote:
> > On Thu, Oct 16, 2025 at 12:23:35PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > On 16.10.25 11:32, Daniel P. Berrangé wrote:
> > > > On Thu, Oct 16, 2025 at 12:02:45AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > > > 1. Remote migration: we can't reuse backends (files, sockets, host devices), as
> > > > > we are moving to another host. So, we don't enable "backend-transfer". We don't
> > > > > transfer the backend, we have to initialize new backend on another host.
> > > > > 
> > > > > 2. Local migration to update QEMU, with minimal freeze-time and minimal
> > > > > extra actions: use "backend-transfer", exactly to keep the backends
> > > > > (vhost-user-server, TAP device in kernel, in-kernel vfio device state, etc)
> > > > > as is.
> > > > > 
> > > > > 3. Local migration, but we want to reconfigure some backend, or switch
> > > > > to another backend. We disable "backend-transfer" for one device.
> > > > 
> > > > This implies that you're changing 'backend-transfer' against the
> > > > device at time of each migration.
> > > > 
> > > > This takes us back to the situation we've had historically where the
> > > > behaviour of migration depends on global properties the mgmt app has
> > > > set prior to the 'migrate' command being run. We've just tried to get
> > > > away from that model by passing everything as parameters to the
> > > > migrate command, so I'm loathe to see us invent a new way to have
> > > > global state properties changing migration behaviour.
> > > > 
> > > > This 'backend-transfer' device property is not really a device property,
> > > > it is an indirect parameter to the 'migrate' command.
> > 
> > I was not seeing it like that.
> > 
> > I was treating per-device parameter to be a flag showing whether the device
> > is capable of passing over FDs, which is more like a device attribute.

Whether a backend is technically capable of transfer shouldn't require a
user specified property - there should be an internal API to query whether
the current backend configuration is transferrable or not, based on the
code implementation. Allowing a mgmt app to specify this can only lead
to mistakes, because they don't know the internal constraints of the
implementation.

The mgmt app should only be concerned with whether they want to transfer
a backend or not which is a time-of-use decision rather than launch time
decision.


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter

Posted by Peter Xu 3 months, 3 weeks ago

On Thu, Oct 16, 2025 at 08:19:37PM +0100, Daniel P. Berrangé wrote:
> On Thu, Oct 16, 2025 at 07:51:42PM +0100, Daniel P. Berrangé wrote:
> > On Thu, Oct 16, 2025 at 02:40:58PM -0400, Peter Xu wrote:
> > > On Thu, Oct 16, 2025 at 12:23:35PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > > On 16.10.25 11:32, Daniel P. Berrangé wrote:
> > > > > On Thu, Oct 16, 2025 at 12:02:45AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > > > > 1. Remote migration: we can't reuse backends (files, sockets, host devices), as
> > > > > > we are moving to another host. So, we don't enable "backend-transfer". We don't
> > > > > > transfer the backend, we have to initialize new backend on another host.
> > > > > > 
> > > > > > 2. Local migration to update QEMU, with minimal freeze-time and minimal
> > > > > > extra actions: use "backend-transfer", exactly to keep the backends
> > > > > > (vhost-user-server, TAP device in kernel, in-kernel vfio device state, etc)
> > > > > > as is.
> > > > > > 
> > > > > > 3. Local migration, but we want to reconfigure some backend, or switch
> > > > > > to another backend. We disable "backend-transfer" for one device.
> > > > > 
> > > > > This implies that you're changing 'backend-transfer' against the
> > > > > device at time of each migration.
> > > > > 
> > > > > This takes us back to the situation we've had historically where the
> > > > > behaviour of migration depends on global properties the mgmt app has
> > > > > set prior to the 'migrate' command being run. We've just tried to get
> > > > > away from that model by passing everything as parameters to the
> > > > > migrate command, so I'm loathe to see us invent a new way to have
> > > > > global state properties changing migration behaviour.
> > > > > 
> > > > > This 'backend-transfer' device property is not really a device property,
> > > > > it is an indirect parameter to the 'migrate' command.
> > > 
> > > I was not seeing it like that.
> > > 
> > > I was treating per-device parameter to be a flag showing whether the device
> > > is capable of passing over FDs, which is more like a device attribute.
> 
> Whether a backend is technically capable of transfer shouldn't require a
> user specified property - there should be an internal API to query whether
> the current backend configuration is transferrable or not, based on the
> code implementation. Allowing a mgmt app to specify this can only lead
> to mistakes, because they don't know the internal constraints of the
> implementation.
> 
> The mgmt app should only be concerned with whether they want to transfer
> a backend or not which is a time-of-use decision rather than launch time
> decision.

IMHO the per-device property, when available, should always mean it fully
support the feature, when it is turned ON.

I also think above statement matches exactly how I see it..  I never
expected mgmt to toggle the per-device properties, as I just left similar
statements in another reply.

That's also why I think the global backend-transfer should be the only
thing exposed to mgmt.  So even if the device properties would exist, they
should only be used in compat properties for the upstream QEMUs.

They're still needed, and be helpful when other devices introduce some
similar concepts to support fd passover, then on some machine types when
the global feature enabled, QEMU will automatically do fd-pass for some
devices and some not, based on the machine type.

Thanks,

-- 
Peter Xu

Re: [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter

Posted by Daniel P. Berrangé 3 months, 3 weeks ago

On Thu, Oct 16, 2025 at 03:39:03PM -0400, Peter Xu wrote:
> On Thu, Oct 16, 2025 at 08:19:37PM +0100, Daniel P. Berrangé wrote:
> > On Thu, Oct 16, 2025 at 07:51:42PM +0100, Daniel P. Berrangé wrote:
> > > On Thu, Oct 16, 2025 at 02:40:58PM -0400, Peter Xu wrote:
> > > > On Thu, Oct 16, 2025 at 12:23:35PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > > > On 16.10.25 11:32, Daniel P. Berrangé wrote:
> > > > > > On Thu, Oct 16, 2025 at 12:02:45AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > > > > > 1. Remote migration: we can't reuse backends (files, sockets, host devices), as
> > > > > > > we are moving to another host. So, we don't enable "backend-transfer". We don't
> > > > > > > transfer the backend, we have to initialize new backend on another host.
> > > > > > > 
> > > > > > > 2. Local migration to update QEMU, with minimal freeze-time and minimal
> > > > > > > extra actions: use "backend-transfer", exactly to keep the backends
> > > > > > > (vhost-user-server, TAP device in kernel, in-kernel vfio device state, etc)
> > > > > > > as is.
> > > > > > > 
> > > > > > > 3. Local migration, but we want to reconfigure some backend, or switch
> > > > > > > to another backend. We disable "backend-transfer" for one device.
> > > > > > 
> > > > > > This implies that you're changing 'backend-transfer' against the
> > > > > > device at time of each migration.
> > > > > > 
> > > > > > This takes us back to the situation we've had historically where the
> > > > > > behaviour of migration depends on global properties the mgmt app has
> > > > > > set prior to the 'migrate' command being run. We've just tried to get
> > > > > > away from that model by passing everything as parameters to the
> > > > > > migrate command, so I'm loathe to see us invent a new way to have
> > > > > > global state properties changing migration behaviour.
> > > > > > 
> > > > > > This 'backend-transfer' device property is not really a device property,
> > > > > > it is an indirect parameter to the 'migrate' command.
> > > > 
> > > > I was not seeing it like that.
> > > > 
> > > > I was treating per-device parameter to be a flag showing whether the device
> > > > is capable of passing over FDs, which is more like a device attribute.
> > 
> > Whether a backend is technically capable of transfer shouldn't require a
> > user specified property - there should be an internal API to query whether
> > the current backend configuration is transferrable or not, based on the
> > code implementation. Allowing a mgmt app to specify this can only lead
> > to mistakes, because they don't know the internal constraints of the
> > implementation.
> > 
> > The mgmt app should only be concerned with whether they want to transfer
> > a backend or not which is a time-of-use decision rather than launch time
> > decision.
> 
> IMHO the per-device property, when available, should always mean it fully
> support the feature, when it is turned ON.

That can't be expressed in a property in the device.

Consider the virtio-net device.  The backend transfer is only
possible of the virtio-net is associated with a netdev using
the vhost-user backend, and the vhost-user backend must be
using a chardev with a socket backend, and the socket backend
must not have TLS or websockets enabled.

Migratability of the backend requires an API against the
NetClientInfo object, which will in turn require calling
out to an API against the Chardv object.


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter

Posted by Vladimir Sementsov-Ogievskiy 3 months, 3 weeks ago

On 16.10.25 12:23, Vladimir Sementsov-Ogievskiy wrote:
> On 16.10.25 11:32, Daniel P. Berrangé wrote:
>> On Thu, Oct 16, 2025 at 12:02:45AM +0300, Vladimir Sementsov-Ogievskiy wrote:
>>> On 15.10.25 23:07, Peter Xu wrote:
>>>> On Wed, Oct 15, 2025 at 10:02:14PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>>>>> On 15.10.25 21:19, Peter Xu wrote:
>>>>>> On Wed, Oct 15, 2025 at 04:21:32PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>>>>>>> This parameter enables backend-transfer feature: all devices
>>>>>>> which support it will migrate their backends (for example a TAP
>>>>>>> device, by passing open file descriptor to migration channel).
>>>>>>>
>>>>>>> Currently no such devices, so the new parameter is a noop.
>>>>>>>
>>>>>>> Next commit will add support for virtio-net, to migrate its
>>>>>>> TAP backend.
>>>>>>>
>>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
>>>>>>> ---
>>>>>
>>>>> [..]
>>>>>
>>>>>>> --- a/qapi/migration.json
>>>>>>> +++ b/qapi/migration.json
>>>>>>> @@ -951,9 +951,16 @@
>>>>>>>     #     is @cpr-exec.  The first list element is the program's filename,
>>>>>>>     #     the remainder its arguments.  (Since 10.2)
>>>>>>>     #
>>>>>>> +# @backend-transfer: Enable backend-transfer feature for devices that
>>>>>>> +#     supports it. In general that means that backend state and its
>>>>>>> +#     file descriptors are passed to the destination in the migraton
>>>>>>> +#     channel (which must be a UNIX socket). Individual devices
>>>>>>> +#     declare the support for backend-transfer by per-device
>>>>>>> +#     backend-transfer option. (Since 10.2)
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>> I still prefer the name "fd-passing" or anything more explicit than
>>>>>> "backend-transfer". Maybe the current name is fine for TAP, only because
>>>>>> TAP doesn't have its own VMSD to transfer?
>>>>>>
>>>>>> Consider a device that would be a backend that supports VMSDs already to be
>>>>>> migrated, then if it starts to allow fd-passing, this name will stop being
>>>>>> suitable there, because it used to "transfer backend" already, now it's
>>>>>> just started to "fd-passing".
>>>>>>
>>>>>> Meanwhile, consider another example - what if a device is not a backend at
>>>>>> all (e.g. vfio?), has its own VMSD, then want to do fd-passing?
>>>>>
>>>>> Reasonable.
>>>>>
>>>>> But consider also the discussion with Fabiano in v5, where he argues against fds
>>>>> (reasonable too):
>>>>>
>>>>> https://lore.kernel.org/qemu-devel/87y0qatqoa.fsf@suse.de/
>>>>>
>>>>> (still, they were against my "fds" name for the parameter, which is
>>>>> really too generic, fd-passing is not)
>>>>>
>>>>> and the arguments for backend-transfer (to read similar with cpr-transfer)
>>>>>
>>>>> https://lore.kernel.org/qemu-devel/87ms6qtlgf.fsf@suse.de/
>>>>>
>>>>>
>>>>>>
>>>>>> In general, I think "fd" is really a core concept of this whole thing.
>>>>>
>>>>> I think, we can call "backend" any external object, linked by the fd.
>>>>>
>>>>> Still, backend/frontend terminology is so misleading, when applied to
>>>>> complex systems (for me, at least), that I don't really like "-backend"
>>>>> word here.
>>>>>
>>>>> fd-passing is OK for me, I can resend with it, if arguments by Fabiano
>>>>> not change your mind.
>>>>
>>>> Ah, I didn't notice the name has been discussed.
>>>>
>>>> I think it means you can vote for your own preference now because we have
>>>> one vote for each. :) Let's also see whether Fabiano will come up with
>>>> something better than both.
>>>>
>>>> You mentioned explicitly the file descriptors in the qapi doc, that's what
>>>> I would strongly request for.  The other thing is the unix socket check, it
>>>> looks all good below now with it, thanks.  No strong feelings on the names.
>>>>
>>>
>>> After a bit more thinking, I leaning towards keeping backend-transfer. I think
>>> it's more meaningful for the user:
>>>
>>> If we call it "fd-passing", user may ask:
>>>
>>> Ok, what is it? Allow QEMU to pass some fds through migration stream, if it
>>> supports fds? Which fds? Why to pass them? Finally, why QEMU can't just check
>>> is it unix socket or not, and pass any fds it wants if it is?
>>>
>>> Logical question is, why not just drop the global capability, and check only
>>> is it unix socket or not? (OK, relying only on socket type is wrong anyway,
>>> as it may be some complex tunneling, which includes unix sockets, but still
>>> can't pass fds, but I think now about feature naming)
>>>
>>> But we really want an explicit switch for the feature. As qemu-update is
>>> not the only case of local migration. The another case is changing the
>>> backend. So for the user's choice is:
>>>
>>> 1. Remote migration: we can't reuse backends (files, sockets, host devices), as
>>> we are moving to another host. So, we don't enable "backend-transfer". We don't
>>> transfer the backend, we have to initialize new backend on another host.
>>>
>>> 2. Local migration to update QEMU, with minimal freeze-time and minimal
>>> extra actions: use "backend-transfer", exactly to keep the backends
>>> (vhost-user-server, TAP device in kernel, in-kernel vfio device state, etc)
>>> as is.
>>>
>>> 3. Local migration, but we want to reconfigure some backend, or switch
>>> to another backend. We disable "backend-transfer" for one device.
>>
>> This implies that you're changing 'backend-transfer' against the
>> device at time of each migration.
>>
>> This takes us back to the situation we've had historically where the
>> behaviour of migration depends on global properties the mgmt app has
>> set prior to the 'migrate' command being run. We've just tried to get
>> away from that model by passing everything as parameters to the
>> migrate command, so I'm loathe to see us invent a new way to have
>> global state properties changing migration behaviour.
>>
>> This 'backend-transfer' device property is not really a device property,
>> it is an indirect parameter to the 'migrate' command.
>>
>> Ergo, if we need the ability to selectively migrate the backend state
>> of individal devices, then instead of a property on the device, we
>> should pass a list of device IDs as a parameter to the migrate
>> command in QMP.
> 
> Understand.
> 
> So, it will look like
> 
> # @backend-transfer: List of devices IDs or QOM paths, to enable
> #     backend-transfer for. In general that means that backend
> #     states and their file descriptors are passed to the destination
> #     in the migration channel (which must be a UNIX socket), and
> #     management tool doesn't have to configure new backends for
> #     target QEMU (like vhost-user server, or TAP device in the kernel).
> #     Default is no backend-transfer migration (Since 10.2)
> 
> 
> Peter, is it OK for you?
> 
> 

Or, may be, we just can continue with two simple experimental boolean parameters:

@backend-transfer-vhost-user-blk

and

@backend-transfer-virtio-net-tap


and not care to implement good-final-complex-API, while it's unstable anyway?



-- 
Best regards,
Vladimir

Re: [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter

Posted by Daniel P. Berrangé 3 months, 3 weeks ago

On Thu, Oct 16, 2025 at 01:38:25PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 16.10.25 12:23, Vladimir Sementsov-Ogievskiy wrote:
> > On 16.10.25 11:32, Daniel P. Berrangé wrote:
> > > On Thu, Oct 16, 2025 at 12:02:45AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > > On 15.10.25 23:07, Peter Xu wrote:
> > > > > On Wed, Oct 15, 2025 at 10:02:14PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > > > > On 15.10.25 21:19, Peter Xu wrote:
> > > > > > > On Wed, Oct 15, 2025 at 04:21:32PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > > > > > > This parameter enables backend-transfer feature: all devices
> > > > > > > > which support it will migrate their backends (for example a TAP
> > > > > > > > device, by passing open file descriptor to migration channel).
> > > > > > > > 
> > > > > > > > Currently no such devices, so the new parameter is a noop.
> > > > > > > > 
> > > > > > > > Next commit will add support for virtio-net, to migrate its
> > > > > > > > TAP backend.
> > > > > > > > 
> > > > > > > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> > > > > > > > ---
> > > > > > 
> > > > > > [..]
> > > > > > 
> > > > > > > > --- a/qapi/migration.json
> > > > > > > > +++ b/qapi/migration.json
> > > > > > > > @@ -951,9 +951,16 @@
> > > > > > > >     #     is @cpr-exec.  The first list element is the program's filename,
> > > > > > > >     #     the remainder its arguments.  (Since 10.2)
> > > > > > > >     #
> > > > > > > > +# @backend-transfer: Enable backend-transfer feature for devices that
> > > > > > > > +#     supports it. In general that means that backend state and its
> > > > > > > > +#     file descriptors are passed to the destination in the migraton
> > > > > > > > +#     channel (which must be a UNIX socket). Individual devices
> > > > > > > > +#     declare the support for backend-transfer by per-device
> > > > > > > > +#     backend-transfer option. (Since 10.2)
> > > > > > > 
> > > > > > > Thanks.
> > > > > > > 
> > > > > > > I still prefer the name "fd-passing" or anything more explicit than
> > > > > > > "backend-transfer". Maybe the current name is fine for TAP, only because
> > > > > > > TAP doesn't have its own VMSD to transfer?
> > > > > > > 
> > > > > > > Consider a device that would be a backend that supports VMSDs already to be
> > > > > > > migrated, then if it starts to allow fd-passing, this name will stop being
> > > > > > > suitable there, because it used to "transfer backend" already, now it's
> > > > > > > just started to "fd-passing".
> > > > > > > 
> > > > > > > Meanwhile, consider another example - what if a device is not a backend at
> > > > > > > all (e.g. vfio?), has its own VMSD, then want to do fd-passing?
> > > > > > 
> > > > > > Reasonable.
> > > > > > 
> > > > > > But consider also the discussion with Fabiano in v5, where he argues against fds
> > > > > > (reasonable too):
> > > > > > 
> > > > > > https://lore.kernel.org/qemu-devel/87y0qatqoa.fsf@suse.de/
> > > > > > 
> > > > > > (still, they were against my "fds" name for the parameter, which is
> > > > > > really too generic, fd-passing is not)
> > > > > > 
> > > > > > and the arguments for backend-transfer (to read similar with cpr-transfer)
> > > > > > 
> > > > > > https://lore.kernel.org/qemu-devel/87ms6qtlgf.fsf@suse.de/
> > > > > > 
> > > > > > 
> > > > > > > 
> > > > > > > In general, I think "fd" is really a core concept of this whole thing.
> > > > > > 
> > > > > > I think, we can call "backend" any external object, linked by the fd.
> > > > > > 
> > > > > > Still, backend/frontend terminology is so misleading, when applied to
> > > > > > complex systems (for me, at least), that I don't really like "-backend"
> > > > > > word here.
> > > > > > 
> > > > > > fd-passing is OK for me, I can resend with it, if arguments by Fabiano
> > > > > > not change your mind.
> > > > > 
> > > > > Ah, I didn't notice the name has been discussed.
> > > > > 
> > > > > I think it means you can vote for your own preference now because we have
> > > > > one vote for each. :) Let's also see whether Fabiano will come up with
> > > > > something better than both.
> > > > > 
> > > > > You mentioned explicitly the file descriptors in the qapi doc, that's what
> > > > > I would strongly request for.  The other thing is the unix socket check, it
> > > > > looks all good below now with it, thanks.  No strong feelings on the names.
> > > > > 
> > > > 
> > > > After a bit more thinking, I leaning towards keeping backend-transfer. I think
> > > > it's more meaningful for the user:
> > > > 
> > > > If we call it "fd-passing", user may ask:
> > > > 
> > > > Ok, what is it? Allow QEMU to pass some fds through migration stream, if it
> > > > supports fds? Which fds? Why to pass them? Finally, why QEMU can't just check
> > > > is it unix socket or not, and pass any fds it wants if it is?
> > > > 
> > > > Logical question is, why not just drop the global capability, and check only
> > > > is it unix socket or not? (OK, relying only on socket type is wrong anyway,
> > > > as it may be some complex tunneling, which includes unix sockets, but still
> > > > can't pass fds, but I think now about feature naming)
> > > > 
> > > > But we really want an explicit switch for the feature. As qemu-update is
> > > > not the only case of local migration. The another case is changing the
> > > > backend. So for the user's choice is:
> > > > 
> > > > 1. Remote migration: we can't reuse backends (files, sockets, host devices), as
> > > > we are moving to another host. So, we don't enable "backend-transfer". We don't
> > > > transfer the backend, we have to initialize new backend on another host.
> > > > 
> > > > 2. Local migration to update QEMU, with minimal freeze-time and minimal
> > > > extra actions: use "backend-transfer", exactly to keep the backends
> > > > (vhost-user-server, TAP device in kernel, in-kernel vfio device state, etc)
> > > > as is.
> > > > 
> > > > 3. Local migration, but we want to reconfigure some backend, or switch
> > > > to another backend. We disable "backend-transfer" for one device.
> > > 
> > > This implies that you're changing 'backend-transfer' against the
> > > device at time of each migration.
> > > 
> > > This takes us back to the situation we've had historically where the
> > > behaviour of migration depends on global properties the mgmt app has
> > > set prior to the 'migrate' command being run. We've just tried to get
> > > away from that model by passing everything as parameters to the
> > > migrate command, so I'm loathe to see us invent a new way to have
> > > global state properties changing migration behaviour.
> > > 
> > > This 'backend-transfer' device property is not really a device property,
> > > it is an indirect parameter to the 'migrate' command.
> > > 
> > > Ergo, if we need the ability to selectively migrate the backend state
> > > of individal devices, then instead of a property on the device, we
> > > should pass a list of device IDs as a parameter to the migrate
> > > command in QMP.
> > 
> > Understand.
> > 
> > So, it will look like
> > 
> > # @backend-transfer: List of devices IDs or QOM paths, to enable
> > #     backend-transfer for. In general that means that backend
> > #     states and their file descriptors are passed to the destination
> > #     in the migration channel (which must be a UNIX socket), and
> > #     management tool doesn't have to configure new backends for
> > #     target QEMU (like vhost-user server, or TAP device in the kernel).
> > #     Default is no backend-transfer migration (Since 10.2)
> > 
> > 
> > Peter, is it OK for you?
> 
> Or, may be, we just can continue with two simple experimental boolean parameters:
> 
> @backend-transfer-vhost-user-blk
> 
> and
> 
> @backend-transfer-virtio-net-tap
> 
> 
> and not care to implement good-final-complex-API, while it's unstable anyway?

Even if declared unstable, that still has a negative impact on the internal
code structure because its putting special cases for certain device types
into the migration framework and the device code, with no time limit on how
long this technical debt will last.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|