This parameter enables backend-transfer feature: all devices
which support it will migrate their backends (for example a TAP
device, by passing open file descriptor to migration channel).
Currently no such devices, so the new parameter is a noop.
Next commit will add support for virtio-net, to migrate its
TAP backend.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
---
migration/options.c | 18 ++++++++++++++++++
migration/options.h | 2 ++
qapi/migration.json | 38 ++++++++++++++++++++++++++++++++------
3 files changed, 52 insertions(+), 6 deletions(-)
diff --git a/migration/options.c b/migration/options.c
index 5183112775..a461b07b54 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -262,6 +262,12 @@ bool migrate_mapped_ram(void)
return s->capabilities[MIGRATION_CAPABILITY_MAPPED_RAM];
}
+bool migrate_backend_transfer(void)
+{
+ MigrationState *s = migrate_get_current();
+ return s->parameters.backend_transfer;
+}
+
bool migrate_ignore_shared(void)
{
MigrationState *s = migrate_get_current();
@@ -963,6 +969,9 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
params->cpr_exec_command = QAPI_CLONE(strList,
s->parameters.cpr_exec_command);
+ params->has_backend_transfer = true;
+ params->backend_transfer = s->parameters.backend_transfer;
+
return params;
}
@@ -997,6 +1006,7 @@ void migrate_params_init(MigrationParameters *params)
params->has_zero_page_detection = true;
params->has_direct_io = true;
params->has_cpr_exec_command = true;
+ params->has_backend_transfer = true;
}
/*
@@ -1305,6 +1315,10 @@ static void migrate_params_test_apply(MigrateSetParameters *params,
if (params->has_cpr_exec_command) {
dest->cpr_exec_command = params->cpr_exec_command;
}
+
+ if (params->has_backend_transfer) {
+ dest->backend_transfer = params->backend_transfer;
+ }
}
static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
@@ -1443,6 +1457,10 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
s->parameters.cpr_exec_command =
QAPI_CLONE(strList, params->cpr_exec_command);
}
+
+ if (params->has_backend_transfer) {
+ s->parameters.backend_transfer = params->backend_transfer;
+ }
}
void qmp_migrate_set_parameters(MigrateSetParameters *params, Error **errp)
diff --git a/migration/options.h b/migration/options.h
index 82d839709e..755ba1c024 100644
--- a/migration/options.h
+++ b/migration/options.h
@@ -87,6 +87,8 @@ const char *migrate_tls_hostname(void);
uint64_t migrate_xbzrle_cache_size(void);
ZeroPageDetection migrate_zero_page_detection(void);
+bool migrate_backend_transfer(void);
+
/* parameters helpers */
bool migrate_params_check(MigrationParameters *params, Error **errp);
diff --git a/qapi/migration.json b/qapi/migration.json
index be0f3fcc12..35601a1f87 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -951,9 +951,16 @@
# is @cpr-exec. The first list element is the program's filename,
# the remainder its arguments. (Since 10.2)
#
+# @backend-transfer: Enable backend-transfer feature for devices that
+# supports it. In general that means that backend state and its
+# file descriptors are passed to the destination in the migraton
+# channel (which must be a UNIX socket). Individual devices
+# declare the support for backend-transfer by per-device
+# backend-transfer option. (Since 10.2)
+#
# Features:
#
-# @unstable: Members @x-checkpoint-delay and
+# @unstable: Members @backend-transfer, @x-checkpoint-delay and
# @x-vcpu-dirty-limit-period are experimental.
#
# Since: 2.4
@@ -978,7 +985,8 @@
'mode',
'zero-page-detection',
'direct-io',
- 'cpr-exec-command'] }
+ 'cpr-exec-command',
+ { 'name': 'backend-transfer', 'features': ['unstable'] } ] }
##
# @MigrateSetParameters:
@@ -1137,9 +1145,16 @@
# is @cpr-exec. The first list element is the program's filename,
# the remainder its arguments. (Since 10.2)
#
+# @backend-transfer: Enable backend-transfer feature for devices that
+# supports it. In general that means that backend state and its
+# file descriptors are passed to the destination in the migraton
+# channel (which must be a UNIX socket). Individual devices
+# declare the support for backend-transfer by per-device
+# backend-transfer option. (Since 10.2)
+#
# Features:
#
-# @unstable: Members @x-checkpoint-delay and
+# @unstable: Members @backend-transfer, @x-checkpoint-delay and
# @x-vcpu-dirty-limit-period are experimental.
#
# TODO: either fuse back into `MigrationParameters`, or make
@@ -1179,7 +1194,9 @@
'*mode': 'MigMode',
'*zero-page-detection': 'ZeroPageDetection',
'*direct-io': 'bool',
- '*cpr-exec-command': [ 'str' ]} }
+ '*cpr-exec-command': [ 'str' ],
+ '*backend-transfer': { 'type': 'bool',
+ 'features': [ 'unstable' ] } } }
##
# @migrate-set-parameters:
@@ -1352,9 +1369,16 @@
# is @cpr-exec. The first list element is the program's filename,
# the remainder its arguments. (Since 10.2)
#
+# @backend-transfer: Enable backend-transfer feature for devices that
+# supports it. In general that means that backend state and its
+# file descriptors are passed to the destination in the migraton
+# channel (which must be a UNIX socket). Individual devices
+# declare the support for backend-transfer by per-device
+# backend-transfer option. (Since 10.2)
+#
# Features:
#
-# @unstable: Members @x-checkpoint-delay and
+# @unstable: Members @backend-transfer, @x-checkpoint-delay and
# @x-vcpu-dirty-limit-period are experimental.
#
# Since: 2.4
@@ -1391,7 +1415,9 @@
'*mode': 'MigMode',
'*zero-page-detection': 'ZeroPageDetection',
'*direct-io': 'bool',
- '*cpr-exec-command': [ 'str' ]} }
+ '*cpr-exec-command': [ 'str' ],
+ '*backend-transfer': { 'type': 'bool',
+ 'features': [ 'unstable' ] } } }
##
# @query-migrate-parameters:
--
2.48.1
Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> writes:
> This parameter enables backend-transfer feature: all devices
> which support it will migrate their backends (for example a TAP
> device, by passing open file descriptor to migration channel).
>
> Currently no such devices, so the new parameter is a noop.
>
> Next commit will add support for virtio-net, to migrate its
> TAP backend.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
[...]
> diff --git a/qapi/migration.json b/qapi/migration.json
> index be0f3fcc12..35601a1f87 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -951,9 +951,16 @@
> # is @cpr-exec. The first list element is the program's filename,
> # the remainder its arguments. (Since 10.2)
> #
> +# @backend-transfer: Enable backend-transfer feature for devices that
Either "Enable the backend transfer feature" or "Enable backend transfer"
> +# supports it. In general that means that backend state and its
support
> +# file descriptors are passed to the destination in the migraton
> +# channel (which must be a UNIX socket). Individual devices
> +# declare the support for backend-transfer by per-device
> +# backend-transfer option. (Since 10.2)
> +#
I'm not sure I understand this.
What is a "per-device backend-transfer option"? Is it a device
property?
If yes, I guess the device declares its capability to do this by having
this property. Correct?
Does the property's value matter? How?
> # Features:
> #
> -# @unstable: Members @x-checkpoint-delay and
> +# @unstable: Members @backend-transfer, @x-checkpoint-delay and
> # @x-vcpu-dirty-limit-period are experimental.
> #
> # Since: 2.4
> @@ -978,7 +985,8 @@
> 'mode',
> 'zero-page-detection',
> 'direct-io',
> - 'cpr-exec-command'] }
> + 'cpr-exec-command',
> + { 'name': 'backend-transfer', 'features': ['unstable'] } ] }
>
> ##
> # @MigrateSetParameters:
[...]
On 16.10.25 13:56, Markus Armbruster wrote:
> Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> writes:
>
>> This parameter enables backend-transfer feature: all devices
>> which support it will migrate their backends (for example a TAP
>> device, by passing open file descriptor to migration channel).
>>
>> Currently no such devices, so the new parameter is a noop.
>>
>> Next commit will add support for virtio-net, to migrate its
>> TAP backend.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
>
> [...]
>
>> diff --git a/qapi/migration.json b/qapi/migration.json
>> index be0f3fcc12..35601a1f87 100644
>> --- a/qapi/migration.json
>> +++ b/qapi/migration.json
>> @@ -951,9 +951,16 @@
>> # is @cpr-exec. The first list element is the program's filename,
>> # the remainder its arguments. (Since 10.2)
>> #
>> +# @backend-transfer: Enable backend-transfer feature for devices that
>
> Either "Enable the backend transfer feature" or "Enable backend transfer"
then, "Enable the backend-transfer feature"
>
>> +# supports it. In general that means that backend state and its
>
> support
>
>> +# file descriptors are passed to the destination in the migraton
>> +# channel (which must be a UNIX socket). Individual devices
>> +# declare the support for backend-transfer by per-device
>> +# backend-transfer option. (Since 10.2)
>> +#
>
> I'm not sure I understand this.
>
> What is a "per-device backend-transfer option"? Is it a device
> property?
>
> If yes, I guess the device declares its capability to do this by having
> this property. Correct?
No, user may set/unset this property to say, should device participate
in backend-transfer or not.
Still, as you can see in parallel thread, Daniel have strong arguments
against such API, so seems it will change again in v9.
https://lore.kernel.org/qemu-devel/aPCtkB-GvFNuqlHn@redhat.com/
>
> Does the property's value matter? How?
>
>> # Features:
>> #
>> -# @unstable: Members @x-checkpoint-delay and
>> +# @unstable: Members @backend-transfer, @x-checkpoint-delay and
>> # @x-vcpu-dirty-limit-period are experimental.
>> #
>> # Since: 2.4
>> @@ -978,7 +985,8 @@
>> 'mode',
>> 'zero-page-detection',
>> 'direct-io',
>> - 'cpr-exec-command'] }
>> + 'cpr-exec-command',
>> + { 'name': 'backend-transfer', 'features': ['unstable'] } ] }
>>
>> ##
>> # @MigrateSetParameters:
>
> [...]
>
--
Best regards,
Vladimir
On Wed, Oct 15, 2025 at 04:21:32PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> This parameter enables backend-transfer feature: all devices
> which support it will migrate their backends (for example a TAP
> device, by passing open file descriptor to migration channel).
>
> Currently no such devices, so the new parameter is a noop.
>
> Next commit will add support for virtio-net, to migrate its
> TAP backend.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> ---
> migration/options.c | 18 ++++++++++++++++++
> migration/options.h | 2 ++
> qapi/migration.json | 38 ++++++++++++++++++++++++++++++++------
> 3 files changed, 52 insertions(+), 6 deletions(-)
>
> diff --git a/migration/options.c b/migration/options.c
> index 5183112775..a461b07b54 100644
> --- a/migration/options.c
> +++ b/migration/options.c
> @@ -262,6 +262,12 @@ bool migrate_mapped_ram(void)
> return s->capabilities[MIGRATION_CAPABILITY_MAPPED_RAM];
> }
>
> +bool migrate_backend_transfer(void)
> +{
> + MigrationState *s = migrate_get_current();
> + return s->parameters.backend_transfer;
> +}
> +
> bool migrate_ignore_shared(void)
> {
> MigrationState *s = migrate_get_current();
> @@ -963,6 +969,9 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
> params->cpr_exec_command = QAPI_CLONE(strList,
> s->parameters.cpr_exec_command);
>
> + params->has_backend_transfer = true;
> + params->backend_transfer = s->parameters.backend_transfer;
> +
> return params;
> }
>
> @@ -997,6 +1006,7 @@ void migrate_params_init(MigrationParameters *params)
> params->has_zero_page_detection = true;
> params->has_direct_io = true;
> params->has_cpr_exec_command = true;
> + params->has_backend_transfer = true;
> }
>
> /*
> @@ -1305,6 +1315,10 @@ static void migrate_params_test_apply(MigrateSetParameters *params,
> if (params->has_cpr_exec_command) {
> dest->cpr_exec_command = params->cpr_exec_command;
> }
> +
> + if (params->has_backend_transfer) {
> + dest->backend_transfer = params->backend_transfer;
> + }
> }
>
> static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
> @@ -1443,6 +1457,10 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
> s->parameters.cpr_exec_command =
> QAPI_CLONE(strList, params->cpr_exec_command);
> }
> +
> + if (params->has_backend_transfer) {
> + s->parameters.backend_transfer = params->backend_transfer;
> + }
> }
>
> void qmp_migrate_set_parameters(MigrateSetParameters *params, Error **errp)
> diff --git a/migration/options.h b/migration/options.h
> index 82d839709e..755ba1c024 100644
> --- a/migration/options.h
> +++ b/migration/options.h
> @@ -87,6 +87,8 @@ const char *migrate_tls_hostname(void);
> uint64_t migrate_xbzrle_cache_size(void);
> ZeroPageDetection migrate_zero_page_detection(void);
>
> +bool migrate_backend_transfer(void);
> +
> /* parameters helpers */
>
> bool migrate_params_check(MigrationParameters *params, Error **errp);
> diff --git a/qapi/migration.json b/qapi/migration.json
> index be0f3fcc12..35601a1f87 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -951,9 +951,16 @@
> # is @cpr-exec. The first list element is the program's filename,
> # the remainder its arguments. (Since 10.2)
> #
> +# @backend-transfer: Enable backend-transfer feature for devices that
> +# supports it. In general that means that backend state and its
> +# file descriptors are passed to the destination in the migraton
> +# channel (which must be a UNIX socket). Individual devices
> +# declare the support for backend-transfer by per-device
> +# backend-transfer option. (Since 10.2)
Thanks.
I still prefer the name "fd-passing" or anything more explicit than
"backend-transfer". Maybe the current name is fine for TAP, only because
TAP doesn't have its own VMSD to transfer?
Consider a device that would be a backend that supports VMSDs already to be
migrated, then if it starts to allow fd-passing, this name will stop being
suitable there, because it used to "transfer backend" already, now it's
just started to "fd-passing".
Meanwhile, consider another example - what if a device is not a backend at
all (e.g. vfio?), has its own VMSD, then want to do fd-passing?
In general, I think "fd" is really a core concept of this whole thing. One
thing to complement that idea is, IMHO this patch misses one important
change, that migration framework should actually explicitly fail the
migration if this feature is enabled but it's not a unix socket protocol
(aka, fd-passing REQUIRES scm rights). Would that look more reliable?
Otherwise IIUC it'll throw weird errors when e.g. when we enabled this
feature and trying to migrate via either TCP or to a file..
> +#
> # Features:
> #
> -# @unstable: Members @x-checkpoint-delay and
> +# @unstable: Members @backend-transfer, @x-checkpoint-delay and
> # @x-vcpu-dirty-limit-period are experimental.
> #
> # Since: 2.4
> @@ -978,7 +985,8 @@
> 'mode',
> 'zero-page-detection',
> 'direct-io',
> - 'cpr-exec-command'] }
> + 'cpr-exec-command',
> + { 'name': 'backend-transfer', 'features': ['unstable'] } ] }
>
> ##
> # @MigrateSetParameters:
> @@ -1137,9 +1145,16 @@
> # is @cpr-exec. The first list element is the program's filename,
> # the remainder its arguments. (Since 10.2)
> #
> +# @backend-transfer: Enable backend-transfer feature for devices that
> +# supports it. In general that means that backend state and its
> +# file descriptors are passed to the destination in the migraton
> +# channel (which must be a UNIX socket). Individual devices
> +# declare the support for backend-transfer by per-device
> +# backend-transfer option. (Since 10.2)
> +#
> # Features:
> #
> -# @unstable: Members @x-checkpoint-delay and
> +# @unstable: Members @backend-transfer, @x-checkpoint-delay and
> # @x-vcpu-dirty-limit-period are experimental.
> #
> # TODO: either fuse back into `MigrationParameters`, or make
> @@ -1179,7 +1194,9 @@
> '*mode': 'MigMode',
> '*zero-page-detection': 'ZeroPageDetection',
> '*direct-io': 'bool',
> - '*cpr-exec-command': [ 'str' ]} }
> + '*cpr-exec-command': [ 'str' ],
> + '*backend-transfer': { 'type': 'bool',
> + 'features': [ 'unstable' ] } } }
>
> ##
> # @migrate-set-parameters:
> @@ -1352,9 +1369,16 @@
> # is @cpr-exec. The first list element is the program's filename,
> # the remainder its arguments. (Since 10.2)
> #
> +# @backend-transfer: Enable backend-transfer feature for devices that
> +# supports it. In general that means that backend state and its
> +# file descriptors are passed to the destination in the migraton
> +# channel (which must be a UNIX socket). Individual devices
> +# declare the support for backend-transfer by per-device
> +# backend-transfer option. (Since 10.2)
> +#
> # Features:
> #
> -# @unstable: Members @x-checkpoint-delay and
> +# @unstable: Members @backend-transfer, @x-checkpoint-delay and
> # @x-vcpu-dirty-limit-period are experimental.
> #
> # Since: 2.4
> @@ -1391,7 +1415,9 @@
> '*mode': 'MigMode',
> '*zero-page-detection': 'ZeroPageDetection',
> '*direct-io': 'bool',
> - '*cpr-exec-command': [ 'str' ]} }
> + '*cpr-exec-command': [ 'str' ],
> + '*backend-transfer': { 'type': 'bool',
> + 'features': [ 'unstable' ] } } }
>
> ##
> # @query-migrate-parameters:
> --
> 2.48.1
>
--
Peter Xu
On 15.10.25 21:19, Peter Xu wrote:
> On Wed, Oct 15, 2025 at 04:21:32PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> This parameter enables backend-transfer feature: all devices
>> which support it will migrate their backends (for example a TAP
>> device, by passing open file descriptor to migration channel).
>>
>> Currently no such devices, so the new parameter is a noop.
>>
>> Next commit will add support for virtio-net, to migrate its
>> TAP backend.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
>> ---
[..]
>> --- a/qapi/migration.json
>> +++ b/qapi/migration.json
>> @@ -951,9 +951,16 @@
>> # is @cpr-exec. The first list element is the program's filename,
>> # the remainder its arguments. (Since 10.2)
>> #
>> +# @backend-transfer: Enable backend-transfer feature for devices that
>> +# supports it. In general that means that backend state and its
>> +# file descriptors are passed to the destination in the migraton
>> +# channel (which must be a UNIX socket). Individual devices
>> +# declare the support for backend-transfer by per-device
>> +# backend-transfer option. (Since 10.2)
>
> Thanks.
>
> I still prefer the name "fd-passing" or anything more explicit than
> "backend-transfer". Maybe the current name is fine for TAP, only because
> TAP doesn't have its own VMSD to transfer?
>
> Consider a device that would be a backend that supports VMSDs already to be
> migrated, then if it starts to allow fd-passing, this name will stop being
> suitable there, because it used to "transfer backend" already, now it's
> just started to "fd-passing".
>
> Meanwhile, consider another example - what if a device is not a backend at
> all (e.g. vfio?), has its own VMSD, then want to do fd-passing?
Reasonable.
But consider also the discussion with Fabiano in v5, where he argues against fds
(reasonable too):
https://lore.kernel.org/qemu-devel/87y0qatqoa.fsf@suse.de/
(still, they were against my "fds" name for the parameter, which is
really too generic, fd-passing is not)
and the arguments for backend-transfer (to read similar with cpr-transfer)
https://lore.kernel.org/qemu-devel/87ms6qtlgf.fsf@suse.de/
>
> In general, I think "fd" is really a core concept of this whole thing.
I think, we can call "backend" any external object, linked by the fd.
Still, backend/frontend terminology is so misleading, when applied to
complex systems (for me, at least), that I don't really like "-backend"
word here.
fd-passing is OK for me, I can resend with it, if arguments by Fabiano
not change your mind.
> One
> thing to complement that idea is, IMHO this patch misses one important
> change, that migration framework should actually explicitly fail the
> migration if this feature is enabled but it's not a unix socket protocol
> (aka, fd-passing REQUIRES scm rights). Would that look more reliable?
> Otherwise IIUC it'll throw weird errors when e.g. when we enabled this
> feature and trying to migrate via either TCP or to a file..
>
Right. I rely on checking in qemu_file_get_fd() / qemu_file_set_fd()
handlers.
But of course, earlier clean failure of qmp-migrate / qmp-incoming-migate
commands would be nice, will do.
Like this, I think:
diff --git a/migration/migration.c b/migration/migration.c
index 6ed6a10f57..0c73332706 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -255,6 +255,14 @@ migration_channels_and_transport_compatible(MigrationAddress *addr,
return false;
}
+ if (migrate_backend_transfer() &&
+ !(addr->transport == MIGRATION_ADDRESS_TYPE_SOCKET &&
+ addr->u.socket.type == SOCKET_ADDRESS_TYPE_UNIX)) {
+ error_setg(errp, "Migration requires a UNIX domain socket as transport, "
+ "because backend-transfer is enabled");
+ return false;
+ }
+
return true;
}
--
Best regards,
Vladimir
On Wed, Oct 15, 2025 at 10:02:14PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 15.10.25 21:19, Peter Xu wrote:
> > On Wed, Oct 15, 2025 at 04:21:32PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > This parameter enables backend-transfer feature: all devices
> > > which support it will migrate their backends (for example a TAP
> > > device, by passing open file descriptor to migration channel).
> > >
> > > Currently no such devices, so the new parameter is a noop.
> > >
> > > Next commit will add support for virtio-net, to migrate its
> > > TAP backend.
> > >
> > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> > > ---
>
> [..]
>
> > > --- a/qapi/migration.json
> > > +++ b/qapi/migration.json
> > > @@ -951,9 +951,16 @@
> > > # is @cpr-exec. The first list element is the program's filename,
> > > # the remainder its arguments. (Since 10.2)
> > > #
> > > +# @backend-transfer: Enable backend-transfer feature for devices that
> > > +# supports it. In general that means that backend state and its
> > > +# file descriptors are passed to the destination in the migraton
> > > +# channel (which must be a UNIX socket). Individual devices
> > > +# declare the support for backend-transfer by per-device
> > > +# backend-transfer option. (Since 10.2)
> >
> > Thanks.
> >
> > I still prefer the name "fd-passing" or anything more explicit than
> > "backend-transfer". Maybe the current name is fine for TAP, only because
> > TAP doesn't have its own VMSD to transfer?
> >
> > Consider a device that would be a backend that supports VMSDs already to be
> > migrated, then if it starts to allow fd-passing, this name will stop being
> > suitable there, because it used to "transfer backend" already, now it's
> > just started to "fd-passing".
> >
> > Meanwhile, consider another example - what if a device is not a backend at
> > all (e.g. vfio?), has its own VMSD, then want to do fd-passing?
>
> Reasonable.
>
> But consider also the discussion with Fabiano in v5, where he argues against fds
> (reasonable too):
>
> https://lore.kernel.org/qemu-devel/87y0qatqoa.fsf@suse.de/
>
> (still, they were against my "fds" name for the parameter, which is
> really too generic, fd-passing is not)
>
> and the arguments for backend-transfer (to read similar with cpr-transfer)
>
> https://lore.kernel.org/qemu-devel/87ms6qtlgf.fsf@suse.de/
>
>
> >
> > In general, I think "fd" is really a core concept of this whole thing.
>
> I think, we can call "backend" any external object, linked by the fd.
>
> Still, backend/frontend terminology is so misleading, when applied to
> complex systems (for me, at least), that I don't really like "-backend"
> word here.
>
> fd-passing is OK for me, I can resend with it, if arguments by Fabiano
> not change your mind.
Ah, I didn't notice the name has been discussed.
I think it means you can vote for your own preference now because we have
one vote for each. :) Let's also see whether Fabiano will come up with
something better than both.
You mentioned explicitly the file descriptors in the qapi doc, that's what
I would strongly request for. The other thing is the unix socket check, it
looks all good below now with it, thanks. No strong feelings on the names.
>
> > One
> > thing to complement that idea is, IMHO this patch misses one important
> > change, that migration framework should actually explicitly fail the
> > migration if this feature is enabled but it's not a unix socket protocol
> > (aka, fd-passing REQUIRES scm rights). Would that look more reliable?
> > Otherwise IIUC it'll throw weird errors when e.g. when we enabled this
> > feature and trying to migrate via either TCP or to a file..
> >
>
> Right. I rely on checking in qemu_file_get_fd() / qemu_file_set_fd()
> handlers.
>
> But of course, earlier clean failure of qmp-migrate / qmp-incoming-migate
> commands would be nice, will do.
>
> Like this, I think:
>
> diff --git a/migration/migration.c b/migration/migration.c
> index 6ed6a10f57..0c73332706 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -255,6 +255,14 @@ migration_channels_and_transport_compatible(MigrationAddress *addr,
> return false;
> }
>
> + if (migrate_backend_transfer() &&
> + !(addr->transport == MIGRATION_ADDRESS_TYPE_SOCKET &&
> + addr->u.socket.type == SOCKET_ADDRESS_TYPE_UNIX)) {
> + error_setg(errp, "Migration requires a UNIX domain socket as transport, "
> + "because backend-transfer is enabled");
> + return false;
> + }
> +
> return true;
> }
>
>
>
>
>
> --
> Best regards,
> Vladimir
>
--
Peter Xu
On 15.10.25 23:07, Peter Xu wrote:
> On Wed, Oct 15, 2025 at 10:02:14PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> On 15.10.25 21:19, Peter Xu wrote:
>>> On Wed, Oct 15, 2025 at 04:21:32PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>>>> This parameter enables backend-transfer feature: all devices
>>>> which support it will migrate their backends (for example a TAP
>>>> device, by passing open file descriptor to migration channel).
>>>>
>>>> Currently no such devices, so the new parameter is a noop.
>>>>
>>>> Next commit will add support for virtio-net, to migrate its
>>>> TAP backend.
>>>>
>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
>>>> ---
>>
>> [..]
>>
>>>> --- a/qapi/migration.json
>>>> +++ b/qapi/migration.json
>>>> @@ -951,9 +951,16 @@
>>>> # is @cpr-exec. The first list element is the program's filename,
>>>> # the remainder its arguments. (Since 10.2)
>>>> #
>>>> +# @backend-transfer: Enable backend-transfer feature for devices that
>>>> +# supports it. In general that means that backend state and its
>>>> +# file descriptors are passed to the destination in the migraton
>>>> +# channel (which must be a UNIX socket). Individual devices
>>>> +# declare the support for backend-transfer by per-device
>>>> +# backend-transfer option. (Since 10.2)
>>>
>>> Thanks.
>>>
>>> I still prefer the name "fd-passing" or anything more explicit than
>>> "backend-transfer". Maybe the current name is fine for TAP, only because
>>> TAP doesn't have its own VMSD to transfer?
>>>
>>> Consider a device that would be a backend that supports VMSDs already to be
>>> migrated, then if it starts to allow fd-passing, this name will stop being
>>> suitable there, because it used to "transfer backend" already, now it's
>>> just started to "fd-passing".
>>>
>>> Meanwhile, consider another example - what if a device is not a backend at
>>> all (e.g. vfio?), has its own VMSD, then want to do fd-passing?
>>
>> Reasonable.
>>
>> But consider also the discussion with Fabiano in v5, where he argues against fds
>> (reasonable too):
>>
>> https://lore.kernel.org/qemu-devel/87y0qatqoa.fsf@suse.de/
>>
>> (still, they were against my "fds" name for the parameter, which is
>> really too generic, fd-passing is not)
>>
>> and the arguments for backend-transfer (to read similar with cpr-transfer)
>>
>> https://lore.kernel.org/qemu-devel/87ms6qtlgf.fsf@suse.de/
>>
>>
>>>
>>> In general, I think "fd" is really a core concept of this whole thing.
>>
>> I think, we can call "backend" any external object, linked by the fd.
>>
>> Still, backend/frontend terminology is so misleading, when applied to
>> complex systems (for me, at least), that I don't really like "-backend"
>> word here.
>>
>> fd-passing is OK for me, I can resend with it, if arguments by Fabiano
>> not change your mind.
>
> Ah, I didn't notice the name has been discussed.
>
> I think it means you can vote for your own preference now because we have
> one vote for each. :) Let's also see whether Fabiano will come up with
> something better than both.
>
> You mentioned explicitly the file descriptors in the qapi doc, that's what
> I would strongly request for. The other thing is the unix socket check, it
> looks all good below now with it, thanks. No strong feelings on the names.
>
After a bit more thinking, I leaning towards keeping backend-transfer. I think
it's more meaningful for the user:
If we call it "fd-passing", user may ask:
Ok, what is it? Allow QEMU to pass some fds through migration stream, if it
supports fds? Which fds? Why to pass them? Finally, why QEMU can't just check
is it unix socket or not, and pass any fds it wants if it is?
Logical question is, why not just drop the global capability, and check only
is it unix socket or not? (OK, relying only on socket type is wrong anyway,
as it may be some complex tunneling, which includes unix sockets, but still
can't pass fds, but I think now about feature naming)
But we really want an explicit switch for the feature. As qemu-update is
not the only case of local migration. The another case is changing the
backend. So for the user's choice is:
1. Remote migration: we can't reuse backends (files, sockets, host devices), as
we are moving to another host. So, we don't enable "backend-transfer". We don't
transfer the backend, we have to initialize new backend on another host.
2. Local migration to update QEMU, with minimal freeze-time and minimal
extra actions: use "backend-transfer", exactly to keep the backends
(vhost-user-server, TAP device in kernel, in-kernel vfio device state, etc)
as is.
3. Local migration, but we want to reconfigure some backend, or switch
to another backend. We disable "backend-transfer" for one device.
4. Some problem with "backend-transfer", may be some bug. Disable the whole
beackend-transfer feature, and do normal local migration to a new version
with bug fixed.
-
"backend-transfer" better reflects, what management layer should do, or
should not do with backends, depending on migration type.
>>
>>> One
>>> thing to complement that idea is, IMHO this patch misses one important
>>> change, that migration framework should actually explicitly fail the
>>> migration if this feature is enabled but it's not a unix socket protocol
>>> (aka, fd-passing REQUIRES scm rights). Would that look more reliable?
>>> Otherwise IIUC it'll throw weird errors when e.g. when we enabled this
>>> feature and trying to migrate via either TCP or to a file..
>>>
>>
>> Right. I rely on checking in qemu_file_get_fd() / qemu_file_set_fd()
>> handlers.
>>
>> But of course, earlier clean failure of qmp-migrate / qmp-incoming-migate
>> commands would be nice, will do.
>>
>> Like this, I think:
>>
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 6ed6a10f57..0c73332706 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -255,6 +255,14 @@ migration_channels_and_transport_compatible(MigrationAddress *addr,
>> return false;
>> }
>>
>> + if (migrate_backend_transfer() &&
>> + !(addr->transport == MIGRATION_ADDRESS_TYPE_SOCKET &&
>> + addr->u.socket.type == SOCKET_ADDRESS_TYPE_UNIX)) {
>> + error_setg(errp, "Migration requires a UNIX domain socket as transport, "
>> + "because backend-transfer is enabled");
>> + return false;
>> + }
>> +
>> return true;
>> }
>>
>>
>>
>>
>>
>> --
>> Best regards,
>> Vladimir
>>
>
--
Best regards,
Vladimir
On Thu, Oct 16, 2025 at 12:02:45AM +0300, Vladimir Sementsov-Ogievskiy wrote: > On 15.10.25 23:07, Peter Xu wrote: > > On Wed, Oct 15, 2025 at 10:02:14PM +0300, Vladimir Sementsov-Ogievskiy wrote: > > > On 15.10.25 21:19, Peter Xu wrote: > > > > On Wed, Oct 15, 2025 at 04:21:32PM +0300, Vladimir Sementsov-Ogievskiy wrote: > > > > > This parameter enables backend-transfer feature: all devices > > > > > which support it will migrate their backends (for example a TAP > > > > > device, by passing open file descriptor to migration channel). > > > > > > > > > > Currently no such devices, so the new parameter is a noop. > > > > > > > > > > Next commit will add support for virtio-net, to migrate its > > > > > TAP backend. > > > > > > > > > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> > > > > > --- > > > > > > [..] > > > > > > > > --- a/qapi/migration.json > > > > > +++ b/qapi/migration.json > > > > > @@ -951,9 +951,16 @@ > > > > > # is @cpr-exec. The first list element is the program's filename, > > > > > # the remainder its arguments. (Since 10.2) > > > > > # > > > > > +# @backend-transfer: Enable backend-transfer feature for devices that > > > > > +# supports it. In general that means that backend state and its > > > > > +# file descriptors are passed to the destination in the migraton > > > > > +# channel (which must be a UNIX socket). Individual devices > > > > > +# declare the support for backend-transfer by per-device > > > > > +# backend-transfer option. (Since 10.2) > > > > > > > > Thanks. > > > > > > > > I still prefer the name "fd-passing" or anything more explicit than > > > > "backend-transfer". Maybe the current name is fine for TAP, only because > > > > TAP doesn't have its own VMSD to transfer? > > > > > > > > Consider a device that would be a backend that supports VMSDs already to be > > > > migrated, then if it starts to allow fd-passing, this name will stop being > > > > suitable there, because it used to "transfer backend" already, now it's > > > > just started to "fd-passing". > > > > > > > > Meanwhile, consider another example - what if a device is not a backend at > > > > all (e.g. vfio?), has its own VMSD, then want to do fd-passing? > > > > > > Reasonable. > > > > > > But consider also the discussion with Fabiano in v5, where he argues against fds > > > (reasonable too): > > > > > > https://lore.kernel.org/qemu-devel/87y0qatqoa.fsf@suse.de/ > > > > > > (still, they were against my "fds" name for the parameter, which is > > > really too generic, fd-passing is not) > > > > > > and the arguments for backend-transfer (to read similar with cpr-transfer) > > > > > > https://lore.kernel.org/qemu-devel/87ms6qtlgf.fsf@suse.de/ > > > > > > > > > > > > > > In general, I think "fd" is really a core concept of this whole thing. > > > > > > I think, we can call "backend" any external object, linked by the fd. > > > > > > Still, backend/frontend terminology is so misleading, when applied to > > > complex systems (for me, at least), that I don't really like "-backend" > > > word here. > > > > > > fd-passing is OK for me, I can resend with it, if arguments by Fabiano > > > not change your mind. > > > > Ah, I didn't notice the name has been discussed. > > > > I think it means you can vote for your own preference now because we have > > one vote for each. :) Let's also see whether Fabiano will come up with > > something better than both. > > > > You mentioned explicitly the file descriptors in the qapi doc, that's what > > I would strongly request for. The other thing is the unix socket check, it > > looks all good below now with it, thanks. No strong feelings on the names. > > > > After a bit more thinking, I leaning towards keeping backend-transfer. I think > it's more meaningful for the user: > > If we call it "fd-passing", user may ask: > > Ok, what is it? Allow QEMU to pass some fds through migration stream, if it > supports fds? Which fds? Why to pass them? Finally, why QEMU can't just check > is it unix socket or not, and pass any fds it wants if it is? > > Logical question is, why not just drop the global capability, and check only > is it unix socket or not? (OK, relying only on socket type is wrong anyway, > as it may be some complex tunneling, which includes unix sockets, but still > can't pass fds, but I think now about feature naming) > > But we really want an explicit switch for the feature. As qemu-update is > not the only case of local migration. The another case is changing the > backend. So for the user's choice is: > > 1. Remote migration: we can't reuse backends (files, sockets, host devices), as > we are moving to another host. So, we don't enable "backend-transfer". We don't > transfer the backend, we have to initialize new backend on another host. > > 2. Local migration to update QEMU, with minimal freeze-time and minimal > extra actions: use "backend-transfer", exactly to keep the backends > (vhost-user-server, TAP device in kernel, in-kernel vfio device state, etc) > as is. > > 3. Local migration, but we want to reconfigure some backend, or switch > to another backend. We disable "backend-transfer" for one device. This implies that you're changing 'backend-transfer' against the device at time of each migration. This takes us back to the situation we've had historically where the behaviour of migration depends on global properties the mgmt app has set prior to the 'migrate' command being run. We've just tried to get away from that model by passing everything as parameters to the migrate command, so I'm loathe to see us invent a new way to have global state properties changing migration behaviour. This 'backend-transfer' device property is not really a device property, it is an indirect parameter to the 'migrate' command. Ergo, if we need the ability to selectively migrate the backend state of individal devices, then instead of a property on the device, we should pass a list of device IDs as a parameter to the migrate command in QMP. > > 4. Some problem with "backend-transfer", may be some bug. Disable the whole > beackend-transfer feature, and do normal local migration to a new version > with bug fixed. > With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On 16.10.25 11:32, Daniel P. Berrangé wrote: > On Thu, Oct 16, 2025 at 12:02:45AM +0300, Vladimir Sementsov-Ogievskiy wrote: >> On 15.10.25 23:07, Peter Xu wrote: >>> On Wed, Oct 15, 2025 at 10:02:14PM +0300, Vladimir Sementsov-Ogievskiy wrote: >>>> On 15.10.25 21:19, Peter Xu wrote: >>>>> On Wed, Oct 15, 2025 at 04:21:32PM +0300, Vladimir Sementsov-Ogievskiy wrote: >>>>>> This parameter enables backend-transfer feature: all devices >>>>>> which support it will migrate their backends (for example a TAP >>>>>> device, by passing open file descriptor to migration channel). >>>>>> >>>>>> Currently no such devices, so the new parameter is a noop. >>>>>> >>>>>> Next commit will add support for virtio-net, to migrate its >>>>>> TAP backend. >>>>>> >>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> >>>>>> --- >>>> >>>> [..] >>>> >>>>>> --- a/qapi/migration.json >>>>>> +++ b/qapi/migration.json >>>>>> @@ -951,9 +951,16 @@ >>>>>> # is @cpr-exec. The first list element is the program's filename, >>>>>> # the remainder its arguments. (Since 10.2) >>>>>> # >>>>>> +# @backend-transfer: Enable backend-transfer feature for devices that >>>>>> +# supports it. In general that means that backend state and its >>>>>> +# file descriptors are passed to the destination in the migraton >>>>>> +# channel (which must be a UNIX socket). Individual devices >>>>>> +# declare the support for backend-transfer by per-device >>>>>> +# backend-transfer option. (Since 10.2) >>>>> >>>>> Thanks. >>>>> >>>>> I still prefer the name "fd-passing" or anything more explicit than >>>>> "backend-transfer". Maybe the current name is fine for TAP, only because >>>>> TAP doesn't have its own VMSD to transfer? >>>>> >>>>> Consider a device that would be a backend that supports VMSDs already to be >>>>> migrated, then if it starts to allow fd-passing, this name will stop being >>>>> suitable there, because it used to "transfer backend" already, now it's >>>>> just started to "fd-passing". >>>>> >>>>> Meanwhile, consider another example - what if a device is not a backend at >>>>> all (e.g. vfio?), has its own VMSD, then want to do fd-passing? >>>> >>>> Reasonable. >>>> >>>> But consider also the discussion with Fabiano in v5, where he argues against fds >>>> (reasonable too): >>>> >>>> https://lore.kernel.org/qemu-devel/87y0qatqoa.fsf@suse.de/ >>>> >>>> (still, they were against my "fds" name for the parameter, which is >>>> really too generic, fd-passing is not) >>>> >>>> and the arguments for backend-transfer (to read similar with cpr-transfer) >>>> >>>> https://lore.kernel.org/qemu-devel/87ms6qtlgf.fsf@suse.de/ >>>> >>>> >>>>> >>>>> In general, I think "fd" is really a core concept of this whole thing. >>>> >>>> I think, we can call "backend" any external object, linked by the fd. >>>> >>>> Still, backend/frontend terminology is so misleading, when applied to >>>> complex systems (for me, at least), that I don't really like "-backend" >>>> word here. >>>> >>>> fd-passing is OK for me, I can resend with it, if arguments by Fabiano >>>> not change your mind. >>> >>> Ah, I didn't notice the name has been discussed. >>> >>> I think it means you can vote for your own preference now because we have >>> one vote for each. :) Let's also see whether Fabiano will come up with >>> something better than both. >>> >>> You mentioned explicitly the file descriptors in the qapi doc, that's what >>> I would strongly request for. The other thing is the unix socket check, it >>> looks all good below now with it, thanks. No strong feelings on the names. >>> >> >> After a bit more thinking, I leaning towards keeping backend-transfer. I think >> it's more meaningful for the user: >> >> If we call it "fd-passing", user may ask: >> >> Ok, what is it? Allow QEMU to pass some fds through migration stream, if it >> supports fds? Which fds? Why to pass them? Finally, why QEMU can't just check >> is it unix socket or not, and pass any fds it wants if it is? >> >> Logical question is, why not just drop the global capability, and check only >> is it unix socket or not? (OK, relying only on socket type is wrong anyway, >> as it may be some complex tunneling, which includes unix sockets, but still >> can't pass fds, but I think now about feature naming) >> >> But we really want an explicit switch for the feature. As qemu-update is >> not the only case of local migration. The another case is changing the >> backend. So for the user's choice is: >> >> 1. Remote migration: we can't reuse backends (files, sockets, host devices), as >> we are moving to another host. So, we don't enable "backend-transfer". We don't >> transfer the backend, we have to initialize new backend on another host. >> >> 2. Local migration to update QEMU, with minimal freeze-time and minimal >> extra actions: use "backend-transfer", exactly to keep the backends >> (vhost-user-server, TAP device in kernel, in-kernel vfio device state, etc) >> as is. >> >> 3. Local migration, but we want to reconfigure some backend, or switch >> to another backend. We disable "backend-transfer" for one device. > > This implies that you're changing 'backend-transfer' against the > device at time of each migration. > > This takes us back to the situation we've had historically where the > behaviour of migration depends on global properties the mgmt app has > set prior to the 'migrate' command being run. We've just tried to get > away from that model by passing everything as parameters to the > migrate command, so I'm loathe to see us invent a new way to have > global state properties changing migration behaviour. > > This 'backend-transfer' device property is not really a device property, > it is an indirect parameter to the 'migrate' command. > > Ergo, if we need the ability to selectively migrate the backend state > of individal devices, then instead of a property on the device, we > should pass a list of device IDs as a parameter to the migrate > command in QMP. Understand. So, it will look like # @backend-transfer: List of devices IDs or QOM paths, to enable # backend-transfer for. In general that means that backend # states and their file descriptors are passed to the destination # in the migration channel (which must be a UNIX socket), and # management tool doesn't have to configure new backends for # target QEMU (like vhost-user server, or TAP device in the kernel). # Default is no backend-transfer migration (Since 10.2) Peter, is it OK for you? > >> >> 4. Some problem with "backend-transfer", may be some bug. Disable the whole >> beackend-transfer feature, and do normal local migration to a new version >> with bug fixed. >> > > With regards, > Daniel -- Best regards, Vladimir
On 16.10.25 12:23, Vladimir Sementsov-Ogievskiy wrote:
> On 16.10.25 11:32, Daniel P. Berrangé wrote:
>> On Thu, Oct 16, 2025 at 12:02:45AM +0300, Vladimir Sementsov-Ogievskiy wrote:
>>> On 15.10.25 23:07, Peter Xu wrote:
>>>> On Wed, Oct 15, 2025 at 10:02:14PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>>>>> On 15.10.25 21:19, Peter Xu wrote:
>>>>>> On Wed, Oct 15, 2025 at 04:21:32PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>>>>>>> This parameter enables backend-transfer feature: all devices
>>>>>>> which support it will migrate their backends (for example a TAP
>>>>>>> device, by passing open file descriptor to migration channel).
>>>>>>>
>>>>>>> Currently no such devices, so the new parameter is a noop.
>>>>>>>
>>>>>>> Next commit will add support for virtio-net, to migrate its
>>>>>>> TAP backend.
>>>>>>>
>>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
>>>>>>> ---
>>>>>
>>>>> [..]
>>>>>
>>>>>>> --- a/qapi/migration.json
>>>>>>> +++ b/qapi/migration.json
>>>>>>> @@ -951,9 +951,16 @@
>>>>>>> # is @cpr-exec. The first list element is the program's filename,
>>>>>>> # the remainder its arguments. (Since 10.2)
>>>>>>> #
>>>>>>> +# @backend-transfer: Enable backend-transfer feature for devices that
>>>>>>> +# supports it. In general that means that backend state and its
>>>>>>> +# file descriptors are passed to the destination in the migraton
>>>>>>> +# channel (which must be a UNIX socket). Individual devices
>>>>>>> +# declare the support for backend-transfer by per-device
>>>>>>> +# backend-transfer option. (Since 10.2)
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>> I still prefer the name "fd-passing" or anything more explicit than
>>>>>> "backend-transfer". Maybe the current name is fine for TAP, only because
>>>>>> TAP doesn't have its own VMSD to transfer?
>>>>>>
>>>>>> Consider a device that would be a backend that supports VMSDs already to be
>>>>>> migrated, then if it starts to allow fd-passing, this name will stop being
>>>>>> suitable there, because it used to "transfer backend" already, now it's
>>>>>> just started to "fd-passing".
>>>>>>
>>>>>> Meanwhile, consider another example - what if a device is not a backend at
>>>>>> all (e.g. vfio?), has its own VMSD, then want to do fd-passing?
>>>>>
>>>>> Reasonable.
>>>>>
>>>>> But consider also the discussion with Fabiano in v5, where he argues against fds
>>>>> (reasonable too):
>>>>>
>>>>> https://lore.kernel.org/qemu-devel/87y0qatqoa.fsf@suse.de/
>>>>>
>>>>> (still, they were against my "fds" name for the parameter, which is
>>>>> really too generic, fd-passing is not)
>>>>>
>>>>> and the arguments for backend-transfer (to read similar with cpr-transfer)
>>>>>
>>>>> https://lore.kernel.org/qemu-devel/87ms6qtlgf.fsf@suse.de/
>>>>>
>>>>>
>>>>>>
>>>>>> In general, I think "fd" is really a core concept of this whole thing.
>>>>>
>>>>> I think, we can call "backend" any external object, linked by the fd.
>>>>>
>>>>> Still, backend/frontend terminology is so misleading, when applied to
>>>>> complex systems (for me, at least), that I don't really like "-backend"
>>>>> word here.
>>>>>
>>>>> fd-passing is OK for me, I can resend with it, if arguments by Fabiano
>>>>> not change your mind.
>>>>
>>>> Ah, I didn't notice the name has been discussed.
>>>>
>>>> I think it means you can vote for your own preference now because we have
>>>> one vote for each. :) Let's also see whether Fabiano will come up with
>>>> something better than both.
>>>>
>>>> You mentioned explicitly the file descriptors in the qapi doc, that's what
>>>> I would strongly request for. The other thing is the unix socket check, it
>>>> looks all good below now with it, thanks. No strong feelings on the names.
>>>>
>>>
>>> After a bit more thinking, I leaning towards keeping backend-transfer. I think
>>> it's more meaningful for the user:
>>>
>>> If we call it "fd-passing", user may ask:
>>>
>>> Ok, what is it? Allow QEMU to pass some fds through migration stream, if it
>>> supports fds? Which fds? Why to pass them? Finally, why QEMU can't just check
>>> is it unix socket or not, and pass any fds it wants if it is?
>>>
>>> Logical question is, why not just drop the global capability, and check only
>>> is it unix socket or not? (OK, relying only on socket type is wrong anyway,
>>> as it may be some complex tunneling, which includes unix sockets, but still
>>> can't pass fds, but I think now about feature naming)
>>>
>>> But we really want an explicit switch for the feature. As qemu-update is
>>> not the only case of local migration. The another case is changing the
>>> backend. So for the user's choice is:
>>>
>>> 1. Remote migration: we can't reuse backends (files, sockets, host devices), as
>>> we are moving to another host. So, we don't enable "backend-transfer". We don't
>>> transfer the backend, we have to initialize new backend on another host.
>>>
>>> 2. Local migration to update QEMU, with minimal freeze-time and minimal
>>> extra actions: use "backend-transfer", exactly to keep the backends
>>> (vhost-user-server, TAP device in kernel, in-kernel vfio device state, etc)
>>> as is.
>>>
>>> 3. Local migration, but we want to reconfigure some backend, or switch
>>> to another backend. We disable "backend-transfer" for one device.
>>
>> This implies that you're changing 'backend-transfer' against the
>> device at time of each migration.
>>
>> This takes us back to the situation we've had historically where the
>> behaviour of migration depends on global properties the mgmt app has
>> set prior to the 'migrate' command being run. We've just tried to get
>> away from that model by passing everything as parameters to the
>> migrate command, so I'm loathe to see us invent a new way to have
>> global state properties changing migration behaviour.
>>
>> This 'backend-transfer' device property is not really a device property,
>> it is an indirect parameter to the 'migrate' command.
>>
>> Ergo, if we need the ability to selectively migrate the backend state
>> of individal devices, then instead of a property on the device, we
>> should pass a list of device IDs as a parameter to the migrate
>> command in QMP.
>
> Understand.
>
> So, it will look like
>
> # @backend-transfer: List of devices IDs or QOM paths, to enable
> # backend-transfer for. In general that means that backend
> # states and their file descriptors are passed to the destination
> # in the migration channel (which must be a UNIX socket), and
> # management tool doesn't have to configure new backends for
> # target QEMU (like vhost-user server, or TAP device in the kernel).
> # Default is no backend-transfer migration (Since 10.2)
>
RFC diff to these series, to switch the API to list of IDs:
diff --git a/hw/core/machine.c b/hw/core/machine.c
index a3d77f5604..681adbb7ac 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -40,7 +40,6 @@
GlobalProperty hw_compat_10_1[] = {
{ TYPE_ACPI_GED, "x-has-hest-addr", "false" },
- { TYPE_VIRTIO_NET, "backend-transfer", "false" },
};
const size_t hw_compat_10_1_len = G_N_ELEMENTS(hw_compat_10_1);
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 5f9711dee7..a895b26e5d 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -3638,7 +3638,7 @@ static bool virtio_net_is_tap_mig(void *opaque, int version_id)
nc = qemu_get_queue(n->nic);
- return migrate_backend_transfer() && n->backend_transfer && nc->peer &&
+ return migrate_backend_transfer(DEVICE(n)) && nc->peer &&
nc->peer->info->type == NET_CLIENT_DRIVER_TAP;
}
@@ -4461,7 +4461,6 @@ static const Property virtio_net_properties[] = {
host_features_ex,
VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM,
false),
- DEFINE_PROP_BOOL("backend-transfer", VirtIONet, backend_transfer, true),
};
static void virtio_net_class_init(ObjectClass *klass, const void *data)
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index a7bfb10dc7..0f3b7aa55e 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -1160,4 +1160,7 @@ typedef enum MachineInitPhase {
bool phase_check(MachineInitPhase phase);
void phase_advance(MachineInitPhase phase);
+bool migrate_backend_transfer(DeviceState *dev);
+bool migrate_backend_transfer_check_list(const strList *list, Error **errp);
+
#endif
diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index bf07f8a4cb..5b8ab7bda7 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -231,7 +231,6 @@ struct VirtIONet {
struct EBPFRSSContext ebpf_rss;
uint32_t nr_ebpf_rss_fds;
char **ebpf_rss_fds;
- bool backend_transfer;
};
size_t virtio_net_handle_ctrl_iov(VirtIODevice *vdev,
diff --git a/include/migration/misc.h b/include/migration/misc.h
index 592b93021e..7f931bed17 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -152,4 +152,6 @@ bool multifd_device_state_save_thread_should_exit(void);
void multifd_abort_device_state_save_threads(void);
bool multifd_join_device_state_save_threads(void);
+const strList *migrate_backend_transfer_list(void);
+
#endif
diff --git a/migration/options.c b/migration/options.c
index a461b07b54..1644728ed7 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -13,6 +13,7 @@
#include "qemu/osdep.h"
#include "qemu/error-report.h"
+#include "qapi/util.h"
#include "exec/target_page.h"
#include "qapi/clone-visitor.h"
#include "qapi/error.h"
@@ -24,6 +25,7 @@
#include "migration/colo.h"
#include "migration/cpr.h"
#include "migration/misc.h"
+#include "migration/options.h"
#include "migration.h"
#include "migration-stats.h"
#include "qemu-file.h"
@@ -262,7 +264,7 @@ bool migrate_mapped_ram(void)
return s->capabilities[MIGRATION_CAPABILITY_MAPPED_RAM];
}
-bool migrate_backend_transfer(void)
+const strList *migrate_backend_transfer_list(void)
{
MigrationState *s = migrate_get_current();
return s->parameters.backend_transfer;
@@ -969,8 +971,11 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
params->cpr_exec_command = QAPI_CLONE(strList,
s->parameters.cpr_exec_command);
- params->has_backend_transfer = true;
- params->backend_transfer = s->parameters.backend_transfer;
+ if (s->parameters.backend_transfer) {
+ params->has_backend_transfer = true;
+ params->backend_transfer = QAPI_CLONE(strList,
+ s->parameters.backend_transfer);
+ }
return params;
}
@@ -1193,6 +1198,11 @@ bool migrate_params_check(MigrationParameters *params, Error **errp)
return false;
}
+ if (params->has_backend_transfer &&
+ !migrate_backend_transfer_check_list(params->backend_transfer, errp)) {
+ return false;
+ }
+
return true;
}
@@ -1459,7 +1469,10 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
}
if (params->has_backend_transfer) {
- s->parameters.backend_transfer = params->backend_transfer;
+ qapi_free_strList(s->parameters.backend_transfer);
+
+ s->parameters.backend_transfer = QAPI_CLONE(strList,
+ params->backend_transfer);
}
}
diff --git a/migration/options.h b/migration/options.h
index 755ba1c024..82d839709e 100644
--- a/migration/options.h
+++ b/migration/options.h
@@ -87,8 +87,6 @@ const char *migrate_tls_hostname(void);
uint64_t migrate_xbzrle_cache_size(void);
ZeroPageDetection migrate_zero_page_detection(void);
-bool migrate_backend_transfer(void);
-
/* parameters helpers */
bool migrate_params_check(MigrationParameters *params, Error **errp);
diff --git a/qapi/migration.json b/qapi/migration.json
index 35601a1f87..9478c4ddab 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -951,12 +951,11 @@
# is @cpr-exec. The first list element is the program's filename,
# the remainder its arguments. (Since 10.2)
#
-# @backend-transfer: Enable backend-transfer feature for devices that
-# supports it. In general that means that backend state and its
-# file descriptors are passed to the destination in the migraton
-# channel (which must be a UNIX socket). Individual devices
-# declare the support for backend-transfer by per-device
-# backend-transfer option. (Since 10.2)
+# @backend-transfer: List of devices (IDs or QOM paths) for
+# backend-transfer migration. When enabled, device backends
+# including opened fds will be passed to the destination in the
+# migration channel (which must be a UNIX domain socket). Default
+# is no backend-transfer migration. (Since 10.2)
#
# Features:
#
@@ -1145,12 +1144,11 @@
# is @cpr-exec. The first list element is the program's filename,
# the remainder its arguments. (Since 10.2)
#
-# @backend-transfer: Enable backend-transfer feature for devices that
-# supports it. In general that means that backend state and its
-# file descriptors are passed to the destination in the migraton
-# channel (which must be a UNIX socket). Individual devices
-# declare the support for backend-transfer by per-device
-# backend-transfer option. (Since 10.2)
+# @backend-transfer: List of devices (IDs or QOM paths) for
+# backend-transfer migration. When enabled, device backends
+# including opened fds will be passed to the destination in the
+# migration channel (which must be a UNIX domain socket). Default
+# is no backend-transfer migration. (Since 10.2)
#
# Features:
#
@@ -1195,7 +1193,7 @@
'*zero-page-detection': 'ZeroPageDetection',
'*direct-io': 'bool',
'*cpr-exec-command': [ 'str' ],
- '*backend-transfer': { 'type': 'bool',
+ '*backend-transfer': { 'type': [ 'str' ],
'features': [ 'unstable' ] } } }
##
@@ -1369,12 +1367,11 @@
# is @cpr-exec. The first list element is the program's filename,
# the remainder its arguments. (Since 10.2)
#
-# @backend-transfer: Enable backend-transfer feature for devices that
-# supports it. In general that means that backend state and its
-# file descriptors are passed to the destination in the migraton
-# channel (which must be a UNIX socket). Individual devices
-# declare the support for backend-transfer by per-device
-# backend-transfer option. (Since 10.2)
+# @backend-transfer: List of devices (IDs or QOM paths) for
+# backend-transfer migration. When enabled, device backends
+# including opened fds will be passed to the destination in the
+# migration channel (which must be a UNIX domain socket). Default
+# is no backend-transfer migration. (Since 10.2)
#
# Features:
#
@@ -1416,7 +1413,7 @@
'*zero-page-detection': 'ZeroPageDetection',
'*direct-io': 'bool',
'*cpr-exec-command': [ 'str' ],
- '*backend-transfer': { 'type': 'bool',
+ '*backend-transfer': { 'type': [ 'str' ],
'features': [ 'unstable' ] } } }
##
diff --git a/system/qdev-monitor.c b/system/qdev-monitor.c
index 2ac92d0a07..b4a1a88992 100644
--- a/system/qdev-monitor.c
+++ b/system/qdev-monitor.c
@@ -939,6 +939,32 @@ void qmp_device_del(const char *id, Error **errp)
}
}
+bool migrate_backend_transfer(DeviceState *dev)
+{
+ const strList *el = migrate_backend_transfer_list();
+
+ for ( ; el; el = el->next) {
+ if (find_device_state(el->value, false, NULL) == dev) {
+ return true;
+ }
+ }
+
+ return false;
+}
+
+bool migrate_backend_transfer_check_list(const strList *list, Error **errp)
+{
+ const strList *el = list;
+
+ for ( ; el; el = el->next) {
+ if (!find_device_state(el->value, false, errp)) {
+ return false;
+ }
+ }
+
+ return true;
+}
+
int qdev_sync_config(DeviceState *dev, Error **errp)
{
DeviceClass *dc = DEVICE_GET_CLASS(dev);
diff --git a/tests/functional/test_x86_64_tap_migration.py b/tests/functional/test_x86_64_tap_migration.py
index 1f88ff174c..a324b0f374 100644
--- a/tests/functional/test_x86_64_tap_migration.py
+++ b/tests/functional/test_x86_64_tap_migration.py
@@ -254,17 +254,16 @@ def prepare_and_launch_vm(
self.log.info(f"Launching {vm_s} VM")
vm.launch()
- self.set_migration_capabilities(vm, backend_transfer)
-
if not backend_transfer:
tap_name = TAP_ID2 if incoming else TAP_ID
else:
tap_name = TAP_ID
- self.add_virtio_net(vm, vhost, tap_name, backend_transfer)
+ self.add_virtio_net(vm, vhost, tap_name)
+
+ self.set_migration_capabilities(vm, backend_transfer)
- def add_virtio_net(self, vm, vhost: bool, tap_name: str,
- backend_transfer: bool):
+ def add_virtio_net(self, vm, vhost: bool, tap_name: str = "tap0"):
netdev_params = {
"id": "netdev.1",
"vhost": vhost,
@@ -289,17 +288,19 @@ def add_virtio_net(self, vm, vhost: bool, tap_name: str,
bus="pci.1",
mac=GUEST_MAC,
disable_legacy="off",
- backend_transfer=backend_transfer,
)
def set_migration_capabilities(self, vm, backend_transfer=True):
- vm.cmd("migrate-set-capabilities", { "capabilities": [
+ capabilities = [
{"capability": "events", "state": True},
{"capability": "x-ignore-shared", "state": True},
- ]})
- vm.cmd("migrate-set-parameters", {
- "backend-transfer": backend_transfer
- })
+ ]
+ vm.cmd("migrate-set-capabilities", {"capabilities": capabilities})
+ if backend_transfer:
+ vm.cmd(
+ "migrate-set-parameters",
+ {"backend-transfer": ["/machine/peripheral/vnet.1/virtio-backend"]},
+ )
def setup_guest_network(self) -> None:
exec_command_and_wait_for_pattern(self, "ip addr", "# ")
--
Best regards,
Vladimir
On 16.10.25 23:26, Vladimir Sementsov-Ogievskiy wrote:
> On 16.10.25 12:23, Vladimir Sementsov-Ogievskiy wrote:
>> On 16.10.25 11:32, Daniel P. Berrangé wrote:
>>> On Thu, Oct 16, 2025 at 12:02:45AM +0300, Vladimir Sementsov-Ogievskiy wrote:
>>>> On 15.10.25 23:07, Peter Xu wrote:
[..]
>>>> 3. Local migration, but we want to reconfigure some backend, or switch
>>>> to another backend. We disable "backend-transfer" for one device.
>>>
>>> This implies that you're changing 'backend-transfer' against the
>>> device at time of each migration.
>>>
>>> This takes us back to the situation we've had historically where the
>>> behaviour of migration depends on global properties the mgmt app has
>>> set prior to the 'migrate' command being run. We've just tried to get
>>> away from that model by passing everything as parameters to the
>>> migrate command, so I'm loathe to see us invent a new way to have
>>> global state properties changing migration behaviour.
>>>
>>> This 'backend-transfer' device property is not really a device property,
>>> it is an indirect parameter to the 'migrate' command.
>>>
>>> Ergo, if we need the ability to selectively migrate the backend state
>>> of individal devices, then instead of a property on the device, we
>>> should pass a list of device IDs as a parameter to the migrate
>>> command in QMP.
>>
>> Understand.
>>
>> So, it will look like
>>
>> # @backend-transfer: List of devices IDs or QOM paths, to enable
>> # backend-transfer for. In general that means that backend
>> # states and their file descriptors are passed to the destination
>> # in the migration channel (which must be a UNIX socket), and
>> # management tool doesn't have to configure new backends for
>> # target QEMU (like vhost-user server, or TAP device in the kernel).
>> # Default is no backend-transfer migration (Since 10.2)
>>
>
>
> RFC diff to these series, to switch the API to list of IDs:
>
[..]
> @@ -1193,6 +1198,11 @@ bool migrate_params_check(MigrationParameters *params, Error **errp)
> return false;
> }
>
> + if (params->has_backend_transfer &&
> + !migrate_backend_transfer_check_list(params->backend_transfer, errp)) {
> + return false;
> + }
This made me to move capabilities setup after device add in the test. Not a problem.
> +
> return true;
> }
>
[..]
> - vm.cmd("migrate-set-parameters", {
> - "backend-transfer": backend_transfer
> - })
> + ]
> + vm.cmd("migrate-set-capabilities", {"capabilities": capabilities})
> + if backend_transfer:
> + vm.cmd(
> + "migrate-set-parameters",
> + {"backend-transfer": ["/machine/peripheral/vnet.1/virtio-backend"]},
If write just "vnet.1" it doesn't work, of course. Is there some way get pointer to
proxy device from virtio-net.c? But maybe, it's OK as is.
> + )
>
> def setup_guest_network(self) -> None:
> exec_command_and_wait_for_pattern(self, "ip addr", "# ")
>
>
>
--
Best regards,
Vladimir
On Thu, Oct 16, 2025 at 12:23:35PM +0300, Vladimir Sementsov-Ogievskiy wrote: > On 16.10.25 11:32, Daniel P. Berrangé wrote: > > On Thu, Oct 16, 2025 at 12:02:45AM +0300, Vladimir Sementsov-Ogievskiy wrote: > > > On 15.10.25 23:07, Peter Xu wrote: > > > > On Wed, Oct 15, 2025 at 10:02:14PM +0300, Vladimir Sementsov-Ogievskiy wrote: > > > > > On 15.10.25 21:19, Peter Xu wrote: > > > > > > On Wed, Oct 15, 2025 at 04:21:32PM +0300, Vladimir Sementsov-Ogievskiy wrote: > > > > > > > This parameter enables backend-transfer feature: all devices > > > > > > > which support it will migrate their backends (for example a TAP > > > > > > > device, by passing open file descriptor to migration channel). > > > > > > > > > > > > > > Currently no such devices, so the new parameter is a noop. > > > > > > > > > > > > > > Next commit will add support for virtio-net, to migrate its > > > > > > > TAP backend. > > > > > > > > > > > > > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> > > > > > > > --- > > > > > > > > > > [..] > > > > > > > > > > > > --- a/qapi/migration.json > > > > > > > +++ b/qapi/migration.json > > > > > > > @@ -951,9 +951,16 @@ > > > > > > > # is @cpr-exec. The first list element is the program's filename, > > > > > > > # the remainder its arguments. (Since 10.2) > > > > > > > # > > > > > > > +# @backend-transfer: Enable backend-transfer feature for devices that > > > > > > > +# supports it. In general that means that backend state and its > > > > > > > +# file descriptors are passed to the destination in the migraton > > > > > > > +# channel (which must be a UNIX socket). Individual devices > > > > > > > +# declare the support for backend-transfer by per-device > > > > > > > +# backend-transfer option. (Since 10.2) > > > > > > > > > > > > Thanks. > > > > > > > > > > > > I still prefer the name "fd-passing" or anything more explicit than > > > > > > "backend-transfer". Maybe the current name is fine for TAP, only because > > > > > > TAP doesn't have its own VMSD to transfer? > > > > > > > > > > > > Consider a device that would be a backend that supports VMSDs already to be > > > > > > migrated, then if it starts to allow fd-passing, this name will stop being > > > > > > suitable there, because it used to "transfer backend" already, now it's > > > > > > just started to "fd-passing". > > > > > > > > > > > > Meanwhile, consider another example - what if a device is not a backend at > > > > > > all (e.g. vfio?), has its own VMSD, then want to do fd-passing? > > > > > > > > > > Reasonable. > > > > > > > > > > But consider also the discussion with Fabiano in v5, where he argues against fds > > > > > (reasonable too): > > > > > > > > > > https://lore.kernel.org/qemu-devel/87y0qatqoa.fsf@suse.de/ > > > > > > > > > > (still, they were against my "fds" name for the parameter, which is > > > > > really too generic, fd-passing is not) > > > > > > > > > > and the arguments for backend-transfer (to read similar with cpr-transfer) > > > > > > > > > > https://lore.kernel.org/qemu-devel/87ms6qtlgf.fsf@suse.de/ > > > > > > > > > > > > > > > > > > > > > > In general, I think "fd" is really a core concept of this whole thing. > > > > > > > > > > I think, we can call "backend" any external object, linked by the fd. > > > > > > > > > > Still, backend/frontend terminology is so misleading, when applied to > > > > > complex systems (for me, at least), that I don't really like "-backend" > > > > > word here. > > > > > > > > > > fd-passing is OK for me, I can resend with it, if arguments by Fabiano > > > > > not change your mind. > > > > > > > > Ah, I didn't notice the name has been discussed. > > > > > > > > I think it means you can vote for your own preference now because we have > > > > one vote for each. :) Let's also see whether Fabiano will come up with > > > > something better than both. > > > > > > > > You mentioned explicitly the file descriptors in the qapi doc, that's what > > > > I would strongly request for. The other thing is the unix socket check, it > > > > looks all good below now with it, thanks. No strong feelings on the names. > > > > > > > > > > After a bit more thinking, I leaning towards keeping backend-transfer. I think > > > it's more meaningful for the user: > > > > > > If we call it "fd-passing", user may ask: > > > > > > Ok, what is it? Allow QEMU to pass some fds through migration stream, if it > > > supports fds? Which fds? Why to pass them? Finally, why QEMU can't just check > > > is it unix socket or not, and pass any fds it wants if it is? > > > > > > Logical question is, why not just drop the global capability, and check only > > > is it unix socket or not? (OK, relying only on socket type is wrong anyway, > > > as it may be some complex tunneling, which includes unix sockets, but still > > > can't pass fds, but I think now about feature naming) > > > > > > But we really want an explicit switch for the feature. As qemu-update is > > > not the only case of local migration. The another case is changing the > > > backend. So for the user's choice is: > > > > > > 1. Remote migration: we can't reuse backends (files, sockets, host devices), as > > > we are moving to another host. So, we don't enable "backend-transfer". We don't > > > transfer the backend, we have to initialize new backend on another host. > > > > > > 2. Local migration to update QEMU, with minimal freeze-time and minimal > > > extra actions: use "backend-transfer", exactly to keep the backends > > > (vhost-user-server, TAP device in kernel, in-kernel vfio device state, etc) > > > as is. > > > > > > 3. Local migration, but we want to reconfigure some backend, or switch > > > to another backend. We disable "backend-transfer" for one device. > > > > This implies that you're changing 'backend-transfer' against the > > device at time of each migration. > > > > This takes us back to the situation we've had historically where the > > behaviour of migration depends on global properties the mgmt app has > > set prior to the 'migrate' command being run. We've just tried to get > > away from that model by passing everything as parameters to the > > migrate command, so I'm loathe to see us invent a new way to have > > global state properties changing migration behaviour. > > > > This 'backend-transfer' device property is not really a device property, > > it is an indirect parameter to the 'migrate' command. I was not seeing it like that. I was treating per-device parameter to be a flag showing whether the device is capable of passing over FDs, which is more like a device attribute. Those things (after set by machine type) should never change, and the only thing to be changed is the global "backend-transfer" boolean that can be set in the "migrate" QMP command, and should be decided by the admin when one wants to initiate the migration process. > > > > Ergo, if we need the ability to selectively migrate the backend state > > of individal devices, then instead of a property on the device, we > > should pass a list of device IDs as a parameter to the migrate > > command in QMP. I doubt whether we would really need that in reality. Likely the admin should only worry about whether setting the global "backend-transfer", the admin may not even need to know which device, and how many devices, will be beneficial to this feature enabled. It just says, "we're doing local migration and via unix sockets, so whatever devices can try to reuse their backends if possible". Thanks, -- Peter Xu
On Thu, Oct 16, 2025 at 02:40:58PM -0400, Peter Xu wrote: > On Thu, Oct 16, 2025 at 12:23:35PM +0300, Vladimir Sementsov-Ogievskiy wrote: > > On 16.10.25 11:32, Daniel P. Berrangé wrote: > > > On Thu, Oct 16, 2025 at 12:02:45AM +0300, Vladimir Sementsov-Ogievskiy wrote: > > > > 1. Remote migration: we can't reuse backends (files, sockets, host devices), as > > > > we are moving to another host. So, we don't enable "backend-transfer". We don't > > > > transfer the backend, we have to initialize new backend on another host. > > > > > > > > 2. Local migration to update QEMU, with minimal freeze-time and minimal > > > > extra actions: use "backend-transfer", exactly to keep the backends > > > > (vhost-user-server, TAP device in kernel, in-kernel vfio device state, etc) > > > > as is. > > > > > > > > 3. Local migration, but we want to reconfigure some backend, or switch > > > > to another backend. We disable "backend-transfer" for one device. > > > > > > This implies that you're changing 'backend-transfer' against the > > > device at time of each migration. > > > > > > This takes us back to the situation we've had historically where the > > > behaviour of migration depends on global properties the mgmt app has > > > set prior to the 'migrate' command being run. We've just tried to get > > > away from that model by passing everything as parameters to the > > > migrate command, so I'm loathe to see us invent a new way to have > > > global state properties changing migration behaviour. > > > > > > This 'backend-transfer' device property is not really a device property, > > > it is an indirect parameter to the 'migrate' command. > > I was not seeing it like that. > > I was treating per-device parameter to be a flag showing whether the device > is capable of passing over FDs, which is more like a device attribute. > > Those things (after set by machine type) should never change, and the only > thing to be changed is the global "backend-transfer" boolean that can be > set in the "migrate" QMP command, and should be decided by the admin when > one wants to initiate the migration process. > > > > > > > Ergo, if we need the ability to selectively migrate the backend state > > > of individal devices, then instead of a property on the device, we > > > should pass a list of device IDs as a parameter to the migrate > > > command in QMP. > > I doubt whether we would really need that in reality. > > Likely the admin should only worry about whether setting the global > "backend-transfer", the admin may not even need to know which device, and > how many devices, will be beneficial to this feature enabled. > > It just says, "we're doing local migration and via unix sockets, so > whatever devices can try to reuse their backends if possible". An individual device can only use backend transfer if both the old and new QEMU agree that it can be done. At the time we start the origin QEMU we know which set of devices are capable of doing an outgoing backend transfer, but we don't know what set of devices are capable of doing an incoming backend transfer. If we don't have a per-device toggle at time of migration, then we have to assume that the target QEMU can always support at least the same set of incoming backends as the src QEMU outgoing backend. This feels like a potentially risky assumption. Another scenario is where you are doing a localhost migration as a mechanism to let you change a device backend. In that case you'll want to do a backend transfer of all devices, except the one that you want to change. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On Thu, Oct 16, 2025 at 07:51:42PM +0100, Daniel P. Berrangé wrote: > On Thu, Oct 16, 2025 at 02:40:58PM -0400, Peter Xu wrote: > > On Thu, Oct 16, 2025 at 12:23:35PM +0300, Vladimir Sementsov-Ogievskiy wrote: > > > On 16.10.25 11:32, Daniel P. Berrangé wrote: > > > > On Thu, Oct 16, 2025 at 12:02:45AM +0300, Vladimir Sementsov-Ogievskiy wrote: > > > > > 1. Remote migration: we can't reuse backends (files, sockets, host devices), as > > > > > we are moving to another host. So, we don't enable "backend-transfer". We don't > > > > > transfer the backend, we have to initialize new backend on another host. > > > > > > > > > > 2. Local migration to update QEMU, with minimal freeze-time and minimal > > > > > extra actions: use "backend-transfer", exactly to keep the backends > > > > > (vhost-user-server, TAP device in kernel, in-kernel vfio device state, etc) > > > > > as is. > > > > > > > > > > 3. Local migration, but we want to reconfigure some backend, or switch > > > > > to another backend. We disable "backend-transfer" for one device. > > > > > > > > This implies that you're changing 'backend-transfer' against the > > > > device at time of each migration. > > > > > > > > This takes us back to the situation we've had historically where the > > > > behaviour of migration depends on global properties the mgmt app has > > > > set prior to the 'migrate' command being run. We've just tried to get > > > > away from that model by passing everything as parameters to the > > > > migrate command, so I'm loathe to see us invent a new way to have > > > > global state properties changing migration behaviour. > > > > > > > > This 'backend-transfer' device property is not really a device property, > > > > it is an indirect parameter to the 'migrate' command. > > > > I was not seeing it like that. > > > > I was treating per-device parameter to be a flag showing whether the device > > is capable of passing over FDs, which is more like a device attribute. > > > > Those things (after set by machine type) should never change, and the only > > thing to be changed is the global "backend-transfer" boolean that can be > > set in the "migrate" QMP command, and should be decided by the admin when > > one wants to initiate the migration process. > > > > > > > > > > Ergo, if we need the ability to selectively migrate the backend state > > > > of individal devices, then instead of a property on the device, we > > > > should pass a list of device IDs as a parameter to the migrate > > > > command in QMP. > > > > I doubt whether we would really need that in reality. > > > > Likely the admin should only worry about whether setting the global > > "backend-transfer", the admin may not even need to know which device, and > > how many devices, will be beneficial to this feature enabled. > > > > It just says, "we're doing local migration and via unix sockets, so > > whatever devices can try to reuse their backends if possible". > > An individual device can only use backend transfer if both the old and > new QEMU agree that it can be done. At the time we start the origin > QEMU we know which set of devices are capable of doing an outgoing > backend transfer, but we don't know what set of devices are capable > of doing an incoming backend transfer. > > If we don't have a per-device toggle at time of migration, then we > have to assume that the target QEMU can always support at least the > same set of incoming backends as the src QEMU outgoing backend. This > feels like a potentially risky assumption. When using machine properties, these things should already be set by the machine types. E.g. if this is a new QEMU with an old machine type, we should have this per-device property set to OFF forever when booting the VM, and should keep it like that after any rounds of migrations. Because any VM using the old machine type _might_ be migrated back to an older QEMU that won't support it. So IIUC that strictly follows how we use versioned machine types. What Vladimir mentioned previously would be something very special, but indeed when there's no machine type versioning we may need to toggle this before each migration. However since upstream is following the machine type properties way of doing this since N years ago, do we need to worry about that? > > Another scenario is where you are doing a localhost migration as a > mechanism to let you change a device backend. In that case you'll > want to do a backend transfer of all devices, except the one that > you want to change. Right, this might be a real need if it exists. Said that, it's so special that I'm not sure whether the admin can easily migrate with global backend-transfer to OFF in this rare case. In general, I would prefer avoiding to introduce any form of list of devices into the migration system if ever possible. I agree if we must introduce that it should at least be a list of IDs rather than adhoc array of strings. However I still want to see whether we can completely avoid it. Thanks, -- Peter Xu
On Thu, Oct 16, 2025 at 03:29:27PM -0400, Peter Xu wrote: > On Thu, Oct 16, 2025 at 07:51:42PM +0100, Daniel P. Berrangé wrote: > > On Thu, Oct 16, 2025 at 02:40:58PM -0400, Peter Xu wrote: > > > On Thu, Oct 16, 2025 at 12:23:35PM +0300, Vladimir Sementsov-Ogievskiy wrote: > > > > On 16.10.25 11:32, Daniel P. Berrangé wrote: > > > > > On Thu, Oct 16, 2025 at 12:02:45AM +0300, Vladimir Sementsov-Ogievskiy wrote: > > > > > > 1. Remote migration: we can't reuse backends (files, sockets, host devices), as > > > > > > we are moving to another host. So, we don't enable "backend-transfer". We don't > > > > > > transfer the backend, we have to initialize new backend on another host. > > > > > > > > > > > > 2. Local migration to update QEMU, with minimal freeze-time and minimal > > > > > > extra actions: use "backend-transfer", exactly to keep the backends > > > > > > (vhost-user-server, TAP device in kernel, in-kernel vfio device state, etc) > > > > > > as is. > > > > > > > > > > > > 3. Local migration, but we want to reconfigure some backend, or switch > > > > > > to another backend. We disable "backend-transfer" for one device. > > > > > > > > > > This implies that you're changing 'backend-transfer' against the > > > > > device at time of each migration. > > > > > > > > > > This takes us back to the situation we've had historically where the > > > > > behaviour of migration depends on global properties the mgmt app has > > > > > set prior to the 'migrate' command being run. We've just tried to get > > > > > away from that model by passing everything as parameters to the > > > > > migrate command, so I'm loathe to see us invent a new way to have > > > > > global state properties changing migration behaviour. > > > > > > > > > > This 'backend-transfer' device property is not really a device property, > > > > > it is an indirect parameter to the 'migrate' command. > > > > > > I was not seeing it like that. > > > > > > I was treating per-device parameter to be a flag showing whether the device > > > is capable of passing over FDs, which is more like a device attribute. > > > > > > Those things (after set by machine type) should never change, and the only > > > thing to be changed is the global "backend-transfer" boolean that can be > > > set in the "migrate" QMP command, and should be decided by the admin when > > > one wants to initiate the migration process. > > > > > > > > > > > > > Ergo, if we need the ability to selectively migrate the backend state > > > > > of individal devices, then instead of a property on the device, we > > > > > should pass a list of device IDs as a parameter to the migrate > > > > > command in QMP. > > > > > > I doubt whether we would really need that in reality. > > > > > > Likely the admin should only worry about whether setting the global > > > "backend-transfer", the admin may not even need to know which device, and > > > how many devices, will be beneficial to this feature enabled. > > > > > > It just says, "we're doing local migration and via unix sockets, so > > > whatever devices can try to reuse their backends if possible". > > > > An individual device can only use backend transfer if both the old and > > new QEMU agree that it can be done. At the time we start the origin > > QEMU we know which set of devices are capable of doing an outgoing > > backend transfer, but we don't know what set of devices are capable > > of doing an incoming backend transfer. > > > > If we don't have a per-device toggle at time of migration, then we > > have to assume that the target QEMU can always support at least the > > same set of incoming backends as the src QEMU outgoing backend. This > > feels like a potentially risky assumption. > > When using machine properties, these things should already be set by the > machine types. Errm, machine types apply to devices, but this is about transferring backends which are outside the scope of machine types. > E.g. if this is a new QEMU with an old machine type, we should have this > per-device property set to OFF forever when booting the VM, and should keep > it like that after any rounds of migrations. Because any VM using the old > machine type _might_ be migrated back to an older QEMU that won't support > it. So IIUC that strictly follows how we use versioned machine types. That makes no conceptual sense. Whether or not a particular backend can be transferred is determined by the choice of backend and its configuration. A "backend-transfer" property against the device frontend cannot be set from the machine type definition, as the machine type has no knowledge of what backend configuration will be used. > In general, I would prefer avoiding to introduce any form of list of > devices into the migration system if ever possible. I agree if we must > introduce that it should at least be a list of IDs rather than adhoc array > of strings. However I still want to see whether we can completely avoid > it. Yes, anything in the migrate API would have to directly correspond to an ID of a device frontend or backend. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On Thu, Oct 16, 2025 at 08:57:18PM +0100, Daniel P. Berrangé wrote: > Errm, machine types apply to devices, but this is about transferring > backends which are outside the scope of machine types. Ah.. I didn't notice that net backends are not inherited by default from qdev, hence not applicable to machine type properties. Is it possible we enable it somehow, so that backends can have compat properties similarly to frontends? If we go with a list of devices in the migration parameters, to me it'll only be a way to workaround the missing of such capability of net backends. Meanwhile, the admin will need to manage the list of devices even if the admin doesn't really needed to, IMHO. Thanks, -- Peter Xu
On Thu, Oct 16, 2025 at 04:28:10PM -0400, Peter Xu wrote: > On Thu, Oct 16, 2025 at 08:57:18PM +0100, Daniel P. Berrangé wrote: > > Errm, machine types apply to devices, but this is about transferring > > backends which are outside the scope of machine types. > > Ah.. I didn't notice that net backends are not inherited by default from > qdev, hence not applicable to machine type properties. > > Is it possible we enable it somehow, so that backends can have compat > properties similarly to frontends? That is a technical limitation, but the problem here is bigger than just the lack of qdev. It is a conceptual one - where a device is implemented, its behaviour is determined exclusively by the QEMU code. There are some rare exceptions, like host PCI device assignment where functionality is partly in the host hardware, or external device backends where impl is offloaded to an external process, but most pure QEMU impls are able to be made always migratable and compat can be easily ensured long term via machine types props. With backends, alot of behaviour is offloaded to either the host OS, or to external libraries or services. Certain narrow configs may be able to transfer state, but there will always be configs were state transfer is impossible. There can be no coarse rule that a backend is migratable or not - it will usually be highly dependent on the particular configuration choices of the backend in use. Machine types props can't magically make all backend config scenarios migratable. We need to be able to interrogate backends at the time migration is required. > If we go with a list of devices in the migration parameters, to me it'll > only be a way to workaround the missing of such capability of net backends. > Meanwhile, the admin will need to manage the list of devices even if the > admin doesn't really needed to, IMHO. We shouldn't need to list devices in every scenario. We need to focus on the internal API design. We need to have suitable APIs exposed by backends to allow us to query migratability and process vmstate a mere property 'backend-transfer' is insufficient, whether set by QEMU code, or set by the mgmt app. If we have proper APIs each device should be able to query whether its backend can be transferred, and so "do the right thing" if backend transfer is requested by migration. The ability to list devices in the migrate command is only needed to be able to exclude some backends if the purpose of migration is to change a backend With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On Fri, Oct 17, 2025 at 09:10:38AM +0100, Daniel P. Berrangé wrote: > On Thu, Oct 16, 2025 at 04:28:10PM -0400, Peter Xu wrote: > > On Thu, Oct 16, 2025 at 08:57:18PM +0100, Daniel P. Berrangé wrote: > > > Errm, machine types apply to devices, but this is about transferring > > > backends which are outside the scope of machine types. > > > > Ah.. I didn't notice that net backends are not inherited by default from > > qdev, hence not applicable to machine type properties. > > > > Is it possible we enable it somehow, so that backends can have compat > > properties similarly to frontends? > > That is a technical limitation, but the problem here is bigger than > just the lack of qdev. It is a conceptual one - where a device is > implemented, its behaviour is determined exclusively by the QEMU > code. There are some rare exceptions, like host PCI device assignment > where functionality is partly in the host hardware, or external > device backends where impl is offloaded to an external process, but > most pure QEMU impls are able to be made always migratable and compat > can be easily ensured long term via machine types props. > > With backends, alot of behaviour is offloaded to either the host > OS, or to external libraries or services. Certain narrow configs > may be able to transfer state, but there will always be configs > were state transfer is impossible. There can be no coarse rule > that a backend is migratable or not - it will usually be highly > dependent on the particular configuration choices of the backend > in use. Machine types props can't magically make all backend > config scenarios migratable. We need to be able to interrogate > backends at the time migration is required. I believe we have similar things already, like USO, which relies on the kernel feature set that QEMU runs on. What we do right now, afaiu, is we make it a per-device property ON/OFF. Then when unknown remote information is required, we make it ON/OFF/AUTO. When it's AUTO, it may prefer ON and probe the kernel, dynamically decide the value on realize. I didn't check the code if it's explicitly done like that, but I think that's doable at least when a backend relies on such remote information. > > > If we go with a list of devices in the migration parameters, to me it'll > > only be a way to workaround the missing of such capability of net backends. > > Meanwhile, the admin will need to manage the list of devices even if the > > admin doesn't really needed to, IMHO. > > We shouldn't need to list devices in every scenario. We need to focus on > the internal API design. We need to have suitable APIs exposed by backends > to allow us to query migratability and process vmstate a mere property > 'backend-transfer' is insufficient, whether set by QEMU code, or set by > the mgmt app. > > If we have proper APIs each device should be able to query whether its > backend can be transferred, and so "do the right thing" if backend > transfer is requested by migration. The ability to list devices in the > migrate command is only needed to be able to exclude some backends if > the purpose of migration is to change a backend IIUC, it is a proposal of using exclude-list, which should in most cases be empty. Yes, I agree it's at least better than query all the devices and having mgmt specify each backend to enable backend-transfer. However IIUC it also means the query API will be internal, so that migration will need to be able to query that from device. Then we have similar issue on what happens if we migrate from a new QEMU to an old QEMU, that new QEMU (when migration module queries TAP) reports per-device ON, however it won't actually work because dest QEMU is OFF. IOW, we're still missing the functionality that we leverage from machine type properties.. Or if we make the query to be visible to QMP / mgmt, then it'll at least need to be a include-list, not exclude-list. Then, we're literally bypassing the machine type versioning mechanism, offloading all these to mgmt. It should work, which I agree. But it also means we're reinventing the wheel of what machine type properties were designed for... because if we expose all these caps on all devices (as long as mutable after device realize), we do not need machine type properties anymore. They're fundamentally solving the same problem, IMHO, on providing a working value for migration no matter what the dest QEMU binary is. Thanks, -- Peter Xu
On 17.10.25 11:10, Daniel P. Berrangé wrote: >> If we go with a list of devices in the migration parameters, to me it'll >> only be a way to workaround the missing of such capability of net backends. >> Meanwhile, the admin will need to manage the list of devices even if the >> admin doesn't really needed to, IMHO. > We shouldn't need to list devices in every scenario. We need to focus on > the internal API design. We need to have suitable APIs exposed by backends > to allow us to query migratability and process vmstate a mere property > 'backend-transfer' is insufficient, whether set by QEMU code, or set by > the mgmt app. I now imagine the following: I already need an additional .pre_incoming migration handler for the feature, see patch [PATCH v8 14/19] migration: introduce .pre_incoming() vmsd handler . I can add a boolean backend_transfer parameter to that handler, so that it informs the device, that it should get the backend state from the migration stream. And that's a good point to fail, if device doesn't support backend transfer in current configuration. If so, it seems logical to add symmetrical .pre_outgoing() vmsd handler, with same backend_transfer parameter, to inform source devices (or get errors from them). Or, otherwise, make a separate VMSD handler .supports_backend_transfer(), which should be called at start of incoming and outgoing migrations to check the specified list of IDs, as well as we can also call it on migrate-set-parameters, to get an earlier failure. And keep the devices to call some migrate_backend_transfer(dev), to understand, should they do backend-transfer or not (like in a diff, which I've sent yesterday in this thread). -- Best regards, Vladimir
On 17.10.25 11:10, Daniel P. Berrangé wrote:
>> Meanwhile, the admin will need to manage the list of devices even if the
>> admin doesn't really needed to, IMHO.
> We shouldn't need to list devices in every scenario.
Do you mean, we may make union,
backend-transfer = true | false | [list of IDs]
Where true means, enable backend-transfer for all supporting devices?
So that normally, we'll not list all devices, but just set it to true?
But this way, migration will fail, if target version doesn't support
backend-transfer for some of used devices, or support for some
another, where source lack the support. So that's a way to create a
situation, where two QEMUs, with same device options, same machine
types, same configurations and same migration parameters / capabilities
define incompatible migration states..
> We need to focus on
> the internal API design. We need to have suitable APIs exposed by backends
> to allow us to query migratability and process vmstate a mere property
> 'backend-transfer' is insufficient, whether set by QEMU code, or set by
> the mgmt app.
>
> If we have proper APIs each device should be able to query whether its
> backend can be transferred, and so "do the right thing" if backend
> transfer is requested by migration. The ability to list devices in the
> migrate command is only needed to be able to exclude some backends if
> the purpose of migration is to change a backend
--
Best regards,
Vladimir
On Fri, Oct 17, 2025 at 11:26:59AM +0300, Vladimir Sementsov-Ogievskiy wrote: > On 17.10.25 11:10, Daniel P. Berrangé wrote: > > > Meanwhile, the admin will need to manage the list of devices even if the > > > admin doesn't really needed to, IMHO. > > We shouldn't need to list devices in every scenario. > > Do you mean, we may make union, > > backend-transfer = true | false | [list of IDs] > > Where true means, enable backend-transfer for all supporting devices? > So that normally, we'll not list all devices, but just set it to true? Well I was thinking separate parameters backend-transfer: bool backend-transfer-devices: [str] (optional list of IDs) but it amounts to the same thing > But this way, migration will fail, if target version doesn't support > backend-transfer for some of used devices, or support for some > another, where source lack the support. So that's a way to create a > situation, where two QEMUs, with same device options, same machine > types, same configurations and same migration parameters / capabilities > define incompatible migration states.. It is worse - the backend on both sides may support transfer, but may none the less be incompatible due to changed configuration, so this needs mgmt app input too. The challenge we have is that whether or not a backend supports transfer requires fairly detailed know of QEMU and the specific configuration of the backend. It is pretty undesirable for mgmt apps to have to that knowledge, as the matrix of possibilities is quite large and liable to change over time. If we consider 'backend transfer' to be a performance optimization, then really we want QEMU to "do the right thing" as much as is possible. Source and dst QEMUs don't have a bi-directional channel though, so they can't negotiate the common subset of backends they both support - it'll need help from the mgmt app. One possibility is a new QMP command "query-migratable-backends" which lists all device IDs, whose current backend configuration is reporting the ability to transfer state. The mgmt app could run that on both sides of the migration, take the intersection of the two lists, and then further subtract any devices where it has delibrately changed the backend configuration on the dst. If we had that, then we could always pass the ID list to the migrate command, while also avoiding hardcoding knowledge of QEMU backend impl details - it would largely "just work". > > We need to focus on > > the internal API design. We need to have suitable APIs exposed by backends > > to allow us to query migratability and process vmstate a mere property > > 'backend-transfer' is insufficient, whether set by QEMU code, or set by > > the mgmt app. > > > > If we have proper APIs each device should be able to query whether its > > backend can be transferred, and so "do the right thing" if backend > > transfer is requested by migration. The ability to list devices in the > > migrate command is only needed to be able to exclude some backends if > > the purpose of migration is to change a backend With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On 17.10.25 11:50, Daniel P. Berrangé wrote:
> On Fri, Oct 17, 2025 at 11:26:59AM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> On 17.10.25 11:10, Daniel P. Berrangé wrote:
>>>> Meanwhile, the admin will need to manage the list of devices even if the
>>>> admin doesn't really needed to, IMHO.
>>> We shouldn't need to list devices in every scenario.
>>
>> Do you mean, we may make union,
>>
>> backend-transfer = true | false | [list of IDs]
>>
>> Where true means, enable backend-transfer for all supporting devices?
>> So that normally, we'll not list all devices, but just set it to true?
>
> Well I was thinking separate parameters
>
> backend-transfer: bool
> backend-transfer-devices: [str] (optional list of IDs)
>
> but it amounts to the same thing
>
>> But this way, migration will fail, if target version doesn't support
>> backend-transfer for some of used devices, or support for some
>> another, where source lack the support. So that's a way to create a
>> situation, where two QEMUs, with same device options, same machine
>> types, same configurations and same migration parameters / capabilities
>> define incompatible migration states..
>
> It is worse - the backend on both sides may support transfer,
> but may none the less be incompatible due to changed configuration,
> so this needs mgmt app input too.
>
> The challenge we have is that whether or not a backend supports
> transfer requires fairly detailed know of QEMU and the specific
> configuration of the backend. It is pretty undesirable for mgmt
> apps to have to that knowledge, as the matrix of possibilities
> is quite large and liable to change over time.
>
> If we consider 'backend transfer' to be a performance optimization,
> then really we want QEMU to "do the right thing" as much as is
> possible.
>
> Source and dst QEMUs don't have a bi-directional channel though,
> so they can't negotiate the common subset of backends they both
> support - it'll need help from the mgmt app.
As I heard from Peter, there a future plans to create such channel
https://wiki.qemu.org/ToDo/LiveMigration#Migration_handshake
>
> One possibility is a new QMP command "query-migratable-backends"
> which lists all device IDs, whose current backend configuration
> is reporting the ability to transfer state. The mgmt app could
> run that on both sides of the migration, take the intersection
> of the two lists, and then further subtract any devices where
> it has delibrately changed the backend configuration on the dst.
>
> If we had that, then we could always pass the ID list to the
> migrate command, while also avoiding hardcoding knowledge of
> QEMU backend impl details - it would largely "just work".
Yes "query + get intersection + set the list" works good for me.
That's enough abstract, the management app should not even care
what these IDs are.
And if migration-handshake realized, that (as many other
paraameters) may be simplified. We may finally have
backend-transfer = "off" | "auto" | [list of IDs]
, where "auto" means exactly negotiate with target the maximal set
of devices, for which we can do backend-transfer.
>
>>> We need to focus on
>>> the internal API design. We need to have suitable APIs exposed by backends
>>> to allow us to query migratability and process vmstate a mere property
>>> 'backend-transfer' is insufficient, whether set by QEMU code, or set by
>>> the mgmt app.
>>>
>>> If we have proper APIs each device should be able to query whether its
>>> backend can be transferred, and so "do the right thing" if backend
>>> transfer is requested by migration. The ability to list devices in the
>>> migrate command is only needed to be able to exclude some backends if
>>> the purpose of migration is to change a backend
>
> With regards,
> Daniel
--
Best regards,
Vladimir
On 16.10.25 23:28, Peter Xu wrote: > On Thu, Oct 16, 2025 at 08:57:18PM +0100, Daniel P. Berrangé wrote: >> Errm, machine types apply to devices, but this is about transferring >> backends which are outside the scope of machine types. > > Ah.. I didn't notice that net backends are not inherited by default from > qdev, hence not applicable to machine type properties. > > Is it possible we enable it somehow, so that backends can have compat > properties similarly to frontends? But that would mean, that we can't reconfigure a backend during live migration. In my understanding, machine type properties are visible to the guest, and that's why we can't change them for running vm, even during live migration. Bringing here another type of properties, which we _can_ change for running vm (even if changing is not very comfortable for admin), will be like tying ourselves hands. And yes, there is a way to change any properties by qom-set. But it lays out of paradigm of machine types, and normally we can't change most of properties in flight. Or in other words: if we _can_ go on only with migration parameters, that actually shows, that what we are talking about is definitely property of migration, not property of device. And final note: if we can use one mechanism instead of two mechanisms, it makes the architecture twice simpler. Trying to go on with _only_ device properties would mean run a bench of qom-set commands before every migration (as we have to distinguish local and remote migrations anyway), that looks bad. On the other hand, go on with _only_ migration parameter is feasible and looks better. And very final note: making global parameter + per-device parameters, actually, global parameter become a workaround to the fact that we don't want run a bench of qom-set commands. So, global parameter is an additional API to hide inconvenience of the main API. > > If we go with a list of devices in the migration parameters, to me it'll > only be a way to workaround the missing of such capability of net backends. > Meanwhile, the admin will need to manage the list of devices even if the > admin doesn't really needed to, IMHO. > > Thanks, > -- Best regards, Vladimir
On Fri, Oct 17, 2025 at 09:51:26AM +0300, Vladimir Sementsov-Ogievskiy wrote: > On 16.10.25 23:28, Peter Xu wrote: > > On Thu, Oct 16, 2025 at 08:57:18PM +0100, Daniel P. Berrangé wrote: > > > Errm, machine types apply to devices, but this is about transferring > > > backends which are outside the scope of machine types. > > > > Ah.. I didn't notice that net backends are not inherited by default from > > qdev, hence not applicable to machine type properties. > > > > Is it possible we enable it somehow, so that backends can have compat > > properties similarly to frontends? > > But that would mean, that we can't reconfigure a backend during live migration. > > In my understanding, machine type properties are visible to the guest, > and that's why we can't change them for running vm, even during live > migration. IIUC machine type properties may or may not be visible to the guest. It should depend on whether it is relevant to a guest-visible behavior. Here a flag showing "whether TAP, as a backend, can migrate" shouldn't be exposed to guest. I was indeed expecting that one will need to qom-set it for each device if you want to get rid of versioned machine types. It's not ideal interfacing as what Dan was looking for, but it should still work so far, and I think it might still be fair if it's only needed without machine type versionings. > > Bringing here another type of properties, which we _can_ change for > running vm (even if changing is not very comfortable for admin), will > be like tying ourselves hands. > > And yes, there is a way to change any properties by qom-set. But it > lays out of paradigm of machine types, and normally we can't change > most of properties in flight. > > > Or in other words: if we _can_ go on only with migration parameters, > that actually shows, that what we are talking about is definitely > property of migration, not property of device. > > > And final note: if we can use one mechanism instead of two mechanisms, > it makes the architecture twice simpler. Trying to go on with _only_ > device properties would mean run a bench of qom-set commands before > every migration (as we have to distinguish local and remote migrations > anyway), that looks bad. On the other hand, go on with _only_ migration > parameter is feasible and looks better. > > > And very final note: making global parameter + per-device parameters, > actually, global parameter become a workaround to the fact that we > don't want run a bench of qom-set commands. So, global parameter is > an additional API to hide inconvenience of the main API. IMHO it's not a workaround. To me, it's a better way of abstraction, because the migration side provides the capability of passing FDs, and whatever is generic about that should be attached to the global knob. Migration shouldn't care about behavior or attributes of a specific device. Listing the devices in any way in migration's QAPI is a workaround instead. But I agree I do not know whether it's easy to have net backends support machine types properties. I think it still makes sense logically that a net backend is a TYPE_DEVICE, even if it's a backend device which is not directly visible to the guest. Thanks, -- Peter Xu
On Thu, Oct 16, 2025 at 07:51:42PM +0100, Daniel P. Berrangé wrote: > On Thu, Oct 16, 2025 at 02:40:58PM -0400, Peter Xu wrote: > > On Thu, Oct 16, 2025 at 12:23:35PM +0300, Vladimir Sementsov-Ogievskiy wrote: > > > On 16.10.25 11:32, Daniel P. Berrangé wrote: > > > > On Thu, Oct 16, 2025 at 12:02:45AM +0300, Vladimir Sementsov-Ogievskiy wrote: > > > > > 1. Remote migration: we can't reuse backends (files, sockets, host devices), as > > > > > we are moving to another host. So, we don't enable "backend-transfer". We don't > > > > > transfer the backend, we have to initialize new backend on another host. > > > > > > > > > > 2. Local migration to update QEMU, with minimal freeze-time and minimal > > > > > extra actions: use "backend-transfer", exactly to keep the backends > > > > > (vhost-user-server, TAP device in kernel, in-kernel vfio device state, etc) > > > > > as is. > > > > > > > > > > 3. Local migration, but we want to reconfigure some backend, or switch > > > > > to another backend. We disable "backend-transfer" for one device. > > > > > > > > This implies that you're changing 'backend-transfer' against the > > > > device at time of each migration. > > > > > > > > This takes us back to the situation we've had historically where the > > > > behaviour of migration depends on global properties the mgmt app has > > > > set prior to the 'migrate' command being run. We've just tried to get > > > > away from that model by passing everything as parameters to the > > > > migrate command, so I'm loathe to see us invent a new way to have > > > > global state properties changing migration behaviour. > > > > > > > > This 'backend-transfer' device property is not really a device property, > > > > it is an indirect parameter to the 'migrate' command. > > > > I was not seeing it like that. > > > > I was treating per-device parameter to be a flag showing whether the device > > is capable of passing over FDs, which is more like a device attribute. Whether a backend is technically capable of transfer shouldn't require a user specified property - there should be an internal API to query whether the current backend configuration is transferrable or not, based on the code implementation. Allowing a mgmt app to specify this can only lead to mistakes, because they don't know the internal constraints of the implementation. The mgmt app should only be concerned with whether they want to transfer a backend or not which is a time-of-use decision rather than launch time decision. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On Thu, Oct 16, 2025 at 08:19:37PM +0100, Daniel P. Berrangé wrote: > On Thu, Oct 16, 2025 at 07:51:42PM +0100, Daniel P. Berrangé wrote: > > On Thu, Oct 16, 2025 at 02:40:58PM -0400, Peter Xu wrote: > > > On Thu, Oct 16, 2025 at 12:23:35PM +0300, Vladimir Sementsov-Ogievskiy wrote: > > > > On 16.10.25 11:32, Daniel P. Berrangé wrote: > > > > > On Thu, Oct 16, 2025 at 12:02:45AM +0300, Vladimir Sementsov-Ogievskiy wrote: > > > > > > 1. Remote migration: we can't reuse backends (files, sockets, host devices), as > > > > > > we are moving to another host. So, we don't enable "backend-transfer". We don't > > > > > > transfer the backend, we have to initialize new backend on another host. > > > > > > > > > > > > 2. Local migration to update QEMU, with minimal freeze-time and minimal > > > > > > extra actions: use "backend-transfer", exactly to keep the backends > > > > > > (vhost-user-server, TAP device in kernel, in-kernel vfio device state, etc) > > > > > > as is. > > > > > > > > > > > > 3. Local migration, but we want to reconfigure some backend, or switch > > > > > > to another backend. We disable "backend-transfer" for one device. > > > > > > > > > > This implies that you're changing 'backend-transfer' against the > > > > > device at time of each migration. > > > > > > > > > > This takes us back to the situation we've had historically where the > > > > > behaviour of migration depends on global properties the mgmt app has > > > > > set prior to the 'migrate' command being run. We've just tried to get > > > > > away from that model by passing everything as parameters to the > > > > > migrate command, so I'm loathe to see us invent a new way to have > > > > > global state properties changing migration behaviour. > > > > > > > > > > This 'backend-transfer' device property is not really a device property, > > > > > it is an indirect parameter to the 'migrate' command. > > > > > > I was not seeing it like that. > > > > > > I was treating per-device parameter to be a flag showing whether the device > > > is capable of passing over FDs, which is more like a device attribute. > > Whether a backend is technically capable of transfer shouldn't require a > user specified property - there should be an internal API to query whether > the current backend configuration is transferrable or not, based on the > code implementation. Allowing a mgmt app to specify this can only lead > to mistakes, because they don't know the internal constraints of the > implementation. > > The mgmt app should only be concerned with whether they want to transfer > a backend or not which is a time-of-use decision rather than launch time > decision. IMHO the per-device property, when available, should always mean it fully support the feature, when it is turned ON. I also think above statement matches exactly how I see it.. I never expected mgmt to toggle the per-device properties, as I just left similar statements in another reply. That's also why I think the global backend-transfer should be the only thing exposed to mgmt. So even if the device properties would exist, they should only be used in compat properties for the upstream QEMUs. They're still needed, and be helpful when other devices introduce some similar concepts to support fd passover, then on some machine types when the global feature enabled, QEMU will automatically do fd-pass for some devices and some not, based on the machine type. Thanks, -- Peter Xu
On Thu, Oct 16, 2025 at 03:39:03PM -0400, Peter Xu wrote: > On Thu, Oct 16, 2025 at 08:19:37PM +0100, Daniel P. Berrangé wrote: > > On Thu, Oct 16, 2025 at 07:51:42PM +0100, Daniel P. Berrangé wrote: > > > On Thu, Oct 16, 2025 at 02:40:58PM -0400, Peter Xu wrote: > > > > On Thu, Oct 16, 2025 at 12:23:35PM +0300, Vladimir Sementsov-Ogievskiy wrote: > > > > > On 16.10.25 11:32, Daniel P. Berrangé wrote: > > > > > > On Thu, Oct 16, 2025 at 12:02:45AM +0300, Vladimir Sementsov-Ogievskiy wrote: > > > > > > > 1. Remote migration: we can't reuse backends (files, sockets, host devices), as > > > > > > > we are moving to another host. So, we don't enable "backend-transfer". We don't > > > > > > > transfer the backend, we have to initialize new backend on another host. > > > > > > > > > > > > > > 2. Local migration to update QEMU, with minimal freeze-time and minimal > > > > > > > extra actions: use "backend-transfer", exactly to keep the backends > > > > > > > (vhost-user-server, TAP device in kernel, in-kernel vfio device state, etc) > > > > > > > as is. > > > > > > > > > > > > > > 3. Local migration, but we want to reconfigure some backend, or switch > > > > > > > to another backend. We disable "backend-transfer" for one device. > > > > > > > > > > > > This implies that you're changing 'backend-transfer' against the > > > > > > device at time of each migration. > > > > > > > > > > > > This takes us back to the situation we've had historically where the > > > > > > behaviour of migration depends on global properties the mgmt app has > > > > > > set prior to the 'migrate' command being run. We've just tried to get > > > > > > away from that model by passing everything as parameters to the > > > > > > migrate command, so I'm loathe to see us invent a new way to have > > > > > > global state properties changing migration behaviour. > > > > > > > > > > > > This 'backend-transfer' device property is not really a device property, > > > > > > it is an indirect parameter to the 'migrate' command. > > > > > > > > I was not seeing it like that. > > > > > > > > I was treating per-device parameter to be a flag showing whether the device > > > > is capable of passing over FDs, which is more like a device attribute. > > > > Whether a backend is technically capable of transfer shouldn't require a > > user specified property - there should be an internal API to query whether > > the current backend configuration is transferrable or not, based on the > > code implementation. Allowing a mgmt app to specify this can only lead > > to mistakes, because they don't know the internal constraints of the > > implementation. > > > > The mgmt app should only be concerned with whether they want to transfer > > a backend or not which is a time-of-use decision rather than launch time > > decision. > > IMHO the per-device property, when available, should always mean it fully > support the feature, when it is turned ON. That can't be expressed in a property in the device. Consider the virtio-net device. The backend transfer is only possible of the virtio-net is associated with a netdev using the vhost-user backend, and the vhost-user backend must be using a chardev with a socket backend, and the socket backend must not have TLS or websockets enabled. Migratability of the backend requires an API against the NetClientInfo object, which will in turn require calling out to an API against the Chardv object. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On 16.10.25 12:23, Vladimir Sementsov-Ogievskiy wrote: > On 16.10.25 11:32, Daniel P. Berrangé wrote: >> On Thu, Oct 16, 2025 at 12:02:45AM +0300, Vladimir Sementsov-Ogievskiy wrote: >>> On 15.10.25 23:07, Peter Xu wrote: >>>> On Wed, Oct 15, 2025 at 10:02:14PM +0300, Vladimir Sementsov-Ogievskiy wrote: >>>>> On 15.10.25 21:19, Peter Xu wrote: >>>>>> On Wed, Oct 15, 2025 at 04:21:32PM +0300, Vladimir Sementsov-Ogievskiy wrote: >>>>>>> This parameter enables backend-transfer feature: all devices >>>>>>> which support it will migrate their backends (for example a TAP >>>>>>> device, by passing open file descriptor to migration channel). >>>>>>> >>>>>>> Currently no such devices, so the new parameter is a noop. >>>>>>> >>>>>>> Next commit will add support for virtio-net, to migrate its >>>>>>> TAP backend. >>>>>>> >>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> >>>>>>> --- >>>>> >>>>> [..] >>>>> >>>>>>> --- a/qapi/migration.json >>>>>>> +++ b/qapi/migration.json >>>>>>> @@ -951,9 +951,16 @@ >>>>>>> # is @cpr-exec. The first list element is the program's filename, >>>>>>> # the remainder its arguments. (Since 10.2) >>>>>>> # >>>>>>> +# @backend-transfer: Enable backend-transfer feature for devices that >>>>>>> +# supports it. In general that means that backend state and its >>>>>>> +# file descriptors are passed to the destination in the migraton >>>>>>> +# channel (which must be a UNIX socket). Individual devices >>>>>>> +# declare the support for backend-transfer by per-device >>>>>>> +# backend-transfer option. (Since 10.2) >>>>>> >>>>>> Thanks. >>>>>> >>>>>> I still prefer the name "fd-passing" or anything more explicit than >>>>>> "backend-transfer". Maybe the current name is fine for TAP, only because >>>>>> TAP doesn't have its own VMSD to transfer? >>>>>> >>>>>> Consider a device that would be a backend that supports VMSDs already to be >>>>>> migrated, then if it starts to allow fd-passing, this name will stop being >>>>>> suitable there, because it used to "transfer backend" already, now it's >>>>>> just started to "fd-passing". >>>>>> >>>>>> Meanwhile, consider another example - what if a device is not a backend at >>>>>> all (e.g. vfio?), has its own VMSD, then want to do fd-passing? >>>>> >>>>> Reasonable. >>>>> >>>>> But consider also the discussion with Fabiano in v5, where he argues against fds >>>>> (reasonable too): >>>>> >>>>> https://lore.kernel.org/qemu-devel/87y0qatqoa.fsf@suse.de/ >>>>> >>>>> (still, they were against my "fds" name for the parameter, which is >>>>> really too generic, fd-passing is not) >>>>> >>>>> and the arguments for backend-transfer (to read similar with cpr-transfer) >>>>> >>>>> https://lore.kernel.org/qemu-devel/87ms6qtlgf.fsf@suse.de/ >>>>> >>>>> >>>>>> >>>>>> In general, I think "fd" is really a core concept of this whole thing. >>>>> >>>>> I think, we can call "backend" any external object, linked by the fd. >>>>> >>>>> Still, backend/frontend terminology is so misleading, when applied to >>>>> complex systems (for me, at least), that I don't really like "-backend" >>>>> word here. >>>>> >>>>> fd-passing is OK for me, I can resend with it, if arguments by Fabiano >>>>> not change your mind. >>>> >>>> Ah, I didn't notice the name has been discussed. >>>> >>>> I think it means you can vote for your own preference now because we have >>>> one vote for each. :) Let's also see whether Fabiano will come up with >>>> something better than both. >>>> >>>> You mentioned explicitly the file descriptors in the qapi doc, that's what >>>> I would strongly request for. The other thing is the unix socket check, it >>>> looks all good below now with it, thanks. No strong feelings on the names. >>>> >>> >>> After a bit more thinking, I leaning towards keeping backend-transfer. I think >>> it's more meaningful for the user: >>> >>> If we call it "fd-passing", user may ask: >>> >>> Ok, what is it? Allow QEMU to pass some fds through migration stream, if it >>> supports fds? Which fds? Why to pass them? Finally, why QEMU can't just check >>> is it unix socket or not, and pass any fds it wants if it is? >>> >>> Logical question is, why not just drop the global capability, and check only >>> is it unix socket or not? (OK, relying only on socket type is wrong anyway, >>> as it may be some complex tunneling, which includes unix sockets, but still >>> can't pass fds, but I think now about feature naming) >>> >>> But we really want an explicit switch for the feature. As qemu-update is >>> not the only case of local migration. The another case is changing the >>> backend. So for the user's choice is: >>> >>> 1. Remote migration: we can't reuse backends (files, sockets, host devices), as >>> we are moving to another host. So, we don't enable "backend-transfer". We don't >>> transfer the backend, we have to initialize new backend on another host. >>> >>> 2. Local migration to update QEMU, with minimal freeze-time and minimal >>> extra actions: use "backend-transfer", exactly to keep the backends >>> (vhost-user-server, TAP device in kernel, in-kernel vfio device state, etc) >>> as is. >>> >>> 3. Local migration, but we want to reconfigure some backend, or switch >>> to another backend. We disable "backend-transfer" for one device. >> >> This implies that you're changing 'backend-transfer' against the >> device at time of each migration. >> >> This takes us back to the situation we've had historically where the >> behaviour of migration depends on global properties the mgmt app has >> set prior to the 'migrate' command being run. We've just tried to get >> away from that model by passing everything as parameters to the >> migrate command, so I'm loathe to see us invent a new way to have >> global state properties changing migration behaviour. >> >> This 'backend-transfer' device property is not really a device property, >> it is an indirect parameter to the 'migrate' command. >> >> Ergo, if we need the ability to selectively migrate the backend state >> of individal devices, then instead of a property on the device, we >> should pass a list of device IDs as a parameter to the migrate >> command in QMP. > > Understand. > > So, it will look like > > # @backend-transfer: List of devices IDs or QOM paths, to enable > # backend-transfer for. In general that means that backend > # states and their file descriptors are passed to the destination > # in the migration channel (which must be a UNIX socket), and > # management tool doesn't have to configure new backends for > # target QEMU (like vhost-user server, or TAP device in the kernel). > # Default is no backend-transfer migration (Since 10.2) > > > Peter, is it OK for you? > > Or, may be, we just can continue with two simple experimental boolean parameters: @backend-transfer-vhost-user-blk and @backend-transfer-virtio-net-tap and not care to implement good-final-complex-API, while it's unstable anyway? -- Best regards, Vladimir
On Thu, Oct 16, 2025 at 01:38:25PM +0300, Vladimir Sementsov-Ogievskiy wrote: > On 16.10.25 12:23, Vladimir Sementsov-Ogievskiy wrote: > > On 16.10.25 11:32, Daniel P. Berrangé wrote: > > > On Thu, Oct 16, 2025 at 12:02:45AM +0300, Vladimir Sementsov-Ogievskiy wrote: > > > > On 15.10.25 23:07, Peter Xu wrote: > > > > > On Wed, Oct 15, 2025 at 10:02:14PM +0300, Vladimir Sementsov-Ogievskiy wrote: > > > > > > On 15.10.25 21:19, Peter Xu wrote: > > > > > > > On Wed, Oct 15, 2025 at 04:21:32PM +0300, Vladimir Sementsov-Ogievskiy wrote: > > > > > > > > This parameter enables backend-transfer feature: all devices > > > > > > > > which support it will migrate their backends (for example a TAP > > > > > > > > device, by passing open file descriptor to migration channel). > > > > > > > > > > > > > > > > Currently no such devices, so the new parameter is a noop. > > > > > > > > > > > > > > > > Next commit will add support for virtio-net, to migrate its > > > > > > > > TAP backend. > > > > > > > > > > > > > > > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> > > > > > > > > --- > > > > > > > > > > > > [..] > > > > > > > > > > > > > > --- a/qapi/migration.json > > > > > > > > +++ b/qapi/migration.json > > > > > > > > @@ -951,9 +951,16 @@ > > > > > > > > # is @cpr-exec. The first list element is the program's filename, > > > > > > > > # the remainder its arguments. (Since 10.2) > > > > > > > > # > > > > > > > > +# @backend-transfer: Enable backend-transfer feature for devices that > > > > > > > > +# supports it. In general that means that backend state and its > > > > > > > > +# file descriptors are passed to the destination in the migraton > > > > > > > > +# channel (which must be a UNIX socket). Individual devices > > > > > > > > +# declare the support for backend-transfer by per-device > > > > > > > > +# backend-transfer option. (Since 10.2) > > > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > I still prefer the name "fd-passing" or anything more explicit than > > > > > > > "backend-transfer". Maybe the current name is fine for TAP, only because > > > > > > > TAP doesn't have its own VMSD to transfer? > > > > > > > > > > > > > > Consider a device that would be a backend that supports VMSDs already to be > > > > > > > migrated, then if it starts to allow fd-passing, this name will stop being > > > > > > > suitable there, because it used to "transfer backend" already, now it's > > > > > > > just started to "fd-passing". > > > > > > > > > > > > > > Meanwhile, consider another example - what if a device is not a backend at > > > > > > > all (e.g. vfio?), has its own VMSD, then want to do fd-passing? > > > > > > > > > > > > Reasonable. > > > > > > > > > > > > But consider also the discussion with Fabiano in v5, where he argues against fds > > > > > > (reasonable too): > > > > > > > > > > > > https://lore.kernel.org/qemu-devel/87y0qatqoa.fsf@suse.de/ > > > > > > > > > > > > (still, they were against my "fds" name for the parameter, which is > > > > > > really too generic, fd-passing is not) > > > > > > > > > > > > and the arguments for backend-transfer (to read similar with cpr-transfer) > > > > > > > > > > > > https://lore.kernel.org/qemu-devel/87ms6qtlgf.fsf@suse.de/ > > > > > > > > > > > > > > > > > > > > > > > > > > In general, I think "fd" is really a core concept of this whole thing. > > > > > > > > > > > > I think, we can call "backend" any external object, linked by the fd. > > > > > > > > > > > > Still, backend/frontend terminology is so misleading, when applied to > > > > > > complex systems (for me, at least), that I don't really like "-backend" > > > > > > word here. > > > > > > > > > > > > fd-passing is OK for me, I can resend with it, if arguments by Fabiano > > > > > > not change your mind. > > > > > > > > > > Ah, I didn't notice the name has been discussed. > > > > > > > > > > I think it means you can vote for your own preference now because we have > > > > > one vote for each. :) Let's also see whether Fabiano will come up with > > > > > something better than both. > > > > > > > > > > You mentioned explicitly the file descriptors in the qapi doc, that's what > > > > > I would strongly request for. The other thing is the unix socket check, it > > > > > looks all good below now with it, thanks. No strong feelings on the names. > > > > > > > > > > > > > After a bit more thinking, I leaning towards keeping backend-transfer. I think > > > > it's more meaningful for the user: > > > > > > > > If we call it "fd-passing", user may ask: > > > > > > > > Ok, what is it? Allow QEMU to pass some fds through migration stream, if it > > > > supports fds? Which fds? Why to pass them? Finally, why QEMU can't just check > > > > is it unix socket or not, and pass any fds it wants if it is? > > > > > > > > Logical question is, why not just drop the global capability, and check only > > > > is it unix socket or not? (OK, relying only on socket type is wrong anyway, > > > > as it may be some complex tunneling, which includes unix sockets, but still > > > > can't pass fds, but I think now about feature naming) > > > > > > > > But we really want an explicit switch for the feature. As qemu-update is > > > > not the only case of local migration. The another case is changing the > > > > backend. So for the user's choice is: > > > > > > > > 1. Remote migration: we can't reuse backends (files, sockets, host devices), as > > > > we are moving to another host. So, we don't enable "backend-transfer". We don't > > > > transfer the backend, we have to initialize new backend on another host. > > > > > > > > 2. Local migration to update QEMU, with minimal freeze-time and minimal > > > > extra actions: use "backend-transfer", exactly to keep the backends > > > > (vhost-user-server, TAP device in kernel, in-kernel vfio device state, etc) > > > > as is. > > > > > > > > 3. Local migration, but we want to reconfigure some backend, or switch > > > > to another backend. We disable "backend-transfer" for one device. > > > > > > This implies that you're changing 'backend-transfer' against the > > > device at time of each migration. > > > > > > This takes us back to the situation we've had historically where the > > > behaviour of migration depends on global properties the mgmt app has > > > set prior to the 'migrate' command being run. We've just tried to get > > > away from that model by passing everything as parameters to the > > > migrate command, so I'm loathe to see us invent a new way to have > > > global state properties changing migration behaviour. > > > > > > This 'backend-transfer' device property is not really a device property, > > > it is an indirect parameter to the 'migrate' command. > > > > > > Ergo, if we need the ability to selectively migrate the backend state > > > of individal devices, then instead of a property on the device, we > > > should pass a list of device IDs as a parameter to the migrate > > > command in QMP. > > > > Understand. > > > > So, it will look like > > > > # @backend-transfer: List of devices IDs or QOM paths, to enable > > # backend-transfer for. In general that means that backend > > # states and their file descriptors are passed to the destination > > # in the migration channel (which must be a UNIX socket), and > > # management tool doesn't have to configure new backends for > > # target QEMU (like vhost-user server, or TAP device in the kernel). > > # Default is no backend-transfer migration (Since 10.2) > > > > > > Peter, is it OK for you? > > Or, may be, we just can continue with two simple experimental boolean parameters: > > @backend-transfer-vhost-user-blk > > and > > @backend-transfer-virtio-net-tap > > > and not care to implement good-final-complex-API, while it's unstable anyway? Even if declared unstable, that still has a negative impact on the internal code structure because its putting special cases for certain device types into the migration framework and the device code, with no time limit on how long this technical debt will last. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
© 2016 - 2025 Red Hat, Inc.