[RFC 1/6] migration: Add virtio-iterative capability

Jonah Palmer posted 6 patches 3 months, 3 weeks ago
[RFC 1/6] migration: Add virtio-iterative capability
Posted by Jonah Palmer 3 months, 3 weeks ago
Adds a new migration capability 'virtio-iterative' that will allow
virtio devices, where supported, to iteratively migrate configuration
changes that occur during the migration process.

This capability is added to the validated capabilities list to ensure
both the source and destination support it before enabling.

The capability defaults to off to maintain backward compatibility.

To enable the capability via HMP:
(qemu) migrate_set_capability virtio-iterative on

To enable the capability via QMP:
{"execute": "migrate-set-capabilities", "arguments": {
     "capabilities": [
        { "capability": "virtio-iterative", "state": true }
     ]
  }
}

Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
---
 migration/savevm.c  | 1 +
 qapi/migration.json | 7 ++++++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index bb04a4520d..40a2189866 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -279,6 +279,7 @@ static bool should_validate_capability(int capability)
     switch (capability) {
     case MIGRATION_CAPABILITY_X_IGNORE_SHARED:
     case MIGRATION_CAPABILITY_MAPPED_RAM:
+    case MIGRATION_CAPABILITY_VIRTIO_ITERATIVE:
         return true;
     default:
         return false;
diff --git a/qapi/migration.json b/qapi/migration.json
index 4963f6ca12..8f042c3ba5 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -479,6 +479,11 @@
 #     each RAM page.  Requires a migration URI that supports seeking,
 #     such as a file.  (since 9.0)
 #
+# @virtio-iterative: Enable iterative migration for virtio devices, if
+#     the device supports it. When enabled, and where supported, virtio
+#     devices will track and migrate configuration changes that may
+#     occur during the migration process. (Since 10.1)
+#
 # Features:
 #
 # @unstable: Members @x-colo and @x-ignore-shared are experimental.
@@ -498,7 +503,7 @@
            { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] },
            'validate-uuid', 'background-snapshot',
            'zero-copy-send', 'postcopy-preempt', 'switchover-ack',
-           'dirty-limit', 'mapped-ram'] }
+           'dirty-limit', 'mapped-ram', 'virtio-iterative'] }
 
 ##
 # @MigrationCapabilityStatus:
-- 
2.47.1
Re: [RFC 1/6] migration: Add virtio-iterative capability
Posted by Markus Armbruster 3 months, 1 week ago
I apologize for the lateness of my review.

Jonah Palmer <jonah.palmer@oracle.com> writes:

> Adds a new migration capability 'virtio-iterative' that will allow
> virtio devices, where supported, to iteratively migrate configuration
> changes that occur during the migration process.

Why is that desirable?

> This capability is added to the validated capabilities list to ensure
> both the source and destination support it before enabling.

What happens when only one side enables it?

> The capability defaults to off to maintain backward compatibility.
>
> To enable the capability via HMP:
> (qemu) migrate_set_capability virtio-iterative on
>
> To enable the capability via QMP:
> {"execute": "migrate-set-capabilities", "arguments": {
>      "capabilities": [
>         { "capability": "virtio-iterative", "state": true }
>      ]
>   }
> }
>
> Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
> ---
>  migration/savevm.c  | 1 +
>  qapi/migration.json | 7 ++++++-
>  2 files changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/migration/savevm.c b/migration/savevm.c
> index bb04a4520d..40a2189866 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -279,6 +279,7 @@ static bool should_validate_capability(int capability)
>      switch (capability) {
>      case MIGRATION_CAPABILITY_X_IGNORE_SHARED:
>      case MIGRATION_CAPABILITY_MAPPED_RAM:
> +    case MIGRATION_CAPABILITY_VIRTIO_ITERATIVE:
>          return true;
>      default:
>          return false;
> diff --git a/qapi/migration.json b/qapi/migration.json
> index 4963f6ca12..8f042c3ba5 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -479,6 +479,11 @@
>  #     each RAM page.  Requires a migration URI that supports seeking,
>  #     such as a file.  (since 9.0)
>  #
> +# @virtio-iterative: Enable iterative migration for virtio devices, if
> +#     the device supports it. When enabled, and where supported, virtio
> +#     devices will track and migrate configuration changes that may
> +#     occur during the migration process. (Since 10.1)

When and why should the user enable this?

What exactly do you mean by "where supported"?

docs/devel/qapi-code-gen.rst:

    For legibility, wrap text paragraphs so every line is at most 70
    characters long.

    Separate sentences with two spaces.

> +#
>  # Features:
>  #
>  # @unstable: Members @x-colo and @x-ignore-shared are experimental.
> @@ -498,7 +503,7 @@
>             { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] },
>             'validate-uuid', 'background-snapshot',
>             'zero-copy-send', 'postcopy-preempt', 'switchover-ack',
> -           'dirty-limit', 'mapped-ram'] }
> +           'dirty-limit', 'mapped-ram', 'virtio-iterative'] }
>  
>  ##
>  # @MigrationCapabilityStatus:
Re: [RFC 1/6] migration: Add virtio-iterative capability
Posted by Jonah Palmer 3 months ago

On 8/8/25 6:48 AM, Markus Armbruster wrote:
> I apologize for the lateness of my review.
> 
> Jonah Palmer <jonah.palmer@oracle.com> writes:
> 
>> Adds a new migration capability 'virtio-iterative' that will allow
>> virtio devices, where supported, to iteratively migrate configuration
>> changes that occur during the migration process.
> 
> Why is that desirable?
> 

To be frank, I wasn't sure if having a migration capability, or even 
have it toggleable at all, would be desirable or not. It appears though 
that this might be better off as a per-device feature set via
--device virtio-net-pci,iterative-mig=on,..., for example.

And by "iteratively migrate configuration changes" I meant more along 
the lines of the device's state as it continues running on the source. 
But perhaps actual configuration changes (e.g. changing the number of 
queue pairs) could also be supported mid-migration like this?

>> This capability is added to the validated capabilities list to ensure
>> both the source and destination support it before enabling.
> 
> What happens when only one side enables it?
> 

The migration stream breaks if only one side enables it.

This is poor wording on my part, my apologies. I don't think it's even 
possible to know the capabilities between the source & destination.

>> The capability defaults to off to maintain backward compatibility.
>>
>> To enable the capability via HMP:
>> (qemu) migrate_set_capability virtio-iterative on
>>
>> To enable the capability via QMP:
>> {"execute": "migrate-set-capabilities", "arguments": {
>>       "capabilities": [
>>          { "capability": "virtio-iterative", "state": true }
>>       ]
>>    }
>> }
>>
>> Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
>> ---
>>   migration/savevm.c  | 1 +
>>   qapi/migration.json | 7 ++++++-
>>   2 files changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/migration/savevm.c b/migration/savevm.c
>> index bb04a4520d..40a2189866 100644
>> --- a/migration/savevm.c
>> +++ b/migration/savevm.c
>> @@ -279,6 +279,7 @@ static bool should_validate_capability(int capability)
>>       switch (capability) {
>>       case MIGRATION_CAPABILITY_X_IGNORE_SHARED:
>>       case MIGRATION_CAPABILITY_MAPPED_RAM:
>> +    case MIGRATION_CAPABILITY_VIRTIO_ITERATIVE:
>>           return true;
>>       default:
>>           return false;
>> diff --git a/qapi/migration.json b/qapi/migration.json
>> index 4963f6ca12..8f042c3ba5 100644
>> --- a/qapi/migration.json
>> +++ b/qapi/migration.json
>> @@ -479,6 +479,11 @@
>>   #     each RAM page.  Requires a migration URI that supports seeking,
>>   #     such as a file.  (since 9.0)
>>   #
>> +# @virtio-iterative: Enable iterative migration for virtio devices, if
>> +#     the device supports it. When enabled, and where supported, virtio
>> +#     devices will track and migrate configuration changes that may
>> +#     occur during the migration process. (Since 10.1)
> 
> When and why should the user enable this?
> 

Well if all goes according to plan, always (at least for virtio-net). 
This should improve the overall speed of live migration for a virtio-net 
device (and vhost-net/vhost-vdpa).

> What exactly do you mean by "where supported"?
> 

I meant if both source's Qemu and destination's Qemu support it, as well 
as for other virtio devices in the future if they decide to implement 
iterative migration (e.g. a more general "enable iterative migration for 
virtio devices").

But I think for now this is better left as a virtio-net configuration 
rather than as a migration capability (e.g. --device 
virtio-net-pci,iterative-mig=on/off,...)

> docs/devel/qapi-code-gen.rst:
> 
>      For legibility, wrap text paragraphs so every line is at most 70
>      characters long.
> 
>      Separate sentences with two spaces.
> 

Ack - thank you.

>> +#
>>   # Features:
>>   #
>>   # @unstable: Members @x-colo and @x-ignore-shared are experimental.
>> @@ -498,7 +503,7 @@
>>              { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] },
>>              'validate-uuid', 'background-snapshot',
>>              'zero-copy-send', 'postcopy-preempt', 'switchover-ack',
>> -           'dirty-limit', 'mapped-ram'] }
>> +           'dirty-limit', 'mapped-ram', 'virtio-iterative'] }
>>   
>>   ##
>>   # @MigrationCapabilityStatus:
>
Re: [RFC 1/6] migration: Add virtio-iterative capability
Posted by Markus Armbruster 2 months, 3 weeks ago
Please excuse the delay, I was on vacation.

Jonah Palmer <jonah.palmer@oracle.com> writes:

> On 8/8/25 6:48 AM, Markus Armbruster wrote:
>> I apologize for the lateness of my review.

Late again: I was on vacation.

>> Jonah Palmer <jonah.palmer@oracle.com> writes:
>> 
>>> Adds a new migration capability 'virtio-iterative' that will allow
>>> virtio devices, where supported, to iteratively migrate configuration
>>> changes that occur during the migration process.
>> 
>> Why is that desirable?
>
> To be frank, I wasn't sure if having a migration capability, or even 
> have it toggleable at all, would be desirable or not. It appears though 
> that this might be better off as a per-device feature set via
> --device virtio-net-pci,iterative-mig=on,..., for example.

See below.

> And by "iteratively migrate configuration changes" I meant more along 
> the lines of the device's state as it continues running on the source.

Isn't that what migration does always?

> But perhaps actual configuration changes (e.g. changing the number of 
> queue pairs) could also be supported mid-migration like this?

I don't know.

>>> This capability is added to the validated capabilities list to ensure
>>> both the source and destination support it before enabling.
>> 
>> What happens when only one side enables it?
>
> The migration stream breaks if only one side enables it.

How does it break?  Error message pointing out the misconfiguration?

> This is poor wording on my part, my apologies. I don't think it's even 
> possible to know the capabilities between the source & destination.
>
>>> The capability defaults to off to maintain backward compatibility.
>>>
>>> To enable the capability via HMP:
>>> (qemu) migrate_set_capability virtio-iterative on
>>>
>>> To enable the capability via QMP:
>>> {"execute": "migrate-set-capabilities", "arguments": {
>>>       "capabilities": [
>>>          { "capability": "virtio-iterative", "state": true }
>>>       ]
>>>    }
>>> }
>>>
>>> Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
>>> ---
>>>  migration/savevm.c  | 1 +
>>>  qapi/migration.json | 7 ++++++-
>>>  2 files changed, 7 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/migration/savevm.c b/migration/savevm.c
>>> index bb04a4520d..40a2189866 100644
>>> --- a/migration/savevm.c
>>> +++ b/migration/savevm.c
>>> @@ -279,6 +279,7 @@ static bool should_validate_capability(int capability)
>>>      switch (capability) {
>>>      case MIGRATION_CAPABILITY_X_IGNORE_SHARED:
>>>      case MIGRATION_CAPABILITY_MAPPED_RAM:
>>> +    case MIGRATION_CAPABILITY_VIRTIO_ITERATIVE:
>>>          return true;
>>>      default:
>>>          return false;
>>> diff --git a/qapi/migration.json b/qapi/migration.json
>>> index 4963f6ca12..8f042c3ba5 100644
>>> --- a/qapi/migration.json
>>> +++ b/qapi/migration.json
>>> @@ -479,6 +479,11 @@
>>>  #     each RAM page.  Requires a migration URI that supports seeking,
>>>  #     such as a file.  (since 9.0)
>>>  #
>>> +# @virtio-iterative: Enable iterative migration for virtio devices, if
>>> +#     the device supports it. When enabled, and where supported, virtio
>>> +#     devices will track and migrate configuration changes that may
>>> +#     occur during the migration process. (Since 10.1)
>> 
>> When and why should the user enable this?
>
> Well if all goes according to plan, always (at least for virtio-net). 
> This should improve the overall speed of live migration for a virtio-net 
> device (and vhost-net/vhost-vdpa).

So the only use for "disabled" would be when migrating to or from an
older version of QEMU that doesn't support this.  Fair?

What's the default?

>> What exactly do you mean by "where supported"?
>
> I meant if both source's Qemu and destination's Qemu support it, as well 
> as for other virtio devices in the future if they decide to implement 
> iterative migration (e.g. a more general "enable iterative migration for 
> virtio devices").
>
> But I think for now this is better left as a virtio-net configuration 
> rather than as a migration capability (e.g. --device 
> virtio-net-pci,iterative-mig=on/off,...)

Makes sense to me (but I'm not a migration expert).

[...]
Re: [RFC 1/6] migration: Add virtio-iterative capability
Posted by Jonah Palmer 2 months, 3 weeks ago

On 8/25/25 8:44 AM, Markus Armbruster wrote:
> Please excuse the delay, I was on vacation.
> 
> Jonah Palmer <jonah.palmer@oracle.com> writes:
> 
>> On 8/8/25 6:48 AM, Markus Armbruster wrote:
>>> I apologize for the lateness of my review.
> 
> Late again: I was on vacation.
> 
>>> Jonah Palmer <jonah.palmer@oracle.com> writes:
>>>
>>>> Adds a new migration capability 'virtio-iterative' that will allow
>>>> virtio devices, where supported, to iteratively migrate configuration
>>>> changes that occur during the migration process.
>>>
>>> Why is that desirable?
>>
>> To be frank, I wasn't sure if having a migration capability, or even
>> have it toggleable at all, would be desirable or not. It appears though
>> that this might be better off as a per-device feature set via
>> --device virtio-net-pci,iterative-mig=on,..., for example.
> 
> See below.
> 
>> And by "iteratively migrate configuration changes" I meant more along
>> the lines of the device's state as it continues running on the source.
> 
> Isn't that what migration does always?
> 

Essentially yes, but today all of the state is only migrated at the end, 
once the source has been paused. So the final correct state is always 
sent to the destination.

If we're no longer waiting until the source has been paused and the 
initial state is sent early, then we need to make sure that any changes 
that happen is still communicated to the destination.

This RFC handles this by just re-sending the entire state again once the 
source has been paused. But of course this isn't optimal and I'm looking 
into how to better optimize this part.

>> But perhaps actual configuration changes (e.g. changing the number of
>> queue pairs) could also be supported mid-migration like this?
> 
> I don't know.
> 
>>>> This capability is added to the validated capabilities list to ensure
>>>> both the source and destination support it before enabling.
>>>
>>> What happens when only one side enables it?
>>
>> The migration stream breaks if only one side enables it.
> 
> How does it break?  Error message pointing out the misconfiguration?
> 

The destination VM is torn down and the source just reports that 
migration failed.

I don't believe the source/destination could be aware of the 
misconfiguration. IIUC the destination reads the migration stream and 
expects certain pieces of data in a certain order. If new data is added 
to the migration stream or the order has changed and the destination 
isn't expecting it, then the migration fails. It doesn't know exactly 
why, just that it read-in data that it wasn't expecting.

>> This is poor wording on my part, my apologies. I don't think it's even
>> possible to know the capabilities between the source & destination.
>>
>>>> The capability defaults to off to maintain backward compatibility.
>>>>
>>>> To enable the capability via HMP:
>>>> (qemu) migrate_set_capability virtio-iterative on
>>>>
>>>> To enable the capability via QMP:
>>>> {"execute": "migrate-set-capabilities", "arguments": {
>>>>        "capabilities": [
>>>>           { "capability": "virtio-iterative", "state": true }
>>>>        ]
>>>>     }
>>>> }
>>>>
>>>> Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
>>>> ---
>>>>   migration/savevm.c  | 1 +
>>>>   qapi/migration.json | 7 ++++++-
>>>>   2 files changed, 7 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/migration/savevm.c b/migration/savevm.c
>>>> index bb04a4520d..40a2189866 100644
>>>> --- a/migration/savevm.c
>>>> +++ b/migration/savevm.c
>>>> @@ -279,6 +279,7 @@ static bool should_validate_capability(int capability)
>>>>       switch (capability) {
>>>>       case MIGRATION_CAPABILITY_X_IGNORE_SHARED:
>>>>       case MIGRATION_CAPABILITY_MAPPED_RAM:
>>>> +    case MIGRATION_CAPABILITY_VIRTIO_ITERATIVE:
>>>>           return true;
>>>>       default:
>>>>           return false;
>>>> diff --git a/qapi/migration.json b/qapi/migration.json
>>>> index 4963f6ca12..8f042c3ba5 100644
>>>> --- a/qapi/migration.json
>>>> +++ b/qapi/migration.json
>>>> @@ -479,6 +479,11 @@
>>>>   #     each RAM page.  Requires a migration URI that supports seeking,
>>>>   #     such as a file.  (since 9.0)
>>>>   #
>>>> +# @virtio-iterative: Enable iterative migration for virtio devices, if
>>>> +#     the device supports it. When enabled, and where supported, virtio
>>>> +#     devices will track and migrate configuration changes that may
>>>> +#     occur during the migration process. (Since 10.1)
>>>
>>> When and why should the user enable this?
>>
>> Well if all goes according to plan, always (at least for virtio-net).
>> This should improve the overall speed of live migration for a virtio-net
>> device (and vhost-net/vhost-vdpa).
> 
> So the only use for "disabled" would be when migrating to or from an
> older version of QEMU that doesn't support this.  Fair?
> 

Correct.

> What's the default?
> 

Disabled.

>>> What exactly do you mean by "where supported"?
>>
>> I meant if both source's Qemu and destination's Qemu support it, as well
>> as for other virtio devices in the future if they decide to implement
>> iterative migration (e.g. a more general "enable iterative migration for
>> virtio devices").
>>
>> But I think for now this is better left as a virtio-net configuration
>> rather than as a migration capability (e.g. --device
>> virtio-net-pci,iterative-mig=on/off,...)
> 
> Makes sense to me (but I'm not a migration expert).
> 
> [...]
>
Re: [RFC 1/6] migration: Add virtio-iterative capability
Posted by Markus Armbruster 2 months, 3 weeks ago
Jonah Palmer <jonah.palmer@oracle.com> writes:

> On 8/25/25 8:44 AM, Markus Armbruster wrote:

[...]

>> Jonah Palmer <jonah.palmer@oracle.com> writes:
>> 
>>> On 8/8/25 6:48 AM, Markus Armbruster wrote:

[...]

>>>> Jonah Palmer <jonah.palmer@oracle.com> writes:
>>>>> Adds a new migration capability 'virtio-iterative' that will allow
>>>>> virtio devices, where supported, to iteratively migrate configuration
>>>>> changes that occur during the migration process.
>>>>
>>>> Why is that desirable?
>>>
>>> To be frank, I wasn't sure if having a migration capability, or even
>>> have it toggleable at all, would be desirable or not. It appears though
>>> that this might be better off as a per-device feature set via
>>> --device virtio-net-pci,iterative-mig=on,..., for example.
>> 
>> See below.
>> 
>>> And by "iteratively migrate configuration changes" I meant more along
>>> the lines of the device's state as it continues running on the source.
>> 
>> Isn't that what migration does always?
>
> Essentially yes, but today all of the state is only migrated at the end, once the source has been paused. So the final correct state is always sent to the destination.

As far as I understand (and ignoring lots of detail, including post
copy), we have three stages:

1. Source runs, migrate memory pages.  Pages that get dirtied after they
are migrated need to be migrated again.

2. Neither source or destination runs, migrate remaining memory pages
and device state.

3. Destination starts to run.

If the duration of stage 2 (downtime) was of no concern, we'd switch to
it immediately, i.e. without migrating anything in stage 1.  This would
minimize I/O.

Of course, we actually care for limiting downtime.  We switch to stage 2
when "little enough" is left for stage two to migrate.

> If we're no longer waiting until the source has been paused and the initial state is sent early, then we need to make sure that any changes that happen is still communicated to the destination.

So you're proposing to treat suitable parts of the device state more
like memory pages.  Correct?

Cover letter and commit message of PATCH 4 provide the motivation: you
observe a shorter downtime.  You speculate this is due to moving "heavy
allocations and page-fault latencies" from stage 2 to stage 1.  Correct?

Is there anything that makes virtio-net particularly suitable?

I think this patch's commit message should at least hint at the
motivation at a high level.  Details like measurements are best left to
PATCH 4.

> This RFC handles this by just re-sending the entire state again once the source has been paused. But of course this isn't optimal and I'm looking into how to better optimize this part.

How much is the entire state?

>>> But perhaps actual configuration changes (e.g. changing the number of
>>> queue pairs) could also be supported mid-migration like this?
>>
>> I don't know.
>> 
>>>>> This capability is added to the validated capabilities list to ensure
>>>>> both the source and destination support it before enabling.
>>>>
>>>> What happens when only one side enables it?
>>>
>>> The migration stream breaks if only one side enables it.
>>
>> How does it break?  Error message pointing out the misconfiguration?
>> 
>
> The destination VM is torn down and the source just reports that migration failed.

Exact same failure as for other misconfigurations, like missing a device
on the destination?

> I don't believe the source/destination could be aware of the misconfiguration. IIUC the destination reads the migration stream and expects certain pieces of data in a certain order. If new data is added to the migration stream or the order has changed and the destination isn't expecting it, then the migration fails. It doesn't know exactly why, just that it read-in data that it wasn't expecting.
>
>>> This is poor wording on my part, my apologies. I don't think it's even
>>> possible to know the capabilities between the source & destination.
>>>
>>>>> The capability defaults to off to maintain backward compatibility.
>>>>>
>>>>> To enable the capability via HMP:
>>>>> (qemu) migrate_set_capability virtio-iterative on
>>>>>
>>>>> To enable the capability via QMP:
>>>>> {"execute": "migrate-set-capabilities", "arguments": {
>>>>>        "capabilities": [
>>>>>           { "capability": "virtio-iterative", "state": true }
>>>>>        ]
>>>>>     }
>>>>> }
>>>>>
>>>>> Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>

[...]

>>>>> diff --git a/qapi/migration.json b/qapi/migration.json
>>>>> index 4963f6ca12..8f042c3ba5 100644
>>>>> --- a/qapi/migration.json
>>>>> +++ b/qapi/migration.json
>>>>> @@ -479,6 +479,11 @@
>>>>>  #     each RAM page.  Requires a migration URI that supports seeking,
>>>>>  #     such as a file.  (since 9.0)
>>>>>  #
>>>>> +# @virtio-iterative: Enable iterative migration for virtio devices, if
>>>>> +#     the device supports it. When enabled, and where supported, virtio
>>>>> +#     devices will track and migrate configuration changes that may
>>>>> +#     occur during the migration process. (Since 10.1)
>>>>
>>>> When and why should the user enable this?
>>>
>>> Well if all goes according to plan, always (at least for virtio-net).
>>> This should improve the overall speed of live migration for a virtio-net
>>> device (and vhost-net/vhost-vdpa).
>> 
>> So the only use for "disabled" would be when migrating to or from an
>> older version of QEMU that doesn't support this.  Fair?
>
> Correct.
>
>> What's the default?
>
> Disabled.

Awkward for something that should always be enabled.  But see below.

Please document defaults in the doc comment.

>>>> What exactly do you mean by "where supported"?
>>>
>>> I meant if both source's Qemu and destination's Qemu support it, as well
>>> as for other virtio devices in the future if they decide to implement
>>> iterative migration (e.g. a more general "enable iterative migration for
>>> virtio devices").
>>>
>>> But I think for now this is better left as a virtio-net configuration
>>> rather than as a migration capability (e.g. --device
>>> virtio-net-pci,iterative-mig=on/off,...)
>> 
>> Makes sense to me (but I'm not a migration expert).

A device property's default can depend on the machine type via compat
properties.  This is normally used to restrict a guest-visible change to
newer machine types.  Here, it's not guest-visible.  But it can get you
this:

* Migrate new machine type from new QEMU to new QEMU (old QEMU doesn't
  have the machine type): iterative is enabled by default.  Good.  User
  can disable it on both ends to not get the improvement.  Enabling it
  on just one breaks migration.

  All other cases go away with time.

* Migrate old machine type from new QEMU to new QEMU: iterative is
  disabled by default, which is sad, but no worse than before.  User can
  enable it on both ends to get the improvement.  Enabling it on just
  one breaks migration.

* Migrate old machine type from new QEMU to old QEMU or vice versa:
  iterative is off by default.  Good.  Enabling it on the new one breaks
  migration.

* Migrate old machine type from old QEMU to old QEMU: iterative is off

I figure almost all users could simply ignore this configuration knob
then.

>> [...]
Re: [RFC 1/6] migration: Add virtio-iterative capability
Posted by Jonah Palmer 2 months, 2 weeks ago

On 8/26/25 2:11 AM, Markus Armbruster wrote:
> Jonah Palmer <jonah.palmer@oracle.com> writes:
> 
>> On 8/25/25 8:44 AM, Markus Armbruster wrote:
> 
> [...]
> 
>>> Jonah Palmer <jonah.palmer@oracle.com> writes:
>>>
>>>> On 8/8/25 6:48 AM, Markus Armbruster wrote:
> 
> [...]
> 
>>>>> Jonah Palmer <jonah.palmer@oracle.com> writes:
>>>>>> Adds a new migration capability 'virtio-iterative' that will allow
>>>>>> virtio devices, where supported, to iteratively migrate configuration
>>>>>> changes that occur during the migration process.
>>>>>
>>>>> Why is that desirable?
>>>>
>>>> To be frank, I wasn't sure if having a migration capability, or even
>>>> have it toggleable at all, would be desirable or not. It appears though
>>>> that this might be better off as a per-device feature set via
>>>> --device virtio-net-pci,iterative-mig=on,..., for example.
>>>
>>> See below.
>>>
>>>> And by "iteratively migrate configuration changes" I meant more along
>>>> the lines of the device's state as it continues running on the source.
>>>
>>> Isn't that what migration does always?
>>
>> Essentially yes, but today all of the state is only migrated at the end, once the source has been paused. So the final correct state is always sent to the destination.
> 
> As far as I understand (and ignoring lots of detail, including post
> copy), we have three stages:
> 
> 1. Source runs, migrate memory pages.  Pages that get dirtied after they
> are migrated need to be migrated again.
> 
> 2. Neither source or destination runs, migrate remaining memory pages
> and device state.
> 
> 3. Destination starts to run.
> 
> If the duration of stage 2 (downtime) was of no concern, we'd switch to
> it immediately, i.e. without migrating anything in stage 1.  This would
> minimize I/O.
> 
> Of course, we actually care for limiting downtime.  We switch to stage 2
> when "little enough" is left for stage two to migrate.
> 
>> If we're no longer waiting until the source has been paused and the initial state is sent early, then we need to make sure that any changes that happen is still communicated to the destination.
> 
> So you're proposing to treat suitable parts of the device state more
> like memory pages.  Correct?
> 

Not in the sense of "something got dirtied so let's immediately re-send 
that" like we would with RAM. It's more along the lines of "something 
got dirtied so let's make sure that gets re-sent at the start of stage 2".

The entire state of a virtio-net device (even with vhost-net / 
vhost-vDPA) is <10KB I believe. I don't believe there's much to gain by 
"iteratively" re-sending changes for virtio-net. It should be suitable 
enough to just re-send whatever changed during stage 1 (after the 
initial state was sent) at the start of stage 2.

This is why I'm currently looking into a solution that uses VMSD's 
.early_setup flag (that Peter recommended) rather than implementing a 
suite of SaveVMHandlers hooks (like this RFC does). We don't need this 
iterative capability as much as we need to start migrating the state 
earlier (and doing corresponding config/prep work) during stage 1.

> Cover letter and commit message of PATCH 4 provide the motivation: you
> observe a shorter downtime.  You speculate this is due to moving "heavy
> allocations and page-fault latencies" from stage 2 to stage 1.  Correct?
> 

Correct. But again I'd like to stress that this is just one part in 
reducing downtime during stage 2. The biggest reductions will come from 
the config/prep work that we're trying to move from stage 2 to stage 1, 
especially when vhost-vDPA is involved. And we can only do this early 
work once we have the state, hence why we're sending it earlier.

> Is there anything that makes virtio-net particularly suitable?
> 

Yes, especially with vhost-vDPA and configuring VQs. See Eugenio's 
comment here 
https://lore.kernel.org/qemu-devel/CAJaqyWdUutZrAWKy9d=ip+h+y3BnptUrcL8Xj06XfizNxPtfpw@mail.gmail.com/.

> I think this patch's commit message should at least hint at the
> motivation at a high level.  Details like measurements are best left to
> PATCH 4.
> 

You're right, this was my bad for not framing this RFC more clearly and 
the true motivations behind it. I will certainly be more direct and 
descriptive in the next RFC for this effort.

>> This RFC handles this by just re-sending the entire state again once the source has been paused. But of course this isn't optimal and I'm looking into how to better optimize this part.
> 
> How much is the entire state?
> 

I'm not exactly sure how large it is but it should be <10KB even with 
vhost-vDPA. It could be slightly larger if we really up the number of 
queue pairs and/or have huge MAC/multicast lists.

>>>> But perhaps actual configuration changes (e.g. changing the number of
>>>> queue pairs) could also be supported mid-migration like this?
>>>
>>> I don't know.
>>>
>>>>>> This capability is added to the validated capabilities list to ensure
>>>>>> both the source and destination support it before enabling.
>>>>>
>>>>> What happens when only one side enables it?
>>>>
>>>> The migration stream breaks if only one side enables it.
>>>
>>> How does it break?  Error message pointing out the misconfiguration?
>>>
>>
>> The destination VM is torn down and the source just reports that migration failed.
> 
> Exact same failure as for other misconfigurations, like missing a device
> on the destination?
> 

I hesitate to say "exact" but for example, when missing a device on one 
side you might see something like below (I removed a serial device):

qemu-system-x86_64: Unknown ramblock "0000:00:03.0/virtio-net-pci.rom", 
cannot accept migration
qemu-system-x86_64: error while loading state for instance 0x0 of device 
'ram'
qemu-system-x86_64: load of migration failed: Invalid argument
...

The expected order gets messed up and eventually the wrong data will end 
up somewhere else. In this case it was the RAM.

>> I don't believe the source/destination could be aware of the misconfiguration. IIUC the destination reads the migration stream and expects certain pieces of data in a certain order. If new data is added to the migration stream or the order has changed and the destination isn't expecting it, then the migration fails. It doesn't know exactly why, just that it read-in data that it wasn't expecting.
>>
>>>> This is poor wording on my part, my apologies. I don't think it's even
>>>> possible to know the capabilities between the source & destination.
>>>>
>>>>>> The capability defaults to off to maintain backward compatibility.
>>>>>>
>>>>>> To enable the capability via HMP:
>>>>>> (qemu) migrate_set_capability virtio-iterative on
>>>>>>
>>>>>> To enable the capability via QMP:
>>>>>> {"execute": "migrate-set-capabilities", "arguments": {
>>>>>>         "capabilities": [
>>>>>>            { "capability": "virtio-iterative", "state": true }
>>>>>>         ]
>>>>>>      }
>>>>>> }
>>>>>>
>>>>>> Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
> 
> [...]
> 
>>>>>> diff --git a/qapi/migration.json b/qapi/migration.json
>>>>>> index 4963f6ca12..8f042c3ba5 100644
>>>>>> --- a/qapi/migration.json
>>>>>> +++ b/qapi/migration.json
>>>>>> @@ -479,6 +479,11 @@
>>>>>>   #     each RAM page.  Requires a migration URI that supports seeking,
>>>>>>   #     such as a file.  (since 9.0)
>>>>>>   #
>>>>>> +# @virtio-iterative: Enable iterative migration for virtio devices, if
>>>>>> +#     the device supports it. When enabled, and where supported, virtio
>>>>>> +#     devices will track and migrate configuration changes that may
>>>>>> +#     occur during the migration process. (Since 10.1)
>>>>>
>>>>> When and why should the user enable this?
>>>>
>>>> Well if all goes according to plan, always (at least for virtio-net).
>>>> This should improve the overall speed of live migration for a virtio-net
>>>> device (and vhost-net/vhost-vdpa).
>>>
>>> So the only use for "disabled" would be when migrating to or from an
>>> older version of QEMU that doesn't support this.  Fair?
>>
>> Correct.
>>
>>> What's the default?
>>
>> Disabled.
> 
> Awkward for something that should always be enabled.  But see below.
> 
> Please document defaults in the doc comment.
> 

Ack.

>>>>> What exactly do you mean by "where supported"?
>>>>
>>>> I meant if both source's Qemu and destination's Qemu support it, as well
>>>> as for other virtio devices in the future if they decide to implement
>>>> iterative migration (e.g. a more general "enable iterative migration for
>>>> virtio devices").
>>>>
>>>> But I think for now this is better left as a virtio-net configuration
>>>> rather than as a migration capability (e.g. --device
>>>> virtio-net-pci,iterative-mig=on/off,...)
>>>
>>> Makes sense to me (but I'm not a migration expert).
> 
> A device property's default can depend on the machine type via compat
> properties.  This is normally used to restrict a guest-visible change to
> newer machine types.  Here, it's not guest-visible.  But it can get you
> this:
> 
> * Migrate new machine type from new QEMU to new QEMU (old QEMU doesn't
>    have the machine type): iterative is enabled by default.  Good.  User
>    can disable it on both ends to not get the improvement.  Enabling it
>    on just one breaks migration.
> 
>    All other cases go away with time.
> 
> * Migrate old machine type from new QEMU to new QEMU: iterative is
>    disabled by default, which is sad, but no worse than before.  User can
>    enable it on both ends to get the improvement.  Enabling it on just
>    one breaks migration.
> 
> * Migrate old machine type from new QEMU to old QEMU or vice versa:
>    iterative is off by default.  Good.  Enabling it on the new one breaks
>    migration.
> 
> * Migrate old machine type from old QEMU to old QEMU: iterative is off
> 
> I figure almost all users could simply ignore this configuration knob
> then.
> 

Oh, that's interesting. I wasn't aware of this. But couldn't this 
potentially cause some headaches and confusion when attempting to 
migrate between 2 guests where one VM is using a machine type does 
support it and the other isn't?

For example, the source and destination VMs both specify '-machine 
q35,...' and the q35 alias resolves into, say, pc-q35-10.1 for the 
source VM and pc-q35-10.0 for the destination VM. And say this property 
is supported on >= pc-q35-10.1.

IIUC, this would mean that iterative is enabled by default on the source 
VM but disabled by default on the destination VM.

Then a user attempts the migration, the migration fails, and then they'd 
have to try and figure out why it's failing.

Furthermore, since it's a device property that's essentially set at VM 
creation time, either the source would have to be reset and explicitly 
set this property to off or the destination would have to be reset and 
use a newer (>= pc-q35-10.1) machine type before starting it back up and 
perform the migration.

Am I understanding this correctly?

>>> [...]
>
Re: [RFC 1/6] migration: Add virtio-iterative capability
Posted by Markus Armbruster 2 months, 2 weeks ago
Jonah Palmer <jonah.palmer@oracle.com> writes:

> On 8/26/25 2:11 AM, Markus Armbruster wrote:
>> Jonah Palmer <jonah.palmer@oracle.com> writes:
>> 
>>> On 8/25/25 8:44 AM, Markus Armbruster wrote:
>> 
>> [...]
>> 
>>>> Jonah Palmer <jonah.palmer@oracle.com> writes:
>>>>
>>>>> On 8/8/25 6:48 AM, Markus Armbruster wrote:
>> 
>> [...]
>> 
>>>>>> Jonah Palmer <jonah.palmer@oracle.com> writes:
>>>>>>> Adds a new migration capability 'virtio-iterative' that will allow
>>>>>>> virtio devices, where supported, to iteratively migrate configuration
>>>>>>> changes that occur during the migration process.
>>>>>>
>>>>>> Why is that desirable?
>>>>>
>>>>> To be frank, I wasn't sure if having a migration capability, or even
>>>>> have it toggleable at all, would be desirable or not. It appears though
>>>>> that this might be better off as a per-device feature set via
>>>>> --device virtio-net-pci,iterative-mig=on,..., for example.
>>>>
>>>> See below.
>>>>
>>>>> And by "iteratively migrate configuration changes" I meant more along
>>>>> the lines of the device's state as it continues running on the source.
>>>>
>>>> Isn't that what migration does always?
>>>
>>> Essentially yes, but today all of the state is only migrated at the end, once the source has been paused. So the final correct state is always sent to the destination.
>> 
>> As far as I understand (and ignoring lots of detail, including post
>> copy), we have three stages:
>> 
>> 1. Source runs, migrate memory pages.  Pages that get dirtied after they
>> are migrated need to be migrated again.
>> 
>> 2. Neither source or destination runs, migrate remaining memory pages
>> and device state.
>> 
>> 3. Destination starts to run.
>> 
>> If the duration of stage 2 (downtime) was of no concern, we'd switch to
>> it immediately, i.e. without migrating anything in stage 1.  This would
>> minimize I/O.
>> 
>> Of course, we actually care for limiting downtime.  We switch to stage 2
>> when "little enough" is left for stage two to migrate.
>> 
>>> If we're no longer waiting until the source has been paused and the initial state is sent early, then we need to make sure that any changes that happen is still communicated to the destination.
>> 
>> So you're proposing to treat suitable parts of the device state more
>> like memory pages.  Correct?
>> 
>
> Not in the sense of "something got dirtied so let's immediately re-send 
> that" like we would with RAM. It's more along the lines of "something 
> got dirtied so let's make sure that gets re-sent at the start of stage 2".

Or is it "something might have dirtied, just resend in stage 2"?

> The entire state of a virtio-net device (even with vhost-net / 
> vhost-vDPA) is <10KB I believe. I don't believe there's much to gain by 
> "iteratively" re-sending changes for virtio-net. It should be suitable 
> enough to just re-send whatever changed during stage 1 (after the 
> initial state was sent) at the start of stage 2.

Got it.

> This is why I'm currently looking into a solution that uses VMSD's 
> .early_setup flag (that Peter recommended) rather than implementing a 
> suite of SaveVMHandlers hooks (like this RFC does). We don't need this 
> iterative capability as much as we need to start migrating the state 
> earlier (and doing corresponding config/prep work) during stage 1.
>
>> Cover letter and commit message of PATCH 4 provide the motivation: you
>> observe a shorter downtime.  You speculate this is due to moving "heavy
>> allocations and page-fault latencies" from stage 2 to stage 1.  Correct?
>
> Correct. But again I'd like to stress that this is just one part in 
> reducing downtime during stage 2. The biggest reductions will come from 
> the config/prep work that we're trying to move from stage 2 to stage 1, 
> especially when vhost-vDPA is involved. And we can only do this early 
> work once we have the state, hence why we're sending it earlier.

This is an important bit of detail I've been missing so far.  Easy
enough to fix in a future commit message and cover letter.

>> Is there anything that makes virtio-net particularly suitable?
>
> Yes, especially with vhost-vDPA and configuring VQs. See Eugenio's 
> comment here 
> https://lore.kernel.org/qemu-devel/CAJaqyWdUutZrAWKy9d=ip+h+y3BnptUrcL8Xj06XfizNxPtfpw@mail.gmail.com/.

Such prep work commonly depends only on device configuration, not state.
I'm curious: what state bits exactly does the prep work need?

Device configuration is available at the start of stage 1, state is
fully available only at the end of stage 2.

Your patches make *tentative* device state available in stage 1.
Tentative, because it may still change afterwards.

You use tentative state to do certain expensive work in stage 1 already,
in order to cut downtime in stage 2.

Fair?

Can state change in ways that invalidate this work?

If yes, how do you handle this?

If no, do you verify the "no change" design assumption holds?

>> I think this patch's commit message should at least hint at the
>> motivation at a high level.  Details like measurements are best left to
>> PATCH 4.
>
> You're right, this was my bad for not framing this RFC more clearly and 
> the true motivations behind it. I will certainly be more direct and 
> descriptive in the next RFC for this effort.
>
>>> This RFC handles this by just re-sending the entire state again once the source has been paused. But of course this isn't optimal and I'm looking into how to better optimize this part.
>> 
>> How much is the entire state?
>
> I'm not exactly sure how large it is but it should be <10KB even with 
> vhost-vDPA. It could be slightly larger if we really up the number of 
> queue pairs and/or have huge MAC/multicast lists.

No worries then.

>>>>> But perhaps actual configuration changes (e.g. changing the number of
>>>>> queue pairs) could also be supported mid-migration like this?
>>>>
>>>> I don't know.
>>>>
>>>>>>> This capability is added to the validated capabilities list to ensure
>>>>>>> both the source and destination support it before enabling.
>>>>>>
>>>>>> What happens when only one side enables it?
>>>>>
>>>>> The migration stream breaks if only one side enables it.
>>>>
>>>> How does it break?  Error message pointing out the misconfiguration?
>>>>
>>>
>>> The destination VM is torn down and the source just reports that migration failed.
>> 
>> Exact same failure as for other misconfigurations, like missing a device
>> on the destination?
>
> I hesitate to say "exact" but for example, when missing a device on one 
> side you might see something like below (I removed a serial device):
>
> qemu-system-x86_64: Unknown ramblock "0000:00:03.0/virtio-net-pci.rom", 
> cannot accept migration
> qemu-system-x86_64: error while loading state for instance 0x0 of device 
> 'ram'
> qemu-system-x86_64: load of migration failed: Invalid argument
> ...

Aside: ugly error cascade due to migration's well-known failure to
propagate errors up properly.

> The expected order gets messed up and eventually the wrong data will end 
> up somewhere else. In this case it was the RAM.

It's messy.  If we started on a green field today, we'd do better, I
hope.

What error message do you observe when only one side enables
@virtio-iterative?  Question is moot if you plan to switch to a
different interface.  Answer it for that interface in a commit message
then.

>>> I don't believe the source/destination could be aware of the misconfiguration. IIUC the destination reads the migration stream and expects certain pieces of data in a certain order. If new data is added to the migration stream or the order has changed and the destination isn't expecting it, then the migration fails. It doesn't know exactly why, just that it read-in data that it wasn't expecting.
>>>
>>>>> This is poor wording on my part, my apologies. I don't think it's even
>>>>> possible to know the capabilities between the source & destination.
>>>>>
>>>>>>> The capability defaults to off to maintain backward compatibility.
>>>>>>>
>>>>>>> To enable the capability via HMP:
>>>>>>> (qemu) migrate_set_capability virtio-iterative on
>>>>>>>
>>>>>>> To enable the capability via QMP:
>>>>>>> {"execute": "migrate-set-capabilities", "arguments": {
>>>>>>>         "capabilities": [
>>>>>>>            { "capability": "virtio-iterative", "state": true }
>>>>>>>         ]
>>>>>>>      }
>>>>>>> }
>>>>>>>
>>>>>>> Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
>> 
>> [...]
>> 
>>>>>>> diff --git a/qapi/migration.json b/qapi/migration.json
>>>>>>> index 4963f6ca12..8f042c3ba5 100644
>>>>>>> --- a/qapi/migration.json
>>>>>>> +++ b/qapi/migration.json
>>>>>>> @@ -479,6 +479,11 @@
>>>>>>>   #     each RAM page.  Requires a migration URI that supports seeking,
>>>>>>>   #     such as a file.  (since 9.0)
>>>>>>>   #
>>>>>>> +# @virtio-iterative: Enable iterative migration for virtio devices, if
>>>>>>> +#     the device supports it. When enabled, and where supported, virtio
>>>>>>> +#     devices will track and migrate configuration changes that may
>>>>>>> +#     occur during the migration process. (Since 10.1)
>>>>>>
>>>>>> When and why should the user enable this?
>>>>>
>>>>> Well if all goes according to plan, always (at least for virtio-net).
>>>>> This should improve the overall speed of live migration for a virtio-net
>>>>> device (and vhost-net/vhost-vdpa).
>>>>
>>>> So the only use for "disabled" would be when migrating to or from an
>>>> older version of QEMU that doesn't support this.  Fair?
>>>
>>> Correct.
>>>
>>>> What's the default?
>>>
>>> Disabled.
>> 
>> Awkward for something that should always be enabled.  But see below.
>> 
>> Please document defaults in the doc comment.
>
> Ack.
>
>>>>>> What exactly do you mean by "where supported"?
>>>>>
>>>>> I meant if both source's Qemu and destination's Qemu support it, as well
>>>>> as for other virtio devices in the future if they decide to implement
>>>>> iterative migration (e.g. a more general "enable iterative migration for
>>>>> virtio devices").
>>>>>
>>>>> But I think for now this is better left as a virtio-net configuration
>>>>> rather than as a migration capability (e.g. --device
>>>>> virtio-net-pci,iterative-mig=on/off,...)
>>>>
>>>> Makes sense to me (but I'm not a migration expert).
>> 
>> A device property's default can depend on the machine type via compat
>> properties.  This is normally used to restrict a guest-visible change to
>> newer machine types.  Here, it's not guest-visible.  But it can get you
>> this:
>> 
>> * Migrate new machine type from new QEMU to new QEMU (old QEMU doesn't
>>    have the machine type): iterative is enabled by default.  Good.  User
>>    can disable it on both ends to not get the improvement.  Enabling it
>>    on just one breaks migration.
>> 
>>    All other cases go away with time.
>> 
>> * Migrate old machine type from new QEMU to new QEMU: iterative is
>>    disabled by default, which is sad, but no worse than before.  User can
>>    enable it on both ends to get the improvement.  Enabling it on just
>>    one breaks migration.
>> 
>> * Migrate old machine type from new QEMU to old QEMU or vice versa:
>>    iterative is off by default.  Good.  Enabling it on the new one breaks
>>    migration.
>> 
>> * Migrate old machine type from old QEMU to old QEMU: iterative is off
>> 
>> I figure almost all users could simply ignore this configuration knob
>> then.
>
> Oh, that's interesting. I wasn't aware of this. But couldn't this 
> potentially cause some headaches and confusion when attempting to 
> migrate between 2 guests where one VM is using a machine type does 
> support it and the other isn't?
>
> For example, the source and destination VMs both specify '-machine 
> q35,...' and the q35 alias resolves into, say, pc-q35-10.1 for the 
> source VM and pc-q35-10.0 for the destination VM. And say this property 
> is supported on >= pc-q35-10.1.

In my understanding, migration requires identical machine types on both
ends, and all bets are off when they're different.

> IIUC, this would mean that iterative is enabled by default on the source 
> VM but disabled by default on the destination VM.
>
> Then a user attempts the migration, the migration fails, and then they'd 
> have to try and figure out why it's failing.

Migration failures due to mismatched configuration tend to be that way,
don't they?

> Furthermore, since it's a device property that's essentially set at VM 
> creation time, either the source would have to be reset and explicitly 
> set this property to off or the destination would have to be reset and 
> use a newer (>= pc-q35-10.1) machine type before starting it back up and 
> perform the migration.

You can use qom-set to change a device property after you created the
device.  It might even work.  However, qom-set is a deeply problematic
and seriously underdocumented interface.  Avoid.

But will you need to change it?

If you started the source with an explicit property value, start the
destination the same way.  Same as for any number of other configuration
knobs.

If you started the source with the default property value, start the
destination the same way.  Values will match as long as the machine type
matches, as it should.

> Am I understanding this correctly?
>
>>>> [...]
Re: [RFC 1/6] migration: Add virtio-iterative capability
Posted by Jonah Palmer 2 months, 2 weeks ago

On 8/27/25 2:37 AM, Markus Armbruster wrote:
> Jonah Palmer <jonah.palmer@oracle.com> writes:
> 
>> On 8/26/25 2:11 AM, Markus Armbruster wrote:
>>> Jonah Palmer <jonah.palmer@oracle.com> writes:
>>>
>>>> On 8/25/25 8:44 AM, Markus Armbruster wrote:
>>>
>>> [...]
>>>
>>>>> Jonah Palmer <jonah.palmer@oracle.com> writes:
>>>>>
>>>>>> On 8/8/25 6:48 AM, Markus Armbruster wrote:
>>>
>>> [...]
>>>
>>>>>>> Jonah Palmer <jonah.palmer@oracle.com> writes:
>>>>>>>> Adds a new migration capability 'virtio-iterative' that will allow
>>>>>>>> virtio devices, where supported, to iteratively migrate configuration
>>>>>>>> changes that occur during the migration process.
>>>>>>>
>>>>>>> Why is that desirable?
>>>>>>
>>>>>> To be frank, I wasn't sure if having a migration capability, or even
>>>>>> have it toggleable at all, would be desirable or not. It appears though
>>>>>> that this might be better off as a per-device feature set via
>>>>>> --device virtio-net-pci,iterative-mig=on,..., for example.
>>>>>
>>>>> See below.
>>>>>
>>>>>> And by "iteratively migrate configuration changes" I meant more along
>>>>>> the lines of the device's state as it continues running on the source.
>>>>>
>>>>> Isn't that what migration does always?
>>>>
>>>> Essentially yes, but today all of the state is only migrated at the end, once the source has been paused. So the final correct state is always sent to the destination.
>>>
>>> As far as I understand (and ignoring lots of detail, including post
>>> copy), we have three stages:
>>>
>>> 1. Source runs, migrate memory pages.  Pages that get dirtied after they
>>> are migrated need to be migrated again.
>>>
>>> 2. Neither source or destination runs, migrate remaining memory pages
>>> and device state.
>>>
>>> 3. Destination starts to run.
>>>
>>> If the duration of stage 2 (downtime) was of no concern, we'd switch to
>>> it immediately, i.e. without migrating anything in stage 1.  This would
>>> minimize I/O.
>>>
>>> Of course, we actually care for limiting downtime.  We switch to stage 2
>>> when "little enough" is left for stage two to migrate.
>>>
>>>> If we're no longer waiting until the source has been paused and the initial state is sent early, then we need to make sure that any changes that happen is still communicated to the destination.
>>>
>>> So you're proposing to treat suitable parts of the device state more
>>> like memory pages.  Correct?
>>>
>>
>> Not in the sense of "something got dirtied so let's immediately re-send
>> that" like we would with RAM. It's more along the lines of "something
>> got dirtied so let's make sure that gets re-sent at the start of stage 2".
> 
> Or is it "something might have dirtied, just resend in stage 2"?
> 

Exactly. This is better wording since it doesn't necessarily have to be 
sent at the "start" of stage 2. Just at some point during it.

>> The entire state of a virtio-net device (even with vhost-net /
>> vhost-vDPA) is <10KB I believe. I don't believe there's much to gain by
>> "iteratively" re-sending changes for virtio-net. It should be suitable
>> enough to just re-send whatever changed during stage 1 (after the
>> initial state was sent) at the start of stage 2.
> 
> Got it.
> 
>> This is why I'm currently looking into a solution that uses VMSD's
>> .early_setup flag (that Peter recommended) rather than implementing a
>> suite of SaveVMHandlers hooks (like this RFC does). We don't need this
>> iterative capability as much as we need to start migrating the state
>> earlier (and doing corresponding config/prep work) during stage 1.
>>
>>> Cover letter and commit message of PATCH 4 provide the motivation: you
>>> observe a shorter downtime.  You speculate this is due to moving "heavy
>>> allocations and page-fault latencies" from stage 2 to stage 1.  Correct?
>>
>> Correct. But again I'd like to stress that this is just one part in
>> reducing downtime during stage 2. The biggest reductions will come from
>> the config/prep work that we're trying to move from stage 2 to stage 1,
>> especially when vhost-vDPA is involved. And we can only do this early
>> work once we have the state, hence why we're sending it earlier.
> 
> This is an important bit of detail I've been missing so far.  Easy
> enough to fix in a future commit message and cover letter.
> 

Ack.

>>> Is there anything that makes virtio-net particularly suitable?
>>
>> Yes, especially with vhost-vDPA and configuring VQs. See Eugenio's
>> comment here
>> https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/CAJaqyWdUutZrAWKy9d=ip*h*y3BnptUrcL8Xj06XfizNxPtfpw@mail.gmail.com/__;Kys!!ACWV5N9M2RV99hQ!MHkMmGR7j2n9i2We6Mh3xXX03yEve90Bhs0aEDCVYU4Z0n-op-0aDlpWMBGZ7CpjDOBnhTkDVJjx8QcQ$ .
> 
> Such prep work commonly depends only on device configuration, not state.
> I'm curious: what state bits exactly does the prep work need?
> 
> Device configuration is available at the start of stage 1, state is
> fully available only at the end of stage 2.
> 

We pretty much need, more or less, all of the state of the VirtIODevice 
itself as well as the bits of the VirtIONet device. Essentially, barring 
ring indices, we'd need whatever is required throughout most of the 
device's startup routine.

In this series, we get everything we need from the vmstate_save_state(f, 
&vmstate_virtio_net, ...) and vmstate_load_state(f, &vmstate_virtio_net, 
...) calls early during stage 1 (see patch 4/6).

Once we've gotten this data, we can start on the prep work that's 
normally done today during stage 2.

> Your patches make *tentative* device state available in stage 1.
> Tentative, because it may still change afterwards.
> 
> You use tentative state to do certain expensive work in stage 1 already,
> in order to cut downtime in stage 2.
> 
> Fair?

Correct.

> 
> Can state change in ways that invalidate this work?
> 

If, for some reason, the guest wanted to change everything during 
migration (specifically during stage 1), then it'd more or less negate 
the early prep work we'd've done. How impactful this is would depend on 
which route we go (see below). God forbid the guest just wait until 
migration is complete.

> If yes, how do you handle this?
> 

So it depends on the route this series goes. That is, whether we go the 
truly iterative SaveVMHandlers hooks route (which this series uses) or 
if we go the early_setup VMSD route (which Peter recommended).

---

If we go the truly iterative route, then technically we can still handle 
these changes during stage 1 and still keep the work out of stage 2.

However, given the nicheness of such a corner case (where things are 
being changed last minute during migration), handling these changes 
iteratively might be overdesign.

And we'd have to guard against the scenario where the guest acts 
maliciously by constantly changing things to prevent migration from 
continuing.

---

If we go the early_setup VMSD route, where we get one shot early to do 
stuff during stage 1 and one last shot to do things later during stage 
2, then the more that gets changed means the less beneficial this early 
work becomes. This is because any changes made during stage 1 could only 
be handled during stage 2, which is what this overall effort is trying 
to minimize.

> If no, do you verify the "no change" design assumption holds?
> 

When it comes to early migration for devices, we can never support a "no 
change" design assumption. The difference in the design lies in how (and 
when) such changes are handled during migration.

>>> I think this patch's commit message should at least hint at the
>>> motivation at a high level.  Details like measurements are best left to
>>> PATCH 4.
>>
>> You're right, this was my bad for not framing this RFC more clearly and
>> the true motivations behind it. I will certainly be more direct and
>> descriptive in the next RFC for this effort.
>>
>>>> This RFC handles this by just re-sending the entire state again once the source has been paused. But of course this isn't optimal and I'm looking into how to better optimize this part.
>>>
>>> How much is the entire state?
>>
>> I'm not exactly sure how large it is but it should be <10KB even with
>> vhost-vDPA. It could be slightly larger if we really up the number of
>> queue pairs and/or have huge MAC/multicast lists.
> 
> No worries then.
> 
>>>>>> But perhaps actual configuration changes (e.g. changing the number of
>>>>>> queue pairs) could also be supported mid-migration like this?
>>>>>
>>>>> I don't know.
>>>>>
>>>>>>>> This capability is added to the validated capabilities list to ensure
>>>>>>>> both the source and destination support it before enabling.
>>>>>>>
>>>>>>> What happens when only one side enables it?
>>>>>>
>>>>>> The migration stream breaks if only one side enables it.
>>>>>
>>>>> How does it break?  Error message pointing out the misconfiguration?
>>>>>
>>>>
>>>> The destination VM is torn down and the source just reports that migration failed.
>>>
>>> Exact same failure as for other misconfigurations, like missing a device
>>> on the destination?
>>
>> I hesitate to say "exact" but for example, when missing a device on one
>> side you might see something like below (I removed a serial device):
>>
>> qemu-system-x86_64: Unknown ramblock "0000:00:03.0/virtio-net-pci.rom",
>> cannot accept migration
>> qemu-system-x86_64: error while loading state for instance 0x0 of device
>> 'ram'
>> qemu-system-x86_64: load of migration failed: Invalid argument
>> ...
> 
> Aside: ugly error cascade due to migration's well-known failure to
> propagate errors up properly.
> 
>> The expected order gets messed up and eventually the wrong data will end
>> up somewhere else. In this case it was the RAM.
> 
> It's messy.  If we started on a green field today, we'd do better, I
> hope.
> 
> What error message do you observe when only one side enables
> @virtio-iterative?  Question is moot if you plan to switch to a
> different interface.  Answer it for that interface in a commit message
> then.
> 

Will do.

>>>> I don't believe the source/destination could be aware of the misconfiguration. IIUC the destination reads the migration stream and expects certain pieces of data in a certain order. If new data is added to the migration stream or the order has changed and the destination isn't expecting it, then the migration fails. It doesn't know exactly why, just that it read-in data that it wasn't expecting.
>>>>
>>>>>> This is poor wording on my part, my apologies. I don't think it's even
>>>>>> possible to know the capabilities between the source & destination.
>>>>>>
>>>>>>>> The capability defaults to off to maintain backward compatibility.
>>>>>>>>
>>>>>>>> To enable the capability via HMP:
>>>>>>>> (qemu) migrate_set_capability virtio-iterative on
>>>>>>>>
>>>>>>>> To enable the capability via QMP:
>>>>>>>> {"execute": "migrate-set-capabilities", "arguments": {
>>>>>>>>          "capabilities": [
>>>>>>>>             { "capability": "virtio-iterative", "state": true }
>>>>>>>>          ]
>>>>>>>>       }
>>>>>>>> }
>>>>>>>>
>>>>>>>> Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
>>>
>>> [...]
>>>
>>>>>>>> diff --git a/qapi/migration.json b/qapi/migration.json
>>>>>>>> index 4963f6ca12..8f042c3ba5 100644
>>>>>>>> --- a/qapi/migration.json
>>>>>>>> +++ b/qapi/migration.json
>>>>>>>> @@ -479,6 +479,11 @@
>>>>>>>>    #     each RAM page.  Requires a migration URI that supports seeking,
>>>>>>>>    #     such as a file.  (since 9.0)
>>>>>>>>    #
>>>>>>>> +# @virtio-iterative: Enable iterative migration for virtio devices, if
>>>>>>>> +#     the device supports it. When enabled, and where supported, virtio
>>>>>>>> +#     devices will track and migrate configuration changes that may
>>>>>>>> +#     occur during the migration process. (Since 10.1)
>>>>>>>
>>>>>>> When and why should the user enable this?
>>>>>>
>>>>>> Well if all goes according to plan, always (at least for virtio-net).
>>>>>> This should improve the overall speed of live migration for a virtio-net
>>>>>> device (and vhost-net/vhost-vdpa).
>>>>>
>>>>> So the only use for "disabled" would be when migrating to or from an
>>>>> older version of QEMU that doesn't support this.  Fair?
>>>>
>>>> Correct.
>>>>
>>>>> What's the default?
>>>>
>>>> Disabled.
>>>
>>> Awkward for something that should always be enabled.  But see below.
>>>
>>> Please document defaults in the doc comment.
>>
>> Ack.
>>
>>>>>>> What exactly do you mean by "where supported"?
>>>>>>
>>>>>> I meant if both source's Qemu and destination's Qemu support it, as well
>>>>>> as for other virtio devices in the future if they decide to implement
>>>>>> iterative migration (e.g. a more general "enable iterative migration for
>>>>>> virtio devices").
>>>>>>
>>>>>> But I think for now this is better left as a virtio-net configuration
>>>>>> rather than as a migration capability (e.g. --device
>>>>>> virtio-net-pci,iterative-mig=on/off,...)
>>>>>
>>>>> Makes sense to me (but I'm not a migration expert).
>>>
>>> A device property's default can depend on the machine type via compat
>>> properties.  This is normally used to restrict a guest-visible change to
>>> newer machine types.  Here, it's not guest-visible.  But it can get you
>>> this:
>>>
>>> * Migrate new machine type from new QEMU to new QEMU (old QEMU doesn't
>>>     have the machine type): iterative is enabled by default.  Good.  User
>>>     can disable it on both ends to not get the improvement.  Enabling it
>>>     on just one breaks migration.
>>>
>>>     All other cases go away with time.
>>>
>>> * Migrate old machine type from new QEMU to new QEMU: iterative is
>>>     disabled by default, which is sad, but no worse than before.  User can
>>>     enable it on both ends to get the improvement.  Enabling it on just
>>>     one breaks migration.
>>>
>>> * Migrate old machine type from new QEMU to old QEMU or vice versa:
>>>     iterative is off by default.  Good.  Enabling it on the new one breaks
>>>     migration.
>>>
>>> * Migrate old machine type from old QEMU to old QEMU: iterative is off
>>>
>>> I figure almost all users could simply ignore this configuration knob
>>> then.
>>
>> Oh, that's interesting. I wasn't aware of this. But couldn't this
>> potentially cause some headaches and confusion when attempting to
>> migrate between 2 guests where one VM is using a machine type does
>> support it and the other isn't?
>>
>> For example, the source and destination VMs both specify '-machine
>> q35,...' and the q35 alias resolves into, say, pc-q35-10.1 for the
>> source VM and pc-q35-10.0 for the destination VM. And say this property
>> is supported on >= pc-q35-10.1.
> 
> In my understanding, migration requires identical machine types on both
> ends, and all bets are off when they're different.
> 

Ah, true.

>> IIUC, this would mean that iterative is enabled by default on the source
>> VM but disabled by default on the destination VM.
>>
>> Then a user attempts the migration, the migration fails, and then they'd
>> have to try and figure out why it's failing.
> 
> Migration failures due to mismatched configuration tend to be that way,
> don't they?
> 

Right.

So if we pin this feature to always be enabled for machine type, say, >= 
pc-q35-XX.X, then can we assume that both guests can actually support 
this feature?

In other words, conversely, is it possible in production that both 
guests use pc-q35-XX.X but one build supports this early migration 
feature and the other doesn't?

If we can assume that, then this would probably be the right approach 
for something like this.

>> Furthermore, since it's a device property that's essentially set at VM
>> creation time, either the source would have to be reset and explicitly
>> set this property to off or the destination would have to be reset and
>> use a newer (>= pc-q35-10.1) machine type before starting it back up and
>> perform the migration.
> 
> You can use qom-set to change a device property after you created the
> device.  It might even work.  However, qom-set is a deeply problematic
> and seriously underdocumented interface.  Avoid.
> 
> But will you need to change it?
> 
> If you started the source with an explicit property value, start the
> destination the same way.  Same as for any number of other configuration
> knobs.
> 
> If you started the source with the default property value, start the
> destination the same way.  Values will match as long as the machine type
> matches, as it should.
> 

Given that migration can only be done with matching machine types and if 
we can assume that guests using pc-q35-XX.X, for example, will always 
have this support, then my concerns about this are allayed.

>> Am I understanding this correctly?
>>
>>>>> [...]
>
Re: [RFC 1/6] migration: Add virtio-iterative capability
Posted by Markus Armbruster 2 months, 2 weeks ago
Jonah Palmer <jonah.palmer@oracle.com> writes:

> On 8/27/25 2:37 AM, Markus Armbruster wrote:
>> Jonah Palmer <jonah.palmer@oracle.com> writes:
>> 
>>> On 8/26/25 2:11 AM, Markus Armbruster wrote:
>>>> Jonah Palmer <jonah.palmer@oracle.com> writes:
>>>>
>>>>> On 8/25/25 8:44 AM, Markus Armbruster wrote:
>>>>
>>>> [...]
>>>>
>>>>>> Jonah Palmer <jonah.palmer@oracle.com> writes:
>>>>>>
>>>>>>> On 8/8/25 6:48 AM, Markus Armbruster wrote:
>>>>
>>>> [...]
>>>>
>>>>>>>> Jonah Palmer <jonah.palmer@oracle.com> writes:
>>>>>>>>> Adds a new migration capability 'virtio-iterative' that will allow
>>>>>>>>> virtio devices, where supported, to iteratively migrate configuration
>>>>>>>>> changes that occur during the migration process.
>>>>>>>>
>>>>>>>> Why is that desirable?
>>>>>>>
>>>>>>> To be frank, I wasn't sure if having a migration capability, or even
>>>>>>> have it toggleable at all, would be desirable or not. It appears though
>>>>>>> that this might be better off as a per-device feature set via
>>>>>>> --device virtio-net-pci,iterative-mig=on,..., for example.
>>>>>>
>>>>>> See below.
>>>>>>
>>>>>>> And by "iteratively migrate configuration changes" I meant more along
>>>>>>> the lines of the device's state as it continues running on the source.
>>>>>>
>>>>>> Isn't that what migration does always?
>>>>>
>>>>> Essentially yes, but today all of the state is only migrated at the end, once the source has been paused. So the final correct state is always sent to the destination.
>>>>
>>>> As far as I understand (and ignoring lots of detail, including post
>>>> copy), we have three stages:
>>>>
>>>> 1. Source runs, migrate memory pages.  Pages that get dirtied after they
>>>> are migrated need to be migrated again.
>>>>
>>>> 2. Neither source or destination runs, migrate remaining memory pages
>>>> and device state.
>>>>
>>>> 3. Destination starts to run.
>>>>
>>>> If the duration of stage 2 (downtime) was of no concern, we'd switch to
>>>> it immediately, i.e. without migrating anything in stage 1.  This would
>>>> minimize I/O.
>>>>
>>>> Of course, we actually care for limiting downtime.  We switch to stage 2
>>>> when "little enough" is left for stage two to migrate.
>>>>
>>>>> If we're no longer waiting until the source has been paused and the initial state is sent early, then we need to make sure that any changes that happen is still communicated to the destination.
>>>>
>>>> So you're proposing to treat suitable parts of the device state more
>>>> like memory pages.  Correct?
>>>>
>>>
>>> Not in the sense of "something got dirtied so let's immediately re-send
>>> that" like we would with RAM. It's more along the lines of "something
>>> got dirtied so let's make sure that gets re-sent at the start of stage 2".
>> 
>> Or is it "something might have dirtied, just resend in stage 2"?
>> 
>
> Exactly. This is better wording since it doesn't necessarily have to be 
> sent at the "start" of stage 2. Just at some point during it.

Got it.

>>> The entire state of a virtio-net device (even with vhost-net /
>>> vhost-vDPA) is <10KB I believe. I don't believe there's much to gain by
>>> "iteratively" re-sending changes for virtio-net. It should be suitable
>>> enough to just re-send whatever changed during stage 1 (after the
>>> initial state was sent) at the start of stage 2.
>> 
>> Got it.
>> 
>>> This is why I'm currently looking into a solution that uses VMSD's
>>> .early_setup flag (that Peter recommended) rather than implementing a
>>> suite of SaveVMHandlers hooks (like this RFC does). We don't need this
>>> iterative capability as much as we need to start migrating the state
>>> earlier (and doing corresponding config/prep work) during stage 1.
>>>
>>>> Cover letter and commit message of PATCH 4 provide the motivation: you
>>>> observe a shorter downtime.  You speculate this is due to moving "heavy
>>>> allocations and page-fault latencies" from stage 2 to stage 1.  Correct?
>>>
>>> Correct. But again I'd like to stress that this is just one part in
>>> reducing downtime during stage 2. The biggest reductions will come from
>>> the config/prep work that we're trying to move from stage 2 to stage 1,
>>> especially when vhost-vDPA is involved. And we can only do this early
>>> work once we have the state, hence why we're sending it earlier.
>> 
>> This is an important bit of detail I've been missing so far.  Easy
>> enough to fix in a future commit message and cover letter.
>> 
>
> Ack.
>
>>>> Is there anything that makes virtio-net particularly suitable?
>>>
>>> Yes, especially with vhost-vDPA and configuring VQs. See Eugenio's
>>> comment here
>>> https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/CAJaqyWdUutZrAWKy9d=ip*h*y3BnptUrcL8Xj06XfizNxPtfpw@mail.gmail.com/__;Kys!!ACWV5N9M2RV99hQ!MHkMmGR7j2n9i2We6Mh3xXX03yEve90Bhs0aEDCVYU4Z0n-op-0aDlpWMBGZ7CpjDOBnhTkDVJjx8QcQ$ .
>> 
>> Such prep work commonly depends only on device configuration, not state.
>> I'm curious: what state bits exactly does the prep work need?
>> 
>> Device configuration is available at the start of stage 1, state is
>> fully available only at the end of stage 2.
>> 
>
> We pretty much need, more or less, all of the state of the VirtIODevice 
> itself as well as the bits of the VirtIONet device. Essentially, barring 
> ring indices, we'd need whatever is required throughout most of the 
> device's startup routine.
>
> In this series, we get everything we need from the vmstate_save_state(f, 
> &vmstate_virtio_net, ...) and vmstate_load_state(f, &vmstate_virtio_net, 
> ...) calls early during stage 1 (see patch 4/6).
>
> Once we've gotten this data, we can start on the prep work that's 
> normally done today during stage 2.

This is unusual.  I'd like to understand it better.

Non-migration startup:

1. We create the device.  This runs its .init().

2. We configure the device by setting device properties.

3. We realize the device.  This runs its .realize(), which initializes
the device state according to its configuration.

4. The guest interacts with the device.  Device state changes.

When is the expensive prep work we've been discussing done here?

>> Your patches make *tentative* device state available in stage 1.
>> Tentative, because it may still change afterwards.
>> 
>> You use tentative state to do certain expensive work in stage 1 already,
>> in order to cut downtime in stage 2.
>> 
>> Fair?
>
> Correct.

Got it.

>> Can state change in ways that invalidate this work?
>> 
>
> If, for some reason, the guest wanted to change everything during 
> migration (specifically during stage 1), then it'd more or less negate 
> the early prep work we'd've done. How impactful this is would depend on 
> which route we go (see below). God forbid the guest just wait until 
> migration is complete.

So the answer is yes.

>> If yes, how do you handle this?
>> 
>
> So it depends on the route this series goes. That is, whether we go the 
> truly iterative SaveVMHandlers hooks route (which this series uses) or 
> if we go the early_setup VMSD route (which Peter recommended).
>
> ---
>
> If we go the truly iterative route, then technically we can still handle 
> these changes during stage 1 and still keep the work out of stage 2.
>
> However, given the nicheness of such a corner case (where things are 
> being changed last minute during migration), handling these changes 
> iteratively might be overdesign.
>
> And we'd have to guard against the scenario where the guest acts 
> maliciously by constantly changing things to prevent migration from 
> continuing.

Yes.

> ---
>
> If we go the early_setup VMSD route, where we get one shot early to do 
> stuff during stage 1 and one last shot to do things later during stage 
> 2, then the more that gets changed means the less beneficial this early 
> work becomes. This is because any changes made during stage 1 could only 
> be handled during stage 2, which is what this overall effort is trying 
> to minimize.

Stupidest solution that could possibly work: if anything impacting the
prep work changed, redo it from scratch.

>> If no, do you verify the "no change" design assumption holds?
>> 
>
> When it comes to early migration for devices, we can never support a "no 
> change" design assumption. The difference in the design lies in how (and 
> when) such changes are handled during migration.

Just checking :)

[...]

>>>>>>> But I think for now this is better left as a virtio-net configuration
>>>>>>> rather than as a migration capability (e.g. --device
>>>>>>> virtio-net-pci,iterative-mig=on/off,...)
>>>>>>
>>>>>> Makes sense to me (but I'm not a migration expert).
>>>>
>>>> A device property's default can depend on the machine type via compat
>>>> properties.  This is normally used to restrict a guest-visible change to
>>>> newer machine types.  Here, it's not guest-visible.  But it can get you
>>>> this:
>>>>
>>>> * Migrate new machine type from new QEMU to new QEMU (old QEMU doesn't
>>>>     have the machine type): iterative is enabled by default.  Good.  User
>>>>     can disable it on both ends to not get the improvement.  Enabling it
>>>>     on just one breaks migration.
>>>>
>>>>     All other cases go away with time.
>>>>
>>>> * Migrate old machine type from new QEMU to new QEMU: iterative is
>>>>     disabled by default, which is sad, but no worse than before.  User can
>>>>     enable it on both ends to get the improvement.  Enabling it on just
>>>>     one breaks migration.
>>>>
>>>> * Migrate old machine type from new QEMU to old QEMU or vice versa:
>>>>     iterative is off by default.  Good.  Enabling it on the new one breaks
>>>>     migration.
>>>>
>>>> * Migrate old machine type from old QEMU to old QEMU: iterative is off
>>>>
>>>> I figure almost all users could simply ignore this configuration knob
>>>> then.
>>>
>>> Oh, that's interesting. I wasn't aware of this. But couldn't this
>>> potentially cause some headaches and confusion when attempting to
>>> migrate between 2 guests where one VM is using a machine type does
>>> support it and the other isn't?
>>>
>>> For example, the source and destination VMs both specify '-machine
>>> q35,...' and the q35 alias resolves into, say, pc-q35-10.1 for the
>>> source VM and pc-q35-10.0 for the destination VM. And say this property
>>> is supported on >= pc-q35-10.1.
>> 
>> In my understanding, migration requires identical machine types on both
>> ends, and all bets are off when they're different.
>> 
>
> Ah, true.
>
>>> IIUC, this would mean that iterative is enabled by default on the source
>>> VM but disabled by default on the destination VM.
>>>
>>> Then a user attempts the migration, the migration fails, and then they'd
>>> have to try and figure out why it's failing.
>> 
>> Migration failures due to mismatched configuration tend to be that way,
>> don't they?
>> 
>
> Right.
>
> So if we pin this feature to always be enabled for machine type, say, >= 
> pc-q35-XX.X, then can we assume that both guests can actually support 
> this feature?
>
> In other words, conversely, is it possible in production that both 
> guests use pc-q35-XX.X but one build supports this early migration 
> feature and the other doesn't?

I'd call that a bug.

Here's how we commonly code property defaulds depending on the machine
type.

The property defaults to the new default (here: feature enabled).

Machine types older than the current (unreleased) one use a compat
property to change it the old default (here: feature disabled).  With
this value, the device must be compatible with its older versions in
prior release of QEMU, both for guest and for migration.

Once you got that right, it's fairly unlikely to break accidentally.

The current machine type then defaults the feature to enabled in the
current and all future versions of QEMU.  The machine type doesn't exist
in older versions of QEMU.

Older machine types default it to disabled in the current and all future
versions of QEMU, which is compatible with older versions of QEMU.

> If we can assume that, then this would probably be the right approach 
> for something like this.
>
>>> Furthermore, since it's a device property that's essentially set at VM
>>> creation time, either the source would have to be reset and explicitly
>>> set this property to off or the destination would have to be reset and
>>> use a newer (>= pc-q35-10.1) machine type before starting it back up and
>>> perform the migration.
>> 
>> You can use qom-set to change a device property after you created the
>> device.  It might even work.  However, qom-set is a deeply problematic
>> and seriously underdocumented interface.  Avoid.
>> 
>> But will you need to change it?
>> 
>> If you started the source with an explicit property value, start the
>> destination the same way.  Same as for any number of other configuration
>> knobs.
>> 
>> If you started the source with the default property value, start the
>> destination the same way.  Values will match as long as the machine type
>> matches, as it should.
>> 
>
> Given that migration can only be done with matching machine types and if 
> we can assume that guests using pc-q35-XX.X, for example, will always 
> have this support, then my concerns about this are allayed.

Glad I was able to assist here!

>>> Am I understanding this correctly?
>>>
>>>>>> [...]
>>
Re: [RFC 1/6] migration: Add virtio-iterative capability
Posted by Jonah Palmer 2 months, 2 weeks ago

On 8/29/25 5:24 AM, Markus Armbruster wrote:
> Jonah Palmer <jonah.palmer@oracle.com> writes:
> 
>> On 8/27/25 2:37 AM, Markus Armbruster wrote:
>>> Jonah Palmer <jonah.palmer@oracle.com> writes:
>>>
>>>> On 8/26/25 2:11 AM, Markus Armbruster wrote:
>>>>> Jonah Palmer <jonah.palmer@oracle.com> writes:
>>>>>
>>>>>> On 8/25/25 8:44 AM, Markus Armbruster wrote:
>>>>>
>>>>> [...]
>>>>>
>>>>>>> Jonah Palmer <jonah.palmer@oracle.com> writes:
>>>>>>>
>>>>>>>> On 8/8/25 6:48 AM, Markus Armbruster wrote:
>>>>>
>>>>> [...]
>>>>>
>>>>>>>>> Jonah Palmer <jonah.palmer@oracle.com> writes:
>>>>>>>>>> Adds a new migration capability 'virtio-iterative' that will allow
>>>>>>>>>> virtio devices, where supported, to iteratively migrate configuration
>>>>>>>>>> changes that occur during the migration process.
>>>>>>>>>
>>>>>>>>> Why is that desirable?
>>>>>>>>
>>>>>>>> To be frank, I wasn't sure if having a migration capability, or even
>>>>>>>> have it toggleable at all, would be desirable or not. It appears though
>>>>>>>> that this might be better off as a per-device feature set via
>>>>>>>> --device virtio-net-pci,iterative-mig=on,..., for example.
>>>>>>>
>>>>>>> See below.
>>>>>>>
>>>>>>>> And by "iteratively migrate configuration changes" I meant more along
>>>>>>>> the lines of the device's state as it continues running on the source.
>>>>>>>
>>>>>>> Isn't that what migration does always?
>>>>>>
>>>>>> Essentially yes, but today all of the state is only migrated at the end, once the source has been paused. So the final correct state is always sent to the destination.
>>>>>
>>>>> As far as I understand (and ignoring lots of detail, including post
>>>>> copy), we have three stages:
>>>>>
>>>>> 1. Source runs, migrate memory pages.  Pages that get dirtied after they
>>>>> are migrated need to be migrated again.
>>>>>
>>>>> 2. Neither source or destination runs, migrate remaining memory pages
>>>>> and device state.
>>>>>
>>>>> 3. Destination starts to run.
>>>>>
>>>>> If the duration of stage 2 (downtime) was of no concern, we'd switch to
>>>>> it immediately, i.e. without migrating anything in stage 1.  This would
>>>>> minimize I/O.
>>>>>
>>>>> Of course, we actually care for limiting downtime.  We switch to stage 2
>>>>> when "little enough" is left for stage two to migrate.
>>>>>
>>>>>> If we're no longer waiting until the source has been paused and the initial state is sent early, then we need to make sure that any changes that happen is still communicated to the destination.
>>>>>
>>>>> So you're proposing to treat suitable parts of the device state more
>>>>> like memory pages.  Correct?
>>>>>
>>>>
>>>> Not in the sense of "something got dirtied so let's immediately re-send
>>>> that" like we would with RAM. It's more along the lines of "something
>>>> got dirtied so let's make sure that gets re-sent at the start of stage 2".
>>>
>>> Or is it "something might have dirtied, just resend in stage 2"?
>>>
>>
>> Exactly. This is better wording since it doesn't necessarily have to be
>> sent at the "start" of stage 2. Just at some point during it.
> 
> Got it.
> 
>>>> The entire state of a virtio-net device (even with vhost-net /
>>>> vhost-vDPA) is <10KB I believe. I don't believe there's much to gain by
>>>> "iteratively" re-sending changes for virtio-net. It should be suitable
>>>> enough to just re-send whatever changed during stage 1 (after the
>>>> initial state was sent) at the start of stage 2.
>>>
>>> Got it.
>>>
>>>> This is why I'm currently looking into a solution that uses VMSD's
>>>> .early_setup flag (that Peter recommended) rather than implementing a
>>>> suite of SaveVMHandlers hooks (like this RFC does). We don't need this
>>>> iterative capability as much as we need to start migrating the state
>>>> earlier (and doing corresponding config/prep work) during stage 1.
>>>>
>>>>> Cover letter and commit message of PATCH 4 provide the motivation: you
>>>>> observe a shorter downtime.  You speculate this is due to moving "heavy
>>>>> allocations and page-fault latencies" from stage 2 to stage 1.  Correct?
>>>>
>>>> Correct. But again I'd like to stress that this is just one part in
>>>> reducing downtime during stage 2. The biggest reductions will come from
>>>> the config/prep work that we're trying to move from stage 2 to stage 1,
>>>> especially when vhost-vDPA is involved. And we can only do this early
>>>> work once we have the state, hence why we're sending it earlier.
>>>
>>> This is an important bit of detail I've been missing so far.  Easy
>>> enough to fix in a future commit message and cover letter.
>>>
>>
>> Ack.
>>
>>>>> Is there anything that makes virtio-net particularly suitable?
>>>>
>>>> Yes, especially with vhost-vDPA and configuring VQs. See Eugenio's
>>>> comment here
>>>> https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/CAJaqyWdUutZrAWKy9d=ip*h*y3BnptUrcL8Xj06XfizNxPtfpw@mail.gmail.com/__;Kys!!ACWV5N9M2RV99hQ!MHkMmGR7j2n9i2We6Mh3xXX03yEve90Bhs0aEDCVYU4Z0n-op-0aDlpWMBGZ7CpjDOBnhTkDVJjx8QcQ$ .
>>>
>>> Such prep work commonly depends only on device configuration, not state.
>>> I'm curious: what state bits exactly does the prep work need?
>>>
>>> Device configuration is available at the start of stage 1, state is
>>> fully available only at the end of stage 2.
>>>
>>
>> We pretty much need, more or less, all of the state of the VirtIODevice
>> itself as well as the bits of the VirtIONet device. Essentially, barring
>> ring indices, we'd need whatever is required throughout most of the
>> device's startup routine.
>>
>> In this series, we get everything we need from the vmstate_save_state(f,
>> &vmstate_virtio_net, ...) and vmstate_load_state(f, &vmstate_virtio_net,
>> ...) calls early during stage 1 (see patch 4/6).
>>
>> Once we've gotten this data, we can start on the prep work that's
>> normally done today during stage 2.
> 
> This is unusual.  I'd like to understand it better.
> 
> Non-migration startup:
> 
> 1. We create the device.  This runs its .init().
> 
> 2. We configure the device by setting device properties.
> 
> 3. We realize the device.  This runs its .realize(), which initializes
> the device state according to its configuration.
> 
> 4. The guest interacts with the device.  Device state changes.
> 
> When is the expensive prep work we've been discussing done here?
> 

During 4.). The expensive vhost bring-up (e.g. for vhost-vDPA) happens 
during vhost_dev_start when it has to send ioctls to configure the 
memory table, VQs, etc.

This prep work depends on negotiated features, device configuration 
(MAC, MTU, MQ), VQ layouts (vring addresses), memory table, etc. It 
doesn't require dynamic VQ state like ring indices.

>>> Your patches make *tentative* device state available in stage 1.
>>> Tentative, because it may still change afterwards.
>>>
>>> You use tentative state to do certain expensive work in stage 1 already,
>>> in order to cut downtime in stage 2.
>>>
>>> Fair?
>>
>> Correct.
> 
> Got it.
> 
>>> Can state change in ways that invalidate this work?
>>>
>>
>> If, for some reason, the guest wanted to change everything during
>> migration (specifically during stage 1), then it'd more or less negate
>> the early prep work we'd've done. How impactful this is would depend on
>> which route we go (see below). God forbid the guest just wait until
>> migration is complete.
> 
> So the answer is yes.
> 
>>> If yes, how do you handle this?
>>>
>>
>> So it depends on the route this series goes. That is, whether we go the
>> truly iterative SaveVMHandlers hooks route (which this series uses) or
>> if we go the early_setup VMSD route (which Peter recommended).
>>
>> ---
>>
>> If we go the truly iterative route, then technically we can still handle
>> these changes during stage 1 and still keep the work out of stage 2.
>>
>> However, given the nicheness of such a corner case (where things are
>> being changed last minute during migration), handling these changes
>> iteratively might be overdesign.
>>
>> And we'd have to guard against the scenario where the guest acts
>> maliciously by constantly changing things to prevent migration from
>> continuing.
> 
> Yes.
> 
>> ---
>>
>> If we go the early_setup VMSD route, where we get one shot early to do
>> stuff during stage 1 and one last shot to do things later during stage
>> 2, then the more that gets changed means the less beneficial this early
>> work becomes. This is because any changes made during stage 1 could only
>> be handled during stage 2, which is what this overall effort is trying
>> to minimize.
> 
> Stupidest solution that could possibly work: if anything impacting the
> prep work changed, redo it from scratch.
> 
>>> If no, do you verify the "no change" design assumption holds?
>>>
>>
>> When it comes to early migration for devices, we can never support a "no
>> change" design assumption. The difference in the design lies in how (and
>> when) such changes are handled during migration.
> 
> Just checking :)
> 
> [...]
> 
>>>>>>>> But I think for now this is better left as a virtio-net configuration
>>>>>>>> rather than as a migration capability (e.g. --device
>>>>>>>> virtio-net-pci,iterative-mig=on/off,...)
>>>>>>>
>>>>>>> Makes sense to me (but I'm not a migration expert).
>>>>>
>>>>> A device property's default can depend on the machine type via compat
>>>>> properties.  This is normally used to restrict a guest-visible change to
>>>>> newer machine types.  Here, it's not guest-visible.  But it can get you
>>>>> this:
>>>>>
>>>>> * Migrate new machine type from new QEMU to new QEMU (old QEMU doesn't
>>>>>      have the machine type): iterative is enabled by default.  Good.  User
>>>>>      can disable it on both ends to not get the improvement.  Enabling it
>>>>>      on just one breaks migration.
>>>>>
>>>>>      All other cases go away with time.
>>>>>
>>>>> * Migrate old machine type from new QEMU to new QEMU: iterative is
>>>>>      disabled by default, which is sad, but no worse than before.  User can
>>>>>      enable it on both ends to get the improvement.  Enabling it on just
>>>>>      one breaks migration.
>>>>>
>>>>> * Migrate old machine type from new QEMU to old QEMU or vice versa:
>>>>>      iterative is off by default.  Good.  Enabling it on the new one breaks
>>>>>      migration.
>>>>>
>>>>> * Migrate old machine type from old QEMU to old QEMU: iterative is off
>>>>>
>>>>> I figure almost all users could simply ignore this configuration knob
>>>>> then.
>>>>
>>>> Oh, that's interesting. I wasn't aware of this. But couldn't this
>>>> potentially cause some headaches and confusion when attempting to
>>>> migrate between 2 guests where one VM is using a machine type does
>>>> support it and the other isn't?
>>>>
>>>> For example, the source and destination VMs both specify '-machine
>>>> q35,...' and the q35 alias resolves into, say, pc-q35-10.1 for the
>>>> source VM and pc-q35-10.0 for the destination VM. And say this property
>>>> is supported on >= pc-q35-10.1.
>>>
>>> In my understanding, migration requires identical machine types on both
>>> ends, and all bets are off when they're different.
>>>
>>
>> Ah, true.
>>
>>>> IIUC, this would mean that iterative is enabled by default on the source
>>>> VM but disabled by default on the destination VM.
>>>>
>>>> Then a user attempts the migration, the migration fails, and then they'd
>>>> have to try and figure out why it's failing.
>>>
>>> Migration failures due to mismatched configuration tend to be that way,
>>> don't they?
>>>
>>
>> Right.
>>
>> So if we pin this feature to always be enabled for machine type, say, >=
>> pc-q35-XX.X, then can we assume that both guests can actually support
>> this feature?
>>
>> In other words, conversely, is it possible in production that both
>> guests use pc-q35-XX.X but one build supports this early migration
>> feature and the other doesn't?
> 
> I'd call that a bug.
> 
> Here's how we commonly code property defaulds depending on the machine
> type.
> 
> The property defaults to the new default (here: feature enabled).
> 
> Machine types older than the current (unreleased) one use a compat
> property to change it the old default (here: feature disabled).  With
> this value, the device must be compatible with its older versions in
> prior release of QEMU, both for guest and for migration.
> 
> Once you got that right, it's fairly unlikely to break accidentally.
> 
> The current machine type then defaults the feature to enabled in the
> current and all future versions of QEMU.  The machine type doesn't exist
> in older versions of QEMU.
> 
> Older machine types default it to disabled in the current and all future
> versions of QEMU, which is compatible with older versions of QEMU.
> 

Got it. This is something I will look into then for this kind of 
implementation. Thank you!

>> If we can assume that, then this would probably be the right approach
>> for something like this.
>>
>>>> Furthermore, since it's a device property that's essentially set at VM
>>>> creation time, either the source would have to be reset and explicitly
>>>> set this property to off or the destination would have to be reset and
>>>> use a newer (>= pc-q35-10.1) machine type before starting it back up and
>>>> perform the migration.
>>>
>>> You can use qom-set to change a device property after you created the
>>> device.  It might even work.  However, qom-set is a deeply problematic
>>> and seriously underdocumented interface.  Avoid.
>>>
>>> But will you need to change it?
>>>
>>> If you started the source with an explicit property value, start the
>>> destination the same way.  Same as for any number of other configuration
>>> knobs.
>>>
>>> If you started the source with the default property value, start the
>>> destination the same way.  Values will match as long as the machine type
>>> matches, as it should.
>>>
>>
>> Given that migration can only be done with matching machine types and if
>> we can assume that guests using pc-q35-XX.X, for example, will always
>> have this support, then my concerns about this are allayed.
> 
> Glad I was able to assist here!
> 
>>>> Am I understanding this correctly?
>>>>
>>>>>>> [...]
>>>
>
Re: [RFC 1/6] migration: Add virtio-iterative capability
Posted by Peter Xu 3 months, 1 week ago
On Tue, Jul 22, 2025 at 12:41:22PM +0000, Jonah Palmer wrote:
> Adds a new migration capability 'virtio-iterative' that will allow
> virtio devices, where supported, to iteratively migrate configuration
> changes that occur during the migration process.
> 
> This capability is added to the validated capabilities list to ensure
> both the source and destination support it before enabling.
> 
> The capability defaults to off to maintain backward compatibility.
> 
> To enable the capability via HMP:
> (qemu) migrate_set_capability virtio-iterative on
> 
> To enable the capability via QMP:
> {"execute": "migrate-set-capabilities", "arguments": {
>      "capabilities": [
>         { "capability": "virtio-iterative", "state": true }
>      ]
>   }
> }
> 
> Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
> ---
>  migration/savevm.c  | 1 +
>  qapi/migration.json | 7 ++++++-
>  2 files changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/migration/savevm.c b/migration/savevm.c
> index bb04a4520d..40a2189866 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -279,6 +279,7 @@ static bool should_validate_capability(int capability)
>      switch (capability) {
>      case MIGRATION_CAPABILITY_X_IGNORE_SHARED:
>      case MIGRATION_CAPABILITY_MAPPED_RAM:
> +    case MIGRATION_CAPABILITY_VIRTIO_ITERATIVE:
>          return true;
>      default:
>          return false;
> diff --git a/qapi/migration.json b/qapi/migration.json
> index 4963f6ca12..8f042c3ba5 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -479,6 +479,11 @@
>  #     each RAM page.  Requires a migration URI that supports seeking,
>  #     such as a file.  (since 9.0)
>  #
> +# @virtio-iterative: Enable iterative migration for virtio devices, if
> +#     the device supports it. When enabled, and where supported, virtio
> +#     devices will track and migrate configuration changes that may
> +#     occur during the migration process. (Since 10.1)
> +#

Having a migration capability to enable iterative support for a specific
type of device sounds wrong.

If virtio will be able to support iterative saves, it could provide the
save_live_iterate() function.  Any explanation why it needs to be a
migration capability?

>  # Features:
>  #
>  # @unstable: Members @x-colo and @x-ignore-shared are experimental.
> @@ -498,7 +503,7 @@
>             { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] },
>             'validate-uuid', 'background-snapshot',
>             'zero-copy-send', 'postcopy-preempt', 'switchover-ack',
> -           'dirty-limit', 'mapped-ram'] }
> +           'dirty-limit', 'mapped-ram', 'virtio-iterative'] }
>  
>  ##
>  # @MigrationCapabilityStatus:
> -- 
> 2.47.1
> 

-- 
Peter Xu
Re: [RFC 1/6] migration: Add virtio-iterative capability
Posted by Jonah Palmer 3 months, 1 week ago

On 8/6/25 11:58 AM, Peter Xu wrote:
> On Tue, Jul 22, 2025 at 12:41:22PM +0000, Jonah Palmer wrote:
>> Adds a new migration capability 'virtio-iterative' that will allow
>> virtio devices, where supported, to iteratively migrate configuration
>> changes that occur during the migration process.
>>
>> This capability is added to the validated capabilities list to ensure
>> both the source and destination support it before enabling.
>>
>> The capability defaults to off to maintain backward compatibility.
>>
>> To enable the capability via HMP:
>> (qemu) migrate_set_capability virtio-iterative on
>>
>> To enable the capability via QMP:
>> {"execute": "migrate-set-capabilities", "arguments": {
>>       "capabilities": [
>>          { "capability": "virtio-iterative", "state": true }
>>       ]
>>    }
>> }
>>
>> Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
>> ---
>>   migration/savevm.c  | 1 +
>>   qapi/migration.json | 7 ++++++-
>>   2 files changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/migration/savevm.c b/migration/savevm.c
>> index bb04a4520d..40a2189866 100644
>> --- a/migration/savevm.c
>> +++ b/migration/savevm.c
>> @@ -279,6 +279,7 @@ static bool should_validate_capability(int capability)
>>       switch (capability) {
>>       case MIGRATION_CAPABILITY_X_IGNORE_SHARED:
>>       case MIGRATION_CAPABILITY_MAPPED_RAM:
>> +    case MIGRATION_CAPABILITY_VIRTIO_ITERATIVE:
>>           return true;
>>       default:
>>           return false;
>> diff --git a/qapi/migration.json b/qapi/migration.json
>> index 4963f6ca12..8f042c3ba5 100644
>> --- a/qapi/migration.json
>> +++ b/qapi/migration.json
>> @@ -479,6 +479,11 @@
>>   #     each RAM page.  Requires a migration URI that supports seeking,
>>   #     such as a file.  (since 9.0)
>>   #
>> +# @virtio-iterative: Enable iterative migration for virtio devices, if
>> +#     the device supports it. When enabled, and where supported, virtio
>> +#     devices will track and migrate configuration changes that may
>> +#     occur during the migration process. (Since 10.1)
>> +#
> 
> Having a migration capability to enable iterative support for a specific
> type of device sounds wrong.
> 
> If virtio will be able to support iterative saves, it could provide the
> save_live_iterate() function.  Any explanation why it needs to be a
> migration capability?
> 

It certainly doesn't have to be a migration capability. Perhaps it's 
better as a per-device compatibility property? E.g.:

-device virtio-net-pci,x-iterative-migration=on,...

I was just thinking along the lines of not having this feature enabled 
by default for backwards-compatibility (and something to toggle to 
compare performance during development).

Totally open to suggestions though. I wasn't really sure how best a 
feature/capability like this should be introduced.

>>   # Features:
>>   #
>>   # @unstable: Members @x-colo and @x-ignore-shared are experimental.
>> @@ -498,7 +503,7 @@
>>              { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] },
>>              'validate-uuid', 'background-snapshot',
>>              'zero-copy-send', 'postcopy-preempt', 'switchover-ack',
>> -           'dirty-limit', 'mapped-ram'] }
>> +           'dirty-limit', 'mapped-ram', 'virtio-iterative'] }
>>   
>>   ##
>>   # @MigrationCapabilityStatus:
>> -- 
>> 2.47.1
>>
>
Re: [RFC 1/6] migration: Add virtio-iterative capability
Posted by Peter Xu 3 months, 1 week ago
On Thu, Aug 07, 2025 at 08:50:38AM -0400, Jonah Palmer wrote:
> 
> 
> On 8/6/25 11:58 AM, Peter Xu wrote:
> > On Tue, Jul 22, 2025 at 12:41:22PM +0000, Jonah Palmer wrote:
> > > Adds a new migration capability 'virtio-iterative' that will allow
> > > virtio devices, where supported, to iteratively migrate configuration
> > > changes that occur during the migration process.
> > > 
> > > This capability is added to the validated capabilities list to ensure
> > > both the source and destination support it before enabling.
> > > 
> > > The capability defaults to off to maintain backward compatibility.
> > > 
> > > To enable the capability via HMP:
> > > (qemu) migrate_set_capability virtio-iterative on
> > > 
> > > To enable the capability via QMP:
> > > {"execute": "migrate-set-capabilities", "arguments": {
> > >       "capabilities": [
> > >          { "capability": "virtio-iterative", "state": true }
> > >       ]
> > >    }
> > > }
> > > 
> > > Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
> > > ---
> > >   migration/savevm.c  | 1 +
> > >   qapi/migration.json | 7 ++++++-
> > >   2 files changed, 7 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/migration/savevm.c b/migration/savevm.c
> > > index bb04a4520d..40a2189866 100644
> > > --- a/migration/savevm.c
> > > +++ b/migration/savevm.c
> > > @@ -279,6 +279,7 @@ static bool should_validate_capability(int capability)
> > >       switch (capability) {
> > >       case MIGRATION_CAPABILITY_X_IGNORE_SHARED:
> > >       case MIGRATION_CAPABILITY_MAPPED_RAM:
> > > +    case MIGRATION_CAPABILITY_VIRTIO_ITERATIVE:
> > >           return true;
> > >       default:
> > >           return false;
> > > diff --git a/qapi/migration.json b/qapi/migration.json
> > > index 4963f6ca12..8f042c3ba5 100644
> > > --- a/qapi/migration.json
> > > +++ b/qapi/migration.json
> > > @@ -479,6 +479,11 @@
> > >   #     each RAM page.  Requires a migration URI that supports seeking,
> > >   #     such as a file.  (since 9.0)
> > >   #
> > > +# @virtio-iterative: Enable iterative migration for virtio devices, if
> > > +#     the device supports it. When enabled, and where supported, virtio
> > > +#     devices will track and migrate configuration changes that may
> > > +#     occur during the migration process. (Since 10.1)
> > > +#
> > 
> > Having a migration capability to enable iterative support for a specific
> > type of device sounds wrong.
> > 
> > If virtio will be able to support iterative saves, it could provide the
> > save_live_iterate() function.  Any explanation why it needs to be a
> > migration capability?
> > 
> 
> It certainly doesn't have to be a migration capability. Perhaps it's better
> as a per-device compatibility property? E.g.:
> 
> -device virtio-net-pci,x-iterative-migration=on,...
> 
> I was just thinking along the lines of not having this feature enabled by
> default for backwards-compatibility (and something to toggle to compare
> performance during development).
> 
> Totally open to suggestions though. I wasn't really sure how best a
> feature/capability like this should be introduced.

Yep, for RFC is fine, if there'll be a formal patch please propose it as a
device property whenever needed, thanks.

-- 
Peter Xu
Re: [RFC 1/6] migration: Add virtio-iterative capability
Posted by Jonah Palmer 3 months, 1 week ago

On 8/7/25 9:13 AM, Peter Xu wrote:
> On Thu, Aug 07, 2025 at 08:50:38AM -0400, Jonah Palmer wrote:
>>
>>
>> On 8/6/25 11:58 AM, Peter Xu wrote:
>>> On Tue, Jul 22, 2025 at 12:41:22PM +0000, Jonah Palmer wrote:
>>>> Adds a new migration capability 'virtio-iterative' that will allow
>>>> virtio devices, where supported, to iteratively migrate configuration
>>>> changes that occur during the migration process.
>>>>
>>>> This capability is added to the validated capabilities list to ensure
>>>> both the source and destination support it before enabling.
>>>>
>>>> The capability defaults to off to maintain backward compatibility.
>>>>
>>>> To enable the capability via HMP:
>>>> (qemu) migrate_set_capability virtio-iterative on
>>>>
>>>> To enable the capability via QMP:
>>>> {"execute": "migrate-set-capabilities", "arguments": {
>>>>        "capabilities": [
>>>>           { "capability": "virtio-iterative", "state": true }
>>>>        ]
>>>>     }
>>>> }
>>>>
>>>> Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com>
>>>> ---
>>>>    migration/savevm.c  | 1 +
>>>>    qapi/migration.json | 7 ++++++-
>>>>    2 files changed, 7 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/migration/savevm.c b/migration/savevm.c
>>>> index bb04a4520d..40a2189866 100644
>>>> --- a/migration/savevm.c
>>>> +++ b/migration/savevm.c
>>>> @@ -279,6 +279,7 @@ static bool should_validate_capability(int capability)
>>>>        switch (capability) {
>>>>        case MIGRATION_CAPABILITY_X_IGNORE_SHARED:
>>>>        case MIGRATION_CAPABILITY_MAPPED_RAM:
>>>> +    case MIGRATION_CAPABILITY_VIRTIO_ITERATIVE:
>>>>            return true;
>>>>        default:
>>>>            return false;
>>>> diff --git a/qapi/migration.json b/qapi/migration.json
>>>> index 4963f6ca12..8f042c3ba5 100644
>>>> --- a/qapi/migration.json
>>>> +++ b/qapi/migration.json
>>>> @@ -479,6 +479,11 @@
>>>>    #     each RAM page.  Requires a migration URI that supports seeking,
>>>>    #     such as a file.  (since 9.0)
>>>>    #
>>>> +# @virtio-iterative: Enable iterative migration for virtio devices, if
>>>> +#     the device supports it. When enabled, and where supported, virtio
>>>> +#     devices will track and migrate configuration changes that may
>>>> +#     occur during the migration process. (Since 10.1)
>>>> +#
>>>
>>> Having a migration capability to enable iterative support for a specific
>>> type of device sounds wrong.
>>>
>>> If virtio will be able to support iterative saves, it could provide the
>>> save_live_iterate() function.  Any explanation why it needs to be a
>>> migration capability?
>>>
>>
>> It certainly doesn't have to be a migration capability. Perhaps it's better
>> as a per-device compatibility property? E.g.:
>>
>> -device virtio-net-pci,x-iterative-migration=on,...
>>
>> I was just thinking along the lines of not having this feature enabled by
>> default for backwards-compatibility (and something to toggle to compare
>> performance during development).
>>
>> Totally open to suggestions though. I wasn't really sure how best a
>> feature/capability like this should be introduced.
> 
> Yep, for RFC is fine, if there'll be a formal patch please propose it as a
> device property whenever needed, thanks.
> 

Gotcha, will do! Thanks for the suggestion :)

Jonah