From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
Update the VFIO documentation at docs/devel/migration describing the
changes brought by the multifd device state transfer.
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
---
docs/devel/migration/vfio.rst | 80 +++++++++++++++++++++++++++++++----
1 file changed, 71 insertions(+), 9 deletions(-)
diff --git a/docs/devel/migration/vfio.rst b/docs/devel/migration/vfio.rst
index c49482eab66d..d9b169d29921 100644
--- a/docs/devel/migration/vfio.rst
+++ b/docs/devel/migration/vfio.rst
@@ -16,6 +16,37 @@ helps to reduce the total downtime of the VM. VFIO devices opt-in to pre-copy
support by reporting the VFIO_MIGRATION_PRE_COPY flag in the
VFIO_DEVICE_FEATURE_MIGRATION ioctl.
+Starting from QEMU version 10.0 there's a possibility to transfer VFIO device
+_STOP_COPY state via multifd channels. This helps reduce downtime - especially
+with multiple VFIO devices or with devices having a large migration state.
+As an additional benefit, setting the VFIO device to _STOP_COPY state and
+saving its config space is also parallelized (run in a separate thread) in
+such migration mode.
+
+The multifd VFIO device state transfer is controlled by
+"x-migration-multifd-transfer" VFIO device property. This property defaults to
+AUTO, which means that VFIO device state transfer via multifd channels is
+attempted in configurations that otherwise support it.
+
+Since the target QEMU needs to load device state buffers in-order it needs to
+queue incoming buffers until they can be loaded into the device.
+This means that a malicious QEMU source could theoretically cause the target
+QEMU to allocate unlimited amounts of memory for such buffers-in-flight.
+
+The "x-migration-max-queued-buffers" property allows capping the maximum count
+of these VFIO device state buffers queued at the destination.
+
+Because a malicious QEMU source causing OOM on the target is not expected to be
+a realistic threat in most of VFIO live migration use cases and the right value
+depends on the particular setup by default this queued buffers limit is
+disabled by setting it to UINT64_MAX.
+
+Some host platforms (like ARM64) require that VFIO device config is loaded only
+after all iterables were loaded.
+Such interlocking is controlled by "x-migration-load-config-after-iter" VFIO
+device property, which in its default setting (AUTO) does so only on platforms
+that actually require it.
+
When pre-copy is supported, it's possible to further reduce downtime by
enabling "switchover-ack" migration capability.
VFIO migration uAPI defines "initial bytes" as part of its pre-copy data stream
@@ -67,14 +98,39 @@ VFIO implements the device hooks for the iterative approach as follows:
* A ``switchover_ack_needed`` function that checks if the VFIO device uses
"switchover-ack" migration capability when this capability is enabled.
-* A ``save_state`` function to save the device config space if it is present.
-
-* A ``save_live_complete_precopy`` function that sets the VFIO device in
- _STOP_COPY state and iteratively copies the data for the VFIO device until
- the vendor driver indicates that no data remains.
-
-* A ``load_state`` function that loads the config section and the data
- sections that are generated by the save functions above.
+* A ``switchover_start`` function that in the multifd mode starts a thread that
+ reassembles the multifd received data and loads it in-order into the device.
+ In the non-multifd mode this function is a NOP.
+
+* A ``save_state`` function to save the device config space if it is present
+ in the non-multifd mode.
+ In the multifd mode it just emits either a dummy EOS marker or
+ "all iterables were loaded" flag for configurations that need to defer
+ loading device config space after them.
+
+* A ``save_live_complete_precopy`` function that in the non-multifd mode sets
+ the VFIO device in _STOP_COPY state and iteratively copies the data for the
+ VFIO device until the vendor driver indicates that no data remains.
+ In the multifd mode it just emits a dummy EOS marker.
+
+* A ``save_live_complete_precopy_thread`` function that in the multifd mode
+ provides thread handler performing multifd device state transfer.
+ It sets the VFIO device to _STOP_COPY state, iteratively reads the data
+ from the VFIO device and queues it for multifd transmission until the vendor
+ driver indicates that no data remains.
+ After that, it saves the device config space and queues it for multifd
+ transfer too.
+ In the non-multifd mode this thread is a NOP.
+
+* A ``load_state`` function that loads the data sections that are generated
+ by the main migration channel save functions above.
+ In the non-multifd mode it also loads the config section, while in the
+ multifd mode it handles the optional "all iterables were loaded" flag if
+ it is in use.
+
+* A ``load_state_buffer`` function that loads the device state and the device
+ config that arrived via multifd channels.
+ It's used only in the multifd mode.
* ``cleanup`` functions for both save and load that perform any migration
related cleanup.
@@ -176,8 +232,11 @@ Live migration save path
Then the VFIO device is put in _STOP_COPY state
(FINISH_MIGRATE, _ACTIVE, _STOP_COPY)
.save_live_complete_precopy() is called for each active device
- For the VFIO device, iterate in .save_live_complete_precopy() until
+ For the VFIO device: in the non-multifd mode iterate in
+ .save_live_complete_precopy() until
pending data is 0
+ In the multifd mode this iteration is done in
+ .save_live_complete_precopy_thread() instead.
|
(POSTMIGRATE, _COMPLETED, _STOP_COPY)
Migraton thread schedules cleanup bottom half and exits
@@ -194,6 +253,9 @@ Live migration resume path
(RESTORE_VM, _ACTIVE, _STOP)
|
For each device, .load_state() is called for that device section data
+ transmitted via the main migration channel.
+ For data transmitted via multifd channels .load_state_buffer() is called
+ instead.
(RESTORE_VM, _ACTIVE, _RESUMING)
|
At the end, .load_cleanup() is called for each device and vCPUs are started
On 2/19/25 21:34, Maciej S. Szmigiero wrote: > From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com> > > Update the VFIO documentation at docs/devel/migration describing the > changes brought by the multifd device state transfer. > > Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> > --- > docs/devel/migration/vfio.rst | 80 +++++++++++++++++++++++++++++++---- > 1 file changed, 71 insertions(+), 9 deletions(-) > > diff --git a/docs/devel/migration/vfio.rst b/docs/devel/migration/vfio.rst > index c49482eab66d..d9b169d29921 100644 > --- a/docs/devel/migration/vfio.rst > +++ b/docs/devel/migration/vfio.rst > @@ -16,6 +16,37 @@ helps to reduce the total downtime of the VM. VFIO devices opt-in to pre-copy > support by reporting the VFIO_MIGRATION_PRE_COPY flag in the > VFIO_DEVICE_FEATURE_MIGRATION ioctl. Please add a new "multifd" documentation subsection at the end of the file with this part : > +Starting from QEMU version 10.0 there's a possibility to transfer VFIO device > +_STOP_COPY state via multifd channels. This helps reduce downtime - especially > +with multiple VFIO devices or with devices having a large migration state. > +As an additional benefit, setting the VFIO device to _STOP_COPY state and > +saving its config space is also parallelized (run in a separate thread) in > +such migration mode. > + > +The multifd VFIO device state transfer is controlled by > +"x-migration-multifd-transfer" VFIO device property. This property defaults to > +AUTO, which means that VFIO device state transfer via multifd channels is > +attempted in configurations that otherwise support it. > + I was expecting a much more detailed explanation on the design too : * in the cover letter * in the hw/vfio/migration-multifd.c * in some new file under docs/devel/migration/ This section : > +Since the target QEMU needs to load device state buffers in-order it needs to > +queue incoming buffers until they can be loaded into the device. > +This means that a malicious QEMU source could theoretically cause the target > +QEMU to allocate unlimited amounts of memory for such buffers-in-flight. > + > +The "x-migration-max-queued-buffers" property allows capping the maximum count > +of these VFIO device state buffers queued at the destination. > + > +Because a malicious QEMU source causing OOM on the target is not expected to be > +a realistic threat in most of VFIO live migration use cases and the right value > +depends on the particular setup by default this queued buffers limit is > +disabled by setting it to UINT64_MAX. should be in patch 34. It is not obvious it will be merged. This section : > +Some host platforms (like ARM64) require that VFIO device config is loaded only > +after all iterables were loaded. > +Such interlocking is controlled by "x-migration-load-config-after-iter" VFIO > +device property, which in its default setting (AUTO) does so only on platforms > +that actually require it. Should be in 35. Same reason. > When pre-copy is supported, it's possible to further reduce downtime by > enabling "switchover-ack" migration capability. > VFIO migration uAPI defines "initial bytes" as part of its pre-copy data stream > @@ -67,14 +98,39 @@ VFIO implements the device hooks for the iterative approach as follows: > * A ``switchover_ack_needed`` function that checks if the VFIO device uses > "switchover-ack" migration capability when this capability is enabled. > > -* A ``save_state`` function to save the device config space if it is present. > - > -* A ``save_live_complete_precopy`` function that sets the VFIO device in > - _STOP_COPY state and iteratively copies the data for the VFIO device until > - the vendor driver indicates that no data remains. > - > -* A ``load_state`` function that loads the config section and the data > - sections that are generated by the save functions above. > +* A ``switchover_start`` function that in the multifd mode starts a thread that > + reassembles the multifd received data and loads it in-order into the device. > + In the non-multifd mode this function is a NOP. > + > +* A ``save_state`` function to save the device config space if it is present > + in the non-multifd mode. > + In the multifd mode it just emits either a dummy EOS marker or > + "all iterables were loaded" flag for configurations that need to defer > + loading device config space after them. > + > +* A ``save_live_complete_precopy`` function that in the non-multifd mode sets > + the VFIO device in _STOP_COPY state and iteratively copies the data for the > + VFIO device until the vendor driver indicates that no data remains. > + In the multifd mode it just emits a dummy EOS marker. > + > +* A ``save_live_complete_precopy_thread`` function that in the multifd mode > + provides thread handler performing multifd device state transfer. > + It sets the VFIO device to _STOP_COPY state, iteratively reads the data > + from the VFIO device and queues it for multifd transmission until the vendor > + driver indicates that no data remains. > + After that, it saves the device config space and queues it for multifd > + transfer too. > + In the non-multifd mode this thread is a NOP. > + > +* A ``load_state`` function that loads the data sections that are generated > + by the main migration channel save functions above. > + In the non-multifd mode it also loads the config section, while in the > + multifd mode it handles the optional "all iterables were loaded" flag if > + it is in use. > + > +* A ``load_state_buffer`` function that loads the device state and the device > + config that arrived via multifd channels. > + It's used only in the multifd mode. Please move the documentation of the new migration handlers in the patch introducing them. Thanks, C. > > * ``cleanup`` functions for both save and load that perform any migration > related cleanup. > @@ -176,8 +232,11 @@ Live migration save path > Then the VFIO device is put in _STOP_COPY state > (FINISH_MIGRATE, _ACTIVE, _STOP_COPY) > .save_live_complete_precopy() is called for each active device > - For the VFIO device, iterate in .save_live_complete_precopy() until > + For the VFIO device: in the non-multifd mode iterate in > + .save_live_complete_precopy() until > pending data is 0 > + In the multifd mode this iteration is done in > + .save_live_complete_precopy_thread() instead. > | > (POSTMIGRATE, _COMPLETED, _STOP_COPY) > Migraton thread schedules cleanup bottom half and exits > @@ -194,6 +253,9 @@ Live migration resume path > (RESTORE_VM, _ACTIVE, _STOP) > | > For each device, .load_state() is called for that device section data > + transmitted via the main migration channel. > + For data transmitted via multifd channels .load_state_buffer() is called > + instead. > (RESTORE_VM, _ACTIVE, _RESUMING) > | > At the end, .load_cleanup() is called for each device and vCPUs are started >
On 27.02.2025 07:59, Cédric Le Goater wrote: > On 2/19/25 21:34, Maciej S. Szmigiero wrote: >> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com> >> >> Update the VFIO documentation at docs/devel/migration describing the >> changes brought by the multifd device state transfer. >> >> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> >> --- >> docs/devel/migration/vfio.rst | 80 +++++++++++++++++++++++++++++++---- >> 1 file changed, 71 insertions(+), 9 deletions(-) >> >> diff --git a/docs/devel/migration/vfio.rst b/docs/devel/migration/vfio.rst >> index c49482eab66d..d9b169d29921 100644 >> --- a/docs/devel/migration/vfio.rst >> +++ b/docs/devel/migration/vfio.rst >> @@ -16,6 +16,37 @@ helps to reduce the total downtime of the VM. VFIO devices opt-in to pre-copy >> support by reporting the VFIO_MIGRATION_PRE_COPY flag in the >> VFIO_DEVICE_FEATURE_MIGRATION ioctl. > > Please add a new "multifd" documentation subsection at the end of the file > with this part : > >> +Starting from QEMU version 10.0 there's a possibility to transfer VFIO device >> +_STOP_COPY state via multifd channels. This helps reduce downtime - especially >> +with multiple VFIO devices or with devices having a large migration state. >> +As an additional benefit, setting the VFIO device to _STOP_COPY state and >> +saving its config space is also parallelized (run in a separate thread) in >> +such migration mode. >> + >> +The multifd VFIO device state transfer is controlled by >> +"x-migration-multifd-transfer" VFIO device property. This property defaults to >> +AUTO, which means that VFIO device state transfer via multifd channels is >> +attempted in configurations that otherwise support it. >> + Done - I also moved the parts about x-migration-max-queued-buffers and x-migration-load-config-after-iter description there since obviously they wouldn't make sense being left alone in the top section. > I was expecting a much more detailed explanation on the design too : > > * in the cover letter > * in the hw/vfio/migration-multifd.c > * in some new file under docs/devel/migration/ > I'm not sure what descriptions you exactly want in these places, but since that's just documentation (not code) it could be added after the code freeze... > > This section : > >> +Since the target QEMU needs to load device state buffers in-order it needs to >> +queue incoming buffers until they can be loaded into the device. >> +This means that a malicious QEMU source could theoretically cause the target >> +QEMU to allocate unlimited amounts of memory for such buffers-in-flight. >> + >> +The "x-migration-max-queued-buffers" property allows capping the maximum count >> +of these VFIO device state buffers queued at the destination. >> + >> +Because a malicious QEMU source causing OOM on the target is not expected to be >> +a realistic threat in most of VFIO live migration use cases and the right value >> +depends on the particular setup by default this queued buffers limit is >> +disabled by setting it to UINT64_MAX. > > should be in patch 34. It is not obvious it will be merged. > ...which brings us to this point. I think by this point in time (less then 2 weeks to code freeze) we should finally decide what is going to be included in the patch set. This way this patch set could be well tested in its final form rather than having significant parts taken out of it at the eleventh hour. If the final form is known also the documentation can be adjusted accordingly and user/admin documentation eventually written once the code is considered okay. I though we discussed a few times the rationale behind both x-migration-max-queued-buffers and x-migration-load-config-after-iter properties but if you still have some concerns there please let me know before I prepare the next version of this patch set so I know whether to include these. > This section : > >> +Some host platforms (like ARM64) require that VFIO device config is loaded only >> +after all iterables were loaded. >> +Such interlocking is controlled by "x-migration-load-config-after-iter" VFIO >> +device property, which in its default setting (AUTO) does so only on platforms >> +that actually require it. > > Should be in 35. Same reason. > > >> When pre-copy is supported, it's possible to further reduce downtime by >> enabling "switchover-ack" migration capability. >> VFIO migration uAPI defines "initial bytes" as part of its pre-copy data stream >> @@ -67,14 +98,39 @@ VFIO implements the device hooks for the iterative approach as follows: >> * A ``switchover_ack_needed`` function that checks if the VFIO device uses >> "switchover-ack" migration capability when this capability is enabled. >> -* A ``save_state`` function to save the device config space if it is present. >> - >> -* A ``save_live_complete_precopy`` function that sets the VFIO device in >> - _STOP_COPY state and iteratively copies the data for the VFIO device until >> - the vendor driver indicates that no data remains. >> - >> -* A ``load_state`` function that loads the config section and the data >> - sections that are generated by the save functions above. >> +* A ``switchover_start`` function that in the multifd mode starts a thread that >> + reassembles the multifd received data and loads it in-order into the device. >> + In the non-multifd mode this function is a NOP. >> + >> +* A ``save_state`` function to save the device config space if it is present >> + in the non-multifd mode. >> + In the multifd mode it just emits either a dummy EOS marker or >> + "all iterables were loaded" flag for configurations that need to defer >> + loading device config space after them. >> + >> +* A ``save_live_complete_precopy`` function that in the non-multifd mode sets >> + the VFIO device in _STOP_COPY state and iteratively copies the data for the >> + VFIO device until the vendor driver indicates that no data remains. >> + In the multifd mode it just emits a dummy EOS marker. >> + >> +* A ``save_live_complete_precopy_thread`` function that in the multifd mode >> + provides thread handler performing multifd device state transfer. >> + It sets the VFIO device to _STOP_COPY state, iteratively reads the data >> + from the VFIO device and queues it for multifd transmission until the vendor >> + driver indicates that no data remains. >> + After that, it saves the device config space and queues it for multifd >> + transfer too. >> + In the non-multifd mode this thread is a NOP. >> + >> +* A ``load_state`` function that loads the data sections that are generated >> + by the main migration channel save functions above. >> + In the non-multifd mode it also loads the config section, while in the >> + multifd mode it handles the optional "all iterables were loaded" flag if >> + it is in use. >> + >> +* A ``load_state_buffer`` function that loads the device state and the device >> + config that arrived via multifd channels. >> + It's used only in the multifd mode. > > Please move the documentation of the new migration handlers in the > patch introducing them. > > > Thanks, > > C. > Thanks, Maciej
On 2/27/25 23:01, Maciej S. Szmigiero wrote: > On 27.02.2025 07:59, Cédric Le Goater wrote: >> On 2/19/25 21:34, Maciej S. Szmigiero wrote: >>> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com> >>> >>> Update the VFIO documentation at docs/devel/migration describing the >>> changes brought by the multifd device state transfer. >>> >>> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> >>> --- >>> docs/devel/migration/vfio.rst | 80 +++++++++++++++++++++++++++++++---- >>> 1 file changed, 71 insertions(+), 9 deletions(-) >>> >>> diff --git a/docs/devel/migration/vfio.rst b/docs/devel/migration/vfio.rst >>> index c49482eab66d..d9b169d29921 100644 >>> --- a/docs/devel/migration/vfio.rst >>> +++ b/docs/devel/migration/vfio.rst >>> @@ -16,6 +16,37 @@ helps to reduce the total downtime of the VM. VFIO devices opt-in to pre-copy >>> support by reporting the VFIO_MIGRATION_PRE_COPY flag in the >>> VFIO_DEVICE_FEATURE_MIGRATION ioctl. >> >> Please add a new "multifd" documentation subsection at the end of the file >> with this part : >> >>> +Starting from QEMU version 10.0 there's a possibility to transfer VFIO device >>> +_STOP_COPY state via multifd channels. This helps reduce downtime - especially >>> +with multiple VFIO devices or with devices having a large migration state. >>> +As an additional benefit, setting the VFIO device to _STOP_COPY state and >>> +saving its config space is also parallelized (run in a separate thread) in >>> +such migration mode. >>> + >>> +The multifd VFIO device state transfer is controlled by >>> +"x-migration-multifd-transfer" VFIO device property. This property defaults to >>> +AUTO, which means that VFIO device state transfer via multifd channels is >>> +attempted in configurations that otherwise support it. >>> + > > Done - I also moved the parts about x-migration-max-queued-buffers > and x-migration-load-config-after-iter description there since > obviously they wouldn't make sense being left alone in the top section. > >> I was expecting a much more detailed explanation on the design too : >> >> * in the cover letter >> * in the hw/vfio/migration-multifd.c >> * in some new file under docs/devel/migration/ I forgot to add : * guide on how to use this new feature from QEMU and libvirt. something we can refer to for tests. That's a must have. * usage scenarios There are some benefits but it is not obvious a user would like to use multiple VFs in one VM, please explain. This is a major addition which needs justification anyhow * pros and cons > I'm not sure what descriptions you exactly want in these places, Looking from the VFIO subsystem, the way this series works is very opaque. There are a couple of a new migration handlers, new threads, new channels, etc. It has been discussed several times with migration folks, please provide a summary for a new reader as ignorant as everyone would be when looking at a new file. > but since > that's just documentation (not code) it could be added after the code freeze... That's the risk of not getting any ! and the initial proposal should be discussed before code freeze. For the general framework, I was expecting an extension of a "multifd" subsection under : https://qemu.readthedocs.io/en/v9.2.0/devel/migration/features.html but it doesn't exist :/ So, for now, let's use the new "multifd" subsection of https://qemu.readthedocs.io/en/v9.2.0/devel/migration/vfio.html > >> >> This section : >> >>> +Since the target QEMU needs to load device state buffers in-order it needs to >>> +queue incoming buffers until they can be loaded into the device. >>> +This means that a malicious QEMU source could theoretically cause the target >>> +QEMU to allocate unlimited amounts of memory for such buffers-in-flight. >>> + >>> +The "x-migration-max-queued-buffers" property allows capping the maximum count >>> +of these VFIO device state buffers queued at the destination. >>> + >>> +Because a malicious QEMU source causing OOM on the target is not expected to be >>> +a realistic threat in most of VFIO live migration use cases and the right value >>> +depends on the particular setup by default this queued buffers limit is >>> +disabled by setting it to UINT64_MAX. >> >> should be in patch 34. It is not obvious it will be merged. >> > > ...which brings us to this point. > > I think by this point in time (less then 2 weeks to code freeze) we should > finally decide what is going to be included in the patch set. > > This way this patch set could be well tested in its final form rather than > having significant parts taken out of it at the eleventh hour. > > If the final form is known also the documentation can be adjusted accordingly > and user/admin documentation eventually written once the code is considered > okay. > > I though we discussed a few times the rationale behind both > x-migration-max-queued-buffers and x-migration-load-config-after-iter properties > but if you still have some concerns there please let me know before I prepare > the next version of this patch set so I know whether to include these. Patch 34, not sure yet. Patch 35 is for next cycle IMO. For QEMU 10.0, let's focus on x86 first and see how it goes. We can add ARM support in QEMU 10.1 if nothing new arises. We will need the virt-arm folks in cc: then. Please keep patch 35 in v6 nevertheless, it is good for reference if someone wants to apply on an out of tree QEMU. Thanks, C. > >> This section : >> >>> +Some host platforms (like ARM64) require that VFIO device config is loaded only >>> +after all iterables were loaded. >>> +Such interlocking is controlled by "x-migration-load-config-after-iter" VFIO >>> +device property, which in its default setting (AUTO) does so only on platforms >>> +that actually require it. >> >> Should be in 35. Same reason. >> >> >>> When pre-copy is supported, it's possible to further reduce downtime by >>> enabling "switchover-ack" migration capability. >>> VFIO migration uAPI defines "initial bytes" as part of its pre-copy data stream >>> @@ -67,14 +98,39 @@ VFIO implements the device hooks for the iterative approach as follows: >>> * A ``switchover_ack_needed`` function that checks if the VFIO device uses >>> "switchover-ack" migration capability when this capability is enabled. >>> -* A ``save_state`` function to save the device config space if it is present. >>> - >>> -* A ``save_live_complete_precopy`` function that sets the VFIO device in >>> - _STOP_COPY state and iteratively copies the data for the VFIO device until >>> - the vendor driver indicates that no data remains. >>> - >>> -* A ``load_state`` function that loads the config section and the data >>> - sections that are generated by the save functions above. >>> +* A ``switchover_start`` function that in the multifd mode starts a thread that >>> + reassembles the multifd received data and loads it in-order into the device. >>> + In the non-multifd mode this function is a NOP. >>> + >>> +* A ``save_state`` function to save the device config space if it is present >>> + in the non-multifd mode. >>> + In the multifd mode it just emits either a dummy EOS marker or >>> + "all iterables were loaded" flag for configurations that need to defer >>> + loading device config space after them. >>> + >>> +* A ``save_live_complete_precopy`` function that in the non-multifd mode sets >>> + the VFIO device in _STOP_COPY state and iteratively copies the data for the >>> + VFIO device until the vendor driver indicates that no data remains. >>> + In the multifd mode it just emits a dummy EOS marker. >>> + >>> +* A ``save_live_complete_precopy_thread`` function that in the multifd mode >>> + provides thread handler performing multifd device state transfer. >>> + It sets the VFIO device to _STOP_COPY state, iteratively reads the data >>> + from the VFIO device and queues it for multifd transmission until the vendor >>> + driver indicates that no data remains. >>> + After that, it saves the device config space and queues it for multifd >>> + transfer too. >>> + In the non-multifd mode this thread is a NOP. >>> + >>> +* A ``load_state`` function that loads the data sections that are generated >>> + by the main migration channel save functions above. >>> + In the non-multifd mode it also loads the config section, while in the >>> + multifd mode it handles the optional "all iterables were loaded" flag if >>> + it is in use. >>> + >>> +* A ``load_state_buffer`` function that loads the device state and the device >>> + config that arrived via multifd channels. >>> + It's used only in the multifd mode. >> >> Please move the documentation of the new migration handlers in the >> patch introducing them. >> >> >> Thanks, >> >> C. >> > > Thanks, > Maciej >
Cédric Le Goater <clg@redhat.com> writes: > On 2/27/25 23:01, Maciej S. Szmigiero wrote: >> On 27.02.2025 07:59, Cédric Le Goater wrote: >>> On 2/19/25 21:34, Maciej S. Szmigiero wrote: >>>> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com> >>>> >>>> Update the VFIO documentation at docs/devel/migration describing the >>>> changes brought by the multifd device state transfer. >>>> >>>> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> >>>> --- >>>> docs/devel/migration/vfio.rst | 80 +++++++++++++++++++++++++++++++---- >>>> 1 file changed, 71 insertions(+), 9 deletions(-) >>>> >>>> diff --git a/docs/devel/migration/vfio.rst b/docs/devel/migration/vfio.rst >>>> index c49482eab66d..d9b169d29921 100644 >>>> --- a/docs/devel/migration/vfio.rst >>>> +++ b/docs/devel/migration/vfio.rst >>>> @@ -16,6 +16,37 @@ helps to reduce the total downtime of the VM. VFIO devices opt-in to pre-copy >>>> support by reporting the VFIO_MIGRATION_PRE_COPY flag in the >>>> VFIO_DEVICE_FEATURE_MIGRATION ioctl. >>> >>> Please add a new "multifd" documentation subsection at the end of the file >>> with this part : >>> >>>> +Starting from QEMU version 10.0 there's a possibility to transfer VFIO device >>>> +_STOP_COPY state via multifd channels. This helps reduce downtime - especially >>>> +with multiple VFIO devices or with devices having a large migration state. >>>> +As an additional benefit, setting the VFIO device to _STOP_COPY state and >>>> +saving its config space is also parallelized (run in a separate thread) in >>>> +such migration mode. >>>> + >>>> +The multifd VFIO device state transfer is controlled by >>>> +"x-migration-multifd-transfer" VFIO device property. This property defaults to >>>> +AUTO, which means that VFIO device state transfer via multifd channels is >>>> +attempted in configurations that otherwise support it. >>>> + >> >> Done - I also moved the parts about x-migration-max-queued-buffers >> and x-migration-load-config-after-iter description there since >> obviously they wouldn't make sense being left alone in the top section. >> >>> I was expecting a much more detailed explanation on the design too : >>> >>> * in the cover letter >>> * in the hw/vfio/migration-multifd.c >>> * in some new file under docs/devel/migration/ > > I forgot to add : > > * guide on how to use this new feature from QEMU and libvirt. > something we can refer to for tests. That's a must have. > * usage scenarios > There are some benefits but it is not obvious a user would > like to use multiple VFs in one VM, please explain. > This is a major addition which needs justification anyhow > * pros and cons > >> I'm not sure what descriptions you exactly want in these places, > > Looking from the VFIO subsystem, the way this series works is very opaque. > There are a couple of a new migration handlers, new threads, new channels, > etc. It has been discussed several times with migration folks, please provide > a summary for a new reader as ignorant as everyone would be when looking at > a new file. > > >> but since >> that's just documentation (not code) it could be added after the code freeze... > > That's the risk of not getting any ! and the initial proposal should be > discussed before code freeze. > > For the general framework, I was expecting an extension of a "multifd" > subsection under : > > https://qemu.readthedocs.io/en/v9.2.0/devel/migration/features.html > > but it doesn't exist :/ Hi, see if this helps. Let me know what can be improved and if something needs to be more detailed. Please ignore the formatting, I'll send a proper patch after the carnaval. @Maciej, it's probably better if you keep your docs separate anyway so we don't add another dependency. I can merge them later. multifd.rst: Multifd ======= Multifd is the name given for the migration capability that enables data transfer using multiple threads. Multifd supports all the transport types currently in use with migration (inet, unix, vsock, fd, file). Restrictions ------------ For migration to a file, support is conditional on the presence of the mapped-ram capability, see #mapped-ram. Snapshots are currently not supported. Postcopy migration is currently not supported. Usage ----- On both source and destination, enable the ``multifd`` capability: ``migrate_set_capability multifd on`` Define a number of channels to use (default is 2, but 8 usually provides best performance). ``migrate_set_parameter multifd-channels 8`` Components ---------- Multifd consists of: - A client that produces the data on the migration source side and consumes it on the destination. Currently the main client code is ram.c, which selects the RAM pages for migration; - A shared data structure (MultiFDSendData), used to transfer data between multifd and the client. On the source side, this structure is further subdivided into payload types (MultiFDPayload); - An API operating on the shared data structure to allow the client code to interact with multifd; - multifd_send/recv(): A dispatcher that transfers work to/from the channels. - multifd_*payload_* and MultiFDPayloadType: Support defining an opaque payload. The payload is always wrapped by MultiFDSend|RecvData. - multifd_send_data_*: Used to manage the memory for the shared data structure. - The threads that process the data (aka channels, due to a 1:1 mapping to QIOChannels). Each multifd channel supports callbacks that can be used for fine-grained processing of the payload, such as compression and zero page detection. - A packet which is the final result of all the data aggregation and/or transformation. The packet contains a header, a payload-specific header and a variable-size data portion. - The packet header: contains a magic number, a version number and flags that inform of special processing needed on the destination. - The payload-specific header: contains metadata referent to the packet's data portion, such as page counts. - The data portion: contains the actual opaque payload data. Note that due to historical reasons, the terminology around multifd packets is inconsistent. The mapped-ram feature ignores packets entirely. Theory of operation ------------------- The multifd channels operate in parallel with the main migration thread. The transfer of data from a client code into multifd happens from the main migration thread using the multifd API. The interaction between the client code and the multifd channels happens in the multifd_send() and multifd_recv() methods. These are reponsible for selecting the next idle channel and making the shared data structure containing the payload accessible to that channel. The client code receives back an empty object which it then uses for the next iteration of data transfer. The selection of idle channels is simply a round-robin over the idle channels (!p->pending_job). Channels wait at a semaphore, once a channel is released, it starts operating on the data immediately. Aside from eventually transmitting the data over the underlying QIOChannel, a channel's operation also includes calling back to the client code at pre-determined points to allow for client-specific handling such as data transformation (e.g. compression), creation of the packet header and arranging the data into iovs (struct iovec). Iovs are the type of data on which the QIOChannel operates. Client code (migration thread): 1. Populate shared structure with opaque data (ram pages, device state) 2. Call multifd_send() 2a. Loop over the channels until one is idle 2b. Switch pointers between client data and channel data 2c. Release channel semaphore 3. Receive back empty object 4. Repeat Multifd channel (multifd thread): 1. Channel idle 2. Gets released by multifd_send() 3. Call multifd_ops methods to fill iov 3a. Compression may happen 3b. Zero page detection may happen 3c. Packet is written 3d. iov is written 4. Pass iov into QIOChannel for transferring 5. Repeat The destination side operates similarly but with multifd_recv(), decompression instead of compression, etc. One important aspect is that when receiving the data, the iov will contain host virtual addresses, so guest memory is written to directly from multifd threads. About flags ----------- The main thread orchestrates the migration by issuing control flags on the migration stream (QEMU_VM_*). The main memory is migrated by ram.c and includes specific control flags that are also put on the main migration stream (RAM_SAVE_FLAG_*). Multifd has its own set of MULTIFD_FLAGs that are included into each packet. These may inform about properties such as the compression algorithm used if the data is compressed. Synchronization --------------- Since the migration process is iterative due to RAM dirty tracking, it is necessary to invalidate data that is no longer current (e.g. due to the source VM touching the page). This is done by having a synchronization point triggered by the migration thread at key points during the migration. Data that's received after the synchronization point is allowed to overwrite data received prior to that point. To perform the synchronization, multifd provides the multifd_send_sync_main() and multifd_recv_sync_main() helpers. These are called whenever the client code whishes to ensure that all data sent previously has now been received by the destination. The synchronization process involves performing a flush of the ramaining client data still left to be transmitted and issuing a multifd packet containing the MULTIFD_FLAG_SYNC flag. This flag informs the receiving end that it should finish reading the data and wait for a synchronization point. To complete the sync, the main migration stream issues a RAM_SAVE_FLAG_MULTIFD_FLUSH flag. When that flag is received by the destination, it ensures all of its channels have seen the MULTIFD_FLAG_SYNC and moves them to an idle state. The client code can then continue with a second round of data by issuing multifd_send() once again. The synchronization process also ensures that internal synchronization happens, i.e. between each thread. This is necessary to avoid threads lagging behind sending or receiving when the migration approaches completion. The mapped-ram feature has different synchronization requirements because it's an asynchronous migration (source and destination not migrating at the same time). For that feature, only the internal sync is relevant. Data transformation ------------------- Each multifd channel executes a set of callbacks before transmitting the data. These callbacks allow the client code to alter the data format right before sending and after receiving. Since the object of the RAM migration is always the memory page and the only processing done for memory pages is zero page detection, which is already part of compression in a sense, the multifd_ops functions are mutually exclusively divided into compression and no-compression. The migration without compression (i.e. regular ram migration) has a further specificity as mentioned of possibly doing zero page detection (see zero-page-detection migration parameter). This consists of sending all pages to multifd and letting the detection of a zero page happen in the multifd channels instead of doing it beforehand on the main migration thread as it was done in the past. Code structure -------------- Multifd code is divided into: The main file containing the core routines - multifd.c RAM migration - multifd-nocomp.c (nocomp, for "no compression") - multifd-zero-page.c - ram.c (also involved in non-multifd migrations + snapshots) Compressors - multifd-uadk.c - multifd-qatzip.c - multifd-zlib.c - multifd-qpl.c - multifd-zstd.c
On 1.03.2025 00:38, Fabiano Rosas wrote: > Cédric Le Goater <clg@redhat.com> writes: > >> On 2/27/25 23:01, Maciej S. Szmigiero wrote: >>> On 27.02.2025 07:59, Cédric Le Goater wrote: >>>> On 2/19/25 21:34, Maciej S. Szmigiero wrote: >>>>> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com> >>>>> >>>>> Update the VFIO documentation at docs/devel/migration describing the >>>>> changes brought by the multifd device state transfer. >>>>> >>>>> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> >>>>> --- >>>>> docs/devel/migration/vfio.rst | 80 +++++++++++++++++++++++++++++++---- >>>>> 1 file changed, 71 insertions(+), 9 deletions(-) >>>>> >>>>> diff --git a/docs/devel/migration/vfio.rst b/docs/devel/migration/vfio.rst >>>>> index c49482eab66d..d9b169d29921 100644 >>>>> --- a/docs/devel/migration/vfio.rst >>>>> +++ b/docs/devel/migration/vfio.rst >>>>> @@ -16,6 +16,37 @@ helps to reduce the total downtime of the VM. VFIO devices opt-in to pre-copy >>>>> support by reporting the VFIO_MIGRATION_PRE_COPY flag in the >>>>> VFIO_DEVICE_FEATURE_MIGRATION ioctl. >>>> >>>> Please add a new "multifd" documentation subsection at the end of the file >>>> with this part : >>>> >>>>> +Starting from QEMU version 10.0 there's a possibility to transfer VFIO device >>>>> +_STOP_COPY state via multifd channels. This helps reduce downtime - especially >>>>> +with multiple VFIO devices or with devices having a large migration state. >>>>> +As an additional benefit, setting the VFIO device to _STOP_COPY state and >>>>> +saving its config space is also parallelized (run in a separate thread) in >>>>> +such migration mode. >>>>> + >>>>> +The multifd VFIO device state transfer is controlled by >>>>> +"x-migration-multifd-transfer" VFIO device property. This property defaults to >>>>> +AUTO, which means that VFIO device state transfer via multifd channels is >>>>> +attempted in configurations that otherwise support it. >>>>> + >>> >>> Done - I also moved the parts about x-migration-max-queued-buffers >>> and x-migration-load-config-after-iter description there since >>> obviously they wouldn't make sense being left alone in the top section. >>> >>>> I was expecting a much more detailed explanation on the design too : >>>> >>>> * in the cover letter >>>> * in the hw/vfio/migration-multifd.c >>>> * in some new file under docs/devel/migration/ >> >> I forgot to add : >> >> * guide on how to use this new feature from QEMU and libvirt. >> something we can refer to for tests. That's a must have. >> * usage scenarios >> There are some benefits but it is not obvious a user would >> like to use multiple VFs in one VM, please explain. >> This is a major addition which needs justification anyhow >> * pros and cons >> >>> I'm not sure what descriptions you exactly want in these places, >> >> Looking from the VFIO subsystem, the way this series works is very opaque. >> There are a couple of a new migration handlers, new threads, new channels, >> etc. It has been discussed several times with migration folks, please provide >> a summary for a new reader as ignorant as everyone would be when looking at >> a new file. >> >> >>> but since >>> that's just documentation (not code) it could be added after the code freeze... >> >> That's the risk of not getting any ! and the initial proposal should be >> discussed before code freeze. >> >> For the general framework, I was expecting an extension of a "multifd" >> subsection under : >> >> https://qemu.readthedocs.io/en/v9.2.0/devel/migration/features.html >> >> but it doesn't exist :/ > > Hi, see if this helps. Let me know what can be improved and if something > needs to be more detailed. Please ignore the formatting, I'll send a > proper patch after the carnaval. > > @Maciej, it's probably better if you keep your docs separate anyway so > we don't add another dependency. I can merge them later. That's a very good idea, thanks for writing this multifd doc Fabiano! > multifd.rst: > > Multifd > ======= > > Multifd is the name given for the migration capability that enables > data transfer using multiple threads. Multifd supports all the > transport types currently in use with migration (inet, unix, vsock, > fd, file). (..) Thanks, Maciej
On 3/1/25 00:38, Fabiano Rosas wrote: > Cédric Le Goater <clg@redhat.com> writes: > >> On 2/27/25 23:01, Maciej S. Szmigiero wrote: >>> On 27.02.2025 07:59, Cédric Le Goater wrote: >>>> On 2/19/25 21:34, Maciej S. Szmigiero wrote: >>>>> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com> >>>>> >>>>> Update the VFIO documentation at docs/devel/migration describing the >>>>> changes brought by the multifd device state transfer. >>>>> >>>>> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> >>>>> --- >>>>> docs/devel/migration/vfio.rst | 80 +++++++++++++++++++++++++++++++---- >>>>> 1 file changed, 71 insertions(+), 9 deletions(-) >>>>> >>>>> diff --git a/docs/devel/migration/vfio.rst b/docs/devel/migration/vfio.rst >>>>> index c49482eab66d..d9b169d29921 100644 >>>>> --- a/docs/devel/migration/vfio.rst >>>>> +++ b/docs/devel/migration/vfio.rst >>>>> @@ -16,6 +16,37 @@ helps to reduce the total downtime of the VM. VFIO devices opt-in to pre-copy >>>>> support by reporting the VFIO_MIGRATION_PRE_COPY flag in the >>>>> VFIO_DEVICE_FEATURE_MIGRATION ioctl. >>>> >>>> Please add a new "multifd" documentation subsection at the end of the file >>>> with this part : >>>> >>>>> +Starting from QEMU version 10.0 there's a possibility to transfer VFIO device >>>>> +_STOP_COPY state via multifd channels. This helps reduce downtime - especially >>>>> +with multiple VFIO devices or with devices having a large migration state. >>>>> +As an additional benefit, setting the VFIO device to _STOP_COPY state and >>>>> +saving its config space is also parallelized (run in a separate thread) in >>>>> +such migration mode. >>>>> + >>>>> +The multifd VFIO device state transfer is controlled by >>>>> +"x-migration-multifd-transfer" VFIO device property. This property defaults to >>>>> +AUTO, which means that VFIO device state transfer via multifd channels is >>>>> +attempted in configurations that otherwise support it. >>>>> + >>> >>> Done - I also moved the parts about x-migration-max-queued-buffers >>> and x-migration-load-config-after-iter description there since >>> obviously they wouldn't make sense being left alone in the top section. >>> >>>> I was expecting a much more detailed explanation on the design too : >>>> >>>> * in the cover letter >>>> * in the hw/vfio/migration-multifd.c >>>> * in some new file under docs/devel/migration/ >> >> I forgot to add : >> >> * guide on how to use this new feature from QEMU and libvirt. >> something we can refer to for tests. That's a must have. >> * usage scenarios >> There are some benefits but it is not obvious a user would >> like to use multiple VFs in one VM, please explain. >> This is a major addition which needs justification anyhow >> * pros and cons >> >>> I'm not sure what descriptions you exactly want in these places, >> >> Looking from the VFIO subsystem, the way this series works is very opaque. >> There are a couple of a new migration handlers, new threads, new channels, >> etc. It has been discussed several times with migration folks, please provide >> a summary for a new reader as ignorant as everyone would be when looking at >> a new file. >> >> >>> but since >>> that's just documentation (not code) it could be added after the code freeze... >> >> That's the risk of not getting any ! and the initial proposal should be >> discussed before code freeze. >> >> For the general framework, I was expecting an extension of a "multifd" >> subsection under : >> >> https://qemu.readthedocs.io/en/v9.2.0/devel/migration/features.html >> >> but it doesn't exist :/ > > Hi, see if this helps. Let me know what can be improved and if something > needs to be more detailed. Please ignore the formatting, I'll send a > proper patch after the carnaval. This is very good ! Thanks a lot Fabiano for providing this input. > @Maciej, it's probably better if you keep your docs separate anyway so > we don't add another dependency. I can merge them later. Perfect. Maciej, We will adjust the file to apply it to before merging. Thanks, C. > > multifd.rst: > > Multifd > ======= > > Multifd is the name given for the migration capability that enables > data transfer using multiple threads. Multifd supports all the > transport types currently in use with migration (inet, unix, vsock, > fd, file). > > Restrictions > ------------ > > For migration to a file, support is conditional on the presence of the > mapped-ram capability, see #mapped-ram. > > Snapshots are currently not supported. > > Postcopy migration is currently not supported. > > Usage > ----- > > On both source and destination, enable the ``multifd`` capability: > > ``migrate_set_capability multifd on`` > > Define a number of channels to use (default is 2, but 8 usually > provides best performance). > > ``migrate_set_parameter multifd-channels 8`` > > Components > ---------- > > Multifd consists of: > > - A client that produces the data on the migration source side and > consumes it on the destination. Currently the main client code is > ram.c, which selects the RAM pages for migration; > > - A shared data structure (MultiFDSendData), used to transfer data > between multifd and the client. On the source side, this structure > is further subdivided into payload types (MultiFDPayload); > > - An API operating on the shared data structure to allow the client > code to interact with multifd; > > - multifd_send/recv(): A dispatcher that transfers work to/from the > channels. > > - multifd_*payload_* and MultiFDPayloadType: Support defining an > opaque payload. The payload is always wrapped by > MultiFDSend|RecvData. > > - multifd_send_data_*: Used to manage the memory for the shared data > structure. > > - The threads that process the data (aka channels, due to a 1:1 > mapping to QIOChannels). Each multifd channel supports callbacks > that can be used for fine-grained processing of the payload, such as > compression and zero page detection. > > - A packet which is the final result of all the data aggregation > and/or transformation. The packet contains a header, a > payload-specific header and a variable-size data portion. > > - The packet header: contains a magic number, a version number and > flags that inform of special processing needed on the > destination. > > - The payload-specific header: contains metadata referent to the > packet's data portion, such as page counts. > > - The data portion: contains the actual opaque payload data. > > Note that due to historical reasons, the terminology around multifd > packets is inconsistent. > > The mapped-ram feature ignores packets entirely. > > Theory of operation > ------------------- > > The multifd channels operate in parallel with the main migration > thread. The transfer of data from a client code into multifd happens > from the main migration thread using the multifd API. > > The interaction between the client code and the multifd channels > happens in the multifd_send() and multifd_recv() methods. These are > reponsible for selecting the next idle channel and making the shared > data structure containing the payload accessible to that channel. The > client code receives back an empty object which it then uses for the > next iteration of data transfer. > > The selection of idle channels is simply a round-robin over the idle > channels (!p->pending_job). Channels wait at a semaphore, once a > channel is released, it starts operating on the data immediately. > > Aside from eventually transmitting the data over the underlying > QIOChannel, a channel's operation also includes calling back to the > client code at pre-determined points to allow for client-specific > handling such as data transformation (e.g. compression), creation of > the packet header and arranging the data into iovs (struct > iovec). Iovs are the type of data on which the QIOChannel operates. > > Client code (migration thread): > 1. Populate shared structure with opaque data (ram pages, device state) > 2. Call multifd_send() > 2a. Loop over the channels until one is idle > 2b. Switch pointers between client data and channel data > 2c. Release channel semaphore > 3. Receive back empty object > 4. Repeat > > Multifd channel (multifd thread): > 1. Channel idle > 2. Gets released by multifd_send() > 3. Call multifd_ops methods to fill iov > 3a. Compression may happen > 3b. Zero page detection may happen > 3c. Packet is written > 3d. iov is written > 4. Pass iov into QIOChannel for transferring > 5. Repeat > > The destination side operates similarly but with multifd_recv(), > decompression instead of compression, etc. One important aspect is > that when receiving the data, the iov will contain host virtual > addresses, so guest memory is written to directly from multifd > threads. > > About flags > ----------- > The main thread orchestrates the migration by issuing control flags on > the migration stream (QEMU_VM_*). > > The main memory is migrated by ram.c and includes specific control > flags that are also put on the main migration stream > (RAM_SAVE_FLAG_*). > > Multifd has its own set of MULTIFD_FLAGs that are included into each > packet. These may inform about properties such as the compression > algorithm used if the data is compressed. > > Synchronization > --------------- > > Since the migration process is iterative due to RAM dirty tracking, it > is necessary to invalidate data that is no longer current (e.g. due to > the source VM touching the page). This is done by having a > synchronization point triggered by the migration thread at key points > during the migration. Data that's received after the synchronization > point is allowed to overwrite data received prior to that point. > > To perform the synchronization, multifd provides the > multifd_send_sync_main() and multifd_recv_sync_main() helpers. These > are called whenever the client code whishes to ensure that all data > sent previously has now been received by the destination. > > The synchronization process involves performing a flush of the > ramaining client data still left to be transmitted and issuing a > multifd packet containing the MULTIFD_FLAG_SYNC flag. This flag > informs the receiving end that it should finish reading the data and > wait for a synchronization point. > > To complete the sync, the main migration stream issues a > RAM_SAVE_FLAG_MULTIFD_FLUSH flag. When that flag is received by the > destination, it ensures all of its channels have seen the > MULTIFD_FLAG_SYNC and moves them to an idle state. > > The client code can then continue with a second round of data by > issuing multifd_send() once again. > > The synchronization process also ensures that internal synchronization > happens, i.e. between each thread. This is necessary to avoid threads > lagging behind sending or receiving when the migration approaches > completion. > > The mapped-ram feature has different synchronization requirements > because it's an asynchronous migration (source and destination not > migrating at the same time). For that feature, only the internal sync > is relevant. > > Data transformation > ------------------- > > Each multifd channel executes a set of callbacks before transmitting > the data. These callbacks allow the client code to alter the data > format right before sending and after receiving. > > Since the object of the RAM migration is always the memory page and > the only processing done for memory pages is zero page detection, > which is already part of compression in a sense, the multifd_ops > functions are mutually exclusively divided into compression and > no-compression. > > The migration without compression (i.e. regular ram migration) has a > further specificity as mentioned of possibly doing zero page detection > (see zero-page-detection migration parameter). This consists of > sending all pages to multifd and letting the detection of a zero page > happen in the multifd channels instead of doing it beforehand on the > main migration thread as it was done in the past. > > Code structure > -------------- > > Multifd code is divided into: > > The main file containing the core routines > > - multifd.c > > RAM migration > > - multifd-nocomp.c (nocomp, for "no compression") > - multifd-zero-page.c > - ram.c (also involved in non-multifd migrations + snapshots) > > Compressors > > - multifd-uadk.c > - multifd-qatzip.c > - multifd-zlib.c > - multifd-qpl.c > - multifd-zstd.c >
On 28.02.2025 11:05, Cédric Le Goater wrote: > On 2/27/25 23:01, Maciej S. Szmigiero wrote: >> On 27.02.2025 07:59, Cédric Le Goater wrote: >>> On 2/19/25 21:34, Maciej S. Szmigiero wrote: >>>> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com> >>>> >>>> Update the VFIO documentation at docs/devel/migration describing the >>>> changes brought by the multifd device state transfer. >>>> >>>> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> >>>> --- >>>> docs/devel/migration/vfio.rst | 80 +++++++++++++++++++++++++++++++---- >>>> 1 file changed, 71 insertions(+), 9 deletions(-) >>>> >>>> diff --git a/docs/devel/migration/vfio.rst b/docs/devel/migration/vfio.rst >>>> index c49482eab66d..d9b169d29921 100644 >>>> --- a/docs/devel/migration/vfio.rst >>>> +++ b/docs/devel/migration/vfio.rst >>>> @@ -16,6 +16,37 @@ helps to reduce the total downtime of the VM. VFIO devices opt-in to pre-copy >>>> support by reporting the VFIO_MIGRATION_PRE_COPY flag in the >>>> VFIO_DEVICE_FEATURE_MIGRATION ioctl. >>> >>> Please add a new "multifd" documentation subsection at the end of the file >>> with this part : >>> >>>> +Starting from QEMU version 10.0 there's a possibility to transfer VFIO device >>>> +_STOP_COPY state via multifd channels. This helps reduce downtime - especially >>>> +with multiple VFIO devices or with devices having a large migration state. >>>> +As an additional benefit, setting the VFIO device to _STOP_COPY state and >>>> +saving its config space is also parallelized (run in a separate thread) in >>>> +such migration mode. >>>> + >>>> +The multifd VFIO device state transfer is controlled by >>>> +"x-migration-multifd-transfer" VFIO device property. This property defaults to >>>> +AUTO, which means that VFIO device state transfer via multifd channels is >>>> +attempted in configurations that otherwise support it. >>>> + >> >> Done - I also moved the parts about x-migration-max-queued-buffers >> and x-migration-load-config-after-iter description there since >> obviously they wouldn't make sense being left alone in the top section. >> >>> I was expecting a much more detailed explanation on the design too : >>> >>> * in the cover letter >>> * in the hw/vfio/migration-multifd.c >>> * in some new file under docs/devel/migration/ > > I forgot to add : > > * guide on how to use this new feature from QEMU and libvirt. > something we can refer to for tests. That's a must have. So basically a user's guide. That's something I plan to write after the code is ready. > * usage scenarios > There are some benefits but it is not obvious a user would > like to use multiple VFs in one VM, please explain. Hmm, this patch set does not bring ability to use multiple VFs in a single VM - that ability is already in QEMU even without this patch set. As Yanghang has measured the downtime improvement happens even with a single VF, although with more VFs one can additionality see the scalability benefits of this patch set. > This is a major addition which needs justification anyhow > * pros and cons The biggest advantage is obviously the downtime performance. I'm not sure if there are any obvious disadvantages (assuming the setup supports the multifd migration in the first place), besides maybe slightly bigger memory usage for in-flight buffers? But we have an option for capping that if someone is concerned about it. >> I'm not sure what descriptions you exactly want in these places, > > Looking from the VFIO subsystem, the way this series works is very opaque. > There are a couple of a new migration handlers, I've added descriptions of these 3 new migration handlers to docs/devel/migration/vfio.rst. They are also described in struct SaveVMHandlers in include/migration/register.h and also in the commit messages that introduce them. > new threads, A total of two of these, their function is described in docs/devel/migration/vfio.rst and also in the commit messages that introduce them. > new channels, I think you meant "new data type for multifd channel" here but that's in migration core, not VFIO. > etc. It has been discussed several times with migration folks, please provide > a summary for a new reader as ignorant as everyone would be when looking at > a new file. I can certainly include all these in the new version cover letter if that's easier for a new reader. >> but since >> that's just documentation (not code) it could be added after the code freeze... > > That's the risk of not getting any ! and the initial proposal should be > discussed before code freeze. > > For the general framework, I was expecting an extension of a "multifd" > subsection under : > > https://qemu.readthedocs.io/en/v9.2.0/devel/migration/features.html > > but it doesn't exist :/ Looking at the source file for this page at docs/devel/migration/features.rst the "multifd" section should appear on this page automatically after I added it to docs/devel/migration/vfio.rst. > So, for now, let's use the new "multifd" subsection of > > https://qemu.readthedocs.io/en/v9.2.0/devel/migration/vfio.html Okay. >> >>> >>> This section : >>> >>>> +Since the target QEMU needs to load device state buffers in-order it needs to >>>> +queue incoming buffers until they can be loaded into the device. >>>> +This means that a malicious QEMU source could theoretically cause the target >>>> +QEMU to allocate unlimited amounts of memory for such buffers-in-flight. >>>> + >>>> +The "x-migration-max-queued-buffers" property allows capping the maximum count >>>> +of these VFIO device state buffers queued at the destination. >>>> + >>>> +Because a malicious QEMU source causing OOM on the target is not expected to be >>>> +a realistic threat in most of VFIO live migration use cases and the right value >>>> +depends on the particular setup by default this queued buffers limit is >>>> +disabled by setting it to UINT64_MAX. >>> >>> should be in patch 34. It is not obvious it will be merged. >>> >> >> ...which brings us to this point. >> >> I think by this point in time (less then 2 weeks to code freeze) we should >> finally decide what is going to be included in the patch set. >> > This way this patch set could be well tested in its final form rather than >> having significant parts taken out of it at the eleventh hour. >> >> If the final form is known also the documentation can be adjusted accordingly >> and user/admin documentation eventually written once the code is considered >> okay. >> >> I though we discussed a few times the rationale behind both >> x-migration-max-queued-buffers and x-migration-load-config-after-iter properties >> but if you still have some concerns there please let me know before I prepare >> the next version of this patch set so I know whether to include these. > > Patch 34, not sure yet. > > Patch 35 is for next cycle IMO. > > For QEMU 10.0, let's focus on x86 first and see how it goes. We can add > ARM support in QEMU 10.1 if nothing new arises. We will need the virt-arm > folks in cc: then. > > Please keep patch 35 in v6 nevertheless, it is good for reference if > someone wants to apply on an out of tree QEMU. If we are to drop/skip adding the "x-migration-load-config-after-iter" option for now then let's do it now so the next version could be already tested in its target shape. After this "x-migration-load-config-after-iter" option is proposed once again in QEMU 10.1 cycle then it obviously will be forward ported to whatever the code looks at that point and tested again. The patch itself is not going to suddenly disappear :) - it's on the mailing list and in my repository here: https://gitlab.com/maciejsszmigiero/qemu/-/commit/6582ac5ac338c40ad74ec60820e85b06c4509a2a > > Thanks, > > C. Thanks, Maciej
© 2016 - 2025 Red Hat, Inc.