[PATCH] vfio: Allow live migration with VFIO devices which use iommu dirty tracking

Zhenzhong Duan posted 1 patch 4 months, 1 week ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20250808040914.19837-1-zhenzhong.duan@intel.com
Maintainers: Alex Williamson <alex.williamson@redhat.com>, "Cédric Le Goater" <clg@redhat.com>
include/hw/vfio/vfio-device.h | 10 ++++++++++
hw/vfio/container-base.c      |  5 +----
hw/vfio/device.c              |  6 ++++++
hw/vfio/migration.c           |  6 +++---
4 files changed, 20 insertions(+), 7 deletions(-)
[PATCH] vfio: Allow live migration with VFIO devices which use iommu dirty tracking
Posted by Zhenzhong Duan 4 months, 1 week ago
Commit e46883204c38 ("vfio/migration: Block migration with vIOMMU")
introduces a migration blocker when vIOMMU is enabled, because we need
to calculate the IOVA ranges for device dirty tracking. But this is
unnecessary for iommu dirty tracking.

Limit the vfio_viommu_preset() check to those devices which use device
dirty tracking. This allows live migration with VFIO devices which use
iommu dirty tracking.

Introduce a helper vfio_device_dirty_pages_disabled() to facilicate it.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
 include/hw/vfio/vfio-device.h | 10 ++++++++++
 hw/vfio/container-base.c      |  5 +----
 hw/vfio/device.c              |  6 ++++++
 hw/vfio/migration.c           |  6 +++---
 4 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
index 6e4d5ccdac..0c663a49d5 100644
--- a/include/hw/vfio/vfio-device.h
+++ b/include/hw/vfio/vfio-device.h
@@ -148,6 +148,16 @@ bool vfio_device_irq_set_signaling(VFIODevice *vbasedev, int index, int subindex
 
 void vfio_device_reset_handler(void *opaque);
 bool vfio_device_is_mdev(VFIODevice *vbasedev);
+/**
+ * vfio_device_dirty_pages_disabled: Check if device dirty tracking will be
+ * used for a VFIO device
+ *
+ * @vbasedev: The VFIODevice to transform
+ *
+ * Return: true if either @vbasedev doesn't support device dirty tracking or
+ * is forcedly disabled from command line, otherwise false.
+ */
+bool vfio_device_dirty_pages_disabled(VFIODevice *vbasedev);
 bool vfio_device_hiod_create_and_realize(VFIODevice *vbasedev,
                                          const char *typename, Error **errp);
 bool vfio_device_attach(char *name, VFIODevice *vbasedev,
diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
index 56304978e1..39c88812f8 100644
--- a/hw/vfio/container-base.c
+++ b/hw/vfio/container-base.c
@@ -177,10 +177,7 @@ bool vfio_container_devices_dirty_tracking_is_supported(
     VFIODevice *vbasedev;
 
     QLIST_FOREACH(vbasedev, &bcontainer->device_list, container_next) {
-        if (vbasedev->device_dirty_page_tracking == ON_OFF_AUTO_OFF) {
-            return false;
-        }
-        if (!vbasedev->dirty_pages_supported) {
+        if (vfio_device_dirty_pages_disabled(vbasedev)) {
             return false;
         }
     }
diff --git a/hw/vfio/device.c b/hw/vfio/device.c
index 08f12ac31f..7a5f249826 100644
--- a/hw/vfio/device.c
+++ b/hw/vfio/device.c
@@ -400,6 +400,12 @@ bool vfio_device_is_mdev(VFIODevice *vbasedev)
     return subsys && (strcmp(subsys, "/sys/bus/mdev") == 0);
 }
 
+bool vfio_device_dirty_pages_disabled(VFIODevice *vbasedev)
+{
+    return (!vbasedev->dirty_pages_supported ||
+            vbasedev->device_dirty_page_tracking == ON_OFF_AUTO_OFF);
+}
+
 bool vfio_device_hiod_create_and_realize(VFIODevice *vbasedev,
                                          const char *typename, Error **errp)
 {
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 4c06e3db93..810a5cb157 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -1183,8 +1183,7 @@ bool vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
         return !vfio_block_migration(vbasedev, err, errp);
     }
 
-    if ((!vbasedev->dirty_pages_supported ||
-         vbasedev->device_dirty_page_tracking == ON_OFF_AUTO_OFF) &&
+    if (vfio_device_dirty_pages_disabled(vbasedev) &&
         !vbasedev->iommu_dirty_tracking) {
         if (vbasedev->enable_migration == ON_OFF_AUTO_AUTO) {
             error_setg(&err,
@@ -1202,7 +1201,8 @@ bool vfio_migration_realize(VFIODevice *vbasedev, Error **errp)
         goto out_deinit;
     }
 
-    if (vfio_viommu_preset(vbasedev)) {
+    if (!vfio_device_dirty_pages_disabled(vbasedev) &&
+        vfio_viommu_preset(vbasedev)) {
         error_setg(&err, "%s: Migration is currently not supported "
                    "with vIOMMU enabled", vbasedev->name);
         goto add_blocker;
-- 
2.47.1
Re: [PATCH] vfio: Allow live migration with VFIO devices which use iommu dirty tracking
Posted by Joao Martins 4 months, 1 week ago
On 08/08/2025 05:09, Zhenzhong Duan wrote:
> Commit e46883204c38 ("vfio/migration: Block migration with vIOMMU")
> introduces a migration blocker when vIOMMU is enabled, because we need
> to calculate the IOVA ranges for device dirty tracking. But this is
> unnecessary for iommu dirty tracking.
> 
> Limit the vfio_viommu_preset() check to those devices which use device
> dirty tracking. This allows live migration with VFIO devices which use
> iommu dirty tracking.
> 

The subject of the patch is a little misleading because LM is already allowed
with VFIO devices and IOMMU dirty tracking. Yet the patch is about VMs with
vIOMMU that get this blocked due to VF device dirty tracking. I suggest instead:

vfio/migration: Allow live migration with vIOMMU without VFs using device dirty
tracking

... It's longer but I think it rings a bit more honest on what we are doing :)

> Introduce a helper vfio_device_dirty_pages_disabled() to facilicate it.
> 
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>

This is unfortunately not enough to unblock vIOMMU migration with IOMMUs.
Have a look at the first four patches of this series:

	https://github.com/jpemartins/qemu/commits/vfio-migration-viommu/

These 4 are meant do this (41d778dda00^..d27e5a5db5f4). Feel free to pick them
up. I hope to take care of the rest of the series; though I have been heavily
preempted internally that I hadn't had the time to clear this series but I think
it's finally coming to an end

The gist of these first four patches is essentially that we need to query the
dirty bitmap before unmap, and we have an extra optimization that let us simply
read the Dirty bit (without clearing it) and so the query is much faster as you
don't have a TLB flush.

I think you can replace the fourth patch with yours as yours it's much
cleaner/simpler.

Mine was specific to IOMMUFD given that perpectual dirty tracking (type1)
required forcefully enabling migration to let it go through. But I think it's ok
for both to work

	Joao
RE: [PATCH] vfio: Allow live migration with VFIO devices which use iommu dirty tracking
Posted by Duan, Zhenzhong 4 months, 1 week ago
Hi Joao,

>-----Original Message-----
>From: Joao Martins <joao.m.martins@oracle.com>
>Subject: Re: [PATCH] vfio: Allow live migration with VFIO devices which use
>iommu dirty tracking
>
>On 08/08/2025 05:09, Zhenzhong Duan wrote:
>> Commit e46883204c38 ("vfio/migration: Block migration with vIOMMU")
>> introduces a migration blocker when vIOMMU is enabled, because we need
>> to calculate the IOVA ranges for device dirty tracking. But this is
>> unnecessary for iommu dirty tracking.
>>
>> Limit the vfio_viommu_preset() check to those devices which use device
>> dirty tracking. This allows live migration with VFIO devices which use
>> iommu dirty tracking.
>>
>
>The subject of the patch is a little misleading because LM is already allowed
>with VFIO devices and IOMMU dirty tracking. Yet the patch is about VMs with
>vIOMMU that get this blocked due to VF device dirty tracking. I suggest
>instead:
>
>vfio/migration: Allow live migration with vIOMMU without VFs using device
>dirty
>tracking
>
>... It's longer but I think it rings a bit more honest on what we are doing :)

Sure, will update subject.

>
>> Introduce a helper vfio_device_dirty_pages_disabled() to facilicate it.
>>
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>
>This is unfortunately not enough to unblock vIOMMU migration with
>IOMMUs.
>Have a look at the first four patches of this series:
>
>	https://github.com/jpemartins/qemu/commits/vfio-migration-viommu/
>
>These 4 are meant do this (41d778dda00^..d27e5a5db5f4). Feel free to pick
>them
>up. I hope to take care of the rest of the series; though I have been heavily
>preempted internally that I hadn't had the time to clear this series but I think
>it's finally coming to an end

Thanks, that help much, we have internal team in intel needing this relax with
vIOMMU enabled.
I'll pick your first 4 patches with mine and send an update.

>
>The gist of these first four patches is essentially that we need to query the
>dirty bitmap before unmap, and we have an extra optimization that let us
>simply
>read the Dirty bit (without clearing it) and so the query is much faster as you
>don't have a TLB flush.

I see the legacy backend use dma_unmap_bitmap to do the same thing in one ioctl(),
will you add that support in kernel?
IIUC, between query and unmap, there is a window in which we will miss dirty pages
if a buggy guest still raises DMA.

>
>I think you can replace the fourth patch with yours as yours it's much
>cleaner/simpler.

OK, will do.

Thanks
Zhenzhong

>
>Mine was specific to IOMMUFD given that perpectual dirty tracking (type1)
>required forcefully enabling migration to let it go through. But I think it's ok
>for both to work
>
>	Joao
Re: [PATCH] vfio: Allow live migration with VFIO devices which use iommu dirty tracking
Posted by Joao Martins 4 months, 1 week ago
On 08/08/2025 11:06, Duan, Zhenzhong wrote:
>>> Introduce a helper vfio_device_dirty_pages_disabled() to facilicate it.
>>>
>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>
>> This is unfortunately not enough to unblock vIOMMU migration with
>> IOMMUs.
>> Have a look at the first four patches of this series:
>>
>> 	https://github.com/jpemartins/qemu/commits/vfio-migration-viommu/
>>
>> These 4 are meant do this (41d778dda00^..d27e5a5db5f4). Feel free to pick
>> them
>> up. I hope to take care of the rest of the series; though I have been heavily
>> preempted internally that I hadn't had the time to clear this series but I think
>> it's finally coming to an end
> 
> Thanks, that help much, we have internal team in intel needing this relax with
> vIOMMU enabled.
> I'll pick your first 4 patches with mine and send an update.
> 

OK

>>
>> The gist of these first four patches is essentially that we need to query the
>> dirty bitmap before unmap, and we have an extra optimization that let us
>> simply
>> read the Dirty bit (without clearing it) and so the query is much faster as you
>> don't have a TLB flush.
> 
> I see the legacy backend use dma_unmap_bitmap to do the same thing in one ioctl(),
> will you add that support in kernel?
> IIUC, between query and unmap, there is a window in which we will miss dirty pages
> if a buggy guest still raises DMA.
> 

No. My first version of the kernel patches had something like this (but not
quite done fully correctly), but we deliberately not handle it and we accepted
that race ... as it was a theoretical use-case and not worth the cost/complexity
it would bring i.e. we would have to write protect the IOPTE, flush, then read
the dirty bits and then unmap (for something we are not aware it exists).

See commit 609848132c71 ("iommufd: Add a flag to skip clearing of IOPTE dirty")
for the history and inside there's a link to the thread to the discussion.

This flag is meant to make it easier to get dirty bits given the next operation
is the unmap which will do TLB flush (and also destroy the dirty bit).

	Joao
RE: [PATCH] vfio: Allow live migration with VFIO devices which use iommu dirty tracking
Posted by Duan, Zhenzhong 4 months ago

>-----Original Message-----
>From: Joao Martins <joao.m.martins@oracle.com>
>>> The gist of these first four patches is essentially that we need to query the
>>> dirty bitmap before unmap, and we have an extra optimization that let us
>>> simply
>>> read the Dirty bit (without clearing it) and so the query is much faster as
>you
>>> don't have a TLB flush.
>>
>> I see the legacy backend use dma_unmap_bitmap to do the same thing in
>one ioctl(),
>> will you add that support in kernel?
>> IIUC, between query and unmap, there is a window in which we will miss
>dirty pages
>> if a buggy guest still raises DMA.
>>
>
>No. My first version of the kernel patches had something like this (but not
>quite done fully correctly), but we deliberately not handle it and we accepted
>that race ... as it was a theoretical use-case and not worth the
>cost/complexity
>it would bring i.e. we would have to write protect the IOPTE, flush, then read
>the dirty bits and then unmap (for something we are not aware it exists).
>
>See commit 609848132c71 ("iommufd: Add a flag to skip clearing of IOPTE
>dirty")
>for the history and inside there's a link to the thread to the discussion.

Clear, thanks for the info.

>
>This flag is meant to make it easier to get dirty bits given the next operation
>is the unmap which will do TLB flush (and also destroy the dirty bit).

OK.

BRs,
Zhenzhong