[PATCH v2 6/8] intel_iommu: Fix unmap_bitmap failure with legacy VFIO backend

Zhenzhong Duan posted 8 patches 4 weeks ago
There is a newer version of this series
[PATCH v2 6/8] intel_iommu: Fix unmap_bitmap failure with legacy VFIO backend
Posted by Zhenzhong Duan 4 weeks ago
If a VFIO device in guest switches from IOMMU domain to block domain,
vtd_address_space_unmap() is called to unmap whole address space.

If that happens during migration, migration fails with legacy VFIO
backend as below:

Status: failed (vfio_container_dma_unmap(0x561bbbd92d90, 0x100000000000, 0x100000000000) = -7 (Argument list too long))

Because legacy VFIO limits maximum bitmap size to 256MB which maps to 8TB on
4K page system, when 16TB sized UNMAP notification is sent, unmap_bitmap
ioctl fails.

Fix it by iterating over DMAMap list to unmap each range with active mapping
when migration is active. If migration is not active, unmapping the whole
address space in one go is optimal.

There is no such limitation with iommufd backend, but it's still not optimal
to allocate large bitmap, e.g., there may be large hole between IOVA ranges,
allocating large bitmap and dirty tracking on the hole is time consuming and
useless work.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Tested-by: Giovannio Cabiddu <giovanni.cabiddu@intel.com>
---
 hw/i386/intel_iommu.c | 42 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 6a168d5107..f32d4f5a15 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -37,6 +37,7 @@
 #include "system/system.h"
 #include "hw/i386/apic_internal.h"
 #include "kvm/kvm_i386.h"
+#include "migration/misc.h"
 #include "migration/vmstate.h"
 #include "trace.h"
 
@@ -4533,6 +4534,42 @@ static void vtd_dev_unset_iommu_device(PCIBus *bus, void *opaque, int devfn)
     vtd_iommu_unlock(s);
 }
 
+/*
+ * Unmapping a large range in one go is not optimal during migration because
+ * a large dirty bitmap needs to be allocated while there may be only small
+ * mappings, iterate over DMAMap list to unmap each range with active mapping.
+ */
+static void vtd_address_space_unmap_in_migration(VTDAddressSpace *as,
+                                                 IOMMUNotifier *n)
+{
+    const DMAMap *map;
+    const DMAMap target = {
+        .iova = n->start,
+        .size = n->end,
+    };
+    IOVATree *tree = as->iova_tree;
+
+    /*
+     * DMAMap is created during IOMMU page table sync, it's either 4KB or huge
+     * page size and always a power of 2 in size. So the range of DMAMap could
+     * be used for UNMAP notification directly.
+     */
+    while ((map = iova_tree_find(tree, &target))) {
+        IOMMUTLBEvent event;
+
+        event.type = IOMMU_NOTIFIER_UNMAP;
+        event.entry.iova = map->iova;
+        event.entry.addr_mask = map->size;
+        event.entry.target_as = &address_space_memory;
+        event.entry.perm = IOMMU_NONE;
+        /* This field is meaningless for unmap */
+        event.entry.translated_addr = 0;
+        memory_region_notify_iommu_one(n, &event);
+
+        iova_tree_remove(tree, *map);
+    }
+}
+
 /* Unmap the whole range in the notifier's scope. */
 static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n)
 {
@@ -4542,6 +4579,11 @@ static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n)
     IntelIOMMUState *s = as->iommu_state;
     DMAMap map;
 
+    if (migration_is_running()) {
+        vtd_address_space_unmap_in_migration(as, n);
+        return;
+    }
+
     /*
      * Note: all the codes in this function has a assumption that IOVA
      * bits are no more than VTD_MGAW bits (which is restricted by
-- 
2.47.1
Re: [PATCH v2 6/8] intel_iommu: Fix unmap_bitmap failure with legacy VFIO backend
Posted by Yi Liu 3 weeks, 4 days ago
On 2025/10/17 16:22, Zhenzhong Duan wrote:
> If a VFIO device in guest switches from IOMMU domain to block domain,
> vtd_address_space_unmap() is called to unmap whole address space.
> 
> If that happens during migration, migration fails with legacy VFIO
> backend as below:
> 
> Status: failed (vfio_container_dma_unmap(0x561bbbd92d90, 0x100000000000, 0x100000000000) = -7 (Argument list too long))
> 
> Because legacy VFIO limits maximum bitmap size to 256MB which maps to 8TB on
> 4K page system, when 16TB sized UNMAP notification is sent, unmap_bitmap
> ioctl fails.

It would be great to add some words to note why vIOMMU can trigger this.

> Fix it by iterating over DMAMap list to unmap each range with active mapping
> when migration is active. If migration is not active, unmapping the whole
> address space in one go is optimal.
> 
> There is no such limitation with iommufd backend, but it's still not optimal
> to allocate large bitmap, e.g., there may be large hole between IOVA ranges,
> allocating large bitmap and dirty tracking on the hole is time consuming and
> useless work.
> 
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> Tested-by: Giovannio Cabiddu <giovanni.cabiddu@intel.com>
> ---
>   hw/i386/intel_iommu.c | 42 ++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 42 insertions(+)

with above comment, the patch LGTM.

Reviewed-by: Yi Liu <yi.l.liu@intel.com>

> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 6a168d5107..f32d4f5a15 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -37,6 +37,7 @@
>   #include "system/system.h"
>   #include "hw/i386/apic_internal.h"
>   #include "kvm/kvm_i386.h"
> +#include "migration/misc.h"
>   #include "migration/vmstate.h"
>   #include "trace.h"
>   
> @@ -4533,6 +4534,42 @@ static void vtd_dev_unset_iommu_device(PCIBus *bus, void *opaque, int devfn)
>       vtd_iommu_unlock(s);
>   }
>   
> +/*
> + * Unmapping a large range in one go is not optimal during migration because
> + * a large dirty bitmap needs to be allocated while there may be only small
> + * mappings, iterate over DMAMap list to unmap each range with active mapping.
> + */
> +static void vtd_address_space_unmap_in_migration(VTDAddressSpace *as,
> +                                                 IOMMUNotifier *n)
> +{
> +    const DMAMap *map;
> +    const DMAMap target = {
> +        .iova = n->start,
> +        .size = n->end,
> +    };
> +    IOVATree *tree = as->iova_tree;
> +
> +    /*
> +     * DMAMap is created during IOMMU page table sync, it's either 4KB or huge
> +     * page size and always a power of 2 in size. So the range of DMAMap could
> +     * be used for UNMAP notification directly.
> +     */
> +    while ((map = iova_tree_find(tree, &target))) {
> +        IOMMUTLBEvent event;
> +
> +        event.type = IOMMU_NOTIFIER_UNMAP;
> +        event.entry.iova = map->iova;
> +        event.entry.addr_mask = map->size;
> +        event.entry.target_as = &address_space_memory;
> +        event.entry.perm = IOMMU_NONE;
> +        /* This field is meaningless for unmap */
> +        event.entry.translated_addr = 0;
> +        memory_region_notify_iommu_one(n, &event);
> +
> +        iova_tree_remove(tree, *map);
> +    }
> +}
> +
>   /* Unmap the whole range in the notifier's scope. */
>   static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n)
>   {
> @@ -4542,6 +4579,11 @@ static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n)
>       IntelIOMMUState *s = as->iommu_state;
>       DMAMap map;
>   
> +    if (migration_is_running()) {
> +        vtd_address_space_unmap_in_migration(as, n);
> +        return;
> +    }
> +
>       /*
>        * Note: all the codes in this function has a assumption that IOVA
>        * bits are no more than VTD_MGAW bits (which is restricted by
RE: [PATCH v2 6/8] intel_iommu: Fix unmap_bitmap failure with legacy VFIO backend
Posted by Duan, Zhenzhong 3 weeks, 4 days ago

>-----Original Message-----
>From: Liu, Yi L <yi.l.liu@intel.com>
>Subject: Re: [PATCH v2 6/8] intel_iommu: Fix unmap_bitmap failure with
>legacy VFIO backend
>
>On 2025/10/17 16:22, Zhenzhong Duan wrote:
>> If a VFIO device in guest switches from IOMMU domain to block domain,
>> vtd_address_space_unmap() is called to unmap whole address space.
>>
>> If that happens during migration, migration fails with legacy VFIO
>> backend as below:
>>
>> Status: failed (vfio_container_dma_unmap(0x561bbbd92d90,
>0x100000000000, 0x100000000000) = -7 (Argument list too long))
>>
>> Because legacy VFIO limits maximum bitmap size to 256MB which maps to
>8TB on
>> 4K page system, when 16TB sized UNMAP notification is sent,
>unmap_bitmap
>> ioctl fails.
>
>It would be great to add some words to note why vIOMMU can trigger this.

Hi Yi, I think the first sentence in description is explaining that?

"If a VFIO device in guest switches from IOMMU domain to block domain,
vtd_address_space_unmap() is called to unmap whole address space."

Thanks
Zhenzhong

Re: [PATCH v2 6/8] intel_iommu: Fix unmap_bitmap failure with legacy VFIO backend
Posted by Yi Liu 3 weeks, 4 days ago

On 2025/10/20 16:03, Duan, Zhenzhong wrote:
> 
> 
>> -----Original Message-----
>> From: Liu, Yi L <yi.l.liu@intel.com>
>> Subject: Re: [PATCH v2 6/8] intel_iommu: Fix unmap_bitmap failure with
>> legacy VFIO backend
>>
>> On 2025/10/17 16:22, Zhenzhong Duan wrote:
>>> If a VFIO device in guest switches from IOMMU domain to block domain,
>>> vtd_address_space_unmap() is called to unmap whole address space.
>>>
>>> If that happens during migration, migration fails with legacy VFIO
>>> backend as below:
>>>
>>> Status: failed (vfio_container_dma_unmap(0x561bbbd92d90,
>> 0x100000000000, 0x100000000000) = -7 (Argument list too long))
>>>
>>> Because legacy VFIO limits maximum bitmap size to 256MB which maps to
>> 8TB on
>>> 4K page system, when 16TB sized UNMAP notification is sent,
>> unmap_bitmap
>>> ioctl fails.
>>
>> It would be great to add some words to note why vIOMMU can trigger this.
> 
> Hi Yi, I think the first sentence in description is explaining that?
> 
> "If a VFIO device in guest switches from IOMMU domain to block domain,
> vtd_address_space_unmap() is called to unmap whole address space."

aha, yes. I was trying to mark it is NOT necessarily related to VM
memory size. Could you note that the address space is guest IOVA which
is not system memory?

Regards,
Yi Liu


>
RE: [PATCH v2 6/8] intel_iommu: Fix unmap_bitmap failure with legacy VFIO backend
Posted by Duan, Zhenzhong 3 weeks, 4 days ago

>-----Original Message-----
>From: Liu, Yi L <yi.l.liu@intel.com>
>Subject: Re: [PATCH v2 6/8] intel_iommu: Fix unmap_bitmap failure with
>legacy VFIO backend
>
>
>
>On 2025/10/20 16:03, Duan, Zhenzhong wrote:
>>
>>
>>> -----Original Message-----
>>> From: Liu, Yi L <yi.l.liu@intel.com>
>>> Subject: Re: [PATCH v2 6/8] intel_iommu: Fix unmap_bitmap failure with
>>> legacy VFIO backend
>>>
>>> On 2025/10/17 16:22, Zhenzhong Duan wrote:
>>>> If a VFIO device in guest switches from IOMMU domain to block domain,
>>>> vtd_address_space_unmap() is called to unmap whole address space.
>>>>
>>>> If that happens during migration, migration fails with legacy VFIO
>>>> backend as below:
>>>>
>>>> Status: failed (vfio_container_dma_unmap(0x561bbbd92d90,
>>> 0x100000000000, 0x100000000000) = -7 (Argument list too long))
>>>>
>>>> Because legacy VFIO limits maximum bitmap size to 256MB which maps
>to
>>> 8TB on
>>>> 4K page system, when 16TB sized UNMAP notification is sent,
>>> unmap_bitmap
>>>> ioctl fails.
>>>
>>> It would be great to add some words to note why vIOMMU can trigger
>this.
>>
>> Hi Yi, I think the first sentence in description is explaining that?
>>
>> "If a VFIO device in guest switches from IOMMU domain to block domain,
>> vtd_address_space_unmap() is called to unmap whole address space."
>
>aha, yes. I was trying to mark it is NOT necessarily related to VM
>memory size. Could you note that the address space is guest IOVA which
>is not system memory?

Sure, will do

Thanks
Zhenzhong
Re: [PATCH v2 6/8] intel_iommu: Fix unmap_bitmap failure with legacy VFIO backend
Posted by Cédric Le Goater 3 weeks, 4 days ago
Clément, Yi Liu,

On 10/17/25 10:22, Zhenzhong Duan wrote:
> If a VFIO device in guest switches from IOMMU domain to block domain,
> vtd_address_space_unmap() is called to unmap whole address space.
> 
> If that happens during migration, migration fails with legacy VFIO
> backend as below:
> 
> Status: failed (vfio_container_dma_unmap(0x561bbbd92d90, 0x100000000000, 0x100000000000) = -7 (Argument list too long))
> 
> Because legacy VFIO limits maximum bitmap size to 256MB which maps to 8TB on
> 4K page system, when 16TB sized UNMAP notification is sent, unmap_bitmap
> ioctl fails.
> 
> Fix it by iterating over DMAMap list to unmap each range with active mapping
> when migration is active. If migration is not active, unmapping the whole
> address space in one go is optimal.
> 
> There is no such limitation with iommufd backend, but it's still not optimal
> to allocate large bitmap, e.g., there may be large hole between IOVA ranges,
> allocating large bitmap and dirty tracking on the hole is time consuming and
> useless work.
> 
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> Tested-by: Giovannio Cabiddu <giovanni.cabiddu@intel.com>
> ---
>   hw/i386/intel_iommu.c | 42 ++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 42 insertions(+)
> 

Could you ack this change please ?

Thanks,

C.