[PULL 27/41] intel_iommu: Fix unmap_bitmap failure with legacy VFIO backend

Cédric Le Goater posted 41 patches 3 weeks, 6 days ago
Maintainers: Yi Liu <yi.l.liu@intel.com>, Eric Auger <eric.auger@redhat.com>, Zhenzhong Duan <zhenzhong.duan@intel.com>, Paolo Bonzini <pbonzini@redhat.com>, Richard Henderson <richard.henderson@linaro.org>, Eduardo Habkost <eduardo@habkost.net>, "Michael S. Tsirkin" <mst@redhat.com>, Marcel Apfelbaum <marcel.apfelbaum@gmail.com>, Jason Wang <jasowang@redhat.com>, "Clément Mathieu--Drif" <clement.mathieu--drif@eviden.com>, Alexey Kardashevskiy <aik@ozlabs.ru>, John Levon <john.levon@nutanix.com>, Thanos Makatos <thanos.makatos@nutanix.com>, "Cédric Le Goater" <clg@redhat.com>, Alex Williamson <alex@shazbot.org>, Magnus Kulke <magnus.kulke@linux.microsoft.com>, Wei Liu <wei.liu@kernel.org>, Cornelia Huck <cohuck@redhat.com>, Palmer Dabbelt <palmer@dabbelt.com>, Alistair Francis <alistair.francis@wdc.com>, Weiwei Li <liwei1518@gmail.com>, Daniel Henrique Barboza <dbarboza@ventanamicro.com>, Liu Zhiwei <zhiwei_liu@linux.alibaba.com>, Fabiano Rosas <farosas@suse.de>, Laurent Vivier <lvivier@redhat.com>
There is a newer version of this series
[PULL 27/41] intel_iommu: Fix unmap_bitmap failure with legacy VFIO backend
Posted by Cédric Le Goater 3 weeks, 6 days ago
From: Zhenzhong Duan <zhenzhong.duan@intel.com>

If a VFIO device in guest switches from IOMMU domain to block domain,
vtd_address_space_unmap() is called to unmap whole address space.

If that happens during migration, migration fails with legacy VFIO
backend as below:

Status: failed (vfio_container_dma_unmap(0x561bbbd92d90, 0x100000000000, 0x100000000000) = -7 (Argument list too long))

Because legacy VFIO limits maximum bitmap size to 256MB which maps to 8TB on
4K page system, when 16TB sized UNMAP notification is sent, unmap_bitmap
ioctl fails. Normally such large UNMAP notification come from IOVA range
rather than system memory.

Apart from that, vtd_address_space_unmap() sends UNMAP notification with
translated_addr = 0, because there is no valid translated_addr for unmapping
a whole iommu memory region. This breaks dirty tracking no matter which VFIO
backend is used.

Fix them all by iterating over DMAMap list to unmap each range with active
mapping when global_dirty_tracking is active. global_dirty_tracking is
protected by BQL, so it's safe to reference it directly. If it's not active,
unmapping the whole address space in one go is optimal.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Giovannio Cabiddu <giovanni.cabiddu@intel.com>
Tested-by: Rohith S R <rohith.s.r@intel.com>
Link: https://lore.kernel.org/qemu-devel/20251218062643.624796-7-zhenzhong.duan@intel.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
 hw/i386/intel_iommu.c | 42 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index f971cdd14c78fd80df26c0f67d4abc8cfb35645d..bb21ae1743d2bf0dc91c4fc18026f449a290814b 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -4764,6 +4764,43 @@ static uint64_t vtd_get_viommu_flags(void *opaque)
     return flags;
 }
 
+/*
+ * There is no valid translated_addr for unmapping a whole iommu memory region.
+ * When dirty tracking is enabled, we need it to set dirty bitmaps. Iterate
+ * over DMAMap list to unmap each range with active mapping and translated_addr
+ * value.
+ */
+static void vtd_address_space_unmap_in_dirty_tracking(VTDAddressSpace *as,
+                                                      IOMMUNotifier *n)
+{
+    const DMAMap *map;
+    const DMAMap target = {
+        .iova = n->start,
+        .size = n->end,
+    };
+    IOVATree *tree = as->iova_tree;
+
+    /*
+     * DMAMap is created during IOMMU page table sync, it's either 4KB or huge
+     * page size and always a power of 2 in size. So the range of DMAMap could
+     * be used for UNMAP notification directly.
+     */
+    while ((map = iova_tree_find(tree, &target))) {
+        IOMMUTLBEvent event;
+
+        event.type = IOMMU_NOTIFIER_UNMAP;
+        event.entry.iova = map->iova;
+        event.entry.addr_mask = map->size;
+        event.entry.target_as = &address_space_memory;
+        event.entry.perm = IOMMU_NONE;
+        /* This field is needed to set dirty bigmap */
+        event.entry.translated_addr = map->translated_addr;
+        memory_region_notify_iommu_one(n, &event);
+
+        iova_tree_remove(tree, *map);
+    }
+}
+
 /* Unmap the whole range in the notifier's scope. */
 static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n)
 {
@@ -4773,6 +4810,11 @@ static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n)
     IntelIOMMUState *s = as->iommu_state;
     DMAMap map;
 
+    if (global_dirty_tracking) {
+        vtd_address_space_unmap_in_dirty_tracking(as, n);
+        return;
+    }
+
     /*
      * Note: all the codes in this function has a assumption that IOVA
      * bits are no more than VTD_MGAW bits (which is restricted by
-- 
2.52.0