[PATCH v5 6/9] intel_iommu: Fix unmap_bitmap failure with legacy VFIO backend

Zhenzhong Duan posted 9 patches 3 months ago
Maintainers: Yi Liu <yi.l.liu@intel.com>, Eric Auger <eric.auger@redhat.com>, Zhenzhong Duan <zhenzhong.duan@intel.com>, "Michael S. Tsirkin" <mst@redhat.com>, Jason Wang <jasowang@redhat.com>, "Clément Mathieu--Drif" <clement.mathieu--drif@eviden.com>, Marcel Apfelbaum <marcel.apfelbaum@gmail.com>, Paolo Bonzini <pbonzini@redhat.com>, Richard Henderson <richard.henderson@linaro.org>, Eduardo Habkost <eduardo@habkost.net>, John Levon <john.levon@nutanix.com>, Thanos Makatos <thanos.makatos@nutanix.com>, "Cédric Le Goater" <clg@redhat.com>, Alex Williamson <alex@shazbot.org>
There is a newer version of this series
[PATCH v5 6/9] intel_iommu: Fix unmap_bitmap failure with legacy VFIO backend
Posted by Zhenzhong Duan 3 months ago
If a VFIO device in guest switches from IOMMU domain to block domain,
vtd_address_space_unmap() is called to unmap whole address space.

If that happens during migration, migration fails with legacy VFIO
backend as below:

Status: failed (vfio_container_dma_unmap(0x561bbbd92d90, 0x100000000000, 0x100000000000) = -7 (Argument list too long))

Because legacy VFIO limits maximum bitmap size to 256MB which maps to 8TB on
4K page system, when 16TB sized UNMAP notification is sent, unmap_bitmap
ioctl fails. Normally such large UNMAP notification come from IOVA range
rather than system memory.

Apart from that, vtd_address_space_unmap() sends UNMAP notification with
translated_addr = 0, because there is no valid translated_addr for unmapping
a whole iommu memory region. This breaks dirty tracking no matter which VFIO
backend is used.

Fix them all by iterating over DMAMap list to unmap each range with active
mapping when migration is active. If migration is not active, unmapping the
whole address space in one go is optimal.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Giovannio Cabiddu <giovanni.cabiddu@intel.com>
Tested-by: Rohith S R <rohith.s.r@intel.com>
---
 hw/i386/intel_iommu.c | 42 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index c402643b56..8e98b0b71d 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -37,6 +37,7 @@
 #include "system/system.h"
 #include "hw/i386/apic_internal.h"
 #include "kvm/kvm_i386.h"
+#include "migration/misc.h"
 #include "migration/vmstate.h"
 #include "trace.h"
 
@@ -4695,6 +4696,42 @@ static void vtd_dev_unset_iommu_device(PCIBus *bus, void *opaque, int devfn)
     vtd_iommu_unlock(s);
 }
 
+/*
+ * Unmapping a large range in one go is not optimal during migration because
+ * a large dirty bitmap needs to be allocated while there may be only small
+ * mappings, iterate over DMAMap list to unmap each range with active mapping.
+ */
+static void vtd_address_space_unmap_in_migration(VTDAddressSpace *as,
+                                                 IOMMUNotifier *n)
+{
+    const DMAMap *map;
+    const DMAMap target = {
+        .iova = n->start,
+        .size = n->end,
+    };
+    IOVATree *tree = as->iova_tree;
+
+    /*
+     * DMAMap is created during IOMMU page table sync, it's either 4KB or huge
+     * page size and always a power of 2 in size. So the range of DMAMap could
+     * be used for UNMAP notification directly.
+     */
+    while ((map = iova_tree_find(tree, &target))) {
+        IOMMUTLBEvent event;
+
+        event.type = IOMMU_NOTIFIER_UNMAP;
+        event.entry.iova = map->iova;
+        event.entry.addr_mask = map->size;
+        event.entry.target_as = &address_space_memory;
+        event.entry.perm = IOMMU_NONE;
+        /* This field is needed to set dirty bigmap */
+        event.entry.translated_addr = map->translated_addr;
+        memory_region_notify_iommu_one(n, &event);
+
+        iova_tree_remove(tree, *map);
+    }
+}
+
 /* Unmap the whole range in the notifier's scope. */
 static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n)
 {
@@ -4704,6 +4741,11 @@ static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n)
     IntelIOMMUState *s = as->iommu_state;
     DMAMap map;
 
+    if (migration_is_running()) {
+        vtd_address_space_unmap_in_migration(as, n);
+        return;
+    }
+
     /*
      * Note: all the codes in this function has a assumption that IOVA
      * bits are no more than VTD_MGAW bits (which is restricted by
-- 
2.47.1
RE: [PATCH v5 6/9] intel_iommu: Fix unmap_bitmap failure with legacy VFIO backend
Posted by Duan, Zhenzhong 2 months, 1 week ago

>-----Original Message-----
>From: Duan, Zhenzhong <zhenzhong.duan@intel.com>
>Subject: [PATCH v5 6/9] intel_iommu: Fix unmap_bitmap failure with legacy
>VFIO backend
>
>If a VFIO device in guest switches from IOMMU domain to block domain,
>vtd_address_space_unmap() is called to unmap whole address space.
>
>If that happens during migration, migration fails with legacy VFIO
>backend as below:
>
>Status: failed (vfio_container_dma_unmap(0x561bbbd92d90,
>0x100000000000, 0x100000000000) = -7 (Argument list too long))
>
>Because legacy VFIO limits maximum bitmap size to 256MB which maps to
>8TB on
>4K page system, when 16TB sized UNMAP notification is sent, unmap_bitmap
>ioctl fails. Normally such large UNMAP notification come from IOVA range
>rather than system memory.
>
>Apart from that, vtd_address_space_unmap() sends UNMAP notification with
>translated_addr = 0, because there is no valid translated_addr for unmapping
>a whole iommu memory region. This breaks dirty tracking no matter which
>VFIO
>backend is used.
>
>Fix them all by iterating over DMAMap list to unmap each range with active
>mapping when migration is active. If migration is not active, unmapping the
>whole address space in one go is optimal.
>
>Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>Reviewed-by: Yi Liu <yi.l.liu@intel.com>
>Tested-by: Giovannio Cabiddu <giovanni.cabiddu@intel.com>
>Tested-by: Rohith S R <rohith.s.r@intel.com>
>---
> hw/i386/intel_iommu.c | 42
>++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 42 insertions(+)
>
>diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>index c402643b56..8e98b0b71d 100644
>--- a/hw/i386/intel_iommu.c
>+++ b/hw/i386/intel_iommu.c
>@@ -37,6 +37,7 @@
> #include "system/system.h"
> #include "hw/i386/apic_internal.h"
> #include "kvm/kvm_i386.h"
>+#include "migration/misc.h"
> #include "migration/vmstate.h"
> #include "trace.h"
>
>@@ -4695,6 +4696,42 @@ static void vtd_dev_unset_iommu_device(PCIBus
>*bus, void *opaque, int devfn)
>     vtd_iommu_unlock(s);
> }
>
>+/*
>+ * Unmapping a large range in one go is not optimal during migration
>because
>+ * a large dirty bitmap needs to be allocated while there may be only small
>+ * mappings, iterate over DMAMap list to unmap each range with active
>mapping.
>+ */
>+static void vtd_address_space_unmap_in_migration(VTDAddressSpace *as,
>+                                                 IOMMUNotifier
>*n)
>+{
>+    const DMAMap *map;
>+    const DMAMap target = {
>+        .iova = n->start,
>+        .size = n->end,
>+    };
>+    IOVATree *tree = as->iova_tree;
>+
>+    /*
>+     * DMAMap is created during IOMMU page table sync, it's either 4KB
>or huge
>+     * page size and always a power of 2 in size. So the range of DMAMap
>could
>+     * be used for UNMAP notification directly.
>+     */
>+    while ((map = iova_tree_find(tree, &target))) {
>+        IOMMUTLBEvent event;
>+
>+        event.type = IOMMU_NOTIFIER_UNMAP;
>+        event.entry.iova = map->iova;
>+        event.entry.addr_mask = map->size;
>+        event.entry.target_as = &address_space_memory;
>+        event.entry.perm = IOMMU_NONE;
>+        /* This field is needed to set dirty bigmap */
>+        event.entry.translated_addr = map->translated_addr;
>+        memory_region_notify_iommu_one(n, &event);
>+
>+        iova_tree_remove(tree, *map);
>+    }
>+}
>+
> /* Unmap the whole range in the notifier's scope. */
> static void vtd_address_space_unmap(VTDAddressSpace *as,
>IOMMUNotifier *n)
> {
>@@ -4704,6 +4741,11 @@ static void
>vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n)
>     IntelIOMMUState *s = as->iommu_state;
>     DMAMap map;
>
>+    if (migration_is_running()) {

Hmm, I just realized it may be better to check global_dirty_tracking instead,
because dirty rate/limit QMP also need it.

Zhenzhong

>+        vtd_address_space_unmap_in_migration(as, n);
>+        return;
>+    }
>+
>     /*
>      * Note: all the codes in this function has a assumption that IOVA
>      * bits are no more than VTD_MGAW bits (which is restricted by
>--
>2.47.1