It seems not super clear on when iova_tree is used, and why. Add a rich
comment above iova_tree to track why we needed the iova_tree, and when we
need it.
Also comment for the map/unmap messages, on how they're used and
implications (e.g. unmap can be larger than the mapped ranges).
Suggested-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
v2:
- Adjust according to Eric's comment
---
include/exec/memory.h | 20 ++++++++++++++++++
include/hw/i386/intel_iommu.h | 38 ++++++++++++++++++++++++++++++++++-
2 files changed, 57 insertions(+), 1 deletion(-)
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 91f8a2395a..a8489feb51 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -129,6 +129,26 @@ struct IOMMUTLBEntry {
/*
* Bitmap for different IOMMUNotifier capabilities. Each notifier can
* register with one or multiple IOMMU Notifier capability bit(s).
+ *
+ * Normally there're two use cases for the notifiers:
+ *
+ * (1) When the device needs accurate synchronizations of the vIOMMU page
+ * tables, it needs to register with both MAP|UNMAP notifies (which
+ * is defined as IOMMU_NOTIFIER_IOTLB_EVENTS below). As long as MAP
+ * events are registered, the notifications will be accurate but
+ * there's overhead on synchronizing the guest vIOMMU page tables.
+ *
+ * (2) When the device doesn't need accurate synchronizations of the
+ * vIOMMU page tables (when the device can both cache translations
+ * and requesting to translate dynamically during DMA process), it
+ * needs to register only with UNMAP or DEVIOTLB_UNMAP notifies.
+ * Note that in this working mode the vIOMMU will not maintain a
+ * shadowed page table for the address space, and the UNMAP messages
+ * can be actually larger than the real invalidations (just like how
+ * the Linux IOMMU driver normally works, where an invalidation can
+ * be enlarged as long as it still covers the target range). The
+ * IOMMU notifiee should be able to take care of over-sized
+ * invalidations.
*/
typedef enum {
IOMMU_NOTIFIER_NONE = 0,
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 46d973e629..89dcbc5e1e 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -109,7 +109,43 @@ struct VTDAddressSpace {
QLIST_ENTRY(VTDAddressSpace) next;
/* Superset of notifier flags that this address space has */
IOMMUNotifierFlag notifier_flags;
- IOVATree *iova_tree; /* Traces mapped IOVA ranges */
+ /*
+ * @iova_tree traces mapped IOVA ranges.
+ *
+ * The tree is not needed if no MAP notifier is registered with current
+ * VTD address space, because all guest invalidate commands can be
+ * directly passed to the IOMMU UNMAP notifiers without any further
+ * reshuffling.
+ *
+ * The tree OTOH is required for MAP typed iommu notifiers for a few
+ * reasons.
+ *
+ * Firstly, there's no way to identify whether an PSI (Page Selective
+ * Invalidations) or DSI (Domain Selective Invalidations) event is an
+ * MAP or UNMAP event within the message itself. Without having prior
+ * knowledge of existing state vIOMMU doesn't know whether it should
+ * notify MAP or UNMAP for a PSI message it received when caching mode
+ * is enabled (for MAP notifiers).
+ *
+ * Secondly, PSI messages received from guest driver can be enlarged in
+ * range, covers but not limited to what the guest driver wanted to
+ * invalidate. When the range to invalidates gets bigger than the
+ * limit of a PSI message, it can even become a DSI which will
+ * invalidate the whole domain. If the vIOMMU directly notifies the
+ * registered device with the unmodified range, it may confuse the
+ * registered drivers (e.g. vfio-pci) on either:
+ *
+ * (1) Trying to map the same region more than once (for
+ * VFIO_IOMMU_MAP_DMA, -EEXIST will trigger), or,
+ *
+ * (2) Trying to UNMAP a range that is still partially mapped.
+ *
+ * That accuracy is not required for UNMAP-only notifiers, but it is a
+ * must-to-have for notifiers registered with MAP events, because the
+ * vIOMMU needs to make sure the shadow page table is always in sync
+ * with the guest IOMMU pgtables for a device.
+ */
+ IOVATree *iova_tree;
};
struct VTDIOTLBEntry {
--
2.37.3