[PATCH v2] intel-iommu: Document iova_tree

Peter Xu posted 1 patch 1 year, 4 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20221206221039.395356-1-peterx@redhat.com
Maintainers: Paolo Bonzini <pbonzini@redhat.com>, Peter Xu <peterx@redhat.com>, David Hildenbrand <david@redhat.com>, "Philippe Mathieu-Daudé" <philmd@linaro.org>, "Michael S. Tsirkin" <mst@redhat.com>, Jason Wang <jasowang@redhat.com>, Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
There is a newer version of this series
include/exec/memory.h         | 20 ++++++++++++++++++
include/hw/i386/intel_iommu.h | 38 ++++++++++++++++++++++++++++++++++-
2 files changed, 57 insertions(+), 1 deletion(-)
[PATCH v2] intel-iommu: Document iova_tree
Posted by Peter Xu 1 year, 4 months ago
It seems not super clear on when iova_tree is used, and why.  Add a rich
comment above iova_tree to track why we needed the iova_tree, and when we
need it.

Also comment for the map/unmap messages, on how they're used and
implications (e.g. unmap can be larger than the mapped ranges).

Suggested-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
v2:
- Adjust according to Eric's comment
---
 include/exec/memory.h         | 20 ++++++++++++++++++
 include/hw/i386/intel_iommu.h | 38 ++++++++++++++++++++++++++++++++++-
 2 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 91f8a2395a..a8489feb51 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -129,6 +129,26 @@ struct IOMMUTLBEntry {
 /*
  * Bitmap for different IOMMUNotifier capabilities. Each notifier can
  * register with one or multiple IOMMU Notifier capability bit(s).
+ *
+ * Normally there're two use cases for the notifiers:
+ *
+ *   (1) When the device needs accurate synchronizations of the vIOMMU page
+ *       tables, it needs to register with both MAP|UNMAP notifies (which
+ *       is defined as IOMMU_NOTIFIER_IOTLB_EVENTS below).  As long as MAP
+ *       events are registered, the notifications will be accurate but
+ *       there's overhead on synchronizing the guest vIOMMU page tables.
+ *
+ *   (2) When the device doesn't need accurate synchronizations of the
+ *       vIOMMU page tables (when the device can both cache translations
+ *       and requesting to translate dynamically during DMA process), it
+ *       needs to register only with UNMAP or DEVIOTLB_UNMAP notifies.
+ *       Note that in this working mode the vIOMMU will not maintain a
+ *       shadowed page table for the address space, and the UNMAP messages
+ *       can be actually larger than the real invalidations (just like how
+ *       the Linux IOMMU driver normally works, where an invalidation can
+ *       be enlarged as long as it still covers the target range).  The
+ *       IOMMU notifiee should be able to take care of over-sized
+ *       invalidations.
  */
 typedef enum {
     IOMMU_NOTIFIER_NONE = 0,
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 46d973e629..89dcbc5e1e 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -109,7 +109,43 @@ struct VTDAddressSpace {
     QLIST_ENTRY(VTDAddressSpace) next;
     /* Superset of notifier flags that this address space has */
     IOMMUNotifierFlag notifier_flags;
-    IOVATree *iova_tree;          /* Traces mapped IOVA ranges */
+    /*
+     * @iova_tree traces mapped IOVA ranges.
+     *
+     * The tree is not needed if no MAP notifier is registered with current
+     * VTD address space, because all guest invalidate commands can be
+     * directly passed to the IOMMU UNMAP notifiers without any further
+     * reshuffling.
+     *
+     * The tree OTOH is required for MAP typed iommu notifiers for a few
+     * reasons.
+     *
+     * Firstly, there's no way to identify whether an PSI (Page Selective
+     * Invalidations) or DSI (Domain Selective Invalidations) event is an
+     * MAP or UNMAP event within the message itself.  Without having prior
+     * knowledge of existing state vIOMMU doesn't know whether it should
+     * notify MAP or UNMAP for a PSI message it received when caching mode
+     * is enabled (for MAP notifiers).
+     *
+     * Secondly, PSI messages received from guest driver can be enlarged in
+     * range, covers but not limited to what the guest driver wanted to
+     * invalidate.  When the range to invalidates gets bigger than the
+     * limit of a PSI message, it can even become a DSI which will
+     * invalidate the whole domain.  If the vIOMMU directly notifies the
+     * registered device with the unmodified range, it may confuse the
+     * registered drivers (e.g. vfio-pci) on either:
+     *
+     *   (1) Trying to map the same region more than once (for
+     *       VFIO_IOMMU_MAP_DMA, -EEXIST will trigger), or,
+     *
+     *   (2) Trying to UNMAP a range that is still partially mapped.
+     *
+     * That accuracy is not required for UNMAP-only notifiers, but it is a
+     * must-to-have for notifiers registered with MAP events, because the
+     * vIOMMU needs to make sure the shadow page table is always in sync
+     * with the guest IOMMU pgtables for a device.
+     */
+    IOVATree *iova_tree;
 };
 
 struct VTDIOTLBEntry {
-- 
2.37.3
Re: [PATCH v2] intel-iommu: Document iova_tree
Posted by Peter Xu 1 year, 4 months ago
On Tue, Dec 06, 2022 at 05:10:39PM -0500, Peter Xu wrote:
> It seems not super clear on when iova_tree is used, and why.  Add a rich
> comment above iova_tree to track why we needed the iova_tree, and when we
> need it.
> 
> Also comment for the map/unmap messages, on how they're used and
> implications (e.g. unmap can be larger than the mapped ranges).
> 
> Suggested-by: Jason Wang <jasowang@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
> v2:
> - Adjust according to Eric's comment

This is the old version.. sorry, will repost.

-- 
Peter Xu