[PATCH v3 00/15] vfio: VFIO migration support with vIOMMU

Joao Martins posted 15 patches 11 months, 1 week ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20230530175937.24202-1-joao.m.martins@oracle.com
Maintainers: "Michael S. Tsirkin" <mst@redhat.com>, Peter Xu <peterx@redhat.com>, Jason Wang <jasowang@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>, Richard Henderson <richard.henderson@linaro.org>, Eduardo Habkost <eduardo@habkost.net>, Marcel Apfelbaum <marcel.apfelbaum@gmail.com>, Alex Williamson <alex.williamson@redhat.com>, "Cédric Le Goater" <clg@redhat.com>, David Hildenbrand <david@redhat.com>, "Philippe Mathieu-Daudé" <philmd@linaro.org>
There is a newer version of this series
hw/i386/intel_iommu.c         |  40 ++++-
hw/pci/pci.c                  |  21 ++-
hw/vfio/common.c              | 265 +++++++++++++++++++++++++++-------
hw/vfio/pci.c                 |   4 +
include/exec/memory.h         |   4 +-
include/hw/pci/pci.h          |  38 ++++-
include/hw/pci/pci_bus.h      |   1 +
include/hw/vfio/vfio-common.h |   1 +
8 files changed, 309 insertions(+), 65 deletions(-)
[PATCH v3 00/15] vfio: VFIO migration support with vIOMMU
Posted by Joao Martins 11 months, 1 week ago
Hey folks,

This series introduces support for vIOMMU with VFIO device migration,
particurlarly related to how we do the dirty page tracking. This is a v3
follow-up containing the vIOMMU subset of v2[0], but with some new changes,
some simplifications and finer grained scope.

Today vIOMMUs serve two purposes: 1) enable interrupt remaping 2)
provide dma translation services for guests to provide some form of
guest kernel managed DMA e.g. for nested virt based usage; (1) is specially
required for big VMs with VFs with more than 255 vcpus. We tackle both
and remove the migration blocker when vIOMMU is present provided the
conditions are met. I have both use-cases here in one series, but I am happy
to tackle them in separate series.

As I found out we don't necessarily need to expose the whole vIOMMU
functionality in order to just support interrupt remapping. x86 IOMMUs
on Windows Server 2018[2] and Linux >=5.10, with qemu 7.1+ (or really
Linux guests with commit c40aaaac10 and since qemu commit 8646d9c773d8)
can instantiate a IOMMU just for interrupt remapping without needing to
be advertised/support DMA translation. AMD IOMMU in theory can provide
the same, but Linux doesn't quite support the IR-only part there yet,
only intel-iommu.

The series is organized as following:

Patches 1-5: Today we can't gather vIOMMU details before the guest
establishes their first DMA mapping via the vIOMMU. So these first four
patches add a way for vIOMMUs to be asked of their properties.
Essentially by being able to return IOMMU MR to upper layers. The last
patch of this uses the new pci_setup_iommu_info() in intel-iommu, which
then lets VFIO fetch the IOMMU MR and use it to gather the necessary
properties that are added in follow-up patches. I choose the least churn
possible way for now (as opposed to a treewide conversion) and allow easy
conversion a posteriori.

Patches 6-8: Handle configs with vIOMMU interrupt remapping but without
DMA translation allowed. Today the 'dma-translation' attribute is
x86-iommu only, but the way this series is structured nothing stops from
other vIOMMUs supporting it too as long as they use
pci_setup_iommu_info() and the necessary IOMMU MR get_attr attributes
are handled. The blocker is thus relaxed when vIOMMUs are able to toggle
the toggle/report DMA_TRANSLATION attribute. With the patches up to this set,
we've then tackled item (1) of the second paragraph.

Patches 9-15: Simplified a lot from v2 (patch 9) to only track the complete
IOVA address space, leveraging the logic we use to compose the dirty ranges.
The blocker is once again relaxed for vIOMMUs that advertise their IOVA
addressing limits. This tackles item (2). So far I mainly use it with
intel-iommu, although I have a small set of patches for virtio-iommu per
Alex's suggestion in v2.

Comments, suggestions welcome. Thanks for the review!

Regards,
	Joao

Changes since v2[3]:
* New patches 1-9 to be able to handle vIOMMUs without DMA translation, and
introduce ways to know various IOMMU model attributes via the IOMMU MR. This
is partly meant to address a comment in previous versions where we can't
access the IOMMU MR prior to the DMA mapping happening. Before this series
vfio giommu_list is only tracking 'mapped GIOVA' and that controlled by the
guest. As well as better tackling of the IOMMU usage for interrupt-remapping
only purposes. 
* Dropped Peter Xu ack on patch 9 given that the code changed a bit.
* Adjust patch 14 to adjust for the VFIO bitmaps no longer being pointers.
* The patches that existed in v2 of vIOMMU dirty tracking, are mostly
* untouched, except patch 12 which was greatly simplified.

Changes since v1[4]:
- Rebased on latest master branch. As part of it, made some changes in
  pre-copy to adjust it to Juan's new patches:
  1. Added a new patch that passes threshold_size parameter to
     .state_pending_{estimate,exact}() handlers.
  2. Added a new patch that refactors vfio_save_block().
  3. Changed the pre-copy patch to cache and report pending pre-copy
     size in the .state_pending_estimate() handler.
- Removed unnecessary P2P code. This should be added later on when P2P
  support is added. (Alex)
- Moved the dirty sync to be after the DMA unmap in vfio_dma_unmap()
  (patch #11). (Alex)
- Stored vfio_devices_all_device_dirty_tracking()'s value in a local
  variable in vfio_get_dirty_bitmap() so it can be re-used (patch #11).
- Refactored the viommu device dirty tracking ranges creation code to
  make it clearer (patch #15).
- Changed overflow check in vfio_iommu_range_is_device_tracked() to
  emphasize that we specifically check for 2^64 wrap around (patch #15).
- Added R-bs / Acks.

[0] https://lore.kernel.org/qemu-devel/20230222174915.5647-1-avihaih@nvidia.com/
[1] https://lore.kernel.org/qemu-devel/c66d2d8e-f042-964a-a797-a3d07c260a3b@oracle.com/
[2] https://learn.microsoft.com/en-us/windows-hardware/design/device-experiences/oem-kernel-dma-protection
[3] https://lore.kernel.org/qemu-devel/20230222174915.5647-1-avihaih@nvidia.com/
[4] https://lore.kernel.org/qemu-devel/20230126184948.10478-1-avihaih@nvidia.com/

Avihai Horon (4):
  memory/iommu: Add IOMMU_ATTR_MAX_IOVA attribute
  intel-iommu: Implement IOMMU_ATTR_MAX_IOVA get_attr() attribute
  vfio/common: Extract vIOMMU code from vfio_sync_dirty_bitmap()
  vfio/common: Optimize device dirty page tracking with vIOMMU

Joao Martins (11):
  hw/pci: Refactor pci_device_iommu_address_space()
  hw/pci: Add a pci_setup_iommu_info() helper
  hw/pci: Add a pci_device_iommu_memory_region() helper
  intel-iommu: Switch to pci_setup_iommu_info()
  vfio/common: Track the IOMMU MR behind the device in addition to the AS
  memory/iommu: Add IOMMU_ATTR_DMA_TRANSLATION attribute
  intel-iommu: Implement get_attr() method
  vfio/common: Relax vIOMMU detection when DMA translation is off
  vfio/common: Move dirty tracking ranges update to helper
  vfio/common: Support device dirty page tracking with vIOMMU
  vfio/common: Block migration with vIOMMUs without address width limits

 hw/i386/intel_iommu.c         |  40 ++++-
 hw/pci/pci.c                  |  21 ++-
 hw/vfio/common.c              | 265 +++++++++++++++++++++++++++-------
 hw/vfio/pci.c                 |   4 +
 include/exec/memory.h         |   4 +-
 include/hw/pci/pci.h          |  38 ++++-
 include/hw/pci/pci_bus.h      |   1 +
 include/hw/vfio/vfio-common.h |   1 +
 8 files changed, 309 insertions(+), 65 deletions(-)

-- 
2.39.3