This work has been presented at Xen Summit 2024 during the
IOMMU paravirtualization and Xen IOMMU subsystem rework
design session.
Operating systems may want to have access to a IOMMU in order to do DMA
protection or implement certain features (e.g VFIO on Linux).
VFIO support is mandatory for framework such as SPDK, which can be useful to
implement an alternative storage backend for virtual machines [1].
In this patch series, we introduce in Xen the ability to manage several
"IOMMU contexts" per domain and provide a new hypercall interface to allow
guests to manage IOMMU contexts.
The VT-d and AMD-Vi drivers are updated to support these new features.
Work still remain to do for ARM/PPC/RISC-V.
Assuming appropriate Dom0 drivers, aside the capability to use VFIO in Dom0,
it also changes the way Linux performs DMA with devices to rely on the "IOMMU"
(thus "PV-IOMMU") instead of assuming all of it is device-visible and eventually
relying on the swiotlb. (this behavior can be disabled with Linux's iommu=pt).
In this case, address space of device is no longer tied to the p2m, causing all
modifications of p2m (e.g grant, foreign) to no longer require a IOTLB flush (usually
on unmap, when using Dom0 PVH or dom0-iommu=strict).
That makes virtualized I/O vastly better with PVH Dom0, at least on Intel platform.
On a Intel i5-4670 platform
PVH Dom0 with current Xen behavior or with iommu=pt:
iperf VM -> Dom0: ~600 Mbps
PVH Dom0 with Dom0 IOMMU driver and iommu=nopt (usually default):
iperf VM -> Dom0: ~7 Gbps (~11x performance increase)
Dom0 driver branch (until a make a new patch):
https://gitlab.com/xen-project/people/tsnake41/linux/-/tree/xen-pviommu-6.18
[1] Using SPDK with the Xen hypervisor - FOSDEM 2023
---
This is a RFC, things are still experimental at this state.
Cc: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Changed in v2 :
* fixed Xen crash when dumping IOMMU contexts (using X debug key)
with DomUs without IOMMU
* s/dettach/detach/
* removed some unused includes
* fix dangling devices in contexts with detach
Changed in v3 :
* lock entirely map/unmap in hypercall
* prevent IOMMU operations on dying contexts (fix race condition)
* iommu_check_context+iommu_get_context -> iommu_get_context and check for NULL
Changed in v4 :
* Part of initialization logic is moved to domain or toolstack (IOMMU_init)
+ domain/toolstack now decides on "context count" and "pagetable pool size"
+ for now, all domains are able to initialize PV-IOMMU
* introduce "dom0-iommu=no-dma" to make default context block all DMA
(disables HAP and sync-pt), enforcing usage of PV-IOMMU for DMA
Can be used to expose properly "Pre-boot DMA protection"
* redesigned locking logic for contexts
+ contexts are accessed using iommu_get_context and released with iommu_put_context
Changed in v5 :
* various PCI Passthrough related fixes
+ rewrote parts of PCI Passthrough logic
+ various other related bug fixes
* simplified VT-d DID (for hardware) management by only having one map instead of two
(pseudo_domid map was previously used for old quarantine code then recycled for PV-IOMMU
in addition to another map also tracing Domain<->VT-d DID, now there is only one
map tracking both making things simpler)
* reworked parts of Xen quarantine logic (needed for PCI Passthrough)
* added cf_check annotations
* some changes to PV-IOMMU headers (Alejandro)
Changed in v6 :
* reorganized the patch series to allow bissecting
* it is splitted in various smaller patches
* initial AMD-Vi port (it doesn't completely work with PV-IOMMU though, but builds at
least)
* AMD-Vi lacks support for iommu_lookup_page (needed for several PV-IOMMU ops)
Changed in v7 :
* Proper AMD-Vi support for PV-IOMMU, mostly works with some quirks (e.g 'AMD IOMMU' devices
that are visible for Dom0, but doesn't exist from PV-IOMMU point of view)
* splitted some parts of patches to smaller ones
* fixed numerous issues
* a notable one being a ASSERT in PV-IOMMU map operation due to problematic foreign page
reference counting
* fixed typo in design document
* introduce a transient "invalid context" ID for devices that aren't handled yet
* add proper "no-dma" documentation
TODO:
* Proper cleanup of AMD-Vi mappings (for ctx_no != 0)
* consider per-iommu domid limit (allocate did on first attach/reattach ?)
* ARM implementation
* properly define nested mode and PASID support
* define how PV-IOMMU should behave in DomUs (e.g they don't see machine bdf)
* especially regarding how to expose "no-dma" mode
* better quarantine code (e.g isolate devices with different reserved regions
regions using separate 'contexts')
* there are corner cases with PV-IOMMU and to-domain Xen PCI Passthrough
(e.g pci-assignable-remove will reassign to context 0, while the driver
expects the device to to be in context X)
Teddy Astie (14):
docs/designs: Add a design document for IOMMU subsystem redesign
docs/designs: Add a design document for PV-IOMMU
x86/domain: Defer domain iommu initialization.
iommu: Move IOMMU domain related structures to (arch_)iommu_context
iommu: Simplify quarantine logic
vtd: Remove MAP_ERROR_RECOVERY code path in domain_context_mapping_one
iommu: Simplify hardware did management
iommu: Introduce redesigned IOMMU subsystem
iommu: Provide 'X' debug key to dump IOMMU context infos
amd/iommu: Introduce lookup implementation
iommu: Introduce iommu_get_max_iova
x86/iommu: Introduce IOMMU arena
iommu: Introduce PV-IOMMU
iommu: Introduce no-dma feature
docs/designs/iommu-contexts.md | 403 +++++
docs/designs/pv-iommu.md | 118 ++
docs/misc/xen-command-line.pandoc | 16 +-
xen/arch/arm/include/asm/iommu.h | 4 +
xen/arch/ppc/include/asm/iommu.h | 3 +
xen/arch/x86/domain.c | 10 +-
xen/arch/x86/include/asm/arena.h | 54 +
xen/arch/x86/include/asm/iommu.h | 59 +-
xen/arch/x86/include/asm/pci.h | 17 -
xen/arch/x86/mm/p2m-ept.c | 2 +-
xen/arch/x86/pv/dom0_build.c | 6 +-
xen/arch/x86/tboot.c | 3 +-
xen/arch/x86/x86_64/mm.c | 3 +-
xen/common/Makefile | 1 +
xen/common/memory.c | 4 +-
xen/common/pv-iommu.c | 554 +++++++
xen/drivers/passthrough/amd/iommu.h | 23 +-
xen/drivers/passthrough/amd/iommu_cmd.c | 20 +-
xen/drivers/passthrough/amd/iommu_init.c | 57 +-
xen/drivers/passthrough/amd/iommu_map.c | 307 ++--
xen/drivers/passthrough/amd/pci_amd_iommu.c | 411 +++--
xen/drivers/passthrough/iommu.c | 751 ++++++++-
xen/drivers/passthrough/pci.c | 394 ++---
xen/drivers/passthrough/vtd/extern.h | 19 +-
xen/drivers/passthrough/vtd/iommu.c | 1623 ++++++-------------
xen/drivers/passthrough/vtd/iommu.h | 2 -
xen/drivers/passthrough/vtd/qinval.c | 2 +-
xen/drivers/passthrough/vtd/quirks.c | 21 +-
xen/drivers/passthrough/vtd/vtd.h | 3 +-
xen/drivers/passthrough/x86/Makefile | 1 +
xen/drivers/passthrough/x86/arena.c | 157 ++
xen/drivers/passthrough/x86/iommu.c | 294 +++-
xen/include/hypercall-defs.c | 6 +
xen/include/public/pv-iommu.h | 343 ++++
xen/include/public/xen.h | 1 +
xen/include/xen/iommu.h | 122 +-
xen/include/xen/pci.h | 3 +
37 files changed, 3777 insertions(+), 2040 deletions(-)
create mode 100644 docs/designs/iommu-contexts.md
create mode 100644 docs/designs/pv-iommu.md
create mode 100644 xen/arch/x86/include/asm/arena.h
create mode 100644 xen/common/pv-iommu.c
create mode 100644 xen/drivers/passthrough/x86/arena.c
create mode 100644 xen/include/public/pv-iommu.h
--
2.51.2
--
Teddy Astie | Vates XCP-ng Developer
XCP-ng & Xen Orchestra - Vates solutions
web: https://vates.tech