Some operating systems want to use IOMMU to implement various features (e.g
VFIO) or DMA protection.
This patch introduce a proposal for IOMMU paravirtualization for Dom0.
Signed-off-by: Teddy Astie <teddy.astie@vates.tech>
---
docs/designs/pv-iommu.md | 116 +++++++++++++++++++++++++++++++++++++++
1 file changed, 116 insertions(+)
create mode 100644 docs/designs/pv-iommu.md
diff --git a/docs/designs/pv-iommu.md b/docs/designs/pv-iommu.md
new file mode 100644
index 0000000000..7df9fa0b94
--- /dev/null
+++ b/docs/designs/pv-iommu.md
@@ -0,0 +1,116 @@
+# IOMMU paravirtualization for Dom0
+
+Status: Experimental
+
+# Background
+
+By default, Xen only uses the IOMMU for itself, either to make device adress
+space coherent with guest adress space (x86 HVM/PVH) or to prevent devices
+from doing DMA outside it's expected memory regions including the hypervisor
+(x86 PV).
+
+A limitation is that guests (especially privildged ones) may want to use
+IOMMU hardware in order to implement features such as DMA protection and
+VFIO [1] as IOMMU functionality is not available outside of the hypervisor
+currently.
+
+[1] VFIO - "Virtual Function I/O" - https://www.kernel.org/doc/html/latest/driver-api/vfio.html
+
+# Design
+
+The operating system may want to have access to various IOMMU features such as
+context management and DMA remapping. We can create a new hypercall that allows
+the guest to have access to a new paravirtualized IOMMU interface.
+
+This feature is only meant to be available for the Dom0, as DomU have some
+emulated devices that can't be managed on Xen side and are not hardware, we
+can't rely on the hardware IOMMU to enforce DMA remapping.
+
+This interface is exposed under the `iommu_op` hypercall.
+
+In addition, Xen domains are modified in order to allow existence of several
+IOMMU context including a default one that implement default behavior (e.g
+hardware assisted paging) and can't be modified by guest. DomU cannot have
+contexts, and therefore act as if they only have the default domain.
+
+Each IOMMU context within a Xen domain is identified using a domain-specific
+context number that is used in the Xen IOMMU subsystem and the hypercall
+interface.
+
+The number of IOMMU context a domain is specified by either the toolstack or
+the domain itself.
+
+# IOMMU operations
+
+## Initialize PV-IOMMU
+
+Initialize PV-IOMMU for the domain.
+It can only be called once.
+
+## Alloc context
+
+Create a new IOMMU context for the guest and return the context number to the
+guest.
+Fail if the IOMMU context limit of the guest is reached.
+
+A flag can be specified to create a identity mapping.
+
+## Free context
+
+Destroy a IOMMU context created previously.
+It is not possible to free the default context.
+
+Reattach context devices to default context if specified by the guest.
+
+Fail if there is a device in the context and reattach-to-default flag is not
+specified.
+
+## Reattach device
+
+Reattach a device to another IOMMU context (including the default one).
+The target IOMMU context number must be valid and the context allocated.
+
+The guest needs to specify a PCI SBDF of a device he has access to.
+
+## Map/unmap page
+
+Map/unmap a page on a context.
+The guest needs to specify a gfn and target dfn to map.
+
+Refuse to create the mapping if one already exist for the same dfn.
+
+## Lookup page
+
+Get the gfn mapped by a specific dfn.
+
+## Remote command
+
+Make a PV-IOMMU operation on behalf of another domain.
+Especially useful for implementing IOMMU emulation (e.g using QEMU)
+or initializing PV-IOMMU with enforced limits.
+
+# Implementation considerations
+
+## Hypercall batching
+
+In order to prevent unneeded hypercalls and IOMMU flushing, it is advisable to
+be able to batch some critical IOMMU operations (e.g map/unmap multiple pages).
+
+## Hardware without IOMMU support
+
+Operating system needs to be aware on PV-IOMMU capability, and whether it is
+able to make contexts. However, some operating system may critically fail in
+case they are able to make a new IOMMU context. Which is supposed to happen
+if no IOMMU hardware is available.
+
+The hypercall interface needs a interface to advertise the ability to create
+and manage IOMMU contexts including the amount of context the guest is able
+to use. Using these informations, the Dom0 may decide whether to use or not
+the PV-IOMMU interface.
+
+## Page pool for contexts
+
+In order to prevent unexpected starving on the hypervisor memory with a
+buggy Dom0. We can preallocate the pages the contexts will use and make
+map/unmap use these pages instead of allocating them dynamically.
+
--
2.47.2
Teddy Astie | Vates XCP-ng Developer
XCP-ng & Xen Orchestra - Vates solutions
web: https://vates.tech
On 17/02/2025 10:18, Teddy Astie wrote: > Some operating systems want to use IOMMU to implement various features (e.g > VFIO) or DMA protection. > This patch introduce a proposal for IOMMU paravirtualization for Dom0. > > Signed-off-by: Teddy Astie <teddy.astie@vates.tech> > --- > docs/designs/pv-iommu.md | 116 +++++++++++++++++++++++++++++++++++++++ > 1 file changed, 116 insertions(+) > create mode 100644 docs/designs/pv-iommu.md > > diff --git a/docs/designs/pv-iommu.md b/docs/designs/pv-iommu.md > new file mode 100644 > index 0000000000..7df9fa0b94 > --- /dev/null > +++ b/docs/designs/pv-iommu.md > @@ -0,0 +1,116 @@ > +# IOMMU paravirtualization for Dom0 > + > +Status: Experimental > + > +# Background > + > +By default, Xen only uses the IOMMU for itself, either to make device adress > +space coherent with guest adress space (x86 HVM/PVH) or to prevent devices typo: adress -> address > +from doing DMA outside it's expected memory regions including the hypervisor > +(x86 PV). > + > +A limitation is that guests (especially privildged ones) may want to use typo: privildged -> privileged > +IOMMU hardware in order to implement features such as DMA protection and > +VFIO [1] as IOMMU functionality is not available outside of the hypervisor > +currently. > + > +[1] VFIO - "Virtual Function I/O" - https://www.kernel.org/doc/html/latest/driver-api/vfio.html > + > +# Design > + > +The operating system may want to have access to various IOMMU features such as > +context management and DMA remapping. We can create a new hypercall that allows > +the guest to have access to a new paravirtualized IOMMU interface. > + > +This feature is only meant to be available for the Dom0, as DomU have some > +emulated devices that can't be managed on Xen side and are not hardware, we > +can't rely on the hardware IOMMU to enforce DMA remapping. > + > +This interface is exposed under the `iommu_op` hypercall. > + > +In addition, Xen domains are modified in order to allow existence of several > +IOMMU context including a default one that implement default behavior (e.g > +hardware assisted paging) and can't be modified by guest. DomU cannot have > +contexts, and therefore act as if they only have the default domain. > + > +Each IOMMU context within a Xen domain is identified using a domain-specific > +context number that is used in the Xen IOMMU subsystem and the hypercall > +interface. > + > +The number of IOMMU context a domain is specified by either the toolstack or > +the domain itself. I don't understand what you want express with the above sentence. Maybe it's just me. > + > +# IOMMU operations > + > +## Initialize PV-IOMMU > + > +Initialize PV-IOMMU for the domain. > +It can only be called once. > + Could this operation be done automatically on first context allocation ? > +## Alloc context > + > +Create a new IOMMU context for the guest and return the context number to the > +guest. > +Fail if the IOMMU context limit of the guest is reached. > + > +A flag can be specified to create a identity mapping. > + > +## Free context > + > +Destroy a IOMMU context created previously. > +It is not possible to free the default context. > + > +Reattach context devices to default context if specified by the guest. > + > +Fail if there is a device in the context and reattach-to-default flag is not > +specified. > + > +## Reattach device > + > +Reattach a device to another IOMMU context (including the default one). > +The target IOMMU context number must be valid and the context allocated. > + > +The guest needs to specify a PCI SBDF of a device he has access to. > + > +## Map/unmap page > + > +Map/unmap a page on a context. > +The guest needs to specify a gfn and target dfn to map. > + > +Refuse to create the mapping if one already exist for the same dfn. > + > +## Lookup page > + > +Get the gfn mapped by a specific dfn. > + > +## Remote command > + > +Make a PV-IOMMU operation on behalf of another domain. > +Especially useful for implementing IOMMU emulation (e.g using QEMU) > +or initializing PV-IOMMU with enforced limits. > + > +# Implementation considerations > + > +## Hypercall batching > + > +In order to prevent unneeded hypercalls and IOMMU flushing, it is advisable to > +be able to batch some critical IOMMU operations (e.g map/unmap multiple pages). > + I suppose that batching also implies preemption. > +## Hardware without IOMMU support > + > +Operating system needs to be aware on PV-IOMMU capability, and whether it is > +able to make contexts. However, some operating system may critically fail in > +case they are able to make a new IOMMU context. Which is supposed to happen > +if no IOMMU hardware is available. > + > +The hypercall interface needs a interface to advertise the ability to create > +and manage IOMMU contexts including the amount of context the guest is able > +to use. Using these informations, the Dom0 may decide whether to use or not > +the PV-IOMMU interface. > + > +## Page pool for contexts > + > +In order to prevent unexpected starving on the hypervisor memory with a > +buggy Dom0. We can preallocate the pages the contexts will use and make > +map/unmap use these pages instead of allocating them dynamically. > + Regards, Frediano
Hello Frediano, Ok for typos fixes Le 19/02/2025 à 13:02, Frediano Ziglio a écrit : > On 17/02/2025 10:18, Teddy Astie wrote: >> +Each IOMMU context within a Xen domain is identified using a domain- >> specific >> +context number that is used in the Xen IOMMU subsystem and the hypercall >> +interface. >> + >> +The number of IOMMU context a domain is specified by either the >> toolstack or >> +the domain itself. > > I don't understand what you want express with the above sentence. > Maybe it's just me. > >> + >> +# IOMMU operations >> + >> +## Initialize PV-IOMMU >> + >> +Initialize PV-IOMMU for the domain. >> +It can only be called once. >> + > > Could this operation be done automatically on first context allocation ? > For initializing PV-IOMMU, you need to pass some additional parameters (memory/context limits). To avoid a guest from initializing with arbitrary limits, it can also be done by the toolstack (e.g domain builder) to enforce some specific limitations as this initialization can only be done once. >> +## Alloc context >> + >> +Create a new IOMMU context for the guest and return the context >> number to the >> +guest. >> +Fail if the IOMMU context limit of the guest is reached. >> + >> +A flag can be specified to create a identity mapping. >> + >> +## Free context >> + >> +Destroy a IOMMU context created previously. >> +It is not possible to free the default context. >> + >> +Reattach context devices to default context if specified by the guest. >> + >> +Fail if there is a device in the context and reattach-to-default flag >> is not >> +specified. >> + >> +## Reattach device >> + >> +Reattach a device to another IOMMU context (including the default one). >> +The target IOMMU context number must be valid and the context allocated. >> + >> +The guest needs to specify a PCI SBDF of a device he has access to. >> + >> +## Map/unmap page >> + >> +Map/unmap a page on a context. >> +The guest needs to specify a gfn and target dfn to map. >> + >> +Refuse to create the mapping if one already exist for the same dfn. >> + >> +## Lookup page >> + >> +Get the gfn mapped by a specific dfn. >> + >> +## Remote command >> + >> +Make a PV-IOMMU operation on behalf of another domain. >> +Especially useful for implementing IOMMU emulation (e.g using QEMU) >> +or initializing PV-IOMMU with enforced limits. >> + >> +# Implementation considerations >> + >> +## Hypercall batching >> + >> +In order to prevent unneeded hypercalls and IOMMU flushing, it is >> advisable to >> +be able to batch some critical IOMMU operations (e.g map/unmap >> multiple pages). >> + > > I suppose that batching also implies preemption. > Yes, the current implementation does it, but I haven't updated to doc on that aspect. >> +## Hardware without IOMMU support >> + >> +Operating system needs to be aware on PV-IOMMU capability, and >> whether it is >> +able to make contexts. However, some operating system may critically >> fail in >> +case they are able to make a new IOMMU context. Which is supposed to >> happen >> +if no IOMMU hardware is available. >> + >> +The hypercall interface needs a interface to advertise the ability to >> create >> +and manage IOMMU contexts including the amount of context the guest >> is able >> +to use. Using these informations, the Dom0 may decide whether to use >> or not >> +the PV-IOMMU interface. >> + >> +## Page pool for contexts >> + >> +In order to prevent unexpected starving on the hypervisor memory with a >> +buggy Dom0. We can preallocate the pages the contexts will use and make >> +map/unmap use these pages instead of allocating them dynamically. >> + > > Regards, > Frediano > Thanks Teddy Teddy Astie | Vates XCP-ng Developer XCP-ng & Xen Orchestra - Vates solutions web: https://vates.tech
© 2016 - 2025 Red Hat, Inc.