RE: [RFC PATCH 0/5] qemu: Implement support for iommufd and multiple vSMMUs

Posted by Shameerali Kolothum Thodi via Devel 8 months, 3 weeks ago


> -----Original Message-----
> From: Nathan Chen <nathanc@nvidia.com>
> Sent: Thursday, May 15, 2025 9:37 PM
> To: devel@lists.libvirt.org
> Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>;
> nicolinc@nvidia.com; Nathan Chen <nathanc@nvidia.com>
> Subject: [RFC PATCH 0/5] qemu: Implement support for iommufd and multiple
> vSMMUs
> 
> Hi,
> 
> This is a follow up to the first RFC patchset [0] for supporting multiple
> vSMMU instances in a qemu VM. This patchset also introduces support for
> using iommufd to propagate DMA mappings to kernel for assigned devices.
> 
> This patchset implements support for specifying multiple <iommu> devices
> within the VM definition when smmuv3Dev IOMMU model is specified, and is
> tested with Shameer's latest qemu RFC for HW-accelerated vSMMU devices
> [1]

Based on feedback released on the above RFC and the discussion here[1],
there are certain changes to the name of the vSMMU device and the way
we associate the PCIe bus.

Going forward it is more likely to be something like below,

-device arm-smmuv3,primary-bus=pcie.0,accel=on
-device vfio-pci,host=xxx,,bus=pcie.0
-device pxb-pcie,id=pcie.1,bus_nr=2
-device arm-smmuv3,primary-bus=pcie.1,accel=on
...

Hopefully, this doesn't warrant any major changes to this libvirt
series, but please do make a note of it.

Thanks,
Shameer
[0] https://lore.kernel.org/qemu-devel/aB25ZRu7pCJNpamt@redhat.com/

> Moreover, it adds a new 'iommufd' member for virDomainIOMMUDef,
> in order to represent the iommufd object in qemu command line. This
> patchset also implements new 'iommufdId' and 'iommufdFd' attributes for
> hostdev devices to be associated with the iommufd object.
> 
> For instance, specifying the iommufd object and associated hostdev in a
> VM definition with multiple IOMMUs, configured to be routed to
> pcie-expander-bus controllers in a way where VFIO device to SMMUv3
> associations are matched with the host (pcie-expander-bus and
> pcie-root-port controllers are no longer auto-added/auto-routed
> like in the first revision of this RFC, as the PCIe topology will be
> configured by management apps):
> 
>   <devices>
> ...
>     <controller type='pci' index='1' model='pcie-expander-bus'>
>       <model name='pxb-pcie'/>
>       <target busNr='252'/>
>       <address type='pci' domain='0x0000' bus='0x00' slot='0x01'
> function='0x0'/>
>     </controller>
>     <controller type='pci' index='2' model='pcie-expander-bus'>
>       <model name='pxb-pcie'/>
>       <target busNr='248'/>
>       <address type='pci' domain='0x0000' bus='0x00' slot='0x02'
> function='0x0'/>
>     </controller>
> ...
>     <controller type='pci' index='21' model='pcie-root-port'>
>       <model name='pcie-root-port'/>
>       <target chassis='21' port='0x0'/>
>       <address type='pci' domain='0x0000' bus='0x01' slot='0x00'
> function='0x0'/>
>     </controller>
>     <controller type='pci' index='22' model='pcie-root-port'>
>       <model name='pcie-root-port'/>
>       <target chassis='22' port='0xa8'/>
>       <address type='pci' domain='0x0000' bus='0x02' slot='0x00'
> function='0x0'/>
>     </controller>
> ...
>     <hostdev mode='subsystem' type='pci' managed='no'>
>       <source>
>         <address domain='0x0009' bus='0x01' slot='0x00' function='0x0'/>
>       </source>
>       <iommufdId>iommufd0</iommufdId>
>       <address type='pci' domain='0x0000' bus='0x15' slot='0x00'
> function='0x0'/>
>     </hostdev>
>     <hostdev mode='subsystem' type='pci' managed='no'>
>       <source>
>         <address domain='0x0019' bus='0x01' slot='0x00' function='0x0'/>
>       </source>
>       <iommufdId>iommufd0</iommufdId>
>       <address type='pci' domain='0x0000' bus='0x16' slot='0x00'
> function='0x0'/>
>     </hostdev>
>     <iommu model='smmuv3Dev'>
>       <iommufd>
>         <id>iommufd0</id>
>       </iommufd>
>       <address type='pci' domain='0x0000' bus='0x01' slot='0x01'
> function='0x0'/>
>     </iommu>
>     <iommu model='smmuv3Dev'>
>       <iommufd>
>         <id>iommufd0</id>
>       </iommufd>
>       <address type='pci' domain='0x0000' bus='0x02' slot='0x01'
> function='0x0'/>
>     </iommu>
>   </devices>
> 
> This would get translated to a qemu command line with the arguments below:
> 
>  -device '{"driver":"pxb-
> pcie","bus_nr":252,"id":"pci.1","bus":"pcie.0","addr":"0x1"}' \
>  -device '{"driver":"pxb-
> pcie","bus_nr":248,"id":"pci.2","bus":"pcie.0","addr":"0x2"}' \
>  -device '{"driver":"pcie-root-
> port","port":0,"chassis":21,"id":"pci.21","bus":"pci.1","addr":"0x0"}' \
>  -device '{"driver":"pcie-root-
> port","port":168,"chassis":22,"id":"pci.22","bus":"pci.2","addr":"0x0"}' \
>  -object '{"qom-type":"iommufd","id":"iommufd0"}' \
>  -device '{"driver":"arm-smmuv3-accel","bus":"pci.1"}' \
>  -device '{"driver":"arm-smmuv3-accel","bus":"pci.2"}' \
>  -device '{"driver":"vfio-
> pci","host":"0009:01:00.0","id":"hostdev0","iommufd":"iommufd0","bus":"pci
> .21","addr":"0x0"}' \
>  -device '{"driver":"vfio-
> pci","host":"0019:01:00.0","id":"hostdev1","iommufd":"iommufd0","bus":"pci
> .22","addr":"0x0"}' \
> 
> If users would like to leverage qemu's iommufd feature to open the VFIO
> cdev and /dev/iommu via an external management layer, the fd can be
> specified like so in the VM definition:
> 
>   <devices>
>     <hostdev mode='subsystem' type='pci' managed='yes'>
>       <driver name='vfio'/>
>       <source>
>         <address domain='0x0000' bus='0x06' slot='0x12' function='0x2'/>
>       </source>
>       <iommufdId>iommufd0</iommufdId>
>       <iommufdFd>23</iommufdFd>
>       <address type='pci' domain='0x0000' bus='0x00' slot='0x03'
> function='0x0'/>
>     </hostdev>
>     <iommu model='intel'>
>       <iommufd>
>         <id>iommufd0</id>
>         <fd>22</fd>
>       </iommufd>
>     </iommu>
>   </devices>
> 
> This would get translated to a qemu command line with the arguments below:
> 
> -object '{"qom-type":"iommufd","id":"iommufd0","fd":"22"}' \
> -device '{"driver":"vfio-
> pci","host":"0000:06:12.2","id":"hostdev1","iommufd":"iommufd0","fd":"23",
> "bus":"pci.0","addr":"0x3"}' \
> 
> Summary of changes:
> - Introduced support for specifying multiple <iommu> stanzas in the VM
> XML definition when using smmuv3Dev model.
> - Automating PCIe topology to populate VM definition with multiple vSMMUs
> routed to pcie-expander-bus controllers is excluded, in favor of
> deferring creation of PXBs and routing of VFIO devices to management apps.
> - Introduced iommufd support.
> 
> TODO:
> - I updated the namespace and cgroup configuration to allow access to
> iommufd
> paths at /dev/vfio/devices/vfio* and /dev/iommu. However, qemu needs to
> be
> launched with user and group set to 'root' in order for these paths to be
> accessible. A passthrough device represented by /dev/vfio/18 normally has
> 'root' user and group permissions, but in the mount namespace it's changed
> to
> 'libvirt-qemu' and 'kvm'. I wasn't able to discern where this is happening by
> looking at src/qemu/qemu_namespace.c and src/qemu/qemu_cgroup.c.
> Would you have
> any pointers on how to change the iommufd paths' user and group
> permissions in
> the libvirt mount namespace?
> 
> This series is on Github:
> https://github.com/NathanChenNVIDIA/libvirt/tree/smmuv3Dev-iommufd-04-
> 15-25
> 
> Thanks,
> Nathan
> 
> [0]
> https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/thread/7GDT6RX5L
> PAJMPP4ZSC4ACME6GVMG236/
> [1] https://lore.kernel.org/qemu-devel/20250311141045.66620-1-
> shameerali.kolothum.thodi@huawei.com/
> 
> Signed-off-by: Nathan Chen <nathanc@nvidia.com>
> 
> Nathan Chen (5):
>   conf: Support multiple smmuv3Dev IOMMU devices
>   conf: Add an iommufd member struct to virDomainIOMMUDef
>   qemu: Implement support for associating iommufd to hostdev
>   qemu: Update Cgroup and namespace for qemu to access iommufd paths
>   qemu: Add test case for specifying iommufd
> 
>  docs/formatdomain.rst                         |   5 +-
>  src/conf/domain_addr.c                        |  12 +-
>  src/conf/domain_addr.h                        |   4 +-
>  src/conf/domain_conf.c                        | 292 ++++++++++++++++--
>  src/conf/domain_conf.h                        |  21 +-
>  src/conf/domain_validate.c                    |  94 +++++-
>  src/conf/schemas/domaincommon.rng             |  37 ++-
>  src/conf/virconftypes.h                       |   2 +
>  src/libvirt_private.syms                      |   2 +
>  src/qemu/qemu_alias.c                         |  15 +-
>  src/qemu/qemu_cgroup.c                        |  47 +++
>  src/qemu/qemu_cgroup.h                        |   1 +
>  src/qemu/qemu_command.c                       | 146 ++++++---
>  src/qemu/qemu_domain_address.c                |  33 +-
>  src/qemu/qemu_driver.c                        |   8 +-
>  src/qemu/qemu_namespace.c                     |  36 +++
>  src/qemu/qemu_postparse.c                     |  11 +-
>  src/qemu/qemu_validate.c                      |  22 +-
>  ...fio-iommufd-intel-iommu.x86_64-latest.args |  43 +++
>  ...vfio-iommufd-intel-iommu.x86_64-latest.xml |  80 +++++
>  .../hostdev-vfio-iommufd-intel-iommu.xml      |  80 +++++
>  tests/qemuxmlconftest.c                       |   1 +
>  22 files changed, 878 insertions(+), 114 deletions(-)
>  create mode 100644 tests/qemuxmlconfdata/hostdev-vfio-iommufd-intel-
> iommu.x86_64-latest.args
>  create mode 100644 tests/qemuxmlconfdata/hostdev-vfio-iommufd-intel-
> iommu.x86_64-latest.xml
>  create mode 100644 tests/qemuxmlconfdata/hostdev-vfio-iommufd-intel-
> iommu.xml
> 
> --
> 2.43.0

Re: [RFC PATCH 0/5] qemu: Implement support for iommufd and multiple vSMMUs

Posted by Nathan Chen via Devel 8 months, 2 weeks ago


On 5/16/2025 3:19 AM, Shameerali Kolothum Thodi wrote:
>> Hi,
>>
>> This is a follow up to the first RFC patchset [0] for supporting multiple
>> vSMMU instances in a qemu VM. This patchset also introduces support for
>> using iommufd to propagate DMA mappings to kernel for assigned devices.
>>
>> This patchset implements support for specifying multiple <iommu> devices
>> within the VM definition when smmuv3Dev IOMMU model is specified, and is
>> tested with Shameer's latest qemu RFC for HW-accelerated vSMMU devices
>> [1]
> Based on feedback released on the above RFC and the discussion here[1],
> there are certain changes to the name of the vSMMU device and the way
> we associate the PCIe bus.
> 
> Going forward it is more likely to be something like below,
> 
> -device arm-smmuv3,primary-bus=pcie.0,accel=on
> -device vfio-pci,host=xxx,,bus=pcie.0
> -device pxb-pcie,id=pcie.1,bus_nr=2
> -device arm-smmuv3,primary-bus=pcie.1,accel=on
> ...
> 
> Hopefully, this doesn't warrant any major changes to this libvirt
> series, but please do make a note of it.
> 
> Thanks,
> Shameer

Thanks Shameer, I will make a note of this for the next revision.

Best,
Nathan