[RFC PATCH 0/5] qemu: Implement support for iommufd and multiple vSMMUs

Nathan Chen via Devel posted 5 patches 7 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/libvirt tags/patchew/20250515203643.21109-1-nathanc@nvidia.com
docs/formatdomain.rst                         |   5 +-
src/conf/domain_addr.c                        |  12 +-
src/conf/domain_addr.h                        |   4 +-
src/conf/domain_conf.c                        | 292 ++++++++++++++++--
src/conf/domain_conf.h                        |  21 +-
src/conf/domain_validate.c                    |  94 +++++-
src/conf/schemas/domaincommon.rng             |  37 ++-
src/conf/virconftypes.h                       |   2 +
src/libvirt_private.syms                      |   2 +
src/qemu/qemu_alias.c                         |  15 +-
src/qemu/qemu_cgroup.c                        |  47 +++
src/qemu/qemu_cgroup.h                        |   1 +
src/qemu/qemu_command.c                       | 146 ++++++---
src/qemu/qemu_domain_address.c                |  33 +-
src/qemu/qemu_driver.c                        |   8 +-
src/qemu/qemu_namespace.c                     |  36 +++
src/qemu/qemu_postparse.c                     |  11 +-
src/qemu/qemu_validate.c                      |  22 +-
...fio-iommufd-intel-iommu.x86_64-latest.args |  43 +++
...vfio-iommufd-intel-iommu.x86_64-latest.xml |  80 +++++
.../hostdev-vfio-iommufd-intel-iommu.xml      |  80 +++++
tests/qemuxmlconftest.c                       |   1 +
22 files changed, 878 insertions(+), 114 deletions(-)
create mode 100644 tests/qemuxmlconfdata/hostdev-vfio-iommufd-intel-iommu.x86_64-latest.args
create mode 100644 tests/qemuxmlconfdata/hostdev-vfio-iommufd-intel-iommu.x86_64-latest.xml
create mode 100644 tests/qemuxmlconfdata/hostdev-vfio-iommufd-intel-iommu.xml
[RFC PATCH 0/5] qemu: Implement support for iommufd and multiple vSMMUs
Posted by Nathan Chen via Devel 7 months ago
Hi,

This is a follow up to the first RFC patchset [0] for supporting multiple
vSMMU instances in a qemu VM. This patchset also introduces support for
using iommufd to propagate DMA mappings to kernel for assigned devices.

This patchset implements support for specifying multiple <iommu> devices
within the VM definition when smmuv3Dev IOMMU model is specified, and is
tested with Shameer's latest qemu RFC for HW-accelerated vSMMU devices [1]

Moreover, it adds a new 'iommufd' member for virDomainIOMMUDef,
in order to represent the iommufd object in qemu command line. This
patchset also implements new 'iommufdId' and 'iommufdFd' attributes for
hostdev devices to be associated with the iommufd object.

For instance, specifying the iommufd object and associated hostdev in a
VM definition with multiple IOMMUs, configured to be routed to
pcie-expander-bus controllers in a way where VFIO device to SMMUv3
associations are matched with the host (pcie-expander-bus and
pcie-root-port controllers are no longer auto-added/auto-routed
like in the first revision of this RFC, as the PCIe topology will be
configured by management apps):

  <devices>
...
    <controller type='pci' index='1' model='pcie-expander-bus'>
      <model name='pxb-pcie'/>
      <target busNr='252'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/>
    </controller>
    <controller type='pci' index='2' model='pcie-expander-bus'>
      <model name='pxb-pcie'/>
      <target busNr='248'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </controller>
...
    <controller type='pci' index='21' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='21' port='0x0'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </controller>
    <controller type='pci' index='22' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='22' port='0xa8'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
    </controller>
...
    <hostdev mode='subsystem' type='pci' managed='no'>
      <source>
        <address domain='0x0009' bus='0x01' slot='0x00' function='0x0'/>
      </source>
      <iommufdId>iommufd0</iommufdId>
      <address type='pci' domain='0x0000' bus='0x15' slot='0x00' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='no'>
      <source>
        <address domain='0x0019' bus='0x01' slot='0x00' function='0x0'/>
      </source>
      <iommufdId>iommufd0</iommufdId>
      <address type='pci' domain='0x0000' bus='0x16' slot='0x00' function='0x0'/>
    </hostdev>
    <iommu model='smmuv3Dev'>
      <iommufd>
        <id>iommufd0</id>
      </iommufd>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x01' function='0x0'/>
    </iommu>
    <iommu model='smmuv3Dev'>
      <iommufd>
        <id>iommufd0</id>
      </iommufd>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x01' function='0x0'/>
    </iommu>
  </devices>

This would get translated to a qemu command line with the arguments below:

 -device '{"driver":"pxb-pcie","bus_nr":252,"id":"pci.1","bus":"pcie.0","addr":"0x1"}' \
 -device '{"driver":"pxb-pcie","bus_nr":248,"id":"pci.2","bus":"pcie.0","addr":"0x2"}' \
 -device '{"driver":"pcie-root-port","port":0,"chassis":21,"id":"pci.21","bus":"pci.1","addr":"0x0"}' \
 -device '{"driver":"pcie-root-port","port":168,"chassis":22,"id":"pci.22","bus":"pci.2","addr":"0x0"}' \
 -object '{"qom-type":"iommufd","id":"iommufd0"}' \
 -device '{"driver":"arm-smmuv3-accel","bus":"pci.1"}' \
 -device '{"driver":"arm-smmuv3-accel","bus":"pci.2"}' \
 -device '{"driver":"vfio-pci","host":"0009:01:00.0","id":"hostdev0","iommufd":"iommufd0","bus":"pci.21","addr":"0x0"}' \
 -device '{"driver":"vfio-pci","host":"0019:01:00.0","id":"hostdev1","iommufd":"iommufd0","bus":"pci.22","addr":"0x0"}' \

If users would like to leverage qemu's iommufd feature to open the VFIO
cdev and /dev/iommu via an external management layer, the fd can be
specified like so in the VM definition:

  <devices>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x06' slot='0x12' function='0x2'/>
      </source>
      <iommufdId>iommufd0</iommufdId>
      <iommufdFd>23</iommufdFd>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </hostdev>
    <iommu model='intel'>
      <iommufd>
        <id>iommufd0</id>
        <fd>22</fd>
      </iommufd>
    </iommu>
  </devices>

This would get translated to a qemu command line with the arguments below:

-object '{"qom-type":"iommufd","id":"iommufd0","fd":"22"}' \
-device '{"driver":"vfio-pci","host":"0000:06:12.2","id":"hostdev1","iommufd":"iommufd0","fd":"23","bus":"pci.0","addr":"0x3"}' \

Summary of changes:
- Introduced support for specifying multiple <iommu> stanzas in the VM
XML definition when using smmuv3Dev model.
- Automating PCIe topology to populate VM definition with multiple vSMMUs
routed to pcie-expander-bus controllers is excluded, in favor of
deferring creation of PXBs and routing of VFIO devices to management apps.
- Introduced iommufd support.

TODO:
- I updated the namespace and cgroup configuration to allow access to iommufd
paths at /dev/vfio/devices/vfio* and /dev/iommu. However, qemu needs to be
launched with user and group set to 'root' in order for these paths to be
accessible. A passthrough device represented by /dev/vfio/18 normally has
'root' user and group permissions, but in the mount namespace it's changed to
'libvirt-qemu' and 'kvm'. I wasn't able to discern where this is happening by
looking at src/qemu/qemu_namespace.c and src/qemu/qemu_cgroup.c. Would you have
any pointers on how to change the iommufd paths' user and group permissions in
the libvirt mount namespace?

This series is on Github:
https://github.com/NathanChenNVIDIA/libvirt/tree/smmuv3Dev-iommufd-04-15-25

Thanks,
Nathan

[0] https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/thread/7GDT6RX5LPAJMPP4ZSC4ACME6GVMG236/
[1] https://lore.kernel.org/qemu-devel/20250311141045.66620-1-shameerali.kolothum.thodi@huawei.com/

Signed-off-by: Nathan Chen <nathanc@nvidia.com>

Nathan Chen (5):
  conf: Support multiple smmuv3Dev IOMMU devices
  conf: Add an iommufd member struct to virDomainIOMMUDef
  qemu: Implement support for associating iommufd to hostdev
  qemu: Update Cgroup and namespace for qemu to access iommufd paths
  qemu: Add test case for specifying iommufd

 docs/formatdomain.rst                         |   5 +-
 src/conf/domain_addr.c                        |  12 +-
 src/conf/domain_addr.h                        |   4 +-
 src/conf/domain_conf.c                        | 292 ++++++++++++++++--
 src/conf/domain_conf.h                        |  21 +-
 src/conf/domain_validate.c                    |  94 +++++-
 src/conf/schemas/domaincommon.rng             |  37 ++-
 src/conf/virconftypes.h                       |   2 +
 src/libvirt_private.syms                      |   2 +
 src/qemu/qemu_alias.c                         |  15 +-
 src/qemu/qemu_cgroup.c                        |  47 +++
 src/qemu/qemu_cgroup.h                        |   1 +
 src/qemu/qemu_command.c                       | 146 ++++++---
 src/qemu/qemu_domain_address.c                |  33 +-
 src/qemu/qemu_driver.c                        |   8 +-
 src/qemu/qemu_namespace.c                     |  36 +++
 src/qemu/qemu_postparse.c                     |  11 +-
 src/qemu/qemu_validate.c                      |  22 +-
 ...fio-iommufd-intel-iommu.x86_64-latest.args |  43 +++
 ...vfio-iommufd-intel-iommu.x86_64-latest.xml |  80 +++++
 .../hostdev-vfio-iommufd-intel-iommu.xml      |  80 +++++
 tests/qemuxmlconftest.c                       |   1 +
 22 files changed, 878 insertions(+), 114 deletions(-)
 create mode 100644 tests/qemuxmlconfdata/hostdev-vfio-iommufd-intel-iommu.x86_64-latest.args
 create mode 100644 tests/qemuxmlconfdata/hostdev-vfio-iommufd-intel-iommu.x86_64-latest.xml
 create mode 100644 tests/qemuxmlconfdata/hostdev-vfio-iommufd-intel-iommu.xml

-- 
2.43.0
Re: [RFC PATCH 0/5] qemu: Implement support for iommufd and multiple vSMMUs
Posted by Daniel P. Berrangé via Devel 6 months, 3 weeks ago
On Thu, May 15, 2025 at 01:36:38PM -0700, Nathan Chen via Devel wrote:
> Hi,
> 
> This is a follow up to the first RFC patchset [0] for supporting multiple
> vSMMU instances in a qemu VM. This patchset also introduces support for
> using iommufd to propagate DMA mappings to kernel for assigned devices.
> 
> This patchset implements support for specifying multiple <iommu> devices
> within the VM definition when smmuv3Dev IOMMU model is specified, and is
> tested with Shameer's latest qemu RFC for HW-accelerated vSMMU devices [1]
> 
> Moreover, it adds a new 'iommufd' member for virDomainIOMMUDef,
> in order to represent the iommufd object in qemu command line. This
> patchset also implements new 'iommufdId' and 'iommufdFd' attributes for
> hostdev devices to be associated with the iommufd object.
> 
> For instance, specifying the iommufd object and associated hostdev in a
> VM definition with multiple IOMMUs, configured to be routed to
> pcie-expander-bus controllers in a way where VFIO device to SMMUv3
> associations are matched with the host (pcie-expander-bus and
> pcie-root-port controllers are no longer auto-added/auto-routed
> like in the first revision of this RFC, as the PCIe topology will be
> configured by management apps):
> 
>   <devices>
> ...
>     <controller type='pci' index='1' model='pcie-expander-bus'>
>       <model name='pxb-pcie'/>
>       <target busNr='252'/>
>       <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/>
>     </controller>
>     <controller type='pci' index='2' model='pcie-expander-bus'>
>       <model name='pxb-pcie'/>
>       <target busNr='248'/>
>       <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
>     </controller>
> ...
>     <controller type='pci' index='21' model='pcie-root-port'>
>       <model name='pcie-root-port'/>
>       <target chassis='21' port='0x0'/>
>       <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
>     </controller>
>     <controller type='pci' index='22' model='pcie-root-port'>
>       <model name='pcie-root-port'/>
>       <target chassis='22' port='0xa8'/>
>       <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
>     </controller>
> ...
>     <hostdev mode='subsystem' type='pci' managed='no'>
>       <source>
>         <address domain='0x0009' bus='0x01' slot='0x00' function='0x0'/>
>       </source>
>       <iommufdId>iommufd0</iommufdId>
>       <address type='pci' domain='0x0000' bus='0x15' slot='0x00' function='0x0'/>
>     </hostdev>
>     <hostdev mode='subsystem' type='pci' managed='no'>
>       <source>
>         <address domain='0x0019' bus='0x01' slot='0x00' function='0x0'/>
>       </source>
>       <iommufdId>iommufd0</iommufdId>
>       <address type='pci' domain='0x0000' bus='0x16' slot='0x00' function='0x0'/>
>     </hostdev>
>     <iommu model='smmuv3Dev'>
>       <iommufd>
>         <id>iommufd0</id>
>       </iommufd>
>       <address type='pci' domain='0x0000' bus='0x01' slot='0x01' function='0x0'/>

IIUC, you're using <address> here to reference the earlier <controller>
pcie-expander-bus. This is a bit wierd as it is making it look like the
smmuv3Dev itself has a PCI address, but this is just the PCI address
of the controller.

The smmuv3dev also doesn't have an address on the pcie-expander-bus,
it is just an association IIUC.

So from this pov, I think I'd be inclined to say we should just
reference the <controller> based on its index, using an attribute

  <iommu model='smmuv3dev' controller='2'/>


>     </iommu>
>     <iommu model='smmuv3Dev'>
>       <iommufd>
>         <id>iommufd0</id>
>       </iommufd>
>       <address type='pci' domain='0x0000' bus='0x02' slot='0x01' function='0x0'/>
>     </iommu>
>   </devices>
> 
> This would get translated to a qemu command line with the arguments below:
> 
>  -device '{"driver":"pxb-pcie","bus_nr":252,"id":"pci.1","bus":"pcie.0","addr":"0x1"}' \
>  -device '{"driver":"pxb-pcie","bus_nr":248,"id":"pci.2","bus":"pcie.0","addr":"0x2"}' \
>  -device '{"driver":"pcie-root-port","port":0,"chassis":21,"id":"pci.21","bus":"pci.1","addr":"0x0"}' \
>  -device '{"driver":"pcie-root-port","port":168,"chassis":22,"id":"pci.22","bus":"pci.2","addr":"0x0"}' \
>  -object '{"qom-type":"iommufd","id":"iommufd0"}' \
>  -device '{"driver":"arm-smmuv3-accel","bus":"pci.1"}' \
>  -device '{"driver":"arm-smmuv3-accel","bus":"pci.2"}' \
>  -device '{"driver":"vfio-pci","host":"0009:01:00.0","id":"hostdev0","iommufd":"iommufd0","bus":"pci.21","addr":"0x0"}' \
>  -device '{"driver":"vfio-pci","host":"0019:01:00.0","id":"hostdev1","iommufd":"iommufd0","bus":"pci.22","addr":"0x0"}' \

The iommufd integration in the XML looks a bit wierd too - we have
four different elements all referencing 'iommufd0'  but nothing
is defining this. The iommu references the iommufd0, but nothing
actually uses this on the arm-smuv3-accel command line.


I've not been paying much attention to iommufd in QEMU, but IIUC
it will apply to x86_64 too. So I'm wondering how iommufd integration
sound work in libvirt more broadly.

> If users would like to leverage qemu's iommufd feature to open the VFIO
> cdev and /dev/iommu via an external management layer, the fd can be
> specified like so in the VM definition:
> 
>   <devices>
>     <hostdev mode='subsystem' type='pci' managed='yes'>
>       <driver name='vfio'/>
>       <source>
>         <address domain='0x0000' bus='0x06' slot='0x12' function='0x2'/>
>       </source>
>       <iommufdId>iommufd0</iommufdId>
>       <iommufdFd>23</iommufdFd>
>       <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
>     </hostdev>
>     <iommu model='intel'>
>       <iommufd>
>         <id>iommufd0</id>
>         <fd>22</fd>
>       </iommufd>
>     </iommu>
>   </devices>
> 
> This would get translated to a qemu command line with the arguments below:
> 
> -object '{"qom-type":"iommufd","id":"iommufd0","fd":"22"}' \
> -device '{"driver":"vfio-pci","host":"0000:06:12.2","id":"hostdev1","iommufd":"iommufd0","fd":"23","bus":"pci.0","addr":"0x3"}' \

I'm not getting why we have multiple different FDs here, when
we only have a single iommufd for the VMs ?

> 
> Summary of changes:
> - Introduced support for specifying multiple <iommu> stanzas in the VM
> XML definition when using smmuv3Dev model.
> - Automating PCIe topology to populate VM definition with multiple vSMMUs
> routed to pcie-expander-bus controllers is excluded, in favor of
> deferring creation of PXBs and routing of VFIO devices to management apps.
> - Introduced iommufd support.
> 
> TODO:
> - I updated the namespace and cgroup configuration to allow access to iommufd
> paths at /dev/vfio/devices/vfio* and /dev/iommu. However, qemu needs to be
> launched with user and group set to 'root' in order for these paths to be
> accessible. A passthrough device represented by /dev/vfio/18 normally has
> 'root' user and group permissions, but in the mount namespace it's changed to
> 'libvirt-qemu' and 'kvm'. I wasn't able to discern where this is happening by
> looking at src/qemu/qemu_namespace.c and src/qemu/qemu_cgroup.c. Would you have
> any pointers on how to change the iommufd paths' user and group permissions in
> the libvirt mount namespace?

All permissions are handled by the security managers in src/security,
both DAC file permissions/ownership and SELinux labelling.


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
Re: [RFC PATCH 0/5] qemu: Implement support for iommufd and multiple vSMMUs
Posted by Nathan Chen via Devel 6 months, 2 weeks ago
Hi Daniel,

On 5/20/2025 5:51 AM, Daniel P. Berrangé wrote:
>> Hi,
>>
>> This is a follow up to the first RFC patchset [0] for supporting multiple
>> vSMMU instances in a qemu VM. This patchset also introduces support for
>> using iommufd to propagate DMA mappings to kernel for assigned devices.
>>
>> This patchset implements support for specifying multiple <iommu> devices
>> within the VM definition when smmuv3Dev IOMMU model is specified, and is
>> tested with Shameer's latest qemu RFC for HW-accelerated vSMMU devices [1]
>>
>> Moreover, it adds a new 'iommufd' member for virDomainIOMMUDef,
>> in order to represent the iommufd object in qemu command line. This
>> patchset also implements new 'iommufdId' and 'iommufdFd' attributes for
>> hostdev devices to be associated with the iommufd object.
>>
>> For instance, specifying the iommufd object and associated hostdev in a
>> VM definition with multiple IOMMUs, configured to be routed to
>> pcie-expander-bus controllers in a way where VFIO device to SMMUv3
>> associations are matched with the host (pcie-expander-bus and
>> pcie-root-port controllers are no longer auto-added/auto-routed
>> like in the first revision of this RFC, as the PCIe topology will be
>> configured by management apps):
>>
>>    <devices>
>> ...
>>      <controller type='pci' index='1' model='pcie-expander-bus'>
>>        <model name='pxb-pcie'/>
>>        <target busNr='252'/>
>>        <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/>
>>      </controller>
>>      <controller type='pci' index='2' model='pcie-expander-bus'>
>>        <model name='pxb-pcie'/>
>>        <target busNr='248'/>
>>        <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
>>      </controller>
>> ...
>>      <controller type='pci' index='21' model='pcie-root-port'>
>>        <model name='pcie-root-port'/>
>>        <target chassis='21' port='0x0'/>
>>        <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
>>      </controller>
>>      <controller type='pci' index='22' model='pcie-root-port'>
>>        <model name='pcie-root-port'/>
>>        <target chassis='22' port='0xa8'/>
>>        <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
>>      </controller>
>> ...
>>      <hostdev mode='subsystem' type='pci' managed='no'>
>>        <source>
>>          <address domain='0x0009' bus='0x01' slot='0x00' function='0x0'/>
>>        </source>
>>        <iommufdId>iommufd0</iommufdId>
>>        <address type='pci' domain='0x0000' bus='0x15' slot='0x00' function='0x0'/>
>>      </hostdev>
>>      <hostdev mode='subsystem' type='pci' managed='no'>
>>        <source>
>>          <address domain='0x0019' bus='0x01' slot='0x00' function='0x0'/>
>>        </source>
>>        <iommufdId>iommufd0</iommufdId>
>>        <address type='pci' domain='0x0000' bus='0x16' slot='0x00' function='0x0'/>
>>      </hostdev>
>>      <iommu model='smmuv3Dev'>
>>        <iommufd>
>>          <id>iommufd0</id>
>>        </iommufd>
>>        <address type='pci' domain='0x0000' bus='0x01' slot='0x01' function='0x0'/>
> IIUC, you're using <address> here to reference the earlier <controller>
> pcie-expander-bus. This is a bit wierd as it is making it look like the
> smmuv3Dev itself has a PCI address, but this is just the PCI address
> of the controller.
> 
> The smmuv3dev also doesn't have an address on the pcie-expander-bus,
> it is just an association IIUC.
> 
> So from this pov, I think I'd be inclined to say we should just
> reference the <controller> based on its index, using an attribute
> 
>    <iommu model='smmuv3dev' controller='2'/>
> 

I see, I will revise this to reference the controller index instead.

>>      </iommu>
>>      <iommu model='smmuv3Dev'>
>>        <iommufd>
>>          <id>iommufd0</id>
>>        </iommufd>
>>        <address type='pci' domain='0x0000' bus='0x02' slot='0x01' function='0x0'/>
>>      </iommu>
>>    </devices>
>>
>> This would get translated to a qemu command line with the arguments below:
>>
>>   -device '{"driver":"pxb-pcie","bus_nr":252,"id":"pci.1","bus":"pcie.0","addr":"0x1"}' \
>>   -device '{"driver":"pxb-pcie","bus_nr":248,"id":"pci.2","bus":"pcie.0","addr":"0x2"}' \
>>   -device '{"driver":"pcie-root-port","port":0,"chassis":21,"id":"pci.21","bus":"pci.1","addr":"0x0"}' \
>>   -device '{"driver":"pcie-root-port","port":168,"chassis":22,"id":"pci.22","bus":"pci.2","addr":"0x0"}' \
>>   -object '{"qom-type":"iommufd","id":"iommufd0"}' \
>>   -device '{"driver":"arm-smmuv3-accel","bus":"pci.1"}' \
>>   -device '{"driver":"arm-smmuv3-accel","bus":"pci.2"}' \
>>   -device '{"driver":"vfio-pci","host":"0009:01:00.0","id":"hostdev0","iommufd":"iommufd0","bus":"pci.21","addr":"0x0"}' \
>>   -device '{"driver":"vfio-pci","host":"0019:01:00.0","id":"hostdev1","iommufd":"iommufd0","bus":"pci.22","addr":"0x0"}' \
> The iommufd integration in the XML looks a bit wierd too - we have
> four different elements all referencing 'iommufd0'  but nothing
> is defining this. The iommu references the iommufd0, but nothing
> actually uses this on the arm-smuv3-accel command line.
> 
> 
> I've not been paying much attention to iommufd in QEMU, but IIUC
> it will apply to x86_64 too. So I'm wondering how iommufd integration
> sound work in libvirt more broadly.
> 

It is my understanding that we want to consider device classes for 
libvirt device representation in XML, so I intended to have users 
declare the iommufd definition as an attribute under the <iommu> stanza, 
which would translate to the following qemu argument:

-object '{"qom-type":"iommufd","id":"iommufd0"}'

but since this series implements support for multiple <iommu> 
definitions, we specify iommufd0 for multiple <iommu> stanzas. For 
x86_64, we would just specify the iommufd attribute once under a single 
<iommu> stanza.

Would you suggest we move iommufd out of the <iommu> definition instead, 
like the examples below?

<domain type='kvm'>
...
   <devices>
     <iommufd>iommufd0</iommufd>
     <iommu model='smmuv3Dev' controller='22'/>
   </devices>
...
</domain>

*or*

<domain type='kvm'>
...
   <iommufd>iommufd0</iommufd>
   <devices>
     <iommu model='smmuv3Dev' controller='22'/>
   </devices>
...
</domain>

>> If users would like to leverage qemu's iommufd feature to open the VFIO
>> cdev and /dev/iommu via an external management layer, the fd can be
>> specified like so in the VM definition:
>>
>>    <devices>
>>      <hostdev mode='subsystem' type='pci' managed='yes'>
>>        <driver name='vfio'/>
>>        <source>
>>          <address domain='0x0000' bus='0x06' slot='0x12' function='0x2'/>
>>        </source>
>>        <iommufdId>iommufd0</iommufdId>
>>        <iommufdFd>23</iommufdFd>
>>        <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
>>      </hostdev>
>>      <iommu model='intel'>
>>        <iommufd>
>>          <id>iommufd0</id>
>>          <fd>22</fd>
>>        </iommufd>
>>      </iommu>
>>    </devices>
>>
>> This would get translated to a qemu command line with the arguments below:
>>
>> -object '{"qom-type":"iommufd","id":"iommufd0","fd":"22"}' \
>> -device '{"driver":"vfio-pci","host":"0000:06:12.2","id":"hostdev1","iommufd":"iommufd0","fd":"23","bus":"pci.0","addr":"0x3"}' \
> I'm not getting why we have multiple different FDs here, when
> we only have a single iommufd for the VMs ?
> 
>> Summary of changes:
>> - Introduced support for specifying multiple <iommu> stanzas in the VM
>> XML definition when using smmuv3Dev model.
>> - Automating PCIe topology to populate VM definition with multiple vSMMUs
>> routed to pcie-expander-bus controllers is excluded, in favor of
>> deferring creation of PXBs and routing of VFIO devices to management apps.
>> - Introduced iommufd support.
>>
>> TODO:
>> - I updated the namespace and cgroup configuration to allow access to iommufd
>> paths at /dev/vfio/devices/vfio* and /dev/iommu. However, qemu needs to be
>> launched with user and group set to 'root' in order for these paths to be
>> accessible. A passthrough device represented by /dev/vfio/18 normally has
>> 'root' user and group permissions, but in the mount namespace it's changed to
>> 'libvirt-qemu' and 'kvm'. I wasn't able to discern where this is happening by
>> looking at src/qemu/qemu_namespace.c and src/qemu/qemu_cgroup.c. Would you have
>> any pointers on how to change the iommufd paths' user and group permissions in
>> the libvirt mount namespace?
> All permissions are handled by the security managers in src/security,
> both DAC file permissions/ownership and SELinux labelling.
> 
> 

Thanks, I will take a look under src/security/ and try to resolve this.

> With regards,
> Daniel

Best,
Nathan
Re: [RFC PATCH 0/5] qemu: Implement support for iommufd and multiple vSMMUs
Posted by Daniel P. Berrangé via Devel 6 months, 2 weeks ago
On Tue, May 27, 2025 at 06:43:46PM -0700, Nathan Chen wrote:
> Hi Daniel,
> 
> On 5/20/2025 5:51 AM, Daniel P. Berrangé wrote:
> > > Hi,
> > > 
> > > This is a follow up to the first RFC patchset [0] for supporting multiple
> > > vSMMU instances in a qemu VM. This patchset also introduces support for
> > > using iommufd to propagate DMA mappings to kernel for assigned devices.
> > > 
> > > This patchset implements support for specifying multiple <iommu> devices
> > > within the VM definition when smmuv3Dev IOMMU model is specified, and is
> > > tested with Shameer's latest qemu RFC for HW-accelerated vSMMU devices [1]
> > > 
> > > Moreover, it adds a new 'iommufd' member for virDomainIOMMUDef,
> > > in order to represent the iommufd object in qemu command line. This
> > > patchset also implements new 'iommufdId' and 'iommufdFd' attributes for
> > > hostdev devices to be associated with the iommufd object.
> > > 
> > > For instance, specifying the iommufd object and associated hostdev in a
> > > VM definition with multiple IOMMUs, configured to be routed to
> > > pcie-expander-bus controllers in a way where VFIO device to SMMUv3
> > > associations are matched with the host (pcie-expander-bus and
> > > pcie-root-port controllers are no longer auto-added/auto-routed
> > > like in the first revision of this RFC, as the PCIe topology will be
> > > configured by management apps):
> > > 
> > >    <devices>
> > > ...
> > >      <controller type='pci' index='1' model='pcie-expander-bus'>
> > >        <model name='pxb-pcie'/>
> > >        <target busNr='252'/>
> > >        <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/>
> > >      </controller>
> > >      <controller type='pci' index='2' model='pcie-expander-bus'>
> > >        <model name='pxb-pcie'/>
> > >        <target busNr='248'/>
> > >        <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
> > >      </controller>
> > > ...
> > >      <controller type='pci' index='21' model='pcie-root-port'>
> > >        <model name='pcie-root-port'/>
> > >        <target chassis='21' port='0x0'/>
> > >        <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
> > >      </controller>
> > >      <controller type='pci' index='22' model='pcie-root-port'>
> > >        <model name='pcie-root-port'/>
> > >        <target chassis='22' port='0xa8'/>
> > >        <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
> > >      </controller>
> > > ...
> > >      <hostdev mode='subsystem' type='pci' managed='no'>
> > >        <source>
> > >          <address domain='0x0009' bus='0x01' slot='0x00' function='0x0'/>
> > >        </source>
> > >        <iommufdId>iommufd0</iommufdId>
> > >        <address type='pci' domain='0x0000' bus='0x15' slot='0x00' function='0x0'/>
> > >      </hostdev>
> > >      <hostdev mode='subsystem' type='pci' managed='no'>
> > >        <source>
> > >          <address domain='0x0019' bus='0x01' slot='0x00' function='0x0'/>
> > >        </source>
> > >        <iommufdId>iommufd0</iommufdId>
> > >        <address type='pci' domain='0x0000' bus='0x16' slot='0x00' function='0x0'/>
> > >      </hostdev>
> > >      <iommu model='smmuv3Dev'>
> > >        <iommufd>
> > >          <id>iommufd0</id>
> > >        </iommufd>
> > >        <address type='pci' domain='0x0000' bus='0x01' slot='0x01' function='0x0'/>
> > IIUC, you're using <address> here to reference the earlier <controller>
> > pcie-expander-bus. This is a bit wierd as it is making it look like the
> > smmuv3Dev itself has a PCI address, but this is just the PCI address
> > of the controller.
> > 
> > The smmuv3dev also doesn't have an address on the pcie-expander-bus,
> > it is just an association IIUC.
> > 
> > So from this pov, I think I'd be inclined to say we should just
> > reference the <controller> based on its index, using an attribute
> > 
> >    <iommu model='smmuv3dev' controller='2'/>
> > 
> 
> I see, I will revise this to reference the controller index instead.
> 
> > >      </iommu>
> > >      <iommu model='smmuv3Dev'>
> > >        <iommufd>
> > >          <id>iommufd0</id>
> > >        </iommufd>
> > >        <address type='pci' domain='0x0000' bus='0x02' slot='0x01' function='0x0'/>
> > >      </iommu>
> > >    </devices>
> > > 
> > > This would get translated to a qemu command line with the arguments below:
> > > 
> > >   -device '{"driver":"pxb-pcie","bus_nr":252,"id":"pci.1","bus":"pcie.0","addr":"0x1"}' \
> > >   -device '{"driver":"pxb-pcie","bus_nr":248,"id":"pci.2","bus":"pcie.0","addr":"0x2"}' \
> > >   -device '{"driver":"pcie-root-port","port":0,"chassis":21,"id":"pci.21","bus":"pci.1","addr":"0x0"}' \
> > >   -device '{"driver":"pcie-root-port","port":168,"chassis":22,"id":"pci.22","bus":"pci.2","addr":"0x0"}' \
> > >   -object '{"qom-type":"iommufd","id":"iommufd0"}' \
> > >   -device '{"driver":"arm-smmuv3-accel","bus":"pci.1"}' \
> > >   -device '{"driver":"arm-smmuv3-accel","bus":"pci.2"}' \
> > >   -device '{"driver":"vfio-pci","host":"0009:01:00.0","id":"hostdev0","iommufd":"iommufd0","bus":"pci.21","addr":"0x0"}' \
> > >   -device '{"driver":"vfio-pci","host":"0019:01:00.0","id":"hostdev1","iommufd":"iommufd0","bus":"pci.22","addr":"0x0"}' \
> > The iommufd integration in the XML looks a bit wierd too - we have
> > four different elements all referencing 'iommufd0'  but nothing
> > is defining this. The iommu references the iommufd0, but nothing
> > actually uses this on the arm-smuv3-accel command line.
> > 
> > 
> > I've not been paying much attention to iommufd in QEMU, but IIUC
> > it will apply to x86_64 too. So I'm wondering how iommufd integration
> > sound work in libvirt more broadly.
> > 
> 
> It is my understanding that we want to consider device classes for libvirt
> device representation in XML, so I intended to have users declare the
> iommufd definition as an attribute under the <iommu> stanza, which would>
 translate to the following qemu argument:
> 
> -object '{"qom-type":"iommufd","id":"iommufd0"}'
> 
> but since this series implements support for multiple <iommu> definitions,
> we specify iommufd0 for multiple <iommu> stanzas. For x86_64, we would just
> specify the iommufd attribute once under a single <iommu> stanza.
> 
> Would you suggest we move iommufd out of the <iommu> definition instead,
> like the examples below?

AFAICT iommufd isn't connected to the guest iommu at all in terms
of configuration, it is simply an attribute of the hostdev. eg
we could do

  <hostdev mode='subsystem' type='mdev' model='vfio-pci' iommufd='on'>

that does leave open the possibility that someone configures iommufd on
one hostdev, but not on another, but that's not as bad as when we set it
on the <iommu> too. So something we can validate in post-parse logic if
we need to ensure consistent usage - if qemu allows a mix of iommfd and
non-iommufd for vfio-pci, we can just allow that at libvirt too

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
Re: [RFC PATCH 0/5] qemu: Implement support for iommufd and multiple vSMMUs
Posted by Nathan Chen via Devel 6 months, 2 weeks ago

On 5/30/2025 8:05 AM, Daniel P. Berrangé wrote:
>> Hi Daniel,
>>
>> On 5/20/2025 5:51 AM, Daniel P. Berrangé wrote:
>>>> Hi,
>>>>
>>>> This is a follow up to the first RFC patchset [0] for supporting multiple
>>>> vSMMU instances in a qemu VM. This patchset also introduces support for
>>>> using iommufd to propagate DMA mappings to kernel for assigned devices.
>>>>
>>>> This patchset implements support for specifying multiple <iommu> devices
>>>> within the VM definition when smmuv3Dev IOMMU model is specified, and is
>>>> tested with Shameer's latest qemu RFC for HW-accelerated vSMMU devices [1]
>>>>
>>>> Moreover, it adds a new 'iommufd' member for virDomainIOMMUDef,
>>>> in order to represent the iommufd object in qemu command line. This
>>>> patchset also implements new 'iommufdId' and 'iommufdFd' attributes for
>>>> hostdev devices to be associated with the iommufd object.
>>>>
>>>> For instance, specifying the iommufd object and associated hostdev in a
>>>> VM definition with multiple IOMMUs, configured to be routed to
>>>> pcie-expander-bus controllers in a way where VFIO device to SMMUv3
>>>> associations are matched with the host (pcie-expander-bus and
>>>> pcie-root-port controllers are no longer auto-added/auto-routed
>>>> like in the first revision of this RFC, as the PCIe topology will be
>>>> configured by management apps):
>>>>
>>>>     <devices>
>>>> ...
>>>>       <controller type='pci' index='1' model='pcie-expander-bus'>
>>>>         <model name='pxb-pcie'/>
>>>>         <target busNr='252'/>
>>>>         <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/>
>>>>       </controller>
>>>>       <controller type='pci' index='2' model='pcie-expander-bus'>
>>>>         <model name='pxb-pcie'/>
>>>>         <target busNr='248'/>
>>>>         <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
>>>>       </controller>
>>>> ...
>>>>       <controller type='pci' index='21' model='pcie-root-port'>
>>>>         <model name='pcie-root-port'/>
>>>>         <target chassis='21' port='0x0'/>
>>>>         <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
>>>>       </controller>
>>>>       <controller type='pci' index='22' model='pcie-root-port'>
>>>>         <model name='pcie-root-port'/>
>>>>         <target chassis='22' port='0xa8'/>
>>>>         <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
>>>>       </controller>
>>>> ...
>>>>       <hostdev mode='subsystem' type='pci' managed='no'>
>>>>         <source>
>>>>           <address domain='0x0009' bus='0x01' slot='0x00' function='0x0'/>
>>>>         </source>
>>>>         <iommufdId>iommufd0</iommufdId>
>>>>         <address type='pci' domain='0x0000' bus='0x15' slot='0x00' function='0x0'/>
>>>>       </hostdev>
>>>>       <hostdev mode='subsystem' type='pci' managed='no'>
>>>>         <source>
>>>>           <address domain='0x0019' bus='0x01' slot='0x00' function='0x0'/>
>>>>         </source>
>>>>         <iommufdId>iommufd0</iommufdId>
>>>>         <address type='pci' domain='0x0000' bus='0x16' slot='0x00' function='0x0'/>
>>>>       </hostdev>
>>>>       <iommu model='smmuv3Dev'>
>>>>         <iommufd>
>>>>           <id>iommufd0</id>
>>>>         </iommufd>
>>>>         <address type='pci' domain='0x0000' bus='0x01' slot='0x01' function='0x0'/>
>>> IIUC, you're using <address> here to reference the earlier <controller>
>>> pcie-expander-bus. This is a bit wierd as it is making it look like the
>>> smmuv3Dev itself has a PCI address, but this is just the PCI address
>>> of the controller.
>>>
>>> The smmuv3dev also doesn't have an address on the pcie-expander-bus,
>>> it is just an association IIUC.
>>>
>>> So from this pov, I think I'd be inclined to say we should just
>>> reference the <controller> based on its index, using an attribute
>>>
>>>     <iommu model='smmuv3dev' controller='2'/>
>>>
>> I see, I will revise this to reference the controller index instead.
>>
>>>>       </iommu>
>>>>       <iommu model='smmuv3Dev'>
>>>>         <iommufd>
>>>>           <id>iommufd0</id>
>>>>         </iommufd>
>>>>         <address type='pci' domain='0x0000' bus='0x02' slot='0x01' function='0x0'/>
>>>>       </iommu>
>>>>     </devices>
>>>>
>>>> This would get translated to a qemu command line with the arguments below:
>>>>
>>>>    -device '{"driver":"pxb-pcie","bus_nr":252,"id":"pci.1","bus":"pcie.0","addr":"0x1"}' \
>>>>    -device '{"driver":"pxb-pcie","bus_nr":248,"id":"pci.2","bus":"pcie.0","addr":"0x2"}' \
>>>>    -device '{"driver":"pcie-root-port","port":0,"chassis":21,"id":"pci.21","bus":"pci.1","addr":"0x0"}' \
>>>>    -device '{"driver":"pcie-root-port","port":168,"chassis":22,"id":"pci.22","bus":"pci.2","addr":"0x0"}' \
>>>>    -object '{"qom-type":"iommufd","id":"iommufd0"}' \
>>>>    -device '{"driver":"arm-smmuv3-accel","bus":"pci.1"}' \
>>>>    -device '{"driver":"arm-smmuv3-accel","bus":"pci.2"}' \
>>>>    -device '{"driver":"vfio-pci","host":"0009:01:00.0","id":"hostdev0","iommufd":"iommufd0","bus":"pci.21","addr":"0x0"}' \
>>>>    -device '{"driver":"vfio-pci","host":"0019:01:00.0","id":"hostdev1","iommufd":"iommufd0","bus":"pci.22","addr":"0x0"}' \
>>> The iommufd integration in the XML looks a bit wierd too - we have
>>> four different elements all referencing 'iommufd0'  but nothing
>>> is defining this. The iommu references the iommufd0, but nothing
>>> actually uses this on the arm-smuv3-accel command line.
>>>
>>>
>>> I've not been paying much attention to iommufd in QEMU, but IIUC
>>> it will apply to x86_64 too. So I'm wondering how iommufd integration
>>> sound work in libvirt more broadly.
>>>
>> It is my understanding that we want to consider device classes for libvirt
>> device representation in XML, so I intended to have users declare the
>> iommufd definition as an attribute under the <iommu> stanza, which would>
>   translate to the following qemu argument:
>> -object '{"qom-type":"iommufd","id":"iommufd0"}'
>>
>> but since this series implements support for multiple <iommu> definitions,
>> we specify iommufd0 for multiple <iommu> stanzas. For x86_64, we would just
>> specify the iommufd attribute once under a single <iommu> stanza.
>>
>> Would you suggest we move iommufd out of the <iommu> definition instead,
>> like the examples below?
> AFAICT iommufd isn't connected to the guest iommu at all in terms
> of configuration, it is simply an attribute of the hostdev. eg
> we could do
> 
>    <hostdev mode='subsystem' type='mdev' model='vfio-pci' iommufd='on'>
> 
> that does leave open the possibility that someone configures iommufd on
> one hostdev, but not on another, but that's not as bad as when we set it
> on the <iommu> too. So something we can validate in post-parse logic if
> we need to ensure consistent usage - if qemu allows a mix of iommfd and
> non-iommufd for vfio-pci, we can just allow that at libvirt too

Ok sounds good, I will make this only a hostdev attribute and generate 
the qemu iommufd object argument when detected instead of having users 
specify it under the guest iommu definition. We should be able to allow 
a mix of iommufd and non-iommufd for vfio-pci, and we can avoid the case 
where someone specifies it for the hostdev but not in the <iommu> stanza 
like you mentioned.

Thanks,
Nathan
RE: [RFC PATCH 0/5] qemu: Implement support for iommufd and multiple vSMMUs
Posted by Shameerali Kolothum Thodi via Devel 7 months ago

> -----Original Message-----
> From: Nathan Chen <nathanc@nvidia.com>
> Sent: Thursday, May 15, 2025 9:37 PM
> To: devel@lists.libvirt.org
> Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>;
> nicolinc@nvidia.com; Nathan Chen <nathanc@nvidia.com>
> Subject: [RFC PATCH 0/5] qemu: Implement support for iommufd and multiple
> vSMMUs
> 
> Hi,
> 
> This is a follow up to the first RFC patchset [0] for supporting multiple
> vSMMU instances in a qemu VM. This patchset also introduces support for
> using iommufd to propagate DMA mappings to kernel for assigned devices.
> 
> This patchset implements support for specifying multiple <iommu> devices
> within the VM definition when smmuv3Dev IOMMU model is specified, and is
> tested with Shameer's latest qemu RFC for HW-accelerated vSMMU devices
> [1]

Based on feedback released on the above RFC and the discussion here[1],
there are certain changes to the name of the vSMMU device and the way
we associate the PCIe bus.

Going forward it is more likely to be something like below,

-device arm-smmuv3,primary-bus=pcie.0,accel=on
-device vfio-pci,host=xxx,,bus=pcie.0
-device pxb-pcie,id=pcie.1,bus_nr=2
-device arm-smmuv3,primary-bus=pcie.1,accel=on
...

Hopefully, this doesn't warrant any major changes to this libvirt
series, but please do make a note of it.

Thanks,
Shameer
[0] https://lore.kernel.org/qemu-devel/aB25ZRu7pCJNpamt@redhat.com/

> Moreover, it adds a new 'iommufd' member for virDomainIOMMUDef,
> in order to represent the iommufd object in qemu command line. This
> patchset also implements new 'iommufdId' and 'iommufdFd' attributes for
> hostdev devices to be associated with the iommufd object.
> 
> For instance, specifying the iommufd object and associated hostdev in a
> VM definition with multiple IOMMUs, configured to be routed to
> pcie-expander-bus controllers in a way where VFIO device to SMMUv3
> associations are matched with the host (pcie-expander-bus and
> pcie-root-port controllers are no longer auto-added/auto-routed
> like in the first revision of this RFC, as the PCIe topology will be
> configured by management apps):
> 
>   <devices>
> ...
>     <controller type='pci' index='1' model='pcie-expander-bus'>
>       <model name='pxb-pcie'/>
>       <target busNr='252'/>
>       <address type='pci' domain='0x0000' bus='0x00' slot='0x01'
> function='0x0'/>
>     </controller>
>     <controller type='pci' index='2' model='pcie-expander-bus'>
>       <model name='pxb-pcie'/>
>       <target busNr='248'/>
>       <address type='pci' domain='0x0000' bus='0x00' slot='0x02'
> function='0x0'/>
>     </controller>
> ...
>     <controller type='pci' index='21' model='pcie-root-port'>
>       <model name='pcie-root-port'/>
>       <target chassis='21' port='0x0'/>
>       <address type='pci' domain='0x0000' bus='0x01' slot='0x00'
> function='0x0'/>
>     </controller>
>     <controller type='pci' index='22' model='pcie-root-port'>
>       <model name='pcie-root-port'/>
>       <target chassis='22' port='0xa8'/>
>       <address type='pci' domain='0x0000' bus='0x02' slot='0x00'
> function='0x0'/>
>     </controller>
> ...
>     <hostdev mode='subsystem' type='pci' managed='no'>
>       <source>
>         <address domain='0x0009' bus='0x01' slot='0x00' function='0x0'/>
>       </source>
>       <iommufdId>iommufd0</iommufdId>
>       <address type='pci' domain='0x0000' bus='0x15' slot='0x00'
> function='0x0'/>
>     </hostdev>
>     <hostdev mode='subsystem' type='pci' managed='no'>
>       <source>
>         <address domain='0x0019' bus='0x01' slot='0x00' function='0x0'/>
>       </source>
>       <iommufdId>iommufd0</iommufdId>
>       <address type='pci' domain='0x0000' bus='0x16' slot='0x00'
> function='0x0'/>
>     </hostdev>
>     <iommu model='smmuv3Dev'>
>       <iommufd>
>         <id>iommufd0</id>
>       </iommufd>
>       <address type='pci' domain='0x0000' bus='0x01' slot='0x01'
> function='0x0'/>
>     </iommu>
>     <iommu model='smmuv3Dev'>
>       <iommufd>
>         <id>iommufd0</id>
>       </iommufd>
>       <address type='pci' domain='0x0000' bus='0x02' slot='0x01'
> function='0x0'/>
>     </iommu>
>   </devices>
> 
> This would get translated to a qemu command line with the arguments below:
> 
>  -device '{"driver":"pxb-
> pcie","bus_nr":252,"id":"pci.1","bus":"pcie.0","addr":"0x1"}' \
>  -device '{"driver":"pxb-
> pcie","bus_nr":248,"id":"pci.2","bus":"pcie.0","addr":"0x2"}' \
>  -device '{"driver":"pcie-root-
> port","port":0,"chassis":21,"id":"pci.21","bus":"pci.1","addr":"0x0"}' \
>  -device '{"driver":"pcie-root-
> port","port":168,"chassis":22,"id":"pci.22","bus":"pci.2","addr":"0x0"}' \
>  -object '{"qom-type":"iommufd","id":"iommufd0"}' \
>  -device '{"driver":"arm-smmuv3-accel","bus":"pci.1"}' \
>  -device '{"driver":"arm-smmuv3-accel","bus":"pci.2"}' \
>  -device '{"driver":"vfio-
> pci","host":"0009:01:00.0","id":"hostdev0","iommufd":"iommufd0","bus":"pci
> .21","addr":"0x0"}' \
>  -device '{"driver":"vfio-
> pci","host":"0019:01:00.0","id":"hostdev1","iommufd":"iommufd0","bus":"pci
> .22","addr":"0x0"}' \
> 
> If users would like to leverage qemu's iommufd feature to open the VFIO
> cdev and /dev/iommu via an external management layer, the fd can be
> specified like so in the VM definition:
> 
>   <devices>
>     <hostdev mode='subsystem' type='pci' managed='yes'>
>       <driver name='vfio'/>
>       <source>
>         <address domain='0x0000' bus='0x06' slot='0x12' function='0x2'/>
>       </source>
>       <iommufdId>iommufd0</iommufdId>
>       <iommufdFd>23</iommufdFd>
>       <address type='pci' domain='0x0000' bus='0x00' slot='0x03'
> function='0x0'/>
>     </hostdev>
>     <iommu model='intel'>
>       <iommufd>
>         <id>iommufd0</id>
>         <fd>22</fd>
>       </iommufd>
>     </iommu>
>   </devices>
> 
> This would get translated to a qemu command line with the arguments below:
> 
> -object '{"qom-type":"iommufd","id":"iommufd0","fd":"22"}' \
> -device '{"driver":"vfio-
> pci","host":"0000:06:12.2","id":"hostdev1","iommufd":"iommufd0","fd":"23",
> "bus":"pci.0","addr":"0x3"}' \
> 
> Summary of changes:
> - Introduced support for specifying multiple <iommu> stanzas in the VM
> XML definition when using smmuv3Dev model.
> - Automating PCIe topology to populate VM definition with multiple vSMMUs
> routed to pcie-expander-bus controllers is excluded, in favor of
> deferring creation of PXBs and routing of VFIO devices to management apps.
> - Introduced iommufd support.
> 
> TODO:
> - I updated the namespace and cgroup configuration to allow access to
> iommufd
> paths at /dev/vfio/devices/vfio* and /dev/iommu. However, qemu needs to
> be
> launched with user and group set to 'root' in order for these paths to be
> accessible. A passthrough device represented by /dev/vfio/18 normally has
> 'root' user and group permissions, but in the mount namespace it's changed
> to
> 'libvirt-qemu' and 'kvm'. I wasn't able to discern where this is happening by
> looking at src/qemu/qemu_namespace.c and src/qemu/qemu_cgroup.c.
> Would you have
> any pointers on how to change the iommufd paths' user and group
> permissions in
> the libvirt mount namespace?
> 
> This series is on Github:
> https://github.com/NathanChenNVIDIA/libvirt/tree/smmuv3Dev-iommufd-04-
> 15-25
> 
> Thanks,
> Nathan
> 
> [0]
> https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/thread/7GDT6RX5L
> PAJMPP4ZSC4ACME6GVMG236/
> [1] https://lore.kernel.org/qemu-devel/20250311141045.66620-1-
> shameerali.kolothum.thodi@huawei.com/
> 
> Signed-off-by: Nathan Chen <nathanc@nvidia.com>
> 
> Nathan Chen (5):
>   conf: Support multiple smmuv3Dev IOMMU devices
>   conf: Add an iommufd member struct to virDomainIOMMUDef
>   qemu: Implement support for associating iommufd to hostdev
>   qemu: Update Cgroup and namespace for qemu to access iommufd paths
>   qemu: Add test case for specifying iommufd
> 
>  docs/formatdomain.rst                         |   5 +-
>  src/conf/domain_addr.c                        |  12 +-
>  src/conf/domain_addr.h                        |   4 +-
>  src/conf/domain_conf.c                        | 292 ++++++++++++++++--
>  src/conf/domain_conf.h                        |  21 +-
>  src/conf/domain_validate.c                    |  94 +++++-
>  src/conf/schemas/domaincommon.rng             |  37 ++-
>  src/conf/virconftypes.h                       |   2 +
>  src/libvirt_private.syms                      |   2 +
>  src/qemu/qemu_alias.c                         |  15 +-
>  src/qemu/qemu_cgroup.c                        |  47 +++
>  src/qemu/qemu_cgroup.h                        |   1 +
>  src/qemu/qemu_command.c                       | 146 ++++++---
>  src/qemu/qemu_domain_address.c                |  33 +-
>  src/qemu/qemu_driver.c                        |   8 +-
>  src/qemu/qemu_namespace.c                     |  36 +++
>  src/qemu/qemu_postparse.c                     |  11 +-
>  src/qemu/qemu_validate.c                      |  22 +-
>  ...fio-iommufd-intel-iommu.x86_64-latest.args |  43 +++
>  ...vfio-iommufd-intel-iommu.x86_64-latest.xml |  80 +++++
>  .../hostdev-vfio-iommufd-intel-iommu.xml      |  80 +++++
>  tests/qemuxmlconftest.c                       |   1 +
>  22 files changed, 878 insertions(+), 114 deletions(-)
>  create mode 100644 tests/qemuxmlconfdata/hostdev-vfio-iommufd-intel-
> iommu.x86_64-latest.args
>  create mode 100644 tests/qemuxmlconfdata/hostdev-vfio-iommufd-intel-
> iommu.x86_64-latest.xml
>  create mode 100644 tests/qemuxmlconfdata/hostdev-vfio-iommufd-intel-
> iommu.xml
> 
> --
> 2.43.0
Re: [RFC PATCH 0/5] qemu: Implement support for iommufd and multiple vSMMUs
Posted by Nathan Chen via Devel 6 months, 2 weeks ago

On 5/16/2025 3:19 AM, Shameerali Kolothum Thodi wrote:
>> Hi,
>>
>> This is a follow up to the first RFC patchset [0] for supporting multiple
>> vSMMU instances in a qemu VM. This patchset also introduces support for
>> using iommufd to propagate DMA mappings to kernel for assigned devices.
>>
>> This patchset implements support for specifying multiple <iommu> devices
>> within the VM definition when smmuv3Dev IOMMU model is specified, and is
>> tested with Shameer's latest qemu RFC for HW-accelerated vSMMU devices
>> [1]
> Based on feedback released on the above RFC and the discussion here[1],
> there are certain changes to the name of the vSMMU device and the way
> we associate the PCIe bus.
> 
> Going forward it is more likely to be something like below,
> 
> -device arm-smmuv3,primary-bus=pcie.0,accel=on
> -device vfio-pci,host=xxx,,bus=pcie.0
> -device pxb-pcie,id=pcie.1,bus_nr=2
> -device arm-smmuv3,primary-bus=pcie.1,accel=on
> ...
> 
> Hopefully, this doesn't warrant any major changes to this libvirt
> series, but please do make a note of it.
> 
> Thanks,
> Shameer

Thanks Shameer, I will make a note of this for the next revision.

Best,
Nathan