[RFC PATCH v2 00/20] hw/arm/virt: Add support for user-creatable accelerated SMMUv3

Shameer Kolothum via posted 20 patches 3 weeks, 4 days ago
backends/iommufd.c            |  51 +++
backends/trace-events         |   2 +
hw/arm/Kconfig                |   5 +
hw/arm/meson.build            |   1 +
hw/arm/smmu-common.c          |  95 +++++-
hw/arm/smmuv3-accel.c         | 616 ++++++++++++++++++++++++++++++++++
hw/arm/smmuv3-internal.h      |  54 +++
hw/arm/smmuv3.c               |  80 ++++-
hw/arm/trace-events           |   6 +
hw/arm/virt-acpi-build.c      | 113 ++++++-
hw/arm/virt.c                 |  12 +
hw/core/sysbus-fdt.c          |   1 +
include/hw/arm/smmu-common.h  |  14 +
include/hw/arm/smmuv3-accel.h |  75 +++++
include/hw/arm/virt.h         |   1 +
include/system/iommufd.h      |  14 +
16 files changed, 1101 insertions(+), 39 deletions(-)
create mode 100644 hw/arm/smmuv3-accel.c
create mode 100644 include/hw/arm/smmuv3-accel.h
[RFC PATCH v2 00/20] hw/arm/virt: Add support for user-creatable accelerated SMMUv3
Posted by Shameer Kolothum via 3 weeks, 4 days ago
Hi All,

This patch series introduces initial support for a user-creatable
accelerated SMMUv3 device (-device arm-smmuv3-accel) in QEMU.

Why this is needed:

Currently, QEMU’s ARM SMMUv3 emulation (iommu=smmuv3) is tied to the
machine and does not support configuring the host SMMUv3 in nested
mode.This limitation prevents its use with vfio-pci passthrough
devices.

The new pluggable smmuv3-accel device enables host SMMUv3 configuration
with nested stage support (Stage 1 owned by the Guest and Stage 2 by the
host) via the new IOMMUFD APIs. Additionally, it allows multiple 
accelerated vSMMUv3 instances for guests running on hosts with multiple
physical SMMUv3s.

This will benefit in:
-Reduced invalidation broadcasts and lookups for devices behind multiple
 physical SMMUv3s.
-Simplifies handling of host SMMUv3s with differing feature sets.
-Lays the groundwork for additional capabilities like vCMDQ support.


Changes from RFCv1[0]:

Thanks to everyone who provided feedback on RFCv1!. 

–The device is now called arm-smmuv3-accel instead of arm-smmuv3-nested
 to better reflect its role in using the host's physical SMMUv3 for page
 table setup and cache invalidations.
-Includes patches for VIOMMU and VDEVICE IOMMUFD APIs (patches 1,2).
-Merges patches from Nicolin’s GitHub repository that add accelerated
 functionalityi for page table setup and cache invalidations[1]. I have
 modified these a bit, but hopefully has not broken anything.
-Incorporates various fixes and improvements based on RFCv1 feedback.
–Adds support for vfio-pci hotplug with smmuv3-accel.

Note: IORT RMR patches for MSI setup are currently excluded as we may
adopt a different approach for MSI handling in the future [2].

Also this has dependency on the common iommufd/vfio patches from
Zhenzhong's series here[3]

ToDos:

–At least one vfio-pci device must currently be cold-plugged to a
 pxb-pcie bus associated with the arm-smmuv3-accel. This is required both
 to associate a vSMMUv3 with a host SMMUv3 and also needed to
 retrieve the host SMMUv3 IDR registers for guest export.
 Future updates will remove this restriction by adding the
 necessary kernel support.
 Please find the discussion here[4]
-This version does not yet support host SMMUv3 fault handling or
 other event notifications. These will be addressed in a
 future patch series.


The complete branch can be found here:
https://github.com/hisilicon/qemu/tree/master-smmuv3-accel-rfcv2-ext

I have done basic sanity testing on a Hisilicon Platform using the kernel
branch here:
https://github.com/nicolinc/iommufd/tree/iommufd_msi-rfcv2

Usage Eg:

On a HiSilicon platform that has multiple host SMMUv3s, the ACC ZIP VF
devices and HNS VF devices are behind different host SMMUv3s. So for a
Guest, specify two arm-smmuv3-accel devices each behind a pxb-pcie as below,


./qemu-system-aarch64 -machine virt,accel=kvm,gic-version=3 \
-cpu host -smp cpus=4 -m size=4G,slots=4,maxmem=256G \
-bios QEMU_EFI.fd \
-object iommufd,id=iommufd0 \
-device virtio-blk-device,drive=fs \
-drive if=none,file=rootfs.qcow2,id=fs \
-device pxb-pcie,id=pcie.1,bus_nr=1,bus=pcie.0 \
-device arm-smmuv3-accel,bus=pcie.1 \
-device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1,pref64-reserve=2M,io-reserve=1K \
-device vfio-pci,host=0000:7d:02.1,bus=pcie.port1,iommufd=iommufd0 \
-device pcie-root-port,id=pcie.port2,bus=pcie.1,chassis=2,pref64-reserve=2M,io-reserve=1K \
-device vfio-pci,host=0000:7d:02.2,bus=pcie.port2,iommufd=iommufd0 \
-device pxb-pcie,id=pcie.2,bus_nr=8,bus=pcie.0 \
-device arm-smmuv3-accel,bus=pcie.2 \
-device pcie-root-port,id=pcie.port3,bus=pcie.2,chassis=3,pref64-reserve=2M,io-reserve=1K \
-device vfio-pci,host=0000:75:00.1,bus=pcie.port3,iommufd=iommufd0 \
-kernel Image \
-append "rdinit=init console=ttyAMA0 root=/dev/vda2 rw earlycon=pl011,0x9000000" \
-device virtio-9p-pci,fsdev=p9fs,mount_tag=p9,bus=pcie.0 \
-fsdev local,id=p9fs,path=p9root,security_model=mapped \
-net none \
-nographic

Guest will boot with two SMMUv3s,
...
arm-smmu-v3 arm-smmu-v3.0.auto: option mask 0x0
arm-smmu-v3 arm-smmu-v3.0.auto: ias 44-bit, oas 44-bit (features 0x00008325)
arm-smmu-v3 arm-smmu-v3.0.auto: allocated 65536 entries for cmdq
arm-smmu-v3 arm-smmu-v3.0.auto: allocated 32768 entries for evtq
arm-smmu-v3 arm-smmu-v3.1.auto: option mask 0x0
arm-smmu-v3 arm-smmu-v3.1.auto: ias 44-bit, oas 44-bit (features 0x00008325)
arm-smmu-v3 arm-smmu-v3.1.auto: allocated 65536 entries for cmdq
arm-smmu-v3 arm-smmu-v3.1.auto: allocated 32768 entries for evtq

With a pci topology like below,

[root@localhost ~]# lspci -tv
-+-[0000:00]-+-00.0  Red Hat, Inc. QEMU PCIe Host bridge
 |           +-01.0  Red Hat, Inc. QEMU PCIe Expander bridge
 |           +-02.0  Red Hat, Inc. QEMU PCIe Expander bridge
 |           \-03.0  Virtio: Virtio filesystem
 +-[0000:01]-+-00.0-[02]----00.0  Huawei Technologies Co., Ltd. HNS Network Controller (Virtual Function)
 |           \-01.0-[03]----00.0  Huawei Technologies Co., Ltd. HNS Network Controller (Virtual Function)
 \-[0000:08]---00.0-[09]----00.0  Huawei Technologies Co., Ltd. HiSilicon ZIP Engine(Virtual Function)

Further tests are always welcome.

Please take a look and let me know your feedback!

Thanks,
Shameer

[0] https://lore.kernel.org/qemu-devel/20241108125242.60136-1-shameerali.kolothum.thodi@huawei.com/
[1] https://github.com/nicolinc/qemu/commit/3acbb7f3d114d6bb70f4895aa66a9ec28e6561d6
[2] https://lore.kernel.org/linux-iommu/cover.1740014950.git.nicolinc@nvidia.com/
[3] https://lore.kernel.org/qemu-devel/20250219082228.3303163-1-zhenzhong.duan@intel.com/
[4] https://lore.kernel.org/qemu-devel/Z6TLSdwgajmHVmGH@redhat.com/

Nicolin Chen (11):
  backends/iommufd: Introduce iommufd_backend_alloc_viommu
  backends/iommufd: Introduce iommufd_vdev_alloc
  hw/arm/smmuv3-accel: Add set/unset_iommu_device callback
  hw/arm/smmuv3-accel: Support nested STE install/uninstall support
  hw/arm/smmuv3-accel: Allocate a vDEVICE object for device
  hw/arm/smmuv3-accel: Return sysmem if stage-1 is bypassed
  hw/arm/smmuv3-accel: Introduce helpers to batch and issue cache
    invalidations
  hw/arm/smmuv3: Forward invalidation commands to hw
  hw/arm/smmuv3-accel: Read host SMMUv3 device info
  hw/arm/smmuv3: Check idr registers for STE_S1CDMAX and STE_S1STALLD
  hw/arm/smmu-common: Bypass emulated IOTLB for a accel SMMUv3

Shameer Kolothum (9):
  hw/arm/smmuv3-accel: Add initial infrastructure for smmuv3-accel
    device
  hw/arm/virt: Add support for smmuv3-accel
  hw/arm/smmuv3-accel: Associate a pxb-pcie bus
  hw/arm/smmu-common: Factor out common helper functions and export
  hw/arm/smmu-common: Introduce callbacks for PCIIOMMUOps
  hw/arm/smmuv3-accel: Provide get_address_space callback
  hw/arm/smmuv3: Install nested ste for CFGI_STE
  hw/arm/virt-acpi-build: Update IORT with multiple smmuv3-accel nodes
  hw/arm/smmuv3-accel: Enable smmuv3-accel creation

 backends/iommufd.c            |  51 +++
 backends/trace-events         |   2 +
 hw/arm/Kconfig                |   5 +
 hw/arm/meson.build            |   1 +
 hw/arm/smmu-common.c          |  95 +++++-
 hw/arm/smmuv3-accel.c         | 616 ++++++++++++++++++++++++++++++++++
 hw/arm/smmuv3-internal.h      |  54 +++
 hw/arm/smmuv3.c               |  80 ++++-
 hw/arm/trace-events           |   6 +
 hw/arm/virt-acpi-build.c      | 113 ++++++-
 hw/arm/virt.c                 |  12 +
 hw/core/sysbus-fdt.c          |   1 +
 include/hw/arm/smmu-common.h  |  14 +
 include/hw/arm/smmuv3-accel.h |  75 +++++
 include/hw/arm/virt.h         |   1 +
 include/system/iommufd.h      |  14 +
 16 files changed, 1101 insertions(+), 39 deletions(-)
 create mode 100644 hw/arm/smmuv3-accel.c
 create mode 100644 include/hw/arm/smmuv3-accel.h

-- 
2.34.1


Re: [RFC PATCH v2 00/20] hw/arm/virt: Add support for user-creatable accelerated SMMUv3
Posted by Eric Auger 1 week, 4 days ago
Hi,

On 3/11/25 3:10 PM, Shameer Kolothum wrote:
> Hi All,
>
> This patch series introduces initial support for a user-creatable
> accelerated SMMUv3 device (-device arm-smmuv3-accel) in QEMU.
>
> Why this is needed:
>
> Currently, QEMU’s ARM SMMUv3 emulation (iommu=smmuv3) is tied to the
> machine and does not support configuring the host SMMUv3 in nested
> mode.This limitation prevents its use with vfio-pci passthrough
> devices.
>
> The new pluggable smmuv3-accel device enables host SMMUv3 configuration
> with nested stage support (Stage 1 owned by the Guest and Stage 2 by the
> host) via the new IOMMUFD APIs. Additionally, it allows multiple 
> accelerated vSMMUv3 instances for guests running on hosts with multiple
> physical SMMUv3s.
>
> This will benefit in:
> -Reduced invalidation broadcasts and lookups for devices behind multiple
>  physical SMMUv3s.
> -Simplifies handling of host SMMUv3s with differing feature sets.
> -Lays the groundwork for additional capabilities like vCMDQ support.
>
>
> Changes from RFCv1[0]:
>
> Thanks to everyone who provided feedback on RFCv1!. 
>
> –The device is now called arm-smmuv3-accel instead of arm-smmuv3-nested
>  to better reflect its role in using the host's physical SMMUv3 for page
>  table setup and cache invalidations.
> -Includes patches for VIOMMU and VDEVICE IOMMUFD APIs (patches 1,2).
> -Merges patches from Nicolin’s GitHub repository that add accelerated
>  functionalityi for page table setup and cache invalidations[1]. I have
>  modified these a bit, but hopefully has not broken anything.
> -Incorporates various fixes and improvements based on RFCv1 feedback.
> –Adds support for vfio-pci hotplug with smmuv3-accel.
>
> Note: IORT RMR patches for MSI setup are currently excluded as we may
> adopt a different approach for MSI handling in the future [2].
>
> Also this has dependency on the common iommufd/vfio patches from
> Zhenzhong's series here[3]
>
> ToDos:
>
> –At least one vfio-pci device must currently be cold-plugged to a
>  pxb-pcie bus associated with the arm-smmuv3-accel. This is required both
>  to associate a vSMMUv3 with a host SMMUv3 and also needed to
>  retrieve the host SMMUv3 IDR registers for guest export.
>  Future updates will remove this restriction by adding the
>  necessary kernel support.
>  Please find the discussion here[4]
> -This version does not yet support host SMMUv3 fault handling or
>  other event notifications. These will be addressed in a
>  future patch series.
>
>
> The complete branch can be found here:
> https://github.com/hisilicon/qemu/tree/master-smmuv3-accel-rfcv2-ext
>
> I have done basic sanity testing on a Hisilicon Platform using the kernel
> branch here:
> https://github.com/nicolinc/iommufd/tree/iommufd_msi-rfcv2
>
> Usage Eg:
>
> On a HiSilicon platform that has multiple host SMMUv3s, the ACC ZIP VF
> devices and HNS VF devices are behind different host SMMUv3s. So for a
> Guest, specify two arm-smmuv3-accel devices each behind a pxb-pcie as below,
>
>
> ./qemu-system-aarch64 -machine virt,accel=kvm,gic-version=3 \
> -cpu host -smp cpus=4 -m size=4G,slots=4,maxmem=256G \
> -bios QEMU_EFI.fd \
> -object iommufd,id=iommufd0 \
> -device virtio-blk-device,drive=fs \
> -drive if=none,file=rootfs.qcow2,id=fs \
> -device pxb-pcie,id=pcie.1,bus_nr=1,bus=pcie.0 \
> -device arm-smmuv3-accel,bus=pcie.1 \
> -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1,pref64-reserve=2M,io-reserve=1K \
> -device vfio-pci,host=0000:7d:02.1,bus=pcie.port1,iommufd=iommufd0 \
> -device pcie-root-port,id=pcie.port2,bus=pcie.1,chassis=2,pref64-reserve=2M,io-reserve=1K \
> -device vfio-pci,host=0000:7d:02.2,bus=pcie.port2,iommufd=iommufd0 \
> -device pxb-pcie,id=pcie.2,bus_nr=8,bus=pcie.0 \
> -device arm-smmuv3-accel,bus=pcie.2 \
> -device pcie-root-port,id=pcie.port3,bus=pcie.2,chassis=3,pref64-reserve=2M,io-reserve=1K \
> -device vfio-pci,host=0000:75:00.1,bus=pcie.port3,iommufd=iommufd0 \
> -kernel Image \
> -append "rdinit=init console=ttyAMA0 root=/dev/vda2 rw earlycon=pl011,0x9000000" \
> -device virtio-9p-pci,fsdev=p9fs,mount_tag=p9,bus=pcie.0 \
> -fsdev local,id=p9fs,path=p9root,security_model=mapped \
> -net none \
> -nographic
>
> Guest will boot with two SMMUv3s,
> ...
> arm-smmu-v3 arm-smmu-v3.0.auto: option mask 0x0
> arm-smmu-v3 arm-smmu-v3.0.auto: ias 44-bit, oas 44-bit (features 0x00008325)
> arm-smmu-v3 arm-smmu-v3.0.auto: allocated 65536 entries for cmdq
> arm-smmu-v3 arm-smmu-v3.0.auto: allocated 32768 entries for evtq
> arm-smmu-v3 arm-smmu-v3.1.auto: option mask 0x0
> arm-smmu-v3 arm-smmu-v3.1.auto: ias 44-bit, oas 44-bit (features 0x00008325)
> arm-smmu-v3 arm-smmu-v3.1.auto: allocated 65536 entries for cmdq
> arm-smmu-v3 arm-smmu-v3.1.auto: allocated 32768 entries for evtq
>
> With a pci topology like below,
>
> [root@localhost ~]# lspci -tv
> -+-[0000:00]-+-00.0  Red Hat, Inc. QEMU PCIe Host bridge
>  |           +-01.0  Red Hat, Inc. QEMU PCIe Expander bridge
>  |           +-02.0  Red Hat, Inc. QEMU PCIe Expander bridge
>  |           \-03.0  Virtio: Virtio filesystem
>  +-[0000:01]-+-00.0-[02]----00.0  Huawei Technologies Co., Ltd. HNS Network Controller (Virtual Function)
>  |           \-01.0-[03]----00.0  Huawei Technologies Co., Ltd. HNS Network Controller (Virtual Function)
>  \-[0000:08]---00.0-[09]----00.0  Huawei Technologies Co., Ltd. HiSilicon ZIP Engine(Virtual Function)

For the record I tested the series with host VFIO device and a
virtio-blk-pci device put behind the same pxb-pcie/smmu protection and
it works just fine

-+-[0000:0a]-+-01.0-[0b]----00.0  Mellanox Technologies ConnectX Family
mlx5Gen Virtual Function
 |           \-01.1-[0c]----00.0  Red Hat, Inc. Virtio 1.0 block device
 \-[0000:00]-+-00.0  Red Hat, Inc. QEMU PCIe Host bridge
             +-01.0-[01]--
             +-01.1-[02]--
             \-02.0  Red Hat, Inc. QEMU PCIe Expander bridge

This shows that without vcmdq feature there is no blocker having the
same smmu device protecting both accelerated and emulated devices.

Thanks

Eric
>
> Further tests are always welcome.
>
> Please take a look and let me know your feedback!
>
> Thanks,
> Shameer
>
> [0] https://lore.kernel.org/qemu-devel/20241108125242.60136-1-shameerali.kolothum.thodi@huawei.com/
> [1] https://github.com/nicolinc/qemu/commit/3acbb7f3d114d6bb70f4895aa66a9ec28e6561d6
> [2] https://lore.kernel.org/linux-iommu/cover.1740014950.git.nicolinc@nvidia.com/
> [3] https://lore.kernel.org/qemu-devel/20250219082228.3303163-1-zhenzhong.duan@intel.com/
> [4] https://lore.kernel.org/qemu-devel/Z6TLSdwgajmHVmGH@redhat.com/
>
> Nicolin Chen (11):
>   backends/iommufd: Introduce iommufd_backend_alloc_viommu
>   backends/iommufd: Introduce iommufd_vdev_alloc
>   hw/arm/smmuv3-accel: Add set/unset_iommu_device callback
>   hw/arm/smmuv3-accel: Support nested STE install/uninstall support
>   hw/arm/smmuv3-accel: Allocate a vDEVICE object for device
>   hw/arm/smmuv3-accel: Return sysmem if stage-1 is bypassed
>   hw/arm/smmuv3-accel: Introduce helpers to batch and issue cache
>     invalidations
>   hw/arm/smmuv3: Forward invalidation commands to hw
>   hw/arm/smmuv3-accel: Read host SMMUv3 device info
>   hw/arm/smmuv3: Check idr registers for STE_S1CDMAX and STE_S1STALLD
>   hw/arm/smmu-common: Bypass emulated IOTLB for a accel SMMUv3
>
> Shameer Kolothum (9):
>   hw/arm/smmuv3-accel: Add initial infrastructure for smmuv3-accel
>     device
>   hw/arm/virt: Add support for smmuv3-accel
>   hw/arm/smmuv3-accel: Associate a pxb-pcie bus
>   hw/arm/smmu-common: Factor out common helper functions and export
>   hw/arm/smmu-common: Introduce callbacks for PCIIOMMUOps
>   hw/arm/smmuv3-accel: Provide get_address_space callback
>   hw/arm/smmuv3: Install nested ste for CFGI_STE
>   hw/arm/virt-acpi-build: Update IORT with multiple smmuv3-accel nodes
>   hw/arm/smmuv3-accel: Enable smmuv3-accel creation
>
>  backends/iommufd.c            |  51 +++
>  backends/trace-events         |   2 +
>  hw/arm/Kconfig                |   5 +
>  hw/arm/meson.build            |   1 +
>  hw/arm/smmu-common.c          |  95 +++++-
>  hw/arm/smmuv3-accel.c         | 616 ++++++++++++++++++++++++++++++++++
>  hw/arm/smmuv3-internal.h      |  54 +++
>  hw/arm/smmuv3.c               |  80 ++++-
>  hw/arm/trace-events           |   6 +
>  hw/arm/virt-acpi-build.c      | 113 ++++++-
>  hw/arm/virt.c                 |  12 +
>  hw/core/sysbus-fdt.c          |   1 +
>  include/hw/arm/smmu-common.h  |  14 +
>  include/hw/arm/smmuv3-accel.h |  75 +++++
>  include/hw/arm/virt.h         |   1 +
>  include/system/iommufd.h      |  14 +
>  16 files changed, 1101 insertions(+), 39 deletions(-)
>  create mode 100644 hw/arm/smmuv3-accel.c
>  create mode 100644 include/hw/arm/smmuv3-accel.h
>


RE: [RFC PATCH v2 00/20] hw/arm/virt: Add support for user-creatable accelerated SMMUv3
Posted by Shameerali Kolothum Thodi via 1 week, 3 days ago
Hi Eric,

> -----Original Message-----
> From: Eric Auger <eric.auger@redhat.com>
> Sent: Tuesday, March 25, 2025 2:43 PM
> To: Shameerali Kolothum Thodi
> <shameerali.kolothum.thodi@huawei.com>; qemu-arm@nongnu.org;
> qemu-devel@nongnu.org
> Cc: peter.maydell@linaro.org; jgg@nvidia.com; nicolinc@nvidia.com;
> ddutile@redhat.com; berrange@redhat.com; nathanc@nvidia.com;
> mochs@nvidia.com; smostafa@google.com; Linuxarm
> <linuxarm@huawei.com>; Wangzhou (B) <wangzhou1@hisilicon.com>;
> jiangkunkun <jiangkunkun@huawei.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>; zhangfei.gao@linaro.org
> Subject: Re: [RFC PATCH v2 00/20] hw/arm/virt: Add support for user-
> creatable accelerated SMMUv3
> 


> For the record I tested the series with host VFIO device and a
> virtio-blk-pci device put behind the same pxb-pcie/smmu protection and
> it works just fine
> 
> -+-[0000:0a]-+-01.0-[0b]----00.0  Mellanox Technologies ConnectX Family
> mlx5Gen Virtual Function
>  |           \-01.1-[0c]----00.0  Red Hat, Inc. Virtio 1.0 block device
>  \-[0000:00]-+-00.0  Red Hat, Inc. QEMU PCIe Host bridge
>              +-01.0-[01]--
>              +-01.1-[02]--
>              \-02.0  Red Hat, Inc. QEMU PCIe Expander bridge
> 
> This shows that without vcmdq feature there is no blocker having the
> same smmu device protecting both accelerated and emulated devices.

Thanks for giving it a spin. Yes, it currently supports the above. 

At the moment we are not using the IOTLB for the emulated dev for a
config like above.  Have you checked performance for either emulated or
vfio dev with the above config? Whatever light tests I have done it shows
performance degradation for emulated dev compared to the default
SMMUv3(iommu=smmuv3). 

And if the emulated dev issues _TLBI_NH_ASID, the code currently will propagate
that down to host SMMUv3. This will affect the vfio dev as well.

So the question is whether we want to allow this(assuming user is educated) or
block such a config as user has an option of using a non-accel smmuv3 for
emulated devices.

Thanks,
Shameer


Re: [RFC PATCH v2 00/20] hw/arm/virt: Add support for user-creatable accelerated SMMUv3
Posted by Nicolin Chen via 1 week, 3 days ago
On Tue, Mar 25, 2025 at 03:43:29PM +0000, Shameerali Kolothum Thodi wrote:
> > For the record I tested the series with host VFIO device and a
> > virtio-blk-pci device put behind the same pxb-pcie/smmu protection and
> > it works just fine
> > 
> > -+-[0000:0a]-+-01.0-[0b]----00.0  Mellanox Technologies ConnectX Family
> > mlx5Gen Virtual Function
> >  |           \-01.1-[0c]----00.0  Red Hat, Inc. Virtio 1.0 block device
> >  \-[0000:00]-+-00.0  Red Hat, Inc. QEMU PCIe Host bridge
> >              +-01.0-[01]--
> >              +-01.1-[02]--
> >              \-02.0  Red Hat, Inc. QEMU PCIe Expander bridge
> > 
> > This shows that without vcmdq feature there is no blocker having the
> > same smmu device protecting both accelerated and emulated devices.
> 
> Thanks for giving it a spin. Yes, it currently supports the above. 
> 
> At the moment we are not using the IOTLB for the emulated dev for a
> config like above.  Have you checked performance for either emulated or
> vfio dev with the above config? Whatever light tests I have done it shows
> performance degradation for emulated dev compared to the default
> SMMUv3(iommu=smmuv3). 
> 
> And if the emulated dev issues _TLBI_NH_ASID, the code currently will propagate
> that down to host SMMUv3. This will affect the vfio dev as well.

VA too. Only commands with an SID field can be simply excluded.
I think we should be concerned that the underlying SMMU CMDQ HW
has a very limited command executing power, so wasting command
cycles doesn't feel very ideal as it could impact the host OS
(and other VMs too).

Thanks
Nicolin
Re: [RFC PATCH v2 00/20] hw/arm/virt: Add support for user-creatable accelerated SMMUv3
Posted by Eric Auger 1 week, 3 days ago
Hi Shameer, Nicolin,

On 3/25/25 7:26 PM, Nicolin Chen wrote:
> On Tue, Mar 25, 2025 at 03:43:29PM +0000, Shameerali Kolothum Thodi wrote:
>>> For the record I tested the series with host VFIO device and a
>>> virtio-blk-pci device put behind the same pxb-pcie/smmu protection and
>>> it works just fine
>>>
>>> -+-[0000:0a]-+-01.0-[0b]----00.0  Mellanox Technologies ConnectX Family
>>> mlx5Gen Virtual Function
>>>  |           \-01.1-[0c]----00.0  Red Hat, Inc. Virtio 1.0 block device
>>>  \-[0000:00]-+-00.0  Red Hat, Inc. QEMU PCIe Host bridge
>>>              +-01.0-[01]--
>>>              +-01.1-[02]--
>>>              \-02.0  Red Hat, Inc. QEMU PCIe Expander bridge
>>>
>>> This shows that without vcmdq feature there is no blocker having the
>>> same smmu device protecting both accelerated and emulated devices.
>> Thanks for giving it a spin. Yes, it currently supports the above. 
>>
>> At the moment we are not using the IOTLB for the emulated dev for a
>> config like above.  Have you checked performance for either emulated or
>> vfio dev with the above config? Whatever light tests I have done it shows
>> performance degradation for emulated dev compared to the default
>> SMMUv3(iommu=smmuv3). 
No I have not checked yet. Again I do not advocate for this kind of mix
but I wanted to check that it still works conceptually.

Thanks

Eric
>>
>> And if the emulated dev issues _TLBI_NH_ASID, the code currently will propagate
>> that down to host SMMUv3. This will affect the vfio dev as well.
> VA too. Only commands with an SID field can be simply excluded.
> I think we should be concerned that the underlying SMMU CMDQ HW
> has a very limited command executing power, so wasting command
> cycles doesn't feel very ideal as it could impact the host OS
> (and other VMs too).
>
> Thanks
> Nicolin
>


Re: [RFC PATCH v2 00/20] hw/arm/virt: Add support for user-creatable accelerated SMMUv3
Posted by Philippe Mathieu-Daudé 2 weeks, 2 days ago
Hi,

On 11/3/25 15:10, Shameer Kolothum via wrote:
> Hi All,
> 
> This patch series introduces initial support for a user-creatable
> accelerated SMMUv3 device (-device arm-smmuv3-accel) in QEMU.

I'm a bit confused by the design here. Why are we introducing this as
some device while it is a core component of the bus topology (here PCI)?

Is is because this device is inspired on how x86 IOMMUs are wired?

> Why this is needed:
> 
> Currently, QEMU’s ARM SMMUv3 emulation (iommu=smmuv3) is tied to the
> machine and does not support configuring the host SMMUv3 in nested
> mode.This limitation prevents its use with vfio-pci passthrough
> devices.
> 
> The new pluggable smmuv3-accel device enables host SMMUv3 configuration
> with nested stage support (Stage 1 owned by the Guest and Stage 2 by the
> host) via the new IOMMUFD APIs. Additionally, it allows multiple
> accelerated vSMMUv3 instances for guests running on hosts with multiple
> physical SMMUv3s.
> 
> This will benefit in:
> -Reduced invalidation broadcasts and lookups for devices behind multiple
>   physical SMMUv3s.
> -Simplifies handling of host SMMUv3s with differing feature sets.
> -Lays the groundwork for additional capabilities like vCMDQ support.
> 
> 
> Changes from RFCv1[0]:
> 
> Thanks to everyone who provided feedback on RFCv1!.
> 
> –The device is now called arm-smmuv3-accel instead of arm-smmuv3-nested
>   to better reflect its role in using the host's physical SMMUv3 for page
>   table setup and cache invalidations.
> -Includes patches for VIOMMU and VDEVICE IOMMUFD APIs (patches 1,2).
> -Merges patches from Nicolin’s GitHub repository that add accelerated
>   functionalityi for page table setup and cache invalidations[1]. I have
>   modified these a bit, but hopefully has not broken anything.
> -Incorporates various fixes and improvements based on RFCv1 feedback.
> –Adds support for vfio-pci hotplug with smmuv3-accel.
> 
> Note: IORT RMR patches for MSI setup are currently excluded as we may
> adopt a different approach for MSI handling in the future [2].
> 
> Also this has dependency on the common iommufd/vfio patches from
> Zhenzhong's series here[3]
> 
> ToDos:
> 
> –At least one vfio-pci device must currently be cold-plugged to a
>   pxb-pcie bus associated with the arm-smmuv3-accel. This is required both
>   to associate a vSMMUv3 with a host SMMUv3 and also needed to
>   retrieve the host SMMUv3 IDR registers for guest export.
>   Future updates will remove this restriction by adding the
>   necessary kernel support.
>   Please find the discussion here[4]
> -This version does not yet support host SMMUv3 fault handling or
>   other event notifications. These will be addressed in a
>   future patch series.
> 
> 
> The complete branch can be found here:
> https://github.com/hisilicon/qemu/tree/master-smmuv3-accel-rfcv2-ext
> 
> I have done basic sanity testing on a Hisilicon Platform using the kernel
> branch here:
> https://github.com/nicolinc/iommufd/tree/iommufd_msi-rfcv2
> 
> Usage Eg:
> 
> On a HiSilicon platform that has multiple host SMMUv3s, the ACC ZIP VF
> devices and HNS VF devices are behind different host SMMUv3s. So for a
> Guest, specify two arm-smmuv3-accel devices each behind a pxb-pcie as below,
> 
> 
> ./qemu-system-aarch64 -machine virt,accel=kvm,gic-version=3 \
> -cpu host -smp cpus=4 -m size=4G,slots=4,maxmem=256G \
> -bios QEMU_EFI.fd \
> -object iommufd,id=iommufd0 \
> -device virtio-blk-device,drive=fs \
> -drive if=none,file=rootfs.qcow2,id=fs \
> -device pxb-pcie,id=pcie.1,bus_nr=1,bus=pcie.0 \
> -device arm-smmuv3-accel,bus=pcie.1 \
> -device pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1,pref64-reserve=2M,io-reserve=1K \
> -device vfio-pci,host=0000:7d:02.1,bus=pcie.port1,iommufd=iommufd0 \
> -device pcie-root-port,id=pcie.port2,bus=pcie.1,chassis=2,pref64-reserve=2M,io-reserve=1K \
> -device vfio-pci,host=0000:7d:02.2,bus=pcie.port2,iommufd=iommufd0 \
> -device pxb-pcie,id=pcie.2,bus_nr=8,bus=pcie.0 \
> -device arm-smmuv3-accel,bus=pcie.2 \
> -device pcie-root-port,id=pcie.port3,bus=pcie.2,chassis=3,pref64-reserve=2M,io-reserve=1K \
> -device vfio-pci,host=0000:75:00.1,bus=pcie.port3,iommufd=iommufd0 \
> -kernel Image \
> -append "rdinit=init console=ttyAMA0 root=/dev/vda2 rw earlycon=pl011,0x9000000" \
> -device virtio-9p-pci,fsdev=p9fs,mount_tag=p9,bus=pcie.0 \
> -fsdev local,id=p9fs,path=p9root,security_model=mapped \
> -net none \
> -nographic
> 
> Guest will boot with two SMMUv3s,
> ...
> arm-smmu-v3 arm-smmu-v3.0.auto: option mask 0x0
> arm-smmu-v3 arm-smmu-v3.0.auto: ias 44-bit, oas 44-bit (features 0x00008325)
> arm-smmu-v3 arm-smmu-v3.0.auto: allocated 65536 entries for cmdq
> arm-smmu-v3 arm-smmu-v3.0.auto: allocated 32768 entries for evtq
> arm-smmu-v3 arm-smmu-v3.1.auto: option mask 0x0
> arm-smmu-v3 arm-smmu-v3.1.auto: ias 44-bit, oas 44-bit (features 0x00008325)
> arm-smmu-v3 arm-smmu-v3.1.auto: allocated 65536 entries for cmdq
> arm-smmu-v3 arm-smmu-v3.1.auto: allocated 32768 entries for evtq
> 
> With a pci topology like below,
> 
> [root@localhost ~]# lspci -tv
> -+-[0000:00]-+-00.0  Red Hat, Inc. QEMU PCIe Host bridge
>   |           +-01.0  Red Hat, Inc. QEMU PCIe Expander bridge
>   |           +-02.0  Red Hat, Inc. QEMU PCIe Expander bridge
>   |           \-03.0  Virtio: Virtio filesystem
>   +-[0000:01]-+-00.0-[02]----00.0  Huawei Technologies Co., Ltd. HNS Network Controller (Virtual Function)
>   |           \-01.0-[03]----00.0  Huawei Technologies Co., Ltd. HNS Network Controller (Virtual Function)
>   \-[0000:08]---00.0-[09]----00.0  Huawei Technologies Co., Ltd. HiSilicon ZIP Engine(Virtual Function)
> 
> Further tests are always welcome.
> 
> Please take a look and let me know your feedback!
> 
> Thanks,
> Shameer
> 
> [0] https://lore.kernel.org/qemu-devel/20241108125242.60136-1-shameerali.kolothum.thodi@huawei.com/
> [1] https://github.com/nicolinc/qemu/commit/3acbb7f3d114d6bb70f4895aa66a9ec28e6561d6
> [2] https://lore.kernel.org/linux-iommu/cover.1740014950.git.nicolinc@nvidia.com/
> [3] https://lore.kernel.org/qemu-devel/20250219082228.3303163-1-zhenzhong.duan@intel.com/
> [4] https://lore.kernel.org/qemu-devel/Z6TLSdwgajmHVmGH@redhat.com/


Re: [RFC PATCH v2 00/20] hw/arm/virt: Add support for user-creatable accelerated SMMUv3
Posted by Eric Auger 2 weeks, 2 days ago
Hi Philippe,

On 3/19/25 5:40 PM, Philippe Mathieu-Daudé wrote:
> Hi,
>
> On 11/3/25 15:10, Shameer Kolothum via wrote:
>> Hi All,
>>
>> This patch series introduces initial support for a user-creatable
>> accelerated SMMUv3 device (-device arm-smmuv3-accel) in QEMU.
>
> I'm a bit confused by the design here. Why are we introducing this as
> some device while it is a core component of the bus topology (here PCI)?

At the moment the SMMU machine wide and is optin-in with a machine option.

However there is a need to be able to instantiate multiple of them to
match the physical implementation. and there is a need to define what
bus topology each instance is translating, hence the idea to attach it
to a bus.

At ACPI level the IORT table allows to precisely define which RID is
translated by each SMMU instance and this is something we fail to model
with the machine wide option.

Eric
>
> Is is because this device is inspired on how x86 IOMMUs are wired?
>
>> Why this is needed:
>>
>> Currently, QEMU’s ARM SMMUv3 emulation (iommu=smmuv3) is tied to the
>> machine and does not support configuring the host SMMUv3 in nested
>> mode.This limitation prevents its use with vfio-pci passthrough
>> devices.
>>
>> The new pluggable smmuv3-accel device enables host SMMUv3 configuration
>> with nested stage support (Stage 1 owned by the Guest and Stage 2 by the
>> host) via the new IOMMUFD APIs. Additionally, it allows multiple
>> accelerated vSMMUv3 instances for guests running on hosts with multiple
>> physical SMMUv3s.
>>
>> This will benefit in:
>> -Reduced invalidation broadcasts and lookups for devices behind multiple
>>   physical SMMUv3s.
>> -Simplifies handling of host SMMUv3s with differing feature sets.
>> -Lays the groundwork for additional capabilities like vCMDQ support.
>>
>>
>> Changes from RFCv1[0]:
>>
>> Thanks to everyone who provided feedback on RFCv1!.
>>
>> –The device is now called arm-smmuv3-accel instead of arm-smmuv3-nested
>>   to better reflect its role in using the host's physical SMMUv3 for
>> page
>>   table setup and cache invalidations.
>> -Includes patches for VIOMMU and VDEVICE IOMMUFD APIs (patches 1,2).
>> -Merges patches from Nicolin’s GitHub repository that add accelerated
>>   functionalityi for page table setup and cache invalidations[1]. I have
>>   modified these a bit, but hopefully has not broken anything.
>> -Incorporates various fixes and improvements based on RFCv1 feedback.
>> –Adds support for vfio-pci hotplug with smmuv3-accel.
>>
>> Note: IORT RMR patches for MSI setup are currently excluded as we may
>> adopt a different approach for MSI handling in the future [2].
>>
>> Also this has dependency on the common iommufd/vfio patches from
>> Zhenzhong's series here[3]
>>
>> ToDos:
>>
>> –At least one vfio-pci device must currently be cold-plugged to a
>>   pxb-pcie bus associated with the arm-smmuv3-accel. This is required
>> both
>>   to associate a vSMMUv3 with a host SMMUv3 and also needed to
>>   retrieve the host SMMUv3 IDR registers for guest export.
>>   Future updates will remove this restriction by adding the
>>   necessary kernel support.
>>   Please find the discussion here[4]
>> -This version does not yet support host SMMUv3 fault handling or
>>   other event notifications. These will be addressed in a
>>   future patch series.
>>
>>
>> The complete branch can be found here:
>> https://github.com/hisilicon/qemu/tree/master-smmuv3-accel-rfcv2-ext
>>
>> I have done basic sanity testing on a Hisilicon Platform using the
>> kernel
>> branch here:
>> https://github.com/nicolinc/iommufd/tree/iommufd_msi-rfcv2
>>
>> Usage Eg:
>>
>> On a HiSilicon platform that has multiple host SMMUv3s, the ACC ZIP VF
>> devices and HNS VF devices are behind different host SMMUv3s. So for a
>> Guest, specify two arm-smmuv3-accel devices each behind a pxb-pcie as
>> below,
>>
>>
>> ./qemu-system-aarch64 -machine virt,accel=kvm,gic-version=3 \
>> -cpu host -smp cpus=4 -m size=4G,slots=4,maxmem=256G \
>> -bios QEMU_EFI.fd \
>> -object iommufd,id=iommufd0 \
>> -device virtio-blk-device,drive=fs \
>> -drive if=none,file=rootfs.qcow2,id=fs \
>> -device pxb-pcie,id=pcie.1,bus_nr=1,bus=pcie.0 \
>> -device arm-smmuv3-accel,bus=pcie.1 \
>> -device
>> pcie-root-port,id=pcie.port1,bus=pcie.1,chassis=1,pref64-reserve=2M,io-reserve=1K
>> \
>> -device vfio-pci,host=0000:7d:02.1,bus=pcie.port1,iommufd=iommufd0 \
>> -device
>> pcie-root-port,id=pcie.port2,bus=pcie.1,chassis=2,pref64-reserve=2M,io-reserve=1K
>> \
>> -device vfio-pci,host=0000:7d:02.2,bus=pcie.port2,iommufd=iommufd0 \
>> -device pxb-pcie,id=pcie.2,bus_nr=8,bus=pcie.0 \
>> -device arm-smmuv3-accel,bus=pcie.2 \
>> -device
>> pcie-root-port,id=pcie.port3,bus=pcie.2,chassis=3,pref64-reserve=2M,io-reserve=1K
>> \
>> -device vfio-pci,host=0000:75:00.1,bus=pcie.port3,iommufd=iommufd0 \
>> -kernel Image \
>> -append "rdinit=init console=ttyAMA0 root=/dev/vda2 rw
>> earlycon=pl011,0x9000000" \
>> -device virtio-9p-pci,fsdev=p9fs,mount_tag=p9,bus=pcie.0 \
>> -fsdev local,id=p9fs,path=p9root,security_model=mapped \
>> -net none \
>> -nographic
>>
>> Guest will boot with two SMMUv3s,
>> ...
>> arm-smmu-v3 arm-smmu-v3.0.auto: option mask 0x0
>> arm-smmu-v3 arm-smmu-v3.0.auto: ias 44-bit, oas 44-bit (features
>> 0x00008325)
>> arm-smmu-v3 arm-smmu-v3.0.auto: allocated 65536 entries for cmdq
>> arm-smmu-v3 arm-smmu-v3.0.auto: allocated 32768 entries for evtq
>> arm-smmu-v3 arm-smmu-v3.1.auto: option mask 0x0
>> arm-smmu-v3 arm-smmu-v3.1.auto: ias 44-bit, oas 44-bit (features
>> 0x00008325)
>> arm-smmu-v3 arm-smmu-v3.1.auto: allocated 65536 entries for cmdq
>> arm-smmu-v3 arm-smmu-v3.1.auto: allocated 32768 entries for evtq
>>
>> With a pci topology like below,
>>
>> [root@localhost ~]# lspci -tv
>> -+-[0000:00]-+-00.0  Red Hat, Inc. QEMU PCIe Host bridge
>>   |           +-01.0  Red Hat, Inc. QEMU PCIe Expander bridge
>>   |           +-02.0  Red Hat, Inc. QEMU PCIe Expander bridge
>>   |           \-03.0  Virtio: Virtio filesystem
>>   +-[0000:01]-+-00.0-[02]----00.0  Huawei Technologies Co., Ltd. HNS
>> Network Controller (Virtual Function)
>>   |           \-01.0-[03]----00.0  Huawei Technologies Co., Ltd. HNS
>> Network Controller (Virtual Function)
>>   \-[0000:08]---00.0-[09]----00.0  Huawei Technologies Co., Ltd.
>> HiSilicon ZIP Engine(Virtual Function)
>>
>> Further tests are always welcome.
>>
>> Please take a look and let me know your feedback!
>>
>> Thanks,
>> Shameer
>>
>> [0]
>> https://lore.kernel.org/qemu-devel/20241108125242.60136-1-shameerali.kolothum.thodi@huawei.com/
>> [1]
>> https://github.com/nicolinc/qemu/commit/3acbb7f3d114d6bb70f4895aa66a9ec28e6561d6
>> [2]
>> https://lore.kernel.org/linux-iommu/cover.1740014950.git.nicolinc@nvidia.com/
>> [3]
>> https://lore.kernel.org/qemu-devel/20250219082228.3303163-1-zhenzhong.duan@intel.com/
>> [4] https://lore.kernel.org/qemu-devel/Z6TLSdwgajmHVmGH@redhat.com/
>