drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 19 ++ drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 19 ++ .../iommu/arm/arm-smmu-v3/tegra241-cmdqv.c | 284 +++++++++++++++++- drivers/iommu/iommufd/Makefile | 3 +- drivers/iommu/iommufd/device.c | 11 + drivers/iommu/iommufd/hw_pagetable.c | 4 +- drivers/iommu/iommufd/iommufd_private.h | 71 +++-- drivers/iommu/iommufd/iommufd_test.h | 5 + drivers/iommu/iommufd/main.c | 69 ++++- drivers/iommu/iommufd/selftest.c | 100 ++++++ drivers/iommu/iommufd/viommu.c | 235 +++++++++++++++ include/linux/iommu.h | 16 + include/linux/iommufd.h | 100 ++++++ include/uapi/linux/iommufd.h | 98 ++++++ tools/testing/selftests/iommu/iommufd.c | 44 +++ tools/testing/selftests/iommu/iommufd_utils.h | 71 +++++ 16 files changed, 1103 insertions(+), 46 deletions(-) create mode 100644 drivers/iommu/iommufd/viommu.c
This is an experimental RFC series for VIOMMU infrastructure, using NVIDIA Tegra241 (Grace) CMDQV as a test instance. VIOMMU obj is used to represent a virtual interface (iommu) backed by an underlying IOMMU's HW-accelerated feature for virtualizaion: for example, NVIDIA's VINTF (v-interface for CMDQV) and AMD"s vIOMMU. VQUEUE obj is used to represent a virtual command queue (buffer) backed by an underlying IOMMU command queue to passthrough for VMs to use directly: for example, NVIDIA's Virtual Command Queue and AMD's Command Buffer. NVIDIA's CMDQV requires a pair of physical and virtual device Stream IDs to process ATC invalidation commands by ARM SMMU. So, set/unset_dev_id ops and ioctls are introduced to VIOMMU. Also, a passthrough queue has a pair of start and tail pointers/indexes in the real HW registers, which should be mmaped to user space for hypervisor to map to VM's mmio region directly. Thus, iommufd needs an mmap op too. Some todos/opens: 1. Add selftest coverages for new ioctls 2. The mmap needs a way to get viommu_id. Currently it's getting from vma->vm_pgoff, which might not be ideal. 3. This series is only verified with a single passthrough device that's hehind a physical ARM SMMU. So, devices behind two+ IOMMUs might need some additional support (and verifications). 4. Requires for comments from AMD folks to support AMD's vIOMMU feature. This series is on Github (for review and reference only): https://github.com/nicolinc/iommufd/commits/vcmdq_user_space-rfc-v1 Real HW tests wre conducted with this QEMU branch: https://github.com/nicolinc/qemu/commits/wip/iommufd_vcmdq/ Thanks Nicolin Chen (14): iommufd: Move iommufd_object to public iommufd header iommufd: Swap _iommufd_object_alloc and __iommufd_object_alloc iommufd: Prepare for viommu structures and functions iommufd: Add struct iommufd_viommu and iommufd_viommu_ops iommufd: Add IOMMUFD_OBJ_VIOMMU and IOMMUFD_CMD_VIOMMU_ALLOC iommufd/selftest: Add IOMMU_VIOMMU_ALLOC test coverage iommufd: Add viommu set/unset_dev_id ops iommufd: Add IOMMU_VIOMMU_SET_DEV_ID ioctl iommufd/selftest: Add IOMMU_VIOMMU_SET_DEV_ID test coverage iommufd/selftest: Add IOMMU_TEST_OP_MV_CHECK_DEV_ID iommufd: Add struct iommufd_vqueue and its related viommu ops iommufd: Add IOMMUFD_OBJ_VQUEUE and IOMMUFD_CMD_VQUEUE_ALLOC iommufd: Add mmap infrastructure iommu/tegra241-cmdqv: Add user-space use support drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 19 ++ drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 19 ++ .../iommu/arm/arm-smmu-v3/tegra241-cmdqv.c | 284 +++++++++++++++++- drivers/iommu/iommufd/Makefile | 3 +- drivers/iommu/iommufd/device.c | 11 + drivers/iommu/iommufd/hw_pagetable.c | 4 +- drivers/iommu/iommufd/iommufd_private.h | 71 +++-- drivers/iommu/iommufd/iommufd_test.h | 5 + drivers/iommu/iommufd/main.c | 69 ++++- drivers/iommu/iommufd/selftest.c | 100 ++++++ drivers/iommu/iommufd/viommu.c | 235 +++++++++++++++ include/linux/iommu.h | 16 + include/linux/iommufd.h | 100 ++++++ include/uapi/linux/iommufd.h | 98 ++++++ tools/testing/selftests/iommu/iommufd.c | 44 +++ tools/testing/selftests/iommu/iommufd_utils.h | 71 +++++ 16 files changed, 1103 insertions(+), 46 deletions(-) create mode 100644 drivers/iommu/iommufd/viommu.c -- 2.43.0
> From: Nicolin Chen <nicolinc@nvidia.com> > Sent: Saturday, April 13, 2024 11:47 AM > > This is an experimental RFC series for VIOMMU infrastructure, using NVIDIA > Tegra241 (Grace) CMDQV as a test instance. > > VIOMMU obj is used to represent a virtual interface (iommu) backed by an > underlying IOMMU's HW-accelerated feature for virtualizaion: for example, > NVIDIA's VINTF (v-interface for CMDQV) and AMD"s vIOMMU. > > VQUEUE obj is used to represent a virtual command queue (buffer) backed > by > an underlying IOMMU command queue to passthrough for VMs to use > directly: > for example, NVIDIA's Virtual Command Queue and AMD's Command Buffer. > is VCMDQ more accurate? AMD also supports fault queue passthrough then VQUEUE sounds broader than a cmd queue...
On Wed, May 22, 2024 at 08:40:00AM +0000, Tian, Kevin wrote: > > From: Nicolin Chen <nicolinc@nvidia.com> > > Sent: Saturday, April 13, 2024 11:47 AM > > > > This is an experimental RFC series for VIOMMU infrastructure, using NVIDIA > > Tegra241 (Grace) CMDQV as a test instance. > > > > VIOMMU obj is used to represent a virtual interface (iommu) backed by an > > underlying IOMMU's HW-accelerated feature for virtualizaion: for example, > > NVIDIA's VINTF (v-interface for CMDQV) and AMD"s vIOMMU. > > > > VQUEUE obj is used to represent a virtual command queue (buffer) backed > > by > > an underlying IOMMU command queue to passthrough for VMs to use > > directly: > > for example, NVIDIA's Virtual Command Queue and AMD's Command Buffer. > > > > is VCMDQ more accurate? AMD also supports fault queue passthrough > then VQUEUE sounds broader than a cmd queue... Is there a reason VQUEUE couldn't handle the fault/etc queues too? The only difference is direction, there is still a doorbell/etc. Jason
On Wed, May 22, 2024 at 01:48:18PM -0300, Jason Gunthorpe wrote: > On Wed, May 22, 2024 at 08:40:00AM +0000, Tian, Kevin wrote: > > > From: Nicolin Chen <nicolinc@nvidia.com> > > > Sent: Saturday, April 13, 2024 11:47 AM > > > > > > This is an experimental RFC series for VIOMMU infrastructure, using NVIDIA > > > Tegra241 (Grace) CMDQV as a test instance. > > > > > > VIOMMU obj is used to represent a virtual interface (iommu) backed by an > > > underlying IOMMU's HW-accelerated feature for virtualizaion: for example, > > > NVIDIA's VINTF (v-interface for CMDQV) and AMD"s vIOMMU. > > > > > > VQUEUE obj is used to represent a virtual command queue (buffer) backed > > > by > > > an underlying IOMMU command queue to passthrough for VMs to use > > > directly: > > > for example, NVIDIA's Virtual Command Queue and AMD's Command Buffer. > > > > > > > is VCMDQ more accurate? AMD also supports fault queue passthrough > > then VQUEUE sounds broader than a cmd queue... > > Is there a reason VQUEUE couldn't handle the fault/etc queues too? The > only difference is direction, there is still a doorbell/etc. Yea, SMMU also has Event Queue and PRI queue. Though I haven't got time to sit down to look at Baolu's work closely, the uAPI seems to be a unified one for all IOMMUs. And though I have no intention to be against that design, yet maybe there could be an alternative in a somewhat HW specific language as we do for invalidation? Or not worth it? Thanks Nicolin
On Wed, May 22, 2024 at 12:47:19PM -0700, Nicolin Chen wrote: > On Wed, May 22, 2024 at 01:48:18PM -0300, Jason Gunthorpe wrote: > > On Wed, May 22, 2024 at 08:40:00AM +0000, Tian, Kevin wrote: > > > > From: Nicolin Chen <nicolinc@nvidia.com> > > > > Sent: Saturday, April 13, 2024 11:47 AM > > > > > > > > This is an experimental RFC series for VIOMMU infrastructure, using NVIDIA > > > > Tegra241 (Grace) CMDQV as a test instance. > > > > > > > > VIOMMU obj is used to represent a virtual interface (iommu) backed by an > > > > underlying IOMMU's HW-accelerated feature for virtualizaion: for example, > > > > NVIDIA's VINTF (v-interface for CMDQV) and AMD"s vIOMMU. > > > > > > > > VQUEUE obj is used to represent a virtual command queue (buffer) backed > > > > by > > > > an underlying IOMMU command queue to passthrough for VMs to use > > > > directly: > > > > for example, NVIDIA's Virtual Command Queue and AMD's Command Buffer. > > > > > > > > > > is VCMDQ more accurate? AMD also supports fault queue passthrough > > > then VQUEUE sounds broader than a cmd queue... > > > > Is there a reason VQUEUE couldn't handle the fault/etc queues too? The > > only difference is direction, there is still a doorbell/etc. > > Yea, SMMU also has Event Queue and PRI queue. Though I haven't > got time to sit down to look at Baolu's work closely, the uAPI > seems to be a unified one for all IOMMUs. And though I have no > intention to be against that design, yet maybe there could be > an alternative in a somewhat HW specific language as we do for > invalidation? Or not worth it? I was thinking not worth it, I expect a gain here is to do as AMD has done and make the HW dma the queues directly to guest memory. IMHO the primary issue with the queues is DOS, as having any shared queue across VMs is dangerous in that way. Allowing each VIOMMU to have its own private queue and own flow control helps with that. Jason
> From: Jason Gunthorpe <jgg@nvidia.com> > Sent: Thursday, May 23, 2024 7:29 AM > > On Wed, May 22, 2024 at 12:47:19PM -0700, Nicolin Chen wrote: > > On Wed, May 22, 2024 at 01:48:18PM -0300, Jason Gunthorpe wrote: > > > On Wed, May 22, 2024 at 08:40:00AM +0000, Tian, Kevin wrote: > > > > > From: Nicolin Chen <nicolinc@nvidia.com> > > > > > Sent: Saturday, April 13, 2024 11:47 AM > > > > > > > > > > This is an experimental RFC series for VIOMMU infrastructure, using > NVIDIA > > > > > Tegra241 (Grace) CMDQV as a test instance. > > > > > > > > > > VIOMMU obj is used to represent a virtual interface (iommu) backed > by an > > > > > underlying IOMMU's HW-accelerated feature for virtualizaion: for > example, > > > > > NVIDIA's VINTF (v-interface for CMDQV) and AMD"s vIOMMU. > > > > > > > > > > VQUEUE obj is used to represent a virtual command queue (buffer) > backed > > > > > by > > > > > an underlying IOMMU command queue to passthrough for VMs to > use > > > > > directly: > > > > > for example, NVIDIA's Virtual Command Queue and AMD's Command > Buffer. > > > > > > > > > > > > > is VCMDQ more accurate? AMD also supports fault queue passthrough > > > > then VQUEUE sounds broader than a cmd queue... > > > > > > Is there a reason VQUEUE couldn't handle the fault/etc queues too? The > > > only difference is direction, there is still a doorbell/etc. No reason. the description made it specific to a cmd queue which led me the impression that we may want to create a separate fault queue. > > > > Yea, SMMU also has Event Queue and PRI queue. Though I haven't > > got time to sit down to look at Baolu's work closely, the uAPI > > seems to be a unified one for all IOMMUs. And though I have no > > intention to be against that design, yet maybe there could be > > an alternative in a somewhat HW specific language as we do for > > invalidation? Or not worth it? > > I was thinking not worth it, I expect a gain here is to do as AMD has > done and make the HW dma the queues directly to guest memory. > > IMHO the primary issue with the queues is DOS, as having any shared > queue across VMs is dangerous in that way. Allowing each VIOMMU to > have its own private queue and own flow control helps with that. > and also shorter delivering path with less data copy?
On Wed, May 22, 2024 at 11:43:51PM +0000, Tian, Kevin wrote: > > From: Jason Gunthorpe <jgg@nvidia.com> > > Sent: Thursday, May 23, 2024 7:29 AM > > On Wed, May 22, 2024 at 12:47:19PM -0700, Nicolin Chen wrote: > > > Yea, SMMU also has Event Queue and PRI queue. Though I haven't > > > got time to sit down to look at Baolu's work closely, the uAPI > > > seems to be a unified one for all IOMMUs. And though I have no > > > intention to be against that design, yet maybe there could be > > > an alternative in a somewhat HW specific language as we do for > > > invalidation? Or not worth it? > > > > I was thinking not worth it, I expect a gain here is to do as AMD has > > done and make the HW dma the queues directly to guest memory. > > > > IMHO the primary issue with the queues is DOS, as having any shared > > queue across VMs is dangerous in that way. Allowing each VIOMMU to > > have its own private queue and own flow control helps with that. > > > > and also shorter delivering path with less data copy? Should I interpret that as a yes for fault report via VQUEUE? We only have AMD that can HW dma the events to the guest queue memory. Others all need a backward translation of (at least) a physical dev ID to a virtual dev ID. This is now doable in the kernel by the ongoing vdev_id design by the way. So kernel then can write the guest memory directly to report events? Thanks Nicolin
On Wed, May 22, 2024 at 08:09:12PM -0700, Nicolin Chen wrote: > On Wed, May 22, 2024 at 11:43:51PM +0000, Tian, Kevin wrote: > > > From: Jason Gunthorpe <jgg@nvidia.com> > > > Sent: Thursday, May 23, 2024 7:29 AM > > > On Wed, May 22, 2024 at 12:47:19PM -0700, Nicolin Chen wrote: > > > > Yea, SMMU also has Event Queue and PRI queue. Though I haven't > > > > got time to sit down to look at Baolu's work closely, the uAPI > > > > seems to be a unified one for all IOMMUs. And though I have no > > > > intention to be against that design, yet maybe there could be > > > > an alternative in a somewhat HW specific language as we do for > > > > invalidation? Or not worth it? > > > > > > I was thinking not worth it, I expect a gain here is to do as AMD has > > > done and make the HW dma the queues directly to guest memory. > > > > > > IMHO the primary issue with the queues is DOS, as having any shared > > > queue across VMs is dangerous in that way. Allowing each VIOMMU to > > > have its own private queue and own flow control helps with that. > > > > > > > and also shorter delivering path with less data copy? > > Should I interpret that as a yes for fault report via VQUEUE? > > We only have AMD that can HW dma the events to the guest queue > memory. Others all need a backward translation of (at least) a > physical dev ID to a virtual dev ID. This is now doable in the > kernel by the ongoing vdev_id design by the way. So kernel then > can write the guest memory directly to report events? I don't think we should get into kernel doing direct access at this point, lets focus on basic functionality before we get to microoptimizations like that. So long as the API could support doing something like that it could be done after benchmarking/etc. Jason
© 2016 - 2026 Red Hat, Inc.