[v1] iommufd: Add vIOMMU infrastructure (Part-4 vCMDQ)

[PATCH v1 00/16] iommufd: Add vIOMMU infrastructure (Part-4 vCMDQ)

Posted by Nicolin Chen 10 months ago

The vIOMMU object is designed to represent a slice of an IOMMU HW for its
virtualization features shared with or passed to user space (a VM mostly)
in a way of HW acceleration. This extended the HWPT-based design for more
advanced virtualization feature.

A vCMDQ introduced by this series as a part of the vIOMMU infrastructure
represents a HW supported queue/buffer for VM to use exclusively, e.g.
  - NVIDIA's virtual command queue
  - AMD vIOMMU's command buffer
either of which is an IOMMU HW feature to directly load and execute cache
invalidation commands issued by a guest kernel, to shoot down TLB entries
that HW cached for guest-owned stage-1 page table entries. This is a big
improvement since there is no VM Exit during an invalidation, compared to
the traditional invalidation pathway by trapping a guest-own invalidation
queue and forwarding those commands/requests to the host kernel that will
eventually fill a HW-owned queue to execute those commands.

Thus, a vCMDQ object, as an initial use case, is all about a guest-owned
HW command queue that VMM can allocate/configure depending on the request
from a guest kernel. Introduce a new IOMMUFD_OBJ_VCMDQ and its allocator
IOMMUFD_CMD_VCMDQ_ALLOC allowing VMM to forward the IOMMU-specific queue
info, such as queue base address, size, and etc.

Meanwhile, a guest-owned command queue needs the kernel (a command queue
driver) to control the queue by reading/writing its consumer and producer
indexes, which means the command queue HW allows the guest kernel to get
a direct R/W access to those registers. Introduce an mmap infrastructure
to the iommufd core so as to support pass through a piece of MMIO region
from the host physical address space to the guest physical address space.
The VMA info (vm_pgoff/size) used by an mmap must be pre-allocated during
the IOMMUFD_CMD_VCMDQ_ALLOC and given those info to the user space as an
output driver-data by the IOMMUFD_CMD_VCMDQ_ALLOC. So, this requires a
driver-specific user data support by a vIOMMU object.

As a real-world use case, this series implements a vCMDQ support to the
tegra241-cmdqv driver for the vCMDQ on NVIDIA Grace CPU. In another word,
this is also the Tegra CMDQV series Part-2 (user-space support), reworked
from Previous RFCv1:
https://lore.kernel.org/all/cover.1712978212.git.nicolinc@nvidia.com/

This is on Github:
https://github.com/nicolinc/iommufd/commits/iommufd_vcmdq-v1

Paring QEMU branch for testing:
https://github.com/nicolinc/qemu/commits/wip/for_iommufd_vcmdq-v1

Thanks
Nicolin

Nicolin Chen (16):
  iommu: Pass in a driver-level user data structure to viommu_alloc op
  iommufd/viommu: Allow driver-specific user data for a vIOMMU object
  iommu: Add iommu_copy_struct_to_user helper
  iommufd: Add iommufd_struct_destroy to revert iommufd_viommu_alloc
  iommufd/selftest: Support user_data in mock_viommu_alloc
  iommufd/selftest: Add covearge for viommu data
  iommufd/viommu: Add driver-allocated vDEVICE support
  iommufd/viommu: Introduce IOMMUFD_OBJ_VCMDQ and its related struct
  iommufd/viommmu: Add IOMMUFD_CMD_VCMDQ_ALLOC ioctl
  iommufd: Add mmap interface
  iommufd/selftest: Add coverage for the new mmap interface
  Documentation: userspace-api: iommufd: Update vCMDQ
  iommu/tegra241-cmdqv: Use request_threaded_irq
  iommu/arm-smmu-v3: Add vsmmu_alloc impl op
  iommu/tegra241-cmdqv: Add user-space use support
  iommu/tegra241-cmdqv: Add IOMMU_VEVENTQ_TYPE_TEGRA241_CMDQV support

 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  24 +-
 drivers/iommu/iommufd/iommufd_private.h       |  20 +-
 drivers/iommu/iommufd/iommufd_test.h          |  17 +
 include/linux/iommu.h                         |  43 ++-
 include/linux/iommufd.h                       |  93 +++++
 include/uapi/linux/iommufd.h                  |  87 +++++
 tools/testing/selftests/iommu/iommufd_utils.h |  21 +-
 .../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c     |  26 +-
 .../iommu/arm/arm-smmu-v3/tegra241-cmdqv.c    | 349 +++++++++++++++++-
 drivers/iommu/iommufd/driver.c                |  54 +++
 drivers/iommu/iommufd/main.c                  |  54 ++-
 drivers/iommu/iommufd/selftest.c              |  58 ++-
 drivers/iommu/iommufd/viommu.c                |  78 +++-
 tools/testing/selftests/iommu/iommufd.c       |  34 +-
 .../selftests/iommu/iommufd_fail_nth.c        |   5 +-
 Documentation/userspace-api/iommufd.rst       |  11 +
 16 files changed, 912 insertions(+), 62 deletions(-)

-- 
2.43.0

Re: [PATCH v1 00/16] iommufd: Add vIOMMU infrastructure (Part-4 vCMDQ)

Posted by Vasant Hegde 9 months, 2 weeks ago

Hi Nicolin,


On 4/11/2025 12:07 PM, Nicolin Chen wrote:
> The vIOMMU object is designed to represent a slice of an IOMMU HW for its
> virtualization features shared with or passed to user space (a VM mostly)
> in a way of HW acceleration. This extended the HWPT-based design for more
> advanced virtualization feature.
> 
> A vCMDQ introduced by this series as a part of the vIOMMU infrastructure
> represents a HW supported queue/buffer for VM to use exclusively, e.g.
>   - NVIDIA's virtual command queue
>   - AMD vIOMMU's command buffer

I assume we can pass multiple buffer details (like GPA, size) from guest to
hypervisor. Is that correct understanding?


> either of which is an IOMMU HW feature to directly load and execute cache
> invalidation commands issued by a guest kernel, to shoot down TLB entries
> that HW cached for guest-owned stage-1 page table entries. This is a big
> improvement since there is no VM Exit during an invalidation, compared to
> the traditional invalidation pathway by trapping a guest-own invalidation
> queue and forwarding those commands/requests to the host kernel that will
> eventually fill a HW-owned queue to execute those commands.
> 
> Thus, a vCMDQ object, as an initial use case, is all about a guest-owned
> HW command queue that VMM can allocate/configure depending on the request
> from a guest kernel. Introduce a new IOMMUFD_OBJ_VCMDQ and its allocator
> IOMMUFD_CMD_VCMDQ_ALLOC allowing VMM to forward the IOMMU-specific queue
> info, such as queue base address, size, and etc.
> > Meanwhile, a guest-owned command queue needs the kernel (a command queue
> driver) to control the queue by reading/writing its consumer and producer
> indexes, which means the command queue HW allows the guest kernel to get
> a direct R/W access to those registers. Introduce an mmap infrastructure
> to the iommufd core so as to support pass through a piece of MMIO region
> from the host physical address space to the guest physical address space.
> The VMA info (vm_pgoff/size) used by an mmap must be pre-allocated during
> the IOMMUFD_CMD_VCMDQ_ALLOC and given those info to the user space as an
> output driver-data by the IOMMUFD_CMD_VCMDQ_ALLOC. So, this requires a
> driver-specific user data support by a vIOMMU object.

Nice! Thanks.

-Vasant

Re: [PATCH v1 00/16] iommufd: Add vIOMMU infrastructure (Part-4 vCMDQ)

Posted by Nicolin Chen 9 months, 2 weeks ago

On Wed, Apr 23, 2025 at 12:58:19PM +0530, Vasant Hegde wrote:
> On 4/11/2025 12:07 PM, Nicolin Chen wrote:
> > The vIOMMU object is designed to represent a slice of an IOMMU HW for its
> > virtualization features shared with or passed to user space (a VM mostly)
> > in a way of HW acceleration. This extended the HWPT-based design for more
> > advanced virtualization feature.
> > 
> > A vCMDQ introduced by this series as a part of the vIOMMU infrastructure
> > represents a HW supported queue/buffer for VM to use exclusively, e.g.
> >   - NVIDIA's virtual command queue
> >   - AMD vIOMMU's command buffer
> 
> I assume we can pass multiple buffer details (like GPA, size) from guest to
> hypervisor. Is that correct understanding?

Yes. The NVIDIA model passes through a Virtual-Interface to a VM,
and the VM can allocate and map multiple command queues (buffers)
to the V-Interface, by providing each command queue info in:

+struct iommu_vcmdq_tegra241_cmdqv {
+	__u32 vcmdq_id;
+	__u32 vcmdq_log2size;		// size
+	__aligned_u64 vcmdq_base;	// GPA
 };

Thanks
Nicolin

Re: [PATCH v1 00/16] iommufd: Add vIOMMU infrastructure (Part-4 vCMDQ)

Posted by Vasant Hegde 9 months, 2 weeks ago

On 4/23/2025 1:15 PM, Nicolin Chen wrote:
> On Wed, Apr 23, 2025 at 12:58:19PM +0530, Vasant Hegde wrote:
>> On 4/11/2025 12:07 PM, Nicolin Chen wrote:
>>> The vIOMMU object is designed to represent a slice of an IOMMU HW for its
>>> virtualization features shared with or passed to user space (a VM mostly)
>>> in a way of HW acceleration. This extended the HWPT-based design for more
>>> advanced virtualization feature.
>>>
>>> A vCMDQ introduced by this series as a part of the vIOMMU infrastructure
>>> represents a HW supported queue/buffer for VM to use exclusively, e.g.
>>>   - NVIDIA's virtual command queue
>>>   - AMD vIOMMU's command buffer
>>
>> I assume we can pass multiple buffer details (like GPA, size) from guest to
>> hypervisor. Is that correct understanding?
> 
> Yes. The NVIDIA model passes through a Virtual-Interface to a VM,
> and the VM can allocate and map multiple command queues (buffers)
> to the V-Interface, by providing each command queue info in:
> 
> +struct iommu_vcmdq_tegra241_cmdqv {
> +	__u32 vcmdq_id;
> +	__u32 vcmdq_log2size;		// size
> +	__aligned_u64 vcmdq_base;	// GPA
>  };

Nice. Thanks for the details.

-Vasant

RE: [PATCH v1 00/16] iommufd: Add vIOMMU infrastructure (Part-4 vCMDQ)

Posted by Tian, Kevin 9 months, 2 weeks ago

> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Friday, April 11, 2025 2:38 PM
> 
[...]
> This is a big
> improvement since there is no VM Exit during an invalidation, compared to
> the traditional invalidation pathway by trapping a guest-own invalidation
> queue and forwarding those commands/requests to the host kernel that will
> eventually fill a HW-owned queue to execute those commands.
> 

any data to show how big the improvements could be in major
IOMMU usages (kernel dma, user dma and sva)?

Re: [PATCH v1 00/16] iommufd: Add vIOMMU infrastructure (Part-4 vCMDQ)

Posted by Nicolin Chen 9 months, 2 weeks ago

On Thu, Apr 24, 2025 at 08:21:08AM +0000, Tian, Kevin wrote:
> > From: Nicolin Chen <nicolinc@nvidia.com>
> > Sent: Friday, April 11, 2025 2:38 PM
> > 
> [...]
> > This is a big
> > improvement since there is no VM Exit during an invalidation, compared to
> > the traditional invalidation pathway by trapping a guest-own invalidation
> > queue and forwarding those commands/requests to the host kernel that will
> > eventually fill a HW-owned queue to execute those commands.
> > 
> 
> any data to show how big the improvements could be in major
> IOMMU usages (kernel dma, user dma and sva)?

I thought I mentioned about the percentage of the gain somewhere
but seemingly not in this series.. Will add.

Thanks!
Nicolin