[RFC 0/8] iommufd: Enable noiommu mode for cdev

Jacob Pan posted 8 patches 2 hours ago
drivers/iommu/Kconfig                        |  25 +++
drivers/iommu/Makefile                       |   1 +
drivers/iommu/generic_pt/fmt/Makefile        |   1 +
drivers/iommu/generic_pt/fmt/iommu_noiommu.c |  10 +
drivers/iommu/iommu.c                        |  12 +-
drivers/iommu/iommufd/hw_pagetable.c         |   8 +
drivers/iommu/iommufd/io_pagetable.c         |  44 ++++
drivers/iommu/iommufd/ioas.c                 |  24 +++
drivers/iommu/iommufd/iommufd_private.h      |   3 +
drivers/iommu/iommufd/main.c                 |   3 +
drivers/iommu/iommufd/vfio_compat.c          |   6 +-
drivers/iommu/noiommu.c                      | 204 +++++++++++++++++++
drivers/vfio/Kconfig                         |   3 +-
drivers/vfio/device_cdev.c                   |   6 +
drivers/vfio/group.c                         |   2 +-
drivers/vfio/vfio.h                          |  38 +++-
drivers/vfio/vfio_main.c                     |  20 +-
include/linux/generic_pt/iommu.h             |   5 +
include/linux/iommu.h                        |   1 +
include/linux/iommufd.h                      |   4 +-
include/linux/vfio.h                         |   2 +
include/uapi/linux/iommufd.h                 |  25 +++
22 files changed, 431 insertions(+), 16 deletions(-)
create mode 100644 drivers/iommu/generic_pt/fmt/iommu_noiommu.c
create mode 100644 drivers/iommu/noiommu.c
[RFC 0/8] iommufd: Enable noiommu mode for cdev
Posted by Jacob Pan 2 hours ago
VFIO's unsafe_noiommu_mode has long provided a way for userspace drivers
to operate on platforms lacking a hardware IOMMU. Today, IOMMUFD also
supports No-IOMMU mode for group based devices under vfio_compat mode.
However, IOMMUFD's native character device (cdev) does not yet implement
No-IOMMU mode, which is the purpose of this patch. In summary, we have:

|-------------------------+------+---------------|
| Device access mode      | VFIO | IOMMUFD       |
|-------------------------+------+---------------|
| group /dev/vfio/$GROUP  | Yes  | Yes           |
|-------------------------+------+---------------|
| cdev /dev/vfio/devices/ | No   | This patch    |
|-------------------------+------+---------------|

Beyond enabling cdev for IOMMUFD, this patch also addresses the following
deficiencies in the current No-IOMMU mode suggested by Jason[1]:
- Devices operating under No-IOMMU mode are limited to device-level UAPI
  access, without container or IOAS-level capabilities. Consequently,
  user-space drivers lack structured mechanisms for page pinning and often
  resort to mlock(), which is less robust than pin_user_pages() used for
  devices backed by a physical IOMMU. For example, mlock() does not prevent
  page migration.
- There is no architectural mechanism for obtaining physical addresses for
  DMA. As a workaround, user-space drivers frequently rely on /proc/pagemap
  tricks or hardcoded values.

By introducing a dummy IOMMU driver, this patch brings No-IOMMU mode closer
to full citizenship within the IOMMU subsystem. In addition to addressing
the two deficiencies mentioned above, the expectation is that it will also
enable No-IOMMU devices to seamlessly participate in KHO [2]. Furthermore,
these devices will use the IOMMUFD-based ownership checking model for
VFIO_DEVICE_PCI_HOT_RESET, eliminating the need for an iommufd_access object
as required in a previous attempt [3].

For in-kernel DMA, DMA APIs will use direct mode only since this driver
provides identity domain only.

The key implementation points are as follows:

1) Explicitly adding a new cdev with noiommu prefix, e.g.
/dev/vfio/
|-- 7
|-- devices
|   `-- noiommu-vfio0
`-- vfio

2) Add a new dummy iommu driver that claims all PCI devices under its device
scope: e.g.
$ ls /sys/class/iommu/noiommu/devices/
0000:00:00.0  0000:00:02.0  0000:00:04.0  0000:01:00.0

3) Leverage Jason's generic iommupt[4] for IOVA, use a mock AMDv1 page table
format. IOVA is not used for DMA but used as a key to lookup physical address
for DMA by userspace drivers.

4) Support IOAS attachment, map/unmap, and auto iommu_domain/HWPT. Page
pinning is done exactly the same as devices with physical IOMMU backing.

5) Add a new IOMMUFD ioctl to retrieve physical address from mock IOVA

Enabling noiommu mode is backward compatible with VFIO, i.e.
echo 1 > /sys/module/vfio/parameters/enable_unsafe_noiommu_mode

Other than that, the usage of noiommu cdev is nearly identical to normal IOMMU
backed devices with the following exceptions:
1) open /dev/vfio/devices/noiommu-vfio0 instead of /dev/vfio/devices/vfio0
2) cannot explicitly allocate HWPT object from user
3) IOMMU_IOAS_MAP returned IOVAs (IOMMU_IOAS_MAP_FIXED_IOVA set or not) are
not usable for DMA. Instead, IOVAs are used as keys to look up physical
addresses.

For example:
	__iommufd = open("/dev/iommu", O_RDWR);
	devfd = open("/dev/vfio/devices/noiommu-vfio0â€);
	ioas_id = ioas_alloc(__iommufd);
	iommufd_bind(__iommufd, devfd);
	uvaddr = (uint64_t)mmap(NULL, len, PROT_READ | PROT_WRITE,
				MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
	struct iommu_ioas_map map = {
		.size = sizeof(map),
		.flags = IOMMU_IOAS_MAP_READABLE | IOMMU_IOAS_MAP_WRITEABLE;
		.ioas_id = ioas_id,
		.iova = iova,
		.user_va = uvaddr,
		.length = size,
	};
	ioctl(iommufd, IOMMU_IOAS_MAP, &map);
	struct iommu_ioas_get_pa get_pa = {
		.size = sizeof(get_pa),
		.flags = 0,
		.ioas_id = ioas_id,
		.iova = iova,
		.length = 0,
		.phys = 0,
	};

	ioctl(iommufd, IOMMU_IOAS_GET_PA, &get_pa);
	/* Do DMA with PA in get_pa.phys */
	iommufd_ioas_unmap(iommufd, ioas_id, iova, len);

There are still a few known issues I am trying to work through, welcome
suggestions.
- Warning "late IOMMU probe at driver bind, something fishy here!" is
  reported. This is likely due to PCI devices are artificially added to the
  dummy IOMMU's device scope (during iommu probe) without early fwspec
  initialization.
- Physical address lookup returns the starting address and default page size
  only, probably we'll be more useful to provide the range of contiguous
  physical address.

Thanks,

Jacob

References:
[1] https://lore.kernel.org/linux-iommu/20250603175403.GA407344@nvidia.com/
[2] https://lore.kernel.org/linux-pci/20251027134430.00007e46@linux.microsoft.com/
[3] https://lore.kernel.org/kvm/20230522115751.326947-1-yi.l.liu@intel.com/
[4] https://lore.kernel.org/linux-iommu/4-v7-ab019a8791e2+175b8-iommu_pt_jgg@nvidia.com/T/#u


Jacob Pan (8):
  iommu: Make iommu_device_register_bus available beyond selftest
  iommu: Add a helper to check if any iommu device is registered
  iommufd: Add a mock page table format for noiommu mode
  iommu: Add a dummy driver for noiommu mode
  vfio: IOMMUFD relax requirement for noiommu mode
  vfio: Rename and remove compat from noiommu set function
  iommu: Enable cdev noiommu mode under iommufd
  iommufd: Add an ioctl IOMMU_IOAS_GET_PA to query PA from IOVA

 drivers/iommu/Kconfig                        |  25 +++
 drivers/iommu/Makefile                       |   1 +
 drivers/iommu/generic_pt/fmt/Makefile        |   1 +
 drivers/iommu/generic_pt/fmt/iommu_noiommu.c |  10 +
 drivers/iommu/iommu.c                        |  12 +-
 drivers/iommu/iommufd/hw_pagetable.c         |   8 +
 drivers/iommu/iommufd/io_pagetable.c         |  44 ++++
 drivers/iommu/iommufd/ioas.c                 |  24 +++
 drivers/iommu/iommufd/iommufd_private.h      |   3 +
 drivers/iommu/iommufd/main.c                 |   3 +
 drivers/iommu/iommufd/vfio_compat.c          |   6 +-
 drivers/iommu/noiommu.c                      | 204 +++++++++++++++++++
 drivers/vfio/Kconfig                         |   3 +-
 drivers/vfio/device_cdev.c                   |   6 +
 drivers/vfio/group.c                         |   2 +-
 drivers/vfio/vfio.h                          |  38 +++-
 drivers/vfio/vfio_main.c                     |  20 +-
 include/linux/generic_pt/iommu.h             |   5 +
 include/linux/iommu.h                        |   1 +
 include/linux/iommufd.h                      |   4 +-
 include/linux/vfio.h                         |   2 +
 include/uapi/linux/iommufd.h                 |  25 +++
 22 files changed, 431 insertions(+), 16 deletions(-)
 create mode 100644 drivers/iommu/generic_pt/fmt/iommu_noiommu.c
 create mode 100644 drivers/iommu/noiommu.c

-- 
2.34.1