cover-letter: iommufd: Add vIOMMU infrastructure (Part-1)

[PATCH v3 00/11] cover-letter: iommufd: Add vIOMMU infrastructure (Part-1)

Posted by Nicolin Chen 1 month, 2 weeks ago

This series introduces a new vIOMMU infrastructure and related ioctls.

IOMMUFD has been using the HWPT infrastructure for all cases, including a
nested IO page table support. Yet, there're limitations for an HWPT-based
structure to support some advanced HW-accelerated features, such as CMDQV
on NVIDIA Grace, and HW-accelerated vIOMMU on AMD. Even for a multi-IOMMU
environment, it is not straightforward for nested HWPTs to share the same
parent HWPT (stage-2 IO pagetable), with the HWPT infrastructure alone: a
parent HWPT typically hold one stage-2 IO pagetable and tag it with only
one ID in the cache entries. When sharing one large stage-2 IO pagetable
across physical IOMMU instances, that one ID may not always be available
across all the IOMMU instances. In other word, it's ideal for SW to have
a different container for the stage-2 IO pagetable so it can hold another
ID that's available.

For this "different container", add vIOMMU, an additional layer to hold
extra virtualization information:
  _______________________________________________________________________
 |                      iommufd (with vIOMMU)                            |
 |                                                                       |
 |                             [5]                                       |
 |                        _____________                                  |
 |                       |             |                                 |
 |        [1]            |    vIOMMU   |          [4]             [2]    |
 |  ________________     |             |     _____________     ________  |
 | |                |    |     [3]     |    |             |   |        | |
 | |      IOAS      |<---|(HWPT_PAGING)|<---| HWPT_NESTED |<--| DEVICE | |
 | |________________|    |_____________|    |_____________|   |________| |
 |         |                    |                  |               |     |
 |_________|____________________|__________________|_______________|_____|
           |                    |                  |               |
           |              ______v_____       ______v_____       ___v__
           | PFN storage |  (paging)  |     |  (nested)  |     |struct|
           |------------>|iommu_domain|<----|iommu_domain|<----|device|
                         |____________|     |____________|     |______|

The vIOMMU object should be seen as a slice of a physical IOMMU instance
that is passed to or shared with a VM. That can be some HW/SW resources:
 - Security namespace for guest owned ID, e.g. guest-controlled cache tags
 - Access to a sharable nesting parent pagetable across physical IOMMUs
 - Virtualization of various platforms IDs, e.g. RIDs and others
 - Delivery of paravirtualized invalidation
 - Direct assigned invalidation queues
 - Direct assigned interrupts
 - Non-affiliated event reporting

On a multi-IOMMU system, the vIOMMU object must be instanced to the number
of the physical IOMMUs that are passed to (via devices) a guest VM, while
being able to hold the shareable parent HWPT. Each vIOMMU then just needs
to allocate its own individual ID to tag its own cache:
                     ----------------------------
 ----------------    |         |  paging_hwpt0  |
 | hwpt_nested0 |--->| viommu0 ------------------
 ----------------    |         |      IDx       |
                     ----------------------------
                     ----------------------------
 ----------------    |         |  paging_hwpt0  |
 | hwpt_nested1 |--->| viommu1 ------------------
 ----------------    |         |      IDy       |
                     ----------------------------

As an initial part-1, add IOMMUFD_CMD_VIOMMU_ALLOC ioctl for an allocation
only. Later series will add more data structures and their ioctls.

As for the implementation of the series, add an IOMMU_VIOMMU_TYPE_DEFAULT
type for a core-allocated-core-managed vIOMMU object, allowing drivers to
simply hook a default viommu ops for viommu-based invalidation alone. And
add support for driver-specific type of vIOMMU allocation, and implement
that in the ARM SMMUv3 driver for a real world use case.

More vIOMMU-based structs and ioctls will be introduced in the follow-up
series to support vDEVICE, vIRQ (vEVENT) and VQUEUE objects. Although we
repurposed the vIOMMU object from an earlier RFC, just for a referece:
https://lore.kernel.org/all/cover.1712978212.git.nicolinc@nvidia.com/

This series is on Github:
https://github.com/nicolinc/iommufd/commits/iommufd_viommu_p1-v3
(paring QEMU branch for testing will be provided with the part2 series)

Changelog
v3
 * Rebased on top of Jason's nesting v3 series
   https://lore.kernel.org/all/0-v3-e2e16cd7467f+2a6a1-smmuv3_nesting_jgg@nvidia.com/
 * Split the series into smaller parts
 * Added Jason's Reviewed-by
 * Added back viommu->iommu_dev
 * Added support for driver-allocated vIOMMU v.s. core-allocated
 * Dropped arm_smmu_cache_invalidate_user
 * Added an iommufd_test_wait_for_users() in selftest
 * Reworked test code to make viommu an individual FIXTURE
 * Added missing TEST_LENGTH case for the new ioctl command
v2
 https://lore.kernel.org/all/cover.1724776335.git.nicolinc@nvidia.com/
 * Limited vdev_id to one per idev
 * Added a rw_sem to protect the vdev_id list
 * Reworked driver-level APIs with proper lockings
 * Added a new viommu_api file for IOMMUFD_DRIVER config
 * Dropped useless iommu_dev point from the viommu structure
 * Added missing index numnbers to new types in the uAPI header
 * Dropped IOMMU_VIOMMU_INVALIDATE uAPI; Instead, reuse the HWPT one
 * Reworked mock_viommu_cache_invalidate() using the new iommu helper
 * Reordered details of set/unset_vdev_id handlers for proper lockings
v1
 https://lore.kernel.org/all/cover.1723061377.git.nicolinc@nvidia.com/

Thanks!
Nicolin

Nicolin Chen (11):
  iommufd: Move struct iommufd_object to public iommufd header
  iommufd: Rename _iommufd_object_alloc to iommufd_object_alloc_elm
  iommufd: Introduce IOMMUFD_OBJ_VIOMMU and its related struct
  iommufd/viommu: Add IOMMU_VIOMMU_ALLOC ioctl
  iommu: Pass in a viommu pointer to domain_alloc_user op
  iommufd: Allow pt_id to carry viommu_id for IOMMU_HWPT_ALLOC
  iommufd/selftest: Add refcount to mock_iommu_device
  iommufd/selftest: Add IOMMU_VIOMMU_TYPE_SELFTEST
  iommufd/selftest: Add IOMMU_VIOMMU_ALLOC test coverage
  Documentation: userspace-api: iommufd: Update vIOMMU
  iommu/arm-smmu-v3: Add IOMMU_VIOMMU_TYPE_ARM_SMMUV3 support

 drivers/iommu/iommufd/Makefile                |  5 +-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   | 18 ++++
 drivers/iommu/iommufd/iommufd_private.h       | 23 ++---
 drivers/iommu/iommufd/iommufd_test.h          |  2 +
 include/linux/iommu.h                         | 15 +++
 include/linux/iommufd.h                       | 52 +++++++++++
 include/uapi/linux/iommufd.h                  | 54 +++++++++--
 tools/testing/selftests/iommu/iommufd_utils.h | 28 ++++++
 drivers/iommu/amd/iommu.c                     |  1 +
 .../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c     | 24 +++++
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   |  2 +
 drivers/iommu/intel/iommu.c                   |  1 +
 drivers/iommu/iommufd/hw_pagetable.c          | 27 +++++-
 drivers/iommu/iommufd/main.c                  | 38 ++------
 drivers/iommu/iommufd/selftest.c              | 79 ++++++++++++++--
 drivers/iommu/iommufd/viommu.c                | 91 +++++++++++++++++++
 drivers/iommu/iommufd/viommu_api.c            | 57 ++++++++++++
 tools/testing/selftests/iommu/iommufd.c       | 84 +++++++++++++++++
 Documentation/userspace-api/iommufd.rst       | 66 +++++++++++++-
 19 files changed, 602 insertions(+), 65 deletions(-)
 create mode 100644 drivers/iommu/iommufd/viommu.c
 create mode 100644 drivers/iommu/iommufd/viommu_api.c

-- 
2.43.0

[PATCH v3 00/16] cover-letter: iommufd: Add vIOMMU infrastructure (Part-2: vDEVICE)

Posted by Nicolin Chen 1 month, 2 weeks ago

Following the previous vIOMMU series, this adds another vDEVICE structure,
representing the association from an iommufd_device to an iommufd_viommu.
This gives the whole architecture a new "v" layer:
  _______________________________________________________________________
 |                      iommufd (with vIOMMU/vDEVICE)                    |
 |                        _____________      _____________               |
 |                       |             |    |             |              |
 |      |----------------|    vIOMMU   |<---|   vDEVICE   |<------|      |
 |      |                |             |    |_____________|       |      |
 |      |     ______     |             |     _____________     ___|____  |
 |      |    |      |    |             |    |             |   |        | |
 |      |    | IOAS |<---|(HWPT_PAGING)|<---| HWPT_NESTED |<--| DEVICE | |
 |      |    |______|    |_____________|    |_____________|   |________| |
 |______|________|______________|__________________|_______________|_____|
        |        |              |                  |               |
  ______v_____   |        ______v_____       ______v_____       ___v__
 |   struct   |  |  PFN  |  (paging)  |     |  (nested)  |     |struct|
 |iommu_device|  |------>|iommu_domain|<----|iommu_domain|<----|device|
 |____________|   storage|____________|     |____________|     |______|

This vDEVICE object is used to collect and store all vIOMMU-related device
information/attributes in a VM. As an initial series for vDEVICE, add only
the virt_id to the vDEVICE, which is a vIOMMU specific device ID in a VM:
e.g. vSID of ARM SMMUv3, vDeviceID of AMD IOMMU, and vID of Intel VT-d to
a Context Table. This virt_id helps IOMMU drivers to link the vID to a pID
of the device against the physical IOMMU instance. This is essential for a
vIOMMU-based invalidation, where the request contains a device's vID for a
device cache flush, e.g. ATC invalidation.

Therefore, with this vDEVICE object, support a vIOMMU-based invalidation,
by reusing IOMMUFD_CMD_HWPT_INVALIDATE for a vIOMMU object to flush cache
with a given driver data.

As for the implementation of the series, add driver support in ARM SMMUv3
for a real world use case.

This series is on Github:
https://github.com/nicolinc/iommufd/commits/iommufd_viommu_p2-v3
Paring QEMU branch for testing:
https://github.com/nicolinc/qemu/commits/wip/for_iommufd_viommu_p2-v3

Changelog
v3
 * Added Jason's Reviewed-by
 * Split this invalidation part out of the part-1 series
 * Repurposed VDEV_ID ioctl to a wider vDEVICE structure and ioctl
 * Reduced viommu_api functions by allowing drivers to access viommu
   and vdevice structure directly
 * Dropped vdevs_rwsem by using xa_lock instead
 * Dropped arm_smmu_cache_invalidate_user
v2
 https://lore.kernel.org/all/cover.1724776335.git.nicolinc@nvidia.com/
 * Limited vdev_id to one per idev
 * Added a rw_sem to protect the vdev_id list
 * Reworked driver-level APIs with proper lockings
 * Added a new viommu_api file for IOMMUFD_DRIVER config
 * Dropped useless iommu_dev point from the viommu structure
 * Added missing index numnbers to new types in the uAPI header
 * Dropped IOMMU_VIOMMU_INVALIDATE uAPI; Instead, reuse the HWPT one
 * Reworked mock_viommu_cache_invalidate() using the new iommu helper
 * Reordered details of set/unset_vdev_id handlers for proper lockings
v1
 https://lore.kernel.org/all/cover.1723061377.git.nicolinc@nvidia.com/

Thanks!
Nicolin

Jason Gunthorpe (3):
  iommu: Add iommu_copy_struct_from_full_user_array helper
  iommu/arm-smmu-v3: Allow ATS for IOMMU_DOMAIN_NESTED
  iommu/arm-smmu-v3: Update comments about ATS and bypass

Nicolin Chen (13):
  iommufd/viommu: Introduce IOMMUFD_OBJ_VDEVICE and its related struct
  iommufd/viommu: Add a default_viommu_ops for IOMMU_VIOMMU_TYPE_DEFAULT
  iommufd/viommu: Add IOMMU_VDEVICE_ALLOC ioctl
  iommufd/selftest: Add IOMMU_VDEVICE_ALLOC test coverage
  iommu/viommu: Add cache_invalidate for IOMMU_VIOMMU_TYPE_DEFAULT
  iommufd/hw_pagetable: Allow viommu->ops->cache_invalidate for
    hwpt_nested
  iommufd: Allow hwpt_id to carry viommu_id for IOMMU_HWPT_INVALIDATE
  iommufd/viommu: Add vdev_to_dev helper
  iommufd/selftest: Add mock_viommu_cache_invalidate
  iommufd/selftest: Add IOMMU_TEST_OP_DEV_CHECK_CACHE test command
  iommufd/selftest: Add vIOMMU coverage for IOMMU_HWPT_INVALIDATE ioctl
  Documentation: userspace-api: iommufd: Update vDEVICE
  iommu/arm-smmu-v3: Add arm_vsmmu_cache_invalidate

 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |   9 +-
 drivers/iommu/iommufd/iommufd_private.h       |  12 ++
 drivers/iommu/iommufd/iommufd_test.h          |  30 +++
 include/linux/iommu.h                         |  53 ++++-
 include/linux/iommufd.h                       |  50 +++++
 include/uapi/linux/iommufd.h                  |  61 +++++-
 tools/testing/selftests/iommu/iommufd_utils.h |  83 +++++++
 .../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c     | 162 +++++++++++++-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   |  50 +++--
 drivers/iommu/iommufd/hw_pagetable.c          |  45 +++-
 drivers/iommu/iommufd/main.c                  |   6 +
 drivers/iommu/iommufd/selftest.c              | 122 ++++++++++-
 drivers/iommu/iommufd/viommu.c                |  93 +++++++-
 drivers/iommu/iommufd/viommu_api.c            |  21 ++
 tools/testing/selftests/iommu/iommufd.c       | 204 +++++++++++++++++-
 Documentation/userspace-api/iommufd.rst       |  58 +++--
 16 files changed, 1007 insertions(+), 52 deletions(-)

-- 
2.43.0

Re: [PATCH v3 00/16] cover-letter: iommufd: Add vIOMMU infrastructure (Part-2: vDEVICE)

Posted by Jason Gunthorpe 1 month, 1 week ago

On Wed, Oct 09, 2024 at 09:38:12AM -0700, Nicolin Chen wrote:
> Following the previous vIOMMU series, this adds another vDEVICE structure,
> representing the association from an iommufd_device to an iommufd_viommu.
> This gives the whole architecture a new "v" layer:

Don't thread series together like this with reply-to, it breaks b4 and
other tools ability to tell them apart.. Just post them separately
normally.

Jason

Re: [PATCH v3 00/16] cover-letter: iommufd: Add vIOMMU infrastructure (Part-2: vDEVICE)

Posted by Nicolin Chen 1 month, 1 week ago

On Thu, Oct 17, 2024 at 04:14:16PM -0300, Jason Gunthorpe wrote:
> On Wed, Oct 09, 2024 at 09:38:12AM -0700, Nicolin Chen wrote:
> > Following the previous vIOMMU series, this adds another vDEVICE structure,
> > representing the association from an iommufd_device to an iommufd_viommu.
> > This gives the whole architecture a new "v" layer:
> 
> Don't thread series together like this with reply-to, it breaks b4 and
> other tools ability to tell them apart.. Just post them separately
> normally.

Yea, I didn't expect a single git-send-mail would thread them
together. Will do separately next time.

Thanks
Nicolin

[PATCH v3 01/16] iommufd/viommu: Introduce IOMMUFD_OBJ_VDEVICE and its related struct

Posted by Nicolin Chen 1 month, 2 weeks ago

Introduce a new IOMMUFD_OBJ_VDEVICE to represent a physical device, i.e.
iommufd_device (idev) object, against an iommufd_viommu (vIOMMU) object in
the VM. This vDEVICE object (and its structure) holds all the information
and attributes in a VM, regarding the device related to the vIOMMU.

As an initial patch, add a per-vIOMMU virtual ID. This can be:
 - Virtual StreamID on a nested ARM SMMUv3, an index to a Stream Table
 - Virtual DeviceID on a nested AMD IOMMU, an index to a Device Table
 - Virtual ID on a nested Intel VT-D IOMMU, an index to a Context Table
Potentially, this vDEVICE structure can hold some vData for Confidential
Compute Architecture (CCA).

Add a pair of vdevice_alloc and vdevice_free in struct iommufd_viommu_ops
to allow driver-level vDEVICE structure allocations.

Similar to iommufd_vdevice_alloc, add an iommufd_vdevice_alloc helper so
IOMMU drivers can allocate core-embedded style structures.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/iommufd/iommufd_private.h |  1 +
 include/linux/iommufd.h                 | 31 +++++++++++++++++++++++++
 drivers/iommu/iommufd/viommu_api.c      | 14 +++++++++++
 3 files changed, 46 insertions(+)

diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index c80d880f8b6a..0c56a467e440 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -132,6 +132,7 @@ enum iommufd_object_type {
 	IOMMUFD_OBJ_ACCESS,
 	IOMMUFD_OBJ_FAULT,
 	IOMMUFD_OBJ_VIOMMU,
+	IOMMUFD_OBJ_VDEVICE,
 #ifdef CONFIG_IOMMUFD_TEST
 	IOMMUFD_OBJ_SELFTEST,
 #endif
diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h
index 069a38999cdd..510fc961a9ad 100644
--- a/include/linux/iommufd.h
+++ b/include/linux/iommufd.h
@@ -75,13 +75,30 @@ struct iommufd_viommu {
 	unsigned int type;
 };
 
+struct iommufd_vdevice {
+	struct iommufd_object obj;
+	struct iommufd_ctx *ictx;
+	struct iommufd_device *idev;
+	struct iommufd_viommu *viommu;
+	u64 id; /* per-vIOMMU virtual ID */
+};
+
 /**
  * struct iommufd_viommu_ops - vIOMMU specific operations
  * @free: Free all driver-specific parts of an iommufd_viommu. The memory of the
  *        vIOMMU will be free-ed by iommufd core after calling this free op.
+ * @vdevice_alloc: Allocate a driver-managed iommufd_vdevice to init some driver
+ *                 specific structure or HW procedure. Note that the core-level
+ *                 structure is filled by the iommufd core after calling this op.
+ * @vdevice_free: Free a driver-managed iommufd_vdevice to de-init its structure
+ *                or HW procedure. The memory of the vdevice will be free-ed by
+ *                iommufd core.
  */
 struct iommufd_viommu_ops {
 	void (*free)(struct iommufd_viommu *viommu);
+	struct iommufd_vdevice *(*vdevice_alloc)(struct iommufd_viommu *viommu,
+						 struct device *dev, u64 id);
+	void (*vdevice_free)(struct iommufd_vdevice *vdev);
 };
 
 #if IS_ENABLED(CONFIG_IOMMUFD)
@@ -103,6 +120,8 @@ int iommufd_vfio_compat_set_no_iommu(struct iommufd_ctx *ictx);
 struct iommufd_viommu *
 __iommufd_viommu_alloc(struct iommufd_ctx *ictx, size_t size,
 		       const struct iommufd_viommu_ops *ops);
+struct iommufd_vdevice *
+__iommufd_vdevice_alloc(struct iommufd_ctx *ictx, size_t size);
 #else /* !CONFIG_IOMMUFD */
 static inline struct iommufd_ctx *iommufd_ctx_from_file(struct file *file)
 {
@@ -150,6 +169,12 @@ __iommufd_viommu_alloc(struct iommufd_ctx *ictx, size_t size,
 {
 	return ERR_PTR(-EOPNOTSUPP);
 }
+
+static inline struct iommufd_vdevice *
+__iommufd_vdevice_alloc(struct iommufd_ctx *ictx, size_t size)
+{
+	return ERR_PTR(-EOPNOTSUPP);
+}
 #endif /* CONFIG_IOMMUFD */
 
 /*
@@ -163,4 +188,10 @@ __iommufd_viommu_alloc(struct iommufd_ctx *ictx, size_t size,
 						struct drv_struct, member)),   \
 					    ops),                              \
 		     struct drv_struct, member)
+#define iommufd_vdevice_alloc(ictx, drv_struct, member)                        \
+	container_of(__iommufd_vdevice_alloc(ictx,                             \
+					     sizeof(struct drv_struct) +       \
+					     BUILD_BUG_ON_ZERO(offsetof(       \
+						struct drv_struct, member))),  \
+		     struct drv_struct, member)
 #endif
diff --git a/drivers/iommu/iommufd/viommu_api.c b/drivers/iommu/iommufd/viommu_api.c
index c1731f080d6b..8419df3b658c 100644
--- a/drivers/iommu/iommufd/viommu_api.c
+++ b/drivers/iommu/iommufd/viommu_api.c
@@ -55,3 +55,17 @@ __iommufd_viommu_alloc(struct iommufd_ctx *ictx, size_t size,
 	return viommu;
 }
 EXPORT_SYMBOL_NS_GPL(__iommufd_viommu_alloc, IOMMUFD);
+
+struct iommufd_vdevice *
+__iommufd_vdevice_alloc(struct iommufd_ctx *ictx, size_t size)
+{
+	struct iommufd_object *obj;
+
+	if (WARN_ON(size < sizeof(struct iommufd_vdevice)))
+		return ERR_PTR(-EINVAL);
+	obj = iommufd_object_alloc_elm(ictx, size, IOMMUFD_OBJ_VDEVICE);
+	if (IS_ERR(obj))
+		return ERR_CAST(obj);
+	return container_of(obj, struct iommufd_vdevice, obj);
+}
+EXPORT_SYMBOL_NS_GPL(__iommufd_vdevice_alloc, IOMMUFD);
-- 
2.43.0

Re: [PATCH v3 01/16] iommufd/viommu: Introduce IOMMUFD_OBJ_VDEVICE and its related struct

Posted by Jason Gunthorpe 1 month, 1 week ago

> +struct iommufd_vdevice *
> +__iommufd_vdevice_alloc(struct iommufd_ctx *ictx, size_t size)
> +{
> +	struct iommufd_object *obj;
> +
> +	if (WARN_ON(size < sizeof(struct iommufd_vdevice)))
> +		return ERR_PTR(-EINVAL);
> +	obj = iommufd_object_alloc_elm(ictx, size, IOMMUFD_OBJ_VDEVICE);
> +	if (IS_ERR(obj))
> +		return ERR_CAST(obj);
> +	return container_of(obj, struct iommufd_vdevice, obj);
> +}

Like for the viommu this can all just be folded into the #define

Jason

Re: [PATCH v3 01/16] iommufd/viommu: Introduce IOMMUFD_OBJ_VDEVICE and its related struct

Posted by Nicolin Chen 1 month, 1 week ago

On Thu, Oct 17, 2024 at 03:45:56PM -0300, Jason Gunthorpe wrote:
> > +struct iommufd_vdevice *
> > +__iommufd_vdevice_alloc(struct iommufd_ctx *ictx, size_t size)
> > +{
> > +	struct iommufd_object *obj;
> > +
> > +	if (WARN_ON(size < sizeof(struct iommufd_vdevice)))
> > +		return ERR_PTR(-EINVAL);
> > +	obj = iommufd_object_alloc_elm(ictx, size, IOMMUFD_OBJ_VDEVICE);
> > +	if (IS_ERR(obj))
> > +		return ERR_CAST(obj);
> > +	return container_of(obj, struct iommufd_vdevice, obj);
> > +}
> 
> Like for the viommu this can all just be folded into the #define

It seems that we have to keep two small functions as followings,
unless we want to expose the enum iommufd_object_type.

---------------------------------------------------------------
diff --git a/drivers/iommu/iommufd/driver.c b/drivers/iommu/iommufd/driver.c
index 4495f1aaccca..fdd203487bac 100644
--- a/drivers/iommu/iommufd/driver.c
+++ b/drivers/iommu/iommufd/driver.c
@@ -38,10 +38,19 @@ _iommufd_object_alloc_member(struct iommufd_ctx *ictx, size_t size,
 EXPORT_SYMBOL_NS_GPL(_iommufd_object_alloc_member, IOMMUFD);

 struct iommufd_viommu *_iommufd_viommu_alloc(struct iommufd_ctx *ictx,
                                             size_t size)
 {
        return container_of(_iommufd_object_alloc_member(ictx, size,
                                                         IOMMUFD_OBJ_VIOMMU),
                            struct iommufd_viommu, obj);
 }
 EXPORT_SYMBOL_NS_GPL(_iommufd_viommu_alloc, IOMMUFD);
+
+struct iommufd_vdevice *_iommufd_vdevice_alloc(struct iommufd_ctx *ictx,
+                                              size_t size)
+{
+       return container_of(_iommufd_object_alloc_member(ictx, size,
+                                                        IOMMUFD_OBJ_VDEVICE),
+                           struct iommufd_vdevice, obj);
+}
+EXPORT_SYMBOL_NS_GPL(_iommufd_vdevice_alloc, IOMMUFD);
---------------------------------------------------------------

Thanks
Nicolin

Re: [PATCH v3 01/16] iommufd/viommu: Introduce IOMMUFD_OBJ_VDEVICE and its related struct

Posted by Jason Gunthorpe 1 month, 1 week ago

On Sat, Oct 19, 2024 at 06:35:45PM -0700, Nicolin Chen wrote:
> On Thu, Oct 17, 2024 at 03:45:56PM -0300, Jason Gunthorpe wrote:
> > > +struct iommufd_vdevice *
> > > +__iommufd_vdevice_alloc(struct iommufd_ctx *ictx, size_t size)
> > > +{
> > > +	struct iommufd_object *obj;
> > > +
> > > +	if (WARN_ON(size < sizeof(struct iommufd_vdevice)))
> > > +		return ERR_PTR(-EINVAL);
> > > +	obj = iommufd_object_alloc_elm(ictx, size, IOMMUFD_OBJ_VDEVICE);
> > > +	if (IS_ERR(obj))
> > > +		return ERR_CAST(obj);
> > > +	return container_of(obj, struct iommufd_vdevice, obj);
> > > +}
> > 
> > Like for the viommu this can all just be folded into the #define
> 
> It seems that we have to keep two small functions as followings,
> unless we want to expose the enum iommufd_object_type.

I'd probably expose the enum along with the struct..

Jason

Re: [PATCH v3 01/16] iommufd/viommu: Introduce IOMMUFD_OBJ_VDEVICE and its related struct

Posted by Nicolin Chen 1 month ago

On Mon, Oct 21, 2024 at 09:21:48AM -0300, Jason Gunthorpe wrote:
> On Sat, Oct 19, 2024 at 06:35:45PM -0700, Nicolin Chen wrote:
> > On Thu, Oct 17, 2024 at 03:45:56PM -0300, Jason Gunthorpe wrote:
> > > > +struct iommufd_vdevice *
> > > > +__iommufd_vdevice_alloc(struct iommufd_ctx *ictx, size_t size)
> > > > +{
> > > > +	struct iommufd_object *obj;
> > > > +
> > > > +	if (WARN_ON(size < sizeof(struct iommufd_vdevice)))
> > > > +		return ERR_PTR(-EINVAL);
> > > > +	obj = iommufd_object_alloc_elm(ictx, size, IOMMUFD_OBJ_VDEVICE);
> > > > +	if (IS_ERR(obj))
> > > > +		return ERR_CAST(obj);
> > > > +	return container_of(obj, struct iommufd_vdevice, obj);
> > > > +}
> > > 
> > > Like for the viommu this can all just be folded into the #define
> > 
> > It seems that we have to keep two small functions as followings,
> > unless we want to expose the enum iommufd_object_type.
> 
> I'd probably expose the enum along with the struct..

OK. Let's do that.

Nicolin

[PATCH v3 02/16] iommufd/viommu: Add a default_viommu_ops for IOMMU_VIOMMU_TYPE_DEFAULT

Posted by Nicolin Chen 1 month, 2 weeks ago

An IOMMU_VIOMMU_TYPE_DEFAULT doesn't need a free() op since the core can
free everything in the destroy(). Now with the new vDEVICE structure, it
might want to allocate its own vDEVICEs.

Add a default_viommu_ops for driver to hook ops for default vIOMMUs.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 include/linux/iommu.h          | 4 ++++
 drivers/iommu/iommufd/viommu.c | 2 +-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 9105478bdbcd..1de2aebc4d92 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -44,6 +44,7 @@ struct iommu_dma_cookie;
 struct iommu_fault_param;
 struct iommufd_ctx;
 struct iommufd_viommu;
+struct iommufd_viommu_ops;
 
 #define IOMMU_FAULT_PERM_READ	(1 << 0) /* read */
 #define IOMMU_FAULT_PERM_WRITE	(1 << 1) /* write */
@@ -551,6 +552,8 @@ static inline int __iommu_copy_struct_from_user_array(
  *                It is suggested to call iommufd_viommu_alloc() helper for
  *                a bundled allocation of the core and the driver structures,
  *                using the given @ictx pointer.
+ * @default_viommu_ops: Driver can choose to use a default core-allocated vIOMMU
+ *                      object by providing a default_viommu_ops.
  * @pgsize_bitmap: bitmap of all possible supported page sizes
  * @owner: Driver module providing these ops
  * @identity_domain: An always available, always attachable identity
@@ -605,6 +608,7 @@ struct iommu_ops {
 					       struct iommu_domain *domain,
 					       struct iommufd_ctx *ictx,
 					       unsigned int viommu_type);
+	const struct iommufd_viommu_ops *default_viommu_ops;
 
 	const struct iommu_domain_ops *default_domain_ops;
 	unsigned long pgsize_bitmap;
diff --git a/drivers/iommu/iommufd/viommu.c b/drivers/iommu/iommufd/viommu.c
index 3a903baeee6a..f512dfb535fd 100644
--- a/drivers/iommu/iommufd/viommu.c
+++ b/drivers/iommu/iommufd/viommu.c
@@ -44,7 +44,7 @@ int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd)
 
 	if (cmd->type == IOMMU_VIOMMU_TYPE_DEFAULT) {
 		viommu = __iommufd_viommu_alloc(ucmd->ictx, sizeof(*viommu),
-						NULL);
+						ops->default_viommu_ops);
 	} else {
 		if (!ops->viommu_alloc) {
 			rc = -EOPNOTSUPP;
-- 
2.43.0

Re: [PATCH v3 02/16] iommufd/viommu: Add a default_viommu_ops for IOMMU_VIOMMU_TYPE_DEFAULT

Posted by Jason Gunthorpe 1 month, 1 week ago

On Wed, Oct 09, 2024 at 09:38:14AM -0700, Nicolin Chen wrote:
> An IOMMU_VIOMMU_TYPE_DEFAULT doesn't need a free() op since the core can
> free everything in the destroy(). Now with the new vDEVICE structure, it
> might want to allocate its own vDEVICEs.
> 
> Add a default_viommu_ops for driver to hook ops for default vIOMMUs.

Why? arm_smmu is now creating its own viommu object, so who will use
this?

Do we have any use for the default mode? It is already a bit
confusing, can we just drop it?

Jason

Re: [PATCH v3 02/16] iommufd/viommu: Add a default_viommu_ops for IOMMU_VIOMMU_TYPE_DEFAULT

Posted by Nicolin Chen 1 month, 1 week ago

On Thu, Oct 17, 2024 at 03:47:29PM -0300, Jason Gunthorpe wrote:
> On Wed, Oct 09, 2024 at 09:38:14AM -0700, Nicolin Chen wrote:
> > An IOMMU_VIOMMU_TYPE_DEFAULT doesn't need a free() op since the core can
> > free everything in the destroy(). Now with the new vDEVICE structure, it
> > might want to allocate its own vDEVICEs.
> > 
> > Add a default_viommu_ops for driver to hook ops for default vIOMMUs.
> 
> Why? arm_smmu is now creating its own viommu object, so who will use
> this?
> 
> Do we have any use for the default mode? It is already a bit
> confusing, can we just drop it?

Hmm, that would make the default model completely useless..

Should we unsupport a default viommu allocation?

Nicolin

Re: [PATCH v3 02/16] iommufd/viommu: Add a default_viommu_ops for IOMMU_VIOMMU_TYPE_DEFAULT

Posted by Jason Gunthorpe 1 month, 1 week ago

On Thu, Oct 17, 2024 at 11:50:44AM -0700, Nicolin Chen wrote:
> On Thu, Oct 17, 2024 at 03:47:29PM -0300, Jason Gunthorpe wrote:
> > On Wed, Oct 09, 2024 at 09:38:14AM -0700, Nicolin Chen wrote:
> > > An IOMMU_VIOMMU_TYPE_DEFAULT doesn't need a free() op since the core can
> > > free everything in the destroy(). Now with the new vDEVICE structure, it
> > > might want to allocate its own vDEVICEs.
> > > 
> > > Add a default_viommu_ops for driver to hook ops for default vIOMMUs.
> > 
> > Why? arm_smmu is now creating its own viommu object, so who will use
> > this?
> > 
> > Do we have any use for the default mode? It is already a bit
> > confusing, can we just drop it?
> 
> Hmm, that would make the default model completely useless..
> 
> Should we unsupport a default viommu allocation?

This is my ask?

Jason

[PATCH v3 03/16] iommufd/viommu: Add IOMMU_VDEVICE_ALLOC ioctl

Posted by Nicolin Chen 1 month, 2 weeks ago

Introduce a new ioctl to allocate a vDEVICE object. Since a vDEVICE object
is a connection of an iommufd_iommufd object and an iommufd_viommu object,
require both as the ioctl inputs and take refcounts in the ioctl handler.

Add to the vIOMMU object a "vdevs" xarray, indexed by a per-vIOMMU virtual
device ID, which can be:
 - Virtual StreamID on a nested ARM SMMUv3, an index to a Stream Table
 - Virtual DeviceID on a nested AMD IOMMU, an index to a Device Table
 - Virtual ID on a nested Intel VT-D IOMMU, an index to a Context Table

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/iommufd/iommufd_private.h | 11 +++
 include/linux/iommufd.h                 |  2 +
 include/uapi/linux/iommufd.h            | 26 +++++++
 drivers/iommu/iommufd/main.c            |  6 ++
 drivers/iommu/iommufd/viommu.c          | 91 +++++++++++++++++++++++++
 5 files changed, 136 insertions(+)

diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 0c56a467e440..e160edb22b67 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -409,6 +409,7 @@ struct iommufd_device {
 	struct iommufd_object obj;
 	struct iommufd_ctx *ictx;
 	struct iommufd_group *igroup;
+	struct iommufd_vdevice *vdev;
 	struct list_head group_item;
 	/* always the physical device */
 	struct device *dev;
@@ -523,8 +524,18 @@ static inline int iommufd_hwpt_replace_device(struct iommufd_device *idev,
 	return iommu_group_replace_domain(idev->igroup->group, hwpt->domain);
 }
 
+static inline struct iommufd_viommu *
+iommufd_get_viommu(struct iommufd_ucmd *ucmd, u32 id)
+{
+	return container_of(iommufd_get_object(ucmd->ictx, id,
+					       IOMMUFD_OBJ_VIOMMU),
+			    struct iommufd_viommu, obj);
+}
+
 int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd);
 void iommufd_viommu_destroy(struct iommufd_object *obj);
+int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd);
+void iommufd_vdevice_destroy(struct iommufd_object *obj);
 
 #ifdef CONFIG_IOMMUFD_TEST
 int iommufd_test(struct iommufd_ucmd *ucmd);
diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h
index 510fc961a9ad..5a630952150d 100644
--- a/include/linux/iommufd.h
+++ b/include/linux/iommufd.h
@@ -72,6 +72,8 @@ struct iommufd_viommu {
 
 	const struct iommufd_viommu_ops *ops;
 
+	struct xarray vdevs;
+
 	unsigned int type;
 };
 
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 6ee841a8c79b..3039519d71b7 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -52,6 +52,7 @@ enum {
 	IOMMUFD_CMD_HWPT_INVALIDATE = 0x8d,
 	IOMMUFD_CMD_FAULT_QUEUE_ALLOC = 0x8e,
 	IOMMUFD_CMD_VIOMMU_ALLOC = 0x8f,
+	IOMMUFD_CMD_VDEVICE_ALLOC = 0x90,
 };
 
 /**
@@ -894,4 +895,29 @@ struct iommu_viommu_alloc {
 	__u32 out_viommu_id;
 };
 #define IOMMU_VIOMMU_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VIOMMU_ALLOC)
+
+/**
+ * struct iommu_vdevice_alloc - ioctl(IOMMU_VDEVICE_ALLOC)
+ * @size: sizeof(struct iommu_vdevice_alloc)
+ * @viommu_id: vIOMMU ID to associate with the virtual device
+ * @dev_id: The pyhsical device to allocate a virtual instance on the vIOMMU
+ * @__reserved: Must be 0
+ * @virt_id: Virtual device ID per vIOMMU, e.g. vSID of ARM SMMUv3, vDeviceID
+ *           of AMD IOMMU, and vID of a nested Intel VT-d to a Context Table.
+ * @out_vdevice_id: Output virtual instance ID for the allocated object
+ * @__reserved2: Must be 0
+ *
+ * Allocate a virtual device instance (for a physical device) against a vIOMMU.
+ * This instance holds the device's information (related to its vIOMMU) in a VM.
+ */
+struct iommu_vdevice_alloc {
+	__u32 size;
+	__u32 viommu_id;
+	__u32 dev_id;
+	__u32 __reserved;
+	__aligned_u64 virt_id;
+	__u32 out_vdevice_id;
+	__u32 __reserved2;
+};
+#define IOMMU_VDEVICE_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VDEVICE_ALLOC)
 #endif
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index cbd0a80b2d67..9a8c3cbecc11 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -302,6 +302,7 @@ union ucmd_buffer {
 	struct iommu_option option;
 	struct iommu_vfio_ioas vfio_ioas;
 	struct iommu_viommu_alloc viommu;
+	struct iommu_vdevice_alloc vdev;
 #ifdef CONFIG_IOMMUFD_TEST
 	struct iommu_test_cmd test;
 #endif
@@ -355,6 +356,8 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = {
 		 __reserved),
 	IOCTL_OP(IOMMU_VIOMMU_ALLOC, iommufd_viommu_alloc_ioctl,
 		 struct iommu_viommu_alloc, out_viommu_id),
+	IOCTL_OP(IOMMU_VDEVICE_ALLOC, iommufd_vdevice_alloc_ioctl,
+		 struct iommu_vdevice_alloc, __reserved2),
 #ifdef CONFIG_IOMMUFD_TEST
 	IOCTL_OP(IOMMU_TEST_CMD, iommufd_test, struct iommu_test_cmd, last),
 #endif
@@ -493,6 +496,9 @@ static const struct iommufd_object_ops iommufd_object_ops[] = {
 	[IOMMUFD_OBJ_VIOMMU] = {
 		.destroy = iommufd_viommu_destroy,
 	},
+	[IOMMUFD_OBJ_VDEVICE] = {
+		.destroy = iommufd_vdevice_destroy,
+	},
 #ifdef CONFIG_IOMMUFD_TEST
 	[IOMMUFD_OBJ_SELFTEST] = {
 		.destroy = iommufd_selftest_destroy,
diff --git a/drivers/iommu/iommufd/viommu.c b/drivers/iommu/iommufd/viommu.c
index f512dfb535fd..eb78c928c60f 100644
--- a/drivers/iommu/iommufd/viommu.c
+++ b/drivers/iommu/iommufd/viommu.c
@@ -9,6 +9,8 @@ void iommufd_viommu_destroy(struct iommufd_object *obj)
 	struct iommufd_viommu *viommu =
 		container_of(obj, struct iommufd_viommu, obj);
 
+	xa_destroy(&viommu->vdevs);
+
 	if (viommu->ops && viommu->ops->free)
 		viommu->ops->free(viommu);
 	refcount_dec(&viommu->hwpt->common.obj.users);
@@ -72,6 +74,8 @@ int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd)
 	 */
 	viommu->iommu_dev = idev->dev->iommu->iommu_dev;
 
+	xa_init(&viommu->vdevs);
+
 	refcount_inc(&viommu->hwpt->common.obj.users);
 
 	cmd->out_viommu_id = viommu->obj.id;
@@ -89,3 +93,90 @@ int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd)
 	iommufd_put_object(ucmd->ictx, &idev->obj);
 	return rc;
 }
+
+void iommufd_vdevice_destroy(struct iommufd_object *obj)
+{
+	struct iommufd_vdevice *old, *vdev =
+		container_of(obj, struct iommufd_vdevice, obj);
+	struct iommufd_viommu *viommu = vdev->viommu;
+	struct iommufd_device *idev = vdev->idev;
+
+	if (viommu->ops && viommu->ops->vdevice_free)
+		viommu->ops->vdevice_free(vdev);
+
+	old = xa_cmpxchg(&viommu->vdevs, vdev->id, vdev, NULL, GFP_KERNEL);
+	if (old)
+		WARN_ON(old != vdev);
+
+	refcount_dec(&viommu->obj.users);
+	refcount_dec(&idev->obj.users);
+	idev->vdev = NULL;
+}
+
+int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd)
+{
+	struct iommu_vdevice_alloc *cmd = ucmd->cmd;
+	struct iommufd_vdevice *vdev, *curr;
+	struct iommufd_viommu *viommu;
+	struct iommufd_device *idev;
+	u64 virt_id = cmd->virt_id;
+	int rc = 0;
+
+	if (virt_id > ULONG_MAX)
+		return -EINVAL;
+
+	viommu = iommufd_get_viommu(ucmd, cmd->viommu_id);
+	if (IS_ERR(viommu))
+		return PTR_ERR(viommu);
+
+	idev = iommufd_get_device(ucmd, cmd->dev_id);
+	if (IS_ERR(idev)) {
+		rc = PTR_ERR(idev);
+		goto out_put_viommu;
+	}
+
+	mutex_lock(&idev->igroup->lock);
+	if (idev->vdev) {
+		rc = -EEXIST;
+		goto out_unlock_igroup;
+	}
+
+	if (viommu->ops && viommu->ops->vdevice_alloc)
+		vdev = viommu->ops->vdevice_alloc(viommu, idev->dev, virt_id);
+	else
+		vdev = __iommufd_vdevice_alloc(ucmd->ictx, sizeof(*vdev));
+	if (IS_ERR(vdev)) {
+		rc = PTR_ERR(vdev);
+		goto out_unlock_igroup;
+	}
+
+	vdev->idev = idev;
+	vdev->id = virt_id;
+	vdev->viommu = viommu;
+
+	idev->vdev = vdev;
+	refcount_inc(&idev->obj.users);
+	refcount_inc(&viommu->obj.users);
+
+	curr = xa_cmpxchg(&viommu->vdevs, virt_id, NULL, vdev, GFP_KERNEL);
+	if (curr) {
+		rc = xa_err(curr) ? : -EBUSY;
+		goto out_abort;
+	}
+
+	cmd->out_vdevice_id = vdev->obj.id;
+	rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
+	if (rc)
+		goto out_abort;
+	iommufd_object_finalize(ucmd->ictx, &vdev->obj);
+	goto out_unlock_igroup;
+
+out_abort:
+	iommufd_object_abort_and_destroy(ucmd->ictx, &vdev->obj);
+out_unlock_igroup:
+	mutex_unlock(&idev->igroup->lock);
+	iommufd_put_object(ucmd->ictx, &idev->obj);
+out_put_viommu:
+	iommufd_put_object(ucmd->ictx, &viommu->obj);
+	return rc;
+}
-- 
2.43.0

Re: [PATCH v3 03/16] iommufd/viommu: Add IOMMU_VDEVICE_ALLOC ioctl

Posted by Jason Gunthorpe 1 month, 1 week ago

On Wed, Oct 09, 2024 at 09:38:15AM -0700, Nicolin Chen wrote:
> +void iommufd_vdevice_destroy(struct iommufd_object *obj)
> +{
> +	struct iommufd_vdevice *old, *vdev =
> +		container_of(obj, struct iommufd_vdevice, obj);
> +	struct iommufd_viommu *viommu = vdev->viommu;
> +	struct iommufd_device *idev = vdev->idev;
> +
> +	if (viommu->ops && viommu->ops->vdevice_free)
> +		viommu->ops->vdevice_free(vdev);
> +
> +	old = xa_cmpxchg(&viommu->vdevs, vdev->id, vdev, NULL, GFP_KERNEL);
> +	if (old)
> +		WARN_ON(old != vdev);
> +
> +	refcount_dec(&viommu->obj.users);
> +	refcount_dec(&idev->obj.users);
> +	idev->vdev = NULL;

This should hold the igroup lock when touching vdev?

> +int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd)
> +{
> +	struct iommu_vdevice_alloc *cmd = ucmd->cmd;
> +	struct iommufd_vdevice *vdev, *curr;
> +	struct iommufd_viommu *viommu;
> +	struct iommufd_device *idev;
> +	u64 virt_id = cmd->virt_id;
> +	int rc = 0;
> +
> +	if (virt_id > ULONG_MAX)
> +		return -EINVAL;
> +
> +	viommu = iommufd_get_viommu(ucmd, cmd->viommu_id);
> +	if (IS_ERR(viommu))
> +		return PTR_ERR(viommu);
> +
> +	idev = iommufd_get_device(ucmd, cmd->dev_id);
> +	if (IS_ERR(idev)) {
> +		rc = PTR_ERR(idev);
> +		goto out_put_viommu;
> +	}
> +
> +	mutex_lock(&idev->igroup->lock);
> +	if (idev->vdev) {
> +		rc = -EEXIST;
> +		goto out_unlock_igroup;
> +	}

Otherwise this won't work right

> +	if (viommu->ops && viommu->ops->vdevice_alloc)
> +		vdev = viommu->ops->vdevice_alloc(viommu, idev->dev, virt_id);
> +	else
> +		vdev = __iommufd_vdevice_alloc(ucmd->ictx, sizeof(*vdev));
> +	if (IS_ERR(vdev)) {
> +		rc = PTR_ERR(vdev);
> +		goto out_unlock_igroup;
> +	}
> +
> +	vdev->idev = idev;
> +	vdev->id = virt_id;
> +	vdev->viommu = viommu;
> +
> +	idev->vdev = vdev;
> +	refcount_inc(&idev->obj.users);
> +	refcount_inc(&viommu->obj.users);
> +
> +	curr = xa_cmpxchg(&viommu->vdevs, virt_id, NULL, vdev, GFP_KERNEL);
> +	if (curr) {
> +		rc = xa_err(curr) ? : -EBUSY;
> +		goto out_abort;
> +	}
> +
> +	cmd->out_vdevice_id = vdev->obj.id;
> +	rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
> +	if (rc)
> +		goto out_abort;
> +	iommufd_object_finalize(ucmd->ictx, &vdev->obj);
> +	goto out_unlock_igroup;
> +
> +out_abort:
> +	iommufd_object_abort_and_destroy(ucmd->ictx, &vdev->obj);

But be mindful of this abort, it doesn't want to be inside the lock if
it also gets the lock.. fail_nth should be updated to cover these new
ioctls to look for tricky things like that

But the design looks OK

Jason

Re: [PATCH v3 03/16] iommufd/viommu: Add IOMMU_VDEVICE_ALLOC ioctl

Posted by Nicolin Chen 1 month, 1 week ago

On Thu, Oct 17, 2024 at 03:52:30PM -0300, Jason Gunthorpe wrote:
> > +	if (viommu->ops && viommu->ops->vdevice_alloc)
> > +		vdev = viommu->ops->vdevice_alloc(viommu, idev->dev, virt_id);
> > +	else
> > +		vdev = __iommufd_vdevice_alloc(ucmd->ictx, sizeof(*vdev));
> > +	if (IS_ERR(vdev)) {
> > +		rc = PTR_ERR(vdev);
> > +		goto out_unlock_igroup;
> > +	}
> > +
> > +	vdev->idev = idev;
> > +	vdev->id = virt_id;
> > +	vdev->viommu = viommu;
> > +
> > +	idev->vdev = vdev;
> > +	refcount_inc(&idev->obj.users);
> > +	refcount_inc(&viommu->obj.users);
> > +
> > +	curr = xa_cmpxchg(&viommu->vdevs, virt_id, NULL, vdev, GFP_KERNEL);
> > +	if (curr) {
> > +		rc = xa_err(curr) ? : -EBUSY;
> > +		goto out_abort;
> > +	}
> > +
> > +	cmd->out_vdevice_id = vdev->obj.id;
> > +	rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
> > +	if (rc)
> > +		goto out_abort;
> > +	iommufd_object_finalize(ucmd->ictx, &vdev->obj);
> > +	goto out_unlock_igroup;
> > +
> > +out_abort:
> > +	iommufd_object_abort_and_destroy(ucmd->ictx, &vdev->obj);
> 
> But be mindful of this abort, it doesn't want to be inside the lock if
> it also gets the lock.. fail_nth should be updated to cover these new
> ioctls to look for tricky things like that

I added an abort() beside destroy():

+void iommufd_vdevice_abort(struct iommufd_object *obj)
+{
+       struct iommufd_vdevice *old, *vdev =
+               container_of(obj, struct iommufd_vdevice, obj);
+       struct iommufd_viommu *viommu = vdev->viommu;
+       struct iommufd_device *idev = vdev->idev;
+
+       lockdep_assert_not_held(&idev->igroup->lock);
+
+       if (viommu->ops && viommu->ops->vdevice_free)
+               viommu->ops->vdevice_free(vdev);
+
+       old = xa_cmpxchg(&viommu->vdevs, vdev->id, vdev, NULL, GFP_KERNEL);
+       if (old)
+               WARN_ON(old != vdev);
+
+       refcount_dec(&viommu->obj.users);
+       refcount_dec(&idev->obj.users);
+       idev->vdev = NULL;
+}
+
+void iommufd_vdevice_destroy(struct iommufd_object *obj)
+{
+       struct iommufd_vdevice *vdev =
+               container_of(obj, struct iommufd_vdevice, obj);
+
+       mutex_lock(&vdev->idev->igroup->lock);
+       iommufd_vdevice_abort(obj);
+       mutex_unlock(&vdev->idev->igroup->lock);
+}
----------------------------------------------------------

And I added fail_nth coverage for IOMMU_VIOMMU/VDEVICE_ALLOC cmds.

Thanks
Nicolin

Re: [PATCH v3 03/16] iommufd/viommu: Add IOMMU_VDEVICE_ALLOC ioctl

Posted by Jason Gunthorpe 1 month, 1 week ago

On Sat, Oct 19, 2024 at 06:42:30PM -0700, Nicolin Chen wrote:
> > But be mindful of this abort, it doesn't want to be inside the lock if
> > it also gets the lock.. fail_nth should be updated to cover these new
> > ioctls to look for tricky things like that
> 
> I added an abort() beside destroy():
> 
> +void iommufd_vdevice_abort(struct iommufd_object *obj)
> +{
> +       struct iommufd_vdevice *old, *vdev =
> +               container_of(obj, struct iommufd_vdevice, obj);
> +       struct iommufd_viommu *viommu = vdev->viommu;
> +       struct iommufd_device *idev = vdev->idev;
> +
> +       lockdep_assert_not_held(&idev->igroup->lock);

???

> +
> +       if (viommu->ops && viommu->ops->vdevice_free)
> +               viommu->ops->vdevice_free(vdev);
> +
> +       old = xa_cmpxchg(&viommu->vdevs, vdev->id, vdev, NULL, GFP_KERNEL);
> +       if (old)
> +               WARN_ON(old != vdev);
> +
> +       refcount_dec(&viommu->obj.users);
> +       refcount_dec(&idev->obj.users);
> +       idev->vdev = NULL;
> +}
> +
> +void iommufd_vdevice_destroy(struct iommufd_object *obj)
> +{
> +       struct iommufd_vdevice *vdev =
> +               container_of(obj, struct iommufd_vdevice, obj);
> +
> +       mutex_lock(&vdev->idev->igroup->lock);
> +       iommufd_vdevice_abort(obj);

When we get it here??

Jason

Re: [PATCH v3 03/16] iommufd/viommu: Add IOMMU_VDEVICE_ALLOC ioctl

Posted by Nicolin Chen 1 month ago

On Mon, Oct 21, 2024 at 09:22:48AM -0300, Jason Gunthorpe wrote:
> On Sat, Oct 19, 2024 at 06:42:30PM -0700, Nicolin Chen wrote:
> > > But be mindful of this abort, it doesn't want to be inside the lock if
> > > it also gets the lock.. fail_nth should be updated to cover these new
> > > ioctls to look for tricky things like that
> > 
> > I added an abort() beside destroy():
> > 
> > +void iommufd_vdevice_abort(struct iommufd_object *obj)
> > +{
> > +       struct iommufd_vdevice *old, *vdev =
> > +               container_of(obj, struct iommufd_vdevice, obj);
> > +       struct iommufd_viommu *viommu = vdev->viommu;
> > +       struct iommufd_device *idev = vdev->idev;
> > +
> > +       lockdep_assert_not_held(&idev->igroup->lock);
> 
> ???

Oops. That's a typo from auto-completion.

> > +void iommufd_vdevice_destroy(struct iommufd_object *obj)
> > +{
> > +       struct iommufd_vdevice *vdev =
> > +               container_of(obj, struct iommufd_vdevice, obj);
> > +
> > +       mutex_lock(&vdev->idev->igroup->lock);
> > +       iommufd_vdevice_abort(obj);
> 
> When we get it here??

LOCKDEP wasn't enabled when I tested it.. Just fixed and retested.

Thanks
Nicolin

[PATCH v3 04/16] iommufd/selftest: Add IOMMU_VDEVICE_ALLOC test coverage

Posted by Nicolin Chen 1 month, 2 weeks ago

Add a vdevice_alloc TEST_F to cover the new IOMMU_VDEVICE_ALLOC ioctls.

Also add a vdevice_alloc op to the viommu mock_viommu_ops for a coverage
of IOMMU_VIOMMU_TYPE_SELFTEST. The DEFAULT coverage is done via a core-
allocated vDEVICE object.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 tools/testing/selftests/iommu/iommufd_utils.h | 27 +++++++++++++++++++
 drivers/iommu/iommufd/selftest.c              | 19 +++++++++++++
 tools/testing/selftests/iommu/iommufd.c       | 20 ++++++++++++++
 3 files changed, 66 insertions(+)

diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h
index 307d097db9dd..ccd1f65df0b0 100644
--- a/tools/testing/selftests/iommu/iommufd_utils.h
+++ b/tools/testing/selftests/iommu/iommufd_utils.h
@@ -790,3 +790,30 @@ static int _test_cmd_viommu_alloc(int fd, __u32 device_id, __u32 hwpt_id,
 	EXPECT_ERRNO(_errno, _test_cmd_viommu_alloc(self->fd, device_id,   \
 						    hwpt_id, type, 0,      \
 						    viommu_id))
+
+static int _test_cmd_vdevice_alloc(int fd, __u32 viommu_id, __u32 idev_id,
+				   __u64 virt_id, __u32 *vdev_id)
+{
+	struct iommu_vdevice_alloc cmd = {
+		.size = sizeof(cmd),
+		.dev_id = idev_id,
+		.viommu_id = viommu_id,
+		.virt_id = virt_id,
+	};
+	int ret;
+
+	ret = ioctl(fd, IOMMU_VDEVICE_ALLOC, &cmd);
+	if (ret)
+		return ret;
+	if (vdev_id)
+		*vdev_id = cmd.out_vdevice_id;
+	return 0;
+}
+
+#define test_cmd_vdevice_alloc(viommu_id, idev_id, virt_id, vdev_id)       \
+	ASSERT_EQ(0, _test_cmd_vdevice_alloc(self->fd, viommu_id, idev_id, \
+					     virt_id, vdev_id))
+#define test_err_vdevice_alloc(_errno, viommu_id, idev_id, virt_id, vdev_id) \
+	EXPECT_ERRNO(_errno,                                                 \
+		     _test_cmd_vdevice_alloc(self->fd, viommu_id, idev_id,   \
+					     virt_id, vdev_id))
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index 4fcf475facb1..87bc45b86f9e 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -136,6 +136,11 @@ struct mock_viommu {
 	struct iommufd_viommu core;
 };
 
+struct mock_vdevice {
+	struct iommufd_vdevice core;
+	u64 rid;
+};
+
 enum selftest_obj_type {
 	TYPE_IDEV,
 };
@@ -560,8 +565,22 @@ static void mock_viommu_free(struct iommufd_viommu *viommu)
 	/* iommufd core frees mock_viommu and viommu */
 }
 
+static struct iommufd_vdevice *mock_vdevice_alloc(struct iommufd_viommu *viommu,
+						  struct device *dev, u64 id)
+{
+	struct mock_vdevice *mock_vdev;
+
+	mock_vdev = iommufd_vdevice_alloc(viommu->ictx, mock_vdevice, core);
+	if (IS_ERR(mock_vdev))
+		return ERR_CAST(mock_vdev);
+
+	mock_vdev->rid = id;
+	return &mock_vdev->core;
+}
+
 static struct iommufd_viommu_ops mock_viommu_ops = {
 	.free = mock_viommu_free,
+	.vdevice_alloc = mock_vdevice_alloc,
 };
 
 static struct iommufd_viommu *
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index c03705825576..af00b082656e 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -129,6 +129,7 @@ TEST_F(iommufd, cmd_length)
 	TEST_LENGTH(iommu_option, IOMMU_OPTION, val64);
 	TEST_LENGTH(iommu_vfio_ioas, IOMMU_VFIO_IOAS, __reserved);
 	TEST_LENGTH(iommu_viommu_alloc, IOMMU_VIOMMU_ALLOC, out_viommu_id);
+	TEST_LENGTH(iommu_vdevice_alloc, IOMMU_VDEVICE_ALLOC, __reserved2);
 #undef TEST_LENGTH
 }
 
@@ -2470,4 +2471,23 @@ TEST_F(iommufd_viommu, viommu_auto_destroy)
 {
 }
 
+TEST_F(iommufd_viommu, vdevice_alloc)
+{
+	uint32_t viommu_id = self->viommu_id;
+	uint32_t dev_id = self->device_id;
+	uint32_t vdev_id = 0;
+
+	if (dev_id) {
+		/* Set vdev_id to 0x99, unset it, and set to 0x88 */
+		test_cmd_vdevice_alloc(viommu_id, dev_id, 0x99, &vdev_id);
+		test_err_vdevice_alloc(EEXIST,
+				       viommu_id, dev_id, 0x99, &vdev_id);
+		test_ioctl_destroy(vdev_id);
+		test_cmd_vdevice_alloc(viommu_id, dev_id, 0x88, &vdev_id);
+		test_ioctl_destroy(vdev_id);
+	} else {
+		test_err_vdevice_alloc(ENOENT, viommu_id, dev_id, 0x99, NULL);
+	}
+}
+
 TEST_HARNESS_MAIN
-- 
2.43.0

[PATCH v3 05/16] iommu/viommu: Add cache_invalidate for IOMMU_VIOMMU_TYPE_DEFAULT

Posted by Nicolin Chen 1 month, 2 weeks ago

This per-vIOMMU cache_invalidate op is like the cache_invalidate_user op
in struct iommu_domain_ops, but wider, supporting device cache (e.g. PCI
ATC invaldiations).

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 include/linux/iommufd.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h
index 5a630952150d..e58b3e43aa7b 100644
--- a/include/linux/iommufd.h
+++ b/include/linux/iommufd.h
@@ -14,6 +14,7 @@
 struct device;
 struct file;
 struct iommu_group;
+struct iommu_user_data_array;
 struct iommufd_access;
 struct iommufd_ctx;
 struct iommufd_device;
@@ -95,12 +96,21 @@ struct iommufd_vdevice {
  * @vdevice_free: Free a driver-managed iommufd_vdevice to de-init its structure
  *                or HW procedure. The memory of the vdevice will be free-ed by
  *                iommufd core.
+ * @cache_invalidate: Flush hardware cache used by a vIOMMU. It can be used for
+ *                    any IOMMU hardware specific cache: TLB and device cache.
+ *                    The @array passes in the cache invalidation requests, in
+ *                    form of a driver data structure. A driver must update the
+ *                    array->entry_num to report the number of handled requests.
+ *                    The data structure of the array entry must be defined in
+ *                    include/uapi/linux/iommufd.h
  */
 struct iommufd_viommu_ops {
 	void (*free)(struct iommufd_viommu *viommu);
 	struct iommufd_vdevice *(*vdevice_alloc)(struct iommufd_viommu *viommu,
 						 struct device *dev, u64 id);
 	void (*vdevice_free)(struct iommufd_vdevice *vdev);
+	int (*cache_invalidate)(struct iommufd_viommu *viommu,
+				struct iommu_user_data_array *array);
 };
 
 #if IS_ENABLED(CONFIG_IOMMUFD)
-- 
2.43.0

[PATCH v3 06/16] iommufd/hw_pagetable: Allow viommu->ops->cache_invalidate for hwpt_nested

Posted by Nicolin Chen 1 month, 2 weeks ago

Now cache entries for hwpt_nested can be invalidated via the vIOMMU's
cache_invalidate op alternatively. Allow iommufd_hwpt_nested_alloc to
support such a case.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/iommufd/hw_pagetable.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index b88a638d07da..ccaaf801955c 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -202,6 +202,17 @@ iommufd_hwpt_paging_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
 	return ERR_PTR(rc);
 }
 
+static inline bool
+iommufd_hwpt_nested_has_invalidate_op(struct iommufd_hwpt_nested *hwpt_nested)
+{
+	struct iommufd_viommu *viommu = hwpt_nested->viommu;
+
+	if (viommu)
+		return viommu->ops && viommu->ops->cache_invalidate;
+	else
+		return hwpt_nested->common.domain->ops->cache_invalidate_user;
+}
+
 /**
  * iommufd_hwpt_nested_alloc() - Get a NESTED iommu_domain for a device
  * @ictx: iommufd context
@@ -257,7 +268,7 @@ iommufd_hwpt_nested_alloc(struct iommufd_ctx *ictx,
 	hwpt->domain->owner = ops;
 
 	if (WARN_ON_ONCE(hwpt->domain->type != IOMMU_DOMAIN_NESTED ||
-			 !hwpt->domain->ops->cache_invalidate_user)) {
+			 !iommufd_hwpt_nested_has_invalidate_op(hwpt_nested))) {
 		rc = -EINVAL;
 		goto out_abort;
 	}
-- 
2.43.0

Re: [PATCH v3 06/16] iommufd/hw_pagetable: Allow viommu->ops->cache_invalidate for hwpt_nested

Posted by Jason Gunthorpe 1 month, 1 week ago

On Wed, Oct 09, 2024 at 09:38:18AM -0700, Nicolin Chen wrote:
> Now cache entries for hwpt_nested can be invalidated via the vIOMMU's
> cache_invalidate op alternatively. Allow iommufd_hwpt_nested_alloc to
> support such a case.
> 
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> ---
>  drivers/iommu/iommufd/hw_pagetable.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

[PATCH v3 07/16] iommufd: Allow hwpt_id to carry viommu_id for IOMMU_HWPT_INVALIDATE

Posted by Nicolin Chen 1 month, 2 weeks ago

With a vIOMMU object, use space can flush any IOMMU related cache that can
be directed using the vIOMMU object. It is similar to IOMMU_HWPT_INVALIDATE
uAPI, but can cover a wider range than IOTLB, e.g. device/desciprtor cache.

Allow hwpt_id of the iommu_hwpt_invalidate structure to carry a viommu_id,
and reuse the IOMMU_HWPT_INVALIDATE uAPI for vIOMMU invalidations. Drivers
can define different structures for vIOMMU invalidations v.s. HWPT ones.

Update the uAPI, kdoc, and selftest case accordingly.

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 include/uapi/linux/iommufd.h            |  9 ++++---
 drivers/iommu/iommufd/hw_pagetable.c    | 32 +++++++++++++++++++------
 tools/testing/selftests/iommu/iommufd.c |  4 ++--
 3 files changed, 33 insertions(+), 12 deletions(-)

diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index 3039519d71b7..cd9e135ef563 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -728,7 +728,7 @@ struct iommu_hwpt_vtd_s1_invalidate {
 /**
  * struct iommu_hwpt_invalidate - ioctl(IOMMU_HWPT_INVALIDATE)
  * @size: sizeof(struct iommu_hwpt_invalidate)
- * @hwpt_id: ID of a nested HWPT for cache invalidation
+ * @hwpt_id: ID of a nested HWPT or a vIOMMU, for cache invalidation
  * @data_uptr: User pointer to an array of driver-specific cache invalidation
  *             data.
  * @data_type: One of enum iommu_hwpt_invalidate_data_type, defining the data
@@ -739,8 +739,11 @@ struct iommu_hwpt_vtd_s1_invalidate {
  *             Output the number of requests successfully handled by kernel.
  * @__reserved: Must be 0.
  *
- * Invalidate the iommu cache for user-managed page table. Modifications on a
- * user-managed page table should be followed by this operation to sync cache.
+ * Invalidate iommu cache for user-managed page table or vIOMMU. Modifications
+ * on a user-managed page table should be followed by this operation, if a HWPT
+ * is passed in via @hwpt_id. Other caches, such as device cache or descriptor
+ * cache can be flushed if a vIOMMU is passed in via the @hwpt_id field.
+ *
  * Each ioctl can support one or more cache invalidation requests in the array
  * that has a total size of @entry_len * @entry_num.
  *
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index ccaaf801955c..211b3f8b4366 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -444,7 +444,7 @@ int iommufd_hwpt_invalidate(struct iommufd_ucmd *ucmd)
 		.entry_len = cmd->entry_len,
 		.entry_num = cmd->entry_num,
 	};
-	struct iommufd_hw_pagetable *hwpt;
+	struct iommufd_object *pt_obj;
 	u32 done_num = 0;
 	int rc;
 
@@ -458,17 +458,35 @@ int iommufd_hwpt_invalidate(struct iommufd_ucmd *ucmd)
 		goto out;
 	}
 
-	hwpt = iommufd_get_hwpt_nested(ucmd, cmd->hwpt_id);
-	if (IS_ERR(hwpt)) {
-		rc = PTR_ERR(hwpt);
+	pt_obj = iommufd_get_object(ucmd->ictx, cmd->hwpt_id, IOMMUFD_OBJ_ANY);
+	if (IS_ERR(pt_obj)) {
+		rc = PTR_ERR(pt_obj);
 		goto out;
 	}
+	if (pt_obj->type == IOMMUFD_OBJ_HWPT_NESTED) {
+		struct iommufd_hw_pagetable *hwpt =
+			container_of(pt_obj, struct iommufd_hw_pagetable, obj);
+
+		rc = hwpt->domain->ops->cache_invalidate_user(hwpt->domain,
+							      &data_array);
+	} else if (pt_obj->type == IOMMUFD_OBJ_VIOMMU) {
+		struct iommufd_viommu *viommu =
+			container_of(pt_obj, struct iommufd_viommu, obj);
+
+		if (!viommu->ops || !viommu->ops->cache_invalidate) {
+			rc = -EOPNOTSUPP;
+			goto out_put_pt;
+		}
+		rc = viommu->ops->cache_invalidate(viommu, &data_array);
+	} else {
+		rc = -EINVAL;
+		goto out_put_pt;
+	}
 
-	rc = hwpt->domain->ops->cache_invalidate_user(hwpt->domain,
-						      &data_array);
 	done_num = data_array.entry_num;
 
-	iommufd_put_object(ucmd->ictx, &hwpt->obj);
+out_put_pt:
+	iommufd_put_object(ucmd->ictx, pt_obj);
 out:
 	cmd->entry_num = done_num;
 	if (iommufd_ucmd_respond(ucmd, sizeof(*cmd)))
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index af00b082656e..2651e2b58423 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -362,9 +362,9 @@ TEST_F(iommufd_ioas, alloc_hwpt_nested)
 		EXPECT_ERRNO(EBUSY,
 			     _test_ioctl_destroy(self->fd, parent_hwpt_id));
 
-		/* hwpt_invalidate only supports a user-managed hwpt (nested) */
+		/* hwpt_invalidate does not support a parent hwpt */
 		num_inv = 1;
-		test_err_hwpt_invalidate(ENOENT, parent_hwpt_id, inv_reqs,
+		test_err_hwpt_invalidate(EINVAL, parent_hwpt_id, inv_reqs,
 					 IOMMU_HWPT_INVALIDATE_DATA_SELFTEST,
 					 sizeof(*inv_reqs), &num_inv);
 		assert(!num_inv);
-- 
2.43.0

Re: [PATCH v3 07/16] iommufd: Allow hwpt_id to carry viommu_id for IOMMU_HWPT_INVALIDATE

Posted by Jason Gunthorpe 1 month, 1 week ago

On Wed, Oct 09, 2024 at 09:38:19AM -0700, Nicolin Chen wrote:
> With a vIOMMU object, use space can flush any IOMMU related cache that can
> be directed using the vIOMMU object. It is similar to IOMMU_HWPT_INVALIDATE
> uAPI, but can cover a wider range than IOTLB, e.g. device/desciprtor cache.
> 
> Allow hwpt_id of the iommu_hwpt_invalidate structure to carry a viommu_id,
> and reuse the IOMMU_HWPT_INVALIDATE uAPI for vIOMMU invalidations. Drivers
> can define different structures for vIOMMU invalidations v.s. HWPT ones.
> 
> Update the uAPI, kdoc, and selftest case accordingly.
> 
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> ---
>  include/uapi/linux/iommufd.h            |  9 ++++---
>  drivers/iommu/iommufd/hw_pagetable.c    | 32 +++++++++++++++++++------
>  tools/testing/selftests/iommu/iommufd.c |  4 ++--
>  3 files changed, 33 insertions(+), 12 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

[PATCH v3 08/16] iommu: Add iommu_copy_struct_from_full_user_array helper

Posted by Nicolin Chen 1 month, 2 weeks ago

From: Jason Gunthorpe <jgg@nvidia.com>

The iommu_copy_struct_from_user_array helper can be used to copy a single
entry from a user array which might not be efficient if the array is big.

Add a new iommu_copy_struct_from_full_user_array to copy the entire user
array at once. Update the existing iommu_copy_struct_from_user_array kdoc
accordingly.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 include/linux/iommu.h | 49 ++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 48 insertions(+), 1 deletion(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 1de2aebc4d92..c980fd0e2174 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -494,7 +494,9 @@ static inline int __iommu_copy_struct_from_user_array(
  * @index: Index to the location in the array to copy user data from
  * @min_last: The last member of the data structure @kdst points in the
  *            initial version.
- * Return 0 for success, otherwise -error.
+ *
+ * Copy a single entry from a user array. Return 0 for success, otherwise
+ * -error.
  */
 #define iommu_copy_struct_from_user_array(kdst, user_array, data_type, index, \
 					  min_last)                           \
@@ -502,6 +504,51 @@ static inline int __iommu_copy_struct_from_user_array(
 		kdst, user_array, data_type, index, sizeof(*(kdst)),          \
 		offsetofend(typeof(*(kdst)), min_last))
 
+
+/**
+ * iommu_copy_struct_from_full_user_array - Copy iommu driver specific user
+ *         space data from an iommu_user_data_array
+ * @kdst: Pointer to an iommu driver specific user data that is defined in
+ *        include/uapi/linux/iommufd.h
+ * @kdst_entry_size: sizeof(*kdst)
+ * @user_array: Pointer to a struct iommu_user_data_array for a user space
+ *              array
+ * @data_type: The data type of the @kdst. Must match with @user_array->type
+ *
+ * Copy the entire user array. kdst must have room for kdst_entry_size *
+ * user_array->entry_num bytes. Return 0 for success, otherwise -error.
+ */
+static inline int
+iommu_copy_struct_from_full_user_array(void *kdst, size_t kdst_entry_size,
+				       struct iommu_user_data_array *user_array,
+				       unsigned int data_type)
+{
+	unsigned int i;
+	int ret;
+
+	if (user_array->type != data_type)
+		return -EINVAL;
+	if (!user_array->entry_num)
+		return -EINVAL;
+	if (likely(user_array->entry_len == kdst_entry_size)) {
+		if (copy_from_user(kdst, user_array->uptr,
+				   user_array->entry_num *
+					   user_array->entry_len))
+			return -EFAULT;
+	}
+
+	/* Copy item by item */
+	for (i = 0; i != user_array->entry_num; i++) {
+		ret = copy_struct_from_user(
+			kdst + kdst_entry_size * i, kdst_entry_size,
+			user_array->uptr + user_array->entry_len * i,
+			user_array->entry_len);
+		if (ret)
+			return ret;
+	}
+	return 0;
+}
+
 /**
  * struct iommu_ops - iommu ops and capabilities
  * @capable: check capability
-- 
2.43.0

[PATCH v3 09/16] iommufd/viommu: Add vdev_to_dev helper

Posted by Nicolin Chen 1 month, 2 weeks ago

This avoids a bigger trouble of moving the struct iommufd_device to the
public header.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 include/linux/iommufd.h            | 7 +++++++
 drivers/iommu/iommufd/viommu_api.c | 7 +++++++
 2 files changed, 14 insertions(+)

diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h
index e58b3e43aa7b..18d9b95f1cbc 100644
--- a/include/linux/iommufd.h
+++ b/include/linux/iommufd.h
@@ -85,6 +85,7 @@ struct iommufd_vdevice {
 	struct iommufd_viommu *viommu;
 	u64 id; /* per-vIOMMU virtual ID */
 };
+struct device *vdev_to_dev(struct iommufd_vdevice *vdev);
 
 /**
  * struct iommufd_viommu_ops - vIOMMU specific operations
@@ -134,6 +135,7 @@ __iommufd_viommu_alloc(struct iommufd_ctx *ictx, size_t size,
 		       const struct iommufd_viommu_ops *ops);
 struct iommufd_vdevice *
 __iommufd_vdevice_alloc(struct iommufd_ctx *ictx, size_t size);
+struct device *vdev_to_dev(struct iommufd_vdevice *vdev);
 #else /* !CONFIG_IOMMUFD */
 static inline struct iommufd_ctx *iommufd_ctx_from_file(struct file *file)
 {
@@ -187,6 +189,11 @@ __iommufd_vdevice_alloc(struct iommufd_ctx *ictx, size_t size)
 {
 	return ERR_PTR(-EOPNOTSUPP);
 }
+
+static inline struct device *vdev_to_dev(struct iommufd_vdevice *vdev)
+{
+	return NULL;
+}
 #endif /* CONFIG_IOMMUFD */
 
 /*
diff --git a/drivers/iommu/iommufd/viommu_api.c b/drivers/iommu/iommufd/viommu_api.c
index 8419df3b658c..281e85be520d 100644
--- a/drivers/iommu/iommufd/viommu_api.c
+++ b/drivers/iommu/iommufd/viommu_api.c
@@ -69,3 +69,10 @@ __iommufd_vdevice_alloc(struct iommufd_ctx *ictx, size_t size)
 	return container_of(obj, struct iommufd_vdevice, obj);
 }
 EXPORT_SYMBOL_NS_GPL(__iommufd_vdevice_alloc, IOMMUFD);
+
+/* Caller should xa_lock(&viommu->vdevs) to protect the return value */
+struct device *vdev_to_dev(struct iommufd_vdevice *vdev)
+{
+	return vdev ? vdev->idev->dev : NULL;
+}
+EXPORT_SYMBOL_NS_GPL(vdev_to_dev, IOMMUFD);
-- 
2.43.0

Re: [PATCH v3 09/16] iommufd/viommu: Add vdev_to_dev helper

Posted by Jason Gunthorpe 1 month, 1 week ago

On Wed, Oct 09, 2024 at 09:38:21AM -0700, Nicolin Chen wrote:
> This avoids a bigger trouble of moving the struct iommufd_device to the
> public header.
> 
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> ---
>  include/linux/iommufd.h            | 7 +++++++
>  drivers/iommu/iommufd/viommu_api.c | 7 +++++++
>  2 files changed, 14 insertions(+)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

[PATCH v3 10/16] iommufd/selftest: Add mock_viommu_cache_invalidate

Posted by Nicolin Chen 1 month, 2 weeks ago

Similar to the coverage of cache_invalidate_user for iotlb invalidation,
add a device cache and a viommu_cache_invalidate function to test it out.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/iommufd/iommufd_test.h | 25 +++++++++
 drivers/iommu/iommufd/selftest.c     | 79 +++++++++++++++++++++++++++-
 2 files changed, 103 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h
index edced4ac7cd3..05f57a968d25 100644
--- a/drivers/iommu/iommufd/iommufd_test.h
+++ b/drivers/iommu/iommufd/iommufd_test.h
@@ -54,6 +54,11 @@ enum {
 	MOCK_NESTED_DOMAIN_IOTLB_NUM = 4,
 };
 
+enum {
+	MOCK_DEV_CACHE_ID_MAX = 3,
+	MOCK_DEV_CACHE_NUM = 4,
+};
+
 struct iommu_test_cmd {
 	__u32 size;
 	__u32 op;
@@ -152,6 +157,7 @@ struct iommu_test_hw_info {
 /* Should not be equal to any defined value in enum iommu_hwpt_data_type */
 #define IOMMU_HWPT_DATA_SELFTEST 0xdead
 #define IOMMU_TEST_IOTLB_DEFAULT 0xbadbeef
+#define IOMMU_TEST_DEV_CACHE_DEFAULT 0xbaddad
 
 /**
  * struct iommu_hwpt_selftest
@@ -182,4 +188,23 @@ struct iommu_hwpt_invalidate_selftest {
 
 #define IOMMU_VIOMMU_TYPE_SELFTEST 0xdeadbeef
 
+/* Should not be equal to any defined value in enum iommu_viommu_invalidate_data_type */
+#define IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST 0xdeadbeef
+#define IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST_INVALID 0xdadbeef
+
+/**
+ * struct iommu_viommu_invalidate_selftest - Invalidation data for Mock VIOMMU
+ *                                        (IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST)
+ * @flags: Invalidate flags
+ * @cache_id: Invalidate cache entry index
+ *
+ * If IOMMU_TEST_INVALIDATE_ALL is set in @flags, @cache_id will be ignored
+ */
+struct iommu_viommu_invalidate_selftest {
+#define IOMMU_TEST_INVALIDATE_FLAG_ALL	(1 << 0)
+	__u32 flags;
+	__u32 vdev_id;
+	__u32 cache_id;
+};
+
 #endif
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index 87bc45b86f9e..8a1aef857922 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -149,6 +149,7 @@ struct mock_dev {
 	struct device dev;
 	unsigned long flags;
 	int id;
+	u32 cache[MOCK_DEV_CACHE_NUM];
 };
 
 struct selftest_obj {
@@ -578,9 +579,80 @@ static struct iommufd_vdevice *mock_vdevice_alloc(struct iommufd_viommu *viommu,
 	return &mock_vdev->core;
 }
 
+static int mock_viommu_cache_invalidate(struct iommufd_viommu *viommu,
+					struct iommu_user_data_array *array)
+{
+	struct iommu_viommu_invalidate_selftest *cmds;
+	struct iommu_viommu_invalidate_selftest *cur;
+	struct iommu_viommu_invalidate_selftest *end;
+	int rc;
+
+	/* A zero-length array is allowed to validate the array type */
+	if (array->entry_num == 0 &&
+	    array->type == IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST) {
+		array->entry_num = 0;
+		return 0;
+	}
+
+	cmds = kcalloc(array->entry_num, sizeof(*cmds), GFP_KERNEL);
+	if (!cmds)
+		return -ENOMEM;
+	cur = cmds;
+	end = cmds + array->entry_num;
+
+	static_assert(sizeof(*cmds) == 3 * sizeof(u32));
+	rc = iommu_copy_struct_from_full_user_array(
+			cmds, sizeof(*cmds), array,
+			IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST);
+	if (rc)
+		goto out;
+
+	while (cur != end) {
+		XA_STATE(xas, &viommu->vdevs, (unsigned long)cur->vdev_id);
+		struct mock_dev *mdev;
+		struct device *dev;
+		int i;
+
+		if (cur->flags & ~IOMMU_TEST_INVALIDATE_FLAG_ALL) {
+			rc = -EOPNOTSUPP;
+			goto out;
+		}
+
+		if (cur->cache_id > MOCK_DEV_CACHE_ID_MAX) {
+			rc = -EINVAL;
+			goto out;
+		}
+
+		xa_lock(&viommu->vdevs);
+		dev = vdev_to_dev(xas_load(&xas));
+		if (!dev) {
+			xa_unlock(&viommu->vdevs);
+			rc = -EINVAL;
+			goto out;
+		}
+		mdev = container_of(dev, struct mock_dev, dev);
+
+		if (cur->flags & IOMMU_TEST_INVALIDATE_FLAG_ALL) {
+			/* Invalidate all cache entries and ignore cache_id */
+			for (i = 0; i < MOCK_DEV_CACHE_NUM; i++)
+				mdev->cache[i] = 0;
+		} else {
+			mdev->cache[cur->cache_id] = 0;
+		}
+		xa_unlock(&viommu->vdevs);
+
+		cur++;
+	}
+out:
+	array->entry_num = cur - cmds;
+	kfree(cmds);
+	return rc;
+}
+
 static struct iommufd_viommu_ops mock_viommu_ops = {
 	.free = mock_viommu_free,
 	.vdevice_alloc = mock_vdevice_alloc,
+	.cache_invalidate = mock_viommu_cache_invalidate,
 };
 
 static struct iommufd_viommu *
@@ -627,6 +699,9 @@ static const struct iommu_ops mock_ops = {
 	.dev_disable_feat = mock_dev_disable_feat,
 	.user_pasid_table = true,
 	.viommu_alloc = mock_viommu_alloc,
+	.default_viommu_ops = &(struct iommufd_viommu_ops){
+		.cache_invalidate = mock_viommu_cache_invalidate,
+	},
 	.default_domain_ops =
 		&(struct iommu_domain_ops){
 			.free = mock_domain_free,
@@ -759,7 +834,7 @@ static void mock_dev_release(struct device *dev)
 static struct mock_dev *mock_dev_create(unsigned long dev_flags)
 {
 	struct mock_dev *mdev;
-	int rc;
+	int rc, i;
 
 	if (dev_flags &
 	    ~(MOCK_FLAGS_DEVICE_NO_DIRTY | MOCK_FLAGS_DEVICE_HUGE_IOVA))
@@ -773,6 +848,8 @@ static struct mock_dev *mock_dev_create(unsigned long dev_flags)
 	mdev->flags = dev_flags;
 	mdev->dev.release = mock_dev_release;
 	mdev->dev.bus = &iommufd_mock_bus_type.bus;
+	for (i = 0; i < MOCK_DEV_CACHE_NUM; i++)
+		mdev->cache[i] = IOMMU_TEST_DEV_CACHE_DEFAULT;
 
 	rc = ida_alloc(&mock_dev_ida, GFP_KERNEL);
 	if (rc < 0)
-- 
2.43.0

[PATCH v3 11/16] iommufd/selftest: Add IOMMU_TEST_OP_DEV_CHECK_CACHE test command

Posted by Nicolin Chen 1 month, 2 weeks ago

Similar to IOMMU_TEST_OP_MD_CHECK_IOTLB verifying a mock_domain's iotlb,
IOMMU_TEST_OP_DEV_CHECK_CACHE will be used to verify a mock_dev's cache.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/iommufd/iommufd_test.h          |  5 ++++
 tools/testing/selftests/iommu/iommufd_utils.h | 24 +++++++++++++++++++
 drivers/iommu/iommufd/selftest.c              | 24 +++++++++++++++++++
 tools/testing/selftests/iommu/iommufd.c       |  7 +++++-
 4 files changed, 59 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h
index 05f57a968d25..b226636aa07a 100644
--- a/drivers/iommu/iommufd/iommufd_test.h
+++ b/drivers/iommu/iommufd/iommufd_test.h
@@ -23,6 +23,7 @@ enum {
 	IOMMU_TEST_OP_DIRTY,
 	IOMMU_TEST_OP_MD_CHECK_IOTLB,
 	IOMMU_TEST_OP_TRIGGER_IOPF,
+	IOMMU_TEST_OP_DEV_CHECK_CACHE,
 };
 
 enum {
@@ -140,6 +141,10 @@ struct iommu_test_cmd {
 			__u32 perm;
 			__u64 addr;
 		} trigger_iopf;
+		struct {
+			__u32 id;
+			__u32 cache;
+		} check_dev_cache;
 	};
 	__u32 last;
 };
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h
index ccd1f65df0b0..b3da8c96a828 100644
--- a/tools/testing/selftests/iommu/iommufd_utils.h
+++ b/tools/testing/selftests/iommu/iommufd_utils.h
@@ -234,6 +234,30 @@ static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id, __u32 ft_i
 			test_cmd_hwpt_check_iotlb(hwpt_id, i, expected);       \
 	})
 
+#define test_cmd_dev_check_cache(device_id, cache_id, expected)                \
+	({                                                                     \
+		struct iommu_test_cmd test_cmd = {                             \
+			.size = sizeof(test_cmd),                              \
+			.op = IOMMU_TEST_OP_DEV_CHECK_CACHE,                   \
+			.id = device_id,                                       \
+			.check_dev_cache = {                                   \
+				.id = cache_id,                                \
+				.cache = expected,                             \
+			},                                                     \
+		};                                                             \
+		ASSERT_EQ(0,                                                   \
+			  ioctl(self->fd,                                      \
+				_IOMMU_TEST_CMD(IOMMU_TEST_OP_DEV_CHECK_CACHE),\
+				&test_cmd));                                   \
+	})
+
+#define test_cmd_dev_check_cache_all(device_id, expected)                      \
+	({                                                                     \
+		int c;                                                         \
+		for (c = 0; c < MOCK_DEV_CACHE_NUM; c++)                       \
+			test_cmd_dev_check_cache(device_id, c, expected);      \
+	})
+
 static int _test_cmd_hwpt_invalidate(int fd, __u32 hwpt_id, void *reqs,
 				     uint32_t data_type, uint32_t lreq,
 				     uint32_t *nreqs)
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c
index 8a1aef857922..ac3836c1dbcd 100644
--- a/drivers/iommu/iommufd/selftest.c
+++ b/drivers/iommu/iommufd/selftest.c
@@ -1106,6 +1106,26 @@ static int iommufd_test_md_check_iotlb(struct iommufd_ucmd *ucmd,
 	return rc;
 }
 
+static int iommufd_test_dev_check_cache(struct iommufd_ucmd *ucmd,
+					u32 idev_id, unsigned int cache_id,
+					u32 cache)
+{
+	struct iommufd_device *idev;
+	struct mock_dev *mdev;
+	int rc = 0;
+
+	idev = iommufd_get_device(ucmd, idev_id);
+	if (IS_ERR(idev))
+		return PTR_ERR(idev);
+	mdev = container_of(idev->dev, struct mock_dev, dev);
+
+	if (cache_id > MOCK_DEV_CACHE_ID_MAX ||
+	    mdev->cache[cache_id] != cache)
+		rc = -EINVAL;
+	iommufd_put_object(ucmd->ictx, &idev->obj);
+	return rc;
+}
+
 struct selftest_access {
 	struct iommufd_access *access;
 	struct file *file;
@@ -1615,6 +1635,10 @@ int iommufd_test(struct iommufd_ucmd *ucmd)
 		return iommufd_test_md_check_iotlb(ucmd, cmd->id,
 						   cmd->check_iotlb.id,
 						   cmd->check_iotlb.iotlb);
+	case IOMMU_TEST_OP_DEV_CHECK_CACHE:
+		return iommufd_test_dev_check_cache(ucmd, cmd->id,
+						    cmd->check_dev_cache.id,
+						    cmd->check_dev_cache.cache);
 	case IOMMU_TEST_OP_CREATE_ACCESS:
 		return iommufd_test_create_access(ucmd, cmd->id,
 						  cmd->create_access.flags);
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index 2651e2b58423..d5c5e3389182 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -222,6 +222,8 @@ FIXTURE_SETUP(iommufd_ioas)
 	for (i = 0; i != variant->mock_domains; i++) {
 		test_cmd_mock_domain(self->ioas_id, &self->stdev_id,
 				     &self->hwpt_id, &self->device_id);
+		test_cmd_dev_check_cache_all(self->device_id,
+					     IOMMU_TEST_DEV_CACHE_DEFAULT);
 		self->base_iova = MOCK_APERTURE_START;
 	}
 }
@@ -1386,9 +1388,12 @@ FIXTURE_SETUP(iommufd_mock_domain)
 
 	ASSERT_GE(ARRAY_SIZE(self->hwpt_ids), variant->mock_domains);
 
-	for (i = 0; i != variant->mock_domains; i++)
+	for (i = 0; i != variant->mock_domains; i++) {
 		test_cmd_mock_domain(self->ioas_id, &self->stdev_ids[i],
 				     &self->hwpt_ids[i], &self->idev_ids[i]);
+		test_cmd_dev_check_cache_all(self->idev_ids[0],
+					     IOMMU_TEST_DEV_CACHE_DEFAULT);
+	}
 	self->hwpt_id = self->hwpt_ids[0];
 
 	self->mmap_flags = MAP_SHARED | MAP_ANONYMOUS;
-- 
2.43.0

[PATCH v3 12/16] iommufd/selftest: Add vIOMMU coverage for IOMMU_HWPT_INVALIDATE ioctl

Posted by Nicolin Chen 1 month, 2 weeks ago

Add a viommu_cache test function to cover vIOMMU invalidations using the
updated IOMMU_HWPT_INVALIDATE ioctl, which now allows passing in a vIOMMU
via its hwpt_id field.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 tools/testing/selftests/iommu/iommufd_utils.h |  32 ++++
 tools/testing/selftests/iommu/iommufd.c       | 173 ++++++++++++++++++
 2 files changed, 205 insertions(+)

diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h
index b3da8c96a828..29e9cd1fdd82 100644
--- a/tools/testing/selftests/iommu/iommufd_utils.h
+++ b/tools/testing/selftests/iommu/iommufd_utils.h
@@ -289,6 +289,38 @@ static int _test_cmd_hwpt_invalidate(int fd, __u32 hwpt_id, void *reqs,
 					     data_type, lreq, nreqs));   \
 	})
 
+static int _test_cmd_viommu_invalidate(int fd, __u32 viommu_id, void *reqs,
+				       uint32_t data_type, uint32_t lreq,
+				       uint32_t *nreqs)
+{
+	struct iommu_hwpt_invalidate cmd = {
+		.size = sizeof(cmd),
+		.hwpt_id = viommu_id,
+		.data_type = data_type,
+		.data_uptr = (uint64_t)reqs,
+		.entry_len = lreq,
+		.entry_num = *nreqs,
+	};
+	int rc = ioctl(fd, IOMMU_HWPT_INVALIDATE, &cmd);
+	*nreqs = cmd.entry_num;
+	return rc;
+}
+
+#define test_cmd_viommu_invalidate(viommu, reqs, lreq, nreqs)                  \
+	({                                                                     \
+		ASSERT_EQ(0,                                                   \
+			  _test_cmd_viommu_invalidate(self->fd, viommu, reqs,  \
+					IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, \
+					lreq, nreqs));                         \
+	})
+#define test_err_viommu_invalidate(_errno, viommu_id, reqs, data_type, lreq,   \
+				 nreqs)                                        \
+	({                                                                     \
+		EXPECT_ERRNO(_errno, _test_cmd_viommu_invalidate(              \
+					     self->fd, viommu_id, reqs,        \
+					     data_type, lreq, nreqs));         \
+	})
+
 static int _test_cmd_access_replace_ioas(int fd, __u32 access_id,
 					 unsigned int ioas_id)
 {
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index d5c5e3389182..5a41bac3c1ab 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -2495,4 +2495,177 @@ TEST_F(iommufd_viommu, vdevice_alloc)
 	}
 }
 
+TEST_F(iommufd_viommu, vdevice_cache)
+{
+	struct iommu_viommu_invalidate_selftest inv_reqs[2] = {};
+	uint32_t viommu_id = self->viommu_id;
+	uint32_t dev_id = self->device_id;
+	uint32_t vdev_id = 0;
+	uint32_t num_inv;
+
+	if (dev_id) {
+		test_cmd_vdevice_alloc(viommu_id, dev_id, 0x99, &vdev_id);
+
+		test_cmd_dev_check_cache_all(dev_id,
+					     IOMMU_TEST_DEV_CACHE_DEFAULT);
+
+		/* Check data_type by passing zero-length array */
+		num_inv = 0;
+		test_cmd_viommu_invalidate(viommu_id, inv_reqs,
+					   sizeof(*inv_reqs), &num_inv);
+		assert(!num_inv);
+
+		/* Negative test: Invalid data_type */
+		num_inv = 1;
+		test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs,
+					   IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST_INVALID,
+					   sizeof(*inv_reqs), &num_inv);
+		assert(!num_inv);
+
+		/* Negative test: structure size sanity */
+		num_inv = 1;
+		test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs,
+					   IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
+					   sizeof(*inv_reqs) + 1, &num_inv);
+		assert(!num_inv);
+
+		num_inv = 1;
+		test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs,
+					   IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
+					   1, &num_inv);
+		assert(!num_inv);
+
+		/* Negative test: invalid flag is passed */
+		num_inv = 1;
+		inv_reqs[0].flags = 0xffffffff;
+		inv_reqs[0].vdev_id = 0x99;
+		test_err_viommu_invalidate(EOPNOTSUPP, viommu_id, inv_reqs,
+					   IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
+					   sizeof(*inv_reqs), &num_inv);
+		assert(!num_inv);
+
+		/* Negative test: invalid data_uptr when array is not empty */
+		num_inv = 1;
+		inv_reqs[0].flags = 0;
+		inv_reqs[0].vdev_id = 0x99;
+		test_err_viommu_invalidate(EINVAL, viommu_id, NULL,
+					   IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
+					   sizeof(*inv_reqs), &num_inv);
+		assert(!num_inv);
+
+		/* Negative test: invalid entry_len when array is not empty */
+		num_inv = 1;
+		inv_reqs[0].flags = 0;
+		inv_reqs[0].vdev_id = 0x99;
+		test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs,
+					   IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
+					   0, &num_inv);
+		assert(!num_inv);
+
+		/* Negative test: invalid cache_id */
+		num_inv = 1;
+		inv_reqs[0].flags = 0;
+		inv_reqs[0].vdev_id = 0x99;
+		inv_reqs[0].cache_id = MOCK_DEV_CACHE_ID_MAX + 1;
+		test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs,
+					   IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
+					   sizeof(*inv_reqs), &num_inv);
+		assert(!num_inv);
+
+		/* Negative test: invalid vdev_id */
+		num_inv = 1;
+		inv_reqs[0].flags = 0;
+		inv_reqs[0].vdev_id = 0x9;
+		inv_reqs[0].cache_id = 0;
+		test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs,
+					   IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
+					   sizeof(*inv_reqs), &num_inv);
+		assert(!num_inv);
+
+		/*
+		 * Invalidate the 1st cache entry but fail the 2nd request
+		 * due to invalid flags configuration in the 2nd request.
+		 */
+		num_inv = 2;
+		inv_reqs[0].flags = 0;
+		inv_reqs[0].vdev_id = 0x99;
+		inv_reqs[0].cache_id = 0;
+		inv_reqs[1].flags = 0xffffffff;
+		inv_reqs[1].vdev_id = 0x99;
+		inv_reqs[1].cache_id = 1;
+		test_err_viommu_invalidate(EOPNOTSUPP, viommu_id, inv_reqs,
+					   IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
+					   sizeof(*inv_reqs), &num_inv);
+		assert(num_inv == 1);
+		test_cmd_dev_check_cache(dev_id, 0, 0);
+		test_cmd_dev_check_cache(dev_id, 1,
+					 IOMMU_TEST_DEV_CACHE_DEFAULT);
+		test_cmd_dev_check_cache(dev_id, 2,
+					 IOMMU_TEST_DEV_CACHE_DEFAULT);
+		test_cmd_dev_check_cache(dev_id, 3,
+					 IOMMU_TEST_DEV_CACHE_DEFAULT);
+
+		/*
+		 * Invalidate the 1st cache entry but fail the 2nd request
+		 * due to invalid cache_id configuration in the 2nd request.
+		 */
+		num_inv = 2;
+		inv_reqs[0].flags = 0;
+		inv_reqs[0].vdev_id = 0x99;
+		inv_reqs[0].cache_id = 0;
+		inv_reqs[1].flags = 0;
+		inv_reqs[1].vdev_id = 0x99;
+		inv_reqs[1].cache_id = MOCK_DEV_CACHE_ID_MAX + 1;
+		test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs,
+					   IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST,
+					   sizeof(*inv_reqs), &num_inv);
+		assert(num_inv == 1);
+		test_cmd_dev_check_cache(dev_id, 0, 0);
+		test_cmd_dev_check_cache(dev_id, 1,
+					 IOMMU_TEST_DEV_CACHE_DEFAULT);
+		test_cmd_dev_check_cache(dev_id, 2,
+					 IOMMU_TEST_DEV_CACHE_DEFAULT);
+		test_cmd_dev_check_cache(dev_id, 3,
+					 IOMMU_TEST_DEV_CACHE_DEFAULT);
+
+		/* Invalidate the 2nd cache entry and verify */
+		num_inv = 1;
+		inv_reqs[0].flags = 0;
+		inv_reqs[0].vdev_id = 0x99;
+		inv_reqs[0].cache_id = 1;
+		test_cmd_viommu_invalidate(viommu_id, inv_reqs,
+					   sizeof(*inv_reqs), &num_inv);
+		assert(num_inv == 1);
+		test_cmd_dev_check_cache(dev_id, 0, 0);
+		test_cmd_dev_check_cache(dev_id, 1, 0);
+		test_cmd_dev_check_cache(dev_id, 2,
+					 IOMMU_TEST_DEV_CACHE_DEFAULT);
+		test_cmd_dev_check_cache(dev_id, 3,
+					 IOMMU_TEST_DEV_CACHE_DEFAULT);
+
+		/* Invalidate the 3rd and 4th cache entries and verify */
+		num_inv = 2;
+		inv_reqs[0].flags = 0;
+		inv_reqs[0].vdev_id = 0x99;
+		inv_reqs[0].cache_id = 2;
+		inv_reqs[1].flags = 0;
+		inv_reqs[1].vdev_id = 0x99;
+		inv_reqs[1].cache_id = 3;
+		test_cmd_viommu_invalidate(viommu_id, inv_reqs,
+					   sizeof(*inv_reqs), &num_inv);
+		assert(num_inv == 2);
+		test_cmd_dev_check_cache_all(dev_id, 0);
+
+		/* Invalidate all cache entries for nested_dev_id[1] and verify */
+		num_inv = 1;
+		inv_reqs[0].vdev_id = 0x99;
+		inv_reqs[0].flags = IOMMU_TEST_INVALIDATE_FLAG_ALL;
+		test_cmd_viommu_invalidate(viommu_id, inv_reqs,
+					   sizeof(*inv_reqs), &num_inv);
+		assert(num_inv == 1);
+		test_cmd_dev_check_cache_all(dev_id, 0);
+		test_ioctl_destroy(vdev_id);
+	}
+}
+
 TEST_HARNESS_MAIN
-- 
2.43.0

[PATCH v3 13/16] Documentation: userspace-api: iommufd: Update vDEVICE

Posted by Nicolin Chen 1 month, 2 weeks ago

With the introduction of the new object and its infrastructure, update the
doc and the vIOMMU graph to reflect that.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 Documentation/userspace-api/iommufd.rst | 58 ++++++++++++++++++-------
 1 file changed, 42 insertions(+), 16 deletions(-)

diff --git a/Documentation/userspace-api/iommufd.rst b/Documentation/userspace-api/iommufd.rst
index 37eb1adda57b..3c27cc92c2cb 100644
--- a/Documentation/userspace-api/iommufd.rst
+++ b/Documentation/userspace-api/iommufd.rst
@@ -94,6 +94,19 @@ Following IOMMUFD objects are exposed to userspace:
       backed by corresponding vIOMMU objects, in which case a guest OS would do
       the "dispatch" naturally instead of VMM trappings.
 
+ - IOMMUFD_OBJ_VDEVICE, representing a virtual device for an IOMMUFD_OBJ_DEVICE
+   against an IOMMUFD_OBJ_VIOMMU. This virtual device holds the device's virtual
+   information or attributes (related to the vIOMMU) in a VM. An immediate vDATA
+   example can be the virtual ID of the device on a vIOMMU, which is a unique ID
+   that VMM assigns to the device for a translation channel/port of the vIOMMU,
+   e.g. vSID of ARM SMMUv3, vDeviceID of AMD IOMMU, and vID of Intel VT-d to a
+   Context Table. Potential use cases of some advanced security information can
+   be forwarded via this object too, such as security level or realm information
+   in a Confidential Compute Architecture. A VMM should create a vDEVICE object
+   to forward all the device information in a VM, when it connects a device to a
+   vIOMMU, which is a separate ioctl call from attaching the same device to an
+   HWPT_PAGING that the vIOMMU holds.
+
 All user-visible objects are destroyed via the IOMMU_DESTROY uAPI.
 
 The diagrams below show relationships between user-visible objects and kernel
@@ -133,23 +146,26 @@ creating the objects and links::
                          |____________|     |____________|     |______|
 
   _______________________________________________________________________
- |                      iommufd (with vIOMMU)                            |
+ |                      iommufd (with vIOMMU/vDEVICE)                    |
  |                                                                       |
- |                             [5]                                       |
- |                        _____________                                  |
- |                       |             |                                 |
- |        [1]            |    vIOMMU   |          [4]             [2]    |
- |  ________________     |             |     _____________     ________  |
- | |                |    |     [3]     |    |             |   |        | |
- | |      IOAS      |<---|(HWPT_PAGING)|<---| HWPT_NESTED |<--| DEVICE | |
- | |________________|    |_____________|    |_____________|   |________| |
- |         |                    |                  |               |     |
- |_________|____________________|__________________|_______________|_____|
-           |                    |                  |               |
-           |              ______v_____       ______v_____       ___v__
-           | PFN storage |  (paging)  |     |  (nested)  |     |struct|
-           |------------>|iommu_domain|<----|iommu_domain|<----|device|
-                         |____________|     |____________|     |______|
+ |                             [5]                [6]                    |
+ |                        _____________      _____________               |
+ |                       |             |    |             |              |
+ |      |----------------|    vIOMMU   |<---|   vDEVICE   |<----|        |
+ |      |                |             |    |_____________|     |        |
+ |      |                |             |                        |        |
+ |      |      [1]       |             |          [4]           | [2]    |
+ |      |     ______     |             |     _____________     _|______  |
+ |      |    |      |    |     [3]     |    |             |   |        | |
+ |      |    | IOAS |<---|(HWPT_PAGING)|<---| HWPT_NESTED |<--| DEVICE | |
+ |      |    |______|    |_____________|    |_____________|   |________| |
+ |      |        |              |                  |               |     |
+ |______|________|______________|__________________|_______________|_____|
+        |        |              |                  |               |
+  ______v_____   |        ______v_____       ______v_____       ___v__
+ |   struct   |  |  PFN  |  (paging)  |     |  (nested)  |     |struct|
+ |iommu_device|  |------>|iommu_domain|<----|iommu_domain|<----|device|
+ |____________|   storage|____________|     |____________|     |______|
 
 1. IOMMUFD_OBJ_IOAS is created via the IOMMU_IOAS_ALLOC uAPI. An iommufd can
    hold multiple IOAS objects. IOAS is the most generic object and does not
@@ -212,6 +228,15 @@ creating the objects and links::
    the vIOMMU object and the HWPT_PAGING, then this vIOMMU object can be used
    as a nesting parent object to allocate an HWPT_NESTED object described above.
 
+6. IOMMUFD_OBJ_VDEVICE can be only manually created via the IOMMU_VDEVICE_ALLOC
+   uAPI, provided a viommu_id for an iommufd_viommu object and a dev_id for an
+   iommufd_device object. The vDEVICE object will be the binding between these
+   two parent objects. Another @virt_id will be also set via the uAPI providing
+   the iommufd core an index to store the vDEVICE object to a vDEVICE array per
+   vIOMMU. If necessary, the IOMMU driver may choose to implement a vdevce_alloc
+   op to init its HW for virtualization feature related to a vDEVICE. Successful
+   completion of this operation sets up the linkages between vIOMMU and device.
+
 A device can only bind to an iommufd due to DMA ownership claim and attach to at
 most one IOAS object (no support of PASID yet).
 
@@ -225,6 +250,7 @@ User visible objects are backed by following datastructures:
 - iommufd_hwpt_paging for IOMMUFD_OBJ_HWPT_PAGING.
 - iommufd_hwpt_nested for IOMMUFD_OBJ_HWPT_NESTED.
 - iommufd_viommu for IOMMUFD_OBJ_VIOMMU.
+- iommufd_vdevice for IOMMUFD_OBJ_VDEVICE
 
 Several terminologies when looking at these datastructures:
 
-- 
2.43.0

Re: [PATCH v3 13/16] Documentation: userspace-api: iommufd: Update vDEVICE

Posted by Jason Gunthorpe 1 month, 1 week ago

On Wed, Oct 09, 2024 at 09:38:25AM -0700, Nicolin Chen wrote:
> With the introduction of the new object and its infrastructure, update the
> doc and the vIOMMU graph to reflect that.
> 
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> ---
>  Documentation/userspace-api/iommufd.rst | 58 ++++++++++++++++++-------
>  1 file changed, 42 insertions(+), 16 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason

[PATCH v3 14/16] iommu/arm-smmu-v3: Add arm_vsmmu_cache_invalidate

Posted by Nicolin Chen 1 month, 2 weeks ago

Implement the vIOMMU's cache_invalidate op for user space to invalidate the
IOTLB entries, Device ATS and CD entries that are still cached by hardware.

Add struct iommu_viommu_arm_smmuv3_invalidate defining invalidation entries
that are simply in the native format of a 128-bit TLBI command. Scan those
commands against the permitted command list and fix their VMID/SID fields.

Co-developed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Co-developed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |   5 +
 include/uapi/linux/iommufd.h                  |  24 ++++
 .../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c     | 131 +++++++++++++++++-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   |   6 +-
 4 files changed, 162 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 844d1dfdea55..000af931a30c 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -529,6 +529,7 @@ struct arm_smmu_cmdq_ent {
 		#define CMDQ_OP_TLBI_NH_ALL     0x10
 		#define CMDQ_OP_TLBI_NH_ASID	0x11
 		#define CMDQ_OP_TLBI_NH_VA	0x12
+		#define CMDQ_OP_TLBI_NH_VAA	0x13
 		#define CMDQ_OP_TLBI_EL2_ALL	0x20
 		#define CMDQ_OP_TLBI_EL2_ASID	0x21
 		#define CMDQ_OP_TLBI_EL2_VA	0x22
@@ -949,6 +950,10 @@ void arm_smmu_attach_commit(struct arm_smmu_attach_state *state);
 void arm_smmu_install_ste_for_dev(struct arm_smmu_master *master,
 				  const struct arm_smmu_ste *target);
 
+int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
+				struct arm_smmu_cmdq *cmdq, u64 *cmds, int n,
+				bool sync);
+
 #ifdef CONFIG_ARM_SMMU_V3_SVA
 bool arm_smmu_sva_supported(struct arm_smmu_device *smmu);
 bool arm_smmu_master_sva_supported(struct arm_smmu_master *master);
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index cd9e135ef563..d9e510ce67cf 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -684,9 +684,11 @@ struct iommu_hwpt_get_dirty_bitmap {
  * enum iommu_hwpt_invalidate_data_type - IOMMU HWPT Cache Invalidation
  *                                        Data Type
  * @IOMMU_HWPT_INVALIDATE_DATA_VTD_S1: Invalidation data for VTD_S1
+ * @IOMMU_VIOMMU_INVALIDATE_DATA_ARM_SMMUV3: Invalidation data for ARM SMMUv3
  */
 enum iommu_hwpt_invalidate_data_type {
 	IOMMU_HWPT_INVALIDATE_DATA_VTD_S1 = 0,
+	IOMMU_VIOMMU_INVALIDATE_DATA_ARM_SMMUV3 = 1,
 };
 
 /**
@@ -725,6 +727,28 @@ struct iommu_hwpt_vtd_s1_invalidate {
 	__u32 __reserved;
 };
 
+/**
+ * struct iommu_viommu_arm_smmuv3_invalidate - ARM SMMUv3 cahce invalidation
+ *         (IOMMU_VIOMMU_INVALIDATE_DATA_ARM_SMMUV3)
+ * @cmd: 128-bit cache invalidation command that runs in SMMU CMDQ.
+ *       Must be little-endian.
+ *
+ * Supported command list only when passing in a vIOMMU via @hwpt_id:
+ *     CMDQ_OP_TLBI_NSNH_ALL
+ *     CMDQ_OP_TLBI_NH_VA
+ *     CMDQ_OP_TLBI_NH_VAA
+ *     CMDQ_OP_TLBI_NH_ALL
+ *     CMDQ_OP_TLBI_NH_ASID
+ *     CMDQ_OP_ATC_INV
+ *     CMDQ_OP_CFGI_CD
+ *     CMDQ_OP_CFGI_CD_ALL
+ *
+ * -EIO will be returned if the command is not supported.
+ */
+struct iommu_viommu_arm_smmuv3_invalidate {
+	__aligned_le64 cmd[2];
+};
+
 /**
  * struct iommu_hwpt_invalidate - ioctl(IOMMU_HWPT_INVALIDATE)
  * @size: sizeof(struct iommu_hwpt_invalidate)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
index 5e235fca8f13..1b82579eb252 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
@@ -191,6 +191,14 @@ arm_smmu_domain_alloc_nesting(struct device *dev, u32 flags,
 	if (smmu_parent->smmu != master->smmu)
 		return ERR_PTR(-EINVAL);
 
+	/*
+	 * FORCE_SYNC is not set with FEAT_NESTING. Some study of the exact HW
+	 * defect is needed to determine if arm_vsmmu_cache_invalidate() needs
+	 * any change to remove this.
+	 */
+	if (WARN_ON(master->smmu->options & ARM_SMMU_OPT_CMDQ_FORCE_SYNC))
+		return ERR_PTR(-EOPNOTSUPP);
+
 	ret = iommu_copy_struct_from_user(&arg, user_data,
 					  IOMMU_HWPT_DATA_ARM_SMMUV3, ste);
 	if (ret)
@@ -213,6 +221,127 @@ arm_smmu_domain_alloc_nesting(struct device *dev, u32 flags,
 	return &nested_domain->domain;
 }
 
+static int arm_vsmmu_vsid_to_sid(struct arm_vsmmu *vsmmu, u32 vsid, u32 *sid)
+{
+	XA_STATE(xas, &vsmmu->core.vdevs, (unsigned long)vsid);
+	struct arm_smmu_master *master;
+	struct device *dev;
+	int ret = 0;
+
+	xa_lock(&vsmmu->core.vdevs);
+	dev = vdev_to_dev(xas_load(&xas));
+	if (!dev) {
+		ret = -EIO;
+		goto unlock;
+	}
+	master = dev_iommu_priv_get(dev);
+
+	/* At this moment, iommufd only supports PCI device that has one SID */
+	if (sid)
+		*sid = master->streams[0].id;
+unlock:
+	xa_unlock(&vsmmu->core.vdevs);
+	return ret;
+}
+
+/*
+ * Convert, in place, the raw invalidation command into an internal format that
+ * can be passed to arm_smmu_cmdq_issue_cmdlist(). Internally commands are
+ * stored in CPU endian.
+ *
+ * Enforce the VMID or SID on the command.
+ */
+static int
+arm_vsmmu_convert_user_cmd(struct arm_vsmmu *vsmmu,
+			   struct iommu_viommu_arm_smmuv3_invalidate *cmd)
+{
+	cmd->cmd[0] = le64_to_cpu(cmd->cmd[0]);
+	cmd->cmd[1] = le64_to_cpu(cmd->cmd[1]);
+
+	switch (cmd->cmd[0] & CMDQ_0_OP) {
+	case CMDQ_OP_TLBI_NSNH_ALL:
+		/* Convert to NH_ALL */
+		cmd->cmd[0] = CMDQ_OP_TLBI_NH_ALL |
+			      FIELD_PREP(CMDQ_TLBI_0_VMID, vsmmu->vmid);
+		cmd->cmd[1] = 0;
+		break;
+	case CMDQ_OP_TLBI_NH_VA:
+	case CMDQ_OP_TLBI_NH_VAA:
+	case CMDQ_OP_TLBI_NH_ALL:
+	case CMDQ_OP_TLBI_NH_ASID:
+		cmd->cmd[0] &= ~CMDQ_TLBI_0_VMID;
+		cmd->cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, vsmmu->vmid);
+		break;
+	case CMDQ_OP_ATC_INV:
+	case CMDQ_OP_CFGI_CD:
+	case CMDQ_OP_CFGI_CD_ALL:
+		u32 sid, vsid = FIELD_GET(CMDQ_CFGI_0_SID, cmd->cmd[0]);
+
+		if (arm_vsmmu_vsid_to_sid(vsmmu, vsid, &sid))
+			return -EIO;
+		cmd->cmd[0] &= ~CMDQ_CFGI_0_SID;
+		cmd->cmd[0] |= FIELD_PREP(CMDQ_CFGI_0_SID, sid);
+		break;
+	default:
+		return -EIO;
+	}
+	return 0;
+}
+
+static int arm_vsmmu_cache_invalidate(struct iommufd_viommu *viommu,
+				      struct iommu_user_data_array *array)
+{
+	struct arm_vsmmu *vsmmu = container_of(viommu, struct arm_vsmmu, core);
+	struct iommu_viommu_arm_smmuv3_invalidate *last;
+	struct iommu_viommu_arm_smmuv3_invalidate *cmds;
+	struct iommu_viommu_arm_smmuv3_invalidate *cur;
+	struct iommu_viommu_arm_smmuv3_invalidate *end;
+	struct arm_smmu_device *smmu = vsmmu->smmu;
+	int ret;
+
+	cmds = kcalloc(array->entry_num, sizeof(*cmds), GFP_KERNEL);
+	if (!cmds)
+		return -ENOMEM;
+	cur = cmds;
+	end = cmds + array->entry_num;
+
+	static_assert(sizeof(*cmds) == 2 * sizeof(u64));
+	ret = iommu_copy_struct_from_full_user_array(
+		cmds, sizeof(*cmds), array,
+		IOMMU_VIOMMU_INVALIDATE_DATA_ARM_SMMUV3);
+	if (ret)
+		goto out;
+
+	last = cmds;
+	while (cur != end) {
+		ret = arm_vsmmu_convert_user_cmd(vsmmu, cur);
+		if (ret)
+			goto out;
+
+		/* FIXME work in blocks of CMDQ_BATCH_ENTRIES and copy each block? */
+		cur++;
+		if (cur != end && (cur - last) != CMDQ_BATCH_ENTRIES - 1)
+			continue;
+
+		/* FIXME always uses the main cmdq rather than trying to group by type */
+		ret = arm_smmu_cmdq_issue_cmdlist(smmu, &smmu->cmdq, last->cmd,
+						  cur - last, true);
+		if (ret) {
+			cur--;
+			goto out;
+		}
+		last = cur;
+	}
+out:
+	array->entry_num = cur - cmds;
+	kfree(cmds);
+	return ret;
+}
+
+const struct iommufd_viommu_ops arm_vsmmu_ops = {
+	.cache_invalidate = arm_vsmmu_cache_invalidate,
+};
+
 struct iommufd_viommu *
 arm_vsmmu_alloc(struct iommu_device *iommu_dev, struct iommu_domain *parent,
 		struct iommufd_ctx *ictx, unsigned int viommu_type)
@@ -225,7 +354,7 @@ arm_vsmmu_alloc(struct iommu_device *iommu_dev, struct iommu_domain *parent,
 	if (viommu_type != IOMMU_VIOMMU_TYPE_ARM_SMMUV3)
 		return ERR_PTR(-EOPNOTSUPP);
 
-	vsmmu = iommufd_viommu_alloc(ictx, arm_vsmmu, core, NULL);
+	vsmmu = iommufd_viommu_alloc(ictx, arm_vsmmu, core, &arm_vsmmu_ops);
 	if (IS_ERR(vsmmu))
 		return ERR_CAST(vsmmu);
 
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 6a23e6dcd5cf..a2bbd140e232 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -766,9 +766,9 @@ static void arm_smmu_cmdq_write_entries(struct arm_smmu_cmdq *cmdq, u64 *cmds,
  *   insert their own list of commands then all of the commands from one
  *   CPU will appear before any of the commands from the other CPU.
  */
-static int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
-				       struct arm_smmu_cmdq *cmdq,
-				       u64 *cmds, int n, bool sync)
+int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
+				struct arm_smmu_cmdq *cmdq, u64 *cmds, int n,
+				bool sync)
 {
 	u64 cmd_sync[CMDQ_ENT_DWORDS];
 	u32 prod;
-- 
2.43.0

Re: [PATCH v3 14/16] iommu/arm-smmu-v3: Add arm_vsmmu_cache_invalidate

Posted by Zhangfei Gao 1 month, 2 weeks ago

On Thu, 10 Oct 2024 at 00:44, Nicolin Chen <nicolinc@nvidia.com> wrote:
>
> Implement the vIOMMU's cache_invalidate op for user space to invalidate the
> IOTLB entries, Device ATS and CD entries that are still cached by hardware.
>
> Add struct iommu_viommu_arm_smmuv3_invalidate defining invalidation entries
> that are simply in the native format of a 128-bit TLBI command. Scan those
> commands against the permitted command list and fix their VMID/SID fields.
>
> Co-developed-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> Co-developed-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |   5 +
>  include/uapi/linux/iommufd.h                  |  24 ++++
>  .../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c     | 131 +++++++++++++++++-
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   |   6 +-
>  4 files changed, 162 insertions(+), 4 deletions(-)
>

> +static int
> +arm_vsmmu_convert_user_cmd(struct arm_vsmmu *vsmmu,
> +                          struct iommu_viommu_arm_smmuv3_invalidate *cmd)
> +{
> +       cmd->cmd[0] = le64_to_cpu(cmd->cmd[0]);
> +       cmd->cmd[1] = le64_to_cpu(cmd->cmd[1]);
> +
> +       switch (cmd->cmd[0] & CMDQ_0_OP) {
> +       case CMDQ_OP_TLBI_NSNH_ALL:
> +               /* Convert to NH_ALL */
> +               cmd->cmd[0] = CMDQ_OP_TLBI_NH_ALL |
> +                             FIELD_PREP(CMDQ_TLBI_0_VMID, vsmmu->vmid);
> +               cmd->cmd[1] = 0;
> +               break;
> +       case CMDQ_OP_TLBI_NH_VA:
> +       case CMDQ_OP_TLBI_NH_VAA:
> +       case CMDQ_OP_TLBI_NH_ALL:
> +       case CMDQ_OP_TLBI_NH_ASID:
> +               cmd->cmd[0] &= ~CMDQ_TLBI_0_VMID;
> +               cmd->cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, vsmmu->vmid);
> +               break;
> +       case CMDQ_OP_ATC_INV:
> +       case CMDQ_OP_CFGI_CD:
> +       case CMDQ_OP_CFGI_CD_ALL:
> +               u32 sid, vsid = FIELD_GET(CMDQ_CFGI_0_SID, cmd->cmd[0]);

Here got build error

drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c:302:3: error: a
label can only be part of a statement and a declaration is not a
statement
  302 |   u32 sid, vsid = FIELD_GET(CMDQ_CFGI_0_SID, cmd->cmd[0]);
      |   ^~~

Need {} to include.
case CMDQ_OP_CFGI_CD_ALL: {
...
}

Thanks

Re: [PATCH v3 14/16] iommu/arm-smmu-v3: Add arm_vsmmu_cache_invalidate

Posted by Nicolin Chen 1 month, 2 weeks ago

On Sat, Oct 12, 2024 at 11:12:09AM +0800, Zhangfei Gao wrote:
> > +       case CMDQ_OP_ATC_INV:
> > +       case CMDQ_OP_CFGI_CD:
> > +       case CMDQ_OP_CFGI_CD_ALL:
> > +               u32 sid, vsid = FIELD_GET(CMDQ_CFGI_0_SID, cmd->cmd[0]);
> 
> Here got build error
> 
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c:302:3: error: a
> label can only be part of a statement and a declaration is not a
> statement
>   302 |   u32 sid, vsid = FIELD_GET(CMDQ_CFGI_0_SID, cmd->cmd[0]);
>       |   ^~~
> 
> Need {} to include.
> case CMDQ_OP_CFGI_CD_ALL: {
> ...
> }

Oh, yea. Will fix.

Thanks!
Nicolin

[PATCH v3 15/16] iommu/arm-smmu-v3: Allow ATS for IOMMU_DOMAIN_NESTED

Posted by Nicolin Chen 1 month, 2 weeks ago

From: Jason Gunthorpe <jgg@nvidia.com>

Now, ATC invalidation can be done with the vIOMMU invalidation op. A guest
owned IOMMU_DOMAIN_NESTED can do an ATS too. Allow it to pass in the EATS
field via the vSTE words.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  4 ++-
 include/uapi/linux/iommufd.h                  |  2 +-
 .../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c     | 31 ++++++++++++++++---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 27 +++++++++++++---
 4 files changed, 54 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 000af931a30c..470bc3ee25ef 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -305,7 +305,7 @@ static inline u32 arm_smmu_strtab_l2_idx(u32 sid)
 #define STRTAB_STE_1_NESTING_ALLOWED                            \
 	cpu_to_le64(STRTAB_STE_1_S1DSS | STRTAB_STE_1_S1CIR |   \
 		    STRTAB_STE_1_S1COR | STRTAB_STE_1_S1CSH |   \
-		    STRTAB_STE_1_S1STALLD)
+		    STRTAB_STE_1_S1STALLD | STRTAB_STE_1_EATS)
 
 /*
  * Context descriptors.
@@ -838,6 +838,7 @@ struct arm_smmu_domain {
 struct arm_smmu_nested_domain {
 	struct iommu_domain domain;
 	struct arm_smmu_domain *s2_parent;
+	bool enable_ats : 1;
 
 	__le64 ste[2];
 };
@@ -879,6 +880,7 @@ struct arm_smmu_master_domain {
 	struct list_head devices_elm;
 	struct arm_smmu_master *master;
 	ioasid_t ssid;
+	bool nested_ats_flush : 1;
 };
 
 static inline struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index d9e510ce67cf..9527a4ecfd56 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -404,7 +404,7 @@ struct iommu_hwpt_vtd_s1 {
  *       a user stage-1 Context Descriptor Table. Must be little-endian.
  *       Allowed fields: (Refer to "5.2 Stream Table Entry" in SMMUv3 HW Spec)
  *       - word-0: V, Cfg, S1Fmt, S1ContextPtr, S1CDMax
- *       - word-1: S1DSS, S1CIR, S1COR, S1CSH, S1STALLD
+ *       - word-1: EATS, S1DSS, S1CIR, S1COR, S1CSH, S1STALLD
  *
  * -EIO will be returned if @ste is not legal or contains any non-allowed field.
  * Cfg can be used to select a S1, Bypass or Abort configuration. A Bypass
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
index 1b82579eb252..b491017921df 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
@@ -103,8 +103,6 @@ static int arm_smmu_attach_dev_nested(struct iommu_domain *domain,
 		.master = master,
 		.old_domain = iommu_get_domain_for_dev(dev),
 		.ssid = IOMMU_NO_PASID,
-		/* Currently invalidation of ATC is not supported */
-		.disable_ats = true,
 	};
 	struct arm_smmu_ste ste;
 	int ret;
@@ -115,6 +113,15 @@ static int arm_smmu_attach_dev_nested(struct iommu_domain *domain,
 		return -EBUSY;
 
 	mutex_lock(&arm_smmu_asid_lock);
+	/*
+	 * The VM has to control the actual ATS state at the PCI device because
+	 * we forward the invalidations directly from the VM. If the VM doesn't
+	 * think ATS is on it will not generate ATC flushes and the ATC will
+	 * become incoherent. Since we can't access the actual virtual PCI ATS
+	 * config bit here base this off the EATS value in the STE. If the EATS
+	 * is set then the VM must generate ATC flushes.
+	 */
+	state.disable_ats = !nested_domain->enable_ats;
 	ret = arm_smmu_attach_prepare(&state, domain);
 	if (ret) {
 		mutex_unlock(&arm_smmu_asid_lock);
@@ -140,8 +147,10 @@ static const struct iommu_domain_ops arm_smmu_nested_ops = {
 	.free = arm_smmu_domain_nested_free,
 };
 
-static int arm_smmu_validate_vste(struct iommu_hwpt_arm_smmuv3 *arg)
+static int arm_smmu_validate_vste(struct iommu_hwpt_arm_smmuv3 *arg,
+				  bool *enable_ats)
 {
+	unsigned int eats;
 	unsigned int cfg;
 
 	if (!(arg->ste[0] & cpu_to_le64(STRTAB_STE_0_V))) {
@@ -158,6 +167,18 @@ static int arm_smmu_validate_vste(struct iommu_hwpt_arm_smmuv3 *arg)
 	if (cfg != STRTAB_STE_0_CFG_ABORT && cfg != STRTAB_STE_0_CFG_BYPASS &&
 	    cfg != STRTAB_STE_0_CFG_S1_TRANS)
 		return -EIO;
+
+	/*
+	 * Only Full ATS or ATS UR is supported
+	 * The EATS field will be set by arm_smmu_make_nested_domain_ste()
+	 */
+	eats = FIELD_GET(STRTAB_STE_1_EATS, le64_to_cpu(arg->ste[1]));
+	arg->ste[1] &= ~cpu_to_le64(STRTAB_STE_1_EATS);
+	if (eats != STRTAB_STE_1_EATS_ABT && eats != STRTAB_STE_1_EATS_TRANS)
+		return -EIO;
+
+	if (cfg == STRTAB_STE_0_CFG_S1_TRANS)
+		*enable_ats = (eats == STRTAB_STE_1_EATS_TRANS);
 	return 0;
 }
 
@@ -170,6 +191,7 @@ arm_smmu_domain_alloc_nesting(struct device *dev, u32 flags,
 	struct arm_smmu_nested_domain *nested_domain;
 	struct arm_smmu_domain *smmu_parent;
 	struct iommu_hwpt_arm_smmuv3 arg;
+	bool enable_ats = false;
 	int ret;
 
 	if (flags || !(master->smmu->features & ARM_SMMU_FEAT_NESTING))
@@ -204,7 +226,7 @@ arm_smmu_domain_alloc_nesting(struct device *dev, u32 flags,
 	if (ret)
 		return ERR_PTR(ret);
 
-	ret = arm_smmu_validate_vste(&arg);
+	ret = arm_smmu_validate_vste(&arg, &enable_ats);
 	if (ret)
 		return ERR_PTR(ret);
 
@@ -215,6 +237,7 @@ arm_smmu_domain_alloc_nesting(struct device *dev, u32 flags,
 	nested_domain->domain.type = IOMMU_DOMAIN_NESTED;
 	nested_domain->domain.ops = &arm_smmu_nested_ops;
 	nested_domain->s2_parent = smmu_parent;
+	nested_domain->enable_ats = enable_ats;
 	nested_domain->ste[0] = arg.ste[0];
 	nested_domain->ste[1] = arg.ste[1] & ~cpu_to_le64(STRTAB_STE_1_EATS);
 
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index a2bbd140e232..1cb4afe7a90a 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2107,7 +2107,16 @@ int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain,
 		if (!master->ats_enabled)
 			continue;
 
-		arm_smmu_atc_inv_to_cmd(master_domain->ssid, iova, size, &cmd);
+		if (master_domain->nested_ats_flush) {
+			/*
+			 * If a S2 used as a nesting parent is changed we have
+			 * no option but to completely flush the ATC.
+			 */
+			arm_smmu_atc_inv_to_cmd(IOMMU_NO_PASID, 0, 0, &cmd);
+		} else {
+			arm_smmu_atc_inv_to_cmd(master_domain->ssid, iova, size,
+						&cmd);
+		}
 
 		for (i = 0; i < master->num_streams; i++) {
 			cmd.atc.sid = master->streams[i].id;
@@ -2630,7 +2639,8 @@ static void arm_smmu_disable_pasid(struct arm_smmu_master *master)
 
 static struct arm_smmu_master_domain *
 arm_smmu_find_master_domain(struct arm_smmu_domain *smmu_domain,
-			    struct arm_smmu_master *master, ioasid_t ssid)
+			    struct arm_smmu_master *master, ioasid_t ssid,
+			    bool nested_ats_flush)
 {
 	struct arm_smmu_master_domain *master_domain;
 
@@ -2639,7 +2649,8 @@ arm_smmu_find_master_domain(struct arm_smmu_domain *smmu_domain,
 	list_for_each_entry(master_domain, &smmu_domain->devices,
 			    devices_elm) {
 		if (master_domain->master == master &&
-		    master_domain->ssid == ssid)
+		    master_domain->ssid == ssid &&
+		    master_domain->nested_ats_flush == nested_ats_flush)
 			return master_domain;
 	}
 	return NULL;
@@ -2670,13 +2681,18 @@ static void arm_smmu_remove_master_domain(struct arm_smmu_master *master,
 {
 	struct arm_smmu_domain *smmu_domain = to_smmu_domain_devices(domain);
 	struct arm_smmu_master_domain *master_domain;
+	bool nested_ats_flush = false;
 	unsigned long flags;
 
 	if (!smmu_domain)
 		return;
 
+	if (domain->type == IOMMU_DOMAIN_NESTED)
+		nested_ats_flush = to_smmu_nested_domain(domain)->enable_ats;
+
 	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
-	master_domain = arm_smmu_find_master_domain(smmu_domain, master, ssid);
+	master_domain = arm_smmu_find_master_domain(smmu_domain, master, ssid,
+						    nested_ats_flush);
 	if (master_domain) {
 		list_del(&master_domain->devices_elm);
 		kfree(master_domain);
@@ -2743,6 +2759,9 @@ int arm_smmu_attach_prepare(struct arm_smmu_attach_state *state,
 			return -ENOMEM;
 		master_domain->master = master;
 		master_domain->ssid = state->ssid;
+		if (new_domain->type == IOMMU_DOMAIN_NESTED)
+			master_domain->nested_ats_flush =
+				to_smmu_nested_domain(new_domain)->enable_ats;
 
 		/*
 		 * During prepare we want the current smmu_domain and new
-- 
2.43.0

[PATCH v3 16/16] iommu/arm-smmu-v3: Update comments about ATS and bypass

Posted by Nicolin Chen 1 month, 2 weeks ago

From: Jason Gunthorpe <jgg@nvidia.com>

The SMMUv3 spec has a note that BYPASS and ATS don't work together under
the STE EATS field definition. However there is another section "13.6.4
Full ATS skipping stage 1" that explains under certain conditions BYPASS
and ATS do work together if the STE is using S1DSS to select BYPASS and
the CD table has the possibility for a substream.

When these comments were written the understanding was that all forms of
BYPASS just didn't work and this was to be a future problem to solve.

It turns out that ATS and IDENTITY will always work just fine:

 - If STE.Config = BYPASS then the PCI ATS is disabled

 - If a PASID domain is attached then S1DSS = BYPASS and ATS will be
   enabled. This meets the requirements of 13.6.4 to automatically
   generate 1:1 ATS replies on the RID.

Update the comments to reflect this.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 1cb4afe7a90a..236f930f9a97 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2745,9 +2745,14 @@ int arm_smmu_attach_prepare(struct arm_smmu_attach_state *state,
 		 * Translation Requests and Translated transactions are denied
 		 * as though ATS is disabled for the stream (STE.EATS == 0b00),
 		 * causing F_BAD_ATS_TREQ and F_TRANSL_FORBIDDEN events
-		 * (IHI0070Ea 5.2 Stream Table Entry). Thus ATS can only be
-		 * enabled if we have arm_smmu_domain, those always have page
-		 * tables.
+		 * (IHI0070Ea 5.2 Stream Table Entry).
+		 *
+		 * However, if we have installed a CD table and are using S1DSS
+		 * then ATS will work in S1DSS bypass. See "13.6.4 Full ATS
+		 * skipping stage 1".
+		 *
+		 * Disable ATS if we are going to create a normal 0b100 bypass
+		 * STE.
 		 */
 		state->ats_enabled = !state->disable_ats &&
 				     arm_smmu_ats_supported(master);
@@ -3072,8 +3077,10 @@ static void arm_smmu_attach_dev_ste(struct iommu_domain *domain,
 	if (arm_smmu_ssids_in_use(&master->cd_table)) {
 		/*
 		 * If a CD table has to be present then we need to run with ATS
-		 * on even though the RID will fail ATS queries with UR. This is
-		 * because we have no idea what the PASID's need.
+		 * on because we have to assume a PASID is using ATS. For
+		 * IDENTITY this will setup things so that S1DSS=bypass which
+		 * follows the explanation in "13.6.4 Full ATS skipping stage 1"
+		 * and allows for ATS on the RID to work.
 		 */
 		state.cd_needs_ats = true;
 		arm_smmu_attach_prepare(&state, domain);
-- 
2.43.0