[PATCH v4 2/7] iommufd: Add iommufd_object_tombstone_user() helper

Xu Yilun posted 7 patches 3 months ago
There is a newer version of this series
[PATCH v4 2/7] iommufd: Add iommufd_object_tombstone_user() helper
Posted by Xu Yilun 3 months ago
Add the iommufd_object_tombstone_user() helper, which allows the caller
to destroy an iommufd object created by userspace.

This is useful on some destroy paths when the kernel caller finds the
object should have been removed by userspace but is still alive. With
this helper, the caller destroys the object but leave the object ID
reserved (so called tombstone). The tombstone prevents repurposing the
object ID without awareness of the original user.

Since this happens for abnormal userspace behavior, for simplicity, the
tombstoned object ID would be permanently leaked until
iommufd_fops_release(). I.e. the original user gets an error when
calling ioctl(IOMMU_DESTROY) on that ID.

The first use case would be to ensure the iommufd_vdevice can't outlive
the associated iommufd_device.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Co-developed-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
---
 drivers/iommu/iommufd/iommufd_private.h | 23 ++++++++++++++++++++++-
 drivers/iommu/iommufd/main.c            | 24 ++++++++++++++++++++++--
 2 files changed, 44 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 4f5e8cd99c96..da1bced8c945 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -188,7 +188,8 @@ void iommufd_object_finalize(struct iommufd_ctx *ictx,
 			     struct iommufd_object *obj);
 
 enum {
-	REMOVE_WAIT_SHORTTERM = 1,
+	REMOVE_WAIT_SHORTTERM	= BIT(0),
+	REMOVE_OBJ_TOMBSTONE	= BIT(1),
 };
 int iommufd_object_remove(struct iommufd_ctx *ictx,
 			  struct iommufd_object *to_destroy, u32 id,
@@ -214,6 +215,26 @@ static inline void iommufd_object_destroy_user(struct iommufd_ctx *ictx,
 	WARN_ON(ret);
 }
 
+/*
+ * Similar to iommufd_object_destroy_user(), except that the object ID is left
+ * reserved/tombstoned.
+ */
+static inline void iommufd_object_tombstone_user(struct iommufd_ctx *ictx,
+						 struct iommufd_object *obj)
+{
+	int ret;
+
+	ret = iommufd_object_remove(ictx, obj, obj->id,
+				    REMOVE_WAIT_SHORTTERM | REMOVE_OBJ_TOMBSTONE);
+
+	/*
+	 * If there is a bug and we couldn't destroy the object then we did put
+	 * back the caller's users refcount and will eventually try to free it
+	 * again during close.
+	 */
+	WARN_ON(ret);
+}
+
 /*
  * The HWPT allocated by autodomains is used in possibly many devices and
  * is automatically destroyed when its refcount reaches zero.
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index 778694d7c207..25ab2f41d650 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -216,7 +216,7 @@ int iommufd_object_remove(struct iommufd_ctx *ictx,
 		goto err_xa;
 	}
 
-	xas_store(&xas, NULL);
+	xas_store(&xas, (flags & REMOVE_OBJ_TOMBSTONE) ? XA_ZERO_ENTRY : NULL);
 	if (ictx->vfio_ioas == container_of(obj, struct iommufd_ioas, obj))
 		ictx->vfio_ioas = NULL;
 	xa_unlock(&ictx->objects);
@@ -298,22 +298,42 @@ static int iommufd_fops_release(struct inode *inode, struct file *filp)
 	 * until the entire list is destroyed. If this can't progress then there
 	 * is some bug related to object refcounting.
 	 */
-	while (!xa_empty(&ictx->objects)) {
+	for (;;) {
 		unsigned int destroyed = 0;
 		unsigned long index;
+		bool empty = true;
 
+		/*
+		 * xa_for_each() will not return tomestones (zeroed entries),
+		 * which prevent the xarray being empty. So use an empty flag
+		 * instead of xa_empty() to indicate all entries are either
+		 * NULLed or tomestoned.
+		 */
 		xa_for_each(&ictx->objects, index, obj) {
+			empty = false;
 			if (!refcount_dec_if_one(&obj->users))
 				continue;
+
 			destroyed++;
 			xa_erase(&ictx->objects, index);
 			iommufd_object_ops[obj->type].destroy(obj);
 			kfree(obj);
 		}
+
+		if (empty)
+			break;
+
 		/* Bug related to users refcount */
 		if (WARN_ON(!destroyed))
 			break;
 	}
+
+	/*
+	 * There may be some tombstones left over from
+	 * iommufd_object_tombstone_user()
+	 */
+	xa_destroy(&ictx->objects);
+
 	WARN_ON(!xa_empty(&ictx->groups));
 
 	mutex_destroy(&ictx->sw_msi_lock);
-- 
2.25.1
Re: [PATCH v4 2/7] iommufd: Add iommufd_object_tombstone_user() helper
Posted by Jason Gunthorpe 2 months, 3 weeks ago
On Wed, Jul 09, 2025 at 12:02:29PM +0800, Xu Yilun wrote:
> Add the iommufd_object_tombstone_user() helper, which allows the caller
> to destroy an iommufd object created by userspace.
> 
> This is useful on some destroy paths when the kernel caller finds the
> object should have been removed by userspace but is still alive. With
> this helper, the caller destroys the object but leave the object ID
> reserved (so called tombstone). The tombstone prevents repurposing the
> object ID without awareness of the original user.
> 
> Since this happens for abnormal userspace behavior, for simplicity, the
> tombstoned object ID would be permanently leaked until
> iommufd_fops_release(). I.e. the original user gets an error when
> calling ioctl(IOMMU_DESTROY) on that ID.
> 
> The first use case would be to ensure the iommufd_vdevice can't outlive
> the associated iommufd_device.
> 
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
> Co-developed-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
> Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
> Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
> ---
>  drivers/iommu/iommufd/iommufd_private.h | 23 ++++++++++++++++++++++-
>  drivers/iommu/iommufd/main.c            | 24 ++++++++++++++++++++++--
>  2 files changed, 44 insertions(+), 3 deletions(-)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason
Re: [PATCH v4 2/7] iommufd: Add iommufd_object_tombstone_user() helper
Posted by Nicolin Chen 2 months, 4 weeks ago
On Wed, Jul 09, 2025 at 12:02:29PM +0800, Xu Yilun wrote:
> Add the iommufd_object_tombstone_user() helper, which allows the caller
> to destroy an iommufd object created by userspace.
> 
> This is useful on some destroy paths when the kernel caller finds the
> object should have been removed by userspace but is still alive. With
> this helper, the caller destroys the object but leave the object ID
> reserved (so called tombstone). The tombstone prevents repurposing the
> object ID without awareness of the original user.
> 
> Since this happens for abnormal userspace behavior, for simplicity, the
> tombstoned object ID would be permanently leaked until
> iommufd_fops_release(). I.e. the original user gets an error when
> calling ioctl(IOMMU_DESTROY) on that ID.
> 
> The first use case would be to ensure the iommufd_vdevice can't outlive
> the associated iommufd_device.
> 
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
> Co-developed-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
> Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
> Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>

Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>

With one nit:

> -	while (!xa_empty(&ictx->objects)) {
> +	for (;;) {
>  		unsigned int destroyed = 0;
>  		unsigned long index;
> +		bool empty = true;
>  
> +		/*
> +		 * xa_for_each() will not return tomestones (zeroed entries),
> +		 * which prevent the xarray being empty. So use an empty flags

Since the first "empty" and the second "empty" are different things,

> +		 * instead of xa_empty() to indicate all entries are either
> +		 * NULLed or tomestoned.
> +		 */

let's write something like this (correcting typos too):

		/*
		 * We can't use xa_empty(), as a tombstone (NULLed entry) would
		 * prevent it returning true, unlike xa_for_each() ignoring the
		 * NULLed entries. So use an empty flag instead of xa_empty() to
		 * indicate all entries are either NULLed or tombstoned.
		 */
Re: [PATCH v4 2/7] iommufd: Add iommufd_object_tombstone_user() helper
Posted by Xu Yilun 2 months, 3 weeks ago
> With one nit:
> 
> > -	while (!xa_empty(&ictx->objects)) {
> > +	for (;;) {
> >  		unsigned int destroyed = 0;
> >  		unsigned long index;
> > +		bool empty = true;
> >  
> > +		/*
> > +		 * xa_for_each() will not return tomestones (zeroed entries),
> > +		 * which prevent the xarray being empty. So use an empty flags
> 
> Since the first "empty" and the second "empty" are different things,
> 
> > +		 * instead of xa_empty() to indicate all entries are either
> > +		 * NULLed or tomestoned.
> > +		 */
> 
> let's write something like this (correcting typos too):
> 
> 		/*
> 		 * We can't use xa_empty(), as a tombstone (NULLed entry) would
                                                            ^
> 		 * prevent it returning true, unlike xa_for_each() ignoring the
> 		 * NULLed entries. So use an empty flag instead of xa_empty() to
                   ^
s/NULLed/zeroed, are they?

Thanks,
Yilun

> 		 * indicate all entries are either NULLed or tombstoned.
> 		 */
Re: [PATCH v4 2/7] iommufd: Add iommufd_object_tombstone_user() helper
Posted by Jason Gunthorpe 2 months, 3 weeks ago
On Fri, Jul 11, 2025 at 11:02:53PM +0800, Xu Yilun wrote:
> > With one nit:
> > 
> > > -	while (!xa_empty(&ictx->objects)) {
> > > +	for (;;) {

Actually just leave this, if xa_empty does work out then the loop
exits quickly, otherwise the break will deal with the tombstones
 it.

> > >  		unsigned int destroyed = 0;
> > >  		unsigned long index;
> > > +		bool empty = true;
> > >  
> > > +		/*
> > > +		 * xa_for_each() will not return tomestones (zeroed entries),
> > > +		 * which prevent the xarray being empty. So use an empty flags
> > 
> > Since the first "empty" and the second "empty" are different things,
> > 
> > > +		 * instead of xa_empty() to indicate all entries are either
> > > +		 * NULLed or tomestoned.
> > > +		 */
> > 
> > let's write something like this (correcting typos too):
> > 
> > 		/*
> > 		 * We can't use xa_empty(), as a tombstone (NULLed entry) would
>                                                             ^
> > 		 * prevent it returning true, unlike xa_for_each() ignoring the
> > 		 * NULLed entries. So use an empty flag instead of xa_empty() to
>                    ^
> s/NULLed/zeroed, are they?

Maybe:

 We can't use xa_empty() to end the loop as the tombstones are stored
 as XA_ZERO_ENTRY in the xarray. However xa_for_each() automatically
 converts them to NULL and skips them causing xa_empty() to be kept
 false. Thus once xa_for_each() finds no further !NULL entries the
 loop is done.

?

Jason
Re: [PATCH v4 2/7] iommufd: Add iommufd_object_tombstone_user() helper
Posted by Nicolin Chen 2 months, 3 weeks ago
On Fri, Jul 11, 2025 at 06:33:23PM -0300, Jason Gunthorpe wrote:
> On Fri, Jul 11, 2025 at 11:02:53PM +0800, Xu Yilun wrote:
> > > let's write something like this (correcting typos too):
> > > 
> > > 		/*
> > > 		 * We can't use xa_empty(), as a tombstone (NULLed entry) would
> >                                                             ^
> > > 		 * prevent it returning true, unlike xa_for_each() ignoring the
> > > 		 * NULLed entries. So use an empty flag instead of xa_empty() to
> >                    ^
> > s/NULLed/zeroed, are they?
> 
> Maybe:
> 
>  We can't use xa_empty() to end the loop as the tombstones are stored
>  as XA_ZERO_ENTRY in the xarray. However xa_for_each() automatically
>  converts them to NULL and skips them causing xa_empty() to be kept
>  false. Thus once xa_for_each() finds no further !NULL entries the
>  loop is done.
> 
> ?

+1
Re: [PATCH v4 2/7] iommufd: Add iommufd_object_tombstone_user() helper
Posted by Nicolin Chen 2 months, 3 weeks ago
On Fri, Jul 11, 2025 at 11:02:53PM +0800, Xu Yilun wrote:
> > With one nit:
> > 
> > > -	while (!xa_empty(&ictx->objects)) {
> > > +	for (;;) {
> > >  		unsigned int destroyed = 0;
> > >  		unsigned long index;
> > > +		bool empty = true;
> > >  
> > > +		/*
> > > +		 * xa_for_each() will not return tomestones (zeroed entries),
> > > +		 * which prevent the xarray being empty. So use an empty flags
> > 
> > Since the first "empty" and the second "empty" are different things,
> > 
> > > +		 * instead of xa_empty() to indicate all entries are either
> > > +		 * NULLed or tomestoned.
> > > +		 */
> > 
> > let's write something like this (correcting typos too):
> > 
> > 		/*
> > 		 * We can't use xa_empty(), as a tombstone (NULLed entry) would
>                                                             ^
> > 		 * prevent it returning true, unlike xa_for_each() ignoring the
> > 		 * NULLed entries. So use an empty flag instead of xa_empty() to
>                    ^
> s/NULLed/zeroed, are they?

They are. Just nicer to pick one for consistency.

Typo was "tomestones", any way..

Thanks
Nicolin
RE: [PATCH v4 2/7] iommufd: Add iommufd_object_tombstone_user() helper
Posted by Tian, Kevin 2 months, 4 weeks ago
> From: Xu Yilun <yilun.xu@linux.intel.com>
> Sent: Wednesday, July 9, 2025 12:02 PM
> 
> Add the iommufd_object_tombstone_user() helper, which allows the caller
> to destroy an iommufd object created by userspace.
> 
> This is useful on some destroy paths when the kernel caller finds the
> object should have been removed by userspace but is still alive. With
> this helper, the caller destroys the object but leave the object ID
> reserved (so called tombstone). The tombstone prevents repurposing the
> object ID without awareness of the original user.
> 
> Since this happens for abnormal userspace behavior, for simplicity, the
> tombstoned object ID would be permanently leaked until
> iommufd_fops_release(). I.e. the original user gets an error when
> calling ioctl(IOMMU_DESTROY) on that ID.
> 
> The first use case would be to ensure the iommufd_vdevice can't outlive
> the associated iommufd_device.
> 
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
> Co-developed-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
> Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
> Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>

Reviewed-by: Kevin Tian <kevin.tian@intel.com>