[PATCH v3 1/5] iommufd: Add iommufd_object_tombstone_user() helper

Xu Yilun posted 5 patches 3 months, 1 week ago
There is a newer version of this series
[PATCH v3 1/5] iommufd: Add iommufd_object_tombstone_user() helper
Posted by Xu Yilun 3 months, 1 week ago
Add the iommufd_object_tombstone_user() helper, which allows the caller
to destroy an iommufd object created by userspace.

This is useful on some destroy paths when the kernel caller finds the
object should have been removed by userspace but is still alive. With
this helper, the caller destroys the object but leave the object ID
reserved (so called tombstone). The tombstone prevents repurposing the
object ID without awareness of the original user.

Since this happens for abnomal userspace behavior, for simplicity, the
tombstoned object ID would be permanently leaked until
iommufd_fops_release(). I.e. the original user gets an error when
calling ioctl(IOMMU_DESTROY) on that ID.

The first use case would be to ensure the iommufd_vdevice can't outlive
the associated iommufd_device.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Co-developed-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
---
 drivers/iommu/iommufd/iommufd_private.h | 23 ++++++++++++++++++++++-
 drivers/iommu/iommufd/main.c            | 19 ++++++++++++++++---
 2 files changed, 38 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index 9ccc83341f32..fbc9ef78d81f 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -187,7 +187,8 @@ void iommufd_object_finalize(struct iommufd_ctx *ictx,
 			     struct iommufd_object *obj);
 
 enum {
-	REMOVE_WAIT_SHORTTERM = 1,
+	REMOVE_WAIT_SHORTTERM	= BIT(0),
+	REMOVE_OBJ_TOMBSTONE	= BIT(1),
 };
 int iommufd_object_remove(struct iommufd_ctx *ictx,
 			  struct iommufd_object *to_destroy, u32 id,
@@ -213,6 +214,26 @@ static inline void iommufd_object_destroy_user(struct iommufd_ctx *ictx,
 	WARN_ON(ret);
 }
 
+/*
+ * Similar to iommufd_object_destroy_user(), except that the object ID is left
+ * reserved/tombstoned.
+ */
+static inline void iommufd_object_tombstone_user(struct iommufd_ctx *ictx,
+						 struct iommufd_object *obj)
+{
+	int ret;
+
+	ret = iommufd_object_remove(ictx, obj, obj->id,
+				    REMOVE_WAIT_SHORTTERM | REMOVE_OBJ_TOMBSTONE);
+
+	/*
+	 * If there is a bug and we couldn't destroy the object then we did put
+	 * back the caller's users refcount and will eventually try to free it
+	 * again during close.
+	 */
+	WARN_ON(ret);
+}
+
 /*
  * The HWPT allocated by autodomains is used in possibly many devices and
  * is automatically destroyed when its refcount reaches zero.
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index 3df468f64e7d..620923669b42 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -167,7 +167,7 @@ int iommufd_object_remove(struct iommufd_ctx *ictx,
 		goto err_xa;
 	}
 
-	xas_store(&xas, NULL);
+	xas_store(&xas, (flags & REMOVE_OBJ_TOMBSTONE) ? XA_ZERO_ENTRY : NULL);
 	if (ictx->vfio_ioas == container_of(obj, struct iommufd_ioas, obj))
 		ictx->vfio_ioas = NULL;
 	xa_unlock(&ictx->objects);
@@ -239,6 +239,7 @@ static int iommufd_fops_release(struct inode *inode, struct file *filp)
 	struct iommufd_sw_msi_map *next;
 	struct iommufd_sw_msi_map *cur;
 	struct iommufd_object *obj;
+	bool empty;
 
 	/*
 	 * The objects in the xarray form a graph of "users" counts, and we have
@@ -249,23 +250,35 @@ static int iommufd_fops_release(struct inode *inode, struct file *filp)
 	 * until the entire list is destroyed. If this can't progress then there
 	 * is some bug related to object refcounting.
 	 */
-	while (!xa_empty(&ictx->objects)) {
+	for (;;) {
 		unsigned int destroyed = 0;
 		unsigned long index;
 
+		empty = true;
 		xa_for_each(&ictx->objects, index, obj) {
+			empty = false;
 			if (!refcount_dec_if_one(&obj->users))
 				continue;
+
 			destroyed++;
 			xa_erase(&ictx->objects, index);
 			iommufd_object_ops[obj->type].destroy(obj);
 			kfree(obj);
 		}
+
+		if (empty)
+			break;
+
 		/* Bug related to users refcount */
 		if (WARN_ON(!destroyed))
 			break;
 	}
-	WARN_ON(!xa_empty(&ictx->groups));
+
+	/*
+	 * There may be some tombstones left over from
+	 * iommufd_object_tombstone_user()
+	 */
+	xa_destroy(&ictx->groups);
 
 	mutex_destroy(&ictx->sw_msi_lock);
 	list_for_each_entry_safe(cur, next, &ictx->sw_msi_list, sw_msi_item)
-- 
2.25.1
Re: [PATCH v3 1/5] iommufd: Add iommufd_object_tombstone_user() helper
Posted by Nicolin Chen 3 months, 1 week ago
On Fri, Jun 27, 2025 at 11:38:05AM +0800, Xu Yilun wrote:
> Add the iommufd_object_tombstone_user() helper, which allows the caller
> to destroy an iommufd object created by userspace.

Should we describe it "partially destroy"?

> This is useful on some destroy paths when the kernel caller finds the
> object should have been removed by userspace but is still alive. With
> this helper, the caller destroys the object but leave the object ID
> reserved (so called tombstone). The tombstone prevents repurposing the
> object ID without awareness of the original user.
> 
> Since this happens for abnomal userspace behavior, for simplicity, the

s/abnomal/abnormal

Nicolin
Re: [PATCH v3 1/5] iommufd: Add iommufd_object_tombstone_user() helper
Posted by Xu Yilun 3 months ago
On Mon, Jun 30, 2025 at 12:50:57PM -0700, Nicolin Chen wrote:
> On Fri, Jun 27, 2025 at 11:38:05AM +0800, Xu Yilun wrote:
> > Add the iommufd_object_tombstone_user() helper, which allows the caller
> > to destroy an iommufd object created by userspace.
> 
> Should we describe it "partially destroy"?

I don't prefer "partially destroy", it gives the impression the object
is still in memory, in fact with the helper it is totally destroyed and
freed. Only the ID is reserved, but the ID actually has nothing to do
with the object anymore.

> 
> > This is useful on some destroy paths when the kernel caller finds the
> > object should have been removed by userspace but is still alive. With
> > this helper, the caller destroys the object but leave the object ID
> > reserved (so called tombstone). The tombstone prevents repurposing the
> > object ID without awareness of the original user.
> > 
> > Since this happens for abnomal userspace behavior, for simplicity, the
> 
> s/abnomal/abnormal

Applied, thanks for catching.

> 
> Nicolin
RE: [PATCH v3 1/5] iommufd: Add iommufd_object_tombstone_user() helper
Posted by Tian, Kevin 3 months, 1 week ago
> From: Xu Yilun <yilun.xu@linux.intel.com>
> Sent: Friday, June 27, 2025 11:38 AM
>
> @@ -239,6 +239,7 @@ static int iommufd_fops_release(struct inode *inode,
> struct file *filp)
>  	struct iommufd_sw_msi_map *next;
>  	struct iommufd_sw_msi_map *cur;
>  	struct iommufd_object *obj;
> +	bool empty;

move into for(;;) loop

> 
>  	/*
>  	 * The objects in the xarray form a graph of "users" counts, and we
> have
> @@ -249,23 +250,35 @@ static int iommufd_fops_release(struct inode
> *inode, struct file *filp)
>  	 * until the entire list is destroyed. If this can't progress then there
>  	 * is some bug related to object refcounting.
>  	 */
> -	while (!xa_empty(&ictx->objects)) {
> +	for (;;) {
>  		unsigned int destroyed = 0;
>  		unsigned long index;
> 
> +		empty = true;
>  		xa_for_each(&ictx->objects, index, obj) {
> +			empty = false;
>  			if (!refcount_dec_if_one(&obj->users))
>  				continue;
> +
>  			destroyed++;
>  			xa_erase(&ictx->objects, index);
>  			iommufd_object_ops[obj->type].destroy(obj);
>  			kfree(obj);
>  		}
> +
> +		if (empty)
> +			break;
> +
>  		/* Bug related to users refcount */
>  		if (WARN_ON(!destroyed))
>  			break;
>  	}
> -	WARN_ON(!xa_empty(&ictx->groups));

I didn't get the rationale of this change. tombstone only means the
object ID reserved, but all the destroy work for the object has been
done, e.g. after all idevice objects are destroyed there should be no
valid igroup entries in ictx->groups (and there is no tombstone
state for igroup).

Is it a wrong change by misreading ictx->groups as ictx->objects?

> +
> +	/*
> +	 * There may be some tombstones left over from
> +	 * iommufd_object_tombstone_user()
> +	 */
> +	xa_destroy(&ictx->groups);
> 

And here should be ictx->objects?
Re: [PATCH v3 1/5] iommufd: Add iommufd_object_tombstone_user() helper
Posted by Xu Yilun 3 months, 1 week ago
On Mon, Jun 30, 2025 at 05:52:26AM +0000, Tian, Kevin wrote:
> > From: Xu Yilun <yilun.xu@linux.intel.com>
> > Sent: Friday, June 27, 2025 11:38 AM
> >
> > @@ -239,6 +239,7 @@ static int iommufd_fops_release(struct inode *inode,
> > struct file *filp)
> >  	struct iommufd_sw_msi_map *next;
> >  	struct iommufd_sw_msi_map *cur;
> >  	struct iommufd_object *obj;
> > +	bool empty;
> 
> move into for(;;) loop

Yes.

> 
> > 
> >  	/*
> >  	 * The objects in the xarray form a graph of "users" counts, and we
> > have
> > @@ -249,23 +250,35 @@ static int iommufd_fops_release(struct inode
> > *inode, struct file *filp)
> >  	 * until the entire list is destroyed. If this can't progress then there
> >  	 * is some bug related to object refcounting.
> >  	 */
> > -	while (!xa_empty(&ictx->objects)) {
> > +	for (;;) {
> >  		unsigned int destroyed = 0;
> >  		unsigned long index;
> > 
> > +		empty = true;
> >  		xa_for_each(&ictx->objects, index, obj) {
> > +			empty = false;
> >  			if (!refcount_dec_if_one(&obj->users))
> >  				continue;
> > +
> >  			destroyed++;
> >  			xa_erase(&ictx->objects, index);
> >  			iommufd_object_ops[obj->type].destroy(obj);
> >  			kfree(obj);
> >  		}
> > +
> > +		if (empty)
> > +			break;
> > +
> >  		/* Bug related to users refcount */
> >  		if (WARN_ON(!destroyed))
> >  			break;
> >  	}
> > -	WARN_ON(!xa_empty(&ictx->groups));
> 
> I didn't get the rationale of this change. tombstone only means the
> object ID reserved, but all the destroy work for the object has been
> done, e.g. after all idevice objects are destroyed there should be no
> valid igroup entries in ictx->groups (and there is no tombstone
> state for igroup).
> 
> Is it a wrong change by misreading ictx->groups as ictx->objects?

Sorry, this is a true mistake.

> 
> > +
> > +	/*
> > +	 * There may be some tombstones left over from
> > +	 * iommufd_object_tombstone_user()
> > +	 */
> > +	xa_destroy(&ictx->groups);
> > 
> 
> And here should be ictx->objects?

Yes, thanks for catching up.

-       xa_destroy(&ictx->groups);
+       xa_destroy(&ictx->objects);
+
+       WARN_ON(!xa_empty(&ictx->groups));

Thanks,
Yilun
Re: [PATCH v3 1/5] iommufd: Add iommufd_object_tombstone_user() helper
Posted by Baolu Lu 3 months, 1 week ago
On 6/27/25 11:38, Xu Yilun wrote:
> Add the iommufd_object_tombstone_user() helper, which allows the caller
> to destroy an iommufd object created by userspace.
> 
> This is useful on some destroy paths when the kernel caller finds the
> object should have been removed by userspace but is still alive. With
> this helper, the caller destroys the object but leave the object ID
> reserved (so called tombstone). The tombstone prevents repurposing the
> object ID without awareness of the original user.
> 
> Since this happens for abnomal userspace behavior, for simplicity, the
> tombstoned object ID would be permanently leaked until
> iommufd_fops_release(). I.e. the original user gets an error when
> calling ioctl(IOMMU_DESTROY) on that ID.
> 
> The first use case would be to ensure the iommufd_vdevice can't outlive
> the associated iommufd_device.
> 
> Suggested-by: Jason Gunthorpe<jgg@nvidia.com>
> Co-developed-by: Aneesh Kumar K.V (Arm)<aneesh.kumar@kernel.org>
> Signed-off-by: Aneesh Kumar K.V (Arm)<aneesh.kumar@kernel.org>
> Signed-off-by: Xu Yilun<yilun.xu@linux.intel.com>
> ---
>   drivers/iommu/iommufd/iommufd_private.h | 23 ++++++++++++++++++++++-
>   drivers/iommu/iommufd/main.c            | 19 ++++++++++++++++---
>   2 files changed, 38 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
> index 9ccc83341f32..fbc9ef78d81f 100644
> --- a/drivers/iommu/iommufd/iommufd_private.h
> +++ b/drivers/iommu/iommufd/iommufd_private.h
> @@ -187,7 +187,8 @@ void iommufd_object_finalize(struct iommufd_ctx *ictx,
>   			     struct iommufd_object *obj);
>   
>   enum {
> -	REMOVE_WAIT_SHORTTERM = 1,
> +	REMOVE_WAIT_SHORTTERM	= BIT(0),
> +	REMOVE_OBJ_TOMBSTONE	= BIT(1),
>   };
>   int iommufd_object_remove(struct iommufd_ctx *ictx,
>   			  struct iommufd_object *to_destroy, u32 id,
> @@ -213,6 +214,26 @@ static inline void iommufd_object_destroy_user(struct iommufd_ctx *ictx,
>   	WARN_ON(ret);
>   }
>   
> +/*
> + * Similar to iommufd_object_destroy_user(), except that the object ID is left
> + * reserved/tombstoned.
> + */
> +static inline void iommufd_object_tombstone_user(struct iommufd_ctx *ictx,
> +						 struct iommufd_object *obj)
> +{
> +	int ret;
> +
> +	ret = iommufd_object_remove(ictx, obj, obj->id,
> +				    REMOVE_WAIT_SHORTTERM | REMOVE_OBJ_TOMBSTONE);
> +
> +	/*
> +	 * If there is a bug and we couldn't destroy the object then we did put
> +	 * back the caller's users refcount and will eventually try to free it
> +	 * again during close.
> +	 */
> +	WARN_ON(ret);
> +}
> +
>   /*
>    * The HWPT allocated by autodomains is used in possibly many devices and
>    * is automatically destroyed when its refcount reaches zero.
> diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
> index 3df468f64e7d..620923669b42 100644
> --- a/drivers/iommu/iommufd/main.c
> +++ b/drivers/iommu/iommufd/main.c
> @@ -167,7 +167,7 @@ int iommufd_object_remove(struct iommufd_ctx *ictx,
>   		goto err_xa;
>   	}
>   
> -	xas_store(&xas, NULL);
> +	xas_store(&xas, (flags & REMOVE_OBJ_TOMBSTONE) ? XA_ZERO_ENTRY : NULL);
>   	if (ictx->vfio_ioas == container_of(obj, struct iommufd_ioas, obj))
>   		ictx->vfio_ioas = NULL;
>   	xa_unlock(&ictx->objects);
> @@ -239,6 +239,7 @@ static int iommufd_fops_release(struct inode *inode, struct file *filp)
>   	struct iommufd_sw_msi_map *next;
>   	struct iommufd_sw_msi_map *cur;
>   	struct iommufd_object *obj;
> +	bool empty;
>   
>   	/*
>   	 * The objects in the xarray form a graph of "users" counts, and we have
> @@ -249,23 +250,35 @@ static int iommufd_fops_release(struct inode *inode, struct file *filp)
>   	 * until the entire list is destroyed. If this can't progress then there
>   	 * is some bug related to object refcounting.
>   	 */
> -	while (!xa_empty(&ictx->objects)) {
> +	for (;;) {
>   		unsigned int destroyed = 0;
>   		unsigned long index;
>   
> +		empty = true;
>   		xa_for_each(&ictx->objects, index, obj) {
> +			empty = false;

People like me, who don't know the details of how xa_for_each() and
xa_empty() work, will ask why "empty = xa_empty()" isn't used here. So
it's better to add some comments to make it more readable.

/* XA_ZERO_ENTRY is treated as a null entry by xa_for_each(). */

Others look good to me.

Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>

>   			if (!refcount_dec_if_one(&obj->users))
>   				continue;
> +
>   			destroyed++;
>   			xa_erase(&ictx->objects, index);
>   			iommufd_object_ops[obj->type].destroy(obj);
>   			kfree(obj);
>   		}
> +
> +		if (empty)
> +			break;
> +
>   		/* Bug related to users refcount */
>   		if (WARN_ON(!destroyed))
>   			break;
>   	}

Thanks,
baolu
Re: [PATCH v3 1/5] iommufd: Add iommufd_object_tombstone_user() helper
Posted by Xu Yilun 3 months, 1 week ago
> > -	while (!xa_empty(&ictx->objects)) {
> > +	for (;;) {
> >   		unsigned int destroyed = 0;
> >   		unsigned long index;
> > +		empty = true;
> >   		xa_for_each(&ictx->objects, index, obj) {
> > +			empty = false;
> 
> People like me, who don't know the details of how xa_for_each() and
> xa_empty() work, will ask why "empty = xa_empty()" isn't used here. So
> it's better to add some comments to make it more readable.
> 
> /* XA_ZERO_ENTRY is treated as a null entry by xa_for_each(). */

Yes. I try to make it more elaborate:

+               /*
+                * xa_for_each() will not return tomestones (zeroed entries),
+                * which prevent the xarray being empty. So use an empty flag
+                * instead of xa_empty() to indicate all entries are either
+                * NULLed or tomestoned.
+                */
                empty = true;

Thanks,
Yilun

> 
> Others look good to me.
> 
> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>