[PATCH v3] Fixes a race in iopt_unmap_iova_range

Sina Hassani posted 1 patch 2 months ago
There is a newer version of this series
drivers/iommu/iommufd/io_pagetable.c | 8 ++++++++
1 file changed, 8 insertions(+)
[PATCH v3] Fixes a race in iopt_unmap_iova_range
Posted by Sina Hassani 2 months ago
Bug: iopt_unmap_iova_range releases the lock on iova_rwsem inside the loop
body when getting to the more expensive unmap operations. This is fine on
its own except the loop condition is based on the first area that matches
the unmap address range. If a concurrent call to map picks an area that was
unmapped in the previous iterations, this loop will try to mistakenly unmap
them.

How to reproduce: I was able to reproduce this by having one userspace
thread mapping buffers and passing them to another thread that unmaps
them. The problem easily shows up as ebusy errors if you use single page
mappings.

The fix: A simple fix that I implemented here is to advance the start
pointer after we unmap an area. That way we are only looking at the
IOVA range that is mapped and hence guaranteed to not have any overlaps
in each iteration.

Test: I tested this against the repro mentioned above and it works fine.

Cc: stable@vger.kernel.org
Signed-off-by: Sina Hassani <sina@openai.com>
---
 drivers/iommu/iommufd/io_pagetable.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/iommu/iommufd/io_pagetable.c
b/drivers/iommu/iommufd/io_pagetable.c
index ee003bb2f647..e306871de06d 100644
--- a/drivers/iommu/iommufd/io_pagetable.c
+++ b/drivers/iommu/iommufd/io_pagetable.c
@@ -814,6 +814,14 @@ static int iopt_unmap_iova_range(struct
io_pagetable *iopt, unsigned long start,
                unmapped_bytes += area_last - area_first + 1;

                down_write(&iopt->iova_rwsem);
+
+               /* Do not reconsider things already unmapped in case of
+                * concurrent allocation */
+               if (area_last >= last) {
+                       break;
+               } else {
+                       start = area_last + 1;
+               }
        }

 out_unlock_iova:
--
2.43.0
RE: [PATCH v3] Fixes a race in iopt_unmap_iova_range
Posted by Tian, Kevin 2 months ago
> From: Sina Hassani <sina@openai.com>
> Sent: Friday, April 10, 2026 6:10 AM
> 
> Bug: iopt_unmap_iova_range releases the lock on iova_rwsem inside the
> loop
> body when getting to the more expensive unmap operations. This is fine on
> its own except the loop condition is based on the first area that matches
> the unmap address range. If a concurrent call to map picks an area that was
> unmapped in the previous iterations, this loop will try to mistakenly unmap
> them.
> 
> How to reproduce: I was able to reproduce this by having one userspace
> thread mapping buffers and passing them to another thread that unmaps
> them. The problem easily shows up as ebusy errors if you use single page
> mappings.
> 
> The fix: A simple fix that I implemented here is to advance the start
> pointer after we unmap an area. That way we are only looking at the
> IOVA range that is mapped and hence guaranteed to not have any overlaps
> in each iteration.
> 
> Test: I tested this against the repro mentioned above and it works fine.
> 
> Cc: stable@vger.kernel.org
> Signed-off-by: Sina Hassani <sina@openai.com>
> ---
>  drivers/iommu/iommufd/io_pagetable.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/iommu/iommufd/io_pagetable.c
> b/drivers/iommu/iommufd/io_pagetable.c
> index ee003bb2f647..e306871de06d 100644
> --- a/drivers/iommu/iommufd/io_pagetable.c
> +++ b/drivers/iommu/iommufd/io_pagetable.c
> @@ -814,6 +814,14 @@ static int iopt_unmap_iova_range(struct
> io_pagetable *iopt, unsigned long start,
>                 unmapped_bytes += area_last - area_first + 1;
> 
>                 down_write(&iopt->iova_rwsem);
> +
> +               /* Do not reconsider things already unmapped in case of
> +                * concurrent allocation */
> +               if (area_last >= last) {
> +                       break;
> +               } else {
> +                       start = area_last + 1;
> +               }
>         }

this could simply be:

	if (area_last >= last)
		break;
	start = area_last + 1;

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Re: [PATCH v3] Fixes a race in iopt_unmap_iova_range
Posted by Sina Hassani 2 months ago
On Thu, Apr 9, 2026 at 8:28 PM Tian, Kevin <kevin.tian@intel.com> wrote:
>
> > From: Sina Hassani <sina@openai.com>
> > Sent: Friday, April 10, 2026 6:10 AM
> >
> > Bug: iopt_unmap_iova_range releases the lock on iova_rwsem inside the
> > loop
> > body when getting to the more expensive unmap operations. This is fine on
> > its own except the loop condition is based on the first area that matches
> > the unmap address range. If a concurrent call to map picks an area that was
> > unmapped in the previous iterations, this loop will try to mistakenly unmap
> > them.
> >
> > How to reproduce: I was able to reproduce this by having one userspace
> > thread mapping buffers and passing them to another thread that unmaps
> > them. The problem easily shows up as ebusy errors if you use single page
> > mappings.
> >
> > The fix: A simple fix that I implemented here is to advance the start
> > pointer after we unmap an area. That way we are only looking at the
> > IOVA range that is mapped and hence guaranteed to not have any overlaps
> > in each iteration.
> >
> > Test: I tested this against the repro mentioned above and it works fine.
> >
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Sina Hassani <sina@openai.com>
> > ---
> >  drivers/iommu/iommufd/io_pagetable.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/drivers/iommu/iommufd/io_pagetable.c
> > b/drivers/iommu/iommufd/io_pagetable.c
> > index ee003bb2f647..e306871de06d 100644
> > --- a/drivers/iommu/iommufd/io_pagetable.c
> > +++ b/drivers/iommu/iommufd/io_pagetable.c
> > @@ -814,6 +814,14 @@ static int iopt_unmap_iova_range(struct
> > io_pagetable *iopt, unsigned long start,
> >                 unmapped_bytes += area_last - area_first + 1;
> >
> >                 down_write(&iopt->iova_rwsem);
> > +
> > +               /* Do not reconsider things already unmapped in case of
> > +                * concurrent allocation */
> > +               if (area_last >= last) {
> > +                       break;
> > +               } else {
> > +                       start = area_last + 1;
> > +               }
> >         }
>
> this could simply be:
>
>         if (area_last >= last)
>                 break;
>         start = area_last + 1;
done
>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>