[v1] iommufd: Fix veventq_depth boundary

[PATCH rc v1 0/4] iommufd: Fix veventq_depth boundary

Posted by Nicolin Chen 1 week ago

The upper bound of veventq_depth has been missing for veventq allocation,
leaving a vulnerability where userspace could exhaust atomic memory pool.

Fix it properly:
 - Allocate outside the spinlock to avoid GFP_ATOMIC
 - Cap the veventq_depth upper bound
 - Fix event_data byte-count
 - Add selftest coverage

Note that QEMU's SMMU has been already allocating veventq using a "HW"
EVTQ entry number. So, picking 19 as the known use case, for a minimal
level of ABI consistency.

This is on github:
https://github.com/nicolinc/iommufd/commits/fix_veventq_depth-v1

Nicolin Chen (4):
  iommufd: Move vevent memory allocation outside spinlock
  iommufd: Set veventq_depth upper bound
  iommufd: Fix data_len byte-count vs element-count mismatch
  iommufd/selftest: Add boundary tests for veventq_depth

 drivers/iommu/iommufd/iommufd_private.h       |  2 +-
 tools/testing/selftests/iommu/iommufd_utils.h | 17 +++++++++--------
 drivers/iommu/iommufd/driver.c                | 19 ++++++++++++++++---
 drivers/iommu/iommufd/eventq.c                |  5 ++++-
 tools/testing/selftests/iommu/iommufd.c       | 19 +++++++++++++++++--
 .../selftests/iommu/iommufd_fail_nth.c        |  2 +-
 6 files changed, 48 insertions(+), 16 deletions(-)

-- 
2.43.0

Re: [PATCH rc v1 0/4] iommufd: Fix veventq_depth boundary

Posted by Jason Gunthorpe 3 days, 14 hours ago

On Sun, May 17, 2026 at 07:28:45PM -0700, Nicolin Chen wrote:
> The upper bound of veventq_depth has been missing for veventq allocation,
> leaving a vulnerability where userspace could exhaust atomic memory pool.
> 
> Fix it properly:
>  - Allocate outside the spinlock to avoid GFP_ATOMIC
>  - Cap the veventq_depth upper bound
>  - Fix event_data byte-count
>  - Add selftest coverage
> 
> Note that QEMU's SMMU has been already allocating veventq using a "HW"
> EVTQ entry number. So, picking 19 as the known use case, for a minimal
> level of ABI consistency.
> 
> This is on github:
> https://github.com/nicolinc/iommufd/commits/fix_veventq_depth-v1
> 
> Nicolin Chen (4):
>   iommufd: Move vevent memory allocation outside spinlock
>   iommufd: Set veventq_depth upper bound
>   iommufd: Fix data_len byte-count vs element-count mismatch
>   iommufd/selftest: Add boundary tests for veventq_depth

Please adjust for the sashiko remarks:

1) Put "iommufd: Fix data_len byte-count vs element-count mismatch"
   first
2) This "Returning -ENOMEM for allocation failures but 0 for queue overflows treats
   the conditions differently, which seems to contradict the stated
   intent." Seems bogus, I think adjust the commit message. We do want
   0 for queue full conditions.
3) Let's fix the "Will this lockless read concurrent with a plain write cause a
   data race?" by removing the optimization, just pre-allocate and
   fail. We don't expect this to be a normal condition worth
   optimizing
4) I'm OK with ENOMEM here, leave it, EAGAIN should mean it is
   pollable and it won't become pollable..
5) The sizeof(hdr) has been fixed in my rc branch. You can rebase on
   top of that and also ensure to send a base-commit trailer to help
   Sashiko apply the patches properly
6) What do you think about the "but done has
   already been incremented by sizeof(*hdr)" ? unrelated issue? If it
   is simple lets add a patch here to fix it

Jason

Re: [PATCH rc v1 0/4] iommufd: Fix veventq_depth boundary

Posted by Nicolin Chen 3 days, 11 hours ago

On Thu, May 21, 2026 at 11:30:04AM -0300, Jason Gunthorpe wrote:
> 1) Put "iommufd: Fix data_len byte-count vs element-count mismatch"
>    first

OK.

> 2) This "Returning -ENOMEM for allocation failures but 0 for queue overflows treats
>    the conditions differently, which seems to contradict the stated
>    intent." Seems bogus, I think adjust the commit message. We do want
>    0 for queue full conditions.

Ack.

> 3) Let's fix the "Will this lockless read concurrent with a plain write cause a
>    data race?" by removing the optimization, just pre-allocate and
>    fail. We don't expect this to be a normal condition worth
>    optimizing

I can drop it.

FWIW, it was added to address a Sashiko review also:

  By moving the allocation outside the spinlock, the precondition check that
  skipped the allocation when the queue was full is bypassed.

  When the queue is full, which can be common during a hardware fault storm
  if userspace cannot keep up, the code now unconditionally allocates memory,
  copies data, acquires the lock, and then immediately frees the memory and
  drops the event.

  Can this tight loop of wasteful slab allocations, memory copies, and
  deallocations exacerbate IOMMU fault storms by adding unnecessary CPU
  overhead?

  Would it be possible to add an optimistic lockless check, such as
  READ_ONCE(veventq->num_events) < veventq->depth, to bypass the allocation
  when the queue appears full?

> 4) I'm OK with ENOMEM here, leave it, EAGAIN should mean it is
>    pollable and it won't become pollable..

Yea. Sashiko would complain about an EAGAIN as well :-)

> 5) The sizeof(hdr) has been fixed in my rc branch. You can rebase on
>    top of that and also ensure to send a base-commit trailer to help
>    Sashiko apply the patches properly

Oh, I forgot to add base commit ID. Will use your for-rc branch.

> 6) What do you think about the "but done has
>    already been incremented by sizeof(*hdr)" ? unrelated issue? If it
>    is simple lets add a patch here to fix it

I added a patch but didn't include in the series -- Sashiko would
raise more questions against that patch...

I think it's a separate bug; Sashiko pointed out another in fault
queue as well. Both bugs are at failure (corner cases?) path.

I'd like to address them separately.

Thanks
Nicolin

Re: [PATCH rc v1 0/4] iommufd: Fix veventq_depth boundary

Posted by Jason Gunthorpe 3 days, 5 hours ago

On Thu, May 21, 2026 at 11:01:48AM -0700, Nicolin Chen wrote:

> FWIW, it was added to address a Sashiko review also:
> 
>   By moving the allocation outside the spinlock, the precondition check that
>   skipped the allocation when the queue was full is bypassed.
> 
>   When the queue is full, which can be common during a hardware fault storm
>   if userspace cannot keep up, the code now unconditionally allocates memory,
>   copies data, acquires the lock, and then immediately frees the memory and
>   drops the event.
> 
>   Can this tight loop of wasteful slab allocations, memory copies, and
>   deallocations exacerbate IOMMU fault storms by adding unnecessary CPU
>   overhead?
> 
>   Would it be possible to add an optimistic lockless check, such as
>   READ_ONCE(veventq->num_events) < veventq->depth, to bypass the allocation
>   when the queue appears full?

That seems like nonsense to me.

> > 6) What do you think about the "but done has
> >    already been incremented by sizeof(*hdr)" ? unrelated issue? If it
> >    is simple lets add a patch here to fix it
> 
> I added a patch but didn't include in the series -- Sashiko would
> raise more questions against that patch...
> 
> I think it's a separate bug; Sashiko pointed out another in fault
> queue as well. Both bugs are at failure (corner cases?) path.
> 
> I'd like to address them separately.

Ok

Jason

Re: [PATCH rc v1 0/4] iommufd: Fix veventq_depth boundary

Posted by Jason Gunthorpe 6 days, 11 hours ago

On Sun, May 17, 2026 at 07:28:45PM -0700, Nicolin Chen wrote:
> The upper bound of veventq_depth has been missing for veventq allocation,
> leaving a vulnerability where userspace could exhaust atomic memory pool.
> 
> Fix it properly:
>  - Allocate outside the spinlock to avoid GFP_ATOMIC
>  - Cap the veventq_depth upper bound
>  - Fix event_data byte-count
>  - Add selftest coverage
> 
> Note that QEMU's SMMU has been already allocating veventq using a "HW"
> EVTQ entry number. So, picking 19 as the known use case, for a minimal
> level of ABI consistency.
> 
> This is on github:
> https://github.com/nicolinc/iommufd/commits/fix_veventq_depth-v1
> 
> Nicolin Chen (4):
>   iommufd: Move vevent memory allocation outside spinlock
>   iommufd: Set veventq_depth upper bound
>   iommufd: Fix data_len byte-count vs element-count mismatch
>   iommufd/selftest: Add boundary tests for veventq_depth

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason