drivers/iommu/iommufd/iommufd_private.h | 2 +- tools/testing/selftests/iommu/iommufd_utils.h | 17 +++++++++-------- drivers/iommu/iommufd/driver.c | 19 ++++++++++++++++--- drivers/iommu/iommufd/eventq.c | 5 ++++- tools/testing/selftests/iommu/iommufd.c | 19 +++++++++++++++++-- .../selftests/iommu/iommufd_fail_nth.c | 2 +- 6 files changed, 48 insertions(+), 16 deletions(-)
The upper bound of veventq_depth has been missing for veventq allocation, leaving a vulnerability where userspace could exhaust atomic memory pool. Fix it properly: - Allocate outside the spinlock to avoid GFP_ATOMIC - Cap the veventq_depth upper bound - Fix event_data byte-count - Add selftest coverage Note that QEMU's SMMU has been already allocating veventq using a "HW" EVTQ entry number. So, picking 19 as the known use case, for a minimal level of ABI consistency. This is on github: https://github.com/nicolinc/iommufd/commits/fix_veventq_depth-v1 Nicolin Chen (4): iommufd: Move vevent memory allocation outside spinlock iommufd: Set veventq_depth upper bound iommufd: Fix data_len byte-count vs element-count mismatch iommufd/selftest: Add boundary tests for veventq_depth drivers/iommu/iommufd/iommufd_private.h | 2 +- tools/testing/selftests/iommu/iommufd_utils.h | 17 +++++++++-------- drivers/iommu/iommufd/driver.c | 19 ++++++++++++++++--- drivers/iommu/iommufd/eventq.c | 5 ++++- tools/testing/selftests/iommu/iommufd.c | 19 +++++++++++++++++-- .../selftests/iommu/iommufd_fail_nth.c | 2 +- 6 files changed, 48 insertions(+), 16 deletions(-) -- 2.43.0
On Sun, May 17, 2026 at 07:28:45PM -0700, Nicolin Chen wrote: > The upper bound of veventq_depth has been missing for veventq allocation, > leaving a vulnerability where userspace could exhaust atomic memory pool. > > Fix it properly: > - Allocate outside the spinlock to avoid GFP_ATOMIC > - Cap the veventq_depth upper bound > - Fix event_data byte-count > - Add selftest coverage > > Note that QEMU's SMMU has been already allocating veventq using a "HW" > EVTQ entry number. So, picking 19 as the known use case, for a minimal > level of ABI consistency. > > This is on github: > https://github.com/nicolinc/iommufd/commits/fix_veventq_depth-v1 > > Nicolin Chen (4): > iommufd: Move vevent memory allocation outside spinlock > iommufd: Set veventq_depth upper bound > iommufd: Fix data_len byte-count vs element-count mismatch > iommufd/selftest: Add boundary tests for veventq_depth Please adjust for the sashiko remarks: 1) Put "iommufd: Fix data_len byte-count vs element-count mismatch" first 2) This "Returning -ENOMEM for allocation failures but 0 for queue overflows treats the conditions differently, which seems to contradict the stated intent." Seems bogus, I think adjust the commit message. We do want 0 for queue full conditions. 3) Let's fix the "Will this lockless read concurrent with a plain write cause a data race?" by removing the optimization, just pre-allocate and fail. We don't expect this to be a normal condition worth optimizing 4) I'm OK with ENOMEM here, leave it, EAGAIN should mean it is pollable and it won't become pollable.. 5) The sizeof(hdr) has been fixed in my rc branch. You can rebase on top of that and also ensure to send a base-commit trailer to help Sashiko apply the patches properly 6) What do you think about the "but done has already been incremented by sizeof(*hdr)" ? unrelated issue? If it is simple lets add a patch here to fix it Jason
On Thu, May 21, 2026 at 11:30:04AM -0300, Jason Gunthorpe wrote: > 1) Put "iommufd: Fix data_len byte-count vs element-count mismatch" > first OK. > 2) This "Returning -ENOMEM for allocation failures but 0 for queue overflows treats > the conditions differently, which seems to contradict the stated > intent." Seems bogus, I think adjust the commit message. We do want > 0 for queue full conditions. Ack. > 3) Let's fix the "Will this lockless read concurrent with a plain write cause a > data race?" by removing the optimization, just pre-allocate and > fail. We don't expect this to be a normal condition worth > optimizing I can drop it. FWIW, it was added to address a Sashiko review also: By moving the allocation outside the spinlock, the precondition check that skipped the allocation when the queue was full is bypassed. When the queue is full, which can be common during a hardware fault storm if userspace cannot keep up, the code now unconditionally allocates memory, copies data, acquires the lock, and then immediately frees the memory and drops the event. Can this tight loop of wasteful slab allocations, memory copies, and deallocations exacerbate IOMMU fault storms by adding unnecessary CPU overhead? Would it be possible to add an optimistic lockless check, such as READ_ONCE(veventq->num_events) < veventq->depth, to bypass the allocation when the queue appears full? > 4) I'm OK with ENOMEM here, leave it, EAGAIN should mean it is > pollable and it won't become pollable.. Yea. Sashiko would complain about an EAGAIN as well :-) > 5) The sizeof(hdr) has been fixed in my rc branch. You can rebase on > top of that and also ensure to send a base-commit trailer to help > Sashiko apply the patches properly Oh, I forgot to add base commit ID. Will use your for-rc branch. > 6) What do you think about the "but done has > already been incremented by sizeof(*hdr)" ? unrelated issue? If it > is simple lets add a patch here to fix it I added a patch but didn't include in the series -- Sashiko would raise more questions against that patch... I think it's a separate bug; Sashiko pointed out another in fault queue as well. Both bugs are at failure (corner cases?) path. I'd like to address them separately. Thanks Nicolin
On Thu, May 21, 2026 at 11:01:48AM -0700, Nicolin Chen wrote: > FWIW, it was added to address a Sashiko review also: > > By moving the allocation outside the spinlock, the precondition check that > skipped the allocation when the queue was full is bypassed. > > When the queue is full, which can be common during a hardware fault storm > if userspace cannot keep up, the code now unconditionally allocates memory, > copies data, acquires the lock, and then immediately frees the memory and > drops the event. > > Can this tight loop of wasteful slab allocations, memory copies, and > deallocations exacerbate IOMMU fault storms by adding unnecessary CPU > overhead? > > Would it be possible to add an optimistic lockless check, such as > READ_ONCE(veventq->num_events) < veventq->depth, to bypass the allocation > when the queue appears full? That seems like nonsense to me. > > 6) What do you think about the "but done has > > already been incremented by sizeof(*hdr)" ? unrelated issue? If it > > is simple lets add a patch here to fix it > > I added a patch but didn't include in the series -- Sashiko would > raise more questions against that patch... > > I think it's a separate bug; Sashiko pointed out another in fault > queue as well. Both bugs are at failure (corner cases?) path. > > I'd like to address them separately. Ok Jason
On Sun, May 17, 2026 at 07:28:45PM -0700, Nicolin Chen wrote: > The upper bound of veventq_depth has been missing for veventq allocation, > leaving a vulnerability where userspace could exhaust atomic memory pool. > > Fix it properly: > - Allocate outside the spinlock to avoid GFP_ATOMIC > - Cap the veventq_depth upper bound > - Fix event_data byte-count > - Add selftest coverage > > Note that QEMU's SMMU has been already allocating veventq using a "HW" > EVTQ entry number. So, picking 19 as the known use case, for a minimal > level of ABI consistency. > > This is on github: > https://github.com/nicolinc/iommufd/commits/fix_veventq_depth-v1 > > Nicolin Chen (4): > iommufd: Move vevent memory allocation outside spinlock > iommufd: Set veventq_depth upper bound > iommufd: Fix data_len byte-count vs element-count mismatch > iommufd/selftest: Add boundary tests for veventq_depth Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Jason
© 2016 - 2026 Red Hat, Inc.