[PATCH v2 0/5] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER

Caleb Sander Mateos posted 5 patches 4 weeks ago
io_uring/cancel.c    |  1 +
io_uring/fdinfo.c    |  2 +-
io_uring/filetable.c |  3 +-
io_uring/io_uring.c  | 67 +++++++++++++++++++++++++++++---------------
io_uring/io_uring.h  | 43 ++++++++++++++++++++++------
io_uring/kbuf.c      |  6 ++--
io_uring/net.c       |  1 +
io_uring/notif.c     |  5 ++--
io_uring/notif.h     |  3 +-
io_uring/openclose.c |  1 +
io_uring/poll.c      |  2 +-
io_uring/register.c  |  1 +
io_uring/rsrc.c      | 10 ++++++-
io_uring/rsrc.h      |  3 +-
io_uring/rw.c        |  3 +-
io_uring/splice.c    |  1 +
io_uring/waitid.c    |  2 +-
17 files changed, 111 insertions(+), 43 deletions(-)
[PATCH v2 0/5] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
Posted by Caleb Sander Mateos 4 weeks ago
As far as I can tell, setting IORING_SETUP_SINGLE_ISSUER when creating
an io_uring doesn't actually enable any additional optimizations (aside
from being a requirement for IORING_SETUP_DEFER_TASKRUN). This series
leverages IORING_SETUP_SINGLE_ISSUER's guarantee that only one task
submits SQEs to skip taking the uring_lock mutex in the submission and
task work paths.

First, we need to close a hole in the IORING_SETUP_SINGLE_ISSUER checks
where IORING_REGISTER_CLONE_BUFFERS only checks whether the thread is
allowed to access one of the two io_urings. It assumes the uring_lock
will prevent concurrent access to the other io_uring, but this will no
longer be the case after the optimization to skip taking uring_lock.

We also need to remove the unused filetable.h #include from io_uring.h
to avoid an #include cycle.

v2:
- Don't enable these optimizations for IORING_SETUP_SQPOLL, as we still
  need to synchronize SQ thread submission with io_uring_register()

Caleb Sander Mateos (5):
  io_uring: don't include filetable.h in io_uring.h
  io_uring/rsrc: respect submitter_task in io_register_clone_buffers()
  io_uring: clear IORING_SETUP_SINGLE_ISSUER for IORING_SETUP_SQPOLL
  io_uring: factor out uring_lock helpers
  io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER

 io_uring/cancel.c    |  1 +
 io_uring/fdinfo.c    |  2 +-
 io_uring/filetable.c |  3 +-
 io_uring/io_uring.c  | 67 +++++++++++++++++++++++++++++---------------
 io_uring/io_uring.h  | 43 ++++++++++++++++++++++------
 io_uring/kbuf.c      |  6 ++--
 io_uring/net.c       |  1 +
 io_uring/notif.c     |  5 ++--
 io_uring/notif.h     |  3 +-
 io_uring/openclose.c |  1 +
 io_uring/poll.c      |  2 +-
 io_uring/register.c  |  1 +
 io_uring/rsrc.c      | 10 ++++++-
 io_uring/rsrc.h      |  3 +-
 io_uring/rw.c        |  3 +-
 io_uring/splice.c    |  1 +
 io_uring/waitid.c    |  2 +-
 17 files changed, 111 insertions(+), 43 deletions(-)

-- 
2.45.2
Re: [PATCH v2 0/5] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
Posted by Jens Axboe 3 weeks, 2 days ago
On Thu, 04 Sep 2025 11:08:57 -0600, Caleb Sander Mateos wrote:
> As far as I can tell, setting IORING_SETUP_SINGLE_ISSUER when creating
> an io_uring doesn't actually enable any additional optimizations (aside
> from being a requirement for IORING_SETUP_DEFER_TASKRUN). This series
> leverages IORING_SETUP_SINGLE_ISSUER's guarantee that only one task
> submits SQEs to skip taking the uring_lock mutex in the submission and
> task work paths.
> 
> [...]

Applied, thanks!

[1/5] io_uring: don't include filetable.h in io_uring.h
      commit: 5d4c52bfa8cdc1dc1ff701246e662be3f43a3fe1
[2/5] io_uring/rsrc: respect submitter_task in io_register_clone_buffers()
      commit: 2f076a453f75de691a081c89bce31b530153d53b
[3/5] io_uring: clear IORING_SETUP_SINGLE_ISSUER for IORING_SETUP_SQPOLL
      commit: 6f5a203998fcf43df1d43f60657d264d1918cdcd
[4/5] io_uring: factor out uring_lock helpers
      commit: 7940a4f3394a6af801af3f2bcd1d491a71a7631d
[5/5] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
      commit: 4cc292a0faf1f0755935aebc9b288ce578d0ced2

Best regards,
-- 
Jens Axboe
Re: [PATCH v2 0/5] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
Posted by Pavel Begunkov 3 weeks, 1 day ago
On 9/9/25 14:35, Jens Axboe wrote:
> 
> On Thu, 04 Sep 2025 11:08:57 -0600, Caleb Sander Mateos wrote:
>> As far as I can tell, setting IORING_SETUP_SINGLE_ISSUER when creating
>> an io_uring doesn't actually enable any additional optimizations (aside
>> from being a requirement for IORING_SETUP_DEFER_TASKRUN). This series
>> leverages IORING_SETUP_SINGLE_ISSUER's guarantee that only one task
>> submits SQEs to skip taking the uring_lock mutex in the submission and
>> task work paths.
>>
>> [...]
> 
> Applied, thanks!
> 
> [1/5] io_uring: don't include filetable.h in io_uring.h
>        commit: 5d4c52bfa8cdc1dc1ff701246e662be3f43a3fe1
> [2/5] io_uring/rsrc: respect submitter_task in io_register_clone_buffers()
>        commit: 2f076a453f75de691a081c89bce31b530153d53b
> [3/5] io_uring: clear IORING_SETUP_SINGLE_ISSUER for IORING_SETUP_SQPOLL
>        commit: 6f5a203998fcf43df1d43f60657d264d1918cdcd
> [4/5] io_uring: factor out uring_lock helpers
>        commit: 7940a4f3394a6af801af3f2bcd1d491a71a7631d
> [5/5] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
>        commit: 4cc292a0faf1f0755935aebc9b288ce578d0ced2

FWIW, from a glance that should be quite broken, there is a bunch of
bits protected from parallel use by the lock. I described this
optimisation few years back around when first introduced SINGLE_ISSUER
and the DEFER_TASKRUN locking model, but to this day think it's not
worth it as it'll be a major pain for any future changes. It would've
been more feasible if links wasn't a thing. Though, none of it is
my problem anymore, and I'm not insisting.

-- 
Pavel Begunkov
Re: [PATCH v2 0/5] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
Posted by Caleb Sander Mateos 3 weeks ago
On Wed, Sep 10, 2025 at 4:56 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
>
> On 9/9/25 14:35, Jens Axboe wrote:
> >
> > On Thu, 04 Sep 2025 11:08:57 -0600, Caleb Sander Mateos wrote:
> >> As far as I can tell, setting IORING_SETUP_SINGLE_ISSUER when creating
> >> an io_uring doesn't actually enable any additional optimizations (aside
> >> from being a requirement for IORING_SETUP_DEFER_TASKRUN). This series
> >> leverages IORING_SETUP_SINGLE_ISSUER's guarantee that only one task
> >> submits SQEs to skip taking the uring_lock mutex in the submission and
> >> task work paths.
> >>
> >> [...]
> >
> > Applied, thanks!
> >
> > [1/5] io_uring: don't include filetable.h in io_uring.h
> >        commit: 5d4c52bfa8cdc1dc1ff701246e662be3f43a3fe1
> > [2/5] io_uring/rsrc: respect submitter_task in io_register_clone_buffers()
> >        commit: 2f076a453f75de691a081c89bce31b530153d53b
> > [3/5] io_uring: clear IORING_SETUP_SINGLE_ISSUER for IORING_SETUP_SQPOLL
> >        commit: 6f5a203998fcf43df1d43f60657d264d1918cdcd
> > [4/5] io_uring: factor out uring_lock helpers
> >        commit: 7940a4f3394a6af801af3f2bcd1d491a71a7631d
> > [5/5] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
> >        commit: 4cc292a0faf1f0755935aebc9b288ce578d0ced2
>
> FWIW, from a glance that should be quite broken, there is a bunch of
> bits protected from parallel use by the lock. I described this
> optimisation few years back around when first introduced SINGLE_ISSUER
> and the DEFER_TASKRUN locking model, but to this day think it's not
> worth it as it'll be a major pain for any future changes. It would've
> been more feasible if links wasn't a thing. Though, none of it is
> my problem anymore, and I'm not insisting.

If you have a link to this prior discussion, I'd appreciate it. I had
tried searching through the io-uring lore archives, but apparently I
don't know the magic keywords. If there are specific issues with this
locking model, I'd like to understand and address them. But opaque
references to known "bugs" are not helpful in improving these patches.

Thanks,
Caleb
Re: [PATCH v2 0/5] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
Posted by Jens Axboe 3 weeks, 1 day ago
On 9/10/25 5:57 AM, Pavel Begunkov wrote:
> On 9/9/25 14:35, Jens Axboe wrote:
>>
>> On Thu, 04 Sep 2025 11:08:57 -0600, Caleb Sander Mateos wrote:
>>> As far as I can tell, setting IORING_SETUP_SINGLE_ISSUER when creating
>>> an io_uring doesn't actually enable any additional optimizations (aside
>>> from being a requirement for IORING_SETUP_DEFER_TASKRUN). This series
>>> leverages IORING_SETUP_SINGLE_ISSUER's guarantee that only one task
>>> submits SQEs to skip taking the uring_lock mutex in the submission and
>>> task work paths.
>>>
>>> [...]
>>
>> Applied, thanks!
>>
>> [1/5] io_uring: don't include filetable.h in io_uring.h
>>        commit: 5d4c52bfa8cdc1dc1ff701246e662be3f43a3fe1
>> [2/5] io_uring/rsrc: respect submitter_task in io_register_clone_buffers()
>>        commit: 2f076a453f75de691a081c89bce31b530153d53b
>> [3/5] io_uring: clear IORING_SETUP_SINGLE_ISSUER for IORING_SETUP_SQPOLL
>>        commit: 6f5a203998fcf43df1d43f60657d264d1918cdcd
>> [4/5] io_uring: factor out uring_lock helpers
>>        commit: 7940a4f3394a6af801af3f2bcd1d491a71a7631d
>> [5/5] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
>>        commit: 4cc292a0faf1f0755935aebc9b288ce578d0ced2
> 
> FWIW, from a glance that should be quite broken, there is a bunch of
> bits protected from parallel use by the lock. I described this
> optimisation few years back around when first introduced SINGLE_ISSUER
> and the DEFER_TASKRUN locking model, but to this day think it's not
> worth it as it'll be a major pain for any future changes. It would've
> been more feasible if links wasn't a thing. Though, none of it is
> my problem anymore, and I'm not insisting.

Hmm yes, was actually pondering this last night as well and was going
to take a closer look today as I have a flight coming up. I'll leave
them in there for now just to see if syzbot finds anything, and take
that closer look and see if it's salvageable for now or if we just need
a new revised take on this.

-- 
Jens Axboe
Re: [PATCH v2 0/5] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
Posted by Caleb Sander Mateos 2 days, 4 hours ago
On Wed, Sep 10, 2025 at 8:36 AM Jens Axboe <axboe@kernel.dk> wrote:
>
> On 9/10/25 5:57 AM, Pavel Begunkov wrote:
> > On 9/9/25 14:35, Jens Axboe wrote:
> >>
> >> On Thu, 04 Sep 2025 11:08:57 -0600, Caleb Sander Mateos wrote:
> >>> As far as I can tell, setting IORING_SETUP_SINGLE_ISSUER when creating
> >>> an io_uring doesn't actually enable any additional optimizations (aside
> >>> from being a requirement for IORING_SETUP_DEFER_TASKRUN). This series
> >>> leverages IORING_SETUP_SINGLE_ISSUER's guarantee that only one task
> >>> submits SQEs to skip taking the uring_lock mutex in the submission and
> >>> task work paths.
> >>>
> >>> [...]
> >>
> >> Applied, thanks!
> >>
> >> [1/5] io_uring: don't include filetable.h in io_uring.h
> >>        commit: 5d4c52bfa8cdc1dc1ff701246e662be3f43a3fe1
> >> [2/5] io_uring/rsrc: respect submitter_task in io_register_clone_buffers()
> >>        commit: 2f076a453f75de691a081c89bce31b530153d53b
> >> [3/5] io_uring: clear IORING_SETUP_SINGLE_ISSUER for IORING_SETUP_SQPOLL
> >>        commit: 6f5a203998fcf43df1d43f60657d264d1918cdcd
> >> [4/5] io_uring: factor out uring_lock helpers
> >>        commit: 7940a4f3394a6af801af3f2bcd1d491a71a7631d
> >> [5/5] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
> >>        commit: 4cc292a0faf1f0755935aebc9b288ce578d0ced2
> >
> > FWIW, from a glance that should be quite broken, there is a bunch of
> > bits protected from parallel use by the lock. I described this
> > optimisation few years back around when first introduced SINGLE_ISSUER
> > and the DEFER_TASKRUN locking model, but to this day think it's not
> > worth it as it'll be a major pain for any future changes. It would've
> > been more feasible if links wasn't a thing. Though, none of it is
> > my problem anymore, and I'm not insisting.
>
> Hmm yes, was actually pondering this last night as well and was going
> to take a closer look today as I have a flight coming up. I'll leave
> them in there for now just to see if syzbot finds anything, and take
> that closer look and see if it's salvageable for now or if we just need
> a new revised take on this.

Hi Jens,
I'd love to understand the race condition concerns you have about not
holding the uring_lock during submission and task work. My limited
mental model of io_uring is that the io_ring_ctx is only accessed in
the context of the task that submitted work to it (during the
io_uring_enter() syscall or task work) or from some interrupt context
that reports a completed operation. Since it's not possible to block
on a mutex in interrupt context, the uring_lock couldn't be used to
synchronize anything running in interrupt context. And then all other
work would be running in the context of the single issuer task, which
couldn't race with itself. Perhaps io_uring worker threads complicate
this picture?
No urgency on this, but I would love to find a way forward here.
Acquiring and releasing the uring_lock is one of the single hottest
CPU instructions in some of our workloads even though each mutex is
accessed only on one CPU. If  uring_lock is necessary to prevent some
other critical sections from racing, perhaps there's an alternate way
to synchronize them (e.g. by deferring them to task work). Or if the
racing sections are specific to certain uses of io_uring, maybe we
could add a setup flag allowing an io_uring user to indicate that it
won't use those features.

Thanks,
Caleb
Re: [PATCH v2 0/5] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
Posted by Pavel Begunkov 3 weeks ago
On 9/10/25 16:36, Jens Axboe wrote:
> On 9/10/25 5:57 AM, Pavel Begunkov wrote:
>> On 9/9/25 14:35, Jens Axboe wrote:
>>>
>>> On Thu, 04 Sep 2025 11:08:57 -0600, Caleb Sander Mateos wrote:
>>>> As far as I can tell, setting IORING_SETUP_SINGLE_ISSUER when creating
>>>> an io_uring doesn't actually enable any additional optimizations (aside
>>>> from being a requirement for IORING_SETUP_DEFER_TASKRUN). This series
>>>> leverages IORING_SETUP_SINGLE_ISSUER's guarantee that only one task
>>>> submits SQEs to skip taking the uring_lock mutex in the submission and
>>>> task work paths.
>>>>
>>>> [...]
>>>
>>> Applied, thanks!
>>>
>>> [1/5] io_uring: don't include filetable.h in io_uring.h
>>>         commit: 5d4c52bfa8cdc1dc1ff701246e662be3f43a3fe1
>>> [2/5] io_uring/rsrc: respect submitter_task in io_register_clone_buffers()
>>>         commit: 2f076a453f75de691a081c89bce31b530153d53b
>>> [3/5] io_uring: clear IORING_SETUP_SINGLE_ISSUER for IORING_SETUP_SQPOLL
>>>         commit: 6f5a203998fcf43df1d43f60657d264d1918cdcd
>>> [4/5] io_uring: factor out uring_lock helpers
>>>         commit: 7940a4f3394a6af801af3f2bcd1d491a71a7631d
>>> [5/5] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
>>>         commit: 4cc292a0faf1f0755935aebc9b288ce578d0ced2
>>
>> FWIW, from a glance that should be quite broken, there is a bunch of
>> bits protected from parallel use by the lock. I described this
>> optimisation few years back around when first introduced SINGLE_ISSUER
>> and the DEFER_TASKRUN locking model, but to this day think it's not
>> worth it as it'll be a major pain for any future changes. It would've
>> been more feasible if links wasn't a thing. Though, none of it is
>> my problem anymore, and I'm not insisting.
> 
> Hmm yes, was actually pondering this last night as well and was going
> to take a closer look today as I have a flight coming up. I'll leave
> them in there for now just to see if syzbot finds anything, and take

Better not to? People are already nervous about security / bugs,
and there is no ambiguity how and where it's wrong. Also because
it breaks uapi in a couple places, with at least one of them
being a security hole.

> that closer look and see if it's salvageable for now or if we just need
> a new revised take on this.

-- 
Pavel Begunkov