io_uring/cancel.c | 1 + io_uring/fdinfo.c | 2 +- io_uring/filetable.c | 3 +- io_uring/io_uring.c | 67 +++++++++++++++++++++++++++++--------------- io_uring/io_uring.h | 43 ++++++++++++++++++++++------ io_uring/kbuf.c | 6 ++-- io_uring/net.c | 1 + io_uring/notif.c | 5 ++-- io_uring/notif.h | 3 +- io_uring/openclose.c | 1 + io_uring/poll.c | 2 +- io_uring/register.c | 1 + io_uring/rsrc.c | 10 ++++++- io_uring/rsrc.h | 3 +- io_uring/rw.c | 3 +- io_uring/splice.c | 1 + io_uring/waitid.c | 2 +- 17 files changed, 111 insertions(+), 43 deletions(-)
As far as I can tell, setting IORING_SETUP_SINGLE_ISSUER when creating an io_uring doesn't actually enable any additional optimizations (aside from being a requirement for IORING_SETUP_DEFER_TASKRUN). This series leverages IORING_SETUP_SINGLE_ISSUER's guarantee that only one task submits SQEs to skip taking the uring_lock mutex in the submission and task work paths. First, we need to close a hole in the IORING_SETUP_SINGLE_ISSUER checks where IORING_REGISTER_CLONE_BUFFERS only checks whether the thread is allowed to access one of the two io_urings. It assumes the uring_lock will prevent concurrent access to the other io_uring, but this will no longer be the case after the optimization to skip taking uring_lock. We also need to remove the unused filetable.h #include from io_uring.h to avoid an #include cycle. v2: - Don't enable these optimizations for IORING_SETUP_SQPOLL, as we still need to synchronize SQ thread submission with io_uring_register() Caleb Sander Mateos (5): io_uring: don't include filetable.h in io_uring.h io_uring/rsrc: respect submitter_task in io_register_clone_buffers() io_uring: clear IORING_SETUP_SINGLE_ISSUER for IORING_SETUP_SQPOLL io_uring: factor out uring_lock helpers io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER io_uring/cancel.c | 1 + io_uring/fdinfo.c | 2 +- io_uring/filetable.c | 3 +- io_uring/io_uring.c | 67 +++++++++++++++++++++++++++++--------------- io_uring/io_uring.h | 43 ++++++++++++++++++++++------ io_uring/kbuf.c | 6 ++-- io_uring/net.c | 1 + io_uring/notif.c | 5 ++-- io_uring/notif.h | 3 +- io_uring/openclose.c | 1 + io_uring/poll.c | 2 +- io_uring/register.c | 1 + io_uring/rsrc.c | 10 ++++++- io_uring/rsrc.h | 3 +- io_uring/rw.c | 3 +- io_uring/splice.c | 1 + io_uring/waitid.c | 2 +- 17 files changed, 111 insertions(+), 43 deletions(-) -- 2.45.2
On Thu, 04 Sep 2025 11:08:57 -0600, Caleb Sander Mateos wrote: > As far as I can tell, setting IORING_SETUP_SINGLE_ISSUER when creating > an io_uring doesn't actually enable any additional optimizations (aside > from being a requirement for IORING_SETUP_DEFER_TASKRUN). This series > leverages IORING_SETUP_SINGLE_ISSUER's guarantee that only one task > submits SQEs to skip taking the uring_lock mutex in the submission and > task work paths. > > [...] Applied, thanks! [1/5] io_uring: don't include filetable.h in io_uring.h commit: 5d4c52bfa8cdc1dc1ff701246e662be3f43a3fe1 [2/5] io_uring/rsrc: respect submitter_task in io_register_clone_buffers() commit: 2f076a453f75de691a081c89bce31b530153d53b [3/5] io_uring: clear IORING_SETUP_SINGLE_ISSUER for IORING_SETUP_SQPOLL commit: 6f5a203998fcf43df1d43f60657d264d1918cdcd [4/5] io_uring: factor out uring_lock helpers commit: 7940a4f3394a6af801af3f2bcd1d491a71a7631d [5/5] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER commit: 4cc292a0faf1f0755935aebc9b288ce578d0ced2 Best regards, -- Jens Axboe
On 9/9/25 14:35, Jens Axboe wrote: > > On Thu, 04 Sep 2025 11:08:57 -0600, Caleb Sander Mateos wrote: >> As far as I can tell, setting IORING_SETUP_SINGLE_ISSUER when creating >> an io_uring doesn't actually enable any additional optimizations (aside >> from being a requirement for IORING_SETUP_DEFER_TASKRUN). This series >> leverages IORING_SETUP_SINGLE_ISSUER's guarantee that only one task >> submits SQEs to skip taking the uring_lock mutex in the submission and >> task work paths. >> >> [...] > > Applied, thanks! > > [1/5] io_uring: don't include filetable.h in io_uring.h > commit: 5d4c52bfa8cdc1dc1ff701246e662be3f43a3fe1 > [2/5] io_uring/rsrc: respect submitter_task in io_register_clone_buffers() > commit: 2f076a453f75de691a081c89bce31b530153d53b > [3/5] io_uring: clear IORING_SETUP_SINGLE_ISSUER for IORING_SETUP_SQPOLL > commit: 6f5a203998fcf43df1d43f60657d264d1918cdcd > [4/5] io_uring: factor out uring_lock helpers > commit: 7940a4f3394a6af801af3f2bcd1d491a71a7631d > [5/5] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER > commit: 4cc292a0faf1f0755935aebc9b288ce578d0ced2 FWIW, from a glance that should be quite broken, there is a bunch of bits protected from parallel use by the lock. I described this optimisation few years back around when first introduced SINGLE_ISSUER and the DEFER_TASKRUN locking model, but to this day think it's not worth it as it'll be a major pain for any future changes. It would've been more feasible if links wasn't a thing. Though, none of it is my problem anymore, and I'm not insisting. -- Pavel Begunkov
On Wed, Sep 10, 2025 at 4:56 AM Pavel Begunkov <asml.silence@gmail.com> wrote: > > On 9/9/25 14:35, Jens Axboe wrote: > > > > On Thu, 04 Sep 2025 11:08:57 -0600, Caleb Sander Mateos wrote: > >> As far as I can tell, setting IORING_SETUP_SINGLE_ISSUER when creating > >> an io_uring doesn't actually enable any additional optimizations (aside > >> from being a requirement for IORING_SETUP_DEFER_TASKRUN). This series > >> leverages IORING_SETUP_SINGLE_ISSUER's guarantee that only one task > >> submits SQEs to skip taking the uring_lock mutex in the submission and > >> task work paths. > >> > >> [...] > > > > Applied, thanks! > > > > [1/5] io_uring: don't include filetable.h in io_uring.h > > commit: 5d4c52bfa8cdc1dc1ff701246e662be3f43a3fe1 > > [2/5] io_uring/rsrc: respect submitter_task in io_register_clone_buffers() > > commit: 2f076a453f75de691a081c89bce31b530153d53b > > [3/5] io_uring: clear IORING_SETUP_SINGLE_ISSUER for IORING_SETUP_SQPOLL > > commit: 6f5a203998fcf43df1d43f60657d264d1918cdcd > > [4/5] io_uring: factor out uring_lock helpers > > commit: 7940a4f3394a6af801af3f2bcd1d491a71a7631d > > [5/5] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER > > commit: 4cc292a0faf1f0755935aebc9b288ce578d0ced2 > > FWIW, from a glance that should be quite broken, there is a bunch of > bits protected from parallel use by the lock. I described this > optimisation few years back around when first introduced SINGLE_ISSUER > and the DEFER_TASKRUN locking model, but to this day think it's not > worth it as it'll be a major pain for any future changes. It would've > been more feasible if links wasn't a thing. Though, none of it is > my problem anymore, and I'm not insisting. If you have a link to this prior discussion, I'd appreciate it. I had tried searching through the io-uring lore archives, but apparently I don't know the magic keywords. If there are specific issues with this locking model, I'd like to understand and address them. But opaque references to known "bugs" are not helpful in improving these patches. Thanks, Caleb
On 9/10/25 5:57 AM, Pavel Begunkov wrote: > On 9/9/25 14:35, Jens Axboe wrote: >> >> On Thu, 04 Sep 2025 11:08:57 -0600, Caleb Sander Mateos wrote: >>> As far as I can tell, setting IORING_SETUP_SINGLE_ISSUER when creating >>> an io_uring doesn't actually enable any additional optimizations (aside >>> from being a requirement for IORING_SETUP_DEFER_TASKRUN). This series >>> leverages IORING_SETUP_SINGLE_ISSUER's guarantee that only one task >>> submits SQEs to skip taking the uring_lock mutex in the submission and >>> task work paths. >>> >>> [...] >> >> Applied, thanks! >> >> [1/5] io_uring: don't include filetable.h in io_uring.h >> commit: 5d4c52bfa8cdc1dc1ff701246e662be3f43a3fe1 >> [2/5] io_uring/rsrc: respect submitter_task in io_register_clone_buffers() >> commit: 2f076a453f75de691a081c89bce31b530153d53b >> [3/5] io_uring: clear IORING_SETUP_SINGLE_ISSUER for IORING_SETUP_SQPOLL >> commit: 6f5a203998fcf43df1d43f60657d264d1918cdcd >> [4/5] io_uring: factor out uring_lock helpers >> commit: 7940a4f3394a6af801af3f2bcd1d491a71a7631d >> [5/5] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER >> commit: 4cc292a0faf1f0755935aebc9b288ce578d0ced2 > > FWIW, from a glance that should be quite broken, there is a bunch of > bits protected from parallel use by the lock. I described this > optimisation few years back around when first introduced SINGLE_ISSUER > and the DEFER_TASKRUN locking model, but to this day think it's not > worth it as it'll be a major pain for any future changes. It would've > been more feasible if links wasn't a thing. Though, none of it is > my problem anymore, and I'm not insisting. Hmm yes, was actually pondering this last night as well and was going to take a closer look today as I have a flight coming up. I'll leave them in there for now just to see if syzbot finds anything, and take that closer look and see if it's salvageable for now or if we just need a new revised take on this. -- Jens Axboe
On Wed, Sep 10, 2025 at 8:36 AM Jens Axboe <axboe@kernel.dk> wrote: > > On 9/10/25 5:57 AM, Pavel Begunkov wrote: > > On 9/9/25 14:35, Jens Axboe wrote: > >> > >> On Thu, 04 Sep 2025 11:08:57 -0600, Caleb Sander Mateos wrote: > >>> As far as I can tell, setting IORING_SETUP_SINGLE_ISSUER when creating > >>> an io_uring doesn't actually enable any additional optimizations (aside > >>> from being a requirement for IORING_SETUP_DEFER_TASKRUN). This series > >>> leverages IORING_SETUP_SINGLE_ISSUER's guarantee that only one task > >>> submits SQEs to skip taking the uring_lock mutex in the submission and > >>> task work paths. > >>> > >>> [...] > >> > >> Applied, thanks! > >> > >> [1/5] io_uring: don't include filetable.h in io_uring.h > >> commit: 5d4c52bfa8cdc1dc1ff701246e662be3f43a3fe1 > >> [2/5] io_uring/rsrc: respect submitter_task in io_register_clone_buffers() > >> commit: 2f076a453f75de691a081c89bce31b530153d53b > >> [3/5] io_uring: clear IORING_SETUP_SINGLE_ISSUER for IORING_SETUP_SQPOLL > >> commit: 6f5a203998fcf43df1d43f60657d264d1918cdcd > >> [4/5] io_uring: factor out uring_lock helpers > >> commit: 7940a4f3394a6af801af3f2bcd1d491a71a7631d > >> [5/5] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER > >> commit: 4cc292a0faf1f0755935aebc9b288ce578d0ced2 > > > > FWIW, from a glance that should be quite broken, there is a bunch of > > bits protected from parallel use by the lock. I described this > > optimisation few years back around when first introduced SINGLE_ISSUER > > and the DEFER_TASKRUN locking model, but to this day think it's not > > worth it as it'll be a major pain for any future changes. It would've > > been more feasible if links wasn't a thing. Though, none of it is > > my problem anymore, and I'm not insisting. > > Hmm yes, was actually pondering this last night as well and was going > to take a closer look today as I have a flight coming up. I'll leave > them in there for now just to see if syzbot finds anything, and take > that closer look and see if it's salvageable for now or if we just need > a new revised take on this. Hi Jens, I'd love to understand the race condition concerns you have about not holding the uring_lock during submission and task work. My limited mental model of io_uring is that the io_ring_ctx is only accessed in the context of the task that submitted work to it (during the io_uring_enter() syscall or task work) or from some interrupt context that reports a completed operation. Since it's not possible to block on a mutex in interrupt context, the uring_lock couldn't be used to synchronize anything running in interrupt context. And then all other work would be running in the context of the single issuer task, which couldn't race with itself. Perhaps io_uring worker threads complicate this picture? No urgency on this, but I would love to find a way forward here. Acquiring and releasing the uring_lock is one of the single hottest CPU instructions in some of our workloads even though each mutex is accessed only on one CPU. If uring_lock is necessary to prevent some other critical sections from racing, perhaps there's an alternate way to synchronize them (e.g. by deferring them to task work). Or if the racing sections are specific to certain uses of io_uring, maybe we could add a setup flag allowing an io_uring user to indicate that it won't use those features. Thanks, Caleb
On 9/10/25 16:36, Jens Axboe wrote: > On 9/10/25 5:57 AM, Pavel Begunkov wrote: >> On 9/9/25 14:35, Jens Axboe wrote: >>> >>> On Thu, 04 Sep 2025 11:08:57 -0600, Caleb Sander Mateos wrote: >>>> As far as I can tell, setting IORING_SETUP_SINGLE_ISSUER when creating >>>> an io_uring doesn't actually enable any additional optimizations (aside >>>> from being a requirement for IORING_SETUP_DEFER_TASKRUN). This series >>>> leverages IORING_SETUP_SINGLE_ISSUER's guarantee that only one task >>>> submits SQEs to skip taking the uring_lock mutex in the submission and >>>> task work paths. >>>> >>>> [...] >>> >>> Applied, thanks! >>> >>> [1/5] io_uring: don't include filetable.h in io_uring.h >>> commit: 5d4c52bfa8cdc1dc1ff701246e662be3f43a3fe1 >>> [2/5] io_uring/rsrc: respect submitter_task in io_register_clone_buffers() >>> commit: 2f076a453f75de691a081c89bce31b530153d53b >>> [3/5] io_uring: clear IORING_SETUP_SINGLE_ISSUER for IORING_SETUP_SQPOLL >>> commit: 6f5a203998fcf43df1d43f60657d264d1918cdcd >>> [4/5] io_uring: factor out uring_lock helpers >>> commit: 7940a4f3394a6af801af3f2bcd1d491a71a7631d >>> [5/5] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER >>> commit: 4cc292a0faf1f0755935aebc9b288ce578d0ced2 >> >> FWIW, from a glance that should be quite broken, there is a bunch of >> bits protected from parallel use by the lock. I described this >> optimisation few years back around when first introduced SINGLE_ISSUER >> and the DEFER_TASKRUN locking model, but to this day think it's not >> worth it as it'll be a major pain for any future changes. It would've >> been more feasible if links wasn't a thing. Though, none of it is >> my problem anymore, and I'm not insisting. > > Hmm yes, was actually pondering this last night as well and was going > to take a closer look today as I have a flight coming up. I'll leave > them in there for now just to see if syzbot finds anything, and take Better not to? People are already nervous about security / bugs, and there is no ambiguity how and where it's wrong. Also because it breaks uapi in a couple places, with at least one of them being a security hole. > that closer look and see if it's salvageable for now or if we just need > a new revised take on this. -- Pavel Begunkov
© 2016 - 2025 Red Hat, Inc.