[PATCH v2 3/4] aio-posix: enable IORING_SETUP_COOP_TASKRUN | IORING_SETUP_TASKRUN_FLAG

Stefan Hajnoczi posted 4 patches 1 month, 2 weeks ago
Maintainers: Stefan Hajnoczi <stefanha@redhat.com>, Fam Zheng <fam@euphon.net>
[PATCH v2 3/4] aio-posix: enable IORING_SETUP_COOP_TASKRUN | IORING_SETUP_TASKRUN_FLAG
Posted by Stefan Hajnoczi 1 month, 2 weeks ago
The IORING_SETUP_COOP_TASKRUN flag reduces interprocessor interrupts
when an io_uring event occurs on a different CPU. The idea is that the
QEMU thread will wait for a CQE anyway, so there is no need to interrupt
the CPU that it is on.

The IORING_SETUP_TASKRUN_FLAG ensures that QEMU's io_uring CQ ring
polling still works with COOP_TASKRUN. The kernel will set a flag in the
SQ ring (this is not a typo, the flag is located in the SQ ring even
though it pertains to the CQ ring) that can be polled from userspace.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 util/fdmon-io_uring.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/util/fdmon-io_uring.c b/util/fdmon-io_uring.c
index ec056b4818..2e2c0e6785 100644
--- a/util/fdmon-io_uring.c
+++ b/util/fdmon-io_uring.c
@@ -423,13 +423,16 @@ static int fdmon_io_uring_wait(AioContext *ctx, AioHandlerList *ready_list,
 
 static bool fdmon_io_uring_need_wait(AioContext *ctx)
 {
+    struct io_uring *ring = &ctx->fdmon_io_uring;
+
     /* Have io_uring events completed? */
-    if (io_uring_cq_ready(&ctx->fdmon_io_uring)) {
+    if (io_uring_cq_ready(ring) ||
+        IO_URING_READ_ONCE(*ring->sq.kflags) & IORING_SQ_TASKRUN) {
         return true;
     }
 
     /* Are there pending sqes to submit? */
-    if (io_uring_sq_ready(&ctx->fdmon_io_uring)) {
+    if (io_uring_sq_ready(ring)) {
         return true;
     }
 
@@ -459,7 +462,15 @@ static inline bool is_creating_iothread(void)
 
 bool fdmon_io_uring_setup(AioContext *ctx, Error **errp)
 {
-    unsigned flags = 0;
+    /* Enable modern flags supported by the host kernel */
+    unsigned flags =
+#ifdef IORING_SETUP_COOP_TASKRUN
+        IORING_SETUP_COOP_TASKRUN |
+#endif
+#ifdef IORING_SETUP_TASKRUN_FLAG
+        IORING_SETUP_TASKRUN_FLAG |
+#endif
+        0;
     int ret;
 
     /*
-- 
2.53.0
Re: [PATCH v2 3/4] aio-posix: enable IORING_SETUP_COOP_TASKRUN | IORING_SETUP_TASKRUN_FLAG
Posted by Stefan Hajnoczi 1 month, 2 weeks ago
On Wed, Feb 25, 2026 at 04:13:35PM +0800, Stefan Hajnoczi wrote:
> The IORING_SETUP_COOP_TASKRUN flag reduces interprocessor interrupts
> when an io_uring event occurs on a different CPU. The idea is that the
> QEMU thread will wait for a CQE anyway, so there is no need to interrupt
> the CPU that it is on.
> 
> The IORING_SETUP_TASKRUN_FLAG ensures that QEMU's io_uring CQ ring
> polling still works with COOP_TASKRUN. The kernel will set a flag in the
> SQ ring (this is not a typo, the flag is located in the SQ ring even
> though it pertains to the CQ ring) that can be polled from userspace.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  util/fdmon-io_uring.c | 17 ++++++++++++++---
>  1 file changed, 14 insertions(+), 3 deletions(-)

Hi Jens,
I noticed liburing's io_uring_cq_ready() does not check the
IORING_SQ_TASKRUN flag. Maybe QEMU's fdmon_io_uring_gsource_check()
needs to check it here so that io_uring_enter(2) will be called with
IORING_ENTER_GETEVENTS in the glib event loop?

(This is a similar idea to your recent patch but needed when
IORING_SETUP_TASKRUN_FLAG is enabled.)

I tried to benchmark this but couldn't observe a difference in IOPS:

diff --git a/util/fdmon-io_uring.c b/util/fdmon-io_uring.c
index 652d269e03..ef4257924b 100644
--- a/util/fdmon-io_uring.c
+++ b/util/fdmon-io_uring.c
@@ -356,7 +356,8 @@ static bool fdmon_io_uring_gsource_check(AioContext *ctx)
      * the main loop can miss completions and sleep in ppoll() until the
      * next timer fires.
      */
-    return io_uring_cq_ready(&ctx->fdmon_io_uring);
+    return io_uring_cq_ready(&ctx->fdmon_io_uring) ||
+           (IO_URING_READ_ONCE(*ctx->fdmon_io_uring.sq.kflags) & IORING_SQ_TASKRUN);
 }

 /* Dispatch CQE handlers that are ready */