[v4] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER

[PATCH v4 5/5] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER

Posted by Caleb Sander Mateos 2 months, 1 week ago

io_ring_ctx's mutex uring_lock can be quite expensive in high-IOPS
workloads. Even when only one thread pinned to a single CPU is accessing
the io_ring_ctx, the atomic CASes required to lock and unlock the mutex
are very hot instructions. The mutex's primary purpose is to prevent
concurrent io_uring system calls on the same io_ring_ctx. However, there
is already a flag IORING_SETUP_SINGLE_ISSUER that promises only one
task will make io_uring_enter() and io_uring_register() system calls on
the io_ring_ctx once it's enabled.
So if the io_ring_ctx is setup with IORING_SETUP_SINGLE_ISSUER, skip the
uring_lock mutex_lock() and mutex_unlock() on the submitter_task. On
other tasks acquiring the ctx uring lock, use a task work item to
suspend the submitter_task for the critical section.
If the io_ring_ctx is IORING_SETUP_R_DISABLED (possible during
io_uring_setup(), io_uring_register(), or io_uring exit), submitter_task
may be set concurrently, so acquire the uring_lock before checking it.
If submitter_task isn't set yet, the uring_lock suffices to provide
mutual exclusion.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Tested-by: syzbot@syzkaller.appspotmail.com
---
 io_uring/io_uring.c |  12 +++++
 io_uring/io_uring.h | 114 ++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 123 insertions(+), 3 deletions(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 8d934bba21fa..054667880bfb 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -363,10 +363,22 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p)
 	xa_destroy(&ctx->io_bl_xa);
 	kfree(ctx);
 	return NULL;
 }
 
+void io_ring_suspend_work(struct callback_head *cb_head)
+{
+	struct io_ring_suspend_work *suspend_work =
+		container_of(cb_head, struct io_ring_suspend_work, cb_head);
+	DECLARE_COMPLETION_ONSTACK(suspend_end);
+
+	*suspend_work->suspend_end = &suspend_end;
+	complete(&suspend_work->suspend_start);
+
+	wait_for_completion(&suspend_end);
+}
+
 static void io_clean_op(struct io_kiocb *req)
 {
 	if (unlikely(req->flags & REQ_F_BUFFER_SELECTED))
 		io_kbuf_drop_legacy(req);
 
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index 57c3eef26a88..2b08d0ddab30 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -1,8 +1,9 @@
 #ifndef IOU_CORE_H
 #define IOU_CORE_H
 
+#include <linux/completion.h>
 #include <linux/errno.h>
 #include <linux/lockdep.h>
 #include <linux/resume_user_mode.h>
 #include <linux/kasan.h>
 #include <linux/poll.h>
@@ -195,19 +196,85 @@ void io_queue_next(struct io_kiocb *req);
 void io_task_refs_refill(struct io_uring_task *tctx);
 bool __io_alloc_req_refill(struct io_ring_ctx *ctx);
 
 void io_activate_pollwq(struct io_ring_ctx *ctx);
 
+/*
+ * The ctx uring lock protects most of the mutable struct io_ring_ctx state
+ * accessed in the struct io_kiocb issue path. In the I/O path, it is typically
+ * acquired in the io_uring_enter() syscall and in io_handle_tw_list(). For
+ * IORING_SETUP_SQPOLL, it's acquired by io_sq_thread() instead. io_kiocb's
+ * issued with IO_URING_F_UNLOCKED in issue_flags (e.g. by io_wq_submit_work())
+ * acquire and release the ctx uring lock whenever they must touch io_ring_ctx
+ * state. io_uring_register() also acquires the ctx uring lock because most
+ * opcodes mutate io_ring_ctx state accessed in the issue path.
+ *
+ * For !IORING_SETUP_SINGLE_ISSUER io_ring_ctx's, acquiring the ctx uring lock
+ * is done via mutex_(try)lock(&ctx->uring_lock).
+ *
+ * However, for IORING_SETUP_SINGLE_ISSUER, we can avoid the mutex_lock() +
+ * mutex_unlock() overhead on submitter_task because a single thread can't race
+ * with itself. In the uncommon case where the ctx uring lock is needed on
+ * another thread, it must suspend submitter_task by scheduling a task work item
+ * on it. io_ring_ctx_lock() returns once the task work item has started.
+ * io_ring_ctx_unlock() allows the task work item to complete.
+ * If io_ring_ctx_lock() is called while the ctx is IORING_SETUP_R_DISABLED
+ * (e.g. during ctx create or exit), io_ring_ctx_lock() must acquire uring_lock
+ * because submitter_task isn't set yet. submitter_task can be accessed once
+ * uring_lock is held. If submitter_task exists, we do the same thing as in the
+ * non-IORING_SETUP_R_DISABLED case (except with uring_lock also held). If
+ * submitter_task isn't set, all other io_ring_ctx_lock() callers will also
+ * acquire uring_lock, so it suffices for mutual exclusion.
+ */
+
+struct io_ring_suspend_work {
+	struct callback_head cb_head;
+	struct completion suspend_start;
+	struct completion **suspend_end;
+};
+
+void io_ring_suspend_work(struct callback_head *cb_head);
+
 struct io_ring_ctx_lock_state {
+	bool need_mutex;
+	struct completion *suspend_end;
 };
 
 /* Acquire the ctx uring lock with the given nesting level */
 static inline void io_ring_ctx_lock_nested(struct io_ring_ctx *ctx,
 					   unsigned int subclass,
 					   struct io_ring_ctx_lock_state *state)
 {
-	mutex_lock_nested(&ctx->uring_lock, subclass);
+	struct io_ring_suspend_work suspend_work;
+
+	if (!(ctx->flags & IORING_SETUP_SINGLE_ISSUER)) {
+		mutex_lock_nested(&ctx->uring_lock, subclass);
+		return;
+	}
+
+	state->suspend_end = NULL;
+	state->need_mutex =
+		!!(smp_load_acquire(&ctx->flags) & IORING_SETUP_R_DISABLED);
+	if (unlikely(state->need_mutex)) {
+		mutex_lock_nested(&ctx->uring_lock, subclass);
+		if (likely(!ctx->submitter_task))
+			return;
+	}
+
+	if (likely(current == ctx->submitter_task))
+		return;
+
+	/* Use task work to suspend submitter_task */
+	init_task_work(&suspend_work.cb_head, io_ring_suspend_work);
+	init_completion(&suspend_work.suspend_start);
+	suspend_work.suspend_end = &state->suspend_end;
+	/* If task_work_add() fails, task is exiting, so no need to suspend */
+	if (unlikely(task_work_add(ctx->submitter_task, &suspend_work.cb_head,
+				   TWA_SIGNAL)))
+		return;
+
+	wait_for_completion(&suspend_work.suspend_start);
 }
 
 /* Acquire the ctx uring lock */
 static inline void io_ring_ctx_lock(struct io_ring_ctx *ctx,
 				    struct io_ring_ctx_lock_state *state)
@@ -217,29 +284,70 @@ static inline void io_ring_ctx_lock(struct io_ring_ctx *ctx,
 
 /* Attempt to acquire the ctx uring lock without blocking */
 static inline bool io_ring_ctx_trylock(struct io_ring_ctx *ctx,
 				       struct io_ring_ctx_lock_state *state)
 {
-	return mutex_trylock(&ctx->uring_lock);
+	if (!(ctx->flags & IORING_SETUP_SINGLE_ISSUER))
+		return mutex_trylock(&ctx->uring_lock);
+
+	state->suspend_end = NULL;
+	state->need_mutex =
+		!!(smp_load_acquire(&ctx->flags) & IORING_SETUP_R_DISABLED);
+	if (unlikely(state->need_mutex)) {
+		if (!mutex_trylock(&ctx->uring_lock))
+			return false;
+		if (likely(!ctx->submitter_task))
+			return true;
+	}
+
+	if (unlikely(current != ctx->submitter_task))
+		goto unlock;
+
+	return true;
+
+unlock:
+	if (unlikely(state->need_mutex))
+		mutex_unlock(&ctx->uring_lock);
+	return false;
 }
 
 /* Release the ctx uring lock */
 static inline void io_ring_ctx_unlock(struct io_ring_ctx *ctx,
 				      struct io_ring_ctx_lock_state *state)
 {
-	mutex_unlock(&ctx->uring_lock);
+	if (!(ctx->flags & IORING_SETUP_SINGLE_ISSUER)) {
+		mutex_unlock(&ctx->uring_lock);
+		return;
+	}
+
+	if (unlikely(state->need_mutex))
+		mutex_unlock(&ctx->uring_lock);
+	if (unlikely(state->suspend_end))
+		complete(state->suspend_end);
 }
 
 /* Return (if CONFIG_LOCKDEP) whether the ctx uring lock is held */
 static inline bool io_ring_ctx_lock_held(const struct io_ring_ctx *ctx)
 {
+	/*
+	 * No straightforward way to check that submitter_task is suspended
+	 * without access to struct io_ring_ctx_lock_state
+	 */
+	if (ctx->flags & IORING_SETUP_SINGLE_ISSUER &&
+	    !(ctx->flags & IORING_SETUP_R_DISABLED))
+		return true;
+
 	return lockdep_is_held(&ctx->uring_lock);
 }
 
 /* Assert (if CONFIG_LOCKDEP) that the ctx uring lock is held */
 static inline void io_ring_ctx_assert_locked(const struct io_ring_ctx *ctx)
 {
+	if (ctx->flags & IORING_SETUP_SINGLE_ISSUER &&
+	    !(ctx->flags & IORING_SETUP_R_DISABLED))
+		return;
+
 	lockdep_assert_held(&ctx->uring_lock);
 }
 
 static inline void io_lockdep_assert_cq_locked(struct io_ring_ctx *ctx)
 {
-- 
2.45.2

Re: [PATCH v4 5/5] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER

Posted by kernel test robot 2 months ago


Hello,

kernel test robot noticed "Oops:general_protection_fault,probably_for_non-canonical_address#:#[##]KASAN" on:

commit: a924e7ffd1b0b2e015ed1174662d52053a2339c4 ("[PATCH v4 5/5] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER")
url: https://github.com/intel-lab-lkp/linux/commits/Caleb-Sander-Mateos/io_uring-use-release-acquire-ordering-for-IORING_SETUP_R_DISABLED/20251203-004502
base: https://git.kernel.org/cgit/linux/kernel/git/axboe/linux.git for-next
patch link: https://lore.kernel.org/all/20251202164121.3612929-6-csander@purestorage.com/
patch subject: [PATCH v4 5/5] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER

in testcase: trinity
version: 
with following parameters:

	runtime: 300s
	group: group-00
	nr_groups: 5



config: x86_64-randconfig-015-20251205
compiler: gcc-14
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 32G

(please refer to attached dmesg/kmsg for entire log/backtrace)



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202512101405.a7a2bdb2-lkp@intel.com


[  617.261968][ T3783] Oops: general protection fault, probably for non-canonical address 0xdffffc00000000f3: 0000 [#1] KASAN
[  617.267361][ T3783] KASAN: null-ptr-deref in range [0x0000000000000798-0x000000000000079f]
[  617.268334][ T3783] CPU: 0 UID: 65534 PID: 3783 Comm: trinity-c0 Not tainted 6.18.0-rc6-00312-ga924e7ffd1b0 #1 PREEMPT(lazy)  f22e3d733e0666690a06b271bf82578b56b40aa3
[  617.269927][ T3783] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[  617.271108][ T3783] RIP: 0010:task_work_add (kbuild/src/consumer/kernel/task_work.c:68 (discriminator 2))
[  617.271772][ T3783] Code: 39 25 df fe 67 03 0f 85 8c 01 00 00 e8 1c bd 24 00 4d 8d ac 24 98 07 00 00 48 b8 00 00 00 00 00 fc ff df 4c 89 ea 48 c1 ea 03 <80> 3c 02 00 0f 85 2f 02 00 00 49 89 df 48 8d 44 24 38 4d 8b b4 24
All code
========
   0:	39 25 df fe 67 03    	cmp    %esp,0x367fedf(%rip)        # 0x367fee5
   6:	0f 85 8c 01 00 00    	jne    0x198
   c:	e8 1c bd 24 00       	call   0x24bd2d
  11:	4d 8d ac 24 98 07 00 	lea    0x798(%r12),%r13
  18:	00 
  19:	48 b8 00 00 00 00 00 	movabs $0xdffffc0000000000,%rax
  20:	fc ff df 
  23:	4c 89 ea             	mov    %r13,%rdx
  26:	48 c1 ea 03          	shr    $0x3,%rdx
  2a:*	80 3c 02 00          	cmpb   $0x0,(%rdx,%rax,1)		<-- trapping instruction
  2e:	0f 85 2f 02 00 00    	jne    0x263
  34:	49 89 df             	mov    %rbx,%r15
  37:	48 8d 44 24 38       	lea    0x38(%rsp),%rax
  3c:	4d                   	rex.WRB
  3d:	8b                   	.byte 0x8b
  3e:	b4 24                	mov    $0x24,%ah

Code starting with the faulting instruction
===========================================
   0:	80 3c 02 00          	cmpb   $0x0,(%rdx,%rax,1)
   4:	0f 85 2f 02 00 00    	jne    0x239
   a:	49 89 df             	mov    %rbx,%r15
   d:	48 8d 44 24 38       	lea    0x38(%rsp),%rax
  12:	4d                   	rex.WRB
  13:	8b                   	.byte 0x8b
  14:	b4 24                	mov    $0x24,%ah
[  617.273774][ T3783] RSP: 0018:ffff88816ac9fb10 EFLAGS: 00010206
[  617.274486][ T3783] RAX: dffffc0000000000 RBX: ffff88816ac9fbe0 RCX: 0000000000000000
[  617.275413][ T3783] RDX: 00000000000000f3 RSI: 0000000000000000 RDI: 0000000000000000
[  617.276336][ T3783] RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000000
[  617.277257][ T3783] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[  617.278178][ T3783] R13: 0000000000000798 R14: 1ffff1102d593f65 R15: ffff88816ac9fcf0
[  617.279075][ T3783] FS:  00000000010a2880(0000) GS:0000000000000000(0000) knlGS:0000000000000000
[  617.280114][ T3783] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  617.280856][ T3783] CR2: 00000000d684d000 CR3: 000000015f35b000 CR4: 00000000000406f0
[  617.281749][ T3783] Call Trace:
[  617.282202][ T3783]  <TASK>
[  617.282613][ T3783]  ? lockdep_init_map_type (kbuild/src/consumer/kernel/locking/lockdep.c:4973 (discriminator 1))
[  617.283274][ T3783]  ? task_work_set_notify_irq (kbuild/src/consumer/kernel/task_work.c:56)
[  617.283904][ T3783]  ? lockdep_init_map_type (kbuild/src/consumer/kernel/locking/lockdep.c:4973 (discriminator 1))
[  617.284515][ T3783]  ? __init_swait_queue_head (kbuild/src/consumer/include/linux/list.h:45 (discriminator 2) kbuild/src/consumer/kernel/sched/swait.c:12 (discriminator 2))


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20251210/202512101405.a7a2bdb2-lkp@intel.com



-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

Re: [PATCH v4 5/5] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER

Posted by Caleb Sander Mateos 1 month, 3 weeks ago

On Tue, Dec 9, 2025 at 10:20 PM kernel test robot <oliver.sang@intel.com> wrote:
>
>
>
> Hello,
>
> kernel test robot noticed "Oops:general_protection_fault,probably_for_non-canonical_address#:#[##]KASAN" on:
>
> commit: a924e7ffd1b0b2e015ed1174662d52053a2339c4 ("[PATCH v4 5/5] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER")
> url: https://github.com/intel-lab-lkp/linux/commits/Caleb-Sander-Mateos/io_uring-use-release-acquire-ordering-for-IORING_SETUP_R_DISABLED/20251203-004502
> base: https://git.kernel.org/cgit/linux/kernel/git/axboe/linux.git for-next
> patch link: https://lore.kernel.org/all/20251202164121.3612929-6-csander@purestorage.com/
> patch subject: [PATCH v4 5/5] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER
>
> in testcase: trinity
> version:
> with following parameters:
>
>         runtime: 300s
>         group: group-00
>         nr_groups: 5
>
>
>
> config: x86_64-randconfig-015-20251205
> compiler: gcc-14
> test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 32G
>
> (please refer to attached dmesg/kmsg for entire log/backtrace)
>
>
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202512101405.a7a2bdb2-lkp@intel.com
>
>
> [  617.261968][ T3783] Oops: general protection fault, probably for non-canonical address 0xdffffc00000000f3: 0000 [#1] KASAN
> [  617.267361][ T3783] KASAN: null-ptr-deref in range [0x0000000000000798-0x000000000000079f]
> [  617.268334][ T3783] CPU: 0 UID: 65534 PID: 3783 Comm: trinity-c0 Not tainted 6.18.0-rc6-00312-ga924e7ffd1b0 #1 PREEMPT(lazy)  f22e3d733e0666690a06b271bf82578b56b40aa3
> [  617.269927][ T3783] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
> [  617.271108][ T3783] RIP: 0010:task_work_add (kbuild/src/consumer/kernel/task_work.c:68 (discriminator 2))
> [  617.271772][ T3783] Code: 39 25 df fe 67 03 0f 85 8c 01 00 00 e8 1c bd 24 00 4d 8d ac 24 98 07 00 00 48 b8 00 00 00 00 00 fc ff df 4c 89 ea 48 c1 ea 03 <80> 3c 02 00 0f 85 2f 02 00 00 49 89 df 48 8d 44 24 38 4d 8b b4 24
> All code
> ========
>    0:   39 25 df fe 67 03       cmp    %esp,0x367fedf(%rip)        # 0x367fee5
>    6:   0f 85 8c 01 00 00       jne    0x198
>    c:   e8 1c bd 24 00          call   0x24bd2d
>   11:   4d 8d ac 24 98 07 00    lea    0x798(%r12),%r13
>   18:   00
>   19:   48 b8 00 00 00 00 00    movabs $0xdffffc0000000000,%rax
>   20:   fc ff df
>   23:   4c 89 ea                mov    %r13,%rdx
>   26:   48 c1 ea 03             shr    $0x3,%rdx
>   2a:*  80 3c 02 00             cmpb   $0x0,(%rdx,%rax,1)               <-- trapping instruction
>   2e:   0f 85 2f 02 00 00       jne    0x263
>   34:   49 89 df                mov    %rbx,%r15
>   37:   48 8d 44 24 38          lea    0x38(%rsp),%rax
>   3c:   4d                      rex.WRB
>   3d:   8b                      .byte 0x8b
>   3e:   b4 24                   mov    $0x24,%ah
>
> Code starting with the faulting instruction
> ===========================================
>    0:   80 3c 02 00             cmpb   $0x0,(%rdx,%rax,1)
>    4:   0f 85 2f 02 00 00       jne    0x239
>    a:   49 89 df                mov    %rbx,%r15
>    d:   48 8d 44 24 38          lea    0x38(%rsp),%rax
>   12:   4d                      rex.WRB
>   13:   8b                      .byte 0x8b
>   14:   b4 24                   mov    $0x24,%ah
> [  617.273774][ T3783] RSP: 0018:ffff88816ac9fb10 EFLAGS: 00010206
> [  617.274486][ T3783] RAX: dffffc0000000000 RBX: ffff88816ac9fbe0 RCX: 0000000000000000
> [  617.275413][ T3783] RDX: 00000000000000f3 RSI: 0000000000000000 RDI: 0000000000000000
> [  617.276336][ T3783] RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000000
> [  617.277257][ T3783] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> [  617.278178][ T3783] R13: 0000000000000798 R14: 1ffff1102d593f65 R15: ffff88816ac9fcf0
> [  617.279075][ T3783] FS:  00000000010a2880(0000) GS:0000000000000000(0000) knlGS:0000000000000000
> [  617.280114][ T3783] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  617.280856][ T3783] CR2: 00000000d684d000 CR3: 000000015f35b000 CR4: 00000000000406f0
> [  617.281749][ T3783] Call Trace:
> [  617.282202][ T3783]  <TASK>
> [  617.282613][ T3783]  ? lockdep_init_map_type (kbuild/src/consumer/kernel/locking/lockdep.c:4973 (discriminator 1))
> [  617.283274][ T3783]  ? task_work_set_notify_irq (kbuild/src/consumer/kernel/task_work.c:56)
> [  617.283904][ T3783]  ? lockdep_init_map_type (kbuild/src/consumer/kernel/locking/lockdep.c:4973 (discriminator 1))
> [  617.284515][ T3783]  ? __init_swait_queue_head (kbuild/src/consumer/include/linux/list.h:45 (discriminator 2) kbuild/src/consumer/kernel/sched/swait.c:12 (discriminator 2))
>
>
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20251210/202512101405.a7a2bdb2-lkp@intel.com

The full Call Trace is more useful:
[  617.261968][ T3783] Oops: general protection fault, probably for
non-canonical address 0xdffffc00000000f3: 0000 [#1] KASAN
[  617.267361][ T3783] KASAN: null-ptr-deref in range
[0x0000000000000798-0x000000000000079f]
[  617.268334][ T3783] CPU: 0 UID: 65534 PID: 3783 Comm: trinity-c0
Not tainted 6.18.0-rc6-00312-ga924e7ffd1b0 #1 PREEMPT(lazy)
f22e3d733e0666690a06b271bf82578b56b40aa3
[  617.269927][ T3783] Hardware name: QEMU Standard PC (i440FX + PIIX,
1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[  617.271108][ T3783] RIP: 0010:task_work_add+0xbd/0x330
[  617.271772][ T3783] Code: 39 25 df fe 67 03 0f 85 8c 01 00 00 e8 1c
bd 24 00 4d 8d ac 24 98 07 00 00 48 b8 00 00 00 00 00 fc ff df 4c 89
ea 48 c1 ea 03 <80> 3c 02 00 0f 85 2f 02 00 00 49 89 df 48 8d 44 24 38
4d 8b b4 24
[  617.273774][ T3783] RSP: 0018:ffff88816ac9fb10 EFLAGS: 00010206
[  617.274486][ T3783] RAX: dffffc0000000000 RBX: ffff88816ac9fbe0
RCX: 0000000000000000
[  617.275413][ T3783] RDX: 00000000000000f3 RSI: 0000000000000000
RDI: 0000000000000000
[  617.276336][ T3783] RBP: 0000000000000002 R08: 0000000000000000
R09: 0000000000000000
[  617.277257][ T3783] R10: 0000000000000000 R11: 0000000000000000
R12: 0000000000000000
[  617.278178][ T3783] R13: 0000000000000798 R14: 1ffff1102d593f65
R15: ffff88816ac9fcf0
[  617.279075][ T3783] FS:  00000000010a2880(0000)
GS:0000000000000000(0000) knlGS:0000000000000000
[  617.280114][ T3783] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  617.280856][ T3783] CR2: 00000000d684d000 CR3: 000000015f35b000
CR4: 00000000000406f0
[  617.281749][ T3783] Call Trace:
[  617.282202][ T3783]  <TASK>
[  617.282613][ T3783]  ? lockdep_init_map_type+0x5c/0x240
[  617.283274][ T3783]  ? task_work_set_notify_irq+0x60/0x60
[  617.283904][ T3783]  ? lockdep_init_map_type+0x5c/0x240
[  617.284515][ T3783]  ? __init_swait_queue_head+0xca/0x160
[  617.285149][ T3783]  io_ring_ctx_lock_nested+0x295/0x340
[  617.285859][ T3783]  ? io_cqring_timer_wakeup+0xb0/0xb0
[  617.286469][ T3783]  ? perf_trace_io_uring_submit_req+0x20/0x20
[  617.287193][ T3783]  ? nohz_balancer_kick+0x140/0x7a0
[  617.294458][ T3783]  ? io_rings_free+0x7b/0xe0
[  617.295099][ T3783]  io_ring_ctx_wait_and_kill+0x6e/0x220
[  617.295749][ T3783]  ? percpu_counter_add+0x90/0x90
[  617.296356][ T3783]  ? security_capset+0xa0/0xc0
[  617.296941][ T3783]  io_uring_create+0xa0f/0xa52
[  617.297537][ T3783]  ? io_uring_poll.cold+0x1a/0x1a
[  617.298159][ T3783]  io_uring_setup.cold+0x1a/0x2e
[  617.298772][ T3783]  ? io_prepare_config+0xb10/0xb10
[  617.299469][ T3783]  __x64_sys_io_uring_setup+0xc4/0x170
[  617.300135][ T3783]  do_syscall_64+0x72/0xe40
[  617.300725][ T3783]  entry_SYSCALL_64_after_hwframe+0x4b/0x53

So we're calling io_ring_ctx_wait_and_kill() from io_uring_create(),
meaning the io_uring creation errored out early. Looks like there are
several "goto err;" paths before ctx->submitter_task is assigned. If
those error paths are taken, io_ring_ctx_lock_nested() can be called
in the IORING_SETUP_SINGLE_ISSUER && !IORING_SETUP_R_DISABLED case
with a NULL submitter_task. I think this can be fixed by just
initializing submitter_task earlier.

Thanks,
Caleb


>
>
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
>

[PATCH v4 1/5] io_uring: use release-acquire ordering for IORING_SETUP_R_DISABLED
[PATCH v4 2/5] io_uring: clear IORING_SETUP_SINGLE_ISSUER for IORING_SETUP_SQPOLL
[PATCH v4 3/5] io_uring: use io_ring_submit_lock() in io_iopoll_req_issued()
[PATCH v4 4/5] io_uring: factor out uring_lock helpers
[PATCH v4 5/5] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER