From nobody Tue Dec 2 00:02:43 2025 Received: from mail-qk1-f225.google.com (mail-qk1-f225.google.com [209.85.222.225]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A0B3C2F28FC for ; Tue, 25 Nov 2025 23:39:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.225 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764113985; cv=none; b=pkpOcVYZ4D+dcE20Ea5b1vslMFSvqeVSfrsu89F+W0Nvj0/9PYtV+39oTBu2h+tS36WX9XtY/WRnE0HLB3pLv7x2jkJr/Amp4aV86IDVoC2oTOJEC53rtIswrvX4qS40DzdfZ6VoRskr5rFtcOv3ParMqw6gAw+lOisF8O3lN4k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764113985; c=relaxed/simple; bh=Tc9M7XOLzN3MKCvdvFx/ylZfkBG+8HRcgSWv13XC13k=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ezD+ZwLNgG4wdiddltQsbQ8T98CHU0c8Uxjs+GTCe+p4fVfkTmc8m4sk+Uy695SdZpH24c6iuBw12IHaalmukxdjbaYocggBueiM1qta+DqZH8hY3nBhXKY3FegWshNBzMX2wm/n2Rtg2MKqvwcXVP4OYRFjdX/oHzVCLDgH9T0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=purestorage.com; spf=fail smtp.mailfrom=purestorage.com; dkim=pass (2048-bit key) header.d=purestorage.com header.i=@purestorage.com header.b=HCDhzo4Y; arc=none smtp.client-ip=209.85.222.225 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=purestorage.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=purestorage.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=purestorage.com header.i=@purestorage.com header.b="HCDhzo4Y" Received: by mail-qk1-f225.google.com with SMTP id af79cd13be357-8b2e2500517so116065685a.1 for ; Tue, 25 Nov 2025 15:39:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=purestorage.com; s=google2022; t=1764113982; x=1764718782; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=uyOg0feMqAK4/apdTc/BWDHfN53hYCUTgX4vLDoCsVk=; b=HCDhzo4YF9AoNARL+ZcqmCFr+3W3zlR17LUERaHE5IRyn0Ab0+Q34V+qaiUoblYA50 CkUVd6cGuEV7qc2PiOQsaAD6KSK2yk3mJa47FQeXjLXNTA36Rdo1eCSoLbyn9s/93DfJ Lv2bj5FsujwVmuZr7+bVrIEOGa9A/OBVNIs+voTZh98hlgoRr5w5mcREVMjGgKWLGGUx XYIinTMilPjOkTwnMigw9Bf55BWYZS1dW6oBN+0CGVMIuOj+dIbiIn9FuCC6fHLsBslG UPQe3kDLy9vv6jzA7aAl+MZzYcFusFyP0dRYGrp1U7M9Ecmn3iyvyUz30D/EAk4fqdyw 9Ikw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1764113982; x=1764718782; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=uyOg0feMqAK4/apdTc/BWDHfN53hYCUTgX4vLDoCsVk=; b=vCXugYscaMeh+ehuxl/kq1FSZrN1CvTKveh/naLE0sgPtKxclNoELGLtkHI4B+9aFc rqT9xgCH6n2fk7VJTgROmgzA+hVBQ//jJpqFrt6vKuewsaPlAr9CndTih4PgB614noh5 UPEJbfFmOoILx8BmDURCGKMDnkR/X4iQB+FgS5NP2eIJzicftJ7fJMyAHqRRgVkOMENu fUwF6iGibXAIgKrqZCj3kPRaaMpBcAP6gWKHklPle5xa1E6uIq/mUqgOs4/vHFoOAea6 unc+v7JUJJv5unHVVohwm1d0LxstEflYrx3O9hyQobgdz/xpykU838iNgk+3eJlHyOOm 5PfQ== X-Forwarded-Encrypted: i=1; AJvYcCUgM1R9UPUzr8o7NQBaw/3gvEXKV7pwPnu1QG2vSFg/UYxQVokgkWFTTcY2R4LXHIlmnYkQVMHaaJvV9Ww=@vger.kernel.org X-Gm-Message-State: AOJu0YwMi7rPFtz3MoC2BsCyAT2DwWCDCkwdi4o3jP7vdDWNMeyq5srT smHk3angFdZA9Ht+l6LZ+dFkUJR31hqNzu1eam3HO7oTz/Fe0C2MPHCVck9lr2py5RiznoPWUOV cHNLfXmiee1a5AC3Hzk+mMqMcitrymTDJfO16CRk515YEuS98EgGE X-Gm-Gg: ASbGncvfY+1+4XtKwLZXLlmBba/Ai+1b46roJ6V95eZ8RIS5jyGtcMO0AXr+2A31Pyj +AHWNkS9mp3mplKbf0oBbm4K5EfQxS9yuR/eFFb6nPch85dYkcpvq+mrQWoYPIsMoRtvRcqCFE4 DJBxt+D9c7jwvGLMfH+AY3sRLzd8GcFl1mZ39of6ZanN66PmDgEmDpPdiAn3PuEuCVWe9IwInFa Y4zuIJEYQBb7u3yFnUIyaId/jpr1z22uMAemDlUChksmyXCW6uFHVu559GGN9W/Yd+Psg9dRtkD kKsXQidPyfSp2v0JcAUTU0l2emHve8ndxYecHKhbdlyJHbwiUxe8x0rxf6XjoSL6LdQR4R9+w9W WMDhu7C5Vmpa9K6jh1kuNx0FD7CE= X-Google-Smtp-Source: AGHT+IEqcoLxP9btPk8rF40Uuvw6WMHO2Hop7Z9kE77zxfDpmhyzspv5F1buVGRy8r+tzzqM1cup7mKYmwKj X-Received: by 2002:a05:620a:444e:b0:8a3:d644:6930 with SMTP id af79cd13be357-8b341ccaae3mr1675591485a.5.1764113982406; Tue, 25 Nov 2025 15:39:42 -0800 (PST) Received: from c7-smtp-2023.dev.purestorage.com ([2620:125:9017:12:36:3:5:0]) by smtp-relay.gmail.com with ESMTPS id af79cd13be357-8b3293255cbsm176858485a.2.2025.11.25.15.39.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 25 Nov 2025 15:39:42 -0800 (PST) X-Relaying-Domain: purestorage.com Received: from dev-csander.dev.purestorage.com (unknown [IPv6:2620:125:9007:640:ffff::1199]) by c7-smtp-2023.dev.purestorage.com (Postfix) with ESMTP id 3FBE33400AF; Tue, 25 Nov 2025 16:39:41 -0700 (MST) Received: by dev-csander.dev.purestorage.com (Postfix, from userid 1557716354) id 3DB1AE41EF2; Tue, 25 Nov 2025 16:39:41 -0700 (MST) From: Caleb Sander Mateos To: Jens Axboe Cc: io-uring@vger.kernel.org, linux-kernel@vger.kernel.org, Caleb Sander Mateos Subject: [PATCH v3 4/4] io_uring: avoid uring_lock for IORING_SETUP_SINGLE_ISSUER Date: Tue, 25 Nov 2025 16:39:28 -0700 Message-ID: <20251125233928.3962947-5-csander@purestorage.com> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20251125233928.3962947-1-csander@purestorage.com> References: <20251125233928.3962947-1-csander@purestorage.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" io_ring_ctx's mutex uring_lock can be quite expensive in high-IOPS workloads. Even when only one thread pinned to a single CPU is accessing the io_ring_ctx, the atomic CASes required to lock and unlock the mutex are very hot instructions. The mutex's primary purpose is to prevent concurrent io_uring system calls on the same io_ring_ctx. However, there is already a flag IORING_SETUP_SINGLE_ISSUER that promises only one task will make io_uring_enter() and io_uring_register() system calls on the io_ring_ctx once it's enabled. So if the io_ring_ctx is setup with IORING_SETUP_SINGLE_ISSUER, skip the uring_lock mutex_lock() and mutex_unlock() on the submitter_task. On other tasks acquiring the ctx uring lock, use a task work item to suspend the submitter_task for the critical section. In io_uring_register(), continue to always acquire the uring_lock mutex. io_uring_register() can be called on a disabled io_ring_ctx (indeed, it's required to enable it), when submitter_task isn't set yet. After submitter_task is set, io_uring_register() is only permitted on submitter_task, so uring_lock suffices to exclude all other users. Signed-off-by: Caleb Sander Mateos --- io_uring/io_uring.c | 11 +++++ io_uring/io_uring.h | 101 ++++++++++++++++++++++++++++++++++++++++++-- 2 files changed, 109 insertions(+), 3 deletions(-) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index e05e56a840f9..64e4e57e2c11 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -363,10 +363,21 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(s= truct io_uring_params *p) xa_destroy(&ctx->io_bl_xa); kfree(ctx); return NULL; } =20 +void io_ring_suspend_work(struct callback_head *cb_head) +{ + struct io_ring_suspend_work *suspend_work =3D + container_of(cb_head, struct io_ring_suspend_work, cb_head); + DECLARE_COMPLETION_ONSTACK(suspend_end); + + suspend_work->lock_state->suspend_end =3D &suspend_end; + complete(&suspend_work->suspend_start); + wait_for_completion(&suspend_end); +} + static void io_clean_op(struct io_kiocb *req) { if (unlikely(req->flags & REQ_F_BUFFER_SELECTED)) io_kbuf_drop_legacy(req); =20 diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index 23dae0af530b..262971224cc6 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -1,8 +1,9 @@ #ifndef IOU_CORE_H #define IOU_CORE_H =20 +#include #include #include #include #include #include @@ -195,36 +196,130 @@ void io_queue_next(struct io_kiocb *req); void io_task_refs_refill(struct io_uring_task *tctx); bool __io_alloc_req_refill(struct io_ring_ctx *ctx); =20 void io_activate_pollwq(struct io_ring_ctx *ctx); =20 +/* + * The ctx uring lock protects most of the mutable struct io_ring_ctx state + * accessed in the struct io_kiocb issue path. In the I/O path, it is typi= cally + * acquired in the io_uring_enter() syscall and io_handle_tw_list(). For + * IORING_SETUP_SQPOLL, it's acquired by io_sq_thread() instead. io_kiocb's + * issued with IO_URING_F_UNLOCKED in issue_flags (e.g. by io_wq_submit_wo= rk()) + * acquire and release the ctx uring lock whenever they must touch io_ring= _ctx + * state. io_uring_register() also acquires the ctx uring lock because most + * opcodes mutate io_ring_ctx state accessed in the issue path. + * + * For !IORING_SETUP_SINGLE_ISSUER io_ring_ctx's, acquiring the ctx uring = lock + * is always done via mutex_(try)lock(&ctx->uring_lock). + * + * However, for IORING_SETUP_SINGLE_ISSUER, we can avoid the mutex_lock() + + * mutex_unlock() overhead on submitter_task because a single thread can't= race + * with itself. In the uncommon case where the ctx uring lock is needed on + * another thread, it must suspend submitter_task by scheduling a task wor= k item + * on it. io_ring_ctx_lock() returns once the task work item has started. + * submitter_task is unblocked once io_ring_ctx_unlock() is called. + * + * io_uring_register() requires special treatment for IORING_SETUP_SINGLE_= ISSUER + * since it's allowed on a IORING_SETUP_R_DISABLED io_ring_ctx, where + * submitter_task isn't set yet. Hence the io_ring_register_ctx_*() family + * of helpers. They unconditionally acquire the uring_lock mutex, which al= ways + * works to exclude other ctx uring lock users: + * - For !IORING_SETUP_SINGLE_ISSUER, all users acquire the ctx uring lock= via + * the uring_lock mutex + * - For IORING_SETUP_SINGLE_ISSUER and IORING_SETUP_R_DISABLED, only + * io_uring_register() is allowed before the io_ring_ctx is enabled. + * So again, all ctx uring lock users acquire the uring_lock mutex. + * - For IORING_SETUP_SINGLE_ISSUER and !IORING_SETUP_R_DISABLED, + * io_uring_register() is only permitted on submitter_task, which is alw= ays + * granted the ctx uring lock unless suspended. + * Acquiring the uring_lock mutex is unnecessary but still correct. + */ + struct io_ring_ctx_lock_state { + struct completion *suspend_end; }; =20 +struct io_ring_suspend_work { + struct callback_head cb_head; + struct completion suspend_start; + struct io_ring_ctx_lock_state *lock_state; +}; + +void io_ring_suspend_work(struct callback_head *cb_head); + /* Acquire the ctx uring lock */ static inline void io_ring_ctx_lock(struct io_ring_ctx *ctx, struct io_ring_ctx_lock_state *state) { - mutex_lock(&ctx->uring_lock); + struct io_ring_suspend_work suspend_work; + struct task_struct *submitter_task; + + if (!(ctx->flags & IORING_SETUP_SINGLE_ISSUER)) { + mutex_lock(&ctx->uring_lock); + return; + } + + submitter_task =3D ctx->submitter_task; + /* + * Not suitable for use while IORING_SETUP_R_DISABLED. + * Must use io_ring_register_ctx_lock() in that case. + */ + WARN_ON_ONCE(!submitter_task); + if (likely(current =3D=3D submitter_task)) + return; + + /* Use task work to suspend submitter_task */ + init_task_work(&suspend_work.cb_head, io_ring_suspend_work); + init_completion(&suspend_work.suspend_start); + suspend_work.lock_state =3D state; + /* If task_work_add() fails, task is exiting, so no need to suspend */ + if (unlikely(task_work_add(submitter_task, &suspend_work.cb_head, + TWA_SIGNAL))) { + state->suspend_end =3D NULL; + return; + } + + wait_for_completion(&suspend_work.suspend_start); } =20 /* Attempt to acquire the ctx uring lock without blocking */ static inline bool io_ring_ctx_trylock(struct io_ring_ctx *ctx) { - return mutex_trylock(&ctx->uring_lock); + if (!(ctx->flags & IORING_SETUP_SINGLE_ISSUER)) + return mutex_trylock(&ctx->uring_lock); + + /* Not suitable for use while IORING_SETUP_R_DISABLED */ + WARN_ON_ONCE(!ctx->submitter_task); + return current =3D=3D ctx->submitter_task; } =20 /* Release the ctx uring lock */ static inline void io_ring_ctx_unlock(struct io_ring_ctx *ctx, struct io_ring_ctx_lock_state *state) { - mutex_unlock(&ctx->uring_lock); + if (!(ctx->flags & IORING_SETUP_SINGLE_ISSUER)) { + mutex_unlock(&ctx->uring_lock); + return; + } + + if (likely(current =3D=3D ctx->submitter_task)) + return; + + if (likely(state->suspend_end)) + complete(state->suspend_end); } =20 /* Assert (if CONFIG_LOCKDEP) that the ctx uring lock is held */ static inline void io_ring_ctx_assert_locked(const struct io_ring_ctx *ctx) { + /* + * No straightforward way to check that submitter_task is suspended + * without access to struct io_ring_ctx_lock_state + */ + if (ctx->flags & IORING_SETUP_SINGLE_ISSUER) + return; + lockdep_assert_held(&ctx->uring_lock); } =20 /* Acquire the ctx uring lock during the io_uring_register() syscall */ static inline void io_ring_register_ctx_lock(struct io_ring_ctx *ctx) --=20 2.45.2