From nobody Sun Feb  8 01:39:27 2026
Received: from mail-pl1-f228.google.com (mail-pl1-f228.google.com
 [209.85.214.228])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id C977E2C327D
	for <linux-kernel@vger.kernel.org>; Thu, 18 Dec 2025 02:45:03 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.214.228
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1766025905; cv=none;
 b=nE1bJEi8Mn7MVhvDUhGmfSLFe+5jt3LYNh7mq8SnRXYwF0kecsso35v6HaqK4CCkIn08xSeEmZZd3T9z1RhZEZUse7luKc+ON8TQ/ArtWhT/JmlrKsScMSku+j3VDKkf7GRxxR5NHGhnMXAm4vycE0gUKEMSoscqL6PKm2w34fI=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1766025905; c=relaxed/simple;
	bh=rpg9xRwqXJdoa9o/0hJC0BVwNsBbVZZp4L9qh0QLmM4=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=HBxMFcdtr7vIJCrpRC4FsMlO/Z14H4PtZwOPpoiiS3zQ/VvRirjme2KPsrwDevv/24bPsaXE02baGTkM+HbTSUyKMBV7EGhjU3lGmIRjm6+g9IHzDfTAHXVLoUQxYlZGQcGp+gCfFD0nAuv0F+uEAXZKWqvt5bfZJEZCZBWqKz0=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=purestorage.com;
 spf=fail smtp.mailfrom=purestorage.com;
 dkim=pass (2048-bit key) header.d=purestorage.com header.i=@purestorage.com
 header.b=NladGGCv; arc=none smtp.client-ip=209.85.214.228
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=purestorage.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=fail smtp.mailfrom=purestorage.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=purestorage.com header.i=@purestorage.com
 header.b="NladGGCv"
Received: by mail-pl1-f228.google.com with SMTP id
 d9443c01a7336-29efc031eddso435505ad.3
        for <linux-kernel@vger.kernel.org>;
 Wed, 17 Dec 2025 18:45:03 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=purestorage.com; s=google2022; t=1766025903; x=1766630703;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=BMOc2yXwMWNwizXE4LJ3r+8jQkSXD2ri2LLDuLQglmM=;
        b=NladGGCvz9hx3HUKT+udV9PoXHncEEwwjBmd/DWBBV9fYZY+O+pcyYoGqTAuh84ROt
         LNy1i5az+fCWRHewJhgSYFZhf08hjq39a40ZFcdjkpcoX994S+O1qjrHufUVbCJDR5S/
         EWVt1Dseu5QY/q2fohWSrovOoT8jmw+vtB+h1o3RTFPiQT8nl2aZuN6K5jodTaXBSWV5
         hV4M37nooP+kEvC5OcifI+/L/h0xb4XRaR5HRfD9E6Y8FWPDv77CQmV3MUTccl2UO2c3
         lzm4un4uf31DvgWcbp5TTUzA+3nIfW0v7UKkOHDWa+IF2Lji545HPf8GnUC4K4CvH+ct
         zQIQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1766025903; x=1766630703;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from
         :to:cc:subject:date:message-id:reply-to;
        bh=BMOc2yXwMWNwizXE4LJ3r+8jQkSXD2ri2LLDuLQglmM=;
        b=go/Zb6O1F21DaTKJwbHw0DyxX7Wm7Ef+Vrk23a1shyyA9tdgp761y7vQCopg4Rty9S
         Dfgq5kixnDtrHh4iHzo9u96ccznX32jbOuX22HJ6aqcftsDGRsLbrkCsKpxcHDe3errc
         B4bAcvf2wEf7c4/lReq7Xo3VFZ6176Q+YyLEJ4ifAW30w7aR8VjtQsUoyVLNaFD2fB67
         jjFXM4FTzIxMwduwB+etblr/D41aXkFb+YtFiNwOpwlBFaotmV5FdovoQkKxuDJt4KQF
         Z+4zbSxzNkYBfoZCCcE6HC9cxX0XFAcMXAoZjccJkDJYOak6mv4onK8eYUKKTGIKv5g2
         W+Bg==
X-Forwarded-Encrypted: i=1;
 AJvYcCUKlE8ejixPuW6+x/4QdRw0XCXEaGGzi1zMKVOSxY/Uu+JS3oeMFRB53kdkgzOVLfyWGqsaxLN6+DHkp+U=@vger.kernel.org
X-Gm-Message-State: AOJu0YzismGvS1EZkwAPkfhoAKiYPZdoMehji5u/uE1sivGoRv3/Ujzz
	RCA5o+tJY6X1sB4urFtJ376VvZBGv4QGxbTKS0RYLjrg6lLVS3zXC7dzENuQ7Var/Eurnt+VKUm
	QYpUCKDahXBGXdOnKWtrKAnWJTZkzPRI/Qhn2QoajXMr1CLTBrRFK
X-Gm-Gg: AY/fxX4Nien5/tGv9YBLr+ljp6BuzRc9KdqFn/rbCG/XGT29aQn1NT2Thn4KGywh+iQ
	qM0UkctOde3kHkLeL6G8TpdYBc/hkH3xz1GE+1N6PFKHXL8n7q+FYecqHaufHD44uAi8PjIGsie
	6z9BVyMqWJul2wZMYSn6uH+3ORsgRwlr/e/d9eFGjNRNecQRMo4IV/yvTAwkIt6c4S+rdxbMT96
	4tuNSkijRO2myUfTRzuEjZTZbPnpq+RqnEk3ece8CS9/yB8IT0iZe+oDkyck4GQHbmPPiURjTKu
	CMPzeY8YCTWv7QdgWXGR+esVe9/YV0WYzWUpst5vmntNETGeCXAaMfLzX1EW5l3pjQ0zcJe5rIk
	BEF/upscWzCQwtO4DLb0Hzh9YoTY=
X-Google-Smtp-Source: 
 AGHT+IFH+pl8jKSdmMofH8xpbY+iOFdJxlVTeWYz56gwdoc2mTCaFO5HLwh1g+k03mnsyYLRY4FHfibrTnS6
X-Received: by 2002:a17:902:f647:b0:29f:2df2:cf49 with SMTP id
 d9443c01a7336-2a2d4516c82mr6470925ad.5.1766025902885;
        Wed, 17 Dec 2025 18:45:02 -0800 (PST)
Received: from c7-smtp-2023.dev.purestorage.com ([2620:125:9017:12:36:3:5:0])
        by smtp-relay.gmail.com with ESMTPS id
 d9443c01a7336-2a2d13e9e91sm1610865ad.46.2025.12.17.18.45.02
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Wed, 17 Dec 2025 18:45:02 -0800 (PST)
X-Relaying-Domain: purestorage.com
Received: from dev-csander.dev.purestorage.com (unknown
 [IPv6:2620:125:9007:640:ffff::1199])
	by c7-smtp-2023.dev.purestorage.com (Postfix) with ESMTP id 3203334023F;
	Wed, 17 Dec 2025 19:45:02 -0700 (MST)
Received: by dev-csander.dev.purestorage.com (Postfix, from userid 1557716354)
	id 2DBE7E41A13; Wed, 17 Dec 2025 19:45:02 -0700 (MST)
From: Caleb Sander Mateos <csander@purestorage.com>
To: Jens Axboe <axboe@kernel.dk>,
	io-uring@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Joanne Koong <joannelkoong@gmail.com>,
	Caleb Sander Mateos <csander@purestorage.com>
Subject: [PATCH v6 1/6] io_uring: use release-acquire ordering for
 IORING_SETUP_R_DISABLED
Date: Wed, 17 Dec 2025 19:44:54 -0700
Message-ID: <20251218024459.1083572-2-csander@purestorage.com>
X-Mailer: git-send-email 2.45.2
In-Reply-To: <20251218024459.1083572-1-csander@purestorage.com>
References: <20251218024459.1083572-1-csander@purestorage.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

io_uring_enter() and io_msg_ring() read ctx->flags and
ctx->submitter_task without holding the ctx's uring_lock. This means
they may race with the assignment to ctx->submitter_task and the
clearing of IORING_SETUP_R_DISABLED from ctx->flags in
io_register_enable_rings(). Ensure the correct ordering of the
ctx->flags and ctx->submitter_task memory accesses by storing to
ctx->flags using release ordering and loading it using acquire ordering.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Fixes: 4add705e4eeb ("io_uring: remove io_register_submitter")
Reviewed-by: Joanne Koong <joannelkoong@gmail.com>
---
 io_uring/io_uring.c | 2 +-
 io_uring/msg_ring.c | 4 ++--
 io_uring/register.c | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 6cb24cdf8e68..761b9612c5b6 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -3249,11 +3249,11 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u=
32, to_submit,
 			goto out;
 	}
=20
 	ctx =3D file->private_data;
 	ret =3D -EBADFD;
-	if (unlikely(ctx->flags & IORING_SETUP_R_DISABLED))
+	if (unlikely(smp_load_acquire(&ctx->flags) & IORING_SETUP_R_DISABLED))
 		goto out;
=20
 	/*
 	 * For SQ polling, the thread will do all submissions and completions.
 	 * Just return the requested submit count, and wake the thread if
diff --git a/io_uring/msg_ring.c b/io_uring/msg_ring.c
index 7063ea7964e7..c48588e06bfb 100644
--- a/io_uring/msg_ring.c
+++ b/io_uring/msg_ring.c
@@ -123,11 +123,11 @@ static int __io_msg_ring_data(struct io_ring_ctx *tar=
get_ctx,
=20
 	if (msg->src_fd || msg->flags & ~IORING_MSG_RING_FLAGS_PASS)
 		return -EINVAL;
 	if (!(msg->flags & IORING_MSG_RING_FLAGS_PASS) && msg->dst_fd)
 		return -EINVAL;
-	if (target_ctx->flags & IORING_SETUP_R_DISABLED)
+	if (smp_load_acquire(&target_ctx->flags) & IORING_SETUP_R_DISABLED)
 		return -EBADFD;
=20
 	if (io_msg_need_remote(target_ctx))
 		return io_msg_data_remote(target_ctx, msg);
=20
@@ -243,11 +243,11 @@ static int io_msg_send_fd(struct io_kiocb *req, unsig=
ned int issue_flags)
=20
 	if (msg->len)
 		return -EINVAL;
 	if (target_ctx =3D=3D ctx)
 		return -EINVAL;
-	if (target_ctx->flags & IORING_SETUP_R_DISABLED)
+	if (smp_load_acquire(&target_ctx->flags) & IORING_SETUP_R_DISABLED)
 		return -EBADFD;
 	if (!msg->src_file) {
 		int ret =3D io_msg_grab_file(req, issue_flags);
 		if (unlikely(ret))
 			return ret;
diff --git a/io_uring/register.c b/io_uring/register.c
index 62d39b3ff317..9e473c244041 100644
--- a/io_uring/register.c
+++ b/io_uring/register.c
@@ -191,11 +191,11 @@ static int io_register_enable_rings(struct io_ring_ct=
x *ctx)
 	}
=20
 	if (ctx->restrictions.registered)
 		ctx->restricted =3D 1;
=20
-	ctx->flags &=3D ~IORING_SETUP_R_DISABLED;
+	smp_store_release(&ctx->flags, ctx->flags & ~IORING_SETUP_R_DISABLED);
 	if (ctx->sq_data && wq_has_sleeper(&ctx->sq_data->wait))
 		wake_up(&ctx->sq_data->wait);
 	return 0;
 }
=20
--=20
2.45.2
From nobody Sun Feb  8 01:39:27 2026
Received: from mail-pl1-f226.google.com (mail-pl1-f226.google.com
 [209.85.214.226])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id EA0A42FC024
	for <linux-kernel@vger.kernel.org>; Thu, 18 Dec 2025 02:45:04 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.214.226
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1766025906; cv=none;
 b=rLgjPIlYlIWnKinLeTt2rJGkMXTCDe/d+QAvCqg4/VsdPmdu0URiOsfac3TVF/Op4BiLk+8Y7NzEemJNoH1Ac/q7a4bzSOmiUT6M06vnR5rEGSOjBF/LKqYVu69HO2mHmvc1D8H6tH+eWgM1ucrIvrLce5yoEHqkjjPKsWPGUTg=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1766025906; c=relaxed/simple;
	bh=L5Uqu9+rr+DwT6pdOZXAwcq9nYIz91fuoF+OXKgDo4U=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=QMXWAvj/yfXVwQ/DNDtnP+EL37hKju7cdURI0l188/vIWlt74JyD+CH5jkbU502wF+Y5CXER+Q3LEVvNDIKJTaoDSUzWna0NDyAxEpAIg0Z47J/GEUa8dUXLoZH/P126tuRYy5xAWFScDg7Ws57sW/TL4GnKnXMs4n0hXVKD4iw=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=purestorage.com;
 spf=fail smtp.mailfrom=purestorage.com;
 dkim=pass (2048-bit key) header.d=purestorage.com header.i=@purestorage.com
 header.b=QWg/Jpch; arc=none smtp.client-ip=209.85.214.226
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=purestorage.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=fail smtp.mailfrom=purestorage.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=purestorage.com header.i=@purestorage.com
 header.b="QWg/Jpch"
Received: by mail-pl1-f226.google.com with SMTP id
 d9443c01a7336-2a110548f10so395755ad.1
        for <linux-kernel@vger.kernel.org>;
 Wed, 17 Dec 2025 18:45:04 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=purestorage.com; s=google2022; t=1766025904; x=1766630704;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=8/b4bbf5Fpl1ur/4rCqqMbk9Yx0Nl8rzYgHyXMsZmA8=;
        b=QWg/JpchEaI+A2XxnCK3RiZ1vRCZalvOa9J5DO55Fsq1OM25XT0Xhsrf7WnORjARY7
         SRus9koTp4mOVaZJHL5fS+B091b+iE58ZeWyMTq6CFKjJT7aqqx+8AzQGrYtxQAyYd9S
         9Zy9FQsXsN11VE5fpuoVsTVChlm6owaO5T+19PNDh9C+EcFYRbcuC+6CizPDbbs4+aeH
         0KqmrqeclKkwGbMRVadJyj+0Wv3MUCJcgb2KypFkSa2DBsfiAz/EmGZHoC4pmUE0zFIg
         uPeZTwD8YKyd1Fm8digSCiFqACpuLMb79Il0e2xrsP/VhivlJGPnfZShaMVXnSvf3bjP
         NFzA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1766025904; x=1766630704;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from
         :to:cc:subject:date:message-id:reply-to;
        bh=8/b4bbf5Fpl1ur/4rCqqMbk9Yx0Nl8rzYgHyXMsZmA8=;
        b=YtQMtNZR7jJxuKowriAdx13+8iLj3T/hwWhN2RCJZWXrF23nEjvneHrCGNreYcngLA
         OtLYVSOu6BHzLW9V/4qaEaU62tkKARsnooYs8Ys69rNCr8I9JeQfNA8+NYvFmyylVPNE
         ucyf1ExumzXWb+e+/dXW3LUZTJf0ZlNB/ToZ0Ntc6FmM7Cq8MZ55+H3EteSejd93pF/+
         q0Da8cXvWmuHtdIkuL2WqT3XXpVaNttlNLmhQ5dH0ctNh+MCcC0637RHHujFxxBnJT/3
         z8e3iovdfSRfA8ipCrZsogrcETndEBVCydaRYMoyOcUQrqxmveFJ5AGghHm4NCBJnEVt
         o0Kg==
X-Forwarded-Encrypted: i=1;
 AJvYcCX4SABlLmQcVIBMZ9PHLv9vSjGZz5wb4sSJRnZxSFM6XU/pYqcLfYEkeh+EoEjJFdBtv7y58JlSWA/oMnQ=@vger.kernel.org
X-Gm-Message-State: AOJu0YzJazggFt9V+IK+D35xD5a+t/NLhcxakkMJWeSITMMQPdrut1Lj
	JXf/slPrGJk1S0NDIglvKM8eDPwcCj66v41sRjixWYJpvVvKjCXPpbtlSzIZ+lh5LlrC1ky9Sna
	2vOy72Vgmg7PtTcHKFLj9c4wSwpYUukEF3F4q2jSDdc4n077LO4Gj
X-Gm-Gg: AY/fxX6hhBa33X6VRO7Ib/76RhboIRGTPY9yc2SpMz1hOYH8wrByGA785JXHJEqjqgN
	W5CDw25VlyghD3nrPR1BJ2RdxPVY3AC4JPuJhjOzTmQBmYBy453Q+mpl9AuB6vNWwvHRgZm1ncm
	ta4OE81BX4ooaFwteRma7wczme9RtklvGX2tC5mooDmdtQAll9Ay6gW6AogAYaS2AmfI1+We69C
	JeLrIbtmvlHOAO7v6gisG1b9/p3hZ5vNZ3Jwec5UdNrsrFKAEFmw0amoIWpfN7gatV3/Lq/UR+U
	LmPkJTlU7oXZdMa4d4GFsmNvZQ2aTKILHZb83qqoMUVfCj0JtDxfZd4eEZmmOrxy3EbsEs7ZkQc
	oSKZdavmAVmept6MQ/lcIrCIpgEA=
X-Google-Smtp-Source: 
 AGHT+IELQcyX0BuhAdYyweS2xiLxp/916xvfJHq29Z1LD/xwAwNxnN1c74MVP7pk5B0pAgqSxlddAfPDutJN
X-Received: by 2002:a17:902:ebc9:b0:29a:56a:8b81 with SMTP id
 d9443c01a7336-2a2d4509762mr5916985ad.8.1766025904052;
        Wed, 17 Dec 2025 18:45:04 -0800 (PST)
Received: from c7-smtp-2023.dev.purestorage.com ([2620:125:9017:12:36:3:5:0])
        by smtp-relay.gmail.com with ESMTPS id
 d9443c01a7336-2a2d08ab029sm1603095ad.24.2025.12.17.18.45.03
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Wed, 17 Dec 2025 18:45:04 -0800 (PST)
X-Relaying-Domain: purestorage.com
Received: from dev-csander.dev.purestorage.com (unknown
 [IPv6:2620:125:9007:640:ffff::1199])
	by c7-smtp-2023.dev.purestorage.com (Postfix) with ESMTP id 764913420E0;
	Wed, 17 Dec 2025 19:45:03 -0700 (MST)
Received: by dev-csander.dev.purestorage.com (Postfix, from userid 1557716354)
	id 6923CE41A13; Wed, 17 Dec 2025 19:45:02 -0700 (MST)
From: Caleb Sander Mateos <csander@purestorage.com>
To: Jens Axboe <axboe@kernel.dk>,
	io-uring@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Joanne Koong <joannelkoong@gmail.com>,
	Caleb Sander Mateos <csander@purestorage.com>
Subject: [PATCH v6 2/6] io_uring: clear IORING_SETUP_SINGLE_ISSUER for
 IORING_SETUP_SQPOLL
Date: Wed, 17 Dec 2025 19:44:55 -0700
Message-ID: <20251218024459.1083572-3-csander@purestorage.com>
X-Mailer: git-send-email 2.45.2
In-Reply-To: <20251218024459.1083572-1-csander@purestorage.com>
References: <20251218024459.1083572-1-csander@purestorage.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

IORING_SETUP_SINGLE_ISSUER doesn't currently enable any optimizations,
but it will soon be used to avoid taking io_ring_ctx's uring_lock when
submitting from the single issuer task. If the IORING_SETUP_SQPOLL flag
is set, the SQ thread is the sole task issuing SQEs. However, other
tasks may make io_uring_register() syscalls, which must be synchronized
with SQE submission. So it wouldn't be safe to skip the uring_lock
around the SQ thread's submission even if IORING_SETUP_SINGLE_ISSUER is
set. Therefore, clear IORING_SETUP_SINGLE_ISSUER from the io_ring_ctx
flags if IORING_SETUP_SQPOLL is set.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
---
 io_uring/io_uring.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 761b9612c5b6..44ff5756b328 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -3478,10 +3478,19 @@ static int io_uring_sanitise_params(struct io_uring=
_params *p)
 	 */
 	if ((flags & (IORING_SETUP_SQE128|IORING_SETUP_SQE_MIXED)) =3D=3D
 	    (IORING_SETUP_SQE128|IORING_SETUP_SQE_MIXED))
 		return -EINVAL;
=20
+	/*
+	 * If IORING_SETUP_SQPOLL is set, only the SQ thread issues SQEs,
+	 * but other threads may call io_uring_register() concurrently.
+	 * We still need ctx uring lock to synchronize these io_ring_ctx
+	 * accesses, so disable the single issuer optimizations.
+	 */
+	if (flags & IORING_SETUP_SQPOLL)
+		p->flags &=3D ~IORING_SETUP_SINGLE_ISSUER;
+
 	return 0;
 }
=20
 static int io_uring_fill_params(struct io_uring_params *p)
 {
--=20
2.45.2
From nobody Sun Feb  8 01:39:27 2026
Received: from mail-yx1-f97.google.com (mail-yx1-f97.google.com
 [74.125.224.97])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9EAFC2D94A4
	for <linux-kernel@vger.kernel.org>; Thu, 18 Dec 2025 02:45:04 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=74.125.224.97
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1766025907; cv=none;
 b=htHp2E4n9A2yR7vfK0nAh1UGa8XOOiaa8X1F7dMOcgh4bf/ziOcCZOTKFv9f05N461AD3t0uJy8hRVCo1dxJVyHi557D4CcPEqzwMeBNGn1e4/HLlpXPCHRMt7AwnkiDId4PMj+rcKcYuYesk97Ea/1x0txabS4QVdFI/qm2OsA=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1766025907; c=relaxed/simple;
	bh=aw8TkwSuEq6GliZr+5FPJ1JFCl20mpvk1u8mxOiswh0=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=Oq7HQmZYryPXLCRcmIq1Io6txuzsZx7pyLabCagTXWsgSfGLToq5ikQoyDYdnW6Lq4CoIbDIxKz7cmxxjuIheiHsRxoZJYLem0J9zI73GgSNC6qrXWv0nRME6ZT6MxHZ2hV0UmRpAiNFTal7UE6UiRYrMY8I+eZ2LZj/a/KlK+Q=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=purestorage.com;
 spf=fail smtp.mailfrom=purestorage.com;
 dkim=pass (2048-bit key) header.d=purestorage.com header.i=@purestorage.com
 header.b=Z9mNFGsM; arc=none smtp.client-ip=74.125.224.97
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=purestorage.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=fail smtp.mailfrom=purestorage.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=purestorage.com header.i=@purestorage.com
 header.b="Z9mNFGsM"
Received: by mail-yx1-f97.google.com with SMTP id
 956f58d0204a3-64324e8296aso25781d50.3
        for <linux-kernel@vger.kernel.org>;
 Wed, 17 Dec 2025 18:45:04 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=purestorage.com; s=google2022; t=1766025903; x=1766630703;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=/4WqxX2C2PdcjwiPFH0GMXdb0hJI2RS81lgoZ0hAJzA=;
        b=Z9mNFGsMLSNmvKI63o1krPh9k6/Nn+vvWhCeKs3jxtBlntwiXSrqF2WSdLOU4UISqM
         r6kHYP9lF1H13jfVNqQUdwfGeaw3cqhLQKGkmaiOuVIZUMpLzFstaCJJmqhWyFDa32iO
         3PukwCxZjcWcxfY4SctUgUxks+fE6Luo8CDfemD7SB0U8I6pgyKQnmLchHGUfvi+30zu
         NwC6+Zi+S63KQPe09u1W2qOht1bjPlQZXen1CNMgzuwo+JPWTRdR+fyJFv7rQf0v6opR
         9RZ5qTzNuU/GCmp4EXBsroXLHv+VXXuXsXiUpuVbYjFCZibUkVd5EqFnavgddptu7ZsP
         DULw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1766025903; x=1766630703;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from
         :to:cc:subject:date:message-id:reply-to;
        bh=/4WqxX2C2PdcjwiPFH0GMXdb0hJI2RS81lgoZ0hAJzA=;
        b=hq/cHi0Cr+a9CW9qmfRVudYp2ZVl4vrlYPZqM+MFXpA50g6ddSbtRdfAl7teMWlS0d
         wlXxqnGzBIxugfN2Ek3fh4A2rdaNVPLwTsI5FQCCxs4SHTyJuvM73a+3c57D6mncui4x
         1YUxaxT73VDUFL3ADhU8+5p4z4CkE/zsknb5JxxDZOZdVSmVhrMwklcx2G0JSDNbSXyt
         t1qEfQLl5UdthquA9XbfWLGSEQxij2ybQMi/R+490kSYcKVN3E1XTCsqc/u96mpAZmqU
         EFd+EfDrBQornGCq3iiFD44DDDILjsQackjowV+BxII+qsrz/L4mUOtWAgIlzw4qaqi7
         06nQ==
X-Forwarded-Encrypted: i=1;
 AJvYcCX3CBWdXNJ9nvtcWGuWQXkuFxYR+UhKBeCbBxANDY74Le6ToDuckWUdqekX5SB0nX4GvRwG6QEWzWyrD2k=@vger.kernel.org
X-Gm-Message-State: AOJu0YxQHpMFWY+vt5ppxXNYxwEPocs4V1p5497SRN4PXP1LYW2DjLbB
	lUIzmFtzvOqcvO33BPrSQfDqE3NNmgoXDcX2/ib3Bo1RxtSyoORHWTPjKmRIBCeUwM6NmaNipdB
	hQi5trKpg+jK+TWZxPJFU91CML8sXzVp0JdR26M3LfNnTF95fPW+u
X-Gm-Gg: AY/fxX6cJl1+FwdKYh8WDdXLQSHuJ63fMY1Xi/34e9qg/cRfvP/ynuqn3n7/vj8cXsT
	2JVV26la1TXvg/+KgFO46t/x8/RC0Bm4zuyi2smB3axgnp/tseEmvDBtbPMSM5Ijtrx62NXAui2
	alwXye2GDDlUON4n1QbLtFiidxRz5VWWI81nd+2N2IqlTg7UmjcOnRo2PnfmTpNO7LLENr9VkmV
	11SfQ5tHxAQcWmf3m8zd6jkiqng4FEEAxqvzY5EeOAMv16p4XDpobMr/qaehIZejnTOd9mEJ4xf
	sBPDW+JptjP9Onojj8fVB5vo93hh8k+TkJaBkAKyLBQ2gD7/YWCnsohGCiafnhydtcQuJAOq9Mn
	TublrwVyR67dJw0iWuzWvDxV2MfM=
X-Google-Smtp-Source: 
 AGHT+IGdRRKMbFB2NJTvFJ8umTezv14TsE8PBfTe6dIFNYVA2ku9Kdlm3dMUfX27EK0D2SBxPvxlKSqiSxcp
X-Received: by 2002:a05:690e:4008:b0:63f:abbe:398e with SMTP id
 956f58d0204a3-646644a3a1cmr504362d50.5.1766025903600;
        Wed, 17 Dec 2025 18:45:03 -0800 (PST)
Received: from c7-smtp-2023.dev.purestorage.com ([2620:125:9017:12:36:3:5:0])
        by smtp-relay.gmail.com with ESMTPS id
 00721157ae682-78fa6ef4dcesm900367b3.12.2025.12.17.18.45.03
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Wed, 17 Dec 2025 18:45:03 -0800 (PST)
X-Relaying-Domain: purestorage.com
Received: from dev-csander.dev.purestorage.com (unknown
 [IPv6:2620:125:9007:640:ffff::1199])
	by c7-smtp-2023.dev.purestorage.com (Postfix) with ESMTP id 9C354340752;
	Wed, 17 Dec 2025 19:45:02 -0700 (MST)
Received: by dev-csander.dev.purestorage.com (Postfix, from userid 1557716354)
	id 97B54E41A49; Wed, 17 Dec 2025 19:45:02 -0700 (MST)
From: Caleb Sander Mateos <csander@purestorage.com>
To: Jens Axboe <axboe@kernel.dk>,
	io-uring@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Joanne Koong <joannelkoong@gmail.com>,
	Caleb Sander Mateos <csander@purestorage.com>,
	kernel test robot <oliver.sang@intel.com>,
	syzbot@syzkaller.appspotmail.com
Subject: [PATCH v6 3/6] io_uring: ensure submitter_task is valid for
 io_ring_ctx's lifetime
Date: Wed, 17 Dec 2025 19:44:56 -0700
Message-ID: <20251218024459.1083572-4-csander@purestorage.com>
X-Mailer: git-send-email 2.45.2
In-Reply-To: <20251218024459.1083572-1-csander@purestorage.com>
References: <20251218024459.1083572-1-csander@purestorage.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

If io_uring_create() fails after allocating the struct io_ring_ctx, it
may call io_ring_ctx_wait_and_kill() before submitter_task has been
assigned. This is currently harmless, as the submit and register paths
that check submitter_task aren't reachable until the io_ring_ctx has
been successfully created. However, a subsequent commit will expect
submitter_task to be set for every IORING_SETUP_SINGLE_ISSUER &&
!IORING_SETUP_R_DISABLED ctx. So assign ctx->submitter_task immediately
after allocating the ctx in io_uring_create().
Similarly, the reference on submitter_task is currently released early
in io_ring_ctx_free(). But it will soon be needed to acquire the uring
lock during the later call to io_req_caches_free(). So release the
submitter_task reference as the last thing before freeing the ctx.

Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202512101405.a7a2bdb2-lkp@intel.com
Tested-by: syzbot@syzkaller.appspotmail.com
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
---
 io_uring/io_uring.c | 24 +++++++++++++-----------
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 44ff5756b328..22086ac84278 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2852,12 +2852,10 @@ static __cold void io_ring_ctx_free(struct io_ring_=
ctx *ctx)
 	io_destroy_buffers(ctx);
 	io_free_region(ctx->user, &ctx->param_region);
 	mutex_unlock(&ctx->uring_lock);
 	if (ctx->sq_creds)
 		put_cred(ctx->sq_creds);
-	if (ctx->submitter_task)
-		put_task_struct(ctx->submitter_task);
=20
 	WARN_ON_ONCE(!list_empty(&ctx->ltimeout_list));
=20
 	if (ctx->mm_account) {
 		mmdrop(ctx->mm_account);
@@ -2877,10 +2875,13 @@ static __cold void io_ring_ctx_free(struct io_ring_=
ctx *ctx)
 	if (ctx->hash_map)
 		io_wq_put_hash(ctx->hash_map);
 	io_napi_free(ctx);
 	kvfree(ctx->cancel_table.hbs);
 	xa_destroy(&ctx->io_bl_xa);
+	/* Release submitter_task last, as any io_ring_ctx_lock() may use it */
+	if (ctx->submitter_task)
+		put_task_struct(ctx->submitter_task);
 	kfree(ctx);
 }
=20
 static __cold void io_activate_pollwq_cb(struct callback_head *cb)
 {
@@ -3594,10 +3595,20 @@ static __cold int io_uring_create(struct io_ctx_con=
fig *config)
=20
 	ctx =3D io_ring_ctx_alloc(p);
 	if (!ctx)
 		return -ENOMEM;
=20
+	/* Assign submitter_task first, as any io_ring_ctx_lock() may use it */
+	if (ctx->flags & IORING_SETUP_SINGLE_ISSUER
+	    && !(ctx->flags & IORING_SETUP_R_DISABLED)) {
+		/*
+		 * Unlike io_register_enable_rings(), don't need WRITE_ONCE()
+		 * since ctx isn't yet accessible from other tasks
+		 */
+		ctx->submitter_task =3D get_task_struct(current);
+	}
+
 	ctx->clockid =3D CLOCK_MONOTONIC;
 	ctx->clock_offset =3D 0;
=20
 	if (!(ctx->flags & IORING_SETUP_NO_SQARRAY))
 		static_branch_inc(&io_key_has_sqarray);
@@ -3662,19 +3673,10 @@ static __cold int io_uring_create(struct io_ctx_con=
fig *config)
 	if (copy_to_user(config->uptr, p, sizeof(*p))) {
 		ret =3D -EFAULT;
 		goto err;
 	}
=20
-	if (ctx->flags & IORING_SETUP_SINGLE_ISSUER
-	    && !(ctx->flags & IORING_SETUP_R_DISABLED)) {
-		/*
-		 * Unlike io_register_enable_rings(), don't need WRITE_ONCE()
-		 * since ctx isn't yet accessible from other tasks
-		 */
-		ctx->submitter_task =3D get_task_struct(current);
-	}
-
 	file =3D io_uring_get_file(ctx);
 	if (IS_ERR(file)) {
 		ret =3D PTR_ERR(file);
 		goto err;
 	}
--=20
2.45.2
From nobody Sun Feb  8 01:39:27 2026
Received: from mail-oa1-f100.google.com (mail-oa1-f100.google.com
 [209.85.160.100])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 29D442FDC25
	for <linux-kernel@vger.kernel.org>; Thu, 18 Dec 2025 02:45:04 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.160.100
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1766025908; cv=none;
 b=bm7btFP1dABWzxeHhyU8VDgx7VYZ0hMQSQ9u4g6lwk5m7wdyKXjlVPaAYOAW/Eh62kKFdNeqF/XoFU51S+Ukdl+wmuVQbpHchgr+3A9GCx+C3ccoch85TQ2JTjeIi9YTmVRCQ17UtkikR1Lmvhej7gR1zSbrZxXcjG+kBQy+lwA=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1766025908; c=relaxed/simple;
	bh=LbZ3r1FO6blMuhfRxt6WhD9bltWwdniPdYHLHEt70jo=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=GpcE23MrpTrhStYamQIe9txvdQ8gVY38ueKTmDoF9SgF0yGCx74SSgTFWJN+bqA4ZszBzRMCNQChCufwiJ65WLG/61+dRC6y9xybw00IOtztkemmRaFs5ARVK4xHaZ+9sBfF/HcY13FM+dFT/l/W2KFpS2NiIRB8VP5ENCH1N5I=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=purestorage.com;
 spf=fail smtp.mailfrom=purestorage.com;
 dkim=pass (2048-bit key) header.d=purestorage.com header.i=@purestorage.com
 header.b=KcQclCoL; arc=none smtp.client-ip=209.85.160.100
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=purestorage.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=fail smtp.mailfrom=purestorage.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=purestorage.com header.i=@purestorage.com
 header.b="KcQclCoL"
Received: by mail-oa1-f100.google.com with SMTP id
 586e51a60fabf-3e0ad5b4763so36625fac.1
        for <linux-kernel@vger.kernel.org>;
 Wed, 17 Dec 2025 18:45:04 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=purestorage.com; s=google2022; t=1766025904; x=1766630704;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=mD3XkocdNT/jKO6xXmN59qwRMEBa7RKDVtIkI8u+RTk=;
        b=KcQclCoL/mBdYeDppoQlpVkSxtg1rLz/YhOgv2DRj6nV+PeauSDRmoBtUGbJGI0/+S
         uG9AVh+FX0nKpeKRBagKJ9l/wKvsA5VG9kohY0ew2zVFUT4f2b6vcXFm9DyXgAEFvHbt
         G13ty3EvDTonEGkZTSu5zf7z9mshroSsiMM33CdpFxDXOh61dJb5h0yOLgZZLCXctoRe
         HDnDLftKjKOIUcakO3D1Sl+iz4T0HcZb3c2zZPWEYAb/CwY9rxqKG3MH0vdGrgUTb6W7
         jI9yLbUqUpBQPuyf1y0ZMINmlnSo+IKA/9Jd9G0Kxdhpg4iqehJE95BXF6AMhQBxbyDk
         a3tA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1766025904; x=1766630704;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from
         :to:cc:subject:date:message-id:reply-to;
        bh=mD3XkocdNT/jKO6xXmN59qwRMEBa7RKDVtIkI8u+RTk=;
        b=LKKi8wNRwAcO04+QgsjoHai8nH0YsCycS/6Ktysr50oPdKVbZvzknknboQpknsxlq6
         9+sovz9g/TOnKxNU364IdMwl7viJD3OVtx1LIVaMsYcYSkQfUpUssQSFWP+QjSMFtyQc
         wnXQgH+5WD/LWzwRWRC1gfyYOxycwuVy27vJLg01Hdo/rhsdyP8q+SzpN5267xnGzVtw
         2CTxfdkERYBDEoydRPH/kdUZBxGm5Dz3xQIXP+It2nUNHtdHr2xEASwSUsjUF7QrSANj
         zSl0Hqcm/nSAyCVIaSCeZjQowl2Gii8hkROLSPdJvHc0zYjX6noXBjemAC/3EXfF46Lp
         127A==
X-Forwarded-Encrypted: i=1;
 AJvYcCUKtAevZbWtsZXXKsGvhlv7j0gjHeOfEBTKWxrOeWCYgsoG3EJDX4waAPYApDg1MGVTu+0vnIvuEXvJJeA=@vger.kernel.org
X-Gm-Message-State: AOJu0YwZ/zGnqerwclN2BiFsZF+tqIlBFFFgPzXVF5n6bX9SIfKLhl/9
	azuB/mswKghLkhssnctm9xwzk9ORxZcZwXR4693wA6eSjIbvu16HGqKodsGOgynB869u73W1q4/
	qvdmErVjeZQLSjWowMIL7ExOXQbUdXMBkTZO7
X-Gm-Gg: AY/fxX4Qt1GorVmSs3HmAHlTHuq+aUS35bLkqP/CVN+P/1QlRoWzKM3REepBe497X8k
	sW4ZAmPeCLJk746q0ucc34hT0kl/KO/8okycIjDAnZHWnW+4JkaizpC5SBd0bGQ93177vuYCFzr
	kAhmz+QAHcvk42RxxheRXofCS6Cb8dT16Ts2LRvBDE1C2OMyWfoyrgnZn36jHHIQQjjQ3ogsjCS
	Bc/YENIF09X3DhO98Zx1JoHPuZA7LUF4f9itqLJJTfUAqetCdsETbZXKQcnGDJykQ5QXfJwl+jR
	q7EWAzZ8wwJXBK0spn3DExkOs0u9Y1NhajaGJ98BHteDhvb0ro8Kdz8wkfWlU4rhFKF/PcFrKgt
	2P6aaioRjVZE8iHTyhDk2X6FoaUH4Zla3fFfywSxfkQ==
X-Google-Smtp-Source: 
 AGHT+IErBFE11CWWforXCJrGX0oeKrN73fwb+L8zmbmh8PgxroZZBWNbtPisq6LiY5/QqEzyWv1mCKknEhT9
X-Received: by 2002:a05:6870:8327:b0:3ea:d0e3:9696 with SMTP id
 586e51a60fabf-3fa1b14b6bbmr311098fac.9.1766025903797;
        Wed, 17 Dec 2025 18:45:03 -0800 (PST)
Received: from c7-smtp-2023.dev.purestorage.com ([208.88.159.129])
        by smtp-relay.gmail.com with ESMTPS id
 586e51a60fabf-3fa18031fe5sm124135fac.15.2025.12.17.18.45.03
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Wed, 17 Dec 2025 18:45:03 -0800 (PST)
X-Relaying-Domain: purestorage.com
Received: from dev-csander.dev.purestorage.com (unknown
 [IPv6:2620:125:9007:640:ffff::1199])
	by c7-smtp-2023.dev.purestorage.com (Postfix) with ESMTP id C824E340891;
	Wed, 17 Dec 2025 19:45:02 -0700 (MST)
Received: by dev-csander.dev.purestorage.com (Postfix, from userid 1557716354)
	id C68CDE41A49; Wed, 17 Dec 2025 19:45:02 -0700 (MST)
From: Caleb Sander Mateos <csander@purestorage.com>
To: Jens Axboe <axboe@kernel.dk>,
	io-uring@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Joanne Koong <joannelkoong@gmail.com>,
	Caleb Sander Mateos <csander@purestorage.com>
Subject: [PATCH v6 4/6] io_uring: use io_ring_submit_lock() in
 io_iopoll_req_issued()
Date: Wed, 17 Dec 2025 19:44:57 -0700
Message-ID: <20251218024459.1083572-5-csander@purestorage.com>
X-Mailer: git-send-email 2.45.2
In-Reply-To: <20251218024459.1083572-1-csander@purestorage.com>
References: <20251218024459.1083572-1-csander@purestorage.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Use the io_ring_submit_lock() helper in io_iopoll_req_issued() instead
of open-coding the logic. io_ring_submit_unlock() can't be used for the
unlock, though, due to the extra logic before releasing the mutex.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Reviewed-by: Joanne Koong <joannelkoong@gmail.com>
---
 io_uring/io_uring.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 22086ac84278..ab0af4a38714 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -1670,15 +1670,13 @@ void io_req_task_complete(struct io_tw_req tw_req, =
io_tw_token_t tw)
  * accessing the kiocb cookie.
  */
 static void io_iopoll_req_issued(struct io_kiocb *req, unsigned int issue_=
flags)
 {
 	struct io_ring_ctx *ctx =3D req->ctx;
-	const bool needs_lock =3D issue_flags & IO_URING_F_UNLOCKED;
=20
 	/* workqueue context doesn't hold uring_lock, grab it now */
-	if (unlikely(needs_lock))
-		mutex_lock(&ctx->uring_lock);
+	io_ring_submit_lock(ctx, issue_flags);
=20
 	/*
 	 * Track whether we have multiple files in our lists. This will impact
 	 * how we do polling eventually, not spinning if we're on potentially
 	 * different devices.
@@ -1701,11 +1699,11 @@ static void io_iopoll_req_issued(struct io_kiocb *r=
eq, unsigned int issue_flags)
 	if (READ_ONCE(req->iopoll_completed))
 		wq_list_add_head(&req->comp_list, &ctx->iopoll_list);
 	else
 		wq_list_add_tail(&req->comp_list, &ctx->iopoll_list);
=20
-	if (unlikely(needs_lock)) {
+	if (unlikely(issue_flags & IO_URING_F_UNLOCKED)) {
 		/*
 		 * If IORING_SETUP_SQPOLL is enabled, sqes are either handle
 		 * in sq thread task context or in io worker task context. If
 		 * current task context is sq thread, we don't need to check
 		 * whether should wake up sq thread.
--=20
2.45.2
From nobody Sun Feb  8 01:39:27 2026
Received: from mail-pl1-f232.google.com (mail-pl1-f232.google.com
 [209.85.214.232])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7F596265606
	for <linux-kernel@vger.kernel.org>; Thu, 18 Dec 2025 02:45:04 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.214.232
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1766025911; cv=none;
 b=vFmFKYRhjX3IwYMgGIu+butBNkIOWlnyOU2owdL3RzbOMvy+y3M7dvjEU5XkC/pO3MWm/a9VybQ4TCXpC4OxtZTnfmaYkWwzEcq9R/6z8GhYVXwqtZmqnj8L8L1tB+Jf0QHoBOPzcSXPM2DKGDu5mQ/5Jp5WWjDp8pTvJBzhTMw=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1766025911; c=relaxed/simple;
	bh=+WACMf3bokRyucRICGBtTERi6JjJHpL8asO5Ob8aO6A=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=eEq9cKS0Rj/lV53lC64mPvTrCsJpIfcG0e+oXomGVXOc3j+kiJ/WdEi/nX83wi43DD2uShJwZLEreb114n44la6wp4zsjwVVCSeDrjCESgjWHv8HReRJ/jfC85DzjrIowCvzAbUIUUfJ0hDd7N0Aq5D9Ukh5n62uooutZo9nfow=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=purestorage.com;
 spf=fail smtp.mailfrom=purestorage.com;
 dkim=pass (2048-bit key) header.d=purestorage.com header.i=@purestorage.com
 header.b=bOmu9M2P; arc=none smtp.client-ip=209.85.214.232
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=purestorage.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=fail smtp.mailfrom=purestorage.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=purestorage.com header.i=@purestorage.com
 header.b="bOmu9M2P"
Received: by mail-pl1-f232.google.com with SMTP id
 d9443c01a7336-2a07fb1527cso383525ad.3
        for <linux-kernel@vger.kernel.org>;
 Wed, 17 Dec 2025 18:45:04 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=purestorage.com; s=google2022; t=1766025904; x=1766630704;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=JJ0TC7568C2lgw1Mn3KetO4woO82HK5aGI8UqHUnfvM=;
        b=bOmu9M2Pf6qzj85SyF+Xeb77kutaLwXmqlE6bTN3Y24Rw3nrlZ7HlAtCUzeXcSsrMp
         1L8AZq1Ex1NIGS2W6/qt0BywkKJwlch0VbgZs4ppqYZh+4TPbJ0XoergWcAC4XZL+DGR
         uUtNJqiAfJBwBpNNrPjc3cFexjPVkmyk1z+rHs90yEVlNl2Sjh83ByR1IABD/NL5pxbt
         iPwjRSnR+QvP6toR2vJpcyQ0UjcuKJT0xjSj0VbLsoEE9je55dqkrMlMSF6pduYOCZhN
         UKwBeUah3ssiCU28D8lH2k2tKcuS8TqGgk20NiuMWKEh7GsaLiZCggLE78X2zBqFngCT
         gzIA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1766025904; x=1766630704;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from
         :to:cc:subject:date:message-id:reply-to;
        bh=JJ0TC7568C2lgw1Mn3KetO4woO82HK5aGI8UqHUnfvM=;
        b=vSpMClZb6MnkTvLJX0M3asbwiTsmRb6N688p+PAVEcPvZ+uYHOO8EOJlllSS3rnfEE
         sWc175Pft1BtcFRiBQwlVLvN+5vWS9oc4kfafXQAb+pnRGsqdc9ZhMZYfQwXS6ltMUWf
         CEDZxRyt7LdINoInhxOpgLAAGJ2Wtmh7aj4/tnpMhZ6t6RI2ucJ6PfFUehurh1FGIHMF
         lgT4EbX9c5p2xj85KA7s689Gw5XF+4e0MrDTQ1w6mb6vTSDf1xW4WnZwqgECK666rp5q
         Q275JL+xZ++C1ERpOT33uxpRsMwhntvBtxrQAA47z+daqfyxW2+YFetqdkNHMAbqAdQs
         vlzw==
X-Forwarded-Encrypted: i=1;
 AJvYcCVpk4azxncXJfiRlbOjTRURLw8WejwbEmTQZpZYXyQnybOfNz06kilYNU6m1lE+PpVEOnPTW5s/VJq9d/I=@vger.kernel.org
X-Gm-Message-State: AOJu0Yyyg4UnQgIPmKB3rT3Bsi3vXaoKd3uyfXSIDePHC+suvsWeBZL/
	Xpub7BLO53v5xnt7fZrHCQNdeHA+UbTiaQIcnzJ71NC5nAURZ558Qtrd64miJQ1vKoHwN7BpSs2
	aNOEftGd1MHNBO3PAD0O54Gzuy/3hGhNxxAgP9bfsgZ537jEbLK8Y
X-Gm-Gg: AY/fxX5QFb9nXrsEseN+ozsD6GMDHQ+lJypfWBbCWJZSr4HGKM1+Y4xWo9RLmmJfI/W
	jqhVeYnqw7oO6NSSk4GuEQD1HeKBEmaDkrhV7sL4awUfOrFhuyhS64juJKuZgV7lTLBDhmmBQwk
	ee2YpiJXDtdtGonITW84Iwgc/lQJ4Opl1TgxW5fbT30Tb08V09mqn0FCj5vwfrXc6K8cQzVXkDV
	TzUMgnUmvXPChKkhH4FJTRSAksNWKl6wZUoTPquk52TgjkGSur5FwrYM4oHy7pUhWRbwtM8GT3O
	X7LBAM3WovBwmWentfZvnQ8yO/qsDYW8poK7hAeyguStE8qWVj6Rf16dYdK+0WEqx9/BjeVZSKn
	6RAsNug76zgBp7can19G+gsfra2g=
X-Google-Smtp-Source: 
 AGHT+IGNQEqolSe0GRey1hiW0JFby1muwdAOs8a0rdXSBcyDEGR7AokuE8oBuZkKdsE8eK7wC11nJb6ffqZu
X-Received: by 2002:a17:90a:da8c:b0:341:a9c7:8fa0 with SMTP id
 98e67ed59e1d1-34e77607cbfmr557975a91.4.1766025903616;
        Wed, 17 Dec 2025 18:45:03 -0800 (PST)
Received: from c7-smtp-2023.dev.purestorage.com ([2620:125:9017:12:36:3:5:0])
        by smtp-relay.gmail.com with ESMTPS id
 98e67ed59e1d1-34e70d4d0f0sm154668a91.2.2025.12.17.18.45.03
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Wed, 17 Dec 2025 18:45:03 -0800 (PST)
X-Relaying-Domain: purestorage.com
Received: from dev-csander.dev.purestorage.com (unknown
 [IPv6:2620:125:9007:640:ffff::1199])
	by c7-smtp-2023.dev.purestorage.com (Postfix) with ESMTP id 013D43420DD;
	Wed, 17 Dec 2025 19:45:03 -0700 (MST)
Received: by dev-csander.dev.purestorage.com (Postfix, from userid 1557716354)
	id F3A51E417CF; Wed, 17 Dec 2025 19:45:02 -0700 (MST)
From: Caleb Sander Mateos <csander@purestorage.com>
To: Jens Axboe <axboe@kernel.dk>,
	io-uring@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Joanne Koong <joannelkoong@gmail.com>,
	Caleb Sander Mateos <csander@purestorage.com>
Subject: [PATCH v6 5/6] io_uring: factor out uring_lock helpers
Date: Wed, 17 Dec 2025 19:44:58 -0700
Message-ID: <20251218024459.1083572-6-csander@purestorage.com>
X-Mailer: git-send-email 2.45.2
In-Reply-To: <20251218024459.1083572-1-csander@purestorage.com>
References: <20251218024459.1083572-1-csander@purestorage.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

A subsequent commit will skip acquiring the io_ring_ctx uring_lock in
io_uring_enter() and io_handle_tw_list() for IORING_SETUP_SINGLE_ISSUER.
Prepare for this change by factoring out the uring_lock accesses under
these functions into helpers. Aside from the helpers, the only remaining
access of uring_lock is its mutex_init() call. Define a struct
io_ring_ctx_lock_state to pass state from io_ring_ctx_lock() to
io_ring_ctx_unlock(). It's currently empty but a subsequent commit will
add fields.

Helpers:
- io_ring_ctx_lock() for mutex_lock()
- io_ring_ctx_lock_nested() for mutex_lock_nested()
- io_ring_ctx_trylock() for mutex_trylock()
- io_ring_ctx_unlock() for mutex_unlock()
- io_ring_ctx_lock_held() for lockdep_is_held()
- io_ring_ctx_assert_locked() for lockdep_assert_held()

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
---
 include/linux/io_uring_types.h |  12 +--
 io_uring/cancel.c              |  40 ++++----
 io_uring/cancel.h              |   5 +-
 io_uring/eventfd.c             |   5 +-
 io_uring/fdinfo.c              |   8 +-
 io_uring/filetable.c           |   8 +-
 io_uring/futex.c               |  14 +--
 io_uring/io_uring.c            | 181 +++++++++++++++++++--------------
 io_uring/io_uring.h            |  75 +++++++++++---
 io_uring/kbuf.c                |  32 +++---
 io_uring/memmap.h              |   2 +-
 io_uring/msg_ring.c            |  29 ++++--
 io_uring/notif.c               |   5 +-
 io_uring/notif.h               |   3 +-
 io_uring/openclose.c           |  14 +--
 io_uring/poll.c                |  21 ++--
 io_uring/register.c            |  79 +++++++-------
 io_uring/rsrc.c                |  51 ++++++----
 io_uring/rsrc.h                |   6 +-
 io_uring/rw.c                  |   2 +-
 io_uring/splice.c              |   5 +-
 io_uring/sqpoll.c              |   5 +-
 io_uring/tctx.c                |  27 +++--
 io_uring/tctx.h                |   5 +-
 io_uring/uring_cmd.c           |  13 ++-
 io_uring/waitid.c              |  13 +--
 io_uring/zcrx.c                |   2 +-
 27 files changed, 404 insertions(+), 258 deletions(-)

diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index e1adb0d20a0a..74d202394b20 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -86,11 +86,11 @@ struct io_mapped_region {
=20
 /*
  * Return value from io_buffer_list selection, to avoid stashing it in
  * struct io_kiocb. For legacy/classic provided buffers, keeping a referen=
ce
  * across execution contexts are fine. But for ring provided buffers, the
- * list may go away as soon as ->uring_lock is dropped. As the io_kiocb
+ * list may go away as soon as the ctx uring lock is dropped. As the io_ki=
ocb
  * persists, it's better to just keep the buffer local for those cases.
  */
 struct io_br_sel {
 	struct io_buffer_list *buf_list;
 	/*
@@ -231,11 +231,11 @@ struct io_submit_link {
 	struct io_kiocb		*head;
 	struct io_kiocb		*last;
 };
=20
 struct io_submit_state {
-	/* inline/task_work completion list, under ->uring_lock */
+	/* inline/task_work completion list, under ctx uring lock */
 	struct io_wq_work_node	free_list;
 	/* batch completion logic */
 	struct io_wq_work_list	compl_reqs;
 	struct io_submit_link	link;
=20
@@ -303,16 +303,16 @@ struct io_ring_ctx {
 		unsigned		cached_sq_head;
 		unsigned		sq_entries;
=20
 		/*
 		 * Fixed resources fast path, should be accessed only under
-		 * uring_lock, and updated through io_uring_register(2)
+		 * ctx uring lock, and updated through io_uring_register(2)
 		 */
 		atomic_t		cancel_seq;
=20
 		/*
-		 * ->iopoll_list is protected by the ctx->uring_lock for
+		 * ->iopoll_list is protected by the ctx uring lock for
 		 * io_uring instances that don't use IORING_SETUP_SQPOLL.
 		 * For SQPOLL, only the single threaded io_sq_thread() will
 		 * manipulate the list, hence no extra locking is needed there.
 		 */
 		bool			poll_multi_queue;
@@ -324,11 +324,11 @@ struct io_ring_ctx {
 		struct io_alloc_cache	imu_cache;
=20
 		struct io_submit_state	submit_state;
=20
 		/*
-		 * Modifications are protected by ->uring_lock and ->mmap_lock.
+		 * Modifications protected by ctx uring lock and ->mmap_lock.
 		 * The buffer list's io mapped region should be stable once
 		 * published.
 		 */
 		struct xarray		io_bl_xa;
=20
@@ -467,11 +467,11 @@ struct io_ring_ctx {
 	struct io_mapped_region		param_region;
 };
=20
 /*
  * Token indicating function is called in task work context:
- * ctx->uring_lock is held and any completions generated will be flushed.
+ * ctx uring lock is held and any completions generated will be flushed.
  * ONLY core io_uring.c should instantiate this struct.
  */
 struct io_tw_state {
 	bool cancel;
 };
diff --git a/io_uring/cancel.c b/io_uring/cancel.c
index ca12ac10c0ae..68b58c7765ef 100644
--- a/io_uring/cancel.c
+++ b/io_uring/cancel.c
@@ -168,10 +168,11 @@ int io_async_cancel_prep(struct io_kiocb *req, const =
struct io_uring_sqe *sqe)
 static int __io_async_cancel(struct io_cancel_data *cd,
 			     struct io_uring_task *tctx,
 			     unsigned int issue_flags)
 {
 	bool all =3D cd->flags & (IORING_ASYNC_CANCEL_ALL|IORING_ASYNC_CANCEL_ANY=
);
+	struct io_ring_ctx_lock_state lock_state;
 	struct io_ring_ctx *ctx =3D cd->ctx;
 	struct io_tctx_node *node;
 	int ret, nr =3D 0;
=20
 	do {
@@ -182,21 +183,21 @@ static int __io_async_cancel(struct io_cancel_data *c=
d,
 			return ret;
 		nr++;
 	} while (1);
=20
 	/* slow path, try all io-wq's */
-	io_ring_submit_lock(ctx, issue_flags);
+	io_ring_submit_lock(ctx, issue_flags, &lock_state);
 	ret =3D -ENOENT;
 	list_for_each_entry(node, &ctx->tctx_list, ctx_node) {
 		ret =3D io_async_cancel_one(node->task->io_uring, cd);
 		if (ret !=3D -ENOENT) {
 			if (!all)
 				break;
 			nr++;
 		}
 	}
-	io_ring_submit_unlock(ctx, issue_flags);
+	io_ring_submit_unlock(ctx, issue_flags, &lock_state);
 	return all ? nr : ret;
 }
=20
 int io_async_cancel(struct io_kiocb *req, unsigned int issue_flags)
 {
@@ -238,11 +239,11 @@ int io_async_cancel(struct io_kiocb *req, unsigned in=
t issue_flags)
 static int __io_sync_cancel(struct io_uring_task *tctx,
 			    struct io_cancel_data *cd, int fd)
 {
 	struct io_ring_ctx *ctx =3D cd->ctx;
=20
-	/* fixed must be grabbed every time since we drop the uring_lock */
+	/* fixed must be grabbed every time since we drop the ctx uring lock */
 	if ((cd->flags & IORING_ASYNC_CANCEL_FD) &&
 	    (cd->flags & IORING_ASYNC_CANCEL_FD_FIXED)) {
 		struct io_rsrc_node *node;
=20
 		node =3D io_rsrc_node_lookup(&ctx->file_table.data, fd);
@@ -254,12 +255,12 @@ static int __io_sync_cancel(struct io_uring_task *tct=
x,
 	}
=20
 	return __io_async_cancel(cd, tctx, 0);
 }
=20
-int io_sync_cancel(struct io_ring_ctx *ctx, void __user *arg)
-	__must_hold(&ctx->uring_lock)
+int io_sync_cancel(struct io_ring_ctx *ctx, void __user *arg,
+		   struct io_ring_ctx_lock_state *lock_state)
 {
 	struct io_cancel_data cd =3D {
 		.ctx	=3D ctx,
 		.seq	=3D atomic_inc_return(&ctx->cancel_seq),
 	};
@@ -267,10 +268,12 @@ int io_sync_cancel(struct io_ring_ctx *ctx, void __us=
er *arg)
 	struct io_uring_sync_cancel_reg sc;
 	struct file *file =3D NULL;
 	DEFINE_WAIT(wait);
 	int ret, i;
=20
+	io_ring_ctx_assert_locked(ctx);
+
 	if (copy_from_user(&sc, arg, sizeof(sc)))
 		return -EFAULT;
 	if (sc.flags & ~CANCEL_FLAGS)
 		return -EINVAL;
 	for (i =3D 0; i < ARRAY_SIZE(sc.pad); i++)
@@ -317,11 +320,11 @@ int io_sync_cancel(struct io_ring_ctx *ctx, void __us=
er *arg)
=20
 		prepare_to_wait(&ctx->cq_wait, &wait, TASK_INTERRUPTIBLE);
=20
 		ret =3D __io_sync_cancel(current->io_uring, &cd, sc.fd);
=20
-		mutex_unlock(&ctx->uring_lock);
+		io_ring_ctx_unlock(ctx, lock_state);
 		if (ret !=3D -EALREADY)
 			break;
=20
 		ret =3D io_run_task_work_sig(ctx);
 		if (ret < 0)
@@ -329,15 +332,15 @@ int io_sync_cancel(struct io_ring_ctx *ctx, void __us=
er *arg)
 		ret =3D schedule_hrtimeout(&timeout, HRTIMER_MODE_ABS);
 		if (!ret) {
 			ret =3D -ETIME;
 			break;
 		}
-		mutex_lock(&ctx->uring_lock);
+		io_ring_ctx_lock(ctx, lock_state);
 	} while (1);
=20
 	finish_wait(&ctx->cq_wait, &wait);
-	mutex_lock(&ctx->uring_lock);
+	io_ring_ctx_lock(ctx, lock_state);
=20
 	if (ret =3D=3D -ENOENT || ret > 0)
 		ret =3D 0;
 out:
 	if (file)
@@ -351,11 +354,11 @@ bool io_cancel_remove_all(struct io_ring_ctx *ctx, st=
ruct io_uring_task *tctx,
 {
 	struct hlist_node *tmp;
 	struct io_kiocb *req;
 	bool found =3D false;
=20
-	lockdep_assert_held(&ctx->uring_lock);
+	io_ring_ctx_assert_locked(ctx);
=20
 	hlist_for_each_entry_safe(req, tmp, list, hash_node) {
 		if (!io_match_task_safe(req, tctx, cancel_all))
 			continue;
 		hlist_del_init(&req->hash_node);
@@ -368,24 +371,25 @@ bool io_cancel_remove_all(struct io_ring_ctx *ctx, st=
ruct io_uring_task *tctx,
=20
 int io_cancel_remove(struct io_ring_ctx *ctx, struct io_cancel_data *cd,
 		     unsigned int issue_flags, struct hlist_head *list,
 		     bool (*cancel)(struct io_kiocb *))
 {
+	struct io_ring_ctx_lock_state lock_state;
 	struct hlist_node *tmp;
 	struct io_kiocb *req;
 	int nr =3D 0;
=20
-	io_ring_submit_lock(ctx, issue_flags);
+	io_ring_submit_lock(ctx, issue_flags, &lock_state);
 	hlist_for_each_entry_safe(req, tmp, list, hash_node) {
 		if (!io_cancel_req_match(req, cd))
 			continue;
 		if (cancel(req))
 			nr++;
 		if (!(cd->flags & IORING_ASYNC_CANCEL_ALL))
 			break;
 	}
-	io_ring_submit_unlock(ctx, issue_flags);
+	io_ring_submit_unlock(ctx, issue_flags, &lock_state);
 	return nr ?: -ENOENT;
 }
=20
 static bool io_match_linked(struct io_kiocb *head)
 {
@@ -477,37 +481,39 @@ __cold bool io_cancel_ctx_cb(struct io_wq_work *work,=
 void *data)
 	return req->ctx =3D=3D data;
 }
=20
 static __cold bool io_uring_try_cancel_iowq(struct io_ring_ctx *ctx)
 {
+	struct io_ring_ctx_lock_state lock_state;
 	struct io_tctx_node *node;
 	enum io_wq_cancel cret;
 	bool ret =3D false;
=20
-	mutex_lock(&ctx->uring_lock);
+	io_ring_ctx_lock(ctx, &lock_state);
 	list_for_each_entry(node, &ctx->tctx_list, ctx_node) {
 		struct io_uring_task *tctx =3D node->task->io_uring;
=20
 		/*
-		 * io_wq will stay alive while we hold uring_lock, because it's
-		 * killed after ctx nodes, which requires to take the lock.
+		 * io_wq will stay alive while we hold ctx uring lock, because
+		 * it's killed after ctx nodes, which requires to take the lock.
 		 */
 		if (!tctx || !tctx->io_wq)
 			continue;
 		cret =3D io_wq_cancel_cb(tctx->io_wq, io_cancel_ctx_cb, ctx, true);
 		ret |=3D (cret !=3D IO_WQ_CANCEL_NOTFOUND);
 	}
-	mutex_unlock(&ctx->uring_lock);
+	io_ring_ctx_unlock(ctx, &lock_state);
=20
 	return ret;
 }
=20
 __cold bool io_uring_try_cancel_requests(struct io_ring_ctx *ctx,
 					 struct io_uring_task *tctx,
 					 bool cancel_all, bool is_sqpoll_thread)
 {
 	struct io_task_cancel cancel =3D { .tctx =3D tctx, .all =3D cancel_all, };
+	struct io_ring_ctx_lock_state lock_state;
 	enum io_wq_cancel cret;
 	bool ret =3D false;
=20
 	/* set it so io_req_local_work_add() would wake us up */
 	if (ctx->flags & IORING_SETUP_DEFER_TASKRUN) {
@@ -542,17 +548,17 @@ __cold bool io_uring_try_cancel_requests(struct io_ri=
ng_ctx *ctx,
 	}
=20
 	if ((ctx->flags & IORING_SETUP_DEFER_TASKRUN) &&
 	    io_allowed_defer_tw_run(ctx))
 		ret |=3D io_run_local_work(ctx, INT_MAX, INT_MAX) > 0;
-	mutex_lock(&ctx->uring_lock);
+	io_ring_ctx_lock(ctx, &lock_state);
 	ret |=3D io_cancel_defer_files(ctx, tctx, cancel_all);
 	ret |=3D io_poll_remove_all(ctx, tctx, cancel_all);
 	ret |=3D io_waitid_remove_all(ctx, tctx, cancel_all);
 	ret |=3D io_futex_remove_all(ctx, tctx, cancel_all);
 	ret |=3D io_uring_try_cancel_uring_cmd(ctx, tctx, cancel_all);
-	mutex_unlock(&ctx->uring_lock);
+	io_ring_ctx_unlock(ctx, &lock_state);
 	ret |=3D io_kill_timeouts(ctx, tctx, cancel_all);
 	if (tctx)
 		ret |=3D io_run_task_work() > 0;
 	else
 		ret |=3D flush_delayed_work(&ctx->fallback_work);
diff --git a/io_uring/cancel.h b/io_uring/cancel.h
index 6783961ede1b..ce4f6b69218e 100644
--- a/io_uring/cancel.h
+++ b/io_uring/cancel.h
@@ -2,10 +2,12 @@
 #ifndef IORING_CANCEL_H
 #define IORING_CANCEL_H
=20
 #include <linux/io_uring_types.h>
=20
+#include "io_uring.h"
+
 struct io_cancel_data {
 	struct io_ring_ctx *ctx;
 	union {
 		u64 data;
 		struct file *file;
@@ -19,11 +21,12 @@ int io_async_cancel_prep(struct io_kiocb *req, const st=
ruct io_uring_sqe *sqe);
 int io_async_cancel(struct io_kiocb *req, unsigned int issue_flags);
=20
 int io_try_cancel(struct io_uring_task *tctx, struct io_cancel_data *cd,
 		  unsigned int issue_flags);
=20
-int io_sync_cancel(struct io_ring_ctx *ctx, void __user *arg);
+int io_sync_cancel(struct io_ring_ctx *ctx, void __user *arg,
+		   struct io_ring_ctx_lock_state *lock_state);
 bool io_cancel_req_match(struct io_kiocb *req, struct io_cancel_data *cd);
 bool io_match_task_safe(struct io_kiocb *head, struct io_uring_task *tctx,
 			bool cancel_all);
=20
 bool io_cancel_remove_all(struct io_ring_ctx *ctx, struct io_uring_task *t=
ctx,
diff --git a/io_uring/eventfd.c b/io_uring/eventfd.c
index 78f8ab7db104..0c615be71edf 100644
--- a/io_uring/eventfd.c
+++ b/io_uring/eventfd.c
@@ -6,10 +6,11 @@
 #include <linux/eventfd.h>
 #include <linux/eventpoll.h>
 #include <linux/io_uring.h>
 #include <linux/io_uring_types.h>
=20
+#include "io_uring.h"
 #include "io-wq.h"
 #include "eventfd.h"
=20
 struct io_ev_fd {
 	struct eventfd_ctx	*cq_ev_fd;
@@ -118,11 +119,11 @@ int io_eventfd_register(struct io_ring_ctx *ctx, void=
 __user *arg,
 	struct io_ev_fd *ev_fd;
 	__s32 __user *fds =3D arg;
 	int fd;
=20
 	ev_fd =3D rcu_dereference_protected(ctx->io_ev_fd,
-					lockdep_is_held(&ctx->uring_lock));
+					io_ring_ctx_lock_held(ctx));
 	if (ev_fd)
 		return -EBUSY;
=20
 	if (copy_from_user(&fd, fds, sizeof(*fds)))
 		return -EFAULT;
@@ -154,11 +155,11 @@ int io_eventfd_register(struct io_ring_ctx *ctx, void=
 __user *arg,
 int io_eventfd_unregister(struct io_ring_ctx *ctx)
 {
 	struct io_ev_fd *ev_fd;
=20
 	ev_fd =3D rcu_dereference_protected(ctx->io_ev_fd,
-					lockdep_is_held(&ctx->uring_lock));
+					io_ring_ctx_lock_held(ctx));
 	if (ev_fd) {
 		ctx->has_evfd =3D false;
 		rcu_assign_pointer(ctx->io_ev_fd, NULL);
 		io_eventfd_put(ev_fd);
 		return 0;
diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c
index a87d4e26eee8..886c06278a9b 100644
--- a/io_uring/fdinfo.c
+++ b/io_uring/fdinfo.c
@@ -9,10 +9,11 @@
 #include <linux/io_uring.h>
=20
 #include <uapi/linux/io_uring.h>
=20
 #include "filetable.h"
+#include "io_uring.h"
 #include "sqpoll.h"
 #include "fdinfo.h"
 #include "cancel.h"
 #include "rsrc.h"
 #include "opdef.h"
@@ -75,11 +76,11 @@ static void __io_uring_show_fdinfo(struct io_ring_ctx *=
ctx, struct seq_file *m)
 	if (ctx->flags & IORING_SETUP_SQE128)
 		sq_shift =3D 1;
=20
 	/*
 	 * we may get imprecise sqe and cqe info if uring is actively running
-	 * since we get cached_sq_head and cached_cq_tail without uring_lock
+	 * since we get cached_sq_head and cached_cq_tail without ctx uring lock
 	 * and sq_tail and cq_head are changed by userspace. But it's ok since
 	 * we usually use these info when it is stuck.
 	 */
 	seq_printf(m, "SqMask:\t0x%x\n", sq_mask);
 	seq_printf(m, "SqHead:\t%u\n", sq_head);
@@ -249,16 +250,17 @@ static void __io_uring_show_fdinfo(struct io_ring_ctx=
 *ctx, struct seq_file *m)
  * anything else to get an extra reference.
  */
 __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *file)
 {
 	struct io_ring_ctx *ctx =3D file->private_data;
+	struct io_ring_ctx_lock_state lock_state;
=20
 	/*
 	 * Avoid ABBA deadlock between the seq lock and the io_uring mutex,
 	 * since fdinfo case grabs it in the opposite direction of normal use
 	 * cases.
 	 */
-	if (mutex_trylock(&ctx->uring_lock)) {
+	if (io_ring_ctx_trylock(ctx, &lock_state)) {
 		__io_uring_show_fdinfo(ctx, m);
-		mutex_unlock(&ctx->uring_lock);
+		io_ring_ctx_unlock(ctx, &lock_state);
 	}
 }
diff --git a/io_uring/filetable.c b/io_uring/filetable.c
index 794ef95df293..40ad4a08dc89 100644
--- a/io_uring/filetable.c
+++ b/io_uring/filetable.c
@@ -55,14 +55,15 @@ void io_free_file_tables(struct io_ring_ctx *ctx, struc=
t io_file_table *table)
 	table->bitmap =3D NULL;
 }
=20
 static int io_install_fixed_file(struct io_ring_ctx *ctx, struct file *fil=
e,
 				 u32 slot_index)
-	__must_hold(&ctx->uring_lock)
 {
 	struct io_rsrc_node *node;
=20
+	io_ring_ctx_assert_locked(ctx);
+
 	if (io_is_uring_fops(file))
 		return -EBADF;
 	if (!ctx->file_table.data.nr)
 		return -ENXIO;
 	if (slot_index >=3D ctx->file_table.data.nr)
@@ -105,16 +106,17 @@ int __io_fixed_fd_install(struct io_ring_ctx *ctx, st=
ruct file *file,
  * fput() is called correspondingly.
  */
 int io_fixed_fd_install(struct io_kiocb *req, unsigned int issue_flags,
 			struct file *file, unsigned int file_slot)
 {
+	struct io_ring_ctx_lock_state lock_state;
 	struct io_ring_ctx *ctx =3D req->ctx;
 	int ret;
=20
-	io_ring_submit_lock(ctx, issue_flags);
+	io_ring_submit_lock(ctx, issue_flags, &lock_state);
 	ret =3D __io_fixed_fd_install(ctx, file, file_slot);
-	io_ring_submit_unlock(ctx, issue_flags);
+	io_ring_submit_unlock(ctx, issue_flags, &lock_state);
=20
 	if (unlikely(ret < 0))
 		fput(file);
 	return ret;
 }
diff --git a/io_uring/futex.c b/io_uring/futex.c
index 11bfff5a80df..aeda00981c7a 100644
--- a/io_uring/futex.c
+++ b/io_uring/futex.c
@@ -220,22 +220,23 @@ static void io_futex_wake_fn(struct wake_q_head *wake=
_q, struct futex_q *q)
=20
 int io_futexv_wait(struct io_kiocb *req, unsigned int issue_flags)
 {
 	struct io_futex *iof =3D io_kiocb_to_cmd(req, struct io_futex);
 	struct io_futexv_data *ifd =3D req->async_data;
+	struct io_ring_ctx_lock_state lock_state;
 	struct io_ring_ctx *ctx =3D req->ctx;
 	int ret, woken =3D -1;
=20
-	io_ring_submit_lock(ctx, issue_flags);
+	io_ring_submit_lock(ctx, issue_flags, &lock_state);
=20
 	ret =3D futex_wait_multiple_setup(ifd->futexv, iof->futex_nr, &woken);
=20
 	/*
 	 * Error case, ret is < 0. Mark the request as failed.
 	 */
 	if (unlikely(ret < 0)) {
-		io_ring_submit_unlock(ctx, issue_flags);
+		io_ring_submit_unlock(ctx, issue_flags, &lock_state);
 		req_set_fail(req);
 		io_req_set_res(req, ret, 0);
 		io_req_async_data_free(req);
 		return IOU_COMPLETE;
 	}
@@ -265,27 +266,28 @@ int io_futexv_wait(struct io_kiocb *req, unsigned int=
 issue_flags)
 		iof->futexv_unqueued =3D 1;
 		if (woken !=3D -1)
 			io_req_set_res(req, woken, 0);
 	}
=20
-	io_ring_submit_unlock(ctx, issue_flags);
+	io_ring_submit_unlock(ctx, issue_flags, &lock_state);
 	return IOU_ISSUE_SKIP_COMPLETE;
 }
=20
 int io_futex_wait(struct io_kiocb *req, unsigned int issue_flags)
 {
 	struct io_futex *iof =3D io_kiocb_to_cmd(req, struct io_futex);
+	struct io_ring_ctx_lock_state lock_state;
 	struct io_ring_ctx *ctx =3D req->ctx;
 	struct io_futex_data *ifd =3D NULL;
 	int ret;
=20
 	if (!iof->futex_mask) {
 		ret =3D -EINVAL;
 		goto done;
 	}
=20
-	io_ring_submit_lock(ctx, issue_flags);
+	io_ring_submit_lock(ctx, issue_flags, &lock_state);
 	ifd =3D io_cache_alloc(&ctx->futex_cache, GFP_NOWAIT);
 	if (!ifd) {
 		ret =3D -ENOMEM;
 		goto done_unlock;
 	}
@@ -299,17 +301,17 @@ int io_futex_wait(struct io_kiocb *req, unsigned int =
issue_flags)
=20
 	ret =3D futex_wait_setup(iof->uaddr, iof->futex_val, iof->futex_flags,
 			       &ifd->q, NULL, NULL);
 	if (!ret) {
 		hlist_add_head(&req->hash_node, &ctx->futex_list);
-		io_ring_submit_unlock(ctx, issue_flags);
+		io_ring_submit_unlock(ctx, issue_flags, &lock_state);
=20
 		return IOU_ISSUE_SKIP_COMPLETE;
 	}
=20
 done_unlock:
-	io_ring_submit_unlock(ctx, issue_flags);
+	io_ring_submit_unlock(ctx, issue_flags, &lock_state);
 done:
 	if (ret < 0)
 		req_set_fail(req);
 	io_req_set_res(req, ret, 0);
 	io_req_async_data_free(req);
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index ab0af4a38714..237663382a5e 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -234,20 +234,21 @@ static inline bool io_should_terminate_tw(struct io_r=
ing_ctx *ctx)
 static __cold void io_fallback_req_func(struct work_struct *work)
 {
 	struct io_ring_ctx *ctx =3D container_of(work, struct io_ring_ctx,
 						fallback_work.work);
 	struct llist_node *node =3D llist_del_all(&ctx->fallback_llist);
+	struct io_ring_ctx_lock_state lock_state;
 	struct io_kiocb *req, *tmp;
 	struct io_tw_state ts =3D {};
=20
 	percpu_ref_get(&ctx->refs);
-	mutex_lock(&ctx->uring_lock);
+	io_ring_ctx_lock(ctx, &lock_state);
 	ts.cancel =3D io_should_terminate_tw(ctx);
 	llist_for_each_entry_safe(req, tmp, node, io_task_work.node)
 		req->io_task_work.func((struct io_tw_req){req}, ts);
 	io_submit_flush_completions(ctx);
-	mutex_unlock(&ctx->uring_lock);
+	io_ring_ctx_unlock(ctx, &lock_state);
 	percpu_ref_put(&ctx->refs);
 }
=20
 static int io_alloc_hash_table(struct io_hash_table *table, unsigned bits)
 {
@@ -514,11 +515,11 @@ unsigned io_linked_nr(struct io_kiocb *req)
=20
 static __cold noinline void io_queue_deferred(struct io_ring_ctx *ctx)
 {
 	bool drain_seen =3D false, first =3D true;
=20
-	lockdep_assert_held(&ctx->uring_lock);
+	io_ring_ctx_assert_locked(ctx);
 	__io_req_caches_free(ctx);
=20
 	while (!list_empty(&ctx->defer_list)) {
 		struct io_defer_entry *de =3D list_first_entry(&ctx->defer_list,
 						struct io_defer_entry, list);
@@ -577,13 +578,15 @@ static void io_cq_unlock_post(struct io_ring_ctx *ctx)
 	spin_unlock(&ctx->completion_lock);
 	io_cqring_wake(ctx);
 	io_commit_cqring_flush(ctx);
 }
=20
-static void __io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool dying)
+static void
+__io_cqring_overflow_flush(struct io_ring_ctx *ctx, bool dying,
+			   struct io_ring_ctx_lock_state *lock_state)
 {
-	lockdep_assert_held(&ctx->uring_lock);
+	io_ring_ctx_assert_locked(ctx);
=20
 	/* don't abort if we're dying, entries must get freed */
 	if (!dying && __io_cqring_events(ctx) =3D=3D ctx->cq_entries)
 		return;
=20
@@ -620,13 +623,13 @@ static void __io_cqring_overflow_flush(struct io_ring=
_ctx *ctx, bool dying)
 		 * to care for a non-real case.
 		 */
 		if (need_resched()) {
 			ctx->cqe_sentinel =3D ctx->cqe_cached;
 			io_cq_unlock_post(ctx);
-			mutex_unlock(&ctx->uring_lock);
+			io_ring_ctx_unlock(ctx, lock_state);
 			cond_resched();
-			mutex_lock(&ctx->uring_lock);
+			io_ring_ctx_lock(ctx, lock_state);
 			io_cq_lock(ctx);
 		}
 	}
=20
 	if (list_empty(&ctx->cq_overflow_list)) {
@@ -634,21 +637,24 @@ static void __io_cqring_overflow_flush(struct io_ring=
_ctx *ctx, bool dying)
 		atomic_andnot(IORING_SQ_CQ_OVERFLOW, &ctx->rings->sq_flags);
 	}
 	io_cq_unlock_post(ctx);
 }
=20
-static void io_cqring_overflow_kill(struct io_ring_ctx *ctx)
+static void io_cqring_overflow_kill(struct io_ring_ctx *ctx,
+				    struct io_ring_ctx_lock_state *lock_state)
 {
 	if (ctx->rings)
-		__io_cqring_overflow_flush(ctx, true);
+		__io_cqring_overflow_flush(ctx, true, lock_state);
 }
=20
 static void io_cqring_do_overflow_flush(struct io_ring_ctx *ctx)
 {
-	mutex_lock(&ctx->uring_lock);
-	__io_cqring_overflow_flush(ctx, false);
-	mutex_unlock(&ctx->uring_lock);
+	struct io_ring_ctx_lock_state lock_state;
+
+	io_ring_ctx_lock(ctx, &lock_state);
+	__io_cqring_overflow_flush(ctx, false, &lock_state);
+	io_ring_ctx_unlock(ctx, &lock_state);
 }
=20
 /* must to be called somewhat shortly after putting a request */
 static inline void io_put_task(struct io_kiocb *req)
 {
@@ -883,15 +889,15 @@ bool io_post_aux_cqe(struct io_ring_ctx *ctx, u64 use=
r_data, s32 res, u32 cflags
 	return filled;
 }
=20
 /*
  * Must be called from inline task_work so we know a flush will happen lat=
er,
- * and obviously with ctx->uring_lock held (tw always has that).
+ * and obviously with ctx uring lock held (tw always has that).
  */
 void io_add_aux_cqe(struct io_ring_ctx *ctx, u64 user_data, s32 res, u32 c=
flags)
 {
-	lockdep_assert_held(&ctx->uring_lock);
+	io_ring_ctx_assert_locked(ctx);
 	lockdep_assert(ctx->lockless_cq);
=20
 	if (!io_fill_cqe_aux(ctx, user_data, res, cflags)) {
 		struct io_cqe cqe =3D io_init_cqe(user_data, res, cflags);
=20
@@ -916,11 +922,11 @@ bool io_req_post_cqe(struct io_kiocb *req, s32 res, u=
32 cflags)
 	 */
 	if (!wq_list_empty(&ctx->submit_state.compl_reqs))
 		__io_submit_flush_completions(ctx);
=20
 	lockdep_assert(!io_wq_current_is_worker());
-	lockdep_assert_held(&ctx->uring_lock);
+	io_ring_ctx_assert_locked(ctx);
=20
 	if (!ctx->lockless_cq) {
 		spin_lock(&ctx->completion_lock);
 		posted =3D io_fill_cqe_aux(ctx, req->cqe.user_data, res, cflags);
 		spin_unlock(&ctx->completion_lock);
@@ -940,11 +946,11 @@ bool io_req_post_cqe32(struct io_kiocb *req, struct i=
o_uring_cqe cqe[2])
 {
 	struct io_ring_ctx *ctx =3D req->ctx;
 	bool posted;
=20
 	lockdep_assert(!io_wq_current_is_worker());
-	lockdep_assert_held(&ctx->uring_lock);
+	io_ring_ctx_assert_locked(ctx);
=20
 	cqe[0].user_data =3D req->cqe.user_data;
 	if (!ctx->lockless_cq) {
 		spin_lock(&ctx->completion_lock);
 		posted =3D io_fill_cqe_aux32(ctx, cqe);
@@ -969,11 +975,11 @@ static void io_req_complete_post(struct io_kiocb *req=
, unsigned issue_flags)
 	if (WARN_ON_ONCE(!(issue_flags & IO_URING_F_IOWQ)))
 		return;
=20
 	/*
 	 * Handle special CQ sync cases via task_work. DEFER_TASKRUN requires
-	 * the submitter task context, IOPOLL protects with uring_lock.
+	 * the submitter task context, IOPOLL protects with ctx uring lock.
 	 */
 	if (ctx->lockless_cq || (req->flags & REQ_F_REISSUE)) {
 defer_complete:
 		req->io_task_work.func =3D io_req_task_complete;
 		io_req_task_work_add(req);
@@ -994,15 +1000,14 @@ static void io_req_complete_post(struct io_kiocb *re=
q, unsigned issue_flags)
 	 */
 	req_ref_put(req);
 }
=20
 void io_req_defer_failed(struct io_kiocb *req, s32 res)
-	__must_hold(&ctx->uring_lock)
 {
 	const struct io_cold_def *def =3D &io_cold_defs[req->opcode];
=20
-	lockdep_assert_held(&req->ctx->uring_lock);
+	io_ring_ctx_assert_locked(req->ctx);
=20
 	req_set_fail(req);
 	io_req_set_res(req, res, io_put_kbuf(req, res, NULL));
 	if (def->fail)
 		def->fail(req);
@@ -1010,20 +1015,21 @@ void io_req_defer_failed(struct io_kiocb *req, s32 =
res)
 }
=20
 /*
  * A request might get retired back into the request caches even before op=
code
  * handlers and io_issue_sqe() are done with it, e.g. inline completion pa=
th.
- * Because of that, io_alloc_req() should be called only under ->uring_lock
+ * Because of that, io_alloc_req() should be called only under ctx uring l=
ock
  * and with extra caution to not get a request that is still worked on.
  */
 __cold bool __io_alloc_req_refill(struct io_ring_ctx *ctx)
-	__must_hold(&ctx->uring_lock)
 {
 	gfp_t gfp =3D GFP_KERNEL | __GFP_NOWARN | __GFP_ZERO;
 	void *reqs[IO_REQ_ALLOC_BATCH];
 	int ret;
=20
+	io_ring_ctx_assert_locked(ctx);
+
 	ret =3D kmem_cache_alloc_bulk(req_cachep, gfp, ARRAY_SIZE(reqs), reqs);
=20
 	/*
 	 * Bulk alloc is all-or-nothing. If we fail to get a batch,
 	 * retry single alloc to be on the safe side.
@@ -1080,19 +1086,20 @@ static inline struct io_kiocb *io_req_find_next(str=
uct io_kiocb *req)
 	nxt =3D req->link;
 	req->link =3D NULL;
 	return nxt;
 }
=20
-static void ctx_flush_and_put(struct io_ring_ctx *ctx, io_tw_token_t tw)
+static void ctx_flush_and_put(struct io_ring_ctx *ctx, io_tw_token_t tw,
+			      struct io_ring_ctx_lock_state *lock_state)
 {
 	if (!ctx)
 		return;
 	if (ctx->flags & IORING_SETUP_TASKRUN_FLAG)
 		atomic_andnot(IORING_SQ_TASKRUN, &ctx->rings->sq_flags);
=20
 	io_submit_flush_completions(ctx);
-	mutex_unlock(&ctx->uring_lock);
+	io_ring_ctx_unlock(ctx, lock_state);
 	percpu_ref_put(&ctx->refs);
 }
=20
 /*
  * Run queued task_work, returning the number of entries processed in *cou=
nt.
@@ -1101,38 +1108,39 @@ static void ctx_flush_and_put(struct io_ring_ctx *c=
tx, io_tw_token_t tw)
  */
 struct llist_node *io_handle_tw_list(struct llist_node *node,
 				     unsigned int *count,
 				     unsigned int max_entries)
 {
+	struct io_ring_ctx_lock_state lock_state;
 	struct io_ring_ctx *ctx =3D NULL;
 	struct io_tw_state ts =3D { };
=20
 	do {
 		struct llist_node *next =3D node->next;
 		struct io_kiocb *req =3D container_of(node, struct io_kiocb,
 						    io_task_work.node);
=20
 		if (req->ctx !=3D ctx) {
-			ctx_flush_and_put(ctx, ts);
+			ctx_flush_and_put(ctx, ts, &lock_state);
 			ctx =3D req->ctx;
-			mutex_lock(&ctx->uring_lock);
+			io_ring_ctx_lock(ctx, &lock_state);
 			percpu_ref_get(&ctx->refs);
 			ts.cancel =3D io_should_terminate_tw(ctx);
 		}
 		INDIRECT_CALL_2(req->io_task_work.func,
 				io_poll_task_func, io_req_rw_complete,
 				(struct io_tw_req){req}, ts);
 		node =3D next;
 		(*count)++;
 		if (unlikely(need_resched())) {
-			ctx_flush_and_put(ctx, ts);
+			ctx_flush_and_put(ctx, ts, &lock_state);
 			ctx =3D NULL;
 			cond_resched();
 		}
 	} while (node && *count < max_entries);
=20
-	ctx_flush_and_put(ctx, ts);
+	ctx_flush_and_put(ctx, ts, &lock_state);
 	return node;
 }
=20
 static __cold void __io_fallback_tw(struct llist_node *node, bool sync)
 {
@@ -1401,16 +1409,17 @@ static inline int io_run_local_work_locked(struct i=
o_ring_ctx *ctx,
 					max(IO_LOCAL_TW_DEFAULT_MAX, min_events));
 }
=20
 int io_run_local_work(struct io_ring_ctx *ctx, int min_events, int max_eve=
nts)
 {
+	struct io_ring_ctx_lock_state lock_state;
 	struct io_tw_state ts =3D {};
 	int ret;
=20
-	mutex_lock(&ctx->uring_lock);
+	io_ring_ctx_lock(ctx, &lock_state);
 	ret =3D __io_run_local_work(ctx, ts, min_events, max_events);
-	mutex_unlock(&ctx->uring_lock);
+	io_ring_ctx_unlock(ctx, &lock_state);
 	return ret;
 }
=20
 static void io_req_task_cancel(struct io_tw_req tw_req, io_tw_token_t tw)
 {
@@ -1465,12 +1474,13 @@ static inline void io_req_put_rsrc_nodes(struct io_=
kiocb *req)
 		io_put_rsrc_node(req->ctx, req->buf_node);
 }
=20
 static void io_free_batch_list(struct io_ring_ctx *ctx,
 			       struct io_wq_work_node *node)
-	__must_hold(&ctx->uring_lock)
 {
+	io_ring_ctx_assert_locked(ctx);
+
 	do {
 		struct io_kiocb *req =3D container_of(node, struct io_kiocb,
 						    comp_list);
=20
 		if (unlikely(req->flags & IO_REQ_CLEAN_SLOW_FLAGS)) {
@@ -1506,15 +1516,16 @@ static void io_free_batch_list(struct io_ring_ctx *=
ctx,
 		io_req_add_to_cache(req, ctx);
 	} while (node);
 }
=20
 void __io_submit_flush_completions(struct io_ring_ctx *ctx)
-	__must_hold(&ctx->uring_lock)
 {
 	struct io_submit_state *state =3D &ctx->submit_state;
 	struct io_wq_work_node *node;
=20
+	io_ring_ctx_assert_locked(ctx);
+
 	__io_cq_lock(ctx);
 	__wq_list_for_each(node, &state->compl_reqs) {
 		struct io_kiocb *req =3D container_of(node, struct io_kiocb,
 					    comp_list);
=20
@@ -1555,51 +1566,54 @@ static unsigned io_cqring_events(struct io_ring_ctx=
 *ctx)
  * We can't just wait for polled events to come to us, we have to actively
  * find and complete them.
  */
 __cold void io_iopoll_try_reap_events(struct io_ring_ctx *ctx)
 {
+	struct io_ring_ctx_lock_state lock_state;
+
 	if (!(ctx->flags & IORING_SETUP_IOPOLL))
 		return;
=20
-	mutex_lock(&ctx->uring_lock);
+	io_ring_ctx_lock(ctx, &lock_state);
 	while (!wq_list_empty(&ctx->iopoll_list)) {
 		/* let it sleep and repeat later if can't complete a request */
 		if (io_do_iopoll(ctx, true) =3D=3D 0)
 			break;
 		/*
 		 * Ensure we allow local-to-the-cpu processing to take place,
 		 * in this case we need to ensure that we reap all events.
 		 * Also let task_work, etc. to progress by releasing the mutex
 		 */
 		if (need_resched()) {
-			mutex_unlock(&ctx->uring_lock);
+			io_ring_ctx_unlock(ctx, &lock_state);
 			cond_resched();
-			mutex_lock(&ctx->uring_lock);
+			io_ring_ctx_lock(ctx, &lock_state);
 		}
 	}
-	mutex_unlock(&ctx->uring_lock);
+	io_ring_ctx_unlock(ctx, &lock_state);
=20
 	if (ctx->flags & IORING_SETUP_DEFER_TASKRUN)
 		io_move_task_work_from_local(ctx);
 }
=20
-static int io_iopoll_check(struct io_ring_ctx *ctx, unsigned int min_event=
s)
+static int io_iopoll_check(struct io_ring_ctx *ctx, unsigned int min_event=
s,
+			   struct io_ring_ctx_lock_state *lock_state)
 {
 	unsigned int nr_events =3D 0;
 	unsigned long check_cq;
=20
 	min_events =3D min(min_events, ctx->cq_entries);
=20
-	lockdep_assert_held(&ctx->uring_lock);
+	io_ring_ctx_assert_locked(ctx);
=20
 	if (!io_allowed_run_tw(ctx))
 		return -EEXIST;
=20
 	check_cq =3D READ_ONCE(ctx->check_cq);
 	if (unlikely(check_cq)) {
 		if (check_cq & BIT(IO_CHECK_CQ_OVERFLOW_BIT))
-			__io_cqring_overflow_flush(ctx, false);
+			__io_cqring_overflow_flush(ctx, false, lock_state);
 		/*
 		 * Similarly do not spin if we have not informed the user of any
 		 * dropped CQE.
 		 */
 		if (check_cq & BIT(IO_CHECK_CQ_DROPPED_BIT))
@@ -1617,11 +1631,11 @@ static int io_iopoll_check(struct io_ring_ctx *ctx,=
 unsigned int min_events)
 		int ret =3D 0;
=20
 		/*
 		 * If a submit got punted to a workqueue, we can have the
 		 * application entering polling for a command before it gets
-		 * issued. That app will hold the uring_lock for the duration
+		 * issued. That app holds the ctx uring lock for the duration
 		 * of the poll right here, so we need to take a breather every
 		 * now and then to ensure that the issue has a chance to add
 		 * the poll to the issued list. Otherwise we can spin here
 		 * forever, while the workqueue is stuck trying to acquire the
 		 * very same mutex.
@@ -1632,13 +1646,13 @@ static int io_iopoll_check(struct io_ring_ctx *ctx,=
 unsigned int min_events)
=20
 			(void) io_run_local_work_locked(ctx, min_events);
=20
 			if (task_work_pending(current) ||
 			    wq_list_empty(&ctx->iopoll_list)) {
-				mutex_unlock(&ctx->uring_lock);
+				io_ring_ctx_unlock(ctx, lock_state);
 				io_run_task_work();
-				mutex_lock(&ctx->uring_lock);
+				io_ring_ctx_lock(ctx, lock_state);
 			}
 			/* some requests don't go through iopoll_list */
 			if (tail !=3D ctx->cached_cq_tail ||
 			    wq_list_empty(&ctx->iopoll_list))
 				break;
@@ -1669,14 +1683,15 @@ void io_req_task_complete(struct io_tw_req tw_req, =
io_tw_token_t tw)
  * find it from a io_do_iopoll() thread before the issuer is done
  * accessing the kiocb cookie.
  */
 static void io_iopoll_req_issued(struct io_kiocb *req, unsigned int issue_=
flags)
 {
+	struct io_ring_ctx_lock_state lock_state;
 	struct io_ring_ctx *ctx =3D req->ctx;
=20
-	/* workqueue context doesn't hold uring_lock, grab it now */
-	io_ring_submit_lock(ctx, issue_flags);
+	/* workqueue context doesn't hold ctx uring lock, grab it now */
+	io_ring_submit_lock(ctx, issue_flags, &lock_state);
=20
 	/*
 	 * Track whether we have multiple files in our lists. This will impact
 	 * how we do polling eventually, not spinning if we're on potentially
 	 * different devices.
@@ -1710,11 +1725,11 @@ static void io_iopoll_req_issued(struct io_kiocb *r=
eq, unsigned int issue_flags)
 		 */
 		if ((ctx->flags & IORING_SETUP_SQPOLL) &&
 		    wq_has_sleeper(&ctx->sq_data->wait))
 			wake_up(&ctx->sq_data->wait);
=20
-		mutex_unlock(&ctx->uring_lock);
+		io_ring_ctx_unlock(ctx, &lock_state);
 	}
 }
=20
 io_req_flags_t io_file_get_flags(struct file *file)
 {
@@ -1728,16 +1743,17 @@ io_req_flags_t io_file_get_flags(struct file *file)
 		res |=3D REQ_F_SUPPORT_NOWAIT;
 	return res;
 }
=20
 static __cold void io_drain_req(struct io_kiocb *req)
-	__must_hold(&ctx->uring_lock)
 {
 	struct io_ring_ctx *ctx =3D req->ctx;
 	bool drain =3D req->flags & IOSQE_IO_DRAIN;
 	struct io_defer_entry *de;
=20
+	io_ring_ctx_assert_locked(ctx);
+
 	de =3D kmalloc(sizeof(*de), GFP_KERNEL_ACCOUNT);
 	if (!de) {
 		io_req_defer_failed(req, -ENOMEM);
 		return;
 	}
@@ -1960,23 +1976,24 @@ void io_wq_submit_work(struct io_wq_work *work)
 }
=20
 inline struct file *io_file_get_fixed(struct io_kiocb *req, int fd,
 				      unsigned int issue_flags)
 {
+	struct io_ring_ctx_lock_state lock_state;
 	struct io_ring_ctx *ctx =3D req->ctx;
 	struct io_rsrc_node *node;
 	struct file *file =3D NULL;
=20
-	io_ring_submit_lock(ctx, issue_flags);
+	io_ring_submit_lock(ctx, issue_flags, &lock_state);
 	node =3D io_rsrc_node_lookup(&ctx->file_table.data, fd);
 	if (node) {
 		node->refs++;
 		req->file_node =3D node;
 		req->flags |=3D io_slot_flags(node);
 		file =3D io_slot_file(node);
 	}
-	io_ring_submit_unlock(ctx, issue_flags);
+	io_ring_submit_unlock(ctx, issue_flags, &lock_state);
 	return file;
 }
=20
 struct file *io_file_get_normal(struct io_kiocb *req, int fd)
 {
@@ -2004,12 +2021,13 @@ static int io_req_sqe_copy(struct io_kiocb *req, un=
signed int issue_flags)
 	def->sqe_copy(req);
 	return 0;
 }
=20
 static void io_queue_async(struct io_kiocb *req, unsigned int issue_flags,=
 int ret)
-	__must_hold(&req->ctx->uring_lock)
 {
+	io_ring_ctx_assert_locked(req->ctx);
+
 	if (ret !=3D -EAGAIN || (req->flags & REQ_F_NOWAIT)) {
 fail:
 		io_req_defer_failed(req, ret);
 		return;
 	}
@@ -2029,16 +2047,17 @@ static void io_queue_async(struct io_kiocb *req, un=
signed int issue_flags, int r
 		break;
 	}
 }
=20
 static inline void io_queue_sqe(struct io_kiocb *req, unsigned int extra_f=
lags)
-	__must_hold(&req->ctx->uring_lock)
 {
 	unsigned int issue_flags =3D IO_URING_F_NONBLOCK |
 				   IO_URING_F_COMPLETE_DEFER | extra_flags;
 	int ret;
=20
+	io_ring_ctx_assert_locked(req->ctx);
+
 	ret =3D io_issue_sqe(req, issue_flags);
=20
 	/*
 	 * We async punt it if the file wasn't marked NOWAIT, or if the file
 	 * doesn't support non-blocking read/write attempts
@@ -2046,12 +2065,13 @@ static inline void io_queue_sqe(struct io_kiocb *re=
q, unsigned int extra_flags)
 	if (unlikely(ret))
 		io_queue_async(req, issue_flags, ret);
 }
=20
 static void io_queue_sqe_fallback(struct io_kiocb *req)
-	__must_hold(&req->ctx->uring_lock)
 {
+	io_ring_ctx_assert_locked(req->ctx);
+
 	if (unlikely(req->flags & REQ_F_FAIL)) {
 		/*
 		 * We don't submit, fail them all, for that replace hardlinks
 		 * with normal links. Extra REQ_F_LINK is tolerated.
 		 */
@@ -2116,17 +2136,18 @@ static __cold int io_init_fail_req(struct io_kiocb =
*req, int err)
 	return err;
 }
=20
 static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req,
 		       const struct io_uring_sqe *sqe, unsigned int *left)
-	__must_hold(&ctx->uring_lock)
 {
 	const struct io_issue_def *def;
 	unsigned int sqe_flags;
 	int personality;
 	u8 opcode;
=20
+	io_ring_ctx_assert_locked(ctx);
+
 	req->ctx =3D ctx;
 	req->opcode =3D opcode =3D READ_ONCE(sqe->opcode);
 	/* same numerical values with corresponding REQ_F_*, safe to copy */
 	sqe_flags =3D READ_ONCE(sqe->flags);
 	req->flags =3D (__force io_req_flags_t) sqe_flags;
@@ -2269,15 +2290,16 @@ static __cold int io_submit_fail_init(const struct =
io_uring_sqe *sqe,
 	return 0;
 }
=20
 static inline int io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *=
req,
 			 const struct io_uring_sqe *sqe, unsigned int *left)
-	__must_hold(&ctx->uring_lock)
 {
 	struct io_submit_link *link =3D &ctx->submit_state.link;
 	int ret;
=20
+	io_ring_ctx_assert_locked(ctx);
+
 	ret =3D io_init_req(ctx, req, sqe, left);
 	if (unlikely(ret))
 		return io_submit_fail_init(sqe, req, ret);
=20
 	trace_io_uring_submit_req(req);
@@ -2398,16 +2420,17 @@ static bool io_get_sqe(struct io_ring_ctx *ctx, con=
st struct io_uring_sqe **sqe)
 	*sqe =3D &ctx->sq_sqes[head];
 	return true;
 }
=20
 int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr)
-	__must_hold(&ctx->uring_lock)
 {
 	unsigned int entries =3D io_sqring_entries(ctx);
 	unsigned int left;
 	int ret;
=20
+	io_ring_ctx_assert_locked(ctx);
+
 	entries =3D min(nr, entries);
 	if (unlikely(!entries))
 		return 0;
=20
 	ret =3D left =3D entries;
@@ -2830,28 +2853,33 @@ static __cold void __io_req_caches_free(struct io_r=
ing_ctx *ctx)
 	}
 }
=20
 static __cold void io_req_caches_free(struct io_ring_ctx *ctx)
 {
-	guard(mutex)(&ctx->uring_lock);
+	struct io_ring_ctx_lock_state lock_state;
+
+	io_ring_ctx_lock(ctx, &lock_state);
 	__io_req_caches_free(ctx);
+	io_ring_ctx_unlock(ctx, &lock_state);
 }
=20
 static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx)
 {
+	struct io_ring_ctx_lock_state lock_state;
+
 	io_sq_thread_finish(ctx);
=20
-	mutex_lock(&ctx->uring_lock);
+	io_ring_ctx_lock(ctx, &lock_state);
 	io_sqe_buffers_unregister(ctx);
 	io_sqe_files_unregister(ctx);
 	io_unregister_zcrx_ifqs(ctx);
-	io_cqring_overflow_kill(ctx);
+	io_cqring_overflow_kill(ctx, &lock_state);
 	io_eventfd_unregister(ctx);
 	io_free_alloc_caches(ctx);
 	io_destroy_buffers(ctx);
 	io_free_region(ctx->user, &ctx->param_region);
-	mutex_unlock(&ctx->uring_lock);
+	io_ring_ctx_unlock(ctx, &lock_state);
 	if (ctx->sq_creds)
 		put_cred(ctx->sq_creds);
=20
 	WARN_ON_ONCE(!list_empty(&ctx->ltimeout_list));
=20
@@ -2883,14 +2911,15 @@ static __cold void io_ring_ctx_free(struct io_ring_=
ctx *ctx)
=20
 static __cold void io_activate_pollwq_cb(struct callback_head *cb)
 {
 	struct io_ring_ctx *ctx =3D container_of(cb, struct io_ring_ctx,
 					       poll_wq_task_work);
+	struct io_ring_ctx_lock_state lock_state;
=20
-	mutex_lock(&ctx->uring_lock);
+	io_ring_ctx_lock(ctx, &lock_state);
 	ctx->poll_activated =3D true;
-	mutex_unlock(&ctx->uring_lock);
+	io_ring_ctx_unlock(ctx, &lock_state);
=20
 	/*
 	 * Wake ups for some events between start of polling and activation
 	 * might've been lost due to loose synchronisation.
 	 */
@@ -2980,10 +3009,11 @@ static __cold void io_tctx_exit_cb(struct callback_=
head *cb)
 }
=20
 static __cold void io_ring_exit_work(struct work_struct *work)
 {
 	struct io_ring_ctx *ctx =3D container_of(work, struct io_ring_ctx, exit_w=
ork);
+	struct io_ring_ctx_lock_state lock_state;
 	unsigned long timeout =3D jiffies + HZ * 60 * 5;
 	unsigned long interval =3D HZ / 20;
 	struct io_tctx_exit exit;
 	struct io_tctx_node *node;
 	int ret;
@@ -2994,13 +3024,13 @@ static __cold void io_ring_exit_work(struct work_st=
ruct *work)
 	 * we're waiting for refs to drop. We need to reap these manually,
 	 * as nobody else will be looking for them.
 	 */
 	do {
 		if (test_bit(IO_CHECK_CQ_OVERFLOW_BIT, &ctx->check_cq)) {
-			mutex_lock(&ctx->uring_lock);
-			io_cqring_overflow_kill(ctx);
-			mutex_unlock(&ctx->uring_lock);
+			io_ring_ctx_lock(ctx, &lock_state);
+			io_cqring_overflow_kill(ctx, &lock_state);
+			io_ring_ctx_unlock(ctx, &lock_state);
 		}
=20
 		if (ctx->flags & IORING_SETUP_DEFER_TASKRUN)
 			io_move_task_work_from_local(ctx);
=20
@@ -3041,11 +3071,11 @@ static __cold void io_ring_exit_work(struct work_st=
ruct *work)
=20
 	init_completion(&exit.completion);
 	init_task_work(&exit.task_work, io_tctx_exit_cb);
 	exit.ctx =3D ctx;
=20
-	mutex_lock(&ctx->uring_lock);
+	io_ring_ctx_lock(ctx, &lock_state);
 	while (!list_empty(&ctx->tctx_list)) {
 		WARN_ON_ONCE(time_after(jiffies, timeout));
=20
 		node =3D list_first_entry(&ctx->tctx_list, struct io_tctx_node,
 					ctx_node);
@@ -3053,20 +3083,20 @@ static __cold void io_ring_exit_work(struct work_st=
ruct *work)
 		list_rotate_left(&ctx->tctx_list);
 		ret =3D task_work_add(node->task, &exit.task_work, TWA_SIGNAL);
 		if (WARN_ON_ONCE(ret))
 			continue;
=20
-		mutex_unlock(&ctx->uring_lock);
+		io_ring_ctx_unlock(ctx, &lock_state);
 		/*
 		 * See comment above for
 		 * wait_for_completion_interruptible_timeout() on why this
 		 * wait is marked as interruptible.
 		 */
 		wait_for_completion_interruptible(&exit.completion);
-		mutex_lock(&ctx->uring_lock);
+		io_ring_ctx_lock(ctx, &lock_state);
 	}
-	mutex_unlock(&ctx->uring_lock);
+	io_ring_ctx_unlock(ctx, &lock_state);
 	spin_lock(&ctx->completion_lock);
 	spin_unlock(&ctx->completion_lock);
=20
 	/* pairs with RCU read section in io_req_local_work_add() */
 	if (ctx->flags & IORING_SETUP_DEFER_TASKRUN)
@@ -3075,18 +3105,19 @@ static __cold void io_ring_exit_work(struct work_st=
ruct *work)
 	io_ring_ctx_free(ctx);
 }
=20
 static __cold void io_ring_ctx_wait_and_kill(struct io_ring_ctx *ctx)
 {
+	struct io_ring_ctx_lock_state lock_state;
 	unsigned long index;
 	struct creds *creds;
=20
-	mutex_lock(&ctx->uring_lock);
+	io_ring_ctx_lock(ctx, &lock_state);
 	percpu_ref_kill(&ctx->refs);
 	xa_for_each(&ctx->personalities, index, creds)
 		io_unregister_personality(ctx, index);
-	mutex_unlock(&ctx->uring_lock);
+	io_ring_ctx_unlock(ctx, &lock_state);
=20
 	flush_delayed_work(&ctx->fallback_work);
=20
 	INIT_WORK(&ctx->exit_work, io_ring_exit_work);
 	/*
@@ -3217,10 +3248,11 @@ static int io_get_ext_arg(struct io_ring_ctx *ctx, =
unsigned flags,
=20
 SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
 		u32, min_complete, u32, flags, const void __user *, argp,
 		size_t, argsz)
 {
+	struct io_ring_ctx_lock_state lock_state;
 	struct io_ring_ctx *ctx;
 	struct file *file;
 	long ret;
=20
 	if (unlikely(flags & ~IORING_ENTER_FLAGS))
@@ -3273,14 +3305,14 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u=
32, to_submit,
 	} else if (to_submit) {
 		ret =3D io_uring_add_tctx_node(ctx);
 		if (unlikely(ret))
 			goto out;
=20
-		mutex_lock(&ctx->uring_lock);
+		io_ring_ctx_lock(ctx, &lock_state);
 		ret =3D io_submit_sqes(ctx, to_submit);
 		if (ret !=3D to_submit) {
-			mutex_unlock(&ctx->uring_lock);
+			io_ring_ctx_unlock(ctx, &lock_state);
 			goto out;
 		}
 		if (flags & IORING_ENTER_GETEVENTS) {
 			if (ctx->syscall_iopoll)
 				goto iopoll_locked;
@@ -3289,11 +3321,11 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u=
32, to_submit,
 			 * it should handle ownership problems if any.
 			 */
 			if (ctx->flags & IORING_SETUP_DEFER_TASKRUN)
 				(void)io_run_local_work_locked(ctx, min_complete);
 		}
-		mutex_unlock(&ctx->uring_lock);
+		io_ring_ctx_unlock(ctx, &lock_state);
 	}
=20
 	if (flags & IORING_ENTER_GETEVENTS) {
 		int ret2;
=20
@@ -3302,16 +3334,17 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u=
32, to_submit,
 			 * We disallow the app entering submit/complete with
 			 * polling, but we still need to lock the ring to
 			 * prevent racing with polled issue that got punted to
 			 * a workqueue.
 			 */
-			mutex_lock(&ctx->uring_lock);
+			io_ring_ctx_lock(ctx, &lock_state);
 iopoll_locked:
 			ret2 =3D io_validate_ext_arg(ctx, flags, argp, argsz);
 			if (likely(!ret2))
-				ret2 =3D io_iopoll_check(ctx, min_complete);
-			mutex_unlock(&ctx->uring_lock);
+				ret2 =3D io_iopoll_check(ctx, min_complete,
+						       &lock_state);
+			io_ring_ctx_unlock(ctx, &lock_state);
 		} else {
 			struct ext_arg ext_arg =3D { .argsz =3D argsz };
=20
 			ret2 =3D io_get_ext_arg(ctx, flags, argp, &ext_arg);
 			if (likely(!ret2))
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index a790c16854d3..57c3eef26a88 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -195,20 +195,64 @@ void io_queue_next(struct io_kiocb *req);
 void io_task_refs_refill(struct io_uring_task *tctx);
 bool __io_alloc_req_refill(struct io_ring_ctx *ctx);
=20
 void io_activate_pollwq(struct io_ring_ctx *ctx);
=20
+struct io_ring_ctx_lock_state {
+};
+
+/* Acquire the ctx uring lock with the given nesting level */
+static inline void io_ring_ctx_lock_nested(struct io_ring_ctx *ctx,
+					   unsigned int subclass,
+					   struct io_ring_ctx_lock_state *state)
+{
+	mutex_lock_nested(&ctx->uring_lock, subclass);
+}
+
+/* Acquire the ctx uring lock */
+static inline void io_ring_ctx_lock(struct io_ring_ctx *ctx,
+				    struct io_ring_ctx_lock_state *state)
+{
+	io_ring_ctx_lock_nested(ctx, 0, state);
+}
+
+/* Attempt to acquire the ctx uring lock without blocking */
+static inline bool io_ring_ctx_trylock(struct io_ring_ctx *ctx,
+				       struct io_ring_ctx_lock_state *state)
+{
+	return mutex_trylock(&ctx->uring_lock);
+}
+
+/* Release the ctx uring lock */
+static inline void io_ring_ctx_unlock(struct io_ring_ctx *ctx,
+				      struct io_ring_ctx_lock_state *state)
+{
+	mutex_unlock(&ctx->uring_lock);
+}
+
+/* Return (if CONFIG_LOCKDEP) whether the ctx uring lock is held */
+static inline bool io_ring_ctx_lock_held(const struct io_ring_ctx *ctx)
+{
+	return lockdep_is_held(&ctx->uring_lock);
+}
+
+/* Assert (if CONFIG_LOCKDEP) that the ctx uring lock is held */
+static inline void io_ring_ctx_assert_locked(const struct io_ring_ctx *ctx)
+{
+	lockdep_assert_held(&ctx->uring_lock);
+}
+
 static inline void io_lockdep_assert_cq_locked(struct io_ring_ctx *ctx)
 {
 #if defined(CONFIG_PROVE_LOCKING)
 	lockdep_assert(in_task());
=20
 	if (ctx->flags & IORING_SETUP_DEFER_TASKRUN)
-		lockdep_assert_held(&ctx->uring_lock);
+		io_ring_ctx_assert_locked(ctx);
=20
 	if (ctx->flags & IORING_SETUP_IOPOLL) {
-		lockdep_assert_held(&ctx->uring_lock);
+		io_ring_ctx_assert_locked(ctx);
 	} else if (!ctx->task_complete) {
 		lockdep_assert_held(&ctx->completion_lock);
 	} else if (ctx->submitter_task) {
 		/*
 		 * ->submitter_task may be NULL and we can still post a CQE,
@@ -373,30 +417,32 @@ static inline void io_put_file(struct io_kiocb *req)
 {
 	if (!(req->flags & REQ_F_FIXED_FILE) && req->file)
 		fput(req->file);
 }
=20
-static inline void io_ring_submit_unlock(struct io_ring_ctx *ctx,
-					 unsigned issue_flags)
+static inline void
+io_ring_submit_unlock(struct io_ring_ctx *ctx, unsigned issue_flags,
+		      struct io_ring_ctx_lock_state *lock_state)
 {
-	lockdep_assert_held(&ctx->uring_lock);
+	io_ring_ctx_assert_locked(ctx);
 	if (unlikely(issue_flags & IO_URING_F_UNLOCKED))
-		mutex_unlock(&ctx->uring_lock);
+		io_ring_ctx_unlock(ctx, lock_state);
 }
=20
-static inline void io_ring_submit_lock(struct io_ring_ctx *ctx,
-				       unsigned issue_flags)
+static inline void
+io_ring_submit_lock(struct io_ring_ctx *ctx, unsigned issue_flags,
+		    struct io_ring_ctx_lock_state *lock_state)
 {
 	/*
-	 * "Normal" inline submissions always hold the uring_lock, since we
+	 * "Normal" inline submissions always hold the ctx uring lock, since we
 	 * grab it from the system call. Same is true for the SQPOLL offload.
 	 * The only exception is when we've detached the request and issue it
 	 * from an async worker thread, grab the lock for that case.
 	 */
 	if (unlikely(issue_flags & IO_URING_F_UNLOCKED))
-		mutex_lock(&ctx->uring_lock);
-	lockdep_assert_held(&ctx->uring_lock);
+		io_ring_ctx_lock(ctx, lock_state);
+	io_ring_ctx_assert_locked(ctx);
 }
=20
 static inline void io_commit_cqring(struct io_ring_ctx *ctx)
 {
 	/* order cqe stores with ring update */
@@ -504,24 +550,23 @@ static inline bool io_task_work_pending(struct io_rin=
g_ctx *ctx)
 	return task_work_pending(current) || io_local_work_pending(ctx);
 }
=20
 static inline void io_tw_lock(struct io_ring_ctx *ctx, io_tw_token_t tw)
 {
-	lockdep_assert_held(&ctx->uring_lock);
+	io_ring_ctx_assert_locked(ctx);
 }
=20
 /*
  * Don't complete immediately but use deferred completion infrastructure.
- * Protected by ->uring_lock and can only be used either with
+ * Protected by ctx uring lock and can only be used either with
  * IO_URING_F_COMPLETE_DEFER or inside a tw handler holding the mutex.
  */
 static inline void io_req_complete_defer(struct io_kiocb *req)
-	__must_hold(&req->ctx->uring_lock)
 {
 	struct io_submit_state *state =3D &req->ctx->submit_state;
=20
-	lockdep_assert_held(&req->ctx->uring_lock);
+	io_ring_ctx_assert_locked(req->ctx);
=20
 	wq_list_add_tail(&req->comp_list, &state->compl_reqs);
 }
=20
 static inline void io_commit_cqring_flush(struct io_ring_ctx *ctx)
diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c
index 796d131107dd..0fb9b22171d4 100644
--- a/io_uring/kbuf.c
+++ b/io_uring/kbuf.c
@@ -72,22 +72,22 @@ bool io_kbuf_commit(struct io_kiocb *req,
 }
=20
 static inline struct io_buffer_list *io_buffer_get_list(struct io_ring_ctx=
 *ctx,
 							unsigned int bgid)
 {
-	lockdep_assert_held(&ctx->uring_lock);
+	io_ring_ctx_assert_locked(ctx);
=20
 	return xa_load(&ctx->io_bl_xa, bgid);
 }
=20
 static int io_buffer_add_list(struct io_ring_ctx *ctx,
 			      struct io_buffer_list *bl, unsigned int bgid)
 {
 	/*
 	 * Store buffer group ID and finally mark the list as visible.
 	 * The normal lookup doesn't care about the visibility as we're
-	 * always under the ->uring_lock, but lookups from mmap do.
+	 * always under the ctx uring lock, but lookups from mmap do.
 	 */
 	bl->bgid =3D bgid;
 	guard(mutex)(&ctx->mmap_lock);
 	return xa_err(xa_store(&ctx->io_bl_xa, bgid, bl, GFP_KERNEL));
 }
@@ -101,23 +101,24 @@ void io_kbuf_drop_legacy(struct io_kiocb *req)
 	req->kbuf =3D NULL;
 }
=20
 bool io_kbuf_recycle_legacy(struct io_kiocb *req, unsigned issue_flags)
 {
+	struct io_ring_ctx_lock_state lock_state;
 	struct io_ring_ctx *ctx =3D req->ctx;
 	struct io_buffer_list *bl;
 	struct io_buffer *buf;
=20
-	io_ring_submit_lock(ctx, issue_flags);
+	io_ring_submit_lock(ctx, issue_flags, &lock_state);
=20
 	buf =3D req->kbuf;
 	bl =3D io_buffer_get_list(ctx, buf->bgid);
 	list_add(&buf->list, &bl->buf_list);
 	bl->nbufs++;
 	req->flags &=3D ~REQ_F_BUFFER_SELECTED;
=20
-	io_ring_submit_unlock(ctx, issue_flags);
+	io_ring_submit_unlock(ctx, issue_flags, &lock_state);
 	return true;
 }
=20
 static void __user *io_provided_buffer_select(struct io_kiocb *req, size_t=
 *len,
 					      struct io_buffer_list *bl)
@@ -210,24 +211,25 @@ static struct io_br_sel io_ring_buffer_select(struct =
io_kiocb *req, size_t *len,
 }
=20
 struct io_br_sel io_buffer_select(struct io_kiocb *req, size_t *len,
 				  unsigned buf_group, unsigned int issue_flags)
 {
+	struct io_ring_ctx_lock_state lock_state;
 	struct io_ring_ctx *ctx =3D req->ctx;
 	struct io_br_sel sel =3D { };
 	struct io_buffer_list *bl;
=20
-	io_ring_submit_lock(req->ctx, issue_flags);
+	io_ring_submit_lock(req->ctx, issue_flags, &lock_state);
=20
 	bl =3D io_buffer_get_list(ctx, buf_group);
 	if (likely(bl)) {
 		if (bl->flags & IOBL_BUF_RING)
 			sel =3D io_ring_buffer_select(req, len, bl, issue_flags);
 		else
 			sel.addr =3D io_provided_buffer_select(req, len, bl);
 	}
-	io_ring_submit_unlock(req->ctx, issue_flags);
+	io_ring_submit_unlock(req->ctx, issue_flags, &lock_state);
 	return sel;
 }
=20
 /* cap it at a reasonable 256, will be one page even for 4K */
 #define PEEK_MAX_IMPORT		256
@@ -315,14 +317,15 @@ static int io_ring_buffers_peek(struct io_kiocb *req,=
 struct buf_sel_arg *arg,
 }
=20
 int io_buffers_select(struct io_kiocb *req, struct buf_sel_arg *arg,
 		      struct io_br_sel *sel, unsigned int issue_flags)
 {
+	struct io_ring_ctx_lock_state lock_state;
 	struct io_ring_ctx *ctx =3D req->ctx;
 	int ret =3D -ENOENT;
=20
-	io_ring_submit_lock(ctx, issue_flags);
+	io_ring_submit_lock(ctx, issue_flags, &lock_state);
 	sel->buf_list =3D io_buffer_get_list(ctx, arg->buf_group);
 	if (unlikely(!sel->buf_list))
 		goto out_unlock;
=20
 	if (sel->buf_list->flags & IOBL_BUF_RING) {
@@ -342,11 +345,11 @@ int io_buffers_select(struct io_kiocb *req, struct bu=
f_sel_arg *arg,
 		ret =3D io_provided_buffers_select(req, &arg->out_len, sel->buf_list, ar=
g->iovs);
 	}
 out_unlock:
 	if (issue_flags & IO_URING_F_UNLOCKED) {
 		sel->buf_list =3D NULL;
-		mutex_unlock(&ctx->uring_lock);
+		io_ring_ctx_unlock(ctx, &lock_state);
 	}
 	return ret;
 }
=20
 int io_buffers_peek(struct io_kiocb *req, struct buf_sel_arg *arg,
@@ -354,11 +357,11 @@ int io_buffers_peek(struct io_kiocb *req, struct buf_=
sel_arg *arg,
 {
 	struct io_ring_ctx *ctx =3D req->ctx;
 	struct io_buffer_list *bl;
 	int ret;
=20
-	lockdep_assert_held(&ctx->uring_lock);
+	io_ring_ctx_assert_locked(ctx);
=20
 	bl =3D io_buffer_get_list(ctx, arg->buf_group);
 	if (unlikely(!bl))
 		return -ENOENT;
=20
@@ -410,11 +413,11 @@ static int io_remove_buffers_legacy(struct io_ring_ct=
x *ctx,
 {
 	unsigned long i =3D 0;
 	struct io_buffer *nxt;
=20
 	/* protects io_buffers_cache */
-	lockdep_assert_held(&ctx->uring_lock);
+	io_ring_ctx_assert_locked(ctx);
 	WARN_ON_ONCE(bl->flags & IOBL_BUF_RING);
=20
 	for (i =3D 0; i < nbufs && !list_empty(&bl->buf_list); i++) {
 		nxt =3D list_first_entry(&bl->buf_list, struct io_buffer, list);
 		list_del(&nxt->list);
@@ -579,18 +582,19 @@ static int __io_manage_buffers_legacy(struct io_kiocb=
 *req,
 }
=20
 int io_manage_buffers_legacy(struct io_kiocb *req, unsigned int issue_flag=
s)
 {
 	struct io_provide_buf *p =3D io_kiocb_to_cmd(req, struct io_provide_buf);
+	struct io_ring_ctx_lock_state lock_state;
 	struct io_ring_ctx *ctx =3D req->ctx;
 	struct io_buffer_list *bl;
 	int ret;
=20
-	io_ring_submit_lock(ctx, issue_flags);
+	io_ring_submit_lock(ctx, issue_flags, &lock_state);
 	bl =3D io_buffer_get_list(ctx, p->bgid);
 	ret =3D __io_manage_buffers_legacy(req, bl);
-	io_ring_submit_unlock(ctx, issue_flags);
+	io_ring_submit_unlock(ctx, issue_flags, &lock_state);
=20
 	if (ret < 0)
 		req_set_fail(req);
 	io_req_set_res(req, ret, 0);
 	return IOU_COMPLETE;
@@ -604,11 +608,11 @@ int io_register_pbuf_ring(struct io_ring_ctx *ctx, vo=
id __user *arg)
 	struct io_uring_buf_ring *br;
 	unsigned long mmap_offset;
 	unsigned long ring_size;
 	int ret;
=20
-	lockdep_assert_held(&ctx->uring_lock);
+	io_ring_ctx_assert_locked(ctx);
=20
 	if (copy_from_user(&reg, arg, sizeof(reg)))
 		return -EFAULT;
 	if (!mem_is_zero(reg.resv, sizeof(reg.resv)))
 		return -EINVAL;
@@ -680,11 +684,11 @@ int io_register_pbuf_ring(struct io_ring_ctx *ctx, vo=
id __user *arg)
 int io_unregister_pbuf_ring(struct io_ring_ctx *ctx, void __user *arg)
 {
 	struct io_uring_buf_reg reg;
 	struct io_buffer_list *bl;
=20
-	lockdep_assert_held(&ctx->uring_lock);
+	io_ring_ctx_assert_locked(ctx);
=20
 	if (copy_from_user(&reg, arg, sizeof(reg)))
 		return -EFAULT;
 	if (!mem_is_zero(reg.resv, sizeof(reg.resv)) || reg.flags)
 		return -EINVAL;
diff --git a/io_uring/memmap.h b/io_uring/memmap.h
index a39d9e518905..080285686a05 100644
--- a/io_uring/memmap.h
+++ b/io_uring/memmap.h
@@ -35,11 +35,11 @@ static inline void io_region_publish(struct io_ring_ctx=
 *ctx,
 				     struct io_mapped_region *src_region,
 				     struct io_mapped_region *dst_region)
 {
 	/*
 	 * Once published mmap can find it without holding only the ->mmap_lock
-	 * and not ->uring_lock.
+	 * and not the ctx uring lock.
 	 */
 	guard(mutex)(&ctx->mmap_lock);
 	*dst_region =3D *src_region;
 }
=20
diff --git a/io_uring/msg_ring.c b/io_uring/msg_ring.c
index c48588e06bfb..47c7cc56782d 100644
--- a/io_uring/msg_ring.c
+++ b/io_uring/msg_ring.c
@@ -30,29 +30,31 @@ struct io_msg {
 		u32 cqe_flags;
 	};
 	u32 flags;
 };
=20
-static void io_double_unlock_ctx(struct io_ring_ctx *octx)
+static void io_double_unlock_ctx(struct io_ring_ctx *octx,
+				 struct io_ring_ctx_lock_state *lock_state)
 {
-	mutex_unlock(&octx->uring_lock);
+	io_ring_ctx_unlock(octx, lock_state);
 }
=20
 static int io_lock_external_ctx(struct io_ring_ctx *octx,
-				unsigned int issue_flags)
+				unsigned int issue_flags,
+				struct io_ring_ctx_lock_state *lock_state)
 {
 	/*
 	 * To ensure proper ordering between the two ctxs, we can only
 	 * attempt a trylock on the target. If that fails and we already have
 	 * the source ctx lock, punt to io-wq.
 	 */
 	if (!(issue_flags & IO_URING_F_UNLOCKED)) {
-		if (!mutex_trylock(&octx->uring_lock))
+		if (!io_ring_ctx_trylock(octx, lock_state))
 			return -EAGAIN;
 		return 0;
 	}
-	mutex_lock(&octx->uring_lock);
+	io_ring_ctx_lock(octx, lock_state);
 	return 0;
 }
=20
 void io_msg_ring_cleanup(struct io_kiocb *req)
 {
@@ -116,10 +118,11 @@ static int io_msg_data_remote(struct io_ring_ctx *tar=
get_ctx,
 }
=20
 static int __io_msg_ring_data(struct io_ring_ctx *target_ctx,
 			      struct io_msg *msg, unsigned int issue_flags)
 {
+	struct io_ring_ctx_lock_state lock_state;
 	u32 flags =3D 0;
 	int ret;
=20
 	if (msg->src_fd || msg->flags & ~IORING_MSG_RING_FLAGS_PASS)
 		return -EINVAL;
@@ -134,17 +137,18 @@ static int __io_msg_ring_data(struct io_ring_ctx *tar=
get_ctx,
 	if (msg->flags & IORING_MSG_RING_FLAGS_PASS)
 		flags =3D msg->cqe_flags;
=20
 	ret =3D -EOVERFLOW;
 	if (target_ctx->flags & IORING_SETUP_IOPOLL) {
-		if (unlikely(io_lock_external_ctx(target_ctx, issue_flags)))
+		if (unlikely(io_lock_external_ctx(target_ctx, issue_flags,
+						  &lock_state)))
 			return -EAGAIN;
 	}
 	if (io_post_aux_cqe(target_ctx, msg->user_data, msg->len, flags))
 		ret =3D 0;
 	if (target_ctx->flags & IORING_SETUP_IOPOLL)
-		io_double_unlock_ctx(target_ctx);
+		io_double_unlock_ctx(target_ctx, &lock_state);
 	return ret;
 }
=20
 static int io_msg_ring_data(struct io_kiocb *req, unsigned int issue_flags)
 {
@@ -155,35 +159,38 @@ static int io_msg_ring_data(struct io_kiocb *req, uns=
igned int issue_flags)
 }
=20
 static int io_msg_grab_file(struct io_kiocb *req, unsigned int issue_flags)
 {
 	struct io_msg *msg =3D io_kiocb_to_cmd(req, struct io_msg);
+	struct io_ring_ctx_lock_state lock_state;
 	struct io_ring_ctx *ctx =3D req->ctx;
 	struct io_rsrc_node *node;
 	int ret =3D -EBADF;
=20
-	io_ring_submit_lock(ctx, issue_flags);
+	io_ring_submit_lock(ctx, issue_flags, &lock_state);
 	node =3D io_rsrc_node_lookup(&ctx->file_table.data, msg->src_fd);
 	if (node) {
 		msg->src_file =3D io_slot_file(node);
 		if (msg->src_file)
 			get_file(msg->src_file);
 		req->flags |=3D REQ_F_NEED_CLEANUP;
 		ret =3D 0;
 	}
-	io_ring_submit_unlock(ctx, issue_flags);
+	io_ring_submit_unlock(ctx, issue_flags, &lock_state);
 	return ret;
 }
=20
 static int io_msg_install_complete(struct io_kiocb *req, unsigned int issu=
e_flags)
 {
 	struct io_ring_ctx *target_ctx =3D req->file->private_data;
 	struct io_msg *msg =3D io_kiocb_to_cmd(req, struct io_msg);
+	struct io_ring_ctx_lock_state lock_state;
 	struct file *src_file =3D msg->src_file;
 	int ret;
=20
-	if (unlikely(io_lock_external_ctx(target_ctx, issue_flags)))
+	if (unlikely(io_lock_external_ctx(target_ctx, issue_flags,
+					  &lock_state)))
 		return -EAGAIN;
=20
 	ret =3D __io_fixed_fd_install(target_ctx, src_file, msg->dst_fd);
 	if (ret < 0)
 		goto out_unlock;
@@ -200,11 +207,11 @@ static int io_msg_install_complete(struct io_kiocb *r=
eq, unsigned int issue_flag
 	 * later IORING_OP_MSG_RING delivers the message.
 	 */
 	if (!io_post_aux_cqe(target_ctx, msg->user_data, ret, 0))
 		ret =3D -EOVERFLOW;
 out_unlock:
-	io_double_unlock_ctx(target_ctx);
+	io_double_unlock_ctx(target_ctx, &lock_state);
 	return ret;
 }
=20
 static void io_msg_tw_fd_complete(struct callback_head *head)
 {
diff --git a/io_uring/notif.c b/io_uring/notif.c
index f476775ba44b..8099b87af588 100644
--- a/io_uring/notif.c
+++ b/io_uring/notif.c
@@ -15,11 +15,11 @@ static void io_notif_tw_complete(struct io_tw_req tw_re=
q, io_tw_token_t tw)
 {
 	struct io_kiocb *notif =3D tw_req.req;
 	struct io_notif_data *nd =3D io_notif_to_data(notif);
 	struct io_ring_ctx *ctx =3D notif->ctx;
=20
-	lockdep_assert_held(&ctx->uring_lock);
+	io_ring_ctx_assert_locked(ctx);
=20
 	do {
 		notif =3D cmd_to_io_kiocb(nd);
=20
 		if (WARN_ON_ONCE(ctx !=3D notif->ctx))
@@ -109,15 +109,16 @@ static const struct ubuf_info_ops io_ubuf_ops =3D {
 	.complete =3D io_tx_ubuf_complete,
 	.link_skb =3D io_link_skb,
 };
=20
 struct io_kiocb *io_alloc_notif(struct io_ring_ctx *ctx)
-	__must_hold(&ctx->uring_lock)
 {
 	struct io_kiocb *notif;
 	struct io_notif_data *nd;
=20
+	io_ring_ctx_assert_locked(ctx);
+
 	if (unlikely(!io_alloc_req(ctx, &notif)))
 		return NULL;
 	notif->ctx =3D ctx;
 	notif->opcode =3D IORING_OP_NOP;
 	notif->flags =3D 0;
diff --git a/io_uring/notif.h b/io_uring/notif.h
index f3589cfef4a9..c33c9a1179c9 100644
--- a/io_uring/notif.h
+++ b/io_uring/notif.h
@@ -31,14 +31,15 @@ static inline struct io_notif_data *io_notif_to_data(st=
ruct io_kiocb *notif)
 {
 	return io_kiocb_to_cmd(notif, struct io_notif_data);
 }
=20
 static inline void io_notif_flush(struct io_kiocb *notif)
-	__must_hold(&notif->ctx->uring_lock)
 {
 	struct io_notif_data *nd =3D io_notif_to_data(notif);
=20
+	io_ring_ctx_assert_locked(notif->ctx);
+
 	io_tx_ubuf_complete(NULL, &nd->uarg, true);
 }
=20
 static inline int io_notif_account_mem(struct io_kiocb *notif, unsigned le=
n)
 {
diff --git a/io_uring/openclose.c b/io_uring/openclose.c
index bfeb91b31bba..432a7a68eec1 100644
--- a/io_uring/openclose.c
+++ b/io_uring/openclose.c
@@ -189,15 +189,16 @@ void io_open_cleanup(struct io_kiocb *req)
 }
=20
 int __io_close_fixed(struct io_ring_ctx *ctx, unsigned int issue_flags,
 		     unsigned int offset)
 {
+	struct io_ring_ctx_lock_state lock_state;
 	int ret;
=20
-	io_ring_submit_lock(ctx, issue_flags);
+	io_ring_submit_lock(ctx, issue_flags, &lock_state);
 	ret =3D io_fixed_fd_remove(ctx, offset);
-	io_ring_submit_unlock(ctx, issue_flags);
+	io_ring_submit_unlock(ctx, issue_flags, &lock_state);
=20
 	return ret;
 }
=20
 static inline int io_close_fixed(struct io_kiocb *req, unsigned int issue_=
flags)
@@ -333,18 +334,19 @@ int io_pipe_prep(struct io_kiocb *req, const struct i=
o_uring_sqe *sqe)
=20
 static int io_pipe_fixed(struct io_kiocb *req, struct file **files,
 			 unsigned int issue_flags)
 {
 	struct io_pipe *p =3D io_kiocb_to_cmd(req, struct io_pipe);
+	struct io_ring_ctx_lock_state lock_state;
 	struct io_ring_ctx *ctx =3D req->ctx;
 	int ret, fds[2] =3D { -1, -1 };
 	int slot =3D p->file_slot;
=20
 	if (p->flags & O_CLOEXEC)
 		return -EINVAL;
=20
-	io_ring_submit_lock(ctx, issue_flags);
+	io_ring_submit_lock(ctx, issue_flags, &lock_state);
=20
 	ret =3D __io_fixed_fd_install(ctx, files[0], slot);
 	if (ret < 0)
 		goto err;
 	fds[0] =3D ret;
@@ -361,23 +363,23 @@ static int io_pipe_fixed(struct io_kiocb *req, struct=
 file **files,
 	if (ret < 0)
 		goto err;
 	fds[1] =3D ret;
 	files[1] =3D NULL;
=20
-	io_ring_submit_unlock(ctx, issue_flags);
+	io_ring_submit_unlock(ctx, issue_flags, &lock_state);
=20
 	if (!copy_to_user(p->fds, fds, sizeof(fds)))
 		return 0;
=20
 	ret =3D -EFAULT;
-	io_ring_submit_lock(ctx, issue_flags);
+	io_ring_submit_lock(ctx, issue_flags, &lock_state);
 err:
 	if (fds[0] !=3D -1)
 		io_fixed_fd_remove(ctx, fds[0]);
 	if (fds[1] !=3D -1)
 		io_fixed_fd_remove(ctx, fds[1]);
-	io_ring_submit_unlock(ctx, issue_flags);
+	io_ring_submit_unlock(ctx, issue_flags, &lock_state);
 	return ret;
 }
=20
 static int io_pipe_fd(struct io_kiocb *req, struct file **files)
 {
diff --git a/io_uring/poll.c b/io_uring/poll.c
index aac4b3b881fb..9e82315f977b 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -121,11 +121,11 @@ static struct io_poll *io_poll_get_single(struct io_k=
iocb *req)
 static void io_poll_req_insert(struct io_kiocb *req)
 {
 	struct io_hash_table *table =3D &req->ctx->cancel_table;
 	u32 index =3D hash_long(req->cqe.user_data, table->hash_bits);
=20
-	lockdep_assert_held(&req->ctx->uring_lock);
+	io_ring_ctx_assert_locked(req->ctx);
=20
 	hlist_add_head(&req->hash_node, &table->hbs[index].list);
 }
=20
 static void io_init_poll_iocb(struct io_poll *poll, __poll_t events)
@@ -339,11 +339,11 @@ void io_poll_task_func(struct io_tw_req tw_req, io_tw=
_token_t tw)
 	} else if (ret =3D=3D IOU_POLL_REQUEUE) {
 		__io_poll_execute(req, 0);
 		return;
 	}
 	io_poll_remove_entries(req);
-	/* task_work always has ->uring_lock held */
+	/* task_work always holds ctx uring lock */
 	hash_del(&req->hash_node);
=20
 	if (req->opcode =3D=3D IORING_OP_POLL_ADD) {
 		if (ret =3D=3D IOU_POLL_DONE) {
 			struct io_poll *poll;
@@ -525,15 +525,16 @@ static bool io_poll_can_finish_inline(struct io_kiocb=
 *req,
 	return pt->owning || io_poll_get_ownership(req);
 }
=20
 static void io_poll_add_hash(struct io_kiocb *req, unsigned int issue_flag=
s)
 {
+	struct io_ring_ctx_lock_state lock_state;
 	struct io_ring_ctx *ctx =3D req->ctx;
=20
-	io_ring_submit_lock(ctx, issue_flags);
+	io_ring_submit_lock(ctx, issue_flags, &lock_state);
 	io_poll_req_insert(req);
-	io_ring_submit_unlock(ctx, issue_flags);
+	io_ring_submit_unlock(ctx, issue_flags, &lock_state);
 }
=20
 /*
  * Returns 0 when it's handed over for polling. The caller owns the reques=
ts if
  * it returns non-zero, but otherwise should not touch it. Negative values
@@ -728,11 +729,11 @@ __cold bool io_poll_remove_all(struct io_ring_ctx *ct=
x, struct io_uring_task *tc
 	struct hlist_node *tmp;
 	struct io_kiocb *req;
 	bool found =3D false;
 	int i;
=20
-	lockdep_assert_held(&ctx->uring_lock);
+	io_ring_ctx_assert_locked(ctx);
=20
 	for (i =3D 0; i < nr_buckets; i++) {
 		struct io_hash_bucket *hb =3D &ctx->cancel_table.hbs[i];
=20
 		hlist_for_each_entry_safe(req, tmp, &hb->list, hash_node) {
@@ -814,15 +815,16 @@ static int __io_poll_cancel(struct io_ring_ctx *ctx, =
struct io_cancel_data *cd)
 }
=20
 int io_poll_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd,
 		   unsigned issue_flags)
 {
+	struct io_ring_ctx_lock_state lock_state;
 	int ret;
=20
-	io_ring_submit_lock(ctx, issue_flags);
+	io_ring_submit_lock(ctx, issue_flags, &lock_state);
 	ret =3D __io_poll_cancel(ctx, cd);
-	io_ring_submit_unlock(ctx, issue_flags);
+	io_ring_submit_unlock(ctx, issue_flags, &lock_state);
 	return ret;
 }
=20
 static __poll_t io_poll_parse_events(const struct io_uring_sqe *sqe,
 				     unsigned int flags)
@@ -905,16 +907,17 @@ int io_poll_add(struct io_kiocb *req, unsigned int is=
sue_flags)
 }
=20
 int io_poll_remove(struct io_kiocb *req, unsigned int issue_flags)
 {
 	struct io_poll_update *poll_update =3D io_kiocb_to_cmd(req, struct io_pol=
l_update);
+	struct io_ring_ctx_lock_state lock_state;
 	struct io_ring_ctx *ctx =3D req->ctx;
 	struct io_cancel_data cd =3D { .ctx =3D ctx, .data =3D poll_update->old_u=
ser_data, };
 	struct io_kiocb *preq;
 	int ret2, ret =3D 0;
=20
-	io_ring_submit_lock(ctx, issue_flags);
+	io_ring_submit_lock(ctx, issue_flags, &lock_state);
 	preq =3D io_poll_find(ctx, true, &cd);
 	ret2 =3D io_poll_disarm(preq);
 	if (ret2) {
 		ret =3D ret2;
 		goto out;
@@ -950,11 +953,11 @@ int io_poll_remove(struct io_kiocb *req, unsigned int=
 issue_flags)
 	if (preq->cqe.res < 0)
 		req_set_fail(preq);
 	preq->io_task_work.func =3D io_req_task_complete;
 	io_req_task_work_add(preq);
 out:
-	io_ring_submit_unlock(ctx, issue_flags);
+	io_ring_submit_unlock(ctx, issue_flags, &lock_state);
 	if (ret < 0) {
 		req_set_fail(req);
 		return ret;
 	}
 	/* complete update request, we're done with it */
diff --git a/io_uring/register.c b/io_uring/register.c
index 9e473c244041..da5030bcae2f 100644
--- a/io_uring/register.c
+++ b/io_uring/register.c
@@ -197,28 +197,30 @@ static int io_register_enable_rings(struct io_ring_ct=
x *ctx)
 	if (ctx->sq_data && wq_has_sleeper(&ctx->sq_data->wait))
 		wake_up(&ctx->sq_data->wait);
 	return 0;
 }
=20
-static __cold int __io_register_iowq_aff(struct io_ring_ctx *ctx,
-					 cpumask_var_t new_mask)
+static __cold int
+__io_register_iowq_aff(struct io_ring_ctx *ctx, cpumask_var_t new_mask,
+		       struct io_ring_ctx_lock_state *lock_state)
 {
 	int ret;
=20
 	if (!(ctx->flags & IORING_SETUP_SQPOLL)) {
 		ret =3D io_wq_cpu_affinity(current->io_uring, new_mask);
 	} else {
-		mutex_unlock(&ctx->uring_lock);
+		io_ring_ctx_unlock(ctx, lock_state);
 		ret =3D io_sqpoll_wq_cpu_affinity(ctx, new_mask);
-		mutex_lock(&ctx->uring_lock);
+		io_ring_ctx_lock(ctx, lock_state);
 	}
=20
 	return ret;
 }
=20
-static __cold int io_register_iowq_aff(struct io_ring_ctx *ctx,
-				       void __user *arg, unsigned len)
+static __cold int
+io_register_iowq_aff(struct io_ring_ctx *ctx, void __user *arg, unsigned l=
en,
+		     struct io_ring_ctx_lock_state *lock_state)
 {
 	cpumask_var_t new_mask;
 	int ret;
=20
 	if (!alloc_cpumask_var(&new_mask, GFP_KERNEL))
@@ -240,30 +242,34 @@ static __cold int io_register_iowq_aff(struct io_ring=
_ctx *ctx,
 	if (ret) {
 		free_cpumask_var(new_mask);
 		return -EFAULT;
 	}
=20
-	ret =3D __io_register_iowq_aff(ctx, new_mask);
+	ret =3D __io_register_iowq_aff(ctx, new_mask, lock_state);
 	free_cpumask_var(new_mask);
 	return ret;
 }
=20
-static __cold int io_unregister_iowq_aff(struct io_ring_ctx *ctx)
+static __cold int
+io_unregister_iowq_aff(struct io_ring_ctx *ctx,
+		       struct io_ring_ctx_lock_state *lock_state)
 {
-	return __io_register_iowq_aff(ctx, NULL);
+	return __io_register_iowq_aff(ctx, NULL, lock_state);
 }
=20
-static __cold int io_register_iowq_max_workers(struct io_ring_ctx *ctx,
-					       void __user *arg)
-	__must_hold(&ctx->uring_lock)
+static __cold int
+io_register_iowq_max_workers(struct io_ring_ctx *ctx, void __user *arg,
+			     struct io_ring_ctx_lock_state *lock_state)
 {
 	struct io_tctx_node *node;
 	struct io_uring_task *tctx =3D NULL;
 	struct io_sq_data *sqd =3D NULL;
 	__u32 new_count[2];
 	int i, ret;
=20
+	io_ring_ctx_assert_locked(ctx);
+
 	if (copy_from_user(new_count, arg, sizeof(new_count)))
 		return -EFAULT;
 	for (i =3D 0; i < ARRAY_SIZE(new_count); i++)
 		if (new_count[i] > INT_MAX)
 			return -EINVAL;
@@ -272,18 +278,18 @@ static __cold int io_register_iowq_max_workers(struct=
 io_ring_ctx *ctx,
 		sqd =3D ctx->sq_data;
 		if (sqd) {
 			struct task_struct *tsk;
=20
 			/*
-			 * Observe the correct sqd->lock -> ctx->uring_lock
-			 * ordering. Fine to drop uring_lock here, we hold
+			 * Observe the correct sqd->lock -> ctx uring lock
+			 * ordering. Fine to drop ctx uring lock here, we hold
 			 * a ref to the ctx.
 			 */
 			refcount_inc(&sqd->refs);
-			mutex_unlock(&ctx->uring_lock);
+			io_ring_ctx_unlock(ctx, lock_state);
 			mutex_lock(&sqd->lock);
-			mutex_lock(&ctx->uring_lock);
+			io_ring_ctx_lock(ctx, lock_state);
 			tsk =3D sqpoll_task_locked(sqd);
 			if (tsk)
 				tctx =3D tsk->io_uring;
 		}
 	} else {
@@ -304,14 +310,14 @@ static __cold int io_register_iowq_max_workers(struct=
 io_ring_ctx *ctx,
 	} else {
 		memset(new_count, 0, sizeof(new_count));
 	}
=20
 	if (sqd) {
-		mutex_unlock(&ctx->uring_lock);
+		io_ring_ctx_unlock(ctx, lock_state);
 		mutex_unlock(&sqd->lock);
 		io_put_sq_data(sqd);
-		mutex_lock(&ctx->uring_lock);
+		io_ring_ctx_lock(ctx, lock_state);
 	}
=20
 	if (copy_to_user(arg, new_count, sizeof(new_count)))
 		return -EFAULT;
=20
@@ -331,14 +337,14 @@ static __cold int io_register_iowq_max_workers(struct=
 io_ring_ctx *ctx,
 		(void)io_wq_max_workers(tctx->io_wq, new_count);
 	}
 	return 0;
 err:
 	if (sqd) {
-		mutex_unlock(&ctx->uring_lock);
+		io_ring_ctx_unlock(ctx, lock_state);
 		mutex_unlock(&sqd->lock);
 		io_put_sq_data(sqd);
-		mutex_lock(&ctx->uring_lock);
+		io_ring_ctx_lock(ctx, lock_state);
 	}
 	return ret;
 }
=20
 static int io_register_clock(struct io_ring_ctx *ctx,
@@ -394,11 +400,12 @@ static void io_register_free_rings(struct io_ring_ctx=
 *ctx,
 #define RESIZE_FLAGS	(IORING_SETUP_CQSIZE | IORING_SETUP_CLAMP)
 #define COPY_FLAGS	(IORING_SETUP_NO_SQARRAY | IORING_SETUP_SQE128 | \
 			 IORING_SETUP_CQE32 | IORING_SETUP_NO_MMAP | \
 			 IORING_SETUP_CQE_MIXED | IORING_SETUP_SQE_MIXED)
=20
-static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *=
arg)
+static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *=
arg,
+				    struct io_ring_ctx_lock_state *lock_state)
 {
 	struct io_ctx_config config;
 	struct io_uring_region_desc rd;
 	struct io_ring_ctx_rings o =3D { }, n =3D { }, *to_free =3D NULL;
 	unsigned i, tail, old_head;
@@ -468,13 +475,13 @@ static int io_register_resize_rings(struct io_ring_ct=
x *ctx, void __user *arg)
=20
 	/*
 	 * If using SQPOLL, park the thread
 	 */
 	if (ctx->sq_data) {
-		mutex_unlock(&ctx->uring_lock);
+		io_ring_ctx_unlock(ctx, lock_state);
 		io_sq_thread_park(ctx->sq_data);
-		mutex_lock(&ctx->uring_lock);
+		io_ring_ctx_lock(ctx, lock_state);
 	}
=20
 	/*
 	 * We'll do the swap. Grab the ctx->mmap_lock, which will exclude
 	 * any new mmap's on the ring fd. Clear out existing mappings to prevent
@@ -605,13 +612,12 @@ static int io_register_mem_region(struct io_ring_ctx =
*ctx, void __user *uarg)
 	io_region_publish(ctx, &region, &ctx->param_region);
 	return 0;
 }
=20
 static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode,
-			       void __user *arg, unsigned nr_args)
-	__releases(ctx->uring_lock)
-	__acquires(ctx->uring_lock)
+			       void __user *arg, unsigned nr_args,
+			       struct io_ring_ctx_lock_state *lock_state)
 {
 	int ret;
=20
 	/*
 	 * We don't quiesce the refs for register anymore and so it can't be
@@ -718,26 +724,26 @@ static int __io_uring_register(struct io_ring_ctx *ct=
x, unsigned opcode,
 		break;
 	case IORING_REGISTER_IOWQ_AFF:
 		ret =3D -EINVAL;
 		if (!arg || !nr_args)
 			break;
-		ret =3D io_register_iowq_aff(ctx, arg, nr_args);
+		ret =3D io_register_iowq_aff(ctx, arg, nr_args, lock_state);
 		break;
 	case IORING_UNREGISTER_IOWQ_AFF:
 		ret =3D -EINVAL;
 		if (arg || nr_args)
 			break;
-		ret =3D io_unregister_iowq_aff(ctx);
+		ret =3D io_unregister_iowq_aff(ctx, lock_state);
 		break;
 	case IORING_REGISTER_IOWQ_MAX_WORKERS:
 		ret =3D -EINVAL;
 		if (!arg || nr_args !=3D 2)
 			break;
-		ret =3D io_register_iowq_max_workers(ctx, arg);
+		ret =3D io_register_iowq_max_workers(ctx, arg, lock_state);
 		break;
 	case IORING_REGISTER_RING_FDS:
-		ret =3D io_ringfd_register(ctx, arg, nr_args);
+		ret =3D io_ringfd_register(ctx, arg, nr_args, lock_state);
 		break;
 	case IORING_UNREGISTER_RING_FDS:
 		ret =3D io_ringfd_unregister(ctx, arg, nr_args);
 		break;
 	case IORING_REGISTER_PBUF_RING:
@@ -754,11 +760,11 @@ static int __io_uring_register(struct io_ring_ctx *ct=
x, unsigned opcode,
 		break;
 	case IORING_REGISTER_SYNC_CANCEL:
 		ret =3D -EINVAL;
 		if (!arg || nr_args !=3D 1)
 			break;
-		ret =3D io_sync_cancel(ctx, arg);
+		ret =3D io_sync_cancel(ctx, arg, lock_state);
 		break;
 	case IORING_REGISTER_FILE_ALLOC_RANGE:
 		ret =3D -EINVAL;
 		if (!arg || nr_args)
 			break;
@@ -790,11 +796,11 @@ static int __io_uring_register(struct io_ring_ctx *ct=
x, unsigned opcode,
 		break;
 	case IORING_REGISTER_CLONE_BUFFERS:
 		ret =3D -EINVAL;
 		if (!arg || nr_args !=3D 1)
 			break;
-		ret =3D io_register_clone_buffers(ctx, arg);
+		ret =3D io_register_clone_buffers(ctx, arg, lock_state);
 		break;
 	case IORING_REGISTER_ZCRX_IFQ:
 		ret =3D -EINVAL;
 		if (!arg || nr_args !=3D 1)
 			break;
@@ -802,11 +808,11 @@ static int __io_uring_register(struct io_ring_ctx *ct=
x, unsigned opcode,
 		break;
 	case IORING_REGISTER_RESIZE_RINGS:
 		ret =3D -EINVAL;
 		if (!arg || nr_args !=3D 1)
 			break;
-		ret =3D io_register_resize_rings(ctx, arg);
+		ret =3D io_register_resize_rings(ctx, arg, lock_state);
 		break;
 	case IORING_REGISTER_MEM_REGION:
 		ret =3D -EINVAL;
 		if (!arg || nr_args !=3D 1)
 			break;
@@ -894,10 +900,11 @@ static int io_uring_register_blind(unsigned int opcod=
e, void __user *arg,
 }
=20
 SYSCALL_DEFINE4(io_uring_register, unsigned int, fd, unsigned int, opcode,
 		void __user *, arg, unsigned int, nr_args)
 {
+	struct io_ring_ctx_lock_state lock_state;
 	struct io_ring_ctx *ctx;
 	long ret =3D -EBADF;
 	struct file *file;
 	bool use_registered_ring;
=20
@@ -913,15 +920,15 @@ SYSCALL_DEFINE4(io_uring_register, unsigned int, fd, =
unsigned int, opcode,
 	file =3D io_uring_register_get_file(fd, use_registered_ring);
 	if (IS_ERR(file))
 		return PTR_ERR(file);
 	ctx =3D file->private_data;
=20
-	mutex_lock(&ctx->uring_lock);
-	ret =3D __io_uring_register(ctx, opcode, arg, nr_args);
+	io_ring_ctx_lock(ctx, &lock_state);
+	ret =3D __io_uring_register(ctx, opcode, arg, nr_args, &lock_state);
=20
 	trace_io_uring_register(ctx, opcode, ctx->file_table.data.nr,
 				ctx->buf_table.nr, ret);
-	mutex_unlock(&ctx->uring_lock);
+	io_ring_ctx_unlock(ctx, &lock_state);
=20
 	fput(file);
 	return ret;
 }
diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
index 41c89f5c616d..19ccfb1ee612 100644
--- a/io_uring/rsrc.c
+++ b/io_uring/rsrc.c
@@ -349,11 +349,11 @@ static int __io_register_rsrc_update(struct io_ring_c=
tx *ctx, unsigned type,
 				     struct io_uring_rsrc_update2 *up,
 				     unsigned nr_args)
 {
 	__u32 tmp;
=20
-	lockdep_assert_held(&ctx->uring_lock);
+	io_ring_ctx_assert_locked(ctx);
=20
 	if (check_add_overflow(up->offset, nr_args, &tmp))
 		return -EOVERFLOW;
=20
 	switch (type) {
@@ -497,14 +497,16 @@ int io_files_update(struct io_kiocb *req, unsigned in=
t issue_flags)
 	up2.resv2 =3D 0;
=20
 	if (up->offset =3D=3D IORING_FILE_INDEX_ALLOC) {
 		ret =3D io_files_update_with_index_alloc(req, issue_flags);
 	} else {
-		io_ring_submit_lock(ctx, issue_flags);
+		struct io_ring_ctx_lock_state lock_state;
+
+		io_ring_submit_lock(ctx, issue_flags, &lock_state);
 		ret =3D __io_register_rsrc_update(ctx, IORING_RSRC_FILE,
 						&up2, up->nr_args);
-		io_ring_submit_unlock(ctx, issue_flags);
+		io_ring_submit_unlock(ctx, issue_flags, &lock_state);
 	}
=20
 	if (ret < 0)
 		req_set_fail(req);
 	io_req_set_res(req, ret, 0);
@@ -940,18 +942,19 @@ int io_buffer_register_bvec(struct io_uring_cmd *cmd,=
 struct request *rq,
 			    void (*release)(void *), unsigned int index,
 			    unsigned int issue_flags)
 {
 	struct io_ring_ctx *ctx =3D cmd_to_io_kiocb(cmd)->ctx;
 	struct io_rsrc_data *data =3D &ctx->buf_table;
+	struct io_ring_ctx_lock_state lock_state;
 	struct req_iterator rq_iter;
 	struct io_mapped_ubuf *imu;
 	struct io_rsrc_node *node;
 	struct bio_vec bv;
 	unsigned int nr_bvecs =3D 0;
 	int ret =3D 0;
=20
-	io_ring_submit_lock(ctx, issue_flags);
+	io_ring_submit_lock(ctx, issue_flags, &lock_state);
 	if (index >=3D data->nr) {
 		ret =3D -EINVAL;
 		goto unlock;
 	}
 	index =3D array_index_nospec(index, data->nr);
@@ -993,24 +996,25 @@ int io_buffer_register_bvec(struct io_uring_cmd *cmd,=
 struct request *rq,
 	imu->nr_bvecs =3D nr_bvecs;
=20
 	node->buf =3D imu;
 	data->nodes[index] =3D node;
 unlock:
-	io_ring_submit_unlock(ctx, issue_flags);
+	io_ring_submit_unlock(ctx, issue_flags, &lock_state);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(io_buffer_register_bvec);
=20
 int io_buffer_unregister_bvec(struct io_uring_cmd *cmd, unsigned int index,
 			      unsigned int issue_flags)
 {
 	struct io_ring_ctx *ctx =3D cmd_to_io_kiocb(cmd)->ctx;
 	struct io_rsrc_data *data =3D &ctx->buf_table;
+	struct io_ring_ctx_lock_state lock_state;
 	struct io_rsrc_node *node;
 	int ret =3D 0;
=20
-	io_ring_submit_lock(ctx, issue_flags);
+	io_ring_submit_lock(ctx, issue_flags, &lock_state);
 	if (index >=3D data->nr) {
 		ret =3D -EINVAL;
 		goto unlock;
 	}
 	index =3D array_index_nospec(index, data->nr);
@@ -1026,11 +1030,11 @@ int io_buffer_unregister_bvec(struct io_uring_cmd *=
cmd, unsigned int index,
 	}
=20
 	io_put_rsrc_node(ctx, node);
 	data->nodes[index] =3D NULL;
 unlock:
-	io_ring_submit_unlock(ctx, issue_flags);
+	io_ring_submit_unlock(ctx, issue_flags, &lock_state);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(io_buffer_unregister_bvec);
=20
 static int validate_fixed_range(u64 buf_addr, size_t len,
@@ -1118,27 +1122,28 @@ static int io_import_fixed(int ddir, struct iov_ite=
r *iter,
 }
=20
 inline struct io_rsrc_node *io_find_buf_node(struct io_kiocb *req,
 					     unsigned issue_flags)
 {
+	struct io_ring_ctx_lock_state lock_state;
 	struct io_ring_ctx *ctx =3D req->ctx;
 	struct io_rsrc_node *node;
=20
 	if (req->flags & REQ_F_BUF_NODE)
 		return req->buf_node;
 	req->flags |=3D REQ_F_BUF_NODE;
=20
-	io_ring_submit_lock(ctx, issue_flags);
+	io_ring_submit_lock(ctx, issue_flags, &lock_state);
 	node =3D io_rsrc_node_lookup(&ctx->buf_table, req->buf_index);
 	if (node) {
 		node->refs++;
 		req->buf_node =3D node;
-		io_ring_submit_unlock(ctx, issue_flags);
+		io_ring_submit_unlock(ctx, issue_flags, &lock_state);
 		return node;
 	}
 	req->flags &=3D ~REQ_F_BUF_NODE;
-	io_ring_submit_unlock(ctx, issue_flags);
+	io_ring_submit_unlock(ctx, issue_flags, &lock_state);
 	return NULL;
 }
=20
 int io_import_reg_buf(struct io_kiocb *req, struct iov_iter *iter,
 			u64 buf_addr, size_t len, int ddir,
@@ -1151,28 +1156,32 @@ int io_import_reg_buf(struct io_kiocb *req, struct =
iov_iter *iter,
 		return -EFAULT;
 	return io_import_fixed(ddir, iter, node->buf, buf_addr, len);
 }
=20
 /* Lock two rings at once. The rings must be different! */
-static void lock_two_rings(struct io_ring_ctx *ctx1, struct io_ring_ctx *c=
tx2)
+static void lock_two_rings(struct io_ring_ctx *ctx1, struct io_ring_ctx *c=
tx2,
+			   struct io_ring_ctx_lock_state *lock_state1,
+			   struct io_ring_ctx_lock_state *lock_state2)
 {
-	if (ctx1 > ctx2)
+	if (ctx1 > ctx2) {
 		swap(ctx1, ctx2);
-	mutex_lock(&ctx1->uring_lock);
-	mutex_lock_nested(&ctx2->uring_lock, SINGLE_DEPTH_NESTING);
+		swap(lock_state1, lock_state2);
+	}
+	io_ring_ctx_lock(ctx1, lock_state1);
+	io_ring_ctx_lock_nested(ctx2, SINGLE_DEPTH_NESTING, lock_state2);
 }
=20
 /* Both rings are locked by the caller. */
 static int io_clone_buffers(struct io_ring_ctx *ctx, struct io_ring_ctx *s=
rc_ctx,
 			    struct io_uring_clone_buffers *arg)
 {
 	struct io_rsrc_data data;
 	int i, ret, off, nr;
 	unsigned int nbufs;
=20
-	lockdep_assert_held(&ctx->uring_lock);
-	lockdep_assert_held(&src_ctx->uring_lock);
+	io_ring_ctx_assert_locked(ctx);
+	io_ring_ctx_assert_locked(src_ctx);
=20
 	/*
 	 * Accounting state is shared between the two rings; that only works if
 	 * both rings are accounted towards the same counters.
 	 */
@@ -1272,12 +1281,14 @@ static int io_clone_buffers(struct io_ring_ctx *ctx=
, struct io_ring_ctx *src_ctx
  * is given in the src_fd to the current ring. This is identical to regist=
ering
  * the buffers with ctx, except faster as mappings already exist.
  *
  * Since the memory is already accounted once, don't account it again.
  */
-int io_register_clone_buffers(struct io_ring_ctx *ctx, void __user *arg)
+int io_register_clone_buffers(struct io_ring_ctx *ctx, void __user *arg,
+			      struct io_ring_ctx_lock_state *lock_state)
 {
+	struct io_ring_ctx_lock_state lock_state2;
 	struct io_uring_clone_buffers buf;
 	struct io_ring_ctx *src_ctx;
 	bool registered_src;
 	struct file *file;
 	int ret;
@@ -1296,12 +1307,12 @@ int io_register_clone_buffers(struct io_ring_ctx *c=
tx, void __user *arg)
 	if (IS_ERR(file))
 		return PTR_ERR(file);
=20
 	src_ctx =3D file->private_data;
 	if (src_ctx !=3D ctx) {
-		mutex_unlock(&ctx->uring_lock);
-		lock_two_rings(ctx, src_ctx);
+		io_ring_ctx_unlock(ctx, lock_state);
+		lock_two_rings(ctx, src_ctx, lock_state, &lock_state2);
=20
 		if (src_ctx->submitter_task &&
 		    src_ctx->submitter_task !=3D current) {
 			ret =3D -EEXIST;
 			goto out;
@@ -1310,11 +1321,11 @@ int io_register_clone_buffers(struct io_ring_ctx *c=
tx, void __user *arg)
=20
 	ret =3D io_clone_buffers(ctx, src_ctx, &buf);
=20
 out:
 	if (src_ctx !=3D ctx)
-		mutex_unlock(&src_ctx->uring_lock);
+		io_ring_ctx_unlock(src_ctx, &lock_state2);
=20
 	fput(file);
 	return ret;
 }
=20
diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h
index d603f6a47f5e..388a0508ec59 100644
--- a/io_uring/rsrc.h
+++ b/io_uring/rsrc.h
@@ -2,10 +2,11 @@
 #ifndef IOU_RSRC_H
 #define IOU_RSRC_H
=20
 #include <linux/io_uring_types.h>
 #include <linux/lockdep.h>
+#include "io_uring.h"
=20
 #define IO_VEC_CACHE_SOFT_CAP		256
=20
 enum {
 	IORING_RSRC_FILE		=3D 0,
@@ -68,11 +69,12 @@ int io_import_reg_vec(int ddir, struct iov_iter *iter,
 			struct io_kiocb *req, struct iou_vec *vec,
 			unsigned nr_iovs, unsigned issue_flags);
 int io_prep_reg_iovec(struct io_kiocb *req, struct iou_vec *iv,
 			const struct iovec __user *uvec, size_t uvec_segs);
=20
-int io_register_clone_buffers(struct io_ring_ctx *ctx, void __user *arg);
+int io_register_clone_buffers(struct io_ring_ctx *ctx, void __user *arg,
+			      struct io_ring_ctx_lock_state *lock_state);
 int io_sqe_buffers_unregister(struct io_ring_ctx *ctx);
 int io_sqe_buffers_register(struct io_ring_ctx *ctx, void __user *arg,
 			    unsigned int nr_args, u64 __user *tags);
 int io_sqe_files_unregister(struct io_ring_ctx *ctx);
 int io_sqe_files_register(struct io_ring_ctx *ctx, void __user *arg,
@@ -97,11 +99,11 @@ static inline struct io_rsrc_node *io_rsrc_node_lookup(=
struct io_rsrc_data *data
 	return NULL;
 }
=20
 static inline void io_put_rsrc_node(struct io_ring_ctx *ctx, struct io_rsr=
c_node *node)
 {
-	lockdep_assert_held(&ctx->uring_lock);
+	io_ring_ctx_assert_locked(ctx);
 	if (!--node->refs)
 		io_free_rsrc_node(ctx, node);
 }
=20
 static inline bool io_reset_rsrc_node(struct io_ring_ctx *ctx,
diff --git a/io_uring/rw.c b/io_uring/rw.c
index 331af6bf4234..4688b210cff8 100644
--- a/io_uring/rw.c
+++ b/io_uring/rw.c
@@ -462,11 +462,11 @@ int io_read_mshot_prep(struct io_kiocb *req, const st=
ruct io_uring_sqe *sqe)
=20
 void io_readv_writev_cleanup(struct io_kiocb *req)
 {
 	struct io_async_rw *rw =3D req->async_data;
=20
-	lockdep_assert_held(&req->ctx->uring_lock);
+	io_ring_ctx_assert_locked(req->ctx);
 	io_vec_free(&rw->vec);
 	io_rw_recycle(req, 0);
 }
=20
 static inline loff_t *io_kiocb_update_pos(struct io_kiocb *req)
diff --git a/io_uring/splice.c b/io_uring/splice.c
index e81ebbb91925..567695c39091 100644
--- a/io_uring/splice.c
+++ b/io_uring/splice.c
@@ -58,26 +58,27 @@ void io_splice_cleanup(struct io_kiocb *req)
=20
 static struct file *io_splice_get_file(struct io_kiocb *req,
 				       unsigned int issue_flags)
 {
 	struct io_splice *sp =3D io_kiocb_to_cmd(req, struct io_splice);
+	struct io_ring_ctx_lock_state lock_state;
 	struct io_ring_ctx *ctx =3D req->ctx;
 	struct io_rsrc_node *node;
 	struct file *file =3D NULL;
=20
 	if (!(sp->flags & SPLICE_F_FD_IN_FIXED))
 		return io_file_get_normal(req, sp->splice_fd_in);
=20
-	io_ring_submit_lock(ctx, issue_flags);
+	io_ring_submit_lock(ctx, issue_flags, &lock_state);
 	node =3D io_rsrc_node_lookup(&ctx->file_table.data, sp->splice_fd_in);
 	if (node) {
 		node->refs++;
 		sp->rsrc_node =3D node;
 		file =3D io_slot_file(node);
 		req->flags |=3D REQ_F_NEED_CLEANUP;
 	}
-	io_ring_submit_unlock(ctx, issue_flags);
+	io_ring_submit_unlock(ctx, issue_flags, &lock_state);
 	return file;
 }
=20
 int io_tee(struct io_kiocb *req, unsigned int issue_flags)
 {
diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c
index 74c1a130cd87..0b4573b53cf3 100644
--- a/io_uring/sqpoll.c
+++ b/io_uring/sqpoll.c
@@ -211,29 +211,30 @@ static int __io_sq_thread(struct io_ring_ctx *ctx, st=
ruct io_sq_data *sqd,
 	/* if we're handling multiple rings, cap submit size for fairness */
 	if (cap_entries && to_submit > IORING_SQPOLL_CAP_ENTRIES_VALUE)
 		to_submit =3D IORING_SQPOLL_CAP_ENTRIES_VALUE;
=20
 	if (to_submit || !wq_list_empty(&ctx->iopoll_list)) {
+		struct io_ring_ctx_lock_state lock_state;
 		const struct cred *creds =3D NULL;
=20
 		io_sq_start_worktime(ist);
=20
 		if (ctx->sq_creds !=3D current_cred())
 			creds =3D override_creds(ctx->sq_creds);
=20
-		mutex_lock(&ctx->uring_lock);
+		io_ring_ctx_lock(ctx, &lock_state);
 		if (!wq_list_empty(&ctx->iopoll_list))
 			io_do_iopoll(ctx, true);
=20
 		/*
 		 * Don't submit if refs are dying, good for io_uring_register(),
 		 * but also it is relied upon by io_ring_exit_work()
 		 */
 		if (to_submit && likely(!percpu_ref_is_dying(&ctx->refs)) &&
 		    !(ctx->flags & IORING_SETUP_R_DISABLED))
 			ret =3D io_submit_sqes(ctx, to_submit);
-		mutex_unlock(&ctx->uring_lock);
+		io_ring_ctx_unlock(ctx, &lock_state);
=20
 		if (to_submit && wq_has_sleeper(&ctx->sqo_sq_wait))
 			wake_up(&ctx->sqo_sq_wait);
 		if (creds)
 			revert_creds(creds);
diff --git a/io_uring/tctx.c b/io_uring/tctx.c
index 5b66755579c0..add6134e934d 100644
--- a/io_uring/tctx.c
+++ b/io_uring/tctx.c
@@ -13,27 +13,28 @@
 #include "tctx.h"
=20
 static struct io_wq *io_init_wq_offload(struct io_ring_ctx *ctx,
 					struct task_struct *task)
 {
+	struct io_ring_ctx_lock_state lock_state;
 	struct io_wq_hash *hash;
 	struct io_wq_data data;
 	unsigned int concurrency;
=20
-	mutex_lock(&ctx->uring_lock);
+	io_ring_ctx_lock(ctx, &lock_state);
 	hash =3D ctx->hash_map;
 	if (!hash) {
 		hash =3D kzalloc(sizeof(*hash), GFP_KERNEL);
 		if (!hash) {
-			mutex_unlock(&ctx->uring_lock);
+			io_ring_ctx_unlock(ctx, &lock_state);
 			return ERR_PTR(-ENOMEM);
 		}
 		refcount_set(&hash->refs, 1);
 		init_waitqueue_head(&hash->wait);
 		ctx->hash_map =3D hash;
 	}
-	mutex_unlock(&ctx->uring_lock);
+	io_ring_ctx_unlock(ctx, &lock_state);
=20
 	data.hash =3D hash;
 	data.task =3D task;
=20
 	/* Do QD, or 4 * CPUS, whatever is smallest */
@@ -121,10 +122,12 @@ int __io_uring_add_tctx_node(struct io_ring_ctx *ctx)
 			if (ret)
 				return ret;
 		}
 	}
 	if (!xa_load(&tctx->xa, (unsigned long)ctx)) {
+		struct io_ring_ctx_lock_state lock_state;
+
 		node =3D kmalloc(sizeof(*node), GFP_KERNEL);
 		if (!node)
 			return -ENOMEM;
 		node->ctx =3D ctx;
 		node->task =3D current;
@@ -134,13 +137,13 @@ int __io_uring_add_tctx_node(struct io_ring_ctx *ctx)
 		if (ret) {
 			kfree(node);
 			return ret;
 		}
=20
-		mutex_lock(&ctx->uring_lock);
+		io_ring_ctx_lock(ctx, &lock_state);
 		list_add(&node->ctx_node, &ctx->tctx_list);
-		mutex_unlock(&ctx->uring_lock);
+		io_ring_ctx_unlock(ctx, &lock_state);
 	}
 	return 0;
 }
=20
 int __io_uring_add_tctx_node_from_submit(struct io_ring_ctx *ctx)
@@ -163,10 +166,11 @@ int __io_uring_add_tctx_node_from_submit(struct io_ri=
ng_ctx *ctx)
  * Remove this io_uring_file -> task mapping.
  */
 __cold void io_uring_del_tctx_node(unsigned long index)
 {
 	struct io_uring_task *tctx =3D current->io_uring;
+	struct io_ring_ctx_lock_state lock_state;
 	struct io_tctx_node *node;
=20
 	if (!tctx)
 		return;
 	node =3D xa_erase(&tctx->xa, index);
@@ -174,13 +178,13 @@ __cold void io_uring_del_tctx_node(unsigned long inde=
x)
 		return;
=20
 	WARN_ON_ONCE(current !=3D node->task);
 	WARN_ON_ONCE(list_empty(&node->ctx_node));
=20
-	mutex_lock(&node->ctx->uring_lock);
+	io_ring_ctx_lock(node->ctx, &lock_state);
 	list_del(&node->ctx_node);
-	mutex_unlock(&node->ctx->uring_lock);
+	io_ring_ctx_unlock(node->ctx, &lock_state);
=20
 	if (tctx->last =3D=3D node->ctx)
 		tctx->last =3D NULL;
 	kfree(node);
 }
@@ -196,11 +200,11 @@ __cold void io_uring_clean_tctx(struct io_uring_task =
*tctx)
 		cond_resched();
 	}
 	if (wq) {
 		/*
 		 * Must be after io_uring_del_tctx_node() (removes nodes under
-		 * uring_lock) to avoid race with io_uring_try_cancel_iowq().
+		 * ctx uring lock) to avoid race with io_uring_try_cancel_iowq()
 		 */
 		io_wq_put_and_exit(wq);
 		tctx->io_wq =3D NULL;
 	}
 }
@@ -259,23 +263,24 @@ static int io_ring_add_registered_fd(struct io_uring_=
task *tctx, int fd,
  * index. If no index is desired, application may set ->offset =3D=3D -1U
  * and we'll find an available index. Returns number of entries
  * successfully processed, or < 0 on error if none were processed.
  */
 int io_ringfd_register(struct io_ring_ctx *ctx, void __user *__arg,
-		       unsigned nr_args)
+		       unsigned nr_args,
+		       struct io_ring_ctx_lock_state *lock_state)
 {
 	struct io_uring_rsrc_update __user *arg =3D __arg;
 	struct io_uring_rsrc_update reg;
 	struct io_uring_task *tctx;
 	int ret, i;
=20
 	if (!nr_args || nr_args > IO_RINGFD_REG_MAX)
 		return -EINVAL;
=20
-	mutex_unlock(&ctx->uring_lock);
+	io_ring_ctx_unlock(ctx, lock_state);
 	ret =3D __io_uring_add_tctx_node(ctx);
-	mutex_lock(&ctx->uring_lock);
+	io_ring_ctx_lock(ctx, lock_state);
 	if (ret)
 		return ret;
=20
 	tctx =3D current->io_uring;
 	for (i =3D 0; i < nr_args; i++) {
diff --git a/io_uring/tctx.h b/io_uring/tctx.h
index 608e96de70a2..f35dbf19bb80 100644
--- a/io_uring/tctx.h
+++ b/io_uring/tctx.h
@@ -1,7 +1,9 @@
 // SPDX-License-Identifier: GPL-2.0
=20
+#include "io_uring.h"
+
 struct io_tctx_node {
 	struct list_head	ctx_node;
 	struct task_struct	*task;
 	struct io_ring_ctx	*ctx;
 };
@@ -13,11 +15,12 @@ int __io_uring_add_tctx_node(struct io_ring_ctx *ctx);
 int __io_uring_add_tctx_node_from_submit(struct io_ring_ctx *ctx);
 void io_uring_clean_tctx(struct io_uring_task *tctx);
=20
 void io_uring_unreg_ringfd(void);
 int io_ringfd_register(struct io_ring_ctx *ctx, void __user *__arg,
-		       unsigned nr_args);
+		       unsigned nr_args,
+		       struct io_ring_ctx_lock_state *lock_state);
 int io_ringfd_unregister(struct io_ring_ctx *ctx, void __user *__arg,
 			 unsigned nr_args);
=20
 /*
  * Note that this task has used io_uring. We use it for cancelation purpos=
es.
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index 197474911f04..a8a128a3f0a2 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -51,11 +51,11 @@ bool io_uring_try_cancel_uring_cmd(struct io_ring_ctx *=
ctx,
 {
 	struct hlist_node *tmp;
 	struct io_kiocb *req;
 	bool ret =3D false;
=20
-	lockdep_assert_held(&ctx->uring_lock);
+	io_ring_ctx_assert_locked(ctx);
=20
 	hlist_for_each_entry_safe(req, tmp, &ctx->cancelable_uring_cmd,
 			hash_node) {
 		struct io_uring_cmd *cmd =3D io_kiocb_to_cmd(req,
 				struct io_uring_cmd);
@@ -76,19 +76,20 @@ bool io_uring_try_cancel_uring_cmd(struct io_ring_ctx *=
ctx,
=20
 static void io_uring_cmd_del_cancelable(struct io_uring_cmd *cmd,
 		unsigned int issue_flags)
 {
 	struct io_kiocb *req =3D cmd_to_io_kiocb(cmd);
+	struct io_ring_ctx_lock_state lock_state;
 	struct io_ring_ctx *ctx =3D req->ctx;
=20
 	if (!(cmd->flags & IORING_URING_CMD_CANCELABLE))
 		return;
=20
 	cmd->flags &=3D ~IORING_URING_CMD_CANCELABLE;
-	io_ring_submit_lock(ctx, issue_flags);
+	io_ring_submit_lock(ctx, issue_flags, &lock_state);
 	hlist_del(&req->hash_node);
-	io_ring_submit_unlock(ctx, issue_flags);
+	io_ring_submit_unlock(ctx, issue_flags, &lock_state);
 }
=20
 /*
  * Mark this command as concelable, then io_uring_try_cancel_uring_cmd()
  * will try to cancel this issued command by sending ->uring_cmd() with
@@ -103,14 +104,16 @@ void io_uring_cmd_mark_cancelable(struct io_uring_cmd=
 *cmd,
 {
 	struct io_kiocb *req =3D cmd_to_io_kiocb(cmd);
 	struct io_ring_ctx *ctx =3D req->ctx;
=20
 	if (!(cmd->flags & IORING_URING_CMD_CANCELABLE)) {
+		struct io_ring_ctx_lock_state lock_state;
+
 		cmd->flags |=3D IORING_URING_CMD_CANCELABLE;
-		io_ring_submit_lock(ctx, issue_flags);
+		io_ring_submit_lock(ctx, issue_flags, &lock_state);
 		hlist_add_head(&req->hash_node, &ctx->cancelable_uring_cmd);
-		io_ring_submit_unlock(ctx, issue_flags);
+		io_ring_submit_unlock(ctx, issue_flags, &lock_state);
 	}
 }
 EXPORT_SYMBOL_GPL(io_uring_cmd_mark_cancelable);
=20
 void __io_uring_cmd_do_in_task(struct io_uring_cmd *ioucmd,
diff --git a/io_uring/waitid.c b/io_uring/waitid.c
index 2d4cbd47c67c..a69eb1b30b89 100644
--- a/io_uring/waitid.c
+++ b/io_uring/waitid.c
@@ -130,11 +130,11 @@ static void io_waitid_complete(struct io_kiocb *req, =
int ret)
 	struct io_waitid *iw =3D io_kiocb_to_cmd(req, struct io_waitid);
=20
 	/* anyone completing better be holding a reference */
 	WARN_ON_ONCE(!(atomic_read(&iw->refs) & IO_WAITID_REF_MASK));
=20
-	lockdep_assert_held(&req->ctx->uring_lock);
+	io_ring_ctx_assert_locked(req->ctx);
=20
 	hlist_del_init(&req->hash_node);
 	io_waitid_remove_wq(req);
=20
 	ret =3D io_waitid_finish(req, ret);
@@ -145,11 +145,11 @@ static void io_waitid_complete(struct io_kiocb *req, =
int ret)
=20
 static bool __io_waitid_cancel(struct io_kiocb *req)
 {
 	struct io_waitid *iw =3D io_kiocb_to_cmd(req, struct io_waitid);
=20
-	lockdep_assert_held(&req->ctx->uring_lock);
+	io_ring_ctx_assert_locked(req->ctx);
=20
 	/*
 	 * Mark us canceled regardless of ownership. This will prevent a
 	 * potential retry from a spurious wakeup.
 	 */
@@ -280,10 +280,11 @@ int io_waitid_prep(struct io_kiocb *req, const struct=
 io_uring_sqe *sqe)
=20
 int io_waitid(struct io_kiocb *req, unsigned int issue_flags)
 {
 	struct io_waitid *iw =3D io_kiocb_to_cmd(req, struct io_waitid);
 	struct io_waitid_async *iwa =3D req->async_data;
+	struct io_ring_ctx_lock_state lock_state;
 	struct io_ring_ctx *ctx =3D req->ctx;
 	int ret;
=20
 	ret =3D kernel_waitid_prepare(&iwa->wo, iw->which, iw->upid, &iw->info,
 					iw->options, NULL);
@@ -301,11 +302,11 @@ int io_waitid(struct io_kiocb *req, unsigned int issu=
e_flags)
 	 * Cancel must hold the ctx lock, so there's no risk of cancelation
 	 * finding us until a) we remain on the list, and b) the lock is
 	 * dropped. We only need to worry about racing with the wakeup
 	 * callback.
 	 */
-	io_ring_submit_lock(ctx, issue_flags);
+	io_ring_submit_lock(ctx, issue_flags, &lock_state);
=20
 	/*
 	 * iw->head is valid under the ring lock, and as long as the request
 	 * is on the waitid_list where cancelations may find it.
 	 */
@@ -321,27 +322,27 @@ int io_waitid(struct io_kiocb *req, unsigned int issu=
e_flags)
 		/*
 		 * Nobody else grabbed a reference, it'll complete when we get
 		 * a waitqueue callback, or if someone cancels it.
 		 */
 		if (!io_waitid_drop_issue_ref(req)) {
-			io_ring_submit_unlock(ctx, issue_flags);
+			io_ring_submit_unlock(ctx, issue_flags, &lock_state);
 			return IOU_ISSUE_SKIP_COMPLETE;
 		}
=20
 		/*
 		 * Wakeup triggered, racing with us. It was prevented from
 		 * completing because of that, queue up the tw to do that.
 		 */
-		io_ring_submit_unlock(ctx, issue_flags);
+		io_ring_submit_unlock(ctx, issue_flags, &lock_state);
 		return IOU_ISSUE_SKIP_COMPLETE;
 	}
=20
 	hlist_del_init(&req->hash_node);
 	io_waitid_remove_wq(req);
 	ret =3D io_waitid_finish(req, ret);
=20
-	io_ring_submit_unlock(ctx, issue_flags);
+	io_ring_submit_unlock(ctx, issue_flags, &lock_state);
 done:
 	if (ret < 0)
 		req_set_fail(req);
 	io_req_set_res(req, ret, 0);
 	return IOU_COMPLETE;
diff --git a/io_uring/zcrx.c b/io_uring/zcrx.c
index b99cf2c6670a..f2ed49bbad63 100644
--- a/io_uring/zcrx.c
+++ b/io_uring/zcrx.c
@@ -851,11 +851,11 @@ static struct net_iov *__io_zcrx_get_free_niov(struct=
 io_zcrx_area *area)
=20
 void io_unregister_zcrx_ifqs(struct io_ring_ctx *ctx)
 {
 	struct io_zcrx_ifq *ifq;
=20
-	lockdep_assert_held(&ctx->uring_lock);
+	io_ring_ctx_assert_locked(ctx);
=20
 	while (1) {
 		scoped_guard(mutex, &ctx->mmap_lock) {
 			unsigned long id =3D 0;
=20
--=20
2.45.2
From nobody Sun Feb  8 01:39:27 2026
Received: from mail-pl1-f226.google.com (mail-pl1-f226.google.com
 [209.85.214.226])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id AF27C23D7E0
	for <linux-kernel@vger.kernel.org>; Thu, 18 Dec 2025 02:45:04 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.214.226
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1766025907; cv=none;
 b=Cr1YcHBXTf0dI72AxASSjQ+VmsDOquFftKxWZM7ls3ch6r/PwVOxdPT0RqC+vDRZfAdcpu9IE9pm5juXximXHEU0l8z0ZKcsqlCWuOCesG9W0H1CyWOsNCQUD5yF20tEtx2yShOejwhjL+GNRkuU0aCWfcWzwd6nL93k6+r7Z7M=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1766025907; c=relaxed/simple;
	bh=hZIRs+tNNb1mG9XvgPl7q/rGi8qz0UKB7MRvjX+k19Y=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=Yth+oPJlhQFv6JuXai1AFwxVpOxoLPubdWKewJqFYZBRC6oLeEaicCFdEpu9CCdqnIq8kiQvUKpxv7hSlvUDN4Sg0KVJ6H6Ymqoq88LVLYBdSdJauTX7+aUfhXh5JaCYzdfr+l83gIS/0a9KYQb30lxx2lwmpBNZoCwAPCJ+pxM=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=purestorage.com;
 spf=fail smtp.mailfrom=purestorage.com;
 dkim=pass (2048-bit key) header.d=purestorage.com header.i=@purestorage.com
 header.b=SpGrw55M; arc=none smtp.client-ip=209.85.214.226
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=purestorage.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=fail smtp.mailfrom=purestorage.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=purestorage.com header.i=@purestorage.com
 header.b="SpGrw55M"
Received: by mail-pl1-f226.google.com with SMTP id
 d9443c01a7336-2a08cb5e30eso341915ad.1
        for <linux-kernel@vger.kernel.org>;
 Wed, 17 Dec 2025 18:45:04 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=purestorage.com; s=google2022; t=1766025904; x=1766630704;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=xCOT1eJLhQi6SepBGbijdrE2xjpshazJgxdmtCDHEFc=;
        b=SpGrw55M9zv1zOs3PguZ/uQ05fJH9mwHNWWuvdBV6sIGZuoNI4CErDsDa3Mw7b0y9B
         90awx051BmKasWL4Mbk4AMnEmdLpnOTAp7OnCYwHvVadc2fZIRveMw3sFnXjKvfBxnG1
         qbWJfIvmsva2eZF5loetuvUlLSCc8WOKoagiovKfc0nB8gtZUiJA65FyKDtjqKKPXszx
         sWdFdCMmLkff4vycKzyuTO1j/UnhDTWfQ2LXyumuD/h22yQ3tJesfNsgI9sEPMnb70G/
         hsvpBkIMLEj4lTP+KwwC6B7l8CgF7/hevwmZ86QQv9ZtBYHpDG3lgFb3Y/scXSnH2jQG
         T6dQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1766025904; x=1766630704;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from
         :to:cc:subject:date:message-id:reply-to;
        bh=xCOT1eJLhQi6SepBGbijdrE2xjpshazJgxdmtCDHEFc=;
        b=asZ6aEOJo30vUxLlwFwHFw4aTm0a5jBd5+XtZFPh9V6hp0QLz+uBO09ZJ9vbf1wgjG
         l1OAuqBV8m/ClAK5EFb5NfR5QJr+kyaE3XQIU36Wz9kwQEHHJchpL+09/GfnE2k2r7uR
         xXlc5mCHtcGLX2+Izs0wFXm3LQd2i7hXJHXkWG7txGxBghgq2FrbKdH6P7n/COPK1yPK
         XulUUayD9gzfXIuXyf01htgUsZ+p7FRwragxr/S5LrcQwBbr6QFKqZ2OY9v1FKiEFXzT
         ghAc1cSyWEuQWfts9TIXLcV2bHHB9Md5aUkYtJsEfuPleARpP6NuRc8UcIiG8wRDVBTZ
         jRVg==
X-Forwarded-Encrypted: i=1;
 AJvYcCVQyu0jCns7blYEhVjVikWEtfQDVG078BohWOk1VbngREpmU6nRdRwMqORBCe7uINDn/pVcPCSnwb5Htlk=@vger.kernel.org
X-Gm-Message-State: AOJu0YzPJ/5KxWbZcXADj/dQuwDxeEZezF4vb2xmhlUNh0LPMjkczma7
	rTseyXIaZF0MW/rUcYOQC/6vx5KEY5qIMTx70/eAreqhs9X9/PS9eozP2nBLtTsY1VzHlf3XEHC
	0dLCqMvgksYZJwXoM3VIrEYE36XM+5GO24xk8NuryMW8I38sjQ9hR
X-Gm-Gg: AY/fxX6AV57XcAIQ65jDIKbzz7xIhalSMFC+HsO648Fp9iXrZ8PY94gCrSGNka4p/rm
	ZfQqfYYJhJoW8eu91IcVqlkFagjiY8n8/IZhpfOuixlSU6bdDO7DJ4i9Gdatxn1uN9HSwIHSMBC
	QGE0xb8o5Y/PsJIwcBJiZeFvbF/I72FPt8S1BP0E9wEBXT2pWRVM2R3ubjeRo6TXPz4PxA5AqYw
	UGIfqDZ5hgPBGCCC+F6ijLPQbOhr9lO1VvRLAMESM+d3maYJq6LczrtHQorpPYHT8HgxENkdkwJ
	IZHWCgE8CklTTO+YzxoYEU711TtiDXI769VdroHwDkvCrwVNWovq1owvBcFKW45xcVjSm7SGZcW
	EUSJmHo5AMFGFn8GjxfJhLraVq7A=
X-Google-Smtp-Source: 
 AGHT+IGjOqg3N+0jyfihCulwzlp/UmV9H0wtrz3m1WHnufqQWqck6o1b14hPo+tV+fQgv+Iim0Otdg9FdiQt
X-Received: by 2002:a05:7022:1e05:b0:11e:3e9:3e89 with SMTP id
 a92af1059eb24-1206289ff38mr414137c88.7.1766025903683;
        Wed, 17 Dec 2025 18:45:03 -0800 (PST)
Received: from c7-smtp-2023.dev.purestorage.com ([2620:125:9017:12:36:3:5:0])
        by smtp-relay.gmail.com with ESMTPS id
 a92af1059eb24-12061f9c99dsm154191c88.3.2025.12.17.18.45.03
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Wed, 17 Dec 2025 18:45:03 -0800 (PST)
X-Relaying-Domain: purestorage.com
Received: from dev-csander.dev.purestorage.com (unknown
 [IPv6:2620:125:9007:640:ffff::1199])
	by c7-smtp-2023.dev.purestorage.com (Postfix) with ESMTP id 361A734023F;
	Wed, 17 Dec 2025 19:45:03 -0700 (MST)
Received: by dev-csander.dev.purestorage.com (Postfix, from userid 1557716354)
	id 34016E417CF; Wed, 17 Dec 2025 19:45:03 -0700 (MST)
From: Caleb Sander Mateos <csander@purestorage.com>
To: Jens Axboe <axboe@kernel.dk>,
	io-uring@vger.kernel.org,
	linux-kernel@vger.kernel.org
Cc: Joanne Koong <joannelkoong@gmail.com>,
	Caleb Sander Mateos <csander@purestorage.com>,
	syzbot@syzkaller.appspotmail.com
Subject: [PATCH v6 6/6] io_uring: avoid uring_lock for
 IORING_SETUP_SINGLE_ISSUER
Date: Wed, 17 Dec 2025 19:44:59 -0700
Message-ID: <20251218024459.1083572-7-csander@purestorage.com>
X-Mailer: git-send-email 2.45.2
In-Reply-To: <20251218024459.1083572-1-csander@purestorage.com>
References: <20251218024459.1083572-1-csander@purestorage.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

io_ring_ctx's mutex uring_lock can be quite expensive in high-IOPS
workloads. Even when only one thread pinned to a single CPU is accessing
the io_ring_ctx, the atomic CASes required to lock and unlock the mutex
are very hot instructions. The mutex's primary purpose is to prevent
concurrent io_uring system calls on the same io_ring_ctx. However, there
is already a flag IORING_SETUP_SINGLE_ISSUER that promises only one
task will make io_uring_enter() and io_uring_register() system calls on
the io_ring_ctx once it's enabled.
So if the io_ring_ctx is setup with IORING_SETUP_SINGLE_ISSUER, skip the
uring_lock mutex_lock() and mutex_unlock() on the submitter_task. On
other tasks acquiring the ctx uring lock, use a task work item to
suspend the submitter_task for the critical section.
If the io_ring_ctx is IORING_SETUP_R_DISABLED (possible during
io_uring_setup(), io_uring_register(), or io_uring exit), submitter_task
may be set concurrently, so acquire the uring_lock before checking it.
If submitter_task isn't set yet, the uring_lock suffices to provide
mutual exclusion. If task work can't be queued because submitter_task
has exited, also use the uring_lock for mutual exclusion.

Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Tested-by: syzbot@syzkaller.appspotmail.com
---
 io_uring/io_uring.c |  12 +++++
 io_uring/io_uring.h | 118 ++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 127 insertions(+), 3 deletions(-)

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 237663382a5e..38390c8c54e0 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -363,10 +363,22 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(s=
truct io_uring_params *p)
 	xa_destroy(&ctx->io_bl_xa);
 	kfree(ctx);
 	return NULL;
 }
=20
+void io_ring_suspend_work(struct callback_head *cb_head)
+{
+	struct io_ring_suspend_work *suspend_work =3D
+		container_of(cb_head, struct io_ring_suspend_work, cb_head);
+	DECLARE_COMPLETION_ONSTACK(suspend_end);
+
+	*suspend_work->suspend_end =3D &suspend_end;
+	complete(&suspend_work->suspend_start);
+
+	wait_for_completion(&suspend_end);
+}
+
 static void io_clean_op(struct io_kiocb *req)
 {
 	if (unlikely(req->flags & REQ_F_BUFFER_SELECTED))
 		io_kbuf_drop_legacy(req);
=20
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index 57c3eef26a88..c2e39ca55569 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -1,8 +1,9 @@
 #ifndef IOU_CORE_H
 #define IOU_CORE_H
=20
+#include <linux/completion.h>
 #include <linux/errno.h>
 #include <linux/lockdep.h>
 #include <linux/resume_user_mode.h>
 #include <linux/kasan.h>
 #include <linux/poll.h>
@@ -195,19 +196,93 @@ void io_queue_next(struct io_kiocb *req);
 void io_task_refs_refill(struct io_uring_task *tctx);
 bool __io_alloc_req_refill(struct io_ring_ctx *ctx);
=20
 void io_activate_pollwq(struct io_ring_ctx *ctx);
=20
+/*
+ * The ctx uring lock protects most of the mutable struct io_ring_ctx state
+ * accessed in the struct io_kiocb issue path. In the I/O path, it is typi=
cally
+ * acquired in the io_uring_enter() syscall and in io_handle_tw_list(). For
+ * IORING_SETUP_SQPOLL, it's acquired by io_sq_thread() instead. io_kiocb's
+ * issued with IO_URING_F_UNLOCKED in issue_flags (e.g. by io_wq_submit_wo=
rk())
+ * acquire and release the ctx uring lock whenever they must touch io_ring=
_ctx
+ * state. io_uring_register() also acquires the ctx uring lock because most
+ * opcodes mutate io_ring_ctx state accessed in the issue path.
+ *
+ * For !IORING_SETUP_SINGLE_ISSUER io_ring_ctx's, acquiring the ctx uring =
lock
+ * is done via mutex_(try)lock(&ctx->uring_lock).
+ *
+ * However, for IORING_SETUP_SINGLE_ISSUER, we can avoid the mutex_lock() +
+ * mutex_unlock() overhead on submitter_task because a single thread can't=
 race
+ * with itself. In the uncommon case where the ctx uring lock is needed on
+ * another thread, it must suspend submitter_task by scheduling a task wor=
k item
+ * on it. io_ring_ctx_lock() returns once the task work item has started.
+ * io_ring_ctx_unlock() allows the task work item to complete.
+ * If io_ring_ctx_lock() is called while the ctx is IORING_SETUP_R_DISABLED
+ * (e.g. during ctx create or exit), io_ring_ctx_lock() must acquire uring=
_lock
+ * because submitter_task isn't set yet. submitter_task can be accessed on=
ce
+ * uring_lock is held. If submitter_task exists, we do the same thing as i=
n the
+ * non-IORING_SETUP_R_DISABLED case. If submitter_task isn't set, all other
+ * io_ring_ctx_lock() callers will also acquire uring_lock, so it suffices=
 for
+ * mutual exclusion.
+ * Similarly, if io_ring_ctx_lock() is called after submitter_task has exi=
ted,
+ * task work can't be queued on it. Acquire uring_lock to exclude other ca=
llers.
+ */
+
+struct io_ring_suspend_work {
+	struct callback_head cb_head;
+	struct completion suspend_start;
+	struct completion **suspend_end;
+};
+
+void io_ring_suspend_work(struct callback_head *cb_head);
+
 struct io_ring_ctx_lock_state {
+	bool mutex_held;
+	struct completion *suspend_end;
 };
=20
 /* Acquire the ctx uring lock with the given nesting level */
 static inline void io_ring_ctx_lock_nested(struct io_ring_ctx *ctx,
 					   unsigned int subclass,
 					   struct io_ring_ctx_lock_state *state)
 {
-	mutex_lock_nested(&ctx->uring_lock, subclass);
+	struct io_ring_suspend_work suspend_work;
+
+	if (!(ctx->flags & IORING_SETUP_SINGLE_ISSUER)) {
+		mutex_lock_nested(&ctx->uring_lock, subclass);
+		return;
+	}
+
+	state->mutex_held =3D false;
+	state->suspend_end =3D NULL;
+	if (unlikely(smp_load_acquire(&ctx->flags) & IORING_SETUP_R_DISABLED)) {
+		mutex_lock_nested(&ctx->uring_lock, subclass);
+		if (likely(!ctx->submitter_task)) {
+			state->mutex_held =3D true;
+			return;
+		}
+
+		/* submitter_task set concurrently, must suspend it */
+		mutex_unlock(&ctx->uring_lock);
+	} else if (likely(current =3D=3D ctx->submitter_task)) {
+		return;
+	}
+
+	/* Use task work to suspend submitter_task */
+	init_task_work(&suspend_work.cb_head, io_ring_suspend_work);
+	init_completion(&suspend_work.suspend_start);
+	suspend_work.suspend_end =3D &state->suspend_end;
+	if (unlikely(task_work_add(ctx->submitter_task, &suspend_work.cb_head,
+				   TWA_SIGNAL))) {
+		/* submitter_task is exiting, use mutex instead */
+		state->mutex_held =3D true;
+		mutex_lock_nested(&ctx->uring_lock, subclass);
+		return;
+	}
+
+	wait_for_completion(&suspend_work.suspend_start);
 }
=20
 /* Acquire the ctx uring lock */
 static inline void io_ring_ctx_lock(struct io_ring_ctx *ctx,
 				    struct io_ring_ctx_lock_state *state)
@@ -217,29 +292,66 @@ static inline void io_ring_ctx_lock(struct io_ring_ct=
x *ctx,
=20
 /* Attempt to acquire the ctx uring lock without blocking */
 static inline bool io_ring_ctx_trylock(struct io_ring_ctx *ctx,
 				       struct io_ring_ctx_lock_state *state)
 {
-	return mutex_trylock(&ctx->uring_lock);
+	if (!(ctx->flags & IORING_SETUP_SINGLE_ISSUER))
+		return mutex_trylock(&ctx->uring_lock);
+
+	state->suspend_end =3D NULL;
+	if (unlikely(smp_load_acquire(&ctx->flags) & IORING_SETUP_R_DISABLED)) {
+		if (!mutex_trylock(&ctx->uring_lock))
+			return false;
+		if (likely(!ctx->submitter_task)) {
+			state->mutex_held =3D true;
+			return true;
+		}
+
+		mutex_unlock(&ctx->uring_lock);
+		return false;
+	}
+
+	state->mutex_held =3D false;
+	return current =3D=3D ctx->submitter_task;
 }
=20
 /* Release the ctx uring lock */
 static inline void io_ring_ctx_unlock(struct io_ring_ctx *ctx,
 				      struct io_ring_ctx_lock_state *state)
 {
-	mutex_unlock(&ctx->uring_lock);
+	if (!(ctx->flags & IORING_SETUP_SINGLE_ISSUER)) {
+		mutex_unlock(&ctx->uring_lock);
+		return;
+	}
+
+	if (unlikely(state->mutex_held))
+		mutex_unlock(&ctx->uring_lock);
+	if (unlikely(state->suspend_end))
+		complete(state->suspend_end);
 }
=20
 /* Return (if CONFIG_LOCKDEP) whether the ctx uring lock is held */
 static inline bool io_ring_ctx_lock_held(const struct io_ring_ctx *ctx)
 {
+	/*
+	 * No straightforward way to check that submitter_task is suspended
+	 * without access to struct io_ring_ctx_lock_state
+	 */
+	if (ctx->flags & IORING_SETUP_SINGLE_ISSUER &&
+	    !(ctx->flags & IORING_SETUP_R_DISABLED))
+		return true;
+
 	return lockdep_is_held(&ctx->uring_lock);
 }
=20
 /* Assert (if CONFIG_LOCKDEP) that the ctx uring lock is held */
 static inline void io_ring_ctx_assert_locked(const struct io_ring_ctx *ctx)
 {
+	if (ctx->flags & IORING_SETUP_SINGLE_ISSUER &&
+	    !(ctx->flags & IORING_SETUP_R_DISABLED))
+		return;
+
 	lockdep_assert_held(&ctx->uring_lock);
 }
=20
 static inline void io_lockdep_assert_cq_locked(struct io_ring_ctx *ctx)
 {
--=20
2.45.2