From nobody Mon May 25 18:05:06 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D386441C71 for ; Mon, 25 May 2026 09:33:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779701617; cv=none; b=Vmy2k+Cv1fQFew96pZelF+d21+Pi6T/5RyRxmsLcsb40YO85m17mVgmZ/GeHqVpbbwyw77hrV5gGR6V6bCa8NvMjDSVK9yLTKnR7P9OR1X8tigHSJgZj81ylI+trRwunw1IFIKO1ZmbS64fMYQ9rUfzeQPWDfe3+8ui2dvLIxqM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779701617; c=relaxed/simple; bh=dmQUNx2Wth2lyUaaifFX4EPbFI3r6V7tKXx15E8xjJM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=e6leICYgryoBEiNky2U823Azmis28B9nYaOoVuGIfrVLs2ySkJ5IH89ddl7atdQU7O2/UIAO+DQJBS/6REBo/SFuYpZCXJGn6ThEavaUq0fXR6GLP8XohnVFsMfuquu3uriOqNZmUNx332ebESn6MsW8DHulweewTwPXbe55Ths= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Mbdfm+JV; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Mbdfm+JV" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9CA2E1F00A3A; Mon, 25 May 2026 09:33:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1779701615; bh=cmy7/7JbpOQfydrZvsCSYgYe1jo6Kvdj0SVWpV7DzcI=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=Mbdfm+JV8HyoKKYROEFNllQXHC0ewPgPYE43ECyM+Jkq+jGa0Ac1w8J62lDQLQ7tt 0SIeNFsIjKR0UaEHAI22jdZkIODhYkxT5aTXDhBnU/hlLr6x68YnRCfCGkoLCOI+Sn ROwm81pFvXFFooi5JkrI3bmb62n+0CGqSu6EGkwYeD0U0EcYlIbp+3PqHf7i/skcvR t//rAZhzdm4Vr13sly2/9ZmH77PQErmg9zwtCO48aMnggE08C92NHnHQ3CvhOv3FF9 zlriScPWnJQOmcrxEPwW901lkW3iBJYwH8wSrF/aZsgq8MrCcW0gl1Av0V5C0grlq1 r4NGaCrXVP3ng== From: Geliang Tang To: mptcp@lists.linux.dev Cc: Geliang Tang , Hannes Reinecke , zhenwei pi , Hui Zhu , Gang Yan Subject: [RFC mptcp-next v17 01/14] nvmet-tcp: define listen socket ops Date: Mon, 25 May 2026 17:32:56 +0800 Message-ID: X-Mailer: git-send-email 2.53.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Geliang Tang To support MPTCP on the target side, the listen socket needs to pass IPPROTO_MPTCP to sock_create() for MPTCP ports, and use MPTCP-specific setsockopt functions. This patch adds struct nvmet_tcp_proto_ops to hold listen socket protocol operations (protocol, set_reuseaddr, set_nodelay, set_priority). A TCP version is defined and used for TCP ports. v2: - use trtype instead of tsas (Hannes). v3: - check mptcp protocol from disc_addr.trtype instead of passing a parameter (Hannes). v4: - check CONFIG_MPTCP. Cc: Hannes Reinecke Co-developed-by: zhenwei pi Signed-off-by: zhenwei pi Co-developed-by: Hui Zhu Signed-off-by: Hui Zhu Co-developed-by: Gang Yan Signed-off-by: Gang Yan Signed-off-by: Geliang Tang --- drivers/nvme/target/tcp.c | 30 ++++++++++++++++++++++++++---- 1 file changed, 26 insertions(+), 4 deletions(-) diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c index 20f150d17a96..094f8eb0107e 100644 --- a/drivers/nvme/target/tcp.c +++ b/drivers/nvme/target/tcp.c @@ -2043,8 +2043,23 @@ static void nvmet_tcp_listen_data_ready(struct sock = *sk) read_unlock_bh(&sk->sk_callback_lock); } =20 +struct nvmet_tcp_proto_ops { + int protocol; + void (*set_reuseaddr)(struct sock *sk); + void (*set_nodelay)(struct sock *sk); + void (*set_priority)(struct sock *sk, u32 priority); +}; + +static const struct nvmet_tcp_proto_ops nvmet_tcp_proto_ops =3D { + .protocol =3D IPPROTO_TCP, + .set_reuseaddr =3D sock_set_reuseaddr, + .set_nodelay =3D tcp_sock_set_nodelay, + .set_priority =3D sock_set_priority, +}; + static int nvmet_tcp_add_port(struct nvmet_port *nport) { + const struct nvmet_tcp_proto_ops *ops; struct nvmet_tcp_port *port; __kernel_sa_family_t af; int ret; @@ -2067,6 +2082,13 @@ static int nvmet_tcp_add_port(struct nvmet_port *npo= rt) goto err_port; } =20 + if (nport->disc_addr.trtype =3D=3D NVMF_TRTYPE_TCP) { + ops =3D &nvmet_tcp_proto_ops; + } else { + ret =3D -EINVAL; + goto err_port; + } + ret =3D inet_pton_with_scope(&init_net, af, nport->disc_addr.traddr, nport->disc_addr.trsvcid, &port->addr); if (ret) { @@ -2081,7 +2103,7 @@ static int nvmet_tcp_add_port(struct nvmet_port *npor= t) port->nport->inline_data_size =3D NVMET_TCP_DEF_INLINE_DATA_SIZE; =20 ret =3D sock_create(port->addr.ss_family, SOCK_STREAM, - IPPROTO_TCP, &port->sock); + ops->protocol, &port->sock); if (ret) { pr_err("failed to create a socket\n"); goto err_port; @@ -2090,10 +2112,10 @@ static int nvmet_tcp_add_port(struct nvmet_port *np= ort) port->sock->sk->sk_user_data =3D port; port->data_ready =3D port->sock->sk->sk_data_ready; port->sock->sk->sk_data_ready =3D nvmet_tcp_listen_data_ready; - sock_set_reuseaddr(port->sock->sk); - tcp_sock_set_nodelay(port->sock->sk); + ops->set_reuseaddr(port->sock->sk); + ops->set_nodelay(port->sock->sk); if (so_priority > 0) - sock_set_priority(port->sock->sk, so_priority); + ops->set_priority(port->sock->sk, so_priority); =20 ret =3D kernel_bind(port->sock, (struct sockaddr_unsized *)&port->addr, sizeof(port->addr)); --=20 2.53.0 From nobody Mon May 25 18:05:06 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EB4BE41C71 for ; Mon, 25 May 2026 09:33:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779701620; cv=none; b=DCbz031g/i88nmkrRvvhMkEa81Fc7nK2E8nPwpqKJXjx4A4aGLvJAAUm4cMlkyoPifrbRWAjUi2L6aBz3QLjSx2kwepmTHlrFn2Vy5tZnWkxoc5t3HnJ9em6fSXCGi5f8zEfP/rF1LzhEQd60WxQpLOc3AWdkXhr2RhQxg+Lrk8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779701620; c=relaxed/simple; bh=UPhtBB0CJTSQx+7RqZlos3N2SQ/w98MIFtmiyBwt6Uc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=MLqNETZBqozDpu3dYGfB1dGGoJ6Hg+E7IdKogk3E29snSJ8YzoNpoXDpaTM9HVVZOk41V0EElUxzUvKZZx8r6GL8a4qGgfNkhcASYgqSG1yad3HugzBSvK9bJ/W3RqCGM9ve+q9EpzUsYuKVsOta5jb2KF44jtmNRWiAhKFXNyQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=nMn4/dyA; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="nMn4/dyA" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 324E41F000E9; Mon, 25 May 2026 09:33:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1779701618; bh=6U8BdeErzTulptG1tqIG2E2TN81VI6DcFXF+KHyXK/U=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=nMn4/dyAiJqFiWrBE1mSTiSgL0rUgArsmTSQ0tdOUVgyPZ6fMOGcoqMK74DpJd2vM 9JtFQWCfpXCWmkq70O531jSt9/mTOrafTc4pzXLzgZ+oJ01Obh2vpqonXfn6G4gf31 UW4cK9YkGyeJ2iWh9ryhG9FM5Zdza/bEJ1TAr5L8xbC8/GF8/807xvVetTnjJZUKkh sE8eKT02/suRlJRjAj9p2Rq4bvTJDpghwjVQK1ZgB8WZe4NQaKng3zOfn2XUEaq0fA jPxFRP7HQ7/0D/EQpNAAGeBFp7mGg0XIBQP8Krus27422lps69/xrUtJHLMdL73bfB 2IS2F4Xr5NhCQ== From: Geliang Tang To: mptcp@lists.linux.dev Cc: Geliang Tang , Hannes Reinecke , zhenwei pi , Hui Zhu , Gang Yan Subject: [RFC mptcp-next v17 02/14] nvmet-tcp: register target mptcp transport Date: Mon, 25 May 2026 17:32:57 +0800 Message-ID: <5cafc7c9b08656dfca4a00a18f5afd9de552891d.1779701391.git.tanggeliang@kylinos.cn> X-Mailer: git-send-email 2.53.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Geliang Tang This patch adds a new nvme target transport type NVMF_TRTYPE_MPTCP for MPTCP. And defines a new nvmet_fabrics_ops named nvmet_mptcp_ops, which is almost the same as nvmet_tcp_ops except .type. It is registered in nvmet_tcp_init() and unregistered in nvmet_tcp_exit(). A MODULE_ALIAS for "nvmet-transport-4" is also added. Note: NVMF_TRTYPE_MPTCP is temporarily assigned 4, a value currently reserved in the NVMe over Fabrics specification. A request will be submitted to the NVMe working group to officially allocate this value for MPTCP. Cc: Hannes Reinecke Co-developed-by: zhenwei pi Signed-off-by: zhenwei pi Co-developed-by: Hui Zhu Signed-off-by: Hui Zhu Co-developed-by: Gang Yan Signed-off-by: Gang Yan Signed-off-by: Geliang Tang --- drivers/nvme/target/configfs.c | 1 + drivers/nvme/target/tcp.c | 29 +++++++++++++++++++++++++++++ include/linux/nvme.h | 1 + 3 files changed, 31 insertions(+) diff --git a/drivers/nvme/target/configfs.c b/drivers/nvme/target/configfs.c index b88f897f06e2..51fc0f4d0c32 100644 --- a/drivers/nvme/target/configfs.c +++ b/drivers/nvme/target/configfs.c @@ -37,6 +37,7 @@ static struct nvmet_type_name_map nvmet_transport[] =3D { { NVMF_TRTYPE_RDMA, "rdma" }, { NVMF_TRTYPE_FC, "fc" }, { NVMF_TRTYPE_TCP, "tcp" }, + { NVMF_TRTYPE_MPTCP, "mptcp" }, { NVMF_TRTYPE_PCI, "pci" }, { NVMF_TRTYPE_LOOP, "loop" }, }; diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c index 094f8eb0107e..fa4053cfed12 100644 --- a/drivers/nvme/target/tcp.c +++ b/drivers/nvme/target/tcp.c @@ -2255,6 +2255,23 @@ static const struct nvmet_fabrics_ops nvmet_tcp_ops = =3D { .host_traddr =3D nvmet_tcp_host_port_addr, }; =20 +#ifdef CONFIG_MPTCP +static bool nvmet_mptcp_registered; + +static const struct nvmet_fabrics_ops nvmet_mptcp_ops =3D { + .owner =3D THIS_MODULE, + .type =3D NVMF_TRTYPE_MPTCP, + .msdbd =3D 1, + .add_port =3D nvmet_tcp_add_port, + .remove_port =3D nvmet_tcp_remove_port, + .queue_response =3D nvmet_tcp_queue_response, + .delete_ctrl =3D nvmet_tcp_delete_ctrl, + .install_queue =3D nvmet_tcp_install_queue, + .disc_traddr =3D nvmet_tcp_disc_port_addr, + .host_traddr =3D nvmet_tcp_host_port_addr, +}; +#endif + static int __init nvmet_tcp_init(void) { int ret; @@ -2268,6 +2285,11 @@ static int __init nvmet_tcp_init(void) if (ret) goto err; =20 +#ifdef CONFIG_MPTCP + if (!nvmet_register_transport(&nvmet_mptcp_ops)) + nvmet_mptcp_registered =3D true; +#endif + return 0; err: destroy_workqueue(nvmet_tcp_wq); @@ -2278,6 +2300,10 @@ static void __exit nvmet_tcp_exit(void) { struct nvmet_tcp_queue *queue; =20 +#ifdef CONFIG_MPTCP + if (nvmet_mptcp_registered) + nvmet_unregister_transport(&nvmet_mptcp_ops); +#endif nvmet_unregister_transport(&nvmet_tcp_ops); =20 flush_workqueue(nvmet_wq); @@ -2297,3 +2323,6 @@ module_exit(nvmet_tcp_exit); MODULE_DESCRIPTION("NVMe target TCP transport driver"); MODULE_LICENSE("GPL v2"); MODULE_ALIAS("nvmet-transport-3"); /* 3 =3D=3D NVMF_TRTYPE_TCP */ +#ifdef CONFIG_MPTCP +MODULE_ALIAS("nvmet-transport-4"); /* 4 =3D=3D NVMF_TRTYPE_MPTCP */ +#endif diff --git a/include/linux/nvme.h b/include/linux/nvme.h index 041f30931a90..0eada1e0c652 100644 --- a/include/linux/nvme.h +++ b/include/linux/nvme.h @@ -68,6 +68,7 @@ enum { NVMF_TRTYPE_RDMA =3D 1, /* RDMA */ NVMF_TRTYPE_FC =3D 2, /* Fibre Channel */ NVMF_TRTYPE_TCP =3D 3, /* TCP/IP */ + NVMF_TRTYPE_MPTCP =3D 4, /* Multipath TCP */ NVMF_TRTYPE_LOOP =3D 254, /* Reserved for host usage */ NVMF_TRTYPE_MAX, }; --=20 2.53.0 From nobody Mon May 25 18:05:06 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0BCCC2EDD6B for ; Mon, 25 May 2026 09:33:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779701623; cv=none; b=q4HNULs1fdRlhVg5n2lUGgxijFDoxvu3E1KsE6OdRtIxLO3XyBCf+oEaMk4BhcapYj10HF0ZU9xZR5HqdOxgAabp9j+w0KeF7C+Z9Ur44ZJ2pNOa7bu7wArYCKAiqmJthrYQWQ4tnYZa3+MyVQOANMp7vtAfbLuhQVdpEF3eI9E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779701623; c=relaxed/simple; bh=YODsaQYjIS6/4RbiVa0XNbpK1NJfiCzwAF97dLWyydA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=c/0ZGHyYUMuwnZESKNWgP70jiT4oYTRS/CQ7sRxBKT5kxXJe5My9bwR1yivDBfjFCTQDbGSR10qb41KTru2lL63pX3nLD09zqGtUosGhSg0owyBDZVSSN200sDFsgpjhEm/GyEN1QQ70zaxRtQn0aUrLKdP0Stfe2chGtBp5Ut0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=lAFKydaG; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="lAFKydaG" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 42F021F00A3C; Mon, 25 May 2026 09:33:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1779701621; bh=dGGQYA+Y4NUxLZuyhMds8AqzLpchfr8cleqaxv/yrUU=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=lAFKydaG57K0SxTTIoOrW6FZumFdha/20AFWUG5IYqzE7qFu17zlBqoy9VocPr4xX bDv91EGZnJXOoTQRXT8gaFjtxwsnG6FxzyUaCsaIYvJgDIRDXYw86znRpntQaRvyWN erjmqHdOhTxr1KHkFB6AHIAztry3daGioufP89kyTTDplzVMpC0/TgfErQkmEu+SbK vZmppyZAYfsCTaIXnfp/sGBe6SRIKMQSuk1VASyFDJIsuQPZvq0KqIAEVRodqet4lL 2LvWHGT9FkTYnwUxV0pEW6CSHpHKeugQJ7N5G9XnwoR6Gs4/Hfl/jn9iBEoIgpJS6+ z9AmU7YsougQA== From: Geliang Tang To: mptcp@lists.linux.dev Cc: Geliang Tang , Hannes Reinecke , zhenwei pi , Hui Zhu , Gang Yan Subject: [RFC mptcp-next v17 03/14] nvmet-tcp: implement mptcp listen socket ops Date: Mon, 25 May 2026 17:32:58 +0800 Message-ID: <85d313f78c4515c34c144448f7ddb6598d9aae88.1779701391.git.tanggeliang@kylinos.cn> X-Mailer: git-send-email 2.53.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Geliang Tang An MPTCP-specific version of struct nvmet_tcp_proto_ops is implemented for listen sockets. It is assigned to port->proto_ops when the transport type is MPTCP. Dedicated MPTCP helpers are introduced for setting listen socket options. The set_nodelay and set_priority helpers set the values on all existing subflows using mptcp_for_each_subflow(). The set_reuseaddr helper only applies to the first subflow. The values are then synchronized to other newly created subflows in sync_socket_options(). Cc: Hannes Reinecke Co-developed-by: zhenwei pi Signed-off-by: zhenwei pi Co-developed-by: Hui Zhu Signed-off-by: Hui Zhu Co-developed-by: Gang Yan Signed-off-by: Gang Yan Signed-off-by: Geliang Tang --- drivers/nvme/target/tcp.c | 13 ++++++++ include/net/mptcp.h | 12 ++++++++ net/mptcp/sockopt.c | 63 +++++++++++++++++++++++++++++++++++++++ 3 files changed, 88 insertions(+) diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c index fa4053cfed12..7093faf3e985 100644 --- a/drivers/nvme/target/tcp.c +++ b/drivers/nvme/target/tcp.c @@ -2057,6 +2057,15 @@ static const struct nvmet_tcp_proto_ops nvmet_tcp_pr= oto_ops =3D { .set_priority =3D sock_set_priority, }; =20 +#ifdef CONFIG_MPTCP +static const struct nvmet_tcp_proto_ops nvmet_mptcp_proto_ops =3D { + .protocol =3D IPPROTO_MPTCP, + .set_reuseaddr =3D mptcp_sock_set_reuseaddr, + .set_nodelay =3D mptcp_sock_set_nodelay, + .set_priority =3D mptcp_sock_set_priority, +}; +#endif + static int nvmet_tcp_add_port(struct nvmet_port *nport) { const struct nvmet_tcp_proto_ops *ops; @@ -2084,6 +2093,10 @@ static int nvmet_tcp_add_port(struct nvmet_port *npo= rt) =20 if (nport->disc_addr.trtype =3D=3D NVMF_TRTYPE_TCP) { ops =3D &nvmet_tcp_proto_ops; +#ifdef CONFIG_MPTCP + } else if (nport->disc_addr.trtype =3D=3D NVMF_TRTYPE_MPTCP) { + ops =3D &nvmet_mptcp_proto_ops; +#endif } else { ret =3D -EINVAL; goto err_port; diff --git a/include/net/mptcp.h b/include/net/mptcp.h index 4cf59e83c1c5..555af4022741 100644 --- a/include/net/mptcp.h +++ b/include/net/mptcp.h @@ -237,6 +237,12 @@ static inline __be32 mptcp_reset_option(const struct s= k_buff *skb) } =20 void mptcp_active_detect_blackhole(struct sock *sk, bool expired); + +void mptcp_sock_set_reuseaddr(struct sock *sk); + +void mptcp_sock_set_nodelay(struct sock *sk); + +void mptcp_sock_set_priority(struct sock *sk, u32 priority); #else =20 static inline void mptcp_init(void) @@ -323,6 +329,12 @@ static inline struct request_sock *mptcp_subflow_reqsk= _alloc(const struct reques static inline __be32 mptcp_reset_option(const struct sk_buff *skb) { retu= rn htonl(0u); } =20 static inline void mptcp_active_detect_blackhole(struct sock *sk, bool exp= ired) { } + +static inline void mptcp_sock_set_reuseaddr(struct sock *sk) { } + +static inline void mptcp_sock_set_nodelay(struct sock *sk) { } + +static inline void mptcp_sock_set_priority(struct sock *sk, u32 priority) = { } #endif /* CONFIG_MPTCP */ =20 #if IS_ENABLED(CONFIG_MPTCP_IPV6) diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c index 87b5796d0135..9be1105a1ec2 100644 --- a/net/mptcp/sockopt.c +++ b/net/mptcp/sockopt.c @@ -1596,6 +1596,8 @@ static void sync_socket_options(struct mptcp_sock *ms= k, struct sock *ssk) inet_assign_bit(FREEBIND, ssk, inet_test_bit(FREEBIND, sk)); inet_assign_bit(BIND_ADDRESS_NO_PORT, ssk, inet_test_bit(BIND_ADDRESS_NO_= PORT, sk)); WRITE_ONCE(inet_sk(ssk)->local_port_range, READ_ONCE(inet_sk(sk)->local_p= ort_range)); + + ssk->sk_reuse =3D sk->sk_reuse; } =20 void mptcp_sockopt_sync_locked(struct mptcp_sock *msk, struct sock *ssk) @@ -1662,3 +1664,64 @@ int mptcp_set_rcvlowat(struct sock *sk, int val) } return 0; } + +void mptcp_sock_set_reuseaddr(struct sock *sk) +{ + struct mptcp_sock *msk =3D mptcp_sk(sk); + struct sock *ssk; + + lock_sock(sk); + sockopt_seq_inc(msk); + sk->sk_reuse =3D SK_CAN_REUSE; + ssk =3D __mptcp_nmpc_sk(msk); + if (IS_ERR(ssk)) + goto unlock; + lock_sock_nested(ssk, SINGLE_DEPTH_NESTING); + ssk->sk_reuse =3D SK_CAN_REUSE; + release_sock(ssk); +unlock: + release_sock(sk); +} +EXPORT_SYMBOL(mptcp_sock_set_reuseaddr); + +void mptcp_sock_set_nodelay(struct sock *sk) +{ + struct mptcp_sock *msk =3D mptcp_sk(sk); + struct mptcp_subflow_context *subflow; + struct sock *ssk; + + lock_sock(sk); + sockopt_seq_inc(msk); + msk->nodelay =3D true; + mptcp_for_each_subflow(msk, subflow) { + ssk =3D mptcp_subflow_tcp_sock(subflow); + if (ssk) { + lock_sock_nested(ssk, SINGLE_DEPTH_NESTING); + __tcp_sock_set_nodelay(ssk, true); + release_sock(ssk); + } + } + release_sock(sk); +} +EXPORT_SYMBOL(mptcp_sock_set_nodelay); + +void mptcp_sock_set_priority(struct sock *sk, u32 priority) +{ + struct mptcp_sock *msk =3D mptcp_sk(sk); + struct mptcp_subflow_context *subflow; + struct sock *ssk; + + lock_sock(sk); + sockopt_seq_inc(msk); + sock_set_priority(sk, priority); + mptcp_for_each_subflow(msk, subflow) { + ssk =3D mptcp_subflow_tcp_sock(subflow); + if (ssk) { + lock_sock_nested(ssk, SINGLE_DEPTH_NESTING); + sock_set_priority(ssk, priority); + release_sock(ssk); + } + } + release_sock(sk); +} +EXPORT_SYMBOL(mptcp_sock_set_priority); --=20 2.53.0 From nobody Mon May 25 18:05:06 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5DD0B41C71 for ; Mon, 25 May 2026 09:33:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779701626; cv=none; b=I+06a/j65pacz0hdLeL6+a9+XQGUVYA9VsM1PQwHv9/XOF6BTKHqW6JhQ5C/u5iKL7Xln8bQggM+5xHRh6aJuJf0S9cE6n/2xRfjT7NjbqJXKKIKD2MaxkKnXO+0qJxY5v97+PUODw3Lo9c3IWP+jvHio7Lc4UwjHicAqJCJTgU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779701626; c=relaxed/simple; bh=qRJh71CNrOxqes94cD9mxBxBWowIncSl6wO8z/4hh0o=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=hPER5qngP5elTx5PcaI3Is/K4ZoHHDVDLYmvZzQ3XqLov831UCSC1+q/hC6LjdKpqDIj32OvxMXu6Hk2ZmDzPJtPCVKugE/HSs/87CqaEoFKqlhtYwAfJoU9Lw++RaAXPUnZ+7ErPcrQrSbjHXde++wzl89USRBaXdep2WrDYxU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=NXd2Cbfa; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="NXd2Cbfa" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 546621F000E9; Mon, 25 May 2026 09:33:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1779701625; bh=CX30/KzfBi4zd6HLC1PSlaRiBuq4VbxIBmU36VV2CJw=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=NXd2CbfaIi1vvaVvksfqCn7aalJDpwzuu1eVqiOArOmbiF94jYo5d5X6a1pMlsPL7 9S7AG6zsePdv8P+H9MZjSSSNbau21VIy5EhzWSPVERAaP/BaUIpUj1yKjX0IDk4X4T TxVBpYXJYWnsL1CPHi5yFsQRBOgoqCLTAt300G2Atz+YiDN+SghDpL//Gfh6igRg9M fPIx9k6OUaqCGwAW+SIa7X/3Z8xk3QsNx3/Lw9gE5H1go42x/xVOvKPRZ2VZfzOh9F KIqMfozeqiNdOtmfdKQRYSxJoT1bw7Ll+7EotLAfqXFBcBdqUxCoZ9wZ8LmOfQskEb faTEnJWdxmswg== From: Geliang Tang To: mptcp@lists.linux.dev Cc: Geliang Tang , Hannes Reinecke , zhenwei pi , Hui Zhu , Gang Yan Subject: [RFC mptcp-next v17 04/14] nvmet-tcp: define accept tcp_proto struct Date: Mon, 25 May 2026 17:32:59 +0800 Message-ID: X-Mailer: git-send-email 2.53.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Geliang Tang To handle accepted sockets, this patch adds struct nvmet_tcp_proto to hold accept socket operations (no_linger, set_priority, set_tos, ops). A proto field is added to struct nvmet_tcp_queue, which points to the appropriate protocol structure. A TCP version is defined and assigned to queue->proto for TCP connections. Also modify nvmet_tcp_set_queue_sock() and nvmet_tcp_done_recv_pdu() to use queue->proto for socket operations and fabrics callbacks. Cc: Hannes Reinecke Co-developed-by: zhenwei pi Signed-off-by: zhenwei pi Co-developed-by: Hui Zhu Signed-off-by: Hui Zhu Co-developed-by: Gang Yan Signed-off-by: Gang Yan Signed-off-by: Geliang Tang --- drivers/nvme/target/tcp.c | 40 +++++++++++++++++++++++++++++++++------ 1 file changed, 34 insertions(+), 6 deletions(-) diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c index 7093faf3e985..4c198be45bef 100644 --- a/drivers/nvme/target/tcp.c +++ b/drivers/nvme/target/tcp.c @@ -145,6 +145,13 @@ enum nvmet_tcp_queue_state { NVMET_TCP_Q_FAILED, }; =20 +struct nvmet_tcp_proto { + void (*no_linger)(struct sock *sk); + void (*set_priority)(struct sock *sk, u32 priority); + void (*set_tos)(struct sock *sk); + const struct nvmet_fabrics_ops *ops; +}; + struct nvmet_tcp_queue { struct socket *sock; struct nvmet_tcp_port *port; @@ -196,6 +203,7 @@ struct nvmet_tcp_queue { void (*data_ready)(struct sock *); void (*state_change)(struct sock *); void (*write_space)(struct sock *); + const struct nvmet_tcp_proto *proto; }; =20 struct nvmet_tcp_port { @@ -1081,7 +1089,8 @@ static int nvmet_tcp_done_recv_pdu(struct nvmet_tcp_q= ueue *queue) req =3D &queue->cmd->req; memcpy(req->cmd, nvme_cmd, sizeof(*nvme_cmd)); =20 - if (unlikely(!nvmet_req_init(req, &queue->nvme_sq, &nvmet_tcp_ops))) { + if (unlikely(!nvmet_req_init(req, &queue->nvme_sq, + queue->proto->ops))) { pr_err("failed cmd %p id %d opcode %d, data_len: %d, status: %04x\n", req->cmd, req->cmd->common.command_id, req->cmd->common.opcode, @@ -1698,7 +1707,6 @@ static void nvmet_tcp_state_change(struct sock *sk) static int nvmet_tcp_set_queue_sock(struct nvmet_tcp_queue *queue) { struct socket *sock =3D queue->sock; - struct inet_sock *inet =3D inet_sk(sock->sk); int ret; =20 ret =3D kernel_getsockname(sock, @@ -1716,14 +1724,13 @@ static int nvmet_tcp_set_queue_sock(struct nvmet_tc= p_queue *queue) * close. This is done to prevent stale data from being sent should * the network connection be restored before TCP times out. */ - sock_no_linger(sock->sk); + queue->proto->no_linger(sock->sk); =20 if (so_priority > 0) - sock_set_priority(sock->sk, so_priority); + queue->proto->set_priority(sock->sk, so_priority); =20 /* Set socket type of service */ - if (inet->rcv_tos > 0) - ip_sock_set_tos(sock->sk, inet->rcv_tos); + queue->proto->set_tos(sock->sk); =20 ret =3D 0; write_lock_bh(&sock->sk->sk_callback_lock); @@ -1906,6 +1913,21 @@ static int nvmet_tcp_tls_handshake(struct nvmet_tcp_= queue *queue) static void nvmet_tcp_tls_handshake_timeout(struct work_struct *w) {} #endif =20 +static void tcp_sock_set_tos(struct sock *sk) +{ + struct inet_sock *inet =3D inet_sk(sk); + + if (inet->rcv_tos > 0) + ip_sock_set_tos(sk, inet->rcv_tos); +} + +static const struct nvmet_tcp_proto nvmet_tcp_proto =3D { + .no_linger =3D sock_no_linger, + .set_priority =3D sock_set_priority, + .set_tos =3D tcp_sock_set_tos, + .ops =3D &nvmet_tcp_ops, +}; + static void nvmet_tcp_alloc_queue(struct nvmet_tcp_port *port, struct socket *newsock) { @@ -1923,6 +1945,12 @@ static void nvmet_tcp_alloc_queue(struct nvmet_tcp_p= ort *port, INIT_WORK(&queue->io_work, nvmet_tcp_io_work); kref_init(&queue->kref); queue->sock =3D newsock; + if (newsock->sk->sk_protocol =3D=3D IPPROTO_TCP) { + queue->proto =3D &nvmet_tcp_proto; + } else { + ret =3D -EINVAL; + goto out_free_queue; + } queue->port =3D port; queue->nr_cmds =3D 0; spin_lock_init(&queue->state_lock); --=20 2.53.0 From nobody Mon May 25 18:05:06 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 371553537ED for ; Mon, 25 May 2026 09:33:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779701629; cv=none; b=qRRuVChYVgfgy1uCVNjUZ7jdJ58ra/EebilwDcw+s5mIJx9DoegtHg5XOy3A1fDAI2WXhC7AC8m6vkvfZoyDOP7GbXwMGzTCGg6glh5t5rFNApVKlmwYGiLLz+CbXvwcSJwNSfesOFz44Em4+LTUHPCdbffd0NxB9WUI6lfGEVU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779701629; c=relaxed/simple; bh=A8nnQQI4TIxfp9jQwvsbI67wPHSH2Yh64EHndVLZtfY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=njAGF5Y87SMgNVISxuZI4bBPLAb8tBtJlEWGtCX715uowshPQVQ+WXlps/1nwY13nRI7hDrFDBzH9LxXSP31GUKstJXilc9Cq/zuFo5My4oVxiiSyTqXe8zooCxy8RIWcrnUlvaXohpq7VEuxIxBbQXPd3SliFwiivZv4IcyqU8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=dJCM6Lo1; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="dJCM6Lo1" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 988921F00A3A; Mon, 25 May 2026 09:33:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1779701627; bh=Gmm6CS87D26VCwE2dTWGXMlYPrNFPESKbdWi2JjmXNQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=dJCM6Lo1Qe4K/c3c3UOCB8AOS9BzCmHf+nD5TZQuvIJLqLyFvrT1Q1qN2kPfWoqCq kEEEtGuQWtlI2Tprb7HwgSS63gN3YAf/BEmLg7Ung8cLFHHSreikxBdEe6Jg8xOxq9 tJ+PcDtLZEVc979Hy1KlEXpPL/o21VUX7dvO5Jp1uEPiVZPMwsoRjObMMC+JyoLojc i8TZXcTQ5R1NCC99cdsVm9UO7uXJ6axcTHXJd57L8daM9bNlwwGVla5C0uFdpNyQoT F21S+zi9jWMHfPpm3KkxJy8syHG8Y3n6fuBTGN15jsFCgg6npONrdluALDN/hF8bHq VMMkB2g87qVKw== From: Geliang Tang To: mptcp@lists.linux.dev Cc: Geliang Tang , Hannes Reinecke , zhenwei pi , Hui Zhu , Gang Yan Subject: [RFC mptcp-next v17 05/14] nvmet-tcp: implement accept mptcp proto Date: Mon, 25 May 2026 17:33:00 +0800 Message-ID: X-Mailer: git-send-email 2.53.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Geliang Tang An MPTCP-specific version of struct nvmet_tcp_proto is implemented for accept sockets. It is assigned to queue->proto when the accepted socket protocol is IPPROTO_MPTCP. Dedicated MPTCP helpers are introduced for setting accept socket options. These helpers (no_linger, set_priority, set_tos) set the values on all existing subflows using mptcp_for_each_subflow(). The values are then synchronized to other newly created subflows in sync_socket_options(). Cc: Hannes Reinecke Co-developed-by: zhenwei pi Signed-off-by: zhenwei pi Co-developed-by: Hui Zhu Signed-off-by: Hui Zhu Co-developed-by: Gang Yan Signed-off-by: Gang Yan Signed-off-by: Geliang Tang --- drivers/nvme/target/tcp.c | 16 +++++++++++ include/net/mptcp.h | 8 ++++++ net/mptcp/sockopt.c | 58 +++++++++++++++++++++++++++++++++++++++ 3 files changed, 82 insertions(+) diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c index 4c198be45bef..8c2dc4bcbcd3 100644 --- a/drivers/nvme/target/tcp.c +++ b/drivers/nvme/target/tcp.c @@ -220,6 +220,9 @@ static DEFINE_MUTEX(nvmet_tcp_queue_mutex); =20 static struct workqueue_struct *nvmet_tcp_wq; static const struct nvmet_fabrics_ops nvmet_tcp_ops; +#ifdef CONFIG_MPTCP +static const struct nvmet_fabrics_ops nvmet_mptcp_ops; +#endif static void nvmet_tcp_free_cmd(struct nvmet_tcp_cmd *c); static void nvmet_tcp_free_cmd_buffers(struct nvmet_tcp_cmd *cmd); =20 @@ -1928,6 +1931,15 @@ static const struct nvmet_tcp_proto nvmet_tcp_proto = =3D { .ops =3D &nvmet_tcp_ops, }; =20 +#ifdef CONFIG_MPTCP +static const struct nvmet_tcp_proto nvmet_mptcp_proto =3D { + .no_linger =3D mptcp_sock_no_linger, + .set_priority =3D mptcp_sock_set_priority, + .set_tos =3D mptcp_sock_set_tos, + .ops =3D &nvmet_mptcp_ops, +}; +#endif + static void nvmet_tcp_alloc_queue(struct nvmet_tcp_port *port, struct socket *newsock) { @@ -1947,6 +1959,10 @@ static void nvmet_tcp_alloc_queue(struct nvmet_tcp_p= ort *port, queue->sock =3D newsock; if (newsock->sk->sk_protocol =3D=3D IPPROTO_TCP) { queue->proto =3D &nvmet_tcp_proto; +#ifdef CONFIG_MPTCP + } else if (newsock->sk->sk_protocol =3D=3D IPPROTO_MPTCP) { + queue->proto =3D &nvmet_mptcp_proto; +#endif } else { ret =3D -EINVAL; goto out_free_queue; diff --git a/include/net/mptcp.h b/include/net/mptcp.h index 555af4022741..8eacb9424b37 100644 --- a/include/net/mptcp.h +++ b/include/net/mptcp.h @@ -243,6 +243,10 @@ void mptcp_sock_set_reuseaddr(struct sock *sk); void mptcp_sock_set_nodelay(struct sock *sk); =20 void mptcp_sock_set_priority(struct sock *sk, u32 priority); + +void mptcp_sock_no_linger(struct sock *sk); + +void mptcp_sock_set_tos(struct sock *sk); #else =20 static inline void mptcp_init(void) @@ -335,6 +339,10 @@ static inline void mptcp_sock_set_reuseaddr(struct soc= k *sk) { } static inline void mptcp_sock_set_nodelay(struct sock *sk) { } =20 static inline void mptcp_sock_set_priority(struct sock *sk, u32 priority) = { } + +static inline void mptcp_sock_no_linger(struct sock *sk) { } + +static inline void mptcp_sock_set_tos(struct sock *sk) { } #endif /* CONFIG_MPTCP */ =20 #if IS_ENABLED(CONFIG_MPTCP_IPV6) diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c index 9be1105a1ec2..730e945bf8cd 100644 --- a/net/mptcp/sockopt.c +++ b/net/mptcp/sockopt.c @@ -1725,3 +1725,61 @@ void mptcp_sock_set_priority(struct sock *sk, u32 pr= iority) release_sock(sk); } EXPORT_SYMBOL(mptcp_sock_set_priority); + +void mptcp_sock_no_linger(struct sock *sk) +{ + struct mptcp_sock *msk =3D mptcp_sk(sk); + struct mptcp_subflow_context *subflow; + struct sock *ssk; + + lock_sock(sk); + sockopt_seq_inc(msk); + WRITE_ONCE(sk->sk_lingertime, 0); + sock_set_flag(sk, SOCK_LINGER); + mptcp_for_each_subflow(msk, subflow) { + ssk =3D mptcp_subflow_tcp_sock(subflow); + if (ssk) { + lock_sock_nested(ssk, SINGLE_DEPTH_NESTING); + WRITE_ONCE(ssk->sk_lingertime, 0); + sock_set_flag(ssk, SOCK_LINGER); + release_sock(ssk); + } + } + release_sock(sk); +} +EXPORT_SYMBOL(mptcp_sock_no_linger); + +static void __mptcp_sock_set_tos(struct sock *sk, int val) +{ + struct mptcp_sock *msk =3D mptcp_sk(sk); + struct mptcp_subflow_context *subflow; + struct sock *ssk; + + lock_sock(sk); + sockopt_seq_inc(msk); + __ip_sock_set_tos(sk, val); + mptcp_for_each_subflow(msk, subflow) { + ssk =3D mptcp_subflow_tcp_sock(subflow); + if (ssk) { + lock_sock_nested(ssk, SINGLE_DEPTH_NESTING); + __ip_sock_set_tos(ssk, val); + release_sock(ssk); + } + } + release_sock(sk); +} + +void mptcp_sock_set_tos(struct sock *sk) +{ + struct mptcp_sock *msk =3D mptcp_sk(sk); + int val =3D 0; + + lock_sock(sk); + if (msk->first) + val =3D inet_sk(msk->first)->rcv_tos; + release_sock(sk); + + if (val > 0) + __mptcp_sock_set_tos(sk, val); +} +EXPORT_SYMBOL(mptcp_sock_set_tos); --=20 2.53.0 From nobody Mon May 25 18:05:06 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C9ACB2EDD6B for ; Mon, 25 May 2026 09:33:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779701632; cv=none; b=oSWOdZTU7DtVsWiicRNw3e6YXMakSGFWN8axQ/QPDV1INWoSXckipwBKTHfY5cXhjWnDofIgNgMlIL3FhOPF+8ZVToUDtkvjZdryRXrXKj0tNsp6sZIcbbZoR6WjWxzamHZiJE22sCJpWNM3cjQ0q6EiNMUlN8zl3tTrdmbFh5A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779701632; c=relaxed/simple; bh=rioaZuPd0ARD2Rjm30w0Al1Qw/LsGme3k8BDFNZtID8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=r693QilJvzqyp2Cr+MBEVPxxPZZvB1gLgXGl/UeGc8AzmImB5gbxh8dZxlq6oVc/+CnaUzbXyco6J2BcPqrKVlWAvyLhci4yIBCwo7+uHphVLO7MLH7vKx/rlZTteZbkGKl3bljpGSxzLtB6nvoj7shXzfqxf15kc7tVfrET3GE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=RLvUOrAi; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="RLvUOrAi" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5F5931F000E9; Mon, 25 May 2026 09:33:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1779701630; bh=4dhpcdUpcC+uDvtksfD7cXJ+uhbtppSNz746ZVxcoTQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=RLvUOrAiPipT/vXgzjUeujYa94IO1yqYB0TZKFBVGyEoj3oKDGhQQpPopufZ5LCQQ DjEUkWYKPwx2cGYj7LAtJJhkXyGGaLNEVXxprslChCdy1re8WAtxBd5sMarDIUkzec IEZu+8nRkRoyfxumDfvY8A7PvQ4ZewVJIX+9GvYFKg7NDlQkinJsgHOzGSHX8lsK5n HMdxKF2s5tjWERwcBrmm31dx4nSvBItEyHTjhDF9q2qY71/KuLs7YLy/pFW42ofHUo 37LopS+SATWERGyLcxMjxY/0pIHFK6VXvmxs2xR1fP+RSQW8LBWvcFW21lBwQAWRXi FVarwuXEA95hw== From: Geliang Tang To: mptcp@lists.linux.dev Cc: Geliang Tang , Hannes Reinecke , zhenwei pi , Hui Zhu , Gang Yan Subject: [RFC mptcp-next v17 06/14] nvme-tcp: define host tcp_proto struct Date: Mon, 25 May 2026 17:33:01 +0800 Message-ID: X-Mailer: git-send-email 2.53.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Geliang Tang To add MPTCP support in "NVMe over TCP", the host side needs to pass IPPROTO_MPTCP to sock_create_kern() instead of IPPROTO_TCP to create an MPTCP socket. Similar to the target-side nvmet_tcp_proto, this patch defines the host-side nvme_tcp_proto structure, which contains the protocol of the socket and a set of function pointers for socket operations. The only difference is that it defines .set_syncnt instead of .set_reuseaddr. A TCP-specific version of this structure is defined, and a proto field is added to nvme_tcp_ctrl. When the transport string is "tcp", it is assigned to ctrl->proto. All locations that previously called TCP setsockopt functions are updated to call the corresponding function pointers in the nvme_tcp_proto structure. The controller's proto pointer is set during initialization and remains valid throughout the controller's lifetime. v2: - use 'trtype' instead of '--mptcp' (Hannes) v3: - check mptcp protocol from opts->transport instead of passing a parameter (Hannes). v4: - check CONFIG_MPTCP. Cc: Hannes Reinecke Co-developed-by: zhenwei pi Signed-off-by: zhenwei pi Co-developed-by: Hui Zhu Signed-off-by: Hui Zhu Co-developed-by: Gang Yan Signed-off-by: Gang Yan Signed-off-by: Geliang Tang --- drivers/nvme/host/tcp.c | 44 ++++++++++++++++++++++++++++++++++------- 1 file changed, 37 insertions(+), 7 deletions(-) diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index 15d36d6a728e..13a5240623ef 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -182,6 +182,16 @@ struct nvme_tcp_queue { void (*write_space)(struct sock *); }; =20 +struct nvme_tcp_proto { + int protocol; + int (*set_syncnt)(struct sock *sk, int val); + void (*set_nodelay)(struct sock *sk); + void (*no_linger)(struct sock *sk); + void (*set_priority)(struct sock *sk, u32 priority); + void (*set_tos)(struct sock *sk, int val); + const struct nvme_ctrl_ops *ops; +}; + struct nvme_tcp_ctrl { /* read only in the hot path */ struct nvme_tcp_queue *queues; @@ -198,6 +208,8 @@ struct nvme_tcp_ctrl { struct delayed_work connect_work; struct nvme_tcp_request async_req; u32 io_queues[HCTX_MAX_TYPES]; + + const struct nvme_tcp_proto *proto; }; =20 static LIST_HEAD(nvme_tcp_ctrl_list); @@ -1799,7 +1811,7 @@ static int nvme_tcp_alloc_queue(struct nvme_ctrl *nct= rl, int qid, =20 ret =3D sock_create_kern(current->nsproxy->net_ns, ctrl->addr.ss_family, SOCK_STREAM, - IPPROTO_TCP, &queue->sock); + ctrl->proto->protocol, &queue->sock); if (ret) { dev_err(nctrl->device, "failed to create socket: %d\n", ret); @@ -1816,24 +1828,24 @@ static int nvme_tcp_alloc_queue(struct nvme_ctrl *n= ctrl, int qid, nvme_tcp_reclassify_socket(queue->sock); =20 /* Single syn retry */ - tcp_sock_set_syncnt(queue->sock->sk, 1); + ctrl->proto->set_syncnt(queue->sock->sk, 1); =20 /* Set TCP no delay */ - tcp_sock_set_nodelay(queue->sock->sk); + ctrl->proto->set_nodelay(queue->sock->sk); =20 /* * Cleanup whatever is sitting in the TCP transmit queue on socket * close. This is done to prevent stale data from being sent should * the network connection be restored before TCP times out. */ - sock_no_linger(queue->sock->sk); + ctrl->proto->no_linger(queue->sock->sk); =20 if (so_priority > 0) - sock_set_priority(queue->sock->sk, so_priority); + ctrl->proto->set_priority(queue->sock->sk, so_priority); =20 /* Set socket type of service */ if (nctrl->opts->tos >=3D 0) - ip_sock_set_tos(queue->sock->sk, nctrl->opts->tos); + ctrl->proto->set_tos(queue->sock->sk, nctrl->opts->tos); =20 /* Set 10 seconds timeout for icresp recvmsg */ queue->sock->sk->sk_rcvtimeo =3D 10 * HZ; @@ -2900,6 +2912,17 @@ nvme_tcp_existing_controller(struct nvmf_ctrl_option= s *opts) return found; } =20 +static const struct nvme_tcp_proto nvme_tcp_proto =3D { + .protocol =3D IPPROTO_TCP, + .set_syncnt =3D tcp_sock_set_syncnt, + .set_nodelay =3D tcp_sock_set_nodelay, + .no_linger =3D sock_no_linger, + .set_priority =3D sock_set_priority, + .set_tos =3D ip_sock_set_tos, + .ops =3D &nvme_tcp_ctrl_ops, + +}; + static struct nvme_tcp_ctrl *nvme_tcp_alloc_ctrl(struct device *dev, struct nvmf_ctrl_options *opts) { @@ -2964,13 +2987,20 @@ static struct nvme_tcp_ctrl *nvme_tcp_alloc_ctrl(st= ruct device *dev, goto out_free_ctrl; } =20 + if (!strcmp(ctrl->ctrl.opts->transport, "tcp")) { + ctrl->proto =3D &nvme_tcp_proto; + } else { + ret =3D -EINVAL; + goto out_free_ctrl; + } + ctrl->queues =3D kzalloc_objs(*ctrl->queues, ctrl->ctrl.queue_count); if (!ctrl->queues) { ret =3D -ENOMEM; goto out_free_ctrl; } =20 - ret =3D nvme_init_ctrl(&ctrl->ctrl, dev, &nvme_tcp_ctrl_ops, 0); + ret =3D nvme_init_ctrl(&ctrl->ctrl, dev, ctrl->proto->ops, 0); if (ret) goto out_kfree_queues; =20 --=20 2.53.0 From nobody Mon May 25 18:05:06 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8B4553537ED for ; Mon, 25 May 2026 09:33:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779701634; cv=none; b=Hs9BRRHGT9SklP9QehOPiJVHsLFN8jv307zNRp4WTvLYNBZgFaQCpmZcmDNvTzGo0l8OUk4C7ZhT84XWZDY4XqFPqACeHV8boSFMsR9oM8wDJNgAgC7Afo5ryt00/XL1BdRkrUGETHi+INTNZHsPhMcLLB70k+oQG4c0QNxrvUE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779701634; c=relaxed/simple; bh=dIgLnqVTYVcMQjUAFiSzZFhfQodv/DxLSR9woHUYvPM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=GNyF6Bf/lhVItpGxOYEW0PWWwU5rWheLqkf8Qzvpo/MjeyNwlQ19mV7uG5AijUMNTv2P6vcr7Fd1AH3Qh9YqbKxHxmaBv99PsOGc7G6Je76ZVCMI57Y5NlPRaow3Wcl94Js4nev5cTjTtLh4iPo6Su3gWwpheeMGYatoE8aBoF8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=IwisjWTj; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="IwisjWTj" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 250F91F00A3A; Mon, 25 May 2026 09:33:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1779701633; bh=OHIpmDygvz0d3+mAZQXcQkX8JBWfz/mALeYLpbWThIM=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=IwisjWTjE52Zr1dxj8lKRWr2o8B/FsV33ULDN2QV+d/AZ60wpl7y7SqloFAFicVNB WYzC+/VH4/QdLMxEeWJ2qMMQOO70Qdn8jueqcxZk+alz86l7sa//ZofVr7thRDOltA xAZN4/cAKwS+ICO89FS4e8mulvz5F+dsXLtRen87+FGRAFX8KVSqWCUs6r5pkHoVnn jkZl5h0bPJOo/NsAKtgrTOVFcbn74wYWCAba6A6s+VJxXZCgVqG58SUnnWnpbnrPao bgXDaDmON9+itdfz6XPC612Xm1JJHDR8Ba7p4/z74pKJKnLXjOwIec0hi79FKdarJm e3rFNTL5rTcWw== From: Geliang Tang To: mptcp@lists.linux.dev Cc: Geliang Tang , Hannes Reinecke , zhenwei pi , Hui Zhu , Gang Yan Subject: [RFC mptcp-next v17 07/14] nvme-tcp: register host mptcp transport Date: Mon, 25 May 2026 17:33:02 +0800 Message-ID: <55802b756982f001a2f37518a3dba448009dd538.1779701391.git.tanggeliang@kylinos.cn> X-Mailer: git-send-email 2.53.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Geliang Tang This patch defines a new nvmf_transport_ops named nvme_mptcp_transport, which is almost the same as nvme_tcp_transport except .name and .allowed_opts. MPTCP currently does not support TLS. The four TLS-related options (NVMF_OPT_TLS, NVMF_OPT_KEYRING, NVMF_OPT_TLS_KEY, and NVMF_OPT_CONCAT) have been removed from allowed_opts. They will be added back once MPTCP TLS is supported. It is registered in nvme_tcp_init_module() and unregistered in nvme_tcp_cleanup_module(). A MODULE_ALIAS("nvme-mptcp") declaration is added at the end of the file. Cc: Hannes Reinecke Co-developed-by: zhenwei pi Signed-off-by: zhenwei pi Co-developed-by: Hui Zhu Signed-off-by: Hui Zhu Co-developed-by: Gang Yan Signed-off-by: Gang Yan Signed-off-by: Geliang Tang --- drivers/nvme/host/tcp.c | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index 13a5240623ef..305624d59c50 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -3067,6 +3067,20 @@ static struct nvmf_transport_ops nvme_tcp_transport = =3D { .create_ctrl =3D nvme_tcp_create_ctrl, }; =20 +#ifdef CONFIG_MPTCP +static struct nvmf_transport_ops nvme_mptcp_transport =3D { + .name =3D "mptcp", + .module =3D THIS_MODULE, + .required_opts =3D NVMF_OPT_TRADDR, + .allowed_opts =3D NVMF_OPT_TRSVCID | NVMF_OPT_RECONNECT_DELAY | + NVMF_OPT_HOST_TRADDR | NVMF_OPT_CTRL_LOSS_TMO | + NVMF_OPT_HDR_DIGEST | NVMF_OPT_DATA_DIGEST | + NVMF_OPT_NR_WRITE_QUEUES | NVMF_OPT_NR_POLL_QUEUES | + NVMF_OPT_TOS | NVMF_OPT_HOST_IFACE, + .create_ctrl =3D nvme_tcp_create_ctrl, +}; +#endif + static int __init nvme_tcp_init_module(void) { unsigned int wq_flags =3D WQ_MEM_RECLAIM | WQ_HIGHPRI | WQ_SYSFS; @@ -3092,6 +3106,9 @@ static int __init nvme_tcp_init_module(void) atomic_set(&nvme_tcp_cpu_queues[cpu], 0); =20 nvmf_register_transport(&nvme_tcp_transport); +#ifdef CONFIG_MPTCP + nvmf_register_transport(&nvme_mptcp_transport); +#endif return 0; } =20 @@ -3099,6 +3116,9 @@ static void __exit nvme_tcp_cleanup_module(void) { struct nvme_tcp_ctrl *ctrl; =20 +#ifdef CONFIG_MPTCP + nvmf_unregister_transport(&nvme_mptcp_transport); +#endif nvmf_unregister_transport(&nvme_tcp_transport); =20 mutex_lock(&nvme_tcp_ctrl_mutex); @@ -3116,3 +3136,6 @@ module_exit(nvme_tcp_cleanup_module); MODULE_DESCRIPTION("NVMe host TCP transport driver"); MODULE_LICENSE("GPL v2"); MODULE_ALIAS("nvme-tcp"); +#ifdef CONFIG_MPTCP +MODULE_ALIAS("nvme-mptcp"); +#endif --=20 2.53.0 From nobody Mon May 25 18:05:06 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C35272EDD6B for ; Mon, 25 May 2026 09:33:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779701638; cv=none; b=p3XyUXQ3FvtfMAJwrGxEUjcOGNL4eqqxru3XotNQy6Qc3zjxPCyXwbNRgRJR0bWjPtooN+7IzRYCul9i9tpvj64MWHEsVe2hEOi1H+fA8qIo3we93G2Snet7yM2JwKlmT2IJ8uAVCGyXN0TV7igwTC/ZIM/57x4XGmCVZXkPTJA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779701638; c=relaxed/simple; bh=aYHWoIXanMnv7qBBelhKhdB6SGqkDGOrxdzo8PJoaL8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=jNx1RigHTO+rdPOM5m/9N0nszvi/UjFC6ROEcjR8woIajMpVe/bUBjWdpr5wrxTD5AmSdvgHU4rhonvdGc3RUxEsqN8hOFZEh/JTq/Bq9Vj31H8A+RDzpMVTnxdAtxDxUJjEJUYzkHNUYzFeIQzL61quFrBEDPyERu7AxFVAbig= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=T4DwK9KX; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="T4DwK9KX" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 04ED81F000E9; Mon, 25 May 2026 09:33:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1779701636; bh=77zcnKNMFNSyGnZK3VynpGkDJG1w0Yt2ls1JAkvUxZM=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=T4DwK9KXAyhXlMSJUc64YCIXoNHZEXV5ZNm4I2FIicqscG2KITDAXE3iOlT5SaIhZ BhOLEIlafe52MBEtN/GMSvzdNExVUA4uHyfC9KoBk2zx/xRaUo/FcGDHLoCRLO7oo+ /v2NBBhNPGdms8xVz8qTO3TnCu/OJ/ICThPfjWuzilGlxLnHi0EZLrVTgfHQOK+Nm8 YW13F85QWOgQVddPQrSWANIODNGUZl7MT9IK5vCT4+lL7WGmF0WDl/NvnRXpsWUuHx //IcdakYuS40UJPmwZJ1x7gSLmpj99yG+BBq+i5Z5jMFjlvRPGWzH5jYTeMR0OpkE0 73g+/UMPSIurg== From: Geliang Tang To: mptcp@lists.linux.dev Cc: Geliang Tang , Hannes Reinecke , zhenwei pi , Hui Zhu , Gang Yan Subject: [RFC mptcp-next v17 08/14] nvme-tcp: implement host mptcp proto Date: Mon, 25 May 2026 17:33:03 +0800 Message-ID: <5217d46d662389e8679a4838b26fa03d504d86b9.1779701391.git.tanggeliang@kylinos.cn> X-Mailer: git-send-email 2.53.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Geliang Tang An MPTCP-specific version of struct nvme_tcp_proto is implemented, and it is assigned to ctrl->proto when the transport string is "mptcp". The socket option setting logic is similar to the target side, except that mptcp_sock_set_syncnt is newly defined for the host side. These helpers set the values on all existing subflows of an MPTCP connection, except for set_reuseaddr which only applies to the first subflow. The values are then synchronized to other newly created subflows in sync_socket_options(). A separate nvme_mptcp_ctrl_ops structure with .name =3D "mptcp" is defined and used for MPTCP controllers. "mptcp" is planned to be introduced as a new NVMe transport type into the NVMe Base Specification in the future. Currently, the Discovery Log does not yet recognize trtype=3D4 (MPTCP), and will show "trtype: unrecognized" for such entries: =3D=3D=3D=3D=3DDiscovery Log Entry 0=3D=3D=3D=3D=3D=3D trtype: unrecognized adrfam: ipv4 subtype: current discovery subsystem treq: not specified, sq flow control disable supported portid: 23106 trsvcid: 23601 subnqn: nqn.2014-08.org.nvmexpress.discovery traddr: 10.1.1.1 eflags: none Cc: Hannes Reinecke Co-developed-by: zhenwei pi Signed-off-by: zhenwei pi Co-developed-by: Hui Zhu Signed-off-by: Hui Zhu Co-developed-by: Gang Yan Signed-off-by: Gang Yan Signed-off-by: Geliang Tang --- drivers/nvme/host/tcp.c | 34 ++++++++++++++++++++++++++++++++++ include/net/mptcp.h | 11 +++++++++++ net/mptcp/sockopt.c | 30 +++++++++++++++++++++++++++++- 3 files changed, 74 insertions(+), 1 deletion(-) diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index 305624d59c50..2388a8c443cc 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -2895,6 +2895,24 @@ static const struct nvme_ctrl_ops nvme_tcp_ctrl_ops = =3D { .get_virt_boundary =3D nvmf_get_virt_boundary, }; =20 +#ifdef CONFIG_MPTCP +static const struct nvme_ctrl_ops nvme_mptcp_ctrl_ops =3D { + .name =3D "mptcp", + .module =3D THIS_MODULE, + .flags =3D NVME_F_FABRICS | NVME_F_BLOCKING, + .reg_read32 =3D nvmf_reg_read32, + .reg_read64 =3D nvmf_reg_read64, + .reg_write32 =3D nvmf_reg_write32, + .subsystem_reset =3D nvmf_subsystem_reset, + .free_ctrl =3D nvme_tcp_free_ctrl, + .submit_async_event =3D nvme_tcp_submit_async_event, + .delete_ctrl =3D nvme_tcp_delete_ctrl, + .get_address =3D nvme_tcp_get_address, + .stop_ctrl =3D nvme_tcp_stop_ctrl, + .get_virt_boundary =3D nvmf_get_virt_boundary, +}; +#endif + static bool nvme_tcp_existing_controller(struct nvmf_ctrl_options *opts) { @@ -2923,6 +2941,18 @@ static const struct nvme_tcp_proto nvme_tcp_proto = =3D { =20 }; =20 +#ifdef CONFIG_MPTCP +static const struct nvme_tcp_proto nvme_mptcp_proto =3D { + .protocol =3D IPPROTO_MPTCP, + .set_syncnt =3D mptcp_sock_set_syncnt, + .set_nodelay =3D mptcp_sock_set_nodelay, + .no_linger =3D mptcp_sock_no_linger, + .set_priority =3D mptcp_sock_set_priority, + .set_tos =3D __mptcp_sock_set_tos, + .ops =3D &nvme_mptcp_ctrl_ops, +}; +#endif + static struct nvme_tcp_ctrl *nvme_tcp_alloc_ctrl(struct device *dev, struct nvmf_ctrl_options *opts) { @@ -2989,6 +3019,10 @@ static struct nvme_tcp_ctrl *nvme_tcp_alloc_ctrl(str= uct device *dev, =20 if (!strcmp(ctrl->ctrl.opts->transport, "tcp")) { ctrl->proto =3D &nvme_tcp_proto; +#ifdef CONFIG_MPTCP + } else if (!strcmp(ctrl->ctrl.opts->transport, "mptcp")) { + ctrl->proto =3D &nvme_mptcp_proto; +#endif } else { ret =3D -EINVAL; goto out_free_ctrl; diff --git a/include/net/mptcp.h b/include/net/mptcp.h index 8eacb9424b37..9d5f0bf49d31 100644 --- a/include/net/mptcp.h +++ b/include/net/mptcp.h @@ -246,7 +246,11 @@ void mptcp_sock_set_priority(struct sock *sk, u32 prio= rity); =20 void mptcp_sock_no_linger(struct sock *sk); =20 +void __mptcp_sock_set_tos(struct sock *sk, int val); + void mptcp_sock_set_tos(struct sock *sk); + +int mptcp_sock_set_syncnt(struct sock *sk, int val); #else =20 static inline void mptcp_init(void) @@ -342,7 +346,14 @@ static inline void mptcp_sock_set_priority(struct sock= *sk, u32 priority) { } =20 static inline void mptcp_sock_no_linger(struct sock *sk) { } =20 +static inline void __mptcp_sock_set_tos(struct sock *sk, int val) { } + static inline void mptcp_sock_set_tos(struct sock *sk) { } + +static inline int mptcp_sock_set_syncnt(struct sock *sk, int val) +{ + return 0; +} #endif /* CONFIG_MPTCP */ =20 #if IS_ENABLED(CONFIG_MPTCP_IPV6) diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c index 730e945bf8cd..8ed7c8b875ed 100644 --- a/net/mptcp/sockopt.c +++ b/net/mptcp/sockopt.c @@ -1598,6 +1598,8 @@ static void sync_socket_options(struct mptcp_sock *ms= k, struct sock *ssk) WRITE_ONCE(inet_sk(ssk)->local_port_range, READ_ONCE(inet_sk(sk)->local_p= ort_range)); =20 ssk->sk_reuse =3D sk->sk_reuse; + if (inet_csk(sk)->icsk_syn_retries > 0) + tcp_sock_set_syncnt(ssk, inet_csk(sk)->icsk_syn_retries); } =20 void mptcp_sockopt_sync_locked(struct mptcp_sock *msk, struct sock *ssk) @@ -1749,7 +1751,7 @@ void mptcp_sock_no_linger(struct sock *sk) } EXPORT_SYMBOL(mptcp_sock_no_linger); =20 -static void __mptcp_sock_set_tos(struct sock *sk, int val) +void __mptcp_sock_set_tos(struct sock *sk, int val) { struct mptcp_sock *msk =3D mptcp_sk(sk); struct mptcp_subflow_context *subflow; @@ -1768,6 +1770,7 @@ static void __mptcp_sock_set_tos(struct sock *sk, int= val) } release_sock(sk); } +EXPORT_SYMBOL(__mptcp_sock_set_tos); =20 void mptcp_sock_set_tos(struct sock *sk) { @@ -1783,3 +1786,28 @@ void mptcp_sock_set_tos(struct sock *sk) __mptcp_sock_set_tos(sk, val); } EXPORT_SYMBOL(mptcp_sock_set_tos); + +int mptcp_sock_set_syncnt(struct sock *sk, int val) +{ + struct mptcp_sock *msk =3D mptcp_sk(sk); + struct mptcp_subflow_context *subflow; + struct sock *ssk; + + if (val < 1 || val > MAX_TCP_SYNCNT) + return -EINVAL; + + lock_sock(sk); + sockopt_seq_inc(msk); + inet_csk(sk)->icsk_syn_retries =3D val; + mptcp_for_each_subflow(msk, subflow) { + ssk =3D mptcp_subflow_tcp_sock(subflow); + if (ssk) { + lock_sock_nested(ssk, SINGLE_DEPTH_NESTING); + tcp_sock_set_syncnt(ssk, val); + release_sock(ssk); + } + } + release_sock(sk); + return 0; +} +EXPORT_SYMBOL(mptcp_sock_set_syncnt); --=20 2.53.0 From nobody Mon May 25 18:05:06 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D22CD3537ED for ; Mon, 25 May 2026 09:33:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779701640; cv=none; b=KvV1CYNpnkD/XxpZ+GtiKEsmxTqQGmDATR20/+T/dHcazkL64ILKbI4FiTHx9LF57Hvk8BNz/xDxFO0Kc8glVqZAVlGHwlWO7EXmdSGuSm+EVunIpQSWNozDDH9IYZzmu16NKvAgQVQEjscFG16lIO5/ABbKFJE65N6CCvpt3dw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779701640; c=relaxed/simple; bh=G3OApdTCjuOewvHSYGBUUDZYzD7d0WKMs9vTBUKKWpw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Yo51sWivbMopAlkS96sW70hG+7BQNnQhyjSc3Co8A8ECQ67mIknW8pXPcr6OljduVuiZGvjWE5DczcdIQSjLMwqzzFeo98arp58+5gmpxbJZFnmcR83t8EvTiBwpfOc4bbJAqJvD4gSLWc8/hsDNPGWnUZBt3DK8iWfUBJvVasU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Slrcxq8J; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Slrcxq8J" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 169F01F00A3A; Mon, 25 May 2026 09:33:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1779701639; bh=9E98rJDebtkrmniVOv6GhgzET2aDoHLnpUSZmwh9LEI=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=Slrcxq8JvZh5lGd8topxtIr9o19Cz5cq3jR0AjQOuGQ31qQfMGqJ0aJi2jZdbtdQY ry7a3sSDFI/Z8mH0NfW7dG/kQRZBkeBKkiNYZPd7raPDFyOwm9Kn7mJ5zMNmq8n+CG 9deIzdu0oZcH5LeXdJTpzZ0zmJAfM0VQL7mh2GMVIddPBxqHVrHtMu7BzPs2G+ywS5 IhRaB7X8/muCvGO2rm25AY98AXqgqpbt4bAnJ6tiefTNk0x4ESILJNJN5PEfQ3UOBo c4IRmeZCL5fmQ3F/XePSRsa4e93YAw44Z7VC2xZZtz/qR3jLxOMqnTtb/61y4Yvidz 2h/9iiBUN2now== From: Geliang Tang To: mptcp@lists.linux.dev Cc: Geliang Tang , Hannes Reinecke , zhenwei pi , Hui Zhu , Gang Yan Subject: [RFC mptcp-next v17 09/14] nvme-fabrics: compare transport in ip_options_match Date: Mon, 25 May 2026 17:33:04 +0800 Message-ID: <37a0840daaf0668b42c86f975a91edb554d3f3a5.1779701391.git.tanggeliang@kylinos.cn> X-Mailer: git-send-email 2.53.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Geliang Tang When checking for an existing controller, nvmf_ip_options_match() does not compare the transport type. This can cause a TCP connection request to incorrectly match an existing MPTCP controller, or an MPTCP connection request to match an existing TCP controller, resulting in a false -EALREADY error. Fix this by adding strcmp(opts->transport, ctrl->opts->transport) to the matching condition. Cc: Hannes Reinecke Cc: zhenwei pi Cc: Hui Zhu Cc: Gang Yan Signed-off-by: Geliang Tang --- drivers/nvme/host/fabrics.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c index ac3d4f400601..e086e61e8f94 100644 --- a/drivers/nvme/host/fabrics.c +++ b/drivers/nvme/host/fabrics.c @@ -1220,6 +1220,7 @@ bool nvmf_ip_options_match(struct nvme_ctrl *ctrl, struct nvmf_ctrl_options *opts) { if (!nvmf_ctlr_matches_baseopts(ctrl, opts) || + strcmp(opts->transport, ctrl->opts->transport) || strcmp(opts->traddr, ctrl->opts->traddr) || strcmp(opts->trsvcid, ctrl->opts->trsvcid)) return false; --=20 2.53.0 From nobody Mon May 25 18:05:06 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5BCD72EDD6B for ; Mon, 25 May 2026 09:34:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779701643; cv=none; b=IXWK6ks7MpkaMMeLyY0wyCX0kfHRRBOGb/V1JmeaxGJJvx9u0fESVAoISalusRLKLUXJp7T0xWEr3YbKUGR57wyMMaygXcVzCWsDWYVIlsnW6F+rnbOL0NDnUx/3aYywOc6uER/jewOhLdtl3it/LUQgRsp4hVfY9gPYV82qGGU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779701643; c=relaxed/simple; bh=sPCvndLtC4IDi+Y1nU54FY1r59m530pzpVWMvk1DBEI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=HTCPGUhANNsIcXPvYEdhC9wXfpL9wcPkjbvH2iwlwTTqgIvwRGp8Jw/rxMVGshPX3nZ77J/r41Guu3puML5tV8IftLCF/dIcpMUzAlFBhYJ8kj3YU7sMKGrCCAWCGiOWAzqeJ0GdrOG0OSQESJFN1bVPAQwzEGzbxh9H0dy6dQw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=fIZrDUQh; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="fIZrDUQh" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 28C231F000E9; Mon, 25 May 2026 09:33:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1779701642; bh=by91ApEM8gvLFvbo0HM1CsSaGopdTBbRJHx2NM4gPpw=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=fIZrDUQha8pGuXQINALJfBUdH2RiSRU6/4mJON1ZZPQwHVCN9WZ5TybLJmfRSzRke S3xF12u2WM/NywqNhVBUhCZC0W1VyiNapQBF+WkLeeviMPQRw2kmH9VCF/6LgxJcC3 hfX8okWIwvrskTWB7ooj2gjZMrajX6Us0j0MK9m86hent3nltpr/2qpbT/px5Y5j30 UrFomKz8JNafKRNoTKIbQTHZx9JatuYRhGwAk9VkQxbHwgiFbtJMAjngyQxo4AXHtI /o13f7cdFvXgpftpsZkNJpJZbZPqUBpo0vs3A7eTYFg0leovsMZoHutKoZGiGf8gs0 fKHecPYsKQoEQ== From: Geliang Tang To: mptcp@lists.linux.dev Cc: Geliang Tang , Hannes Reinecke , zhenwei pi , Hui Zhu , Gang Yan Subject: [RFC mptcp-next v17 10/14] selftests: mptcp: add nvme over mptcp test Date: Mon, 25 May 2026 17:33:05 +0800 Message-ID: <7716719744dd9c5ddf67faa155982903cbfb6e71.1779701391.git.tanggeliang@kylinos.cn> X-Mailer: git-send-email 2.53.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Geliang Tang A test case for NVMe over MPTCP has been implemented. It verifies the proper functionality of nvme discover and connect commands to establish NVMe over MPTCP connections. The test then evaluates read/write performance using fio, and ensures proper cleanup with nvme disconnect. This script accepts two positional parameters: trtype - Transport type (mptcp|tcp). Default: mptcp path - Number of multipath (1-4). Default: 1 This test simulates four NICs on both target and host sides, each limited to 125MB/s. It shows that 'NVMe over MPTCP' delivered bandwidth up to four times that of standard TCP with a single NVMe multipath configuration: # ./mptcp_nvme.sh tcp READ: bw=3D112MiB/s (118MB/s), 112MiB/s-112MiB/s (118MB/s-118MB/s), io=3D1123MiB (1177MB), run=3D10018-10018msec WRITE: bw=3D112MiB/s (117MB/s), 112MiB/s-112MiB/s (117MB/s-117MB/s), io=3D1118MiB (1173MB), run=3D10018-10018msec # ./mptcp_nvme.sh mptcp READ: bw=3D427MiB/s (448MB/s), 427MiB/s-427MiB/s (448MB/s-448MB/s), io=3D4286MiB (4494MB), run=3D10039-10039msec WRITE: bw=3D387MiB/s (406MB/s), 387MiB/s-387MiB/s (406MB/s-406MB/s), io=3D3885MiB (4073MB), run=3D10043-10043msec It reflects that MPTCP has the same multi-interface bandwidth aggregation capability as NVMe multipath. Cc: Hannes Reinecke Co-developed-by: zhenwei pi Signed-off-by: zhenwei pi Co-developed-by: Hui Zhu Signed-off-by: Hui Zhu Co-developed-by: Gang Yan Signed-off-by: Gang Yan Signed-off-by: Geliang Tang --- tools/testing/selftests/net/mptcp/Makefile | 1 + tools/testing/selftests/net/mptcp/config | 8 + .../testing/selftests/net/mptcp/mptcp_lib.sh | 12 + .../testing/selftests/net/mptcp/mptcp_nvme.sh | 329 ++++++++++++++++++ 4 files changed, 350 insertions(+) create mode 100755 tools/testing/selftests/net/mptcp/mptcp_nvme.sh diff --git a/tools/testing/selftests/net/mptcp/Makefile b/tools/testing/sel= ftests/net/mptcp/Makefile index 22ba0da2adb8..7b308447a58b 100644 --- a/tools/testing/selftests/net/mptcp/Makefile +++ b/tools/testing/selftests/net/mptcp/Makefile @@ -13,6 +13,7 @@ TEST_PROGS :=3D \ mptcp_connect_sendfile.sh \ mptcp_connect_splice.sh \ mptcp_join.sh \ + mptcp_nvme.sh \ mptcp_sockopt.sh \ pm_netlink.sh \ simult_flows.sh \ diff --git a/tools/testing/selftests/net/mptcp/config b/tools/testing/selft= ests/net/mptcp/config index 59051ee2a986..e59cf7398f19 100644 --- a/tools/testing/selftests/net/mptcp/config +++ b/tools/testing/selftests/net/mptcp/config @@ -34,3 +34,11 @@ CONFIG_NFT_SOCKET=3Dm CONFIG_NFT_TPROXY=3Dm CONFIG_SYN_COOKIES=3Dy CONFIG_VETH=3Dy +CONFIG_BLK_DEV_LOOP=3Dy +CONFIG_CONFIGFS_FS=3Dy +CONFIG_NVME_CORE=3Dy +CONFIG_NVME_FABRICS=3Dy +CONFIG_NVME_TCP=3Dy +CONFIG_NVME_TARGET=3Dy +CONFIG_NVME_TARGET_TCP=3Dy +CONFIG_NVME_MULTIPATH=3Dy diff --git a/tools/testing/selftests/net/mptcp/mptcp_lib.sh b/tools/testing= /selftests/net/mptcp/mptcp_lib.sh index 5ef6033775c8..e08854ba42bd 100644 --- a/tools/testing/selftests/net/mptcp/mptcp_lib.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_lib.sh @@ -530,6 +530,18 @@ mptcp_lib_check_tools() { exit ${KSFT_SKIP} fi ;; + "nvme") + if ! nvme --version &> /dev/null; then + mptcp_lib_pr_skip "nvme tool not found" + exit ${KSFT_SKIP} + fi + ;; + "fio") + if ! fio -h &> /dev/null; then + mptcp_lib_pr_skip "fio tool not found" + exit ${KSFT_SKIP} + fi + ;; *) mptcp_lib_pr_fail "Internal error: unsupported tool: ${tool}" exit ${KSFT_FAIL} diff --git a/tools/testing/selftests/net/mptcp/mptcp_nvme.sh b/tools/testin= g/selftests/net/mptcp/mptcp_nvme.sh new file mode 100755 index 000000000000..5b1133dbc2d5 --- /dev/null +++ b/tools/testing/selftests/net/mptcp/mptcp_nvme.sh @@ -0,0 +1,329 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 + +. "$(dirname "$0")/mptcp_lib.sh" + +ret=3D0 +trtype=3D"${1:-mptcp}" +path=3D"${2:-1}" +nqn=3D"nqn.2014-08.org.nvmexpress.${trtype}dev.$$.${RANDOM}" +ns=3D1 +port=3D$((RANDOM % 10000 + 20000)) +trsvcid=3D$((RANDOM % 64512 + 1024)) +ns1=3D"" +ns2=3D"" +temp_file=3D"" +loop_dev=3D"" + +export trtype path nqn ns port trsvcid +export loop_dev temp_file + +usage() +{ + cat << EOF + +Usage: + + $(basename "$0") [trtype] [path] + + trtype Transport type (tcp|mptcp) - default: mptcp + path Number of multipath (1-4) - default: 1 + +EOF +exit ${KSFT_FAIL} +} + +validate_params() +{ + if [[ ! "${trtype}" =3D~ ^(tcp|mptcp)$ ]]; then + echo "Invalid trtype ${trtype}. Must be tcp or mptcp" + usage + fi + + if [[ ! "${path}" =3D~ ^[1-4]$ ]]; then + echo "Invalid path count ${path}. Must be between 1 and 4" + usage + fi +} + +# This function is invoked indirectly +#shellcheck disable=3DSC2317,SC2329 +ns1_cleanup() +{ + pushd /sys/kernel/config/nvmet || exit 1 + + for i in $(seq 1 "${path}"); do + local portdir=3D$((port + i)) + + rm -rf "ports/${portdir}/subsystems/${nqn}" + rmdir "ports/${portdir}" + done + + echo 0 > "subsystems/${nqn}/namespaces/${ns}/enable" + rmdir "subsystems/${nqn}/namespaces/${ns}" + rmdir "subsystems/${nqn}" + + popd || exit 1 +} + +# This function is invoked indirectly +#shellcheck disable=3DSC2317,SC2329 +ns2_cleanup() +{ + nvme disconnect -n "${nqn}" || true +} + +# This function is used in the cleanup trap +#shellcheck disable=3DSC2317,SC2329 +cleanup() +{ + if ! ip netns exec "$ns2" bash <<- EOF + $(declare -f ns2_cleanup) + ns2_cleanup + EOF + then + echo "ns2_cleanup failed" >&2 + fi + + sleep 1 + + if ! ip netns exec "$ns1" unshare -m bash <<- EOF + mount -t configfs none /sys/kernel/config + $(declare -f ns1_cleanup) + ns1_cleanup + EOF + then + echo "ns1_cleanup failed" >&2 + fi + + if [ -n "${loop_dev}" ] && [ -b "${loop_dev}" ]; then + losetup -d "${loop_dev}" 2>/dev/null || true + fi + rm -rf "${temp_file}" + + mptcp_lib_ns_exit "$ns1" "$ns2" + + unset -v trtype path nqn ns port trsvcid + unset -v loop_dev temp_file +} + +# $tc_args needs word splitting to pass multiple arguments to netem +# shellcheck disable=3DSC2086 +init() +{ + local tc_args=3D"rate 1000mbit" + + mptcp_lib_ns_init ns1 ns2 + + # ns1 ns2 + # 10.1.1.1 10.1.1.2 + # 10.1.2.1 10.1.2.2 + # 10.1.3.1 10.1.3.2 + # 10.1.4.1 10.1.4.2 + for i in {1..4}; do + ip link add ns1eth"$i" netns "$ns1" type veth peer \ + name ns2eth"$i" netns "$ns2" + ip -net "$ns1" addr add 10.1."$i".1/24 dev ns1eth"$i" + ip -net "$ns1" addr add dead:beef:"$i"::1/64 \ + dev ns1eth"$i" nodad + ip -net "$ns1" link set ns1eth"$i" up + ip -net "$ns2" addr add 10.1."$i".2/24 dev ns2eth"$i" + ip -net "$ns2" addr add dead:beef:"$i"::2/64 \ + dev ns2eth"$i" nodad + ip -net "$ns2" link set ns2eth"$i" up + ip -net "$ns2" route add default via 10.1."$i".1 \ + dev ns2eth"$i" metric 10"$i" + ip -net "$ns2" route add default via dead:beef:"$i"::1 \ + dev ns2eth"$i" metric 10"$i" + + # Add tc qdisc to both namespaces for bandwidth limiting + tc -n "$ns1" qdisc add dev ns1eth"$i" root netem $tc_args + tc -n "$ns2" qdisc add dev ns2eth"$i" root netem $tc_args + + tc -n "$ns1" qdisc show dev ns1eth"$i" + tc -n "$ns2" qdisc show dev ns2eth"$i" + done + + mptcp_lib_pm_nl_set_limits "${ns1}" 8 8 + + mptcp_lib_pm_nl_add_endpoint "$ns1" 10.1.1.1 flags signal + mptcp_lib_pm_nl_add_endpoint "$ns1" 10.1.2.1 flags signal + mptcp_lib_pm_nl_add_endpoint "$ns1" 10.1.3.1 flags signal + mptcp_lib_pm_nl_add_endpoint "$ns1" 10.1.4.1 flags signal + + mptcp_lib_pm_nl_set_limits "${ns2}" 8 8 + + mptcp_lib_pm_nl_add_endpoint "$ns2" 10.1.1.2 flags subflow + mptcp_lib_pm_nl_add_endpoint "$ns2" 10.1.2.2 flags subflow + mptcp_lib_pm_nl_add_endpoint "$ns2" 10.1.3.2 flags subflow + mptcp_lib_pm_nl_add_endpoint "$ns2" 10.1.4.2 flags subflow +} + +# This function is invoked indirectly +#shellcheck disable=3DSC2317,SC2329 +run_target() +{ + cd /sys/kernel/config/nvmet/subsystems || exit + mkdir -p "${nqn}" + cd "${nqn}" || exit + echo 1 > attr_allow_any_host + mkdir -p namespaces/"${ns}" + echo "${loop_dev}" > namespaces/"${ns}"/device_path + echo 1 > namespaces/"${ns}"/enable + + # Create ${path} ports, each on a different IP address + for i in $(seq 1 "${path}"); do + local portdir=3D$((port + i)) + + cd /sys/kernel/config/nvmet/ports || exit + mkdir -p "${portdir}" + cd "${portdir}" || exit 1 + echo "${trtype}" > addr_trtype + echo ipv4 > addr_adrfam + if [ "${path}" -eq 1 ]; then + echo "0.0.0.0" > addr_traddr + else + echo "10.1.${i}.1" > addr_traddr + fi + echo "${trsvcid}" > addr_trsvcid + + mkdir -p subsystems + ln -sf "../../subsystems/${nqn}" "subsystems/${nqn}" + cd - >/dev/null || exit + done +} + +# This function is invoked indirectly +#shellcheck disable=3DSC2317,SC2329 +run_host() +{ + local traddr=3D10.1.1.1 + local devname + + echo "nvme discover -a ${traddr}" + if ! nvme discover -t "${trtype}" -a "${traddr}" \ + -s "${trsvcid}"; then + echo "Failed to discover ${traddr}" + return 1 + fi + + for i in $(seq 1 "${path}"); do + traddr=3D10.1.${i}.1 + echo "Connecting to ${traddr}:${trsvcid}" + if ! nvme connect -t "${trtype}" -a "${traddr}" \ + -s "${trsvcid}" -n "${nqn}"; then + echo "Failed to connect to ${traddr}" + return 1 + fi + done + + for i in $(seq 1 10); do + for dev in /dev/nvme*n1; do + if [ -b "$dev" ] 2>/dev/null; then + if nvme id-ctrl "$dev" 2>/dev/null | + grep -q "${nqn}"; then + devname=3D$(basename "$dev") + break 2 + fi + fi + done 2>/dev/null + [ -n "$devname" ] && break + sleep 1 + done + + if [ -z "$devname" ]; then + echo "No block device found for NQN ${nqn}" >&2 + return 1 + fi + + echo "nvme list" + if ! nvme list; then + echo "nvme list failed" >&2 + return 1 + fi + + sleep 1 + + echo "fio randread /dev/${devname}" + if ! fio --name=3Dglobal --direct=3D1 --norandommap --randrepeat=3D0 \ + --ioengine=3Dlibaio --thread=3D1 --blocksize=3D128k --runtime=3D10 \ + --time_based --rw=3Drandread --numjobs=3D4 --iodepth=3D256 \ + --group_reporting --size=3D100% \ + --name=3Dlibaio_4_256_128k_randread \ + --filename=3D"/dev/${devname}"; then + echo "fio randread failed" + return 1 + fi + + sleep 1 + + echo "fio randwrite /dev/${devname}" + if ! fio --name=3Dglobal --direct=3D1 --norandommap --randrepeat=3D0 \ + --ioengine=3Dlibaio --thread=3D1 --blocksize=3D128k --runtime=3D10 \ + --time_based --rw=3Drandwrite --numjobs=3D4 --iodepth=3D256 \ + --group_reporting --size=3D100% \ + --name=3Dlibaio_4_256_128k_randwrite \ + --filename=3D"/dev/${devname}"; then + echo "fio randwrite failed" + return 1 + fi + + nvme flush "/dev/${devname}" +} + +mptcp_lib_check_tools nvme fio +validate_params + +if ! temp_file=3D$(mktemp --suffix=3D.raw /tmp/nvme_test.XXXXXX); then + echo "Failed to create temp file" + exit 1 +fi + +trap cleanup EXIT + +if ! dd if=3D/dev/zero of=3D"${temp_file}" bs=3D1M count=3D0 seek=3D512; t= hen + echo "Failed to create backing file" >&2 + exit 1 +fi + +if ! loop_dev=3D$(losetup -f --show "${temp_file}"); then + echo "Failed to create loop device" >&2 + exit 1 +fi + +init + +run_test() +{ + if ! ip netns exec "$ns1" unshare -m bash <<- EOF + mount -t configfs none /sys/kernel/config + $(declare -f run_target) + run_target + exit \$? + EOF + then + ret=3D"${KSFT_FAIL}" + fi + + if ! ip netns exec "$ns2" bash <<- EOF + $(declare -f run_host) + run_host + exit \$? + EOF + then + ret=3D"${KSFT_FAIL}" + fi + + sleep 1 +} + +run_test "$@" + +if [ "${ret}" -eq 0 ]; then + mptcp_lib_result_pass "nvme over ${trtype} test" +else + mptcp_lib_result_fail "nvme over ${trtype} test" +fi + +mptcp_lib_result_print_all_tap +exit "$ret" --=20 2.53.0 From nobody Mon May 25 18:05:06 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5E7513E314D for ; Mon, 25 May 2026 09:34:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779701646; cv=none; b=u6WI1TYo1Keg2akN9TGh4WsFkyxKAdBr7cJQTUVq9PQMFOioLqIx6WBm/ib985e6wSAR678PSjgvPpXGn6T4pqMs92vVBJHhJ0+du94E5P8iwJIfYUQErg6EoW7+lO3tHlLoKwK4K45VNqxCvp/5YDY3sZUM/HdJE0yONVz35dA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779701646; c=relaxed/simple; bh=0s7T3WkqK9bn/FdcmLdBcpEzRbaP8esJ8c0EGQL1GTE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Bap2fYoKGdqQtGDJ7rWLZAJx5CRGgqW9mphDWIoszoiJ7VVp8fi03V3ehasvM31YMTColEhdSF79QOtvv+sHVrAKvYZgKGobyC9TKKVSg0liwPCmla6mm71FfzNmP6QtdpLQKWRW4XHKEuD7CgbEj5eKpIwXbtlrlZ1D7EeFrls= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=k/g3gx2X; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="k/g3gx2X" Received: by smtp.kernel.org (Postfix) with ESMTPSA id E32131F00A3C; Mon, 25 May 2026 09:34:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1779701645; bh=z9wY9Ma0QN9MHEczeDOxCGf47kMA502HbBeSGdtk7rk=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=k/g3gx2X95ziPi1140+zwoanQCXbtLpIQOnG7qgq4dH7auo26TNshzrdcn5TGnIp1 Kt1PCvF+cVvwOa9KvS14YLOsTZ8FnfTMuZoSLcp9vuvMp6EqHX78zJUYggWdSh5CP+ EaOluXlGSMIEx0Q5aamkyHk/QUGfAO311WyRagkOhpunexVOsHy3QJeLElgtmPtf+g AIfb4cdFIPPDTdhB6sXvsYruUFtL8P5b9xlIFIm0qcFL4DuuZmPoQWUAL73Ar9dE05 jJTo//YKrDGovYZ5GF+d4ExGF01fgLwcBCbvCHKybPpXYg70GUjsW/Vr/UtOvDcPqV ne6VSzjTu5zlw== From: Geliang Tang To: mptcp@lists.linux.dev Cc: Geliang Tang , Hannes Reinecke , zhenwei pi , Hui Zhu , Gang Yan Subject: [RFC mptcp-next v17 11/14] selftests: mptcp: nvme: add iopolicy tests Date: Mon, 25 May 2026 17:33:06 +0800 Message-ID: X-Mailer: git-send-email 2.53.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Geliang Tang Add NVMe iopolicy testing to mptcp_nvme.sh, with the default set to "numa". It can be set to "round-robin" or "queue-depth". Test results with 4 NVMe multipath paths and round-robin iopolicy show that TCP and MPTCP achieve similar bandwidth: # ./mptcp_nvme.sh tcp 4 round-robin READ: bw=3D455MiB/s (478MB/s), 455MiB/s-455MiB/s (478MB/s-478MB/s), io=3D4665MiB (4891MB), run=3D10242-10242msec WRITE: bw=3D455MiB/s (477MB/s), 455MiB/s-455MiB/s (477MB/s-477MB/s), io=3D4633MiB (4858MB), run=3D10184-10184msec # ./mptcp_nvme.sh mptcp 4 round-robin READ: bw=3D445MiB/s (466MB/s), 445MiB/s-445MiB/s (466MB/s-466MB/s), io=3D4575MiB (4797MB), run=3D10287-10287msec WRITE: bw=3D445MiB/s (467MB/s), 445MiB/s-445MiB/s (467MB/s-467MB/s), io=3D4572MiB (4794MB), run=3D10267-10267msec A "loss" argument is added to simulate network packet loss. When loss=3D1, each veth interface is configured with "delay 5ms loss 0.5%" using tc qdisc. Under this scenario, TCP performance is reduced by multiples compared to MPTCP: # ./mptcp_nvme.sh tcp 4 round-robin 1 READ: bw=3D144MiB/s (151MB/s), 144MiB/s-144MiB/s (151MB/s-151MB/s), io=3D1909MiB (2001MB), run=3D13231-13231msec WRITE: bw=3D100.0MiB/s (105MB/s), 100.0MiB/s-100.0MiB/s (105MB/s-105MB/s), io=3D1397MiB (1465MB), run=3D13980-13980msec # ./mptcp_nvme.sh mptcp 4 round-robin 1 READ: bw=3D428MiB/s (449MB/s), 428MiB/s-428MiB/s (449MB/s-449MB/s), io=3D4524MiB (4743MB), run=3D10564-10564msec WRITE: bw=3D431MiB/s (452MB/s), 431MiB/s-431MiB/s (452MB/s-452MB/s), io=3D4513MiB (4732MB), run=3D10481-10481msec These results demonstrate that MPTCP has better resilience against packet loss compared to TCP, as it can leverage multiple subflows to mitigate network degradation. Cc: Hannes Reinecke Co-developed-by: zhenwei pi Signed-off-by: zhenwei pi Co-developed-by: Hui Zhu Signed-off-by: Hui Zhu Co-developed-by: Gang Yan Signed-off-by: Gang Yan Signed-off-by: Geliang Tang --- .../testing/selftests/net/mptcp/mptcp_nvme.sh | 70 ++++++++++++++++++- 1 file changed, 69 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/net/mptcp/mptcp_nvme.sh b/tools/testin= g/selftests/net/mptcp/mptcp_nvme.sh index 5b1133dbc2d5..3ab04be05dff 100755 --- a/tools/testing/selftests/net/mptcp/mptcp_nvme.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_nvme.sh @@ -6,6 +6,8 @@ ret=3D0 trtype=3D"${1:-mptcp}" path=3D"${2:-1}" +iopolicy=3D${3:-"numa"} # round-robin, queue-depth +loss=3D${4:-0} nqn=3D"nqn.2014-08.org.nvmexpress.${trtype}dev.$$.${RANDOM}" ns=3D1 port=3D$((RANDOM % 10000 + 20000)) @@ -17,6 +19,7 @@ loop_dev=3D"" =20 export trtype path nqn ns port trsvcid export loop_dev temp_file +export iopolicy loss =20 usage() { @@ -24,10 +27,12 @@ usage() =20 Usage: =20 - $(basename "$0") [trtype] [path] + $(basename "$0") [trtype] [path] [iopolicy] [loss] =20 trtype Transport type (tcp|mptcp) - default: mptcp path Number of multipath (1-4) - default: 1 + iopolicy I/O policy (numa|round-robin|queue-depth) - default: numa + loss Enable packet loss (0|1) - default: 0 =20 EOF exit ${KSFT_FAIL} @@ -44,6 +49,16 @@ validate_params() echo "Invalid path count ${path}. Must be between 1 and 4" usage fi + + if [[ ! "${iopolicy}" =3D~ ^(numa|round-robin|queue-depth)$ ]]; then + echo "Invalid iopolicy ${iopolicy}." + usage + fi + + if [[ ! "${loss}" =3D~ ^[01]$ ]]; then + echo "Invalid loss value ${loss}. Must be 0 or 1" + usage + fi } =20 # This function is invoked indirectly @@ -105,6 +120,7 @@ cleanup() =20 unset -v trtype path nqn ns port trsvcid unset -v loop_dev temp_file + unset -v iopolicy loss } =20 # $tc_args needs word splitting to pass multiple arguments to netem @@ -113,6 +129,10 @@ init() { local tc_args=3D"rate 1000mbit" =20 + if [ "${loss}" -eq 1 ]; then + tc_args+=3D" delay 5ms loss 0.5%" + fi + mptcp_lib_ns_init ns1 ns2 =20 # ns1 ns2 @@ -193,6 +213,48 @@ run_target() done } =20 +# This function is invoked indirectly +#shellcheck disable=3DSC2317,SC2329 +set_io_policy() +{ + local nqn=3D"$1" + local iopolicy=3D"$2" + local subname + local policy + local current + + subname=3D$(nvme list-subsys 2>/dev/null | grep "${nqn}" | + grep -o 'nvme-subsys[0-9]*' | head -1) + if [ -z "$subname" ]; then + return 1 + fi + + policy=3D"/sys/class/nvme-subsystem/${subname}/iopolicy" + if [ ! -e "$policy" ]; then + # NVMe multipath not supported, skip iopolicy setting + return 0 + fi + + if [ ! -w "$policy" ]; then + return 1 + fi + + if ! echo "${iopolicy}" > "$policy" 2>/dev/null; then + return 1 + fi + + current=3D$(cat "$policy" 2>/dev/null) + if [ -z "$current" ]; then + return 1 + fi + + if [[ "$current" !=3D *"${iopolicy}"* ]]; then + return 1 + fi + + return 0 +} + # This function is invoked indirectly #shellcheck disable=3DSC2317,SC2329 run_host() @@ -242,6 +304,11 @@ run_host() return 1 fi =20 + if ! set_io_policy "${nqn}" "${iopolicy}"; then + echo "Failed to set I/O policy to ${iopolicy}" + return 1 + fi + sleep 1 =20 echo "fio randread /dev/${devname}" @@ -306,6 +373,7 @@ run_test() fi =20 if ! ip netns exec "$ns2" bash <<- EOF + $(declare -f set_io_policy) $(declare -f run_host) run_host exit \$? --=20 2.53.0 From nobody Mon May 25 18:05:06 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E7D5B3E63B3 for ; Mon, 25 May 2026 09:34:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779701649; cv=none; b=FDvXpLz66/2ebOdJvv3M/XDQ6gG5RcCwNdw5uZIGDasVCKcnsJsaV9A+/AjISmNNPQnmdH2JCvpMDJZXvyPnXGx/hE1AG+HV7EubSzm79MWgUGSDbncxGe76YERa1o7VPEHoJ7TqGZv0yWjzyw1VTvLei1mdzejgvxuNI7MwUIs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779701649; c=relaxed/simple; bh=EtPqnx9xDbeYrG2da+/fXaOoIo5aL5hwuH9pFNnPEhk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=KuhOs5eLoOnx6xKSJLExt8tmP3zZm0ymu2wIbVfxoFfyw71SRsnnx/dPqpIun+00/VKiGv3hMRj+b3UgONqE3trPtERTfSFnO6Xbgs0mQqkO6xa6mr9f9AIuw/jAcWOTwK3tKprNb1xXjMGOD804dXMJA3PoRFrelsVwh5NLFWM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=DRjeG6eV; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="DRjeG6eV" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 90FC21F000E9; Mon, 25 May 2026 09:34:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1779701647; bh=rsHMAHkJ4NGX23mCCuPPA0NOe0OeSYKZDS3Wd72PLcs=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=DRjeG6eVb1Q7iWLuWVxjl+fzoPIdvMuIq+lprk3vLK7LZAqgizQaG7nn0TzOzmtdU p6psfQkIMTiidUgrBDuSJDET1SC7LiqlzL1zbLS1AW6m0ECrlD4azguKI5Nk6LCbNO usa2znNByDh4iuA/haOqvXtdh66KDTbN8Ruqe1S/2iRsM3fRGsv07faBc5pU47yEDy pHeM1uyMfIxaa3S15l+eWvGicnCF7BkiAfE/qino0A+Pww1FqVs9SE/Z3CFXh0KSjw mtE9laRWnyI5gBCr4QBjDnmkdnilEm6qPtM0ZVppsFJJrzdq3+eOaD+OjXkPfg9Pzh IcUNTgOKCcIuQ== From: Geliang Tang To: mptcp@lists.linux.dev Cc: Geliang Tang , Hannes Reinecke , zhenwei pi , Hui Zhu , Gang Yan Subject: [RFC mptcp-next v17 12/14] nvmet-tcp: check return value of nvmet_tcp_set_queue_sock Date: Mon, 25 May 2026 17:33:07 +0800 Message-ID: <236a9ed0a2ad14b534681e12970fc1139d19f7fa.1779701391.git.tanggeliang@kylinos.cn> X-Mailer: git-send-email 2.53.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Geliang Tang The return value of nvmet_tcp_set_queue_sock() is currently ignored in nvmet_tcp_tls_handshake_done(). If it fails (e.g., due to the socket not being in TCP_ESTABLISHED state), the socket callbacks will not be properly set, leading to queue and socket leakage. Fix this by capturing the return value and calling nvmet_tcp_schedule_release_queue() on failure to ensure proper cleanup. Cc: Hannes Reinecke Cc: zhenwei pi Cc: Hui Zhu Cc: Gang Yan Fixes: 675b453e0241 ("nvmet-tcp: enable TLS handshake upcall") Signed-off-by: Geliang Tang --- drivers/nvme/target/tcp.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c index 8c2dc4bcbcd3..1cf0ee464a22 100644 --- a/drivers/nvme/target/tcp.c +++ b/drivers/nvme/target/tcp.c @@ -1854,10 +1854,11 @@ static void nvmet_tcp_tls_handshake_done(void *data= , int status, if (!status) status =3D nvmet_tcp_tls_key_lookup(queue, peerid); =20 + if (!status) + status =3D nvmet_tcp_set_queue_sock(queue); + if (status) nvmet_tcp_schedule_release_queue(queue); - else - nvmet_tcp_set_queue_sock(queue); kref_put(&queue->kref, nvmet_tcp_release_queue); } =20 --=20 2.53.0 From nobody Mon May 25 18:05:06 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CB1083E5A38 for ; Mon, 25 May 2026 09:34:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779701652; cv=none; b=DB7UvVUW/nEp/D0qbeEE9j8lwNF3Pc+ESEtyCKdIU2oJuaP0dnX8IlYVjKaNu3gPyUsY4BDCoRItOSQY7ARyiD5p0USS3WMD7Oi+bfig2BdxnA6PzGYApSc5Ufni12fZHpGAjLn5DHIbB+uSmOBPZV2Ims7TIpIEYxqbTTOWSg4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779701652; c=relaxed/simple; bh=TaW2gJ+Xg5i3BerQbGr+5vKutvhz5JFjl2EuwCQnDL4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=GSuv96zRTfQt23RnXPTuEVXSujFvv0ltYt8iTwwIqlLZfz+ihMvjCX/uAMWBwuXwZna2keRH1riBMX/6BrEkjIB5iwcCJNTxTf+47qTU+LnOl48cjC13+gZQVa7KFDbRy30JfTidH+nih0EgDgaeijaVimIyiQpkHfyvUIBPVlo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=H5NtaK1O; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="H5NtaK1O" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3EF961F00A3A; Mon, 25 May 2026 09:34:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1779701650; bh=CPo8LVR5xUphb9eW21gS7U0P7l2x/dnnILCfYkOUsOk=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=H5NtaK1OptAe7VXP8Mo0QN/xJhuP5XtkHJLTPJrPk+7+YFjnffH1Ol/P1FnKUpAw2 Yw6Pdq6Pk8qRkin3WXqnj05tJ9pubf22yLHzZtJYCQmNEHMZuHaM/B2gi8HUbk94wx 4IFIzJNp+PuR/LQ+3MuCNmvoQcfN/khGNu2NBKuPuZ8hO8jv/2hQjRo3HXwkYZKaoJ 3H+QE9Ye38Ban6V5ZGPT/id56cE5qJbTGd7ciWxrAqYib05+VXc/mTIXNHI3ctA/PK G6pmzg+gTsgDxFtH+t7IFillSw4WPj5mJjFOGHKE/3oQV2cotD+XOMspmDw5BlzzjN H+GOM+ENHytSg== From: Geliang Tang To: mptcp@lists.linux.dev Cc: Geliang Tang , Hannes Reinecke , zhenwei pi , Hui Zhu , Gang Yan Subject: [RFC mptcp-next v17 13/14] nvmet-tcp: fix page fragment cache leak in error path Date: Mon, 25 May 2026 17:33:08 +0800 Message-ID: X-Mailer: git-send-email 2.53.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Geliang Tang In nvmet_tcp_alloc_queue(), when a connection is closed during the allocation process (e.g., nvmet_tcp_set_queue_sock() returns -ENOTCONN), the error handling jumps to out_free_queue without draining the page fragment cache. Although nvmet_tcp_free_cmd() is called in some error paths to release individual page fragments, the underlying page cache reference held by queue->pf_cache is never released. This results in a page leak each time a connection fails during allocation, which could lead to memory exhaustion over time if connections are repeatedly opened and closed. Fix this by calling page_frag_cache_drain() before freeing the queue structure in the out_free_queue label. Cc: Hannes Reinecke Cc: zhenwei pi Cc: Hui Zhu Cc: Gang Yan Fixes: 872d26a391da ("nvmet-tcp: add NVMe over TCP target driver") Signed-off-by: Geliang Tang --- drivers/nvme/target/tcp.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c index 1cf0ee464a22..500793b56983 100644 --- a/drivers/nvme/target/tcp.c +++ b/drivers/nvme/target/tcp.c @@ -2047,6 +2047,7 @@ static void nvmet_tcp_alloc_queue(struct nvmet_tcp_po= rt *port, out_sock: fput(queue->sock->file); out_free_queue: + page_frag_cache_drain(&queue->pf_cache); kfree(queue); out_release: pr_err("failed to allocate queue, error %d\n", ret); --=20 2.53.0 From nobody Mon May 25 18:05:06 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 829C43E3DB8 for ; Mon, 25 May 2026 09:34:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779701654; cv=none; b=PcNZ6WqZM8Gubp5U8dTiGSzkgM8F0YrWD9enLnD5243tlMoj6fuYscRafcFGJhH0ytVdzepBO+muMyRraIU9jdvp+Utr2/f7iT7H8t89Fr2lvOLG107frz1nA+6LGVEh5zbAQg+48SlRuYxypS7MEX7TPNRURN2Ij2u572C8Axk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779701654; c=relaxed/simple; bh=Vk22Xl9WEDtsqiVP3W/98LneQ5j71Y7CNTPMxMRjysA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=i28UO+Q3o3QzGpzCnyMTrl3AwLniQz+KLWBwIipYOKolW7sis2nzn9Uc1SFPtpkrr9C1VjEbbiQjr9Euf0azLF+s76WfLz3se0YLFkgPGrKKnMNCXFRcpmzSa2AhrTnp1E8OWOhPKEsNexMeyHiUzT0sX7V0bciUZSPpuJGvaVY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=N+VJOAoC; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="N+VJOAoC" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1DEF11F000E9; Mon, 25 May 2026 09:34:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1779701653; bh=JwpmLWzrcZiTQG7mv2NOc+FB61GC3uj+q2Noe/Sk/Pw=; h=From:To:Cc:Subject:Date:In-Reply-To:References; b=N+VJOAoC291ivhi8N/nlOxaCEMlEXcRm7xiphm4lp4oJ3hETHRNuTMJOYhbpKfXwa Cu3agLqN8bab2sRCA6zZjsT+iZSTSd3T5VWDyR1bPApJqgqjcOE3SHxmNSflPgJDeo +KPPsi4HNWB0HKu+9F7uIRm3pvBsMd81Fm0FC5WZjIb9lRqQYGS4tnYLve6pt4nBO5 bjraI5HgW6UzQDhul80IwAKZSnaXhoJKZhs21v+RMtHfzWFhvThfHjXzvJD9NQISzl gTfgJbye81/U53LI5PgEGpH1XlXyieVm7TnyVqugF2KnVMLfT94AO/xLV1jm+F1aOm INCkjFxcAn52Q== From: Geliang Tang To: mptcp@lists.linux.dev Cc: Geliang Tang , Hannes Reinecke , zhenwei pi , Hui Zhu , Gang Yan Subject: [RFC mptcp-next v17 14/14] nvme-tcp: add RCU protection for host_iface validation Date: Mon, 25 May 2026 17:33:09 +0800 Message-ID: <9f89cc6c87d4b65603b686df0c6f339c745f249f.1779701391.git.tanggeliang@kylinos.cn> X-Mailer: git-send-email 2.53.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Geliang Tang When the host_iface option is specified, nvme_tcp_alloc_ctrl() uses __dev_get_by_name(&init_net, opts->host_iface) to validate the interface. This has two issues: it hardcodes init_net instead of using the caller's network namespace, and it lacks RCU protection required by __dev_get_by_nam= e(). Fix by using dev_get_by_name_rcu() with current->nsproxy->net_ns and wrapping the lookup with rcu_read_lock()/rcu_read_unlock(). Cc: Hannes Reinecke Cc: zhenwei pi Cc: Hui Zhu Cc: Gang Yan Fixes: 8b43ced64d2b ("nvme-tcp: use __dev_get_by_name instead dev_get_by_na= me for OPT_HOST_IFACE") Signed-off-by: Geliang Tang --- drivers/nvme/host/tcp.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index 2388a8c443cc..a600fa3f9caa 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include #include @@ -3004,12 +3005,16 @@ static struct nvme_tcp_ctrl *nvme_tcp_alloc_ctrl(st= ruct device *dev, } =20 if (opts->mask & NVMF_OPT_HOST_IFACE) { - if (!__dev_get_by_name(&init_net, opts->host_iface)) { + rcu_read_lock(); + if (!dev_get_by_name_rcu(current->nsproxy->net_ns, + opts->host_iface)) { + rcu_read_unlock(); pr_err("invalid interface passed: %s\n", opts->host_iface); ret =3D -ENODEV; goto out_free_ctrl; } + rcu_read_unlock(); } =20 if (!opts->duplicate_connect && nvme_tcp_existing_controller(opts)) { --=20 2.53.0