From nobody Thu Nov 27 12:37:14 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BC2FE1E32D3 for ; Fri, 7 Nov 2025 03:38:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762486680; cv=none; b=i+jXGQMRdB6mF9Y879rLYERIZK7/AYZ0JSn0jt9dJRgkiKMY22sw8XFT0XJKM+r0FvGVu2RU6EGMq4Aiv+b939R45tvc4rRTI9+v4T1p0QjXaA+oSLkdfNXzoiwI6TkuPO9jxvxj5AOwbdXr3jj+FYLrZF01aspECkmdFbMEU/s= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762486680; c=relaxed/simple; bh=lLwfaRl7QsVbQv0wh8UqEkJcWszQ54RQ1a8cEzo98sM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=FU7CnTbKHEGPBEqSLwrEFOFY617nr+AVKNMIr74S58VK0nrjf4hnXH2ggVP5NKMenBImLSl7AhHierEGM6/tuuvc4Bzp/nd/eWgD662Erfaut3KKR19zqFx9bOpDpJDOI9jtCL7nRCBJDNI6RaQFlkz3og2WHytu2WpYjgeVsI8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=XZrISWuL; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="XZrISWuL" Received: by smtp.kernel.org (Postfix) with ESMTPSA id BF170C4CEF5; Fri, 7 Nov 2025 03:37:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1762486680; bh=lLwfaRl7QsVbQv0wh8UqEkJcWszQ54RQ1a8cEzo98sM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=XZrISWuLhkBwi7nq19jC6XhhM0UH7SHsUKqpLVQM+UOJJAGB8JfctUJspW6tug7pH 236PFM72duLIQ5hgDr9yfKFMFJo0kgoK/V16AuANd+Y6CZZ+u60DQS0i3dO75pyHCJ V7HISJT3hTNGe33X8KVowVe5Iao4wLLKV0BTlZ1fkdjQpMu6YU7ZQGwlN/9Irqmwve TASipEl61lCDvjOaRoXEPzMBQppnG2VvBnqOq3Zm2z3izKRPmrFn+oo3c5Xd0iitv/ ozyRHC41Pq4detNpCqMyEWRWkbvmH5NkW/+zj8YssofOAqbXBT382vZALAy3Xrvax0 kDFgEVw7+jG0A== From: Geliang Tang To: mptcp@lists.linux.dev, hare@suse.de, hare@kernel.org Cc: Geliang Tang , Hui Zhu , Gang Yan , zhenwei pi Subject: [RFC mptcp-next 1/6] mptcp: add sock_set_nodelay Date: Fri, 7 Nov 2025 11:37:32 +0800 Message-ID: X-Mailer: git-send-email 2.43.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Geliang Tang This patch adds support for the TCP_NODELAY socket option in MPTCP by introducing a new mptcp_sock_set_nodelay() function. The implementation: 1. Exposes the function in include/net/mptcp.h for external use 2. Provides the actual implementation in protocol.c that: - Locks the MPTCP master socket - Retrieves the first subflow socket - Sets TCP_NODELAY on the subflow using __tcp_sock_set_nodelay() - Properly handles locking and error cases The function follows MPTCP's standard pattern for modifying subflow options while maintaining proper socket locking semantics. When called, it will disable Nagle's algorithm on the underlying TCP subflow, which can help reduce latency for certain types of applications. Co-Developed-by: Hui Zhu Signed-off-by: Hui Zhu Co-Developed-by: Gang Yan Signed-off-by: Gang Yan Co-Developed-by: zhenwei pi Signed-off-by: zhenwei pi Signed-off-by: Geliang Tang --- include/net/mptcp.h | 4 ++++ net/mptcp/protocol.c | 18 ++++++++++++++++++ 2 files changed, 22 insertions(+) diff --git a/include/net/mptcp.h b/include/net/mptcp.h index 4cf59e83c1c5..f275eae0d32f 100644 --- a/include/net/mptcp.h +++ b/include/net/mptcp.h @@ -237,6 +237,8 @@ static inline __be32 mptcp_reset_option(const struct sk= _buff *skb) } =20 void mptcp_active_detect_blackhole(struct sock *sk, bool expired); + +void mptcp_sock_set_nodelay(struct sock *sk); #else =20 static inline void mptcp_init(void) @@ -323,6 +325,8 @@ static inline struct request_sock *mptcp_subflow_reqsk_= alloc(const struct reques static inline __be32 mptcp_reset_option(const struct sk_buff *skb) { retu= rn htonl(0u); } =20 static inline void mptcp_active_detect_blackhole(struct sock *sk, bool exp= ired) { } + +static void mptcp_sock_set_nodelay(struct sock *sk) { } #endif /* CONFIG_MPTCP */ =20 #if IS_ENABLED(CONFIG_MPTCP_IPV6) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index e796c9c7971d..b2285b651ebc 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -3696,6 +3696,24 @@ void mptcp_sock_graft(struct sock *sk, struct socket= *parent) write_unlock_bh(&sk->sk_callback_lock); } =20 +void mptcp_sock_set_nodelay(struct sock *sk) +{ + struct mptcp_sock *msk =3D mptcp_sk(sk); + struct sock *ssk; + + lock_sock(sk); + ssk =3D __mptcp_nmpc_sk(msk); + if (IS_ERR(ssk)) + goto unlock; + + lock_sock(ssk); + __tcp_sock_set_nodelay(ssk, true); + release_sock(ssk); +unlock: + release_sock(sk); +} +EXPORT_SYMBOL(mptcp_sock_set_nodelay); + bool mptcp_finish_join(struct sock *ssk) { struct mptcp_subflow_context *subflow =3D mptcp_subflow_ctx(ssk); --=20 2.43.0 From nobody Thu Nov 27 12:37:14 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4B0D116132F for ; Fri, 7 Nov 2025 03:38:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762486683; cv=none; b=jrjNda3bDagSIMFY2VcYPmpWbSXEujwkPTu3/1KZecQAObPJN47BkzLCmBZB5A/gKV0Oz7T6GLbbOclJij/37x8R7r1aFv51R3aD/GeiALATF28nxvq4/vSgfQ/rC5FjfJy7unKy4nvy/SAMyTMq+XHtK8IromHnZJR/Q0QwhVk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762486683; c=relaxed/simple; bh=B27xRBm9gvXU8kqD3icE0nkOJxglruwcgI1VI2A1Peo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=kGZh5bC/xkI+30s1MyxZfLbEN5HXfzE5hJz9JTyRuwYqL9VBhe5b7qZuZa0HbPH0/o1jR8iBYUkpfjU63eZ7RDIqxmw6WXS2ML0mpBn9j2+LDrRWotUtYes/c9PH+YkW7PLNLTYau/mbSZto0jjErm640ck4Av9TAjMiWOAcBnQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=dMaa2xkE; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="dMaa2xkE" Received: by smtp.kernel.org (Postfix) with ESMTPSA id EA077C116C6; Fri, 7 Nov 2025 03:38:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1762486683; bh=B27xRBm9gvXU8kqD3icE0nkOJxglruwcgI1VI2A1Peo=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=dMaa2xkEv3FjMnp8X3kb9IqkfzcaciZuaMTFUtCReacc/YvN2QyyKu4gvlRh0gnAD /FeCPRsIjyU3yLBnZPt8PoWUtVJ/To2Z9L1qiM5MJ7RjYCRmPHfys34rMc0+Gb2FPu 55SVMAzR4YJSM5KZdlIFB5mBrDgzWfrW4E+g/4+D11LdgUJElkN4OvMDUU5XUAnx/E 1qwDv396o0YGII2chjBYQPyT4HGTW7Go7pdoTctgtWe0GYbPvuVIUoLs7BtnJMbPaU 9bto4ZVwRWt5HdhfToXtzJI4L+PKEZhusvFAOpg6ZrhnnFrQbxc5un7hyFU7IA0gso ltr8BZ5MiYu7Q== From: Geliang Tang To: mptcp@lists.linux.dev, hare@suse.de, hare@kernel.org Cc: Geliang Tang , Hui Zhu , Gang Yan , zhenwei pi Subject: [RFC mptcp-next 2/6] nvmet-tcp: add mptcp support Date: Fri, 7 Nov 2025 11:37:33 +0800 Message-ID: <188ee24f0e00c882748ba8765b8459d226ed554f.1762485513.git.tanggeliang@kylinos.cn> X-Mailer: git-send-email 2.43.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Geliang Tang This patch adds a new nvme target transport type NVMF_TRTYPE_MPTCP for MPTCP. And defines a new nvmet_fabrics_ops named nvmet_mptcp_ops, which is almost the same as nvmet_tcp_ops except .type. Check if disc_addr.trtype is NVMF_TRTYPE_MPTCP in nvmet_tcp_add_port() to decide whether to pass IPPROTO_MPTCP to sock_create() to create a MPTCP socket instead of a TCP one. This new nvmet_fabrics_ops can be switched in nvmet_tcp_done_recv_pdu() according to different protocol. v2: - use trtype instead of tsas (Hannes). v3: - check mptcp protocol from disc_addr.trtype instead of passing a parameter (Hannes). v4: - check CONFIG_MPTCP. v5: - Thanks to Hui Zhu for helping me debug the following list corruption issue using gdb: [ 13.043520][ T179] nvmet: adding nsid 1 to subsystem nqn.2014-08.org.nv= mexpress.mptcpdev [ 13.197544][ T181] nvmet_tcp: enabling port 1234 (127.0.0.1:4420) [ 13.395800][ T182] slab MPTCP start ffff8880108f0b80 pointer offset 24= 80 size 2816 [ 13.396422][ T182] list_add corruption. prev->next should be next (ffff= 8880108f1530), but was ffff8885108f1530. (prev=3Dffff8880108f1530). [ 13.397064][ T182] ------------[ cut here ]------------ [ 13.397305][ T182] kernel BUG at lib/list_debug.c:32! [ 13.397668][ T182] Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI [ 13.397914][ T182] CPU: 1 UID: 0 PID: 182 Comm: nvme Not tainted 6.16.0= -rc3+ #1 PREEMPT(full) [ 13.398282][ T182] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 - This issue was finally located in tcp_sock_set_nodelay(). When using MPT= CP, set_nodelay of TCP cannot be invoked. We need to implement a MPTCP one. Co-Developed-by: Hui Zhu Signed-off-by: Hui Zhu Co-Developed-by: Gang Yan Signed-off-by: Gang Yan Co-Developed-by: zhenwei pi Signed-off-by: zhenwei pi Signed-off-by: Geliang Tang --- drivers/nvme/host/tcp.c | 4 +++- drivers/nvme/target/configfs.c | 1 + drivers/nvme/target/tcp.c | 34 ++++++++++++++++++++++++++++++++-- include/linux/nvme.h | 1 + 4 files changed, 37 insertions(+), 3 deletions(-) diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index 6795b8286c35..a80af6471b10 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -19,6 +19,7 @@ #include #include #include +#include =20 #include "nvme.h" #include "fabrics.h" @@ -1804,7 +1805,8 @@ static int nvme_tcp_alloc_queue(struct nvme_ctrl *nct= rl, int qid, tcp_sock_set_syncnt(queue->sock->sk, 1); =20 /* Set TCP no delay */ - tcp_sock_set_nodelay(queue->sock->sk); + proto =3D=3D IPPROTO_MPTCP ? mptcp_sock_set_nodelay(queue->sock->sk) : + tcp_sock_set_nodelay(queue->sock->sk); =20 /* * Cleanup whatever is sitting in the TCP transmit queue on socket diff --git a/drivers/nvme/target/configfs.c b/drivers/nvme/target/configfs.c index e44ef69dffc2..14c642cd458e 100644 --- a/drivers/nvme/target/configfs.c +++ b/drivers/nvme/target/configfs.c @@ -37,6 +37,7 @@ static struct nvmet_type_name_map nvmet_transport[] =3D { { NVMF_TRTYPE_RDMA, "rdma" }, { NVMF_TRTYPE_FC, "fc" }, { NVMF_TRTYPE_TCP, "tcp" }, + { NVMF_TRTYPE_MPTCP, "mptcp" }, { NVMF_TRTYPE_PCI, "pci" }, { NVMF_TRTYPE_LOOP, "loop" }, }; diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c index d543da09ef8e..066dd88e2449 100644 --- a/drivers/nvme/target/tcp.c +++ b/drivers/nvme/target/tcp.c @@ -212,6 +212,7 @@ static DEFINE_MUTEX(nvmet_tcp_queue_mutex); =20 static struct workqueue_struct *nvmet_tcp_wq; static const struct nvmet_fabrics_ops nvmet_tcp_ops; +static const struct nvmet_fabrics_ops nvmet_mptcp_ops; static void nvmet_tcp_free_cmd(struct nvmet_tcp_cmd *c); static void nvmet_tcp_free_cmd_buffers(struct nvmet_tcp_cmd *cmd); =20 @@ -999,6 +1000,7 @@ static int nvmet_tcp_done_recv_pdu(struct nvmet_tcp_qu= eue *queue) { struct nvme_tcp_hdr *hdr =3D &queue->pdu.cmd.hdr; struct nvme_command *nvme_cmd =3D &queue->pdu.cmd.cmd; + const struct nvmet_fabrics_ops *ops; struct nvmet_req *req; int ret; =20 @@ -1039,7 +1041,9 @@ static int nvmet_tcp_done_recv_pdu(struct nvmet_tcp_q= ueue *queue) req =3D &queue->cmd->req; memcpy(req->cmd, nvme_cmd, sizeof(*nvme_cmd)); =20 - if (unlikely(!nvmet_req_init(req, &queue->nvme_sq, &nvmet_tcp_ops))) { + ops =3D queue->sock->sk->sk_protocol =3D=3D IPPROTO_MPTCP ? + &nvmet_mptcp_ops : &nvmet_tcp_ops; + if (unlikely(!nvmet_req_init(req, &queue->nvme_sq, ops))) { pr_err("failed cmd %p id %d opcode %d, data_len: %d, status: %04x\n", req->cmd, req->cmd->common.command_id, req->cmd->common.opcode, @@ -2007,6 +2011,7 @@ static int nvmet_tcp_add_port(struct nvmet_port *npor= t) { struct nvmet_tcp_port *port; __kernel_sa_family_t af; + int proto =3D IPPROTO_TCP; int ret; =20 port =3D kzalloc(sizeof(*port), GFP_KERNEL); @@ -2027,6 +2032,11 @@ static int nvmet_tcp_add_port(struct nvmet_port *npo= rt) goto err_port; } =20 +#ifdef CONFIG_MPTCP + if (nport->disc_addr.trtype =3D=3D NVMF_TRTYPE_MPTCP) + proto =3D IPPROTO_MPTCP; +#endif + ret =3D inet_pton_with_scope(&init_net, af, nport->disc_addr.traddr, nport->disc_addr.trsvcid, &port->addr); if (ret) { @@ -2041,7 +2051,7 @@ static int nvmet_tcp_add_port(struct nvmet_port *npor= t) port->nport->inline_data_size =3D NVMET_TCP_DEF_INLINE_DATA_SIZE; =20 ret =3D sock_create(port->addr.ss_family, SOCK_STREAM, - IPPROTO_TCP, &port->sock); + proto, &port->sock); if (ret) { pr_err("failed to create a socket\n"); goto err_port; @@ -2193,6 +2203,19 @@ static const struct nvmet_fabrics_ops nvmet_tcp_ops = =3D { .host_traddr =3D nvmet_tcp_host_port_addr, }; =20 +static const struct nvmet_fabrics_ops nvmet_mptcp_ops =3D { + .owner =3D THIS_MODULE, + .type =3D NVMF_TRTYPE_MPTCP, + .msdbd =3D 1, + .add_port =3D nvmet_tcp_add_port, + .remove_port =3D nvmet_tcp_remove_port, + .queue_response =3D nvmet_tcp_queue_response, + .delete_ctrl =3D nvmet_tcp_delete_ctrl, + .install_queue =3D nvmet_tcp_install_queue, + .disc_traddr =3D nvmet_tcp_disc_port_addr, + .host_traddr =3D nvmet_tcp_host_port_addr, +}; + static int __init nvmet_tcp_init(void) { int ret; @@ -2206,6 +2229,12 @@ static int __init nvmet_tcp_init(void) if (ret) goto err; =20 + ret =3D nvmet_register_transport(&nvmet_mptcp_ops); + if (ret) { + nvmet_unregister_transport(&nvmet_tcp_ops); + goto err; + } + return 0; err: destroy_workqueue(nvmet_tcp_wq); @@ -2216,6 +2245,7 @@ static void __exit nvmet_tcp_exit(void) { struct nvmet_tcp_queue *queue; =20 + nvmet_unregister_transport(&nvmet_mptcp_ops); nvmet_unregister_transport(&nvmet_tcp_ops); =20 flush_workqueue(nvmet_wq); diff --git a/include/linux/nvme.h b/include/linux/nvme.h index 655d194f8e72..8069667ad47e 100644 --- a/include/linux/nvme.h +++ b/include/linux/nvme.h @@ -68,6 +68,7 @@ enum { NVMF_TRTYPE_RDMA =3D 1, /* RDMA */ NVMF_TRTYPE_FC =3D 2, /* Fibre Channel */ NVMF_TRTYPE_TCP =3D 3, /* TCP/IP */ + NVMF_TRTYPE_MPTCP =3D 4, /* Multipath TCP */ NVMF_TRTYPE_LOOP =3D 254, /* Reserved for host usage */ NVMF_TRTYPE_MAX, }; --=20 2.43.0 From nobody Thu Nov 27 12:37:14 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B811213D891 for ; Fri, 7 Nov 2025 03:38:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762486686; cv=none; b=b0faiFRW2W7HRs8rhy4v0hUUVtlSXDrPTv6aUNa38/BNRN9eNFU2cJiiS4U1awhrfA6bMlXCJuQsWR6WODuLjVTCZ4TYmMEAK1szJGVoVHqxTPHgXmADXBbHgbBAFTEbww2HEA20q/T0984kt8fEXyROYrvPlH98FXDT1gHYV7g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762486686; c=relaxed/simple; bh=QkjsIuCIOhTSaxj56q974+fJowLc0XaGHAxmxOcZXGw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=kaSSrOFrrYdDjTIWZbw/dg873QxiryllNyHtdeaRuyY8deKFwAeO4FWHJi9c/XJeuIAHdsBD10esWyTcZwQDjVOmLFxGmxZai+1Fyr5tOTX5dCmFdq2i8khLHrZj7TNPasxAJ8jq9joUfkMaKY8FKVopAf/YLSCXFFDlEWVOMkc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=M+xcDimB; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="M+xcDimB" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9F419C4CEF7; Fri, 7 Nov 2025 03:38:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1762486686; bh=QkjsIuCIOhTSaxj56q974+fJowLc0XaGHAxmxOcZXGw=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=M+xcDimBFijxP29I/+stbdgWDJ4wgwOBQ6n8J4O3XEvknrgs0nv78iEJ96fggrLMn sSEo4KhBQldL6ON4ZQIulyOMUXhNuXkGtpQj5s15FdW4Y6WaFS86SJ+Im985/VFNvQ nbNDz4EP77OQLyPcRgvKMDx2vRGNGdUSX3cfzvGynzodoHX6PcQ8JR+HEts/3qRxBX f8NEGGjpu/vCSXPCD2s1uyORuQ6FeiLcl4wptNDbGHAIP5n6fVwga/gL04On7D2pgE oEyGxXH4SLtVzD3GvV9eNbQny+conVRECbVLLHoX/UC3DlyytEkDJ1zNMD9wtUjkoi 5FfTrCqzuw69g== From: Geliang Tang To: mptcp@lists.linux.dev, hare@suse.de, hare@kernel.org Cc: Geliang Tang , Hui Zhu , Gang Yan , zhenwei pi Subject: [RFC mptcp-next 3/6] mptcp: add sock_set_reuseaddr Date: Fri, 7 Nov 2025 11:37:34 +0800 Message-ID: <32da8f2876b30d35602232e9021d312bd105b2de.1762485513.git.tanggeliang@kylinos.cn> X-Mailer: git-send-email 2.43.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Geliang Tang This patch adds support for the SO_REUSEADDR socket option in MPTCP by introducing a new mptcp_sock_set_reuseaddr() function. The implementation: 1. Exposes the function in include/net/mptcp.h for external use 2. Provides the actual implementation in protocol.c that: - Locks the MPTCP master socket - Retrieves the first subflow socket - Sets SK_CAN_REUSE flag on the subflow - Properly handles locking and error cases The function follows the same pattern as other MPTCP socket option helpers, maintaining proper locking semantics while allowing address reuse on the underlying TCP subflows. This enables MPTCP sockets to bind to addresses that are already in use, which is particularly useful for: - Server applications that need to restart quickly - Applications using well-known ports - Multi-homed scenarios where the same port may be used across interfaces Co-Developed-by: Hui Zhu Signed-off-by: Hui Zhu Co-Developed-by: Gang Yan Signed-off-by: Gang Yan Co-Developed-by: zhenwei pi Signed-off-by: zhenwei pi Signed-off-by: Geliang Tang --- include/net/mptcp.h | 4 ++++ net/mptcp/protocol.c | 15 +++++++++++++++ 2 files changed, 19 insertions(+) diff --git a/include/net/mptcp.h b/include/net/mptcp.h index f275eae0d32f..3488f3506a8e 100644 --- a/include/net/mptcp.h +++ b/include/net/mptcp.h @@ -239,6 +239,8 @@ static inline __be32 mptcp_reset_option(const struct sk= _buff *skb) void mptcp_active_detect_blackhole(struct sock *sk, bool expired); =20 void mptcp_sock_set_nodelay(struct sock *sk); + +void mptcp_sock_set_reuseaddr(struct sock *sk); #else =20 static inline void mptcp_init(void) @@ -327,6 +329,8 @@ static inline __be32 mptcp_reset_option(const struct sk= _buff *skb) { return hto static inline void mptcp_active_detect_blackhole(struct sock *sk, bool exp= ired) { } =20 static void mptcp_sock_set_nodelay(struct sock *sk) { } + +static void mptcp_sock_set_reuseaddr(struct sock *sk) { } #endif /* CONFIG_MPTCP */ =20 #if IS_ENABLED(CONFIG_MPTCP_IPV6) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index b2285b651ebc..34aa3f13e5c1 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -3714,6 +3714,21 @@ void mptcp_sock_set_nodelay(struct sock *sk) } EXPORT_SYMBOL(mptcp_sock_set_nodelay); =20 +void mptcp_sock_set_reuseaddr(struct sock *sk) +{ + struct mptcp_sock *msk =3D mptcp_sk(sk); + struct sock *ssk; + + lock_sock(sk); + ssk =3D __mptcp_nmpc_sk(msk); + if (IS_ERR(ssk)) + goto unlock; + ssk->sk_reuse =3D SK_CAN_REUSE; +unlock: + release_sock(sk); +} +EXPORT_SYMBOL(mptcp_sock_set_reuseaddr); + bool mptcp_finish_join(struct sock *ssk) { struct mptcp_subflow_context *subflow =3D mptcp_subflow_ctx(ssk); --=20 2.43.0 From nobody Thu Nov 27 12:37:14 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C62EA13D891 for ; Fri, 7 Nov 2025 03:38:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762486689; cv=none; b=pIMqmtm8rgOiLaCidoGSr3amz1JunHqzAax3+1+9IR+XF/DH13hjCcrZt/VoPIIkZXS3ZjABqRm7/aM2qfm+kqEKkXAWX6q6FuIap8BRfs6VQI78nFvHBl46GYNhel4RiZU9LoRRodKz9NMpTW+66ugKhNhEnCYzQU84qfQDxdM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762486689; c=relaxed/simple; bh=Mr6kmagnyPdh5SDu9uaqpOYYj3vKbdCtK3achT7ePVQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=OWITxbkPIlJjbHKS4Om++1Xo+W8sjVsb1z3m1HVRV8cNn9KERlMS/uSFxH8PYeVm7QizYuAXzRKmjmjhn7CXodG1JwA0rtYZ2rpGElaKvz9QV+mpUeDng5RjOqpUewE3W59Z3WMS5Zb56lTWNyc3T9j8HpV5JlV8NijdjskE3Kw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=gq4R17kJ; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="gq4R17kJ" Received: by smtp.kernel.org (Postfix) with ESMTPSA id DBA71C19422; Fri, 7 Nov 2025 03:38:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1762486689; bh=Mr6kmagnyPdh5SDu9uaqpOYYj3vKbdCtK3achT7ePVQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=gq4R17kJtDNZdrrcfOJTO65Sc3DZnFH/ayoTBsSP9F7B0QdZE7ZRTz9diZgGPgbaS WUicxApFsnQxrLVMQDlhyhmK7JDX4KpEB6hQOWh90/kqf0Eo7EfIp7FYoLt34WyBtL KN0zDDd7f1p6rpSt04o8o0xPOSmwyoIqbxd9A+60z9Y77K1uwyVJWisaegU6k8E3Sy czlVy+dQSHkMPMKe2OKcUreS5091Fxo4deoIKn3NZX79xdHfJurHtrpVPCOb1pe8rU 6FazjtwvuV5uQtWS7atBr82bxbZSKmkDPGdlfVecKixL2K2wWKZLWlYYzYV8I9SY0m 7GuHAMVlxR3Mg== From: Geliang Tang To: mptcp@lists.linux.dev, hare@suse.de, hare@kernel.org Cc: Geliang Tang , Hui Zhu , Gang Yan , zhenwei pi Subject: [RFC mptcp-next 4/6] nvme-tcp: add mptcp support Date: Fri, 7 Nov 2025 11:37:35 +0800 Message-ID: X-Mailer: git-send-email 2.43.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Geliang Tang This patch defines a new nvmf_transport_ops named nvme_mptcp_transport, which is almost the same as nvme_tcp_transport except .type. Check if opts->transport is "mptcp" in nvme_tcp_alloc_queue() to decide whether to pass IPPROTO_MPTCP to sock_create_kern() to create a MPTCP socket instead of a TCP one. v2: - use 'trtype' instead of '--mptcp' (Hannes) v3: - check mptcp protocol from opts->transport instead of passing a parameter (Hannes). v4: - check CONFIG_MPTCP. Co-Developed-by: Hui Zhu Signed-off-by: Hui Zhu Co-Developed-by: Gang Yan Signed-off-by: Gang Yan Co-Developed-by: zhenwei pi Signed-off-by: zhenwei pi Signed-off-by: Geliang Tang --- drivers/nvme/host/tcp.c | 23 ++++++++++++++++++++++- drivers/nvme/target/tcp.c | 10 ++++++++-- 2 files changed, 30 insertions(+), 3 deletions(-) diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index a80af6471b10..194728ffdf05 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -1767,6 +1767,7 @@ static int nvme_tcp_alloc_queue(struct nvme_ctrl *nct= rl, int qid, { struct nvme_tcp_ctrl *ctrl =3D to_tcp_ctrl(nctrl); struct nvme_tcp_queue *queue =3D &ctrl->queues[qid]; + int proto =3D IPPROTO_TCP; int ret, rcv_pdu_size; struct file *sock_file; =20 @@ -1783,9 +1784,14 @@ static int nvme_tcp_alloc_queue(struct nvme_ctrl *nc= trl, int qid, queue->cmnd_capsule_len =3D sizeof(struct nvme_command) + NVME_TCP_ADMIN_CCSZ; =20 +#ifdef CONFIG_MPTCP + if (!strcmp(ctrl->ctrl.opts->transport, "mptcp")) + proto =3D IPPROTO_MPTCP; +#endif + ret =3D sock_create_kern(current->nsproxy->net_ns, ctrl->addr.ss_family, SOCK_STREAM, - IPPROTO_TCP, &queue->sock); + proto, &queue->sock); if (ret) { dev_err(nctrl->device, "failed to create socket: %d\n", ret); @@ -3024,6 +3030,19 @@ static struct nvmf_transport_ops nvme_tcp_transport = =3D { .create_ctrl =3D nvme_tcp_create_ctrl, }; =20 +static struct nvmf_transport_ops nvme_mptcp_transport =3D { + .name =3D "mptcp", + .module =3D THIS_MODULE, + .required_opts =3D NVMF_OPT_TRADDR, + .allowed_opts =3D NVMF_OPT_TRSVCID | NVMF_OPT_RECONNECT_DELAY | + NVMF_OPT_HOST_TRADDR | NVMF_OPT_CTRL_LOSS_TMO | + NVMF_OPT_HDR_DIGEST | NVMF_OPT_DATA_DIGEST | + NVMF_OPT_NR_WRITE_QUEUES | NVMF_OPT_NR_POLL_QUEUES | + NVMF_OPT_TOS | NVMF_OPT_HOST_IFACE | NVMF_OPT_TLS | + NVMF_OPT_KEYRING | NVMF_OPT_TLS_KEY | NVMF_OPT_CONCAT, + .create_ctrl =3D nvme_tcp_create_ctrl, +}; + static int __init nvme_tcp_init_module(void) { unsigned int wq_flags =3D WQ_MEM_RECLAIM | WQ_HIGHPRI | WQ_SYSFS; @@ -3049,6 +3068,7 @@ static int __init nvme_tcp_init_module(void) atomic_set(&nvme_tcp_cpu_queues[cpu], 0); =20 nvmf_register_transport(&nvme_tcp_transport); + nvmf_register_transport(&nvme_mptcp_transport); return 0; } =20 @@ -3056,6 +3076,7 @@ static void __exit nvme_tcp_cleanup_module(void) { struct nvme_tcp_ctrl *ctrl; =20 + nvmf_unregister_transport(&nvme_mptcp_transport); nvmf_unregister_transport(&nvme_tcp_transport); =20 mutex_lock(&nvme_tcp_ctrl_mutex); diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c index 066dd88e2449..854b70b4a6f4 100644 --- a/drivers/nvme/target/tcp.c +++ b/drivers/nvme/target/tcp.c @@ -16,6 +16,7 @@ #include #include #include +#include #include #include #include @@ -2060,8 +2061,13 @@ static int nvmet_tcp_add_port(struct nvmet_port *npo= rt) port->sock->sk->sk_user_data =3D port; port->data_ready =3D port->sock->sk->sk_data_ready; port->sock->sk->sk_data_ready =3D nvmet_tcp_listen_data_ready; - sock_set_reuseaddr(port->sock->sk); - tcp_sock_set_nodelay(port->sock->sk); + if (proto =3D=3D IPPROTO_MPTCP) { + mptcp_sock_set_reuseaddr(port->sock->sk); + mptcp_sock_set_nodelay(port->sock->sk); + } else { + sock_set_reuseaddr(port->sock->sk); + tcp_sock_set_nodelay(port->sock->sk); + } if (so_priority > 0) sock_set_priority(port->sock->sk, so_priority); =20 --=20 2.43.0 From nobody Thu Nov 27 12:37:14 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 07D4013D891 for ; Fri, 7 Nov 2025 03:38:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762486693; cv=none; b=XNofY3NJy1WRi40jCViKOYjnuFTaN8nS3eHZy5KbaX5tEdtjY4+nt8bmc4SMibWBQdhia0DkL0+qGTP9KHEjVPcVDxqbTz6qdEbCYkRP6LvLjOVVJ2HBDLUpILTefaP0KetaTr81GBMCaY973MkA/4aN/T0NwA6WX3m1/PPMn7U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762486693; c=relaxed/simple; bh=9MUfi1cF/g48GvPD7NmsgtRoRPVrEqVhQtc+Z8NHTqc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Vlq4NbOGEelgd2IzJGrN1Qg6e+gQqkg2xCsqd2HIUBAMsH87mtygQjm96eqmy22tqLLW4pQvQaJE1JIShoG43Wy0RCOUwA6JoJf51iOwuOGma4mivOCo5bD/6nhUXawjOhXEgEbhMthn6od0A5F5I19A3mocuqI4hutVERyvvOA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=u7vSjlDb; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="u7vSjlDb" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4A1E8C19424; Fri, 7 Nov 2025 03:38:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1762486692; bh=9MUfi1cF/g48GvPD7NmsgtRoRPVrEqVhQtc+Z8NHTqc=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=u7vSjlDbFzmytHzG9aqb5qi/piB4oqUtD3jSW1tejFAesJbx+EMXVD9mleU2LsCVj CuJnpUKTsEHyUcREB+NIWqeOdlxucVuWTY+o7gp5hIosDhrmYTuaZE1FTveMX0ZbiF RahQbry28Fn6HLfe4ggawVcC+A0b5rbGUnetARTzBBFFU8lIinly8RLTzXGmi092KC PQE1YsTJjzqZKtJQg87MGgK63zYjugm/RcGZyBoG5syc9TkRXEjB4UNknrc1x8ZshT A72f2gmYLcvM1Lk1x6nDwPtEYPZ62KtffhdJ33RXzs78Y95HGwrrG2OiWKieTmqNCG os6OfzFiN7Ckw== From: Geliang Tang To: mptcp@lists.linux.dev, hare@suse.de, hare@kernel.org Cc: Geliang Tang , Hui Zhu , Gang Yan , zhenwei pi Subject: [RFC mptcp-next 5/6] selftests: mptcp: add NVMe-over-MPTCP test Date: Fri, 7 Nov 2025 11:37:36 +0800 Message-ID: X-Mailer: git-send-email 2.43.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Geliang Tang This patch introduces a new selftest that evaluates NVMe-over-Fabrics performance when using MPTCP as the transport protocol. The test: 1. Sets up a local NVMe target configuration using: - A loopback device as storage backend - MPTCP as the transport type - IPv4 addressing 2. Performs standard NVMe operations including: - Discovery and connection - Device listing - Random read/write performance tests using fio 3. Cleans up all test resources The test script (mptcp_nvme.sh) accepts parameters for transport type and target address, defaulting to MPTCP and localhost. It measures performance using fio with: - 4 parallel jobs - 256 I/O depth - 4KB block size - 10 second runtime - Both random read and write patterns Required kernel config options are added to the MPTCP selftest config file to support NVMe-over-TCP functionality. This test helps validate MPTCP's suitability for storage networking workloads and provides a benchmark for performance comparisons. Co-Developed-by: Hui Zhu Signed-off-by: Hui Zhu Co-Developed-by: Gang Yan Signed-off-by: Gang Yan Co-Developed-by: zhenwei pi Signed-off-by: zhenwei pi Signed-off-by: Geliang Tang --- tools/testing/selftests/net/mptcp/config | 7 +++ .../testing/selftests/net/mptcp/mptcp_nvme.sh | 57 +++++++++++++++++++ 2 files changed, 64 insertions(+) create mode 100755 tools/testing/selftests/net/mptcp/mptcp_nvme.sh diff --git a/tools/testing/selftests/net/mptcp/config b/tools/testing/selft= ests/net/mptcp/config index 59051ee2a986..0eee348eff8b 100644 --- a/tools/testing/selftests/net/mptcp/config +++ b/tools/testing/selftests/net/mptcp/config @@ -34,3 +34,10 @@ CONFIG_NFT_SOCKET=3Dm CONFIG_NFT_TPROXY=3Dm CONFIG_SYN_COOKIES=3Dy CONFIG_VETH=3Dy +CONFIG_CONFIGFS_FS=3Dy +CONFIG_NVME_CORE=3Dy +CONFIG_NVME_FABRICS=3Dy +CONFIG_NVME_TCP=3Dy +CONFIG_NVME_TARGET=3Dy +CONFIG_NVME_TARGET_TCP=3Dy +CONFIG_NVME_MULTIPATH=3Dy diff --git a/tools/testing/selftests/net/mptcp/mptcp_nvme.sh b/tools/testin= g/selftests/net/mptcp/mptcp_nvme.sh new file mode 100755 index 000000000000..329169fac4ca --- /dev/null +++ b/tools/testing/selftests/net/mptcp/mptcp_nvme.sh @@ -0,0 +1,57 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 + +trtype=3D"${1:-mptcp}" +traddr=3D"${2:-127.0.0.1}" +ns=3D1 +port=3D1234 +trsvcid=3D4420 +nqn=3Dnqn.2014-08.org.nvmexpress.${trtype}dev + +dd if=3D/dev/zero of=3D/tmp/test.raw bs=3D1M count=3D0 seek=3D512 +losetup /dev/loop100 /tmp/test.raw +cd /sys/kernel/config/nvmet/subsystems +mkdir ${nqn} +cd ${nqn} +echo 1 > attr_allow_any_host +cd namespaces +mkdir ${ns} +cd ${ns} +echo /dev/loop100 > device_path +echo 1 > enable +cd /sys/kernel/config/nvmet/ports +mkdir ${port} +cd ${port} +echo ${trtype} > addr_trtype +echo ipv4 > addr_adrfam +echo ${traddr} > addr_traddr +echo ${trsvcid} > addr_trsvcid +cd subsystems +ln -s ../../../subsystems/${nqn} ${trtype}subsys + +echo "nvme discover" +nvme discover -t ${trtype} -a ${traddr} -s ${trsvcid} + +echo "nvme connect" +devname=3D$(nvme connect -t ${trtype} -a ${traddr} -s ${trsvcid} -n ${nqn}= | awk '{print $4}') + +sleep 0.5 +echo "nvme list" +nvme list + +fio --name=3Dglobal --direct=3D1 --norandommap --randrepeat=3D0 --ioengine= =3Dlibaio --thread=3D1 --blocksize=3D4k --runtime=3D10 --time_based --rw=3D= randread --numjobs=3D4 --iodepth=3D256 --group_reporting --size=3D100% --na= me=3Dlibaio_4_256_4k_randread --filename=3D/dev/${devname}n1 + +fio --name=3Dglobal --direct=3D1 --norandommap --randrepeat=3D0 --ioengine= =3Dlibaio --thread=3D1 --blocksize=3D4k --runtime=3D10 --time_based --rw=3D= randwrite --numjobs=3D4 --iodepth=3D256 --group_reporting --size=3D100% --n= ame=3Dlibaio_4_256_4k_randread --filename=3D/dev/${devname}n1 + +sleep 0.5 +echo "nvme disconnect" +nvme disconnect -n ${nqn} + +rm -rf /sys/kernel/config/nvmet/ports/${port}/subsystems/${trtype}subsys +rmdir /sys/kernel/config/nvmet/ports/${port} +echo 0 > /sys/kernel/config/nvmet/subsystems/${nqn}/namespaces/${ns}/enable +echo -n 0 > /sys/kernel/config/nvmet/subsystems/${nqn}/namespaces/${ns}/de= vice_path +rmdir /sys/kernel/config/nvmet/subsystems/${nqn}/namespaces/${ns} +rmdir /sys/kernel/config/nvmet/subsystems/${nqn} +losetup -d /dev/loop100 +rm -rf /tmp/test.raw --=20 2.43.0 From nobody Thu Nov 27 12:37:14 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 601B913D891 for ; Fri, 7 Nov 2025 03:38:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762486695; cv=none; b=ib4D7HCvLuF5f5EHuamxXuShkMm46e/uG23m97TrZbGGQSMEXgat0W0pC2VP5xdeTsxky4/+Mtz5ydqAU3jbjUccYaHKgybVkmw/jroIKFq3QMhOWFuZ063Vyk4tK4jVTuKySS9CU6c23QYFKkYCx5QO5HQ9VN8WQrZCMHZzsYQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762486695; c=relaxed/simple; bh=mFSXEgORr33xEfFsLfkZVniXeb6eqzeMzPyvO0vjOSk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=LTVvZYUafFyFEKip2I3KryG5a5qwrNtiFIwEV5p/rfJG1yZWKRY4Dv67Gr6/K5trCIYRa8ubAfbsksoLPwipZUVSZEHHlEPqEESU5IIwyCfFfc8FiyE7hXxgJsaaRlX8CjNhC1cfxLb1lg6QWp+guKC/zx70cD47rkPWNb0XT9c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=t5rpM3CL; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="t5rpM3CL" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6DF07C4CEF7; Fri, 7 Nov 2025 03:38:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1762486695; bh=mFSXEgORr33xEfFsLfkZVniXeb6eqzeMzPyvO0vjOSk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=t5rpM3CL9zcM5qjZ7XDvFu9tV5n+au7jB3MEeP1rBgHRuD1jw+CdSFfh8NLE02+6m 0kvYAIMbDDatRrTuz7WJsDv4A0WaBRQgQEW8+RH1+PPYb4clCvnWNJ7xI4h98neT5l 4FG01IPHNDjiF3A0aJEE/1cdQlliCLPb6WDruEEeU9pVp+qvTc6pUt/ySLDvtJHgwM 7DRLJSBPuAu8OE6CbgvOpM1e2zSJE9+xfx1Te4ItRUyXt3DesvzNMEM/iF6CHClkXZ MGbaMW/Y3WQREZpn2F9+v20QOXQJy/cC1nu4Ax8rm4tnlJTaVjVuRMbUvbK2UTstXT WWpMkwFC9N9Tg== From: Geliang Tang To: mptcp@lists.linux.dev, hare@suse.de, hare@kernel.org Cc: Geliang Tang Subject: [RFC mptcp-next 6/6] nvmet-tcp: clear MSG_DONTWAIT for MPTCP (TODO: HELP WANTED) Date: Fri, 7 Nov 2025 11:37:37 +0800 Message-ID: X-Mailer: git-send-email 2.43.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Geliang Tang When running the tests from a previous commit in a loop, there is about a one in ten chance of reporting the following "timeout" error: ''' [ 32.867710] nvme nvme0: I/O tag 1 (0001) type 4 opcode 0x2 (I/O Cmd) QID= 2 timeout [ 32.867929] nvme nvme0: starting error recovery [ 32.867994] nvme nvme0: I/O tag 2 (0002) type 4 opcode 0x2 (I/O Cmd) QID= 2 timeout [ 32.868112] nvme nvme0: I/O tag 3 (0003) type 4 opcode 0x2 (I/O Cmd) QID= 2 timeout [ 32.868359] nvme nvme0: I/O tag 4 (0004) type 4 opcode 0x2 (I/O Cmd) QID= 2 timeout [ 32.868446] nvme nvme0: I/O tag 5 (0005) type 4 opcode 0x2 (I/O Cmd) QID= 2 timeout [ 32.868592] nvme0c0n1: I/O Cmd(0x2) @ LBA 1046528, 8 blocks, I/O Error (= sct 0x3 / sc 0x70) [ 32.868817] recoverable transport error, dev nvme0c0n1, sector 1046528 o= p 0x0:(READ) flags 0x2080700 phys_seg 1 prio class 0 [ 32.868976] block nvme0n1: no usable path - requeuing I/O [ 32.869038] nvme nvme0: I/O tag 6 (0006) type 4 opcode 0x2 (I/O Cmd) QID= 2 timeout [ 32.869119] nvme nvme0: I/O tag 7 (0007) type 4 opcode 0x2 (I/O Cmd) QID= 2 timeout ... ... [ 32.877945] nvme nvme0: I/O tag 122 (107a) type 4 opcode 0x2 (I/O Cmd) Q= ID 2 timeout [ 32.878025] block nvme0n1: no usable path - requeuing I/O [ 32.878079] block nvme0n1: no usable path - requeuing I/O [ 32.878128] block nvme0n1: no usable path - requeuing I/O [ 32.878180] block nvme0n1: no usable path - requeuing I/O [ 32.878238] block nvme0n1: no usable path - requeuing I/O [ 32.878296] block nvme0n1: no usable path - requeuing I/O [ 32.878350] block nvme0n1: no usable path - requeuing I/O [ 32.878403] block nvme0n1: no usable path - requeuing I/O [ 32.878455] block nvme0n1: no usable path - requeuing I/O [ 32.883603] nvme nvme0: Reconnecting in 10 seconds... ''' Through debugging, I discovered that setting the MSG_DONTWAIT flag leads to MPTCP returning EAGAIN in mptcp_sendmsg. This patch addresses it by dropping the MSG_DONTWAIT flag for MPTCP. I know this isn't an ideal fix, and I need your suggestions. Signed-off-by: Geliang Tang --- drivers/nvme/target/tcp.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c index 854b70b4a6f4..683a451bcb08 100644 --- a/drivers/nvme/target/tcp.c +++ b/drivers/nvme/target/tcp.c @@ -645,12 +645,15 @@ static int nvmet_try_send_data(struct nvmet_tcp_cmd *= cmd, bool last_in_batch) =20 while (cmd->cur_sg) { struct msghdr msg =3D { - .msg_flags =3D MSG_DONTWAIT | MSG_SPLICE_PAGES, + .msg_flags =3D MSG_SPLICE_PAGES, }; struct page *page =3D sg_page(cmd->cur_sg); struct bio_vec bvec; u32 left =3D cmd->cur_sg->length - cmd->offset; =20 + if (cmd->queue->sock->sk->sk_protocol !=3D IPPROTO_MPTCP) + msg.msg_flags |=3D MSG_DONTWAIT; + if ((!last_in_batch && cmd->queue->send_list_len) || cmd->wbytes_done + left < cmd->req.transfer_len || queue->data_digest || !queue->nvme_sq.sqhd_disabled) @@ -694,12 +697,15 @@ static int nvmet_try_send_data(struct nvmet_tcp_cmd *= cmd, bool last_in_batch) static int nvmet_try_send_response(struct nvmet_tcp_cmd *cmd, bool last_in_batch) { - struct msghdr msg =3D { .msg_flags =3D MSG_DONTWAIT | MSG_SPLICE_PAGES, }; + struct msghdr msg =3D { .msg_flags =3D MSG_SPLICE_PAGES, }; struct bio_vec bvec; u8 hdgst =3D nvmet_tcp_hdgst_len(cmd->queue); int left =3D sizeof(*cmd->rsp_pdu) - cmd->offset + hdgst; int ret; =20 + if (cmd->queue->sock->sk->sk_protocol !=3D IPPROTO_MPTCP) + msg.msg_flags |=3D MSG_DONTWAIT; + if (!last_in_batch && cmd->queue->send_list_len) msg.msg_flags |=3D MSG_MORE; else --=20 2.43.0