From nobody Wed Apr 1 23:46:40 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 579563EDADB for ; Tue, 31 Mar 2026 10:28:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774952920; cv=none; b=DMqYlhfbYtT/YZTNUUxKpNF8xAxL6Zku0pQ2X3jbnJDeby2UoOlzAZUtmBJBIyAoGxJDRtmq1sEM3EntnOX20uJ9WGYFD0f17WaRhtVWjiI1D04nOdmmSDKwusgxTWxXsKhuO4FOi6PrEotMIlUYqKna2dFS6VfQoQGsy5tClTM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774952920; c=relaxed/simple; bh=0/tZA/5bGFdGSJopCPeV59LWuPC37pi1VzZul6wBc0s=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=QmmkXWEn8b6+Md2sgmvlbtER8zAi2BCKYLFMT05H+UcioTzX/WMVAB7e3ztWegoX6hyZIMYmsifSq+2hiTY4z9YBXFr7my09s9Rtht/ko5KkIj4opF+iV2LXdSWZiV0M0gOu7ucwxWRt98XzE9Nu5iMSb9gfw2uaKtaPRxbJydQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=nB1+D74f; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="nB1+D74f" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4D9ACC2BCB0; Tue, 31 Mar 2026 10:28:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1774952919; bh=0/tZA/5bGFdGSJopCPeV59LWuPC37pi1VzZul6wBc0s=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=nB1+D74fXs6Cs69Qg1Swp5FLFBOvjBtCctqQeSRo1hTW0VTKzvMk6sSPUZ7hernPw 0r00p2h3aljLGtFFhJe4NSlnJWutxI6hbdzdTQVTYXS6X0KVWfODy4e+hMYPlizDtA JAwv6LnwprxWoLykp6HjmvjxSiX+6VB9wAmnWhOI3CY7raBHkMOHqGeuIoAe7HqMW0 lLuaSsDr8ghDzuiH9MNchnFHLSNNl+JsLCcc+ldeKqoqgtIxGSuAL4oBjtF3oJWPTG 7BRnbO6wBJ7pXb31UG3tE2KJrZvqmRoi903y6yOKITqgMt60GU7BZBdkPGsv1SWnIA eCBpIZbzlOrrg== From: Geliang Tang To: mptcp@lists.linux.dev Cc: Geliang Tang , Hannes Reinecke , zhenwei pi , Hui Zhu , Gang Yan Subject: [RFC mptcp-next v7 1/7] nvmet-tcp: define target tcp_sockops struct Date: Tue, 31 Mar 2026 18:28:25 +0800 Message-ID: X-Mailer: git-send-email 2.51.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Geliang Tang To add MPTCP support in "NVMe over TCP", the target side needs to pass IPPROTO_MPTCP to sock_create() instead of IPPROTO_TCP to create an MPTCP socket. Additionally, the setsockopt operations for this socket need to be switched to a set of MPTCP-specific functions. This patch defines the nvmet_tcp_sockops structure, which contains the protocol of the socket and a set of function pointers for these socket operations. A "sockops" field is also added to struct nvmet_tcp_port. A TCP-specific version of struct nvmet_tcp_sockops is defined. In nvmet_tcp_add_port(), port->sockops is set to nvmet_tcp_sockops based on whether trtype is TCP. All locations that previously called TCP setsockopt functions are updated to call the corresponding function pointers in the nvmet_tcp_sockops structure. Cc: Hannes Reinecke Co-developed-by: zhenwei pi Signed-off-by: zhenwei pi Co-developed-by: Hui Zhu Signed-off-by: Hui Zhu Co-developed-by: Gang Yan Signed-off-by: Gang Yan Signed-off-by: Geliang Tang --- drivers/nvme/target/tcp.c | 43 ++++++++++++++++++++++++++++++++------- 1 file changed, 36 insertions(+), 7 deletions(-) diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c index acc71a26733f..dc1207d96b30 100644 --- a/drivers/nvme/target/tcp.c +++ b/drivers/nvme/target/tcp.c @@ -198,12 +198,22 @@ struct nvmet_tcp_queue { void (*write_space)(struct sock *); }; =20 +struct nvmet_tcp_sockops { + int proto; + void (*set_reuseaddr)(struct sock *sk); + void (*set_nodelay)(struct sock *sk); + void (*set_priority)(struct sock *sk, u32 priority); + void (*no_linger)(struct sock *sk); + void (*set_tos)(struct sock *sk, int val); +}; + struct nvmet_tcp_port { struct socket *sock; struct work_struct accept_work; struct nvmet_port *nport; struct sockaddr_storage addr; void (*data_ready)(struct sock *); + const struct nvmet_tcp_sockops *sockops; }; =20 static DEFINE_IDA(nvmet_tcp_queue_ida); @@ -1698,19 +1708,22 @@ static int nvmet_tcp_set_queue_sock(struct nvmet_tc= p_queue *queue) if (ret < 0) return ret; =20 + if (!queue->port || !queue->port->sockops) + return -EINVAL; + /* * Cleanup whatever is sitting in the TCP transmit queue on socket * close. This is done to prevent stale data from being sent should * the network connection be restored before TCP times out. */ - sock_no_linger(sock->sk); + queue->port->sockops->no_linger(sock->sk); =20 if (so_priority > 0) - sock_set_priority(sock->sk, so_priority); + queue->port->sockops->set_priority(sock->sk, so_priority); =20 /* Set socket type of service */ if (inet->rcv_tos > 0) - ip_sock_set_tos(sock->sk, inet->rcv_tos); + queue->port->sockops->set_tos(sock->sk, inet->rcv_tos); =20 ret =3D 0; write_lock_bh(&sock->sk->sk_callback_lock); @@ -2030,6 +2043,15 @@ static void nvmet_tcp_listen_data_ready(struct sock = *sk) read_unlock_bh(&sk->sk_callback_lock); } =20 +static const struct nvmet_tcp_sockops nvmet_tcp_sockops =3D { + .proto =3D IPPROTO_TCP, + .set_reuseaddr =3D sock_set_reuseaddr, + .set_nodelay =3D tcp_sock_set_nodelay, + .set_priority =3D sock_set_priority, + .no_linger =3D sock_no_linger, + .set_tos =3D ip_sock_set_tos, +}; + static int nvmet_tcp_add_port(struct nvmet_port *nport) { struct nvmet_tcp_port *port; @@ -2054,6 +2076,13 @@ static int nvmet_tcp_add_port(struct nvmet_port *npo= rt) goto err_port; } =20 + if (nport->disc_addr.trtype =3D=3D NVMF_TRTYPE_TCP) { + port->sockops =3D &nvmet_tcp_sockops; + } else { + ret =3D -EINVAL; + goto err_port; + } + ret =3D inet_pton_with_scope(&init_net, af, nport->disc_addr.traddr, nport->disc_addr.trsvcid, &port->addr); if (ret) { @@ -2068,7 +2097,7 @@ static int nvmet_tcp_add_port(struct nvmet_port *npor= t) port->nport->inline_data_size =3D NVMET_TCP_DEF_INLINE_DATA_SIZE; =20 ret =3D sock_create(port->addr.ss_family, SOCK_STREAM, - IPPROTO_TCP, &port->sock); + port->sockops->proto, &port->sock); if (ret) { pr_err("failed to create a socket\n"); goto err_port; @@ -2077,10 +2106,10 @@ static int nvmet_tcp_add_port(struct nvmet_port *np= ort) port->sock->sk->sk_user_data =3D port; port->data_ready =3D port->sock->sk->sk_data_ready; port->sock->sk->sk_data_ready =3D nvmet_tcp_listen_data_ready; - sock_set_reuseaddr(port->sock->sk); - tcp_sock_set_nodelay(port->sock->sk); + port->sockops->set_reuseaddr(port->sock->sk); + port->sockops->set_nodelay(port->sock->sk); if (so_priority > 0) - sock_set_priority(port->sock->sk, so_priority); + port->sockops->set_priority(port->sock->sk, so_priority); =20 ret =3D kernel_bind(port->sock, (struct sockaddr_unsized *)&port->addr, sizeof(port->addr)); --=20 2.51.0 From nobody Wed Apr 1 23:46:40 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 75D653B6BE8 for ; Tue, 31 Mar 2026 10:28:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774952922; cv=none; b=RJQG4Dwpmgt0y5TyoaBz7r3dVvgUWnAqeOnnlEp/SEJX3+kiPuEBr/FbguAgjvL+t0BpQ+goE9habrECR+2+2eGa4ya9VtlwKHCbcsB5rSSXQEZRKyE4D4tngNyGeDQSCYXnkoeeJwHiHo8ixn2MqEQs3yoviHW2XpFf/jFpX8Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774952922; c=relaxed/simple; bh=f1zqQP08G+WWiowLS75zCHHyO39CYJD/+34Vw/MQOXU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=IPOCYq/KstUxXrDNiGLZZfJwjn4G3MxXYfr/edKW9eTFnWuKfINgq+R0Q0AZQsT8xnjbPimayoEw6UY5O8w1EhYH9vhYNdYkYPAvTv7y3Ia5sNvV60KkRZ/3q1ChC89818e23eptc4REH00p/SAEPlQhYLqQ9M+CbDzwlIBQ8/c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=LeCJW3KF; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="LeCJW3KF" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 72244C19423; Tue, 31 Mar 2026 10:28:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1774952922; bh=f1zqQP08G+WWiowLS75zCHHyO39CYJD/+34Vw/MQOXU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=LeCJW3KFl/vRNtRnY9Abm2LzXCePFa9pPKy2FeUbmJ+TsElkDWPngHwdqIo3zn5Yj V5CVx7j0tz6aqfWF9imTswTTA/vkBLBZZ+C4TAj+MHdl3VjpfejFDs3eBI6I2ZNnF9 EW6IOGoP3xqRhLBSmTcJivhVsQy0gFByK7aniC0tL358lMXCclh5i2CS5BkLbAkikh lSThMGkiQ2Jcq010OqZhMFxhlp+u2/S4g8Nn5Sgfda0c0z+BaGJpLIZyay6s60/Wkv l1Z1icd0O1CR99jPhwkEi9TWO1cpWFqIW42Wpd3/pxW8ExM/5UiiKd9TQqYrD4Yxd+ PDvkONDPIse9Q== From: Geliang Tang To: mptcp@lists.linux.dev Cc: Geliang Tang , Hannes Reinecke , zhenwei pi , Hui Zhu , Gang Yan Subject: [RFC mptcp-next v7 2/7] nvmet-tcp: implement target mptcp sockops Date: Tue, 31 Mar 2026 18:28:26 +0800 Message-ID: <08d0655c8758754aef67b9c9fa211664c2f6e73e.1774952107.git.tanggeliang@kylinos.cn> X-Mailer: git-send-email 2.51.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Geliang Tang This patch introduces a new NVMe target transport type NVMF_TRTYPE_MPTCP to support MPTCP. An MPTCP-specific version of struct nvmet_tcp_sockops is implemented, and it is assigned to port->sockops when the transport type is MPTCP. Dedicated MPTCP helpers are introduced for setting socket options. These helpers set the values on the first subflow socket of an MPTCP connection. The values are then synchronized to other newly created subflows in sync_socket_options(). Cc: Hannes Reinecke Co-developed-by: zhenwei pi Signed-off-by: zhenwei pi Co-developed-by: Hui Zhu Signed-off-by: Hui Zhu Co-developed-by: Gang Yan Signed-off-by: Gang Yan Signed-off-by: Geliang Tang --- drivers/nvme/target/tcp.c | 15 ++++++ include/linux/nvme.h | 1 + include/net/mptcp.h | 20 ++++++++ net/mptcp/sockopt.c | 98 +++++++++++++++++++++++++++++++++++++++ 4 files changed, 134 insertions(+) diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c index dc1207d96b30..8471b14a7ee8 100644 --- a/drivers/nvme/target/tcp.c +++ b/drivers/nvme/target/tcp.c @@ -2052,6 +2052,17 @@ static const struct nvmet_tcp_sockops nvmet_tcp_sock= ops =3D { .set_tos =3D ip_sock_set_tos, }; =20 +#ifdef CONFIG_MPTCP +static const struct nvmet_tcp_sockops nvmet_mptcp_sockops =3D { + .proto =3D IPPROTO_MPTCP, + .set_reuseaddr =3D mptcp_sock_set_reuseaddr, + .set_nodelay =3D mptcp_sock_set_nodelay, + .set_priority =3D mptcp_sock_set_priority, + .no_linger =3D mptcp_sock_no_linger, + .set_tos =3D mptcp_sock_set_tos, +}; +#endif + static int nvmet_tcp_add_port(struct nvmet_port *nport) { struct nvmet_tcp_port *port; @@ -2078,6 +2089,10 @@ static int nvmet_tcp_add_port(struct nvmet_port *npo= rt) =20 if (nport->disc_addr.trtype =3D=3D NVMF_TRTYPE_TCP) { port->sockops =3D &nvmet_tcp_sockops; +#ifdef CONFIG_MPTCP + } else if (nport->disc_addr.trtype =3D=3D NVMF_TRTYPE_MPTCP) { + port->sockops =3D &nvmet_mptcp_sockops; +#endif } else { ret =3D -EINVAL; goto err_port; diff --git a/include/linux/nvme.h b/include/linux/nvme.h index 655d194f8e72..8069667ad47e 100644 --- a/include/linux/nvme.h +++ b/include/linux/nvme.h @@ -68,6 +68,7 @@ enum { NVMF_TRTYPE_RDMA =3D 1, /* RDMA */ NVMF_TRTYPE_FC =3D 2, /* Fibre Channel */ NVMF_TRTYPE_TCP =3D 3, /* TCP/IP */ + NVMF_TRTYPE_MPTCP =3D 4, /* Multipath TCP */ NVMF_TRTYPE_LOOP =3D 254, /* Reserved for host usage */ NVMF_TRTYPE_MAX, }; diff --git a/include/net/mptcp.h b/include/net/mptcp.h index 4cf59e83c1c5..91ce7b9b639d 100644 --- a/include/net/mptcp.h +++ b/include/net/mptcp.h @@ -237,6 +237,16 @@ static inline __be32 mptcp_reset_option(const struct s= k_buff *skb) } =20 void mptcp_active_detect_blackhole(struct sock *sk, bool expired); + +void mptcp_sock_set_reuseaddr(struct sock *sk); + +void mptcp_sock_set_nodelay(struct sock *sk); + +void mptcp_sock_set_priority(struct sock *sk, u32 priority); + +void mptcp_sock_no_linger(struct sock *sk); + +void mptcp_sock_set_tos(struct sock *sk, int val); #else =20 static inline void mptcp_init(void) @@ -323,6 +333,16 @@ static inline struct request_sock *mptcp_subflow_reqsk= _alloc(const struct reques static inline __be32 mptcp_reset_option(const struct sk_buff *skb) { retu= rn htonl(0u); } =20 static inline void mptcp_active_detect_blackhole(struct sock *sk, bool exp= ired) { } + +static inline void mptcp_sock_set_reuseaddr(struct sock *sk) { } + +static inline void mptcp_sock_set_nodelay(struct sock *sk) { } + +static inline void mptcp_sock_set_priority(struct sock *sk, u32 priority) = { } + +static inline void mptcp_sock_no_linger(struct sock *sk) { } + +static inline void mptcp_sock_set_tos(struct sock *sk, int val) { } #endif /* CONFIG_MPTCP */ =20 #if IS_ENABLED(CONFIG_MPTCP_IPV6) diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c index de90a2897d2d..2ea2e46977b9 100644 --- a/net/mptcp/sockopt.c +++ b/net/mptcp/sockopt.c @@ -1537,6 +1537,7 @@ static void sync_socket_options(struct mptcp_sock *ms= k, struct sock *ssk) static const unsigned int tx_rx_locks =3D SOCK_RCVBUF_LOCK | SOCK_SNDBUF_= LOCK; struct sock *sk =3D (struct sock *)msk; bool keep_open; + u32 priority; =20 keep_open =3D sock_flag(sk, SOCK_KEEPOPEN); if (ssk->sk_prot->keepalive) @@ -1586,6 +1587,11 @@ static void sync_socket_options(struct mptcp_sock *m= sk, struct sock *ssk) inet_assign_bit(FREEBIND, ssk, inet_test_bit(FREEBIND, sk)); inet_assign_bit(BIND_ADDRESS_NO_PORT, ssk, inet_test_bit(BIND_ADDRESS_NO_= PORT, sk)); WRITE_ONCE(inet_sk(ssk)->local_port_range, READ_ONCE(inet_sk(sk)->local_p= ort_range)); + + ssk->sk_reuse =3D sk->sk_reuse; + priority =3D READ_ONCE(sk->sk_priority); + if (priority > 0) + sock_set_priority(ssk, priority); } =20 void mptcp_sockopt_sync_locked(struct mptcp_sock *msk, struct sock *ssk) @@ -1652,3 +1658,95 @@ int mptcp_set_rcvlowat(struct sock *sk, int val) } return 0; } + +void mptcp_sock_set_reuseaddr(struct sock *sk) +{ + struct mptcp_sock *msk =3D mptcp_sk(sk); + struct sock *ssk; + + lock_sock(sk); + sockopt_seq_inc(msk); + sk->sk_reuse =3D SK_CAN_REUSE; + ssk =3D __mptcp_nmpc_sk(msk); + if (IS_ERR(ssk)) + goto unlock; + sock_set_reuseaddr(ssk); +unlock: + release_sock(sk); +} +EXPORT_SYMBOL(mptcp_sock_set_reuseaddr); + +void mptcp_sock_set_nodelay(struct sock *sk) +{ + struct mptcp_sock *msk =3D mptcp_sk(sk); + struct sock *ssk; + + lock_sock(sk); + sockopt_seq_inc(msk); + msk->nodelay =3D true; + ssk =3D __mptcp_nmpc_sk(msk); + if (IS_ERR(ssk)) + goto unlock; + lock_sock(ssk); + __tcp_sock_set_nodelay(ssk, true); + release_sock(ssk); +unlock: + release_sock(sk); +} +EXPORT_SYMBOL(mptcp_sock_set_nodelay); + +void mptcp_sock_set_priority(struct sock *sk, u32 priority) +{ + struct mptcp_sock *msk =3D mptcp_sk(sk); + struct sock *ssk; + + lock_sock(sk); + sockopt_seq_inc(msk); + sock_set_priority(sk, priority); + ssk =3D msk->first; + if (IS_ERR(ssk)) + goto unlock; + lock_sock(ssk); + sock_set_priority(ssk, priority); + release_sock(ssk); +unlock: + release_sock(sk); +} +EXPORT_SYMBOL(mptcp_sock_set_priority); + +void mptcp_sock_no_linger(struct sock *sk) +{ + struct mptcp_sock *msk =3D mptcp_sk(sk); + struct sock *ssk; + + lock_sock(sk); + sockopt_seq_inc(msk); + WRITE_ONCE(sk->sk_lingertime, 0); + sock_set_flag(sk, SOCK_LINGER); + ssk =3D msk->first; + if (IS_ERR(ssk)) + goto unlock; + sock_no_linger(ssk); +unlock: + release_sock(sk); +} +EXPORT_SYMBOL(mptcp_sock_no_linger); + +void mptcp_sock_set_tos(struct sock *sk, int val) +{ + struct mptcp_sock *msk =3D mptcp_sk(sk); + struct sock *ssk; + + lock_sock(sk); + sockopt_seq_inc(msk); + __ip_sock_set_tos(sk, val); + ssk =3D msk->first; + if (IS_ERR(ssk)) + goto unlock; + lock_sock(ssk); + __ip_sock_set_tos(ssk, val); + release_sock(ssk); +unlock: + release_sock(sk); +} +EXPORT_SYMBOL(mptcp_sock_set_tos); --=20 2.51.0 From nobody Wed Apr 1 23:46:40 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8905D3EE1FD for ; Tue, 31 Mar 2026 10:28:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774952924; cv=none; b=Rj8gx2902l/lOSLgdW3nabhfJIGNAVQkeaiLv8lnubEbGP2z+E6lRIm5e+0u8MmgIdWzM4FgbAc1Q4f/HBG2SqmCZRTtMzK81CUsg9DESzNhc8DOsWW7vUPl9nGFLEMb6r5HAefY8gJatc1m6JB9Qn9yEZdBwcA4OVPst8nN/6g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774952924; c=relaxed/simple; bh=K5BE5D/1DQQ/37SUbxY4JuB1sVYIZEygxTM4z2dlUIo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=RfUtUqVXiNhrDq2W+x+Ac5klR0/Ko2m4hqkfYL9mYbWjgD/mUBycHyCdaEfrFWcz8CCa/PDKzQ2tRYw6hh0wHkde8a6rWsjtrhXVFEtGnU5rUNnODkcFdYX7LR5BmN7afd5nfdSXOHTB6Ge9eeqlYzJUzMjxlEES0VN6b0o9nM4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=UjXAsE5a; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="UjXAsE5a" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8AED5C2BCB0; Tue, 31 Mar 2026 10:28:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1774952924; bh=K5BE5D/1DQQ/37SUbxY4JuB1sVYIZEygxTM4z2dlUIo=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=UjXAsE5ab8mn/B/PJEIZ8B1+ZlHwYgBgCWe8Z/Z82VBm2teKkuI51s6GZPj8MltK3 HygeJqBXAQzcZjcbOMF2DSBN6Yo5nHv59ABm6rkNpBmGDrukbxmQTOq6SyRDQytA/s qwbPBJQraKd4nZcTDvncUyQXVnELhw7LeLu/0yAUzmeoBS9OnA6fPTXs+fyiygfBbK aUAzTAIb9GtNad85bAqaLrpbX9cBczpjB1mvJDOI59eFLrGQlepf9oya5FaS14gaNI BrxHVThboxFe7GwEdm78EzfLOVE1UvepnKO3S+IfwuUQK9VKlujzPavnuBBqrjkiho HywbhECPOs7cA== From: Geliang Tang To: mptcp@lists.linux.dev Cc: Geliang Tang , Hannes Reinecke , zhenwei pi , Hui Zhu , Gang Yan Subject: [RFC mptcp-next v7 3/7] nvmet-tcp: register target mptcp transport Date: Tue, 31 Mar 2026 18:28:27 +0800 Message-ID: <2f5ae03c4a7511e757c777f6b263a41df286ac26.1774952107.git.tanggeliang@kylinos.cn> X-Mailer: git-send-email 2.51.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Geliang Tang This patch defines a new nvmet_fabrics_ops named nvmet_mptcp_ops, which is almost identical to nvmet_tcp_ops except for the .type field. It is registered in nvmet_tcp_init() and unregistered in nvmet_tcp_exit(). This new nvmet_fabrics_ops is selected in nvmet_tcp_done_recv_pdu() based on the protocol type. A MODULE_ALIAS for "nvmet-transport-4" is also added. v2: - use trtype instead of tsas (Hannes). v3: - check mptcp protocol from disc_addr.trtype instead of passing a parameter (Hannes). v4: - check CONFIG_MPTCP. Cc: Hannes Reinecke Co-developed-by: zhenwei pi Signed-off-by: zhenwei pi Co-developed-by: Hui Zhu Signed-off-by: Hui Zhu Co-developed-by: Gang Yan Signed-off-by: Gang Yan Signed-off-by: Geliang Tang --- drivers/nvme/target/configfs.c | 1 + drivers/nvme/target/tcp.c | 41 +++++++++++++++++++++++++++++++++- 2 files changed, 41 insertions(+), 1 deletion(-) diff --git a/drivers/nvme/target/configfs.c b/drivers/nvme/target/configfs.c index 3088e044dbcb..4b7498ffb102 100644 --- a/drivers/nvme/target/configfs.c +++ b/drivers/nvme/target/configfs.c @@ -38,6 +38,7 @@ static struct nvmet_type_name_map nvmet_transport[] =3D { { NVMF_TRTYPE_RDMA, "rdma" }, { NVMF_TRTYPE_FC, "fc" }, { NVMF_TRTYPE_TCP, "tcp" }, + { NVMF_TRTYPE_MPTCP, "mptcp" }, { NVMF_TRTYPE_PCI, "pci" }, { NVMF_TRTYPE_LOOP, "loop" }, }; diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c index 8471b14a7ee8..1ca9fbbaea92 100644 --- a/drivers/nvme/target/tcp.c +++ b/drivers/nvme/target/tcp.c @@ -222,6 +222,9 @@ static DEFINE_MUTEX(nvmet_tcp_queue_mutex); =20 static struct workqueue_struct *nvmet_tcp_wq; static const struct nvmet_fabrics_ops nvmet_tcp_ops; +#ifdef CONFIG_MPTCP +static const struct nvmet_fabrics_ops nvmet_mptcp_ops; +#endif static void nvmet_tcp_free_cmd(struct nvmet_tcp_cmd *c); static void nvmet_tcp_free_cmd_buffers(struct nvmet_tcp_cmd *cmd); =20 @@ -1037,6 +1040,7 @@ static int nvmet_tcp_done_recv_pdu(struct nvmet_tcp_q= ueue *queue) { struct nvme_tcp_hdr *hdr =3D &queue->pdu.cmd.hdr; struct nvme_command *nvme_cmd =3D &queue->pdu.cmd.cmd; + const struct nvmet_fabrics_ops *ops; struct nvmet_req *req; int ret; =20 @@ -1077,7 +1081,15 @@ static int nvmet_tcp_done_recv_pdu(struct nvmet_tcp_= queue *queue) req =3D &queue->cmd->req; memcpy(req->cmd, nvme_cmd, sizeof(*nvme_cmd)); =20 - if (unlikely(!nvmet_req_init(req, &queue->nvme_sq, &nvmet_tcp_ops))) { + if (queue->sock->sk->sk_protocol =3D=3D IPPROTO_TCP) + ops =3D &nvmet_tcp_ops; +#ifdef CONFIG_MPTCP + else if (queue->sock->sk->sk_protocol =3D=3D IPPROTO_MPTCP) + ops =3D &nvmet_mptcp_ops; +#endif + else + return -EINVAL; + if (unlikely(!nvmet_req_init(req, &queue->nvme_sq, ops))) { pr_err("failed cmd %p id %d opcode %d, data_len: %d, status: %04x\n", req->cmd, req->cmd->common.command_id, req->cmd->common.opcode, @@ -2264,6 +2276,21 @@ static const struct nvmet_fabrics_ops nvmet_tcp_ops = =3D { .host_traddr =3D nvmet_tcp_host_port_addr, }; =20 +#ifdef CONFIG_MPTCP +static const struct nvmet_fabrics_ops nvmet_mptcp_ops =3D { + .owner =3D THIS_MODULE, + .type =3D NVMF_TRTYPE_MPTCP, + .msdbd =3D 1, + .add_port =3D nvmet_tcp_add_port, + .remove_port =3D nvmet_tcp_remove_port, + .queue_response =3D nvmet_tcp_queue_response, + .delete_ctrl =3D nvmet_tcp_delete_ctrl, + .install_queue =3D nvmet_tcp_install_queue, + .disc_traddr =3D nvmet_tcp_disc_port_addr, + .host_traddr =3D nvmet_tcp_host_port_addr, +}; +#endif + static int __init nvmet_tcp_init(void) { int ret; @@ -2277,6 +2304,14 @@ static int __init nvmet_tcp_init(void) if (ret) goto err; =20 +#ifdef CONFIG_MPTCP + ret =3D nvmet_register_transport(&nvmet_mptcp_ops); + if (ret) { + nvmet_unregister_transport(&nvmet_tcp_ops); + goto err; + } +#endif + return 0; err: destroy_workqueue(nvmet_tcp_wq); @@ -2287,6 +2322,9 @@ static void __exit nvmet_tcp_exit(void) { struct nvmet_tcp_queue *queue; =20 +#ifdef CONFIG_MPTCP + nvmet_unregister_transport(&nvmet_mptcp_ops); +#endif nvmet_unregister_transport(&nvmet_tcp_ops); =20 flush_workqueue(nvmet_wq); @@ -2306,3 +2344,4 @@ module_exit(nvmet_tcp_exit); MODULE_DESCRIPTION("NVMe target TCP transport driver"); MODULE_LICENSE("GPL v2"); MODULE_ALIAS("nvmet-transport-3"); /* 3 =3D=3D NVMF_TRTYPE_TCP */ +MODULE_ALIAS("nvmet-transport-4"); /* 4 =3D=3D NVMF_TRTYPE_MPTCP */ --=20 2.51.0 From nobody Wed Apr 1 23:46:40 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 77C3E3ED128 for ; Tue, 31 Mar 2026 10:28:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774952926; cv=none; b=bG6idNZ+rEKtT9X3jpzOFreH07ORu5CX4a+eufCgDgwyvPjqA2KvK+Y4mSYNdsEZlfLrfT1hDhozwNxCXS3MHaiYois9lZAJsfd8lK0dokxa1BVS0wFKIDGMqXGus+LC2XyutMFozr+I/nyH9N0Io/LsUHXjseUuwHRFL+OoG7I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774952926; c=relaxed/simple; bh=WNGyQP0yiyg2l5WSpsWeNaSyG2Y9MPeEGVueFkXO1oI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=dgnSVS6pycyCn3TS0rVfsUar25lFy0bxCRM400npcg5dEbLERBotEL57lF/sJQOZcZou5CQpFXThL3I24QSlimO3lBsSDuTe/uuj5bECwdd24vuyJ7gvq+CiOkvplgR/qP94jnWuNt5KL4aZSZtiSHc5ZzOM+LhwmDSgWFD2YLs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=TcIDfIcI; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="TcIDfIcI" Received: by smtp.kernel.org (Postfix) with ESMTPSA id A4E66C19423; Tue, 31 Mar 2026 10:28:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1774952926; bh=WNGyQP0yiyg2l5WSpsWeNaSyG2Y9MPeEGVueFkXO1oI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=TcIDfIcId1xmft2B374wNaSca98OorjLA9SIEUj6M7aNV8pM81fFRpwNw1nmTfac1 b507keFFeGdg+J3s6C7Szp8Ypi69NZSxBeYZp5cUnvi0HRsI2cPKAH7RUabmong77e kumWSzqjx1asixiYqX2prrgC6n/QZM4ECBhi8jIsDRhgVd8ugtkdsS/Zcocm8T9Pyo Ot93KwIgyKHSpicArZ3dEljh+d88WXcVHWdARj6IeaQ+GitEx7UJe+VTsY8PV3JkZ0 IJgLIeCACCkagsDw+WSxbT9I1WKlS3ReWpjXer1HXckkOw6uJvc6YOorvGotwSQLsP gYcIYnq28xxhw== From: Geliang Tang To: mptcp@lists.linux.dev Cc: Geliang Tang , Hannes Reinecke , zhenwei pi , Hui Zhu , Gang Yan Subject: [RFC mptcp-next v7 4/7] nvme-tcp: define host tcp_sockops struct Date: Tue, 31 Mar 2026 18:28:28 +0800 Message-ID: X-Mailer: git-send-email 2.51.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Geliang Tang To add MPTCP support in "NVMe over TCP", the host side needs to pass IPPROTO_MPTCP to sock_create_kern() instead of IPPROTO_TCP to create an MPTCP socket. Similar to the target-side nvmet_tcp_sockops, this patch defines the host-side nvme_tcp_sockops structure, which contains the protocol of the socket and a set of function pointers for socket operations. The only difference is that it defines .set_syncnt instead of .set_reuseaddr. A TCP-specific version of this structure is defined, and a sockops field is added to nvme_tcp_ctrl. When the transport string is "tcp", it is assigned to ctrl->sockops. All locations that previously called TCP setsockopt functions are updated to call the corresponding function pointers in the nvme_tcp_sockops structure. Cc: Hannes Reinecke Co-developed-by: zhenwei pi Signed-off-by: zhenwei pi Co-developed-by: Hui Zhu Signed-off-by: Hui Zhu Co-developed-by: Gang Yan Signed-off-by: Gang Yan Signed-off-by: Geliang Tang --- drivers/nvme/host/tcp.c | 39 +++++++++++++++++++++++++++++++++------ 1 file changed, 33 insertions(+), 6 deletions(-) diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index 243dab830dc8..1f45f388b9c1 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -182,6 +182,15 @@ struct nvme_tcp_queue { void (*write_space)(struct sock *); }; =20 +struct nvme_tcp_sockops { + int proto; + int (*set_syncnt)(struct sock *sk, int val); + void (*set_nodelay)(struct sock *sk); + void (*no_linger)(struct sock *sk); + void (*set_priority)(struct sock *sk, u32 priority); + void (*set_tos)(struct sock *sk, int val); +}; + struct nvme_tcp_ctrl { /* read only in the hot path */ struct nvme_tcp_queue *queues; @@ -198,6 +207,8 @@ struct nvme_tcp_ctrl { struct delayed_work connect_work; struct nvme_tcp_request async_req; u32 io_queues[HCTX_MAX_TYPES]; + + const struct nvme_tcp_sockops *sockops; }; =20 static LIST_HEAD(nvme_tcp_ctrl_list); @@ -1785,7 +1796,7 @@ static int nvme_tcp_alloc_queue(struct nvme_ctrl *nct= rl, int qid, =20 ret =3D sock_create_kern(current->nsproxy->net_ns, ctrl->addr.ss_family, SOCK_STREAM, - IPPROTO_TCP, &queue->sock); + ctrl->sockops->proto, &queue->sock); if (ret) { dev_err(nctrl->device, "failed to create socket: %d\n", ret); @@ -1802,24 +1813,24 @@ static int nvme_tcp_alloc_queue(struct nvme_ctrl *n= ctrl, int qid, nvme_tcp_reclassify_socket(queue->sock); =20 /* Single syn retry */ - tcp_sock_set_syncnt(queue->sock->sk, 1); + ctrl->sockops->set_syncnt(queue->sock->sk, 1); =20 /* Set TCP no delay */ - tcp_sock_set_nodelay(queue->sock->sk); + ctrl->sockops->set_nodelay(queue->sock->sk); =20 /* * Cleanup whatever is sitting in the TCP transmit queue on socket * close. This is done to prevent stale data from being sent should * the network connection be restored before TCP times out. */ - sock_no_linger(queue->sock->sk); + ctrl->sockops->no_linger(queue->sock->sk); =20 if (so_priority > 0) - sock_set_priority(queue->sock->sk, so_priority); + ctrl->sockops->set_priority(queue->sock->sk, so_priority); =20 /* Set socket type of service */ if (nctrl->opts->tos >=3D 0) - ip_sock_set_tos(queue->sock->sk, nctrl->opts->tos); + ctrl->sockops->set_tos(queue->sock->sk, nctrl->opts->tos); =20 /* Set 10 seconds timeout for icresp recvmsg */ queue->sock->sk->sk_rcvtimeo =3D 10 * HZ; @@ -2886,6 +2897,15 @@ nvme_tcp_existing_controller(struct nvmf_ctrl_option= s *opts) return found; } =20 +static const struct nvme_tcp_sockops nvme_tcp_sockops =3D { + .proto =3D IPPROTO_TCP, + .set_syncnt =3D tcp_sock_set_syncnt, + .set_nodelay =3D tcp_sock_set_nodelay, + .no_linger =3D sock_no_linger, + .set_priority =3D sock_set_priority, + .set_tos =3D ip_sock_set_tos, +}; + static struct nvme_tcp_ctrl *nvme_tcp_alloc_ctrl(struct device *dev, struct nvmf_ctrl_options *opts) { @@ -2950,6 +2970,13 @@ static struct nvme_tcp_ctrl *nvme_tcp_alloc_ctrl(str= uct device *dev, goto out_free_ctrl; } =20 + if (!strcmp(ctrl->ctrl.opts->transport, "tcp")) { + ctrl->sockops =3D &nvme_tcp_sockops; + } else { + ret =3D -EINVAL; + goto out_free_ctrl; + } + ctrl->queues =3D kzalloc_objs(*ctrl->queues, ctrl->ctrl.queue_count); if (!ctrl->queues) { ret =3D -ENOMEM; --=20 2.51.0 From nobody Wed Apr 1 23:46:40 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 983553EFD21 for ; Tue, 31 Mar 2026 10:28:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774952929; cv=none; b=BluW0O9U1KOnegnfnRcRtASXWh18l/Jd8F6zVn/CMvuwKCHS7lRknu3urRxd8ZMGOQqhdUzG/WvCB4cugY7nVEYLE+5pWmEX8uqye2g3cTmdn16HukZ2Ef4cpX3+IkVEnqphzyD31Jy0kSE7wQwecAquIw6CtZVhpQ/eOfi7ccI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774952929; c=relaxed/simple; bh=L+JVOydn3eZ7Nv0hZnbVuP7CPTy2AU5s5xYCzb8pOwU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=TOo6vN2fFzLSQEUH03SS7jCDmurLa26CoY19DmSQgE7ngAnAlAgw0V6HggIi5lsElnjkn00dcGajPa+nruR0qsMbVblgrvZoyGjpQA3uCuHf1MPZ/vqyQB31o5Ujhu1+G6cx1PjXUEwdW2hDYQWa9uzfh6jAgW4aWIj3lzkMhI4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=oOUYa6bh; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="oOUYa6bh" Received: by smtp.kernel.org (Postfix) with ESMTPSA id D9590C19423; Tue, 31 Mar 2026 10:28:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1774952929; bh=L+JVOydn3eZ7Nv0hZnbVuP7CPTy2AU5s5xYCzb8pOwU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=oOUYa6bhLxlfdQ1h6kzxBb5RrkNo/LlYu61tm925cf6sxq+YlvVmpf1vGZO1JY2g+ uluEqJwoonJL9wt49acYVGGnpvXumwrtp9n9D/NtgHrSJSPCuDDkcoEU2B1rcc9WNC DFyOHaDiw89LubqJBGhsSCD4/JWx7TPhD2+kwwRP6d7a2hx+pWVeSZPwBNYxbSIczN XTcHI1D2FD5nvNuDA45Q58C1YKRs8tsT4RH1JsSwwJbg2T8Nwa17hJGr8T17RuB91i LN1C4vULTJH+0d+hgn8FVRlKJcvNR+Ll6vkzSzAE6LEn/OS8EHjL0qrA547YDdfz0J ksIM3URTMavDQ== From: Geliang Tang To: mptcp@lists.linux.dev Cc: Geliang Tang , Hannes Reinecke , zhenwei pi , Hui Zhu , Gang Yan Subject: [RFC mptcp-next v7 5/7] nvme-tcp: implement host mptcp sockops Date: Tue, 31 Mar 2026 18:28:29 +0800 Message-ID: X-Mailer: git-send-email 2.51.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Geliang Tang An MPTCP-specific version of struct nvme_tcp_sockops is implemented, and it is assigned to ctrl->sockops when the transport string is "mptcp". The socket option setting logic is similar to the target side, except that mptcp_sock_set_syncnt is newly defined for the host side. It sets the value on the first subflow socket of an MPTCP connection. The value is then synchronized to other newly created subflows in sync_socket_options(). Cc: Hannes Reinecke Co-developed-by: zhenwei pi Signed-off-by: zhenwei pi Co-developed-by: Hui Zhu Signed-off-by: Hui Zhu Co-developed-by: Gang Yan Signed-off-by: Gang Yan Signed-off-by: Geliang Tang --- drivers/nvme/host/tcp.c | 15 +++++++++++++++ include/net/mptcp.h | 7 +++++++ net/mptcp/protocol.h | 1 + net/mptcp/sockopt.c | 22 ++++++++++++++++++++++ 4 files changed, 45 insertions(+) diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index 1f45f388b9c1..f0ce09b0c142 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -2906,6 +2906,17 @@ static const struct nvme_tcp_sockops nvme_tcp_sockop= s =3D { .set_tos =3D ip_sock_set_tos, }; =20 +#ifdef CONFIG_MPTCP +static const struct nvme_tcp_sockops nvme_mptcp_sockops =3D { + .proto =3D IPPROTO_MPTCP, + .set_syncnt =3D mptcp_sock_set_syncnt, + .set_nodelay =3D mptcp_sock_set_nodelay, + .no_linger =3D mptcp_sock_no_linger, + .set_priority =3D mptcp_sock_set_priority, + .set_tos =3D mptcp_sock_set_tos, +}; +#endif + static struct nvme_tcp_ctrl *nvme_tcp_alloc_ctrl(struct device *dev, struct nvmf_ctrl_options *opts) { @@ -2972,6 +2983,10 @@ static struct nvme_tcp_ctrl *nvme_tcp_alloc_ctrl(str= uct device *dev, =20 if (!strcmp(ctrl->ctrl.opts->transport, "tcp")) { ctrl->sockops =3D &nvme_tcp_sockops; +#ifdef CONFIG_MPTCP + } else if (!strcmp(ctrl->ctrl.opts->transport, "mptcp")) { + ctrl->sockops =3D &nvme_mptcp_sockops; +#endif } else { ret =3D -EINVAL; goto out_free_ctrl; diff --git a/include/net/mptcp.h b/include/net/mptcp.h index 91ce7b9b639d..49031a111e69 100644 --- a/include/net/mptcp.h +++ b/include/net/mptcp.h @@ -247,6 +247,8 @@ void mptcp_sock_set_priority(struct sock *sk, u32 prior= ity); void mptcp_sock_no_linger(struct sock *sk); =20 void mptcp_sock_set_tos(struct sock *sk, int val); + +int mptcp_sock_set_syncnt(struct sock *sk, int val); #else =20 static inline void mptcp_init(void) @@ -343,6 +345,11 @@ static inline void mptcp_sock_set_priority(struct sock= *sk, u32 priority) { } static inline void mptcp_sock_no_linger(struct sock *sk) { } =20 static inline void mptcp_sock_set_tos(struct sock *sk, int val) { } + +static inline int mptcp_sock_set_syncnt(struct sock *sk, int val) +{ + return 0; +} #endif /* CONFIG_MPTCP */ =20 #if IS_ENABLED(CONFIG_MPTCP_IPV6) diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index f5d4d7d030f2..84e80816b2a4 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -335,6 +335,7 @@ struct mptcp_sock { int keepalive_idle; int keepalive_intvl; int maxseg; + int icsk_syn_retries; struct work_struct work; struct sk_buff *ooo_last_skb; struct rb_root out_of_order_queue; diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c index 2ea2e46977b9..489558b3797c 100644 --- a/net/mptcp/sockopt.c +++ b/net/mptcp/sockopt.c @@ -1592,6 +1592,7 @@ static void sync_socket_options(struct mptcp_sock *ms= k, struct sock *ssk) priority =3D READ_ONCE(sk->sk_priority); if (priority > 0) sock_set_priority(ssk, priority); + tcp_sock_set_syncnt(ssk, msk->icsk_syn_retries); } =20 void mptcp_sockopt_sync_locked(struct mptcp_sock *msk, struct sock *ssk) @@ -1750,3 +1751,24 @@ void mptcp_sock_set_tos(struct sock *sk, int val) release_sock(sk); } EXPORT_SYMBOL(mptcp_sock_set_tos); + +int mptcp_sock_set_syncnt(struct sock *sk, int val) +{ + struct mptcp_sock *msk =3D mptcp_sk(sk); + struct sock *ssk; + + if (val < 1 || val > MAX_TCP_SYNCNT) + return -EINVAL; + + lock_sock(sk); + sockopt_seq_inc(msk); + msk->icsk_syn_retries =3D val; + ssk =3D __mptcp_nmpc_sk(msk); + if (IS_ERR(ssk)) + goto unlock; + tcp_sock_set_syncnt(ssk, val); +unlock: + release_sock(sk); + return 0; +} +EXPORT_SYMBOL(mptcp_sock_set_syncnt); --=20 2.51.0 From nobody Wed Apr 1 23:46:40 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D337E3EFD36 for ; Tue, 31 Mar 2026 10:28:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774952932; cv=none; b=Kuq4iXIz3neaZyN3/g+sQYUf5htLqY34aG7ACAJgocB4n7McvO1G4TINbdBFiOd8mtH2HwbqReA1arbFCcjhzNX90T/y9leeY1a1bIL87v0b9KK1i1EmPoVcQfZmD7VX5mMfAZJ1B93Dnr7sbckq4e/C7+LVketUUXu+uWAdI7c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774952932; c=relaxed/simple; bh=4GEOfh62BsBFGONz1JD+59smDt+yIIMIoIYikAP2xLg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=I16+lfNUYev/tenWKE7RXLyFaWoi44gi4E41zcLTvkBYVpmHjBzU2fNmUWnl6aVHy/tjY5Wn+vAq1wZwYm37k8kFKRbjiRpnTs42a83BvjKYa9YeuwUaAnRTikvpFrgFV1KlnH10xQshTVkS5bWWbsJjrum+JMzKNeaqQUuTlhU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=c5knIY67; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="c5knIY67" Received: by smtp.kernel.org (Postfix) with ESMTPSA id D5EAEC19423; Tue, 31 Mar 2026 10:28:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1774952932; bh=4GEOfh62BsBFGONz1JD+59smDt+yIIMIoIYikAP2xLg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=c5knIY67VMVVqPxe5dD99oErUUICNMl8AR+wiwrQeecVcDIHIWBioA4hwtU04M0ts Y1Tk3wDYwbUA0ilLe0zgMq6/kCf80XRHHdw7RHghqbKAMWks4oJ3eKaGbgO5e0jt30 qEN7MbIu+YQfOuNs0Pcfdd/yN9OMPBiosO21SgWOh4VOg4kFqlM2Jwoee4lHOdru2w Vshe8V8ukLt5K/rA8vZZS9E8trQNXrj8iAlZnPi8fQTXJCHbwA5uZsn/Vj3E2hc8BI gb7K4vs4arXuEV+bZYycQQLSq9mURTkCQJiG3hGuJUjyU3B4MjmR4pbxo03LH16yKJ meB6JFOtZ/wTQ== From: Geliang Tang To: mptcp@lists.linux.dev Cc: Geliang Tang , Hannes Reinecke , zhenwei pi , Hui Zhu , Gang Yan Subject: [RFC mptcp-next v7 6/7] nvme-tcp: register host mptcp transport Date: Tue, 31 Mar 2026 18:28:30 +0800 Message-ID: <89096187f25fd5fdecfe2fb8646fdb243b702c93.1774952107.git.tanggeliang@kylinos.cn> X-Mailer: git-send-email 2.51.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Geliang Tang This patch defines a new nvmf_transport_ops named nvme_mptcp_transport, which is almost the same as nvme_tcp_transport except .type and .allowed_opts. MPTCP currently does not support TLS. The four TLS-related options (NVMF_OPT_TLS, NVMF_OPT_KEYRING, NVMF_OPT_TLS_KEY, and NVMF_OPT_CONCAT) have been removed from allowed_opts. They will be added back once MPTCP TLS is supported. It is registered in nvme_tcp_init_module() and unregistered in nvme_tcp_cleanup_module(). A separate nvme_mptcp_ctrl_ops structure with .name =3D "mptcp" is defined and used for MPTCP controllers. A MODULE_ALIAS("nvme-mptcp") declaration alongside the other module metadata is added at the end of the file. v2: - use 'trtype' instead of '--mptcp' (Hannes) v3: - check mptcp protocol from opts->transport instead of passing a parameter (Hannes). v4: - check CONFIG_MPTCP. Cc: Hannes Reinecke Co-developed-by: zhenwei pi Signed-off-by: zhenwei pi Co-developed-by: Hui Zhu Signed-off-by: Hui Zhu Co-developed-by: Gang Yan Signed-off-by: Gang Yan Signed-off-by: Geliang Tang --- drivers/nvme/host/tcp.c | 44 ++++++++++++++++++++++++++++++++++++++++- 1 file changed, 43 insertions(+), 1 deletion(-) diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c index f0ce09b0c142..c137fce4d358 100644 --- a/drivers/nvme/host/tcp.c +++ b/drivers/nvme/host/tcp.c @@ -2880,6 +2880,24 @@ static const struct nvme_ctrl_ops nvme_tcp_ctrl_ops = =3D { .get_virt_boundary =3D nvmf_get_virt_boundary, }; =20 +#ifdef CONFIG_MPTCP +static const struct nvme_ctrl_ops nvme_mptcp_ctrl_ops =3D { + .name =3D "mptcp", + .module =3D THIS_MODULE, + .flags =3D NVME_F_FABRICS | NVME_F_BLOCKING, + .reg_read32 =3D nvmf_reg_read32, + .reg_read64 =3D nvmf_reg_read64, + .reg_write32 =3D nvmf_reg_write32, + .subsystem_reset =3D nvmf_subsystem_reset, + .free_ctrl =3D nvme_tcp_free_ctrl, + .submit_async_event =3D nvme_tcp_submit_async_event, + .delete_ctrl =3D nvme_tcp_delete_ctrl, + .get_address =3D nvme_tcp_get_address, + .stop_ctrl =3D nvme_tcp_stop_ctrl, + .get_virt_boundary =3D nvmf_get_virt_boundary, +}; +#endif + static bool nvme_tcp_existing_controller(struct nvmf_ctrl_options *opts) { @@ -2920,6 +2938,7 @@ static const struct nvme_tcp_sockops nvme_mptcp_socko= ps =3D { static struct nvme_tcp_ctrl *nvme_tcp_alloc_ctrl(struct device *dev, struct nvmf_ctrl_options *opts) { + const struct nvme_ctrl_ops *ops; struct nvme_tcp_ctrl *ctrl; int ret; =20 @@ -2983,9 +3002,11 @@ static struct nvme_tcp_ctrl *nvme_tcp_alloc_ctrl(str= uct device *dev, =20 if (!strcmp(ctrl->ctrl.opts->transport, "tcp")) { ctrl->sockops =3D &nvme_tcp_sockops; + ops =3D &nvme_tcp_ctrl_ops; #ifdef CONFIG_MPTCP } else if (!strcmp(ctrl->ctrl.opts->transport, "mptcp")) { ctrl->sockops =3D &nvme_mptcp_sockops; + ops =3D &nvme_mptcp_ctrl_ops; #endif } else { ret =3D -EINVAL; @@ -2998,7 +3019,7 @@ static struct nvme_tcp_ctrl *nvme_tcp_alloc_ctrl(stru= ct device *dev, goto out_free_ctrl; } =20 - ret =3D nvme_init_ctrl(&ctrl->ctrl, dev, &nvme_tcp_ctrl_ops, 0); + ret =3D nvme_init_ctrl(&ctrl->ctrl, dev, ops, 0); if (ret) goto out_kfree_queues; =20 @@ -3065,6 +3086,20 @@ static struct nvmf_transport_ops nvme_tcp_transport = =3D { .create_ctrl =3D nvme_tcp_create_ctrl, }; =20 +#ifdef CONFIG_MPTCP +static struct nvmf_transport_ops nvme_mptcp_transport =3D { + .name =3D "mptcp", + .module =3D THIS_MODULE, + .required_opts =3D NVMF_OPT_TRADDR, + .allowed_opts =3D NVMF_OPT_TRSVCID | NVMF_OPT_RECONNECT_DELAY | + NVMF_OPT_HOST_TRADDR | NVMF_OPT_CTRL_LOSS_TMO | + NVMF_OPT_HDR_DIGEST | NVMF_OPT_DATA_DIGEST | + NVMF_OPT_NR_WRITE_QUEUES | NVMF_OPT_NR_POLL_QUEUES | + NVMF_OPT_TOS | NVMF_OPT_HOST_IFACE, + .create_ctrl =3D nvme_tcp_create_ctrl, +}; +#endif + static int __init nvme_tcp_init_module(void) { unsigned int wq_flags =3D WQ_MEM_RECLAIM | WQ_HIGHPRI | WQ_SYSFS; @@ -3090,6 +3125,9 @@ static int __init nvme_tcp_init_module(void) atomic_set(&nvme_tcp_cpu_queues[cpu], 0); =20 nvmf_register_transport(&nvme_tcp_transport); +#ifdef CONFIG_MPTCP + nvmf_register_transport(&nvme_mptcp_transport); +#endif return 0; } =20 @@ -3097,6 +3135,9 @@ static void __exit nvme_tcp_cleanup_module(void) { struct nvme_tcp_ctrl *ctrl; =20 +#ifdef CONFIG_MPTCP + nvmf_unregister_transport(&nvme_mptcp_transport); +#endif nvmf_unregister_transport(&nvme_tcp_transport); =20 mutex_lock(&nvme_tcp_ctrl_mutex); @@ -3113,3 +3154,4 @@ module_exit(nvme_tcp_cleanup_module); =20 MODULE_DESCRIPTION("NVMe host TCP transport driver"); MODULE_LICENSE("GPL v2"); +MODULE_ALIAS("nvme-mptcp"); --=20 2.51.0 From nobody Wed Apr 1 23:46:40 2026 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 831783EF0A5 for ; Tue, 31 Mar 2026 10:28:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774952936; cv=none; b=lzf4i5QH8ARfQJd539a+hVZ/lM4EbCpe9wih0kgkOURIwCzFknlWeaJDLBIGjFmWz49/lrsw6jkIfFkaFW4BK/WLFpl7FuKzSKQW9kRk4wju2QD0ETv8Aej8z6a9h+0uv7tc3xauggIqQxM/XsQzHHJQOMarPsplJVfuY0ItORY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774952936; c=relaxed/simple; bh=DI6CVDO4qyBY+ADksNU0PDXVvYHW9KFG93w5qzjSDD0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=qTNqNDA41je/jvyNXnFzVw1NH+lEjixvYukiApwRcXlC/4bFhZIM1GKq8UO8DGBD4sM7lIcoiGI9nYO3qHe7AnOgzsB+EtWNu+jdzhGNgsgkINsgFCGuYs8u0JlNBZP2OOrdskKUIQj2VVuhwIZ8oNwvKYx32+spvTLhtooqQhU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=i3jsyEp2; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="i3jsyEp2" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0DD57C2BCB0; Tue, 31 Mar 2026 10:28:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1774952936; bh=DI6CVDO4qyBY+ADksNU0PDXVvYHW9KFG93w5qzjSDD0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=i3jsyEp2PC6IP4uanzLkGtf9XhVpIIWHCCSZQoLfmyxPgxzVfHYidmUpZ+hcmXXLG mFR137bOVi7K4NVp4NX/1AfoiqhrnmDL1MO7OVXWTGwRV6OapptIV/wvHV08v0y9x0 q0RBq8JBMVtQ8dP8memOrCzjF9kIS8JVpqQWcerK3+fcPQ949ww1OEkezbXBY53Ahv kPAn/VSk7NLnAC5FOvhKf1KLyrjVTDLWV4WDLO55kb36sNUXh3hDk6ZqAPFe8jgDrF O3ZFnmzEdH75glUntGOMP+5bTj43FF3XDNY7ih7mhfrZGAnZX0Ld4OF9fQw5z19TBH JidLLIJ3JQreQ== From: Geliang Tang To: mptcp@lists.linux.dev Cc: Geliang Tang , Hannes Reinecke , Nilay Shroff , Ming Lei , zhenwei pi , Hui Zhu , Gang Yan Subject: [RFC mptcp-next v7 7/7] selftests: mptcp: add NVMe over MPTCP test Date: Tue, 31 Mar 2026 18:28:31 +0800 Message-ID: <5c4f1cddacc3edaceefbec551cc878490484dd17.1774952107.git.tanggeliang@kylinos.cn> X-Mailer: git-send-email 2.51.0 In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Geliang Tang A test case for NVMe over MPTCP has been implemented. It verifies the proper functionality of nvme list, discover, connect, and disconnect commands. Additionally, read/write performance has been evaluated using fio. This test simulates four NICs on both target and host sides, each limited to 100MB/s. It shows that 'NVMe over MPTCP' delivered bandwidth up to four times that of standard TCP: # ./mptcp_nvme.sh tcp READ: bw=3D112MiB/s (118MB/s), 112MiB/s-112MiB/s (118MB/s-118MB/s), io=3D1123MiB (1177MB), run=3D10018-10018msec WRITE: bw=3D112MiB/s (117MB/s), 112MiB/s-112MiB/s (117MB/s-117MB/s), io=3D1118MiB (1173MB), run=3D10018-10018msec # ./mptcp_nvme.sh mptcp READ: bw=3D427MiB/s (448MB/s), 427MiB/s-427MiB/s (448MB/s-448MB/s), io=3D4286MiB (4494MB), run=3D10039-10039msec WRITE: bw=3D387MiB/s (406MB/s), 387MiB/s-387MiB/s (406MB/s-406MB/s), io=3D3885MiB (4073MB), run=3D10043-10043msec Also add NVMe iopolicy testing to mptcp_nvme.sh, with the default set to "numa". It can be set to "round-robin" or "queue-depth". # ./mptcp_nvme.sh mptcp round-robin Cc: Hannes Reinecke Cc: Nilay Shroff Cc: Ming Lei Co-developed-by: zhenwei pi Signed-off-by: zhenwei pi Co-developed-by: Hui Zhu Signed-off-by: Hui Zhu Co-developed-by: Gang Yan Signed-off-by: Gang Yan Signed-off-by: Geliang Tang --- tools/testing/selftests/net/mptcp/Makefile | 1 + tools/testing/selftests/net/mptcp/config | 7 + .../testing/selftests/net/mptcp/mptcp_lib.sh | 12 + .../testing/selftests/net/mptcp/mptcp_nvme.sh | 220 ++++++++++++++++++ 4 files changed, 240 insertions(+) create mode 100755 tools/testing/selftests/net/mptcp/mptcp_nvme.sh diff --git a/tools/testing/selftests/net/mptcp/Makefile b/tools/testing/sel= ftests/net/mptcp/Makefile index 22ba0da2adb8..7b308447a58b 100644 --- a/tools/testing/selftests/net/mptcp/Makefile +++ b/tools/testing/selftests/net/mptcp/Makefile @@ -13,6 +13,7 @@ TEST_PROGS :=3D \ mptcp_connect_sendfile.sh \ mptcp_connect_splice.sh \ mptcp_join.sh \ + mptcp_nvme.sh \ mptcp_sockopt.sh \ pm_netlink.sh \ simult_flows.sh \ diff --git a/tools/testing/selftests/net/mptcp/config b/tools/testing/selft= ests/net/mptcp/config index 59051ee2a986..0eee348eff8b 100644 --- a/tools/testing/selftests/net/mptcp/config +++ b/tools/testing/selftests/net/mptcp/config @@ -34,3 +34,10 @@ CONFIG_NFT_SOCKET=3Dm CONFIG_NFT_TPROXY=3Dm CONFIG_SYN_COOKIES=3Dy CONFIG_VETH=3Dy +CONFIG_CONFIGFS_FS=3Dy +CONFIG_NVME_CORE=3Dy +CONFIG_NVME_FABRICS=3Dy +CONFIG_NVME_TCP=3Dy +CONFIG_NVME_TARGET=3Dy +CONFIG_NVME_TARGET_TCP=3Dy +CONFIG_NVME_MULTIPATH=3Dy diff --git a/tools/testing/selftests/net/mptcp/mptcp_lib.sh b/tools/testing= /selftests/net/mptcp/mptcp_lib.sh index 5fea7e7df628..53a155b9e9a7 100644 --- a/tools/testing/selftests/net/mptcp/mptcp_lib.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_lib.sh @@ -526,6 +526,18 @@ mptcp_lib_check_tools() { exit ${KSFT_SKIP} fi ;; + "nvme") + if ! nvme -h &> /dev/null; then + mptcp_lib_pr_skip "Could not run all tests without ${tool}" + exit ${KSFT_SKIP} + fi + ;; + "fio") + if ! fio -h &> /dev/null; then + mptcp_lib_pr_skip "Could not run all tests without ${tool}" + exit ${KSFT_SKIP} + fi + ;; *) mptcp_lib_pr_fail "Internal error: unsupported tool: ${tool}" exit ${KSFT_FAIL} diff --git a/tools/testing/selftests/net/mptcp/mptcp_nvme.sh b/tools/testin= g/selftests/net/mptcp/mptcp_nvme.sh new file mode 100755 index 000000000000..9c5c60d38928 --- /dev/null +++ b/tools/testing/selftests/net/mptcp/mptcp_nvme.sh @@ -0,0 +1,220 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 + +. "$(dirname "$0")/mptcp_lib.sh" + +ret=3D0 +trtype=3D"${1:-mptcp}" +iopolicy=3D${2:-"numa"} # round-robin, queue-depth +nqn=3Dnqn.2014-08.org.nvmexpress.${trtype}dev +ns=3D1 +port=3D1234 +trsvcid=3D4420 +ns1=3D"" +ns2=3D"" + +ns1_cleanup() +{ + mount -t configfs none /sys/kernel/config + + pushd /sys/kernel/config/nvmet || exit 1 + rm -rf ports/"${port}"/subsystems/"${trtype}"subsys + rmdir ports/"${port}" + echo 0 > subsystems/"${nqn}"/namespaces/"${ns}"/enable + echo -n 0 > subsystems/"${nqn}"/namespaces/"${ns}"/device_path + rmdir subsystems/"${nqn}"/namespaces/"${ns}" + rmdir subsystems/"${nqn}" + popd || exit 1 +} + +ns2_cleanup() +{ + nvme disconnect -n "${nqn}" || true +} + +cleanup() +{ + ip netns exec "$ns2" bash <<- EOF + $(declare -f ns2_cleanup) + ns2_cleanup + EOF + + sleep 1 + + ip netns exec "$ns1" bash <<- EOF + $(declare -f ns1_cleanup) + ns1_cleanup + EOF + + losetup -d /dev/loop100 + rm -rf /tmp/test.raw + + mptcp_lib_ns_exit "$ns1" "$ns2" + + kill "$monitor_pid_ns1" 2>/dev/null + wait "$monitor_pid_ns1" 2>/dev/null + + kill "$monitor_pid_ns2" 2>/dev/null + wait "$monitor_pid_ns2" 2>/dev/null + + unset -v trtype nqn ns port trsvcid +} + +init() +{ + mptcp_lib_ns_init ns1 ns2 + + # ns1 ns2 + # 10.1.1.1 10.1.1.2 + # 10.1.2.1 10.1.2.2 + # 10.1.3.1 10.1.3.2 + # 10.1.4.1 10.1.4.2 + for i in {1..4}; do + ip link add ns1eth"$i" netns "$ns1" type veth peer \ + name ns2eth"$i" netns "$ns2" + ip -net "$ns1" addr add 10.1."$i".1/24 dev ns1eth"$i" + ip -net "$ns1" addr add dead:beef:"$i"::1/64 \ + dev ns1eth"$i" nodad + ip -net "$ns1" link set ns1eth"$i" up + ip -net "$ns2" addr add 10.1."$i".2/24 dev ns2eth"$i" + ip -net "$ns2" addr add dead:beef:"$i"::2/64 \ + dev ns2eth"$i" nodad + ip -net "$ns2" link set ns2eth"$i" up + ip -net "$ns2" route add default via 10.1."$i".1 \ + dev ns2eth"$i" metric 10"$i" + ip -net "$ns2" route add default via dead:beef:"$i"::1 \ + dev ns2eth"$i" metric 10"$i" + + # Add tc qdisc to both namespaces for bandwidth limiting + tc -n "$ns1" qdisc add dev ns1eth"$i" root netem rate 1000mbit + tc -n "$ns2" qdisc add dev ns2eth"$i" root netem rate 1000mbit + done + + mptcp_lib_pm_nl_set_limits "${ns1}" 8 8 + + mptcp_lib_pm_nl_add_endpoint "$ns1" 10.1.2.1 flags signal + mptcp_lib_pm_nl_add_endpoint "$ns1" 10.1.3.1 flags signal + mptcp_lib_pm_nl_add_endpoint "$ns1" 10.1.4.1 flags signal + + mptcp_lib_pm_nl_set_limits "${ns2}" 8 8 + + mptcp_lib_pm_nl_add_endpoint "$ns2" 10.1.2.2 flags subflow + mptcp_lib_pm_nl_add_endpoint "$ns2" 10.1.3.2 flags subflow + mptcp_lib_pm_nl_add_endpoint "$ns2" 10.1.4.2 flags subflow + + ip -n "${ns1}" mptcp monitor & + monitor_pid_ns1=3D$! + ip -n "${ns2}" mptcp monitor & + monitor_pid_ns2=3D$! +} + +run_target() +{ + mount -t configfs none /sys/kernel/config + + cd /sys/kernel/config/nvmet/subsystems || exit + mkdir -p "${nqn}" + cd "${nqn}" || exit + echo 1 > attr_allow_any_host + mkdir -p namespaces/"${ns}" + echo /dev/loop100 > namespaces/"${ns}"/device_path + echo 1 > namespaces/"${ns}"/enable + + cd /sys/kernel/config/nvmet/ports || exit + mkdir -p "${port}" + cd "${port}" || exit + echo "${trtype}" > addr_trtype + echo ipv4 > addr_adrfam + echo 0.0.0.0 > addr_traddr + echo "${trsvcid}" > addr_trsvcid + + cd subsystems || exit + ln -sf ../../../subsystems/"${nqn}" "${trtype}"subsys +} + +run_host() +{ + local traddr=3D10.1.1.1 + + echo "nvme discover -a ${traddr}" + nvme discover -t "${trtype}" -a "${traddr}" -s "${trsvcid}" + if [ $? -ne 0 ]; then + return 1 + fi + + echo "nvme connect" + devname=3D$(nvme connect -t "${trtype}" -a "${traddr}" \ + -s "${trsvcid}" -n "${nqn}" | + awk '{print $NF}') + if [ -z "$devname" ]; then + return 1 + fi + + sleep 1 + + echo "nvme list" + nvme list + + subname=3D$(nvme list-subsys /dev/"${devname}"n1 | + grep -o 'nvme-subsys[0-9]*' | head -1) + + echo "${iopolicy}" > /sys/class/nvme-subsystem/"${subname}"/iopolicy + cat /sys/class/nvme-subsystem/"${subname}"/iopolicy + + echo "fio randread /dev/${devname}n1" + fio --name=3Dglobal --direct=3D1 --norandommap --randrepeat=3D0 \ + --ioengine=3Dlibaio --thread=3D1 --blocksize=3D4k --runtime=3D10 \ + --time_based --rw=3Drandread --numjobs=3D4 --iodepth=3D256 \ + --group_reporting --size=3D100% --name=3Dlibaio_4_256_4k_randread \ + --size=3D4m --filename=3D/dev/"${devname}"n1 + + sleep 1 + + echo "fio randwrite /dev/${devname}n1" + fio --name=3Dglobal --direct=3D1 --norandommap --randrepeat=3D0 \ + --ioengine=3Dlibaio --thread=3D1 --blocksize=3D4k --runtime=3D10 \ + --time_based --rw=3Drandwrite --numjobs=3D4 --iodepth=3D256 \ + --group_reporting --size=3D100% --name=3Dlibaio_4_256_4k_randwrite \ + --size=3D4m --filename=3D/dev/"${devname}"n1 + + nvme flush /dev/"${devname}"n1 +} + +mptcp_lib_check_tools nvme fio + +init +trap cleanup EXIT + +dd if=3D/dev/zero of=3D/tmp/test.raw bs=3D1M count=3D0 seek=3D512 +losetup /dev/loop100 /tmp/test.raw + +run_test() +{ + export trtype nqn ns port trsvcid + export iopolicy + + if ! ip netns exec "$ns1" bash <<- EOF + $(declare -f run_target) + run_target + exit \$? + EOF + then + ret=3D"${KSFT_FAIL}" + fi + + if ! ip netns exec "$ns2" bash <<- EOF + $(declare -f run_host) + run_host + exit \$? + EOF + then + ret=3D"${KSFT_FAIL}" + fi + + sleep 1 +} + +run_test "$@" + +mptcp_lib_result_print_all_tap +exit "$ret" --=20 2.51.0