net/mptcp/protocol.c | 29 +++++++---------------------- net/mptcp/protocol.h | 1 - 2 files changed, 7 insertions(+), 23 deletions(-)
Ondrej reported a functional issue WRT timeout handling on connect
with a nice reproducer.
The problem is that the current mptcp connect waits for both the
MPTCP socket level timeout, and the first subflow socket timeout.
The latter is not influenced/touched by the exposed setsockopt().
Overall the above makes the SO_SNDTIMEO a no-op on connect.
Since mptcp_connect is invoked via inet_stream_connect and the
latter properly handle the MPTCP level timeout, we can address the
issue making the nested subflow level connect always unblocking.
This also allow simplifying a bit the code, dropping an ugly hack
to handle the fastopen and custom proto_ops connect.
The issues predates the blamed commit below, but the current resolution
requires the infrastructure introduced there.
Fixes: 54f1944ed6d2 ("mptcp: factor out mptcp_connect()")
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/399
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
Note: there is needed follow-up correctly propagating the SYN_SENT
-> CLOSE state transition from the first subflow to the msk socket in
case of connect error, but that is somewhat independed/pre-existent.
Sending out early for testing.
---
net/mptcp/protocol.c | 29 +++++++----------------------
net/mptcp/protocol.h | 1 -
2 files changed, 7 insertions(+), 23 deletions(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index f5842f93c891..4747058a8b6b 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1734,7 +1734,6 @@ static int mptcp_sendmsg_fastopen(struct sock *sk, struct msghdr *msg,
lock_sock(ssk);
msg->msg_flags |= MSG_DONTWAIT;
- msk->connect_flags = O_NONBLOCK;
msk->fastopening = 1;
ret = tcp_sendmsg_fastopen(ssk, msg, copied_syn, len, NULL);
msk->fastopening = 0;
@@ -3672,9 +3671,9 @@ static int mptcp_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
* acquired the subflow socket lock, too.
*/
if (msk->fastopening)
- err = __inet_stream_connect(ssock, uaddr, addr_len, msk->connect_flags, 1);
+ err = __inet_stream_connect(ssock, uaddr, addr_len, O_NONBLOCK, 1);
else
- err = inet_stream_connect(ssock, uaddr, addr_len, msk->connect_flags);
+ err = inet_stream_connect(ssock, uaddr, addr_len, O_NONBLOCK);
inet_sk(sk)->defer_connect = inet_sk(ssock->sk)->defer_connect;
/* on successful connect, the msk state will be moved to established by
@@ -3687,12 +3686,10 @@ static int mptcp_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
mptcp_copy_inaddrs(sk, ssock->sk);
- /* unblocking connect, mptcp-level inet_stream_connect will error out
- * without changing the socket state, update it here.
+ /* silence EINPROGRESS and let the caller inet_stream_connect
+ * handle the connection in progress
*/
- if (err == -EINPROGRESS)
- sk->sk_socket->state = ssock->state;
- return err;
+ return 0;
}
static struct proto mptcp_prot = {
@@ -3751,18 +3748,6 @@ static int mptcp_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
return err;
}
-static int mptcp_stream_connect(struct socket *sock, struct sockaddr *uaddr,
- int addr_len, int flags)
-{
- int ret;
-
- lock_sock(sock->sk);
- mptcp_sk(sock->sk)->connect_flags = flags;
- ret = __inet_stream_connect(sock, uaddr, addr_len, flags, 0);
- release_sock(sock->sk);
- return ret;
-}
-
static int mptcp_listen(struct socket *sock, int backlog)
{
struct mptcp_sock *msk = mptcp_sk(sock->sk);
@@ -3917,7 +3902,7 @@ static const struct proto_ops mptcp_stream_ops = {
.owner = THIS_MODULE,
.release = inet_release,
.bind = mptcp_bind,
- .connect = mptcp_stream_connect,
+ .connect = inet_stream_connect,
.socketpair = sock_no_socketpair,
.accept = mptcp_stream_accept,
.getname = inet_getname,
@@ -4012,7 +3997,7 @@ static const struct proto_ops mptcp_v6_stream_ops = {
.owner = THIS_MODULE,
.release = inet6_release,
.bind = mptcp_bind,
- .connect = mptcp_stream_connect,
+ .connect = inet_stream_connect,
.socketpair = sock_no_socketpair,
.accept = mptcp_stream_accept,
.getname = inet6_getname,
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 8d9d87e2f840..39613bba1c46 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -300,7 +300,6 @@ struct mptcp_sock {
nodelay:1,
fastopening:1,
in_accept_queue:1;
- int connect_flags;
struct work_struct work;
struct sk_buff *ooo_last_skb;
struct rb_root out_of_order_queue;
--
2.40.0
Hi Paolo, Mat,
On 05/05/2023 21:34, Paolo Abeni wrote:
> Ondrej reported a functional issue WRT timeout handling on connect
> with a nice reproducer.
>
> The problem is that the current mptcp connect waits for both the
> MPTCP socket level timeout, and the first subflow socket timeout.
> The latter is not influenced/touched by the exposed setsockopt().
>
> Overall the above makes the SO_SNDTIMEO a no-op on connect.
>
> Since mptcp_connect is invoked via inet_stream_connect and the
> latter properly handle the MPTCP level timeout, we can address the
> issue making the nested subflow level connect always unblocking.
>
> This also allow simplifying a bit the code, dropping an ugly hack
> to handle the fastopen and custom proto_ops connect.
>
> The issues predates the blamed commit below, but the current resolution
> requires the infrastructure introduced there.
>
> Fixes: 54f1944ed6d2 ("mptcp: factor out mptcp_connect()")
> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/399
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Thank you for the fix and the review!
I just applied it in our repo with Mat's RvB tag. I also added Ondrej's
Reported-by tag.
New patches for t/upstream-net and t/upstream:
- 59d33b72ee59: mptcp: fix connect timeout handling
- 8d8fd087a52c: tg:msg: add Ondrej's Reported-by tag
- Results: 8ecd3b5ba105..fa4a5780caa4 (export-net)
- Results: d9d25c574c4e..b8e056f78785 (export)
Tests are now in progress:
https://cirrus-ci.com/github/multipath-tcp/mptcp_net-next/export-net/20230515T155446
https://cirrus-ci.com/github/multipath-tcp/mptcp_net-next/export/20230515T155446
Cheers,
Matt
--
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net
On Fri, 5 May 2023, Paolo Abeni wrote:
> Ondrej reported a functional issue WRT timeout handling on connect
> with a nice reproducer.
>
> The problem is that the current mptcp connect waits for both the
> MPTCP socket level timeout, and the first subflow socket timeout.
> The latter is not influenced/touched by the exposed setsockopt().
>
> Overall the above makes the SO_SNDTIMEO a no-op on connect.
>
> Since mptcp_connect is invoked via inet_stream_connect and the
> latter properly handle the MPTCP level timeout, we can address the
> issue making the nested subflow level connect always unblocking.
>
> This also allow simplifying a bit the code, dropping an ugly hack
> to handle the fastopen and custom proto_ops connect.
>
> The issues predates the blamed commit below, but the current resolution
> requires the infrastructure introduced there.
>
> Fixes: 54f1944ed6d2 ("mptcp: factor out mptcp_connect()")
> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/399
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> Note: there is needed follow-up correctly propagating the SYN_SENT
> -> CLOSE state transition from the first subflow to the msk socket in
> case of connect error, but that is somewhat independed/pre-existent.
>
> Sending out early for testing.
Hi Paolo -
Patch LGTM, and appreciate the cleanup of msk->connect_flags:
Reviewed-by: Mat Martineau <martineau@kernel.org>
I think it would be good to also add a selftest in net-next to check the
connect timeout (set to a very short time), but that doesn't need to block
this -net change.
Thanks,
Mat
> ---
> net/mptcp/protocol.c | 29 +++++++----------------------
> net/mptcp/protocol.h | 1 -
> 2 files changed, 7 insertions(+), 23 deletions(-)
>
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index f5842f93c891..4747058a8b6b 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -1734,7 +1734,6 @@ static int mptcp_sendmsg_fastopen(struct sock *sk, struct msghdr *msg,
>
> lock_sock(ssk);
> msg->msg_flags |= MSG_DONTWAIT;
> - msk->connect_flags = O_NONBLOCK;
> msk->fastopening = 1;
> ret = tcp_sendmsg_fastopen(ssk, msg, copied_syn, len, NULL);
> msk->fastopening = 0;
> @@ -3672,9 +3671,9 @@ static int mptcp_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
> * acquired the subflow socket lock, too.
> */
> if (msk->fastopening)
> - err = __inet_stream_connect(ssock, uaddr, addr_len, msk->connect_flags, 1);
> + err = __inet_stream_connect(ssock, uaddr, addr_len, O_NONBLOCK, 1);
> else
> - err = inet_stream_connect(ssock, uaddr, addr_len, msk->connect_flags);
> + err = inet_stream_connect(ssock, uaddr, addr_len, O_NONBLOCK);
> inet_sk(sk)->defer_connect = inet_sk(ssock->sk)->defer_connect;
>
> /* on successful connect, the msk state will be moved to established by
> @@ -3687,12 +3686,10 @@ static int mptcp_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
>
> mptcp_copy_inaddrs(sk, ssock->sk);
>
> - /* unblocking connect, mptcp-level inet_stream_connect will error out
> - * without changing the socket state, update it here.
> + /* silence EINPROGRESS and let the caller inet_stream_connect
> + * handle the connection in progress
> */
> - if (err == -EINPROGRESS)
> - sk->sk_socket->state = ssock->state;
> - return err;
> + return 0;
> }
>
> static struct proto mptcp_prot = {
> @@ -3751,18 +3748,6 @@ static int mptcp_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
> return err;
> }
>
> -static int mptcp_stream_connect(struct socket *sock, struct sockaddr *uaddr,
> - int addr_len, int flags)
> -{
> - int ret;
> -
> - lock_sock(sock->sk);
> - mptcp_sk(sock->sk)->connect_flags = flags;
> - ret = __inet_stream_connect(sock, uaddr, addr_len, flags, 0);
> - release_sock(sock->sk);
> - return ret;
> -}
> -
> static int mptcp_listen(struct socket *sock, int backlog)
> {
> struct mptcp_sock *msk = mptcp_sk(sock->sk);
> @@ -3917,7 +3902,7 @@ static const struct proto_ops mptcp_stream_ops = {
> .owner = THIS_MODULE,
> .release = inet_release,
> .bind = mptcp_bind,
> - .connect = mptcp_stream_connect,
> + .connect = inet_stream_connect,
> .socketpair = sock_no_socketpair,
> .accept = mptcp_stream_accept,
> .getname = inet_getname,
> @@ -4012,7 +3997,7 @@ static const struct proto_ops mptcp_v6_stream_ops = {
> .owner = THIS_MODULE,
> .release = inet6_release,
> .bind = mptcp_bind,
> - .connect = mptcp_stream_connect,
> + .connect = inet_stream_connect,
> .socketpair = sock_no_socketpair,
> .accept = mptcp_stream_accept,
> .getname = inet6_getname,
> diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
> index 8d9d87e2f840..39613bba1c46 100644
> --- a/net/mptcp/protocol.h
> +++ b/net/mptcp/protocol.h
> @@ -300,7 +300,6 @@ struct mptcp_sock {
> nodelay:1,
> fastopening:1,
> in_accept_queue:1;
> - int connect_flags;
> struct work_struct work;
> struct sk_buff *ooo_last_skb;
> struct rb_root out_of_order_queue;
> --
> 2.40.0
>
>
>
On Fri, 2023-05-12 at 12:24 -0700, Mat Martineau wrote:
> On Fri, 5 May 2023, Paolo Abeni wrote:
>
> > Ondrej reported a functional issue WRT timeout handling on connect
> > with a nice reproducer.
> >
> > The problem is that the current mptcp connect waits for both the
> > MPTCP socket level timeout, and the first subflow socket timeout.
> > The latter is not influenced/touched by the exposed setsockopt().
> >
> > Overall the above makes the SO_SNDTIMEO a no-op on connect.
> >
> > Since mptcp_connect is invoked via inet_stream_connect and the
> > latter properly handle the MPTCP level timeout, we can address the
> > issue making the nested subflow level connect always unblocking.
> >
> > This also allow simplifying a bit the code, dropping an ugly hack
> > to handle the fastopen and custom proto_ops connect.
> >
> > The issues predates the blamed commit below, but the current resolution
> > requires the infrastructure introduced there.
> >
> > Fixes: 54f1944ed6d2 ("mptcp: factor out mptcp_connect()")
> > Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/399
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> > Note: there is needed follow-up correctly propagating the SYN_SENT
> > -> CLOSE state transition from the first subflow to the msk socket in
> > case of connect error, but that is somewhat independed/pre-existent.
> >
> > Sending out early for testing.
>
> Hi Paolo -
>
> Patch LGTM, and appreciate the cleanup of msk->connect_flags:
>
> Reviewed-by: Mat Martineau <martineau@kernel.org>
>
> I think it would be good to also add a selftest in net-next to check the
> connect timeout (set to a very short time), but that doesn't need to block
> this -net change.
Agreed. Perhaps a pktdrill script would could be simpler option?!?
Thanks!
Paolo
Hi Paolo, Mat,
On 15/05/2023 10:16, Paolo Abeni wrote:
> On Fri, 2023-05-12 at 12:24 -0700, Mat Martineau wrote:
>> On Fri, 5 May 2023, Paolo Abeni wrote:
>>
>>> Ondrej reported a functional issue WRT timeout handling on connect
>>> with a nice reproducer.
>>>
>>> The problem is that the current mptcp connect waits for both the
>>> MPTCP socket level timeout, and the first subflow socket timeout.
>>> The latter is not influenced/touched by the exposed setsockopt().
>>>
>>> Overall the above makes the SO_SNDTIMEO a no-op on connect.
>>>
>>> Since mptcp_connect is invoked via inet_stream_connect and the
>>> latter properly handle the MPTCP level timeout, we can address the
>>> issue making the nested subflow level connect always unblocking.
>>>
>>> This also allow simplifying a bit the code, dropping an ugly hack
>>> to handle the fastopen and custom proto_ops connect.
>>>
>>> The issues predates the blamed commit below, but the current resolution
>>> requires the infrastructure introduced there.
>>>
>>> Fixes: 54f1944ed6d2 ("mptcp: factor out mptcp_connect()")
>>> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/399
>>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>>> ---
>>> Note: there is needed follow-up correctly propagating the SYN_SENT
>>> -> CLOSE state transition from the first subflow to the msk socket in
>>> case of connect error, but that is somewhat independed/pre-existent.
>>>
>>> Sending out early for testing.
>>
>> Hi Paolo -
>>
>> Patch LGTM, and appreciate the cleanup of msk->connect_flags:
>>
>> Reviewed-by: Mat Martineau <martineau@kernel.org>
>>
>> I think it would be good to also add a selftest in net-next to check the
>> connect timeout (set to a very short time), but that doesn't need to block
>> this -net change.
>
> Agreed. Perhaps a pktdrill script would could be simpler option?!?
Good idea, that seems more appropriated, especially with the timeout part.
Just to avoid deadlocks: is there someone working on that? :)
But as Mat said, this is not blocking, I can apply the patch.
Cheers,
Matt
--
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net
On Mon, 2023-05-15 at 17:46 +0200, Matthieu Baerts wrote:
> Hi Paolo, Mat,
>
> On 15/05/2023 10:16, Paolo Abeni wrote:
> > On Fri, 2023-05-12 at 12:24 -0700, Mat Martineau wrote:
> > > On Fri, 5 May 2023, Paolo Abeni wrote:
> > >
> > > > Ondrej reported a functional issue WRT timeout handling on connect
> > > > with a nice reproducer.
> > > >
> > > > The problem is that the current mptcp connect waits for both the
> > > > MPTCP socket level timeout, and the first subflow socket timeout.
> > > > The latter is not influenced/touched by the exposed setsockopt().
> > > >
> > > > Overall the above makes the SO_SNDTIMEO a no-op on connect.
> > > >
> > > > Since mptcp_connect is invoked via inet_stream_connect and the
> > > > latter properly handle the MPTCP level timeout, we can address the
> > > > issue making the nested subflow level connect always unblocking.
> > > >
> > > > This also allow simplifying a bit the code, dropping an ugly hack
> > > > to handle the fastopen and custom proto_ops connect.
> > > >
> > > > The issues predates the blamed commit below, but the current resolution
> > > > requires the infrastructure introduced there.
> > > >
> > > > Fixes: 54f1944ed6d2 ("mptcp: factor out mptcp_connect()")
> > > > Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/399
> > > > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > > > ---
> > > > Note: there is needed follow-up correctly propagating the SYN_SENT
> > > > -> CLOSE state transition from the first subflow to the msk socket in
> > > > case of connect error, but that is somewhat independed/pre-existent.
> > > >
> > > > Sending out early for testing.
> > >
> > > Hi Paolo -
> > >
> > > Patch LGTM, and appreciate the cleanup of msk->connect_flags:
> > >
> > > Reviewed-by: Mat Martineau <martineau@kernel.org>
> > >
> > > I think it would be good to also add a selftest in net-next to check the
> > > connect timeout (set to a very short time), but that doesn't need to block
> > > this -net change.
> >
> > Agreed. Perhaps a pktdrill script would could be simpler option?!?
>
> Good idea, that seems more appropriated, especially with the timeout part.
>
> Just to avoid deadlocks: is there someone working on that? :)
Yep, just opended a PR:
https://github.com/multipath-tcp/packetdrill/pull/115
/P
On Fri, 2023-05-05 at 21:34 +0200, Paolo Abeni wrote:
> Note: there is needed follow-up correctly propagating the SYN_SENT
> -> CLOSE state transition from the first subflow to the msk socket in
> case of connect error, but that is somewhat independed/pre-existent.
Re-reading the relevant code, there is no need for such follow-up:
commit 1249db44a102 ("mptcp: be careful on subflow status propagation
on errors") already take care of it.
Cheers,
Paolo
© 2016 - 2026 Red Hat, Inc.