When the MPTCP DATA FIN have been ACKed, there is no more MPTCP related
metadata to exchange, and all subflows can be safely shutdown.
Before this patch, the subflows were actually terminated at 'close()'
time. That's certainly fine most of the time, but not when the userspace
'shutdown()' a connection, without close()ing it. When doing so, the
subflows were staying in LAST_ACK state on one side -- and consequently
in FIN_WAIT2 on the other side -- until the 'close()' of the MPTCP
socket.
Now, when the DATA FIN have been ACKed, all subflows are shutdown. A
consequence of this is that the TCP 'FIN' flag can be set earlier now,
but the end result is the same. This affects the packetdrill tests
looking at the end of the MPTCP connections, but for a good reason.
Fixes: 3721b9b64676 ("mptcp: Track received DATA_FIN sequence number and add related helpers")
Fixes: 16a9a9da1723 ("mptcp: Add helper to process acks of DATA_FIN")
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
net/mptcp/protocol.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 1c26acf6c4145896ecbb5c7f7004121c66a20649..9feb9f437c2bdfff392249e7ad00edd09ba67ac3 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -372,6 +372,20 @@ static void mptcp_close_wake_up(struct sock *sk)
sk_wake_async(sk, SOCK_WAKE_WAITD, POLL_IN);
}
+static void mptcp_shutdown_subflows(struct mptcp_sock *msk)
+{
+ struct mptcp_subflow_context *subflow;
+
+ mptcp_for_each_subflow(msk, subflow) {
+ struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
+ bool slow;
+
+ slow = lock_sock_fast(ssk);
+ tcp_shutdown(ssk, SEND_SHUTDOWN);
+ unlock_sock_fast(ssk, slow);
+ }
+}
+
/* called under the msk socket lock */
static bool mptcp_pending_data_fin_ack(struct sock *sk)
{
@@ -396,6 +410,7 @@ static void mptcp_check_data_fin_ack(struct sock *sk)
break;
case TCP_CLOSING:
case TCP_LAST_ACK:
+ mptcp_shutdown_subflows(msk);
mptcp_set_state(sk, TCP_CLOSE);
break;
}
@@ -564,6 +579,7 @@ static bool mptcp_check_data_fin(struct sock *sk)
mptcp_set_state(sk, TCP_CLOSING);
break;
case TCP_FIN_WAIT2:
+ mptcp_shutdown_subflows(msk);
mptcp_set_state(sk, TCP_CLOSE);
break;
default:
--
2.51.0
On Fri, 2025-09-05 at 20:18 +0200, Matthieu Baerts (NGI0) wrote: > When the MPTCP DATA FIN have been ACKed, there is no more MPTCP > related > metadata to exchange, and all subflows can be safely shutdown. > > Before this patch, the subflows were actually terminated at 'close()' > time. That's certainly fine most of the time, but not when the > userspace > 'shutdown()' a connection, without close()ing it. When doing so, the > subflows were staying in LAST_ACK state on one side -- and > consequently > in FIN_WAIT2 on the other side -- until the 'close()' of the MPTCP > socket. > > Now, when the DATA FIN have been ACKed, all subflows are shutdown. A > consequence of this is that the TCP 'FIN' flag can be set earlier > now, > but the end result is the same. This affects the packetdrill tests > looking at the end of the MPTCP connections, but for a good reason. > > Fixes: 3721b9b64676 ("mptcp: Track received DATA_FIN sequence number > and add related helpers") > Fixes: 16a9a9da1723 ("mptcp: Add helper to process acks of DATA_FIN") > Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> > --- > net/mptcp/protocol.c | 16 ++++++++++++++++ > 1 file changed, 16 insertions(+) > > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c > index > 1c26acf6c4145896ecbb5c7f7004121c66a20649..9feb9f437c2bdfff392249e7ad0 > 0edd09ba67ac3 100644 > --- a/net/mptcp/protocol.c > +++ b/net/mptcp/protocol.c > @@ -372,6 +372,20 @@ static void mptcp_close_wake_up(struct sock *sk) > sk_wake_async(sk, SOCK_WAKE_WAITD, POLL_IN); > } > > +static void mptcp_shutdown_subflows(struct mptcp_sock *msk) > +{ > + struct mptcp_subflow_context *subflow; > + > + mptcp_for_each_subflow(msk, subflow) { > + struct sock *ssk = mptcp_subflow_tcp_sock(subflow); > + bool slow; > + > + slow = lock_sock_fast(ssk); > + tcp_shutdown(ssk, SEND_SHUTDOWN); > + unlock_sock_fast(ssk, slow); > + } > +} > + > /* called under the msk socket lock */ > static bool mptcp_pending_data_fin_ack(struct sock *sk) > { > @@ -396,6 +410,7 @@ static void mptcp_check_data_fin_ack(struct sock > *sk) > break; > case TCP_CLOSING: > case TCP_LAST_ACK: > + mptcp_shutdown_subflows(msk); > mptcp_set_state(sk, TCP_CLOSE); > break; > } > @@ -564,6 +579,7 @@ static bool mptcp_check_data_fin(struct sock *sk) > mptcp_set_state(sk, TCP_CLOSING); > break; > case TCP_FIN_WAIT2: > + mptcp_shutdown_subflows(msk); I think we should not directly call mptcp_shutdown_subflows() within mptcp_check_data_fin() and mptcp_check_data_fin_ack(), but should instead call it after these functions return and the sk state is TCP_CLOSE. This ensures that the original purpose of these two functions remains unchanged. WDYT? Thanks, -Geliang > mptcp_set_state(sk, TCP_CLOSE); > break; > default:
On Fri, 2025-09-05 at 20:18 +0200, Matthieu Baerts (NGI0) wrote: > When the MPTCP DATA FIN have been ACKed, there is no more MPTCP > related > metadata to exchange, and all subflows can be safely shutdown. > > Before this patch, the subflows were actually terminated at 'close()' > time. That's certainly fine most of the time, but not when the > userspace > 'shutdown()' a connection, without close()ing it. When doing so, the > subflows were staying in LAST_ACK state on one side -- and > consequently > in FIN_WAIT2 on the other side -- until the 'close()' of the MPTCP > socket. > > Now, when the DATA FIN have been ACKed, all subflows are shutdown. A > consequence of this is that the TCP 'FIN' flag can be set earlier > now, > but the end result is the same. This affects the packetdrill tests > looking at the end of the MPTCP connections, but for a good reason. > > Fixes: 3721b9b64676 ("mptcp: Track received DATA_FIN sequence number > and add related helpers") > Fixes: 16a9a9da1723 ("mptcp: Add helper to process acks of DATA_FIN") > Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> > --- > net/mptcp/protocol.c | 16 ++++++++++++++++ > 1 file changed, 16 insertions(+) > > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c > index > 1c26acf6c4145896ecbb5c7f7004121c66a20649..9feb9f437c2bdfff392249e7ad0 > 0edd09ba67ac3 100644 > --- a/net/mptcp/protocol.c > +++ b/net/mptcp/protocol.c > @@ -372,6 +372,20 @@ static void mptcp_close_wake_up(struct sock *sk) > sk_wake_async(sk, SOCK_WAKE_WAITD, POLL_IN); > } > > +static void mptcp_shutdown_subflows(struct mptcp_sock *msk) > +{ > + struct mptcp_subflow_context *subflow; > + > + mptcp_for_each_subflow(msk, subflow) { > + struct sock *ssk = mptcp_subflow_tcp_sock(subflow); > + bool slow; > + > + slow = lock_sock_fast(ssk); > + tcp_shutdown(ssk, SEND_SHUTDOWN); > + unlock_sock_fast(ssk, slow); > + } > +} > + > /* called under the msk socket lock */ > static bool mptcp_pending_data_fin_ack(struct sock *sk) > { > @@ -396,6 +410,7 @@ static void mptcp_check_data_fin_ack(struct sock > *sk) > break; > case TCP_CLOSING: > case TCP_LAST_ACK: > + mptcp_shutdown_subflows(msk); > mptcp_set_state(sk, TCP_CLOSE); > break; > } > @@ -564,6 +579,7 @@ static bool mptcp_check_data_fin(struct sock *sk) > mptcp_set_state(sk, TCP_CLOSING); > break; > case TCP_FIN_WAIT2: > + mptcp_shutdown_subflows(msk); I think we should not directly call mptcp_shutdown_subflows() within mptcp_check_data_fin() and mptcp_check_data_fin_ack(), but should instead call it after these functions return and the sk state is TCP_CLOSE. This ensures that the original purpose of these two functions remains unchanged. WDYT? Thanks, -Geliang > mptcp_set_state(sk, TCP_CLOSE); > break; > default:
Hi Geliang, On 06/09/2025 02:45, Geliang Tang wrote: > On Fri, 2025-09-05 at 20:18 +0200, Matthieu Baerts (NGI0) wrote: >> When the MPTCP DATA FIN have been ACKed, there is no more MPTCP >> related >> metadata to exchange, and all subflows can be safely shutdown. >> >> Before this patch, the subflows were actually terminated at 'close()' >> time. That's certainly fine most of the time, but not when the >> userspace >> 'shutdown()' a connection, without close()ing it. When doing so, the >> subflows were staying in LAST_ACK state on one side -- and >> consequently >> in FIN_WAIT2 on the other side -- until the 'close()' of the MPTCP >> socket. >> >> Now, when the DATA FIN have been ACKed, all subflows are shutdown. A >> consequence of this is that the TCP 'FIN' flag can be set earlier >> now, >> but the end result is the same. This affects the packetdrill tests >> looking at the end of the MPTCP connections, but for a good reason. >> >> Fixes: 3721b9b64676 ("mptcp: Track received DATA_FIN sequence number >> and add related helpers") >> Fixes: 16a9a9da1723 ("mptcp: Add helper to process acks of DATA_FIN") >> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> >> --- >> net/mptcp/protocol.c | 16 ++++++++++++++++ >> 1 file changed, 16 insertions(+) >> >> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c >> index >> 1c26acf6c4145896ecbb5c7f7004121c66a20649..9feb9f437c2bdfff392249e7ad0 >> 0edd09ba67ac3 100644 >> --- a/net/mptcp/protocol.c >> +++ b/net/mptcp/protocol.c >> @@ -372,6 +372,20 @@ static void mptcp_close_wake_up(struct sock *sk) >> sk_wake_async(sk, SOCK_WAKE_WAITD, POLL_IN); >> } >> >> +static void mptcp_shutdown_subflows(struct mptcp_sock *msk) >> +{ >> + struct mptcp_subflow_context *subflow; >> + >> + mptcp_for_each_subflow(msk, subflow) { >> + struct sock *ssk = mptcp_subflow_tcp_sock(subflow); >> + bool slow; >> + >> + slow = lock_sock_fast(ssk); >> + tcp_shutdown(ssk, SEND_SHUTDOWN); >> + unlock_sock_fast(ssk, slow); >> + } >> +} >> + >> /* called under the msk socket lock */ >> static bool mptcp_pending_data_fin_ack(struct sock *sk) >> { >> @@ -396,6 +410,7 @@ static void mptcp_check_data_fin_ack(struct sock >> *sk) >> break; >> case TCP_CLOSING: >> case TCP_LAST_ACK: >> + mptcp_shutdown_subflows(msk); >> mptcp_set_state(sk, TCP_CLOSE); >> break; >> } >> @@ -564,6 +579,7 @@ static bool mptcp_check_data_fin(struct sock *sk) >> mptcp_set_state(sk, TCP_CLOSING); >> break; >> case TCP_FIN_WAIT2: >> + mptcp_shutdown_subflows(msk); > > I think we should not directly call mptcp_shutdown_subflows() within > mptcp_check_data_fin() and mptcp_check_data_fin_ack(), but should > instead call it after these functions return and the sk state is > TCP_CLOSE. This ensures that the original purpose of these two > functions remains unchanged. Thank you for the review. I'm not sure to understand what you mean about the "original purpose": they both set the appropriated actions when DATA_FIN have been received. Here, I do want to shut the subflows down when the DATA_FIN ACK are received, so that's the good spot, no? Note that commit 3721b9b64676 ("mptcp: Track received DATA_FIN sequence number and add related helpers") already had a hint of what should have been done: see the "Close subflows now?" comment, later removed in commit e16163b6e2b7 ("mptcp: refactor shutdown and close"). Or maybe do you mean their original name "check_data_fin(_ack)" is confusing? The "check" here is not just to check if the DATA_FIN (ACK) was set, but to check actions related to them. Cheers, Matt -- Sponsored by the NGI0 Core fund.
Hi Matt, On Sat, 2025-09-06 at 15:42 +0200, Matthieu Baerts wrote: > Hi Geliang, > > On 06/09/2025 02:45, Geliang Tang wrote: > > On Fri, 2025-09-05 at 20:18 +0200, Matthieu Baerts (NGI0) wrote: > > > When the MPTCP DATA FIN have been ACKed, there is no more MPTCP > > > related > > > metadata to exchange, and all subflows can be safely shutdown. > > > > > > Before this patch, the subflows were actually terminated at > > > 'close()' > > > time. That's certainly fine most of the time, but not when the > > > userspace > > > 'shutdown()' a connection, without close()ing it. When doing so, > > > the > > > subflows were staying in LAST_ACK state on one side -- and > > > consequently > > > in FIN_WAIT2 on the other side -- until the 'close()' of the > > > MPTCP > > > socket. > > > > > > Now, when the DATA FIN have been ACKed, all subflows are > > > shutdown. A > > > consequence of this is that the TCP 'FIN' flag can be set earlier > > > now, > > > but the end result is the same. This affects the packetdrill > > > tests > > > looking at the end of the MPTCP connections, but for a good > > > reason. > > > > > > Fixes: 3721b9b64676 ("mptcp: Track received DATA_FIN sequence > > > number > > > and add related helpers") > > > Fixes: 16a9a9da1723 ("mptcp: Add helper to process acks of > > > DATA_FIN") > > > Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> > > > --- > > > net/mptcp/protocol.c | 16 ++++++++++++++++ > > > 1 file changed, 16 insertions(+) > > > > > > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c > > > index > > > 1c26acf6c4145896ecbb5c7f7004121c66a20649..9feb9f437c2bdfff392249e > > > 7ad0 > > > 0edd09ba67ac3 100644 > > > --- a/net/mptcp/protocol.c > > > +++ b/net/mptcp/protocol.c > > > @@ -372,6 +372,20 @@ static void mptcp_close_wake_up(struct sock > > > *sk) > > > sk_wake_async(sk, SOCK_WAKE_WAITD, POLL_IN); > > > } > > > > > > +static void mptcp_shutdown_subflows(struct mptcp_sock *msk) > > > +{ > > > + struct mptcp_subflow_context *subflow; > > > + > > > + mptcp_for_each_subflow(msk, subflow) { > > > + struct sock *ssk = > > > mptcp_subflow_tcp_sock(subflow); > > > + bool slow; > > > + > > > + slow = lock_sock_fast(ssk); > > > + tcp_shutdown(ssk, SEND_SHUTDOWN); > > > + unlock_sock_fast(ssk, slow); > > > + } > > > +} > > > + > > > /* called under the msk socket lock */ > > > static bool mptcp_pending_data_fin_ack(struct sock *sk) > > > { > > > @@ -396,6 +410,7 @@ static void mptcp_check_data_fin_ack(struct > > > sock > > > *sk) > > > break; > > > case TCP_CLOSING: > > > case TCP_LAST_ACK: > > > + mptcp_shutdown_subflows(msk); > > > mptcp_set_state(sk, TCP_CLOSE); > > > break; > > > } > > > @@ -564,6 +579,7 @@ static bool mptcp_check_data_fin(struct sock > > > *sk) > > > mptcp_set_state(sk, TCP_CLOSING); > > > break; > > > case TCP_FIN_WAIT2: > > > + mptcp_shutdown_subflows(msk); > > > > I think we should not directly call mptcp_shutdown_subflows() > > within > > mptcp_check_data_fin() and mptcp_check_data_fin_ack(), but should > > instead call it after these functions return and the sk state is > > TCP_CLOSE. This ensures that the original purpose of these two > > functions remains unchanged. > > Thank you for the review. I'm not sure to understand what you mean > about > the "original purpose": they both set the appropriated actions when > DATA_FIN have been received. Here, I do want to shut the subflows > down > when the DATA_FIN ACK are received, so that's the good spot, no? > > Note that commit 3721b9b64676 ("mptcp: Track received DATA_FIN > sequence > number and add related helpers") already had a hint of what should > have > been done: see the "Close subflows now?" comment, later removed in > commit e16163b6e2b7 ("mptcp: refactor shutdown and close"). > > Or maybe do you mean their original name "check_data_fin(_ack)" is > confusing? The "check" here is not just to check if the DATA_FIN > (ACK) > was set, but to check actions related to them. I mean not to modify mptcp_check_data_fin and mptcp_check_data_fin_ack, but to check TCP_CLOSE in mptcp_worker(), something like: @@ -2749,6 +2749,9 @@ static void mptcp_worker(struct work_struct *work) mptcp_check_data_fin_ack(sk); mptcp_check_data_fin(sk); + if (sk->sk_state == TCP_CLOSE) + mptcp_shutdown_subflows(msk); + if (test_and_clear_bit(MPTCP_WORK_CLOSE_SUBFLOW, &msk->flags)) __mptcp_close_subflow(sk); Another option is to call mptcp_shutdown_subflows in mptcp_close_wake_up, which one do you think is better? Thanks, -Geliang > > Cheers, > Matt
© 2016 - 2025 Red Hat, Inc.