When a peer decides to close one subflow in the middle of a connection
having multiple subflows, the receiver of the first FIN should accept
that, and close the subflow on its side as well. If not, the subflow
will stay half closed, and would even continue to be used until the end
of the MPTCP connection or a reset from the network.
The issue has not been seen before, probably because the in-kernel
path-manager always sends a RM_ADDR before closing the subflow. Upon the
reception of this RM_ADDR, the other peer will initiate the closure on
its side as well. On the other hand, if the RM_ADDR is lost, or if the
path-manager of the other peer only closes the subflow without sending a
RM_ADDR, the subflow would switch to TCP_CLOSE_WAIT, but that's it,
leaving the subflow half-closed.
So now, when the subflow switches to the TCP_CLOSE_WAIT state, and if
the MPTCP connection has not been closed before with a DATA_FIN, the
kernel owning the subflow schedules its worker to initiate the closure
on its side as well.
This issue can be easily reproduced with packetdrill, as visible in [1],
by creating an additional subflow, injecting a FIN+ACK before sending
the DATA_FIN, and expecting a FIN+ACK in return.
Fixes: 40947e13997a ("mptcp: schedule worker when subflow is closed")
Link: https://github.com/multipath-tcp/packetdrill/pull/154 [1]
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
net/mptcp/protocol.c | 5 ++++-
net/mptcp/subflow.c | 8 ++++++--
2 files changed, 10 insertions(+), 3 deletions(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 13777c35496c..609d684135dc 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -2533,8 +2533,11 @@ static void __mptcp_close_subflow(struct sock *sk)
mptcp_for_each_subflow_safe(msk, subflow, tmp) {
struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
+ int ssk_state = inet_sk_state_load(ssk);
- if (inet_sk_state_load(ssk) != TCP_CLOSE)
+ if (ssk_state != TCP_CLOSE &&
+ (ssk->sk_state != TCP_CLOSE_WAIT ||
+ inet_sk_state_load(sk) != TCP_ESTABLISHED))
continue;
/* 'subflow_data_ready' will re-sched once rx queue is empty */
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index a7fb4d46e024..723cd3fbba32 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -1255,12 +1255,16 @@ static void mptcp_subflow_discard_data(struct sock *ssk, struct sk_buff *skb,
/* sched mptcp worker to remove the subflow if no more data is pending */
static void subflow_sched_work_if_closed(struct mptcp_sock *msk, struct sock *ssk)
{
- if (likely(ssk->sk_state != TCP_CLOSE))
+ struct sock *sk = (struct sock *)msk;
+
+ if (likely(ssk->sk_state != TCP_CLOSE &&
+ (ssk->sk_state != TCP_CLOSE_WAIT ||
+ inet_sk_state_load(sk) != TCP_ESTABLISHED)))
return;
if (skb_queue_empty(&ssk->sk_receive_queue) &&
!test_and_set_bit(MPTCP_WORK_CLOSE_SUBFLOW, &msk->flags))
- mptcp_schedule_work((struct sock *)msk);
+ mptcp_schedule_work(sk);
}
static bool subflow_can_fallback(struct mptcp_subflow_context *subflow)
--
2.45.2
On Fri, 2 Aug 2024, Matthieu Baerts (NGI0) wrote: > When a peer decides to close one subflow in the middle of a connection > having multiple subflows, the receiver of the first FIN should accept > that, and close the subflow on its side as well. If not, the subflow > will stay half closed, and would even continue to be used until the end > of the MPTCP connection or a reset from the network. > > The issue has not been seen before, probably because the in-kernel > path-manager always sends a RM_ADDR before closing the subflow. Upon the > reception of this RM_ADDR, the other peer will initiate the closure on > its side as well. On the other hand, if the RM_ADDR is lost, or if the > path-manager of the other peer only closes the subflow without sending a > RM_ADDR, the subflow would switch to TCP_CLOSE_WAIT, but that's it, > leaving the subflow half-closed. > > So now, when the subflow switches to the TCP_CLOSE_WAIT state, and if > the MPTCP connection has not been closed before with a DATA_FIN, the > kernel owning the subflow schedules its worker to initiate the closure > on its side as well. > > This issue can be easily reproduced with packetdrill, as visible in [1], > by creating an additional subflow, injecting a FIN+ACK before sending > the DATA_FIN, and expecting a FIN+ACK in return. > > Fixes: 40947e13997a ("mptcp: schedule worker when subflow is closed") > Link: https://github.com/multipath-tcp/packetdrill/pull/154 [1] > Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> > --- > net/mptcp/protocol.c | 5 ++++- > net/mptcp/subflow.c | 8 ++++++-- > 2 files changed, 10 insertions(+), 3 deletions(-) > > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c > index 13777c35496c..609d684135dc 100644 > --- a/net/mptcp/protocol.c > +++ b/net/mptcp/protocol.c > @@ -2533,8 +2533,11 @@ static void __mptcp_close_subflow(struct sock *sk) > > mptcp_for_each_subflow_safe(msk, subflow, tmp) { > struct sock *ssk = mptcp_subflow_tcp_sock(subflow); > + int ssk_state = inet_sk_state_load(ssk); > > - if (inet_sk_state_load(ssk) != TCP_CLOSE) > + if (ssk_state != TCP_CLOSE && > + (ssk->sk_state != TCP_CLOSE_WAIT || Hi Matthieu - Looks like both those ssk checks should use ssk_state, but otherwise the patch looks good. - Mat > + inet_sk_state_load(sk) != TCP_ESTABLISHED)) > continue; > > /* 'subflow_data_ready' will re-sched once rx queue is empty */ > diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c > index a7fb4d46e024..723cd3fbba32 100644 > --- a/net/mptcp/subflow.c > +++ b/net/mptcp/subflow.c > @@ -1255,12 +1255,16 @@ static void mptcp_subflow_discard_data(struct sock *ssk, struct sk_buff *skb, > /* sched mptcp worker to remove the subflow if no more data is pending */ > static void subflow_sched_work_if_closed(struct mptcp_sock *msk, struct sock *ssk) > { > - if (likely(ssk->sk_state != TCP_CLOSE)) > + struct sock *sk = (struct sock *)msk; > + > + if (likely(ssk->sk_state != TCP_CLOSE && > + (ssk->sk_state != TCP_CLOSE_WAIT || > + inet_sk_state_load(sk) != TCP_ESTABLISHED))) > return; > > if (skb_queue_empty(&ssk->sk_receive_queue) && > !test_and_set_bit(MPTCP_WORK_CLOSE_SUBFLOW, &msk->flags)) > - mptcp_schedule_work((struct sock *)msk); > + mptcp_schedule_work(sk); > } > > static bool subflow_can_fallback(struct mptcp_subflow_context *subflow) > > -- > 2.45.2 > >
Hi Mat, On 09/08/2024 02:28, Mat Martineau wrote: > On Fri, 2 Aug 2024, Matthieu Baerts (NGI0) wrote: > >> When a peer decides to close one subflow in the middle of a connection >> having multiple subflows, the receiver of the first FIN should accept >> that, and close the subflow on its side as well. If not, the subflow >> will stay half closed, and would even continue to be used until the end >> of the MPTCP connection or a reset from the network. >> >> The issue has not been seen before, probably because the in-kernel >> path-manager always sends a RM_ADDR before closing the subflow. Upon the >> reception of this RM_ADDR, the other peer will initiate the closure on >> its side as well. On the other hand, if the RM_ADDR is lost, or if the >> path-manager of the other peer only closes the subflow without sending a >> RM_ADDR, the subflow would switch to TCP_CLOSE_WAIT, but that's it, >> leaving the subflow half-closed. >> >> So now, when the subflow switches to the TCP_CLOSE_WAIT state, and if >> the MPTCP connection has not been closed before with a DATA_FIN, the >> kernel owning the subflow schedules its worker to initiate the closure >> on its side as well. >> >> This issue can be easily reproduced with packetdrill, as visible in [1], >> by creating an additional subflow, injecting a FIN+ACK before sending >> the DATA_FIN, and expecting a FIN+ACK in return. >> >> Fixes: 40947e13997a ("mptcp: schedule worker when subflow is closed") >> Link: https://github.com/multipath-tcp/packetdrill/pull/154 [1] >> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> >> --- >> net/mptcp/protocol.c | 5 ++++- >> net/mptcp/subflow.c | 8 ++++++-- >> 2 files changed, 10 insertions(+), 3 deletions(-) >> >> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c >> index 13777c35496c..609d684135dc 100644 >> --- a/net/mptcp/protocol.c >> +++ b/net/mptcp/protocol.c >> @@ -2533,8 +2533,11 @@ static void __mptcp_close_subflow(struct sock *sk) >> >> mptcp_for_each_subflow_safe(msk, subflow, tmp) { >> struct sock *ssk = mptcp_subflow_tcp_sock(subflow); >> + int ssk_state = inet_sk_state_load(ssk); >> >> - if (inet_sk_state_load(ssk) != TCP_CLOSE) >> + if (ssk_state != TCP_CLOSE && >> + (ssk->sk_state != TCP_CLOSE_WAIT || > > Hi Matthieu - > > Looks like both those ssk checks should use ssk_state, but otherwise the > patch looks good. Good catch! Will fix that in the v7. Cheers, Matt -- Sponsored by the NGI0 Core fund.
© 2016 - 2024 Red Hat, Inc.