net/mptcp/protocol.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
From: Menglong Dong <imagedong@tencent.com>
All of the subflows of a msk will be orphaned in mptcp_close(), which
means the subflows are in DEAD state. After then, DATA_FIN will be sent,
and the other side will response with a DATA_ACK for this DATA_FIN.
However, if the other side still has pending data, the data that received
on these subflows will not be passed to the msk, as they are DEAD and
subflow_data_ready() will not be called in tcp_data_ready(). Therefore,
these data can't be acked, and they will be retransmitted again and again,
until timeout.
Fix this by not orphaning the subflows in mptcp_close(). I think that's
ok, as they can still be orphaned in __mptcp_close_ssk(). Maybe we can
also fix this in mptcp_incoming_options() by check the status of ssk and
msk?
Reviewed-by: Biao Jiang <benbjiang@tencent.com>
Reviewed-by: Mengen Sun <mengensun@tencent.com>
Signed-off-by: Menglong Dong <imagedong@tencent.com>
---
net/mptcp/protocol.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 3796d1bfef6b..5ac1584baf61 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -815,7 +815,7 @@ void mptcp_data_ready(struct sock *sk, struct sock *ssk)
/* Wake-up the reader only for in-sequence data */
mptcp_data_lock(sk);
- if (move_skbs_to_msk(msk, ssk))
+ if (move_skbs_to_msk(msk, ssk) && !sock_flag(sk, SOCK_DEAD))
sk->sk_data_ready(sk);
mptcp_data_unlock(sk);
@@ -2940,7 +2940,6 @@ bool __mptcp_close(struct sock *sk, long timeout)
if (ssk == msk->first)
subflow->fail_tout = 0;
- sock_orphan(ssk);
unlock_sock_fast(ssk, slow);
}
sock_orphan(sk);
--
2.37.2
Hello, On Fri, 2022-11-18 at 15:20 +0800, menglong8.dong@gmail.com wrote: > From: Menglong Dong <imagedong@tencent.com> > > All of the subflows of a msk will be orphaned in mptcp_close(), which > means the subflows are in DEAD state. After then, DATA_FIN will be sent, > and the other side will response with a DATA_ACK for this DATA_FIN. > > However, if the other side still has pending data, the data that received > on these subflows will not be passed to the msk, as they are DEAD and > subflow_data_ready() will not be called in tcp_data_ready(). Therefore, > these data can't be acked, and they will be retransmitted again and again, > until timeout. > > Fix this by not orphaning the subflows in mptcp_close(). I think that's > ok, as they can still be orphaned in __mptcp_close_ssk(). Maybe we can > also fix this in mptcp_incoming_options() by check the status of ssk and > msk? Thank you for clearly debugging this scenario! As noted by the CI, this fix causes an UaF, as mptcp_close() frees the msk->sk_socket struct, which is also referenced by ssk->sk_socket. Without orphaning the subflows, later tcp code will use ssk->sk_socket, causing the reported splat. Note that there is the same risk with ssk- >sk_wq. Since: * all the network code carefully checks for not NULL value of ssk- >sk_socket before accessing it * access it directly in context where it can't be NULL * uses the same policies to access 'sk_wq' * it looks like the SOCK_DONE flag can be set independently from the sock_orphan() I think mptcp_close() could instead directly clearing ssk->sk_socket and ssk->sk_wq - just ' = NULL', no sock_orphan() call - and __mptcp_close_ssk() could be changed accordingly: diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 8679eafeeca8..6b9f390966f7 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -2340,11 +2340,7 @@ static void __mptcp_close_ssk(struct sock *sk, struct sock *ssk, goto out; } - /* if we are invoked by the msk cleanup code, the subflow is - * already orphaned - */ - if (ssk->sk_socket) - sock_orphan(ssk); + sock_orphan(ssk); subflow->disposable = 1; > Reviewed-by: Biao Jiang <benbjiang@tencent.com> > Reviewed-by: Mengen Sun <mengensun@tencent.com> > Signed-off-by: Menglong Dong <imagedong@tencent.com> > --- > net/mptcp/protocol.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c > index 3796d1bfef6b..5ac1584baf61 100644 > --- a/net/mptcp/protocol.c > +++ b/net/mptcp/protocol.c > @@ -815,7 +815,7 @@ void mptcp_data_ready(struct sock *sk, struct sock *ssk) > > /* Wake-up the reader only for in-sequence data */ > mptcp_data_lock(sk); > - if (move_skbs_to_msk(msk, ssk)) > + if (move_skbs_to_msk(msk, ssk) && !sock_flag(sk, SOCK_DEAD)) > sk->sk_data_ready(sk); I think this is not need right? sk_data_ready == sock_def_readable, will check carefully sk->sk_wq which should be NULL after close. Thank! Paolo
Hello, On Fri, Nov 18, 2022 at 6:34 PM Paolo Abeni <pabeni@redhat.com> wrote: > [......] > > Thank you for clearly debugging this scenario! > > As noted by the CI, this fix causes an UaF, as mptcp_close() frees the > msk->sk_socket struct, which is also referenced by ssk->sk_socket. > > Without orphaning the subflows, later tcp code will use ssk->sk_socket, > causing the reported splat. Note that there is the same risk with ssk- > >sk_wq. > > Since: > * all the network code carefully checks for not NULL value of ssk- > >sk_socket before accessing it > * access it directly in context where it can't be NULL > * uses the same policies to access 'sk_wq' > * it looks like the SOCK_DONE flag can be set independently from the > sock_orphan() > > I think mptcp_close() could instead directly clearing ssk->sk_socket > and ssk->sk_wq - just ' = NULL', no sock_orphan() call - and > __mptcp_close_ssk() could be changed accordingly: > > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c > index 8679eafeeca8..6b9f390966f7 100644 > --- a/net/mptcp/protocol.c > +++ b/net/mptcp/protocol.c > @@ -2340,11 +2340,7 @@ static void __mptcp_close_ssk(struct sock *sk, > struct sock *ssk, > goto out; > } > > - /* if we are invoked by the msk cleanup code, the subflow is > - * already orphaned > - */ > - if (ssk->sk_socket) > - sock_orphan(ssk); > + sock_orphan(ssk); > > subflow->disposable = 1; > Thanks for your detailed explanation, and I'll send the V2 according to it! > > Reviewed-by: Biao Jiang <benbjiang@tencent.com> > > Reviewed-by: Mengen Sun <mengensun@tencent.com> > > Signed-off-by: Menglong Dong <imagedong@tencent.com> > > --- > > net/mptcp/protocol.c | 3 +-- > > 1 file changed, 1 insertion(+), 2 deletions(-) > > > > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c > > index 3796d1bfef6b..5ac1584baf61 100644 > > --- a/net/mptcp/protocol.c > > +++ b/net/mptcp/protocol.c > > @@ -815,7 +815,7 @@ void mptcp_data_ready(struct sock *sk, struct sock *ssk) > > > > /* Wake-up the reader only for in-sequence data */ > > mptcp_data_lock(sk); > > - if (move_skbs_to_msk(msk, ssk)) > > + if (move_skbs_to_msk(msk, ssk) && !sock_flag(sk, SOCK_DEAD)) > > sk->sk_data_ready(sk); > > I think this is not need right? sk_data_ready == sock_def_readable, > will check carefully sk->sk_wq which should be NULL after close. > Yeah, this check seems unnecessary, I'll remove it. > Thank! > > Paolo >
On Fri, 2022-11-18 at 11:34 +0100, Paolo Abeni wrote: > As noted by the CI, this fix causes an UaF, as mptcp_close() frees the > msk->sk_socket struct, which is also referenced by ssk->sk_socket. I almost forgot: see mptcp_sock_graft() > Without orphaning the subflows, later tcp code will use ssk->sk_socket, > causing the reported splat. Note that there is the same risk with ssk- > > sk_wq. > > Since: > * all the network code carefully checks for not NULL value of ssk- > > sk_socket before accessing it > * access it directly in context where it can't be NULL > * uses the same policies to access 'sk_wq' > * it looks like the SOCK_DONE flag can be set independently from the > sock_orphan() > > I think mptcp_close() could instead directly clearing ssk->sk_socket > and ssk->sk_wq - just ' = NULL', no sock_orphan() call To be explicit, I mean something alike: diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 8679eafeeca8..0d5109073e44 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -2938,7 +2938,11 @@ bool __mptcp_close(struct sock *sk, long timeout) if (ssk == msk->first) subflow->fail_tout = 0; - sock_orphan(ssk); + /* detach from the parent socket, but allow data_ready to + * push incoming date into the mptcp stack, to properly ack it + */ + ssk->sk_socket = NULL; + ssk->sk_wq = NULL; unlock_sock_fast(ssk, slow); } sock_orphan(sk); Cheers, Paolo
Hi Menglong, On 18/11/2022 08:20, menglong8.dong@gmail.com wrote: > From: Menglong Dong <imagedong@tencent.com> > > All of the subflows of a msk will be orphaned in mptcp_close(), which > means the subflows are in DEAD state. After then, DATA_FIN will be sent, > and the other side will response with a DATA_ACK for this DATA_FIN. > > However, if the other side still has pending data, the data that received > on these subflows will not be passed to the msk, as they are DEAD and > subflow_data_ready() will not be called in tcp_data_ready(). Therefore, > these data can't be acked, and they will be retransmitted again and again, > until timeout. Thank you for the patch. I don't know if you noticed but the CI found an issue with it: - KVM Validation: debug: - Unstable: 1 failed test(s): packetdrill_add_addr - Critical: 1 Call Trace(s) ❌: - Task: https://cirrus-ci.com/task/6018056502116352 - Summary: https://api.cirrus-ci.com/v1/artifact/task/6018056502116352/summary/summary.txt Do you mind looking at that please? Cheers, Matt -- Tessares | Belgium | Hybrid Access Solutions www.tessares.net
On Fri, Nov 18, 2022 at 6:10 PM Matthieu Baerts <matthieu.baerts@tessares.net> wrote: > > Hi Menglong, > > On 18/11/2022 08:20, menglong8.dong@gmail.com wrote: > > From: Menglong Dong <imagedong@tencent.com> > > > > All of the subflows of a msk will be orphaned in mptcp_close(), which > > means the subflows are in DEAD state. After then, DATA_FIN will be sent, > > and the other side will response with a DATA_ACK for this DATA_FIN. > > > > However, if the other side still has pending data, the data that received > > on these subflows will not be passed to the msk, as they are DEAD and > > subflow_data_ready() will not be called in tcp_data_ready(). Therefore, > > these data can't be acked, and they will be retransmitted again and again, > > until timeout. > Thank you for the patch. > > I don't know if you noticed but the CI found an issue with it: > > - KVM Validation: debug: > - Unstable: 1 failed test(s): packetdrill_add_addr - Critical: 1 Call > Trace(s) ❌: > - Task: https://cirrus-ci.com/task/6018056502116352 > - Summary: > https://api.cirrus-ci.com/v1/artifact/task/6018056502116352/summary/summary.txt > > Do you mind looking at that please? > Yeah, I noticed it. Seems the ssk is using msk->sk_socket, which will be freed after mptcp_close(). I'm still thinking :/ Thanks! Menglong Dong > Cheers, > Matt > -- > Tessares | Belgium | Hybrid Access Solutions > www.tessares.net
Hi Menglong, Thank you for your modifications, that's great! Our CI did some validations and here is its report: - KVM Validation: normal: - Success! ✅: - Task: https://cirrus-ci.com/task/4892156595273728 - Summary: https://api.cirrus-ci.com/v1/artifact/task/4892156595273728/summary/summary.txt - KVM Validation: debug: - Unstable: 1 failed test(s): packetdrill_add_addr - Critical: 1 Call Trace(s) ❌: - Task: https://cirrus-ci.com/task/6018056502116352 - Summary: https://api.cirrus-ci.com/v1/artifact/task/6018056502116352/summary/summary.txt Initiator: Patchew Applier Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/0061c26274eb If there are some issues, you can reproduce them using the same environment as the one used by the CI thanks to a docker image, e.g.: $ cd [kernel source code] $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \ --pull always mptcp/mptcp-upstream-virtme-docker:latest \ auto-debug For more details: https://github.com/multipath-tcp/mptcp-upstream-virtme-docker Please note that despite all the efforts that have been already done to have a stable tests suite when executed on a public CI like here, it is possible some reported issues are not due to your modifications. Still, do not hesitate to help us improve that ;-) Cheers, MPTCP GH Action bot Bot operated by Matthieu Baerts (Tessares)
© 2016 - 2024 Red Hat, Inc.