[v1] mptcp: don't orphan ssk in mptcp_close()

[PATCH mptcp-net] mptcp: don't orphan ssk in mptcp_close()

Posted by menglong8.dong@gmail.com 1 year, 5 months ago

From: Menglong Dong <imagedong@tencent.com>

All of the subflows of a msk will be orphaned in mptcp_close(), which
means the subflows are in DEAD state. After then, DATA_FIN will be sent,
and the other side will response with a DATA_ACK for this DATA_FIN.

However, if the other side still has pending data, the data that received
on these subflows will not be passed to the msk, as they are DEAD and
subflow_data_ready() will not be called in tcp_data_ready(). Therefore,
these data can't be acked, and they will be retransmitted again and again,
until timeout.

Fix this by not orphaning the subflows in mptcp_close(). I think that's
ok, as they can still be orphaned in __mptcp_close_ssk(). Maybe we can
also fix this in mptcp_incoming_options() by check the status of ssk and
msk?

Reviewed-by: Biao Jiang <benbjiang@tencent.com>
Reviewed-by: Mengen Sun <mengensun@tencent.com>
Signed-off-by: Menglong Dong <imagedong@tencent.com>
---
 net/mptcp/protocol.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 3796d1bfef6b..5ac1584baf61 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -815,7 +815,7 @@ void mptcp_data_ready(struct sock *sk, struct sock *ssk)
 
 	/* Wake-up the reader only for in-sequence data */
 	mptcp_data_lock(sk);
-	if (move_skbs_to_msk(msk, ssk))
+	if (move_skbs_to_msk(msk, ssk) && !sock_flag(sk, SOCK_DEAD))
 		sk->sk_data_ready(sk);
 
 	mptcp_data_unlock(sk);
@@ -2940,7 +2940,6 @@ bool __mptcp_close(struct sock *sk, long timeout)
 		if (ssk == msk->first)
 			subflow->fail_tout = 0;
 
-		sock_orphan(ssk);
 		unlock_sock_fast(ssk, slow);
 	}
 	sock_orphan(sk);
-- 
2.37.2

Re: [PATCH mptcp-net] mptcp: don't orphan ssk in mptcp_close()

Posted by Paolo Abeni 1 year, 5 months ago

Hello,

On Fri, 2022-11-18 at 15:20 +0800, menglong8.dong@gmail.com wrote:
> From: Menglong Dong <imagedong@tencent.com>
> 
> All of the subflows of a msk will be orphaned in mptcp_close(), which
> means the subflows are in DEAD state. After then, DATA_FIN will be sent,
> and the other side will response with a DATA_ACK for this DATA_FIN.
> 
> However, if the other side still has pending data, the data that received
> on these subflows will not be passed to the msk, as they are DEAD and
> subflow_data_ready() will not be called in tcp_data_ready(). Therefore,
> these data can't be acked, and they will be retransmitted again and again,
> until timeout.
> 
> Fix this by not orphaning the subflows in mptcp_close(). I think that's
> ok, as they can still be orphaned in __mptcp_close_ssk(). Maybe we can
> also fix this in mptcp_incoming_options() by check the status of ssk and
> msk?

Thank you for clearly debugging this scenario!

As noted by the CI, this fix causes an UaF, as mptcp_close() frees the
msk->sk_socket struct, which is also referenced by ssk->sk_socket.

Without orphaning the subflows, later tcp code will use ssk->sk_socket,
causing the reported splat. Note that there is the same risk with ssk-
>sk_wq.

Since:
* all the network code carefully checks for not NULL value of ssk-
>sk_socket before accessing it
* access it directly in context where it can't be NULL
* uses the same policies to access 'sk_wq'
* it looks like the SOCK_DONE flag can be set independently from the
sock_orphan()

I think mptcp_close() could instead directly clearing ssk->sk_socket
and ssk->sk_wq - just ' = NULL', no sock_orphan() call - and
__mptcp_close_ssk() could be changed accordingly:

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 8679eafeeca8..6b9f390966f7 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -2340,11 +2340,7 @@ static void __mptcp_close_ssk(struct sock *sk,
struct sock *ssk,
                goto out;
        }
 
-       /* if we are invoked by the msk cleanup code, the subflow is
-        * already orphaned
-        */
-       if (ssk->sk_socket)
-               sock_orphan(ssk);
+       sock_orphan(ssk);
 
        subflow->disposable = 1;

> Reviewed-by: Biao Jiang <benbjiang@tencent.com>
> Reviewed-by: Mengen Sun <mengensun@tencent.com>
> Signed-off-by: Menglong Dong <imagedong@tencent.com>
> ---
>  net/mptcp/protocol.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 3796d1bfef6b..5ac1584baf61 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -815,7 +815,7 @@ void mptcp_data_ready(struct sock *sk, struct sock *ssk)
>  
>  	/* Wake-up the reader only for in-sequence data */
>  	mptcp_data_lock(sk);
> -	if (move_skbs_to_msk(msk, ssk))
> +	if (move_skbs_to_msk(msk, ssk) && !sock_flag(sk, SOCK_DEAD))
>  		sk->sk_data_ready(sk);

I think this is not need right? sk_data_ready == sock_def_readable,
will check carefully sk->sk_wq which should be NULL after close.

Thank!

Paolo

Re: [PATCH mptcp-net] mptcp: don't orphan ssk in mptcp_close()

Posted by Menglong Dong 1 year, 5 months ago

Hello,

On Fri, Nov 18, 2022 at 6:34 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
[......]
>
> Thank you for clearly debugging this scenario!
>
> As noted by the CI, this fix causes an UaF, as mptcp_close() frees the
> msk->sk_socket struct, which is also referenced by ssk->sk_socket.
>
> Without orphaning the subflows, later tcp code will use ssk->sk_socket,
> causing the reported splat. Note that there is the same risk with ssk-
> >sk_wq.
>
> Since:
> * all the network code carefully checks for not NULL value of ssk-
> >sk_socket before accessing it
> * access it directly in context where it can't be NULL
> * uses the same policies to access 'sk_wq'
> * it looks like the SOCK_DONE flag can be set independently from the
> sock_orphan()
>
> I think mptcp_close() could instead directly clearing ssk->sk_socket
> and ssk->sk_wq - just ' = NULL', no sock_orphan() call - and
> __mptcp_close_ssk() could be changed accordingly:
>
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 8679eafeeca8..6b9f390966f7 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -2340,11 +2340,7 @@ static void __mptcp_close_ssk(struct sock *sk,
> struct sock *ssk,
>                 goto out;
>         }
>
> -       /* if we are invoked by the msk cleanup code, the subflow is
> -        * already orphaned
> -        */
> -       if (ssk->sk_socket)
> -               sock_orphan(ssk);
> +       sock_orphan(ssk);
>
>         subflow->disposable = 1;
>

Thanks for your detailed explanation, and I'll send the V2 according
to it!

> > Reviewed-by: Biao Jiang <benbjiang@tencent.com>
> > Reviewed-by: Mengen Sun <mengensun@tencent.com>
> > Signed-off-by: Menglong Dong <imagedong@tencent.com>
> > ---
> >  net/mptcp/protocol.c | 3 +--
> >  1 file changed, 1 insertion(+), 2 deletions(-)
> >
> > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> > index 3796d1bfef6b..5ac1584baf61 100644
> > --- a/net/mptcp/protocol.c
> > +++ b/net/mptcp/protocol.c
> > @@ -815,7 +815,7 @@ void mptcp_data_ready(struct sock *sk, struct sock *ssk)
> >
> >       /* Wake-up the reader only for in-sequence data */
> >       mptcp_data_lock(sk);
> > -     if (move_skbs_to_msk(msk, ssk))
> > +     if (move_skbs_to_msk(msk, ssk) && !sock_flag(sk, SOCK_DEAD))
> >               sk->sk_data_ready(sk);
>
> I think this is not need right? sk_data_ready == sock_def_readable,
> will check carefully sk->sk_wq which should be NULL after close.
>

Yeah, this check seems unnecessary, I'll remove it.

> Thank!
>
> Paolo
>

Re: [PATCH mptcp-net] mptcp: don't orphan ssk in mptcp_close()

Posted by Paolo Abeni 1 year, 5 months ago

On Fri, 2022-11-18 at 11:34 +0100, Paolo Abeni wrote:
> As noted by the CI, this fix causes an UaF, as mptcp_close() frees the
> msk->sk_socket struct, which is also referenced by ssk->sk_socket.

I almost forgot: see mptcp_sock_graft()

> Without orphaning the subflows, later tcp code will use ssk->sk_socket,
> causing the reported splat. Note that there is the same risk with ssk-
> > sk_wq.
> 
> Since:
> * all the network code carefully checks for not NULL value of ssk-
> > sk_socket before accessing it
> * access it directly in context where it can't be NULL
> * uses the same policies to access 'sk_wq'
> * it looks like the SOCK_DONE flag can be set independently from the
> sock_orphan()
> 
> I think mptcp_close() could instead directly clearing ssk->sk_socket
> and ssk->sk_wq - just ' = NULL', no sock_orphan() call

To be explicit, I mean something alike:

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 8679eafeeca8..0d5109073e44 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -2938,7 +2938,11 @@ bool __mptcp_close(struct sock *sk, long timeout)
                if (ssk == msk->first)
                        subflow->fail_tout = 0;
 
-               sock_orphan(ssk);
+               /* detach from the parent socket, but allow data_ready to
+                * push incoming date into the mptcp stack, to properly ack it
+                */
+               ssk->sk_socket = NULL;
+               ssk->sk_wq = NULL;
                unlock_sock_fast(ssk, slow);
        }
        sock_orphan(sk);

Cheers,

Paolo

Re: [PATCH mptcp-net] mptcp: don't orphan ssk in mptcp_close()

Posted by Matthieu Baerts 1 year, 5 months ago

Hi Menglong,

On 18/11/2022 08:20, menglong8.dong@gmail.com wrote:
> From: Menglong Dong <imagedong@tencent.com>
> 
> All of the subflows of a msk will be orphaned in mptcp_close(), which
> means the subflows are in DEAD state. After then, DATA_FIN will be sent,
> and the other side will response with a DATA_ACK for this DATA_FIN.
> 
> However, if the other side still has pending data, the data that received
> on these subflows will not be passed to the msk, as they are DEAD and
> subflow_data_ready() will not be called in tcp_data_ready(). Therefore,
> these data can't be acked, and they will be retransmitted again and again,
> until timeout.
Thank you for the patch.

I don't know if you noticed but the CI found an issue with it:

- KVM Validation: debug:
  - Unstable: 1 failed test(s): packetdrill_add_addr - Critical: 1 Call
Trace(s) ❌:
  - Task: https://cirrus-ci.com/task/6018056502116352
  - Summary:
https://api.cirrus-ci.com/v1/artifact/task/6018056502116352/summary/summary.txt

Do you mind looking at that please?

Cheers,
Matt
-- 
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net

Re: [PATCH mptcp-net] mptcp: don't orphan ssk in mptcp_close()

Posted by Menglong Dong 1 year, 5 months ago

On Fri, Nov 18, 2022 at 6:10 PM Matthieu Baerts
<matthieu.baerts@tessares.net> wrote:
>
> Hi Menglong,
>
> On 18/11/2022 08:20, menglong8.dong@gmail.com wrote:
> > From: Menglong Dong <imagedong@tencent.com>
> >
> > All of the subflows of a msk will be orphaned in mptcp_close(), which
> > means the subflows are in DEAD state. After then, DATA_FIN will be sent,
> > and the other side will response with a DATA_ACK for this DATA_FIN.
> >
> > However, if the other side still has pending data, the data that received
> > on these subflows will not be passed to the msk, as they are DEAD and
> > subflow_data_ready() will not be called in tcp_data_ready(). Therefore,
> > these data can't be acked, and they will be retransmitted again and again,
> > until timeout.
> Thank you for the patch.
>
> I don't know if you noticed but the CI found an issue with it:
>
> - KVM Validation: debug:
>   - Unstable: 1 failed test(s): packetdrill_add_addr - Critical: 1 Call
> Trace(s) ❌:
>   - Task: https://cirrus-ci.com/task/6018056502116352
>   - Summary:
> https://api.cirrus-ci.com/v1/artifact/task/6018056502116352/summary/summary.txt
>
> Do you mind looking at that please?
>

Yeah, I noticed it. Seems the ssk is using msk->sk_socket, which will
be freed after mptcp_close(). I'm still thinking :/

Thanks!
Menglong Dong

> Cheers,
> Matt
> --
> Tessares | Belgium | Hybrid Access Solutions
> www.tessares.net

Re: mptcp: don't orphan ssk in mptcp_close(): Tests Results

Posted by MPTCP CI 1 year, 5 months ago

Hi Menglong,

Thank you for your modifications, that's great!

Our CI did some validations and here is its report:

- KVM Validation: normal:
  - Success! ✅:
  - Task: https://cirrus-ci.com/task/4892156595273728
  - Summary: https://api.cirrus-ci.com/v1/artifact/task/4892156595273728/summary/summary.txt

- KVM Validation: debug:
  - Unstable: 1 failed test(s): packetdrill_add_addr - Critical: 1 Call Trace(s) ❌:
  - Task: https://cirrus-ci.com/task/6018056502116352
  - Summary: https://api.cirrus-ci.com/v1/artifact/task/6018056502116352/summary/summary.txt

Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/0061c26274eb


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-debug

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (Tessares)