[PATCH net v3] net: mptcp: fix unreleased socket in accept queue

menglong8.dong@gmail.com posted 1 patch 3 weeks ago
Failed in applying to current master (apply log)
There is a newer version of this series
net/mptcp/protocol.c |  2 +-
net/mptcp/protocol.h |  1 +
net/mptcp/subflow.c  | 31 ++++++-------------------------
3 files changed, 8 insertions(+), 26 deletions(-)
[PATCH net v3] net: mptcp: fix unreleased socket in accept queue
Posted by menglong8.dong@gmail.com 3 weeks ago
From: Menglong Dong <imagedong@tencent.com>

The mptcp socket and its subflow sockets in accept queue can't be
released after the process exit.

While the release of a mptcp socket in listening state, the
corresponding tcp socket will be released too. Meanwhile, the tcp
socket in the unaccept queue will be released too. However, only init
subflow is in the unaccept queue, and the joined subflow is not in the
unaccept queue, which makes the joined subflow won't be released, and
therefore the corresponding unaccepted mptcp socket will not be released
to.

This can be reproduced easily with following steps:

1. create 2 namespace and veth:
   $ ip netns add mptcp-client
   $ ip netns add mptcp-server
   $ sysctl -w net.ipv4.conf.all.rp_filter=0
   $ ip netns exec mptcp-client sysctl -w net.mptcp.enabled=1
   $ ip netns exec mptcp-server sysctl -w net.mptcp.enabled=1
   $ ip link add red-client netns mptcp-client type veth peer red-server \
     netns mptcp-server
   $ ip -n mptcp-server address add 10.0.0.1/24 dev red-server
   $ ip -n mptcp-server address add 192.168.0.1/24 dev red-server
   $ ip -n mptcp-client address add 10.0.0.2/24 dev red-client
   $ ip -n mptcp-client address add 192.168.0.2/24 dev red-client
   $ ip -n mptcp-server link set red-server up
   $ ip -n mptcp-client link set red-client up

2. configure the endpoint and limit for client and server:
   $ ip -n mptcp-server mptcp endpoint flush
   $ ip -n mptcp-server mptcp limits set subflow 2 add_addr_accepted 2
   $ ip -n mptcp-client mptcp endpoint flush
   $ ip -n mptcp-client mptcp limits set subflow 2 add_addr_accepted 2
   $ ip -n mptcp-client mptcp endpoint add 192.168.0.2 dev red-client id \
     1 subflow

3. listen and accept on a port, such as 9999. The nc command we used
   here is modified, which makes it use mptcp protocol by default.
   $ ip netns exec mptcp-server nc -l -k -p 9999

4. open another *two* terminal and use each of them to connect to the
   server with the following command:
   $ ip netns exec mptcp-client nc 10.0.0.1 9999
   Input something after connect to triger the connection of the second
   subflow. So that there are two established mptcp connections, with the
   second one still unaccepted.

5. exit all the nc command, and check the tcp socket in server namespace.
   And you will find that there is one tcp socket in CLOSE_WAIT state
   and can't release forever.

Fix this by closing all of the unaccepted mptcp socket in
mptcp_subflow_queue_clean() with mptcp_close(). As the mptcp_cancel_work()
is called inside mptcp_close(), we can't introduce a mptcp_close_nolock()
and call it here, which will cause deadlock.

Now, we can ensure that all unaccepted mptcp sockets will be cleaned by
mptcp_close() before they are released, so mptcp_sock_destruct(), which is
used to clean the unaccepted mptcp socket, is not needed anymore.

The selftests for mptcp is ran for this commit, and no new failures.
However, there are some failures before this commit for mptcp_join.sh:

 5 failure(s) has(ve) been detected:
        - 27: invalid address, ADD_ADDR timeout
        - 34: remove invalid addresses
        - 102: userspace pm no echo w/o daemon
        - 103: userspace pm type rejects join
        - 105: userspace pm type prevents mp_prio

Fixes: f296234c98a8 ("mptcp: Add handling of incoming MP_JOIN requests")
Reviewed-by: Jiang Biao <benbjiang@tencent.com>
Reviewed-by: Mengen Sun <mengensun@tencent.com>
Signed-off-by: Menglong Dong <imagedong@tencent.com>
---
v3:
- remove mptcp_close_nolock() and call mptcp_close() directly in
  mptcp_subflow_queue_clean(), as mptcp_close_nolock() will cause
  dead lock.

v2:
- remove mptcp_sock_destruct()
- introduce mptcp_close_nolock() and replace mptcp_close() with it in
  mptcp_subflow_queue_clean()
---
 net/mptcp/protocol.c |  2 +-
 net/mptcp/protocol.h |  1 +
 net/mptcp/subflow.c  | 31 ++++++-------------------------
 3 files changed, 8 insertions(+), 26 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index d398f3810662..2de33626b73f 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -2796,7 +2796,7 @@ static void __mptcp_destroy_sock(struct sock *sk)
 	sock_put(sk);
 }
 
-static void mptcp_close(struct sock *sk, long timeout)
+void mptcp_close(struct sock *sk, long timeout)
 {
 	struct mptcp_subflow_context *subflow;
 	struct mptcp_sock *msk = mptcp_sk(sk);
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 132d50833df1..102692b581dd 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -612,6 +612,7 @@ void mptcp_subflow_reset(struct sock *ssk);
 void mptcp_subflow_queue_clean(struct sock *ssk);
 void mptcp_sock_graft(struct sock *sk, struct socket *parent);
 struct socket *__mptcp_nmpc_socket(const struct mptcp_sock *msk);
+void mptcp_close(struct sock *sk, long timeout);
 
 bool mptcp_addresses_equal(const struct mptcp_addr_info *a,
 			   const struct mptcp_addr_info *b, bool use_port);
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index c7d49fb6e7bd..45315a57185a 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -602,30 +602,6 @@ static bool subflow_hmac_valid(const struct request_sock *req,
 	return !crypto_memneq(hmac, mp_opt->hmac, MPTCPOPT_HMAC_LEN);
 }
 
-static void mptcp_sock_destruct(struct sock *sk)
-{
-	/* if new mptcp socket isn't accepted, it is free'd
-	 * from the tcp listener sockets request queue, linked
-	 * from req->sk.  The tcp socket is released.
-	 * This calls the ULP release function which will
-	 * also remove the mptcp socket, via
-	 * sock_put(ctx->conn).
-	 *
-	 * Problem is that the mptcp socket will be in
-	 * ESTABLISHED state and will not have the SOCK_DEAD flag.
-	 * Both result in warnings from inet_sock_destruct.
-	 */
-	if ((1 << sk->sk_state) & (TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)) {
-		sk->sk_state = TCP_CLOSE;
-		WARN_ON_ONCE(sk->sk_socket);
-		sock_orphan(sk);
-	}
-
-	/* We don't need to clear msk->subflow, as it's still NULL at this point */
-	mptcp_destroy_common(mptcp_sk(sk), 0);
-	inet_sock_destruct(sk);
-}
-
 static void mptcp_force_close(struct sock *sk)
 {
 	/* the msk is not yet exposed to user-space */
@@ -768,7 +744,6 @@ static struct sock *subflow_syn_recv_sock(const struct sock *sk,
 			/* new mpc subflow takes ownership of the newly
 			 * created mptcp socket
 			 */
-			new_msk->sk_destruct = mptcp_sock_destruct;
 			mptcp_sk(new_msk)->setsockopt_seq = ctx->setsockopt_seq;
 			mptcp_pm_new_connection(mptcp_sk(new_msk), child, 1);
 			mptcp_token_accept(subflow_req, mptcp_sk(new_msk));
@@ -1770,6 +1745,12 @@ void mptcp_subflow_queue_clean(struct sock *listener_ssk)
 		msk->first = NULL;
 		msk->dl_next = NULL;
 		unlock_sock_fast(sk, slow);
+
+		/* mptcp_close() will put a extra reference on sk,
+		 * so we hold one here.
+		 */
+		sock_hold(sk);
+		mptcp_close(sk, 0);
 	}
 
 	/* we are still under the listener msk socket lock */
-- 
2.37.2
Re: [PATCH net v3] net: mptcp: fix unreleased socket in accept queue
Posted by Matthieu Baerts 2 weeks, 6 days ago
Hi Menglong,

On 07/09/2022 13:11, menglong8.dong@gmail.com wrote:
> From: Menglong Dong <imagedong@tencent.com>
> 
> The mptcp socket and its subflow sockets in accept queue can't be
> released after the process exit.
> 
> While the release of a mptcp socket in listening state, the
> corresponding tcp socket will be released too. Meanwhile, the tcp
> socket in the unaccept queue will be released too. However, only init
> subflow is in the unaccept queue, and the joined subflow is not in the
> unaccept queue, which makes the joined subflow won't be released, and
> therefore the corresponding unaccepted mptcp socket will not be released
> to.

Thank you for the v3.

Unfortunately, our CI found a possible recursive locking:

> - KVM Validation: debug:
>   - Unstable: 1 failed test(s): selftest_mptcp_join - Critical: 1 Call Trace(s) ❌:
>   - Task: https://cirrus-ci.com/task/5418283233968128
>   - Summary: https://api.cirrus-ci.com/v1/artifact/task/5418283233968128/summary/summary.txt

https://lore.kernel.org/mptcp/4e6d3d9e-1f1a-23ae-cb56-2d4f043f17ae@gmail.com/T/#u

Do you mind looking at it please?

Also, because it is not just a simple fix, may you send any new versions
only to MPTCP mailing list please? So without the other mailing lists
and netdev maintainers to reduce the audience during the development.

Once the patch is ready, we will apply it in MPTCP tree and send it to
netdev. That's what we usually for MPTCP related patches.

Cheers,
Matt
-- 
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net

Re: [PATCH net v3] net: mptcp: fix unreleased socket in accept queue
Posted by Paolo Abeni 2 weeks, 6 days ago
On Thu, 2022-09-08 at 15:56 +0200, Matthieu Baerts wrote:
> Hi Menglong,
> 
> On 07/09/2022 13:11, menglong8.dong@gmail.com wrote:
> > From: Menglong Dong <imagedong@tencent.com>
> > 
> > The mptcp socket and its subflow sockets in accept queue can't be
> > released after the process exit.
> > 
> > While the release of a mptcp socket in listening state, the
> > corresponding tcp socket will be released too. Meanwhile, the tcp
> > socket in the unaccept queue will be released too. However, only init
> > subflow is in the unaccept queue, and the joined subflow is not in the
> > unaccept queue, which makes the joined subflow won't be released, and
> > therefore the corresponding unaccepted mptcp socket will not be released
> > to.
> 
> Thank you for the v3.
> 
> Unfortunately, our CI found a possible recursive locking:
> 
> > - KVM Validation: debug:
> >   - Unstable: 1 failed test(s): selftest_mptcp_join - Critical: 1 Call Trace(s) ❌:
> >   - Task: https://cirrus-ci.com/task/5418283233968128
> >   - Summary: https://api.cirrus-ci.com/v1/artifact/task/5418283233968128/summary/summary.txt
> 
> https://lore.kernel.org/mptcp/4e6d3d9e-1f1a-23ae-cb56-2d4f043f17ae@gmail.com/T/#u
> 
> Do you mind looking at it please?

Ah, that is actually a false positive, but we must silence it. The main
point is that the lock_sock() in mptcp_close() rightfully lacks the
_nested annotation.

Instead of adding such annotation only for this call site, which would
be both ugly and dangerous, I suggest to factor_out from mptcp_close()
all the code the run under the socket lock, say in:
 
bool __mptcp_close(struct sock *sk, long timeout)
	// return true if the caller need to cancel the mptcp worker
	// (outside the socket lock)

and then in mptcp_subflow_queue_clean():

	sock_hold(sk);

	slow = lock_sock_fast_nested(sk);
        next = msk->dl_next;
        msk->first = NULL;
        msk->dl_next = NULL;
	do_cancel_work = __mptcp_close(sk, 0);
	unlock_sock_fast(sk, slow);

	if (do_cancel_work)
		mptcp_cancel_work(sk);
	sock_put(sk);

All the above could require 2 different patches, 1 to factor-out the
helper, and 1 to actually implement the fix.

Cheers,

Paolo


Re: [PATCH net v3] net: mptcp: fix unreleased socket in accept queue
Posted by Menglong Dong 2 weeks, 6 days ago
On Thu, Sep 8, 2022 at 10:45 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On Thu, 2022-09-08 at 15:56 +0200, Matthieu Baerts wrote:
> > Hi Menglong,
> >
> > On 07/09/2022 13:11, menglong8.dong@gmail.com wrote:
> > > From: Menglong Dong <imagedong@tencent.com>
> > >
> > > The mptcp socket and its subflow sockets in accept queue can't be
> > > released after the process exit.
> > >
> > > While the release of a mptcp socket in listening state, the
> > > corresponding tcp socket will be released too. Meanwhile, the tcp
> > > socket in the unaccept queue will be released too. However, only init
> > > subflow is in the unaccept queue, and the joined subflow is not in the
> > > unaccept queue, which makes the joined subflow won't be released, and
> > > therefore the corresponding unaccepted mptcp socket will not be released
> > > to.
> >
> > Thank you for the v3.
> >
> > Unfortunately, our CI found a possible recursive locking:
> >
> > > - KVM Validation: debug:
> > >   - Unstable: 1 failed test(s): selftest_mptcp_join - Critical: 1 Call Trace(s) ❌:
> > >   - Task: https://cirrus-ci.com/task/5418283233968128
> > >   - Summary: https://api.cirrus-ci.com/v1/artifact/task/5418283233968128/summary/summary.txt
> >
> > https://lore.kernel.org/mptcp/4e6d3d9e-1f1a-23ae-cb56-2d4f043f17ae@gmail.com/T/#u
> >
> > Do you mind looking at it please?
>
> Ah, that is actually a false positive, but we must silence it. The main
> point is that the lock_sock() in mptcp_close() rightfully lacks the
> _nested annotation.
>
> Instead of adding such annotation only for this call site, which would
> be both ugly and dangerous, I suggest to factor_out from mptcp_close()
> all the code the run under the socket lock, say in:
>
> bool __mptcp_close(struct sock *sk, long timeout)
>         // return true if the caller need to cancel the mptcp worker
>         // (outside the socket lock)
>
> and then in mptcp_subflow_queue_clean():
>
>         sock_hold(sk);
>
>         slow = lock_sock_fast_nested(sk);
>         next = msk->dl_next;
>         msk->first = NULL;
>         msk->dl_next = NULL;
>         do_cancel_work = __mptcp_close(sk, 0);
>         unlock_sock_fast(sk, slow);
>
>         if (do_cancel_work)
>                 mptcp_cancel_work(sk);
>         sock_put(sk);
>
> All the above could require 2 different patches, 1 to factor-out the
> helper, and 1 to actually implement the fix.
>

Thanks for your explanation! As Matthieu said, I'll send the next
version to the MPTCP mailing list only.

Thanks!
Menglong Dong

> Cheers,
>
> Paolo
>

Re: net: mptcp: fix unreleased socket in accept queue: Tests Results
Posted by MPTCP CI 2 weeks, 6 days ago
Hi Menglong,

Thank you for your modifications, that's great!

Our CI did some validations and here is its report:

- KVM Validation: normal:
  - Unstable: 1 failed test(s): selftest_mptcp_join 🔴:
  - Task: https://cirrus-ci.com/task/5981233187389440
  - Summary: https://api.cirrus-ci.com/v1/artifact/task/5981233187389440/summary/summary.txt

- KVM Validation: debug:
  - Unstable: 1 failed test(s): selftest_mptcp_join - Critical: 1 Call Trace(s) ❌:
  - Task: https://cirrus-ci.com/task/5418283233968128
  - Summary: https://api.cirrus-ci.com/v1/artifact/task/5418283233968128/summary/summary.txt

Initiator: Matthieu Baerts
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/aefc08a013fb


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-debug

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (Tessares)