[PATCH mptcp-net] mptcp: move subflow cleanup in mptcp_destroy_common()

Paolo Abeni posted 1 patch 2 months, 2 weeks ago
Failed in applying to current master (apply log)
net/mptcp/protocol.c | 39 +++++++++++++++------------------------
net/mptcp/protocol.h |  2 +-
net/mptcp/subflow.c  |  3 ++-
3 files changed, 18 insertions(+), 26 deletions(-)
[PATCH mptcp-net] mptcp: move subflow cleanup in mptcp_destroy_common()
Posted by Paolo Abeni 2 months, 2 weeks ago
If the mptcp socket creation fails due to a CGROUP_INET_SOCK_CREATE
eBPF program, the MPTCP protocol ends-up leaking all the subflows:
the related cleanup happens in __mptcp_destroy_sock() that is not
invoked in such code path.

Address the issue moving the subflow sockets cleanup in the
mptcp_destroy_common() helper, which is invoked in every msk cleanup
path.

Additionally get rid of the intermediate list_splice_init step, which
is an unneeded relic from the past.

The issue is present since before the reported root cause commit, but
any attempt to backport the fix before that hash will require a complete
rewrite.

Fixes: e16163b6e2 ("mptcp: refactor shutdown and close")
Reported-by: Nguyen Dinh Phi <phind.uet@gmail.com>
Co-developed-by: Nguyen Dinh Phi <phind.uet@gmail.com>
Signed-off-by: Nguyen Dinh Phi <phind.uet@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
The tags are provisional, waiting for Nguyen feedback/preferences.

Sending out early to trigger the CI.

This will need likely some staging period.

"mptcp: add mptcp_for_each_subflow_safe helper" needs to be rebased
on top of this (we can replace another list_for_each_safe instance)
---
 net/mptcp/protocol.c | 39 +++++++++++++++------------------------
 net/mptcp/protocol.h |  2 +-
 net/mptcp/subflow.c  |  3 ++-
 3 files changed, 18 insertions(+), 26 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 21a3ed64226e..6f3324955355 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -2804,30 +2804,16 @@ static void __mptcp_wr_shutdown(struct sock *sk)
 
 static void __mptcp_destroy_sock(struct sock *sk)
 {
-	struct mptcp_subflow_context *subflow, *tmp;
 	struct mptcp_sock *msk = mptcp_sk(sk);
-	LIST_HEAD(conn_list);
 
 	pr_debug("msk=%p", msk);
 
 	might_sleep();
 
-	/* join list will be eventually flushed (with rst) at sock lock release time*/
-	list_splice_init(&msk->conn_list, &conn_list);
-
 	mptcp_stop_timer(sk);
 	sk_stop_timer(sk, &sk->sk_timer);
 	msk->pm.status = 0;
 
-	/* clears msk->subflow, allowing the following loop to close
-	 * even the initial subflow
-	 */
-	mptcp_dispose_initial_subflow(msk);
-	list_for_each_entry_safe(subflow, tmp, &conn_list, node) {
-		struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
-		__mptcp_close_ssk(sk, ssk, subflow, 0);
-	}
-
 	sk->sk_prot->destroy(sk);
 
 	WARN_ON_ONCE(msk->rmem_fwd_alloc);
@@ -2919,24 +2905,20 @@ static void mptcp_copy_inaddrs(struct sock *msk, const struct sock *ssk)
 
 static int mptcp_disconnect(struct sock *sk, int flags)
 {
-	struct mptcp_subflow_context *subflow, *tmp;
 	struct mptcp_sock *msk = mptcp_sk(sk);
 
 	inet_sk_state_store(sk, TCP_CLOSE);
 
-	list_for_each_entry_safe(subflow, tmp, &msk->conn_list, node) {
-		struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
-
-		__mptcp_close_ssk(sk, ssk, subflow, MPTCP_CF_FASTCLOSE);
-	}
-
 	mptcp_stop_timer(sk);
 	sk_stop_timer(sk, &sk->sk_timer);
 
 	if (mptcp_sk(sk)->token)
 		mptcp_event(MPTCP_EVENT_CLOSED, mptcp_sk(sk), NULL, GFP_KERNEL);
 
-	mptcp_destroy_common(msk);
+	/* msk->subflow is still intact, the following will not free the first
+	 * subflow
+	 */
+	mptcp_destroy_common(msk, MPTCP_CF_FASTCLOSE);
 	msk->last_snd = NULL;
 	WRITE_ONCE(msk->flags, 0);
 	msk->cb_flags = 0;
@@ -3086,12 +3068,17 @@ static struct sock *mptcp_accept(struct sock *sk, int flags, int *err,
 	return newsk;
 }
 
-void mptcp_destroy_common(struct mptcp_sock *msk)
+void mptcp_destroy_common(struct mptcp_sock *msk, unsigned int flags)
 {
+	struct mptcp_subflow_context *subflow, *tmp;
 	struct sock *sk = (struct sock *)msk;
 
 	__mptcp_clear_xmit(sk);
 
+	/* join list will be eventually flushed (with rst) at sock lock release time */
+	list_for_each_entry_safe(subflow, tmp, &msk->conn_list, node)
+		__mptcp_close_ssk(sk, mptcp_subflow_tcp_sock(subflow), subflow, flags);
+
 	/* move to sk_receive_queue, sk_stream_kill_queues will purge it */
 	mptcp_data_lock(sk);
 	skb_queue_splice_tail_init(&msk->receive_queue, &sk->sk_receive_queue);
@@ -3113,7 +3100,11 @@ static void mptcp_destroy(struct sock *sk)
 {
 	struct mptcp_sock *msk = mptcp_sk(sk);
 
-	mptcp_destroy_common(msk);
+	/* clears msk->subflow, allowing the following to close
+	 * even the initial subflow
+	 */
+	mptcp_dispose_initial_subflow(msk);
+	mptcp_destroy_common(msk, 0);
 	sk_sockets_allocated_dec(sk);
 }
 
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 480c5320b86e..78c8c471b22e 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -718,7 +718,7 @@ static inline void mptcp_write_space(struct sock *sk)
 	}
 }
 
-void mptcp_destroy_common(struct mptcp_sock *msk);
+void mptcp_destroy_common(struct mptcp_sock *msk, unsigned int flags);
 
 #define MPTCP_TOKEN_MAX_RETRIES	4
 
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 63e8892ec807..fc4ceb02328d 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -621,7 +621,8 @@ static void mptcp_sock_destruct(struct sock *sk)
 		sock_orphan(sk);
 	}
 
-	mptcp_destroy_common(mptcp_sk(sk));
+	/* We don't need to clear msk->subflow, as it's still NULL at this point */
+	mptcp_destroy_common(mptcp_sk(sk), 0);
 	inet_sock_destruct(sk);
 }
 
-- 
2.35.3


Re: [PATCH mptcp-net] mptcp: move subflow cleanup in mptcp_destroy_common()
Posted by Matthieu Baerts 2 months, 1 week ago
Hi Paolo, Mat,

On 15/07/2022 13:10, Paolo Abeni wrote:
> If the mptcp socket creation fails due to a CGROUP_INET_SOCK_CREATE
> eBPF program, the MPTCP protocol ends-up leaking all the subflows:
> the related cleanup happens in __mptcp_destroy_sock() that is not
> invoked in such code path.
> 
> Address the issue moving the subflow sockets cleanup in the
> mptcp_destroy_common() helper, which is invoked in every msk cleanup
> path.
> 
> Additionally get rid of the intermediate list_splice_init step, which
> is an unneeded relic from the past.
> 
> The issue is present since before the reported root cause commit, but
> any attempt to backport the fix before that hash will require a complete
> rewrite.
Thank you for the patch and the review!

Now in our tree (fixes for -net):

New patches for t/upstream-net:
- 4b5a18b64fca: mptcp: move subflow cleanup in mptcp_destroy_common()
- Results: b0fec3f01957..7023536b34d1 (export-net)

New patches for t/upstream:
- 4b5a18b64fca: mptcp: move subflow cleanup in mptcp_destroy_common()
- Results: fc8aa8591c3b..04e53d6af29a (export)


Tests are now in progress:

https://cirrus-ci.com/github/multipath-tcp/mptcp_net-next/export-net/20220720T210003
https://cirrus-ci.com/github/multipath-tcp/mptcp_net-next/export/20220720T210003

> ---
> The tags are provisional, waiting for Nguyen feedback/preferences.
> 
> Sending out early to trigger the CI.
> 
> This will need likely some staging period.
> 
> "mptcp: add mptcp_for_each_subflow_safe helper" needs to be rebased
> on top of this (we can replace another list_for_each_safe instance)

Done! Thanks for the note!

- de34e44fab5e: conflict in
t/mptcp-add-mptcp_for_each_subflow_safe-helper (+ use the new helper in
mptcp_destroy_common())

Cheers,
Matt
-- 
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net

Re: [PATCH mptcp-net] mptcp: move subflow cleanup in mptcp_destroy_common()
Posted by Mat Martineau 2 months, 2 weeks ago
On Fri, 15 Jul 2022, Paolo Abeni wrote:

> If the mptcp socket creation fails due to a CGROUP_INET_SOCK_CREATE
> eBPF program, the MPTCP protocol ends-up leaking all the subflows:
> the related cleanup happens in __mptcp_destroy_sock() that is not
> invoked in such code path.
>
> Address the issue moving the subflow sockets cleanup in the
> mptcp_destroy_common() helper, which is invoked in every msk cleanup
> path.
>
> Additionally get rid of the intermediate list_splice_init step, which
> is an unneeded relic from the past.
>
> The issue is present since before the reported root cause commit, but
> any attempt to backport the fix before that hash will require a complete
> rewrite.
>
> Fixes: e16163b6e2 ("mptcp: refactor shutdown and close")
> Reported-by: Nguyen Dinh Phi <phind.uet@gmail.com>
> Co-developed-by: Nguyen Dinh Phi <phind.uet@gmail.com>
> Signed-off-by: Nguyen Dinh Phi <phind.uet@gmail.com>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> The tags are provisional, waiting for Nguyen feedback/preferences.
>
> Sending out early to trigger the CI.

For some reason patchew didn't tag this commit, so the CI didn't trigger. 
Patchew did tag my patch about a half hour ago, so the importer doesn't 
seem to be completely offline.

I manually pushed a tag: patchew/20220715-pabeni-subflow-cleanup

That has started the github and cirrus-ci builds:

https://github.com/multipath-tcp/mptcp_net-next/actions/runs/2678690859
https://cirrus-ci.com/build/4905612499746816


I also noticed that this patch conflicts with Matthieu's "mptcp: add 
mptcp_for_each_subflow_safe helper" patch, but that's only in the export 
branch currently so we can easily wait on that until your -net fix 
propagates to net-next.


One more question below (about mptcp_dispose_initial_subflow), but 
assuming the CI results are ok I think this would be good to apply to 
export-net and squash further fixes if needed:

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>

>
> This will need likely some staging period.
>
> "mptcp: add mptcp_for_each_subflow_safe helper" needs to be rebased
> on top of this (we can replace another list_for_each_safe instance)
> ---
> net/mptcp/protocol.c | 39 +++++++++++++++------------------------
> net/mptcp/protocol.h |  2 +-
> net/mptcp/subflow.c  |  3 ++-
> 3 files changed, 18 insertions(+), 26 deletions(-)
>
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 21a3ed64226e..6f3324955355 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -2804,30 +2804,16 @@ static void __mptcp_wr_shutdown(struct sock *sk)
>
> static void __mptcp_destroy_sock(struct sock *sk)
> {
> -	struct mptcp_subflow_context *subflow, *tmp;
> 	struct mptcp_sock *msk = mptcp_sk(sk);
> -	LIST_HEAD(conn_list);
>
> 	pr_debug("msk=%p", msk);
>
> 	might_sleep();
>
> -	/* join list will be eventually flushed (with rst) at sock lock release time*/
> -	list_splice_init(&msk->conn_list, &conn_list);
> -
> 	mptcp_stop_timer(sk);
> 	sk_stop_timer(sk, &sk->sk_timer);
> 	msk->pm.status = 0;
>
> -	/* clears msk->subflow, allowing the following loop to close
> -	 * even the initial subflow
> -	 */
> -	mptcp_dispose_initial_subflow(msk);
> -	list_for_each_entry_safe(subflow, tmp, &conn_list, node) {
> -		struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
> -		__mptcp_close_ssk(sk, ssk, subflow, 0);
> -	}
> -
> 	sk->sk_prot->destroy(sk);
>
> 	WARN_ON_ONCE(msk->rmem_fwd_alloc);
> @@ -2919,24 +2905,20 @@ static void mptcp_copy_inaddrs(struct sock *msk, const struct sock *ssk)
>
> static int mptcp_disconnect(struct sock *sk, int flags)
> {
> -	struct mptcp_subflow_context *subflow, *tmp;
> 	struct mptcp_sock *msk = mptcp_sk(sk);
>
> 	inet_sk_state_store(sk, TCP_CLOSE);
>
> -	list_for_each_entry_safe(subflow, tmp, &msk->conn_list, node) {
> -		struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
> -
> -		__mptcp_close_ssk(sk, ssk, subflow, MPTCP_CF_FASTCLOSE);
> -	}
> -
> 	mptcp_stop_timer(sk);
> 	sk_stop_timer(sk, &sk->sk_timer);
>
> 	if (mptcp_sk(sk)->token)
> 		mptcp_event(MPTCP_EVENT_CLOSED, mptcp_sk(sk), NULL, GFP_KERNEL);
>
> -	mptcp_destroy_common(msk);
> +	/* msk->subflow is still intact, the following will not free the first
> +	 * subflow
> +	 */
> +	mptcp_destroy_common(msk, MPTCP_CF_FASTCLOSE);
> 	msk->last_snd = NULL;
> 	WRITE_ONCE(msk->flags, 0);
> 	msk->cb_flags = 0;
> @@ -3086,12 +3068,17 @@ static struct sock *mptcp_accept(struct sock *sk, int flags, int *err,
> 	return newsk;
> }
>
> -void mptcp_destroy_common(struct mptcp_sock *msk)
> +void mptcp_destroy_common(struct mptcp_sock *msk, unsigned int flags)
> {
> +	struct mptcp_subflow_context *subflow, *tmp;
> 	struct sock *sk = (struct sock *)msk;
>
> 	__mptcp_clear_xmit(sk);
>
> +	/* join list will be eventually flushed (with rst) at sock lock release time */
> +	list_for_each_entry_safe(subflow, tmp, &msk->conn_list, node)
> +		__mptcp_close_ssk(sk, mptcp_subflow_tcp_sock(subflow), subflow, flags);
> +
> 	/* move to sk_receive_queue, sk_stream_kill_queues will purge it */
> 	mptcp_data_lock(sk);
> 	skb_queue_splice_tail_init(&msk->receive_queue, &sk->sk_receive_queue);
> @@ -3113,7 +3100,11 @@ static void mptcp_destroy(struct sock *sk)
> {
> 	struct mptcp_sock *msk = mptcp_sk(sk);
>
> -	mptcp_destroy_common(msk);
> +	/* clears msk->subflow, allowing the following to close
> +	 * even the initial subflow
> +	 */
> +	mptcp_dispose_initial_subflow(msk);

Was the missing call to mptcp_displose_initial_subflow() also causing a 
problem? Or did this cleanup end up happening through a different path?

This seems different from the "leaking all subflows" case mentioned in the 
commit message.

- Mat

> +	mptcp_destroy_common(msk, 0);
> 	sk_sockets_allocated_dec(sk);
> }
>
> diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
> index 480c5320b86e..78c8c471b22e 100644
> --- a/net/mptcp/protocol.h
> +++ b/net/mptcp/protocol.h
> @@ -718,7 +718,7 @@ static inline void mptcp_write_space(struct sock *sk)
> 	}
> }
>
> -void mptcp_destroy_common(struct mptcp_sock *msk);
> +void mptcp_destroy_common(struct mptcp_sock *msk, unsigned int flags);
>
> #define MPTCP_TOKEN_MAX_RETRIES	4
>
> diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
> index 63e8892ec807..fc4ceb02328d 100644
> --- a/net/mptcp/subflow.c
> +++ b/net/mptcp/subflow.c
> @@ -621,7 +621,8 @@ static void mptcp_sock_destruct(struct sock *sk)
> 		sock_orphan(sk);
> 	}
>
> -	mptcp_destroy_common(mptcp_sk(sk));
> +	/* We don't need to clear msk->subflow, as it's still NULL at this point */
> +	mptcp_destroy_common(mptcp_sk(sk), 0);
> 	inet_sock_destruct(sk);
> }
>
> -- 
> 2.35.3
>
>
>

--
Mat Martineau
Intel

Re: [PATCH mptcp-net] mptcp: move subflow cleanup in mptcp_destroy_common()
Posted by Paolo Abeni 2 months, 1 week ago
On Fri, 2022-07-15 at 12:13 -0700, Mat Martineau wrote:
> On Fri, 15 Jul 2022, Paolo Abeni wrote:
> 
> > If the mptcp socket creation fails due to a CGROUP_INET_SOCK_CREATE
> > eBPF program, the MPTCP protocol ends-up leaking all the subflows:
> > the related cleanup happens in __mptcp_destroy_sock() that is not
> > invoked in such code path.
> > 
> > Address the issue moving the subflow sockets cleanup in the
> > mptcp_destroy_common() helper, which is invoked in every msk cleanup
> > path.
> > 
> > Additionally get rid of the intermediate list_splice_init step, which
> > is an unneeded relic from the past.
> > 
> > The issue is present since before the reported root cause commit, but
> > any attempt to backport the fix before that hash will require a complete
> > rewrite.
> > 
> > Fixes: e16163b6e2 ("mptcp: refactor shutdown and close")
> > Reported-by: Nguyen Dinh Phi <phind.uet@gmail.com>
> > Co-developed-by: Nguyen Dinh Phi <phind.uet@gmail.com>
> > Signed-off-by: Nguyen Dinh Phi <phind.uet@gmail.com>
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> > The tags are provisional, waiting for Nguyen feedback/preferences.
> > 
> > Sending out early to trigger the CI.
> 
> For some reason patchew didn't tag this commit, so the CI didn't trigger. 
> Patchew did tag my patch about a half hour ago, so the importer doesn't 
> seem to be completely offline.
> 
> I manually pushed a tag: patchew/20220715-pabeni-subflow-cleanup
> 
> That has started the github and cirrus-ci builds:
> 
> https://github.com/multipath-tcp/mptcp_net-next/actions/runs/2678690859
> https://cirrus-ci.com/build/4905612499746816
> 
> 
> I also noticed that this patch conflicts with Matthieu's "mptcp: add 
> mptcp_for_each_subflow_safe helper" patch, but that's only in the export 
> branch currently so we can easily wait on that until your -net fix 
> propagates to net-next.
> 
> 
> One more question below (about mptcp_dispose_initial_subflow), but 
> assuming the CI results are ok I think this would be good to apply to 
> export-net and squash further fixes if needed:
> 
> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
> 
> > 
> > This will need likely some staging period.
> > 
> > "mptcp: add mptcp_for_each_subflow_safe helper" needs to be rebased
> > on top of this (we can replace another list_for_each_safe instance)
> > ---
> > net/mptcp/protocol.c | 39 +++++++++++++++------------------------
> > net/mptcp/protocol.h |  2 +-
> > net/mptcp/subflow.c  |  3 ++-
> > 3 files changed, 18 insertions(+), 26 deletions(-)
> > 
> > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> > index 21a3ed64226e..6f3324955355 100644
> > --- a/net/mptcp/protocol.c
> > +++ b/net/mptcp/protocol.c
> > @@ -2804,30 +2804,16 @@ static void __mptcp_wr_shutdown(struct sock *sk)
> > 
> > static void __mptcp_destroy_sock(struct sock *sk)
> > {
> > -	struct mptcp_subflow_context *subflow, *tmp;
> > 	struct mptcp_sock *msk = mptcp_sk(sk);
> > -	LIST_HEAD(conn_list);
> > 
> > 	pr_debug("msk=%p", msk);
> > 
> > 	might_sleep();
> > 
> > -	/* join list will be eventually flushed (with rst) at sock lock release time*/
> > -	list_splice_init(&msk->conn_list, &conn_list);
> > -
> > 	mptcp_stop_timer(sk);
> > 	sk_stop_timer(sk, &sk->sk_timer);
> > 	msk->pm.status = 0;
> > 
> > -	/* clears msk->subflow, allowing the following loop to close
> > -	 * even the initial subflow
> > -	 */
> > -	mptcp_dispose_initial_subflow(msk);
> > -	list_for_each_entry_safe(subflow, tmp, &conn_list, node) {
> > -		struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
> > -		__mptcp_close_ssk(sk, ssk, subflow, 0);
> > -	}
> > -
> > 	sk->sk_prot->destroy(sk);
> > 
> > 	WARN_ON_ONCE(msk->rmem_fwd_alloc);
> > @@ -2919,24 +2905,20 @@ static void mptcp_copy_inaddrs(struct sock *msk, const struct sock *ssk)
> > 
> > static int mptcp_disconnect(struct sock *sk, int flags)
> > {
> > -	struct mptcp_subflow_context *subflow, *tmp;
> > 	struct mptcp_sock *msk = mptcp_sk(sk);
> > 
> > 	inet_sk_state_store(sk, TCP_CLOSE);
> > 
> > -	list_for_each_entry_safe(subflow, tmp, &msk->conn_list, node) {
> > -		struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
> > -
> > -		__mptcp_close_ssk(sk, ssk, subflow, MPTCP_CF_FASTCLOSE);
> > -	}
> > -
> > 	mptcp_stop_timer(sk);
> > 	sk_stop_timer(sk, &sk->sk_timer);
> > 
> > 	if (mptcp_sk(sk)->token)
> > 		mptcp_event(MPTCP_EVENT_CLOSED, mptcp_sk(sk), NULL, GFP_KERNEL);
> > 
> > -	mptcp_destroy_common(msk);
> > +	/* msk->subflow is still intact, the following will not free the first
> > +	 * subflow
> > +	 */
> > +	mptcp_destroy_common(msk, MPTCP_CF_FASTCLOSE);
> > 	msk->last_snd = NULL;
> > 	WRITE_ONCE(msk->flags, 0);
> > 	msk->cb_flags = 0;
> > @@ -3086,12 +3068,17 @@ static struct sock *mptcp_accept(struct sock *sk, int flags, int *err,
> > 	return newsk;
> > }
> > 
> > -void mptcp_destroy_common(struct mptcp_sock *msk)
> > +void mptcp_destroy_common(struct mptcp_sock *msk, unsigned int flags)
> > {
> > +	struct mptcp_subflow_context *subflow, *tmp;
> > 	struct sock *sk = (struct sock *)msk;
> > 
> > 	__mptcp_clear_xmit(sk);
> > 
> > +	/* join list will be eventually flushed (with rst) at sock lock release time */
> > +	list_for_each_entry_safe(subflow, tmp, &msk->conn_list, node)
> > +		__mptcp_close_ssk(sk, mptcp_subflow_tcp_sock(subflow), subflow, flags);
> > +
> > 	/* move to sk_receive_queue, sk_stream_kill_queues will purge it */
> > 	mptcp_data_lock(sk);
> > 	skb_queue_splice_tail_init(&msk->receive_queue, &sk->sk_receive_queue);
> > @@ -3113,7 +3100,11 @@ static void mptcp_destroy(struct sock *sk)
> > {
> > 	struct mptcp_sock *msk = mptcp_sk(sk);
> > 
> > -	mptcp_destroy_common(msk);
> > +	/* clears msk->subflow, allowing the following to close
> > +	 * even the initial subflow
> > +	 */
> > +	mptcp_dispose_initial_subflow(msk);
> 
> Was the missing call to mptcp_displose_initial_subflow() also causing a 
> problem? Or did this cleanup end up happening through a different path?
> 
> This seems different from the "leaking all subflows" case mentioned in the 
> commit message.

I'm unsure I read the question correctly.

This patch does not add a new mptcp_dispose_initial_subflow() call in
the cleanup/free code path. Such call was already there in
__mptcp_destroy_sock(), and is needed to ensure even the initial
subflow is deleted.

mptcp_dispose_initial_subflow() is placed outside
mptcp_destroy_common() to allow mptcp_disconnect() to re-use the latter
helper while preserving the initial subflow 

Please let me know if the above solves your doubts.

Paolo


Re: [PATCH mptcp-net] mptcp: move subflow cleanup in mptcp_destroy_common()
Posted by Mat Martineau 2 months, 1 week ago
On Mon, 18 Jul 2022, Paolo Abeni wrote:

> On Fri, 2022-07-15 at 12:13 -0700, Mat Martineau wrote:
>> On Fri, 15 Jul 2022, Paolo Abeni wrote:
>>
>>> If the mptcp socket creation fails due to a CGROUP_INET_SOCK_CREATE
>>> eBPF program, the MPTCP protocol ends-up leaking all the subflows:
>>> the related cleanup happens in __mptcp_destroy_sock() that is not
>>> invoked in such code path.
>>>
>>> Address the issue moving the subflow sockets cleanup in the
>>> mptcp_destroy_common() helper, which is invoked in every msk cleanup
>>> path.
>>>
>>> Additionally get rid of the intermediate list_splice_init step, which
>>> is an unneeded relic from the past.
>>>
>>> The issue is present since before the reported root cause commit, but
>>> any attempt to backport the fix before that hash will require a complete
>>> rewrite.
>>>
>>> Fixes: e16163b6e2 ("mptcp: refactor shutdown and close")
>>> Reported-by: Nguyen Dinh Phi <phind.uet@gmail.com>
>>> Co-developed-by: Nguyen Dinh Phi <phind.uet@gmail.com>
>>> Signed-off-by: Nguyen Dinh Phi <phind.uet@gmail.com>
>>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>>> ---
>>> The tags are provisional, waiting for Nguyen feedback/preferences.
>>>
>>> Sending out early to trigger the CI.
>>
>> For some reason patchew didn't tag this commit, so the CI didn't trigger.
>> Patchew did tag my patch about a half hour ago, so the importer doesn't
>> seem to be completely offline.
>>
>> I manually pushed a tag: patchew/20220715-pabeni-subflow-cleanup
>>
>> That has started the github and cirrus-ci builds:
>>
>> https://github.com/multipath-tcp/mptcp_net-next/actions/runs/2678690859
>> https://cirrus-ci.com/build/4905612499746816
>>
>>
>> I also noticed that this patch conflicts with Matthieu's "mptcp: add
>> mptcp_for_each_subflow_safe helper" patch, but that's only in the export
>> branch currently so we can easily wait on that until your -net fix
>> propagates to net-next.
>>
>>
>> One more question below (about mptcp_dispose_initial_subflow), but
>> assuming the CI results are ok I think this would be good to apply to
>> export-net and squash further fixes if needed:
>>
>> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
>>
>>>
>>> This will need likely some staging period.
>>>
>>> "mptcp: add mptcp_for_each_subflow_safe helper" needs to be rebased
>>> on top of this (we can replace another list_for_each_safe instance)
>>> ---
>>> net/mptcp/protocol.c | 39 +++++++++++++++------------------------
>>> net/mptcp/protocol.h |  2 +-
>>> net/mptcp/subflow.c  |  3 ++-
>>> 3 files changed, 18 insertions(+), 26 deletions(-)
>>>
>>> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
>>> index 21a3ed64226e..6f3324955355 100644
>>> --- a/net/mptcp/protocol.c
>>> +++ b/net/mptcp/protocol.c
>>> @@ -2804,30 +2804,16 @@ static void __mptcp_wr_shutdown(struct sock *sk)
>>>
>>> static void __mptcp_destroy_sock(struct sock *sk)
>>> {
>>> -	struct mptcp_subflow_context *subflow, *tmp;
>>> 	struct mptcp_sock *msk = mptcp_sk(sk);
>>> -	LIST_HEAD(conn_list);
>>>
>>> 	pr_debug("msk=%p", msk);
>>>
>>> 	might_sleep();
>>>
>>> -	/* join list will be eventually flushed (with rst) at sock lock release time*/
>>> -	list_splice_init(&msk->conn_list, &conn_list);
>>> -
>>> 	mptcp_stop_timer(sk);
>>> 	sk_stop_timer(sk, &sk->sk_timer);
>>> 	msk->pm.status = 0;
>>>
>>> -	/* clears msk->subflow, allowing the following loop to close
>>> -	 * even the initial subflow
>>> -	 */
>>> -	mptcp_dispose_initial_subflow(msk);
>>> -	list_for_each_entry_safe(subflow, tmp, &conn_list, node) {
>>> -		struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
>>> -		__mptcp_close_ssk(sk, ssk, subflow, 0);
>>> -	}
>>> -
>>> 	sk->sk_prot->destroy(sk);
>>>
>>> 	WARN_ON_ONCE(msk->rmem_fwd_alloc);
>>> @@ -2919,24 +2905,20 @@ static void mptcp_copy_inaddrs(struct sock *msk, const struct sock *ssk)
>>>
>>> static int mptcp_disconnect(struct sock *sk, int flags)
>>> {
>>> -	struct mptcp_subflow_context *subflow, *tmp;
>>> 	struct mptcp_sock *msk = mptcp_sk(sk);
>>>
>>> 	inet_sk_state_store(sk, TCP_CLOSE);
>>>
>>> -	list_for_each_entry_safe(subflow, tmp, &msk->conn_list, node) {
>>> -		struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
>>> -
>>> -		__mptcp_close_ssk(sk, ssk, subflow, MPTCP_CF_FASTCLOSE);
>>> -	}
>>> -
>>> 	mptcp_stop_timer(sk);
>>> 	sk_stop_timer(sk, &sk->sk_timer);
>>>
>>> 	if (mptcp_sk(sk)->token)
>>> 		mptcp_event(MPTCP_EVENT_CLOSED, mptcp_sk(sk), NULL, GFP_KERNEL);
>>>
>>> -	mptcp_destroy_common(msk);
>>> +	/* msk->subflow is still intact, the following will not free the first
>>> +	 * subflow
>>> +	 */
>>> +	mptcp_destroy_common(msk, MPTCP_CF_FASTCLOSE);
>>> 	msk->last_snd = NULL;
>>> 	WRITE_ONCE(msk->flags, 0);
>>> 	msk->cb_flags = 0;
>>> @@ -3086,12 +3068,17 @@ static struct sock *mptcp_accept(struct sock *sk, int flags, int *err,
>>> 	return newsk;
>>> }
>>>
>>> -void mptcp_destroy_common(struct mptcp_sock *msk)
>>> +void mptcp_destroy_common(struct mptcp_sock *msk, unsigned int flags)
>>> {
>>> +	struct mptcp_subflow_context *subflow, *tmp;
>>> 	struct sock *sk = (struct sock *)msk;
>>>
>>> 	__mptcp_clear_xmit(sk);
>>>
>>> +	/* join list will be eventually flushed (with rst) at sock lock release time */
>>> +	list_for_each_entry_safe(subflow, tmp, &msk->conn_list, node)
>>> +		__mptcp_close_ssk(sk, mptcp_subflow_tcp_sock(subflow), subflow, flags);
>>> +
>>> 	/* move to sk_receive_queue, sk_stream_kill_queues will purge it */
>>> 	mptcp_data_lock(sk);
>>> 	skb_queue_splice_tail_init(&msk->receive_queue, &sk->sk_receive_queue);
>>> @@ -3113,7 +3100,11 @@ static void mptcp_destroy(struct sock *sk)
>>> {
>>> 	struct mptcp_sock *msk = mptcp_sk(sk);
>>>
>>> -	mptcp_destroy_common(msk);
>>> +	/* clears msk->subflow, allowing the following to close
>>> +	 * even the initial subflow
>>> +	 */
>>> +	mptcp_dispose_initial_subflow(msk);
>>
>> Was the missing call to mptcp_displose_initial_subflow() also causing a
>> problem? Or did this cleanup end up happening through a different path?
>>
>> This seems different from the "leaking all subflows" case mentioned in the
>> commit message.
>
> I'm unsure I read the question correctly.
>
> This patch does not add a new mptcp_dispose_initial_subflow() call in
> the cleanup/free code path. Such call was already there in
> __mptcp_destroy_sock(), and is needed to ensure even the initial
> subflow is deleted.
>
> mptcp_dispose_initial_subflow() is placed outside
> mptcp_destroy_common() to allow mptcp_disconnect() to re-use the latter
> helper while preserving the initial subflow
>

Right, it was clear that the old call in __mptcp_destroy_sock() was moved 
to mptcp_destroy().

But the commit message states that not every cleanup/free code path was 
calling __mptcp_destroy_sock(). That means not every code path was calling 
mptcp_dispose_initial_subflow(), and now it is consistently called by 
mptcp_destroy().

Does this mean the msk->subflow inode was leaked as part of "leaking all 
the subflows"? Or was that inode cleaned up some other way when 
__mptcp_destroy_sock() was not called?

--
Mat Martineau
Intel

Re: [PATCH mptcp-net] mptcp: move subflow cleanup in mptcp_destroy_common()
Posted by Paolo Abeni 2 months, 1 week ago
On Mon, 2022-07-18 at 11:31 -0700, Mat Martineau wrote:
> On Mon, 18 Jul 2022, Paolo Abeni wrote:
> 
> > On Fri, 2022-07-15 at 12:13 -0700, Mat Martineau wrote:
> > > On Fri, 15 Jul 2022, Paolo Abeni wrote:
> > > 
> > > > If the mptcp socket creation fails due to a CGROUP_INET_SOCK_CREATE
> > > > eBPF program, the MPTCP protocol ends-up leaking all the subflows:
> > > > the related cleanup happens in __mptcp_destroy_sock() that is not
> > > > invoked in such code path.
> > > > 
> > > > Address the issue moving the subflow sockets cleanup in the
> > > > mptcp_destroy_common() helper, which is invoked in every msk cleanup
> > > > path.
> > > > 
> > > > Additionally get rid of the intermediate list_splice_init step, which
> > > > is an unneeded relic from the past.
> > > > 
> > > > The issue is present since before the reported root cause commit, but
> > > > any attempt to backport the fix before that hash will require a complete
> > > > rewrite.
> > > > 
> > > > Fixes: e16163b6e2 ("mptcp: refactor shutdown and close")
> > > > Reported-by: Nguyen Dinh Phi <phind.uet@gmail.com>
> > > > Co-developed-by: Nguyen Dinh Phi <phind.uet@gmail.com>
> > > > Signed-off-by: Nguyen Dinh Phi <phind.uet@gmail.com>
> > > > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > > > ---
> > > > The tags are provisional, waiting for Nguyen feedback/preferences.
> > > > 
> > > > Sending out early to trigger the CI.
> > > 
> > > For some reason patchew didn't tag this commit, so the CI didn't trigger.
> > > Patchew did tag my patch about a half hour ago, so the importer doesn't
> > > seem to be completely offline.
> > > 
> > > I manually pushed a tag: patchew/20220715-pabeni-subflow-cleanup
> > > 
> > > That has started the github and cirrus-ci builds:
> > > 
> > > https://github.com/multipath-tcp/mptcp_net-next/actions/runs/2678690859
> > > https://cirrus-ci.com/build/4905612499746816
> > > 
> > > 
> > > I also noticed that this patch conflicts with Matthieu's "mptcp: add
> > > mptcp_for_each_subflow_safe helper" patch, but that's only in the export
> > > branch currently so we can easily wait on that until your -net fix
> > > propagates to net-next.
> > > 
> > > 
> > > One more question below (about mptcp_dispose_initial_subflow), but
> > > assuming the CI results are ok I think this would be good to apply to
> > > export-net and squash further fixes if needed:
> > > 
> > > Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
> > > 
> > > > 
> > > > This will need likely some staging period.
> > > > 
> > > > "mptcp: add mptcp_for_each_subflow_safe helper" needs to be rebased
> > > > on top of this (we can replace another list_for_each_safe instance)
> > > > ---
> > > > net/mptcp/protocol.c | 39 +++++++++++++++------------------------
> > > > net/mptcp/protocol.h |  2 +-
> > > > net/mptcp/subflow.c  |  3 ++-
> > > > 3 files changed, 18 insertions(+), 26 deletions(-)
> > > > 
> > > > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> > > > index 21a3ed64226e..6f3324955355 100644
> > > > --- a/net/mptcp/protocol.c
> > > > +++ b/net/mptcp/protocol.c
> > > > @@ -2804,30 +2804,16 @@ static void __mptcp_wr_shutdown(struct sock *sk)
> > > > 
> > > > static void __mptcp_destroy_sock(struct sock *sk)
> > > > {
> > > > -	struct mptcp_subflow_context *subflow, *tmp;
> > > > 	struct mptcp_sock *msk = mptcp_sk(sk);
> > > > -	LIST_HEAD(conn_list);
> > > > 
> > > > 	pr_debug("msk=%p", msk);
> > > > 
> > > > 	might_sleep();
> > > > 
> > > > -	/* join list will be eventually flushed (with rst) at sock lock release time*/
> > > > -	list_splice_init(&msk->conn_list, &conn_list);
> > > > -
> > > > 	mptcp_stop_timer(sk);
> > > > 	sk_stop_timer(sk, &sk->sk_timer);
> > > > 	msk->pm.status = 0;
> > > > 
> > > > -	/* clears msk->subflow, allowing the following loop to close
> > > > -	 * even the initial subflow
> > > > -	 */
> > > > -	mptcp_dispose_initial_subflow(msk);
> > > > -	list_for_each_entry_safe(subflow, tmp, &conn_list, node) {
> > > > -		struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
> > > > -		__mptcp_close_ssk(sk, ssk, subflow, 0);
> > > > -	}
> > > > -
> > > > 	sk->sk_prot->destroy(sk);
> > > > 
> > > > 	WARN_ON_ONCE(msk->rmem_fwd_alloc);
> > > > @@ -2919,24 +2905,20 @@ static void mptcp_copy_inaddrs(struct sock *msk, const struct sock *ssk)
> > > > 
> > > > static int mptcp_disconnect(struct sock *sk, int flags)
> > > > {
> > > > -	struct mptcp_subflow_context *subflow, *tmp;
> > > > 	struct mptcp_sock *msk = mptcp_sk(sk);
> > > > 
> > > > 	inet_sk_state_store(sk, TCP_CLOSE);
> > > > 
> > > > -	list_for_each_entry_safe(subflow, tmp, &msk->conn_list, node) {
> > > > -		struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
> > > > -
> > > > -		__mptcp_close_ssk(sk, ssk, subflow, MPTCP_CF_FASTCLOSE);
> > > > -	}
> > > > -
> > > > 	mptcp_stop_timer(sk);
> > > > 	sk_stop_timer(sk, &sk->sk_timer);
> > > > 
> > > > 	if (mptcp_sk(sk)->token)
> > > > 		mptcp_event(MPTCP_EVENT_CLOSED, mptcp_sk(sk), NULL, GFP_KERNEL);
> > > > 
> > > > -	mptcp_destroy_common(msk);
> > > > +	/* msk->subflow is still intact, the following will not free the first
> > > > +	 * subflow
> > > > +	 */
> > > > +	mptcp_destroy_common(msk, MPTCP_CF_FASTCLOSE);
> > > > 	msk->last_snd = NULL;
> > > > 	WRITE_ONCE(msk->flags, 0);
> > > > 	msk->cb_flags = 0;
> > > > @@ -3086,12 +3068,17 @@ static struct sock *mptcp_accept(struct sock *sk, int flags, int *err,
> > > > 	return newsk;
> > > > }
> > > > 
> > > > -void mptcp_destroy_common(struct mptcp_sock *msk)
> > > > +void mptcp_destroy_common(struct mptcp_sock *msk, unsigned int flags)
> > > > {
> > > > +	struct mptcp_subflow_context *subflow, *tmp;
> > > > 	struct sock *sk = (struct sock *)msk;
> > > > 
> > > > 	__mptcp_clear_xmit(sk);
> > > > 
> > > > +	/* join list will be eventually flushed (with rst) at sock lock release time */
> > > > +	list_for_each_entry_safe(subflow, tmp, &msk->conn_list, node)
> > > > +		__mptcp_close_ssk(sk, mptcp_subflow_tcp_sock(subflow), subflow, flags);
> > > > +
> > > > 	/* move to sk_receive_queue, sk_stream_kill_queues will purge it */
> > > > 	mptcp_data_lock(sk);
> > > > 	skb_queue_splice_tail_init(&msk->receive_queue, &sk->sk_receive_queue);
> > > > @@ -3113,7 +3100,11 @@ static void mptcp_destroy(struct sock *sk)
> > > > {
> > > > 	struct mptcp_sock *msk = mptcp_sk(sk);
> > > > 
> > > > -	mptcp_destroy_common(msk);
> > > > +	/* clears msk->subflow, allowing the following to close
> > > > +	 * even the initial subflow
> > > > +	 */
> > > > +	mptcp_dispose_initial_subflow(msk);
> > > 
> > > Was the missing call to mptcp_displose_initial_subflow() also causing a
> > > problem? Or did this cleanup end up happening through a different path?
> > > 
> > > This seems different from the "leaking all subflows" case mentioned in the
> > > commit message.
> > 
> > I'm unsure I read the question correctly.
> > 
> > This patch does not add a new mptcp_dispose_initial_subflow() call in
> > the cleanup/free code path. Such call was already there in
> > __mptcp_destroy_sock(), and is needed to ensure even the initial
> > subflow is deleted.
> > 
> > mptcp_dispose_initial_subflow() is placed outside
> > mptcp_destroy_common() to allow mptcp_disconnect() to re-use the latter
> > helper while preserving the initial subflow
> > 
> 
> Right, it was clear that the old call in __mptcp_destroy_sock() was moved 
> to mptcp_destroy().
> 
> But the commit message states that not every cleanup/free code path was 
> calling __mptcp_destroy_sock(). That means not every code path was calling 
> mptcp_dispose_initial_subflow(), and now it is consistently called by 
> mptcp_destroy().
> 
> Does this mean the msk->subflow inode was leaked as part of "leaking all 
> the subflows"? 

AFAICS, yes.


Note that this patch should also additionally address a different leak
scenario:

After MPC handshake an mptcp listener creates the MPC subflow and the
paired msk sockets. Later the client creates additional subflows, which
are added to the msk conn_list, as the msk is already fully
established. 

the mptcp listener is closed before the child msk is accepted. The mpc
subflow is deleted, but prior to this patch mptcp_sock_destruct() did
not free the MPJ subflows, leaking them.

In such scenario the child msk->subflow struct is not leaked - because
it's not even allocated ;)

/P


Re: [PATCH mptcp-net] mptcp: move subflow cleanup in mptcp_destroy_common()
Posted by Mat Martineau 2 months, 1 week ago
On Tue, 19 Jul 2022, Paolo Abeni wrote:

> On Mon, 2022-07-18 at 11:31 -0700, Mat Martineau wrote:
>> On Mon, 18 Jul 2022, Paolo Abeni wrote:
>>
>>> On Fri, 2022-07-15 at 12:13 -0700, Mat Martineau wrote:
>>>> On Fri, 15 Jul 2022, Paolo Abeni wrote:
>>>>
>>>>> If the mptcp socket creation fails due to a CGROUP_INET_SOCK_CREATE
>>>>> eBPF program, the MPTCP protocol ends-up leaking all the subflows:
>>>>> the related cleanup happens in __mptcp_destroy_sock() that is not
>>>>> invoked in such code path.
>>>>>
>>>>> Address the issue moving the subflow sockets cleanup in the
>>>>> mptcp_destroy_common() helper, which is invoked in every msk cleanup
>>>>> path.
>>>>>
>>>>> Additionally get rid of the intermediate list_splice_init step, which
>>>>> is an unneeded relic from the past.
>>>>>
>>>>> The issue is present since before the reported root cause commit, but
>>>>> any attempt to backport the fix before that hash will require a complete
>>>>> rewrite.
>>>>>
>>>>> Fixes: e16163b6e2 ("mptcp: refactor shutdown and close")
>>>>> Reported-by: Nguyen Dinh Phi <phind.uet@gmail.com>
>>>>> Co-developed-by: Nguyen Dinh Phi <phind.uet@gmail.com>
>>>>> Signed-off-by: Nguyen Dinh Phi <phind.uet@gmail.com>
>>>>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>>>>> ---
>>>>> The tags are provisional, waiting for Nguyen feedback/preferences.
>>>>>
>>>>> Sending out early to trigger the CI.
>>>>
>>>> For some reason patchew didn't tag this commit, so the CI didn't trigger.
>>>> Patchew did tag my patch about a half hour ago, so the importer doesn't
>>>> seem to be completely offline.
>>>>
>>>> I manually pushed a tag: patchew/20220715-pabeni-subflow-cleanup
>>>>
>>>> That has started the github and cirrus-ci builds:
>>>>
>>>> https://github.com/multipath-tcp/mptcp_net-next/actions/runs/2678690859
>>>> https://cirrus-ci.com/build/4905612499746816
>>>>
>>>>
>>>> I also noticed that this patch conflicts with Matthieu's "mptcp: add
>>>> mptcp_for_each_subflow_safe helper" patch, but that's only in the export
>>>> branch currently so we can easily wait on that until your -net fix
>>>> propagates to net-next.
>>>>
>>>>
>>>> One more question below (about mptcp_dispose_initial_subflow), but
>>>> assuming the CI results are ok I think this would be good to apply to
>>>> export-net and squash further fixes if needed:
>>>>
>>>> Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
>>>>
>>>>>
>>>>> This will need likely some staging period.
>>>>>
>>>>> "mptcp: add mptcp_for_each_subflow_safe helper" needs to be rebased
>>>>> on top of this (we can replace another list_for_each_safe instance)
>>>>> ---
>>>>> net/mptcp/protocol.c | 39 +++++++++++++++------------------------
>>>>> net/mptcp/protocol.h |  2 +-
>>>>> net/mptcp/subflow.c  |  3 ++-
>>>>> 3 files changed, 18 insertions(+), 26 deletions(-)
>>>>>
>>>>> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
>>>>> index 21a3ed64226e..6f3324955355 100644
>>>>> --- a/net/mptcp/protocol.c
>>>>> +++ b/net/mptcp/protocol.c
>>>>> @@ -2804,30 +2804,16 @@ static void __mptcp_wr_shutdown(struct sock *sk)
>>>>>
>>>>> static void __mptcp_destroy_sock(struct sock *sk)
>>>>> {
>>>>> -	struct mptcp_subflow_context *subflow, *tmp;
>>>>> 	struct mptcp_sock *msk = mptcp_sk(sk);
>>>>> -	LIST_HEAD(conn_list);
>>>>>
>>>>> 	pr_debug("msk=%p", msk);
>>>>>
>>>>> 	might_sleep();
>>>>>
>>>>> -	/* join list will be eventually flushed (with rst) at sock lock release time*/
>>>>> -	list_splice_init(&msk->conn_list, &conn_list);
>>>>> -
>>>>> 	mptcp_stop_timer(sk);
>>>>> 	sk_stop_timer(sk, &sk->sk_timer);
>>>>> 	msk->pm.status = 0;
>>>>>
>>>>> -	/* clears msk->subflow, allowing the following loop to close
>>>>> -	 * even the initial subflow
>>>>> -	 */
>>>>> -	mptcp_dispose_initial_subflow(msk);
>>>>> -	list_for_each_entry_safe(subflow, tmp, &conn_list, node) {
>>>>> -		struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
>>>>> -		__mptcp_close_ssk(sk, ssk, subflow, 0);
>>>>> -	}
>>>>> -
>>>>> 	sk->sk_prot->destroy(sk);
>>>>>
>>>>> 	WARN_ON_ONCE(msk->rmem_fwd_alloc);
>>>>> @@ -2919,24 +2905,20 @@ static void mptcp_copy_inaddrs(struct sock *msk, const struct sock *ssk)
>>>>>
>>>>> static int mptcp_disconnect(struct sock *sk, int flags)
>>>>> {
>>>>> -	struct mptcp_subflow_context *subflow, *tmp;
>>>>> 	struct mptcp_sock *msk = mptcp_sk(sk);
>>>>>
>>>>> 	inet_sk_state_store(sk, TCP_CLOSE);
>>>>>
>>>>> -	list_for_each_entry_safe(subflow, tmp, &msk->conn_list, node) {
>>>>> -		struct sock *ssk = mptcp_subflow_tcp_sock(subflow);
>>>>> -
>>>>> -		__mptcp_close_ssk(sk, ssk, subflow, MPTCP_CF_FASTCLOSE);
>>>>> -	}
>>>>> -
>>>>> 	mptcp_stop_timer(sk);
>>>>> 	sk_stop_timer(sk, &sk->sk_timer);
>>>>>
>>>>> 	if (mptcp_sk(sk)->token)
>>>>> 		mptcp_event(MPTCP_EVENT_CLOSED, mptcp_sk(sk), NULL, GFP_KERNEL);
>>>>>
>>>>> -	mptcp_destroy_common(msk);
>>>>> +	/* msk->subflow is still intact, the following will not free the first
>>>>> +	 * subflow
>>>>> +	 */
>>>>> +	mptcp_destroy_common(msk, MPTCP_CF_FASTCLOSE);
>>>>> 	msk->last_snd = NULL;
>>>>> 	WRITE_ONCE(msk->flags, 0);
>>>>> 	msk->cb_flags = 0;
>>>>> @@ -3086,12 +3068,17 @@ static struct sock *mptcp_accept(struct sock *sk, int flags, int *err,
>>>>> 	return newsk;
>>>>> }
>>>>>
>>>>> -void mptcp_destroy_common(struct mptcp_sock *msk)
>>>>> +void mptcp_destroy_common(struct mptcp_sock *msk, unsigned int flags)
>>>>> {
>>>>> +	struct mptcp_subflow_context *subflow, *tmp;
>>>>> 	struct sock *sk = (struct sock *)msk;
>>>>>
>>>>> 	__mptcp_clear_xmit(sk);
>>>>>
>>>>> +	/* join list will be eventually flushed (with rst) at sock lock release time */
>>>>> +	list_for_each_entry_safe(subflow, tmp, &msk->conn_list, node)
>>>>> +		__mptcp_close_ssk(sk, mptcp_subflow_tcp_sock(subflow), subflow, flags);
>>>>> +
>>>>> 	/* move to sk_receive_queue, sk_stream_kill_queues will purge it */
>>>>> 	mptcp_data_lock(sk);
>>>>> 	skb_queue_splice_tail_init(&msk->receive_queue, &sk->sk_receive_queue);
>>>>> @@ -3113,7 +3100,11 @@ static void mptcp_destroy(struct sock *sk)
>>>>> {
>>>>> 	struct mptcp_sock *msk = mptcp_sk(sk);
>>>>>
>>>>> -	mptcp_destroy_common(msk);
>>>>> +	/* clears msk->subflow, allowing the following to close
>>>>> +	 * even the initial subflow
>>>>> +	 */
>>>>> +	mptcp_dispose_initial_subflow(msk);
>>>>
>>>> Was the missing call to mptcp_displose_initial_subflow() also causing a
>>>> problem? Or did this cleanup end up happening through a different path?
>>>>
>>>> This seems different from the "leaking all subflows" case mentioned in the
>>>> commit message.
>>>
>>> I'm unsure I read the question correctly.
>>>
>>> This patch does not add a new mptcp_dispose_initial_subflow() call in
>>> the cleanup/free code path. Such call was already there in
>>> __mptcp_destroy_sock(), and is needed to ensure even the initial
>>> subflow is deleted.
>>>
>>> mptcp_dispose_initial_subflow() is placed outside
>>> mptcp_destroy_common() to allow mptcp_disconnect() to re-use the latter
>>> helper while preserving the initial subflow
>>>
>>
>> Right, it was clear that the old call in __mptcp_destroy_sock() was moved
>> to mptcp_destroy().
>>
>> But the commit message states that not every cleanup/free code path was
>> calling __mptcp_destroy_sock(). That means not every code path was calling
>> mptcp_dispose_initial_subflow(), and now it is consistently called by
>> mptcp_destroy().
>>
>> Does this mean the msk->subflow inode was leaked as part of "leaking all
>> the subflows"?
>
> AFAICS, yes.
>
>
> Note that this patch should also additionally address a different leak
> scenario:
>
> After MPC handshake an mptcp listener creates the MPC subflow and the
> paired msk sockets. Later the client creates additional subflows, which
> are added to the msk conn_list, as the msk is already fully
> established.
>
> the mptcp listener is closed before the child msk is accepted. The mpc
> subflow is deleted, but prior to this patch mptcp_sock_destruct() did
> not free the MPJ subflows, leaking them.
>
> In such scenario the child msk->subflow struct is not leaked - because
> it's not even allocated ;)

Ok, thanks for the details. I've updated the patch status in patchwork.

--
Mat Martineau
Intel