[v1] mptcp: improve subflow creation on errors

[PATCH mptcp-next 2/3] mptcp: do not block subflows creation on errors

Posted by Paolo Abeni 3 years, 9 months ago

If the MPTCP configuration allows for multiple subflows
creation, and the first additional subflows never reach
the fully established status - e.g. due to packets drop or
reset - the in kernel path manager do not move to the
next subflow.

Let's trigger subflow creation on both the above event.
Note that we don't need an additional timer to catch timeout
and/or long delay in connection completion: we just need to
measure the time elapsed since the subflow creation every
time we emit an MPJ sub-option.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/options.c  | 19 ++++++++++++++++---
 net/mptcp/protocol.h |  1 +
 net/mptcp/subflow.c  | 12 +++++++++++-
 3 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index 7a6a39b71633..d6c4f5c82d09 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -1257,10 +1257,10 @@ static u16 mptcp_make_csum(const struct mptcp_ext *mpext)
 void mptcp_write_options(__be32 *ptr, const struct tcp_sock *tp,
 			 struct mptcp_out_options *opts)
 {
-	if (unlikely(OPTION_MPTCP_FAIL & opts->suboptions)) {
-		const struct sock *ssk = (const struct sock *)tp;
-		struct mptcp_subflow_context *subflow;
+	const struct sock *ssk = (const struct sock *)tp;
+	struct mptcp_subflow_context *subflow;
 
+	if (unlikely(OPTION_MPTCP_FAIL & opts->suboptions)) {
 		subflow = mptcp_subflow_ctx(ssk);
 		subflow->send_mp_fail = 0;
 
@@ -1382,6 +1382,19 @@ void mptcp_write_options(__be32 *ptr, const struct tcp_sock *tp,
 		/* MPC is additionally mutually exclusive with MP_PRIO */
 		goto mp_capable_done;
 	} else if (OPTIONS_MPTCP_MPJ & opts->suboptions) {
+		if (ssk) {
+			subflow = mptcp_subflow_ctx(ssk);
+			if (subflow->start_stamp &&
+			    unlikely(after(tcp_jiffies32, subflow->start_stamp + HZ/10))) {
+				/* we are not established yet and "a lot" of time passed
+				 * e.g. due to syn retranmissions. We can attempt next
+				 * subflow creation
+				 */
+				mptcp_pm_subflow_established(mptcp_sk(subflow->conn));
+				subflow->start_stamp = 0;
+			}
+		}
+
 		if (OPTION_MPTCP_MPJ_SYN & opts->suboptions) {
 			*ptr++ = mptcp_option(MPTCPOPT_MP_JOIN,
 					      TCPOLEN_MPTCP_MPJ_SYN,
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index a6a4bd7de5b4..d5df04d5b3f0 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -432,6 +432,7 @@ struct mptcp_subflow_context {
 		stale : 1;	    /* unable to snd/rcv data, do not use for xmit */
 	enum mptcp_data_avail data_avail;
 	u32	remote_nonce;
+	u32	start_stamp;
 	u64	thmac;
 	u32	local_nonce;
 	u32	remote_token;
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 0f90bd61de01..85986f3b58ed 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -1273,7 +1273,16 @@ void __mptcp_error_report(struct sock *sk)
 
 static void subflow_error_report(struct sock *ssk)
 {
-	struct sock *sk = mptcp_subflow_ctx(ssk)->conn;
+	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
+	struct sock *sk = subflow->conn;
+
+	/* subflow aborted before reaching the fully_established status
+	 * attempt the creation of the next subflow
+	 */
+	if (unlikely(!subflow->fully_established && subflow->start_stamp)) {
+		mptcp_pm_subflow_established(mptcp_sk(sk));
+		subflow->start_stamp = 0;
+	}
 
 	mptcp_data_lock(sk);
 	if (!sock_owned_by_user(sk))
@@ -1444,6 +1453,7 @@ int __mptcp_subflow_connect(struct sock *sk, const struct mptcp_addr_info *loc,
 	subflow->remote_id = remote_id;
 	subflow->request_join = 1;
 	subflow->request_bkup = !!(flags & MPTCP_PM_ADDR_FLAG_BACKUP);
+	subflow->start_stamp = tcp_jiffies32;
 	mptcp_info2sockaddr(remote, &addr, ssk->sk_family);
 
 	mptcp_add_pending_subflow(msk, subflow);
-- 
2.33.1

Re: [PATCH mptcp-next 2/3] mptcp: do not block subflows creation on errors

Posted by Matthieu Baerts 3 years, 9 months ago

Hi Paolo,

On 23/11/2021 17:37, Paolo Abeni wrote:
> If the MPTCP configuration allows for multiple subflows
> creation, and the first additional subflows never reach
> the fully established status - e.g. due to packets drop or
> reset - the in kernel path manager do not move to the
> next subflow.

Thank you for looking at that!

Why the PM could not try to establish all possible paths "in parallel"?
Is it because we have this limit of allowed subflows and we wait to know
if a path is established to count it as used subflow? But if here we
mark a path as established if it has an issue, does it still make sense?
I probably missed something here :)

> Let's trigger subflow creation on both the above event.
> Note that we don't need an additional timer to catch timeout
> and/or long delay in connection completion: we just need to
> measure the time elapsed since the subflow creation every
> time we emit an MPJ sub-option.

Just to be sure: can we do that because we always expect a retransmission?
> @@ -1382,6 +1382,19 @@ void mptcp_write_options(__be32 *ptr, const struct tcp_sock *tp,
>  		/* MPC is additionally mutually exclusive with MP_PRIO */
>  		goto mp_capable_done;
>  	} else if (OPTIONS_MPTCP_MPJ & opts->suboptions) {
> +		if (ssk) {
> +			subflow = mptcp_subflow_ctx(ssk);
> +			if (subflow->start_stamp &&
> +			    unlikely(after(tcp_jiffies32, subflow->start_stamp + HZ/10))) {

Small detail: checkpatch asks to have space around "/". I can add them
when applying the patch if only this needs to be modified.

> +				/* we are not established yet and "a lot" of time passed
> +				 * e.g. due to syn retranmissions. We can attempt next
> +				 * subflow creation

That's strange to tell the PM "yes, the subflow is established now"
while there is potentially (or clearly in case of error (RST) below) an
issue :)

Should we eventually have another function name doing the same and
resetting the stamp variable?

> +				 */
> +				mptcp_pm_subflow_established(mptcp_sk(subflow->conn));
> +				subflow->start_stamp = 0;
> +			}
> +		}
> +
>  		if (OPTION_MPTCP_MPJ_SYN & opts->suboptions) {
>  			*ptr++ = mptcp_option(MPTCPOPT_MP_JOIN,
>  					      TCPOLEN_MPTCP_MPJ_SYN,
> diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
> index 0f90bd61de01..85986f3b58ed 100644
> --- a/net/mptcp/subflow.c
> +++ b/net/mptcp/subflow.c
> @@ -1273,7 +1273,16 @@ void __mptcp_error_report(struct sock *sk)
>  
>  static void subflow_error_report(struct sock *ssk)
>  {
> -	struct sock *sk = mptcp_subflow_ctx(ssk)->conn;
> +	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
> +	struct sock *sk = subflow->conn;
> +
> +	/* subflow aborted before reaching the fully_established status
> +	 * attempt the creation of the next subflow
> +	 */
> +	if (unlikely(!subflow->fully_established && subflow->start_stamp)) {
> +		mptcp_pm_subflow_established(mptcp_sk(sk));
> +		subflow->start_stamp = 0;

Detail: do you need to set it to 0 here?

Cheers,
Matt
-- 
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net

Re: [PATCH mptcp-next 2/3] mptcp: do not block subflows creation on errors

Posted by Paolo Abeni 3 years, 9 months ago

On Wed, 2021-11-24 at 10:55 +0100, Matthieu Baerts wrote:
> Hi Paolo,
> 
> On 23/11/2021 17:37, Paolo Abeni wrote:
> > If the MPTCP configuration allows for multiple subflows
> > creation, and the first additional subflows never reach
> > the fully established status - e.g. due to packets drop or
> > reset - the in kernel path manager do not move to the
> > next subflow.
> 
> Thank you for looking at that!
> 
> Why the PM could not try to establish all possible paths "in parallel"?
> Is it because we have this limit of allowed subflows and we wait to know
> if a path is established to count it as used subflow?

IIRC, that design choice is to avoid that an MPTCP endpoint could
easily become a DDoS bot, generating "a lot" of syn in response to
action from possibly untrusted peer.

>  But if here we
> mark a path as established if it has an issue, does it still make sense?
> I probably missed something here :)

With this patch we don't mark the path as astablished, we try to pick
the next one. 

> > Let's trigger subflow creation on both the above event.
> > Note that we don't need an additional timer to catch timeout
> > and/or long delay in connection completion: we just need to
> > measure the time elapsed since the subflow creation every
> > time we emit an MPJ sub-option.
> 
> Just to be sure: can we do that because we always expect a retransmission?
> > @@ -1382,6 +1382,19 @@ void mptcp_write_options(__be32 *ptr, const struct tcp_sock *tp,
> >  		/* MPC is additionally mutually exclusive with MP_PRIO */
> >  		goto mp_capable_done;
> >  	} else if (OPTIONS_MPTCP_MPJ & opts->suboptions) {
> > +		if (ssk) {
> > +			subflow = mptcp_subflow_ctx(ssk);
> > +			if (subflow->start_stamp &&
> > +			    unlikely(after(tcp_jiffies32, subflow->start_stamp + HZ/10))) {
> 
> Small detail: checkpatch asks to have space around "/". I can add them
> when applying the patch if only this needs to be modified.
> 
> > +				/* we are not established yet and "a lot" of time passed
> > +				 * e.g. due to syn retranmissions. We can attempt next
> > +				 * subflow creation
> 
> That's strange to tell the PM "yes, the subflow is established now"
> while there is potentially (or clearly in case of error (RST) below) an
> issue :)
> 
> Should we eventually have another function name doing the same and
> resetting the stamp variable?

Yes, in v2 I'll add another pm helper for that.

Side note: do we still need the pm -> pm_netlink additional layering?
That was initially introduced to allow multiple in-kernel PM, but
afaics we are now considering a single in kernel one and e.v. an user-
space one. Perhpas we can remove some intermediate abstraction?

(not in this series!)

> > +				 */
> > +				mptcp_pm_subflow_established(mptcp_sk(subflow->conn));
> > +				subflow->start_stamp = 0;
> > +			}
> > +		}
> > +
> >  		if (OPTION_MPTCP_MPJ_SYN & opts->suboptions) {
> >  			*ptr++ = mptcp_option(MPTCPOPT_MP_JOIN,
> >  					      TCPOLEN_MPTCP_MPJ_SYN,
> > diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
> > index 0f90bd61de01..85986f3b58ed 100644
> > --- a/net/mptcp/subflow.c
> > +++ b/net/mptcp/subflow.c
> > @@ -1273,7 +1273,16 @@ void __mptcp_error_report(struct sock *sk)
> >  
> >  static void subflow_error_report(struct sock *ssk)
> >  {
> > -	struct sock *sk = mptcp_subflow_ctx(ssk)->conn;
> > +	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
> > +	struct sock *sk = subflow->conn;
> > +
> > +	/* subflow aborted before reaching the fully_established status
> > +	 * attempt the creation of the next subflow
> > +	 */
> > +	if (unlikely(!subflow->fully_established && subflow->start_stamp)) {
> > +		mptcp_pm_subflow_established(mptcp_sk(sk));
> > +		subflow->start_stamp = 0;
> 
> Detail: do you need to set it to 0 here?

So we call mptcp_pm_subflow_established() at most once due to subflow
timeout or reset. I feared a malicius peer could trigger a lot of
activity with a e.g. a burst of icmp unreachable.

Thanks!

Paolo

Re: [PATCH mptcp-next 2/3] mptcp: do not block subflows creation on errors

Posted by Matthieu Baerts 3 years, 9 months ago

On 24/11/2021 11:09, Paolo Abeni wrote:
> On Wed, 2021-11-24 at 10:55 +0100, Matthieu Baerts wrote:
>> Hi Paolo,
>>
>> On 23/11/2021 17:37, Paolo Abeni wrote:
>>> If the MPTCP configuration allows for multiple subflows
>>> creation, and the first additional subflows never reach
>>> the fully established status - e.g. due to packets drop or
>>> reset - the in kernel path manager do not move to the
>>> next subflow.
>>
>> Thank you for looking at that!
>>
>> Why the PM could not try to establish all possible paths "in parallel"?
>> Is it because we have this limit of allowed subflows and we wait to know
>> if a path is established to count it as used subflow?
> 
> IIRC, that design choice is to avoid that an MPTCP endpoint could
> easily become a DDoS bot, generating "a lot" of syn in response to
> action from possibly untrusted peer.

The endpoint will only be able to produce a very few SYN+MPJ depending
on the limit that is in place.

Should we maybe do a distinction between local addresses and remote
ones? I mean when the MPTCP connection is established, the PM could open
all paths using different local addresses and then the remote ones one
at a time?
>>> +				/* we are not established yet and "a lot" of time passed
>>> +				 * e.g. due to syn retranmissions. We can attempt next
>>> +				 * subflow creation
>>
>> That's strange to tell the PM "yes, the subflow is established now"
>> while there is potentially (or clearly in case of error (RST) below) an
>> issue :)
>>
>> Should we eventually have another function name doing the same and
>> resetting the stamp variable?
> 
> Yes, in v2 I'll add another pm helper for that.

Thank you!

> Side note: do we still need the pm -> pm_netlink additional layering?
> That was initially introduced to allow multiple in-kernel PM, but
> afaics we are now considering a single in kernel one and e.v. an user-
> space one. Perhpas we can remove some intermediate abstraction?

I think we should keep it. It still makes sense to have a PM controlled
with eBPF for servers having to support a lot of connections and always
doing the same action. Maybe the in-kernel PM can already cover all use
cases for a server, mainly announcing addresses. Or maybe more might be
needed, e.g. announcing an IP or another depending on the initial one,
etc. and in this case, a controlled with eBPF would be interesting (but
could be done as an extension of the in-kernel PM one?)

> (not in this series!)

Of course!

>>> +				 */
>>> +				mptcp_pm_subflow_established(mptcp_sk(subflow->conn));
>>> +				subflow->start_stamp = 0;
>>> +			}
>>> +		}
>>> +
>>>  		if (OPTION_MPTCP_MPJ_SYN & opts->suboptions) {
>>>  			*ptr++ = mptcp_option(MPTCPOPT_MP_JOIN,
>>>  					      TCPOLEN_MPTCP_MPJ_SYN,
>>> diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
>>> index 0f90bd61de01..85986f3b58ed 100644
>>> --- a/net/mptcp/subflow.c
>>> +++ b/net/mptcp/subflow.c
>>> @@ -1273,7 +1273,16 @@ void __mptcp_error_report(struct sock *sk)
>>>  
>>>  static void subflow_error_report(struct sock *ssk)
>>>  {
>>> -	struct sock *sk = mptcp_subflow_ctx(ssk)->conn;
>>> +	struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk);
>>> +	struct sock *sk = subflow->conn;
>>> +
>>> +	/* subflow aborted before reaching the fully_established status
>>> +	 * attempt the creation of the next subflow
>>> +	 */
>>> +	if (unlikely(!subflow->fully_established && subflow->start_stamp)) {
>>> +		mptcp_pm_subflow_established(mptcp_sk(sk));
>>> +		subflow->start_stamp = 0;
>>
>> Detail: do you need to set it to 0 here?
> 
> So we call mptcp_pm_subflow_established() at most once due to subflow
> timeout or reset. I feared a malicius peer could trigger a lot of
> activity with a e.g. a burst of icmp unreachable.

Good point, thank you for your reply. I thought subflow_error_report()
would be called only once per ssk but then yes, safer to add this check.

Cheers,
Matt
-- 
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net

Re: [PATCH mptcp-next 2/3] mptcp: do not block subflows creation on errors

Posted by Paolo Abeni 3 years, 9 months ago

On Wed, 2021-11-24 at 13:23 +0100, Matthieu Baerts wrote:
> On 24/11/2021 11:09, Paolo Abeni wrote:
> > On Wed, 2021-11-24 at 10:55 +0100, Matthieu Baerts wrote:
> > > Hi Paolo,
> > > 
> > > On 23/11/2021 17:37, Paolo Abeni wrote:
> > > > If the MPTCP configuration allows for multiple subflows
> > > > creation, and the first additional subflows never reach
> > > > the fully established status - e.g. due to packets drop or
> > > > reset - the in kernel path manager do not move to the
> > > > next subflow.
> > > 
> > > Thank you for looking at that!
> > > 
> > > Why the PM could not try to establish all possible paths "in parallel"?
> > > Is it because we have this limit of allowed subflows and we wait to know
> > > if a path is established to count it as used subflow?
> > 
> > IIRC, that design choice is to avoid that an MPTCP endpoint could
> > easily become a DDoS bot, generating "a lot" of syn in response to
> > action from possibly untrusted peer.
> 
> The endpoint will only be able to produce a very few SYN+MPJ depending
> on the limit that is in place.
> 
> Should we maybe do a distinction between local addresses and remote
> ones? I mean when the MPTCP connection is established, the PM could open
> all paths using different local addresses and then the remote ones one
> at a time?
> > > 

I *think* that kind of change will be more invasive - we would have to
revisit the non trivial mptcp_pm_create_subflow_or_signal_addr() and
AFAICS we would still need to cope with subflow creation failure.

All the above IMHO ! 

Cheers,

Paolo

[PATCH mptcp-next 1/3] mptcp: clean-up MPJ option writing.
[PATCH mptcp-next 2/3] mptcp: do not block subflows creation on errors
[PATCH mptcp-next 3/3] selftests: mptcp: add tests for subflow creation failure