[PATCH mptcp-next] mptcp: add a new sysctl for make after break timeout

Paolo Abeni posted 1 patch 8 months ago
Failed in applying to current master (apply log)
There is a newer version of this series
net/mptcp/ctrl.c     | 16 ++++++++++++++++
net/mptcp/protocol.c |  5 +++--
net/mptcp/protocol.h |  1 +
3 files changed, 20 insertions(+), 2 deletions(-)
[PATCH mptcp-next] mptcp: add a new sysctl for make after break timeout
Posted by Paolo Abeni 8 months ago
The MPTCP protocol allows sockets with no alive subflows to stay
in ESTABLISHED status for and user-defined timeout, to allow for
later subflows creation.

Currently such timeout is constant - TCP_TIMEWAIT_LEN. Let the
user-space configure them via a newly added sysctl, to better cope
with busy servers and simplify (make them faster) the relevant
pktdrill tests.

Note that the new know does not apply to orphaned MPTCP socket
waiting for the data_fin handshake completion: they always wait
TCP_TIMEWAIT_LEN.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/ctrl.c     | 16 ++++++++++++++++
 net/mptcp/protocol.c |  5 +++--
 net/mptcp/protocol.h |  1 +
 3 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/net/mptcp/ctrl.c b/net/mptcp/ctrl.c
index c46c22a84d23..b22d41729076 100644
--- a/net/mptcp/ctrl.c
+++ b/net/mptcp/ctrl.c
@@ -27,6 +27,7 @@ struct mptcp_pernet {
 #endif
 
 	unsigned int add_addr_timeout;
+	unsigned int close_timeout;
 	unsigned int stale_loss_cnt;
 	u8 mptcp_enabled;
 	u8 checksum_enabled;
@@ -65,6 +66,13 @@ unsigned int mptcp_stale_loss_cnt(const struct net *net)
 	return mptcp_get_pernet(net)->stale_loss_cnt;
 }
 
+unsigned int mptcp_close_timeout(const struct sock *sk)
+{
+	if (sock_flag(sk, SOCK_DEAD))
+		return TCP_TIMEWAIT_LEN;
+	return mptcp_get_pernet(sock_net(sk))->close_timeout;
+}
+
 int mptcp_get_pm_type(const struct net *net)
 {
 	return mptcp_get_pernet(net)->pm_type;
@@ -79,6 +87,7 @@ static void mptcp_pernet_set_defaults(struct mptcp_pernet *pernet)
 {
 	pernet->mptcp_enabled = 1;
 	pernet->add_addr_timeout = TCP_RTO_MAX;
+	pernet->close_timeout = TCP_TIMEWAIT_LEN;
 	pernet->checksum_enabled = 0;
 	pernet->allow_join_initial_addr_port = 1;
 	pernet->stale_loss_cnt = 4;
@@ -140,6 +149,12 @@ static struct ctl_table mptcp_sysctl_table[] = {
 		.maxlen	= MPTCP_SCHED_NAME_MAX,
 		.mode = 0644,
 		.proc_handler = proc_dostring,
+	},
+		{
+		.procname = "close_timeout",
+		.maxlen = sizeof(unsigned int),
+		.mode = 0644,
+		.proc_handler = proc_dointvec_jiffies,
 	},
 	{}
 };
@@ -163,6 +178,7 @@ static int mptcp_pernet_new_table(struct net *net, struct mptcp_pernet *pernet)
 	table[4].data = &pernet->stale_loss_cnt;
 	table[5].data = &pernet->pm_type;
 	table[6].data = &pernet->scheduler;
+	table[7].data = &pernet->close_timeout;
 
 	hdr = register_net_sysctl(net, MPTCP_SYSCTL_PATH, table);
 	if (!hdr)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 80d06cab8bc7..8022f00190ad 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -2501,7 +2501,7 @@ static bool mptcp_close_tout_expired(const struct sock *sk)
 		return false;
 
 	return time_after32(tcp_jiffies32,
-		  inet_csk(sk)->icsk_mtup.probe_timestamp + TCP_TIMEWAIT_LEN);
+	    inet_csk(sk)->icsk_mtup.probe_timestamp + mptcp_close_timeout(sk));
 }
 
 static void mptcp_check_fastclose(struct mptcp_sock *msk)
@@ -2643,7 +2643,8 @@ void mptcp_reset_tout_timer(struct mptcp_sock *msk, unsigned long fail_tout)
 	if (!fail_tout && !inet_csk(sk)->icsk_mtup.probe_timestamp)
 		return;
 
-	close_timeout = inet_csk(sk)->icsk_mtup.probe_timestamp - tcp_jiffies32 + jiffies + TCP_TIMEWAIT_LEN;
+	close_timeout = inet_csk(sk)->icsk_mtup.probe_timestamp - tcp_jiffies32 + jiffies +
+			mptcp_close_timeout(sk);
 
 	/* the close timeout takes precedence on the fail one, and here at least one of
 	 * them is active
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 392c2f247034..1cade49205bd 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -624,6 +624,7 @@ unsigned int mptcp_get_add_addr_timeout(const struct net *net);
 int mptcp_is_checksum_enabled(const struct net *net);
 int mptcp_allow_join_id0(const struct net *net);
 unsigned int mptcp_stale_loss_cnt(const struct net *net);
+unsigned int mptcp_close_timeout(const struct sock *sk);
 int mptcp_get_pm_type(const struct net *net);
 const char *mptcp_get_scheduler(const struct net *net);
 void mptcp_subflow_fully_established(struct mptcp_subflow_context *subflow,
-- 
2.41.0
Re: [PATCH mptcp-next] mptcp: add a new sysctl for make after break timeout
Posted by Matthieu Baerts 8 months ago
Hi Paolo,

On 04/09/2023 16:08, Paolo Abeni wrote:
> The MPTCP protocol allows sockets with no alive subflows to stay
> in ESTABLISHED status for and user-defined timeout, to allow for
> later subflows creation.
> 
> Currently such timeout is constant - TCP_TIMEWAIT_LEN. Let the
> user-space configure them via a newly added sysctl, to better cope
> with busy servers and simplify (make them faster) the relevant
> pktdrill tests.
> 
> Note that the new know does not apply to orphaned MPTCP socket
> waiting for the data_fin handshake completion: they always wait
> TCP_TIMEWAIT_LEN.

Thank you for having shared this patch!

It looks good, I have a few comments below if you don't mind:

(...)

> diff --git a/net/mptcp/ctrl.c b/net/mptcp/ctrl.c
> index c46c22a84d23..b22d41729076 100644
> --- a/net/mptcp/ctrl.c
> +++ b/net/mptcp/ctrl.c
> @@ -27,6 +27,7 @@ struct mptcp_pernet {
>  #endif
>  
>  	unsigned int add_addr_timeout;
> +	unsigned int close_timeout;

Sometimes, it is good to add a suffix indicating the unit being used but
I guess here in jiffies, we don't need that because the time is usually
in jiffies, right?

>  	unsigned int stale_loss_cnt;
>  	u8 mptcp_enabled;
>  	u8 checksum_enabled;
> @@ -65,6 +66,13 @@ unsigned int mptcp_stale_loss_cnt(const struct net *net)
>  	return mptcp_get_pernet(net)->stale_loss_cnt;
>  }
>  
> +unsigned int mptcp_close_timeout(const struct sock *sk)
> +{
> +	if (sock_flag(sk, SOCK_DEAD))
> +		return TCP_TIMEWAIT_LEN;
> +	return mptcp_get_pernet(sock_net(sk))->close_timeout;
> +}
> +
>  int mptcp_get_pm_type(const struct net *net)
>  {
>  	return mptcp_get_pernet(net)->pm_type;
> @@ -79,6 +87,7 @@ static void mptcp_pernet_set_defaults(struct mptcp_pernet *pernet)
>  {
>  	pernet->mptcp_enabled = 1;
>  	pernet->add_addr_timeout = TCP_RTO_MAX;
> +	pernet->close_timeout = TCP_TIMEWAIT_LEN;
>  	pernet->checksum_enabled = 0;
>  	pernet->allow_join_initial_addr_port = 1;
>  	pernet->stale_loss_cnt = 4;
> @@ -140,6 +149,12 @@ static struct ctl_table mptcp_sysctl_table[] = {
>  		.maxlen	= MPTCP_SCHED_NAME_MAX,
>  		.mode = 0644,
>  		.proc_handler = proc_dostring,
> +	},
> +		{

(nit) it looks like the indentation is not OK?

> +		.procname = "close_timeout",

Do you mind documenting this new entry in
Documentation/networking/mptcp-sysctl.rst please?

Also, here as well I guess we don't need to add a suffix "_s" or "_sec"
for the unit because usually it is in seconds, right?

> +		.maxlen = sizeof(unsigned int),

Just an idea: would it make sense to accept -1 as special case for "no
timeout"? (maybe not, we can always put a huge value? also I'm not sure
we can use proc_dointvec_jiffies() with -1 and I wonder if they handle
UINT_MAX with the conversion but that's a detail)

> +		.mode = 0644,
> +		.proc_handler = proc_dointvec_jiffies,
>  	},
>  	{}
>  };
> @@ -163,6 +178,7 @@ static int mptcp_pernet_new_table(struct net *net, struct mptcp_pernet *pernet)
>  	table[4].data = &pernet->stale_loss_cnt;
>  	table[5].data = &pernet->pm_type;
>  	table[6].data = &pernet->scheduler;
> +	table[7].data = &pernet->close_timeout;
>  
>  	hdr = register_net_sysctl(net, MPTCP_SYSCTL_PATH, table);
>  	if (!hdr)
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 80d06cab8bc7..8022f00190ad 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -2501,7 +2501,7 @@ static bool mptcp_close_tout_expired(const struct sock *sk)
>  		return false;
>  
>  	return time_after32(tcp_jiffies32,
> -		  inet_csk(sk)->icsk_mtup.probe_timestamp + TCP_TIMEWAIT_LEN);
> +	    inet_csk(sk)->icsk_mtup.probe_timestamp + mptcp_close_timeout(sk));

(nit: the indentation looks strange but maybe what you wanted?)

>  }
>  
>  static void mptcp_check_fastclose(struct mptcp_sock *msk)
> @@ -2643,7 +2643,8 @@ void mptcp_reset_tout_timer(struct mptcp_sock *msk, unsigned long fail_tout)
>  	if (!fail_tout && !inet_csk(sk)->icsk_mtup.probe_timestamp)
>  		return;
>  
> -	close_timeout = inet_csk(sk)->icsk_mtup.probe_timestamp - tcp_jiffies32 + jiffies + TCP_TIMEWAIT_LEN;
> +	close_timeout = inet_csk(sk)->icsk_mtup.probe_timestamp - tcp_jiffies32 + jiffies +
> +			mptcp_close_timeout(sk);

Do we need to do something special if mptcp_close_timeout(sk) returns 0,
AKA close the MPTCP connection immediately?

>  	/* the close timeout takes precedence on the fail one, and here at least one of
>  	 * them is active

Cheers,
Matt
-- 
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net
Re: [PATCH mptcp-next] mptcp: add a new sysctl for make after break timeout
Posted by Paolo Abeni 8 months ago
On Mon, 2023-09-04 at 16:35 +0200, Matthieu Baerts wrote:
> > diff --git a/net/mptcp/ctrl.c b/net/mptcp/ctrl.c
> > index c46c22a84d23..b22d41729076 100644
> > --- a/net/mptcp/ctrl.c
> > +++ b/net/mptcp/ctrl.c
> > @@ -27,6 +27,7 @@ struct mptcp_pernet {
> >  #endif
> >  
> >  	unsigned int add_addr_timeout;
> > +	unsigned int close_timeout;
> 
> Sometimes, it is good to add a suffix indicating the unit being used but
> I guess here in jiffies, we don't need that because the time is usually
> in jiffies, right?

I guess so. Also there is an hopefully self-documenting
proc_dointvec_jiffies() below.

[...]

> > @@ -140,6 +149,12 @@ static struct ctl_table mptcp_sysctl_table[] = {
> >  		.maxlen	= MPTCP_SCHED_NAME_MAX,
> >  		.mode = 0644,
> >  		.proc_handler = proc_dostring,
> > +	},
> > +		{
> 
> (nit) it looks like the indentation is not OK?

c&p error. I'll fix that in v2.

> > +		.procname = "close_timeout",
> 
> Do you mind documenting this new entry in
> Documentation/networking/mptcp-sysctl.rst please?

I will in v2.

> Also, here as well I guess we don't need to add a suffix "_s" or "_sec"
> for the unit because usually it is in seconds, right?

AFAICS almost all timeouts under net have no such suffix, I guess that
is fine?!?

> > +		.maxlen = sizeof(unsigned int),
> 
> Just an idea: would it make sense to accept -1 as special case for "no
> timeout"? (maybe not, we can always put a huge value? also I'm not sure
> we can use proc_dointvec_jiffies() with -1 and I wonder if they handle
> UINT_MAX with the conversion but that's a detail)

I'm unsure if that above is needed. A 0 timeout will fire (almost)
immediately, should be a reasonable enough approximation.
> 
> > diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> > index 80d06cab8bc7..8022f00190ad 100644
> > --- a/net/mptcp/protocol.c
> > +++ b/net/mptcp/protocol.c
> > @@ -2501,7 +2501,7 @@ static bool mptcp_close_tout_expired(const struct sock *sk)
> >  		return false;
> >  
> >  	return time_after32(tcp_jiffies32,
> > -		  inet_csk(sk)->icsk_mtup.probe_timestamp + TCP_TIMEWAIT_LEN);
> > +	    inet_csk(sk)->icsk_mtup.probe_timestamp + mptcp_close_timeout(sk));
> 
> (nit: the indentation looks strange but maybe what you wanted?)

Just trying to fit 80 chars, probably it's better to align to the above
'('

> > @@ -2643,7 +2643,8 @@ void mptcp_reset_tout_timer(struct mptcp_sock *msk, unsigned long fail_tout)
> >  	if (!fail_tout && !inet_csk(sk)->icsk_mtup.probe_timestamp)
> >  		return;
> >  
> > -	close_timeout = inet_csk(sk)->icsk_mtup.probe_timestamp - tcp_jiffies32 + jiffies + TCP_TIMEWAIT_LEN;
> > +	close_timeout = inet_csk(sk)->icsk_mtup.probe_timestamp - tcp_jiffies32 + jiffies +
> > +			mptcp_close_timeout(sk);
> 
> Do we need to do something special if mptcp_close_timeout(sk) returns 0,
> AKA close the MPTCP connection immediately?

AFAIK, not needed. mod_timer() will be fine to schedule it ASAP.

Cheers,

Paolo
Re: [PATCH mptcp-next] mptcp: add a new sysctl for make after break timeout
Posted by Matthieu Baerts 8 months ago
Hi Paolo,

Thank you for your reply!

On 04/09/2023 17:04, Paolo Abeni wrote:
> On Mon, 2023-09-04 at 16:35 +0200, Matthieu Baerts wrote:
>>> diff --git a/net/mptcp/ctrl.c b/net/mptcp/ctrl.c
>>> index c46c22a84d23..b22d41729076 100644
>>> --- a/net/mptcp/ctrl.c
>>> +++ b/net/mptcp/ctrl.c
>>> @@ -27,6 +27,7 @@ struct mptcp_pernet {
>>>  #endif
>>>  
>>>  	unsigned int add_addr_timeout;
>>> +	unsigned int close_timeout;
>>
>> Sometimes, it is good to add a suffix indicating the unit being used but
>> I guess here in jiffies, we don't need that because the time is usually
>> in jiffies, right?
> 
> I guess so. Also there is an hopefully self-documenting
> proc_dointvec_jiffies() below.

I'm fine with that. At the end, it's the same behaviour as the one above
('add_addr_timeout').

(I was thinking about all the "_us" (srtt_us, etc.) variables, mostly in
TCP but that's different: here it is maybe less common than jiffies and
probably needed when doing the conversion from ms to µs)

> 
> [...]
> 
>>> @@ -140,6 +149,12 @@ static struct ctl_table mptcp_sysctl_table[] = {
>>>  		.maxlen	= MPTCP_SCHED_NAME_MAX,
>>>  		.mode = 0644,
>>>  		.proc_handler = proc_dostring,
>>> +	},
>>> +		{
>>
>> (nit) it looks like the indentation is not OK?
> 
> c&p error. I'll fix that in v2.
> 
>>> +		.procname = "close_timeout",
>>
>> Do you mind documenting this new entry in
>> Documentation/networking/mptcp-sysctl.rst please?
> 
> I will in v2.
> 
>> Also, here as well I guess we don't need to add a suffix "_s" or "_sec"
>> for the unit because usually it is in seconds, right?
> 
> AFAICS almost all timeouts under net have no such suffix, I guess that
> is fine?!?

Yes indeed, no need to change.

> 
>>> +		.maxlen = sizeof(unsigned int),
>>
>> Just an idea: would it make sense to accept -1 as special case for "no
>> timeout"? (maybe not, we can always put a huge value? also I'm not sure
>> we can use proc_dointvec_jiffies() with -1 and I wonder if they handle
>> UINT_MAX with the conversion but that's a detail)
> 
> I'm unsure if that above is needed. A 0 timeout will fire (almost)
> immediately, should be a reasonable enough approximation.

Sorry, I wanted to suggest such special case: "never close the MPTCP
connection if there is no more subflows" (value of "-1", not "0"). But
that's maybe not needed if we can give a huge number?

>>> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
>>> index 80d06cab8bc7..8022f00190ad 100644
>>> --- a/net/mptcp/protocol.c
>>> +++ b/net/mptcp/protocol.c
>>> @@ -2501,7 +2501,7 @@ static bool mptcp_close_tout_expired(const struct sock *sk)
>>>  		return false;
>>>  
>>>  	return time_after32(tcp_jiffies32,
>>> -		  inet_csk(sk)->icsk_mtup.probe_timestamp + TCP_TIMEWAIT_LEN);
>>> +	    inet_csk(sk)->icsk_mtup.probe_timestamp + mptcp_close_timeout(sk));
>>
>> (nit: the indentation looks strange but maybe what you wanted?)
> 
> Just trying to fit 80 chars, probably it's better to align to the above
> '('

Maybe yes if you don't mind :)

>>> @@ -2643,7 +2643,8 @@ void mptcp_reset_tout_timer(struct mptcp_sock *msk, unsigned long fail_tout)
>>>  	if (!fail_tout && !inet_csk(sk)->icsk_mtup.probe_timestamp)
>>>  		return;
>>>  
>>> -	close_timeout = inet_csk(sk)->icsk_mtup.probe_timestamp - tcp_jiffies32 + jiffies + TCP_TIMEWAIT_LEN;
>>> +	close_timeout = inet_csk(sk)->icsk_mtup.probe_timestamp - tcp_jiffies32 + jiffies +
>>> +			mptcp_close_timeout(sk);
>>
>> Do we need to do something special if mptcp_close_timeout(sk) returns 0,
>> AKA close the MPTCP connection immediately?
> 
> AFAIK, not needed. mod_timer() will be fine to schedule it ASAP.

OK, easier to maintain then! :)

Cheers,
Matt
-- 
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net