[PATCH mptcp-next v16 3/8] Squash to "mptcp: add mptcpi_subflows_total counter"

Geliang Tang posted 8 patches 11 months, 1 week ago
Maintainers: Matthieu Baerts <matttbe@kernel.org>, Mat Martineau <martineau@kernel.org>, "David S. Miller" <davem@davemloft.net>, Eric Dumazet <edumazet@google.com>, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>, Shuah Khan <shuah@kernel.org>, Kishen Maloor <kishen.maloor@intel.com>
There is a newer version of this series
[PATCH mptcp-next v16 3/8] Squash to "mptcp: add mptcpi_subflows_total counter"
Posted by Geliang Tang 11 months, 1 week ago
Update __mptcp_has_initial_subflow().

Signed-off-by: Geliang Tang <geliang.tang@suse.com>
---
 net/mptcp/protocol.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 6508179e94a6..1fb4ac3727c4 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -1081,7 +1081,7 @@ static inline bool __mptcp_has_initial_subflow(const struct mptcp_sock *msk)
 {
 	struct sock *ssk = READ_ONCE(msk->first);
 
-	return ssk && inet_sk_state_load(ssk) != TCP_CLOSE;
+	return ssk && inet_sk_state_load(ssk) == TCP_ESTABLISHED;
 }
 
 static inline void mptcp_do_fallback(struct sock *ssk)
-- 
2.35.3
Re: [PATCH mptcp-next v16 3/8] Squash to "mptcp: add mptcpi_subflows_total counter"
Posted by Paolo Abeni 11 months, 1 week ago
Hi,

On Fri, 2023-10-13 at 13:46 +0800, Geliang Tang wrote:
> Update __mptcp_has_initial_subflow().
> 
> Signed-off-by: Geliang Tang <geliang.tang@suse.com>
> ---
>  net/mptcp/protocol.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
> index 6508179e94a6..1fb4ac3727c4 100644
> --- a/net/mptcp/protocol.h
> +++ b/net/mptcp/protocol.h
> @@ -1081,7 +1081,7 @@ static inline bool __mptcp_has_initial_subflow(const struct mptcp_sock *msk)
>  {
>  	struct sock *ssk = READ_ONCE(msk->first);
>  
> -	return ssk && inet_sk_state_load(ssk) != TCP_CLOSE;
> +	return ssk && inet_sk_state_load(ssk) == TCP_ESTABLISHED;

I think the above is not correct, __mptcp_has_initial_subflow() will
return false before connect completes and/or for listener sockets.

You can list explicitly add the valid states with something alike:

	(1 << inet_sk_state_load(ssk)) & (TCPF_ESTABLISHED |
TCPF_SYN_SENT | TCPF_SYN_RECV | TCPF_LISTEN | TCPF_CLOSE_WAIT)

I'm unsure if we should include CLOSE_WAIT here: the remote has shut
down, but this end can still send data...

Side important note: you are too fast :) There are a lot of in-flight
patches, and it's difficult to follow each series consistently. I
suggest to focus on a small subset - possibly on a single series at the
time.

e.g. The first 2 patches in this series are IMHO ready to be merged
[*]. If Mat could apply them, you could follow-up with the remaining
bits of this series.

Cheers,

Paolo

[*] modulo some expansion to the changelog of patch 1, but that could
happen even after merging IMHO.
Re: [PATCH mptcp-next v16 3/8] Squash to "mptcp: add mptcpi_subflows_total counter"
Posted by Matthieu Baerts 11 months, 1 week ago
Hi Paolo,

On 13/10/2023 12:32, Paolo Abeni wrote:
> Hi,
> 
> On Fri, 2023-10-13 at 13:46 +0800, Geliang Tang wrote:
>> Update __mptcp_has_initial_subflow().
>>
>> Signed-off-by: Geliang Tang <geliang.tang@suse.com>
>> ---
>>  net/mptcp/protocol.h | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
>> index 6508179e94a6..1fb4ac3727c4 100644
>> --- a/net/mptcp/protocol.h
>> +++ b/net/mptcp/protocol.h
>> @@ -1081,7 +1081,7 @@ static inline bool __mptcp_has_initial_subflow(const struct mptcp_sock *msk)
>>  {
>>  	struct sock *ssk = READ_ONCE(msk->first);
>>  
>> -	return ssk && inet_sk_state_load(ssk) != TCP_CLOSE;
>> +	return ssk && inet_sk_state_load(ssk) == TCP_ESTABLISHED;
> 
> I think the above is not correct, __mptcp_has_initial_subflow() will
> return false before connect completes and/or for listener sockets.

Please note that __mptcp_has_initial_subflow() is there just to count
the number of subflows. It is being used with 'msk->pm.subflows' and it
is supposed to have the same "behaviour". Then I don't think we should
increment the subflow counter for listener sockets, no?

> You can list explicitly add the valid states with something alike:
> 
> 	(1 << inet_sk_state_load(ssk)) & (TCPF_ESTABLISHED |
> TCPF_SYN_SENT | TCPF_SYN_RECV | TCPF_LISTEN | TCPF_CLOSE_WAIT)
> 
> I'm unsure if we should include CLOSE_WAIT here: the remote has shut
> down, but this end can still send data...

Maybe better, no? As long as the behaviour is similar to the one with
'msk->pm.subflows'.

> Side important note: you are too fast :) There are a lot of in-flight
> patches, and it's difficult to follow each series consistently. I
> suggest to focus on a small subset - possibly on a single series at the
> time.
> 
> e.g. The first 2 patches in this series are IMHO ready to be merged
> [*]. If Mat could apply them, you could follow-up with the remaining
> bits of this series.

Sure, I can do that.

Regarding patch 1/8, do you think we should send that to "-net"? The
patch looks OK to me but on the other hand, it is not a big issue to
reset the initial subflow (but not ideal) if we fear regressions due to
this patch. WDYT?

> [*] modulo some expansion to the changelog of patch 1, but that could
> happen even after merging IMHO.

Indeed. But we might forget :)
So if you have any suggestions, do not hesitate to share them :-)

Cheers,
Matt
-- 
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net
Re: [PATCH mptcp-next v16 3/8] Squash to "mptcp: add mptcpi_subflows_total counter"
Posted by Paolo Abeni 11 months, 1 week ago
On Fri, 2023-10-13 at 12:46 +0200, Matthieu Baerts wrote:
> On 13/10/2023 12:32, Paolo Abeni wrote:
> > Hi,
> > 
> > On Fri, 2023-10-13 at 13:46 +0800, Geliang Tang wrote:
> > > Update __mptcp_has_initial_subflow().
> > > 
> > > Signed-off-by: Geliang Tang <geliang.tang@suse.com>
> > > ---
> > >  net/mptcp/protocol.h | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
> > > index 6508179e94a6..1fb4ac3727c4 100644
> > > --- a/net/mptcp/protocol.h
> > > +++ b/net/mptcp/protocol.h
> > > @@ -1081,7 +1081,7 @@ static inline bool __mptcp_has_initial_subflow(const struct mptcp_sock *msk)
> > >  {
> > >  	struct sock *ssk = READ_ONCE(msk->first);
> > >  
> > > -	return ssk && inet_sk_state_load(ssk) != TCP_CLOSE;
> > > +	return ssk && inet_sk_state_load(ssk) == TCP_ESTABLISHED;
> > 
> > I think the above is not correct, __mptcp_has_initial_subflow() will
> > return false before connect completes and/or for listener sockets.
> 
> Please note that __mptcp_has_initial_subflow() is there just to count
> the number of subflows. It is being used with 'msk->pm.subflows' and it
> is supposed to have the same "behaviour". Then I don't think we should
> increment the subflow counter for listener sockets, no?

If the goal is giving an accurate count of the total number of
subflows, I think we should: the msk listener has 1 subflow: the tcp
listener.

> > You can list explicitly add the valid states with something alike:
> > 
> > 	(1 << inet_sk_state_load(ssk)) & (TCPF_ESTABLISHED |
> > TCPF_SYN_SENT | TCPF_SYN_RECV | TCPF_LISTEN | TCPF_CLOSE_WAIT)
> > 
> > I'm unsure if we should include CLOSE_WAIT here: the remote has shut
> > down, but this end can still send data...
> 
> Maybe better, no? As long as the behaviour is similar to the one with
> 'msk->pm.subflows'.

If we keep the way we account for MPJ subflows as a reference
CLOSE_WAIT status must be excluded. I agree/now see it's the better
option.

> > Side important note: you are too fast :) There are a lot of in-flight
> > patches, and it's difficult to follow each series consistently. I
> > suggest to focus on a small subset - possibly on a single series at the
> > time.
> > 
> > e.g. The first 2 patches in this series are IMHO ready to be merged
> > [*]. If Mat could apply them, you could follow-up with the remaining
> > bits of this series.
> 
> Sure, I can do that.
> 
> Regarding patch 1/8, do you think we should send that to "-net"? The
> patch looks OK to me but on the other hand, it is not a big issue to
> reset the initial subflow (but not ideal) if we fear regressions due to
> this patch. WDYT?

I think both patch 1 & 2 should go via -net, but I'm not 110% sure
there will be not regressions free. Perhaps we can let stage a bit in
our tree?

> > [*] modulo some expansion to the changelog of patch 1, but that could
> > happen even after merging IMHO.
> 
> Indeed. But we might forget :)
> So if you have any suggestions, do not hesitate to share them :-)

I would re-phrase the commit message roughly as follow:

"""
When closing the first subflow, the MPTCP protocol unconditionally
calls tcp_disconnect(), which in turn generates a reset if the subflow
is established. 

That is unexpected and different from what MPTCP does with MPJ
subflows, where resets are generated only on FASTCLOSE and other edge
scenarios.

We can't reuse for the first subflow the same code in place for MPJ
subflows, as MPTCP clean them up completely via a tcp_close() call,
while must keep the first subflow socket alive for later re-usage, due
to implementation constraints.

This patch adds a new helper __mptcp_subflow_disconnect() that
encapsulates, a logic similar to tcp_close, issuing a reset only when
the MPTCP_CF_FASTCLOSE flag is set, and performing a clean shutdown
otherwise.
"""

Cheers,

Paolo
Re: [PATCH mptcp-next v16 3/8] Squash to "mptcp: add mptcpi_subflows_total counter"
Posted by Matthieu Baerts 11 months, 1 week ago
Hi Paolo,

Thank you for your replies!

On 13/10/2023 17:35, Paolo Abeni wrote:
> On Fri, 2023-10-13 at 12:46 +0200, Matthieu Baerts wrote:
>> On 13/10/2023 12:32, Paolo Abeni wrote:
>>> Hi,
>>>
>>> On Fri, 2023-10-13 at 13:46 +0800, Geliang Tang wrote:
>>>> Update __mptcp_has_initial_subflow().
>>>>
>>>> Signed-off-by: Geliang Tang <geliang.tang@suse.com>
>>>> ---
>>>>  net/mptcp/protocol.h | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
>>>> index 6508179e94a6..1fb4ac3727c4 100644
>>>> --- a/net/mptcp/protocol.h
>>>> +++ b/net/mptcp/protocol.h
>>>> @@ -1081,7 +1081,7 @@ static inline bool __mptcp_has_initial_subflow(const struct mptcp_sock *msk)
>>>>  {
>>>>  	struct sock *ssk = READ_ONCE(msk->first);
>>>>  
>>>> -	return ssk && inet_sk_state_load(ssk) != TCP_CLOSE;
>>>> +	return ssk && inet_sk_state_load(ssk) == TCP_ESTABLISHED;
>>>
>>> I think the above is not correct, __mptcp_has_initial_subflow() will
>>> return false before connect completes and/or for listener sockets.
>>
>> Please note that __mptcp_has_initial_subflow() is there just to count
>> the number of subflows. It is being used with 'msk->pm.subflows' and it
>> is supposed to have the same "behaviour". Then I don't think we should
>> increment the subflow counter for listener sockets, no?
> 
> If the goal is giving an accurate count of the total number of
> subflows, I think we should: the msk listener has 1 subflow: the tcp
> listener.

Good point! I was thinking about subflows that were visible on the wire,
but yes, there is a subflow there.

If we look at 'ss' output, we will also see one TCP subflow (with ulp
mptcp) for this listener socket, so displaying 1 for the total number of
subflows would make more sense.

So yes, we should add TCPF_LISTEN.

>>> You can list explicitly add the valid states with something alike:
>>>
>>> 	(1 << inet_sk_state_load(ssk)) & (TCPF_ESTABLISHED |
>>> TCPF_SYN_SENT | TCPF_SYN_RECV | TCPF_LISTEN | TCPF_CLOSE_WAIT)
>>>
>>> I'm unsure if we should include CLOSE_WAIT here: the remote has shut
>>> down, but this end can still send data...
>>
>> Maybe better, no? As long as the behaviour is similar to the one with
>> 'msk->pm.subflows'.
> 
> If we keep the way we account for MPJ subflows as a reference
> CLOSE_WAIT status must be excluded. I agree/now see it's the better
> option.

OK, so no CLOSE_WAIT, right?

>>> Side important note: you are too fast :) There are a lot of in-flight
>>> patches, and it's difficult to follow each series consistently. I
>>> suggest to focus on a small subset - possibly on a single series at the
>>> time.
>>>
>>> e.g. The first 2 patches in this series are IMHO ready to be merged
>>> [*]. If Mat could apply them, you could follow-up with the remaining
>>> bits of this series.
>>
>> Sure, I can do that.
>>
>> Regarding patch 1/8, do you think we should send that to "-net"? The
>> patch looks OK to me but on the other hand, it is not a big issue to
>> reset the initial subflow (but not ideal) if we fear regressions due to
>> this patch. WDYT?
> 
> I think both patch 1 & 2 should go via -net, but I'm not 110% sure
> there will be not regressions free. Perhaps we can let stage a bit in
> our tree?

Sure, good idea!

I can apply these patches now.

>>> [*] modulo some expansion to the changelog of patch 1, but that could
>>> happen even after merging IMHO.
>>
>> Indeed. But we might forget :)
>> So if you have any suggestions, do not hesitate to share them :-)
> 
> I would re-phrase the commit message roughly as follow:
> 
> """
> When closing the first subflow, the MPTCP protocol unconditionally
> calls tcp_disconnect(), which in turn generates a reset if the subflow
> is established. 
> 
> That is unexpected and different from what MPTCP does with MPJ
> subflows, where resets are generated only on FASTCLOSE and other edge
> scenarios.
> 
> We can't reuse for the first subflow the same code in place for MPJ
> subflows, as MPTCP clean them up completely via a tcp_close() call,
> while must keep the first subflow socket alive for later re-usage, due
> to implementation constraints.
> 
> This patch adds a new helper __mptcp_subflow_disconnect() that
> encapsulates, a logic similar to tcp_close, issuing a reset only when
> the MPTCP_CF_FASTCLOSE flag is set, and performing a clean shutdown
> otherwise.
> """

I suggest to also modify the title, something like:

    mptcp: avoid sending RST when closing the initial subflow

And this Fixes tag:

    Fixes: c2b2ae3925b6 ("mptcp: handle correctly disconnect() failures")

Cheers,
Matt
-- 
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net
Re: [PATCH mptcp-next v16 3/8] Squash to "mptcp: add mptcpi_subflows_total counter"
Posted by Paolo Abeni 11 months, 1 week ago
On Mon, 2023-10-16 at 13:53 +0200, Matthieu Baerts wrote:
> On 13/10/2023 17:35, Paolo Abeni wrote:
> > On Fri, 2023-10-13 at 12:46 +0200, Matthieu Baerts wrote:
> > > On 13/10/2023 12:32, Paolo Abeni wrote:
> > > 
> > > > You can list explicitly add the valid states with something alike:
> > > > 
> > > > 	(1 << inet_sk_state_load(ssk)) & (TCPF_ESTABLISHED |
> > > > TCPF_SYN_SENT | TCPF_SYN_RECV | TCPF_LISTEN | TCPF_CLOSE_WAIT)
> > > > 
> > > > I'm unsure if we should include CLOSE_WAIT here: the remote has shut
> > > > down, but this end can still send data...
> > > 
> > > Maybe better, no? As long as the behaviour is similar to the one with
> > > 'msk->pm.subflows'.
> > 
> > If we keep the way we account for MPJ subflows as a reference
> > CLOSE_WAIT status must be excluded. I agree/now see it's the better
> > option.
> 
> OK, so no CLOSE_WAIT, right?

Exactly, no CLOSE_WAIT

> I suggest to also modify the title, something like:
> 
>     mptcp: avoid sending RST when closing the initial subflow
> 
> And this Fixes tag:
> 
>     Fixes: c2b2ae3925b6 ("mptcp: handle correctly disconnect() failures")

Fine by me.

Thanks!

Paolo
Re: [PATCH mptcp-next v16 3/8] Squash to "mptcp: add mptcpi_subflows_total counter"
Posted by Matthieu Baerts 11 months, 1 week ago
Hi Paolo, Geliang,

On 16/10/2023 15:50, Paolo Abeni wrote:
> On Mon, 2023-10-16 at 13:53 +0200, Matthieu Baerts wrote:
>> On 13/10/2023 17:35, Paolo Abeni wrote:
>>> On Fri, 2023-10-13 at 12:46 +0200, Matthieu Baerts wrote:
>>>> On 13/10/2023 12:32, Paolo Abeni wrote:
>>>>
>>>>> You can list explicitly add the valid states with something alike:
>>>>>
>>>>> 	(1 << inet_sk_state_load(ssk)) & (TCPF_ESTABLISHED |
>>>>> TCPF_SYN_SENT | TCPF_SYN_RECV | TCPF_LISTEN | TCPF_CLOSE_WAIT)
>>>>>
>>>>> I'm unsure if we should include CLOSE_WAIT here: the remote has shut
>>>>> down, but this end can still send data...
>>>>
>>>> Maybe better, no? As long as the behaviour is similar to the one with
>>>> 'msk->pm.subflows'.
>>>
>>> If we keep the way we account for MPJ subflows as a reference
>>> CLOSE_WAIT status must be excluded. I agree/now see it's the better
>>> option.
>>
>> OK, so no CLOSE_WAIT, right?
> 
> Exactly, no CLOSE_WAIT

@Geliang: do you mind including what Paolo suggested -- without
CLOSE_WAIT -- in the next version?

>> I suggest to also modify the title, something like:
>>
>>     mptcp: avoid sending RST when closing the initial subflow
>>
>> And this Fixes tag:
>>
>>     Fixes: c2b2ae3925b6 ("mptcp: handle correctly disconnect() failures")
> 
> Fine by me.

Great, just applied the patches in our tree (fixes for -net):

New patches for t/upstream-net and t/upstream:
- 6c7131fb5cc7: mptcp: avoid sending RST when closing the initial subflow
- 1f39dcb567ca: selftests: mptcp: join: no RST when rm subflow/addr
- Results: 83a52a5f95c9..2fa503488095 (export-net)
- Results: c0f977cdc194..92a5724f0df6 (export)

Tests are now in progress:

https://cirrus-ci.com/github/multipath-tcp/mptcp_net-next/export-net/20231016T201914
https://cirrus-ci.com/github/multipath-tcp/mptcp_net-next/export/20231016T201914

Cheers,
Matt
-- 
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net