mptcp: process MP_JOIN's 4th ACK

[PATCH RFC mptcp-next fixes] mptcp: process MP_JOIN's 4th ACK

Posted by Matthieu Baerts (NGI0) 3 months, 2 weeks ago

The 'Fixes' commit recently changed the behaviour for MPTCP Join
requests for listening sockets: when receiving the 3rd ACK of a request
adding a new path (MP_JOIN), sk->sk_socket will be set, and point to the
MPTCP sock that has been created when the MPTCP connection got created
before with the first path. The newly added 'goto' will then skip the
process of the segment text (step 7) and not go through tcp_data_queue()
where the MPTCP options are validated, and some actions are triggered,
e.g. sending the MPJ 4th ACK [1].

This doesn't fully break MPTCP, mainly the 4th MPJ ACK that will be
delayed, but it looks like it affects the MPTFO feature as well --
probably in case of retransmissions I suppose -- and being the reason
why the selftests started to be unstable the last few days [2].

TODO: it is not clear to me if we need to fix something for MPTCP. The
following packetdrill test doesn't seem to be failing with the following
patch, we can see a fallback to TCP, which seems OK in this case, no?

  `../common/defaults.sh`

   0 socket(..., SOCK_STREAM|SOCK_NONBLOCK, IPPROTO_MPTCP) = 3
  +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)

  +0 > S  0:0(0)                 <mss 1460, sackOK, TS val 100 ecr 0,   nop, wscale 8, mpcapable v1 flags[flag_h] nokey>
  +0 < S  0:0(0) win 1000        <mss 1460, sackOK, TS val 407 ecr 0,   nop, wscale 8, mpcapable v1 flags[flag_h] nokey>
  +0 > S. 0:0(0) ack 1           <mss 1460, sackOK, TS val 330 ecr 0,   nop, wscale 8, mpcapable v1 flags[flag_h] nokey>
  +0 < S. 0:0(0) ack 1 win 65535 <mss 1460, sackOK, TS val 700 ecr 100, nop, wscale 8, mpcapable v1 flags[flag_h] key[skey=2]>

  +0 write(3, ..., 100) = 100
  +0 >  . 1:1(0)     ack 1 <nop, nop, TS val 845707014 ecr 700, nop, nop, sack 0:1>
  +0 > P. 1:101(100) ack 1 <nop, nop, TS val 845958933 ecr 700>

Link: https://datatracker.ietf.org/doc/html/rfc8684#fig_tokens [1]
Link: https://netdev.bots.linux.dev/contest.html?executor=vmksft-mptcp-dbg&test=mptcp-connect-sh [2]
Fixes: 23e89e8ee7be ("tcp: Don't drop SYN+ACK for simultaneous connect().")
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
 net/ipv4/tcp_input.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index ff9ab3d01ced..ff981d7776c3 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -6820,7 +6820,7 @@ tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb)
 		if (sk->sk_shutdown & SEND_SHUTDOWN)
 			tcp_shutdown(sk, SEND_SHUTDOWN);
 
-		if (sk->sk_socket)
+		if (sk->sk_socket && !sk_is_mptcp(sk))
 			goto consume;
 		break;
 

---
base-commit: 6cad12b229c1b550b9be002f364632fd0f120b24
change-id: 20240715-mptcp-process-4th-ack-mpj-f508b50d4068

Best regards,
-- 
Matthieu Baerts (NGI0) <matttbe@kernel.org>

Re: [PATCH RFC mptcp-next fixes] mptcp: process MP_JOIN's 4th ACK

Posted by Paolo Abeni 3 months, 2 weeks ago


On 7/15/24 21:09, Matthieu Baerts (NGI0) wrote:
> The 'Fixes' commit recently changed the behaviour for MPTCP Join
> requests for listening sockets: when receiving the 3rd ACK of a request
> adding a new path (MP_JOIN), sk->sk_socket will be set, and point to the
> MPTCP sock that has been created when the MPTCP connection got created
> before with the first path. The newly added 'goto' will then skip the
> process of the segment text (step 7) and not go through tcp_data_queue()
> where the MPTCP options are validated, and some actions are triggered,
> e.g. sending the MPJ 4th ACK [1].
> 
> This doesn't fully break MPTCP, mainly the 4th MPJ ACK that will be
> delayed, but it looks like it affects the MPTFO feature as well --
> probably in case of retransmissions I suppose -- and being the reason
> why the selftests started to be unstable the last few days [2].
> 
> TODO: it is not clear to me if we need to fix something for MPTCP. The
> following packetdrill test doesn't seem to be failing with the following
> patch, we can see a fallback to TCP, which seems OK in this case, no?
> 
>    `../common/defaults.sh`
> 
>     0 socket(..., SOCK_STREAM|SOCK_NONBLOCK, IPPROTO_MPTCP) = 3
>    +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)
> 
>    +0 > S  0:0(0)                 <mss 1460, sackOK, TS val 100 ecr 0,   nop, wscale 8, mpcapable v1 flags[flag_h] nokey>
>    +0 < S  0:0(0) win 1000        <mss 1460, sackOK, TS val 407 ecr 0,   nop, wscale 8, mpcapable v1 flags[flag_h] nokey>
>    +0 > S. 0:0(0) ack 1           <mss 1460, sackOK, TS val 330 ecr 0,   nop, wscale 8, mpcapable v1 flags[flag_h] nokey>
>    +0 < S. 0:0(0) ack 1 win 65535 <mss 1460, sackOK, TS val 700 ecr 100, nop, wscale 8, mpcapable v1 flags[flag_h] key[skey=2]>
> 
>    +0 write(3, ..., 100) = 100
>    +0 >  . 1:1(0)     ack 1 <nop, nop, TS val 845707014 ecr 700, nop, nop, sack 0:1>
>    +0 > P. 1:101(100) ack 1 <nop, nop, TS val 845958933 ecr 700>
> 
> Link: https://datatracker.ietf.org/doc/html/rfc8684#fig_tokens [1]
> Link: https://netdev.bots.linux.dev/contest.html?executor=vmksft-mptcp-dbg&test=mptcp-connect-sh [2]
> Fixes: 23e89e8ee7be ("tcp: Don't drop SYN+ACK for simultaneous connect().")
> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
> ---
>   net/ipv4/tcp_input.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index ff9ab3d01ced..ff981d7776c3 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -6820,7 +6820,7 @@ tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb)
>   		if (sk->sk_shutdown & SEND_SHUTDOWN)
>   			tcp_shutdown(sk, SEND_SHUTDOWN);
>   
> -		if (sk->sk_socket)
> +		if (sk->sk_socket && !sk_is_mptcp(sk))
>   			goto consume;
>   		break;
>   
> 
> ---

patch LGTM, but I'm wondering if a similar problem can be reproduced 
with plain TCP and  TFO?


syn + TFO	->
		<- syn ack
			[accept]
ack		->
		NO 3rd ack????

If so possibly the 'if (sk->sk_socket ...' check should be refined even 
more?!??

Thanks,

Paolo

Re: [PATCH RFC mptcp-next fixes] mptcp: process MP_JOIN's 4th ACK

Posted by Matthieu Baerts 3 months, 2 weeks ago

Hi Paolo,

On 16/07/2024 12:35, Paolo Abeni wrote:
> 
> 
> On 7/15/24 21:09, Matthieu Baerts (NGI0) wrote:
>> The 'Fixes' commit recently changed the behaviour for MPTCP Join
>> requests for listening sockets: when receiving the 3rd ACK of a request
>> adding a new path (MP_JOIN), sk->sk_socket will be set, and point to the
>> MPTCP sock that has been created when the MPTCP connection got created
>> before with the first path. The newly added 'goto' will then skip the
>> process of the segment text (step 7) and not go through tcp_data_queue()
>> where the MPTCP options are validated, and some actions are triggered,
>> e.g. sending the MPJ 4th ACK [1].
>>
>> This doesn't fully break MPTCP, mainly the 4th MPJ ACK that will be
>> delayed, but it looks like it affects the MPTFO feature as well --
>> probably in case of retransmissions I suppose -- and being the reason
>> why the selftests started to be unstable the last few days [2].
>>
>> TODO: it is not clear to me if we need to fix something for MPTCP. The
>> following packetdrill test doesn't seem to be failing with the following
>> patch, we can see a fallback to TCP, which seems OK in this case, no?
>>
>>    `../common/defaults.sh`
>>
>>     0 socket(..., SOCK_STREAM|SOCK_NONBLOCK, IPPROTO_MPTCP) = 3
>>    +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)
>>
>>    +0 > S  0:0(0)                 <mss 1460, sackOK, TS val 100 ecr
>> 0,   nop, wscale 8, mpcapable v1 flags[flag_h] nokey>
>>    +0 < S  0:0(0) win 1000        <mss 1460, sackOK, TS val 407 ecr
>> 0,   nop, wscale 8, mpcapable v1 flags[flag_h] nokey>
>>    +0 > S. 0:0(0) ack 1           <mss 1460, sackOK, TS val 330 ecr
>> 0,   nop, wscale 8, mpcapable v1 flags[flag_h] nokey>
>>    +0 < S. 0:0(0) ack 1 win 65535 <mss 1460, sackOK, TS val 700 ecr
>> 100, nop, wscale 8, mpcapable v1 flags[flag_h] key[skey=2]>
>>
>>    +0 write(3, ..., 100) = 100
>>    +0 >  . 1:1(0)     ack 1 <nop, nop, TS val 845707014 ecr 700, nop,
>> nop, sack 0:1>
>>    +0 > P. 1:101(100) ack 1 <nop, nop, TS val 845958933 ecr 700>
>>
>> Link: https://datatracker.ietf.org/doc/html/rfc8684#fig_tokens [1]
>> Link: https://netdev.bots.linux.dev/contest.html?executor=vmksft-
>> mptcp-dbg&test=mptcp-connect-sh [2]
>> Fixes: 23e89e8ee7be ("tcp: Don't drop SYN+ACK for simultaneous
>> connect().")
>> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
>> ---
>>   net/ipv4/tcp_input.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
>> index ff9ab3d01ced..ff981d7776c3 100644
>> --- a/net/ipv4/tcp_input.c
>> +++ b/net/ipv4/tcp_input.c
>> @@ -6820,7 +6820,7 @@ tcp_rcv_state_process(struct sock *sk, struct
>> sk_buff *skb)
>>           if (sk->sk_shutdown & SEND_SHUTDOWN)
>>               tcp_shutdown(sk, SEND_SHUTDOWN);
>>   -        if (sk->sk_socket)
>> +        if (sk->sk_socket && !sk_is_mptcp(sk))
>>               goto consume;
>>           break;
>>  
>> ---
> 
> patch LGTM

Thank you for having looked!

Should I already send the patch -- without the TODO, but mentioning the
fallback to TCP instead -- to have it accepted before sending the pull
request to Linus if we still have time?

> but I'm wondering if a similar problem can be reproduced
> with plain TCP and  TFO?
> 
> 
> syn + TFO    ->
>         <- syn ack
>             [accept]
> ack        ->
>         NO 3rd ack????
> 
> If so possibly the 'if (sk->sk_socket ...' check should be refined even
> more?!??

Good point!

I can mention that in the comment of my patch, and try to reproduce that
locally. WDYT?

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.

Re: [PATCH RFC mptcp-next fixes] mptcp: process MP_JOIN's 4th ACK

Posted by Paolo Abeni 3 months, 2 weeks ago


On 7/16/24 12:44, Matthieu Baerts wrote:
> On 16/07/2024 12:35, Paolo Abeni wrote:
>> On 7/15/24 21:09, Matthieu Baerts (NGI0) wrote:
>>> The 'Fixes' commit recently changed the behaviour for MPTCP Join
>>> requests for listening sockets: when receiving the 3rd ACK of a request
>>> adding a new path (MP_JOIN), sk->sk_socket will be set, and point to the
>>> MPTCP sock that has been created when the MPTCP connection got created
>>> before with the first path. The newly added 'goto' will then skip the
>>> process of the segment text (step 7) and not go through tcp_data_queue()
>>> where the MPTCP options are validated, and some actions are triggered,
>>> e.g. sending the MPJ 4th ACK [1].
>>>
>>> This doesn't fully break MPTCP, mainly the 4th MPJ ACK that will be
>>> delayed, but it looks like it affects the MPTFO feature as well --
>>> probably in case of retransmissions I suppose -- and being the reason
>>> why the selftests started to be unstable the last few days [2].
>>>
>>> TODO: it is not clear to me if we need to fix something for MPTCP. The
>>> following packetdrill test doesn't seem to be failing with the following
>>> patch, we can see a fallback to TCP, which seems OK in this case, no?
>>>
>>>     `../common/defaults.sh`
>>>
>>>      0 socket(..., SOCK_STREAM|SOCK_NONBLOCK, IPPROTO_MPTCP) = 3
>>>     +0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)
>>>
>>>     +0 > S  0:0(0)                 <mss 1460, sackOK, TS val 100 ecr
>>> 0,   nop, wscale 8, mpcapable v1 flags[flag_h] nokey>
>>>     +0 < S  0:0(0) win 1000        <mss 1460, sackOK, TS val 407 ecr
>>> 0,   nop, wscale 8, mpcapable v1 flags[flag_h] nokey>
>>>     +0 > S. 0:0(0) ack 1           <mss 1460, sackOK, TS val 330 ecr
>>> 0,   nop, wscale 8, mpcapable v1 flags[flag_h] nokey>
>>>     +0 < S. 0:0(0) ack 1 win 65535 <mss 1460, sackOK, TS val 700 ecr
>>> 100, nop, wscale 8, mpcapable v1 flags[flag_h] key[skey=2]>
>>>
>>>     +0 write(3, ..., 100) = 100
>>>     +0 >  . 1:1(0)     ack 1 <nop, nop, TS val 845707014 ecr 700, nop,
>>> nop, sack 0:1>
>>>     +0 > P. 1:101(100) ack 1 <nop, nop, TS val 845958933 ecr 700>
>>>
>>> Link: https://datatracker.ietf.org/doc/html/rfc8684#fig_tokens [1]
>>> Link: https://netdev.bots.linux.dev/contest.html?executor=vmksft-
>>> mptcp-dbg&test=mptcp-connect-sh [2]
>>> Fixes: 23e89e8ee7be ("tcp: Don't drop SYN+ACK for simultaneous
>>> connect().")
>>> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
>>> ---
>>>    net/ipv4/tcp_input.c | 2 +-
>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
>>> index ff9ab3d01ced..ff981d7776c3 100644
>>> --- a/net/ipv4/tcp_input.c
>>> +++ b/net/ipv4/tcp_input.c
>>> @@ -6820,7 +6820,7 @@ tcp_rcv_state_process(struct sock *sk, struct
>>> sk_buff *skb)
>>>            if (sk->sk_shutdown & SEND_SHUTDOWN)
>>>                tcp_shutdown(sk, SEND_SHUTDOWN);
>>>    -        if (sk->sk_socket)
>>> +        if (sk->sk_socket && !sk_is_mptcp(sk))
>>>                goto consume;
>>>            break;
>>>   
>>> ---
>>
>> patch LGTM
> 
> Thank you for having looked!
> 
> Should I already send the patch -- without the TODO, but mentioning the
> fallback to TCP instead -- to have it accepted before sending the pull
> request to Linus if we still have time?
> 
>> but I'm wondering if a similar problem can be reproduced
>> with plain TCP and  TFO?
>>
>>
>> syn + TFO    ->
>>          <- syn ack
>>              [accept]
>> ack        ->
>>          NO 3rd ack????
>>
>> If so possibly the 'if (sk->sk_socket ...' check should be refined even
>> more?!??
> 
> Good point!
> 
> I can mention that in the comment of my patch, and try to reproduce that
> locally. WDYT?

Would be great, especially the reproducer part. pktdrill should be handy 
here.

Thanks!

Paolo