[PATCH mptcp-next 2/3] selftests: mptcp: join: tolerate more ADD_ADDR

Matthieu Baerts (NGI0) posted 3 patches 2 weeks ago
There is a newer version of this series
[PATCH mptcp-next 2/3] selftests: mptcp: join: tolerate more ADD_ADDR
Posted by Matthieu Baerts (NGI0) 2 weeks ago
ADD_ADDR can be retransmitted, and with, the parent commit, these
retransmissions can be sent quicker: from 2 minutes to less than one
second.

To avoid false positives where retransmitted ADD_ADDR causes higher
counters than expected, it is required to be more tolerant. Errors are
now only reported when fewer ADD_ADDRs have been sent/received.

An alternative could be to disable the ADD_ADDR retransmissions by
default, but that's changing the default kernel behaviour. Plus,
ADD_ADDR retransmissions can be required for some tests. To avoid adding
exceptions to a few tests, it seems better to increase the tolerance.

Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
 tools/testing/selftests/net/mptcp/mptcp_join.sh | 21 +++++++--------------
 1 file changed, 7 insertions(+), 14 deletions(-)

diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh
index 2f046167a0b6cc6fb5531a033d8d95c9ea399cf9..1028f3f931612d98e828707a9f55caa69be4a6ca 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh
@@ -358,6 +358,7 @@ reset_with_add_addr_timeout()
 		tables="${ip6tables}"
 	fi
 
+	# set a maximum, to avoid too long timeout with exponential backoff
 	ip netns exec $ns1 sysctl -q net.mptcp.add_addr_timeout=1
 
 	if ! ip netns exec $ns2 $tables -A OUTPUT -p tcp \
@@ -1669,7 +1670,6 @@ chk_add_nr()
 	local tx=""
 	local rx=""
 	local count
-	local timeout
 
 	if [[ $ns_invert = "invert" ]]; then
 		ns_tx=$ns2
@@ -1678,16 +1678,13 @@ chk_add_nr()
 		rx=" server"
 	fi
 
-	timeout=$(ip netns exec ${ns_tx} sysctl -n net.mptcp.add_addr_timeout)
-
 	print_check "add addr rx${rx}"
 	count=$(mptcp_lib_get_counter ${ns_rx} "MPTcpExtAddAddr")
 	if [ -z "$count" ]; then
 		print_skip
-	# if the test configured a short timeout tolerate greater then expected
-	# add addrs options, due to retransmissions
-	elif [ "$count" != "$add_nr" ] && { [ "$timeout" -gt 1 ] || [ "$count" -lt "$add_nr" ]; }; then
-		fail_test "got $count ADD_ADDR[s] expected $add_nr"
+	# Tolerate more ADD_ADDR then expected, due to retransmissions
+	elif [ "$count" -lt "$add_nr" ]; then
+		fail_test "got $count ADD_ADDR[s] expected $add_nr or more"
 	else
 		print_ok
 	fi
@@ -1774,19 +1771,15 @@ chk_add_tx_nr()
 {
 	local add_tx_nr=$1
 	local echo_tx_nr=$2
-	local timeout
 	local count
 
-	timeout=$(ip netns exec $ns1 sysctl -n net.mptcp.add_addr_timeout)
-
 	print_check "add addr tx"
 	count=$(mptcp_lib_get_counter ${ns1} "MPTcpExtAddAddrTx")
 	if [ -z "$count" ]; then
 		print_skip
-	# if the test configured a short timeout tolerate greater then expected
-	# add addrs options, due to retransmissions
-	elif [ "$count" != "$add_tx_nr" ] && { [ "$timeout" -gt 1 ] || [ "$count" -lt "$add_tx_nr" ]; }; then
-		fail_test "got $count ADD_ADDR[s] TX, expected $add_tx_nr"
+	# Tolerate more ADD_ADDR then expected, due to retransmissions
+	elif [ "$count" -lt "$add_tx_nr" ]; then
+		fail_test "got $count ADD_ADDR[s] TX, expected $add_tx_nr or more"
 	else
 		print_ok
 	fi

-- 
2.51.0
Re: [PATCH mptcp-next 2/3] selftests: mptcp: join: tolerate more ADD_ADDR
Posted by Geliang Tang 1 week, 6 days ago
Hi Matt,

On Tue, 2025-09-02 at 21:01 +0200, Matthieu Baerts (NGI0) wrote:
> ADD_ADDR can be retransmitted, and with, the parent commit, these
> retransmissions can be sent quicker: from 2 minutes to less than one
> second.
> 
> To avoid false positives where retransmitted ADD_ADDR causes higher
> counters than expected, it is required to be more tolerant. Errors
> are
> now only reported when fewer ADD_ADDRs have been sent/received.
> 
> An alternative could be to disable the ADD_ADDR retransmissions by
> default, but that's changing the default kernel behaviour. Plus,
> ADD_ADDR retransmissions can be required for some tests. To avoid
> adding
> exceptions to a few tests, it seems better to increase the tolerance.

I think adding "tolerate more ADD_ADDR" to the two general helpers
chk_add_nr() and chk_add_tx_nr() is not very accurate, because we don't
know which test sends/receives more ADD_ADDR. Because chk_add_nr() and
chk_add_tx_nr() of most tests are normal, only a few tests will be
affected and need to add "tolerate".

I think it can be handled like chk_csum_nr(), accepting a number with
"+", such as "+2" to allow more than two ADD_ADDR. In this way, we only
need to add this "+" to a few failed tests, so the control is more
accurate. What do you think?

Thanks,
-Geliang

> 
> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
> ---
>  tools/testing/selftests/net/mptcp/mptcp_join.sh | 21 +++++++--------
> ------
>  1 file changed, 7 insertions(+), 14 deletions(-)
> 
> diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh
> b/tools/testing/selftests/net/mptcp/mptcp_join.sh
> index
> 2f046167a0b6cc6fb5531a033d8d95c9ea399cf9..1028f3f931612d98e828707a9f5
> 5caa69be4a6ca 100755
> --- a/tools/testing/selftests/net/mptcp/mptcp_join.sh
> +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh
> @@ -358,6 +358,7 @@ reset_with_add_addr_timeout()
>  		tables="${ip6tables}"
>  	fi
>  
> +	# set a maximum, to avoid too long timeout with exponential
> backoff
>  	ip netns exec $ns1 sysctl -q net.mptcp.add_addr_timeout=1
>  
>  	if ! ip netns exec $ns2 $tables -A OUTPUT -p tcp \
> @@ -1669,7 +1670,6 @@ chk_add_nr()
>  	local tx=""
>  	local rx=""
>  	local count
> -	local timeout
>  
>  	if [[ $ns_invert = "invert" ]]; then
>  		ns_tx=$ns2
> @@ -1678,16 +1678,13 @@ chk_add_nr()
>  		rx=" server"
>  	fi
>  
> -	timeout=$(ip netns exec ${ns_tx} sysctl -n
> net.mptcp.add_addr_timeout)
> -
>  	print_check "add addr rx${rx}"
>  	count=$(mptcp_lib_get_counter ${ns_rx} "MPTcpExtAddAddr")
>  	if [ -z "$count" ]; then
>  		print_skip
> -	# if the test configured a short timeout tolerate greater
> then expected
> -	# add addrs options, due to retransmissions
> -	elif [ "$count" != "$add_nr" ] && { [ "$timeout" -gt 1 ] ||
> [ "$count" -lt "$add_nr" ]; }; then
> -		fail_test "got $count ADD_ADDR[s] expected $add_nr"
> +	# Tolerate more ADD_ADDR then expected, due to
> retransmissions
> +	elif [ "$count" -lt "$add_nr" ]; then
> +		fail_test "got $count ADD_ADDR[s] expected $add_nr
> or more"
>  	else
>  		print_ok
>  	fi
> @@ -1774,19 +1771,15 @@ chk_add_tx_nr()
>  {
>  	local add_tx_nr=$1
>  	local echo_tx_nr=$2
> -	local timeout
>  	local count
>  
> -	timeout=$(ip netns exec $ns1 sysctl -n
> net.mptcp.add_addr_timeout)
> -
>  	print_check "add addr tx"
>  	count=$(mptcp_lib_get_counter ${ns1} "MPTcpExtAddAddrTx")
>  	if [ -z "$count" ]; then
>  		print_skip
> -	# if the test configured a short timeout tolerate greater
> then expected
> -	# add addrs options, due to retransmissions
> -	elif [ "$count" != "$add_tx_nr" ] && { [ "$timeout" -gt 1 ]
> || [ "$count" -lt "$add_tx_nr" ]; }; then
> -		fail_test "got $count ADD_ADDR[s] TX, expected
> $add_tx_nr"
> +	# Tolerate more ADD_ADDR then expected, due to
> retransmissions
> +	elif [ "$count" -lt "$add_tx_nr" ]; then
> +		fail_test "got $count ADD_ADDR[s] TX, expected
> $add_tx_nr or more"
>  	else
>  		print_ok
>  	fi
Re: [PATCH mptcp-next 2/3] selftests: mptcp: join: tolerate more ADD_ADDR
Posted by Matthieu Baerts 1 week, 6 days ago
Hi Geliang,

On 04/09/2025 11:26, Geliang Tang wrote:
> Hi Matt,
> 
> On Tue, 2025-09-02 at 21:01 +0200, Matthieu Baerts (NGI0) wrote:
>> ADD_ADDR can be retransmitted, and with, the parent commit, these
>> retransmissions can be sent quicker: from 2 minutes to less than one
>> second.
>>
>> To avoid false positives where retransmitted ADD_ADDR causes higher
>> counters than expected, it is required to be more tolerant. Errors
>> are
>> now only reported when fewer ADD_ADDRs have been sent/received.
>>
>> An alternative could be to disable the ADD_ADDR retransmissions by
>> default, but that's changing the default kernel behaviour. Plus,
>> ADD_ADDR retransmissions can be required for some tests. To avoid
>> adding
>> exceptions to a few tests, it seems better to increase the tolerance.
> 
> I think adding "tolerate more ADD_ADDR" to the two general helpers
> chk_add_nr() and chk_add_tx_nr() is not very accurate, because we don't
> know which test sends/receives more ADD_ADDR. Because chk_add_nr() and
> chk_add_tx_nr() of most tests are normal, only a few tests will be
> affected and need to add "tolerate".
> 
> I think it can be handled like chk_csum_nr(), accepting a number with
> "+", such as "+2" to allow more than two ADD_ADDR. In this way, we only
> need to add this "+" to a few failed tests, so the control is more
> accurate. What do you think?

I initially did something similar: marking the tests where we can be
more tolerant, but the modifications were quite important. But that was
not it: before, there were no retransmissions (only after 2 minutes, so
after the end of each subtest), and we had the tolerance only when
setting the timeout to 1 second. Now, we always have retransmissions
(and we don't control the timeout any more), so to be aligned with what
we had before, we should probably always have this tolerance. We might
have unexpected retransmissions because suddenly, the host got busy, and
the ADD_ADDR retransmission timer fires. We don't want to be impacted by
these unexpected events. I think that's valid for all tests were we
check the ADD_ADDR.

Maybe what is missing is an exception for the expected counter 0: if we
expect no ADD_ADDR, we should fail if some are seen. But this issue was
already there, we would need another patch for that, probably. (I can look).

Also, what would be certainly better here, is to add a new MIB counters
for the retransmissions, so we can check the actual number instead of
having the tolerance. Again, that's orthogonal to this fix I think, so
this can be done later on.

So in short:
- this patch simply align with what was already there
- we should add an exception: no tolerance when expecting no ADD_ADDR
- we might add a new MIB counter (sent retransmitted ADD_ADDR) to avoid
the tolerance (on newer kernels only: when this counter is available)

WDYT?

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.
Re: [PATCH mptcp-next 2/3] selftests: mptcp: join: tolerate more ADD_ADDR
Posted by Matthieu Baerts 1 week, 6 days ago
Hi Geliang,

On 04/09/2025 12:47, Matthieu Baerts wrote:
> Hi Geliang,
> 
> On 04/09/2025 11:26, Geliang Tang wrote:
>> Hi Matt,
>>
>> On Tue, 2025-09-02 at 21:01 +0200, Matthieu Baerts (NGI0) wrote:
>>> ADD_ADDR can be retransmitted, and with, the parent commit, these
>>> retransmissions can be sent quicker: from 2 minutes to less than one
>>> second.
>>>
>>> To avoid false positives where retransmitted ADD_ADDR causes higher
>>> counters than expected, it is required to be more tolerant. Errors
>>> are
>>> now only reported when fewer ADD_ADDRs have been sent/received.
>>>
>>> An alternative could be to disable the ADD_ADDR retransmissions by
>>> default, but that's changing the default kernel behaviour. Plus,
>>> ADD_ADDR retransmissions can be required for some tests. To avoid
>>> adding
>>> exceptions to a few tests, it seems better to increase the tolerance.
>>
>> I think adding "tolerate more ADD_ADDR" to the two general helpers
>> chk_add_nr() and chk_add_tx_nr() is not very accurate, because we don't
>> know which test sends/receives more ADD_ADDR. Because chk_add_nr() and
>> chk_add_tx_nr() of most tests are normal, only a few tests will be
>> affected and need to add "tolerate".
>>
>> I think it can be handled like chk_csum_nr(), accepting a number with
>> "+", such as "+2" to allow more than two ADD_ADDR. In this way, we only
>> need to add this "+" to a few failed tests, so the control is more
>> accurate. What do you think?
> 
> I initially did something similar: marking the tests where we can be
> more tolerant, but the modifications were quite important. But that was
> not it: before, there were no retransmissions (only after 2 minutes, so
> after the end of each subtest), and we had the tolerance only when
> setting the timeout to 1 second. Now, we always have retransmissions
> (and we don't control the timeout any more), so to be aligned with what
> we had before, we should probably always have this tolerance. We might
> have unexpected retransmissions because suddenly, the host got busy, and
> the ADD_ADDR retransmission timer fires. We don't want to be impacted by
> these unexpected events. I think that's valid for all tests were we
> check the ADD_ADDR.
> 
> Maybe what is missing is an exception for the expected counter 0: if we
> expect no ADD_ADDR, we should fail if some are seen. But this issue was
> already there, we would need another patch for that, probably. (I can look).

To be more correct here: the issue was already there before, but we
couldn't hit it because "net.mptcp.add_addr_timeout" was only set to 1
(check with tolerance) when we expected to see ADD_ADDR. In other words,
no need to have an exception for 0 before this patch. I will then modify
this one.
> Also, what would be certainly better here, is to add a new MIB counters
> for the retransmissions, so we can check the actual number instead of
> having the tolerance. Again, that's orthogonal to this fix I think, so
> this can be done later on.

(Note that this might be nice, but I'm currently not planning to add this)

> 
> So in short:
> - this patch simply align with what was already there
> - we should add an exception: no tolerance when expecting no ADD_ADDR
> - we might add a new MIB counter (sent retransmitted ADD_ADDR) to avoid
> the tolerance (on newer kernels only: when this counter is available)
> 
> WDYT?
> 
> Cheers,
> Matt

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.