ADD_ADDR can be retransmitted, and with, the parent commit, these
retransmissions can be sent quicker: from 2 minutes to less than one
second.
To avoid false positives where retransmitted ADD_ADDR causes higher
counters than expected, it is required to be more tolerant. Errors are
now only reported when fewer ADD_ADDRs have been sent/received.
An alternative could be to disable the ADD_ADDR retransmissions by
default, but that's changing the default kernel behaviour. Plus,
ADD_ADDR retransmissions can be required for some tests. To avoid adding
exceptions to a few tests, it seems better to increase the tolerance.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
tools/testing/selftests/net/mptcp/mptcp_join.sh | 21 +++++++--------------
1 file changed, 7 insertions(+), 14 deletions(-)
diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh
index 2f046167a0b6cc6fb5531a033d8d95c9ea399cf9..1028f3f931612d98e828707a9f55caa69be4a6ca 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh
@@ -358,6 +358,7 @@ reset_with_add_addr_timeout()
tables="${ip6tables}"
fi
+ # set a maximum, to avoid too long timeout with exponential backoff
ip netns exec $ns1 sysctl -q net.mptcp.add_addr_timeout=1
if ! ip netns exec $ns2 $tables -A OUTPUT -p tcp \
@@ -1669,7 +1670,6 @@ chk_add_nr()
local tx=""
local rx=""
local count
- local timeout
if [[ $ns_invert = "invert" ]]; then
ns_tx=$ns2
@@ -1678,16 +1678,13 @@ chk_add_nr()
rx=" server"
fi
- timeout=$(ip netns exec ${ns_tx} sysctl -n net.mptcp.add_addr_timeout)
-
print_check "add addr rx${rx}"
count=$(mptcp_lib_get_counter ${ns_rx} "MPTcpExtAddAddr")
if [ -z "$count" ]; then
print_skip
- # if the test configured a short timeout tolerate greater then expected
- # add addrs options, due to retransmissions
- elif [ "$count" != "$add_nr" ] && { [ "$timeout" -gt 1 ] || [ "$count" -lt "$add_nr" ]; }; then
- fail_test "got $count ADD_ADDR[s] expected $add_nr"
+ # Tolerate more ADD_ADDR then expected, due to retransmissions
+ elif [ "$count" -lt "$add_nr" ]; then
+ fail_test "got $count ADD_ADDR[s] expected $add_nr or more"
else
print_ok
fi
@@ -1774,19 +1771,15 @@ chk_add_tx_nr()
{
local add_tx_nr=$1
local echo_tx_nr=$2
- local timeout
local count
- timeout=$(ip netns exec $ns1 sysctl -n net.mptcp.add_addr_timeout)
-
print_check "add addr tx"
count=$(mptcp_lib_get_counter ${ns1} "MPTcpExtAddAddrTx")
if [ -z "$count" ]; then
print_skip
- # if the test configured a short timeout tolerate greater then expected
- # add addrs options, due to retransmissions
- elif [ "$count" != "$add_tx_nr" ] && { [ "$timeout" -gt 1 ] || [ "$count" -lt "$add_tx_nr" ]; }; then
- fail_test "got $count ADD_ADDR[s] TX, expected $add_tx_nr"
+ # Tolerate more ADD_ADDR then expected, due to retransmissions
+ elif [ "$count" -lt "$add_tx_nr" ]; then
+ fail_test "got $count ADD_ADDR[s] TX, expected $add_tx_nr or more"
else
print_ok
fi
--
2.51.0
Hi Matt, On Tue, 2025-09-02 at 21:01 +0200, Matthieu Baerts (NGI0) wrote: > ADD_ADDR can be retransmitted, and with, the parent commit, these > retransmissions can be sent quicker: from 2 minutes to less than one > second. > > To avoid false positives where retransmitted ADD_ADDR causes higher > counters than expected, it is required to be more tolerant. Errors > are > now only reported when fewer ADD_ADDRs have been sent/received. > > An alternative could be to disable the ADD_ADDR retransmissions by > default, but that's changing the default kernel behaviour. Plus, > ADD_ADDR retransmissions can be required for some tests. To avoid > adding > exceptions to a few tests, it seems better to increase the tolerance. I think adding "tolerate more ADD_ADDR" to the two general helpers chk_add_nr() and chk_add_tx_nr() is not very accurate, because we don't know which test sends/receives more ADD_ADDR. Because chk_add_nr() and chk_add_tx_nr() of most tests are normal, only a few tests will be affected and need to add "tolerate". I think it can be handled like chk_csum_nr(), accepting a number with "+", such as "+2" to allow more than two ADD_ADDR. In this way, we only need to add this "+" to a few failed tests, so the control is more accurate. What do you think? Thanks, -Geliang > > Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> > --- > tools/testing/selftests/net/mptcp/mptcp_join.sh | 21 +++++++-------- > ------ > 1 file changed, 7 insertions(+), 14 deletions(-) > > diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh > b/tools/testing/selftests/net/mptcp/mptcp_join.sh > index > 2f046167a0b6cc6fb5531a033d8d95c9ea399cf9..1028f3f931612d98e828707a9f5 > 5caa69be4a6ca 100755 > --- a/tools/testing/selftests/net/mptcp/mptcp_join.sh > +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh > @@ -358,6 +358,7 @@ reset_with_add_addr_timeout() > tables="${ip6tables}" > fi > > + # set a maximum, to avoid too long timeout with exponential > backoff > ip netns exec $ns1 sysctl -q net.mptcp.add_addr_timeout=1 > > if ! ip netns exec $ns2 $tables -A OUTPUT -p tcp \ > @@ -1669,7 +1670,6 @@ chk_add_nr() > local tx="" > local rx="" > local count > - local timeout > > if [[ $ns_invert = "invert" ]]; then > ns_tx=$ns2 > @@ -1678,16 +1678,13 @@ chk_add_nr() > rx=" server" > fi > > - timeout=$(ip netns exec ${ns_tx} sysctl -n > net.mptcp.add_addr_timeout) > - > print_check "add addr rx${rx}" > count=$(mptcp_lib_get_counter ${ns_rx} "MPTcpExtAddAddr") > if [ -z "$count" ]; then > print_skip > - # if the test configured a short timeout tolerate greater > then expected > - # add addrs options, due to retransmissions > - elif [ "$count" != "$add_nr" ] && { [ "$timeout" -gt 1 ] || > [ "$count" -lt "$add_nr" ]; }; then > - fail_test "got $count ADD_ADDR[s] expected $add_nr" > + # Tolerate more ADD_ADDR then expected, due to > retransmissions > + elif [ "$count" -lt "$add_nr" ]; then > + fail_test "got $count ADD_ADDR[s] expected $add_nr > or more" > else > print_ok > fi > @@ -1774,19 +1771,15 @@ chk_add_tx_nr() > { > local add_tx_nr=$1 > local echo_tx_nr=$2 > - local timeout > local count > > - timeout=$(ip netns exec $ns1 sysctl -n > net.mptcp.add_addr_timeout) > - > print_check "add addr tx" > count=$(mptcp_lib_get_counter ${ns1} "MPTcpExtAddAddrTx") > if [ -z "$count" ]; then > print_skip > - # if the test configured a short timeout tolerate greater > then expected > - # add addrs options, due to retransmissions > - elif [ "$count" != "$add_tx_nr" ] && { [ "$timeout" -gt 1 ] > || [ "$count" -lt "$add_tx_nr" ]; }; then > - fail_test "got $count ADD_ADDR[s] TX, expected > $add_tx_nr" > + # Tolerate more ADD_ADDR then expected, due to > retransmissions > + elif [ "$count" -lt "$add_tx_nr" ]; then > + fail_test "got $count ADD_ADDR[s] TX, expected > $add_tx_nr or more" > else > print_ok > fi
Hi Geliang, On 04/09/2025 11:26, Geliang Tang wrote: > Hi Matt, > > On Tue, 2025-09-02 at 21:01 +0200, Matthieu Baerts (NGI0) wrote: >> ADD_ADDR can be retransmitted, and with, the parent commit, these >> retransmissions can be sent quicker: from 2 minutes to less than one >> second. >> >> To avoid false positives where retransmitted ADD_ADDR causes higher >> counters than expected, it is required to be more tolerant. Errors >> are >> now only reported when fewer ADD_ADDRs have been sent/received. >> >> An alternative could be to disable the ADD_ADDR retransmissions by >> default, but that's changing the default kernel behaviour. Plus, >> ADD_ADDR retransmissions can be required for some tests. To avoid >> adding >> exceptions to a few tests, it seems better to increase the tolerance. > > I think adding "tolerate more ADD_ADDR" to the two general helpers > chk_add_nr() and chk_add_tx_nr() is not very accurate, because we don't > know which test sends/receives more ADD_ADDR. Because chk_add_nr() and > chk_add_tx_nr() of most tests are normal, only a few tests will be > affected and need to add "tolerate". > > I think it can be handled like chk_csum_nr(), accepting a number with > "+", such as "+2" to allow more than two ADD_ADDR. In this way, we only > need to add this "+" to a few failed tests, so the control is more > accurate. What do you think? I initially did something similar: marking the tests where we can be more tolerant, but the modifications were quite important. But that was not it: before, there were no retransmissions (only after 2 minutes, so after the end of each subtest), and we had the tolerance only when setting the timeout to 1 second. Now, we always have retransmissions (and we don't control the timeout any more), so to be aligned with what we had before, we should probably always have this tolerance. We might have unexpected retransmissions because suddenly, the host got busy, and the ADD_ADDR retransmission timer fires. We don't want to be impacted by these unexpected events. I think that's valid for all tests were we check the ADD_ADDR. Maybe what is missing is an exception for the expected counter 0: if we expect no ADD_ADDR, we should fail if some are seen. But this issue was already there, we would need another patch for that, probably. (I can look). Also, what would be certainly better here, is to add a new MIB counters for the retransmissions, so we can check the actual number instead of having the tolerance. Again, that's orthogonal to this fix I think, so this can be done later on. So in short: - this patch simply align with what was already there - we should add an exception: no tolerance when expecting no ADD_ADDR - we might add a new MIB counter (sent retransmitted ADD_ADDR) to avoid the tolerance (on newer kernels only: when this counter is available) WDYT? Cheers, Matt -- Sponsored by the NGI0 Core fund.
Hi Geliang, On 04/09/2025 12:47, Matthieu Baerts wrote: > Hi Geliang, > > On 04/09/2025 11:26, Geliang Tang wrote: >> Hi Matt, >> >> On Tue, 2025-09-02 at 21:01 +0200, Matthieu Baerts (NGI0) wrote: >>> ADD_ADDR can be retransmitted, and with, the parent commit, these >>> retransmissions can be sent quicker: from 2 minutes to less than one >>> second. >>> >>> To avoid false positives where retransmitted ADD_ADDR causes higher >>> counters than expected, it is required to be more tolerant. Errors >>> are >>> now only reported when fewer ADD_ADDRs have been sent/received. >>> >>> An alternative could be to disable the ADD_ADDR retransmissions by >>> default, but that's changing the default kernel behaviour. Plus, >>> ADD_ADDR retransmissions can be required for some tests. To avoid >>> adding >>> exceptions to a few tests, it seems better to increase the tolerance. >> >> I think adding "tolerate more ADD_ADDR" to the two general helpers >> chk_add_nr() and chk_add_tx_nr() is not very accurate, because we don't >> know which test sends/receives more ADD_ADDR. Because chk_add_nr() and >> chk_add_tx_nr() of most tests are normal, only a few tests will be >> affected and need to add "tolerate". >> >> I think it can be handled like chk_csum_nr(), accepting a number with >> "+", such as "+2" to allow more than two ADD_ADDR. In this way, we only >> need to add this "+" to a few failed tests, so the control is more >> accurate. What do you think? > > I initially did something similar: marking the tests where we can be > more tolerant, but the modifications were quite important. But that was > not it: before, there were no retransmissions (only after 2 minutes, so > after the end of each subtest), and we had the tolerance only when > setting the timeout to 1 second. Now, we always have retransmissions > (and we don't control the timeout any more), so to be aligned with what > we had before, we should probably always have this tolerance. We might > have unexpected retransmissions because suddenly, the host got busy, and > the ADD_ADDR retransmission timer fires. We don't want to be impacted by > these unexpected events. I think that's valid for all tests were we > check the ADD_ADDR. > > Maybe what is missing is an exception for the expected counter 0: if we > expect no ADD_ADDR, we should fail if some are seen. But this issue was > already there, we would need another patch for that, probably. (I can look). To be more correct here: the issue was already there before, but we couldn't hit it because "net.mptcp.add_addr_timeout" was only set to 1 (check with tolerance) when we expected to see ADD_ADDR. In other words, no need to have an exception for 0 before this patch. I will then modify this one. > Also, what would be certainly better here, is to add a new MIB counters > for the retransmissions, so we can check the actual number instead of > having the tolerance. Again, that's orthogonal to this fix I think, so > this can be done later on. (Note that this might be nice, but I'm currently not planning to add this) > > So in short: > - this patch simply align with what was already there > - we should add an exception: no tolerance when expecting no ADD_ADDR > - we might add a new MIB counter (sent retransmitted ADD_ADDR) to avoid > the tolerance (on newer kernels only: when this counter is available) > > WDYT? > > Cheers, > Matt Cheers, Matt -- Sponsored by the NGI0 Core fund.
© 2016 - 2025 Red Hat, Inc.