[PATCH] selftests: mptcp: diag: avoid extra waiting

Matthieu Baerts (NGI0) posted 1 patch 2 months, 1 week ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/multipath-tcp/mptcp_net-next tags/patchew/20240301163846.1370114-2-matttbe@kernel.org
tools/testing/selftests/net/mptcp/diag.sh | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)
[PATCH] selftests: mptcp: diag: avoid extra waiting
Posted by Matthieu Baerts (NGI0) 2 months, 1 week ago
When creating a lot of listener sockets, it is enough to wait only for
the last one, like we are doing before in diag.sh for other subtests.

If we do a check for each listener sockets, each time listing all
available sockets, it can take a very long time in very slow
environments, at the point we can reach some timeout.

When using the debug kconfig, the waiting time switches from more than
8 sec to 0.1 sec on my side. In slow/busy environments, and with a poll
timeout set to 30 ms, the waiting time could go up to ~100 sec because
the listener socket would timeout and stop, while the script would still
be checking one by one if all sockets are ready. The result is that
after having waited for everything to be ready, all sockets have been
stopped due to a timeout, and it is too late for the script to check how
many there were.

While at it, also removed ss options we don't need: we only need the
filtering options, to count how many listener sockets have been created.
We don't need to ask ss to display internal TCP information, and the
memory if the output is dropped by the 'wc -l' command anyway.

Fixes: b4b51d36bbaa3 ("selftests: mptcp: explicitly trigger the listener diag code-path")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/r/20240301063754.2ecefecf@kernel.org
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
 tools/testing/selftests/net/mptcp/diag.sh | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/net/mptcp/diag.sh b/tools/testing/selftests/net/mptcp/diag.sh
index e87cf76b3e4ab..38cacfc3de218 100755
--- a/tools/testing/selftests/net/mptcp/diag.sh
+++ b/tools/testing/selftests/net/mptcp/diag.sh
@@ -96,8 +96,8 @@ chk_listener_nr()
 	local expected=$1
 	local msg="$2"
 
-	__chk_nr "ss -inmlHMON $ns | wc -l" "$expected" "$msg - mptcp" 0
-	__chk_nr "ss -inmlHtON $ns | wc -l" "$expected" "$msg - subflows"
+	__chk_nr "ss -nlHMON $ns | wc -l" "$expected" "$msg - mptcp" 0
+	__chk_nr "ss -nlHtON $ns | wc -l" "$expected" "$msg - subflows"
 }
 
 wait_msk_nr()
@@ -304,10 +304,7 @@ for I in $(seq 1 $NR_SERVERS); do
 	ip netns exec $ns ./mptcp_connect -p $((I + 20001)) \
 		-t ${timeout_poll} -l 0.0.0.0 >/dev/null 2>&1 &
 done
-
-for I in $(seq 1 $NR_SERVERS); do
-	mptcp_lib_wait_local_port_listen $ns $((I + 20001))
-done
+mptcp_lib_wait_local_port_listen $ns $((NR_SERVERS + 20001))
 
 chk_listener_nr $NR_SERVERS "many listener sockets"
 
-- 
2.43.0
Re: selftests: mptcp: diag: avoid extra waiting: Tests Results
Posted by MPTCP CI 2 months, 1 week ago
Hi Matthieu,

Thank you for your modifications, that's great!

Our CI (Cirrus) did some validations with a debug kernel and here is its report:

- KVM Validation: debug (except selftest_mptcp_join):
  - Success! ✅:
  - Task: https://cirrus-ci.com/task/4921225164619776
  - Summary: https://api.cirrus-ci.com/v1/artifact/task/4921225164619776/summary/summary.txt

- KVM Validation: debug (only selftest_mptcp_join):
  - Success! ✅:
  - Task: https://cirrus-ci.com/task/6047125071462400
  - Summary: https://api.cirrus-ci.com/v1/artifact/task/6047125071462400/summary/summary.txt

Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/4ee0d83081ea


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-debug

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)
Re: selftests: mptcp: diag: avoid extra waiting: Tests Results
Posted by MPTCP CI 2 months, 1 week ago
Hi Matthieu,

Thank you for your modifications, that's great!

Our CI (GitHub Action) did some validations and here is its report:

- KVM Validation: normal:
  - Success! ✅:
  - Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/8114524834

Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/4ee0d83081ea


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-normal

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)
Re: [PATCH] selftests: mptcp: diag: avoid extra waiting
Posted by Matthieu Baerts 2 months, 1 week ago

On 01/03/2024 17:38, Matthieu Baerts (NGI0) wrote:
> When creating a lot of listener sockets, it is enough to wait only for
> the last one, like we are doing before in diag.sh for other subtests.
> 
> If we do a check for each listener sockets, each time listing all
> available sockets, it can take a very long time in very slow
> environments, at the point we can reach some timeout.
> 
> When using the debug kconfig, the waiting time switches from more than
> 8 sec to 0.1 sec on my side. In slow/busy environments, and with a poll
> timeout set to 30 ms, the waiting time could go up to ~100 sec because
> the listener socket would timeout and stop, while the script would still
> be checking one by one if all sockets are ready. The result is that
> after having waited for everything to be ready, all sockets have been
> stopped due to a timeout, and it is too late for the script to check how
> many there were.
> 
> While at it, also removed ss options we don't need: we only need the
> filtering options, to count how many listener sockets have been created.
> We don't need to ask ss to display internal TCP information, and the
> memory if the output is dropped by the 'wc -l' command anyway.
> 
> Fixes: b4b51d36bbaa3 ("selftests: mptcp: explicitly trigger the listener diag code-path")

(Note: mmh, I start to have too many commits in my repo, I need to force
'abbrev = 12' instead of using the automatic value)


I suggest already applying this patch and sending it to netdev ASAP, to
stop all the unstable diag.sh tests when using the debug kconfig:

https://netdev.bots.linux.dev/flakes.html?br-cnt=88&tn-needle=mptcp/diag-sh

New patches for t/upstream-net and t/upstream:
- 195a689388c8: selftests: mptcp: diag: avoid extra waiting
- Results: 7b9cbda58a89..cd1cba94145c (export-net)
- Results: 00e04e87933b..bb1b366ebd4b (export)

Tests are now in progress:

https://cirrus-ci.com/github/multipath-tcp/mptcp_net-next/export-net/20240301T164913
https://cirrus-ci.com/github/multipath-tcp/mptcp_net-next/export/20240301T164913

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.