From: Paolo Abeni <pabeni@redhat.com>
The mptcp diag interface already experienced a few locking bugs
that lockdep and appropriate coverage have detected in advance.
Let's add a test-case triggering the relevant code path, to prevent
similar issues in the future.
Be careful to cope with very slow environments.
Note that we don't need an explicit timeout on the mptcp_connect
subprocess to cope with eventual bug/hang-up as the final cleanup
terminating the child processes will take care of that.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
tools/testing/selftests/net/mptcp/diag.sh | 30 +++++++++++++++++++++++++++++-
1 file changed, 29 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/mptcp/diag.sh b/tools/testing/selftests/net/mptcp/diag.sh
index 0a58ebb8b04c..f300f4e1eb59 100755
--- a/tools/testing/selftests/net/mptcp/diag.sh
+++ b/tools/testing/selftests/net/mptcp/diag.sh
@@ -20,7 +20,7 @@ flush_pids()
ip netns pids "${ns}" | xargs --no-run-if-empty kill -SIGUSR1 &>/dev/null
- for _ in $(seq 10); do
+ for _ in $(seq $((timeout_poll * 10))); do
[ -z "$(ip netns pids "${ns}")" ] && break
sleep 0.1
done
@@ -91,6 +91,15 @@ chk_msk_nr()
__chk_msk_nr "grep -c token:" "$@"
}
+chk_listener_nr()
+{
+ local expected=$1
+ local msg="$2"
+
+ __chk_nr "ss -inmlHMON $ns | wc -l" "$expected" "$msg - mptcp" 0
+ __chk_nr "ss -inmlHtON $ns | wc -l" "$expected" "$msg - subflows"
+}
+
wait_msk_nr()
{
local condition="grep -c token:"
@@ -289,5 +298,24 @@ flush_pids
chk_msk_inuse 0 "many->0"
chk_msk_cestab 0 "many->0"
+chk_listener_nr 0 "no listener sockets"
+NR_SERVERS=100
+for I in $(seq 1 $NR_SERVERS); do
+ ip netns exec $ns ./mptcp_connect -p $((I + 20001)) \
+ -t ${timeout_poll} -l 0.0.0.0 >/dev/null 2>&1 &
+done
+
+for I in $(seq 1 $NR_SERVERS); do
+ mptcp_lib_wait_local_port_listen $ns $((I + 20001))
+done
+
+chk_listener_nr $NR_SERVERS "many listener sockets"
+
+# graceful termination
+for I in $(seq 1 $NR_SERVERS); do
+ echo a | ip netns exec $ns ./mptcp_connect -p $((I + 20001)) 127.0.0.1 >/dev/null 2>&1 &
+done
+flush_pids
+
mptcp_lib_result_print_all_tap
exit $ret
--
2.43.0
On Fri, 23 Feb 2024 17:14:20 +0100 Matthieu Baerts (NGI0) wrote: > From: Paolo Abeni <pabeni@redhat.com> > > The mptcp diag interface already experienced a few locking bugs > that lockdep and appropriate coverage have detected in advance. > > Let's add a test-case triggering the relevant code path, to prevent > similar issues in the future. > > Be careful to cope with very slow environments. > > Note that we don't need an explicit timeout on the mptcp_connect > subprocess to cope with eventual bug/hang-up as the final cleanup > terminating the child processes will take care of that. Hi! There's a failure in CI under debug after merging net and net-next in diag.sh. Maybe because of the patch which lowered timeout? https://lore.kernel.org/all/20240223-upstream-net-next-20240223-misc-improvements-v1-8-b6c8a10396bd@kernel.org/
Hi Jakub, On 01/03/2024 15:37, Jakub Kicinski wrote: > On Fri, 23 Feb 2024 17:14:20 +0100 Matthieu Baerts (NGI0) wrote: >> From: Paolo Abeni <pabeni@redhat.com> >> >> The mptcp diag interface already experienced a few locking bugs >> that lockdep and appropriate coverage have detected in advance. >> >> Let's add a test-case triggering the relevant code path, to prevent >> similar issues in the future. >> >> Be careful to cope with very slow environments. >> >> Note that we don't need an explicit timeout on the mptcp_connect >> subprocess to cope with eventual bug/hang-up as the final cleanup >> terminating the child processes will take care of that. > > Hi! > > There's a failure in CI under debug after merging net and net-next > in diag.sh. Maybe because of the patch which lowered timeout? > https://lore.kernel.org/all/20240223-upstream-net-next-20240223-misc-improvements-v1-8-b6c8a10396bd@kernel.org/ Thank you for this message! I didn't have this error on my side, even without '-d SLUB_DEBUG_ON' we do on top of the debug kconfig, but I see I can reproduce it on slower environments. Indeed, it looks like it can be caused by that modification. I will send a fix ASAP! Cheers, Matt -- Sponsored by the NGI0 Core fund.
Hi Jakub, On 01/03/2024 16:10, Matthieu Baerts wrote: > On 01/03/2024 15:37, Jakub Kicinski wrote: >> On Fri, 23 Feb 2024 17:14:20 +0100 Matthieu Baerts (NGI0) wrote: >>> From: Paolo Abeni <pabeni@redhat.com> >>> >>> The mptcp diag interface already experienced a few locking bugs >>> that lockdep and appropriate coverage have detected in advance. >>> >>> Let's add a test-case triggering the relevant code path, to prevent >>> similar issues in the future. >>> >>> Be careful to cope with very slow environments. >>> >>> Note that we don't need an explicit timeout on the mptcp_connect >>> subprocess to cope with eventual bug/hang-up as the final cleanup >>> terminating the child processes will take care of that. >> >> Hi! >> >> There's a failure in CI under debug after merging net and net-next >> in diag.sh. Maybe because of the patch which lowered timeout? >> https://lore.kernel.org/all/20240223-upstream-net-next-20240223-misc-improvements-v1-8-b6c8a10396bd@kernel.org/ > > Thank you for this message! > > I didn't have this error on my side, even without '-d SLUB_DEBUG_ON' we > do on top of the debug kconfig, but I see I can reproduce it on slower > environments. Indeed, it looks like it can be caused by that > modification. I will send a fix ASAP! The following patch fixes the issue on my side (when using 'renice 20' and stress-ng in parallel :) ): https://lore.kernel.org/netdev/20240301-upstream-net-20240301-selftests-mptcp-diag-exit-timeout-v1-2-07cb2fa15c06@kernel.org/T/ I queued it for -net, but the issue should only be visible in net-next due to the reduction of the timeout, as you suggested. This patch can also be applied in net-next if preferred, it will not cause any conflicts between net and net-next. Cheers, Matt -- Sponsored by the NGI0 Core fund.
On Fri, 1 Mar 2024 19:24:53 +0100 Matthieu Baerts wrote: > > Thank you for this message! > > > > I didn't have this error on my side, even without '-d SLUB_DEBUG_ON' we > > do on top of the debug kconfig, but I see I can reproduce it on slower > > environments. Indeed, it looks like it can be caused by that > > modification. I will send a fix ASAP! > > The following patch fixes the issue on my side (when using 'renice 20' > and stress-ng in parallel :) ): > > https://lore.kernel.org/netdev/20240301-upstream-net-20240301-selftests-mptcp-diag-exit-timeout-v1-2-07cb2fa15c06@kernel.org/T/ Nice! Note only a fix but also lowers runtime, if I'm reading right! > I queued it for -net, but the issue should only be visible in net-next > due to the reduction of the timeout, as you suggested. This patch can > also be applied in net-next if preferred, it will not cause any > conflicts between net and net-next. No strong preference, net sounds good. Thanks for a quick turnaround!
On Fri, Feb 23, 2024 at 05:14:20PM +0100, Matthieu Baerts (NGI0) wrote: > From: Paolo Abeni <pabeni@redhat.com> > > The mptcp diag interface already experienced a few locking bugs > that lockdep and appropriate coverage have detected in advance. > > Let's add a test-case triggering the relevant code path, to prevent > similar issues in the future. > > Be careful to cope with very slow environments. > > Note that we don't need an explicit timeout on the mptcp_connect > subprocess to cope with eventual bug/hang-up as the final cleanup > terminating the child processes will take care of that. > > Signed-off-by: Paolo Abeni <pabeni@redhat.com> > Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> > Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org>
© 2016 - 2026 Red Hat, Inc.