mptcp: more misc. fixes for v6.8

[PATCH net 10/10] selftests: mptcp: explicitly trigger the listener diag code-path

Posted by Matthieu Baerts (NGI0) 1 year, 11 months ago

From: Paolo Abeni <pabeni@redhat.com>

The mptcp diag interface already experienced a few locking bugs
that lockdep and appropriate coverage have detected in advance.

Let's add a test-case triggering the relevant code path, to prevent
similar issues in the future.

Be careful to cope with very slow environments.

Note that we don't need an explicit timeout on the mptcp_connect
subprocess to cope with eventual bug/hang-up as the final cleanup
terminating the child processes will take care of that.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
 tools/testing/selftests/net/mptcp/diag.sh | 30 +++++++++++++++++++++++++++++-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/net/mptcp/diag.sh b/tools/testing/selftests/net/mptcp/diag.sh
index 0a58ebb8b04c..f300f4e1eb59 100755
--- a/tools/testing/selftests/net/mptcp/diag.sh
+++ b/tools/testing/selftests/net/mptcp/diag.sh
@@ -20,7 +20,7 @@ flush_pids()
 
 	ip netns pids "${ns}" | xargs --no-run-if-empty kill -SIGUSR1 &>/dev/null
 
-	for _ in $(seq 10); do
+	for _ in $(seq $((timeout_poll * 10))); do
 		[ -z "$(ip netns pids "${ns}")" ] && break
 		sleep 0.1
 	done
@@ -91,6 +91,15 @@ chk_msk_nr()
 	__chk_msk_nr "grep -c token:" "$@"
 }
 
+chk_listener_nr()
+{
+	local expected=$1
+	local msg="$2"
+
+	__chk_nr "ss -inmlHMON $ns | wc -l" "$expected" "$msg - mptcp" 0
+	__chk_nr "ss -inmlHtON $ns | wc -l" "$expected" "$msg - subflows"
+}
+
 wait_msk_nr()
 {
 	local condition="grep -c token:"
@@ -289,5 +298,24 @@ flush_pids
 chk_msk_inuse 0 "many->0"
 chk_msk_cestab 0 "many->0"
 
+chk_listener_nr 0 "no listener sockets"
+NR_SERVERS=100
+for I in $(seq 1 $NR_SERVERS); do
+	ip netns exec $ns ./mptcp_connect -p $((I + 20001)) \
+		-t ${timeout_poll} -l 0.0.0.0 >/dev/null 2>&1 &
+done
+
+for I in $(seq 1 $NR_SERVERS); do
+	mptcp_lib_wait_local_port_listen $ns $((I + 20001))
+done
+
+chk_listener_nr $NR_SERVERS "many listener sockets"
+
+# graceful termination
+for I in $(seq 1 $NR_SERVERS); do
+	echo a | ip netns exec $ns ./mptcp_connect -p $((I + 20001)) 127.0.0.1 >/dev/null 2>&1 &
+done
+flush_pids
+
 mptcp_lib_result_print_all_tap
 exit $ret

-- 
2.43.0

Re: [PATCH net 10/10] selftests: mptcp: explicitly trigger the listener diag code-path

Posted by Jakub Kicinski 1 year, 11 months ago

On Fri, 23 Feb 2024 17:14:20 +0100 Matthieu Baerts (NGI0) wrote:
> From: Paolo Abeni <pabeni@redhat.com>
> 
> The mptcp diag interface already experienced a few locking bugs
> that lockdep and appropriate coverage have detected in advance.
> 
> Let's add a test-case triggering the relevant code path, to prevent
> similar issues in the future.
> 
> Be careful to cope with very slow environments.
> 
> Note that we don't need an explicit timeout on the mptcp_connect
> subprocess to cope with eventual bug/hang-up as the final cleanup
> terminating the child processes will take care of that.

Hi!

There's a failure in CI under debug after merging net and net-next 
in diag.sh. Maybe because of the patch which lowered timeout?
https://lore.kernel.org/all/20240223-upstream-net-next-20240223-misc-improvements-v1-8-b6c8a10396bd@kernel.org/

Re: [PATCH net 10/10] selftests: mptcp: explicitly trigger the listener diag code-path

Posted by Matthieu Baerts 1 year, 11 months ago

Hi Jakub,

On 01/03/2024 15:37, Jakub Kicinski wrote:
> On Fri, 23 Feb 2024 17:14:20 +0100 Matthieu Baerts (NGI0) wrote:
>> From: Paolo Abeni <pabeni@redhat.com>
>>
>> The mptcp diag interface already experienced a few locking bugs
>> that lockdep and appropriate coverage have detected in advance.
>>
>> Let's add a test-case triggering the relevant code path, to prevent
>> similar issues in the future.
>>
>> Be careful to cope with very slow environments.
>>
>> Note that we don't need an explicit timeout on the mptcp_connect
>> subprocess to cope with eventual bug/hang-up as the final cleanup
>> terminating the child processes will take care of that.
> 
> Hi!
> 
> There's a failure in CI under debug after merging net and net-next 
> in diag.sh. Maybe because of the patch which lowered timeout?
> https://lore.kernel.org/all/20240223-upstream-net-next-20240223-misc-improvements-v1-8-b6c8a10396bd@kernel.org/

Thank you for this message!

I didn't have this error on my side, even without '-d SLUB_DEBUG_ON' we
do on top of the debug kconfig, but I see I can reproduce it on slower
environments. Indeed, it looks like it can be caused by that
modification. I will send a fix ASAP!

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.

Re: [PATCH net 10/10] selftests: mptcp: explicitly trigger the listener diag code-path

Posted by Matthieu Baerts 1 year, 11 months ago

Hi Jakub,

On 01/03/2024 16:10, Matthieu Baerts wrote:
> On 01/03/2024 15:37, Jakub Kicinski wrote:
>> On Fri, 23 Feb 2024 17:14:20 +0100 Matthieu Baerts (NGI0) wrote:
>>> From: Paolo Abeni <pabeni@redhat.com>
>>>
>>> The mptcp diag interface already experienced a few locking bugs
>>> that lockdep and appropriate coverage have detected in advance.
>>>
>>> Let's add a test-case triggering the relevant code path, to prevent
>>> similar issues in the future.
>>>
>>> Be careful to cope with very slow environments.
>>>
>>> Note that we don't need an explicit timeout on the mptcp_connect
>>> subprocess to cope with eventual bug/hang-up as the final cleanup
>>> terminating the child processes will take care of that.
>>
>> Hi!
>>
>> There's a failure in CI under debug after merging net and net-next 
>> in diag.sh. Maybe because of the patch which lowered timeout?
>> https://lore.kernel.org/all/20240223-upstream-net-next-20240223-misc-improvements-v1-8-b6c8a10396bd@kernel.org/
> 
> Thank you for this message!
> 
> I didn't have this error on my side, even without '-d SLUB_DEBUG_ON' we
> do on top of the debug kconfig, but I see I can reproduce it on slower
> environments. Indeed, it looks like it can be caused by that
> modification. I will send a fix ASAP!

The following patch fixes the issue on my side (when using 'renice 20'
and stress-ng in parallel :) ):

https://lore.kernel.org/netdev/20240301-upstream-net-20240301-selftests-mptcp-diag-exit-timeout-v1-2-07cb2fa15c06@kernel.org/T/

I queued it for -net, but the issue should only be visible in net-next
due to the reduction of the timeout, as you suggested. This patch can
also be applied in net-next if preferred, it will not cause any
conflicts between net and net-next.

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.

Re: [PATCH net 10/10] selftests: mptcp: explicitly trigger the listener diag code-path

Posted by Jakub Kicinski 1 year, 11 months ago

On Fri, 1 Mar 2024 19:24:53 +0100 Matthieu Baerts wrote:
> > Thank you for this message!
> > 
> > I didn't have this error on my side, even without '-d SLUB_DEBUG_ON' we
> > do on top of the debug kconfig, but I see I can reproduce it on slower
> > environments. Indeed, it looks like it can be caused by that
> > modification. I will send a fix ASAP!  
> 
> The following patch fixes the issue on my side (when using 'renice 20'
> and stress-ng in parallel :) ):
> 
> https://lore.kernel.org/netdev/20240301-upstream-net-20240301-selftests-mptcp-diag-exit-timeout-v1-2-07cb2fa15c06@kernel.org/T/

Nice! Note only a fix but also lowers runtime, if I'm reading right!

> I queued it for -net, but the issue should only be visible in net-next
> due to the reduction of the timeout, as you suggested. This patch can
> also be applied in net-next if preferred, it will not cause any
> conflicts between net and net-next.

No strong preference, net sounds good.

Thanks for a quick turnaround!

Re: [PATCH net 10/10] selftests: mptcp: explicitly trigger the listener diag code-path

Posted by Simon Horman 1 year, 11 months ago

On Fri, Feb 23, 2024 at 05:14:20PM +0100, Matthieu Baerts (NGI0) wrote:
> From: Paolo Abeni <pabeni@redhat.com>
> 
> The mptcp diag interface already experienced a few locking bugs
> that lockdep and appropriate coverage have detected in advance.
> 
> Let's add a test-case triggering the relevant code path, to prevent
> similar issues in the future.
> 
> Be careful to cope with very slow environments.
> 
> Note that we don't need an explicit timeout on the mptcp_connect
> subprocess to cope with eventual bug/hang-up as the final cleanup
> terminating the child processes will take care of that.
> 
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>

Reviewed-by: Simon Horman <horms@kernel.org>