[PATCH mptcp-next v6 4/4] selftest: mptcp: add test for mptcp socket in use

menglong8.dong@gmail.com posted 4 patches 2 months, 4 weeks ago
There is a newer version of this series
Maintainers: Mat Martineau <mathew.j.martineau@linux.intel.com>, Matthieu Baerts <matthieu.baerts@tessares.net>, "David S. Miller" <davem@davemloft.net>, Eric Dumazet <edumazet@google.com>, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>, Shuah Khan <shuah@kernel.org>
[PATCH mptcp-next v6 4/4] selftest: mptcp: add test for mptcp socket in use
Posted by menglong8.dong@gmail.com 2 months, 4 weeks ago
From: Menglong Dong <imagedong@tencent.com>

Add the function chk_msk_inuse() to diag.sh, which is used to check the
statistics of mptcp socket in use. As mptcp socket in listen state will
be closed randomly after 'accept', we need to get the count of listening
mptcp socket through 'ss' command.

All tests pass.

Signed-off-by: Menglong Dong <imagedong@tencent.com>
---
v6:
- reuse __chk_nr to check the number of msk in use
---
 tools/testing/selftests/net/mptcp/diag.sh | 47 ++++++++++++++++++++---
 1 file changed, 42 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/net/mptcp/diag.sh b/tools/testing/selftests/net/mptcp/diag.sh
index 515859a5168b..9994cb23f8a0 100755
--- a/tools/testing/selftests/net/mptcp/diag.sh
+++ b/tools/testing/selftests/net/mptcp/diag.sh
@@ -36,15 +36,20 @@ if [ $? -ne 0 ];then
 	exit $ksft_skip
 fi
 
+get_msk_inuse()
+{
+	ip netns exec $ns cat /proc/net/protocols | awk '$1~/^MPTCP$/{print $3}'
+}
+
 __chk_nr()
 {
-	local condition="$1"
+	local command="$1"
 	local expected=$2
 	local msg nr
 
 	shift 2
 	msg=$*
-	nr=$(ss -inmHMN $ns | $condition)
+	nr=$(eval $command)
 
 	printf "%-50s" "$msg"
 	if [ $nr != $expected ]; then
@@ -56,9 +61,17 @@ __chk_nr()
 	test_cnt=$((test_cnt+1))
 }
 
+__chk_msk_nr()
+{
+	local condition=$1
+	shift 1
+
+	__chk_nr "ss -inmHMN $ns | $condition" $*
+}
+
 chk_msk_nr()
 {
-	__chk_nr "grep -c token:" $*
+	__chk_msk_nr "grep -c token:" $*
 }
 
 wait_msk_nr()
@@ -96,12 +109,12 @@ wait_msk_nr()
 
 chk_msk_fallback_nr()
 {
-		__chk_nr "grep -c fallback" $*
+		__chk_msk_nr "grep -c fallback" $*
 }
 
 chk_msk_remote_key_nr()
 {
-		__chk_nr "grep -c remote_key" $*
+		__chk_msk_nr "grep -c remote_key" $*
 }
 
 __chk_listen()
@@ -141,6 +154,25 @@ chk_msk_listen()
 	nr=$(ss -Ml $filter | wc -l)
 }
 
+chk_msk_inuse()
+{
+	local expected=$1
+	local listen_nr
+
+	listen_nr=$(ss -N $ns -Ml | grep -c LISTEN)
+	expected=$(($expected+$listen_nr))
+	shift 1
+
+	for i in $(seq 10); do
+		if [ $(get_msk_inuse) -eq $expected ];then
+			break
+		fi
+		sleep 0.1
+	done
+
+	__chk_nr get_msk_inuse $expected $*
+}
+
 # $1: ns, $2: port
 wait_local_port_listen()
 {
@@ -194,8 +226,10 @@ wait_connected $ns 10000
 chk_msk_nr 2 "after MPC handshake "
 chk_msk_remote_key_nr 2 "....chk remote_key"
 chk_msk_fallback_nr 0 "....chk no fallback"
+chk_msk_inuse 2 "chk 2 msk in use"
 flush_pids
 
+chk_msk_inuse 0 "chk 0 msk in use after flush"
 
 echo "a" | \
 	timeout ${timeout_test} \
@@ -231,6 +265,9 @@ for I in `seq 1 $NR_CLIENTS`; do
 done
 
 wait_msk_nr $((NR_CLIENTS*2)) "many msk socket present"
+chk_msk_inuse $((NR_CLIENTS*2)) "chk many msk in use"
 flush_pids
 
+chk_msk_inuse 0 "chk 0 msk in use after flush"
+
 exit $ret
-- 
2.37.2
Re: selftest: mptcp: add test for mptcp socket in use: Tests Results
Posted by MPTCP CI 2 months, 4 weeks ago
Hi Menglong,

Thank you for your modifications, that's great!

Our CI did some validations and here is its report:

- KVM Validation: normal:
  - Success! ✅:
  - Task: https://cirrus-ci.com/task/4909154062565376
  - Summary: https://api.cirrus-ci.com/v1/artifact/task/4909154062565376/summary/summary.txt

- KVM Validation: debug:
  - Unstable: 1 failed test(s): selftest_diag 🔴:
  - Task: https://cirrus-ci.com/task/6035053969408000
  - Summary: https://api.cirrus-ci.com/v1/artifact/task/6035053969408000/summary/summary.txt

Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/aa327877c6f9


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-debug

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (Tessares)
Re: selftest: mptcp: add test for mptcp socket in use: Tests Results
Posted by Matthieu Baerts 2 months, 4 weeks ago
Hi Menglong,

Thank you for the v6!

It looks like the CI is not happy with it:

On 03/11/2022 14:21, MPTCP CI wrote:
> Hi Menglong,
> 
> Thank you for your modifications, that's great!
> 
> Our CI did some validations and here is its report:
> 
> - KVM Validation: normal:
>   - Success! ✅:
>   - Task: https://cirrus-ci.com/task/4909154062565376
>   - Summary: https://api.cirrus-ci.com/v1/artifact/task/4909154062565376/summary/summary.txt
> 
> - KVM Validation: debug:
>   - Unstable: 1 failed test(s): selftest_diag 🔴:
>   - Task: https://cirrus-ci.com/task/6035053969408000
>   - Summary: https://api.cirrus-ci.com/v1/artifact/task/6035053969408000/summary/summary.txt

As you can see:

----------------------------
(...)
# all listen sockets                                [  ok  ]
# after MPC handshake                               [  ok  ]
# ....chk remote_key                                [  ok  ]
# ....chk no fallback                               [  ok  ]
# chk 2 msk in use                                  [  ok  ]
# chk 0 msk in use after flush                      [  ok  ]
# check fallback                                    [  ok  ]
# many msk socket present                           [ fail ] timeout
while expecting 200 max 201 last 1
# chk many msk in use                               [ fail ] expected
200 found 0
# chk 0 msk in use after flush                      [  ok  ]
----------------------------

I guess one socket is still present after the 'check fallback': you
probably need to modify flush_pids() to wait for the processes to be
over, as suggested on a comment in your v5, no?

https://lore.kernel.org/all/b3f3c01e-4010-d5ce-970d-394711bcd0e1@tessares.net/

I don't think it is a good idea to wait for >= 200 except if it takes a
very long time to have the previous socket terminated. If it does, maybe
we should re-order the test or re-create the netns instead of re-using it.


About the patch 3/4, note that the SIGUSR1 is probably stopping the test
earlier than expected because the interrupt will cause some actions to
stop but still good to check for the 'quit' variable.


Also, one small detail for patch 4/4: can you add "...." at the
beginning of the new lines you print in the selftest, similar to
"....chk no fallback"?

Cheers,
Matt
-- 
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net
Re: [Internet]Re: selftest: mptcp: add test for mptcp socket in use: Tests Results
Posted by imagedong(董梦龙) 2 months, 4 weeks ago

> On 2022/11/4 01:34,“Matthieu Baerts”<matthieu.baerts@tessares.net> write:
> 
> Hi Menglong,
> 
> Thank you for the v6!
> 
> It looks like the CI is not happy with it:
> 
> On 03/11/2022 14:21, MPTCP CI wrote:
> > Hi Menglong,
> > 
> > Thank you for your modifications, that's great!
> > 
> > Our CI did some validations and here is its report:
> > 
> > - KVM Validation: normal:
> >   - Success! ✅:
> >   - Task: https://cirrus-ci.com/task/4909154062565376
> >   - Summary: https://api.cirrus-ci.com/v1/artifact/task/4909154062565376/summary/summary.txt
> > 
> > - KVM Validation: debug:
> >   - Unstable: 1 failed test(s): selftest_diag 🔴:
> >   - Task: https://cirrus-ci.com/task/6035053969408000
> >   - Summary: https://api.cirrus-ci.com/v1/artifact/task/6035053969408000/summary/summary.txt
> 
> As you can see:
> 
> ----------------------------
> (...)
> # all listen sockets                                [  ok  ]
> # after MPC handshake                               [  ok  ]
> # ....chk remote_key                                [  ok  ]
> # ....chk no fallback                               [  ok  ]
> # chk 2 msk in use                                  [  ok  ]
> # chk 0 msk in use after flush                      [  ok  ]
> # check fallback                                    [  ok  ]
> # many msk socket present                           [ fail ] timeout
> while expecting 200 max 201 last 1
> # chk many msk in use                               [ fail ] expected
> 200 found 0
> # chk 0 msk in use after flush                      [  ok  ]
> ----------------------------
> 
> I guess one socket is still present after the 'check fallback': you
> probably need to modify flush_pids() to wait for the processes to be
> over, as suggested on a comment in your v5, no?
> 

In fact, I suspect that it's something else. the wait_msk_nr()
has already wait long enough for the processes to be over.
I suspect that the msk is not released after the processes
exit.

Now, I can reproduce it in my computer, about once in 10 times.
And I'll try to figure out what's happening.

> b3f3c01e-4010-d5ce-970d-394711bcd0e1@tessares.net <https://lore.kernel.org/all/<a href=>/">https://lore.kernel.org/all/b3f3c01e-4010-d5ce-970d-394711bcd0e1@tessares.net/
> 
> I don't think it is a good idea to wait for >= 200 except if it takes a
> very long time to have the previous socket terminated. If it does, maybe
> we should re-order the test or re-create the netns instead of re-using it.
> 

It can happen in a very slow qemu machine, but not possible
in a physical machine.

> 
> About the patch 3/4, note that the SIGUSR1 is probably stopping the test
> earlier than expected because the interrupt will cause some actions to
> stop but still good to check for the 'quit' variable.
> 
> 
> Also, one small detail for patch 4/4: can you add "...." at the
> beginning of the new lines you print in the selftest, similar to
> "....chk no fallback"?
> 

Okay!

Thanks!
Menglong Dong

> Cheers,
> Matt
> -- 
> Tessares | Belgium | Hybrid Access Solutions
> www.tessares.net


Re: selftest: mptcp: add test for mptcp socket in use: Tests Results
Posted by Mat Martineau 2 months, 4 weeks ago
On Thu, 3 Nov 2022, Matthieu Baerts wrote:

> Hi Menglong,
>
> Thank you for the v6!
>
> It looks like the CI is not happy with it:
>
> On 03/11/2022 14:21, MPTCP CI wrote:
>> Hi Menglong,
>>
>> Thank you for your modifications, that's great!
>>
>> Our CI did some validations and here is its report:
>>
>> - KVM Validation: normal:
>>   - Success! ✅:
>>   - Task: https://cirrus-ci.com/task/4909154062565376
>>   - Summary: https://api.cirrus-ci.com/v1/artifact/task/4909154062565376/summary/summary.txt
>>
>> - KVM Validation: debug:
>>   - Unstable: 1 failed test(s): selftest_diag 🔴:
>>   - Task: https://cirrus-ci.com/task/6035053969408000
>>   - Summary: https://api.cirrus-ci.com/v1/artifact/task/6035053969408000/summary/summary.txt
>
> As you can see:
>
> ----------------------------
> (...)
> # all listen sockets                                [  ok  ]
> # after MPC handshake                               [  ok  ]
> # ....chk remote_key                                [  ok  ]
> # ....chk no fallback                               [  ok  ]
> # chk 2 msk in use                                  [  ok  ]
> # chk 0 msk in use after flush                      [  ok  ]
> # check fallback                                    [  ok  ]
> # many msk socket present                           [ fail ] timeout
> while expecting 200 max 201 last 1
> # chk many msk in use                               [ fail ] expected
> 200 found 0
> # chk 0 msk in use after flush                      [  ok  ]
> ----------------------------
>
> I guess one socket is still present after the 'check fallback': you
> probably need to modify flush_pids() to wait for the processes to be
> over, as suggested on a comment in your v5, no?
>
> https://lore.kernel.org/all/b3f3c01e-4010-d5ce-970d-394711bcd0e1@tessares.net/
>
> I don't think it is a good idea to wait for >= 200 except if it takes a
> very long time to have the previous socket terminated. If it does, maybe
> we should re-order the test or re-create the netns instead of re-using it.
>
>
> About the patch 3/4, note that the SIGUSR1 is probably stopping the test
> earlier than expected because the interrupt will cause some actions to
> stop but still good to check for the 'quit' variable.
>
>
> Also, one small detail for patch 4/4: can you add "...." at the
> beginning of the new lines you print in the selftest, similar to
> "....chk no fallback"?
>

Menglong -

Thanks for the updated patches. The test ran ok on my local system, but 
the CI is slow on the debug build which makes the timing trickier. I don't 
have anything to add to Matthieu's comments above, seems like his 
suggestions will resolve the CI issue.

--
Mat Martineau
Intel