[PATCH mptcp-next] Squash-to: "selftests: mptcp: tweak simult_flows for debug kernels"

Mat Martineau posted 1 patch 1 year, 9 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/multipath-tcp/mptcp_net-next tags/patchew/20220627214438.17887-1-mathew.j.martineau@linux.intel.com
Maintainers: "David S. Miller" <davem@davemloft.net>, Mat Martineau <mathew.j.martineau@linux.intel.com>, Paolo Abeni <pabeni@redhat.com>, Jakub Kicinski <kuba@kernel.org>, Shuah Khan <shuah@kernel.org>, Matthieu Baerts <matthieu.baerts@tessares.net>, Eric Dumazet <edumazet@google.com>
There is a newer version of this series
tools/testing/selftests/net/mptcp/simult_flows.sh | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
[PATCH mptcp-next] Squash-to: "selftests: mptcp: tweak simult_flows for debug kernels"
Posted by Mat Martineau 1 year, 9 months ago
kbuild is still seeing intermittent failures in the simult_flows.sh
test. It uses a kernel config without kmemleak, but with other
performance-affecting debug options like lockdep and kasan.

Example failures:
kernel-selftests.net/mptcp.simult_flows.sh.unbalanced_bwidth_with_unbalanced_delay_transfer_slower_than_expected!_runtime_4339_ms_expected_4005_ms_max_4005.fail
kernel-selftests.net/mptcp.simult_flows.sh.unbalanced_bwidth_transfer_slower_than_expected!_runtime_4285_ms_expected_4005_ms_max_4005.fail
kernel-selftests.net/mptcp.simult_flows.sh.unbalanced_bwidth_transfer_slower_than_expected!_runtime_4346_ms_expected_4005_ms_max_4005.fail

Adjust the debug detection to loosen the simult_flows timing constraints
if either kmemleak or lockdep are configured.

Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
---
 tools/testing/selftests/net/mptcp/simult_flows.sh | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/net/mptcp/simult_flows.sh b/tools/testing/selftests/net/mptcp/simult_flows.sh
index e266b26a4274..fa4f0bf55049 100755
--- a/tools/testing/selftests/net/mptcp/simult_flows.sh
+++ b/tools/testing/selftests/net/mptcp/simult_flows.sh
@@ -110,9 +110,10 @@ setup()
 	# debug build can slow down measurably the test program
 	# we use quite tight time limit on the run-time, to ensure
 	# maximum B/W usage.
-	# Use the kmemleak file presence as a rough estimate for this being
-	# a debug kernel and increase the maximum run-time accordingly
-	[ -f /sys/kernel/debug/kmemleak ] && slack=$((slack+200))
+	# Use the kmemleak or lockdep file presence as a rough estimate
+	# for this being a debug kernel and increase the maximum
+	# run-time accordingly
+	[ -f /sys/kernel/debug/kmemleak -o -f /proc/lockdep ] && slack=$((slack+200))
 }
 
 # $1: ns, $2: port
-- 
2.36.1


Re: [PATCH mptcp-next] Squash-to: "selftests: mptcp: tweak simult_flows for debug kernels"
Posted by Matthieu Baerts 1 year, 9 months ago
Hi Mat,

On 27/06/2022 23:44, Mat Martineau wrote:
> kbuild is still seeing intermittent failures in the simult_flows.sh
> test. It uses a kernel config without kmemleak, but with other
> performance-affecting debug options like lockdep and kasan.
> 
> Example failures:
> kernel-selftests.net/mptcp.simult_flows.sh.unbalanced_bwidth_with_unbalanced_delay_transfer_slower_than_expected!_runtime_4339_ms_expected_4005_ms_max_4005.fail
> kernel-selftests.net/mptcp.simult_flows.sh.unbalanced_bwidth_transfer_slower_than_expected!_runtime_4285_ms_expected_4005_ms_max_4005.fail
> kernel-selftests.net/mptcp.simult_flows.sh.unbalanced_bwidth_transfer_slower_than_expected!_runtime_4346_ms_expected_4005_ms_max_4005.fail

If I'm not mistaken, adding 200ms would not prevent these failures if
you got 4346ms instead of 4005ms, right? It looks like we need to extend
the time to something around 350ms.

> Adjust the debug detection to loosen the simult_flows timing constraints
> if either kmemleak or lockdep are configured.

Good idea!
I didn't find any "safe" ways to easily check that KASAN is used.

Checking dmesg doesn't seem to be a safe way for all environments.

But maybe we could do this? (with '-q')

  $ grep mm/kasan /sys/devices/system/cpu/hotplug/states/sys/devices
  /system/cpu/hotplug/states:70:214: mm/kasan:online

Cheers,
Matt
-- 
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net

Re: [PATCH mptcp-next] Squash-to: "selftests: mptcp: tweak simult_flows for debug kernels"
Posted by Mat Martineau 1 year, 9 months ago
On Tue, 28 Jun 2022, Matthieu Baerts wrote:

> Hi Mat,
>
> On 27/06/2022 23:44, Mat Martineau wrote:
>> kbuild is still seeing intermittent failures in the simult_flows.sh
>> test. It uses a kernel config without kmemleak, but with other
>> performance-affecting debug options like lockdep and kasan.
>>
>> Example failures:
>> kernel-selftests.net/mptcp.simult_flows.sh.unbalanced_bwidth_with_unbalanced_delay_transfer_slower_than_expected!_runtime_4339_ms_expected_4005_ms_max_4005.fail
>> kernel-selftests.net/mptcp.simult_flows.sh.unbalanced_bwidth_transfer_slower_than_expected!_runtime_4285_ms_expected_4005_ms_max_4005.fail
>> kernel-selftests.net/mptcp.simult_flows.sh.unbalanced_bwidth_transfer_slower_than_expected!_runtime_4346_ms_expected_4005_ms_max_4005.fail
>
> If I'm not mistaken, adding 200ms would not prevent these failures if
> you got 4346ms instead of 4005ms, right? It looks like we need to extend
> the time to something around 350ms.
>

I had thought the "slack" was calculated differently, but I think you're 
correct here. I am a little reluctant to increase the limit too far, since 
the whole point is to detect when the transfers become slower - and we 
seem to instead keep finding slower CI systems!

What do you think about this approach: make simult_flows.sh 'SKIP' when 
debug kernel features are detected, unless a "-f" flag forces it to run? 
That way we could run it with debug features where we know the system 
performance, like our CI, but not show bogus failures on random 
debug-enabled systems.


>> Adjust the debug detection to loosen the simult_flows timing constraints
>> if either kmemleak or lockdep are configured.
>
> Good idea!
> I didn't find any "safe" ways to easily check that KASAN is used.
>
> Checking dmesg doesn't seem to be a safe way for all environments.
>
> But maybe we could do this? (with '-q')
>
>  $ grep mm/kasan /sys/devices/system/cpu/hotplug/states/sys/devices
>  /system/cpu/hotplug/states:70:214: mm/kasan:online
>

How about:

grep -q ' kmemleak_init$\| lockdep_init$\| kasan_init$\| prove_locking$' /proc/kallsyms

?

That detects the compiled-in features, rather than what's enabled at 
runtime, but it's simple and may be good enough.

--
Mat Martineau
Intel

Re: [PATCH mptcp-next] Squash-to: "selftests: mptcp: tweak simult_flows for debug kernels"
Posted by Matthieu Baerts 1 year, 9 months ago
Hi Mat,

On 28/06/2022 19:55, Mat Martineau wrote:
> On Tue, 28 Jun 2022, Matthieu Baerts wrote:
> 
>> Hi Mat,
>>
>> On 27/06/2022 23:44, Mat Martineau wrote:
>>> kbuild is still seeing intermittent failures in the simult_flows.sh
>>> test. It uses a kernel config without kmemleak, but with other
>>> performance-affecting debug options like lockdep and kasan.
>>>
>>> Example failures:
>>> kernel-selftests.net/mptcp.simult_flows.sh.unbalanced_bwidth_with_unbalanced_delay_transfer_slower_than_expected!_runtime_4339_ms_expected_4005_ms_max_4005.fail
>>>
>>> kernel-selftests.net/mptcp.simult_flows.sh.unbalanced_bwidth_transfer_slower_than_expected!_runtime_4285_ms_expected_4005_ms_max_4005.fail
>>>
>>> kernel-selftests.net/mptcp.simult_flows.sh.unbalanced_bwidth_transfer_slower_than_expected!_runtime_4346_ms_expected_4005_ms_max_4005.fail
>>>
>>
>> If I'm not mistaken, adding 200ms would not prevent these failures if
>> you got 4346ms instead of 4005ms, right? It looks like we need to extend
>> the time to something around 350ms.
>>
> 
> I had thought the "slack" was calculated differently, but I think you're
> correct here. I am a little reluctant to increase the limit too far,
> since the whole point is to detect when the transfers become slower -
> and we seem to instead keep finding slower CI systems!

Indeed.
In a recent build with a non debug kernel, I also got one issue:

>  # unbalanced bwidth with opposed, unbalanced delay - reverse directiontransfer slower than expected! runtime 4097 ms, expected 4005 ms max 4005 [ fail ]

It seems it is quite rare and probably due other jobs running in
parallel. I will monitor that.

> What do you think about this approach: make simult_flows.sh 'SKIP' when
> debug kernel features are detected, unless a "-f" flag forces it to run?
> That way we could run it with debug features where we know the system
> performance, like our CI, but not show bogus failures on random
> debug-enabled systems.

Yes, that was my suggestion in the GitHub issue I opened. In "debug"
mode, we are going to be slowed down by the extra processing the kernel
has to do while in this test we mainly focus on the network delay. A bit
more is added the processing but not much because I guess the "slack" is
also there for the "slow start" at the beginning of the connection.

https://github.com/multipath-tcp/mptcp_net-next/issues/282

If we add a "-f" flag, maybe good to add the possibility to change the
default "slack" value, e.g.

  ./simult_flows.sh -f 400


Or maybe clearer with:

  ./simult_flows.sh -f -s 400


(slack would be 400 instead of 50 then)


@Paolo: would it be OK for you if we skip this test in debug mode?


>>> Adjust the debug detection to loosen the simult_flows timing constraints
>>> if either kmemleak or lockdep are configured.
>>
>> Good idea!
>> I didn't find any "safe" ways to easily check that KASAN is used.
>>
>> Checking dmesg doesn't seem to be a safe way for all environments.
>>
>> But maybe we could do this? (with '-q')
>>
>>  $ grep mm/kasan /sys/devices/system/cpu/hotplug/states/sys/devices
>>  /system/cpu/hotplug/states:70:214: mm/kasan:online
>>
> 
> How about:
> 
> grep -q ' kmemleak_init$\| lockdep_init$\| kasan_init$\| prove_locking$'
> /proc/kallsyms
> 
> ?
> 
> That detects the compiled-in features, rather than what's enabled at
> runtime, but it's simple and may be good enough.

Good idea, seems OK on my side:

  # grep ' kmemleak_init$\| lockdep_init$\| kasan_init$\|
prove_locking$' /proc/kallsyms
  ffffffff9ae5b420 d prove_locking
  ffffffff9c51b592 T kasan_init
  ffffffff9c525857 T lockdep_init
  ffffffff9c542072 T kmemleak_init

Cheers,
Matt
-- 
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net

Re: Squash-to: "selftests: mptcp: tweak simult_flows for debug kernels": Tests Results
Posted by MPTCP CI 1 year, 9 months ago
Hi Mat,

Thank you for your modifications, that's great!

Our CI did some validations and here is its report:

- KVM Validation: normal:
  - Success! ✅:
  - Task: https://cirrus-ci.com/task/5662712175788032
  - Summary: https://api.cirrus-ci.com/v1/artifact/task/5662712175788032/summary/summary.txt

- KVM Validation: debug:
  - Unstable: 2 failed test(s): packetdrill_add_addr selftest_mptcp_join 🔴:
  - Task: https://cirrus-ci.com/task/5099762222366720
  - Summary: https://api.cirrus-ci.com/v1/artifact/task/5099762222366720/summary/summary.txt

Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/4fa12bd9968b


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-debug

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (Tessares)