tools/testing/selftests/net/mptcp/Makefile | 3 ++- tools/testing/selftests/net/mptcp/mptcp_connect_checksum.sh | 5 +++++ tools/testing/selftests/net/mptcp/mptcp_connect_mmap.sh | 5 +++++ tools/testing/selftests/net/mptcp/mptcp_connect_sendfile.sh | 5 +++++ 4 files changed, 17 insertions(+), 1 deletion(-)
mptcp_connect.sh can be executed manually with "-m <MODE>" and "-C" to make sure everything works as expected when using "mmap" and "sendfile" modes instead of "poll", and with the MPTCP checksum support. These modes should be validated, but they are not when the selftests are executed via the kselftest helpers. It means that most CIs validating these selftests, like NIPA for the net development trees and LKFT for the stable ones, are not covering these modes. To fix that, new test programs have been added, simply calling mptcp_connect.sh with the right parameters. The first patch can be backported up to v5.6, and the second one up to v5.14. Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> --- Changes in v2: - force using a different prefix in the subtests to avoid having the same test names in all mptcp_connect*.sh selftests. - Link to v1: https://lore.kernel.org/r/20250714-net-mptcp-sft-connect-alt-v1-0-bf1c5abbe575@kernel.org --- Matthieu Baerts (NGI0) (2): selftests: mptcp: connect: also cover alt modes selftests: mptcp: connect: also cover checksum tools/testing/selftests/net/mptcp/Makefile | 3 ++- tools/testing/selftests/net/mptcp/mptcp_connect_checksum.sh | 5 +++++ tools/testing/selftests/net/mptcp/mptcp_connect_mmap.sh | 5 +++++ tools/testing/selftests/net/mptcp/mptcp_connect_sendfile.sh | 5 +++++ 4 files changed, 17 insertions(+), 1 deletion(-) --- base-commit: b640daa2822a39ff76e70200cb2b7b892b896dce change-id: 20250714-net-mptcp-sft-connect-alt-c1aaf073ef4e Best regards, -- Matthieu Baerts (NGI0) <matttbe@kernel.org>
Hello: This series was applied to netdev/net.git (main) by Jakub Kicinski <kuba@kernel.org>: On Tue, 15 Jul 2025 20:43:27 +0200 you wrote: > mptcp_connect.sh can be executed manually with "-m <MODE>" and "-C" to > make sure everything works as expected when using "mmap" and "sendfile" > modes instead of "poll", and with the MPTCP checksum support. > > These modes should be validated, but they are not when the selftests are > executed via the kselftest helpers. It means that most CIs validating > these selftests, like NIPA for the net development trees and LKFT for > the stable ones, are not covering these modes. > > [...] Here is the summary with links: - [net,v2,1/2] selftests: mptcp: connect: also cover alt modes https://git.kernel.org/netdev/net/c/37848a456fc3 - [net,v2,2/2] selftests: mptcp: connect: also cover checksum https://git.kernel.org/netdev/net/c/fdf0f60a2bb0 You are awesome, thank you! -- Deet-doot-dot, I am a bot. https://korg.docs.kernel.org/patchwork/pwbot.html
On Tue, 15 Jul 2025 20:43:27 +0200 Matthieu Baerts (NGI0) wrote: > mptcp_connect.sh can be executed manually with "-m <MODE>" and "-C" to > make sure everything works as expected when using "mmap" and "sendfile" > modes instead of "poll", and with the MPTCP checksum support. > > These modes should be validated, but they are not when the selftests are > executed via the kselftest helpers. It means that most CIs validating > these selftests, like NIPA for the net development trees and LKFT for > the stable ones, are not covering these modes. > > To fix that, new test programs have been added, simply calling > mptcp_connect.sh with the right parameters. > > The first patch can be backported up to v5.6, and the second one up to > v5.14. Looks like the failures that Paolo flagged yesterday: https://lore.kernel.org/all/a7a89aa2-7354-42c7-8219-99a3cafd3b33@redhat.com/ are back as soon as this hit NIPA :( https://netdev.bots.linux.dev/contest.html?branch=net-next-2025-07-16--00-00&executor=vmksft-mptcp&pw-n=0&pass=0 No idea why TBH, the tests run sequentially and connect.sh run before any of the new ones. I'm gonna leave it in patchwork in case the next run is clean, please use pw-bot to discard them if they keep failing.
On Tue, 15 Jul 2025 18:53:08 -0700 Jakub Kicinski wrote: > On Tue, 15 Jul 2025 20:43:27 +0200 Matthieu Baerts (NGI0) wrote: > > mptcp_connect.sh can be executed manually with "-m <MODE>" and "-C" to > > make sure everything works as expected when using "mmap" and "sendfile" > > modes instead of "poll", and with the MPTCP checksum support. > > > > These modes should be validated, but they are not when the selftests are > > executed via the kselftest helpers. It means that most CIs validating > > these selftests, like NIPA for the net development trees and LKFT for > > the stable ones, are not covering these modes. > > > > To fix that, new test programs have been added, simply calling > > mptcp_connect.sh with the right parameters. > > > > The first patch can be backported up to v5.6, and the second one up to > > v5.14. > > Looks like the failures that Paolo flagged yesterday: > > https://lore.kernel.org/all/a7a89aa2-7354-42c7-8219-99a3cafd3b33@redhat.com/ > > are back as soon as this hit NIPA :( > > https://netdev.bots.linux.dev/contest.html?branch=net-next-2025-07-16--00-00&executor=vmksft-mptcp&pw-n=0&pass=0 > > No idea why TBH, the tests run sequentially and connect.sh run before > any of the new ones. > > I'm gonna leave it in patchwork in case the next run is clean, > please use pw-bot to discard them if they keep failing. It failed again on the latest run, in a somewhat more concerning way :( # (duration 30279ms) [FAIL] file received by server does not match (in, out): # -rw------- 1 root root 5171914 Jul 16 05:24 /tmp/tmp.W2c96hxSIz # Trailing bytes are: # w,ѐ)-rw------- 1 root root 5166208 Jul 16 05:24 /tmp/tmp.s33PNcrN6M # Trailing bytes are: # (<v /&^<ֱrnFsaC7INFO: with peek mode: saveAfterPeek https://netdev-3.bots.linux.dev/vmksft-mptcp/results/211121/4-mptcp-connect-sh/stdout BTW feeding the random data into hexdump-like formatter seems advisable? :P
Hi Jakub, On 16/07/2025 16:26, Jakub Kicinski wrote: > On Tue, 15 Jul 2025 18:53:08 -0700 Jakub Kicinski wrote: >> On Tue, 15 Jul 2025 20:43:27 +0200 Matthieu Baerts (NGI0) wrote: >>> mptcp_connect.sh can be executed manually with "-m <MODE>" and "-C" to >>> make sure everything works as expected when using "mmap" and "sendfile" >>> modes instead of "poll", and with the MPTCP checksum support. >>> >>> These modes should be validated, but they are not when the selftests are >>> executed via the kselftest helpers. It means that most CIs validating >>> these selftests, like NIPA for the net development trees and LKFT for >>> the stable ones, are not covering these modes. >>> >>> To fix that, new test programs have been added, simply calling >>> mptcp_connect.sh with the right parameters. >>> >>> The first patch can be backported up to v5.6, and the second one up to >>> v5.14. >> >> Looks like the failures that Paolo flagged yesterday: >> >> https://lore.kernel.org/all/a7a89aa2-7354-42c7-8219-99a3cafd3b33@redhat.com/ >> >> are back as soon as this hit NIPA :( >> >> https://netdev.bots.linux.dev/contest.html?branch=net-next-2025-07-16--00-00&executor=vmksft-mptcp&pw-n=0&pass=0 >> >> No idea why TBH, the tests run sequentially and connect.sh run before >> any of the new ones. And just to be sure, no CPU or IO overload at that moment? I didn't see such errors reported by our CI, but I can try to reproduce them locally in different conditions. >> I'm gonna leave it in patchwork in case the next run is clean, >> please use pw-bot to discard them if they keep failing. Oops, sorry I forgot to reply: when I checked in the morning, the last two builds were clean. I wanted to check the next one, then I forgot :) > It failed again on the latest run, in a somewhat more concerning way :( > > # (duration 30279ms) [FAIL] file received by server does not match (in, out): > # -rw------- 1 root root 5171914 Jul 16 05:24 /tmp/tmp.W2c96hxSIz > # Trailing bytes are: > # w,ѐ)-rw------- 1 root root 5166208 Jul 16 05:24 /tmp/tmp.s33PNcrN6M > # Trailing bytes are: > # (<v /&^<rnFsaC7INFO: with peek mode: saveAfterPeek > > https://netdev-3.bots.linux.dev/vmksft-mptcp/results/211121/4-mptcp-connect-sh/stdout I see, the error can be a bit scary :) If I'm not mistaken, there was a poll timeout error before. When it is detected, the test is stopped. After each test, even in case of errors, the received file is compared with the sending one. So here, this concerning error is expected. Anyway, even if the errors are not caused by this series, I think it is better to delay these patches while we are investigating that: pw-bot: cr > BTW feeding the random data into hexdump-like formatter seems > advisable? :P It is just to check that the CIs can correctly parse random data :-D Cheers, Matt -- Sponsored by the NGI0 Core fund.
On Wed, 16 Jul 2025 16:55:21 +0200 Matthieu Baerts wrote: > >> Looks like the failures that Paolo flagged yesterday: > >> > >> https://lore.kernel.org/all/a7a89aa2-7354-42c7-8219-99a3cafd3b33@redhat.com/ > >> > >> are back as soon as this hit NIPA :( > >> > >> https://netdev.bots.linux.dev/contest.html?branch=net-next-2025-07-16--00-00&executor=vmksft-mptcp&pw-n=0&pass=0 > >> > >> No idea why TBH, the tests run sequentially and connect.sh run before > >> any of the new ones. > > And just to be sure, no CPU or IO overload at that moment? I didn't see > such errors reported by our CI, but I can try to reproduce them locally > in different conditions. None that I can see. The test run ~10min after all the builds completed, and we wait now for the CPU load to die down and writeback to finish before we kick off VMs. The VMs for various tests are running at that point, the CPU util averaged across cores is 66%.
On 16/07/2025 17:36, Jakub Kicinski wrote: > On Wed, 16 Jul 2025 16:55:21 +0200 Matthieu Baerts wrote: >>>> Looks like the failures that Paolo flagged yesterday: >>>> >>>> https://lore.kernel.org/all/a7a89aa2-7354-42c7-8219-99a3cafd3b33@redhat.com/ >>>> >>>> are back as soon as this hit NIPA :( >>>> >>>> https://netdev.bots.linux.dev/contest.html?branch=net-next-2025-07-16--00-00&executor=vmksft-mptcp&pw-n=0&pass=0 >>>> >>>> No idea why TBH, the tests run sequentially and connect.sh run before >>>> any of the new ones. >> >> And just to be sure, no CPU or IO overload at that moment? I didn't see >> such errors reported by our CI, but I can try to reproduce them locally >> in different conditions. > > None that I can see. The test run ~10min after all the builds completed, > and we wait now for the CPU load to die down and writeback to finish > before we kick off VMs. The VMs for various tests are running at that > point, the CPU util averaged across cores is 66%. Thank you for having checked, and for the explanations! OK, so maybe running stress-ng in parallel to be able to reproduce the issue might not help. We will investigate. Cheers, Matt -- Sponsored by the NGI0 Core fund.
On Wed, 16 Jul 2025 18:35:11 +0200 Matthieu Baerts wrote: > >> And just to be sure, no CPU or IO overload at that moment? I didn't see > >> such errors reported by our CI, but I can try to reproduce them locally > >> in different conditions. > > > > None that I can see. The test run ~10min after all the builds completed, > > and we wait now for the CPU load to die down and writeback to finish > > before we kick off VMs. The VMs for various tests are running at that > > point, the CPU util averaged across cores is 66%. > > Thank you for having checked, and for the explanations! > > OK, so maybe running stress-ng in parallel to be able to reproduce the > issue might not help. We will investigate. connect tests failed again overnight. Now I see why Paolo was responding on Eric's series, that seems like a more likely culprit..
Hi Jakub, On 17/07/2025 16:42, Jakub Kicinski wrote: > On Wed, 16 Jul 2025 18:35:11 +0200 Matthieu Baerts wrote: >>>> And just to be sure, no CPU or IO overload at that moment? I didn't see >>>> such errors reported by our CI, but I can try to reproduce them locally >>>> in different conditions. >>> >>> None that I can see. The test run ~10min after all the builds completed, >>> and we wait now for the CPU load to die down and writeback to finish >>> before we kick off VMs. The VMs for various tests are running at that >>> point, the CPU util averaged across cores is 66%. >> >> Thank you for having checked, and for the explanations! >> >> OK, so maybe running stress-ng in parallel to be able to reproduce the >> issue might not help. We will investigate. > > connect tests failed again overnight. Now I see why Paolo was > responding on Eric's series, that seems like a more likely culprit.. Good point, Paolo was certainly right, as always :) We do need to investigate. Note that it might be hard for me to do that the next few days as I'm travelling for work, but we are tracking the issue: https://github.com/multipath-tcp/mptcp_net-next/issues/574 I see that you already marked the mptcp-connect-sh selftest as ignored, so I guess we are not causing other troubles with the CI. (We could then also apply this series here and ignore the new tests, but it is also fine for me to wait.) Cheers, Matt -- Sponsored by the NGI0 Core fund.
On Fri, 18 Jul 2025 01:49:24 +0200 Matthieu Baerts wrote: > I see that you already marked the mptcp-connect-sh selftest as ignored, > so I guess we are not causing other troubles with the CI. (We could then > also apply this series here and ignore the new tests, but it is also > fine for me to wait.) If you're okay either way I'd rather wait. From our perspective the new tests would go straight into the ignore bucket.
On Thu, 17 Jul 2025 18:33:46 -0700 Jakub Kicinski wrote: > On Fri, 18 Jul 2025 01:49:24 +0200 Matthieu Baerts wrote: > > I see that you already marked the mptcp-connect-sh selftest as ignored, > > so I guess we are not causing other troubles with the CI. (We could then > > also apply this series here and ignore the new tests, but it is also > > fine for me to wait.) > > If you're okay either way I'd rather wait. From our perspective the new > tests would go straight into the ignore bucket. Restoring now, given Paolo's fixes.
© 2016 - 2025 Red Hat, Inc.