.../testing/selftests/net/tcp_ao/key-management.c | 46 ++++--- tools/testing/selftests/net/tcp_ao/lib/sock.c | 12 +- tools/testing/selftests/net/tcp_ao/rst.c | 138 ++++++++++++++------- 3 files changed, 124 insertions(+), 72 deletions(-)
Changes in v2:
- Dropped "selftests/net: Clean-up double assignment", going to send it
to net-next with other changes (Simon)
- Added a patch to rectify RST selftests.
- Link to v1: https://lore.kernel.org/r/20240118-tcp-ao-test-key-mgmt-v1-0-3583ca147113@arista.com
Two typo fixes, noticed by Mohammad's review.
And a fix for an issue that got uncovered.
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
Dmitry Safonov (2):
selftests/net: Rectify key counters checks
selftests/net: Repair RST passive reset selftest
Mohammad Nassiri (1):
selftests/net: Argument value mismatch when calling verify_counters()
.../testing/selftests/net/tcp_ao/key-management.c | 46 ++++---
tools/testing/selftests/net/tcp_ao/lib/sock.c | 12 +-
tools/testing/selftests/net/tcp_ao/rst.c | 138 ++++++++++++++-------
3 files changed, 124 insertions(+), 72 deletions(-)
---
base-commit: ecb1b8288dc7ccbdcb3b9df005fa1c0e0c0388a7
change-id: 20240118-tcp-ao-test-key-mgmt-bb51a5fe15a2
Best regards,
--
Dmitry Safonov <dima@arista.com>
On Tue, 30 Jan 2024 03:51:51 +0000 Dmitry Safonov wrote: > Two typo fixes, noticed by Mohammad's review. > And a fix for an issue that got uncovered. I can confirm that all tests pass now :) Thank you!
On 2/1/24 00:36, Jakub Kicinski wrote:
> On Tue, 30 Jan 2024 03:51:51 +0000 Dmitry Safonov wrote:
>> Two typo fixes, noticed by Mohammad's review.
>> And a fix for an issue that got uncovered.
>
> I can confirm that all tests pass now :)
> Thank you!
Thanks Jakub!
Please, let me know if there will be other issues with tcp-ao tests :)
Going to work on tracepoints and some other TCP-AO stuff for net-next.
Thanks,
Dmitry
On Thu, 1 Feb 2024 00:50:46 +0000 Dmitry Safonov wrote: > Please, let me know if there will be other issues with tcp-ao tests :) > > Going to work on tracepoints and some other TCP-AO stuff for net-next. Since you're being nice and helpful I figured I'll try testing TCP-AO with debug options enabled :) (kernel/configs/debug.config and kernel/configs/x86_debug.config included), that slows things down and causes a bit of flakiness in unsigned-md5-* tests: https://netdev.bots.linux.dev/flakes.html?br-cnt=75&tn-needle=tcp-ao This has links to outputs: https://netdev.bots.linux.dev/contest.html?executor=vmksft-tcp-ao-dbg&pass=0 If it's a timing thing - FWIW we started exporting KSFT_MACHINE_SLOW=yes on the slow runners.
Hi Jakub,
On 2/1/24 21:21, Jakub Kicinski wrote:
> On Thu, 1 Feb 2024 00:50:46 +0000 Dmitry Safonov wrote:
>> Please, let me know if there will be other issues with tcp-ao tests :)
>>
>> Going to work on tracepoints and some other TCP-AO stuff for net-next.
>
> Since you're being nice and helpful I figured I'll try testing TCP-AO
> with debug options enabled :) (kernel/configs/debug.config and
> kernel/configs/x86_debug.config included),
Haha :)
> that slows things down
> and causes a bit of flakiness in unsigned-md5-* tests:
>
> https://netdev.bots.linux.dev/flakes.html?br-cnt=75&tn-needle=tcp-ao
>
> This has links to outputs:
> https://netdev.bots.linux.dev/contest.html?executor=vmksft-tcp-ao-dbg&pass=0
>
> If it's a timing thing - FWIW we started exporting
> KSFT_MACHINE_SLOW=yes on the slow runners.
I think, I know what happens here:
# ok 8 AO server (AO_REQUIRED): AO client: counter TCPAOGood increased 4
=> 6
# ok 9 AO server (AO_REQUIRED): unsigned client
# ok 10 AO server (AO_REQUIRED): unsigned client: counter TCPAORequired
increased 1 => 2
# not ok 11 AO server (AO_REQUIRED): unsigned client: Counter
netns_ao_good was not expected to increase 7 => 8
for each of tests the server listens at a new port, but re-uses the same
namespaces+veth. If the node/machine is quite slow, I guess a segment
might have been retransmitted and the test that initiated it had already
finished.
And as result, the per-namespace counters are incremented, which makes
the test fail (IOW, the test expects all segments in ns being dropped).
So, I should do one of the options:
1. relax per-namespace checks (the per-socket and per-key counters are
checked)
2. unshare(net) + veth setup for each test
3. split the selftest on smaller ones (as they create new net-ns in
initialization)
I'd probably prefer (2), albeit it slows down that slow machine even
more, but I don't think creating 2 net-ns + veth pair per each test
would add a lot more overhead even on some rpi board. But let's see,
maybe I'll just go with (1) as that's really easy.
I'll cook a patch this week.
Thanks,
Dmitry
On 2/1/24 22:25, Dmitry Safonov wrote:
> Hi Jakub,
>
> On 2/1/24 21:21, Jakub Kicinski wrote:
>> On Thu, 1 Feb 2024 00:50:46 +0000 Dmitry Safonov wrote:
>>> Please, let me know if there will be other issues with tcp-ao tests :)
>>>
>>> Going to work on tracepoints and some other TCP-AO stuff for net-next.
>>
>> Since you're being nice and helpful I figured I'll try testing TCP-AO
>> with debug options enabled :) (kernel/configs/debug.config and
>> kernel/configs/x86_debug.config included),
>
> Haha :)
>
>> that slows things down
>> and causes a bit of flakiness in unsigned-md5-* tests:
>>
>> https://netdev.bots.linux.dev/flakes.html?br-cnt=75&tn-needle=tcp-ao
>>
>> This has links to outputs:
>> https://netdev.bots.linux.dev/contest.html?executor=vmksft-tcp-ao-dbg&pass=0
>>
>> If it's a timing thing - FWIW we started exporting
>> KSFT_MACHINE_SLOW=yes on the slow runners.
>
> I think, I know what happens here:
>
> # ok 8 AO server (AO_REQUIRED): AO client: counter TCPAOGood increased 4
> => 6
> # ok 9 AO server (AO_REQUIRED): unsigned client
> # ok 10 AO server (AO_REQUIRED): unsigned client: counter TCPAORequired
> increased 1 => 2
> # not ok 11 AO server (AO_REQUIRED): unsigned client: Counter
> netns_ao_good was not expected to increase 7 => 8
>
> for each of tests the server listens at a new port, but re-uses the same
> namespaces+veth. If the node/machine is quite slow, I guess a segment
> might have been retransmitted and the test that initiated it had already
> finished.
> And as result, the per-namespace counters are incremented, which makes
> the test fail (IOW, the test expects all segments in ns being dropped).
>
> So, I should do one of the options:
>
> 1. relax per-namespace checks (the per-socket and per-key counters are
> checked)
> 2. unshare(net) + veth setup for each test
> 3. split the selftest on smaller ones (as they create new net-ns in
> initialization)
Actually, I think there may be an easier fix:
4. Make sure that client close()s TCP-AO first, making it twsk.
And also make sure that net-ns counters read post server's close().
Will do this, let's see if this fixes the flakiness on the netdev bot :)
> I'd probably prefer (2), albeit it slows down that slow machine even
> more, but I don't think creating 2 net-ns + veth pair per each test
> would add a lot more overhead even on some rpi board. But let's see,
> maybe I'll just go with (1) as that's really easy.
>
> I'll cook a patch this week.
Thanks,
Dmitry
On 2/1/24 23:37, Dmitry Safonov wrote:
> On 2/1/24 22:25, Dmitry Safonov wrote:
>> Hi Jakub,
>>
>> On 2/1/24 21:21, Jakub Kicinski wrote:
>>> On Thu, 1 Feb 2024 00:50:46 +0000 Dmitry Safonov wrote:
>>>> Please, let me know if there will be other issues with tcp-ao tests :)
>>>>
>>>> Going to work on tracepoints and some other TCP-AO stuff for net-next.
>>>
>>> Since you're being nice and helpful I figured I'll try testing TCP-AO
>>> with debug options enabled :) (kernel/configs/debug.config and
>>> kernel/configs/x86_debug.config included),
>>
>> Haha :)
>>
>>> that slows things down
>>> and causes a bit of flakiness in unsigned-md5-* tests:
>>>
>>> https://netdev.bots.linux.dev/flakes.html?br-cnt=75&tn-needle=tcp-ao
>>>
>>> This has links to outputs:
>>> https://netdev.bots.linux.dev/contest.html?executor=vmksft-tcp-ao-dbg&pass=0
>>>
>>> If it's a timing thing - FWIW we started exporting
>>> KSFT_MACHINE_SLOW=yes on the slow runners.
>>
>> I think, I know what happens here:
>>
>> # ok 8 AO server (AO_REQUIRED): AO client: counter TCPAOGood increased 4
>> => 6
>> # ok 9 AO server (AO_REQUIRED): unsigned client
>> # ok 10 AO server (AO_REQUIRED): unsigned client: counter TCPAORequired
>> increased 1 => 2
>> # not ok 11 AO server (AO_REQUIRED): unsigned client: Counter
>> netns_ao_good was not expected to increase 7 => 8
>>
>> for each of tests the server listens at a new port, but re-uses the same
>> namespaces+veth. If the node/machine is quite slow, I guess a segment
>> might have been retransmitted and the test that initiated it had already
>> finished.
>> And as result, the per-namespace counters are incremented, which makes
>> the test fail (IOW, the test expects all segments in ns being dropped).
>>
>> So, I should do one of the options:
>>
>> 1. relax per-namespace checks (the per-socket and per-key counters are
>> checked)
>> 2. unshare(net) + veth setup for each test
>> 3. split the selftest on smaller ones (as they create new net-ns in
>> initialization)
>
> Actually, I think there may be an easier fix:
>
> 4. Make sure that client close()s TCP-AO first, making it twsk.
> And also make sure that net-ns counters read post server's close().
>
> Will do this, let's see if this fixes the flakiness on the netdev bot :)
FWIW, I ended up with this:
https://lore.kernel.org/all/20240202-unsigned-md5-netns-counters-v1-1-8b90c37c0566@arista.com/
I reproduced the issue once, running unsigned-md5* in a loop, while in
another terminal building linux-next with all cores.
With the patch above, it survived 77 iterations of both ipv4/ipv6 tests
so far. So, there is a chance it fixes the issue :)
Thanks,
Dmitry
On Fri, 2 Feb 2024 02:30:52 +0000 Dmitry Safonov wrote: > > Actually, I think there may be an easier fix: > > > > 4. Make sure that client close()s TCP-AO first, making it twsk. > > And also make sure that net-ns counters read post server's close(). > > > > Will do this, let's see if this fixes the flakiness on the netdev bot :) > > FWIW, I ended up with this: > https://lore.kernel.org/all/20240202-unsigned-md5-netns-counters-v1-1-8b90c37c0566@arista.com/ > > I reproduced the issue once, running unsigned-md5* in a loop, while in > another terminal building linux-next with all cores. > With the patch above, it survived 77 iterations of both ipv4/ipv6 tests > so far. So, there is a chance it fixes the issue :) That was quick! Fingers crossed :)
© 2016 - 2025 Red Hat, Inc.