[PATCH] selftests/net: deflake GRO tests and fix return value and output

Kevin Krakauer posted 1 patch 10 months ago
There is a newer version of this series
tools/testing/selftests/net/gro.c         | 8 +++++---
tools/testing/selftests/net/gro.sh        | 5 +++--
tools/testing/selftests/net/setup_veth.sh | 1 +
3 files changed, 9 insertions(+), 5 deletions(-)
[PATCH] selftests/net: deflake GRO tests and fix return value and output
Posted by Kevin Krakauer 10 months ago
GRO tests are timing dependent and can easily flake. This is partially
mitigated in gro.sh by giving each subtest 3 chances to pass. However,
this still flakes on some machines.

Set the device's napi_defer_hard_irqs to 50 so that GRO is less likely
to immediately flush. This already happened in setup_loopback.sh, but
wasn't added to setup_veth.sh. This accounts for most of the reduction
in flakiness.

We also increase the number of chances for success from 3 to 6.

`gro.sh -t <test>` now returns a passing/failing exit code as expected.

gro.c:main no longer erroneously claims a test passes when running as a
server.

Tested: Ran `gro.sh -t large` 100 times with and without this change.
Passed 100/100 with and 64/100 without. Ran inside strace to increase
flakiness.

Signed-off-by: Kevin Krakauer <krakauer@google.com>
---
 tools/testing/selftests/net/gro.c         | 8 +++++---
 tools/testing/selftests/net/gro.sh        | 5 +++--
 tools/testing/selftests/net/setup_veth.sh | 1 +
 3 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/net/gro.c b/tools/testing/selftests/net/gro.c
index b2184847e388..d5824eadea10 100644
--- a/tools/testing/selftests/net/gro.c
+++ b/tools/testing/selftests/net/gro.c
@@ -1318,11 +1318,13 @@ int main(int argc, char **argv)
 	read_MAC(src_mac, smac);
 	read_MAC(dst_mac, dmac);
 
-	if (tx_socket)
+	if (tx_socket) {
 		gro_sender();
-	else
+	} else {
+		/* Only the receiver exit status determines test success. */
 		gro_receiver();
+		fprintf(stderr, "Gro::%s test passed.\n", testname);
+	}
 
-	fprintf(stderr, "Gro::%s test passed.\n", testname);
 	return 0;
 }
diff --git a/tools/testing/selftests/net/gro.sh b/tools/testing/selftests/net/gro.sh
index 02c21ff4ca81..703173f8c8a9 100755
--- a/tools/testing/selftests/net/gro.sh
+++ b/tools/testing/selftests/net/gro.sh
@@ -21,7 +21,7 @@ run_test() {
   # Each test is run 3 times to deflake, because given the receive timing,
   # not all packets that should coalesce will be considered in the same flow
   # on every try.
-  for tries in {1..3}; do
+  for tries in {1..6}; do
     # Actual test starts here
     ip netns exec $server_ns ./gro "${ARGS[@]}" "--rx" "--iface" "server" \
       1>>log.txt &
@@ -100,5 +100,6 @@ trap cleanup EXIT
 if [[ "${test}" == "all" ]]; then
   run_all_tests
 else
-  run_test "${proto}" "${test}"
+  exit_code=$(run_test "${proto}" "${test}")
+  exit $exit_code
 fi;
diff --git a/tools/testing/selftests/net/setup_veth.sh b/tools/testing/selftests/net/setup_veth.sh
index 1f78a87f6f37..9882ad730c24 100644
--- a/tools/testing/selftests/net/setup_veth.sh
+++ b/tools/testing/selftests/net/setup_veth.sh
@@ -12,6 +12,7 @@ setup_veth_ns() {
 
 	[[ -e /var/run/netns/"${ns_name}" ]] || ip netns add "${ns_name}"
 	echo 1000000 > "/sys/class/net/${ns_dev}/gro_flush_timeout"
+	echo 50 > "/sys/class/net/${ns_dev}/napi_defer_hard_irqs"
 	ip link set dev "${ns_dev}" netns "${ns_name}" mtu 65535
 	ip -netns "${ns_name}" link set dev "${ns_dev}" up
 
-- 
2.48.1
Re: [PATCH] selftests/net: deflake GRO tests and fix return value and output
Posted by Petr Machata 9 months, 3 weeks ago
Kevin Krakauer <krakauer@google.com> writes:

> diff --git a/tools/testing/selftests/net/gro.sh b/tools/testing/selftests/net/gro.sh
> index 02c21ff4ca81..703173f8c8a9 100755
> --- a/tools/testing/selftests/net/gro.sh
> +++ b/tools/testing/selftests/net/gro.sh
> @@ -21,7 +21,7 @@ run_test() {
>    # Each test is run 3 times to deflake, because given the receive timing,

-    # Each test is run 3 times to deflake, because given the receive timing,
+    # Each test is run 6 times to deflake, because given the receive timing,

>    # not all packets that should coalesce will be considered in the same flow
>    # on every try.
> -  for tries in {1..3}; do
> +  for tries in {1..6}; do
>      # Actual test starts here
>      ip netns exec $server_ns ./gro "${ARGS[@]}" "--rx" "--iface" "server" \
>        1>>log.txt &
Re: [PATCH] selftests/net: deflake GRO tests and fix return value and output
Posted by Jakub Kicinski 10 months ago
On Tue, 18 Feb 2025 08:45:55 -0800 Kevin Krakauer wrote:
> GRO tests are timing dependent and can easily flake. This is partially
> mitigated in gro.sh by giving each subtest 3 chances to pass. However,
> this still flakes on some machines.

To be clear - are you running this over veth or a real device?

> Set the device's napi_defer_hard_irqs to 50 so that GRO is less likely
> to immediately flush. This already happened in setup_loopback.sh, but
> wasn't added to setup_veth.sh. This accounts for most of the reduction
> in flakiness.

That doesn't make intuitive sense to me. If we already defer flushes
why do we need to also defer IRQs?

> We also increase the number of chances for success from 3 to 6.
> 
> `gro.sh -t <test>` now returns a passing/failing exit code as expected.
> 
> gro.c:main no longer erroneously claims a test passes when running as a
> server.
> 
> Tested: Ran `gro.sh -t large` 100 times with and without this change.
> Passed 100/100 with and 64/100 without. Ran inside strace to increase
> flakiness.
> 
> Signed-off-by: Kevin Krakauer <krakauer@google.com>
> ---
>  tools/testing/selftests/net/gro.c         | 8 +++++---
>  tools/testing/selftests/net/gro.sh        | 5 +++--
>  tools/testing/selftests/net/setup_veth.sh | 1 +
>  3 files changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/tools/testing/selftests/net/gro.c b/tools/testing/selftests/net/gro.c
> index b2184847e388..d5824eadea10 100644
> --- a/tools/testing/selftests/net/gro.c
> +++ b/tools/testing/selftests/net/gro.c
> @@ -1318,11 +1318,13 @@ int main(int argc, char **argv)
>  	read_MAC(src_mac, smac);
>  	read_MAC(dst_mac, dmac);
>  
> -	if (tx_socket)
> +	if (tx_socket) {
>  		gro_sender();
> -	else
> +	} else {
> +		/* Only the receiver exit status determines test success. */
>  		gro_receiver();
> +		fprintf(stderr, "Gro::%s test passed.\n", testname);
> +	}
>  
> -	fprintf(stderr, "Gro::%s test passed.\n", testname);


That seems quite separate to the stability fix?

>  	return 0;
>  }
> diff --git a/tools/testing/selftests/net/gro.sh b/tools/testing/selftests/net/gro.sh
> index 02c21ff4ca81..703173f8c8a9 100755
> --- a/tools/testing/selftests/net/gro.sh
> +++ b/tools/testing/selftests/net/gro.sh
> @@ -21,7 +21,7 @@ run_test() {
>    # Each test is run 3 times to deflake, because given the receive timing,
>    # not all packets that should coalesce will be considered in the same flow
>    # on every try.
> -  for tries in {1..3}; do
> +  for tries in {1..6}; do
>      # Actual test starts here
>      ip netns exec $server_ns ./gro "${ARGS[@]}" "--rx" "--iface" "server" \
>        1>>log.txt &  
> @@ -100,5 +100,6 @@ trap cleanup EXIT
>  if [[ "${test}" == "all" ]]; then
>    run_all_tests
>  else
> -  run_test "${proto}" "${test}"
> +  exit_code=$(run_test "${proto}" "${test}")
> +  exit $exit_code

Also separate from stability?

Let's split the patch up into logically separate changes.

>  fi;
> diff --git a/tools/testing/selftests/net/setup_veth.sh b/tools/testing/selftests/net/setup_veth.sh
> index 1f78a87f6f37..9882ad730c24 100644
> --- a/tools/testing/selftests/net/setup_veth.sh
> +++ b/tools/testing/selftests/net/setup_veth.sh
> @@ -12,6 +12,7 @@ setup_veth_ns() {
>  
>  	[[ -e /var/run/netns/"${ns_name}" ]] || ip netns add "${ns_name}"
>  	echo 1000000 > "/sys/class/net/${ns_dev}/gro_flush_timeout"
> +	echo 50 > "/sys/class/net/${ns_dev}/napi_defer_hard_irqs"
>  	ip link set dev "${ns_dev}" netns "${ns_name}" mtu 65535
>  	ip -netns "${ns_name}" link set dev "${ns_dev}" up
>  
-- 
pw-bot: cr
[PATCH] selftests/net: deflake GRO tests and fix return value and output
Posted by Kevin Krakauer 9 months, 4 weeks ago
Thanks for the review! I'll split this up. Do you think it's better as two
patchsets -- one for stability/deflaking, one for return value and output
cleanup -- or as a single patchset with several commits?

> To be clear - are you running this over veth or a real device?

Over a veth.

>> Set the device's napi_defer_hard_irqs to 50 so that GRO is less likely
>> to immediately flush. This already happened in setup_loopback.sh, but
>> wasn't added to setup_veth.sh. This accounts for most of the reduction
>> in flakiness.
>
>That doesn't make intuitive sense to me. If we already defer flushes
>why do we need to also defer IRQs?

Yep, the behavior here is weird. I ran `gro.sh -t large` 1000 times with each of
the following setups (all inside strace to increase flakiness):

- gro_flush_timeout=1ms, napi_defer_hard_irqs=0  --> failed to GRO 29 times
- gro_flush_timeout=5ms, napi_defer_hard_irqs=0  --> failed to GRO 45 times
- gro_flush_timeout=50ms, napi_defer_hard_irqs=0 --> failed to GRO 35 times
- gro_flush_timeout=1ms, napi_defer_hard_irqs=1  --> failed to GRO 0 times
- gro_flush_timeout=1ms, napi_defer_hard_irqs=50 --> failed to GRO 0 times

napi_defer_hard_irqs is clearly having an effect. And deferring once is enough.
I believe that deferring IRQs prevents anything else from causing a GRO flush
before gro_flush_timeout expires. While waiting for the timeout to expire, an
incoming packet can cause napi_complete_done and thus napi_gro_flush to run.
Outgoing packets from the veth can also cause this: veth_xmit calls
__veth_xdp_flush, which only actually does anything when IRQs are enabled.

So napi_defer_hard_irqs=1 seems sufficient to allow the full gro_flush_timeout
to expire before flushing GRO.
Re: [PATCH] selftests/net: deflake GRO tests and fix return value and output
Posted by Jakub Kicinski 9 months, 3 weeks ago
On Sun, 23 Feb 2025 07:19:49 -0800 Kevin Krakauer wrote:
> Thanks for the review! I'll split this up. Do you think it's better as two
> patchsets -- one for stability/deflaking, one for return value and output
> cleanup -- or as a single patchset with several commits?

Should be fine either way, they will both end up in net-next.
One patchset may be easier to merge, as we can't CI-test two
conflicting series on the list.

> > To be clear - are you running this over veth or a real device?  
> 
> Over a veth.
> 
> >> Set the device's napi_defer_hard_irqs to 50 so that GRO is less likely
> >> to immediately flush. This already happened in setup_loopback.sh, but
> >> wasn't added to setup_veth.sh. This accounts for most of the reduction
> >> in flakiness.  
> >
> >That doesn't make intuitive sense to me. If we already defer flushes
> >why do we need to also defer IRQs?  
> 
> Yep, the behavior here is weird. I ran `gro.sh -t large` 1000 times with each of
> the following setups (all inside strace to increase flakiness):
> 
> - gro_flush_timeout=1ms, napi_defer_hard_irqs=0  --> failed to GRO 29 times
> - gro_flush_timeout=5ms, napi_defer_hard_irqs=0  --> failed to GRO 45 times
> - gro_flush_timeout=50ms, napi_defer_hard_irqs=0 --> failed to GRO 35 times
> - gro_flush_timeout=1ms, napi_defer_hard_irqs=1  --> failed to GRO 0 times
> - gro_flush_timeout=1ms, napi_defer_hard_irqs=50 --> failed to GRO 0 times
> 
> napi_defer_hard_irqs is clearly having an effect. And deferring once is enough.
> I believe that deferring IRQs prevents anything else from causing a GRO flush
> before gro_flush_timeout expires. While waiting for the timeout to expire, an
> incoming packet can cause napi_complete_done and thus napi_gro_flush to run.
> Outgoing packets from the veth can also cause this: veth_xmit calls
> __veth_xdp_flush, which only actually does anything when IRQs are enabled.
> 
> So napi_defer_hard_irqs=1 seems sufficient to allow the full gro_flush_timeout
> to expire before flushing GRO.

With msec-long deferrals we'll flush due to jiffies change. At least
that explains a bit. Could you maybe try lower timeouts than 1msec?
Previously we'd just keep partially-completed packets in GRO for up 
to 1msec, now we'll delay all packet processing for 1msec, that's a lot.
Re: [PATCH] selftests/net: deflake GRO tests and fix return value and output
Posted by Kevin Krakauer 9 months, 3 weeks ago
On Mon, Feb 24, 2025 at 12:48:30PM -0800, Jakub Kicinski wrote:
> With msec-long deferrals we'll flush due to jiffies change. At least
> that explains a bit. Could you maybe try lower timeouts than 1msec?
> Previously we'd just keep partially-completed packets in GRO for up 
> to 1msec, now we'll delay all packet processing for 1msec, that's a lot.

Results again with each test run 1000 times:

gro_flush_timeout=50us  napi_defer_hard_irqs=1 --> failed to GRO 0 times
gro_flush_timeout=100us napi_defer_hard_irqs=1 --> failed to GRO 0 times

gro_flush_timeout=50us  napi_defer_hard_irqs=0 --> failed to GRO 36 times
gro_flush_timeout=100us napi_defer_hard_irqs=0 --> failed to GRO 46 times

100us with 1 defer seems to work fine and is well below the duration of
a jiffy. So we'll usually be testing the "default" GRO path and only
occasionally the jiffy-update path. I'll make these the numbers in the
revised patch unless someone thinks otherwise.