[PATCH bpf-next v6 00/15] selftests/bpf: Integrate test_xsk.c to test_progs framework

Bastien Curutchet (eBPF Foundation) posted 15 patches 3 months, 1 week ago
There is a newer version of this series
tools/testing/selftests/bpf/Makefile              |   11 +-
tools/testing/selftests/bpf/prog_tests/test_xsk.c | 2595 ++++++++++++++++++++
tools/testing/selftests/bpf/prog_tests/test_xsk.h |  298 +++
tools/testing/selftests/bpf/prog_tests/xsk.c      |  151 ++
tools/testing/selftests/bpf/xskxceiver.c          | 2696 +--------------------
tools/testing/selftests/bpf/xskxceiver.h          |  156 --
6 files changed, 3183 insertions(+), 2724 deletions(-)
[PATCH bpf-next v6 00/15] selftests/bpf: Integrate test_xsk.c to test_progs framework
Posted by Bastien Curutchet (eBPF Foundation) 3 months, 1 week ago
Hi all,

The test_xsk.sh script covers many AF_XDP use cases. The tests it runs
are defined in xksxceiver.c. Since this script is used to test real
hardware, the goal here is to leave it as it is, and only integrate the
tests that run on veth peers into the test_progs framework.

I've looked into what could improve the speed in the CI:
- some tests are skipped when run on veth peers in a VM (because they
  rely on huge page allocation or HW rings). This skipping logic still
  takes some time and can be easily avoided.
- the TEARDOWN test is quite long (several seconds on its own) because
  it runs the same test 10 times in a row to ensure the teardown process
  works properly

With theses tests fully skipped in the CI and the veth setup done only
once for each mode (DRV / SKB), the execution time is reduced to about 5
seconds on my setup.
```
$ tools/testing/selftests/bpf/vmtest.sh -d $HOME/ebpf/output-regular/ -- time ./test_progs -t xsk
[...]
real    0m 5.04s
user    0m 0.38s
sys     0m 1.61s
```

It still feels a bit long, but there are 24 tests run in both DRV and
SKB modes which means around 100ms for each one. I'm not sure I can make
it much faster without randomizing the tests so that not all of them run
in every CI execution.

PATCH 1 extracts test_xsk[.c/.h] from xskxceiver[.c/.h] to make the
tests available to test_progs.
PATCH 2 to 7 fix small issues in the current test
PATCH 8 to 13 handle all errors to release resources instead of calling
exit() when any error occurs.
PATCH 14 isolates the tests that won't fit in the CI
PATCH 15 integrates the CI tests to the test_progs framework

Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
---
Changes in v6:
- Setup veth peer once for each mode instead of once for each substest
- Rename the 'flaky' table 'skip-ci' table and move the automatically
  skipped and the longest tests into it
- Link to v5: https://lore.kernel.org/r/20251016-xsk-v5-0-662c95eb8005@bootlin.com

Changes in v5:
- Rebase on latest bpf-next_base
- Move XDP_ADJUST_TAIL_SHRINK_MULTI_BUFF to the flaky table
- Add Maciej's reviewed-by
- Link to v4: https://lore.kernel.org/r/20250924-xsk-v4-0-20e57537b876@bootlin.com

Changes in v4:
- Fix test_xsk.sh's summary report.
- Merge PATCH 11 & 12 together, otherwise PATCH 11 fails to build.
- Split old PATCH 3 in two patches. The first one fixes
  testapp_stats_rx_dropped(), the second one fixes
  testapp_xdp_shared_umem(). The unecessary frees (in
  testapp_stats_rx_full() and testapp_stats_fill_empty() are removed)
- Link to v3: https://lore.kernel.org/r/20250904-xsk-v3-0-ce382e331485@bootlin.com

Changes in v3:
- Rebase on latest bpf-next_base to integrate commit c9110e6f7237 ("selftests/bpf:
Fix count write in testapp_xdp_metadata_copy()").
- Move XDP_METADATA_COPY_* tests from flaky-tests to nominal tests
- Link to v2: https://lore.kernel.org/r/20250902-xsk-v2-0-17c6345d5215@bootlin.com

Changes in v2:
- Rebase on the latest bpf-next_base and integrate the newly added tests
  to the work (adjust_tail* and tx_queue_consumer tests)
- Re-order patches to split xkxceiver sooner.
- Fix the bug reported by Maciej.
- Fix verbose mode in test_xsk.sh by keeping kselftest (remove PATCH 1,
  7 and 8)
- Link to v1: https://lore.kernel.org/r/20250313-xsk-v1-0-7374729a93b9@bootlin.com

---
Bastien Curutchet (eBPF Foundation) (15):
      selftests/bpf: test_xsk: Split xskxceiver
      selftests/bpf: test_xsk: Initialize bitmap before use
      selftests/bpf: test_xsk: Fix __testapp_validate_traffic()'s return value
      selftests/bpf: test_xsk: fix memory leak in testapp_stats_rx_dropped()
      selftests/bpf: test_xsk: fix memory leak in testapp_xdp_shared_umem()
      selftests/bpf: test_xsk: Wrap test clean-up in functions
      selftests/bpf: test_xsk: Release resources when swap fails
      selftests/bpf: test_xsk: Add return value to init_iface()
      selftests/bpf: test_xsk: Don't exit immediately when xsk_attach fails
      selftests/bpf: test_xsk: Don't exit immediately when gettimeofday fails
      selftests/bpf: test_xsk: Don't exit immediately when workers fail
      selftests/bpf: test_xsk: Don't exit immediately if validate_traffic fails
      selftests/bpf: test_xsk: Don't exit immediately on allocation failures
      selftests/bpf: test_xsk: Isolate non-CI tests
      selftests/bpf: test_xsk: Integrate test_xsk.c to test_progs framework

 tools/testing/selftests/bpf/Makefile              |   11 +-
 tools/testing/selftests/bpf/prog_tests/test_xsk.c | 2595 ++++++++++++++++++++
 tools/testing/selftests/bpf/prog_tests/test_xsk.h |  298 +++
 tools/testing/selftests/bpf/prog_tests/xsk.c      |  151 ++
 tools/testing/selftests/bpf/xskxceiver.c          | 2696 +--------------------
 tools/testing/selftests/bpf/xskxceiver.h          |  156 --
 6 files changed, 3183 insertions(+), 2724 deletions(-)
---
base-commit: 4481a8590725400f37d3015f0ee0d53a2cdc1bd6
change-id: 20250218-xsk-0cf90e975d14

Best regards,
-- 
Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Re: [PATCH bpf-next v6 00/15] selftests/bpf: Integrate test_xsk.c to test_progs framework
Posted by Alexei Starovoitov 3 months, 1 week ago
On Wed, Oct 29, 2025 at 6:52 AM Bastien Curutchet (eBPF Foundation)
<bastien.curutchet@bootlin.com> wrote:
>
> Hi all,
>
> The test_xsk.sh script covers many AF_XDP use cases. The tests it runs
> are defined in xksxceiver.c. Since this script is used to test real
> hardware, the goal here is to leave it as it is, and only integrate the
> tests that run on veth peers into the test_progs framework.
>
> I've looked into what could improve the speed in the CI:
> - some tests are skipped when run on veth peers in a VM (because they
>   rely on huge page allocation or HW rings). This skipping logic still
>   takes some time and can be easily avoided.
> - the TEARDOWN test is quite long (several seconds on its own) because
>   it runs the same test 10 times in a row to ensure the teardown process
>   works properly
>
> With theses tests fully skipped in the CI and the veth setup done only
> once for each mode (DRV / SKB), the execution time is reduced to about 5
> seconds on my setup.
> ```
> $ tools/testing/selftests/bpf/vmtest.sh -d $HOME/ebpf/output-regular/ -- time ./test_progs -t xsk
> [...]
> real    0m 5.04s
> user    0m 0.38s
> sys     0m 1.61s

This is fine. I see
Summary: 2/48 PASSED, 0 SKIPPED, 0 FAILED

real    0m8.165s
user    0m1.795s
sys     0m4.740s

on debug kernel with kasan which is ok.

But it conflicts with itself :(

$ test_progs -j -t xsk

All error logs:
setup_veth:FAIL:ip link add veth0 numtxqueues 4 numrxqueues 4 type
veth peer name veth1 numtxqueues 4 numrxqueues 4 unexpected error: 512
(errno 2)
test_xsk_drv:FAIL:setup veth unexpected error: -1 (errno 2)
#664     xsk_drv:FAIL
Summary: 1/24 PASSED, 0 SKIPPED, 1 FAILED

Pls fix the parallel run and not by adding "_serial", of course.

pw-bot: cr
Re: [PATCH bpf-next v6 00/15] selftests/bpf: Integrate test_xsk.c to test_progs framework
Posted by Bastien Curutchet 3 months, 1 week ago
Hi,

On 10/29/25 7:54 PM, Alexei Starovoitov wrote:
> On Wed, Oct 29, 2025 at 6:52 AM Bastien Curutchet (eBPF Foundation)
> <bastien.curutchet@bootlin.com> wrote:
>>
>> Hi all,
>>
>> The test_xsk.sh script covers many AF_XDP use cases. The tests it runs
>> are defined in xksxceiver.c. Since this script is used to test real
>> hardware, the goal here is to leave it as it is, and only integrate the
>> tests that run on veth peers into the test_progs framework.
>>
>> I've looked into what could improve the speed in the CI:
>> - some tests are skipped when run on veth peers in a VM (because they
>>    rely on huge page allocation or HW rings). This skipping logic still
>>    takes some time and can be easily avoided.
>> - the TEARDOWN test is quite long (several seconds on its own) because
>>    it runs the same test 10 times in a row to ensure the teardown process
>>    works properly
>>
>> With theses tests fully skipped in the CI and the veth setup done only
>> once for each mode (DRV / SKB), the execution time is reduced to about 5
>> seconds on my setup.
>> ```
>> $ tools/testing/selftests/bpf/vmtest.sh -d $HOME/ebpf/output-regular/ -- time ./test_progs -t xsk
>> [...]
>> real    0m 5.04s
>> user    0m 0.38s
>> sys     0m 1.61s
> 
> This is fine. I see
> Summary: 2/48 PASSED, 0 SKIPPED, 0 FAILED
> 
> real    0m8.165s
> user    0m1.795s
> sys     0m4.740s
> 
> on debug kernel with kasan which is ok.
> > But it conflicts with itself :(
> 
> $ test_progs -j -t xsk
> 
> All error logs:
> setup_veth:FAIL:ip link add veth0 numtxqueues 4 numrxqueues 4 type
> veth peer name veth1 numtxqueues 4 numrxqueues 4 unexpected error: 512
> (errno 2)
> test_xsk_drv:FAIL:setup veth unexpected error: -1 (errno 2)
> #664     xsk_drv:FAIL
> Summary: 1/24 PASSED, 0 SKIPPED, 1 FAILED
> 
> Pls fix the parallel run and not by adding "_serial", of course.
Oups, in my quest for speed I removed the 'test_ns' prefix. It didn't 
seem necessary since all tests are run at once, but I forgot about 
parallel execution between the DRV and SKB modes..

Sorry about this, I'll put back the 'test_ns' prefix.

It will be a good opportunity to address some of the AI feedback I received.


Best regards,
Bastien