[PATCH net-next v5 00/10] Decouple receive and transmit enablement in team driver

Marc Harvey posted 10 patches 2 months, 1 week ago
There is a newer version of this series
drivers/net/team/team_core.c                       | 237 +++++++++++++++----
drivers/net/team/team_mode_loadbalance.c           |   8 +-
drivers/net/team/team_mode_random.c                |   4 +-
drivers/net/team/team_mode_roundrobin.c            |   2 +-
include/linux/if_team.h                            |  63 ++++--
tools/testing/selftests/drivers/net/team/Makefile  |   4 +
tools/testing/selftests/drivers/net/team/config    |   4 +
.../drivers/net/team/decoupled_enablement.sh       | 249 ++++++++++++++++++++
.../testing/selftests/drivers/net/team/options.sh  |  99 +++++++-
.../testing/selftests/drivers/net/team/team_lib.sh | 172 ++++++++++++++
.../drivers/net/team/teamd_activebackup.sh         | 251 +++++++++++++++++++++
.../drivers/net/team/transmit_failover.sh          | 158 +++++++++++++
tools/testing/selftests/net/forwarding/lib.sh      |   7 +-
tools/testing/selftests/net/lib.sh                 |  13 ++
14 files changed, 1199 insertions(+), 72 deletions(-)
[PATCH net-next v5 00/10] Decouple receive and transmit enablement in team driver
Posted by Marc Harvey 2 months, 1 week ago
Allow independent control over receive and transmit enablement states
for aggregated ports in the team driver.

The motivation is that IEE 802.3ad LACP "independent control" can't
be implemented for the team driver currently. This was added to the
bonding driver in commit 240fd405528b ("bonding: Add independent
control state machine").

This series also has a few patches that add tests to show that the old
coupled enablement still works and that the new decoupled enablement
works as intended (4, 5, and 10).

There are three patches with small fixes as well, with the goal of
making the final decoupling patch clearer (1, 2, and 3).

Signed-off-by: Marc Harvey <marcharvey@google.com>
---
Changes in v5:
- Change teamd activebackup selftest in patch 5 to try graceful teamd
  teardown before using sigkill.
- Make the teamd activebackup selftest in patch 5 delete leftover
  teamd files during teardown.
- Reorder function calls in team_port_enable function in patch 7,
  since the enablement behavior shouldn't change.
- Make selftests use tcpdump instead of checking rx counters.
- Fix minor typos in patch 10.
- Link to v4: https://lore.kernel.org/r/20260403-teaming-driver-internal-v4-0-d3032f33ca25@google.com

Changes in v4:
- Split the large v3 patch "net: team: Decouple rx and tx enablement
  in the team driver" into 4 smaller patches.
- Link to v3: https://lore.kernel.org/r/20260402-teaming-driver-internal-v3-0-e8cfdec3b5c2@google.com

Changes in v3:
- Patch 5: In test cleanup, kill teamd to fix timeout.
- Link to v2: https://lore.kernel.org/r/20260401-teaming-driver-internal-v2-0-f80c1291727b@google.com

Changes in v2:
- Patch 4 and 5: Fix shellcheck errors and warnings, use iperf3
  instead of netcat+pv, fix dependency checking.
- Patch 7: Fix shellcheck errors and warnings, fix dependency
  checking.
- Link to v1: https://lore.kernel.org/all/20260331053353.2504254-1-marcharvey@google.com/

---
Marc Harvey (10):
      net: team: Annotate reads and writes for mixed lock accessed values
      net: team: Remove unused team_mode_op, port_enabled
      net: team: Rename port_disabled team mode op to port_tx_disabled
      selftests: net: Add tests for failover of team-aggregated ports
      selftests: net: Add test for enablement of ports with teamd
      net: team: Rename enablement functions and struct members to tx
      net: team: Track rx enablement separately from tx enablement
      net: team: Add new rx_enabled team port option
      net: team: Add new tx_enabled team port option
      selftests: net: Add tests for team driver decoupled tx and rx control

 drivers/net/team/team_core.c                       | 237 +++++++++++++++----
 drivers/net/team/team_mode_loadbalance.c           |   8 +-
 drivers/net/team/team_mode_random.c                |   4 +-
 drivers/net/team/team_mode_roundrobin.c            |   2 +-
 include/linux/if_team.h                            |  63 ++++--
 tools/testing/selftests/drivers/net/team/Makefile  |   4 +
 tools/testing/selftests/drivers/net/team/config    |   4 +
 .../drivers/net/team/decoupled_enablement.sh       | 249 ++++++++++++++++++++
 .../testing/selftests/drivers/net/team/options.sh  |  99 +++++++-
 .../testing/selftests/drivers/net/team/team_lib.sh | 172 ++++++++++++++
 .../drivers/net/team/teamd_activebackup.sh         | 251 +++++++++++++++++++++
 .../drivers/net/team/transmit_failover.sh          | 158 +++++++++++++
 tools/testing/selftests/net/forwarding/lib.sh      |   7 +-
 tools/testing/selftests/net/lib.sh                 |  13 ++
 14 files changed, 1199 insertions(+), 72 deletions(-)
---
base-commit: 3741f8fa004bf598cd5032b0ff240984332d6f05
change-id: 20260401-teaming-driver-internal-83f2f0074d68

Best regards,
-- 
Marc Harvey <marcharvey@google.com>
Re: [PATCH net-next v5 00/10] Decouple receive and transmit enablement in team driver
Posted by Jiri Pirko 2 months, 1 week ago
Mon, Apr 06, 2026 at 05:03:36AM +0200, marcharvey@google.com wrote:
>Allow independent control over receive and transmit enablement states
>for aggregated ports in the team driver.
>
>The motivation is that IEE 802.3ad LACP "independent control" can't
>be implemented for the team driver currently. This was added to the
>bonding driver in commit 240fd405528b ("bonding: Add independent
>control state machine").
>
>This series also has a few patches that add tests to show that the old
>coupled enablement still works and that the new decoupled enablement
>works as intended (4, 5, and 10).
>
>There are three patches with small fixes as well, with the goal of
>making the final decoupling patch clearer (1, 2, and 3).

Looks fine to me now. Do you have libteam/teamd counterpart?
Re: [PATCH net-next v5 00/10] Decouple receive and transmit enablement in team driver
Posted by Marc Harvey 2 months, 1 week ago
On Tue, Apr 7, 2026 at 4:55 AM Jiri Pirko <jiri@resnulli.us> wrote:
>
> Mon, Apr 06, 2026 at 05:03:36AM +0200, marcharvey@google.com wrote:
> >Allow independent control over receive and transmit enablement states
> >for aggregated ports in the team driver.
> >
> >The motivation is that IEE 802.3ad LACP "independent control" can't
> >be implemented for the team driver currently. This was added to the
> >bonding driver in commit 240fd405528b ("bonding: Add independent
> >control state machine").
> >
> >This series also has a few patches that add tests to show that the old
> >coupled enablement still works and that the new decoupled enablement
> >works as intended (4, 5, and 10).
> >
> >There are three patches with small fixes as well, with the goal of
> >making the final decoupling patch clearer (1, 2, and 3).
>
> Looks fine to me now. Do you have libteam/teamd counterpart?

I don't see a need for this to be used in any of the teamd runners.
Libteam should support this out of the box, since the options are
identified over netlink by their string names. The options.sh test
uses teamnl, which uses libteam, to set the new options.
Re: [PATCH net-next v5 00/10] Decouple receive and transmit enablement in team driver
Posted by Jiri Pirko 2 months, 1 week ago
Wed, Apr 08, 2026 at 02:12:35AM +0200, marcharvey@google.com wrote:
>On Tue, Apr 7, 2026 at 4:55 AM Jiri Pirko <jiri@resnulli.us> wrote:
>>
>> Mon, Apr 06, 2026 at 05:03:36AM +0200, marcharvey@google.com wrote:
>> >Allow independent control over receive and transmit enablement states
>> >for aggregated ports in the team driver.
>> >
>> >The motivation is that IEE 802.3ad LACP "independent control" can't
>> >be implemented for the team driver currently. This was added to the
>> >bonding driver in commit 240fd405528b ("bonding: Add independent
>> >control state machine").
>> >
>> >This series also has a few patches that add tests to show that the old
>> >coupled enablement still works and that the new decoupled enablement
>> >works as intended (4, 5, and 10).
>> >
>> >There are three patches with small fixes as well, with the goal of
>> >making the final decoupling patch clearer (1, 2, and 3).
>>
>> Looks fine to me now. Do you have libteam/teamd counterpart?
>
>I don't see a need for this to be used in any of the teamd runners.

Why do you need this then?
Re: [PATCH net-next v5 00/10] Decouple receive and transmit enablement in team driver
Posted by Marc Harvey 2 months ago
On Wed, Apr 8, 2026 at 2:00 AM Jiri Pirko <jiri@resnulli.us> wrote:
> Wed, Apr 08, 2026 at 02:12:35AM +0200, marcharvey@google.com wrote:
> >On Tue, Apr 7, 2026 at 4:55 AM Jiri Pirko <jiri@resnulli.us> wrote:
> >>
> >> Looks fine to me now. Do you have libteam/teamd counterpart?
> >
> >I don't see a need for this to be used in any of the teamd runners.
>
> Why do you need this then?

Initially, we plan to use a non-teamd userspace component for teaming
control due to several non-standard requirements, such as
synchronization with unrelated software. It is probably worth
converting the teamd lacp runner to independent control at some point,
because according to the spec: "It is recommended that the independent
control state diagram be implemented in preference to the coupled
control state diagram."
Re: [PATCH net-next v5 00/10] Decouple receive and transmit enablement in team driver
Posted by Jakub Kicinski 2 months, 1 week ago
On Mon, 06 Apr 2026 03:03:36 +0000 Marc Harvey wrote:
> Allow independent control over receive and transmit enablement states
> for aggregated ports in the team driver.
> 
> The motivation is that IEE 802.3ad LACP "independent control" can't
> be implemented for the team driver currently. This was added to the
> bonding driver in commit 240fd405528b ("bonding: Add independent
> control state machine").
> 
> This series also has a few patches that add tests to show that the old
> coupled enablement still works and that the new decoupled enablement
> works as intended (4, 5, and 10).
> 
> There are three patches with small fixes as well, with the goal of
> making the final decoupling patch clearer (1, 2, and 3).

activebackup:

TAP version 13
1..1
# overriding timeout to 2400
# selftests: drivers/net/team: teamd_activebackup.sh
# Setting up two-link aggregation for runner activebackup
# Teamd version is: teamd 1.32
# Conf files are /tmp/tmp.ydjNK9Um7H and /tmp/tmp.xZuc3cWbN0
# This program is not intended to be run as root.
# This program is not intended to be run as root.
# Created team devices
# Teamd PIDs are 21457 and 21461
# exec of "ip link set eth0 up" failed: No such file or directory
# exec of "ip link set eth0 up" failed: No such file or directory
# exec of "ip link set eth1 up" failed: No such file or directory
# exec of "ip link set eth1 up" failed: No such file or directory
# PING fd00::2 (fd00::2) 56 data bytes
# 64 bytes from fd00::2: icmp_seq=1 ttl=64 time=0.753 ms
# 
# --- fd00::2 ping statistics ---
# 1 packets transmitted, 1 received, 0% packet loss, time 0ms
# rtt min/avg/max/mdev = 0.753/0.753/0.753/0.000 msPacket count for test_team2 was 0
# Waiting for eth0 in ns2-lZ0gqd to stop receiving
# Packet count for eth0 was 0Packet count for eth0 was 0
# Packet count for eth1 was 0
# Waiting for eth1 in ns2-lZ0gqd to stop receiving
# Packet count for eth1 was 0Packet count for eth0 was 0
# Packet count for eth1 was 0
# TEST: teamd active backup runner test                               [FAIL]
# Traffic did not reach team interface in NS2.
# Tearing down two-link aggregation
# Failed to kill daemon: Timer expired
# Failed to kill daemon: Timer expired
# Sending sigkill to teamd for test_team1
# rm: cannot remove '/var/run/teamd/test_team1.pid': No such file or directory
# rm: cannot remove '/var/run/teamd/test_team1.sock': No such file or directory
# Sending sigkill to teamd for test_team2
# rm: cannot remove '/var/run/teamd/test_team2.pid': No such file or directory
# rm: cannot remove '/var/run/teamd/test_team2.sock': No such file or directory
not ok 1 selftests: drivers/net/team: teamd_activebackup.sh # exit=1


transmit_failover:

TAP version 13
1..1
# overriding timeout to 2400
# selftests: drivers/net/team: transmit_failover.sh
# Error: ipv6: address not found.
# Setting team in ns2-yxjiUo to mode roundrobin
# Error: ipv6: address not found.
# Setting team in ns1-Jht6kA to mode broadcast
# Packet count for eth0 was 0
# Packet count for eth1 was 0
# Packet count for eth0 was 0
# Packet count for eth1 was 0
# Packet count for eth0 was 0
# Packet count for eth1 was 0
# TEST: Failover of 'broadcast' test                                  [FAIL]
# eth0 not transmitting when both links enabled
# Setting team in ns1-Jht6kA to mode roundrobin
# Packet count for eth0 was 0
# Packet count for eth1 was 0
# Packet count for eth0 was 0
# Packet count for eth1 was 0
# Packet count for eth0 was 0
# Packet count for eth1 was 0
# TEST: Failover of 'roundrobin' test                                 [FAIL]
# eth0 not transmitting when both links enabled
# Setting team in ns1-Jht6kA to mode random
# Packet count for eth0 was 0
# Packet count for eth1 was 0
# Packet count for eth0 was 0
# Packet count for eth1 was 0
# Packet count for eth0 was 0
# Packet count for eth1 was 0
# TEST: Failover of 'random' test                                     [FAIL]
# eth0 not transmitting when both links enabled
not ok 1 selftests: drivers/net/team: transmit_failover.sh # exit=1
-- 
pw-bot: cr
Re: [PATCH net-next v5 00/10] Decouple receive and transmit enablement in team driver
Posted by Marc Harvey 2 months, 1 week ago
On Mon, Apr 6, 2026 at 7:44 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Mon, 06 Apr 2026 03:03:36 +0000 Marc Harvey wrote:
> > Allow independent control over receive and transmit enablement states
> > for aggregated ports in the team driver.
> >
> > The motivation is that IEE 802.3ad LACP "independent control" can't
> > be implemented for the team driver currently. This was added to the
> > bonding driver in commit 240fd405528b ("bonding: Add independent
> > control state machine").
> >
> > This series also has a few patches that add tests to show that the old
> > coupled enablement still works and that the new decoupled enablement
> > works as intended (4, 5, and 10).
> >
> > There are three patches with small fixes as well, with the goal of
> > making the final decoupling patch clearer (1, 2, and 3).
>
> activebackup:
>
> TAP version 13
> 1..1
> # overriding timeout to 2400
> # selftests: drivers/net/team: teamd_activebackup.sh
> # Setting up two-link aggregation for runner activebackup
> # Teamd version is: teamd 1.32
> # Conf files are /tmp/tmp.ydjNK9Um7H and /tmp/tmp.xZuc3cWbN0
> # This program is not intended to be run as root.
> # This program is not intended to be run as root.
> # Created team devices
> # Teamd PIDs are 21457 and 21461
> # exec of "ip link set eth0 up" failed: No such file or directory
> # exec of "ip link set eth0 up" failed: No such file or directory
> # exec of "ip link set eth1 up" failed: No such file or directory
> # exec of "ip link set eth1 up" failed: No such file or directory
> # PING fd00::2 (fd00::2) 56 data bytes
> # 64 bytes from fd00::2: icmp_seq=1 ttl=64 time=0.753 ms
> #
> # --- fd00::2 ping statistics ---
> # 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> # rtt min/avg/max/mdev = 0.753/0.753/0.753/0.000 msPacket count for test_team2 was 0
> # Waiting for eth0 in ns2-lZ0gqd to stop receiving
> # Packet count for eth0 was 0Packet count for eth0 was 0
> # Packet count for eth1 was 0
> # Waiting for eth1 in ns2-lZ0gqd to stop receiving
> # Packet count for eth1 was 0Packet count for eth0 was 0
> # Packet count for eth1 was 0
> # TEST: teamd active backup runner test                               [FAIL]
> # Traffic did not reach team interface in NS2.
> # Tearing down two-link aggregation
> # Failed to kill daemon: Timer expired
> # Failed to kill daemon: Timer expired
> # Sending sigkill to teamd for test_team1
> # rm: cannot remove '/var/run/teamd/test_team1.pid': No such file or directory
> # rm: cannot remove '/var/run/teamd/test_team1.sock': No such file or directory
> # Sending sigkill to teamd for test_team2
> # rm: cannot remove '/var/run/teamd/test_team2.pid': No such file or directory
> # rm: cannot remove '/var/run/teamd/test_team2.sock': No such file or directory
> not ok 1 selftests: drivers/net/team: teamd_activebackup.sh # exit=1
>
>
> transmit_failover:
>
> TAP version 13
> 1..1
> # overriding timeout to 2400
> # selftests: drivers/net/team: transmit_failover.sh
> # Error: ipv6: address not found.
> # Setting team in ns2-yxjiUo to mode roundrobin
> # Error: ipv6: address not found.
> # Setting team in ns1-Jht6kA to mode broadcast
> # Packet count for eth0 was 0
> # Packet count for eth1 was 0
> # Packet count for eth0 was 0
> # Packet count for eth1 was 0
> # Packet count for eth0 was 0
> # Packet count for eth1 was 0
> # TEST: Failover of 'broadcast' test                                  [FAIL]
> # eth0 not transmitting when both links enabled
> # Setting team in ns1-Jht6kA to mode roundrobin
> # Packet count for eth0 was 0
> # Packet count for eth1 was 0
> # Packet count for eth0 was 0
> # Packet count for eth1 was 0
> # Packet count for eth0 was 0
> # Packet count for eth1 was 0
> # TEST: Failover of 'roundrobin' test                                 [FAIL]
> # eth0 not transmitting when both links enabled
> # Setting team in ns1-Jht6kA to mode random
> # Packet count for eth0 was 0
> # Packet count for eth1 was 0
> # Packet count for eth0 was 0
> # Packet count for eth1 was 0
> # Packet count for eth0 was 0
> # Packet count for eth1 was 0
> # TEST: Failover of 'random' test                                     [FAIL]
> # eth0 not transmitting when both links enabled
> not ok 1 selftests: drivers/net/team: transmit_failover.sh # exit=1
> --
> pw-bot: cr

Apologies for all of the test failures. Before sending this revision,
I ran each test thousands of times and observed no failures, so I
thought the flakiness would be resolved.

No matter what I try, I can't recreate either issue on my end. I've
tried building with the exact config from one of the test runs
(https://netdev-ctrl.bots.linux.dev/logs/vmksft/bonding/results/590921/).
I've tried stressing the VM according to
https://github.com/linux-netdev/nipa/wiki/How-to-run-netdev-selftests-CI-style#reproducing-unstable-tests
(this makes the tests time out, but I can still see traffic). I've
tried using the netdev-testing/net-next-2026-04-06--09-00 kernel
source. I've tried in nested and unnested virtual machines. I've also
tried running multiple test instances in parallel, but nothing
recreates the issues. The issues seem related to tcpdump, but without
reproducing them, I can only guess. Any suggestions for running the
tests exactly as the CI does would be greatly appreciated.

- Marc
Re: [PATCH net-next v5 00/10] Decouple receive and transmit enablement in team driver
Posted by Marc Harvey 2 months, 1 week ago
On Mon, Apr 6, 2026 at 10:04 PM Marc Harvey <marcharvey@google.com> wrote:
>
> On Mon, Apr 6, 2026 at 7:44 AM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Mon, 06 Apr 2026 03:03:36 +0000 Marc Harvey wrote:
> > > Allow independent control over receive and transmit enablement states
> > > for aggregated ports in the team driver.
> > >
> > > The motivation is that IEE 802.3ad LACP "independent control" can't
> > > be implemented for the team driver currently. This was added to the
> > > bonding driver in commit 240fd405528b ("bonding: Add independent
> > > control state machine").
> > >
> > > This series also has a few patches that add tests to show that the old
> > > coupled enablement still works and that the new decoupled enablement
> > > works as intended (4, 5, and 10).
> > >
> > > There are three patches with small fixes as well, with the goal of
> > > making the final decoupling patch clearer (1, 2, and 3).
> >
> > activebackup:
> >
> > TAP version 13
> > 1..1
> > # overriding timeout to 2400
> > # selftests: drivers/net/team: teamd_activebackup.sh
> > # Setting up two-link aggregation for runner activebackup
> > # Teamd version is: teamd 1.32
> > # Conf files are /tmp/tmp.ydjNK9Um7H and /tmp/tmp.xZuc3cWbN0
> > # This program is not intended to be run as root.
> > # This program is not intended to be run as root.
> > # Created team devices
> > # Teamd PIDs are 21457 and 21461
> > # exec of "ip link set eth0 up" failed: No such file or directory
> > # exec of "ip link set eth0 up" failed: No such file or directory
> > # exec of "ip link set eth1 up" failed: No such file or directory
> > # exec of "ip link set eth1 up" failed: No such file or directory
> > # PING fd00::2 (fd00::2) 56 data bytes
> > # 64 bytes from fd00::2: icmp_seq=1 ttl=64 time=0.753 ms
> > #
> > # --- fd00::2 ping statistics ---
> > # 1 packets transmitted, 1 received, 0% packet loss, time 0ms
> > # rtt min/avg/max/mdev = 0.753/0.753/0.753/0.000 msPacket count for test_team2 was 0
> > # Waiting for eth0 in ns2-lZ0gqd to stop receiving
> > # Packet count for eth0 was 0Packet count for eth0 was 0
> > # Packet count for eth1 was 0
> > # Waiting for eth1 in ns2-lZ0gqd to stop receiving
> > # Packet count for eth1 was 0Packet count for eth0 was 0
> > # Packet count for eth1 was 0
> > # TEST: teamd active backup runner test                               [FAIL]
> > # Traffic did not reach team interface in NS2.
> > # Tearing down two-link aggregation
> > # Failed to kill daemon: Timer expired
> > # Failed to kill daemon: Timer expired
> > # Sending sigkill to teamd for test_team1
> > # rm: cannot remove '/var/run/teamd/test_team1.pid': No such file or directory
> > # rm: cannot remove '/var/run/teamd/test_team1.sock': No such file or directory
> > # Sending sigkill to teamd for test_team2
> > # rm: cannot remove '/var/run/teamd/test_team2.pid': No such file or directory
> > # rm: cannot remove '/var/run/teamd/test_team2.sock': No such file or directory
> > not ok 1 selftests: drivers/net/team: teamd_activebackup.sh # exit=1
> >
> >
> > transmit_failover:
> >
> > TAP version 13
> > 1..1
> > # overriding timeout to 2400
> > # selftests: drivers/net/team: transmit_failover.sh
> > # Error: ipv6: address not found.
> > # Setting team in ns2-yxjiUo to mode roundrobin
> > # Error: ipv6: address not found.
> > # Setting team in ns1-Jht6kA to mode broadcast
> > # Packet count for eth0 was 0
> > # Packet count for eth1 was 0
> > # Packet count for eth0 was 0
> > # Packet count for eth1 was 0
> > # Packet count for eth0 was 0
> > # Packet count for eth1 was 0
> > # TEST: Failover of 'broadcast' test                                  [FAIL]
> > # eth0 not transmitting when both links enabled
> > # Setting team in ns1-Jht6kA to mode roundrobin
> > # Packet count for eth0 was 0
> > # Packet count for eth1 was 0
> > # Packet count for eth0 was 0
> > # Packet count for eth1 was 0
> > # Packet count for eth0 was 0
> > # Packet count for eth1 was 0
> > # TEST: Failover of 'roundrobin' test                                 [FAIL]
> > # eth0 not transmitting when both links enabled
> > # Setting team in ns1-Jht6kA to mode random
> > # Packet count for eth0 was 0
> > # Packet count for eth1 was 0
> > # Packet count for eth0 was 0
> > # Packet count for eth1 was 0
> > # Packet count for eth0 was 0
> > # Packet count for eth1 was 0
> > # TEST: Failover of 'random' test                                     [FAIL]
> > # eth0 not transmitting when both links enabled
> > not ok 1 selftests: drivers/net/team: transmit_failover.sh # exit=1
> > --
> > pw-bot: cr
>
> Apologies for all of the test failures. Before sending this revision,
> I ran each test thousands of times and observed no failures, so I
> thought the flakiness would be resolved.
>
> No matter what I try, I can't recreate either issue on my end. I've
> tried building with the exact config from one of the test runs
> (https://netdev-ctrl.bots.linux.dev/logs/vmksft/bonding/results/590921/).
> I've tried stressing the VM according to
> https://github.com/linux-netdev/nipa/wiki/How-to-run-netdev-selftests-CI-style#reproducing-unstable-tests
> (this makes the tests time out, but I can still see traffic). I've
> tried using the netdev-testing/net-next-2026-04-06--09-00 kernel
> source. I've tried in nested and unnested virtual machines. I've also
> tried running multiple test instances in parallel, but nothing
> recreates the issues. The issues seem related to tcpdump, but without
> reproducing them, I can only guess. Any suggestions for running the
> tests exactly as the CI does would be greatly appreciated.
>
> - Marc

Thank you very much to kuniyu@google.com, who figured out how to
recreate the issue on Fedora. Fedora's /etc/services maps TCP port
1234 to the "search-agent" service (normal), which tcpdump then uses
to text-replace port numbers in its output. So the tests were looking
for ${ip_address}.1234, but tcpdump was spitting out
${ip_address}.search_agent. What is strange is that the test already
uses tcpdump's "-n" option: "Don't convert addresses (i.e., host
addresses, port numbers, etc.) to names."

It turns out that Fedora has a patched version of tcpdump that
separates the normal "-n" option into two options! "-n" handles host
addresses, and "-nn" handles port and protocol numbers. The tcpdump
invocation used by the selftests only uses "-n". What's stranger is
that passing "-nn" to tcpdump is actually portable, because under the
hood it is treated as a counter, with or without the Fedora patch:
https://github.com/the-tcpdump-group/tcpdump/blob/master/tcpdump.c#L1915
(thanks again to Kuniyuki for discovering this).

For v6, I will just change the TCP port to one that is not used by a
service, and will make the tcpdump helper function in the
net/forwarding lib use "-nn" instead of "-n".

- Marc
Re: [PATCH net-next v5 00/10] Decouple receive and transmit enablement in team driver
Posted by Jakub Kicinski 2 months, 1 week ago
On Tue, 7 Apr 2026 16:06:02 -0700 Marc Harvey wrote:
> Thank you very much to kuniyu@google.com, who figured out how to
> recreate the issue on Fedora. Fedora's /etc/services maps TCP port
> 1234 to the "search-agent" service (normal), which tcpdump then uses
> to text-replace port numbers in its output. So the tests were looking
> for ${ip_address}.1234, but tcpdump was spitting out
> ${ip_address}.search_agent. What is strange is that the test already
> uses tcpdump's "-n" option: "Don't convert addresses (i.e., host
> addresses, port numbers, etc.) to names."
> 
> It turns out that Fedora has a patched version of tcpdump that
> separates the normal "-n" option into two options! "-n" handles host
> addresses, and "-nn" handles port and protocol numbers. The tcpdump
> invocation used by the selftests only uses "-n". What's stranger is
> that passing "-nn" to tcpdump is actually portable, because under the
> hood it is treated as a counter, with or without the Fedora patch:
> https://github.com/the-tcpdump-group/tcpdump/blob/master/tcpdump.c#L1915
> (thanks again to Kuniyuki for discovering this).

Oh wow! Thanks to both of you for not giving up and getting to the
bottom of this :)