[PATCH net-next v6 00/10] Decouple receive and transmit enablement in team driver

Marc Harvey posted 10 patches 2 months, 1 week ago
There is a newer version of this series
drivers/net/team/team_core.c                       | 237 ++++++++++++++++----
drivers/net/team/team_mode_loadbalance.c           |   8 +-
drivers/net/team/team_mode_random.c                |   4 +-
drivers/net/team/team_mode_roundrobin.c            |   2 +-
include/linux/if_team.h                            |  63 +++---
tools/testing/selftests/drivers/net/team/Makefile  |   4 +
tools/testing/selftests/drivers/net/team/config    |   4 +
.../drivers/net/team/decoupled_enablement.sh       | 249 +++++++++++++++++++++
.../testing/selftests/drivers/net/team/options.sh  |  99 +++++++-
.../testing/selftests/drivers/net/team/team_lib.sh | 174 ++++++++++++++
.../drivers/net/team/teamd_activebackup.sh         | 246 ++++++++++++++++++++
.../drivers/net/team/transmit_failover.sh          | 158 +++++++++++++
tools/testing/selftests/net/forwarding/lib.sh      |   9 +-
tools/testing/selftests/net/lib.sh                 |  13 ++
14 files changed, 1197 insertions(+), 73 deletions(-)
[PATCH net-next v6 00/10] Decouple receive and transmit enablement in team driver
Posted by Marc Harvey 2 months, 1 week ago
Allow independent control over receive and transmit enablement states
for aggregated ports in the team driver.

The motivation is that IEE 802.3ad LACP "independent control" can't
be implemented for the team driver currently. This was added to the
bonding driver in commit 240fd405528b ("bonding: Add independent
control state machine").

This series also has a few patches that add tests to show that the old
coupled enablement still works and that the new decoupled enablement
works as intended (4, 5, and 10).

There are three patches with small fixes as well, with the goal of
making the final decoupling patch clearer (1, 2, and 3).

Signed-off-by: Marc Harvey <marcharvey@google.com>
---
Changes in v6:
- Make selftests use a TCP port with no associate service.
- Make selftests pass -nn flag to tcpdump, which will make it not
  convert port numbers to service names.
- Link to v5: https://lore.kernel.org/r/20260406-teaming-driver-internal-v5-0-e8a3f348a1c5@google.com

Changes in v5:
- Change teamd activebackup selftest in patch 5 to try graceful teamd
  teardown before using sigkill.
- Make the teamd activebackup selftest in patch 5 delete leftover
  teamd files during teardown.
- Reorder function calls in team_port_enable function in patch 7,
  since the enablement behavior shouldn't change.
- Make selftests use tcpdump instead of checking rx counters.
- Fix minor typos in patch 10.
- Link to v4: https://lore.kernel.org/r/20260403-teaming-driver-internal-v4-0-d3032f33ca25@google.com

Changes in v4:
- Split the large v3 patch "net: team: Decouple rx and tx enablement
  in the team driver" into 4 smaller patches.
- Link to v3: https://lore.kernel.org/r/20260402-teaming-driver-internal-v3-0-e8cfdec3b5c2@google.com

Changes in v3:
- Patch 5: In test cleanup, kill teamd to fix timeout.
- Link to v2: https://lore.kernel.org/r/20260401-teaming-driver-internal-v2-0-f80c1291727b@google.com

Changes in v2:
- Patch 4 and 5: Fix shellcheck errors and warnings, use iperf3
  instead of netcat+pv, fix dependency checking.
- Patch 7: Fix shellcheck errors and warnings, fix dependency
  checking.
- Link to v1: https://lore.kernel.org/all/20260331053353.2504254-1-marcharvey@google.com/

---
Marc Harvey (10):
      net: team: Annotate reads and writes for mixed lock accessed values
      net: team: Remove unused team_mode_op, port_enabled
      net: team: Rename port_disabled team mode op to port_tx_disabled
      selftests: net: Add tests for failover of team-aggregated ports
      selftests: net: Add test for enablement of ports with teamd
      net: team: Rename enablement functions and struct members to tx
      net: team: Track rx enablement separately from tx enablement
      net: team: Add new rx_enabled team port option
      net: team: Add new tx_enabled team port option
      selftests: net: Add tests for team driver decoupled tx and rx control

 drivers/net/team/team_core.c                       | 237 ++++++++++++++++----
 drivers/net/team/team_mode_loadbalance.c           |   8 +-
 drivers/net/team/team_mode_random.c                |   4 +-
 drivers/net/team/team_mode_roundrobin.c            |   2 +-
 include/linux/if_team.h                            |  63 +++---
 tools/testing/selftests/drivers/net/team/Makefile  |   4 +
 tools/testing/selftests/drivers/net/team/config    |   4 +
 .../drivers/net/team/decoupled_enablement.sh       | 249 +++++++++++++++++++++
 .../testing/selftests/drivers/net/team/options.sh  |  99 +++++++-
 .../testing/selftests/drivers/net/team/team_lib.sh | 174 ++++++++++++++
 .../drivers/net/team/teamd_activebackup.sh         | 246 ++++++++++++++++++++
 .../drivers/net/team/transmit_failover.sh          | 158 +++++++++++++
 tools/testing/selftests/net/forwarding/lib.sh      |   9 +-
 tools/testing/selftests/net/lib.sh                 |  13 ++
 14 files changed, 1197 insertions(+), 73 deletions(-)
---
base-commit: 2ce8a41113eda1adddc1e6dc43cf89383ec6dc22
change-id: 20260401-teaming-driver-internal-83f2f0074d68

Best regards,
-- 
Marc Harvey <marcharvey@google.com>
Re: [PATCH net-next v6 00/10] Decouple receive and transmit enablement in team driver
Posted by Jakub Kicinski 2 months, 1 week ago
On Wed, 08 Apr 2026 02:52:19 +0000 Marc Harvey wrote:
> Allow independent control over receive and transmit enablement states
> for aggregated ports in the team driver.
> 
> The motivation is that IEE 802.3ad LACP "independent control" can't
> be implemented for the team driver currently. This was added to the
> bonding driver in commit 240fd405528b ("bonding: Add independent
> control state machine").
> 
> This series also has a few patches that add tests to show that the old
> coupled enablement still works and that the new decoupled enablement
> works as intended (4, 5, and 10).
> 
> There are three patches with small fixes as well, with the goal of
> making the final decoupling patch clearer (1, 2, and 3).

It pains me to report on non-debug kernels:

make: Entering directory '/srv/vmksft/testing/wt-9/tools/testing/selftests'
make[1]: Nothing to be done for 'all'.
TAP version 13
1..1
# timeout set to 45
# selftests: drivers/net/team: teamd_activebackup.sh
# Setting up two-link aggregation for runner activebackup
# Teamd version is: teamd 1.32
# Conf files are /tmp/tmp.ZeEAwlX4kB and /tmp/tmp.Q8XVmtXmXY
# This program is not intended to be run as root.
# This program is not intended to be run as root.
# Created team devices
# Teamd PIDs are 30274 and 30278
# PING fd00::2 (fd00::2) 56 data bytes
# 64 bytes from fd00::2: icmp_seq=1 ttl=64 time=0.037 ms
# 
# --- fd00::2 ping statistics ---
# 1 packets transmitted, 1 received, 0% packet loss, time 0ms
# rtt min/avg/max/mdev = 0.037/0.037/0.037/0.000 msPacket count for test_team2 was 121
# Waiting for eth0 in ns2-yYZzD5 to stop receiving
# Packet count for eth0 was 0Packet count for eth0 was 0
# Packet count for eth1 was 243
# Waiting for eth1 in ns2-yYZzD5 to stop receiving
# Packet count for eth1 was 0Packet count for eth0 was 365
# Packet count for eth1 was 0
# TEST: teamd active backup runner test                               [ OK ]
# Tearing down two-link aggregation
# Failed to kill daemon: Timer expired
#
not ok 1 selftests: drivers/net/team: teamd_activebackup.sh # TIMEOUT 45 seconds


Retry:

make: Entering directory '/srv/vmksft/testing/wt-9/tools/testing/selftests'
make[1]: Nothing to be done for 'all'.
TAP version 13
1..1
# timeout set to 45
# selftests: drivers/net/team: teamd_activebackup.sh
# Setting up two-link aggregation for runner activebackup
# Teamd version is: teamd 1.32
# Conf files are /tmp/tmp.0pmbsXgdH5 and /tmp/tmp.ehbGB6jJTZ
# This program is not intended to be run as root.
# This program is not intended to be run as root.
# Created team devices
# Teamd PIDs are 1314 and 1318
# PING fd00::2 (fd00::2) 56 data bytes
# 64 bytes from fd00::2: icmp_seq=1 ttl=64 time=0.032 ms
# 
# --- fd00::2 ping statistics ---
# 1 packets transmitted, 1 received, 0% packet loss, time 0ms
# rtt min/avg/max/mdev = 0.032/0.032/0.032/0.000 msPacket count for test_team2 was 121
# Waiting for eth0 in ns2-H0Yrq8 to stop receiving
# Packet count for eth0 was 0Packet count for eth0 was 0
# Packet count for eth1 was 243
# Waiting for eth1 in ns2-H0Yrq8 to stop receiving
# Packet count for eth1 was 0Packet count for eth0 was 366
# Packet count for eth1 was 0
# TEST: teamd active backup runner test                               [ OK ]
# Tearing down two-link aggregation
# Failed to kill daemon: Timer expired
#
not ok 1 selftests: drivers/net/team: teamd_activebackup.sh # TIMEOUT 45 seconds
Re: [PATCH net-next v6 00/10] Decouple receive and transmit enablement in team driver
Posted by Marc Harvey 2 months, 1 week ago
On Wed, Apr 8, 2026 at 9:40 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> It pains me to report on non-debug kernels:

I'm sorry to have pained you. Despite my best efforts to run with the
exact same environment and conditions as your CI, my teamd can be
killed with "teamd -k" but yours hangs (both are version 1.32 on
Fedora with the same kernel config). For v7, I’ll invoke "teamd -k"
using the timeout utility, or just increase the test timeout.
Re: [PATCH net-next v6 00/10] Decouple receive and transmit enablement in team driver
Posted by Kuniyuki Iwashima 2 months, 1 week ago
From: Marc Harvey <marcharvey@google.com>
Date: Wed, 8 Apr 2026 17:10:05 -0700
> On Wed, Apr 8, 2026 at 9:40 AM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > It pains me to report on non-debug kernels:
> 
> I'm sorry to have pained you. Despite my best efforts to run with the
> exact same environment and conditions as your CI, my teamd can be
> killed with "teamd -k" but yours hangs (both are version 1.32 on
> Fedora with the same kernel config).

Considering the subsequent "kill" works on the dbg instance (thanks
to 2400s timeout), I guess teamd is somehow stuck at SIGTERM handling
removing team devices in teamd_port_remove_all().  (SIGTERM being masked
sounds unlikely)

https://netdev-ctrl.bots.linux.dev/logs/vmksft/bonding-dbg/results/593802/4-teamd-activebackup-sh/stdout
https://netdev-ctrl.bots.linux.dev/logs/vmksft/bonding-dbg/results/593802/4-teamd-activebackup-sh/stderr
---8<---
[  759.819815][T21724] test_team1: Port device eth1 removed
[  759.822323][T21724] test_team1: Port device eth0 removed
[  790.615687][T21728] test_team2: Port device eth1 removed
[  790.617445][T21728] test_team2: Port device eth0 removed
---8<---

Adding -N and letting "ip netns del" release the last netns refcnt
and defer device destruction to cleanup_net() may help.


> For v7, I’ll invoke "teamd -k"
> using the timeout utility, or just increase the test timeout.

+1 for the latter, maybe set timeout=300.

daemon_pid_file_kill_wait(SIGTERM, 30) * 2 = 120s, but just in case.

See these files for howto:

  $ find tools/testing/selftests/net/ -name settings
Re: [PATCH net-next v6 00/10] Decouple receive and transmit enablement in team driver
Posted by Jakub Kicinski 2 months, 1 week ago
On Wed, 8 Apr 2026 17:10:05 -0700 Marc Harvey wrote:
> On Wed, Apr 8, 2026 at 9:40 AM Jakub Kicinski <kuba@kernel.org> wrote:
> > It pains me to report on non-debug kernels:  
> 
> I'm sorry to have pained you.

To be clear it's a compassionate pain on your behalf, I don' care :)

> Despite my best efforts to run with the exact same environment and
> conditions as your CI, my teamd can be killed with "teamd -k" but
> yours hangs (both are version 1.32 on Fedora with the same kernel
> config). For v7, I’ll invoke "teamd -k" using the timeout utility, or
> just increase the test timeout.