drivers/net/bonding/bond_main.c | 25 ++++++++++++++++++------- 1 file changed, 18 insertions(+), 7 deletions(-)
This change addresses a MAC address conflict issue in failover scenarios,
similar to the problem described in commit a951bc1e6ba5 ("bonding: correct
the MAC address for 'follow' fail_over_mac policy").
In fail_over_mac=follow mode, the bonding driver expects the formerly active
slave to swap MAC addresses with the newly active slave during failover.
However, under certain conditions, two slaves may end up with the same MAC
address, which breaks this policy:
1) ip link set eth0 master bond0
-> bond0 adopts eth0's MAC address (MAC0).
2) ip link set eth1 master bond0
-> eth1 is added as a backup with its own MAC (MAC1).
3) ip link set eth0 nomaster
-> eth0 is released and restores its MAC (MAC0).
-> eth1 becomes the active slave, and bond0 assigns MAC0 to eth1.
4) ip link set eth0 master bond0
-> eth0 is re-added to bond0, now both eth0 and eth1 have MAC0.
This results in a MAC address conflict and violates the expected behavior
of the failover policy.
To fix this, we assign a random MAC address to any newly added slave if
its current MAC address matches that of the bond. The original (permanent)
MAC address is saved and will be restored when the device is released
from the bond.
This ensures that each slave has a unique MAC address during failover
transitions, preserving the integrity of the fail_over_mac=follow policy.
Fixes: 3915c1e8634a ("bonding: Add "follow" option to fail_over_mac")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
v3: set random MAC address for the new added link (Jakub Kicinski)
change the MAC address during enslave, not failover (Jay Vosburgh)
v2: use memcmp directly instead of adding a redundant helper (Jakub Kicinski)
---
drivers/net/bonding/bond_main.c | 25 ++++++++++++++++++-------
1 file changed, 18 insertions(+), 7 deletions(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 8ea183da8d53..b91ed8eb7eb7 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2118,15 +2118,26 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev,
* set the master's mac address to that of the first slave
*/
memcpy(ss.__data, bond_dev->dev_addr, bond_dev->addr_len);
- ss.ss_family = slave_dev->type;
- res = dev_set_mac_address(slave_dev, (struct sockaddr *)&ss,
- extack);
- if (res) {
- slave_err(bond_dev, slave_dev, "Error %d calling set_mac_address\n", res);
- goto err_restore_mtu;
- }
+ } else if (bond->params.fail_over_mac == BOND_FOM_FOLLOW &&
+ BOND_MODE(bond) == BOND_MODE_ACTIVEBACKUP &&
+ memcmp(slave_dev->dev_addr, bond_dev->dev_addr, bond_dev->addr_len) == 0) {
+ /* Set slave to random address to avoid duplicate mac
+ * address in later fail over.
+ */
+ eth_random_addr(ss.__data);
+ } else {
+ goto skip_mac_set;
}
+ ss.ss_family = slave_dev->type;
+ res = dev_set_mac_address(slave_dev, (struct sockaddr *)&ss, extack);
+ if (res) {
+ slave_err(bond_dev, slave_dev, "Error %d calling set_mac_address\n", res);
+ goto err_restore_mtu;
+ }
+
+skip_mac_set:
+
/* set no_addrconf flag before open to prevent IPv6 addrconf */
slave_dev->priv_flags |= IFF_NO_ADDRCONF;
--
2.46.0
Hangbin Liu <liuhangbin@gmail.com> wrote:
>This change addresses a MAC address conflict issue in failover scenarios,
>similar to the problem described in commit a951bc1e6ba5 ("bonding: correct
>the MAC address for 'follow' fail_over_mac policy").
>
>In fail_over_mac=follow mode, the bonding driver expects the formerly active
>slave to swap MAC addresses with the newly active slave during failover.
>However, under certain conditions, two slaves may end up with the same MAC
>address, which breaks this policy:
>
>1) ip link set eth0 master bond0
> -> bond0 adopts eth0's MAC address (MAC0).
>
>2) ip link set eth1 master bond0
> -> eth1 is added as a backup with its own MAC (MAC1).
>
>3) ip link set eth0 nomaster
> -> eth0 is released and restores its MAC (MAC0).
> -> eth1 becomes the active slave, and bond0 assigns MAC0 to eth1.
>
>4) ip link set eth0 master bond0
> -> eth0 is re-added to bond0, now both eth0 and eth1 have MAC0.
>
>This results in a MAC address conflict and violates the expected behavior
>of the failover policy.
>
>To fix this, we assign a random MAC address to any newly added slave if
>its current MAC address matches that of the bond. The original (permanent)
>MAC address is saved and will be restored when the device is released
>from the bond.
>
>This ensures that each slave has a unique MAC address during failover
>transitions, preserving the integrity of the fail_over_mac=follow policy.
>
>Fixes: 3915c1e8634a ("bonding: Add "follow" option to fail_over_mac")
>Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
The code flow is a little clunky in the "if (situation one) else
if (situation two) else goto skip_mac_set" bit, but I don't really have
a better suggestion that isn't clunky in some other way.
This implementation does keep the already complicated failover
logic from becoming more complicated for this corner case.
-J
Acked-by: Jay Vosburgh <jv@jvosburgh.net>
>---
>v3: set random MAC address for the new added link (Jakub Kicinski)
> change the MAC address during enslave, not failover (Jay Vosburgh)
>v2: use memcmp directly instead of adding a redundant helper (Jakub Kicinski)
>---
> drivers/net/bonding/bond_main.c | 25 ++++++++++++++++++-------
> 1 file changed, 18 insertions(+), 7 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index 8ea183da8d53..b91ed8eb7eb7 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -2118,15 +2118,26 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev,
> * set the master's mac address to that of the first slave
> */
> memcpy(ss.__data, bond_dev->dev_addr, bond_dev->addr_len);
>- ss.ss_family = slave_dev->type;
>- res = dev_set_mac_address(slave_dev, (struct sockaddr *)&ss,
>- extack);
>- if (res) {
>- slave_err(bond_dev, slave_dev, "Error %d calling set_mac_address\n", res);
>- goto err_restore_mtu;
>- }
>+ } else if (bond->params.fail_over_mac == BOND_FOM_FOLLOW &&
>+ BOND_MODE(bond) == BOND_MODE_ACTIVEBACKUP &&
>+ memcmp(slave_dev->dev_addr, bond_dev->dev_addr, bond_dev->addr_len) == 0) {
>+ /* Set slave to random address to avoid duplicate mac
>+ * address in later fail over.
>+ */
>+ eth_random_addr(ss.__data);
>+ } else {
>+ goto skip_mac_set;
> }
>
>+ ss.ss_family = slave_dev->type;
>+ res = dev_set_mac_address(slave_dev, (struct sockaddr *)&ss, extack);
>+ if (res) {
>+ slave_err(bond_dev, slave_dev, "Error %d calling set_mac_address\n", res);
>+ goto err_restore_mtu;
>+ }
>+
>+skip_mac_set:
>+
> /* set no_addrconf flag before open to prevent IPv6 addrconf */
> slave_dev->priv_flags |= IFF_NO_ADDRCONF;
>
>--
>2.46.0
>
---
-Jay Vosburgh, jv@jvosburgh.net
On Thu, 24 Apr 2025 16:44:52 -0700 Jay Vosburgh wrote: > The code flow is a little clunky in the "if (situation one) else > if (situation two) else goto skip_mac_set" bit, but I don't really have > a better suggestion that isn't clunky in some other way. > > This implementation does keep the already complicated failover > logic from becoming more complicated for this corner case. Any thoughts on whether we should route this as a fix or as a -next improvement? The commit under Fixes is almost old enough to drink.
Jakub Kicinski <kuba@kernel.org> wrote: >On Thu, 24 Apr 2025 16:44:52 -0700 Jay Vosburgh wrote: >> The code flow is a little clunky in the "if (situation one) else >> if (situation two) else goto skip_mac_set" bit, but I don't really have >> a better suggestion that isn't clunky in some other way. >> >> This implementation does keep the already complicated failover >> logic from becoming more complicated for this corner case. > >Any thoughts on whether we should route this as a fix or as a -next >improvement? The commit under Fixes is almost old enough to drink. I'm fine with -next, the hardware this option was originally intended for was uncommon even then (IBM POWER ehea). I'm not aware of any recent-ish devices with the issue this was solving (that multiple ports of the NIC programmed with the same MAC made the hardware cranky), so it's more of a correctness exercise in my mind. -J --- -Jay Vosburgh, jv@jvosburgh.net
© 2016 - 2026 Red Hat, Inc.