[PATCHv2 net] bonding: use permanent address for MAC swapping if device address is same

Hangbin Liu posted 1 patch 1 week, 2 days ago
drivers/net/bonding/bond_main.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
[PATCHv2 net] bonding: use permanent address for MAC swapping if device address is same
Posted by Hangbin Liu 1 week, 2 days ago
Similar with a951bc1e6ba5 ("bonding: correct the MAC address for "follow"
fail_over_mac policy"). The fail_over_mac follow mode requires the formerly
active slave to swap MAC addresses with the newly active slave during
failover. However, the slave's MAC address can be same under certain
conditions:

1) ip link set eth0 master bond0
   bond0 adopts eth0's MAC address (MAC0).

1) ip link set eth1 master bond0
   eth1 is added as a backup with its own MAC (MAC1).

3) ip link set eth0 nomaster
   eth0 is released and restores its MAC (MAC0).
   eth1 becomes the active slave, and bond0 assigns MAC0 to eth1.

4) ip link set eth0 master bond0
   eth0 is re-added to bond0, but both eth0 and eth1 now have MAC0,
   breaking the follow policy.

To resolve this issue, we need to swap the new active slave’s permanent
MAC address with the old one. The new active slave then uses the old
dev_addr, ensuring that it matches the bond address. After the fix:

5) ip link set bond0 type bond active_slave eth0
   dev_addr is the same, swap old active eth1's MAC (MAC0) with eth0.
   Swap new active eth0's permanent MAC (MAC0) to eth1.
   MAC addresses remain unchanged.

6) ip link set bond0 type bond active_slave eth1
   dev_addr is the same, swap the old active eth0's MAC (MAC0) with eth1.
   Swap new active eth1's permanent MAC (MAC1) to eth0.
   The MAC addresses are now correctly differentiated.

Fixes: 3915c1e8634a ("bonding: Add "follow" option to fail_over_mac")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
v2: use memcmp directly instead of adding a redundant helper (Jakub Kicinski)
---
 drivers/net/bonding/bond_main.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index e45bba240cbc..1e343d8fafa0 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1107,8 +1107,13 @@ static void bond_do_fail_over_mac(struct bonding *bond,
 			old_active = bond_get_old_active(bond, new_active);
 
 		if (old_active) {
-			bond_hw_addr_copy(tmp_mac, new_active->dev->dev_addr,
-					  new_active->dev->addr_len);
+			if (memcmp(old_active->dev->dev_addr, new_active->dev->dev_addr,
+				   new_active->dev->addr_len) == 0)
+				bond_hw_addr_copy(tmp_mac, new_active->perm_hwaddr,
+						  new_active->dev->addr_len);
+			else
+				bond_hw_addr_copy(tmp_mac, new_active->dev->dev_addr,
+						  new_active->dev->addr_len);
 			bond_hw_addr_copy(ss.__data,
 					  old_active->dev->dev_addr,
 					  old_active->dev->addr_len);
-- 
2.46.0

Re: [PATCHv2 net] bonding: use permanent address for MAC swapping if device address is same
Posted by Jay Vosburgh 5 days, 21 hours ago
Hangbin Liu <liuhangbin@gmail.com> wrote:

>Similar with a951bc1e6ba5 ("bonding: correct the MAC address for "follow"
>fail_over_mac policy"). The fail_over_mac follow mode requires the formerly
>active slave to swap MAC addresses with the newly active slave during
>failover. However, the slave's MAC address can be same under certain
>conditions:
>
>1) ip link set eth0 master bond0
>   bond0 adopts eth0's MAC address (MAC0).
>
>1) ip link set eth1 master bond0
>   eth1 is added as a backup with its own MAC (MAC1).
>
>3) ip link set eth0 nomaster
>   eth0 is released and restores its MAC (MAC0).
>   eth1 becomes the active slave, and bond0 assigns MAC0 to eth1.

	This step leaves both the bond+eth1 and the independent eth0
using the same MAC address.  There is a warning printed for this, and
allowing the duplicated MAC address assignment has been the behavior for
a very long time, and to my knowledge hasn't caused issues (I presume
because swapping interfaces in and out of a bond willy nilly doesn't
happen much outside of test cases).

[ pasting in Paolo's comment from the other thread ]

Paolo says:
>It was not immediately clear to me that the mac-dance in the code below
>happens only at failover time.

	"Failover" may be a misnomer here; the dance happens when the
active interface changes, which for this scenario occurs at either a
failover (link went down, etc) or when the active interface is removed
from the bond.

Paolo says:
>I second Jakub's doubt, I think it would be better to change eth0 mac
>address here (possibly to permanent eth1 mac, to preserve some consistency?)

	This would cause the MAC of the bond itself to change (for the
interface removal case at issue here), which is not the current behavior
for fail_over_mac=follow.  I'm not sure how often that's a dependency in
practice, but it would be a change of long-standing behavior.

Paolo says:
>Doing that in ndo_del_slave() should allow bonding to change the mac
>while still owning the old slave and avoid races with user-space.
>
>WDYT?

[ end of Paolo's comment ]

>4) ip link set eth0 master bond0
>   eth0 is re-added to bond0, but both eth0 and eth1 now have MAC0,
>   breaking the follow policy.
>
>To resolve this issue, we need to swap the new active slave’s permanent
>MAC address with the old one. The new active slave then uses the old
>dev_addr, ensuring that it matches the bond address. After the fix:

	Which interface is the "new active" in this situation?  Adding
eth0 back into the bond should not cause a change of active, eth0 would
be added as a backup.

>5) ip link set bond0 type bond active_slave eth0
>   dev_addr is the same, swap old active eth1's MAC (MAC0) with eth0.
>   Swap new active eth0's permanent MAC (MAC0) to eth1.
>   MAC addresses remain unchanged.

	So this patch's change wouldn't actually resolve the MAC
conflict until a failover takes place?  I.e., if we only do step 4 but
not step 5 or 6, eth0 and eth1 will both have the same MAC address.  Am
I understanding correctly?

	-J

>6) ip link set bond0 type bond active_slave eth1
>   dev_addr is the same, swap the old active eth0's MAC (MAC0) with eth1.
>   Swap new active eth1's permanent MAC (MAC1) to eth0.
>   The MAC addresses are now correctly differentiated.
>
>Fixes: 3915c1e8634a ("bonding: Add "follow" option to fail_over_mac")
>Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
>---
>v2: use memcmp directly instead of adding a redundant helper (Jakub Kicinski)
>---
> drivers/net/bonding/bond_main.c | 9 +++++++--
> 1 file changed, 7 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index e45bba240cbc..1e343d8fafa0 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -1107,8 +1107,13 @@ static void bond_do_fail_over_mac(struct bonding *bond,
> 			old_active = bond_get_old_active(bond, new_active);
> 
> 		if (old_active) {
>-			bond_hw_addr_copy(tmp_mac, new_active->dev->dev_addr,
>-					  new_active->dev->addr_len);
>+			if (memcmp(old_active->dev->dev_addr, new_active->dev->dev_addr,
>+				   new_active->dev->addr_len) == 0)
>+				bond_hw_addr_copy(tmp_mac, new_active->perm_hwaddr,
>+						  new_active->dev->addr_len);
>+			else
>+				bond_hw_addr_copy(tmp_mac, new_active->dev->dev_addr,
>+						  new_active->dev->addr_len);
> 			bond_hw_addr_copy(ss.__data,
> 					  old_active->dev->dev_addr,
> 					  old_active->dev->addr_len);
>-- 
>2.46.0
>

---
	-Jay Vosburgh, jv@jvosburgh.net
Re: [PATCHv2 net] bonding: use permanent address for MAC swapping if device address is same
Posted by Hangbin Liu 3 days, 9 hours ago
On Fri, Apr 04, 2025 at 02:36:39PM -0700, Jay Vosburgh wrote:
> Hangbin Liu <liuhangbin@gmail.com> wrote:
> 
> >Similar with a951bc1e6ba5 ("bonding: correct the MAC address for "follow"
> >fail_over_mac policy"). The fail_over_mac follow mode requires the formerly
> >active slave to swap MAC addresses with the newly active slave during
> >failover. However, the slave's MAC address can be same under certain
> >conditions:
> >
> >1) ip link set eth0 master bond0
> >   bond0 adopts eth0's MAC address (MAC0).
> >
> >1) ip link set eth1 master bond0
> >   eth1 is added as a backup with its own MAC (MAC1).
> >
> >3) ip link set eth0 nomaster
> >   eth0 is released and restores its MAC (MAC0).
> >   eth1 becomes the active slave, and bond0 assigns MAC0 to eth1.
> 
> 	This step leaves both the bond+eth1 and the independent eth0
> using the same MAC address.  There is a warning printed for this, and
> allowing the duplicated MAC address assignment has been the behavior for
> a very long time, and to my knowledge hasn't caused issues (I presume
> because swapping interfaces in and out of a bond willy nilly doesn't
> happen much outside of test cases).

Yes, until the NetworkManager become the default interface configuration tool
on some release. When set a slave down, the nmcli will remove the interface
from bond... This causes the issue to be triggered more often.

> >4) ip link set eth0 master bond0
> >   eth0 is re-added to bond0, but both eth0 and eth1 now have MAC0,
> >   breaking the follow policy.
> >
> >To resolve this issue, we need to swap the new active slave’s permanent
> >MAC address with the old one. The new active slave then uses the old
> >dev_addr, ensuring that it matches the bond address. After the fix:
> 
> 	Which interface is the "new active" in this situation?  Adding
> eth0 back into the bond should not cause a change of active, eth0 would
> be added as a backup.

When do fail-over, the "new active" literally. E.g.

> >5) ip link set bond0 type bond active_slave eth0
> >   dev_addr is the same, swap old active eth1's MAC (MAC0) with eth0.
> >   Swap new active eth0's permanent MAC (MAC0) to eth1.
> >   MAC addresses remain unchanged.

The new active slave is eth0 here.
> 
> 
> 	So this patch's change wouldn't actually resolve the MAC
> conflict until a failover takes place?  I.e., if we only do step 4 but
> not step 5 or 6, eth0 and eth1 will both have the same MAC address.  Am
> I understanding correctly?

Yes, you are right. At step 4, there is no failover, so eth0 is still using
it's own mac address. How about set the mac at enslave time, with this we
can get correct mac directly. e.g.

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 950d8e4d86f8..0d4e1ddd900d 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2120,6 +2120,24 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev,
 			slave_err(bond_dev, slave_dev, "Error %d calling set_mac_address\n", res);
 			goto err_restore_mtu;
 		}
+	} else if (bond->params.fail_over_mac == BOND_FOM_FOLLOW &&
+		   BOND_MODE(bond) == BOND_MODE_ACTIVEBACKUP &&
+		   memcmp(slave_dev->dev_addr, bond_dev->dev_addr, bond_dev->addr_len) == 0) {
+		/* Set slave to current active slave's permanent mac address to
+		 * avoid duplicate mac address.
+		 */
+		curr_active_slave = rcu_dereference(bond->curr_active_slave);
+		if (curr_active_slave) {
+			memcpy(ss.__data, curr_active_slave->perm_hwaddr,
+			       curr_active_slave->dev->addr_len);
+			ss.ss_family = slave_dev->type;
+			res = dev_set_mac_address(slave_dev, (struct sockaddr *)&ss,
+					extack);
+			if (res) {
+				slave_err(bond_dev, slave_dev, "Error %d calling set_mac_address\n", res);
+				goto err_restore_mtu;
+			}
+		}
 	}

Thanks
Hangbin