[PATCH net] bonding: use permanent address for MAC swapping if device address is same

Hangbin Liu posted 1 patch 9 months ago
There is a newer version of this series
drivers/net/bonding/bond_main.c | 9 +++++++--
include/net/bonding.h           | 8 ++++++++
2 files changed, 15 insertions(+), 2 deletions(-)
[PATCH net] bonding: use permanent address for MAC swapping if device address is same
Posted by Hangbin Liu 9 months ago
Similar with a951bc1e6ba5 ("bonding: correct the MAC address for "follow"
fail_over_mac policy"). The fail_over_mac follow mode requires the formerly
active slave to swap MAC addresses with the newly active slave during
failover. However, the slave's MAC address can be same under certain
conditions:

1) ip link set eth0 master bond0
   bond0 adopts eth0's MAC address (MAC0).

1) ip link set eth1 master bond0
   eth1 is added as a backup with its own MAC (MAC1).

3) ip link set eth0 nomaster
   eth0 is released and restores its MAC (MAC0).
   eth1 becomes the active slave, and bond0 assigns MAC0 to eth1.

4) ip link set eth0 master bond0
   eth0 is re-added to bond0, but both eth0 and eth1 now have MAC0,
   breaking the follow policy.

To resolve this issue, we need to swap the new active slave’s permanent
MAC address with the old one. The new active slave then uses the old
dev_addr, ensuring that it matches the bond address. After the fix:

5) ip link set bond0 type bond active_slave eth0
   dev_addr is the same, swap old active eth1's MAC (MAC0) with eth0.
   Swap new active eth0's permanent MAC (MAC0) to eth1.
   MAC addresses remain unchanged.

6) ip link set bond0 type bond active_slave eth1
   dev_addr is the same, swap the old active eth0's MAC (MAC0) with eth1.
   Swap new active eth1's permanent MAC (MAC1) to eth0.
   The MAC addresses are now correctly differentiated.

Fixes: 3915c1e8634a ("bonding: Add "follow" option to fail_over_mac")
Reported-by: Liang Li <liali@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
 drivers/net/bonding/bond_main.c | 9 +++++++--
 include/net/bonding.h           | 8 ++++++++
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index e45bba240cbc..9cc2348d4ee9 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1107,8 +1107,13 @@ static void bond_do_fail_over_mac(struct bonding *bond,
 			old_active = bond_get_old_active(bond, new_active);
 
 		if (old_active) {
-			bond_hw_addr_copy(tmp_mac, new_active->dev->dev_addr,
-					  new_active->dev->addr_len);
+			if (bond_hw_addr_equal(old_active->dev->dev_addr, new_active->dev->dev_addr,
+					       new_active->dev->addr_len))
+				bond_hw_addr_copy(tmp_mac, new_active->perm_hwaddr,
+						  new_active->dev->addr_len);
+			else
+				bond_hw_addr_copy(tmp_mac, new_active->dev->dev_addr,
+						  new_active->dev->addr_len);
 			bond_hw_addr_copy(ss.__data,
 					  old_active->dev->dev_addr,
 					  old_active->dev->addr_len);
diff --git a/include/net/bonding.h b/include/net/bonding.h
index 8bb5f016969f..de965c24dde0 100644
--- a/include/net/bonding.h
+++ b/include/net/bonding.h
@@ -463,6 +463,14 @@ static inline void bond_hw_addr_copy(u8 *dst, const u8 *src, unsigned int len)
 	memcpy(dst, src, len);
 }
 
+static inline bool bond_hw_addr_equal(const u8 *dst, const u8 *src, unsigned int len)
+{
+	if (len == ETH_ALEN)
+		return ether_addr_equal(dst, src);
+	else
+		return (memcmp(dst, src, len) == 0);
+}
+
 #define BOND_PRI_RESELECT_ALWAYS	0
 #define BOND_PRI_RESELECT_BETTER	1
 #define BOND_PRI_RESELECT_FAILURE	2
-- 
2.46.0

Re: [PATCH net] bonding: use permanent address for MAC swapping if device address is same
Posted by Paolo Abeni 8 months, 2 weeks ago
On 3/19/25 9:09 AM, Hangbin Liu wrote:
> Similar with a951bc1e6ba5 ("bonding: correct the MAC address for "follow"
> fail_over_mac policy"). The fail_over_mac follow mode requires the formerly
> active slave to swap MAC addresses with the newly active slave during
> failover. However, the slave's MAC address can be same under certain
> conditions:
> 
> 1) ip link set eth0 master bond0
>    bond0 adopts eth0's MAC address (MAC0).
> 
> 1) ip link set eth1 master bond0
>    eth1 is added as a backup with its own MAC (MAC1).
> 
> 3) ip link set eth0 nomaster
>    eth0 is released and restores its MAC (MAC0).
>    eth1 becomes the active slave, and bond0 assigns MAC0 to eth1.

It was not immediately clear to me that the mac-dance in the code below
happens only at failover time.

I second Jakub's doubt, I think it would be better to change eth0 mac
address here (possibly to permanent eth1 mac, to preserve some consistency?)

Doing that in ndo_del_slave() should allow bonding to change the mac
while still owning the old slave and avoid races with user-space.

WDYT?

Thanks,

Paolo
Re: [PATCH net] bonding: use permanent address for MAC swapping if device address is same
Posted by Hangbin Liu 8 months, 2 weeks ago
On Thu, Apr 03, 2025 at 04:02:33PM +0200, Paolo Abeni wrote:
> On 3/19/25 9:09 AM, Hangbin Liu wrote:
> > Similar with a951bc1e6ba5 ("bonding: correct the MAC address for "follow"
> > fail_over_mac policy"). The fail_over_mac follow mode requires the formerly
> > active slave to swap MAC addresses with the newly active slave during
> > failover. However, the slave's MAC address can be same under certain
> > conditions:
> > 
> > 1) ip link set eth0 master bond0
> >    bond0 adopts eth0's MAC address (MAC0).
> > 
> > 1) ip link set eth1 master bond0
> >    eth1 is added as a backup with its own MAC (MAC1).
> > 
> > 3) ip link set eth0 nomaster
> >    eth0 is released and restores its MAC (MAC0).
> >    eth1 becomes the active slave, and bond0 assigns MAC0 to eth1.
> 
> It was not immediately clear to me that the mac-dance in the code below
> happens only at failover time.
> 
> I second Jakub's doubt, I think it would be better to change eth0 mac
> address here (possibly to permanent eth1 mac, to preserve some consistency?)

I have talked about one of the duplicate mac issue with Jay before [1]. We
decided to print a warning for that. I will discuss with Jay for this one
in one new patch thread.

[1] https://lore.kernel.org/netdev/Z49yXz1dx2ZzqhC1@fedora

Thanks
Hangbin
> 
> Doing that in ndo_del_slave() should allow bonding to change the mac
> while still owning the old slave and avoid races with user-space.
> 
> WDYT?
> 
> Thanks,
> 
> Paolo
>
Re: [PATCH net] bonding: use permanent address for MAC swapping if device address is same
Posted by Jakub Kicinski 8 months, 3 weeks ago
On Wed, 19 Mar 2025 08:09:47 +0000 Hangbin Liu wrote:
> Similar with a951bc1e6ba5 ("bonding: correct the MAC address for "follow"
> fail_over_mac policy"). The fail_over_mac follow mode requires the formerly
> active slave to swap MAC addresses with the newly active slave during
> failover. However, the slave's MAC address can be same under certain
> conditions:
> 
> 1) ip link set eth0 master bond0
>    bond0 adopts eth0's MAC address (MAC0).
> 
> 1) ip link set eth1 master bond0

nit: 2)

>    eth1 is added as a backup with its own MAC (MAC1).
> 
> 3) ip link set eth0 nomaster
>    eth0 is released and restores its MAC (MAC0).
>    eth1 becomes the active slave, and bond0 assigns MAC0 to eth1.

I don't know much about bonding, but this seems like a problem already
to me. Assuming both eth0 and eth1 are on the same segment we now have
two interfaces with the same MAC on the network. Shouldn't we override
the address of eth0 to a random one when it leaves?

> 4) ip link set eth0 master bond0
>    eth0 is re-added to bond0, but both eth0 and eth1 now have MAC0,
>    breaking the follow policy.
> 
> To resolve this issue, we need to swap the new active slave’s permanent
> MAC address with the old one. The new active slave then uses the old
> dev_addr, ensuring that it matches the bond address. After the fix:
> 
> 5) ip link set bond0 type bond active_slave eth0
>    dev_addr is the same, swap old active eth1's MAC (MAC0) with eth0.
>    Swap new active eth0's permanent MAC (MAC0) to eth1.
>    MAC addresses remain unchanged.
> 
> 6) ip link set bond0 type bond active_slave eth1
>    dev_addr is the same, swap the old active eth0's MAC (MAC0) with eth1.
>    Swap new active eth1's permanent MAC (MAC1) to eth0.
>    The MAC addresses are now correctly differentiated.
> 
> Fixes: 3915c1e8634a ("bonding: Add "follow" option to fail_over_mac")
> Reported-by: Liang Li <liali@redhat.com>
> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
> ---
>  drivers/net/bonding/bond_main.c | 9 +++++++--
>  include/net/bonding.h           | 8 ++++++++
>  2 files changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index e45bba240cbc..9cc2348d4ee9 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -1107,8 +1107,13 @@ static void bond_do_fail_over_mac(struct bonding *bond,
>  			old_active = bond_get_old_active(bond, new_active);
>  
>  		if (old_active) {
> -			bond_hw_addr_copy(tmp_mac, new_active->dev->dev_addr,
> -					  new_active->dev->addr_len);
> +			if (bond_hw_addr_equal(old_active->dev->dev_addr, new_active->dev->dev_addr,
> +					       new_active->dev->addr_len))
> +				bond_hw_addr_copy(tmp_mac, new_active->perm_hwaddr,
> +						  new_active->dev->addr_len);
> +			else
> +				bond_hw_addr_copy(tmp_mac, new_active->dev->dev_addr,
> +						  new_active->dev->addr_len);
>  			bond_hw_addr_copy(ss.__data,
>  					  old_active->dev->dev_addr,
>  					  old_active->dev->addr_len);
> diff --git a/include/net/bonding.h b/include/net/bonding.h
> index 8bb5f016969f..de965c24dde0 100644
> --- a/include/net/bonding.h
> +++ b/include/net/bonding.h
> @@ -463,6 +463,14 @@ static inline void bond_hw_addr_copy(u8 *dst, const u8 *src, unsigned int len)
>  	memcpy(dst, src, len);
>  }
>  
> +static inline bool bond_hw_addr_equal(const u8 *dst, const u8 *src, unsigned int len)
> +{
> +	if (len == ETH_ALEN)
> +		return ether_addr_equal(dst, src);
> +	else
> +		return (memcmp(dst, src, len) == 0);

looks like this is on ctrl path, just always use memcmp directly ?
not sure if this helper actually.. helps.

> +}
> +
>  #define BOND_PRI_RESELECT_ALWAYS	0
>  #define BOND_PRI_RESELECT_BETTER	1
>  #define BOND_PRI_RESELECT_FAILURE	2
-- 
pw-bot: cr
Re: [PATCH net] bonding: use permanent address for MAC swapping if device address is same
Posted by Hangbin Liu 8 months, 3 weeks ago
On Tue, Mar 25, 2025 at 06:24:16AM -0700, Jakub Kicinski wrote:
> > 1) ip link set eth1 master bond0
> 
> nit: 2)
> 
> >    eth1 is added as a backup with its own MAC (MAC1).
> > 
> > 3) ip link set eth0 nomaster
> >    eth0 is released and restores its MAC (MAC0).
> >    eth1 becomes the active slave, and bond0 assigns MAC0 to eth1.
> 
> I don't know much about bonding, but this seems like a problem already
> to me. Assuming both eth0 and eth1 are on the same segment we now have
> two interfaces with the same MAC on the network. Shouldn't we override
> the address of eth0 to a random one when it leaves?

Can we change an interface mac to random value after leaving bond's control?
It looks may break user's other configures.

> 
> > 4) ip link set eth0 master bond0
> >    eth0 is re-added to bond0, but both eth0 and eth1 now have MAC0,
> >    breaking the follow policy.
> > 
> > To resolve this issue, we need to swap the new active slave’s permanent
> > MAC address with the old one. The new active slave then uses the old
> > dev_addr, ensuring that it matches the bond address. After the fix:
> > 
> > 5) ip link set bond0 type bond active_slave eth0
> >    dev_addr is the same, swap old active eth1's MAC (MAC0) with eth0.
> >    Swap new active eth0's permanent MAC (MAC0) to eth1.
> >    MAC addresses remain unchanged.
> > 
> > 6) ip link set bond0 type bond active_slave eth1
> >    dev_addr is the same, swap the old active eth0's MAC (MAC0) with eth1.
> >    Swap new active eth1's permanent MAC (MAC1) to eth0.
> >    The MAC addresses are now correctly differentiated.
> > 
> > Fixes: 3915c1e8634a ("bonding: Add "follow" option to fail_over_mac")
> > Reported-by: Liang Li <liali@redhat.com>
> > Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
> > ---
> >  drivers/net/bonding/bond_main.c | 9 +++++++--
> >  include/net/bonding.h           | 8 ++++++++
> >  2 files changed, 15 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> > index e45bba240cbc..9cc2348d4ee9 100644
> > --- a/drivers/net/bonding/bond_main.c
> > +++ b/drivers/net/bonding/bond_main.c
> > @@ -1107,8 +1107,13 @@ static void bond_do_fail_over_mac(struct bonding *bond,
> >  			old_active = bond_get_old_active(bond, new_active);
> >  
> >  		if (old_active) {
> > -			bond_hw_addr_copy(tmp_mac, new_active->dev->dev_addr,
> > -					  new_active->dev->addr_len);
> > +			if (bond_hw_addr_equal(old_active->dev->dev_addr, new_active->dev->dev_addr,
> > +					       new_active->dev->addr_len))
> > +				bond_hw_addr_copy(tmp_mac, new_active->perm_hwaddr,
> > +						  new_active->dev->addr_len);
> > +			else
> > +				bond_hw_addr_copy(tmp_mac, new_active->dev->dev_addr,
> > +						  new_active->dev->addr_len);
> >  			bond_hw_addr_copy(ss.__data,
> >  					  old_active->dev->dev_addr,
> >  					  old_active->dev->addr_len);
> > diff --git a/include/net/bonding.h b/include/net/bonding.h
> > index 8bb5f016969f..de965c24dde0 100644
> > --- a/include/net/bonding.h
> > +++ b/include/net/bonding.h
> > @@ -463,6 +463,14 @@ static inline void bond_hw_addr_copy(u8 *dst, const u8 *src, unsigned int len)
> >  	memcpy(dst, src, len);
> >  }
> >  
> > +static inline bool bond_hw_addr_equal(const u8 *dst, const u8 *src, unsigned int len)
> > +{
> > +	if (len == ETH_ALEN)
> > +		return ether_addr_equal(dst, src);
> > +	else
> > +		return (memcmp(dst, src, len) == 0);
> 
> looks like this is on ctrl path, just always use memcmp directly ?
> not sure if this helper actually.. helps.

This is just to align with bond_hw_addr_copy(). If you think it's not help.
I can use memcmp() directly.

Thanks
Hangbin
Re: [PATCH net] bonding: use permanent address for MAC swapping if device address is same
Posted by Jakub Kicinski 8 months, 3 weeks ago
On Wed, 26 Mar 2025 10:22:58 +0000 Hangbin Liu wrote:
> > I don't know much about bonding, but this seems like a problem already
> > to me. Assuming both eth0 and eth1 are on the same segment we now have
> > two interfaces with the same MAC on the network. Shouldn't we override
> > the address of eth0 to a random one when it leaves?  
> 
> Can we change an interface mac to random value after leaving bond's control?
> It looks may break user's other configures.

Hard to speculate but leaving two interfaces with the same MAC is even
worse? I guess nobody hit this problem in practice.

> > looks like this is on ctrl path, just always use memcmp directly ?
> > not sure if this helper actually.. helps.  
> 
> This is just to align with bond_hw_addr_copy(). If you think it's not help.
> I can use memcmp() directly.

Yes, I don't think it helps.
Re: [PATCH net] bonding: use permanent address for MAC swapping if device address is same
Posted by Hangbin Liu 8 months, 3 weeks ago
On Wed, Mar 26, 2025 at 04:32:13AM -0700, Jakub Kicinski wrote:
> On Wed, 26 Mar 2025 10:22:58 +0000 Hangbin Liu wrote:
> > > I don't know much about bonding, but this seems like a problem already
> > > to me. Assuming both eth0 and eth1 are on the same segment we now have
> > > two interfaces with the same MAC on the network. Shouldn't we override
> > > the address of eth0 to a random one when it leaves?  
> > 
> > Can we change an interface mac to random value after leaving bond's control?
> > It looks may break user's other configures.
> 
> Hard to speculate but leaving two interfaces with the same MAC is even
> worse? I guess nobody hit this problem in practice.

It's the default behavior for bonding as bond will take the first link's
MAC address. If release the first link, then we will have 2 interfaces (bond
and first link) has same MAC address.

Usually, People doesn't release the link after adding to bond, but it do happens.

Hi Jay, any comments?

Thanks
Hangbin
Re: [PATCH net] bonding: use permanent address for MAC swapping if device address is same
Posted by Jay Vosburgh 9 months ago
Hangbin Liu <liuhangbin@gmail.com> wrote:

>Similar with a951bc1e6ba5 ("bonding: correct the MAC address for "follow"
>fail_over_mac policy"). The fail_over_mac follow mode requires the formerly
>active slave to swap MAC addresses with the newly active slave during
>failover. However, the slave's MAC address can be same under certain
>conditions:
>
>1) ip link set eth0 master bond0
>   bond0 adopts eth0's MAC address (MAC0).
>
>1) ip link set eth1 master bond0
>   eth1 is added as a backup with its own MAC (MAC1).
>
>3) ip link set eth0 nomaster
>   eth0 is released and restores its MAC (MAC0).
>   eth1 becomes the active slave, and bond0 assigns MAC0 to eth1.
>
>4) ip link set eth0 master bond0
>   eth0 is re-added to bond0, but both eth0 and eth1 now have MAC0,
>   breaking the follow policy.

	Are all of these steps necessary, or does the issue happen if a
new interface (not previously part of the bond) is added to the bond
with its MAC set to whatever the bond's MAC is?

	Did this come up in practise somewhere, or through inspection /
testing?  I'm curious as I'd expect usage of this option today would be
rare, as I hope that current hardware wouldn't have the "MAC assigned to
multiple ports" issues that led to the "follow" logic.  If memory
serves, the issue arose originally in the ehea network device (on IBM
POWER), which I believe is out of production now for some years.

>To resolve this issue, we need to swap the new active slave’s permanent
>MAC address with the old one. The new active slave then uses the old
>dev_addr, ensuring that it matches the bond address. After the fix:
>
>5) ip link set bond0 type bond active_slave eth0
>   dev_addr is the same, swap old active eth1's MAC (MAC0) with eth0.
>   Swap new active eth0's permanent MAC (MAC0) to eth1.
>   MAC addresses remain unchanged.
>
>6) ip link set bond0 type bond active_slave eth1
>   dev_addr is the same, swap the old active eth0's MAC (MAC0) with eth1.
>   Swap new active eth1's permanent MAC (MAC1) to eth0.
>   The MAC addresses are now correctly differentiated.

	An alternative solution could be to disallow adding a new
interface in "follow" mode if its MAC matches the active interface of
the bond.  If this patch is more of an correctness exercise rather than
something found out in the world impacting production deployments, it
might be better to keep the MAC swapping logic in the failover code
simpler.

	-J

>Fixes: 3915c1e8634a ("bonding: Add "follow" option to fail_over_mac")
>Reported-by: Liang Li <liali@redhat.com>
>Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
>---
> drivers/net/bonding/bond_main.c | 9 +++++++--
> include/net/bonding.h           | 8 ++++++++
> 2 files changed, 15 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index e45bba240cbc..9cc2348d4ee9 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -1107,8 +1107,13 @@ static void bond_do_fail_over_mac(struct bonding *bond,
> 			old_active = bond_get_old_active(bond, new_active);
> 
> 		if (old_active) {
>-			bond_hw_addr_copy(tmp_mac, new_active->dev->dev_addr,
>-					  new_active->dev->addr_len);
>+			if (bond_hw_addr_equal(old_active->dev->dev_addr, new_active->dev->dev_addr,
>+					       new_active->dev->addr_len))
>+				bond_hw_addr_copy(tmp_mac, new_active->perm_hwaddr,
>+						  new_active->dev->addr_len);
>+			else
>+				bond_hw_addr_copy(tmp_mac, new_active->dev->dev_addr,
>+						  new_active->dev->addr_len);
> 			bond_hw_addr_copy(ss.__data,
> 					  old_active->dev->dev_addr,
> 					  old_active->dev->addr_len);
>diff --git a/include/net/bonding.h b/include/net/bonding.h
>index 8bb5f016969f..de965c24dde0 100644
>--- a/include/net/bonding.h
>+++ b/include/net/bonding.h
>@@ -463,6 +463,14 @@ static inline void bond_hw_addr_copy(u8 *dst, const u8 *src, unsigned int len)
> 	memcpy(dst, src, len);
> }
> 
>+static inline bool bond_hw_addr_equal(const u8 *dst, const u8 *src, unsigned int len)
>+{
>+	if (len == ETH_ALEN)
>+		return ether_addr_equal(dst, src);
>+	else
>+		return (memcmp(dst, src, len) == 0);
>+}
>+
> #define BOND_PRI_RESELECT_ALWAYS	0
> #define BOND_PRI_RESELECT_BETTER	1
> #define BOND_PRI_RESELECT_FAILURE	2
>-- 
>2.46.0

---
	-Jay Vosburgh, jv@jvosburgh.net
Re: [PATCH net] bonding: use permanent address for MAC swapping if device address is same
Posted by Hangbin Liu 8 months, 4 weeks ago
On Thu, Mar 20, 2025 at 12:13:14PM -0700, Jay Vosburgh wrote:
> Hangbin Liu <liuhangbin@gmail.com> wrote:
> 
> >Similar with a951bc1e6ba5 ("bonding: correct the MAC address for "follow"
> >fail_over_mac policy"). The fail_over_mac follow mode requires the formerly
> >active slave to swap MAC addresses with the newly active slave during
> >failover. However, the slave's MAC address can be same under certain
> >conditions:
> >
> >1) ip link set eth0 master bond0
> >   bond0 adopts eth0's MAC address (MAC0).
> >
> >1) ip link set eth1 master bond0
> >   eth1 is added as a backup with its own MAC (MAC1).
> >
> >3) ip link set eth0 nomaster
> >   eth0 is released and restores its MAC (MAC0).
> >   eth1 becomes the active slave, and bond0 assigns MAC0 to eth1.
> >
> >4) ip link set eth0 master bond0
> >   eth0 is re-added to bond0, but both eth0 and eth1 now have MAC0,
> >   breaking the follow policy.
> 
> 	Are all of these steps necessary, or does the issue happen if a
> new interface (not previously part of the bond) is added to the bond
> with its MAC set to whatever the bond's MAC is?

Yes, any new interface(usually the previous first bond link) with MAC
same to bond will has this issue.

> 
> 	Did this come up in practise somewhere, or through inspection /
> testing?  I'm curious as I'd expect usage of this option today would be
> rare, as I hope that current hardware wouldn't have the "MAC assigned to
> multiple ports" issues that led to the "follow" logic.  If memory
> serves, the issue arose originally in the ehea network device (on IBM
> POWER), which I believe is out of production now for some years.

This issue has also been reported by OCP users[1] with NetworkManager setups.
When the bond is controlled by NetworkManager, using nmcli to set a slave
link down will remove the slave from the bond and bring the interface down.
Similarly, using nmcli to bring the link up will add the slave back to the bond.

This is the default link operation logic of NetworkManager, and there is
little possibility of changing it. However, this behavior makes the MAC
address duplication issue much easier to trigger.

[1] https://github.com/openshift/machine-config-operator/pull/4609

> >To resolve this issue, we need to swap the new active slave’s permanent
> >MAC address with the old one. The new active slave then uses the old
> >dev_addr, ensuring that it matches the bond address. After the fix:
> >
> >5) ip link set bond0 type bond active_slave eth0
> >   dev_addr is the same, swap old active eth1's MAC (MAC0) with eth0.
> >   Swap new active eth0's permanent MAC (MAC0) to eth1.
> >   MAC addresses remain unchanged.
> >
> >6) ip link set bond0 type bond active_slave eth1
> >   dev_addr is the same, swap the old active eth0's MAC (MAC0) with eth1.
> >   Swap new active eth1's permanent MAC (MAC1) to eth0.
> >   The MAC addresses are now correctly differentiated.
> 
> 	An alternative solution could be to disallow adding a new
> interface in "follow" mode if its MAC matches the active interface of
> the bond.  If this patch is more of an correctness exercise rather than
> something found out in the world impacting production deployments, it
> might be better to keep the MAC swapping logic in the failover code
> simpler.

With the NetworkManager operations situation. It looks we can't simply
disallow adding the interface back to bond if its MAC the same with active
interface of bond.

Thanks
Hangbin
> 
> 	-J
> 
> >Fixes: 3915c1e8634a ("bonding: Add "follow" option to fail_over_mac")
> >Reported-by: Liang Li <liali@redhat.com>
> >Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
> >---
> > drivers/net/bonding/bond_main.c | 9 +++++++--
> > include/net/bonding.h           | 8 ++++++++
> > 2 files changed, 15 insertions(+), 2 deletions(-)
> >
> >diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> >index e45bba240cbc..9cc2348d4ee9 100644
> >--- a/drivers/net/bonding/bond_main.c
> >+++ b/drivers/net/bonding/bond_main.c
> >@@ -1107,8 +1107,13 @@ static void bond_do_fail_over_mac(struct bonding *bond,
> > 			old_active = bond_get_old_active(bond, new_active);
> > 
> > 		if (old_active) {
> >-			bond_hw_addr_copy(tmp_mac, new_active->dev->dev_addr,
> >-					  new_active->dev->addr_len);
> >+			if (bond_hw_addr_equal(old_active->dev->dev_addr, new_active->dev->dev_addr,
> >+					       new_active->dev->addr_len))
> >+				bond_hw_addr_copy(tmp_mac, new_active->perm_hwaddr,
> >+						  new_active->dev->addr_len);
> >+			else
> >+				bond_hw_addr_copy(tmp_mac, new_active->dev->dev_addr,
> >+						  new_active->dev->addr_len);
> > 			bond_hw_addr_copy(ss.__data,
> > 					  old_active->dev->dev_addr,
> > 					  old_active->dev->addr_len);
> >diff --git a/include/net/bonding.h b/include/net/bonding.h
> >index 8bb5f016969f..de965c24dde0 100644
> >--- a/include/net/bonding.h
> >+++ b/include/net/bonding.h
> >@@ -463,6 +463,14 @@ static inline void bond_hw_addr_copy(u8 *dst, const u8 *src, unsigned int len)
> > 	memcpy(dst, src, len);
> > }
> > 
> >+static inline bool bond_hw_addr_equal(const u8 *dst, const u8 *src, unsigned int len)
> >+{
> >+	if (len == ETH_ALEN)
> >+		return ether_addr_equal(dst, src);
> >+	else
> >+		return (memcmp(dst, src, len) == 0);
> >+}
> >+
> > #define BOND_PRI_RESELECT_ALWAYS	0
> > #define BOND_PRI_RESELECT_BETTER	1
> > #define BOND_PRI_RESELECT_FAILURE	2
> >-- 
> >2.46.0
> 
> ---
> 	-Jay Vosburgh, jv@jvosburgh.net
>