drivers/net/phy/phy_device.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-)
For the shared MDIO bus use case, multiple MACs will share the same MDIO
bus. Therefore, these MACs all depend on this MDIO bus. If this shared
MDIO bus is removed, all the PHY devices attached to this MDIO bus will
also be removed. Consequently, the MAC driver should not access the PHY
device, otherwise, it will lead to some potential crashes. Because the
corresponding phydev and the mii_bus have been freed, some pointers have
become invalid.
For example. Abhishek reported a crash issue that occurred if the MDIO
bus driver was removed first, followed by the MAC driver. The crash log
is as below.
Call trace:
__list_del_entry_valid_or_report+0xa8/0xe0
__device_link_del+0x40/0xf0
device_link_put_kref+0xb4/0xc8
device_link_del+0x38/0x58
phy_detach+0x2c/0x170
phy_disconnect+0x4c/0x70
phylink_disconnect_phy+0x6c/0xc0 [phylink]
stmmac_release+0x60/0x358 [stmmac]
Another example is the i.MX95-15x15 platform which has two ENETC ports.
When all the external PHYs are managed the EMDIO (the MDIO controller),
if the enetc driver is removed after the EMDIO driver. Users will see
the below crash log and the console is hanged.
Call trace:
_phy_state_machine+0x230/0x36c (P)
phy_stop+0x74/0x190
phylink_stop+0x28/0xb8
enetc_close+0x28/0x8c
__dev_close_many+0xb4/0x1d8
netif_close_many+0x8c/0x13c
enetc4_pf_remove+0x2c/0x84
pci_device_remove+0x44/0xe8
To address this issue, Sarosh Hasan tried to change the devlink flag to
DL_FLAG_AUTOREMOVE_SUPPLIER [1], so that the MAC driver will be removed
along with the PHY driver. However, the solution does not take into
account the hot-swappable PHY devices (SFP PHYs), so when the PHY device
is unplugged, the MAC driver will automatically be removed, which is not
the expected behavior. This issue should not exist for SFP PHYs, so based
on the Sarosh's patch, the flag is changed to DL_FLAG_AUTOREMOVE_SUPPLIER
for non-SFP PHYs.
Reported-by: Abhishek Chauhan (ABC) <quic_abchauha@quicinc.com>
Closes: https://lore.kernel.org/all/d696a426-40bb-4c1a-b42d-990fb690de5e@quicinc.com/
Link: https://lore.kernel.org/imx/20250703090041.23137-1-quic_sarohasa@quicinc.com/ # [1]
Fixes: bc66fa87d4fd ("net: phy: Add link between phy dev and mac dev")
Suggested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Wei Fang <wei.fang@nxp.com>
---
v2:
1. Change the subject and update the commit message
2. Based on Maxime's suggestion, only set DL_FLAG_AUTOREMOVE_SUPPLIER
flag for non-SFP PHYs.
v1 link: https://lore.kernel.org/imx/20260126104409.1070403-1-wei.fang@nxp.com/
---
drivers/net/phy/phy_device.c | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 81984d4ebb7c..0494ab58ceaf 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -1771,9 +1771,17 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev,
* another mac interface, so we should create a device link between
* phy dev and mac dev.
*/
- if (dev && phydev->mdio.bus->parent && dev->dev.parent != phydev->mdio.bus->parent)
- phydev->devlink = device_link_add(dev->dev.parent, &phydev->mdio.dev,
- DL_FLAG_PM_RUNTIME | DL_FLAG_STATELESS);
+ if (dev && bus->parent && dev->dev.parent != bus->parent) {
+ if (phy_on_sfp(phydev))
+ phydev->devlink = device_link_add(dev->dev.parent,
+ &phydev->mdio.dev,
+ DL_FLAG_PM_RUNTIME |
+ DL_FLAG_STATELESS);
+ else
+ device_link_add(dev->dev.parent, &phydev->mdio.dev,
+ DL_FLAG_PM_RUNTIME |
+ DL_FLAG_AUTOREMOVE_SUPPLIER);
+ }
return err;
--
2.34.1
Hi Wei,
On 02/02/2026 06:45, Wei Fang wrote:
> For the shared MDIO bus use case, multiple MACs will share the same MDIO
> bus. Therefore, these MACs all depend on this MDIO bus. If this shared
> MDIO bus is removed, all the PHY devices attached to this MDIO bus will
> also be removed. Consequently, the MAC driver should not access the PHY
> device, otherwise, it will lead to some potential crashes. Because the
> corresponding phydev and the mii_bus have been freed, some pointers have
> become invalid.
>
> For example. Abhishek reported a crash issue that occurred if the MDIO
> bus driver was removed first, followed by the MAC driver. The crash log
> is as below.
>
> Call trace:
> __list_del_entry_valid_or_report+0xa8/0xe0
> __device_link_del+0x40/0xf0
> device_link_put_kref+0xb4/0xc8
> device_link_del+0x38/0x58
> phy_detach+0x2c/0x170
> phy_disconnect+0x4c/0x70
> phylink_disconnect_phy+0x6c/0xc0 [phylink]
> stmmac_release+0x60/0x358 [stmmac]
>
> Another example is the i.MX95-15x15 platform which has two ENETC ports.
> When all the external PHYs are managed the EMDIO (the MDIO controller),
> if the enetc driver is removed after the EMDIO driver. Users will see
> the below crash log and the console is hanged.
>
> Call trace:
> _phy_state_machine+0x230/0x36c (P)
> phy_stop+0x74/0x190
> phylink_stop+0x28/0xb8
> enetc_close+0x28/0x8c
> __dev_close_many+0xb4/0x1d8
> netif_close_many+0x8c/0x13c
> enetc4_pf_remove+0x2c/0x84
> pci_device_remove+0x44/0xe8
>
> To address this issue, Sarosh Hasan tried to change the devlink flag to
> DL_FLAG_AUTOREMOVE_SUPPLIER [1], so that the MAC driver will be removed
> along with the PHY driver. However, the solution does not take into
> account the hot-swappable PHY devices (SFP PHYs), so when the PHY device
> is unplugged, the MAC driver will automatically be removed, which is not
> the expected behavior. This issue should not exist for SFP PHYs, so based
> on the Sarosh's patch, the flag is changed to DL_FLAG_AUTOREMOVE_SUPPLIER
> for non-SFP PHYs.
>
> Reported-by: Abhishek Chauhan (ABC) <quic_abchauha@quicinc.com>
> Closes: https://lore.kernel.org/all/d696a426-40bb-4c1a-b42d-990fb690de5e@quicinc.com/
> Link: https://lore.kernel.org/imx/20250703090041.23137-1-quic_sarohasa@quicinc.com/ # [1]
> Fixes: bc66fa87d4fd ("net: phy: Add link between phy dev and mac dev")
> Suggested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
> Signed-off-by: Wei Fang <wei.fang@nxp.com>
I gave that patch a test, with the following cases :
- On Macchiatobin (we have PHYs that share an mdiobus).
When unbinding a PHY, the MAC dissapears as well :
#before :
# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 2048
link/ether 00:51:82:42:42:00 brd ff:ff:ff:ff:ff:ff
3: eth1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 2048
link/ether 00:51:82:42:42:01 brd ff:ff:ff:ff:ff:ff
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 2048
link/ether 00:51:82:42:42:02 brd ff:ff:ff:ff:ff:ff
5: eth3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 2048
link/ether 00:51:82:42:42:03 brd ff:ff:ff:ff:ff:ff
echo f212a600.mdio-mii:08 > /sys/devices/platform/cp0-bus/cp0-bus:bus@f2000000/f212a600.mdio/mdio_bus/f212a600.mdio-mii/f212a600.mdio-mii:08/driver/unbind
The MAC interface correctly disappears, but for some reason a lot of
other interfaces dissapeared as well (only eth0 is left, where I used to
have 4 different interfaces)
# after :
# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 2048
link/ether 00:51:82:42:42:00 brd ff:ff:ff:ff:ff:ff
- I also tested the SFP PHY setup on a Cyclone V platform,
and it worked as expected (i.e. MAC didn't disappear under my
feet when removing a Copper SFP, but the devlink was still created when
the module was present) :
# ls /sys/class/devlink/
mdio_bus:i2c:sfp:16--platform:ff702000.ethernet
I don't have time to investigate why my interfaces are dissapearing
on mcbin, but OTHO unbinding the devices manually isn't
something I do very often... It may or may not be related to this patch.
I'll let Russell and Andrew comment more on that as I may
still miss other cases, but as far as I can tell, this looks
OK.
Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Maxime
On Mon, Feb 02, 2026 at 12:10:41PM +0100, Maxime Chevallier wrote:
> Hi Wei,
>
> On 02/02/2026 06:45, Wei Fang wrote:
> > For the shared MDIO bus use case, multiple MACs will share the same MDIO
> > bus. Therefore, these MACs all depend on this MDIO bus. If this shared
> > MDIO bus is removed, all the PHY devices attached to this MDIO bus will
> > also be removed. Consequently, the MAC driver should not access the PHY
> > device, otherwise, it will lead to some potential crashes. Because the
> > corresponding phydev and the mii_bus have been freed, some pointers have
> > become invalid.
> >
> > For example. Abhishek reported a crash issue that occurred if the MDIO
> > bus driver was removed first, followed by the MAC driver. The crash log
> > is as below.
> >
> > Call trace:
> > __list_del_entry_valid_or_report+0xa8/0xe0
> > __device_link_del+0x40/0xf0
> > device_link_put_kref+0xb4/0xc8
> > device_link_del+0x38/0x58
> > phy_detach+0x2c/0x170
> > phy_disconnect+0x4c/0x70
> > phylink_disconnect_phy+0x6c/0xc0 [phylink]
> > stmmac_release+0x60/0x358 [stmmac]
> >
> > Another example is the i.MX95-15x15 platform which has two ENETC ports.
> > When all the external PHYs are managed the EMDIO (the MDIO controller),
> > if the enetc driver is removed after the EMDIO driver. Users will see
> > the below crash log and the console is hanged.
> >
> > Call trace:
> > _phy_state_machine+0x230/0x36c (P)
> > phy_stop+0x74/0x190
> > phylink_stop+0x28/0xb8
> > enetc_close+0x28/0x8c
> > __dev_close_many+0xb4/0x1d8
> > netif_close_many+0x8c/0x13c
> > enetc4_pf_remove+0x2c/0x84
> > pci_device_remove+0x44/0xe8
> >
> > To address this issue, Sarosh Hasan tried to change the devlink flag to
> > DL_FLAG_AUTOREMOVE_SUPPLIER [1], so that the MAC driver will be removed
> > along with the PHY driver. However, the solution does not take into
> > account the hot-swappable PHY devices (SFP PHYs), so when the PHY device
> > is unplugged, the MAC driver will automatically be removed, which is not
> > the expected behavior. This issue should not exist for SFP PHYs, so based
> > on the Sarosh's patch, the flag is changed to DL_FLAG_AUTOREMOVE_SUPPLIER
> > for non-SFP PHYs.
> >
> > Reported-by: Abhishek Chauhan (ABC) <quic_abchauha@quicinc.com>
> > Closes: https://lore.kernel.org/all/d696a426-40bb-4c1a-b42d-990fb690de5e@quicinc.com/
> > Link: https://lore.kernel.org/imx/20250703090041.23137-1-quic_sarohasa@quicinc.com/ # [1]
> > Fixes: bc66fa87d4fd ("net: phy: Add link between phy dev and mac dev")
> > Suggested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
> > Signed-off-by: Wei Fang <wei.fang@nxp.com>
>
> I gave that patch a test, with the following cases :
>
> - On Macchiatobin (we have PHYs that share an mdiobus).
> When unbinding a PHY, the MAC dissapears as well :
Correct, this is why these band-aids are harmful. One "device" can
correspond with *multiple* network interfaces, and the loss of one
PHY can have a *very* detrimental effect.
Consider the case where root-NFS is being used, and removing a PHY
on another interface takes out the interface that root-NFS is
using. Your machine is now dead in the water.
In my opinion, we should be concentrating more on the issue behind
the oops.
Given that this problem is because of the bus being removed, one
thing that would help would be for the MDIO bus to be properly
refcounted, and when the bus is unbound, to replace the bus ops
with versions that return -ENXIO or similar under the MII bus
lock. This would be easier of the MDIO bus ops were a separate struct
to struct mii_bus.
Similar with the PHY itself - if the PHY is in-use, it should be
refcounted to stop the struct phy_device from going away, and
should we have the situation where the PHY driver is unbound,
phydev->drv should be set to a set of dummy ops (under the phydev
mutex and probably rtnl.)
It seems to me that throwing devlinks at this problem is giving us
more problems than it's solving.
A graceful way to handle a MAC losing its PHY is for phylib to
indicate that the PHY has gone down, rather than removing the
network interface (and potentially a whole host of other network
interfaces in the case of one struct device being associated
with many interfaces.)
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
On 02/02/2026 15:25, Russell King (Oracle) wrote:
> On Mon, Feb 02, 2026 at 12:10:41PM +0100, Maxime Chevallier wrote:
>> Hi Wei,
>>
>> On 02/02/2026 06:45, Wei Fang wrote:
>>> For the shared MDIO bus use case, multiple MACs will share the same MDIO
>>> bus. Therefore, these MACs all depend on this MDIO bus. If this shared
>>> MDIO bus is removed, all the PHY devices attached to this MDIO bus will
>>> also be removed. Consequently, the MAC driver should not access the PHY
>>> device, otherwise, it will lead to some potential crashes. Because the
>>> corresponding phydev and the mii_bus have been freed, some pointers have
>>> become invalid.
>>>
>>> For example. Abhishek reported a crash issue that occurred if the MDIO
>>> bus driver was removed first, followed by the MAC driver. The crash log
>>> is as below.
>>>
>>> Call trace:
>>> __list_del_entry_valid_or_report+0xa8/0xe0
>>> __device_link_del+0x40/0xf0
>>> device_link_put_kref+0xb4/0xc8
>>> device_link_del+0x38/0x58
>>> phy_detach+0x2c/0x170
>>> phy_disconnect+0x4c/0x70
>>> phylink_disconnect_phy+0x6c/0xc0 [phylink]
>>> stmmac_release+0x60/0x358 [stmmac]
>>>
>>> Another example is the i.MX95-15x15 platform which has two ENETC ports.
>>> When all the external PHYs are managed the EMDIO (the MDIO controller),
>>> if the enetc driver is removed after the EMDIO driver. Users will see
>>> the below crash log and the console is hanged.
>>>
>>> Call trace:
>>> _phy_state_machine+0x230/0x36c (P)
>>> phy_stop+0x74/0x190
>>> phylink_stop+0x28/0xb8
>>> enetc_close+0x28/0x8c
>>> __dev_close_many+0xb4/0x1d8
>>> netif_close_many+0x8c/0x13c
>>> enetc4_pf_remove+0x2c/0x84
>>> pci_device_remove+0x44/0xe8
>>>
>>> To address this issue, Sarosh Hasan tried to change the devlink flag to
>>> DL_FLAG_AUTOREMOVE_SUPPLIER [1], so that the MAC driver will be removed
>>> along with the PHY driver. However, the solution does not take into
>>> account the hot-swappable PHY devices (SFP PHYs), so when the PHY device
>>> is unplugged, the MAC driver will automatically be removed, which is not
>>> the expected behavior. This issue should not exist for SFP PHYs, so based
>>> on the Sarosh's patch, the flag is changed to DL_FLAG_AUTOREMOVE_SUPPLIER
>>> for non-SFP PHYs.
>>>
>>> Reported-by: Abhishek Chauhan (ABC) <quic_abchauha@quicinc.com>
>>> Closes: https://lore.kernel.org/all/d696a426-40bb-4c1a-b42d-990fb690de5e@quicinc.com/
>>> Link: https://lore.kernel.org/imx/20250703090041.23137-1-quic_sarohasa@quicinc.com/ # [1]
>>> Fixes: bc66fa87d4fd ("net: phy: Add link between phy dev and mac dev")
>>> Suggested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
>>> Signed-off-by: Wei Fang <wei.fang@nxp.com>
>>
>> I gave that patch a test, with the following cases :
>>
>> - On Macchiatobin (we have PHYs that share an mdiobus).
>> When unbinding a PHY, the MAC dissapears as well :
>
> Correct, this is why these band-aids are harmful. One "device" can
> correspond with *multiple* network interfaces, and the loss of one
> PHY can have a *very* detrimental effect.
>
> Consider the case where root-NFS is being used, and removing a PHY
> on another interface takes out the interface that root-NFS is
> using. Your machine is now dead in the water.
That's what I've been seeing. I unbound one PHY, it took out 3 netdevs
and I don't have log regarding "why". I guess there's devlink debug
knobs for that, but not enabled by default it seems.
However, we seem to have the issue even without this patch.
On MCBin, if I unbind eth1 for example, all 3 interfaces that are on CP1
are gone :
cd /sys/class/net/eth1/device/driver
echo f4000000.ethernet > unbind
only eth0 is now left. This is on net-next/main :(
For Wei's case where unbinding netdev 1 brings the mdio bus down, used
by PHY on netdev 2, we'd be also dead in the water as well no matter
what as well no ?
> In my opinion, we should be concentrating more on the issue behind
> the oops.
>
> Given that this problem is because of the bus being removed, one
> thing that would help would be for the MDIO bus to be properly
> refcounted, and when the bus is unbound, to replace the bus ops
> with versions that return -ENXIO or similar under the MII bus
> lock. This would be easier of the MDIO bus ops were a separate struct
> to struct mii_bus.
>
> Similar with the PHY itself - if the PHY is in-use, it should be
> refcounted to stop the struct phy_device from going away, and
> should we have the situation where the PHY driver is unbound,
> phydev->drv should be set to a set of dummy ops (under the phydev
> mutex and probably rtnl.)
>
> It seems to me that throwing devlinks at this problem is giving us
> more problems than it's solving.
>
> A graceful way to handle a MAC losing its PHY is for phylib to
> indicate that the PHY has gone down, rather than removing the
> network interface (and potentially a whole host of other network
> interfaces in the case of one struct device being associated
> with many interfaces.)
>
Agreed, that's quite the can of worms though I suspect :(
Maxime
On Mon, Feb 02, 2026 at 06:38:48PM +0100, Maxime Chevallier wrote:
>
>
> On 02/02/2026 15:25, Russell King (Oracle) wrote:
> > On Mon, Feb 02, 2026 at 12:10:41PM +0100, Maxime Chevallier wrote:
> >> Hi Wei,
> >>
> >> On 02/02/2026 06:45, Wei Fang wrote:
> >>> For the shared MDIO bus use case, multiple MACs will share the same MDIO
> >>> bus. Therefore, these MACs all depend on this MDIO bus. If this shared
> >>> MDIO bus is removed, all the PHY devices attached to this MDIO bus will
> >>> also be removed. Consequently, the MAC driver should not access the PHY
> >>> device, otherwise, it will lead to some potential crashes. Because the
> >>> corresponding phydev and the mii_bus have been freed, some pointers have
> >>> become invalid.
> >>>
> >>> For example. Abhishek reported a crash issue that occurred if the MDIO
> >>> bus driver was removed first, followed by the MAC driver. The crash log
> >>> is as below.
> >>>
> >>> Call trace:
> >>> __list_del_entry_valid_or_report+0xa8/0xe0
> >>> __device_link_del+0x40/0xf0
> >>> device_link_put_kref+0xb4/0xc8
> >>> device_link_del+0x38/0x58
> >>> phy_detach+0x2c/0x170
> >>> phy_disconnect+0x4c/0x70
> >>> phylink_disconnect_phy+0x6c/0xc0 [phylink]
> >>> stmmac_release+0x60/0x358 [stmmac]
> >>>
> >>> Another example is the i.MX95-15x15 platform which has two ENETC ports.
> >>> When all the external PHYs are managed the EMDIO (the MDIO controller),
> >>> if the enetc driver is removed after the EMDIO driver. Users will see
> >>> the below crash log and the console is hanged.
> >>>
> >>> Call trace:
> >>> _phy_state_machine+0x230/0x36c (P)
> >>> phy_stop+0x74/0x190
> >>> phylink_stop+0x28/0xb8
> >>> enetc_close+0x28/0x8c
> >>> __dev_close_many+0xb4/0x1d8
> >>> netif_close_many+0x8c/0x13c
> >>> enetc4_pf_remove+0x2c/0x84
> >>> pci_device_remove+0x44/0xe8
> >>>
> >>> To address this issue, Sarosh Hasan tried to change the devlink flag to
> >>> DL_FLAG_AUTOREMOVE_SUPPLIER [1], so that the MAC driver will be removed
> >>> along with the PHY driver. However, the solution does not take into
> >>> account the hot-swappable PHY devices (SFP PHYs), so when the PHY device
> >>> is unplugged, the MAC driver will automatically be removed, which is not
> >>> the expected behavior. This issue should not exist for SFP PHYs, so based
> >>> on the Sarosh's patch, the flag is changed to DL_FLAG_AUTOREMOVE_SUPPLIER
> >>> for non-SFP PHYs.
> >>>
> >>> Reported-by: Abhishek Chauhan (ABC) <quic_abchauha@quicinc.com>
> >>> Closes: https://lore.kernel.org/all/d696a426-40bb-4c1a-b42d-990fb690de5e@quicinc.com/
> >>> Link: https://lore.kernel.org/imx/20250703090041.23137-1-quic_sarohasa@quicinc.com/ # [1]
> >>> Fixes: bc66fa87d4fd ("net: phy: Add link between phy dev and mac dev")
> >>> Suggested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
> >>> Signed-off-by: Wei Fang <wei.fang@nxp.com>
> >>
> >> I gave that patch a test, with the following cases :
> >>
> >> - On Macchiatobin (we have PHYs that share an mdiobus).
> >> When unbinding a PHY, the MAC dissapears as well :
> >
> > Correct, this is why these band-aids are harmful. One "device" can
> > correspond with *multiple* network interfaces, and the loss of one
> > PHY can have a *very* detrimental effect.
> >
> > Consider the case where root-NFS is being used, and removing a PHY
> > on another interface takes out the interface that root-NFS is
> > using. Your machine is now dead in the water.
>
> That's what I've been seeing. I unbound one PHY, it took out 3 netdevs
> and I don't have log regarding "why". I guess there's devlink debug
> knobs for that, but not enabled by default it seems.
See what I said above. "One "device" can correspond with *multiple*
network interfaces".
On Armada 8040, one network "device" has multiple ports - they all
share the same packet infrastructure. Each port is a separate
interface in the kernel.
Consequently, the "struct device" is common across all ports on one
of the CP110 dies (there are two dies.) If one triggers an unbind
of that struct device, then you lose *all* ports on that CP110 die
whether or not the others _could_ remain functional.
Consider a DSA switch, which has external PHYs connected. Should
unbinding one port's PHYs take out the entire switch - and in the
case of multiple switches, cause the entire switch tree to be taken
out?
This is why devlinks is a bad idea. It's too heavy handed for cases
beyond the simple "one network device per struct device" model that
doesn't exist everywhere. For simple cases, yes, maybe, but not
where it means that taking out one minor part of the system destroys
the entire system because it chose to unbind a multi-interface device.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
On 02/02/2026 19:00, Russell King (Oracle) wrote:
> On Mon, Feb 02, 2026 at 06:38:48PM +0100, Maxime Chevallier wrote:
>>
>>
>> On 02/02/2026 15:25, Russell King (Oracle) wrote:
>>> On Mon, Feb 02, 2026 at 12:10:41PM +0100, Maxime Chevallier wrote:
>>>> Hi Wei,
>>>>
>>>> On 02/02/2026 06:45, Wei Fang wrote:
>>>>> For the shared MDIO bus use case, multiple MACs will share the same MDIO
>>>>> bus. Therefore, these MACs all depend on this MDIO bus. If this shared
>>>>> MDIO bus is removed, all the PHY devices attached to this MDIO bus will
>>>>> also be removed. Consequently, the MAC driver should not access the PHY
>>>>> device, otherwise, it will lead to some potential crashes. Because the
>>>>> corresponding phydev and the mii_bus have been freed, some pointers have
>>>>> become invalid.
>>>>>
>>>>> For example. Abhishek reported a crash issue that occurred if the MDIO
>>>>> bus driver was removed first, followed by the MAC driver. The crash log
>>>>> is as below.
>>>>>
>>>>> Call trace:
>>>>> __list_del_entry_valid_or_report+0xa8/0xe0
>>>>> __device_link_del+0x40/0xf0
>>>>> device_link_put_kref+0xb4/0xc8
>>>>> device_link_del+0x38/0x58
>>>>> phy_detach+0x2c/0x170
>>>>> phy_disconnect+0x4c/0x70
>>>>> phylink_disconnect_phy+0x6c/0xc0 [phylink]
>>>>> stmmac_release+0x60/0x358 [stmmac]
>>>>>
>>>>> Another example is the i.MX95-15x15 platform which has two ENETC ports.
>>>>> When all the external PHYs are managed the EMDIO (the MDIO controller),
>>>>> if the enetc driver is removed after the EMDIO driver. Users will see
>>>>> the below crash log and the console is hanged.
>>>>>
>>>>> Call trace:
>>>>> _phy_state_machine+0x230/0x36c (P)
>>>>> phy_stop+0x74/0x190
>>>>> phylink_stop+0x28/0xb8
>>>>> enetc_close+0x28/0x8c
>>>>> __dev_close_many+0xb4/0x1d8
>>>>> netif_close_many+0x8c/0x13c
>>>>> enetc4_pf_remove+0x2c/0x84
>>>>> pci_device_remove+0x44/0xe8
>>>>>
>>>>> To address this issue, Sarosh Hasan tried to change the devlink flag to
>>>>> DL_FLAG_AUTOREMOVE_SUPPLIER [1], so that the MAC driver will be removed
>>>>> along with the PHY driver. However, the solution does not take into
>>>>> account the hot-swappable PHY devices (SFP PHYs), so when the PHY device
>>>>> is unplugged, the MAC driver will automatically be removed, which is not
>>>>> the expected behavior. This issue should not exist for SFP PHYs, so based
>>>>> on the Sarosh's patch, the flag is changed to DL_FLAG_AUTOREMOVE_SUPPLIER
>>>>> for non-SFP PHYs.
>>>>>
>>>>> Reported-by: Abhishek Chauhan (ABC) <quic_abchauha@quicinc.com>
>>>>> Closes: https://lore.kernel.org/all/d696a426-40bb-4c1a-b42d-990fb690de5e@quicinc.com/
>>>>> Link: https://lore.kernel.org/imx/20250703090041.23137-1-quic_sarohasa@quicinc.com/ # [1]
>>>>> Fixes: bc66fa87d4fd ("net: phy: Add link between phy dev and mac dev")
>>>>> Suggested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
>>>>> Signed-off-by: Wei Fang <wei.fang@nxp.com>
>>>>
>>>> I gave that patch a test, with the following cases :
>>>>
>>>> - On Macchiatobin (we have PHYs that share an mdiobus).
>>>> When unbinding a PHY, the MAC dissapears as well :
>>>
>>> Correct, this is why these band-aids are harmful. One "device" can
>>> correspond with *multiple* network interfaces, and the loss of one
>>> PHY can have a *very* detrimental effect.
>>>
>>> Consider the case where root-NFS is being used, and removing a PHY
>>> on another interface takes out the interface that root-NFS is
>>> using. Your machine is now dead in the water.
>>
>> That's what I've been seeing. I unbound one PHY, it took out 3 netdevs
>> and I don't have log regarding "why". I guess there's devlink debug
>> knobs for that, but not enabled by default it seems.
>
> See what I said above. "One "device" can correspond with *multiple*
> network interfaces".
>
> On Armada 8040, one network "device" has multiple ports - they all
> share the same packet infrastructure. Each port is a separate
> interface in the kernel.
>
> Consequently, the "struct device" is common across all ports on one
> of the CP110 dies (there are two dies.) If one triggers an unbind
> of that struct device, then you lose *all* ports on that CP110 die
> whether or not the others _could_ remain functional.
>
> Consider a DSA switch, which has external PHYs connected. Should
> unbinding one port's PHYs take out the entire switch - and in the
> case of multiple switches, cause the entire switch tree to be taken
> out?
Don't get me wrong, I completely agree with you on that, it's pretty bad
to lose all these interfaces in one go, and the debugging experience to
figure this out on an unknown system doesn't sound great.
> This is why devlinks is a bad idea. It's too heavy handed for cases
> beyond the simple "one network device per struct device" model that
> doesn't exist everywhere. For simple cases, yes, maybe, but not
> where it means that taking out one minor part of the system destroys
> the entire system because it chose to unbind a multi-interface device.
>
Fair, fair.
I think you gave enough pointers on a way forward then.
Maxime
© 2016 - 2026 Red Hat, Inc.