drivers/net/phy/phy_device.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
There is a potential crash issue when disabling and re-enabling the
network port. When disabling the network port, phy_detach() calls
device_link_del() to remove the device link, but it does not clear
phydev->devlink, so phydev->devlink is not a NULL pointer. Then the
network port is re-enabled, but if phy_attach_direct() fails before
calling device_link_add(), the code jumps to the "error" label and
calls phy_detach(). Since phydev->devlink retains the old value from
the previous attach/detach cycle, device_link_del() uses the old value,
which accesses a NULL pointer and causes a crash. The simplified crash
log is as follows.
[ 24.702421] Call trace:
[ 24.704856] device_link_put_kref+0x20/0x120
[ 24.709124] device_link_del+0x30/0x48
[ 24.712864] phy_detach+0x24/0x168
[ 24.716261] phy_attach_direct+0x168/0x3a4
[ 24.720352] phylink_fwnode_phy_connect+0xc8/0x14c
[ 24.725140] phylink_of_phy_connect+0x1c/0x34
Therefore, phydev->devlink needs to be cleared when the device link is
deleted.
Fixes: bc66fa87d4fd ("net: phy: Add link between phy dev and mac dev")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
---
v2: Improve the commit message.
---
drivers/net/phy/phy_device.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index cc1bfd22fb81..7d5e76a3db0e 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -1727,8 +1727,10 @@ void phy_detach(struct phy_device *phydev)
struct module *ndev_owner = NULL;
struct mii_bus *bus;
- if (phydev->devlink)
+ if (phydev->devlink) {
device_link_del(phydev->devlink);
+ phydev->devlink = NULL;
+ }
if (phydev->sysfs_links) {
if (dev)
--
2.34.1
On 5/23/2025 1:37 AM, Wei Fang wrote:
> There is a potential crash issue when disabling and re-enabling the
> network port. When disabling the network port, phy_detach() calls
> device_link_del() to remove the device link, but it does not clear
> phydev->devlink, so phydev->devlink is not a NULL pointer. Then the
> network port is re-enabled, but if phy_attach_direct() fails before
> calling device_link_add(), the code jumps to the "error" label and
> calls phy_detach(). Since phydev->devlink retains the old value from
> the previous attach/detach cycle, device_link_del() uses the old value,
> which accesses a NULL pointer and causes a crash. The simplified crash
> log is as follows.
>
> [ 24.702421] Call trace:
> [ 24.704856] device_link_put_kref+0x20/0x120
> [ 24.709124] device_link_del+0x30/0x48
> [ 24.712864] phy_detach+0x24/0x168
> [ 24.716261] phy_attach_direct+0x168/0x3a4
> [ 24.720352] phylink_fwnode_phy_connect+0xc8/0x14c
> [ 24.725140] phylink_of_phy_connect+0x1c/0x34
>
> Therefore, phydev->devlink needs to be cleared when the device link is
> deleted.
>
> Fixes: bc66fa87d4fd ("net: phy: Add link between phy dev and mac dev")
> Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
--
Florian
On 5/23/2025 8:19 AM, Florian Fainelli wrote:
>
>
> On 5/23/2025 1:37 AM, Wei Fang wrote:
>> There is a potential crash issue when disabling and re-enabling the
>> network port. When disabling the network port, phy_detach() calls
>> device_link_del() to remove the device link, but it does not clear
>> phydev->devlink, so phydev->devlink is not a NULL pointer. Then the
>> network port is re-enabled, but if phy_attach_direct() fails before
>> calling device_link_add(), the code jumps to the "error" label and
>> calls phy_detach(). Since phydev->devlink retains the old value from
>> the previous attach/detach cycle, device_link_del() uses the old value,
>> which accesses a NULL pointer and causes a crash. The simplified crash
>> log is as follows.
>>
>> [ 24.702421] Call trace:
>> [ 24.704856] device_link_put_kref+0x20/0x120
>> [ 24.709124] device_link_del+0x30/0x48
>> [ 24.712864] phy_detach+0x24/0x168
>> [ 24.716261] phy_attach_direct+0x168/0x3a4
>> [ 24.720352] phylink_fwnode_phy_connect+0xc8/0x14c
>> [ 24.725140] phylink_of_phy_connect+0x1c/0x34
>>
>> Therefore, phydev->devlink needs to be cleared when the device link is
>> deleted.
>>
>> Fixes: bc66fa87d4fd ("net: phy: Add link between phy dev and mac dev")
>> Signed-off-by: Wei Fang <wei.fang@nxp.com>
>
@Wei
What happens in case of shared mdio ?
1. Device 23040000 has the mdio node of both the ethernet phy and device 23000000 references the phy-handle present in the Device 23040000
2. When rmmod of the driver happens
3. the parent devlink is already deleted.
4. This cause the child mdio to access an entry causing a corruption.
5. Thought this fix would help but i see that its not helping the case.
Wondering if this is a legacy issue with shared mdio framework.
43369.232799: qcom-ethqos 23040000.ethernet eth1: stmmac_dvr_remove: removing driver
43369.233782: stmmac_pcs: Link Down
43369.258337: qcom-ethqos 23040000.ethernet eth1: FPE workqueue stop
43369.258522: br1: port 1(eth1) entered disabled state
43369.758779: qcom-ethqos 23040000.ethernet eth1: Timeout accessing MAC_VLAN_Tag_Filter
43369.758789: qcom-ethqos 23040000.ethernet eth1: failed to kill vid 0081/0
43369.759270: qcom-ethqos 23040000.ethernet eth1 (unregistering): left allmulticast mode
43369.759275: qcom-ethqos 23040000.ethernet eth1 (unregistering): left promiscuous mode
43369.759301: br1: port 1(eth1) entered disabled state
43370.259618: qcom-ethqos 23040000.ethernet eth1 (unregistering): Timeout accessing MAC_VLAN_Tag_Filter
43370.309863: qcom-ethqos 23000000.ethernet eth0: stmmac_dvr_remove: removing driver
43370.310019: list_del corruption, ffffff80c6ec9408->prev is LIST_POISON2 (dead000000000122)
43370.310034: ------------[ cut here ]------------
43370.310035: kernel BUG at lib/list_debug.c:59!
43370.310119: CPU: 4 PID: 3067767 Comm: rmmod Tainted: G W OE 6.6.65-rt47-debug #1
43370.310122: Hardware name: Qualcomm Technologies, Inc. SA8775P Ride (DT)
43370.310165: Call trace:
43370.310166: __list_del_entry_valid_or_report+0xa8/0xe0
43370.310168: __device_link_del+0x40/0xf0
43370.310172: device_link_put_kref+0xb4/0xc8
43370.310174: device_link_del+0x38/0x58
43370.310176: phy_detach+0x2c/0x170
43370.310181: phy_disconnect+0x4c/0x70
43370.310184: phylink_disconnect_phy+0x6c/0xc0 [phylink]
43370.310194: stmmac_release+0x60/0x358 [stmmac]
43370.310210: __dev_close_many+0xb4/0x160
43370.310213: dev_close_many+0xbc/0x1a0
43370.310215: unregister_netdevice_many_notify+0x178/0x870
43370.310218: unregister_netdevice_queue+0xf8/0x140
43370.310221: unregister_netdev+0x2c/0x48
43370.310223: stmmac_dvr_remove+0xd0/0x1b0 [stmmac]
43370.310233: devm_stmmac_pltfr_remove+0x2c/0x58 [stmmac_platform]
> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
On Tue, Jun 03, 2025 at 01:39:47PM -0700, Abhishek Chauhan (ABC) wrote:
>
>
> On 5/23/2025 8:19 AM, Florian Fainelli wrote:
> >
> >
> > On 5/23/2025 1:37 AM, Wei Fang wrote:
> >> There is a potential crash issue when disabling and re-enabling the
> >> network port. When disabling the network port, phy_detach() calls
> >> device_link_del() to remove the device link, but it does not clear
> >> phydev->devlink, so phydev->devlink is not a NULL pointer. Then the
> >> network port is re-enabled, but if phy_attach_direct() fails before
> >> calling device_link_add(), the code jumps to the "error" label and
> >> calls phy_detach(). Since phydev->devlink retains the old value from
> >> the previous attach/detach cycle, device_link_del() uses the old value,
> >> which accesses a NULL pointer and causes a crash. The simplified crash
> >> log is as follows.
> >>
> >> [ 24.702421] Call trace:
> >> [ 24.704856] device_link_put_kref+0x20/0x120
> >> [ 24.709124] device_link_del+0x30/0x48
> >> [ 24.712864] phy_detach+0x24/0x168
> >> [ 24.716261] phy_attach_direct+0x168/0x3a4
> >> [ 24.720352] phylink_fwnode_phy_connect+0xc8/0x14c
> >> [ 24.725140] phylink_of_phy_connect+0x1c/0x34
> >>
> >> Therefore, phydev->devlink needs to be cleared when the device link is
> >> deleted.
> >>
> >> Fixes: bc66fa87d4fd ("net: phy: Add link between phy dev and mac dev")
> >> Signed-off-by: Wei Fang <wei.fang@nxp.com>
> >
> @Wei
> What happens in case of shared mdio ?
>
> 1. Device 23040000 has the mdio node of both the ethernet phy and device 23000000 references the phy-handle present in the Device 23040000
> 2. When rmmod of the driver happens
> 3. the parent devlink is already deleted.
> 4. This cause the child mdio to access an entry causing a corruption.
> 5. Thought this fix would help but i see that its not helping the case.
>
> Wondering if this is a legacy issue with shared mdio framework.
The device link does nothing for this as it has DL_FLAG_STATELESS set,
which only affects suspend/resume/shutdown ordering, and with
DL_FLAG_PM_RUNTIME also set, runtime PM.
The device probe/removal ordering is unaffected. Maybe that's a
problem, but it needs careful consideration to change.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
On Fri, May 23, 2025 at 04:37:59PM +0800, Wei Fang wrote:
> There is a potential crash issue when disabling and re-enabling the
> network port. When disabling the network port, phy_detach() calls
> device_link_del() to remove the device link, but it does not clear
> phydev->devlink, so phydev->devlink is not a NULL pointer. Then the
> network port is re-enabled, but if phy_attach_direct() fails before
> calling device_link_add(), the code jumps to the "error" label and
> calls phy_detach(). Since phydev->devlink retains the old value from
> the previous attach/detach cycle, device_link_del() uses the old value,
> which accesses a NULL pointer and causes a crash. The simplified crash
> log is as follows.
>
> [ 24.702421] Call trace:
> [ 24.704856] device_link_put_kref+0x20/0x120
> [ 24.709124] device_link_del+0x30/0x48
> [ 24.712864] phy_detach+0x24/0x168
> [ 24.716261] phy_attach_direct+0x168/0x3a4
> [ 24.720352] phylink_fwnode_phy_connect+0xc8/0x14c
> [ 24.725140] phylink_of_phy_connect+0x1c/0x34
>
> Therefore, phydev->devlink needs to be cleared when the device link is
> deleted.
>
> Fixes: bc66fa87d4fd ("net: phy: Add link between phy dev and mac dev")
> Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Andrew
© 2016 - 2025 Red Hat, Inc.