Disable interrupt handling for the LAN87xx PHY to prevent the network
interface from entering a corrupted state after rapid configuration
changes.
When the link configuration is changed quickly, the PHY can get stuck in
a non-functional state. In this state, 'ethtool' reports that a link is
present, but 'ip link' shows NO-CARRIER, and the interface is unable to
transfer data.
Fixes: 1ce8b37241ed ("usbnet: smsc95xx: Forward PHY interrupts to PHY driver to avoid polling")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Cc: Lukas Wunner <lukas@wunner.de>
---
drivers/net/phy/smsc.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/drivers/net/phy/smsc.c b/drivers/net/phy/smsc.c
index b6489da5cfcd..dac6bf156d15 100644
--- a/drivers/net/phy/smsc.c
+++ b/drivers/net/phy/smsc.c
@@ -746,10 +746,6 @@ static struct phy_driver smsc_phy_driver[] = {
.soft_reset = smsc_phy_reset,
.config_aneg = lan87xx_config_aneg,
- /* IRQ related */
- .config_intr = smsc_phy_config_intr,
- .handle_interrupt = smsc_phy_handle_interrupt,
-
/* Statistics */
.get_sset_count = smsc_get_sset_count,
.get_strings = smsc_get_strings,
--
2.39.5
On Tue, Jul 01, 2025 at 02:21:46PM +0200, Oleksij Rempel wrote: > Disable interrupt handling for the LAN87xx PHY to prevent the network > interface from entering a corrupted state after rapid configuration > changes. > > When the link configuration is changed quickly, the PHY can get stuck in > a non-functional state. In this state, 'ethtool' reports that a link is > present, but 'ip link' shows NO-CARRIER, and the interface is unable to > transfer data. [...] > --- a/drivers/net/phy/smsc.c > +++ b/drivers/net/phy/smsc.c > @@ -746,10 +746,6 @@ static struct phy_driver smsc_phy_driver[] = { > .soft_reset = smsc_phy_reset, > .config_aneg = lan87xx_config_aneg, > > - /* IRQ related */ > - .config_intr = smsc_phy_config_intr, > - .handle_interrupt = smsc_phy_handle_interrupt, > - Well, that's not good. I guess this means that the interrupt is polled again, so we basically go back to the suboptimal behavior prior to 1ce8b37241ed? Without support for interrupt handling, we can't take advantage of the GPIOs on the chip for interrupt generation. Nor can we properly support runtime PM if no cable is attached. What's the actual root cause? Is it the issue described in this paragraph of 1ce8b37241ed's commit message? Normally the PHY interrupt should be masked until the PHY driver has cleared it. However masking requires a (sleeping) USB transaction and interrupts are received in (non-sleepable) softirq context. I decided not to mask the interrupt at all (by using the dummy_irq_chip's noop ->irq_mask() callback): The USB interrupt endpoint is polled in 1 msec intervals and normally that's sufficient to wake the PHY driver's IRQ thread and have it clear the interrupt. If it does take longer, worst thing that can happen is the IRQ thread is woken again. No big deal. There must be better options than going back to polling. E.g. inserting delays to avoid the PHY getting wedged. TBH I did test this thoroughly back in the day and never witnessed the issue. Thanks, Lukas
On 7/1/25 2:58 PM, Lukas Wunner wrote: > On Tue, Jul 01, 2025 at 02:21:46PM +0200, Oleksij Rempel wrote: >> Disable interrupt handling for the LAN87xx PHY to prevent the network >> interface from entering a corrupted state after rapid configuration >> changes. >> >> When the link configuration is changed quickly, the PHY can get stuck in >> a non-functional state. In this state, 'ethtool' reports that a link is >> present, but 'ip link' shows NO-CARRIER, and the interface is unable to >> transfer data. > [...] >> --- a/drivers/net/phy/smsc.c >> +++ b/drivers/net/phy/smsc.c >> @@ -746,10 +746,6 @@ static struct phy_driver smsc_phy_driver[] = { >> .soft_reset = smsc_phy_reset, >> .config_aneg = lan87xx_config_aneg, >> >> - /* IRQ related */ >> - .config_intr = smsc_phy_config_intr, >> - .handle_interrupt = smsc_phy_handle_interrupt, >> - > > Well, that's not good. I guess this means that the interrupt is > polled again, so we basically go back to the suboptimal behavior > prior to 1ce8b37241ed? > > Without support for interrupt handling, we can't take advantage > of the GPIOs on the chip for interrupt generation. Nor can we > properly support runtime PM if no cable is attached. > > What's the actual root cause? Is it the issue described in this > paragraph of 1ce8b37241ed's commit message? > > Normally the PHY interrupt should be masked until the PHY driver has > cleared it. However masking requires a (sleeping) USB transaction and > interrupts are received in (non-sleepable) softirq context. I decided > not to mask the interrupt at all (by using the dummy_irq_chip's noop > ->irq_mask() callback): The USB interrupt endpoint is polled in 1 msec > intervals and normally that's sufficient to wake the PHY driver's IRQ > thread and have it clear the interrupt. If it does take longer, worst > thing that can happen is the IRQ thread is woken again. No big deal. > > There must be better options than going back to polling. > E.g. inserting delays to avoid the PHY getting wedged. I agree the solution proposed by this patch looks too rough. I think more effort should be invested to at least understand why the phy got stuck. @Oleksij: possibly you could re-submit patch 1-3 only while you keep investigating the issue addressed here. /P
Hi Lukas, On Tue, Jul 01, 2025 at 02:58:19PM +0200, Lukas Wunner wrote: > On Tue, Jul 01, 2025 at 02:21:46PM +0200, Oleksij Rempel wrote: > > Disable interrupt handling for the LAN87xx PHY to prevent the network > > interface from entering a corrupted state after rapid configuration > > changes. > > > > When the link configuration is changed quickly, the PHY can get stuck in > > a non-functional state. In this state, 'ethtool' reports that a link is > > present, but 'ip link' shows NO-CARRIER, and the interface is unable to > > transfer data. > [...] > > --- a/drivers/net/phy/smsc.c > > +++ b/drivers/net/phy/smsc.c > > @@ -746,10 +746,6 @@ static struct phy_driver smsc_phy_driver[] = { > > .soft_reset = smsc_phy_reset, > > .config_aneg = lan87xx_config_aneg, > > > > - /* IRQ related */ > > - .config_intr = smsc_phy_config_intr, > > - .handle_interrupt = smsc_phy_handle_interrupt, > > - > > Well, that's not good. I guess this means that the interrupt is > polled again, so we basically go back to the suboptimal behavior > prior to 1ce8b37241ed? Not fully. It will disable interrupt support only for the embedded PHY, other types of interrupts should work as expected. > Without support for interrupt handling, we can't take advantage > of the GPIOs on the chip for interrupt generation. Nor can we > properly support runtime PM if no cable is attached. Hm... the PHY smsc driver is not using EDPD mode by default if PHY interrupts are enabled. Or do you mean other kind of PM? > What's the actual root cause? Is it the issue described in this > paragraph of 1ce8b37241ed's commit message? > > Normally the PHY interrupt should be masked until the PHY driver has > cleared it. However masking requires a (sleeping) USB transaction and > interrupts are received in (non-sleepable) softirq context. I decided > not to mask the interrupt at all (by using the dummy_irq_chip's noop > ->irq_mask() callback): The USB interrupt endpoint is polled in 1 msec > intervals and normally that's sufficient to wake the PHY driver's IRQ > thread and have it clear the interrupt. If it does take longer, worst > thing that can happen is the IRQ thread is woken again. No big deal. I'm not sure. It seems to be not the problem. > There must be better options than going back to polling. > E.g. inserting delays to avoid the PHY getting wedged. > > TBH I did test this thoroughly back in the day and never > witnessed the issue. I did some testing back in time too. It worked and still works normally in the autoneg mode. What is not working as expected is the fixed mode, especially 10 mbit fixed mode. Here are my current testing results: # configure 10 mbit forced mode: ethtool -s eth0 autoneg off speed 10 duplex half # attach cable (can be done wothout reataching cable) [10174.585150] smsc_phy_handle_interrupt: MII_LAN83C185_ISF = 0x0098 [10174.586760] smsc_phy_handle_interrupt: MII_LAN83C185_ISF = 0x0098 [10174.594636] lan87xx_read_status: link: no, speed: 10, duplex: half, autoneg: off [10174.602777] lan87xx_read_status: link: no, speed: 10, duplex: half, autoneg: off [10174.841458] smsc_phy_handle_interrupt: MII_LAN83C185_ISF = 0x0098 [10174.843017] smsc_phy_handle_interrupt: MII_LAN83C185_ISF = 0x0098 [10174.850619] lan87xx_read_status: link: no, speed: 10, duplex: half, autoneg: off [10174.857026] lan87xx_read_status: link: no, speed: 10, duplex: half, autoneg: off [10175.425513] smsc_phy_handle_interrupt: MII_LAN83C185_ISF = 0x0098 [10175.427046] smsc_phy_handle_interrupt: MII_LAN83C185_ISF = 0x0098 [10175.434871] lan87xx_read_status: link: no, speed: 10, duplex: half, autoneg: off [10175.441332] lan87xx_read_status: link: no, speed: 10, duplex: half, autoneg: off At this point no more interrupts will come and link up state will not be detected. Replugging cable will have same result. The worst part - unplugging the cable may trigger an endless interrupt storm (which is some times reproducible in the 10Mbit forced mode): [ 1584.132799] smsc_phy_handle_interrupt: MII_LAN83C185_ISF = 0x0098 [ 1584.134220] smsc_phy_handle_interrupt: MII_LAN83C185_ISF = 0x0098 [ 1584.389134] smsc_phy_handle_interrupt: MII_LAN83C185_ISF = 0x0098 [ 1584.390591] smsc_phy_handle_interrupt: MII_LAN83C185_ISF = 0x0098 [ 1584.644757] smsc_phy_handle_interrupt: MII_LAN83C185_ISF = 0x0098 [ 1584.646177] smsc_phy_handle_interrupt: MII_LAN83C185_ISF = 0x0098 [ 1584.900781] smsc_phy_handle_interrupt: MII_LAN83C185_ISF = 0x0098 [ 1584.902305] smsc_phy_handle_interrupt: MII_LAN83C185_ISF = 0x0098 [ 1585.158416] smsc_phy_handle_interrupt: MII_LAN83C185_ISF = 0x0098 With latest kernel we can use adaptive polling, wich I added now for testing. Here are the results: [ 2200.702427] lan87xx_read_status: link: no, speed: 10, duplex: half, autoneg: off [ 2200.948552] smsc95xx 1-1.1:1.0 enu1u1: intdata: 0x00008000 [ 2200.949640] smsc95xx 1-1.1:1.0 enu1u1: intdata: 0x00008000 [ 2200.950182] smsc95xx 1-1.1:1.0 enu1u1: intdata: 0x00008000 [ 2200.951374] smsc_phy_handle_interrupt: MII_LAN83C185_ISF = 0x0098 [ 2200.953186] smsc_phy_handle_interrupt: MII_LAN83C185_ISF = 0x0098 [ 2200.959234] lan87xx_read_status: link: no, speed: 10, duplex: half, autoneg: off [ 2201.204270] smsc95xx 1-1.1:1.0 enu1u1: intdata: 0x00008000 [ 2201.205284] smsc95xx 1-1.1:1.0 enu1u1: intdata: 0x00008000 [ 2201.207139] smsc_phy_handle_interrupt: MII_LAN83C185_ISF = 0x0098 [ 2201.208825] smsc_phy_handle_interrupt: MII_LAN83C185_ISF = 0x0098 [ 2201.216406] lan87xx_read_status: link: no, speed: 10, duplex: half, autoneg: off [ 2201.460548] smsc95xx 1-1.1:1.0 enu1u1: intdata: 0x00008000 [ 2201.461618] smsc95xx 1-1.1:1.0 enu1u1: intdata: 0x00008000 [ 2201.462181] smsc95xx 1-1.1:1.0 enu1u1: intdata: 0x00008000 [ 2201.463273] smsc_phy_handle_interrupt: MII_LAN83C185_ISF = 0x0098 [ 2201.464764] smsc_phy_handle_interrupt: MII_LAN83C185_ISF = 0x0098 [ 2201.471066] lan87xx_read_status: link: no, speed: 10, duplex: half, autoneg: off [ 2201.716547] smsc95xx 1-1.1:1.0 enu1u1: intdata: 0x00008000 [ 2201.717607] smsc95xx 1-1.1:1.0 enu1u1: intdata: 0x00008000 [ 2201.718235] smsc95xx 1-1.1:1.0 enu1u1: intdata: 0x00008000 [ 2201.719267] smsc_phy_handle_interrupt: MII_LAN83C185_ISF = 0x0098 [ 2201.721035] smsc_phy_handle_interrupt: MII_LAN83C185_ISF = 0x0098 [ 2201.727488] lan87xx_read_status: link: no, speed: 10, duplex: half, autoneg: off [ 2201.972542] smsc95xx 1-1.1:1.0 enu1u1: intdata: 0x00008000 [ 2201.973614] smsc95xx 1-1.1:1.0 enu1u1: intdata: 0x00008000 [ 2201.974176] smsc95xx 1-1.1:1.0 enu1u1: intdata: 0x00008000 [ 2201.975321] smsc_phy_handle_interrupt: MII_LAN83C185_ISF = 0x0098 [ 2201.977078] smsc_phy_handle_interrupt: MII_LAN83C185_ISF = 0x0098 [ 2201.983500] lan87xx_read_status: link: no, speed: 10, duplex: half, autoneg: off [ 2202.228538] smsc95xx 1-1.1:1.0 enu1u1: intdata: 0x00008000 [ 2202.229615] smsc95xx 1-1.1:1.0 enu1u1: intdata: 0x00008000 [ 2202.230174] smsc95xx 1-1.1:1.0 enu1u1: intdata: 0x00008000 [ 2202.231292] smsc_phy_handle_interrupt: MII_LAN83C185_ISF = 0x0098 [ 2202.233038] smsc_phy_handle_interrupt: MII_LAN83C185_ISF = 0x0098 [ 2202.238972] lan87xx_read_status: link: no, speed: 10, duplex: half, autoneg: off [ 2202.239018] smsc_phy_get_next_update: next update in 250 jiffies [ 2203.258566] lan87xx_read_status: link: yes, speed: 10, duplex: half, autoneg: off [ 2203.258756] smsc95xx 1-1.1:1.0 enu1u1: Link is Up - 10Mbps/Half - flow control off With adaptive polling we can use both. Since IRQ down interrupt works as expected, we can use low frequency polling (one per 30 seconds) in link up state. On link down state, after last interrupt poll one time per second for 30 seconds, then switch to low frequency polling (one per 30 seconds). I need to figure out haw to handle an interrupt storm. -- Pengutronix e.K. | | Steuerwalder Str. 21 | http://www.pengutronix.de/ | 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
On Wed, Jul 02, 2025 at 11:46:05AM +0200, Oleksij Rempel wrote: > On Tue, Jul 01, 2025 at 02:58:19PM +0200, Lukas Wunner wrote: > > Without support for interrupt handling, we can't take advantage > > of the GPIOs on the chip for interrupt generation. Nor can we > > properly support runtime PM if no cable is attached. > > Hm... the PHY smsc driver is not using EDPD mode by default if PHY > interrupts are enabled. Or do you mean other kind of PM? See commit 2642cc6c3bbe ("net: phy: smsc: Disable Energy Detect Power-Down in interrupt mode"): The LAN9514 used on older RasPi boards only supports SUSPEND0, SUSPEND1, SUSPEND2. Other incarnations of the LAN95xx family such as LAN9500A additionally support SUSPEND3. The smsc95xx.c driver enables suspend only on SUSPEND3-capable chips. I was planning to add suspend support for LAN9514 but never got around to finish it. Enabling SUSPEND1 if no cable is attached saves quite a bit of power, so RasPis without a network connection consume less. Back in the day I did some tests and noticed that while EDPD wasn't working at all outside of SUSPEND modes, it did work at least sometimes to wake up from SUSPEND1. The driver fiddles with EDPD settings upon entering suspend. I concluded that more testing is necessary to enable EDPD before SUSPEND1 and that's when I had to put this project aside. :( Thanks, Lukas
© 2016 - 2025 Red Hat, Inc.