[PATCH net v4 0/3] net: phy: smsc: use IRQ + relaxed polling to fix missed link-up

Oleksij Rempel posted 3 patches 2 months, 3 weeks ago
drivers/net/phy/phy.c  | 27 ++++++++++++++++++++-------
drivers/net/phy/smsc.c | 40 ++++++++++++++++++++++++++++++++++++++++
include/linux/phy.h    | 17 +++++++++++++++--
3 files changed, 75 insertions(+), 9 deletions(-)
[PATCH net v4 0/3] net: phy: smsc: use IRQ + relaxed polling to fix missed link-up
Posted by Oleksij Rempel 2 months, 3 weeks ago
This series makes the SMSC LAN8700 (as used in LAN9512 and similar USB
adapters) reliable again in configurations where it is forced to 10 Mb/s
and the link partner still advertises autonegotiation.

In this scenario, the PHY may miss the final link-up interrupt, causing
the network interface to remain down even though a valid link is
present.

To address this:

Patch 1 – phylib: Enable polling if the driver implements
get_next_update_time(). This ensures the state machine is active even
without update_stats().

Patch 2 – phylib: Allow drivers to return PHY_STATE_IRQ to explicitly
disable polling.

Patch 3 – smsc: Implement get_next_update_time() with adaptive 1 Hz
polling for up to 30 seconds after the last interrupt in the affected
10M autoneg-off mode.  All other configurations rely on IRQs only.

Testing:

The LAN9512 (LAN8700 core) was tested against an Intel I350 NIC using
baseline, parallel-detection, and advertisement test suites. All
relevant tests passed.

Changes in v4:
- address -Wformat-security for WARN_ONCE()

Changes in v3:
- handle conflicting configuration if update_stats are supported

Changes in v2:
- Introduced explicit disable polling via PHY_STATE_IRQ
- Changed the workaround logic to apply 1 Hz polling only for 30 seconds
  after the last IRQ
- Dropped relaxed 30s polling while link is up
- Reworded commit messages and comments to reflect updated logic
- Split core changes into two separate patches for clarity

Thanks,
Oleksij Rempel

Oleksij Rempel (3):
  net: phy: enable polling when driver implements get_next_update_time
  net: phy: allow drivers to disable polling via get_next_update_time()
  net: phy: smsc: recover missed link-up IRQs on LAN8700 with adaptive
    polling

 drivers/net/phy/phy.c  | 27 ++++++++++++++++++++-------
 drivers/net/phy/smsc.c | 40 ++++++++++++++++++++++++++++++++++++++++
 include/linux/phy.h    | 17 +++++++++++++++--
 3 files changed, 75 insertions(+), 9 deletions(-)

--
2.39.5

Re: [PATCH net v4 0/3] net: phy: smsc: use IRQ + relaxed polling to fix missed link-up
Posted by Andrew Lunn 2 months, 2 weeks ago
On Mon, Jul 14, 2025 at 11:52:37AM +0200, Oleksij Rempel wrote:
> This series makes the SMSC LAN8700 (as used in LAN9512 and similar USB
> adapters) reliable again in configurations where it is forced to 10 Mb/s
> and the link partner still advertises autonegotiation.

I've seen a comment from another Maintainer that thinks this is rather
hackish. I tend to agree, you are adding complexity to the core to
handle one broken PHY, and a corner case in that PHY. It would be
better to hide as much of this in the PHY driver.

I'm wondering if there is a much simpler solution, which does not need
the core changing. Have the driver dynamically flip between interrupts
and polling, depending on the link mode.

Start up in the usual way. If the platform supports interrupts, let
the core get the interrupt, install the handler and use
interrupts. Otherwise do polling.

If .config_aneg() puts the PHY into the broken state, forced to 10
Mb/s, and interrupts are used, set phydev->irq = PHY_POLL, and call
phy_trigger_machine() to kick off polling.

If .config_aneg() is called to take it out of the broken state,
restore phydev->irq. An additional poll up to one second later should
not cause any issues.

I don't think this needs any core code changes.

Maybe there is an issue with phy_free_interrupt() being called while
irq has been set to polling? You might be able to use the
phy_driver.remove() to handle that?

	Andrew
Re: [PATCH net v4 0/3] net: phy: smsc: use IRQ + relaxed polling to fix missed link-up
Posted by Oleksij Rempel 2 weeks, 5 days ago
Hi Andrew,

On Fri, Jul 18, 2025 at 03:58:56PM +0200, Andrew Lunn wrote:
> On Mon, Jul 14, 2025 at 11:52:37AM +0200, Oleksij Rempel wrote:
> > This series makes the SMSC LAN8700 (as used in LAN9512 and similar USB
> > adapters) reliable again in configurations where it is forced to 10 Mb/s
> > and the link partner still advertises autonegotiation.
> 
> I've seen a comment from another Maintainer that thinks this is rather
> hackish. I tend to agree, you are adding complexity to the core to
> handle one broken PHY, and a corner case in that PHY. It would be
> better to hide as much of this in the PHY driver.
> 
> I'm wondering if there is a much simpler solution, which does not need
> the core changing. Have the driver dynamically flip between interrupts
> and polling, depending on the link mode.
> 
> Start up in the usual way. If the platform supports interrupts, let
> the core get the interrupt, install the handler and use
> interrupts. Otherwise do polling.
> 
> If .config_aneg() puts the PHY into the broken state, forced to 10
> Mb/s, and interrupts are used, set phydev->irq = PHY_POLL, and call
> phy_trigger_machine() to kick off polling.
> 
> If .config_aneg() is called to take it out of the broken state,
> restore phydev->irq. An additional poll up to one second later should
> not cause any issues.
> 
> I don't think this needs any core code changes.
> 
> Maybe there is an issue with phy_free_interrupt() being called while
> irq has been set to polling? You might be able to use the
> phy_driver.remove() to handle that?

I tried to go this way, but it feels even dirtier. The driver would need
to overwrite phydev->irq and phydev->interrupts, and also care about
proper interrupt (re)configuration. On disconnect, phy_disconnect()
unconditionally calls phy_free_interrupt() if phy_interrupt_is_valid(),
but phy_driver.remove() is invoked too late. This leads to warnings
like:
removing non-empty directory 'irq/210', leaking at least 'usb-001:003:01'

So the driver ends up fighting with core assumptions about IRQ lifetime.

How about a minimal change instead: conditionally call
phy_queue_state_machine() from lan87xx_config_aneg()? That would trigger
a poll in the broken mode without touching phydev->irq or core teardown
paths. Seems less intrusive than rewriting IRQ handling.

Best Regards,
Oleksij
-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |
Re: [PATCH net v4 0/3] net: phy: smsc: use IRQ + relaxed polling to fix missed link-up
Posted by Andrew Lunn 2 weeks, 4 days ago
> How about a minimal change instead: conditionally call
> phy_queue_state_machine() from lan87xx_config_aneg()? That would trigger
> a poll in the broken mode without touching phydev->irq or core teardown
> paths. Seems less intrusive than rewriting IRQ handling.

It is currently a static function, so that would have to change.

Or it might be better to add phy_trigger_machine_soon(), using the
default 1 second delay? And i would document it as only to be used by
broken PHYs, to try to stop it being abused. Anybody using it needs to
acknowledge their PHY is broken.

	Andrew
Re: [PATCH net v4 0/3] net: phy: smsc: use IRQ + relaxed polling to fix missed link-up
Posted by Russell King (Oracle) 2 weeks, 4 days ago
On Thu, Sep 18, 2025 at 09:48:33PM +0200, Andrew Lunn wrote:
> > How about a minimal change instead: conditionally call
> > phy_queue_state_machine() from lan87xx_config_aneg()? That would trigger
> > a poll in the broken mode without touching phydev->irq or core teardown
> > paths. Seems less intrusive than rewriting IRQ handling.
> 
> It is currently a static function, so that would have to change.
> 
> Or it might be better to add phy_trigger_machine_soon(), using the
> default 1 second delay? And i would document it as only to be used by
> broken PHYs, to try to stop it being abused. Anybody using it needs to
> acknowledge their PHY is broken.

Couldn't this be even simpler? If the problem is the interrupt
isn't raised, then how about the following.

(This assumes there is no issue with calling phy_trigger_machine()
from IRQ context.)

When lan87xx_config_aneg() detects that we're configuring into the
broken mode, start a timer (which has been pre-initialised at probe
time.) When the timer fires, call phy_trigger_machine(), and modify
the timer for the next poll interval.

When lan87xx_config_aneg() detects that we aren't in the broken
mode, delete the timer synchronously. Also delete the timer when
the PHY is suspended or unbound.

This requires no changes necessary to the core phylib code, and no
fiddling with fragile state either. Does it matter if the
interrupt does fire? Not really.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
Re: [PATCH net v4 0/3] net: phy: smsc: use IRQ + relaxed polling to fix missed link-up
Posted by Jakub Kicinski 2 months, 3 weeks ago
On Mon, 14 Jul 2025 11:52:37 +0200 Oleksij Rempel wrote:
> This series makes the SMSC LAN8700 (as used in LAN9512 and similar USB
> adapters) reliable again in configurations where it is forced to 10 Mb/s
> and the link partner still advertises autonegotiation.
> 
> In this scenario, the PHY may miss the final link-up interrupt, causing
> the network interface to remain down even though a valid link is
> present.

Could we get a PHY maintainer ack on these (especially patch 2)?