[v1] net: phy: smsc: use IRQ + relaxed polling to fix missed link-up

[PATCH net v1 2/2] net: phy: smsc: add adaptive polling to recover missed link-up on LAN8700

Posted by Oleksij Rempel 7 months, 1 week ago

Fixe unreliable link detection on LAN8700 configured for 10 Mbit / half-
or full-duplex against an autonegotiating partner and similar scenarios.

The LAN8700 PHY (build in to LAN9512 and similar adapters) can fail to
report a link-up event when it is forced to a fixed speed/duplex and the
link partner still advertises autonegotiation. During link establishment
the PHY raises several interrupts while the link is not yet up; once the
link finally comes up no further interrupt is generated, so phylib never
observes the transition and the kernel keeps the interface down even
though ethtool shows the link as up.

Mitigate this by combining interrupts with adaptive polling:

- When the driver is running in pure polling mode it continues to poll
  once per second (unchanged).
- With an IRQ present we now
  - poll every 30 s while the link is up (low overhead);
  - switch to a 1 s poll for up to 30 s after the last IRQ and while the
    link is down, ensuring we catch the final silent link-up.

This patch depends on:
- commit 8bf47e4d7b87 ("net: phy: Add support for driver-specific next
  update time")
- part of this patch set ("net: phy: enable polling when driver
  implements get_next_update_time")

Fixes: 1ce8b37241ed ("usbnet: smsc95xx: Forward PHY interrupts to PHY driver to avoid polling")
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Cc: Lukas Wunner <lukas@wunner.de>
---
 drivers/net/phy/smsc.c | 43 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/drivers/net/phy/smsc.c b/drivers/net/phy/smsc.c
index b6489da5cfcd..118aee834094 100644
--- a/drivers/net/phy/smsc.c
+++ b/drivers/net/phy/smsc.c
@@ -39,6 +39,17 @@
 /* interval between phylib state machine runs in ms */
 #define PHY_STATE_MACH_MS		1000
 
+/* Poll every 1 s when we have no IRQ line.
+ * Matches the default phylib state-machine cadence.
+ */
+#define SMSC_NOIRQ_POLLING_INTERVAL	secs_to_jiffies(1)
+/* When IRQs are present and the link is already up,
+ * fall back to a light 30 s poll:
+ *  – avoids needless wake-ups for power-management purposes
+ *  – still short enough to recover if the final link-up IRQ was lost
+ */
+#define SMSC_IRQ_POLLING_INTERVAL	secs_to_jiffies(30)
+
 struct smsc_hw_stat {
 	const char *string;
 	u8 reg;
@@ -54,6 +65,7 @@ struct smsc_phy_priv {
 	unsigned int edpd_mode_set_by_user:1;
 	unsigned int edpd_max_wait_ms;
 	bool wol_arp;
+	unsigned long last_irq;
 };
 
 static int smsc_phy_ack_interrupt(struct phy_device *phydev)
@@ -100,6 +112,7 @@ static int smsc_phy_config_edpd(struct phy_device *phydev)
 
 irqreturn_t smsc_phy_handle_interrupt(struct phy_device *phydev)
 {
+	struct smsc_phy_priv *priv = phydev->priv;
 	int irq_status;
 
 	irq_status = phy_read(phydev, MII_LAN83C185_ISF);
@@ -113,6 +126,8 @@ irqreturn_t smsc_phy_handle_interrupt(struct phy_device *phydev)
 	if (!(irq_status & MII_LAN83C185_ISF_INT_PHYLIB_EVENTS))
 		return IRQ_NONE;
 
+	priv->last_irq = jiffies;
+
 	phy_trigger_machine(phydev);
 
 	return IRQ_HANDLED;
@@ -684,6 +699,33 @@ int smsc_phy_probe(struct phy_device *phydev)
 }
 EXPORT_SYMBOL_GPL(smsc_phy_probe);
 
+static unsigned int smsc_phy_get_next_update(struct phy_device *phydev)
+{
+	struct smsc_phy_priv *priv = phydev->priv;
+
+	/* If interrupts are disabled, fall back to default polling */
+	if (phydev->irq == PHY_POLL)
+		return SMSC_NOIRQ_POLLING_INTERVAL;
+
+	/* The PHY sometimes drops the *final* link-up IRQ when we run
+	 * with autoneg OFF (10 Mbps HD/FD) against an autonegotiating
+	 * partner: we see several “link down” IRQs, none for “link up”.
+	 *
+	 * Work-around philosophy:
+	 *   - If the link is already up, the hazard is past, so we
+	 *     revert to a relaxed 30 s poll to save power.
+	 *   - Otherwise we stay in a tighter polling loop for up to one
+	 *     full interval after the last IRQ in case the crucial
+	 *     link-up IRQ was lost.  Empirically 5 s is enough but we
+	 *     keep 30 s to be extra conservative.
+	 */
+	if (!priv->last_irq || phydev->link ||
+	    time_is_before_jiffies(priv->last_irq + SMSC_IRQ_POLLING_INTERVAL))
+		return SMSC_IRQ_POLLING_INTERVAL;
+
+	return SMSC_NOIRQ_POLLING_INTERVAL;
+}
+
 static struct phy_driver smsc_phy_driver[] = {
 {
 	.phy_id		= 0x0007c0a0, /* OUI=0x00800f, Model#=0x0a */
@@ -749,6 +791,7 @@ static struct phy_driver smsc_phy_driver[] = {
 	/* IRQ related */
 	.config_intr	= smsc_phy_config_intr,
 	.handle_interrupt = smsc_phy_handle_interrupt,
+	.get_next_update_time = smsc_phy_get_next_update,
 
 	/* Statistics */
 	.get_sset_count = smsc_get_sset_count,
-- 
2.39.5

Re: [PATCH net v1 2/2] net: phy: smsc: add adaptive polling to recover missed link-up on LAN8700

Posted by Lukas Wunner 7 months, 1 week ago

On Mon, Jul 07, 2025 at 05:32:32PM +0200, Oleksij Rempel wrote:
> Fixe unreliable link detection on LAN8700 configured for 10 Mbit / half-

s/Fixe/Fix/

> or full-duplex against an autonegotiating partner and similar scenarios.
> 
> The LAN8700 PHY (build in to LAN9512 and similar adapters) can fail to

s/build/built/

> report a link-up event when it is forced to a fixed speed/duplex and the
> link partner still advertises autonegotiation. During link establishment
> the PHY raises several interrupts while the link is not yet up; once the
> link finally comes up no further interrupt is generated, so phylib never
> observes the transition and the kernel keeps the interface down even
> though ethtool shows the link as up.
> 
> Mitigate this by combining interrupts with adaptive polling:
> 
> - When the driver is running in pure polling mode it continues to poll
>   once per second (unchanged).
> - With an IRQ present we now
>   - poll every 30 s while the link is up (low overhead);
>   - switch to a 1 s poll for up to 30 s after the last IRQ and while the
>     link is down, ensuring we catch the final silent link-up.

I think this begs the question, if we *know* that the link may come up
belatedly without an interrupt, why poll?  Would it work to schedule a
one-off link check after a reasonable delay to catch the link change?

I'm also wondering if enabling EDPD changes the behavior?

> +static unsigned int smsc_phy_get_next_update(struct phy_device *phydev)
> +{
> +	struct smsc_phy_priv *priv = phydev->priv;
> +
> +	/* If interrupts are disabled, fall back to default polling */
> +	if (phydev->irq == PHY_POLL)
> +		return SMSC_NOIRQ_POLLING_INTERVAL;
> +
> +	/* The PHY sometimes drops the *final* link-up IRQ when we run
> +	 * with autoneg OFF (10 Mbps HD/FD) against an autonegotiating
> +	 * partner: we see several "link down" IRQs, none for "link up".

Hm, I'm not seeing a check for all those conditions (e.g. autoneg off) here?

Also, you seem to be using UTF-8 characters instead of US-ASCII single or
double quotes around "link down" and "link up".

> +	 *
> +	 * Work-around philosophy:
> +	 *   - If the link is already up, the hazard is past, so we
> +	 *     revert to a relaxed 30 s poll to save power.
> +	 *   - Otherwise we stay in a tighter polling loop for up to one
> +	 *     full interval after the last IRQ in case the crucial
> +	 *     link-up IRQ was lost.  Empirically 5 s is enough but we
> +	 *     keep 30 s to be extra conservative.
> +	 */
> +	if (!priv->last_irq || phydev->link ||
> +	    time_is_before_jiffies(priv->last_irq + SMSC_IRQ_POLLING_INTERVAL))
> +		return SMSC_IRQ_POLLING_INTERVAL;
> +
> +	return SMSC_NOIRQ_POLLING_INTERVAL;
> +}

Thanks,

Lukas

Re: [PATCH net v1 2/2] net: phy: smsc: add adaptive polling to recover missed link-up on LAN8700

Posted by Oleksij Rempel 7 months ago

On Mon, Jul 07, 2025 at 08:10:36PM +0200, Lukas Wunner wrote:
> > Mitigate this by combining interrupts with adaptive polling:
> > 
> > - When the driver is running in pure polling mode it continues to poll
> >   once per second (unchanged).
> > - With an IRQ present we now
> >   - poll every 30 s while the link is up (low overhead);
> >   - switch to a 1 s poll for up to 30 s after the last IRQ and while the
> >     link is down, ensuring we catch the final silent link-up.
> 
> I think this begs the question, if we *know* that the link may come up
> belatedly without an interrupt, why poll?  Would it work to schedule a
> one-off link check after a reasonable delay to catch the link change?

The exact timing of the link-up depends on the link partner. For
example, when testing with an Intel i350, the longest observed link-up
time was 3.85 seconds. But I cannot measure all possible combinations of
PHYs and link partners, and some might take even longer.

If we use a one-shot delayed check, we would have to wait at least 5
seconds to be safe-maybe more. This would increase the link-up time for
most users, even in cases where the link is already up earlier. So the
user experience would get worse, without guaranteeing we catch all
cases.

That’s why I decided to go with a conservative 1 Hz polling for 30
seconds after the last interrupt. This keeps the link-up time short when
possible, and also helps detect obscure link changes that may otherwise
go unnoticed.

> I'm also wondering if enabling EDPD changes the behavior?

This is not straightforward to test. The PHY driver explicitly requires
polling mode for EDPD support, so when EDPD is enabled, polling is
already active. In that case, the missing IRQ issue is avoided by
design, because the PHY state machine runs regularly regardless of
interrupts. So enabling EDPD indirectly works around the problem, but
not because it changes the PHY behavior-just because it activates
polling.

> > +static unsigned int smsc_phy_get_next_update(struct phy_device *phydev)
> > +{
> > +	struct smsc_phy_priv *priv = phydev->priv;
> > +
> > +	/* If interrupts are disabled, fall back to default polling */
> > +	if (phydev->irq == PHY_POLL)
> > +		return SMSC_NOIRQ_POLLING_INTERVAL;
> > +
> > +	/* The PHY sometimes drops the *final* link-up IRQ when we run
> > +	 * with autoneg OFF (10 Mbps HD/FD) against an autonegotiating
> > +	 * partner: we see several "link down" IRQs, none for "link up".
> 
> Hm, I'm not seeing a check for all those conditions (e.g. autoneg off) here?

You're right, I'm not checking explicitly for autoneg == off or speed ==
10 in the code. I chose not to limit it to just that case, because if
IRQ behavior is unreliable in one configuration, it's possible that
other cases may also be affected in less obvious ways.

Polling once per second for 30 seconds is a small cost in terms of
power, but it greatly reduces the risk of missing a link-up event. In my
opinion, it's a safer default and avoids user-visible issues without
significantly impacting energy usage.

Best Regards,
Oleksij
-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

[PATCH net v1 1/2] net: phy: enable polling when driver implements get_next_update_time
[PATCH net v1 2/2] net: phy: smsc: add adaptive polling to recover missed link-up on LAN8700