drivers/net/phy/phy_device.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
There are possibilities for phy_resume to be executed when the ethernet
interface is initially taken UP after bootup. This is harmless in most
cases, but the respective PHY driver`s resume callback cannot have any
logic that should only be executed if it was previously suspended.
In stmmac for instance, 2 entry points of phy_resume:
1. stmmac_open->phylink_of_phy_connect->phy_attach_direct->phy_resume
commit 1211ce530771 ("net: phy: resume/suspend PHYs on attach/detach")
This is not needed at the initial interface UP but required if the PHY
may suspend when the interface is taken DOWN.
2. stmmac_open->phylink_start->phy_start->__phy_resume
commit 9e573cfc35c6 ("net: phy: start interrupts in phy_start")
This patch does not introduce the __phy_resume in phy_start but removes
it from being conditional to PHY_HALTED. Now it fails to ensure if it
really needs to resume.
Prevent these duplicate access and provide logic exclusivity for resume
callback in a PHY driver.
Signed-off-by: Abid Ali <dev.nuvorolabs@gmail.com>
---
drivers/net/phy/phy_device.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 73f9cb2e2844..68583bb74aec 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -1846,7 +1846,7 @@ int __phy_resume(struct phy_device *phydev)
lockdep_assert_held(&phydev->lock);
- if (!phydrv || !phydrv->resume)
+ if (!phydrv || !phydrv->resume && phydev->suspended)
return 0;
ret = phydrv->resume(phydev);
---
base-commit: 347e9f5043c89695b01e66b3ed111755afcf1911
change-id: 20250718-phy_resume-cc86109396cc
Best regards,
--
Abid Ali <dev.nuvorolabs@gmail.com>
Hi Abid, kernel test robot noticed the following build warnings: url: https://github.com/intel-lab-lkp/linux/commits/Abid-Ali/net-phy-Fix-premature-resume-by-a-PHY-driver/20250718-234858 base: 347e9f5043c89695b01e66b3ed111755afcf1911 patch link: https://lore.kernel.org/r/20250718-phy_resume-v1-1-9c6b59580bee%40gmail.com patch subject: [PATCH] net: phy: Fix premature resume by a PHY driver config: sparc-randconfig-r071-20250719 (https://download.01.org/0day-ci/archive/20250720/202507200229.iOChX4Rp-lkp@intel.com/config) compiler: sparc-linux-gcc (GCC) 14.3.0 If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Reported-by: Dan Carpenter <dan.carpenter@linaro.org> | Closes: https://lore.kernel.org/r/202507200229.iOChX4Rp-lkp@intel.com/ smatch warnings: drivers/net/phy/phy_device.c:1852 __phy_resume() error: we previously assumed 'phydrv->resume' could be null (see line 1849) vim +1852 drivers/net/phy/phy_device.c 9c2c2e62df3fa3 Andrew Lunn 2018-02-27 1842 int __phy_resume(struct phy_device *phydev) 481b5d938b4a60 Sebastian Hesselbarth 2013-12-13 1843 { 0bd199fd9c19aa Russell King (Oracle 2024-02-02 1844) const struct phy_driver *phydrv = phydev->drv; 8a8f8281e7e7a8 Heiner Kallweit 2020-03-26 1845 int ret; 481b5d938b4a60 Sebastian Hesselbarth 2013-12-13 1846 e6e918d4eb93f4 Heiner Kallweit 2021-01-06 1847 lockdep_assert_held(&phydev->lock); f5e64032a799d4 Russell King 2017-12-12 1848 9421d84b1b3e16 Abid Ali 2025-07-18 @1849 if (!phydrv || !phydrv->resume && phydev->suspended) ^^^^^^^^^^^^^^^ This checks for if the resume pointer is NULL, but if the resume is NULL but suspend is also NULL. I'm surprised that the compiler allows us to write that comparison without adding parenthesis. I thought that it would complain about && having higher precedence. 8a8f8281e7e7a8 Heiner Kallweit 2020-03-26 1850 return 0; 8a477a6fb6a336 Florian Fainelli 2015-01-26 1851 8a8f8281e7e7a8 Heiner Kallweit 2020-03-26 @1852 ret = phydrv->resume(phydev); ^^^^^^^^^^^^^^ Then this will crash. 8a8f8281e7e7a8 Heiner Kallweit 2020-03-26 1853 if (!ret) 8a477a6fb6a336 Florian Fainelli 2015-01-26 1854 phydev->suspended = false; 8a477a6fb6a336 Florian Fainelli 2015-01-26 1855 8a477a6fb6a336 Florian Fainelli 2015-01-26 1856 return ret; 481b5d938b4a60 Sebastian Hesselbarth 2013-12-13 1857 } -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki
Hi Dan, On Sun, Jul 20, 2025 at 21:09:19 +0300, Dan Carpenter wrote: > smatch warnings: > drivers/net/phy/phy_device.c:1852 __phy_resume() error: we previously assumed 'phydrv->resume' could be null (see line 1849) > > vim +1852 drivers/net/phy/phy_device.c > > 9421d84b1b3e16 Abid Ali 2025-07-18 @1849 if (!phydrv || !phydrv->resume && phydev->suspended) > ^^^^^^^^^^^^^^^ > This checks for if the resume pointer is NULL, but if the resume is > NULL but suspend is also NULL. I'm surprised that the compiler allows > us to write that comparison without adding parenthesis. I thought that > it would complain about && having higher precedence. > > 8a8f8281e7e7a8 Heiner Kallweit 2020-03-26 @1852 ret = phydrv->resume(phydev); > ^^^^^^^^^^^^^^ > Then this will crash. yea, there is an obvious error here and have pointed it in the thread. I have not gone forward with the V2 for now as we there are other concerns thats not easy to address. LINK: https://lore.kernel.org/netdev/aHu6kzOpaoDFR8BM@shell.armlinux.org.uk/ For now the idea is to deal with issue pointed in the commit in the driver`s resume callback itself.
Hi Abid, kernel test robot noticed the following build warnings: [auto build test WARNING on 347e9f5043c89695b01e66b3ed111755afcf1911] url: https://github.com/intel-lab-lkp/linux/commits/Abid-Ali/net-phy-Fix-premature-resume-by-a-PHY-driver/20250718-234858 base: 347e9f5043c89695b01e66b3ed111755afcf1911 patch link: https://lore.kernel.org/r/20250718-phy_resume-v1-1-9c6b59580bee%40gmail.com patch subject: [PATCH] net: phy: Fix premature resume by a PHY driver config: i386-buildonly-randconfig-004-20250719 (https://download.01.org/0day-ci/archive/20250719/202507191322.YG6cNwF6-lkp@intel.com/config) compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250719/202507191322.YG6cNwF6-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202507191322.YG6cNwF6-lkp@intel.com/ All warnings (new ones prefixed by >>): >> drivers/net/phy/phy_device.c:1849:33: warning: '&&' within '||' [-Wlogical-op-parentheses] 1849 | if (!phydrv || !phydrv->resume && phydev->suspended) | ~~ ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~ drivers/net/phy/phy_device.c:1849:33: note: place parentheses around the '&&' expression to silence this warning 1849 | if (!phydrv || !phydrv->resume && phydev->suspended) | ^ | ( ) 1 warning generated. vim +1849 drivers/net/phy/phy_device.c 1841 1842 int __phy_resume(struct phy_device *phydev) 1843 { 1844 const struct phy_driver *phydrv = phydev->drv; 1845 int ret; 1846 1847 lockdep_assert_held(&phydev->lock); 1848 > 1849 if (!phydrv || !phydrv->resume && phydev->suspended) 1850 return 0; 1851 1852 ret = phydrv->resume(phydev); 1853 if (!ret) 1854 phydev->suspended = false; 1855 1856 return ret; 1857 } 1858 EXPORT_SYMBOL(__phy_resume); 1859 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki
On Fri, Jul 18, 2025 at 03:42:22PM +0000, Abid Ali wrote: > There are possibilities for phy_resume to be executed when the ethernet > interface is initially taken UP after bootup. This is harmless in most > cases, but the respective PHY driver`s resume callback cannot have any > logic that should only be executed if it was previously suspended. Sorry but no. The PHY will be "resumed" from boot, even if it wasn't "suspended". So the idea that resume should only be called if it was previously suspended is incorrect. E.g. .ndo_open -> ... -> phy_attach_direct() -> phy_resume() -> phydrv->resume() During this path, the PHY may or may not be suspended, depending on the state of the hardware when control was passed to the kernel, which includes kexec(). PHY drivers must cope with an already functional PHY when their resume() method is called. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
On Fri, Jul 18, 2025 at 05:10:54 PM +0100, Russell King (Oracle) wrote: > Sorry but no. The PHY will be "resumed" from boot, even if it wasn't > "suspended". So the idea that resume should only be called if it was > previously suspended is incorrect. > E.g. .ndo_open -> ... -> phy_attach_direct() -> phy_resume() -> > phydrv->resume() I do point this path out and there is also a second call (2) .ndo_open -> phylink_start -> phy_start -> __phy_resume This would mean 2 calls to the PHY resume every time an interface is taken UP is expected behaviour?. > During this path, the PHY may or may not be suspended, depending on > the state of the hardware when control was passed to the kernel, > which includes kexec(). > PHY drivers must cope with an already functional PHY when their > resume() method is called. This is not what I expected, but if it is by design I do not see a need to fight it. Just to make it clear, if we need to reset a PHY after it returns from suspend(or any code thats dependant), the driver`s callback should provide this guarantee?. if yes, this was just a misconception of relating it to resume callbacks provided by the PM subsystem where such behaviour is not exepcted Obvious logic error in the patch but this patch is not required as it stands. - if (!phydrv || !phydrv->resume && phydev->suspended) + if (!phydrv || !phydrv->resume || !phydev->suspended)
On Sat, Jul 19, 2025 at 06:25:47AM +0000, Abid Ali wrote: > On Fri, Jul 18, 2025 at 05:10:54 PM +0100, Russell King (Oracle) wrote: > > Sorry but no. The PHY will be "resumed" from boot, even if it wasn't > > "suspended". So the idea that resume should only be called if it was > > previously suspended is incorrect. > > > E.g. .ndo_open -> ... -> phy_attach_direct() -> phy_resume() -> > > phydrv->resume() > > I do point this path out and there is also a second call > (2) .ndo_open -> phylink_start -> phy_start -> __phy_resume > This would mean 2 calls to the PHY resume every time an interface is > taken UP is expected behaviour?. The whole point is this: > > During this path, the PHY may or may not be suspended, depending on > > the state of the hardware when control was passed to the kernel, > > which includes kexec(). Thus, the resume function *must* cope with an already resumed PHY, and thus adding extra complexity to try to ignore calling the resume function if it wasn't previously suspended is likely to cause regressions - phydrv->suspended will be clear for the initial call to ->resume(). Thus, if the PHY was suspended at boot time, it won't be resumed when one attempts to bring up the interface initially. Just fix your driver's resume function. > > PHY drivers must cope with an already functional PHY when their > > resume() method is called. > > This is not what I expected, but if it is by design I do not see a > need to fight it. Just to make it clear, if we need to reset a PHY > after it returns from suspend(or any code thats dependant), the driver`s > callback should provide this guarantee?. Hardware or software reset? How much a software reset disrupts the PHY is PHY dependent. E.g. there are PHYs that need to be software reset for configuration and advertisement changes, but all the software configuration is preserved over such a reset. So I can't answer your question. It depends how disruptive the reset you are talking about is to your PHY implementation. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
On Sat, Jul 19, 2025 at 08:48:20 AM +0100, Russell King (Oracle) wrote: > > I do point this path out and there is also a second call > > (2) .ndo_open -> phylink_start -> phy_start -> __phy_resume > > This would mean 2 calls to the PHY resume every time an interface is > > taken UP is expected behaviour?. > > The whole point is this: > > > > During this path, the PHY may or may not be suspended, depending on > > > the state of the hardware when control was passed to the kernel, > > > which includes kexec(). > > Thus, the resume function *must* cope with an already resumed PHY, > and thus adding extra complexity to try to ignore calling the resume > function if it wasn't previously suspended is likely to cause > regressions - phydrv->suspended will be clear for the initial call > to ->resume(). Thus, if the PHY was suspended at boot time, it won't > be resumed when one attempts to bring up the interface initially. yea, I get your point. > Hardware or software reset? > > How much a software reset disrupts the PHY is PHY dependent. E.g. there > are PHYs that need to be software reset for configuration and > advertisement changes, but all the software configuration is preserved > over such a reset. The PHY we have loses power when the kernel PM goes to suspend and we need have a hardware reset upon its bootup in resume. As an unintentional consequence this ended with 2 additional resets (reset-delay-us in dts + 2 PHY resume) at boot->interface-UP. In the end the "phydev->state" in the driver`s resume callback was used to prevent it and checking further, it was evident that there were 2 intentional calls for phy_resume from .ndo_open which didnt look obvious. This particular scenario was not the point of the commit but rather having some protection for phy_resume but I guess its not possible. To keep it simple, these would be my present understanding. 1. Should the PHY driver be able handle consecutive resume callbacks? a. yes. It would have to be taken care in the driver. 2. Why does phy_resume exec twice in .ndo_open with PHYLINK API? a. can happen but still dont have clarity on why .ndo_open does this.
On Sat, Jul 19, 2025 at 11:34:50AM +0000, Abid Ali wrote: > The PHY we have loses power when the kernel PM goes to suspend and we > need have a hardware reset upon its bootup in resume. Other PHY drivers are fine with this. > As an unintentional consequence this ended with 2 additional > resets (reset-delay-us in dts + 2 PHY resume) at boot->interface-UP. Presumably, the resets occur because your PHY driver (which phy driver, and are the patches for this merged?) is causing the hardware reset to occur? > In the end the "phydev->state" in the driver`s resume callback was used to > prevent it and checking further, it was evident that there were 2 > intentional calls for phy_resume from .ndo_open which didnt look obvious. > > This particular scenario was not the point of the commit but rather > having some protection for phy_resume but I guess its not possible. > To keep it simple, these would be my present understanding. > > 1. Should the PHY driver be able handle consecutive resume callbacks? > a. yes. It would have to be taken care in the driver. Correct, but I think we need full details. > 2. Why does phy_resume exec twice in .ndo_open with PHYLINK API? > a. can happen but still dont have clarity on why .ndo_open does this. Nothing to do with phylink. Exactly the same would happen with drivers that use the phylib API, attaching the PHY in .ndo_open and then calling phy_start(). Phylink is just a wrapper around these. The problem I have with the idea of changing the behaviour in core code is that we can't test every driver that is making use of phylib (whether directly or via phylink.) It would be a monumental task to do that level of testing. So, if there's another clean way to solve the issue, I would much rather that approach was taken. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
© 2016 - 2025 Red Hat, Inc.