[PATCH net-next] net: phylink: improve phylink_sfp_config_phy() error message with empty supported

Vladimir Oltean posted 1 patch 1 week, 1 day ago
drivers/net/phy/phylink.c | 5 +++++
1 file changed, 5 insertions(+)
[PATCH net-next] net: phylink: improve phylink_sfp_config_phy() error message with empty supported
Posted by Vladimir Oltean 1 week, 1 day ago
It seems that phylink does not support driving PHYs in SFP modules using
the Generic PHY or Generic Clause 45 PHY driver. I've come to this
conclusion after analyzing these facts:

- sfp_sm_probe_phy(), who is our caller here, first calls
  phy_device_register() and then sfp_add_phy() -> ... ->
  phylink_sfp_connect_phy().

- phydev->supported is populated by phy_probe()

- phy_probe() is usually called synchronously from phy_device_register()
  via phy_bus_match(), if a precise device driver is found for the PHY.
  In that case, phydev->supported has a good chance of being set to a
  non-zero mask.

- There is an exceptional case for the PHYs for which phy_bus_match()
  didn't find a driver. Those devices sit for a while without a driver,
  then phy_attach_direct() force-binds the genphy_c45_driver or
  genphy_driver to them. Again, this triggers phy_probe() and renders
  a good chance of phydev->supported being populated, assuming
  compatibility with genphy_read_abilities() or
  genphy_c45_pma_read_abilities().

- phylink_sfp_config_phy() does not support the exceptional case of
  retrieving phydev->supported from the Generic PHY driver, due to its
  code flow. It expects the phydev->supported mask to already be
  non-empty, because it first calls phylink_validate() on it, and only
  calls phylink_attach_phy() if that succeeds. Thus, phylink_attach_phy()
  -> phy_attach_direct() has no chance of running.

It is not my wish to change the state of affairs by altering the code
flow, but merely to document the limitation rather than have the current
unspecific error:

[   61.800079] mv88e6085 d0032004.mdio-mii:12 sfp: validation with support 00,00000000,00000000,00000000 failed: -EINVAL
[   61.820743] sfp sfp: sfp_add_phy failed: -EINVAL

On the premise that an empty phydev->supported is going to make
phylink_validate() fail anyway, it would be more informative to single
out that case, undercut the phylink_validate() call, and print a more
specific message:

[   64.738270] mv88e6085 d0032004.mdio-mii:12 sfp: PHY i2c:sfp:16 (id 0x01410cc2) supports no link modes. Maybe its specific PHY driver not loaded?
[   64.769731] sfp sfp: sfp_add_phy failed: -EINVAL

Of course, there may be other reasons due to which phydev->supported is
empty, thus the use of the word "maybe", but I think the lack of a
driver would be the most common.

Link: https://lore.kernel.org/netdev/20241113144229.3ff4bgsalvj7spb7@skbuf/
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 drivers/net/phy/phylink.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index b1e828a4286d..efeff8733a52 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -3219,6 +3219,11 @@ static int phylink_sfp_config_phy(struct phylink *pl, u8 mode,
 	int ret;
 
 	linkmode_copy(support, phy->supported);
+	if (linkmode_empty(support)) {
+		phylink_err(pl, "PHY %s (id 0x%.8lx) supports no link modes. Maybe its specific PHY driver not loaded?\n",
+			    phydev_name(phy), (unsigned long)phy->phy_id);
+		return -EINVAL;
+	}
 
 	memset(&config, 0, sizeof(config));
 	linkmode_copy(config.advertising, phy->advertising);
-- 
2.43.0
Re: [PATCH net-next] net: phylink: improve phylink_sfp_config_phy() error message with empty supported
Posted by Andrew Lunn 1 week, 1 day ago
On Thu, Nov 14, 2024 at 06:53:48PM +0200, Vladimir Oltean wrote:
> It seems that phylink does not support driving PHYs in SFP modules using
> the Generic PHY or Generic Clause 45 PHY driver.

Somewhat related, i wounder if we should add a phydev_info() message
in genphy_probe() which prints a message that you probably want to
swap to using a PHY driver specific to the hardware?

The less genphy is used, the better.

	Andrew
Re: [PATCH net-next] net: phylink: improve phylink_sfp_config_phy() error message with empty supported
Posted by Andrew Lunn 1 week, 1 day ago
> [   64.738270] mv88e6085 d0032004.mdio-mii:12 sfp: PHY i2c:sfp:16 (id 0x01410cc2) supports no link modes. Maybe its specific PHY driver not loaded?
> [   64.769731] sfp sfp: sfp_add_phy failed: -EINVAL
> 
> Of course, there may be other reasons due to which phydev->supported is
> empty, thus the use of the word "maybe", but I think the lack of a
> driver would be the most common.

I think this is useful.

I only have a minor nitpick, maybe in the commit message mention which
PHY drivers are typically used by SFPs, to point somebody who gets
this message in the right direction. The Marvell driver is one. at803x
i think is also used. Are then any others?

	Andrew
Re: [PATCH net-next] net: phylink: improve phylink_sfp_config_phy() error message with empty supported
Posted by Russell King (Oracle) 1 week, 1 day ago
On Thu, Nov 14, 2024 at 06:38:13PM +0100, Andrew Lunn wrote:
> > [   64.738270] mv88e6085 d0032004.mdio-mii:12 sfp: PHY i2c:sfp:16 (id 0x01410cc2) supports no link modes. Maybe its specific PHY driver not loaded?
> > [   64.769731] sfp sfp: sfp_add_phy failed: -EINVAL
> > 
> > Of course, there may be other reasons due to which phydev->supported is
> > empty, thus the use of the word "maybe", but I think the lack of a
> > driver would be the most common.
> 
> I think this is useful.
> 
> I only have a minor nitpick, maybe in the commit message mention which
> PHY drivers are typically used by SFPs, to point somebody who gets
> this message in the right direction. The Marvell driver is one. at803x
> i think is also used. Are then any others?

bcm84881 too. Not sure about at803x - the only SFP I know that uses
that PHY doesn't make the PHY available to the host.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
Re: [PATCH net-next] net: phylink: improve phylink_sfp_config_phy() error message with empty supported
Posted by Vladimir Oltean 1 week ago
On Thu, Nov 14, 2024 at 06:33:00PM +0000, Russell King (Oracle) wrote:
> On Thu, Nov 14, 2024 at 06:38:13PM +0100, Andrew Lunn wrote:
> > > [   64.738270] mv88e6085 d0032004.mdio-mii:12 sfp: PHY i2c:sfp:16 (id 0x01410cc2) supports no link modes. Maybe its specific PHY driver not loaded?
> > > [   64.769731] sfp sfp: sfp_add_phy failed: -EINVAL
> > > 
> > > Of course, there may be other reasons due to which phydev->supported is
> > > empty, thus the use of the word "maybe", but I think the lack of a
> > > driver would be the most common.
> > 
> > I think this is useful.
> > 
> > I only have a minor nitpick, maybe in the commit message mention which
> > PHY drivers are typically used by SFPs, to point somebody who gets
> > this message in the right direction. The Marvell driver is one. at803x
> > i think is also used. Are then any others?
> 
> bcm84881 too. Not sure about at803x - the only SFP I know that uses
> that PHY doesn't make the PHY available to the host.

So which Kconfig options should I put down for v2? CONFIG_BCM84881_PHY
and CONFIG_MARVELL_PHY?

To avoid this "Please insert the name of your sound card" situation
reminiscent of the 90s, another thing which might be interesting to
explore would be for each PHY driver to have a stub portion always built
into the kernel, keeping an association between the phy_id/phy_id_mask
and the Kconfig information associated with it (Kconfig option, and
whether it was enabled or not).

This way we could eliminate the guesswork and the kernel would always
know and print to the user which driver, if any, could handle that PHY
ID, rather than rely on the user to know that there is a portion of the
PHY ID which needs to be masked out when searching the kernel sources
for a compatible driver.

Please see the attached patch an inelegant way in which I've actually
implemented this for the Marvell and BCM84881 drivers. My idea is to
move each driver's struct mdio_device_id to a separate stub file.
Ugly but definitely low in complexity. It produces this output:

[   64.515123] mv88e6085 d0032004.mdio-mii:12 sfp: PHY i2c:sfp:16 supports no link modes. Its driver, CONFIG_MARVELL_PHY, is compiled as module.
[   64.532233] sfp sfp: sfp_add_phy failed: -EINVAL

Let me know if something like this would be interesting to submit to
mainline.
Re: [PATCH net-next] net: phylink: improve phylink_sfp_config_phy() error message with empty supported
Posted by Andrew Lunn 1 week ago
On Fri, Nov 15, 2024 at 06:14:01PM +0200, Vladimir Oltean wrote:
> On Thu, Nov 14, 2024 at 06:33:00PM +0000, Russell King (Oracle) wrote:
> > On Thu, Nov 14, 2024 at 06:38:13PM +0100, Andrew Lunn wrote:
> > > > [   64.738270] mv88e6085 d0032004.mdio-mii:12 sfp: PHY i2c:sfp:16 (id 0x01410cc2) supports no link modes. Maybe its specific PHY driver not loaded?
> > > > [   64.769731] sfp sfp: sfp_add_phy failed: -EINVAL
> > > > 
> > > > Of course, there may be other reasons due to which phydev->supported is
> > > > empty, thus the use of the word "maybe", but I think the lack of a
> > > > driver would be the most common.
> > > 
> > > I think this is useful.
> > > 
> > > I only have a minor nitpick, maybe in the commit message mention which
> > > PHY drivers are typically used by SFPs, to point somebody who gets
> > > this message in the right direction. The Marvell driver is one. at803x
> > > i think is also used. Are then any others?
> > 
> > bcm84881 too. Not sure about at803x - the only SFP I know that uses
> > that PHY doesn't make the PHY available to the host.
> 
> So which Kconfig options should I put down for v2? CONFIG_BCM84881_PHY
> and CONFIG_MARVELL_PHY?
> 
> To avoid this "Please insert the name of your sound card" situation
> reminiscent of the 90s, another thing which might be interesting to
> explore would be for each PHY driver to have a stub portion always built
> into the kernel, keeping an association between the phy_id/phy_id_mask
> and the Kconfig information associated with it (Kconfig option, and
> whether it was enabled or not).

This might be useful in other ways, if we can make it work for every
driver. genphy somewhat breaks the usual device model, and that causes
us pain at times. fw_devlink gets confused by genphy, and users as
well. We have the issue of not knowing if genphy is to be used, or we
should wait around longer for the correct driver to load.

So i can see three use cases:

1) There is a driver for this hardware, it is just not being built

2) There is a driver for this hardware, it is being built, it has not
loaded yet.

3) There is no driver for this hardware, genphy is the fallback.

I would actually say 1) is not something we should solve at the PHY
driver layer, it is a generic problem for all drivers. We want some
Makefile support for extracting the MODULE_DEVICE_TABLE() for modules
which are not enabled, and some way to create a modules.disabled.alias
which module loading can look at and issue a warning. 2) i also think
is a generic problem. 3) is probably PHY specific, because i don't
know of any other case where there is a fallback driver.

	Andrew
Re: [PATCH net-next] net: phylink: improve phylink_sfp_config_phy() error message with empty supported
Posted by Vladimir Oltean 6 days, 22 hours ago
On Fri, Nov 15, 2024 at 06:51:00PM +0100, Andrew Lunn wrote:
> We want some Makefile support for extracting the MODULE_DEVICE_TABLE()
> for modules which are not enabled, and some way to create a
> modules.disabled.alias which module loading can look at and issue a
> warning.

Sadly, for me, automatically extracting MODULE_DEVICE_TABLE() from files
which are not compiled in C is science fiction.

Also, I need to sleep on the idea that creating an association between
device and missing driver might be useful kernel-wide. I'm not prepared
to handle the possibility where we make that happen, but it gets push
back and dropped.