[RESEND PATCH net-next v3 2/4] net: phy: aquantia: wait for FW reset before checking the vendor ID

Bartosz Golaszewski posted 4 patches 1 year, 5 months ago
[RESEND PATCH net-next v3 2/4] net: phy: aquantia: wait for FW reset before checking the vendor ID
Posted by Bartosz Golaszewski 1 year, 5 months ago
From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>

Checking the firmware register before it complete the boot process makes
no sense, it will report 0 even if FW is available from internal memory.
Always wait for FW to boot before continuing or we'll unnecessarily try
to load it from nvmem/filesystem and fail.

Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
---
 drivers/net/phy/aquantia/aquantia_firmware.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/phy/aquantia/aquantia_firmware.c b/drivers/net/phy/aquantia/aquantia_firmware.c
index 0c9640ef153b..524627a36c6f 100644
--- a/drivers/net/phy/aquantia/aquantia_firmware.c
+++ b/drivers/net/phy/aquantia/aquantia_firmware.c
@@ -353,6 +353,10 @@ int aqr_firmware_load(struct phy_device *phydev)
 {
 	int ret;
 
+	ret = aqr_wait_reset_complete(phydev);
+	if (ret)
+		return ret;
+
 	/* Check if the firmware is not already loaded by pooling
 	 * the current version returned by the PHY. If 0 is returned,
 	 * no firmware is loaded.
-- 
2.43.0
Re: [RESEND PATCH net-next v3 2/4] net: phy: aquantia: wait for FW reset before checking the vendor ID
Posted by Jon Hunter 1 year, 4 months ago
Hi Bartosz,

On 08/07/2024 08:50, Bartosz Golaszewski wrote:
> From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
> 
> Checking the firmware register before it complete the boot process makes
> no sense, it will report 0 even if FW is available from internal memory.
> Always wait for FW to boot before continuing or we'll unnecessarily try
> to load it from nvmem/filesystem and fail.
> 
> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
> ---
>   drivers/net/phy/aquantia/aquantia_firmware.c | 4 ++++
>   1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/net/phy/aquantia/aquantia_firmware.c b/drivers/net/phy/aquantia/aquantia_firmware.c
> index 0c9640ef153b..524627a36c6f 100644
> --- a/drivers/net/phy/aquantia/aquantia_firmware.c
> +++ b/drivers/net/phy/aquantia/aquantia_firmware.c
> @@ -353,6 +353,10 @@ int aqr_firmware_load(struct phy_device *phydev)
>   {
>   	int ret;
>   
> +	ret = aqr_wait_reset_complete(phydev);
> +	if (ret)
> +		return ret;
> +
>   	/* Check if the firmware is not already loaded by pooling
>   	 * the current version returned by the PHY. If 0 is returned,
>   	 * no firmware is loaded.


Although this fixed another issue we were seeing with this driver, we 
have been reviewing this change and have a question about it.

According to the description for the function aqr_wait_reset_complete() 
this function is intended to give the device time to load firmware and 
check there is a valid firmware ID.

If a valid firmware ID (non-zero) is detected, then 
aqr_wait_reset_complete() will return 0 (because 
phy_read_mmd_poll_timeout() returns 0 on success and -ETIMEDOUT upon a 
timeout).

If it times out, then it would appear that with the above code we don't 
attempt to load the firmware by any other means?

Hence, I was wondering if we want this ...

diff --git a/drivers/net/phy/aquantia/aquantia_firmware.c 
b/drivers/net/phy/aquantia/aquantia_firmware.c
index 524627a36c6f..a167f42ae36b 100644
--- a/drivers/net/phy/aquantia/aquantia_firmware.c
+++ b/drivers/net/phy/aquantia/aquantia_firmware.c
@@ -353,16 +353,12 @@ int aqr_firmware_load(struct phy_device *phydev)
  {
         int ret;

-       ret = aqr_wait_reset_complete(phydev);
-       if (ret)
-               return ret;
-
-       /* Check if the firmware is not already loaded by pooling
+       /* Check if the firmware is not already loaded by polling
          * the current version returned by the PHY. If 0 is returned,
-        * no firmware is loaded.
+        * firmware is loaded.
          */
-       ret = phy_read_mmd(phydev, MDIO_MMD_VEND1, VEND1_GLOBAL_FW_ID);
-       if (ret > 0)
+       ret = aqr_wait_reset_complete(phydev);
+       if (!ret)
                 goto exit;

         ret = aqr_firmware_load_nvmem(phydev);


Our Aquantia PHY has a SPI-NOR and so we don't to test the other 
firmware loading cases.

Jon

-- 
nvpublic
Re: [RESEND PATCH net-next v3 2/4] net: phy: aquantia: wait for FW reset before checking the vendor ID
Posted by Vladimir Oltean 1 year, 4 months ago
Hi Jon,

On Tue, Jul 30, 2024 at 10:59:59AM +0100, Jon Hunter wrote:
> Hi Bartosz,
> 
> On 08/07/2024 08:50, Bartosz Golaszewski wrote:
> > From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
> > 
> > Checking the firmware register before it complete the boot process makes
> > no sense, it will report 0 even if FW is available from internal memory.
> > Always wait for FW to boot before continuing or we'll unnecessarily try
> > to load it from nvmem/filesystem and fail.
> > 
> > Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
> > ---
> >   drivers/net/phy/aquantia/aquantia_firmware.c | 4 ++++
> >   1 file changed, 4 insertions(+)
> > 
> > diff --git a/drivers/net/phy/aquantia/aquantia_firmware.c b/drivers/net/phy/aquantia/aquantia_firmware.c
> > index 0c9640ef153b..524627a36c6f 100644
> > --- a/drivers/net/phy/aquantia/aquantia_firmware.c
> > +++ b/drivers/net/phy/aquantia/aquantia_firmware.c
> > @@ -353,6 +353,10 @@ int aqr_firmware_load(struct phy_device *phydev)
> >   {
> >   	int ret;
> > +	ret = aqr_wait_reset_complete(phydev);
> > +	if (ret)
> > +		return ret;
> > +
> >   	/* Check if the firmware is not already loaded by pooling
> >   	 * the current version returned by the PHY. If 0 is returned,
> >   	 * no firmware is loaded.
> 
> 
> Although this fixed another issue we were seeing with this driver, we have
> been reviewing this change and have a question about it.
> 
> According to the description for the function aqr_wait_reset_complete() this
> function is intended to give the device time to load firmware and check
> there is a valid firmware ID.
> 
> If a valid firmware ID (non-zero) is detected, then
> aqr_wait_reset_complete() will return 0 (because phy_read_mmd_poll_timeout()
> returns 0 on success and -ETIMEDOUT upon a timeout).
> 
> If it times out, then it would appear that with the above code we don't
> attempt to load the firmware by any other means?
> 
> Hence, I was wondering if we want this ...
> 
> diff --git a/drivers/net/phy/aquantia/aquantia_firmware.c
> b/drivers/net/phy/aquantia/aquantia_firmware.c
> index 524627a36c6f..a167f42ae36b 100644
> --- a/drivers/net/phy/aquantia/aquantia_firmware.c
> +++ b/drivers/net/phy/aquantia/aquantia_firmware.c
> @@ -353,16 +353,12 @@ int aqr_firmware_load(struct phy_device *phydev)
>  {
>         int ret;
> 
> -       ret = aqr_wait_reset_complete(phydev);
> -       if (ret)
> -               return ret;
> -
> -       /* Check if the firmware is not already loaded by pooling
> +       /* Check if the firmware is not already loaded by polling
>          * the current version returned by the PHY. If 0 is returned,
> -        * no firmware is loaded.
> +        * firmware is loaded.
>          */
> -       ret = phy_read_mmd(phydev, MDIO_MMD_VEND1, VEND1_GLOBAL_FW_ID);
> -       if (ret > 0)
> +       ret = aqr_wait_reset_complete(phydev);
> +       if (!ret)
>                 goto exit;
> 
>         ret = aqr_firmware_load_nvmem(phydev);

I agree with your analysis and we also noticed this.

But actually, you wouldn't want to ignore other return codes from
phy_read_mmd_poll_timeout() like real errors from phy_read_mmd():
-ENODEV, -ENXIO etc.

I found that the logic is more readable with a switch/case statement as below.

diff --git a/drivers/net/phy/aquantia/aquantia_firmware.c b/drivers/net/phy/aquantia/aquantia_firmware.c
index 524627a36c6f..d839f64471bd 100644
--- a/drivers/net/phy/aquantia/aquantia_firmware.c
+++ b/drivers/net/phy/aquantia/aquantia_firmware.c
@@ -353,26 +353,33 @@ int aqr_firmware_load(struct phy_device *phydev)
 {
 	int ret;
 
-	ret = aqr_wait_reset_complete(phydev);
-	if (ret)
-		return ret;
-
-	/* Check if the firmware is not already loaded by pooling
-	 * the current version returned by the PHY. If 0 is returned,
-	 * no firmware is loaded.
+	/* Check if the firmware is not already loaded by polling
+	 * the current version returned by the PHY.
 	 */
-	ret = phy_read_mmd(phydev, MDIO_MMD_VEND1, VEND1_GLOBAL_FW_ID);
-	if (ret > 0)
-		goto exit;
+	ret = aqr_wait_reset_complete(phydev);
+	switch (ret) {
+	case 0:
+		/* Some firmware is loaded => do nothing */
+		return 0;
+	case -ETIMEDOUT:
+		/* VEND1_GLOBAL_FW_ID still reads 0 after 2 seconds of polling.
+		 * We don't have full confidence that no firmware is loaded (in
+		 * theory it might just not have loaded yet), but we will
+		 * assume that, and load a new image.
+		 */
+		ret = aqr_firmware_load_nvmem(phydev);
+		if (!ret)
+			goto exit;
 
-	ret = aqr_firmware_load_nvmem(phydev);
-	if (!ret)
-		goto exit;
+		ret = aqr_firmware_load_fs(phydev);
+		if (ret)
+			return ret;
 
-	ret = aqr_firmware_load_fs(phydev);
-	if (ret)
+		break;
+	default:
+		/* PHY read error, propagate it to the caller */
 		return ret;
+	}
 
-exit:
 	return 0;
 }
Re: [RESEND PATCH net-next v3 2/4] net: phy: aquantia: wait for FW reset before checking the vendor ID
Posted by Russell King (Oracle) 1 year, 4 months ago
On Tue, Jul 30, 2024 at 10:59:59AM +0100, Jon Hunter wrote:
> Hi Bartosz,
> 
> On 08/07/2024 08:50, Bartosz Golaszewski wrote:
> > From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
> > 
> > Checking the firmware register before it complete the boot process makes
> > no sense, it will report 0 even if FW is available from internal memory.
> > Always wait for FW to boot before continuing or we'll unnecessarily try
> > to load it from nvmem/filesystem and fail.
> > 
> > Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
> > ---
> >   drivers/net/phy/aquantia/aquantia_firmware.c | 4 ++++
> >   1 file changed, 4 insertions(+)
> > 
> > diff --git a/drivers/net/phy/aquantia/aquantia_firmware.c b/drivers/net/phy/aquantia/aquantia_firmware.c
> > index 0c9640ef153b..524627a36c6f 100644
> > --- a/drivers/net/phy/aquantia/aquantia_firmware.c
> > +++ b/drivers/net/phy/aquantia/aquantia_firmware.c
> > @@ -353,6 +353,10 @@ int aqr_firmware_load(struct phy_device *phydev)
> >   {
> >   	int ret;
> > +	ret = aqr_wait_reset_complete(phydev);
> > +	if (ret)
> > +		return ret;
> > +
> >   	/* Check if the firmware is not already loaded by pooling
> >   	 * the current version returned by the PHY. If 0 is returned,
> >   	 * no firmware is loaded.
> 
> 
> Although this fixed another issue we were seeing with this driver, we have
> been reviewing this change and have a question about it.
> 
> According to the description for the function aqr_wait_reset_complete() this
> function is intended to give the device time to load firmware and check
> there is a valid firmware ID.
> 
> If a valid firmware ID (non-zero) is detected, then
> aqr_wait_reset_complete() will return 0 (because phy_read_mmd_poll_timeout()
> returns 0 on success and -ETIMEDOUT upon a timeout).
> 
> If it times out, then it would appear that with the above code we don't
> attempt to load the firmware by any other means?

I'm also wondering about aqr_wait_reset_complete(). It uses
phy_read_mmd_poll_timeout(), which prints an error message if it times
out (which means no firmware has been loaded.) If we're then going on to
attempt to load firmware, the error is not an error at all. So, I think
while phy_read_poll_timeout() is nice and convenient, we need something
like:

#define phy_read_poll_timeout_quiet(phydev, regnum, val, cond, sleep_us, \
                                    timeout_us, sleep_before_read) \
({ \
        int __ret, __val; \
        __ret = read_poll_timeout(__val = phy_read, val, \
                                  __val < 0 || (cond), \
                sleep_us, timeout_us, sleep_before_read, phydev, regnum); \
        if (__val < 0) \
                __ret = __val; \
        __ret; \
})

#define phy_read_poll_timeout(phydev, regnum, val, cond, sleep_us, \
                                timeout_us, sleep_before_read) \
({ \
        int __ret = phy_read_poll_timeout_quiet(phydev, regnum, val, cond, \
						sleep_us, timeout_us, \
						sleep_before_read); \
        if (__ret) \
                phydev_err(phydev, "%s failed: %d\n", __func__, __ret); \
        __ret; \
})

and aqr_wait_reset_complete() needs to use phy_read_poll_timeout_quiet().

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
Re: [RESEND PATCH net-next v3 2/4] net: phy: aquantia: wait for FW reset before checking the vendor ID
Posted by Vladimir Oltean 1 year, 4 months ago
Hi Russell,

On Tue, Jul 30, 2024 at 12:23:45PM +0100, Russell King (Oracle) wrote:
> > If it times out, then it would appear that with the above code we don't
> > attempt to load the firmware by any other means?
> 
> I'm also wondering about aqr_wait_reset_complete(). It uses
> phy_read_mmd_poll_timeout(), which prints an error message if it times
> out (which means no firmware has been loaded.) If we're then going on to
> attempt to load firmware, the error is not an error at all. So, I think
> while phy_read_poll_timeout() is nice and convenient, we need something
> like:
> 
> #define phy_read_poll_timeout_quiet(phydev, regnum, val, cond, sleep_us, \
>                                     timeout_us, sleep_before_read) \
> ({ \
>         int __ret, __val; \
>         __ret = read_poll_timeout(__val = phy_read, val, \
>                                   __val < 0 || (cond), \
>                 sleep_us, timeout_us, sleep_before_read, phydev, regnum); \
>         if (__val < 0) \
>                 __ret = __val; \
>         __ret; \
> })
> 
> #define phy_read_poll_timeout(phydev, regnum, val, cond, sleep_us, \
>                                 timeout_us, sleep_before_read) \
> ({ \
>         int __ret = phy_read_poll_timeout_quiet(phydev, regnum, val, cond, \
> 						sleep_us, timeout_us, \
> 						sleep_before_read); \
>         if (__ret) \
>                 phydev_err(phydev, "%s failed: %d\n", __func__, __ret); \
>         __ret; \
> })
> 
> and aqr_wait_reset_complete() needs to use phy_read_poll_timeout_quiet().

I agree that aqr_wait_reset_complete() shouldn't have built-in prints in it,
as long as failures are also expected. Maybe an alternative option would
be for aqr_wait_reset_complete() to manually roll a call to read_poll_timeout(),
considering how it would be nice for _actual_ errors (not -ETIMEDOUT)
from phy_read_mmd() to still be logged.

But it seems strange that the driver has to time out on a 2 second poll,
and then it's still not sure why VEND1_GLOBAL_FW_ID still reads 0?
Is it because there's no firmware, or because there is, but it hasn't
waited for long enough?

I haven't followed the development of AQR firmware loading. Isn't there
a faster and more reliable way of determining whether there is firmware
in the first place? It could give the driver a 2 second boot-time speedup,
plus more confidence.