[PATCH] net: sfp: reset fault retry counter on successful reinitialisation

Stefan Mahr posted 1 patch 3 years, 8 months ago
drivers/net/phy/sfp.c | 3 +++
1 file changed, 3 insertions(+)
[PATCH] net: sfp: reset fault retry counter on successful reinitialisation
Posted by Stefan Mahr 3 years, 8 months ago
This patch resets the fault retry counter to the default value, if the
module reinitialisation was successful. Otherwise without resetting
the counter, five (N_FAULT/N_FAULT_INIT) single TX_FAULT events will
deactivate the module persistently.

In case the reinitialisation was not successful after five retries,
the module is still being deactivated.

Signed-off-by: Stefan Mahr <dac922@gmx.de>
---
 drivers/net/phy/sfp.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/phy/sfp.c b/drivers/net/phy/sfp.c
index 63f90fe9a4d2..a8d7a713222a 100644
--- a/drivers/net/phy/sfp.c
+++ b/drivers/net/phy/sfp.c
@@ -2263,6 +2263,9 @@ static void sfp_sm_main(struct sfp *sfp, unsigned int event)
 		} else if (event == SFP_E_TIMEOUT || event == SFP_E_TX_CLEAR) {
 			dev_info(sfp->dev, "module transmit fault recovered\n");
 			sfp_sm_link_check_los(sfp);
+
+			/* Reset the fault retry count */
+			sfp->sm_fault_retries = N_FAULT;
 		}
 		break;

--
2.25.1
Re: [PATCH] net: sfp: reset fault retry counter on successful reinitialisation
Posted by Russell King (Oracle) 3 years, 8 months ago
On Fri, Aug 12, 2022 at 03:04:38PM +0200, Stefan Mahr wrote:
> This patch resets the fault retry counter to the default value, if the
> module reinitialisation was successful. Otherwise without resetting
> the counter, five (N_FAULT/N_FAULT_INIT) single TX_FAULT events will
> deactivate the module persistently.

This is intentional - if a module keeps asserting its TX_FAULT status,
then there is something wrong with it, and for an optical module it
means that the laser is overloading (producing more output than it
should.) That is a safety issue.

So, if the module keeps indicating that its laser is misbehaving the
only right thing to do is to shut it down permanently, and have
someone investigate.

What issue are you seeing that needs a different behaviour?

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
Re: [PATCH] net: sfp: reset fault retry counter on successful reinitialisation
Posted by Stefan Mahr 3 years, 8 months ago
> On Fri, Aug 12, 2022 at 03:04:38PM +0200, Stefan Mahr wrote:
>> This patch resets the fault retry counter to the default value, if the
>> module reinitialisation was successful. Otherwise without resetting
>> the counter, five (N_FAULT/N_FAULT_INIT) single TX_FAULT events will
>> deactivate the module persistently.
>
> This is intentional - if a module keeps asserting its TX_FAULT status,
> then there is something wrong with it, and for an optical module it
> means that the laser is overloading (producing more output than it
> should.) That is a safety issue.

Yes, this behaviour is not changed by the patch. When TX_FAULT is true
persistently for 5 retries, the module will still be disabled.

>
> So, if the module keeps indicating that its laser is misbehaving the
> only right thing to do is to shut it down permanently, and have
> someone investigate.
>
> What issue are you seeing that needs a different behaviour?
>

In my case two different SFP modules (1000Base-BX) indicate short
TX_FAULT events for less than one second and only one per week -
sometimes more, sometimes less. So the idea was to reset the counter if
reinitialisation was successful, so rare single TX_FAULT events can be
ignored.

However, you are right: If a module reports TX_FAULT several times,
something serious may be wrong.