[RFC PATCH net-next] net: dsa: sja1105: let phylink help with the replay of link callbacks

Vladimir Oltean posted 1 patch 1 month, 3 weeks ago
drivers/net/dsa/sja1105/sja1105_main.c | 58 ++++----------------------
drivers/net/phy/phylink.c              | 51 +++++++++++++++++++++-
include/linux/phylink.h                |  5 +++
3 files changed, 63 insertions(+), 51 deletions(-)
[RFC PATCH net-next] net: dsa: sja1105: let phylink help with the replay of link callbacks
Posted by Vladimir Oltean 1 month, 3 weeks ago
sja1105_static_config_reload() changes major settings in the switch and
it requires a reset. A use case is to change things like Qdiscs (but see
sja1105_reset_reasons[] for full list) while PTP synchronization is
running, and the servo loop must not exit the locked state (s2).
Therefore, stopping and restarting the phylink instances of all ports is
not desirable, because that also stops the phylib state machine, and
retriggers a seconds-long auto-negotiation process that breaks PTP.
Thus, saving and restoring the link management settings is handled
privately by the driver.

The method got progressively more complex as SGMII support got added,
because this is handled through the xpcs phylink_pcs component, to which
we don't have unfettered access. Nonetheless, the switch reset line is
hardwired to also reset the XPCS, creating a situation where it loses
state and needs to be reprogrammed at a moment in time outside phylink's
control.

Although commits 907476c66d73 ("net: dsa: sja1105: call PCS
config/link_up via pcs_ops structure") and 41bf58314b17 ("net: dsa:
sja1105: use phylink_pcs internally") made the sja1105 <-> xpcs
interaction slightly prettier, we still depend heavily on the PCS being
"XPCS-like", because to back up its settings, we read the MII_BMCR
register, through a mdiobus_c45_read() operation, breaking all layering
separation.

But the phylink instance already has all that state, and more. It's just
that it's private. In this proposal, phylink offers 2 helpers for
walking the MAC and PCS drivers again through the callbacks required
during a destructive reset operation: mac_link_down() -> pcs_link_down()
-> mac_config() -> pcs_config() -> mac_link_up() -> pcs_link_up().

This creates the unique opportunity to simplify away even more code than
just the xpcs handling from sja1105_static_config_reload().
The sja1105_set_port_config() method is also invoked from
sja1105_mac_link_up(). And since that is now called directly by
phylink.. we can just remove it from sja1105_static_config_reload().
This makes it possible to re-merge sja1105_set_port_speed() and
sja1105_set_port_config() in a later change.

Note that my only setups with sja1105 where the xpcs is used is with the
xpcs on the CPU-facing port (fixed-link). Thus, I cannot test xpcs + PHY.
But the replay procedure walks through all ports, and I did test a
regular RGMII user port + a PHY.

ptp4l[54.552]: master offset          5 s2 freq    -931 path delay       764
ptp4l[55.551]: master offset         22 s2 freq    -913 path delay       764
ptp4l[56.551]: master offset         13 s2 freq    -915 path delay       765
ptp4l[57.552]: master offset          5 s2 freq    -919 path delay       765
ptp4l[58.553]: master offset         13 s2 freq    -910 path delay       765
ptp4l[59.553]: master offset         13 s2 freq    -906 path delay       765
ptp4l[60.553]: master offset          6 s2 freq    -909 path delay       765
ptp4l[61.553]: master offset          6 s2 freq    -907 path delay       765
ptp4l[62.553]: master offset          6 s2 freq    -906 path delay       765
ptp4l[63.553]: master offset         14 s2 freq    -896 path delay       765
$ ip link set br0 type bridge vlan_filtering 1
[   63.983283] sja1105 spi2.0 sw0p0: Link is Down
[   63.991913] sja1105 spi2.0: Link is Down
[   64.009784] sja1105 spi2.0: Reset switch and programmed static config. Reason: VLAN filtering
[   64.020217] sja1105 spi2.0 sw0p0: Link is Up - 1Gbps/Full - flow control off
[   64.030683] sja1105 spi2.0: Link is Up - 1Gbps/Full - flow control off
ptp4l[64.554]: master offset       7397 s2 freq   +6491 path delay       765
ptp4l[65.554]: master offset         38 s2 freq   +1352 path delay       765
ptp4l[66.554]: master offset      -2225 s2 freq    -900 path delay       764
ptp4l[67.555]: master offset      -2226 s2 freq   -1569 path delay       765
ptp4l[68.555]: master offset      -1553 s2 freq   -1563 path delay       765
ptp4l[69.555]: master offset       -865 s2 freq   -1341 path delay       765
ptp4l[70.555]: master offset       -401 s2 freq   -1137 path delay       765
ptp4l[71.556]: master offset       -145 s2 freq   -1001 path delay       765
ptp4l[72.558]: master offset        -26 s2 freq    -926 path delay       765
ptp4l[73.557]: master offset         30 s2 freq    -877 path delay       765
ptp4l[74.557]: master offset         47 s2 freq    -851 path delay       765
ptp4l[75.557]: master offset         29 s2 freq    -855 path delay       765

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
This was not what was discussed in
https://lore.kernel.org/netdev/E1ssjcz-005Ns9-D5@rmk-PC.armlinux.org.uk/,
but I will approach that perhaps differently, depending on the feedback here.

 drivers/net/dsa/sja1105/sja1105_main.c | 58 ++++----------------------
 drivers/net/phy/phylink.c              | 51 +++++++++++++++++++++-
 include/linux/phylink.h                |  5 +++
 3 files changed, 63 insertions(+), 51 deletions(-)

diff --git a/drivers/net/dsa/sja1105/sja1105_main.c b/drivers/net/dsa/sja1105/sja1105_main.c
index af38b8959d8d..864f697105d8 100644
--- a/drivers/net/dsa/sja1105/sja1105_main.c
+++ b/drivers/net/dsa/sja1105/sja1105_main.c
@@ -2291,14 +2291,12 @@ int sja1105_static_config_reload(struct sja1105_private *priv,
 {
 	struct ptp_system_timestamp ptp_sts_before;
 	struct ptp_system_timestamp ptp_sts_after;
-	u16 bmcr[SJA1105_MAX_NUM_PORTS] = {0};
-	u64 mac_speed[SJA1105_MAX_NUM_PORTS];
 	struct sja1105_mac_config_entry *mac;
 	struct dsa_switch *ds = priv->ds;
+	struct dsa_port *dp;
 	s64 t1, t2, t3, t4;
-	s64 t12, t34;
-	int rc, i;
-	s64 now;
+	s64 t12, t34, now;
+	int rc;
 
 	mutex_lock(&priv->fdb_lock);
 	mutex_lock(&priv->mgmt_lock);
@@ -2310,13 +2308,9 @@ int sja1105_static_config_reload(struct sja1105_private *priv,
 	 * switch wants to see in the static config in order to allow us to
 	 * change it through the dynamic interface later.
 	 */
-	for (i = 0; i < ds->num_ports; i++) {
-		mac_speed[i] = mac[i].speed;
-		mac[i].speed = priv->info->port_speed[SJA1105_SPEED_AUTO];
-
-		if (priv->pcs[i])
-			bmcr[i] = mdiobus_c45_read(priv->mdio_pcs, i,
-						   MDIO_MMD_VEND2, MDIO_CTRL1);
+	dsa_switch_for_each_available_port(dp, ds) {
+		phylink_replay_link_begin(dp->pl);
+		mac[dp->index].speed = priv->info->port_speed[SJA1105_SPEED_AUTO];
 	}
 
 	/* No PTP operations can run right now */
@@ -2370,44 +2364,8 @@ int sja1105_static_config_reload(struct sja1105_private *priv,
 			goto out;
 	}
 
-	for (i = 0; i < ds->num_ports; i++) {
-		struct phylink_pcs *pcs = priv->pcs[i];
-		unsigned int neg_mode;
-
-		mac[i].speed = mac_speed[i];
-		rc = sja1105_set_port_config(priv, i);
-		if (rc < 0)
-			goto out;
-
-		if (!pcs)
-			continue;
-
-		if (bmcr[i] & BMCR_ANENABLE)
-			neg_mode = PHYLINK_PCS_NEG_INBAND_ENABLED;
-		else
-			neg_mode = PHYLINK_PCS_NEG_OUTBAND;
-
-		rc = pcs->ops->pcs_config(pcs, neg_mode, priv->phy_mode[i],
-					  NULL, true);
-		if (rc < 0)
-			goto out;
-
-		if (neg_mode == PHYLINK_PCS_NEG_OUTBAND) {
-			int speed = SPEED_UNKNOWN;
-
-			if (priv->phy_mode[i] == PHY_INTERFACE_MODE_2500BASEX)
-				speed = SPEED_2500;
-			else if (bmcr[i] & BMCR_SPEED1000)
-				speed = SPEED_1000;
-			else if (bmcr[i] & BMCR_SPEED100)
-				speed = SPEED_100;
-			else
-				speed = SPEED_10;
-
-			pcs->ops->pcs_link_up(pcs, neg_mode, priv->phy_mode[i],
-					      speed, DUPLEX_FULL);
-		}
-	}
+	dsa_switch_for_each_available_port(dp, ds)
+		 phylink_replay_link_end(dp->pl);
 
 	rc = sja1105_reload_cbs(priv);
 	if (rc < 0)
diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index 4309317de3d1..ba13f80ed45f 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -80,6 +80,7 @@ struct phylink {
 
 	bool mac_link_dropped;
 	bool using_mac_select_pcs;
+	bool force_major_config;
 
 	struct sfp_bus *sfp_bus;
 	bool sfp_may_have_phy;
@@ -1551,7 +1552,8 @@ static void phylink_resolve(struct work_struct *w)
 	}
 
 	if (mac_config) {
-		if (link_state.interface != pl->link_config.interface) {
+		if (link_state.interface != pl->link_config.interface ||
+		    pl->force_major_config) {
 			/* The interface has changed, force the link down and
 			 * then reconfigure.
 			 */
@@ -1561,6 +1563,7 @@ static void phylink_resolve(struct work_struct *w)
 			}
 			phylink_major_config(pl, false, &link_state);
 			pl->link_config.interface = link_state.interface;
+			pl->force_major_config = false;
 		}
 	}
 
@@ -3870,6 +3873,52 @@ void phylink_mii_c45_pcs_get_state(struct mdio_device *pcs,
 }
 EXPORT_SYMBOL_GPL(phylink_mii_c45_pcs_get_state);
 
+/**
+ * phylink_replay_link_begin() - begin replay of link callbacks for driver
+ *				 which loses state
+ * @pl: a pointer to a &struct phylink returned from phylink_create()
+ *
+ * Helper for MAC drivers which may perform a destructive reset at runtime.
+ * Both the own driver's mac_link_down() method is called, as well as the
+ * pcs_link_down() method of the split PCS (if any).
+ *
+ * This is similar to phylink_stop(), except it does not alter the state of
+ * the phylib PHY (it is assumed that it is not affected by the MAC destructive
+ * reset).
+ */
+void phylink_replay_link_begin(struct phylink *pl)
+{
+	ASSERT_RTNL();
+
+	phylink_run_resolve_and_disable(pl, PHYLINK_DISABLE_STOPPED);
+}
+EXPORT_SYMBOL_GPL(phylink_replay_link_begin);
+
+/**
+ * phylink_replay_link_end() - end replay of link callbacks for driver
+ *			       which lost state
+ * @pl: a pointer to a &struct phylink returned from phylink_create()
+ *
+ * Helper for MAC drivers which may perform a destructive reset at runtime.
+ * Both the own driver's mac_config() and mac_link_up() methods, as well as the
+ * pcs_config() and pcs_link_up() method of the split PCS (if any), are called.
+ *
+ * This is similar to phylink_start(), except it does not alter the state of
+ * the phylib PHY.
+ *
+ * One must call this method only within the same rtnl_lock() critical section
+ * as a previous phylink_replay_link_start().
+ */
+void phylink_replay_link_end(struct phylink *pl)
+{
+	ASSERT_RTNL();
+
+	pl->force_major_config = true;
+	phylink_enable_and_run_resolve(pl, PHYLINK_DISABLE_STOPPED);
+	flush_work(&pl->resolve);
+}
+EXPORT_SYMBOL_GPL(phylink_replay_link_end);
+
 static int __init phylink_init(void)
 {
 	for (int i = 0; i < ARRAY_SIZE(phylink_sfp_interface_preference); ++i)
diff --git a/include/linux/phylink.h b/include/linux/phylink.h
index 5c01048860c4..d978daafd0e9 100644
--- a/include/linux/phylink.h
+++ b/include/linux/phylink.h
@@ -687,4 +687,9 @@ void phylink_mii_c45_pcs_get_state(struct mdio_device *pcs,
 
 void phylink_decode_usxgmii_word(struct phylink_link_state *state,
 				 uint16_t lpa);
+
+void phylink_replay_link_begin(struct phylink *pl);
+
+void phylink_replay_link_end(struct phylink *pl);
+
 #endif
-- 
2.43.0