[net-next PATCH v4 03/11] net: phylink: introduce internal phylink PCS handling

Christian Marangi posted 11 patches 4 months ago
[net-next PATCH v4 03/11] net: phylink: introduce internal phylink PCS handling
Posted by Christian Marangi 4 months ago
Introduce internal handling of PCS for phylink. This is an alternative
to .mac_select_pcs that moves the selection logic of the PCS entirely to
phylink with the usage of the supported_interface value in the PCS
struct.

MAC should now provide an array of available PCS in phylink_config in
.available_pcs and fill the .num_available_pcs with the number of
elements in the array. MAC should also define a new bitmap,
pcs_interfaces, in phylink_config to define for what interface mode a
dedicated PCS is required.

On phylink_create() this array is parsed and a linked list of PCS is
created based on the PCS passed in phylink_config.
Also the supported_interface value in phylink struct is updated with the
new supported_interface from the provided PCS.

On phylink_start() every PCS in phylink PCS list gets attached to the
phylink instance. This is done by setting the phylink value in
phylink_pcs struct to the phylink instance.

On phylink_stop(), every PCS in phylink PCS list is detached from the
phylink instance. This is done by setting the phylink value in
phylink_pcs struct to NULL.

phylink_validate_mac_and_pcs(), phylink_major_config() and
phylink_inband_caps() are updated to support this new implementation
with the PCS list stored in phylink.

They will make use of phylink_validate_pcs_interface() that will loop
for every PCS in the phylink PCS available list and find one that supports
the passed interface.

phylink_validate_pcs_interface() applies the same logic of .mac_select_pcs
where if a supported_interface value is not set for the PCS struct, then
it's assumed every interface is supported.

A MAC is required to implement either a .mac_select_pcs or make use of
the PCS list implementation. Implementing both will result in a fail
on MAC/PCS validation.

phylink value in phylink_pcs struct with this implementation is used to
track from PCS side when it's attached to a phylink instance. PCS driver
will make use of this information to correctly detach from a phylink
instance if needed.

The .mac_select_pcs implementation is not changed but it's expected that
every MAC driver migrates to the new implementation to later deprecate
and remove .mac_select_pcs.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
---
 drivers/net/phy/phylink.c | 147 +++++++++++++++++++++++++++++++++-----
 include/linux/phylink.h   |  10 +++
 2 files changed, 139 insertions(+), 18 deletions(-)

diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index ec42fd278604..95d7e06dee56 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -59,6 +59,9 @@ struct phylink {
 	/* The link configuration settings */
 	struct phylink_link_state link_config;
 
+	/* List of available PCS */
+	struct list_head pcs_list;
+
 	/* What interface are supported by the current link.
 	 * Can change on removal or addition of new PCS.
 	 */
@@ -144,6 +147,8 @@ static const phy_interface_t phylink_sfp_interface_preference[] = {
 
 static DECLARE_PHY_INTERFACE_MASK(phylink_sfp_interfaces);
 
+static void phylink_run_resolve(struct phylink *pl);
+
 /**
  * phylink_set_port_modes() - set the port type modes in the ethtool mask
  * @mask: ethtool link mode mask
@@ -499,22 +504,59 @@ static void phylink_validate_mask_caps(unsigned long *supported,
 	linkmode_and(state->advertising, state->advertising, mask);
 }
 
+static int phylink_validate_pcs_interface(struct phylink_pcs *pcs,
+					  phy_interface_t interface)
+{
+	/* If PCS define an empty supported_interfaces value, assume
+	 * all interface are supported.
+	 */
+	if (phy_interface_empty(pcs->supported_interfaces))
+		return 0;
+
+	/* Ensure that this PCS supports the interface mode */
+	if (!test_bit(interface, pcs->supported_interfaces))
+		return -EINVAL;
+
+	return 0;
+}
+
 static int phylink_validate_mac_and_pcs(struct phylink *pl,
 					unsigned long *supported,
 					struct phylink_link_state *state)
 {
-	struct phylink_pcs *pcs = NULL;
 	unsigned long capabilities;
+	struct phylink_pcs *pcs;
+	bool pcs_found = false;
 	int ret;
 
 	/* Get the PCS for this interface mode */
 	if (pl->mac_ops->mac_select_pcs) {
+		/* Make sure either PCS internal validation or .mac_select_pcs
+		 * is used. Return error if both are defined.
+		 */
+		if (!list_empty(&pl->pcs_list)) {
+			phylink_err(pl, "either phylink_pcs_add() or .mac_select_pcs must be used\n");
+			return -EINVAL;
+		}
+
 		pcs = pl->mac_ops->mac_select_pcs(pl->config, state->interface);
 		if (IS_ERR(pcs))
 			return PTR_ERR(pcs);
+
+		pcs_found = !!pcs;
+	} else {
+		/* Check every assigned PCS and search for one that supports
+		 * the interface.
+		 */
+		list_for_each_entry(pcs, &pl->pcs_list, list) {
+			if (!phylink_validate_pcs_interface(pcs, state->interface)) {
+				pcs_found = true;
+				break;
+			}
+		}
 	}
 
-	if (pcs) {
+	if (pcs_found) {
 		/* The PCS, if present, must be setup before phylink_create()
 		 * has been called. If the ops is not initialised, print an
 		 * error and backtrace rather than oopsing the kernel.
@@ -526,13 +568,10 @@ static int phylink_validate_mac_and_pcs(struct phylink *pl,
 			return -EINVAL;
 		}
 
-		/* Ensure that this PCS supports the interface which the MAC
-		 * returned it for. It is an error for the MAC to return a PCS
-		 * that does not support the interface mode.
-		 */
-		if (!phy_interface_empty(pcs->supported_interfaces) &&
-		    !test_bit(state->interface, pcs->supported_interfaces)) {
-			phylink_err(pl, "MAC returned PCS which does not support %s\n",
+		/* Recheck PCS to handle legacy way for .mac_select_pcs */
+		ret = phylink_validate_pcs_interface(pcs, state->interface);
+		if (ret) {
+			phylink_err(pl, "selected PCS does not support %s\n",
 				    phy_modes(state->interface));
 			return -EINVAL;
 		}
@@ -937,12 +976,22 @@ static unsigned int phylink_inband_caps(struct phylink *pl,
 					 phy_interface_t interface)
 {
 	struct phylink_pcs *pcs;
+	bool pcs_found = false;
 
-	if (!pl->mac_ops->mac_select_pcs)
-		return 0;
+	if (pl->mac_ops->mac_select_pcs) {
+		pcs = pl->mac_ops->mac_select_pcs(pl->config,
+						  interface);
+		pcs_found = !!pcs;
+	} else {
+		list_for_each_entry(pcs, &pl->pcs_list, list) {
+			if (!phylink_validate_pcs_interface(pcs, interface)) {
+				pcs_found = true;
+				break;
+			}
+		}
+	}
 
-	pcs = pl->mac_ops->mac_select_pcs(pl->config, interface);
-	if (!pcs)
+	if (!pcs_found)
 		return 0;
 
 	return phylink_pcs_inband_caps(pcs, interface);
@@ -1228,10 +1277,36 @@ static void phylink_major_config(struct phylink *pl, bool restart,
 			pl->major_config_failed = true;
 			return;
 		}
+	/* Find a PCS in available PCS list for the requested interface.
+	 * This doesn't overwrite the previous .mac_select_pcs as either
+	 * .mac_select_pcs or PCS list implementation are permitted.
+	 *
+	 * Skip searching if the MAC doesn't require a dedicaed PCS for
+	 * the requested interface.
+	 */
+	} else if (test_bit(state->interface, pl->config->pcs_interfaces)) {
+		bool pcs_found = false;
+
+		list_for_each_entry(pcs, &pl->pcs_list, list) {
+			if (!phylink_validate_pcs_interface(pcs,
+							    state->interface)) {
+				pcs_found = true;
+				break;
+			}
+		}
+
+		if (!pcs_found) {
+			phylink_err(pl,
+				    "couldn't find a PCS for %s\n",
+				    phy_modes(state->interface));
 
-		pcs_changed = pl->pcs != pcs;
+			pl->major_config_failed = true;
+			return;
+		}
 	}
 
+	pcs_changed = pl->pcs != pcs;
+
 	phylink_pcs_neg_mode(pl, pcs, state->interface, state->advertising);
 
 	phylink_dbg(pl, "major config, active %s/%s/%s\n",
@@ -1258,10 +1333,12 @@ static void phylink_major_config(struct phylink *pl, bool restart,
 	if (pcs_changed) {
 		phylink_pcs_disable(pl->pcs);
 
-		if (pl->pcs)
-			pl->pcs->phylink = NULL;
+		if (pl->mac_ops->mac_select_pcs) {
+			if (pl->pcs)
+				pl->pcs->phylink = NULL;
 
-		pcs->phylink = pl;
+			pcs->phylink = pl;
+		}
 
 		pl->pcs = pcs;
 	}
@@ -1797,8 +1874,9 @@ struct phylink *phylink_create(struct phylink_config *config,
 			       phy_interface_t iface,
 			       const struct phylink_mac_ops *mac_ops)
 {
+	struct phylink_pcs *pcs;
 	struct phylink *pl;
-	int ret;
+	int i, ret;
 
 	/* Validate the supplied configuration */
 	if (phy_interface_empty(config->supported_interfaces)) {
@@ -1813,9 +1891,21 @@ struct phylink *phylink_create(struct phylink_config *config,
 
 	mutex_init(&pl->state_mutex);
 	INIT_WORK(&pl->resolve, phylink_resolve);
+	INIT_LIST_HEAD(&pl->pcs_list);
+
+	/* Fill the PCS list with available PCS from phylink config */
+	for (i = 0; i < config->num_available_pcs; i++) {
+		pcs = config->available_pcs[i];
+
+		list_add(&pcs->list, &pl->pcs_list);
+	}
 
 	phy_interface_copy(pl->supported_interfaces,
 			   config->supported_interfaces);
+	list_for_each_entry(pcs, &pl->pcs_list, list)
+		phy_interface_or(pl->supported_interfaces,
+				 pl->supported_interfaces,
+				 pcs->supported_interfaces);
 
 	pl->config = config;
 	if (config->type == PHYLINK_NETDEV) {
@@ -1894,10 +1984,16 @@ EXPORT_SYMBOL_GPL(phylink_create);
  */
 void phylink_destroy(struct phylink *pl)
 {
+	struct phylink_pcs *pcs, *tmp;
+
 	sfp_bus_del_upstream(pl->sfp_bus);
 	if (pl->link_gpio)
 		gpiod_put(pl->link_gpio);
 
+	/* Remove every PCS from phylink PCS list */
+	list_for_each_entry_safe(pcs, tmp, &pl->pcs_list, list)
+		list_del(&pcs->list);
+
 	cancel_work_sync(&pl->resolve);
 	kfree(pl);
 }
@@ -2374,6 +2470,7 @@ static irqreturn_t phylink_link_handler(int irq, void *data)
  */
 void phylink_start(struct phylink *pl)
 {
+	struct phylink_pcs *pcs;
 	bool poll = false;
 
 	ASSERT_RTNL();
@@ -2400,6 +2497,10 @@ void phylink_start(struct phylink *pl)
 
 	pl->pcs_state = PCS_STATE_STARTED;
 
+	/* link available PCS to phylink struct */
+	list_for_each_entry(pcs, &pl->pcs_list, list)
+		pcs->phylink = pl;
+
 	phylink_enable_and_run_resolve(pl, PHYLINK_DISABLE_STOPPED);
 
 	if (pl->cfg_link_an_mode == MLO_AN_FIXED && pl->link_gpio) {
@@ -2444,6 +2545,8 @@ EXPORT_SYMBOL_GPL(phylink_start);
  */
 void phylink_stop(struct phylink *pl)
 {
+	struct phylink_pcs *pcs;
+
 	ASSERT_RTNL();
 
 	if (pl->sfp_bus)
@@ -2461,6 +2564,14 @@ void phylink_stop(struct phylink *pl)
 	pl->pcs_state = PCS_STATE_DOWN;
 
 	phylink_pcs_disable(pl->pcs);
+
+	/* Drop link between phylink and PCS */
+	list_for_each_entry(pcs, &pl->pcs_list, list)
+		pcs->phylink = NULL;
+
+	/* Restore original supported interfaces */
+	phy_interface_copy(pl->supported_interfaces,
+			   pl->config->supported_interfaces);
 }
 EXPORT_SYMBOL_GPL(phylink_stop);
 
diff --git a/include/linux/phylink.h b/include/linux/phylink.h
index 30659b615fca..ef0b5a0729c8 100644
--- a/include/linux/phylink.h
+++ b/include/linux/phylink.h
@@ -150,12 +150,16 @@ enum phylink_op_type {
  *		     if MAC link is at %MLO_AN_FIXED mode.
  * @supported_interfaces: bitmap describing which PHY_INTERFACE_MODE_xxx
  *                        are supported by the MAC/PCS.
+ * @pcs_interfaces: bitmap describing for which PHY_INTERFACE_MODE_xxx a
+ *		    dedicated PCS is required.
  * @lpi_interfaces: bitmap describing which PHY interface modes can support
  *		    LPI signalling.
  * @mac_capabilities: MAC pause/speed/duplex capabilities.
  * @lpi_capabilities: MAC speeds which can support LPI signalling
  * @lpi_timer_default: Default EEE LPI timer setting.
  * @eee_enabled_default: If set, EEE will be enabled by phylink at creation time
+ * @available_pcs: array of available phylink_pcs PCS
+ * @num_available_pcs: num of available phylink_pcs PCS
  */
 struct phylink_config {
 	struct device *dev;
@@ -168,11 +172,14 @@ struct phylink_config {
 	void (*get_fixed_state)(struct phylink_config *config,
 				struct phylink_link_state *state);
 	DECLARE_PHY_INTERFACE_MASK(supported_interfaces);
+	DECLARE_PHY_INTERFACE_MASK(pcs_interfaces);
 	DECLARE_PHY_INTERFACE_MASK(lpi_interfaces);
 	unsigned long mac_capabilities;
 	unsigned long lpi_capabilities;
 	u32 lpi_timer_default;
 	bool eee_enabled_default;
+	struct phylink_pcs **available_pcs;
+	unsigned int num_available_pcs;
 };
 
 void phylink_limit_mac_speed(struct phylink_config *config, u32 max_speed);
@@ -469,6 +476,9 @@ struct phylink_pcs {
 	struct phylink *phylink;
 	bool poll;
 	bool rxc_always_on;
+
+	/* private: */
+	struct list_head list;
 };
 
 /**
-- 
2.48.1
Re: [net-next PATCH v4 03/11] net: phylink: introduce internal phylink PCS handling
Posted by Sean Anderson 4 months ago
On 5/11/25 16:12, Christian Marangi wrote:
> Introduce internal handling of PCS for phylink. This is an alternative
> to .mac_select_pcs that moves the selection logic of the PCS entirely to
> phylink with the usage of the supported_interface value in the PCS
> struct.
> 
> MAC should now provide an array of available PCS in phylink_config in
> .available_pcs and fill the .num_available_pcs with the number of
> elements in the array. MAC should also define a new bitmap,
> pcs_interfaces, in phylink_config to define for what interface mode a
> dedicated PCS is required.
> 
> On phylink_create() this array is parsed and a linked list of PCS is
> created based on the PCS passed in phylink_config.
> Also the supported_interface value in phylink struct is updated with the
> new supported_interface from the provided PCS.
> 
> On phylink_start() every PCS in phylink PCS list gets attached to the
> phylink instance. This is done by setting the phylink value in
> phylink_pcs struct to the phylink instance.
> 
> On phylink_stop(), every PCS in phylink PCS list is detached from the
> phylink instance. This is done by setting the phylink value in
> phylink_pcs struct to NULL.
> 
> phylink_validate_mac_and_pcs(), phylink_major_config() and
> phylink_inband_caps() are updated to support this new implementation
> with the PCS list stored in phylink.
> 
> They will make use of phylink_validate_pcs_interface() that will loop
> for every PCS in the phylink PCS available list and find one that supports
> the passed interface.
> 
> phylink_validate_pcs_interface() applies the same logic of .mac_select_pcs
> where if a supported_interface value is not set for the PCS struct, then
> it's assumed every interface is supported.
> 
> A MAC is required to implement either a .mac_select_pcs or make use of
> the PCS list implementation. Implementing both will result in a fail
> on MAC/PCS validation.
> 
> phylink value in phylink_pcs struct with this implementation is used to
> track from PCS side when it's attached to a phylink instance. PCS driver
> will make use of this information to correctly detach from a phylink
> instance if needed.
> 
> The .mac_select_pcs implementation is not changed but it's expected that
> every MAC driver migrates to the new implementation to later deprecate
> and remove .mac_select_pcs.

This introduces a completely parallel PCS selection system used by a
single driver. I don't think we want the maintenance burden and
complexity of two systems in perpetuity. So what is your plan for
conversion of existing drivers to your new system?

DSA drivers typically have different PCSs for each port. At the moment
these are selected with mac_select_pcs, allowing the use of a single
phylink_config for each port. If you need to pass PCSs through
phylink_config then each port will needs its own config. This may prove
difficult to integrate with the existing API.

> Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
> ---
>  drivers/net/phy/phylink.c | 147 +++++++++++++++++++++++++++++++++-----
>  include/linux/phylink.h   |  10 +++
>  2 files changed, 139 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
> index ec42fd278604..95d7e06dee56 100644
> --- a/drivers/net/phy/phylink.c
> +++ b/drivers/net/phy/phylink.c
> @@ -59,6 +59,9 @@ struct phylink {
>  	/* The link configuration settings */
>  	struct phylink_link_state link_config;
>  
> +	/* List of available PCS */
> +	struct list_head pcs_list;
> +
>  	/* What interface are supported by the current link.
>  	 * Can change on removal or addition of new PCS.
>  	 */
> @@ -144,6 +147,8 @@ static const phy_interface_t phylink_sfp_interface_preference[] = {
>  
>  static DECLARE_PHY_INTERFACE_MASK(phylink_sfp_interfaces);
>  
> +static void phylink_run_resolve(struct phylink *pl);
> +
>  /**
>   * phylink_set_port_modes() - set the port type modes in the ethtool mask
>   * @mask: ethtool link mode mask
> @@ -499,22 +504,59 @@ static void phylink_validate_mask_caps(unsigned long *supported,
>  	linkmode_and(state->advertising, state->advertising, mask);
>  }
>  
> +static int phylink_validate_pcs_interface(struct phylink_pcs *pcs,
> +					  phy_interface_t interface)
> +{
> +	/* If PCS define an empty supported_interfaces value, assume
> +	 * all interface are supported.
> +	 */
> +	if (phy_interface_empty(pcs->supported_interfaces))
> +		return 0;
> +
> +	/* Ensure that this PCS supports the interface mode */
> +	if (!test_bit(interface, pcs->supported_interfaces))
> +		return -EINVAL;
> +
> +	return 0;
> +}
> +
>  static int phylink_validate_mac_and_pcs(struct phylink *pl,
>  					unsigned long *supported,
>  					struct phylink_link_state *state)
>  {
> -	struct phylink_pcs *pcs = NULL;
>  	unsigned long capabilities;
> +	struct phylink_pcs *pcs;
> +	bool pcs_found = false;
>  	int ret;
>  
>  	/* Get the PCS for this interface mode */
>  	if (pl->mac_ops->mac_select_pcs) {
> +		/* Make sure either PCS internal validation or .mac_select_pcs
> +		 * is used. Return error if both are defined.
> +		 */
> +		if (!list_empty(&pl->pcs_list)) {
> +			phylink_err(pl, "either phylink_pcs_add() or .mac_select_pcs must be used\n");
> +			return -EINVAL;
> +		}
> +

This validation should be done in phylink_create.

>  		pcs = pl->mac_ops->mac_select_pcs(pl->config, state->interface);
>  		if (IS_ERR(pcs))
>  			return PTR_ERR(pcs);
> +
> +		pcs_found = !!pcs;
> +	} else {
> +		/* Check every assigned PCS and search for one that supports
> +		 * the interface.
> +		 */
> +		list_for_each_entry(pcs, &pl->pcs_list, list) {
> +			if (!phylink_validate_pcs_interface(pcs, state->interface)) {
> +				pcs_found = true;
> +				break;
> +			}
> +		}
>  	}
>  
> -	if (pcs) {
> +	if (pcs_found) {
>  		/* The PCS, if present, must be setup before phylink_create()
>  		 * has been called. If the ops is not initialised, print an
>  		 * error and backtrace rather than oopsing the kernel.
> @@ -526,13 +568,10 @@ static int phylink_validate_mac_and_pcs(struct phylink *pl,
>  			return -EINVAL;
>  		}
>  
> -		/* Ensure that this PCS supports the interface which the MAC
> -		 * returned it for. It is an error for the MAC to return a PCS
> -		 * that does not support the interface mode.
> -		 */
> -		if (!phy_interface_empty(pcs->supported_interfaces) &&
> -		    !test_bit(state->interface, pcs->supported_interfaces)) {
> -			phylink_err(pl, "MAC returned PCS which does not support %s\n",
> +		/* Recheck PCS to handle legacy way for .mac_select_pcs */
> +		ret = phylink_validate_pcs_interface(pcs, state->interface);
> +		if (ret) {
> +			phylink_err(pl, "selected PCS does not support %s\n",
>  				    phy_modes(state->interface));
>  			return -EINVAL;
>  		}
> @@ -937,12 +976,22 @@ static unsigned int phylink_inband_caps(struct phylink *pl,
>  					 phy_interface_t interface)
>  {
>  	struct phylink_pcs *pcs;
> +	bool pcs_found = false;
>  
> -	if (!pl->mac_ops->mac_select_pcs)
> -		return 0;
> +	if (pl->mac_ops->mac_select_pcs) {
> +		pcs = pl->mac_ops->mac_select_pcs(pl->config,
> +						  interface);
> +		pcs_found = !!pcs;
> +	} else {
> +		list_for_each_entry(pcs, &pl->pcs_list, list) {
> +			if (!phylink_validate_pcs_interface(pcs, interface)) {
> +				pcs_found = true;
> +				break;
> +			}
> +		}
> +	}
>  
> -	pcs = pl->mac_ops->mac_select_pcs(pl->config, interface);
> -	if (!pcs)
> +	if (!pcs_found)
>  		return 0;
>  
>  	return phylink_pcs_inband_caps(pcs, interface);
> @@ -1228,10 +1277,36 @@ static void phylink_major_config(struct phylink *pl, bool restart,
>  			pl->major_config_failed = true;
>  			return;
>  		}
> +	/* Find a PCS in available PCS list for the requested interface.
> +	 * This doesn't overwrite the previous .mac_select_pcs as either
> +	 * .mac_select_pcs or PCS list implementation are permitted.
> +	 *
> +	 * Skip searching if the MAC doesn't require a dedicaed PCS for

dedicated

> +	 * the requested interface.
> +	 */
> +	} else if (test_bit(state->interface, pl->config->pcs_interfaces)) {
> +		bool pcs_found = false;
> +
> +		list_for_each_entry(pcs, &pl->pcs_list, list) {
> +			if (!phylink_validate_pcs_interface(pcs,
> +							    state->interface)) {
> +				pcs_found = true;
> +				break;
> +			}
> +		}
> +
> +		if (!pcs_found) {
> +			phylink_err(pl,
> +				    "couldn't find a PCS for %s\n",
> +				    phy_modes(state->interface));
>  
> -		pcs_changed = pl->pcs != pcs;
> +			pl->major_config_failed = true;
> +			return;
> +		}
>  	}
>  
> +	pcs_changed = pl->pcs != pcs;
> +
>  	phylink_pcs_neg_mode(pl, pcs, state->interface, state->advertising);
>  
>  	phylink_dbg(pl, "major config, active %s/%s/%s\n",
> @@ -1258,10 +1333,12 @@ static void phylink_major_config(struct phylink *pl, bool restart,
>  	if (pcs_changed) {
>  		phylink_pcs_disable(pl->pcs);
>  
> -		if (pl->pcs)
> -			pl->pcs->phylink = NULL;
> +		if (pl->mac_ops->mac_select_pcs) {
> +			if (pl->pcs)
> +				pl->pcs->phylink = NULL;
>  
> -		pcs->phylink = pl;
> +			pcs->phylink = pl;
> +		}
>  
>  		pl->pcs = pcs;
>  	}
> @@ -1797,8 +1874,9 @@ struct phylink *phylink_create(struct phylink_config *config,
>  			       phy_interface_t iface,
>  			       const struct phylink_mac_ops *mac_ops)
>  {
> +	struct phylink_pcs *pcs;
>  	struct phylink *pl;
> -	int ret;
> +	int i, ret;
>  
>  	/* Validate the supplied configuration */
>  	if (phy_interface_empty(config->supported_interfaces)) {
> @@ -1813,9 +1891,21 @@ struct phylink *phylink_create(struct phylink_config *config,
>  
>  	mutex_init(&pl->state_mutex);
>  	INIT_WORK(&pl->resolve, phylink_resolve);
> +	INIT_LIST_HEAD(&pl->pcs_list);
> +
> +	/* Fill the PCS list with available PCS from phylink config */
> +	for (i = 0; i < config->num_available_pcs; i++) {
> +		pcs = config->available_pcs[i];
> +
> +		list_add(&pcs->list, &pl->pcs_list);
> +	}

Why do we have a separate linked list if we are getting all the PCSs
in an array at configuration time? From what I can tell there is no
dynamic configuration of PCSs.

>  
>  	phy_interface_copy(pl->supported_interfaces,
>  			   config->supported_interfaces);
> +	list_for_each_entry(pcs, &pl->pcs_list, list)
> +		phy_interface_or(pl->supported_interfaces,
> +				 pl->supported_interfaces,
> +				 pcs->supported_interfaces);
>  
>  	pl->config = config;
>  	if (config->type == PHYLINK_NETDEV) {
> @@ -1894,10 +1984,16 @@ EXPORT_SYMBOL_GPL(phylink_create);
>   */
>  void phylink_destroy(struct phylink *pl)
>  {
> +	struct phylink_pcs *pcs, *tmp;
> +
>  	sfp_bus_del_upstream(pl->sfp_bus);
>  	if (pl->link_gpio)
>  		gpiod_put(pl->link_gpio);
>  
> +	/* Remove every PCS from phylink PCS list */
> +	list_for_each_entry_safe(pcs, tmp, &pl->pcs_list, list)
> +		list_del(&pcs->list);
> +
>  	cancel_work_sync(&pl->resolve);
>  	kfree(pl);
>  }
> @@ -2374,6 +2470,7 @@ static irqreturn_t phylink_link_handler(int irq, void *data)
>   */
>  void phylink_start(struct phylink *pl)
>  {
> +	struct phylink_pcs *pcs;
>  	bool poll = false;
>  
>  	ASSERT_RTNL();
> @@ -2400,6 +2497,10 @@ void phylink_start(struct phylink *pl)
>  
>  	pl->pcs_state = PCS_STATE_STARTED;
>  
> +	/* link available PCS to phylink struct */
> +	list_for_each_entry(pcs, &pl->pcs_list, list)
> +		pcs->phylink = pl;
> +
>  	phylink_enable_and_run_resolve(pl, PHYLINK_DISABLE_STOPPED);
>  
>  	if (pl->cfg_link_an_mode == MLO_AN_FIXED && pl->link_gpio) {
> @@ -2444,6 +2545,8 @@ EXPORT_SYMBOL_GPL(phylink_start);
>   */
>  void phylink_stop(struct phylink *pl)
>  {
> +	struct phylink_pcs *pcs;
> +
>  	ASSERT_RTNL();
>  
>  	if (pl->sfp_bus)
> @@ -2461,6 +2564,14 @@ void phylink_stop(struct phylink *pl)
>  	pl->pcs_state = PCS_STATE_DOWN;
>  
>  	phylink_pcs_disable(pl->pcs);
> +
> +	/* Drop link between phylink and PCS */
> +	list_for_each_entry(pcs, &pl->pcs_list, list)
> +		pcs->phylink = NULL;
> +
> +	/* Restore original supported interfaces */
> +	phy_interface_copy(pl->supported_interfaces,
> +			   pl->config->supported_interfaces);
>  }
>  EXPORT_SYMBOL_GPL(phylink_stop);
>  
> diff --git a/include/linux/phylink.h b/include/linux/phylink.h
> index 30659b615fca..ef0b5a0729c8 100644
> --- a/include/linux/phylink.h
> +++ b/include/linux/phylink.h
> @@ -150,12 +150,16 @@ enum phylink_op_type {
>   *		     if MAC link is at %MLO_AN_FIXED mode.
>   * @supported_interfaces: bitmap describing which PHY_INTERFACE_MODE_xxx
>   *                        are supported by the MAC/PCS.
> + * @pcs_interfaces: bitmap describing for which PHY_INTERFACE_MODE_xxx a
> + *		    dedicated PCS is required.
>   * @lpi_interfaces: bitmap describing which PHY interface modes can support
>   *		    LPI signalling.
>   * @mac_capabilities: MAC pause/speed/duplex capabilities.
>   * @lpi_capabilities: MAC speeds which can support LPI signalling
>   * @lpi_timer_default: Default EEE LPI timer setting.
>   * @eee_enabled_default: If set, EEE will be enabled by phylink at creation time
> + * @available_pcs: array of available phylink_pcs PCS
> + * @num_available_pcs: num of available phylink_pcs PCS
>   */
>  struct phylink_config {
>  	struct device *dev;
> @@ -168,11 +172,14 @@ struct phylink_config {
>  	void (*get_fixed_state)(struct phylink_config *config,
>  				struct phylink_link_state *state);
>  	DECLARE_PHY_INTERFACE_MASK(supported_interfaces);
> +	DECLARE_PHY_INTERFACE_MASK(pcs_interfaces);
>  	DECLARE_PHY_INTERFACE_MASK(lpi_interfaces);
>  	unsigned long mac_capabilities;
>  	unsigned long lpi_capabilities;
>  	u32 lpi_timer_default;
>  	bool eee_enabled_default;
> +	struct phylink_pcs **available_pcs;
> +	unsigned int num_available_pcs;
>  };
>  
>  void phylink_limit_mac_speed(struct phylink_config *config, u32 max_speed);
> @@ -469,6 +476,9 @@ struct phylink_pcs {
>  	struct phylink *phylink;
>  	bool poll;
>  	bool rxc_always_on;
> +
> +	/* private: */
> +	struct list_head list;
>  };
>  
>  /**
Re: [net-next PATCH v4 03/11] net: phylink: introduce internal phylink PCS handling
Posted by Daniel Golle 4 months ago
On Tue, May 13, 2025 at 02:18:02PM -0400, Sean Anderson wrote:
> On 5/11/25 16:12, Christian Marangi wrote:
> > Introduce internal handling of PCS for phylink. This is an alternative
> > to .mac_select_pcs that moves the selection logic of the PCS entirely to
> > phylink with the usage of the supported_interface value in the PCS
> > struct.
> > 
> > MAC should now provide an array of available PCS in phylink_config in
> > .available_pcs and fill the .num_available_pcs with the number of
> > elements in the array. MAC should also define a new bitmap,
> > pcs_interfaces, in phylink_config to define for what interface mode a
> > dedicated PCS is required.
> > 
> > On phylink_create() this array is parsed and a linked list of PCS is
> > created based on the PCS passed in phylink_config.
> > Also the supported_interface value in phylink struct is updated with the
> > new supported_interface from the provided PCS.
> > 
> > On phylink_start() every PCS in phylink PCS list gets attached to the
> > phylink instance. This is done by setting the phylink value in
> > phylink_pcs struct to the phylink instance.
> > 
> > On phylink_stop(), every PCS in phylink PCS list is detached from the
> > phylink instance. This is done by setting the phylink value in
> > phylink_pcs struct to NULL.
> > 
> > phylink_validate_mac_and_pcs(), phylink_major_config() and
> > phylink_inband_caps() are updated to support this new implementation
> > with the PCS list stored in phylink.
> > 
> > They will make use of phylink_validate_pcs_interface() that will loop
> > for every PCS in the phylink PCS available list and find one that supports
> > the passed interface.
> > 
> > phylink_validate_pcs_interface() applies the same logic of .mac_select_pcs
> > where if a supported_interface value is not set for the PCS struct, then
> > it's assumed every interface is supported.
> > 
> > A MAC is required to implement either a .mac_select_pcs or make use of
> > the PCS list implementation. Implementing both will result in a fail
> > on MAC/PCS validation.
> > 
> > phylink value in phylink_pcs struct with this implementation is used to
> > track from PCS side when it's attached to a phylink instance. PCS driver
> > will make use of this information to correctly detach from a phylink
> > instance if needed.
> > 
> > The .mac_select_pcs implementation is not changed but it's expected that
> > every MAC driver migrates to the new implementation to later deprecate
> > and remove .mac_select_pcs.
> 
> This introduces a completely parallel PCS selection system used by a
> single driver. I don't think we want the maintenance burden and
> complexity of two systems in perpetuity. So what is your plan for
> conversion of existing drivers to your new system?

Moving functionality duplicated in many drivers to a common shared
implementation is nothing unsual.

While this series proposes the new mechanism for Airoha SoC, they are
immediately useful (and long awaited) also for MediaTek and Qualcomm
SoCs.

Also in the series you posted at least the macb driver (in "[net-next
PATCH v4 09/11] net: macb: Move most of mac_config to  mac_prepare")
would benefit from that shared implementation, as all it does in it's
mac_select_pcs is selecting the PCS by a given phy_interface_t, which is
what most Ethernet drivers which use more than one PCS are doing in
their implementatio of mac_select_pcs().

Also axienet_mac_select_pcs() from "[net-next PATCH v4 08/11] net:
axienet: Convert to use PCS subsystem" could obviously very easily be
mirated to use the phylink-internal handling of PCS selection.


> 
> DSA drivers typically have different PCSs for each port. At the moment
> these are selected with mac_select_pcs, allowing the use of a single
> phylink_config for each port. If you need to pass PCSs through
> phylink_config then each port will needs its own config. This may prove
> difficult to integrate with the existing API.

This might be a misunderstanding. Also here there is only a single
phylink_config for each MAC or DSA port, just instead of having many
more or less identical implementations of .mac_select_pcs, this
functionality is moved into phylink. As a nice side-effect that also
makes managing the life-cycle of the PCS more easy, so we won't need all
the wrappers for all the PCS OPs.
Re: [net-next PATCH v4 03/11] net: phylink: introduce internal phylink PCS handling
Posted by Sean Anderson 4 months ago
On 5/13/25 15:03, Daniel Golle wrote:
> On Tue, May 13, 2025 at 02:18:02PM -0400, Sean Anderson wrote:
>> On 5/11/25 16:12, Christian Marangi wrote:
>> > Introduce internal handling of PCS for phylink. This is an alternative
>> > to .mac_select_pcs that moves the selection logic of the PCS entirely to
>> > phylink with the usage of the supported_interface value in the PCS
>> > struct.
>> > 
>> > MAC should now provide an array of available PCS in phylink_config in
>> > .available_pcs and fill the .num_available_pcs with the number of
>> > elements in the array. MAC should also define a new bitmap,
>> > pcs_interfaces, in phylink_config to define for what interface mode a
>> > dedicated PCS is required.
>> > 
>> > On phylink_create() this array is parsed and a linked list of PCS is
>> > created based on the PCS passed in phylink_config.
>> > Also the supported_interface value in phylink struct is updated with the
>> > new supported_interface from the provided PCS.
>> > 
>> > On phylink_start() every PCS in phylink PCS list gets attached to the
>> > phylink instance. This is done by setting the phylink value in
>> > phylink_pcs struct to the phylink instance.
>> > 
>> > On phylink_stop(), every PCS in phylink PCS list is detached from the
>> > phylink instance. This is done by setting the phylink value in
>> > phylink_pcs struct to NULL.
>> > 
>> > phylink_validate_mac_and_pcs(), phylink_major_config() and
>> > phylink_inband_caps() are updated to support this new implementation
>> > with the PCS list stored in phylink.
>> > 
>> > They will make use of phylink_validate_pcs_interface() that will loop
>> > for every PCS in the phylink PCS available list and find one that supports
>> > the passed interface.
>> > 
>> > phylink_validate_pcs_interface() applies the same logic of .mac_select_pcs
>> > where if a supported_interface value is not set for the PCS struct, then
>> > it's assumed every interface is supported.
>> > 
>> > A MAC is required to implement either a .mac_select_pcs or make use of
>> > the PCS list implementation. Implementing both will result in a fail
>> > on MAC/PCS validation.
>> > 
>> > phylink value in phylink_pcs struct with this implementation is used to
>> > track from PCS side when it's attached to a phylink instance. PCS driver
>> > will make use of this information to correctly detach from a phylink
>> > instance if needed.
>> > 
>> > The .mac_select_pcs implementation is not changed but it's expected that
>> > every MAC driver migrates to the new implementation to later deprecate
>> > and remove .mac_select_pcs.
>> 
>> This introduces a completely parallel PCS selection system used by a
>> single driver. I don't think we want the maintenance burden and
>> complexity of two systems in perpetuity. So what is your plan for
>> conversion of existing drivers to your new system?
> 
> Moving functionality duplicated in many drivers to a common shared
> implementation is nothing unsual.
> 
> While this series proposes the new mechanism for Airoha SoC, they are
> immediately useful (and long awaited) also for MediaTek and Qualcomm
> SoCs.
>
> Also in the series you posted at least the macb driver (in "[net-next
> PATCH v4 09/11] net: macb: Move most of mac_config to  mac_prepare")
> would benefit from that shared implementation, as all it does in it's
> mac_select_pcs is selecting the PCS by a given phy_interface_t, which is
> what most Ethernet drivers which use more than one PCS are doing in
> their implementatio of mac_select_pcs().

I generally agree. The vast majority of select_pcs implementations just
determine the PCS based on the interface. But I don't think this is the
right approach. This touches a lot of other areas of the code and
reimplements much of the existing phylink machinery.

I would prefer something more self-contained like

phylink_generic_select_pcs(phylink, interface)
{
	for_each(pcs : phylink->available_pcss) {
		if (test_bit(pcs->supported, interface))
			return pcs;
	}

	return NULL;
}

which could be dropped into existing implementations in an incremental
manner.

I think the inclusion of PCS lifetime management in this patch
complicates things significantly.

> Also axienet_mac_select_pcs() from "[net-next PATCH v4 08/11] net:
> axienet: Convert to use PCS subsystem" could obviously very easily be
> mirated to use the phylink-internal handling of PCS selection.

But the bulk of the patch remains. We still have to add the external PCS
lookup. There still needs to be another case in mac_config to mux the
PCS correctly. And we would need to add some code to create a list of
PCSs. I don't know whether we win on the balance.

>> 
>> DSA drivers typically have different PCSs for each port. At the moment
>> these are selected with mac_select_pcs, allowing the use of a single
>> phylink_config for each port. If you need to pass PCSs through
>> phylink_config then each port will needs its own config. This may prove
>> difficult to integrate with the existing API.
> 
> This might be a misunderstanding. Also here there is only a single
> phylink_config for each MAC or DSA port,

My point is that while this is the case at the moment, it would not be
the case with a "generic" select pcs. You would need to modify the
config for each port to ensure the right PCSs are passed in.

> just instead of having many
> more or less identical implementations of .mac_select_pcs, this
> functionality is moved into phylink. As a nice side-effect that also
> makes managing the life-cycle of the PCS more easy, so we won't need all
> the wrappers for all the PCS OPs.

I think the wrapper approach is very obviously correct. This way has me
worried about exciting new concurrency bugs.

--Sean
Re: [net-next PATCH v4 03/11] net: phylink: introduce internal phylink PCS handling
Posted by Daniel Golle 4 months ago
On Tue, May 13, 2025 at 03:23:32PM -0400, Sean Anderson wrote:
> On 5/13/25 15:03, Daniel Golle wrote:
> > just instead of having many
> > more or less identical implementations of .mac_select_pcs, this
> > functionality is moved into phylink. As a nice side-effect that also
> > makes managing the life-cycle of the PCS more easy, so we won't need all
> > the wrappers for all the PCS OPs.
> 
> I think the wrapper approach is very obviously correct. This way has me
> worried about exciting new concurrency bugs.

You may not be surprised to read that this was also our starting point 2
months ago, I had implemented support for standalone PCS very similar to
the approach you have published now, using refcnt'ed instances and
locked wrapper functions for all OPs. My approach, like yours, was to
create a new subsystem for standalone PCS drivers which is orthogonal to
phylink and only requires very few very small changes to phylink itself.
It was a draft and not as complete and well-documented like your series
now, of course.

I've then shared that implementation with Christian and some other
experienced OpenWrt developers and we concluded that having phylink handle
the PCS lifecycle and PCS selection would be the better and more elegant
approach for multiple reasons:
 - The lifetime management of the wrapper instances becomes tricky:
   We would either have to live with them being allocated by the
   MAC-driver (imagine test-case doing unbind and then bind in a loop
   for a while -- we would end up oom). Or we need some kind of garbage
   collecting mechanism which frees the wrapper once refcnt is zero --
   and as .select_pcs would 'get' the PCS (ie. bump refcnt) we'd need a
   'put' equivalent (eg. a .pcs_destroy() OP) in phylink.

   Russell repeatedly pointed me to the possibility of a PCS
   "disappearing" (and potentially "reappearing" some time later), and
   in this case it is unclear who would then ever call pcs_put(), or
   even notify the Ethernet driver or phylink about the PCS now being
   available (again). Using device_link_add(), like it is done in
   pcs-rzn1-miic.c, prevents the worst (ie. use-after-free), but also
   impacts all other netdevs exposed by the same Ethernet driver
   instance, and has a few other rather ugly implications.

 - phylink currently expects .mac_select_pcs to never fail. But we may
   need a mechanism similar to probe deferral in case the PCS is not
   yet available.
   Your series partially solves this in patch 11/11 "of: property: Add
   device link support for PCS", but also that still won't make the link
   come back in case of a PCS showing up late to the party, eg. due to
   constraints such as phy drivers (drivers/phy, not drivers/net/phy)
   waiting for nvmem providers, or PCS instances "going away" and
   "coming back" later.

 - removal of a PCS instance (eg. via sysfs unbind) would still
   require changes to phylink. there is no phylink function to
   impair the link in this case, and using dev_close() is a bit ugly,
   and also won't bring the link back up once the PCS (re-)appears.

 - phylink anyway is the only user of PCS drivers, and will very likely
   always be. So why create another subsystem?

All that being said I also see potential problems with Christians
current implementation as it doesn't prevent the Ethernet driver to
still store a pointer to struct phylink_pcs (returned eg. from
fwnode_pcs_get()).

Hence I would like to see an even more tight integration with phylink,
in the sense that pointers to 'struct phylink_pcs' should never be
exposed to the MAC driver, as only in that way we can be sure that
phylink, and only phylink, is responsible for reacting to a PCS "going
away".

Ie. instead of fwnode_phylink_pcs_parse() handing pointers to struct
phylink_pcs to the Ethernet driver, so it can use it to populate struct
phylink_config available_pcs member, this should be the responsibility
of phylink alltogether, directly populating the list of available PCS in
phylink's private structure.

Similarly, there should not be fwnode_pcs_get() but rather phylink
providing a function fwnode_phylink_pcs_register(phylink, fwnode) which
directly adds the PCS referenced to the internal list of available PCS.

I hope we can pick the best of all the suggested implementations, and
together come up with something even better.
Re: [net-next PATCH v4 03/11] net: phylink: introduce internal phylink PCS handling
Posted by Sean Anderson 3 months, 3 weeks ago
On 5/13/25 23:00, Daniel Golle wrote:
> On Tue, May 13, 2025 at 03:23:32PM -0400, Sean Anderson wrote:
>> On 5/13/25 15:03, Daniel Golle wrote:
>> > just instead of having many
>> > more or less identical implementations of .mac_select_pcs, this
>> > functionality is moved into phylink. As a nice side-effect that also
>> > makes managing the life-cycle of the PCS more easy, so we won't need all
>> > the wrappers for all the PCS OPs.
>> 
>> I think the wrapper approach is very obviously correct. This way has me
>> worried about exciting new concurrency bugs.
> 
> You may not be surprised to read that this was also our starting point 2
> months ago, I had implemented support for standalone PCS very similar to
> the approach you have published now, using refcnt'ed instances and
> locked wrapper functions for all OPs. My approach, like yours, was to
> create a new subsystem for standalone PCS drivers which is orthogonal to
> phylink and only requires very few very small changes to phylink itself.
> It was a draft and not as complete and well-documented like your series
> now, of course.
> 
> I've then shared that implementation with Christian and some other
> experienced OpenWrt developers and we concluded that having phylink handle
> the PCS lifecycle and PCS selection would be the better and more elegant
> approach for multiple reasons:
>  - The lifetime management of the wrapper instances becomes tricky:
>    We would either have to live with them being allocated by the
>    MAC-driver (imagine test-case doing unbind and then bind in a loop
>    for a while -- we would end up oom). Or we need some kind of garbage
>    collecting mechanism which frees the wrapper once refcnt is zero --
>    and as .select_pcs would 'get' the PCS (ie. bump refcnt) we'd need a
>    'put' equivalent (eg. a .pcs_destroy() OP) in phylink.
> 
>    Russell repeatedly pointed me to the possibility of a PCS
>    "disappearing" (and potentially "reappearing" some time later), and
>    in this case it is unclear who would then ever call pcs_put(), or
>    even notify the Ethernet driver or phylink about the PCS now being
>    available (again). Using device_link_add(), like it is done in
>    pcs-rzn1-miic.c, prevents the worst (ie. use-after-free), but also
>    impacts all other netdevs exposed by the same Ethernet driver
>    instance, and has a few other rather ugly implications.

SRCU neatly solves the lifetime management issues. The wrapper lives as
long as anyone (provider or user) holds a reference. A PCS can disappear
at any point and everything still works (although the link goes down).
Device links are only an optimization; they cannot be relied on for
correctness.

>  - phylink currently expects .mac_select_pcs to never fail. But we may
>    need a mechanism similar to probe deferral in case the PCS is not
>    yet available.

Which is why you grab the PCS in probe. If you want to be more dynamic,
you can do it in netdev open like is done for PHYs.

>    Your series partially solves this in patch 11/11 "of: property: Add
>    device link support for PCS", but also that still won't make the link
>    come back in case of a PCS showing up late to the party, eg. due to
>    constraints such as phy drivers (drivers/phy, not drivers/net/phy)
>    waiting for nvmem providers, or PCS instances "going away" and
>    "coming back" later.

This all works correctly due to device links. The only case that doesn't
work automatically is something like

MAC built-in
  MDIO built-in
    PCS module

where the PCS module gets loaded late. In that case you have to manually
re-probe the MAC. I think the best way to address this would be to grab
the PCS in netdev open so that the MAC can probe without the PCS.

>  - removal of a PCS instance (eg. via sysfs unbind) would still
>    require changes to phylink. there is no phylink function to
>    impair the link in this case, and using dev_close() is a bit ugly,
>    and also won't bring the link back up once the PCS (re-)appears.

This works just fine. There are two cases:

- If the PCS has an IRQ, we notify phylink and then it polls the PCS
  (see below).
- If the PCS is polled, phylink will call pcs_get_state and see that the
  link is down.

Either way, the link goes down. But bringing the link back up is pretty
unusual anyway. Unlike PHYs (which theoretically can be on removable
busses) PCSs are generally permanently attached to their MACs. The only
removable scenario I can think of is if the PCS is on an FPGA and the
MAC is not.

So if the PCS goes away, the MAC is likely to follow shortly after
(since the whole thing is on a removable bus). Or someone has manually
removed the PCS, in which case I think it's reasonable to have them
manually remove the MAC as well. If you really want to support this,
then just grab the PCS in netdev open.

>  - phylink anyway is the only user of PCS drivers, and will very likely
>    always be. So why create another subsystem?

To avoid adding overhead for the majority of PCSs where the PCS is built
into the MAC and literally can't be removed. We only pay the price for
dynamicism on the drivers where it matters.

> All that being said I also see potential problems with Christians
> current implementation as it doesn't prevent the Ethernet driver to
> still store a pointer to struct phylink_pcs (returned eg. from
> fwnode_pcs_get()).
> 
> Hence I would like to see an even more tight integration with phylink,
> in the sense that pointers to 'struct phylink_pcs' should never be
> exposed to the MAC driver, as only in that way we can be sure that
> phylink, and only phylink, is responsible for reacting to a PCS "going
> away".

OK, but then how does the MAC select the PCS? If there are multiple PCSs
then ultimately someone has to configure a mux somewhere.

> Ie. instead of fwnode_phylink_pcs_parse() handing pointers to struct
> phylink_pcs to the Ethernet driver, so it can use it to populate struct
> phylink_config available_pcs member, this should be the responsibility
> of phylink alltogether, directly populating the list of available PCS in
> phylink's private structure.
> 
> Similarly, there should not be fwnode_pcs_get() but rather phylink
> providing a function fwnode_phylink_pcs_register(phylink, fwnode) which
> directly adds the PCS referenced to the internal list of available PCS.

This is difficult to work with for existing drivers. Many of them have
non-standard ways of looking up their PCS that they need to support for
backwards-compatibility. And some of them create the PCS themselves
(such as if they are PCI devices with internal MDIO busses). It's much
easier for the MAC to create or look up the PCS itself and then hand it
off to phylink.

> I hope we can pick the best of all the suggested implementations, and
> together come up with something even better.

Sure. And I think we were starting from a clean slate then this would be
the obvious way to do things. But we must support existing drivers and
provide an upgrade path for them. This is why I favor an incremental
approach.

--Sean
Re: [net-next PATCH v4 03/11] net: phylink: introduce internal phylink PCS handling
Posted by Sean Anderson 3 months, 3 weeks ago
On 5/19/25 14:10, Sean Anderson wrote:
> On 5/13/25 23:00, Daniel Golle wrote:
>> On Tue, May 13, 2025 at 03:23:32PM -0400, Sean Anderson wrote:
>>> On 5/13/25 15:03, Daniel Golle wrote:
>>> > just instead of having many
>>> > more or less identical implementations of .mac_select_pcs, this
>>> > functionality is moved into phylink. As a nice side-effect that also
>>> > makes managing the life-cycle of the PCS more easy, so we won't need all
>>> > the wrappers for all the PCS OPs.
>>> 
>>> I think the wrapper approach is very obviously correct. This way has me
>>> worried about exciting new concurrency bugs.
>> 
>> You may not be surprised to read that this was also our starting point 2
>> months ago, I had implemented support for standalone PCS very similar to
>> the approach you have published now, using refcnt'ed instances and
>> locked wrapper functions for all OPs. My approach, like yours, was to
>> create a new subsystem for standalone PCS drivers which is orthogonal to
>> phylink and only requires very few very small changes to phylink itself.
>> It was a draft and not as complete and well-documented like your series
>> now, of course.
>> 
>> I've then shared that implementation with Christian and some other
>> experienced OpenWrt developers and we concluded that having phylink handle
>> the PCS lifecycle and PCS selection would be the better and more elegant
>> approach for multiple reasons:
>>  - The lifetime management of the wrapper instances becomes tricky:
>>    We would either have to live with them being allocated by the
>>    MAC-driver (imagine test-case doing unbind and then bind in a loop
>>    for a while -- we would end up oom). Or we need some kind of garbage
>>    collecting mechanism which frees the wrapper once refcnt is zero --
>>    and as .select_pcs would 'get' the PCS (ie. bump refcnt) we'd need a
>>    'put' equivalent (eg. a .pcs_destroy() OP) in phylink.
>> 
>>    Russell repeatedly pointed me to the possibility of a PCS
>>    "disappearing" (and potentially "reappearing" some time later), and
>>    in this case it is unclear who would then ever call pcs_put(), or
>>    even notify the Ethernet driver or phylink about the PCS now being
>>    available (again). Using device_link_add(), like it is done in
>>    pcs-rzn1-miic.c, prevents the worst (ie. use-after-free), but also
>>    impacts all other netdevs exposed by the same Ethernet driver
>>    instance, and has a few other rather ugly implications.
> 
> SRCU neatly solves the lifetime management issues. The wrapper lives as
> long as anyone (provider or user) holds a reference. A PCS can disappear
> at any point and everything still works (although the link goes down).
> Device links are only an optimization; they cannot be relied on for
> correctness.
> 
>>  - phylink currently expects .mac_select_pcs to never fail. But we may
>>    need a mechanism similar to probe deferral in case the PCS is not
>>    yet available.
> 
> Which is why you grab the PCS in probe. If you want to be more dynamic,
> you can do it in netdev open like is done for PHYs.
> 
>>    Your series partially solves this in patch 11/11 "of: property: Add
>>    device link support for PCS", but also that still won't make the link
>>    come back in case of a PCS showing up late to the party, eg. due to
>>    constraints such as phy drivers (drivers/phy, not drivers/net/phy)
>>    waiting for nvmem providers, or PCS instances "going away" and
>>    "coming back" later.
> 
> This all works correctly due to device links. The only case that doesn't
> work automatically is something like
> 
> MAC built-in
>   MDIO built-in
>     PCS module
> 
> where the PCS module gets loaded late. In that case you have to manually
> re-probe the MAC. I think the best way to address this would be to grab
> the PCS in netdev open so that the MAC can probe without the PCS.
> 
>>  - removal of a PCS instance (eg. via sysfs unbind) would still
>>    require changes to phylink. there is no phylink function to
>>    impair the link in this case, and using dev_close() is a bit ugly,
>>    and also won't bring the link back up once the PCS (re-)appears.
> 
> This works just fine. There are two cases:
> 
> - If the PCS has an IRQ, we notify phylink and then it polls the PCS
>   (see below).
> - If the PCS is polled, phylink will call pcs_get_state and see that the
>   link is down.
> 
> Either way, the link goes down. But bringing the link back up is pretty
> unusual anyway. Unlike PHYs (which theoretically can be on removable
> busses) PCSs are generally permanently attached to their MACs. The only
> removable scenario I can think of is if the PCS is on an FPGA and the
> MAC is not.
> 
> So if the PCS goes away, the MAC is likely to follow shortly after
> (since the whole thing is on a removable bus). Or someone has manually
> removed the PCS, in which case I think it's reasonable to have them
> manually remove the MAC as well. If you really want to support this,
> then just grab the PCS in netdev open.

So I had a closer look at this and unfortunately it isn't as easy as
just grabbing the PCS in ndo_open. The problem is that we need to
know the supported interfaces before phylink_create. The interfaces are
validated and are visible to userspace as soon as the netdev is
registered. And we can't just defer phylink_create to ndo_open because a
lot of the ethtool ops are implemented with phylink. So this would
probably need something like phylink_update_supported_interfaces().

But TBH I don't think this use case is very relevant. As I said above,
it only affects FPGA reconfiguration and people manually unbinding
drivers. Either way I think they are savvy enough to reprobe the netdev.

--Sean

>>  - phylink anyway is the only user of PCS drivers, and will very likely
>>    always be. So why create another subsystem?
> 
> To avoid adding overhead for the majority of PCSs where the PCS is built
> into the MAC and literally can't be removed. We only pay the price for
> dynamicism on the drivers where it matters.
> 
>> All that being said I also see potential problems with Christians
>> current implementation as it doesn't prevent the Ethernet driver to
>> still store a pointer to struct phylink_pcs (returned eg. from
>> fwnode_pcs_get()).
>> 
>> Hence I would like to see an even more tight integration with phylink,
>> in the sense that pointers to 'struct phylink_pcs' should never be
>> exposed to the MAC driver, as only in that way we can be sure that
>> phylink, and only phylink, is responsible for reacting to a PCS "going
>> away".
> 
> OK, but then how does the MAC select the PCS? If there are multiple PCSs
> then ultimately someone has to configure a mux somewhere.
> 
>> Ie. instead of fwnode_phylink_pcs_parse() handing pointers to struct
>> phylink_pcs to the Ethernet driver, so it can use it to populate struct
>> phylink_config available_pcs member, this should be the responsibility
>> of phylink alltogether, directly populating the list of available PCS in
>> phylink's private structure.
>> 
>> Similarly, there should not be fwnode_pcs_get() but rather phylink
>> providing a function fwnode_phylink_pcs_register(phylink, fwnode) which
>> directly adds the PCS referenced to the internal list of available PCS.
> 
> This is difficult to work with for existing drivers. Many of them have
> non-standard ways of looking up their PCS that they need to support for
> backwards-compatibility. And some of them create the PCS themselves
> (such as if they are PCI devices with internal MDIO busses). It's much
> easier for the MAC to create or look up the PCS itself and then hand it
> off to phylink.
> 
>> I hope we can pick the best of all the suggested implementations, and
>> together come up with something even better.
> 
> Sure. And I think we were starting from a clean slate then this would be
> the obvious way to do things. But we must support existing drivers and
> provide an upgrade path for them. This is why I favor an incremental
> approach.
> 
> --Sean
Re: [net-next PATCH v4 03/11] net: phylink: introduce internal phylink PCS handling
Posted by Sean Anderson 4 months ago
On 5/13/25 15:23, Sean Anderson wrote:
> On 5/13/25 15:03, Daniel Golle wrote:
>> On Tue, May 13, 2025 at 02:18:02PM -0400, Sean Anderson wrote:
>>> DSA drivers typically have different PCSs for each port. At the moment
>>> these are selected with mac_select_pcs, allowing the use of a single
>>> phylink_config for each port. If you need to pass PCSs through
>>> phylink_config then each port will needs its own config. This may prove
>>> difficult to integrate with the existing API.
>> 
>> This might be a misunderstanding. Also here there is only a single
>> phylink_config for each MAC or DSA port,
> 
> My point is that while this is the case at the moment, it would not be
> the case with a "generic" select pcs. You would need to modify the
> config for each port to ensure the right PCSs are passed in.

I reread dsa_port_phylink_create and I think I missed the ds/dp
distinction the first time around.

--Sean