[PATCH net-next] net/mlx5: Support devlink port state for host PF

Tariq Toukan posted 1 patch 4 days, 5 hours ago
.../net/ethernet/mellanox/mlx5/core/ecpf.c    |  5 +-
.../mellanox/mlx5/core/esw/devlink_port.c     |  2 +
.../net/ethernet/mellanox/mlx5/core/eswitch.c | 48 ++++++++++++----
.../net/ethernet/mellanox/mlx5/core/eswitch.h | 10 ++++
.../mellanox/mlx5/core/eswitch_offloads.c     | 55 +++++++++++++++++++
5 files changed, 108 insertions(+), 12 deletions(-)
[PATCH net-next] net/mlx5: Support devlink port state for host PF
Posted by Tariq Toukan 4 days, 5 hours ago
From: Moshe Shemesh <moshe@nvidia.com>

Add support for devlink port function state get/set operations for the
host physical function (PF). Until now, mlx5 only allowed state get/set
for subfunctions (SFs) ports. This change enables an administrator with
eSwitch manager privileges to query or modify the host PF’s function
state, allowing it to be explicitly inactivated or activated. While
inactivated, the administrator can modify the functions attributes, such
as enable/disable roce.

$ devlink port show pci/0000:03:00.0/196608
pci/0000:03:00.0/196608: type eth netdev eth1 flavour pcipf controller 1 pfnum 0 external true splittable false
  function:
    hw_addr a0:88:c2:45:17:7c state active opstate attached roce enable max_io_eqs 120
$ devlink port function set pci/0000:03:00.0/196608 state inactive
$ devlink port show pci/0000:03:00.0/196608
pci/0000:03:00.0/196608: type eth netdev eth1 flavour pcipf controller 1 pfnum 0 external true splittable false
  function:
    hw_addr a0:88:c2:45:17:7c state inactive opstate detached roce enable max_io_eqs 120

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 .../net/ethernet/mellanox/mlx5/core/ecpf.c    |  5 +-
 .../mellanox/mlx5/core/esw/devlink_port.c     |  2 +
 .../net/ethernet/mellanox/mlx5/core/eswitch.c | 48 ++++++++++++----
 .../net/ethernet/mellanox/mlx5/core/eswitch.h | 10 ++++
 .../mellanox/mlx5/core/eswitch_offloads.c     | 55 +++++++++++++++++++
 5 files changed, 108 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ecpf.c b/drivers/net/ethernet/mellanox/mlx5/core/ecpf.c
index d000236ddbac..15cb27aea2c9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/ecpf.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ecpf.c
@@ -2,6 +2,7 @@
 /* Copyright (c) 2019 Mellanox Technologies. */
 
 #include "ecpf.h"
+#include "eswitch.h"
 
 bool mlx5_read_embedded_cpu(struct mlx5_core_dev *dev)
 {
@@ -49,7 +50,7 @@ static int mlx5_host_pf_init(struct mlx5_core_dev *dev)
 	/* ECPF shall enable HCA for host PF in the same way a PF
 	 * does this for its VFs when ECPF is not a eswitch manager.
 	 */
-	err = mlx5_cmd_host_pf_enable_hca(dev);
+	err = mlx5_esw_host_pf_enable_hca(dev);
 	if (err)
 		mlx5_core_err(dev, "Failed to enable external host PF HCA err(%d)\n", err);
 
@@ -63,7 +64,7 @@ static void mlx5_host_pf_cleanup(struct mlx5_core_dev *dev)
 	if (mlx5_ecpf_esw_admins_host_pf(dev))
 		return;
 
-	err = mlx5_cmd_host_pf_disable_hca(dev);
+	err = mlx5_esw_host_pf_disable_hca(dev);
 	if (err) {
 		mlx5_core_err(dev, "Failed to disable external host PF HCA err(%d)\n", err);
 		return;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c
index 89a58dee50b3..cd60bc500ec5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/devlink_port.c
@@ -99,6 +99,8 @@ static const struct devlink_port_ops mlx5_esw_pf_vf_dl_port_ops = {
 	.port_fn_roce_set = mlx5_devlink_port_fn_roce_set,
 	.port_fn_migratable_get = mlx5_devlink_port_fn_migratable_get,
 	.port_fn_migratable_set = mlx5_devlink_port_fn_migratable_set,
+	.port_fn_state_get = mlx5_devlink_pf_port_fn_state_get,
+	.port_fn_state_set = mlx5_devlink_pf_port_fn_state_set,
 #ifdef CONFIG_XFRM_OFFLOAD
 	.port_fn_ipsec_crypto_get = mlx5_devlink_port_fn_ipsec_crypto_get,
 	.port_fn_ipsec_crypto_set = mlx5_devlink_port_fn_ipsec_crypto_set,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
index 4b7a1ce7f406..5fbfabe28bdb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
@@ -1304,24 +1304,52 @@ static int mlx5_eswitch_load_ec_vf_vports(struct mlx5_eswitch *esw, u16 num_ec_v
 	return err;
 }
 
-static int host_pf_enable_hca(struct mlx5_core_dev *dev)
+int mlx5_esw_host_pf_enable_hca(struct mlx5_core_dev *dev)
 {
-	if (!mlx5_core_is_ecpf(dev))
+	struct mlx5_eswitch *esw = dev->priv.eswitch;
+	struct mlx5_vport *vport;
+	int err;
+
+	if (!mlx5_core_is_ecpf(dev) || !mlx5_esw_allowed(esw))
 		return 0;
 
+	vport = mlx5_eswitch_get_vport(esw, MLX5_VPORT_PF);
+	if (IS_ERR(vport))
+		return PTR_ERR(vport);
+
 	/* Once vport and representor are ready, take out the external host PF
 	 * out of initializing state. Enabling HCA clears the iser->initializing
 	 * bit and host PF driver loading can progress.
 	 */
-	return mlx5_cmd_host_pf_enable_hca(dev);
+	err = mlx5_cmd_host_pf_enable_hca(dev);
+	if (err)
+		return err;
+
+	vport->pf_activated = true;
+
+	return 0;
 }
 
-static void host_pf_disable_hca(struct mlx5_core_dev *dev)
+int mlx5_esw_host_pf_disable_hca(struct mlx5_core_dev *dev)
 {
-	if (!mlx5_core_is_ecpf(dev))
-		return;
+	struct mlx5_eswitch *esw = dev->priv.eswitch;
+	struct mlx5_vport *vport;
+	int err;
 
-	mlx5_cmd_host_pf_disable_hca(dev);
+	if (!mlx5_core_is_ecpf(dev) || !mlx5_esw_allowed(esw))
+		return 0;
+
+	vport = mlx5_eswitch_get_vport(esw, MLX5_VPORT_PF);
+	if (IS_ERR(vport))
+		return PTR_ERR(vport);
+
+	err = mlx5_cmd_host_pf_disable_hca(dev);
+	if (err)
+		return err;
+
+	vport->pf_activated = false;
+
+	return 0;
 }
 
 /* mlx5_eswitch_enable_pf_vf_vports() enables vports of PF, ECPF and VFs
@@ -1347,7 +1375,7 @@ mlx5_eswitch_enable_pf_vf_vports(struct mlx5_eswitch *esw,
 
 	if (mlx5_esw_host_functions_enabled(esw->dev)) {
 		/* Enable external host PF HCA */
-		ret = host_pf_enable_hca(esw->dev);
+		ret = mlx5_esw_host_pf_enable_hca(esw->dev);
 		if (ret)
 			goto pf_hca_err;
 	}
@@ -1391,7 +1419,7 @@ mlx5_eswitch_enable_pf_vf_vports(struct mlx5_eswitch *esw,
 		mlx5_eswitch_unload_pf_vf_vport(esw, MLX5_VPORT_ECPF);
 ecpf_err:
 	if (mlx5_esw_host_functions_enabled(esw->dev))
-		host_pf_disable_hca(esw->dev);
+		mlx5_esw_host_pf_disable_hca(esw->dev);
 pf_hca_err:
 	if (pf_needed && mlx5_esw_host_functions_enabled(esw->dev))
 		mlx5_eswitch_unload_pf_vf_vport(esw, MLX5_VPORT_PF);
@@ -1416,7 +1444,7 @@ void mlx5_eswitch_disable_pf_vf_vports(struct mlx5_eswitch *esw)
 	}
 
 	if (mlx5_esw_host_functions_enabled(esw->dev))
-		host_pf_disable_hca(esw->dev);
+		mlx5_esw_host_pf_disable_hca(esw->dev);
 
 	if ((mlx5_core_is_ecpf_esw_manager(esw->dev) ||
 	     esw->mode == MLX5_ESWITCH_LEGACY) &&
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
index 714ad28e8445..6841caef02d1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
@@ -243,6 +243,7 @@ struct mlx5_vport {
 	u16 vport;
 	bool                    enabled;
 	bool max_eqs_set;
+	bool pf_activated;
 	enum mlx5_eswitch_vport_event enabled_events;
 	int index;
 	struct mlx5_devlink_port *dl_port;
@@ -587,6 +588,13 @@ int mlx5_devlink_port_fn_migratable_get(struct devlink_port *port, bool *is_enab
 					struct netlink_ext_ack *extack);
 int mlx5_devlink_port_fn_migratable_set(struct devlink_port *port, bool enable,
 					struct netlink_ext_ack *extack);
+int mlx5_devlink_pf_port_fn_state_get(struct devlink_port *port,
+				      enum devlink_port_fn_state *state,
+				      enum devlink_port_fn_opstate *opstate,
+				      struct netlink_ext_ack *extack);
+int mlx5_devlink_pf_port_fn_state_set(struct devlink_port *port,
+				      enum devlink_port_fn_state state,
+				      struct netlink_ext_ack *extack);
 #ifdef CONFIG_XFRM_OFFLOAD
 int mlx5_devlink_port_fn_ipsec_crypto_get(struct devlink_port *port, bool *is_enabled,
 					  struct netlink_ext_ack *extack);
@@ -634,6 +642,8 @@ bool mlx5_esw_multipath_prereq(struct mlx5_core_dev *dev0,
 			       struct mlx5_core_dev *dev1);
 
 const u32 *mlx5_esw_query_functions(struct mlx5_core_dev *dev);
+int mlx5_esw_host_pf_enable_hca(struct mlx5_core_dev *dev);
+int mlx5_esw_host_pf_disable_hca(struct mlx5_core_dev *dev);
 
 void mlx5_esw_adjacent_vhcas_setup(struct mlx5_eswitch *esw);
 void mlx5_esw_adjacent_vhcas_cleanup(struct mlx5_eswitch *esw);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 02b7e474586d..1b439cef3719 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
@@ -4696,6 +4696,61 @@ int mlx5_devlink_port_fn_roce_set(struct devlink_port *port, bool enable,
 	return err;
 }
 
+int mlx5_devlink_pf_port_fn_state_get(struct devlink_port *port,
+				      enum devlink_port_fn_state *state,
+				      enum devlink_port_fn_opstate *opstate,
+				      struct netlink_ext_ack *extack)
+{
+	struct mlx5_vport *vport = mlx5_devlink_port_vport_get(port);
+	const u32 *query_out;
+	bool pf_disabled;
+
+	if (vport->vport != MLX5_VPORT_PF) {
+		NL_SET_ERR_MSG_MOD(extack, "State get is not supported for VF");
+		return -EOPNOTSUPP;
+	}
+
+	*state = vport->pf_activated ?
+		 DEVLINK_PORT_FN_STATE_ACTIVE : DEVLINK_PORT_FN_STATE_INACTIVE;
+
+	query_out = mlx5_esw_query_functions(vport->dev);
+	if (IS_ERR(query_out))
+		return PTR_ERR(query_out);
+
+	pf_disabled = MLX5_GET(query_esw_functions_out, query_out,
+			       host_params_context.host_pf_disabled);
+
+	*opstate = pf_disabled ? DEVLINK_PORT_FN_OPSTATE_DETACHED :
+				 DEVLINK_PORT_FN_OPSTATE_ATTACHED;
+
+	kvfree(query_out);
+	return 0;
+}
+
+int mlx5_devlink_pf_port_fn_state_set(struct devlink_port *port,
+				      enum devlink_port_fn_state state,
+				      struct netlink_ext_ack *extack)
+{
+	struct mlx5_vport *vport = mlx5_devlink_port_vport_get(port);
+	struct mlx5_core_dev *dev;
+
+	if (vport->vport != MLX5_VPORT_PF) {
+		NL_SET_ERR_MSG_MOD(extack, "State set is not supported for VF");
+		return -EOPNOTSUPP;
+	}
+
+	dev = vport->dev;
+
+	switch (state) {
+	case DEVLINK_PORT_FN_STATE_ACTIVE:
+		return mlx5_esw_host_pf_enable_hca(dev);
+	case DEVLINK_PORT_FN_STATE_INACTIVE:
+		return mlx5_esw_host_pf_disable_hca(dev);
+	default:
+		return -EOPNOTSUPP;
+	}
+}
+
 int
 mlx5_eswitch_restore_ipsec_rule(struct mlx5_eswitch *esw, struct mlx5_flow_handle *rule,
 				struct mlx5_esw_flow_attr *esw_attr, int attr_idx)

base-commit: fae1c659d7bd5640012be21b5b5d6490b83c0df8
-- 
2.44.0

Re: [PATCH net-next] net/mlx5: Support devlink port state for host PF
Posted by Paolo Abeni 1 day, 23 hours ago
On 2/3/26 11:24 AM, Tariq Toukan wrote:
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
> index 4b7a1ce7f406..5fbfabe28bdb 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
> @@ -1304,24 +1304,52 @@ static int mlx5_eswitch_load_ec_vf_vports(struct mlx5_eswitch *esw, u16 num_ec_v
>  	return err;
>  }
>  
> -static int host_pf_enable_hca(struct mlx5_core_dev *dev)
> +int mlx5_esw_host_pf_enable_hca(struct mlx5_core_dev *dev)
>  {
> -	if (!mlx5_core_is_ecpf(dev))
> +	struct mlx5_eswitch *esw = dev->priv.eswitch;
> +	struct mlx5_vport *vport;
> +	int err;
> +
> +	if (!mlx5_core_is_ecpf(dev) || !mlx5_esw_allowed(esw))
>  		return 0;

I was able to miss the AI feedback here:

---
The old host_pf_enable_hca() only checked mlx5_core_is_ecpf(dev) before
calling mlx5_cmd_host_pf_enable_hca(). The new function adds a check for
mlx5_esw_allowed(esw), which returns false when esw is NULL or when the
device is not an eswitch manager.

When called from mlx5_host_pf_init() in ecpf.c on an ECPF device that is
not an eswitch manager (the path when mlx5_ecpf_esw_admins_host_pf()
returns false), this new condition will cause the function to return 0
without enabling the HCA.

Is this behavior change intentional? The old code would enable the HCA
in this configuration, but the new code skips it.

The same concern applies to mlx5_esw_host_pf_disable_hca() below.
---

and indeed it looks relevant. I think you have to follow-up or send a
revert, whatever it's easier/faster.

> @@ -1347,7 +1375,7 @@ mlx5_eswitch_enable_pf_vf_vports(struct mlx5_eswitch *esw,
>  
>  	if (mlx5_esw_host_functions_enabled(esw->dev)) {
>  		/* Enable external host PF HCA */
> -		ret = host_pf_enable_hca(esw->dev);
> +		ret = mlx5_esw_host_pf_enable_hca(esw->dev);

Just FTR, more AI feedback here:

---
The old host_pf_disable_hca() was a void function. The new
mlx5_esw_host_pf_disable_hca() returns int and can fail, but the return
value is not checked here in the error path.

If mlx5_esw_host_pf_disable_hca() fails, it returns without setting
vport->pf_activated = false. This leaves pf_activated set to true even
though the HCA state may be inconsistent.

Later, mlx5_devlink_pf_port_fn_state_get() reads vport->pf_activated to
report state to userspace, which could then report incorrect state.

Should the return value be checked, or should the pf_activated flag be
updated unconditionally to reflect the intended state?

The same pattern appears in mlx5_eswitch_disable_pf_vf_vports().
---
Re: [PATCH net-next] net/mlx5: Support devlink port state for host PF
Posted by Moshe Shemesh 1 day, 22 hours ago

On 2/5/2026 5:57 PM, Paolo Abeni wrote:
> 
> On 2/3/26 11:24 AM, Tariq Toukan wrote:
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
>> index 4b7a1ce7f406..5fbfabe28bdb 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
>> @@ -1304,24 +1304,52 @@ static int mlx5_eswitch_load_ec_vf_vports(struct mlx5_eswitch *esw, u16 num_ec_v
>>        return err;
>>   }
>>
>> -static int host_pf_enable_hca(struct mlx5_core_dev *dev)
>> +int mlx5_esw_host_pf_enable_hca(struct mlx5_core_dev *dev)
>>   {
>> -     if (!mlx5_core_is_ecpf(dev))
>> +     struct mlx5_eswitch *esw = dev->priv.eswitch;
>> +     struct mlx5_vport *vport;
>> +     int err;
>> +
>> +     if (!mlx5_core_is_ecpf(dev) || !mlx5_esw_allowed(esw))
>>                return 0;
> 
> I was able to miss the AI feedback here:
> 
> ---
> The old host_pf_enable_hca() only checked mlx5_core_is_ecpf(dev) before
> calling mlx5_cmd_host_pf_enable_hca(). The new function adds a check for
> mlx5_esw_allowed(esw), which returns false when esw is NULL or when the
> device is not an eswitch manager.
> 
> When called from mlx5_host_pf_init() in ecpf.c on an ECPF device that is
> not an eswitch manager (the path when mlx5_ecpf_esw_admins_host_pf()
> returns false), this new condition will cause the function to return 0
> without enabling the HCA.

The additional check I added here is actually redundant either if we get 
from old code path or from the new caller, both will call it as eswitch 
manager only.
So there is no concern on the old code behavior change, but I can follow 
up with a patch to remove the redundant check, though not critical.

> 
> Is this behavior change intentional? The old code would enable the HCA
> in this configuration, but the new code skips it.
> 
> The same concern applies to mlx5_esw_host_pf_disable_hca() below.
> ---
> 
> and indeed it looks relevant. I think you have to follow-up or send a
> revert, whatever it's easier/faster.
> 
>> @@ -1347,7 +1375,7 @@ mlx5_eswitch_enable_pf_vf_vports(struct mlx5_eswitch *esw,
>>
>>        if (mlx5_esw_host_functions_enabled(esw->dev)) {
>>                /* Enable external host PF HCA */
>> -             ret = host_pf_enable_hca(esw->dev);
>> +             ret = mlx5_esw_host_pf_enable_hca(esw->dev);
> 
> Just FTR, more AI feedback here:
> 
> ---
> The old host_pf_disable_hca() was a void function. The new
> mlx5_esw_host_pf_disable_hca() returns int and can fail, but the return
> value is not checked here in the error path.
> 
> If mlx5_esw_host_pf_disable_hca() fails, it returns without setting
> vport->pf_activated = false. This leaves pf_activated set to true even
> though the HCA state may be inconsistent.
> 
> Later, mlx5_devlink_pf_port_fn_state_get() reads vport->pf_activated to
> report state to userspace, which could then report incorrect state.
> 
> Should the return value be checked, or should the pf_activated flag be
> updated unconditionally to reflect the intended state?

When disable function is part of the unload/teardown/error flows we 
don't check result. The function is not void anymore only for the new 
use case that it is called from devlink and return value is checked there.
Thanks, Moshe.

> 
> The same pattern appears in mlx5_eswitch_disable_pf_vf_vports().
> ---
>