[PATCH net-next 10/10] net/mlx5e: Use the 'num_doorbells' devlink param

Tariq Toukan posted 10 patches 3 weeks, 1 day ago
There is a newer version of this series
[PATCH net-next 10/10] net/mlx5e: Use the 'num_doorbells' devlink param
Posted by Tariq Toukan 3 weeks, 1 day ago
From: Cosmin Ratiu <cratiu@nvidia.com>

Use the new devlink param to control how many doorbells mlx5e devices
allocate and use. The maximum number of doorbells configurable is capped
to the maximum number of channels. This only applies to the Ethernet
part, the RDMA devices using mlx5 manage their own doorbells.

Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 Documentation/networking/devlink/mlx5.rst     |  8 ++++++
 .../net/ethernet/mellanox/mlx5/core/devlink.c | 26 +++++++++++++++++++
 .../ethernet/mellanox/mlx5/core/en_common.c   | 15 ++++++++++-
 3 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/Documentation/networking/devlink/mlx5.rst b/Documentation/networking/devlink/mlx5.rst
index 60cc9fedf1ef..0650462b3eae 100644
--- a/Documentation/networking/devlink/mlx5.rst
+++ b/Documentation/networking/devlink/mlx5.rst
@@ -45,6 +45,14 @@ Parameters
      - The range is between 1 and a device-specific max.
      - Applies to each physical function (PF) independently, if the device
        supports it. Otherwise, it applies symmetrically to all PFs.
+   * - ``num_doorbells``
+     - driverinit
+     - This controls the number of channel doorbells used by the netdev. In all
+       cases, an additional doorbell is allocated and used for non-channel
+       communication (e.g. for PTP, HWS, etc.). Supported values are:
+       - 0: No channel-specific doorbells, use the global one for everything.
+       - [1, max_num_channels]: Spread netdev channels equally across these
+         doorbells.
 
 Note: permanent parameters such as ``enable_sriov`` and ``total_vfs`` require FW reset to take effect
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
index a0b68321355a..50b8cc9bc12b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
@@ -535,6 +535,25 @@ mlx5_devlink_hairpin_queue_size_validate(struct devlink *devlink, u32 id,
 	return 0;
 }
 
+static int mlx5_devlink_num_doorbells_validate(struct devlink *devlink, u32 id,
+					       union devlink_param_value val,
+					       struct netlink_ext_ack *extack)
+{
+	struct mlx5_core_dev *mdev = devlink_priv(devlink);
+	u32 val32 = val.vu32;
+	u32 max_num_channels;
+
+	max_num_channels = mlx5e_get_max_num_channels(mdev);
+	if (val32 > max_num_channels) {
+		NL_SET_ERR_MSG_FMT_MOD(extack,
+				       "Requested num_doorbells (%u) exceeds maximum number of channels (%u)\n",
+				       val32, max_num_channels);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 static void mlx5_devlink_hairpin_params_init_values(struct devlink *devlink)
 {
 	struct mlx5_core_dev *dev = devlink_priv(devlink);
@@ -614,6 +633,9 @@ static const struct devlink_param mlx5_devlink_eth_params[] = {
 			     "hairpin_queue_size", DEVLINK_PARAM_TYPE_U32,
 			     BIT(DEVLINK_PARAM_CMODE_DRIVERINIT), NULL, NULL,
 			     mlx5_devlink_hairpin_queue_size_validate),
+	DEVLINK_PARAM_GENERIC(NUM_DOORBELLS,
+			      BIT(DEVLINK_PARAM_CMODE_DRIVERINIT), NULL, NULL,
+			      mlx5_devlink_num_doorbells_validate),
 };
 
 static int mlx5_devlink_eth_params_register(struct devlink *devlink)
@@ -637,6 +659,10 @@ static int mlx5_devlink_eth_params_register(struct devlink *devlink)
 
 	mlx5_devlink_hairpin_params_init_values(devlink);
 
+	value.vu32 = MLX5_DEFAULT_NUM_DOORBELLS;
+	devl_param_driverinit_value_set(devlink,
+					DEVLINK_PARAM_GENERIC_ID_NUM_DOORBELLS,
+					value);
 	return 0;
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c
index d13cebbc763a..96b744ceaf13 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c
@@ -30,6 +30,7 @@
  * SOFTWARE.
  */
 
+#include "devlink.h"
 #include "en.h"
 #include "lib/crypto.h"
 
@@ -140,6 +141,18 @@ static int mlx5e_create_tises(struct mlx5_core_dev *mdev, u32 tisn[MLX5_MAX_PORT
 	return err;
 }
 
+static unsigned int
+mlx5e_get_devlink_param_num_doorbells(struct mlx5_core_dev *dev)
+{
+	const u32 param_id = DEVLINK_PARAM_GENERIC_ID_NUM_DOORBELLS;
+	struct devlink *devlink = priv_to_devlink(dev);
+	union devlink_param_value val;
+	int err;
+
+	err = devl_param_driverinit_value_get(devlink, param_id, &val);
+	return err ? MLX5_DEFAULT_NUM_DOORBELLS : val.vu32;
+}
+
 int mlx5e_create_mdev_resources(struct mlx5_core_dev *mdev, bool create_tises)
 {
 	struct mlx5e_hw_objs *res = &mdev->mlx5e_res.hw_objs;
@@ -164,7 +177,7 @@ int mlx5e_create_mdev_resources(struct mlx5_core_dev *mdev, bool create_tises)
 		goto err_dealloc_transport_domain;
 	}
 
-	num_doorbells = min(MLX5_DEFAULT_NUM_DOORBELLS,
+	num_doorbells = min(mlx5e_get_devlink_param_num_doorbells(mdev),
 			    mlx5e_get_max_num_channels(mdev));
 	res->bfregs = kcalloc(num_doorbells, sizeof(*res->bfregs), GFP_KERNEL);
 	if (!res->bfregs) {
-- 
2.31.1
Re: [PATCH net-next 10/10] net/mlx5e: Use the 'num_doorbells' devlink param
Posted by Jakub Kicinski 3 weeks, 1 day ago
On Wed, 10 Sep 2025 13:24:51 +0300 Tariq Toukan wrote:
> @@ -45,6 +45,14 @@ Parameters
>       - The range is between 1 and a device-specific max.
>       - Applies to each physical function (PF) independently, if the device
>         supports it. Otherwise, it applies symmetrically to all PFs.
> +   * - ``num_doorbells``
> +     - driverinit
> +     - This controls the number of channel doorbells used by the netdev. In all
> +       cases, an additional doorbell is allocated and used for non-channel
> +       communication (e.g. for PTP, HWS, etc.). Supported values are:
> +       - 0: No channel-specific doorbells, use the global one for everything.
> +       - [1, max_num_channels]: Spread netdev channels equally across these
> +         doorbells.

This is not vibing with the changes we merged yesterday (or I fumbled
the merge):

Documentation/networking/devlink/mlx5.rst:13: ERROR: Error parsing content block for the "list-table" directive: uniform two-level bullet list expected, but row 8 does not contain the same number of items as row 1 (3 vs 4).

Also in this series:

drivers/net/ethernet/mellanox/mlx5/core/devlink.c:549:11-83: WARNING avoid newline at end of message in NL_SET_ERR_MSG_FMT_MOD
Re: [PATCH net-next 10/10] net/mlx5e: Use the 'num_doorbells' devlink param
Posted by Stanislav Fomichev 3 weeks, 1 day ago
On 09/10, Tariq Toukan wrote:
> From: Cosmin Ratiu <cratiu@nvidia.com>
> 
> Use the new devlink param to control how many doorbells mlx5e devices
> allocate and use. The maximum number of doorbells configurable is capped
> to the maximum number of channels. This only applies to the Ethernet
> part, the RDMA devices using mlx5 manage their own doorbells.
> 
> Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> ---
>  Documentation/networking/devlink/mlx5.rst     |  8 ++++++
>  .../net/ethernet/mellanox/mlx5/core/devlink.c | 26 +++++++++++++++++++
>  .../ethernet/mellanox/mlx5/core/en_common.c   | 15 ++++++++++-
>  3 files changed, 48 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/networking/devlink/mlx5.rst b/Documentation/networking/devlink/mlx5.rst
> index 60cc9fedf1ef..0650462b3eae 100644
> --- a/Documentation/networking/devlink/mlx5.rst
> +++ b/Documentation/networking/devlink/mlx5.rst
> @@ -45,6 +45,14 @@ Parameters
>       - The range is between 1 and a device-specific max.
>       - Applies to each physical function (PF) independently, if the device
>         supports it. Otherwise, it applies symmetrically to all PFs.
> +   * - ``num_doorbells``
> +     - driverinit
> +     - This controls the number of channel doorbells used by the netdev. In all
> +       cases, an additional doorbell is allocated and used for non-channel
> +       communication (e.g. for PTP, HWS, etc.). Supported values are:
> +       - 0: No channel-specific doorbells, use the global one for everything.
> +       - [1, max_num_channels]: Spread netdev channels equally across these
> +         doorbells.

Do you have any guidance on this number? Why would the user want
`num_doorbells < num_doorbells` vs `num_doorbells == num_channels`?

IOW, why not allocate the same number of doorbells as the number of
channels and do it unconditionally without devlink param? Are extra
doorbells causing any overhead in the non-contended case?
Re: [PATCH net-next 10/10] net/mlx5e: Use the 'num_doorbells' devlink param
Posted by Cosmin Ratiu 2 weeks, 2 days ago
On Wed, 2025-09-10 at 09:16 -0700, Stanislav Fomichev wrote:
> > +       - 0: No channel-specific doorbells, use the global one for
> > everything.
> > +       - [1, max_num_channels]: Spread netdev channels equally
> > across these
> > +         doorbells.
> 
> Do you have any guidance on this number? Why would the user want
> `num_doorbells < num_doorbells` vs `num_doorbells == num_channels`?
> 
> IOW, why not allocate the same number of doorbells as the number of
> channels and do it unconditionally without devlink param? Are extra
> doorbells causing any overhead in the non-contended case?

In most cases, additional doorbells are an overhead and not required.
For the last 10+ years, mlx5 has been running with a single doorbell
for all channels. But as the number of cores and channels grew,
bottlenecks were discovered on some platforms. Thus the need for this
series.

This series proposes 8 as the new default and we expect nobody would
need to touch this knob except in extreme cases, but I think it's nice
for it to exist. Regression testing showed little to no impact on most
platforms where 1 doorbell was enough, and a significant improvement on
platforms that showed MMIO bottlenecks.

I'll add some numbers in the next version's cover letter.

Cosmin.
Re: [PATCH net-next 10/10] net/mlx5e: Use the 'num_doorbells' devlink param
Posted by Jason Gunthorpe 3 weeks, 1 day ago
On Wed, Sep 10, 2025 at 09:16:40AM -0700, Stanislav Fomichev wrote:

> > +   * - ``num_doorbells``
> > +     - driverinit
> > +     - This controls the number of channel doorbells used by the netdev. In all
> > +       cases, an additional doorbell is allocated and used for non-channel
> > +       communication (e.g. for PTP, HWS, etc.). Supported values are:
> > +       - 0: No channel-specific doorbells, use the global one for everything.
> > +       - [1, max_num_channels]: Spread netdev channels equally across these
> > +         doorbells.
> 
> Do you have any guidance on this number? Why would the user want
> `num_doorbells < num_doorbells` vs `num_doorbells == num_channels`?

I expect it to be common that most deployment should continue to use
the historical value of num_doorbells = 0.

Certain systems with troubled CPUs will need to increase this, I don't
know if we yet fully understand what values these CPUs will need.

Nor do I think I'm permitted to say what CPUs are troubled :\

> IOW, why not allocate the same number of doorbells as the number of
> channels and do it unconditionally without devlink param? Are extra
> doorbells causing any overhead in the non-contended case?

It has a cost that should be minimized to not harm the current
majority of users.

Jason