From: Cosmin Ratiu <cratiu@nvidia.com>
Use the new devlink param to control how many doorbells mlx5e devices
allocate and use. The maximum number of doorbells configurable is capped
to the maximum number of channels. This only applies to the Ethernet
part, the RDMA devices using mlx5 manage their own doorbells.
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
Documentation/networking/devlink/mlx5.rst | 8 ++++++
.../net/ethernet/mellanox/mlx5/core/devlink.c | 26 +++++++++++++++++++
.../ethernet/mellanox/mlx5/core/en_common.c | 15 ++++++++++-
3 files changed, 48 insertions(+), 1 deletion(-)
diff --git a/Documentation/networking/devlink/mlx5.rst b/Documentation/networking/devlink/mlx5.rst
index 60cc9fedf1ef..0650462b3eae 100644
--- a/Documentation/networking/devlink/mlx5.rst
+++ b/Documentation/networking/devlink/mlx5.rst
@@ -45,6 +45,14 @@ Parameters
- The range is between 1 and a device-specific max.
- Applies to each physical function (PF) independently, if the device
supports it. Otherwise, it applies symmetrically to all PFs.
+ * - ``num_doorbells``
+ - driverinit
+ - This controls the number of channel doorbells used by the netdev. In all
+ cases, an additional doorbell is allocated and used for non-channel
+ communication (e.g. for PTP, HWS, etc.). Supported values are:
+ - 0: No channel-specific doorbells, use the global one for everything.
+ - [1, max_num_channels]: Spread netdev channels equally across these
+ doorbells.
Note: permanent parameters such as ``enable_sriov`` and ``total_vfs`` require FW reset to take effect
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
index a0b68321355a..50b8cc9bc12b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
@@ -535,6 +535,25 @@ mlx5_devlink_hairpin_queue_size_validate(struct devlink *devlink, u32 id,
return 0;
}
+static int mlx5_devlink_num_doorbells_validate(struct devlink *devlink, u32 id,
+ union devlink_param_value val,
+ struct netlink_ext_ack *extack)
+{
+ struct mlx5_core_dev *mdev = devlink_priv(devlink);
+ u32 val32 = val.vu32;
+ u32 max_num_channels;
+
+ max_num_channels = mlx5e_get_max_num_channels(mdev);
+ if (val32 > max_num_channels) {
+ NL_SET_ERR_MSG_FMT_MOD(extack,
+ "Requested num_doorbells (%u) exceeds maximum number of channels (%u)\n",
+ val32, max_num_channels);
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
static void mlx5_devlink_hairpin_params_init_values(struct devlink *devlink)
{
struct mlx5_core_dev *dev = devlink_priv(devlink);
@@ -614,6 +633,9 @@ static const struct devlink_param mlx5_devlink_eth_params[] = {
"hairpin_queue_size", DEVLINK_PARAM_TYPE_U32,
BIT(DEVLINK_PARAM_CMODE_DRIVERINIT), NULL, NULL,
mlx5_devlink_hairpin_queue_size_validate),
+ DEVLINK_PARAM_GENERIC(NUM_DOORBELLS,
+ BIT(DEVLINK_PARAM_CMODE_DRIVERINIT), NULL, NULL,
+ mlx5_devlink_num_doorbells_validate),
};
static int mlx5_devlink_eth_params_register(struct devlink *devlink)
@@ -637,6 +659,10 @@ static int mlx5_devlink_eth_params_register(struct devlink *devlink)
mlx5_devlink_hairpin_params_init_values(devlink);
+ value.vu32 = MLX5_DEFAULT_NUM_DOORBELLS;
+ devl_param_driverinit_value_set(devlink,
+ DEVLINK_PARAM_GENERIC_ID_NUM_DOORBELLS,
+ value);
return 0;
}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c
index d13cebbc763a..96b744ceaf13 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c
@@ -30,6 +30,7 @@
* SOFTWARE.
*/
+#include "devlink.h"
#include "en.h"
#include "lib/crypto.h"
@@ -140,6 +141,18 @@ static int mlx5e_create_tises(struct mlx5_core_dev *mdev, u32 tisn[MLX5_MAX_PORT
return err;
}
+static unsigned int
+mlx5e_get_devlink_param_num_doorbells(struct mlx5_core_dev *dev)
+{
+ const u32 param_id = DEVLINK_PARAM_GENERIC_ID_NUM_DOORBELLS;
+ struct devlink *devlink = priv_to_devlink(dev);
+ union devlink_param_value val;
+ int err;
+
+ err = devl_param_driverinit_value_get(devlink, param_id, &val);
+ return err ? MLX5_DEFAULT_NUM_DOORBELLS : val.vu32;
+}
+
int mlx5e_create_mdev_resources(struct mlx5_core_dev *mdev, bool create_tises)
{
struct mlx5e_hw_objs *res = &mdev->mlx5e_res.hw_objs;
@@ -164,7 +177,7 @@ int mlx5e_create_mdev_resources(struct mlx5_core_dev *mdev, bool create_tises)
goto err_dealloc_transport_domain;
}
- num_doorbells = min(MLX5_DEFAULT_NUM_DOORBELLS,
+ num_doorbells = min(mlx5e_get_devlink_param_num_doorbells(mdev),
mlx5e_get_max_num_channels(mdev));
res->bfregs = kcalloc(num_doorbells, sizeof(*res->bfregs), GFP_KERNEL);
if (!res->bfregs) {
--
2.31.1
On Wed, 10 Sep 2025 13:24:51 +0300 Tariq Toukan wrote: > @@ -45,6 +45,14 @@ Parameters > - The range is between 1 and a device-specific max. > - Applies to each physical function (PF) independently, if the device > supports it. Otherwise, it applies symmetrically to all PFs. > + * - ``num_doorbells`` > + - driverinit > + - This controls the number of channel doorbells used by the netdev. In all > + cases, an additional doorbell is allocated and used for non-channel > + communication (e.g. for PTP, HWS, etc.). Supported values are: > + - 0: No channel-specific doorbells, use the global one for everything. > + - [1, max_num_channels]: Spread netdev channels equally across these > + doorbells. This is not vibing with the changes we merged yesterday (or I fumbled the merge): Documentation/networking/devlink/mlx5.rst:13: ERROR: Error parsing content block for the "list-table" directive: uniform two-level bullet list expected, but row 8 does not contain the same number of items as row 1 (3 vs 4). Also in this series: drivers/net/ethernet/mellanox/mlx5/core/devlink.c:549:11-83: WARNING avoid newline at end of message in NL_SET_ERR_MSG_FMT_MOD
On 09/10, Tariq Toukan wrote: > From: Cosmin Ratiu <cratiu@nvidia.com> > > Use the new devlink param to control how many doorbells mlx5e devices > allocate and use. The maximum number of doorbells configurable is capped > to the maximum number of channels. This only applies to the Ethernet > part, the RDMA devices using mlx5 manage their own doorbells. > > Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> > Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> > Signed-off-by: Tariq Toukan <tariqt@nvidia.com> > --- > Documentation/networking/devlink/mlx5.rst | 8 ++++++ > .../net/ethernet/mellanox/mlx5/core/devlink.c | 26 +++++++++++++++++++ > .../ethernet/mellanox/mlx5/core/en_common.c | 15 ++++++++++- > 3 files changed, 48 insertions(+), 1 deletion(-) > > diff --git a/Documentation/networking/devlink/mlx5.rst b/Documentation/networking/devlink/mlx5.rst > index 60cc9fedf1ef..0650462b3eae 100644 > --- a/Documentation/networking/devlink/mlx5.rst > +++ b/Documentation/networking/devlink/mlx5.rst > @@ -45,6 +45,14 @@ Parameters > - The range is between 1 and a device-specific max. > - Applies to each physical function (PF) independently, if the device > supports it. Otherwise, it applies symmetrically to all PFs. > + * - ``num_doorbells`` > + - driverinit > + - This controls the number of channel doorbells used by the netdev. In all > + cases, an additional doorbell is allocated and used for non-channel > + communication (e.g. for PTP, HWS, etc.). Supported values are: > + - 0: No channel-specific doorbells, use the global one for everything. > + - [1, max_num_channels]: Spread netdev channels equally across these > + doorbells. Do you have any guidance on this number? Why would the user want `num_doorbells < num_doorbells` vs `num_doorbells == num_channels`? IOW, why not allocate the same number of doorbells as the number of channels and do it unconditionally without devlink param? Are extra doorbells causing any overhead in the non-contended case?
On Wed, 2025-09-10 at 09:16 -0700, Stanislav Fomichev wrote: > > + - 0: No channel-specific doorbells, use the global one for > > everything. > > + - [1, max_num_channels]: Spread netdev channels equally > > across these > > + doorbells. > > Do you have any guidance on this number? Why would the user want > `num_doorbells < num_doorbells` vs `num_doorbells == num_channels`? > > IOW, why not allocate the same number of doorbells as the number of > channels and do it unconditionally without devlink param? Are extra > doorbells causing any overhead in the non-contended case? In most cases, additional doorbells are an overhead and not required. For the last 10+ years, mlx5 has been running with a single doorbell for all channels. But as the number of cores and channels grew, bottlenecks were discovered on some platforms. Thus the need for this series. This series proposes 8 as the new default and we expect nobody would need to touch this knob except in extreme cases, but I think it's nice for it to exist. Regression testing showed little to no impact on most platforms where 1 doorbell was enough, and a significant improvement on platforms that showed MMIO bottlenecks. I'll add some numbers in the next version's cover letter. Cosmin.
On Wed, Sep 10, 2025 at 09:16:40AM -0700, Stanislav Fomichev wrote: > > + * - ``num_doorbells`` > > + - driverinit > > + - This controls the number of channel doorbells used by the netdev. In all > > + cases, an additional doorbell is allocated and used for non-channel > > + communication (e.g. for PTP, HWS, etc.). Supported values are: > > + - 0: No channel-specific doorbells, use the global one for everything. > > + - [1, max_num_channels]: Spread netdev channels equally across these > > + doorbells. > > Do you have any guidance on this number? Why would the user want > `num_doorbells < num_doorbells` vs `num_doorbells == num_channels`? I expect it to be common that most deployment should continue to use the historical value of num_doorbells = 0. Certain systems with troubled CPUs will need to increase this, I don't know if we yet fully understand what values these CPUs will need. Nor do I think I'm permitted to say what CPUs are troubled :\ > IOW, why not allocate the same number of doorbells as the number of > channels and do it unconditionally without devlink param? Are extra > doorbells causing any overhead in the non-contended case? It has a cost that should be minimized to not harm the current majority of users. Jason
© 2016 - 2025 Red Hat, Inc.