[PATCH net-next 00/14] devlink and mlx5: Support cross-function rate scheduling

Tariq Toukan posted 14 patches 1 week, 4 days ago
There is a newer version of this series
Documentation/netlink/specs/devlink.yaml      |  22 +-
.../networking/devlink/devlink-port.rst       |   2 +
.../networking/devlink/devlink-shared.rst     |  66 ++++
Documentation/networking/devlink/index.rst    |   3 +
Documentation/networking/devlink/mlx5.rst     |  33 ++
.../net/ethernet/mellanox/mlx5/core/Makefile  |   5 +-
.../net/ethernet/mellanox/mlx5/core/devlink.c |   1 +
.../mellanox/mlx5/core/esw/devlink_port.c     |   2 +-
.../net/ethernet/mellanox/mlx5/core/esw/qos.c | 324 ++++++++----------
.../net/ethernet/mellanox/mlx5/core/esw/qos.h |   3 -
.../net/ethernet/mellanox/mlx5/core/eswitch.c |   9 +-
.../net/ethernet/mellanox/mlx5/core/eswitch.h |  14 +-
.../net/ethernet/mellanox/mlx5/core/main.c    |  18 +
.../ethernet/mellanox/mlx5/core/sh_devlink.c  | 183 ++++++++++
.../ethernet/mellanox/mlx5/core/sh_devlink.h  |  16 +
include/linux/mlx5/driver.h                   |   5 +
include/net/devlink.h                         |   7 +
include/uapi/linux/devlink.h                  |   2 +
net/devlink/core.c                            |  48 ++-
net/devlink/dev.c                             |   7 +-
net/devlink/devl_internal.h                   |  11 +-
net/devlink/netlink.c                         |  67 +++-
net/devlink/netlink_gen.c                     |  23 +-
net/devlink/netlink_gen.h                     |   8 +
net/devlink/rate.c                            | 287 ++++++++++++----
25 files changed, 873 insertions(+), 293 deletions(-)
create mode 100644 Documentation/networking/devlink/devlink-shared.rst
create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/sh_devlink.c
create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/sh_devlink.h
[PATCH net-next 00/14] devlink and mlx5: Support cross-function rate scheduling
Posted by Tariq Toukan 1 week, 4 days ago
Hi,

This series by Cosmin and Jiri adds support for cross-function rate
scheduling in devlink and mlx5.
This is a different approach for the series discussed in [2] earlier
this year. See detailed feature description by Cosmin below [1].

Code dependency:
This series should apply cleanly after the pulling of
'net-2025_11_19_05_03', specifically commit f94c1a114ac2 ("devlink:
rate: Unset parent pointer in devl_rate_nodes_destroy").

Regards,
Tariq


[1]
devlink objects support rate management for TX scheduling, which
involves maintaining a tree of rate nodes that corresponds to TX
schedulers in hardware. 'man devlink-rate' has the full details.

The tree of rate nodes is maintained per devlink object, protected by
the devlink lock.

There exists hardware capable of instantiating TX scheduling trees
spanning multiple functions of the same physical device (and thus
devlink objects) and therefore the current API and locking scheme is
insufficient.

This patch series changes the devlink rate implementation and API to
allow supporting such hardware and managing TX scheduling trees across
multiple functions of a physical device.

Modeling this requires having devlink rate nodes with parents in other
devlink objects. A naive approach that relies on the current
one-lock-per-devlink model is impossible, as it would require in some
cases acquiring multiple devlink locks in the correct order.

The solution proposed in this patch series consists of two parts:

1. Modeling the underlying physical NIC as a shared devlink object on
   the faux bus and nesting all its PF devlink instances in it.

2. Changing the devlink rate implementation to store rates in this
   shared devlink object, if it exists, and use its lock to protect
   against concurrent changes of the scheduling tree.

With these in place, cross-esw scheduling support is added to mlx5.  The
neat part about this approach is that it works for SFs as well, which
are already nested in their parent PF instances.

V1 of this patch series was sent a long time ago [2], using a different
approach of storing rates in a shared rate domain with special locking
rules. This new approach uses standard devlink instances and nesting.

Patches:

devlink rate changes for cross-device TX scheduling:
devlink: Reverse locking order for nested instances
documentation: networking: add shared devlink documentation
devlink: Add helpers to lock nested-in instances
devlink: Refactor devlink_rate_nodes_check
devlink: Decouple rate storage from associated devlink object
devlink: Add parent dev to devlink API
devlink: Allow parent dev for rate-set and rate-new
devlink: Allow rate node parents from other devlinks

mlx5 support for cross-devuce TX scheduling:
net/mlx5: Introduce shared devlink instance for PFs on same chip
net/mlx5: Expose a function to clear a vport's parent
net/mlx5: Store QoS sched nodes in the sh_devlink
net/mlx5: qos: Support cross-esw tx scheduling
net/mlx5: qos: Enable cross-device scheduling
net/mlx5: Document devlink rates and cross-esw scheduling

[2] https://lore.kernel.org/netdev/20250213180134.323929-1-tariqt@nvidia.com/


Cosmin Ratiu (12):
  devlink: Reverse locking order for nested instances
  devlink: Add helpers to lock nested-in instances
  devlink: Refactor devlink_rate_nodes_check
  devlink: Decouple rate storage from associated devlink object
  devlink: Add parent dev to devlink API
  devlink: Allow parent dev for rate-set and rate-new
  devlink: Allow rate node parents from other devlinks
  net/mlx5: Expose a function to clear a vport's parent
  net/mlx5: Store QoS sched nodes in the sh_devlink
  net/mlx5: qos: Support cross-device tx scheduling
  net/mlx5: qos: Enable cross-device scheduling
  net/mlx5: Document devlink rates

Jiri Pirko (2):
  documentation: networking: add shared devlink documentation
  net/mlx5: Introduce shared devlink instance for PFs on same chip

 Documentation/netlink/specs/devlink.yaml      |  22 +-
 .../networking/devlink/devlink-port.rst       |   2 +
 .../networking/devlink/devlink-shared.rst     |  66 ++++
 Documentation/networking/devlink/index.rst    |   3 +
 Documentation/networking/devlink/mlx5.rst     |  33 ++
 .../net/ethernet/mellanox/mlx5/core/Makefile  |   5 +-
 .../net/ethernet/mellanox/mlx5/core/devlink.c |   1 +
 .../mellanox/mlx5/core/esw/devlink_port.c     |   2 +-
 .../net/ethernet/mellanox/mlx5/core/esw/qos.c | 324 ++++++++----------
 .../net/ethernet/mellanox/mlx5/core/esw/qos.h |   3 -
 .../net/ethernet/mellanox/mlx5/core/eswitch.c |   9 +-
 .../net/ethernet/mellanox/mlx5/core/eswitch.h |  14 +-
 .../net/ethernet/mellanox/mlx5/core/main.c    |  18 +
 .../ethernet/mellanox/mlx5/core/sh_devlink.c  | 183 ++++++++++
 .../ethernet/mellanox/mlx5/core/sh_devlink.h  |  16 +
 include/linux/mlx5/driver.h                   |   5 +
 include/net/devlink.h                         |   7 +
 include/uapi/linux/devlink.h                  |   2 +
 net/devlink/core.c                            |  48 ++-
 net/devlink/dev.c                             |   7 +-
 net/devlink/devl_internal.h                   |  11 +-
 net/devlink/netlink.c                         |  67 +++-
 net/devlink/netlink_gen.c                     |  23 +-
 net/devlink/netlink_gen.h                     |   8 +
 net/devlink/rate.c                            | 287 ++++++++++++----
 25 files changed, 873 insertions(+), 293 deletions(-)
 create mode 100644 Documentation/networking/devlink/devlink-shared.rst
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/sh_devlink.c
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/sh_devlink.h

-- 
2.31.1
Re: [PATCH net-next 00/14] devlink and mlx5: Support cross-function rate scheduling
Posted by Jakub Kicinski 1 week, 3 days ago
On Thu, 20 Nov 2025 15:09:12 +0200 Tariq Toukan wrote:
> Code dependency:
> This series should apply cleanly after the pulling of
> 'net-2025_11_19_05_03', specifically commit f94c1a114ac2 ("devlink:
> rate: Unset parent pointer in devl_rate_nodes_destroy").

repost please, we don't do dependencies
Re: [PATCH net-next 00/14] devlink and mlx5: Support cross-function rate scheduling
Posted by Tariq Toukan 1 week, 1 day ago

On 21/11/2025 5:39, Jakub Kicinski wrote:
> On Thu, 20 Nov 2025 15:09:12 +0200 Tariq Toukan wrote:
>> Code dependency:
>> This series should apply cleanly after the pulling of
>> 'net-2025_11_19_05_03', specifically commit f94c1a114ac2 ("devlink:
>> rate: Unset parent pointer in devl_rate_nodes_destroy").
> 
> repost please, we don't do dependencies
> 

Hi,

I submitted the code before my weekend as we have a gap of ~1.5 working 
days (timezones + Friday). It could be utilized for collecting feedback 
on the proposed solution, or even get it accepted.

I referred to a net-* tag from the net branch, part of your regular 
process, that was about to get merged any minute. Btw it was indeed 
pulled before this response, so our series would in fact apply cleanly.

Anyway, not a big deal, I'm re-posting the series now.

Regards,
Tariq
Re: [PATCH net-next 00/14] devlink and mlx5: Support cross-function rate scheduling
Posted by Jakub Kicinski 6 days, 22 hours ago
On Sun, 23 Nov 2025 08:57:58 +0200 Tariq Toukan wrote:
> I submitted the code before my weekend as we have a gap of ~1.5 working 
> days (timezones + Friday). It could be utilized for collecting feedback 
> on the proposed solution, or even get it accepted.

Makes sense, our recommendation is to throw in an RFC in the subjects
in this case. Saves the back and forth.