[PATCH mlx5-next V2 0/9] mlx5-next updates 2026-03-09

Tariq Toukan posted 9 patches 1 month ago
drivers/infiniband/hw/mlx5/ib_rep.c           |  24 +-
drivers/infiniband/hw/mlx5/main.c             |  21 +-
drivers/infiniband/hw/mlx5/mlx5_ib.h          |   1 -
drivers/infiniband/hw/mlx5/mr.c               |   1 -
.../mellanox/mlx5/core/diag/fs_tracepoint.c   |   3 +
.../net/ethernet/mellanox/mlx5/core/en_tc.c   |   9 +-
.../net/ethernet/mellanox/mlx5/core/eswitch.h |  14 +-
.../mellanox/mlx5/core/eswitch_offloads.c     | 103 ++-
.../net/ethernet/mellanox/mlx5/core/fs_cmd.c  |   6 +-
.../net/ethernet/mellanox/mlx5/core/fs_core.c |  17 +-
.../ethernet/mellanox/mlx5/core/lag/debugfs.c |   3 +-
.../net/ethernet/mellanox/mlx5/core/lag/lag.c | 684 ++++++++++++++----
.../net/ethernet/mellanox/mlx5/core/lag/lag.h |  49 +-
.../net/ethernet/mellanox/mlx5/core/lag/mp.c  |  20 +-
.../ethernet/mellanox/mlx5/core/lag/mpesw.c   |  15 +-
.../mellanox/mlx5/core/lag/port_sel.c         |  28 +-
.../net/ethernet/mellanox/mlx5/core/lib/sd.c  |   2 +-
include/linux/mlx5/device.h                   |   1 +
include/linux/mlx5/fs.h                       |  10 +-
include/linux/mlx5/lag.h                      |  21 +
include/linux/mlx5/mlx5_ifc.h                 |  26 +-
21 files changed, 850 insertions(+), 208 deletions(-)
create mode 100644 include/linux/mlx5/lag.h
[PATCH mlx5-next V2 0/9] mlx5-next updates 2026-03-09
Posted by Tariq Toukan 1 month ago
Hi,

This series contains mlx5 shared updates as preparation for upcoming
features.

First patch by Alex contains IFC changes as preparation for an upcoming
feature.
Last patch does definition movement to expose a HW constant so it could
be used later also by core and Eth drivers.

Patches 2 to 8 by Shay introduce mlx5 infrastructure for SD switchdev
and LAG support.
Detailed description by Shay below.

Regards,
Tariq

This series adds shared infrastructure to enable Socket Direct (SD)
single-netdev switchdev transition and LAG support in subsequent patches.

Currently, LAG is not supported in Socket Direct configurations, and
BlueField-3/4 utilizing SD for North-South traffic operates with two
distinct eSwitches per physical port. This forces the use of separate
IPs and MAC addresses for each NUMA node, complicating network
configuration and requiring firmware to handle MPFS with different
inner and outer packets for communication.

The goal is to expose a single external IP address (single MAC address)
per physical port while maintaining SD's bandwidth and latency benefits.
This means having a single eswitch per physical port managing all
physical ports via merged eswitch with multiple vports. This enables
single FDB creation which will result in a single RDMA device to be used by
DOCA/HWS/OVS.

To achieve this, the LAG infrastructure needs changes since the current
implementation assumes a fixed mapping between device indices and LAG
ports, which breaks with SD's multi-device-per-port model.

This series prepares the groundwork by:

1. Adding IFC bits for silent mode query and VHCA RX destination type,
   needed for SD device coordination and cross-VHCA traffic steering.

2. Converting the LAG pf array to xarray and using xa_alloc for dynamic
   index management. This decouples LAG indexing from physical device
   indices, allowing flexible device membership.

3. Convert peer_miss_rule array to xarray, key with vhca_id.

4. Introducing LAG variant of device index helpers that produce unique
   identifiers even when multiple devices share the same physical port.

5. Adding VHCA RX flow destination support for steering traffic to a
   specific VHCA's receive path.

6. Moving LAG demux table ownership to the LAG layer with APIs for
   SW-only LAG modes where firmware cannot create the demux table.

A follow-up series will build on this infrastructure to implement:
- SD single-netdev switchdev mode transition with shared FDB
  corresponded to the SD group.
- LAG support enabling bonding of SD groups

Since the follow-up series is large (~20 patches), the shared code
between RDMA and net is sent in advance to avoid overloading the
shared branch tree.

V2:
- Add one more patch #9.
- Use kvfree() instead of kfree() in mlx5_esw_lag_demux_rule_create()
- Fix a condition check to > instead of >= in
  mlx5_ib_set_vport_rep().
- Fix author of patch #4.
- Link to V1: https://lore.kernel.org/all/20260308065559.1837449-1-tariqt@nvidia.com/

Alexei Lazar (1):
  net/mlx5: Add IFC bits for shared headroom pool PBMC support

Shay Drory (7):
  net/mlx5: Add silent mode set/query and VHCA RX IFC bits
  net/mlx5: LAG, replace pf array with xarray
  net/mlx5: LAG, use xa_alloc to manage LAG device indices
  net/mlx5: E-switch, modify peer miss rule index to vhca_id
  net/mlx5: LAG, replace mlx5_get_dev_index with LAG sequence number
  net/mlx5: Add VHCA RX flow destination support for FW steering
  {net/RDMA}/mlx5: Add LAG demux table API and vport demux rules

Tariq Toukan (1):
  net/mlx5: Expose MLX5_UMR_ALIGN definition

 drivers/infiniband/hw/mlx5/ib_rep.c           |  24 +-
 drivers/infiniband/hw/mlx5/main.c             |  21 +-
 drivers/infiniband/hw/mlx5/mlx5_ib.h          |   1 -
 drivers/infiniband/hw/mlx5/mr.c               |   1 -
 .../mellanox/mlx5/core/diag/fs_tracepoint.c   |   3 +
 .../net/ethernet/mellanox/mlx5/core/en_tc.c   |   9 +-
 .../net/ethernet/mellanox/mlx5/core/eswitch.h |  14 +-
 .../mellanox/mlx5/core/eswitch_offloads.c     | 103 ++-
 .../net/ethernet/mellanox/mlx5/core/fs_cmd.c  |   6 +-
 .../net/ethernet/mellanox/mlx5/core/fs_core.c |  17 +-
 .../ethernet/mellanox/mlx5/core/lag/debugfs.c |   3 +-
 .../net/ethernet/mellanox/mlx5/core/lag/lag.c | 684 ++++++++++++++----
 .../net/ethernet/mellanox/mlx5/core/lag/lag.h |  49 +-
 .../net/ethernet/mellanox/mlx5/core/lag/mp.c  |  20 +-
 .../ethernet/mellanox/mlx5/core/lag/mpesw.c   |  15 +-
 .../mellanox/mlx5/core/lag/port_sel.c         |  28 +-
 .../net/ethernet/mellanox/mlx5/core/lib/sd.c  |   2 +-
 include/linux/mlx5/device.h                   |   1 +
 include/linux/mlx5/fs.h                       |  10 +-
 include/linux/mlx5/lag.h                      |  21 +
 include/linux/mlx5/mlx5_ifc.h                 |  26 +-
 21 files changed, 850 insertions(+), 208 deletions(-)
 create mode 100644 include/linux/mlx5/lag.h


base-commit: 385a06f74ff7a03e3fb0b15fb87cfeb052d75073
-- 
2.44.0
Re: [PATCH mlx5-next V2 0/9] mlx5-next updates 2026-03-09
Posted by Leon Romanovsky 3 weeks, 6 days ago
On Mon, 09 Mar 2026 11:34:26 +0200, Tariq Toukan wrote:
> This series contains mlx5 shared updates as preparation for upcoming
> features.
> 
> First patch by Alex contains IFC changes as preparation for an upcoming
> feature.
> Last patch does definition movement to expose a HW constant so it could
> be used later also by core and Eth drivers.
> 
> [...]

Applied, thanks!

[1/9] net/mlx5: Add IFC bits for shared headroom pool PBMC support
      https://git.kernel.org/rdma/rdma/c/f8e761655997cc
[2/9] net/mlx5: Add silent mode set/query and VHCA RX IFC bits
      https://git.kernel.org/rdma/rdma/c/691dffc7255e74
[3/9] net/mlx5: LAG, replace pf array with xarray
      https://git.kernel.org/rdma/rdma/c/91e9f3e7b62657
[4/9] net/mlx5: LAG, use xa_alloc to manage LAG device indices
      https://git.kernel.org/rdma/rdma/c/2b204cdb12068c
[5/9] net/mlx5: E-switch, modify peer miss rule index to vhca_id
      https://git.kernel.org/rdma/rdma/c/da0349d0ffc7b8
[6/9] net/mlx5: LAG, replace mlx5_get_dev_index with LAG sequence number
      https://git.kernel.org/rdma/rdma/c/971b28accc0943
[7/9] net/mlx5: Add VHCA RX flow destination support for FW steering
      https://git.kernel.org/rdma/rdma/c/0bc9059fab6365
[8/9] {net/RDMA}/mlx5: Add LAG demux table API and vport demux rules
      https://git.kernel.org/rdma/rdma/c/d6c9b4de8109a3
[9/9] net/mlx5: Expose MLX5_UMR_ALIGN definition
      https://git.kernel.org/rdma/rdma/c/4dd2115f43594d

Best regards,
-- 
Leon Romanovsky <leon@kernel.org>
Re: [PATCH mlx5-next V2 0/9] mlx5-next updates 2026-03-09
Posted by Tariq Toukan 4 weeks, 1 day ago

On 09/03/2026 11:34, Tariq Toukan wrote:
> Hi,
> 
> This series contains mlx5 shared updates as preparation for upcoming
> features.
> 
> First patch by Alex contains IFC changes as preparation for an upcoming
> feature.
> Last patch does definition movement to expose a HW constant so it could
> be used later also by core and Eth drivers.
> 
> Patches 2 to 8 by Shay introduce mlx5 infrastructure for SD switchdev
> and LAG support.
> Detailed description by Shay below.
> 
> Regards,
> Tariq
> 

Hi Leon, no comments for a while, let's merge it please.