drivers/infiniband/hw/mlx5/main.c | 196 ++++++++++++++++++++----- drivers/infiniband/hw/mlx5/mlx5_ib.h | 8 +- drivers/net/ethernet/mellanox/mlx5/core/fw.c | 6 + drivers/net/ethernet/mellanox/mlx5/core/main.c | 1 + include/linux/mlx5/device.h | 9 ++ include/linux/mlx5/mlx5_ifc.h | 23 ++- include/uapi/rdma/mlx5_user_ioctl_cmds.h | 9 ++ include/uapi/rdma/mlx5_user_ioctl_verbs.h | 4 + 8 files changed, 218 insertions(+), 38 deletions(-)
This series adds support for Transaction Layer Packet (TLP) emulation
response gateway regions, enabling userspace device emulation software
to write TLP responses directly to lower layers without kernel driver
involvement.
Currently, the mlx5 driver exposes VirtIO emulation access regions via
the MLX5_IB_METHOD_VAR_OBJ_ALLOC ioctl. This series extends that
ioctl to also support allocating TLP response gateway channels for
PCI device emulation use cases.
Thanks
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
Maher Sanalla (6):
net/mlx5: Add TLP emulation device capabilities
net/mlx5: Expose TLP emulation capabilities
RDMA/mlx5: Refactor VAR table to use region abstraction
RDMA/mlx5: Add TLP VAR region support and infrastructure
RDMA/mlx5: Add support for TLP VAR allocation
RDMA/mlx5: Add VAR object query method for cross-process sharing
drivers/infiniband/hw/mlx5/main.c | 196 ++++++++++++++++++++-----
drivers/infiniband/hw/mlx5/mlx5_ib.h | 8 +-
drivers/net/ethernet/mellanox/mlx5/core/fw.c | 6 +
drivers/net/ethernet/mellanox/mlx5/core/main.c | 1 +
include/linux/mlx5/device.h | 9 ++
include/linux/mlx5/mlx5_ifc.h | 23 ++-
include/uapi/rdma/mlx5_user_ioctl_cmds.h | 9 ++
include/uapi/rdma/mlx5_user_ioctl_verbs.h | 4 +
8 files changed, 218 insertions(+), 38 deletions(-)
---
base-commit: 58409f0d4dd3f9e987214064e49b088823934304
change-id: 20260225-var-tlp-93de10adedb8
Best regards,
--
Leon Romanovsky <leonro@nvidia.com>
On Wed, Feb 25, 2026 at 04:19:30PM +0200, Leon Romanovsky wrote: > This series adds support for Transaction Layer Packet (TLP) emulation > response gateway regions, enabling userspace device emulation software > to write TLP responses directly to lower layers without kernel driver > involvement. > > Currently, the mlx5 driver exposes VirtIO emulation access regions via > the MLX5_IB_METHOD_VAR_OBJ_ALLOC ioctl. This series extends that > ioctl to also support allocating TLP response gateway channels for > PCI device emulation use cases. Sorry if this is obvious to people in the know, but could you possibly give a quick high level description of the use case behind this feature? I'm just curious what emulation needs are enabled by having access to this packet level. Thanks!
On Fri, Feb 27, 2026 at 02:37:05PM -0700, Keith Busch wrote: > On Wed, Feb 25, 2026 at 04:19:30PM +0200, Leon Romanovsky wrote: > > This series adds support for Transaction Layer Packet (TLP) emulation > > response gateway regions, enabling userspace device emulation software > > to write TLP responses directly to lower layers without kernel driver > > involvement. > > > > Currently, the mlx5 driver exposes VirtIO emulation access regions via > > the MLX5_IB_METHOD_VAR_OBJ_ALLOC ioctl. This series extends that > > ioctl to also support allocating TLP response gateway channels for > > PCI device emulation use cases. > > Sorry if this is obvious to people in the know, but could you possibly > give a quick high level description of the use case behind this feature? > I'm just curious what emulation needs are enabled by having access to > this packet level. Thanks! These days the DPU world supports what I think of as "software defined PCI functions". Meaning when the DPU receives a PCIe TLP on its PCI interface it may invoke software generate a response packet for that TLP. At least the Mellanox DPU can route the TLPs to software in many different places: various on-device processors, or on the ARM cores running Linux.. So, for example, using this basic capability you can write some software to have the DPU create a PCI function that conforms to the virtio-net specification. Or NVMe. Or whatever else you dream up. The peculiar thing is that this is all tightly coupled to RDMA. Eg if you want your TLP to trigger a DMA from the PCI function then RDMA QPs and MRs have to be used to execute the DMA. Jason
On Wed, 25 Feb 2026 16:19:30 +0200 Leon Romanovsky wrote: > This series adds support for Transaction Layer Packet (TLP) emulation > response gateway regions, enabling userspace device emulation software > to write TLP responses directly to lower layers without kernel driver > involvement. > > Currently, the mlx5 driver exposes VirtIO emulation access regions via > the MLX5_IB_METHOD_VAR_OBJ_ALLOC ioctl. This series extends that > ioctl to also support allocating TLP response gateway channels for > PCI device emulation use cases. Why is this an RDMA thing if it's a PCIe feature indented for VirtIO?
On Thu, Feb 26, 2026 at 05:34:34PM -0800, Jakub Kicinski wrote: > On Wed, 25 Feb 2026 16:19:30 +0200 Leon Romanovsky wrote: > > This series adds support for Transaction Layer Packet (TLP) emulation > > response gateway regions, enabling userspace device emulation software > > to write TLP responses directly to lower layers without kernel driver > > involvement. > > > > Currently, the mlx5 driver exposes VirtIO emulation access regions via > > the MLX5_IB_METHOD_VAR_OBJ_ALLOC ioctl. This series extends that > > ioctl to also support allocating TLP response gateway channels for > > PCI device emulation use cases. > > Why is this an RDMA thing if it's a PCIe feature indented for VirtIO? This is the result of a long path of evolution. Early on, we had VDPA emulation implemented entirely within the RDMA stack. The idea was to build something similar to a tun/tap pair, where a native RDMA QP could be connected to RDMA QPs carrying WQEs formatted in the VirtIO layout. With some QEMU-side handling, this produced a virtio-net device. Later, this model was adapted for a DPU configuration. In that setup, the DPU's RDMA block held the native QPs, while the x86 host exposed the VirtIO-formatted QPs, still with QEMU involved. The DPU controlled the x86-side "tun/tap" through RDMA-linked operations on the associated objects. Next, the DPU evolved to instantiate a full VirtIO PCI function on its own, removing the need for x86 to run QEMU. The DPU continued to manage the tun/tap via RDMA operations, with some extensions to cover PCI- related details. Eventually, the DPU gained general-purpose programmable co-processors capable of executing various RDMA and non-RDMA operations. As a result, the RDMA subsystem also became responsible for loading programs onto these co-processors and managing them within RDMA context and PD security constraints. Now we have reached a stage where these co-processors can manage a much larger portion of the PCI-side behavior, including delegating some responsibilities back to the host CPU. This produces an odd situation where a privileged RDMA user can: - Claim an "emulation" PCI function - Load a co-processor program associated with that PCI function - Use RDMA-mediated queues and security controls to interact with the co-processor program - Use the co-processor and related mechanisms to capture and respond to TLPs directed to that PCI function There are many tightly coupled components in this design, but the TLP handling cannot be separated from the RDMA-related logic that enables it. Thanks
On Wed, Feb 25, 2026 at 04:19:30PM +0200, Leon Romanovsky wrote: > This series adds support for Transaction Layer Packet (TLP) emulation > response gateway regions, enabling userspace device emulation software > to write TLP responses directly to lower layers without kernel driver > involvement. > > Currently, the mlx5 driver exposes VirtIO emulation access regions via > the MLX5_IB_METHOD_VAR_OBJ_ALLOC ioctl. This series extends that > ioctl to also support allocating TLP response gateway channels for > PCI device emulation use cases. > > Thanks > > Signed-off-by: Leon Romanovsky <leonro@nvidia.com> > --- > Maher Sanalla (6): > net/mlx5: Add TLP emulation device capabilities > net/mlx5: Expose TLP emulation capabilities > RDMA/mlx5: Refactor VAR table to use region abstraction > RDMA/mlx5: Add TLP VAR region support and infrastructure > RDMA/mlx5: Add support for TLP VAR allocation > RDMA/mlx5: Add VAR object query method for cross-process sharing There is no need in this last patch. There is a way to implement it purely in userspace. Thanks
On Wed, 25 Feb 2026 16:19:30 +0200, Leon Romanovsky wrote:
> This series adds support for Transaction Layer Packet (TLP) emulation
> response gateway regions, enabling userspace device emulation software
> to write TLP responses directly to lower layers without kernel driver
> involvement.
>
> Currently, the mlx5 driver exposes VirtIO emulation access regions via
> the MLX5_IB_METHOD_VAR_OBJ_ALLOC ioctl. This series extends that
> ioctl to also support allocating TLP response gateway channels for
> PCI device emulation use cases.
>
> [...]
Applied, thanks!
[1/6] net/mlx5: Add TLP emulation device capabilities
(no commit info)
[2/6] net/mlx5: Expose TLP emulation capabilities
(no commit info)
Best regards,
--
Leon Romanovsky <leon@kernel.org>
On Wed, 25 Feb 2026 16:19:30 +0200, Leon Romanovsky wrote:
> This series adds support for Transaction Layer Packet (TLP) emulation
> response gateway regions, enabling userspace device emulation software
> to write TLP responses directly to lower layers without kernel driver
> involvement.
>
> Currently, the mlx5 driver exposes VirtIO emulation access regions via
> the MLX5_IB_METHOD_VAR_OBJ_ALLOC ioctl. This series extends that
> ioctl to also support allocating TLP response gateway channels for
> PCI device emulation use cases.
>
> [...]
Applied, thanks!
[3/6] RDMA/mlx5: Refactor VAR table to use region abstraction
(no commit info)
[4/6] RDMA/mlx5: Add TLP VAR region support and infrastructure
(no commit info)
[5/6] RDMA/mlx5: Add support for TLP VAR allocation
(no commit info)
[6/6] RDMA/mlx5: Add VAR object query method for cross-process sharing
(no commit info)
Best regards,
--
Leon Romanovsky <leon@kernel.org>
© 2016 - 2026 Red Hat, Inc.