[PATCH rdma-next 0/6] Add support for TLP emulation

Leon Romanovsky posted 6 patches 1 month, 1 week ago
drivers/infiniband/hw/mlx5/main.c              | 196 ++++++++++++++++++++-----
drivers/infiniband/hw/mlx5/mlx5_ib.h           |   8 +-
drivers/net/ethernet/mellanox/mlx5/core/fw.c   |   6 +
drivers/net/ethernet/mellanox/mlx5/core/main.c |   1 +
include/linux/mlx5/device.h                    |   9 ++
include/linux/mlx5/mlx5_ifc.h                  |  23 ++-
include/uapi/rdma/mlx5_user_ioctl_cmds.h       |   9 ++
include/uapi/rdma/mlx5_user_ioctl_verbs.h      |   4 +
8 files changed, 218 insertions(+), 38 deletions(-)
[PATCH rdma-next 0/6] Add support for TLP emulation
Posted by Leon Romanovsky 1 month, 1 week ago
This series adds support for Transaction Layer Packet (TLP) emulation
response gateway regions, enabling userspace device emulation software
to write TLP responses directly to lower layers without kernel driver
involvement.

Currently, the mlx5 driver exposes VirtIO emulation access regions via
the MLX5_IB_METHOD_VAR_OBJ_ALLOC ioctl. This series extends that
ioctl to also support allocating TLP response gateway channels for
PCI device emulation use cases.

Thanks

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
Maher Sanalla (6):
      net/mlx5: Add TLP emulation device capabilities
      net/mlx5: Expose TLP emulation capabilities
      RDMA/mlx5: Refactor VAR table to use region abstraction
      RDMA/mlx5: Add TLP VAR region support and infrastructure
      RDMA/mlx5: Add support for TLP VAR allocation
      RDMA/mlx5: Add VAR object query method for cross-process sharing

 drivers/infiniband/hw/mlx5/main.c              | 196 ++++++++++++++++++++-----
 drivers/infiniband/hw/mlx5/mlx5_ib.h           |   8 +-
 drivers/net/ethernet/mellanox/mlx5/core/fw.c   |   6 +
 drivers/net/ethernet/mellanox/mlx5/core/main.c |   1 +
 include/linux/mlx5/device.h                    |   9 ++
 include/linux/mlx5/mlx5_ifc.h                  |  23 ++-
 include/uapi/rdma/mlx5_user_ioctl_cmds.h       |   9 ++
 include/uapi/rdma/mlx5_user_ioctl_verbs.h      |   4 +
 8 files changed, 218 insertions(+), 38 deletions(-)
---
base-commit: 58409f0d4dd3f9e987214064e49b088823934304
change-id: 20260225-var-tlp-93de10adedb8

Best regards,
--  
Leon Romanovsky <leonro@nvidia.com>
Re: [PATCH rdma-next 0/6] Add support for TLP emulation
Posted by Keith Busch 1 month, 1 week ago
On Wed, Feb 25, 2026 at 04:19:30PM +0200, Leon Romanovsky wrote:
> This series adds support for Transaction Layer Packet (TLP) emulation
> response gateway regions, enabling userspace device emulation software
> to write TLP responses directly to lower layers without kernel driver
> involvement.
> 
> Currently, the mlx5 driver exposes VirtIO emulation access regions via
> the MLX5_IB_METHOD_VAR_OBJ_ALLOC ioctl. This series extends that
> ioctl to also support allocating TLP response gateway channels for
> PCI device emulation use cases.

Sorry if this is obvious to people in the know, but could you possibly
give a quick high level description of the use case behind this feature?
I'm just curious what emulation needs are enabled by having access to
this packet level. Thanks!
Re: [PATCH rdma-next 0/6] Add support for TLP emulation
Posted by Jason Gunthorpe 1 month ago
On Fri, Feb 27, 2026 at 02:37:05PM -0700, Keith Busch wrote:
> On Wed, Feb 25, 2026 at 04:19:30PM +0200, Leon Romanovsky wrote:
> > This series adds support for Transaction Layer Packet (TLP) emulation
> > response gateway regions, enabling userspace device emulation software
> > to write TLP responses directly to lower layers without kernel driver
> > involvement.
> > 
> > Currently, the mlx5 driver exposes VirtIO emulation access regions via
> > the MLX5_IB_METHOD_VAR_OBJ_ALLOC ioctl. This series extends that
> > ioctl to also support allocating TLP response gateway channels for
> > PCI device emulation use cases.
> 
> Sorry if this is obvious to people in the know, but could you possibly
> give a quick high level description of the use case behind this feature?
> I'm just curious what emulation needs are enabled by having access to
> this packet level. Thanks!

These days the DPU world supports what I think of as "software defined
PCI functions". Meaning when the DPU receives a PCIe TLP on its PCI
interface it may invoke software generate a response packet for that
TLP.

At least the Mellanox DPU can route the TLPs to software in many
different places: various on-device processors, or on the ARM cores
running Linux..

So, for example, using this basic capability you can write some
software to have the DPU create a PCI function that conforms to the
virtio-net specification. Or NVMe. Or whatever else you dream up.

The peculiar thing is that this is all tightly coupled to RDMA. Eg if
you want your TLP to trigger a DMA from the PCI function then RDMA QPs
and MRs have to be used to execute the DMA.

Jason
Re: [PATCH rdma-next 0/6] Add support for TLP emulation
Posted by Jakub Kicinski 1 month, 1 week ago
On Wed, 25 Feb 2026 16:19:30 +0200 Leon Romanovsky wrote:
> This series adds support for Transaction Layer Packet (TLP) emulation
> response gateway regions, enabling userspace device emulation software
> to write TLP responses directly to lower layers without kernel driver
> involvement.
> 
> Currently, the mlx5 driver exposes VirtIO emulation access regions via
> the MLX5_IB_METHOD_VAR_OBJ_ALLOC ioctl. This series extends that
> ioctl to also support allocating TLP response gateway channels for
> PCI device emulation use cases.

Why is this an RDMA thing if it's a PCIe feature indented for VirtIO?
Re: [PATCH rdma-next 0/6] Add support for TLP emulation
Posted by Leon Romanovsky 1 month ago
On Thu, Feb 26, 2026 at 05:34:34PM -0800, Jakub Kicinski wrote:
> On Wed, 25 Feb 2026 16:19:30 +0200 Leon Romanovsky wrote:
> > This series adds support for Transaction Layer Packet (TLP) emulation
> > response gateway regions, enabling userspace device emulation software
> > to write TLP responses directly to lower layers without kernel driver
> > involvement.
> > 
> > Currently, the mlx5 driver exposes VirtIO emulation access regions via
> > the MLX5_IB_METHOD_VAR_OBJ_ALLOC ioctl. This series extends that
> > ioctl to also support allocating TLP response gateway channels for
> > PCI device emulation use cases.
> 
> Why is this an RDMA thing if it's a PCIe feature indented for VirtIO?

This is the result of a long path of evolution.

Early on, we had VDPA emulation implemented entirely within the RDMA
stack. The idea was to build something similar to a tun/tap pair, where
a native RDMA QP could be connected to RDMA QPs carrying WQEs formatted
in the VirtIO layout. With some QEMU-side handling, this produced a
virtio-net device.

Later, this model was adapted for a DPU configuration. In that setup,
the DPU's RDMA block held the native QPs, while the x86 host exposed the
VirtIO-formatted QPs, still with QEMU involved. The DPU controlled the
x86-side "tun/tap" through RDMA-linked operations on the associated
objects.

Next, the DPU evolved to instantiate a full VirtIO PCI function on its
own, removing the need for x86 to run QEMU. The DPU continued to manage
the tun/tap via RDMA operations, with some extensions to cover PCI-
related details.

Eventually, the DPU gained general-purpose programmable co-processors
capable of executing various RDMA and non-RDMA operations. As a result,
the RDMA subsystem also became responsible for loading programs onto
these co-processors and managing them within RDMA context and PD
security constraints.

Now we have reached a stage where these co-processors can manage a much
larger portion of the PCI-side behavior, including delegating some
responsibilities back to the host CPU. This produces an odd situation
where a privileged RDMA user can:

- Claim an "emulation" PCI function
- Load a co-processor program associated with that PCI function
- Use RDMA-mediated queues and security controls to interact with the
  co-processor program
- Use the co-processor and related mechanisms to capture and respond to
  TLPs directed to that PCI function

There are many tightly coupled components in this design, but the TLP
handling cannot be separated from the RDMA-related logic that enables
it.

Thanks
Re: [PATCH rdma-next 0/6] Add support for TLP emulation
Posted by Leon Romanovsky 1 month, 1 week ago
On Wed, Feb 25, 2026 at 04:19:30PM +0200, Leon Romanovsky wrote:
> This series adds support for Transaction Layer Packet (TLP) emulation
> response gateway regions, enabling userspace device emulation software
> to write TLP responses directly to lower layers without kernel driver
> involvement.
> 
> Currently, the mlx5 driver exposes VirtIO emulation access regions via
> the MLX5_IB_METHOD_VAR_OBJ_ALLOC ioctl. This series extends that
> ioctl to also support allocating TLP response gateway channels for
> PCI device emulation use cases.
> 
> Thanks
> 
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> ---
> Maher Sanalla (6):
>       net/mlx5: Add TLP emulation device capabilities
>       net/mlx5: Expose TLP emulation capabilities
>       RDMA/mlx5: Refactor VAR table to use region abstraction
>       RDMA/mlx5: Add TLP VAR region support and infrastructure
>       RDMA/mlx5: Add support for TLP VAR allocation

>       RDMA/mlx5: Add VAR object query method for cross-process sharing

There is no need in this last patch. There is a way to implement it
purely in userspace.

Thanks
Re: (subset) [PATCH rdma-next 0/6] Add support for TLP emulation
Posted by Leon Romanovsky 1 month ago
On Wed, 25 Feb 2026 16:19:30 +0200, Leon Romanovsky wrote:
> This series adds support for Transaction Layer Packet (TLP) emulation
> response gateway regions, enabling userspace device emulation software
> to write TLP responses directly to lower layers without kernel driver
> involvement.
> 
> Currently, the mlx5 driver exposes VirtIO emulation access regions via
> the MLX5_IB_METHOD_VAR_OBJ_ALLOC ioctl. This series extends that
> ioctl to also support allocating TLP response gateway channels for
> PCI device emulation use cases.
> 
> [...]

Applied, thanks!

[1/6] net/mlx5: Add TLP emulation device capabilities
      (no commit info)
[2/6] net/mlx5: Expose TLP emulation capabilities
      (no commit info)

Best regards,
-- 
Leon Romanovsky <leon@kernel.org>
Re: (subset) [PATCH rdma-next 0/6] Add support for TLP emulation
Posted by Leon Romanovsky 1 month ago
On Wed, 25 Feb 2026 16:19:30 +0200, Leon Romanovsky wrote:
> This series adds support for Transaction Layer Packet (TLP) emulation
> response gateway regions, enabling userspace device emulation software
> to write TLP responses directly to lower layers without kernel driver
> involvement.
> 
> Currently, the mlx5 driver exposes VirtIO emulation access regions via
> the MLX5_IB_METHOD_VAR_OBJ_ALLOC ioctl. This series extends that
> ioctl to also support allocating TLP response gateway channels for
> PCI device emulation use cases.
> 
> [...]

Applied, thanks!

[3/6] RDMA/mlx5: Refactor VAR table to use region abstraction
      (no commit info)
[4/6] RDMA/mlx5: Add TLP VAR region support and infrastructure
      (no commit info)
[5/6] RDMA/mlx5: Add support for TLP VAR allocation
      (no commit info)
[6/6] RDMA/mlx5: Add VAR object query method for cross-process sharing
      (no commit info)

Best regards,
-- 
Leon Romanovsky <leon@kernel.org>