[PATCH net-next 00/11] net: devmem: support devmem with netkit devices

Bobby Eshleman posted 11 patches 1 month, 2 weeks ago
There is a newer version of this series
.../networking/net_cachelines/net_device.rst       |   2 +-
Documentation/networking/netmem.rst                |   8 +-
.../translations/zh_CN/networking/netmem.rst       |   7 +-
drivers/net/ethernet/broadcom/bnxt/bnxt.c          |   2 +-
drivers/net/ethernet/google/gve/gve_main.c         |   2 +-
drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |   2 +-
drivers/net/ethernet/meta/fbnic/fbnic_netdev.c     |   2 +-
drivers/net/netkit.c                               |   1 +
include/linux/netdevice.h                          |  11 +-
net/core/dev.c                                     |  24 ++-
net/core/devmem.c                                  |   6 +-
net/core/devmem.h                                  |   9 +-
net/core/netdev-genl.c                             |  53 ++++-
tools/testing/selftests/drivers/net/hw/devmem.py   |  73 +------
.../selftests/drivers/net/hw/lib/py/devmem.py      | 215 +++++++++++++++++++++
tools/testing/selftests/drivers/net/hw/ncdevmem.c  |  58 +++---
.../testing/selftests/drivers/net/hw/nk_devmem.py  |  40 ++++
.../drivers/net/hw/nk_primary_rx_redirect.bpf.c    |  41 ++++
tools/testing/selftests/drivers/net/lib/py/env.py  |  67 +++++--
19 files changed, 498 insertions(+), 125 deletions(-)
[PATCH net-next 00/11] net: devmem: support devmem with netkit devices
Posted by Bobby Eshleman 1 month, 2 weeks ago
This series enables TCP devmem TX through netkit devices.

Netkit now supports queue leasing. A physical NIC's RX queue can be
leased to a netkit guest interface inside a container namespace. This
gives the container a devmem-capable data path on the RX side (bind-rx,
etc...). On the TX side, the container process binds to its netkit guest
interface and sends traffic that netkit redirects (via BPF or ip
forwarding) to the physical NIC for DMA.

Two things in the existing devmem TX path prevent this from working:

1. validate_xmit_unreadable_skb() requires dev->netmem_tx before it will
   forward a dmabuf-backed (unreadable) skb. This protects skbs from
   landing on devices that don't have the IOMMU mappings for the backing
   dmabuf or that don't speak netmem. Netkit, however, does not support
   DMA, doesn't attempt to read unreadable skb pages and so doesn't
   break netmem (it is pure skb routing and redirection). It is
   functionally capable of routing unreadable skbs, but there is no way
   for the TX validation pathway to distinguish between a device that
   will actually attempt DMA-ing the skb and another device
   (like netkit) that does not DMA but also does not break
   netmem.

2. bind_tx_doit uses the bound device as the DMA device.  When the user
   binds devmem TX to the netkit guest, the bind handler attempts to
   create DMA mappings against netkit, which has no DMA capability and
   no IOMMU mappings.

This series solves these problems as follows:

1. Extend netmem_tx to two bits, assigned to one of three values:

   NETMEM_TX_NONE   - netmem not supported
   NETMEM_TX_DMA    - netmem supported and performs DMA
   NETMEM_TX_NO_DMA - netmem supported, but does not DMA

   With these bits, phys devices can set NETMEM_TX_DMA and devices like
   netkit set NETMEM_TX_NO_DMA. The validation TX path ensures that any
   DMA-capable netdev exactly matches the bound device, guarantee the
   correct mapping of the bound dmabuf. The validation TX path also
   allows devices with NETMEM_TX_NO_DMA to pass, knowing these devices
   will not misuse netmem or run into IOMMU faults. After redirection or
   routing and the skb finally makes its way through the stack to a
   physical device's TX path, the above NETMEM_TX_DMA check is performed
   again to guarantee the device has the appropriate binding/mappings.

2. On TX bind, the bind handler recognizes NETMEM_TX_NO_DMA devices and
   finds the phys TX device and binds to that instead. For the netkit
   case, if it has been leased a queue from a DMA-capable device
   already, then the bind action is performed on the DMA-capable device
   instead and the dmabuf is mapped correctly.

Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
Bobby Eshleman (11):
      net: add netmem_tx modes that indicate dma capability
      net: bnxt: convert netmem_tx from bool to NETMEM_TX_DMA enum
      gve: convert netmem_tx from bool to NETMEM_TX_DMA enum
      net/mlx5e: convert netmem_tx from bool to NETMEM_TX_DMA enum
      eth: fbnic: convert netmem_tx from bool to NETMEM_TX_DMA enum
      netkit: set NETMEM_TX_NO_DMA for unreadable skb passthrough
      net: devmem: support TX over NETMEM_TX_NO_DMA devices
      selftests: drv-net: ncdevmem: add -n flag to skip NIC configuration
      selftests: drv-net: refactor devmem command builders into lib module
      selftests: drv-net: add primary_rx_redirect support to NetDrvContEnv
      selftests: drv-net: add netkit devmem tests

 .../networking/net_cachelines/net_device.rst       |   2 +-
 Documentation/networking/netmem.rst                |   8 +-
 .../translations/zh_CN/networking/netmem.rst       |   7 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c          |   2 +-
 drivers/net/ethernet/google/gve/gve_main.c         |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |   2 +-
 drivers/net/ethernet/meta/fbnic/fbnic_netdev.c     |   2 +-
 drivers/net/netkit.c                               |   1 +
 include/linux/netdevice.h                          |  11 +-
 net/core/dev.c                                     |  24 ++-
 net/core/devmem.c                                  |   6 +-
 net/core/devmem.h                                  |   9 +-
 net/core/netdev-genl.c                             |  53 ++++-
 tools/testing/selftests/drivers/net/hw/devmem.py   |  73 +------
 .../selftests/drivers/net/hw/lib/py/devmem.py      | 215 +++++++++++++++++++++
 tools/testing/selftests/drivers/net/hw/ncdevmem.c  |  58 +++---
 .../testing/selftests/drivers/net/hw/nk_devmem.py  |  40 ++++
 .../drivers/net/hw/nk_primary_rx_redirect.bpf.c    |  41 ++++
 tools/testing/selftests/drivers/net/lib/py/env.py  |  67 +++++--
 19 files changed, 498 insertions(+), 125 deletions(-)
---
base-commit: 790ead9394860e7d70c5e0e50a35b243e909a618
change-id: 20260423-tcp-dm-netkit-2bd78b638d30

Best regards,
-- 
Bobby Eshleman <bobbyeshleman@meta.com>
Re: [PATCH net-next 00/11] net: devmem: support devmem with netkit devices
Posted by Jakub Kicinski 1 month, 2 weeks ago
On Tue, 28 Apr 2026 15:41:57 -0700 Bobby Eshleman wrote:
>       net: add netmem_tx modes that indicate dma capability
>       net: bnxt: convert netmem_tx from bool to NETMEM_TX_DMA enum
>       gve: convert netmem_tx from bool to NETMEM_TX_DMA enum
>       net/mlx5e: convert netmem_tx from bool to NETMEM_TX_DMA enum
>       eth: fbnic: convert netmem_tx from bool to NETMEM_TX_DMA enum
>       netkit: set NETMEM_TX_NO_DMA for unreadable skb passthrough
>       net: devmem: support TX over NETMEM_TX_NO_DMA devices

I think it looks reasonable over all, but the assumption that rx lease
implies tx queue does not seem great. Sounds like Daniel has that part
covered tho :)

When you post v2 - you can squash the driver patches into patch 1.
Re: [PATCH net-next 00/11] net: devmem: support devmem with netkit devices
Posted by Bobby Eshleman 1 month, 2 weeks ago
On Thu, Apr 30, 2026 at 05:59:45PM -0700, Jakub Kicinski wrote:
> On Tue, 28 Apr 2026 15:41:57 -0700 Bobby Eshleman wrote:
> >       net: add netmem_tx modes that indicate dma capability
> >       net: bnxt: convert netmem_tx from bool to NETMEM_TX_DMA enum
> >       gve: convert netmem_tx from bool to NETMEM_TX_DMA enum
> >       net/mlx5e: convert netmem_tx from bool to NETMEM_TX_DMA enum
> >       eth: fbnic: convert netmem_tx from bool to NETMEM_TX_DMA enum
> >       netkit: set NETMEM_TX_NO_DMA for unreadable skb passthrough
> >       net: devmem: support TX over NETMEM_TX_NO_DMA devices
> 
> I think it looks reasonable over all, but the assumption that rx lease
> implies tx queue does not seem great. Sounds like Daniel has that part
> covered tho :)

Indeed, with TX leasing this becomes much nicer.

> 
> When you post v2 - you can squash the driver patches into patch 1.

Will do!

Best,
Bobby
Re: [PATCH net-next 00/11] net: devmem: support devmem with netkit devices
Posted by Daniel Borkmann 1 month, 2 weeks ago
Hi Bobby,

On 4/29/26 12:41 AM, Bobby Eshleman wrote:
> This series enables TCP devmem TX through netkit devices.
> 
> Netkit now supports queue leasing. A physical NIC's RX queue can be
> leased to a netkit guest interface inside a container namespace. This
> gives the container a devmem-capable data path on the RX side (bind-rx,
> etc...). On the TX side, the container process binds to its netkit guest
> interface and sends traffic that netkit redirects (via BPF or ip
> forwarding) to the physical NIC for DMA.
[...]
Thanks for working on this, after the RX queue leasing got merged, I've
been looking into the same actually. :)

I think the NETMEM_TX_* enum approach seems reasonable.

What I have a PoC on is to build out TX queue leasing as first-class
symmetric infrastructure to complement the RX queue leasing - basically
I implemented an equivalent to the latter in netdev_nl_queue_create_doit
et al, so you can have independent RX and TX leases and per-queue
accountability, such that ynl queue-get op shows the full picture, and
lastly we could also enable AF_XDP TX-only support through this infra.

Would you be open to collab on integrating both and migrating the devmem
code to work off an TX queue object? Next week is LSF/MM/BPF, are you
there by any chance to catch up in person?

Thanks a lot,
Daniel
Re: [PATCH net-next 00/11] net: devmem: support devmem with netkit devices
Posted by Bobby Eshleman 1 month, 2 weeks ago
On Wed, Apr 29, 2026 at 02:08:31PM +0200, Daniel Borkmann wrote:
> Hi Bobby,
> 
> On 4/29/26 12:41 AM, Bobby Eshleman wrote:
> > This series enables TCP devmem TX through netkit devices.
> > 
> > Netkit now supports queue leasing. A physical NIC's RX queue can be
> > leased to a netkit guest interface inside a container namespace. This
> > gives the container a devmem-capable data path on the RX side (bind-rx,
> > etc...). On the TX side, the container process binds to its netkit guest
> > interface and sends traffic that netkit redirects (via BPF or ip
> > forwarding) to the physical NIC for DMA.
> [...]
> Thanks for working on this, after the RX queue leasing got merged, I've
> been looking into the same actually. :)
> 
> I think the NETMEM_TX_* enum approach seems reasonable.
> 
> What I have a PoC on is to build out TX queue leasing as first-class
> symmetric infrastructure to complement the RX queue leasing - basically
> I implemented an equivalent to the latter in netdev_nl_queue_create_doit
> et al, so you can have independent RX and TX leases and per-queue
> accountability, such that ynl queue-get op shows the full picture, and
> lastly we could also enable AF_XDP TX-only support through this infra.
> 
> Would you be open to collab on integrating both and migrating the devmem
> code to work off an TX queue object? Next week is LSF/MM/BPF, are you
> there by any chance to catch up in person?
> 

Hey Daniel,

Definitely am open to that. I will unfortunately not be at LSF/MM/BPF,
but maybe we can schedule a meeting offline to sync up?

On the approach, explicit TX queue leasing sounds like a better way to
permit devmem's tx binding than implicitly via RX lease.

Best,
Bobby
Re: [PATCH net-next 00/11] net: devmem: support devmem with netkit devices
Posted by Daniel Borkmann 1 month, 2 weeks ago
On 4/29/26 5:18 PM, Bobby Eshleman wrote:
> On Wed, Apr 29, 2026 at 02:08:31PM +0200, Daniel Borkmann wrote:
>> On 4/29/26 12:41 AM, Bobby Eshleman wrote:
>>> This series enables TCP devmem TX through netkit devices.
>>>
>>> Netkit now supports queue leasing. A physical NIC's RX queue can be
>>> leased to a netkit guest interface inside a container namespace. This
>>> gives the container a devmem-capable data path on the RX side (bind-rx,
>>> etc...). On the TX side, the container process binds to its netkit guest
>>> interface and sends traffic that netkit redirects (via BPF or ip
>>> forwarding) to the physical NIC for DMA.
>> [...]
>> Thanks for working on this, after the RX queue leasing got merged, I've
>> been looking into the same actually. :)
>>
>> I think the NETMEM_TX_* enum approach seems reasonable.
>>
>> What I have a PoC on is to build out TX queue leasing as first-class
>> symmetric infrastructure to complement the RX queue leasing - basically
>> I implemented an equivalent to the latter in netdev_nl_queue_create_doit
>> et al, so you can have independent RX and TX leases and per-queue
>> accountability, such that ynl queue-get op shows the full picture, and
>> lastly we could also enable AF_XDP TX-only support through this infra.
>>
>> Would you be open to collab on integrating both and migrating the devmem
>> code to work off an TX queue object? Next week is LSF/MM/BPF, are you
>> there by any chance to catch up in person?
> 
> Definitely am open to that. I will unfortunately not be at LSF/MM/BPF,
> but maybe we can schedule a meeting offline to sync up?

Ack, absolutely, will reach out offline to find sth after LSF/MM/BPF week.
If anyone else wants to join, just DM me.

Cheers,
Daniel