[PATCH net-next v3 0/8] net: devmem: support devmem with netkit devices

Bobby Eshleman posted 8 patches 1 month ago
There is a newer version of this series
.../networking/net_cachelines/net_device.rst       |   2 +-
Documentation/networking/netmem.rst                |   8 +-
.../translations/zh_CN/networking/netmem.rst       |   7 +-
drivers/net/ethernet/broadcom/bnxt/bnxt.c          |   2 +-
drivers/net/ethernet/google/gve/gve_main.c         |   2 +-
drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |   2 +-
drivers/net/ethernet/meta/fbnic/fbnic_netdev.c     |   2 +-
drivers/net/netkit.c                               |   1 +
include/linux/netdevice.h                          |  11 +-
net/core/dev.c                                     |   5 +-
net/core/devmem.c                                  |   6 +-
net/core/devmem.h                                  |   9 +-
net/core/netdev-genl.c                             |  65 +++++-
tools/testing/selftests/drivers/net/hw/Makefile    |   1 +
tools/testing/selftests/drivers/net/hw/devmem.py   |  77 ++------
.../selftests/drivers/net/hw/lib/py/devmem.py      | 218 +++++++++++++++++++++
tools/testing/selftests/drivers/net/hw/ncdevmem.c  |  58 +++---
.../testing/selftests/drivers/net/hw/nk_devmem.py  |  55 ++++++
.../drivers/net/hw/nk_primary_rx_redirect.bpf.c    |  39 ++++
.../testing/selftests/drivers/net/hw/nk_qlease.py  |   8 +-
tools/testing/selftests/drivers/net/lib/py/env.py  | 109 ++++++++---
21 files changed, 549 insertions(+), 138 deletions(-)
[PATCH net-next v3 0/8] net: devmem: support devmem with netkit devices
Posted by Bobby Eshleman 1 month ago
This series enables TCP devmem TX through netkit devices.

Netkit now supports queue leasing. A physical NIC's RX queue can be
leased to a netkit guest interface inside a container namespace. This
gives the container a devmem-capable data path on the RX side (bind-rx,
etc...). On the TX side, the container process binds to its netkit guest
interface and sends traffic that netkit redirects (via BPF or ip
forwarding) to the physical NIC for DMA.

Two things in the existing devmem TX path prevent this from working:

1. validate_xmit_unreadable_skb() requires dev->netmem_tx before it will
   forward a dmabuf-backed (unreadable) skb. This protects skbs from
   landing on devices that don't have the IOMMU mappings for the backing
   dmabuf or that don't speak netmem. Netkit, however, does not support
   DMA, doesn't attempt to read unreadable skb pages and so doesn't
   break netmem (it is pure skb routing and redirection). It is
   functionally capable of routing unreadable skbs, but there is no way
   for the TX validation pathway to distinguish between a device that
   will actually attempt DMA-ing the skb and another device
   (like netkit) that does not DMA but also does not break
   netmem.

2. bind_tx_doit uses the bound device as the DMA device.  When the user
   binds devmem TX to the netkit guest, the bind handler attempts to
   create DMA mappings against netkit, which has no DMA capability and
   no IOMMU mappings.

This series solves these problems as follows:

1. Extend netmem_tx to two bits, assigned to one of three values:

   NETMEM_TX_NONE   - netmem not supported
   NETMEM_TX_DMA    - netmem supported and performs DMA
   NETMEM_TX_NO_DMA - netmem supported, but does not DMA

   With these bits, phys devices can set NETMEM_TX_DMA and devices like
   netkit set NETMEM_TX_NO_DMA. The validation TX path ensures that any
   DMA-capable netdev exactly matches the bound device, guaranteeing the
   correct mapping of the bound dmabuf. The validation TX path also
   allows devices with NETMEM_TX_NO_DMA to pass, knowing these devices
   will not misuse netmem or run into IOMMU faults. After redirection or
   routing and the skb finally makes its way through the stack to a
   physical device's TX path, the above NETMEM_TX_DMA check is performed
   again to guarantee the device has the appropriate binding/mappings.

2. On TX bind, the bind handler recognizes NETMEM_TX_NO_DMA devices and
   finds the phys TX device and binds to that instead. For the netkit
   case, if it has been leased a queue from a DMA-capable device
   already, then the bind action is performed on the DMA-capable device
   instead and the dmabuf is mapped correctly.

---
Changes in v3:
- Fix validate_xmit_unreadable_skb() logic for non-devmem
  unreadable niovs (should not be dropped) (Sashiko)
- Simplify lock handling in bind_tx, no premature release (Jakub)
- split NO_DMA changes into separate patch (Jakub)
- fixed some pylint issues, one required an additional patch ("selftests:
  drv-net: make attr _nk_guest_ifname public") to rename a variable from
  private to public
- see per-patch changelist for more detailed changes
- Link to v2: https://lore.kernel.org/r/20260504-tcp-dm-netkit-v2-0-56d52ac72fd4@meta.com

Changes in v2:
- Squash driver conversion patches (2-5) into patch 1 (Jakub)
- In validate_xmit_unreadable_skb() to check netmem_tx mode before inspecting
  frags (Jakub)
- Lock bind_dev around netdev_queue_get_dma_dev() when bind_dev != netdev to
  fix lockdep (Sashiko)
- Move require_devmem() into individual test functions so KsftSkipEx goes up to
  ksft_run() (Sashiko)
- Add nk_devmem.py to TEST_PROGS in Makefile (Sashiko)
- Link to v1:
  https://lore.kernel.org/all/20260428-tcp-dm-netkit-v1-0-719280eba4d2@meta.com/

Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>

---
Bobby Eshleman (8):
      net: convert netmem_tx flag to enum
      net: netkit: declare NETMEM_TX_NO_DMA mode
      net: devmem: support TX over NETMEM_TX_NO_DMA devices
      selftests: drv-net: ncdevmem: add -n flag to skip NIC configuration
      selftests: drv-net: make attr _nk_guest_ifname public
      selftests: drv-net: refactor devmem command builders into lib module
      selftests: drv-net: add primary_rx_redirect support to NetDrvContEnv
      selftests: drv-net: add netkit devmem tests

 .../networking/net_cachelines/net_device.rst       |   2 +-
 Documentation/networking/netmem.rst                |   8 +-
 .../translations/zh_CN/networking/netmem.rst       |   7 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c          |   2 +-
 drivers/net/ethernet/google/gve/gve_main.c         |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |   2 +-
 drivers/net/ethernet/meta/fbnic/fbnic_netdev.c     |   2 +-
 drivers/net/netkit.c                               |   1 +
 include/linux/netdevice.h                          |  11 +-
 net/core/dev.c                                     |   5 +-
 net/core/devmem.c                                  |   6 +-
 net/core/devmem.h                                  |   9 +-
 net/core/netdev-genl.c                             |  65 +++++-
 tools/testing/selftests/drivers/net/hw/Makefile    |   1 +
 tools/testing/selftests/drivers/net/hw/devmem.py   |  77 ++------
 .../selftests/drivers/net/hw/lib/py/devmem.py      | 218 +++++++++++++++++++++
 tools/testing/selftests/drivers/net/hw/ncdevmem.c  |  58 +++---
 .../testing/selftests/drivers/net/hw/nk_devmem.py  |  55 ++++++
 .../drivers/net/hw/nk_primary_rx_redirect.bpf.c    |  39 ++++
 .../testing/selftests/drivers/net/hw/nk_qlease.py  |   8 +-
 tools/testing/selftests/drivers/net/lib/py/env.py  | 109 ++++++++---
 21 files changed, 549 insertions(+), 138 deletions(-)
---
base-commit: 790ead9394860e7d70c5e0e50a35b243e909a618
change-id: 20260423-tcp-dm-netkit-2bd78b638d30

Best regards,
-- 
Bobby Eshleman <bobbyeshleman@meta.com>
Re: [PATCH net-next v3 0/8] net: devmem: support devmem with netkit devices
Posted by Zhu Yanjun 1 month ago
在 2026/5/7 19:27, Bobby Eshleman 写道:
> This series enables TCP devmem TX through netkit devices.
> 
> Netkit now supports queue leasing. A physical NIC's RX queue can be
> leased to a netkit guest interface inside a container namespace. This
> gives the container a devmem-capable data path on the RX side (bind-rx,
> etc...). On the TX side, the container process binds to its netkit guest
> interface and sends traffic that netkit redirects (via BPF or ip
> forwarding) to the physical NIC for DMA.
> 
> Two things in the existing devmem TX path prevent this from working:
> 
> 1. validate_xmit_unreadable_skb() requires dev->netmem_tx before it will
>     forward a dmabuf-backed (unreadable) skb. This protects skbs from
>     landing on devices that don't have the IOMMU mappings for the backing
>     dmabuf or that don't speak netmem. Netkit, however, does not support
>     DMA, doesn't attempt to read unreadable skb pages and so doesn't
>     break netmem (it is pure skb routing and redirection). It is
>     functionally capable of routing unreadable skbs, but there is no way
>     for the TX validation pathway to distinguish between a device that
>     will actually attempt DMA-ing the skb and another device
>     (like netkit) that does not DMA but also does not break
>     netmem.
> 
> 2. bind_tx_doit uses the bound device as the DMA device.  When the user
>     binds devmem TX to the netkit guest, the bind handler attempts to
>     create DMA mappings against netkit, which has no DMA capability and
>     no IOMMU mappings.
> 
> This series solves these problems as follows:
> 
> 1. Extend netmem_tx to two bits, assigned to one of three values:
> 
>     NETMEM_TX_NONE   - netmem not supported
>     NETMEM_TX_DMA    - netmem supported and performs DMA
>     NETMEM_TX_NO_DMA - netmem supported, but does not DMA
> 
>     With these bits, phys devices can set NETMEM_TX_DMA and devices like
>     netkit set NETMEM_TX_NO_DMA. The validation TX path ensures that any
>     DMA-capable netdev exactly matches the bound device, guaranteeing the
>     correct mapping of the bound dmabuf. The validation TX path also
>     allows devices with NETMEM_TX_NO_DMA to pass, knowing these devices
>     will not misuse netmem or run into IOMMU faults. After redirection or
>     routing and the skb finally makes its way through the stack to a
>     physical device's TX path, the above NETMEM_TX_DMA check is performed
>     again to guarantee the device has the appropriate binding/mappings.
> 
> 2. On TX bind, the bind handler recognizes NETMEM_TX_NO_DMA devices and
>     finds the phys TX device and binds to that instead. For the netkit
>     case, if it has been leased a queue from a DMA-capable device
>     already, then the bind action is performed on the DMA-capable device
>     instead and the dmabuf is mapped correctly.
> 
> ---
> Changes in v3:
> - Fix validate_xmit_unreadable_skb() logic for non-devmem
>    unreadable niovs (should not be dropped) (Sashiko)
> - Simplify lock handling in bind_tx, no premature release (Jakub)
> - split NO_DMA changes into separate patch (Jakub)
> - fixed some pylint issues, one required an additional patch ("selftests:
>    drv-net: make attr _nk_guest_ifname public") to rename a variable from
>    private to public
> - see per-patch changelist for more detailed changes
> - Link to v2: https://lore.kernel.org/r/20260504-tcp-dm-netkit-v2-0-56d52ac72fd4@meta.com
> 
> Changes in v2:
> - Squash driver conversion patches (2-5) into patch 1 (Jakub)
> - In validate_xmit_unreadable_skb() to check netmem_tx mode before inspecting
>    frags (Jakub)
> - Lock bind_dev around netdev_queue_get_dma_dev() when bind_dev != netdev to
>    fix lockdep (Sashiko)
> - Move require_devmem() into individual test functions so KsftSkipEx goes up to
>    ksft_run() (Sashiko)
> - Add nk_devmem.py to TEST_PROGS in Makefile (Sashiko)
> - Link to v1:
>    https://lore.kernel.org/all/20260428-tcp-dm-netkit-v1-0-719280eba4d2@meta.com/
> 
> Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
> 
> ---
> Bobby Eshleman (8):
>        net: convert netmem_tx flag to enum
>        net: netkit: declare NETMEM_TX_NO_DMA mode
>        net: devmem: support TX over NETMEM_TX_NO_DMA devices

I applied this patchset in my local kernel tree and built a new kernel 
image. I loaded this new kernel image in my test environment. It seems 
that all the testcases can pass.

I think that this patchset would not cause any regression problem in my 
test environment.

Zhu Yanjun

>        selftests: drv-net: ncdevmem: add -n flag to skip NIC configuration
>        selftests: drv-net: make attr _nk_guest_ifname public
>        selftests: drv-net: refactor devmem command builders into lib module
>        selftests: drv-net: add primary_rx_redirect support to NetDrvContEnv
>        selftests: drv-net: add netkit devmem tests
> 
>   .../networking/net_cachelines/net_device.rst       |   2 +-
>   Documentation/networking/netmem.rst                |   8 +-
>   .../translations/zh_CN/networking/netmem.rst       |   7 +-
>   drivers/net/ethernet/broadcom/bnxt/bnxt.c          |   2 +-
>   drivers/net/ethernet/google/gve/gve_main.c         |   2 +-
>   drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |   2 +-
>   drivers/net/ethernet/meta/fbnic/fbnic_netdev.c     |   2 +-
>   drivers/net/netkit.c                               |   1 +
>   include/linux/netdevice.h                          |  11 +-
>   net/core/dev.c                                     |   5 +-
>   net/core/devmem.c                                  |   6 +-
>   net/core/devmem.h                                  |   9 +-
>   net/core/netdev-genl.c                             |  65 +++++-
>   tools/testing/selftests/drivers/net/hw/Makefile    |   1 +
>   tools/testing/selftests/drivers/net/hw/devmem.py   |  77 ++------
>   .../selftests/drivers/net/hw/lib/py/devmem.py      | 218 +++++++++++++++++++++
>   tools/testing/selftests/drivers/net/hw/ncdevmem.c  |  58 +++---
>   .../testing/selftests/drivers/net/hw/nk_devmem.py  |  55 ++++++
>   .../drivers/net/hw/nk_primary_rx_redirect.bpf.c    |  39 ++++
>   .../testing/selftests/drivers/net/hw/nk_qlease.py  |   8 +-
>   tools/testing/selftests/drivers/net/lib/py/env.py  | 109 ++++++++---
>   21 files changed, 549 insertions(+), 138 deletions(-)
> ---
> base-commit: 790ead9394860e7d70c5e0e50a35b243e909a618
> change-id: 20260423-tcp-dm-netkit-2bd78b638d30
> 
> Best regards,

Re: [PATCH net-next v3 0/8] net: devmem: support devmem with netkit devices
Posted by Bobby Eshleman 1 month ago
On Sun, May 10, 2026 at 01:33:18PM -0700, Zhu Yanjun wrote:
> 在 2026/5/7 19:27, Bobby Eshleman 写道:
> > This series enables TCP devmem TX through netkit devices.
> > 
> > Netkit now supports queue leasing. A physical NIC's RX queue can be
> > leased to a netkit guest interface inside a container namespace. This
> > gives the container a devmem-capable data path on the RX side (bind-rx,
> > etc...). On the TX side, the container process binds to its netkit guest
> > interface and sends traffic that netkit redirects (via BPF or ip
> > forwarding) to the physical NIC for DMA.
> > 
> > Two things in the existing devmem TX path prevent this from working:
> > 
> > 1. validate_xmit_unreadable_skb() requires dev->netmem_tx before it will
> >     forward a dmabuf-backed (unreadable) skb. This protects skbs from
> >     landing on devices that don't have the IOMMU mappings for the backing
> >     dmabuf or that don't speak netmem. Netkit, however, does not support
> >     DMA, doesn't attempt to read unreadable skb pages and so doesn't
> >     break netmem (it is pure skb routing and redirection). It is
> >     functionally capable of routing unreadable skbs, but there is no way
> >     for the TX validation pathway to distinguish between a device that
> >     will actually attempt DMA-ing the skb and another device
> >     (like netkit) that does not DMA but also does not break
> >     netmem.
> > 
> > 2. bind_tx_doit uses the bound device as the DMA device.  When the user
> >     binds devmem TX to the netkit guest, the bind handler attempts to
> >     create DMA mappings against netkit, which has no DMA capability and
> >     no IOMMU mappings.
> > 
> > This series solves these problems as follows:
> > 
> > 1. Extend netmem_tx to two bits, assigned to one of three values:
> > 
> >     NETMEM_TX_NONE   - netmem not supported
> >     NETMEM_TX_DMA    - netmem supported and performs DMA
> >     NETMEM_TX_NO_DMA - netmem supported, but does not DMA
> > 
> >     With these bits, phys devices can set NETMEM_TX_DMA and devices like
> >     netkit set NETMEM_TX_NO_DMA. The validation TX path ensures that any
> >     DMA-capable netdev exactly matches the bound device, guaranteeing the
> >     correct mapping of the bound dmabuf. The validation TX path also
> >     allows devices with NETMEM_TX_NO_DMA to pass, knowing these devices
> >     will not misuse netmem or run into IOMMU faults. After redirection or
> >     routing and the skb finally makes its way through the stack to a
> >     physical device's TX path, the above NETMEM_TX_DMA check is performed
> >     again to guarantee the device has the appropriate binding/mappings.
> > 
> > 2. On TX bind, the bind handler recognizes NETMEM_TX_NO_DMA devices and
> >     finds the phys TX device and binds to that instead. For the netkit
> >     case, if it has been leased a queue from a DMA-capable device
> >     already, then the bind action is performed on the DMA-capable device
> >     instead and the dmabuf is mapped correctly.
> > 
> > ---
> > Changes in v3:
> > - Fix validate_xmit_unreadable_skb() logic for non-devmem
> >    unreadable niovs (should not be dropped) (Sashiko)
> > - Simplify lock handling in bind_tx, no premature release (Jakub)
> > - split NO_DMA changes into separate patch (Jakub)
> > - fixed some pylint issues, one required an additional patch ("selftests:
> >    drv-net: make attr _nk_guest_ifname public") to rename a variable from
> >    private to public
> > - see per-patch changelist for more detailed changes
> > - Link to v2: https://lore.kernel.org/r/20260504-tcp-dm-netkit-v2-0-56d52ac72fd4@meta.com
> > 
> > Changes in v2:
> > - Squash driver conversion patches (2-5) into patch 1 (Jakub)
> > - In validate_xmit_unreadable_skb() to check netmem_tx mode before inspecting
> >    frags (Jakub)
> > - Lock bind_dev around netdev_queue_get_dma_dev() when bind_dev != netdev to
> >    fix lockdep (Sashiko)
> > - Move require_devmem() into individual test functions so KsftSkipEx goes up to
> >    ksft_run() (Sashiko)
> > - Add nk_devmem.py to TEST_PROGS in Makefile (Sashiko)
> > - Link to v1:
> >    https://lore.kernel.org/all/20260428-tcp-dm-netkit-v1-0-719280eba4d2@meta.com/
> > 
> > Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
> > 
> > ---
> > Bobby Eshleman (8):
> >        net: convert netmem_tx flag to enum
> >        net: netkit: declare NETMEM_TX_NO_DMA mode
> >        net: devmem: support TX over NETMEM_TX_NO_DMA devices
> 
> I applied this patchset in my local kernel tree and built a new kernel
> image. I loaded this new kernel image in my test environment. It seems that
> all the testcases can pass.
> 
> I think that this patchset would not cause any regression problem in my test
> environment.
> 
> Zhu Yanjun

Thanks for testing!

Best,
Bobby