[PATCH net-next v8 0/5] net: devmem: improve cpu cost of RX token management

Bobby Eshleman posted 5 patches 1 month ago
There is a newer version of this series
Documentation/netlink/specs/netdev.yaml           |  12 +++
Documentation/networking/devmem.rst               |  70 +++++++++++++
include/net/netmem.h                              |   1 +
include/net/sock.h                                |   7 +-
include/uapi/linux/netdev.h                       |   1 +
net/core/devmem.c                                 | 114 ++++++++++++++++++----
net/core/devmem.h                                 |  13 ++-
net/core/netdev-genl-gen.c                        |   5 +-
net/core/netdev-genl.c                            |  10 +-
net/core/sock.c                                   | 103 ++++++++++++++-----
net/ipv4/tcp.c                                    |  76 ++++++++++++---
net/ipv4/tcp_ipv4.c                               |  11 ++-
net/ipv4/tcp_minisocks.c                          |   3 +-
tools/include/uapi/linux/netdev.h                 |   1 +
tools/testing/selftests/drivers/net/hw/devmem.py  |  21 +++-
tools/testing/selftests/drivers/net/hw/ncdevmem.c |  19 ++--
16 files changed, 389 insertions(+), 78 deletions(-)
[PATCH net-next v8 0/5] net: devmem: improve cpu cost of RX token management
Posted by Bobby Eshleman 1 month ago
This series improves the CPU cost of RX token management by adding an
attribute to NETDEV_CMD_BIND_RX that configures sockets using the
binding to avoid the xarray allocator and instead use a per-binding niov
array and a uref field in niov.

Improvement is ~13% cpu util per RX user thread.

Using kperf, the following results were observed:

Before:
	Average RX worker idle %: 13.13, flows 4, test runs 11
After:
	Average RX worker idle %: 26.32, flows 4, test runs 11

Two other approaches were tested, but with no improvement. Namely, 1)
using a hashmap for tokens and 2) keeping an xarray of atomic counters
but using RCU so that the hotpath could be mostly lockless. Neither of
these approaches proved better than the simple array in terms of CPU.

The attribute NETDEV_A_DMABUF_AUTORELEASE is added to toggle the
optimization. It is an optional attribute and defaults to 0 (i.e.,
optimization on).

To: David S. Miller <davem@davemloft.net>
To: Eric Dumazet <edumazet@google.com>
To: Jakub Kicinski <kuba@kernel.org>
To: Paolo Abeni <pabeni@redhat.com>
To: Simon Horman <horms@kernel.org>
To: Kuniyuki Iwashima <kuniyu@google.com>
To: Willem de Bruijn <willemb@google.com>
To: Neal Cardwell <ncardwell@google.com>
To: David Ahern <dsahern@kernel.org>
To: Mina Almasry <almasrymina@google.com>
To: Arnd Bergmann <arnd@arndb.de>
To: Jonathan Corbet <corbet@lwn.net>
To: Andrew Lunn <andrew+netdev@lunn.ch>
To: Shuah Khan <shuah@kernel.org>
Cc: Stanislav Fomichev <sdf@fomichev.me>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Cc: linux-kselftest@vger.kernel.org
Cc: asml.silence@gmail.com
Cc: matttbe@kernel.org
Cc: skhawaja@google.com

Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>

Changes in v8:
- change static branch logic (only set when enabled, otherwise just
  always revert back to disabled)
- fix missing tests
- Link to v7: https://lore.kernel.org/r/20251119-scratch-bobbyeshleman-devmem-tcp-token-upstream-v7-0-1abc8467354c@meta.com

Changes in v7:
- use netlink instead of sockopt (Stan)
- restrict system to only one mode, dmabuf bindings can not co-exist
  with different modes (Stan)
- use static branching to enforce single system-wide mode (Stan)
- Link to v6: https://lore.kernel.org/r/20251104-scratch-bobbyeshleman-devmem-tcp-token-upstream-v6-0-ea98cf4d40b3@meta.com

Changes in v6:
- renamed 'net: devmem: use niov array for token management' to refer to
  optionality of new config
- added documentation and tests
- make autorelease flag per-socket sockopt instead of binding
  field / sysctl
- many per-patch changes (see Changes sections per-patch)
- Link to v5: https://lore.kernel.org/r/20251023-scratch-bobbyeshleman-devmem-tcp-token-upstream-v5-0-47cb85f5259e@meta.com

Changes in v5:
- add sysctl to opt-out of performance benefit, back to old token release
- Link to v4: https://lore.kernel.org/all/20250926-scratch-bobbyeshleman-devmem-tcp-token-upstream-v4-0-39156563c3ea@meta.com

Changes in v4:
- rebase to net-next
- Link to v3: https://lore.kernel.org/r/20250926-scratch-bobbyeshleman-devmem-tcp-token-upstream-v3-0-084b46bda88f@meta.com

Changes in v3:
- make urefs per-binding instead of per-socket, reducing memory
  footprint
- fallback to cleaning up references in dmabuf unbind if socket
  leaked tokens
- drop ethtool patch
- Link to v2: https://lore.kernel.org/r/20250911-scratch-bobbyeshleman-devmem-tcp-token-upstream-v2-0-c80d735bd453@meta.com

Changes in v2:
- net: ethtool: prevent user from breaking devmem single-binding rule
  (Mina)
- pre-assign niovs in binding->vec for RX case (Mina)
- remove WARNs on invalid user input (Mina)
- remove extraneous binding ref get (Mina)
- remove WARN for changed binding (Mina)
- always use GFP_ZERO for binding->vec (Mina)
- fix length of alloc for urefs
- use atomic_set(, 0) to initialize sk_user_frags.urefs
- Link to v1: https://lore.kernel.org/r/20250902-scratch-bobbyeshleman-devmem-tcp-token-upstream-v1-0-d946169b5550@meta.com

---
Bobby Eshleman (5):
      net: devmem: rename tx_vec to vec in dmabuf binding
      net: devmem: refactor sock_devmem_dontneed for autorelease split
      net: devmem: implement autorelease token management
      net: devmem: document NETDEV_A_DMABUF_AUTORELEASE netlink attribute
      selftests: drv-net: devmem: add autorelease test

 Documentation/netlink/specs/netdev.yaml           |  12 +++
 Documentation/networking/devmem.rst               |  70 +++++++++++++
 include/net/netmem.h                              |   1 +
 include/net/sock.h                                |   7 +-
 include/uapi/linux/netdev.h                       |   1 +
 net/core/devmem.c                                 | 114 ++++++++++++++++++----
 net/core/devmem.h                                 |  13 ++-
 net/core/netdev-genl-gen.c                        |   5 +-
 net/core/netdev-genl.c                            |  10 +-
 net/core/sock.c                                   | 103 ++++++++++++++-----
 net/ipv4/tcp.c                                    |  76 ++++++++++++---
 net/ipv4/tcp_ipv4.c                               |  11 ++-
 net/ipv4/tcp_minisocks.c                          |   3 +-
 tools/include/uapi/linux/netdev.h                 |   1 +
 tools/testing/selftests/drivers/net/hw/devmem.py  |  21 +++-
 tools/testing/selftests/drivers/net/hw/ncdevmem.c |  19 ++--
 16 files changed, 389 insertions(+), 78 deletions(-)
---
base-commit: 627f8a2588139ec699cda5d548c6d4733d2682ca
change-id: 20250829-scratch-bobbyeshleman-devmem-tcp-token-upstream-292be174d503

Best regards,
-- 
Bobby Eshleman <bobbyeshleman@meta.com>
Re: [PATCH net-next v8 0/5] net: devmem: improve cpu cost of RX token management
Posted by Jakub Kicinski 1 month ago
On Wed, 07 Jan 2026 16:57:34 -0800 Bobby Eshleman wrote:
> This series improves the CPU cost of RX token management by adding an
> attribute to NETDEV_CMD_BIND_RX that configures sockets using the
> binding to avoid the xarray allocator and instead use a per-binding niov
> array and a uref field in niov.

net/ipv4/tcp.c:2600:41: error: implicit declaration of function ‘net_devmem_dmabuf_binding_get’; did you mean ‘net_devmem_dmabuf_binding_put’? [-Wimplicit-function-declaration]
 2600 |                                         net_devmem_dmabuf_binding_get(binding);
      |                                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      |                                         net_devmem_dmabuf_binding_put
-- 
pw-bot: cr
Re: [PATCH net-next v8 0/5] net: devmem: improve cpu cost of RX token management
Posted by Bobby Eshleman 1 month ago
On Wed, Jan 07, 2026 at 07:30:13PM -0800, Jakub Kicinski wrote:
> On Wed, 07 Jan 2026 16:57:34 -0800 Bobby Eshleman wrote:
> > This series improves the CPU cost of RX token management by adding an
> > attribute to NETDEV_CMD_BIND_RX that configures sockets using the
> > binding to avoid the xarray allocator and instead use a per-binding niov
> > array and a uref field in niov.
> 
> net/ipv4/tcp.c:2600:41: error: implicit declaration of function ‘net_devmem_dmabuf_binding_get’; did you mean ‘net_devmem_dmabuf_binding_put’? [-Wimplicit-function-declaration]
>  2600 |                                         net_devmem_dmabuf_binding_get(binding);
>       |                                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>       |                                         net_devmem_dmabuf_binding_put
> -- 
> pw-bot: cr

I see that net_devmem_dmabuf_binding_get() is lacking a
stub for CONFIG_NET_DEVMEM=n ...

Just curious how pw works... is this a randconfig catch? I ask because
all of the build targets pass for this series (build_allmodconfig_warn,
build_clang, etc.. locally and on patchwork.kernel.org), and if there is
a config that pw uses that I'm missing in my local checks I'd like to
add it.

Best,
Bobby
Re: [PATCH net-next v8 0/5] net: devmem: improve cpu cost of RX token management
Posted by Jakub Kicinski 1 month ago
On Wed, 7 Jan 2026 20:37:32 -0800 Bobby Eshleman wrote:
> On Wed, Jan 07, 2026 at 07:30:13PM -0800, Jakub Kicinski wrote:
> > On Wed, 07 Jan 2026 16:57:34 -0800 Bobby Eshleman wrote:  
> > > This series improves the CPU cost of RX token management by adding an
> > > attribute to NETDEV_CMD_BIND_RX that configures sockets using the
> > > binding to avoid the xarray allocator and instead use a per-binding niov
> > > array and a uref field in niov.  
> > 
> > net/ipv4/tcp.c:2600:41: error: implicit declaration of function ‘net_devmem_dmabuf_binding_get’; did you mean ‘net_devmem_dmabuf_binding_put’? [-Wimplicit-function-declaration]
> >  2600 |                                         net_devmem_dmabuf_binding_get(binding);
> >       |                                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >       |                                         net_devmem_dmabuf_binding_put
> 
> I see that net_devmem_dmabuf_binding_get() is lacking a
> stub for CONFIG_NET_DEVMEM=n ...
> 
> Just curious how pw works... is this a randconfig catch? I ask because
> all of the build targets pass for this series (build_allmodconfig_warn,
> build_clang, etc.. locally and on patchwork.kernel.org), and if there is
> a config that pw uses that I'm missing in my local checks I'd like to
> add it.

kunit hit it on our end
Re: [PATCH net-next v8 0/5] net: devmem: improve cpu cost of RX token management
Posted by Bobby Eshleman 1 month ago
On Thu, Jan 08, 2026 at 06:42:00AM -0800, Jakub Kicinski wrote:
> On Wed, 7 Jan 2026 20:37:32 -0800 Bobby Eshleman wrote:
> > On Wed, Jan 07, 2026 at 07:30:13PM -0800, Jakub Kicinski wrote:
> > > On Wed, 07 Jan 2026 16:57:34 -0800 Bobby Eshleman wrote:  
> > > > This series improves the CPU cost of RX token management by adding an
> > > > attribute to NETDEV_CMD_BIND_RX that configures sockets using the
> > > > binding to avoid the xarray allocator and instead use a per-binding niov
> > > > array and a uref field in niov.  
> > > 
> > > net/ipv4/tcp.c:2600:41: error: implicit declaration of function ‘net_devmem_dmabuf_binding_get’; did you mean ‘net_devmem_dmabuf_binding_put’? [-Wimplicit-function-declaration]
> > >  2600 |                                         net_devmem_dmabuf_binding_get(binding);
> > >       |                                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > >       |                                         net_devmem_dmabuf_binding_put
> > 
> > I see that net_devmem_dmabuf_binding_get() is lacking a
> > stub for CONFIG_NET_DEVMEM=n ...
> > 
> > Just curious how pw works... is this a randconfig catch? I ask because
> > all of the build targets pass for this series (build_allmodconfig_warn,
> > build_clang, etc.. locally and on patchwork.kernel.org), and if there is
> > a config that pw uses that I'm missing in my local checks I'd like to
> > add it.
> 
> kunit hit it on our end

Got it, thank you