[PATCH net-next v4 0/2] net: devmem: improve cpu cost of RX token management

Bobby Eshleman posted 2 patches 5 days, 5 hours ago
include/net/netmem.h     |  1 +
include/net/sock.h       |  4 +--
net/core/devmem.c        | 46 +++++++++++++++---------
net/core/devmem.h        |  4 +--
net/core/sock.c          | 34 ++++++++++++------
net/ipv4/tcp.c           | 94 +++++++++++-------------------------------------
net/ipv4/tcp_ipv4.c      | 18 ++--------
net/ipv4/tcp_minisocks.c |  2 +-
8 files changed, 82 insertions(+), 121 deletions(-)
[PATCH net-next v4 0/2] net: devmem: improve cpu cost of RX token management
Posted by Bobby Eshleman 5 days, 5 hours ago
This series improves the CPU cost of RX token management by replacing
the xarray allocator with an niov array and a uref field in niov.

Improvement is ~5% per RX user thread.

Two other approaches were tested, but with no improvement. Namely, 1)
using a hashmap for tokens and 2) keeping an xarray of atomic counters
but using RCU so that the hotpath could be mostly lockless. Neither of
these approaches proved better than the simple array in terms of CPU.

Running with a NCCL workload is still TODO, but I will follow up on this
thread with those results when done.

Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
Changes in v4:
- rebase to net-next
- Link to v3: https://lore.kernel.org/r/20250926-scratch-bobbyeshleman-devmem-tcp-token-upstream-v3-0-084b46bda88f@meta.com

Changes in v3:
- make urefs per-binding instead of per-socket, reducing memory
  footprint
- fallback to cleaning up references in dmabuf unbind if socket
  leaked tokens
- drop ethtool patch
- Link to v2: https://lore.kernel.org/r/20250911-scratch-bobbyeshleman-devmem-tcp-token-upstream-v2-0-c80d735bd453@meta.com

Changes in v2:
- net: ethtool: prevent user from breaking devmem single-binding rule
  (Mina)
- pre-assign niovs in binding->vec for RX case (Mina)
- remove WARNs on invalid user input (Mina)
- remove extraneous binding ref get (Mina)
- remove WARN for changed binding (Mina)
- always use GFP_ZERO for binding->vec (Mina)
- fix length of alloc for urefs
- use atomic_set(, 0) to initialize sk_user_frags.urefs
- Link to v1:
https://lore.kernel.org/r/20250902-scratch-bobbyeshleman-devmem-tcp-token-upstream-v1-0-d946169b5550@meta.com

---
Bobby Eshleman (2):
      net: devmem: rename tx_vec to vec in dmabuf binding
      net: devmem: use niov array for token management

 include/net/netmem.h     |  1 +
 include/net/sock.h       |  4 +--
 net/core/devmem.c        | 46 +++++++++++++++---------
 net/core/devmem.h        |  4 +--
 net/core/sock.c          | 34 ++++++++++++------
 net/ipv4/tcp.c           | 94 +++++++++++-------------------------------------
 net/ipv4/tcp_ipv4.c      | 18 ++--------
 net/ipv4/tcp_minisocks.c |  2 +-
 8 files changed, 82 insertions(+), 121 deletions(-)
---
base-commit: 203e3beb73e53584ca90bc2a6d8240b9b12b9bcf
change-id: 20250829-scratch-bobbyeshleman-devmem-tcp-token-upstream-292be174d503

Best regards,
-- 
Bobby Eshleman <bobbyeshleman@meta.com>
[syzbot ci] Re: net: devmem: improve cpu cost of RX token management
Posted by syzbot ci 4 days, 15 hours ago
syzbot ci has tested the following series

[v4] net: devmem: improve cpu cost of RX token management
https://lore.kernel.org/all/20250926-scratch-bobbyeshleman-devmem-tcp-token-upstream-v4-0-39156563c3ea@meta.com
* [PATCH net-next v4 1/2] net: devmem: rename tx_vec to vec in dmabuf binding
* [PATCH net-next v4 2/2] net: devmem: use niov array for token management

and found the following issue:
general protection fault in sock_devmem_dontneed

Full report is available here:
https://ci.syzbot.org/series/b8209bd4-e9f0-4c54-bad3-613e8431151b

***

general protection fault in sock_devmem_dontneed

tree:      net-next
URL:       https://kernel.googlesource.com/pub/scm/linux/kernel/git/netdev/net-next.git
base:      dc1dea796b197aba2c3cae25bfef45f4b3ad46fe
arch:      amd64
compiler:  Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
config:    https://ci.syzbot.org/builds/b4d90fd9-9fbe-4e17-8fc0-3d6603df09da/config
C repro:   https://ci.syzbot.org/findings/ce81b3c3-3db8-4643-9731-cbe331c65fdb/c_repro
syz repro: https://ci.syzbot.org/findings/ce81b3c3-3db8-4643-9731-cbe331c65fdb/syz_repro

Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
CPU: 1 UID: 0 PID: 5996 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:sock_devmem_dontneed+0x372/0x920 net/core/sock.c:1113
Code: 48 44 8b 20 44 89 74 24 54 45 01 f4 48 8b 44 24 60 42 80 3c 28 00 74 08 48 89 df e8 e8 5a c9 f8 4c 8b 33 4c 89 f0 48 c1 e8 03 <42> 80 3c 28 00 74 08 4c 89 f7 e8 cf 5a c9 f8 4d 8b 3e 4c 89 f8 48
RSP: 0018:ffffc90002a1fac0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88810a8ab710 RCX: 1ffff11023002f45
RDX: ffff88801b339cc0 RSI: 0000000000002000 RDI: 0000000000000000
RBP: ffffc90002a1fc50 R08: ffffc90002a1fbdf R09: 0000000000000000
R10: ffffc90002a1fb60 R11: fffff52000543f7c R12: 0000000000f07000
R13: dffffc0000000000 R14: 0000000000000000 R15: ffff88810a8ab710
FS:  000055556f85a500(0000) GS:ffff8881a3c3d000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00002000000a2000 CR3: 0000000024516000 CR4: 00000000000006f0
Call Trace:
 <TASK>
 sk_setsockopt+0x682/0x2dc0 net/core/sock.c:1304
 do_sock_setsockopt+0x11b/0x1b0 net/socket.c:2340
 __sys_setsockopt net/socket.c:2369 [inline]
 __do_sys_setsockopt net/socket.c:2375 [inline]
 __se_sys_setsockopt net/socket.c:2372 [inline]
 __x64_sys_setsockopt+0x13f/0x1b0 net/socket.c:2372
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fea0438ec29
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffd04a8f368 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
RAX: ffffffffffffffda RBX: 00007fea045d5fa0 RCX: 00007fea0438ec29
RDX: 0000000000000050 RSI: 0000000000000001 RDI: 0000000000000003
RBP: 00007fea04411e41 R08: 0000000000000010 R09: 0000000000000000
R10: 00002000000a2000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007fea045d5fa0 R14: 00007fea045d5fa0 R15: 0000000000000005
 </TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:sock_devmem_dontneed+0x372/0x920 net/core/sock.c:1113
Code: 48 44 8b 20 44 89 74 24 54 45 01 f4 48 8b 44 24 60 42 80 3c 28 00 74 08 48 89 df e8 e8 5a c9 f8 4c 8b 33 4c 89 f0 48 c1 e8 03 <42> 80 3c 28 00 74 08 4c 89 f7 e8 cf 5a c9 f8 4d 8b 3e 4c 89 f8 48
RSP: 0018:ffffc90002a1fac0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88810a8ab710 RCX: 1ffff11023002f45
RDX: ffff88801b339cc0 RSI: 0000000000002000 RDI: 0000000000000000
RBP: ffffc90002a1fc50 R08: ffffc90002a1fbdf R09: 0000000000000000
R10: ffffc90002a1fb60 R11: fffff52000543f7c R12: 0000000000f07000
R13: dffffc0000000000 R14: 0000000000000000 R15: ffff88810a8ab710
FS:  000055556f85a500(0000) GS:ffff8881a3c3d000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00002000000a2000 CR3: 0000000024516000 CR4: 00000000000006f0
----------------
Code disassembly (best guess):
   0:	48                   	rex.W
   1:	44 8b 20             	mov    (%rax),%r12d
   4:	44 89 74 24 54       	mov    %r14d,0x54(%rsp)
   9:	45 01 f4             	add    %r14d,%r12d
   c:	48 8b 44 24 60       	mov    0x60(%rsp),%rax
  11:	42 80 3c 28 00       	cmpb   $0x0,(%rax,%r13,1)
  16:	74 08                	je     0x20
  18:	48 89 df             	mov    %rbx,%rdi
  1b:	e8 e8 5a c9 f8       	call   0xf8c95b08
  20:	4c 8b 33             	mov    (%rbx),%r14
  23:	4c 89 f0             	mov    %r14,%rax
  26:	48 c1 e8 03          	shr    $0x3,%rax
* 2a:	42 80 3c 28 00       	cmpb   $0x0,(%rax,%r13,1) <-- trapping instruction
  2f:	74 08                	je     0x39
  31:	4c 89 f7             	mov    %r14,%rdi
  34:	e8 cf 5a c9 f8       	call   0xf8c95b08
  39:	4d 8b 3e             	mov    (%r14),%r15
  3c:	4c 89 f8             	mov    %r15,%rax
  3f:	48                   	rex.W


***

If these findings have caused you to resend the series or submit a
separate fix, please add the following tag to your commit message:
  Tested-by: syzbot@syzkaller.appspotmail.com

---
This report is generated by a bot. It may contain errors.
syzbot ci engineers can be reached at syzkaller@googlegroups.com.