[PATCH net-next v2 0/3] net: devmem: improve cpu cost of RX token management

Bobby Eshleman posted 3 patches 2 weeks, 6 days ago
There is a newer version of this series
include/net/sock.h       |   6 +-
net/core/devmem.c        |  29 +++++-----
net/core/devmem.h        |   4 +-
net/core/sock.c          |  23 +++++---
net/ethtool/ioctl.c      | 144 +++++++++++++++++++++++++++++++++++++++++++++++
net/ipv4/tcp.c           | 120 ++++++++++++++++-----------------------
net/ipv4/tcp_ipv4.c      |  45 +++++++++++++--
net/ipv4/tcp_minisocks.c |   2 -
8 files changed, 266 insertions(+), 107 deletions(-)
[PATCH net-next v2 0/3] net: devmem: improve cpu cost of RX token management
Posted by Bobby Eshleman 2 weeks, 6 days ago
This series improves the CPU cost of RX token management by replacing
the xarray allocator with a normal array of atomics. Similar to devmem
TX's page-index lookup scheme for niovs, RX also uses page indices to
lookup the corresponding atomic in the array.

Improvement is ~5% per RX user thread.

Two other approaches were tested, but with no improvement. Namely, 1)
using a hashmap for tokens and 2) keeping an xarray of atomic counters
but using RCU so that the hotpath could be mostly lockless. Neither of
these approaches proved better than the simple array in terms of CPU.

Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com>
---
Changes in v2:
- net: ethtool: prevent user from breaking devmem single-binding rule
  (Mina)
- pre-assign niovs in binding->vec for RX case (Mina)
- remove WARNs on invalid user input (Mina)
- remove extraneous binding ref get (Mina)
- remove WARN for changed binding (Mina)
- always use GFP_ZERO for binding->vec (Mina)
- fix length of alloc for urefs
- use atomic_set(, 0) to initialize sk_user_frags.urefs
- Link to v1: https://lore.kernel.org/r/20250902-scratch-bobbyeshleman-devmem-tcp-token-upstream-v1-0-d946169b5550@meta.com

---
Bobby Eshleman (3):
      net: devmem: rename tx_vec to vec in dmabuf binding
      net: devmem: use niov array for token management
      net: ethtool: prevent user from breaking devmem single-binding rule

 include/net/sock.h       |   6 +-
 net/core/devmem.c        |  29 +++++-----
 net/core/devmem.h        |   4 +-
 net/core/sock.c          |  23 +++++---
 net/ethtool/ioctl.c      | 144 +++++++++++++++++++++++++++++++++++++++++++++++
 net/ipv4/tcp.c           | 120 ++++++++++++++++-----------------------
 net/ipv4/tcp_ipv4.c      |  45 +++++++++++++--
 net/ipv4/tcp_minisocks.c |   2 -
 8 files changed, 266 insertions(+), 107 deletions(-)
---
base-commit: dc2f650f7e6857bf384069c1a56b2937a1ee370d
change-id: 20250829-scratch-bobbyeshleman-devmem-tcp-token-upstream-292be174d503

Best regards,
-- 
Bobby Eshleman <bobbyeshleman@meta.com>
[syzbot ci] Re: net: devmem: improve cpu cost of RX token management
Posted by syzbot ci 2 weeks, 6 days ago
syzbot ci has tested the following series

[v2] net: devmem: improve cpu cost of RX token management
https://lore.kernel.org/all/20250911-scratch-bobbyeshleman-devmem-tcp-token-upstream-v2-0-c80d735bd453@meta.com
* [PATCH net-next v2 1/3] net: devmem: rename tx_vec to vec in dmabuf binding
* [PATCH net-next v2 2/3] net: devmem: use niov array for token management
* [PATCH net-next v2 3/3] net: ethtool: prevent user from breaking devmem single-binding rule

and found the following issue:
general protection fault in sock_devmem_dontneed

Full report is available here:
https://ci.syzbot.org/series/40b2252a-f8bb-4cec-bfc1-2ff8a3c55336

***

general protection fault in sock_devmem_dontneed

tree:      net-next
URL:       https://kernel.googlesource.com/pub/scm/linux/kernel/git/netdev/net-next.git
base:      5adf6f2b9972dbb69f4dd11bae52ba251c64ecb7
arch:      amd64
compiler:  Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
config:    https://ci.syzbot.org/builds/2c30c608-f14f-4e6d-9772-cc5e129939fc/config
C repro:   https://ci.syzbot.org/findings/c89c36f8-4666-47d0-bc39-35662a268e4d/c_repro
syz repro: https://ci.syzbot.org/findings/c89c36f8-4666-47d0-bc39-35662a268e4d/syz_repro

Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
CPU: 1 UID: 0 PID: 6028 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full) 
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:sock_devmem_dontneed+0x40b/0x910 net/core/sock.c:1112
Code: 8b 44 24 18 44 8b 20 44 03 64 24 14 48 8b 44 24 68 80 3c 18 00 74 08 4c 89 ef e8 f0 bb c9 f8 4d 8b 7d 00 4c 89 f8 48 c1 e8 03 <80> 3c 18 00 74 08 4c 89 ff e8 d7 bb c9 f8 4d 8b 2f 4c 89 e8 48 c1
RSP: 0018:ffffc90002987ac0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: dffffc0000000000 RCX: 1ffff11020d27e78
RDX: ffff88810a039cc0 RSI: 0000000000000003 RDI: 0000000000000000
RBP: ffffc90002987c50 R08: ffffc90002987bdf R09: 0000000000000000
R10: ffffc90002987b60 R11: fffff52000530f7c R12: 0000000000000006
R13: ffff8881235cb710 R14: 0000000000000000 R15: 0000000000000000
FS:  000055555e866500(0000) GS:ffff8881a3c14000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b31b63fff CR3: 0000000027a20000 CR4: 00000000000006f0
Call Trace:
 <TASK>
 sk_setsockopt+0x682/0x2dc0 net/core/sock.c:1301
 do_sock_setsockopt+0x11b/0x1b0 net/socket.c:2340
 __sys_setsockopt net/socket.c:2369 [inline]
 __do_sys_setsockopt net/socket.c:2375 [inline]
 __se_sys_setsockopt net/socket.c:2372 [inline]
 __x64_sys_setsockopt+0x13f/0x1b0 net/socket.c:2372
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7faf24f8eba9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffc3eb96018 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
RAX: ffffffffffffffda RBX: 00007faf251d5fa0 RCX: 00007faf24f8eba9
RDX: 0000000000000050 RSI: 0000000000000001 RDI: 0000000000000003
RBP: 00007faf25011e19 R08: 0000000000000048 R09: 0000000000000000
R10: 0000200000000100 R11: 0000000000000246 R12: 0000000000000000
R13: 00007faf251d5fa0 R14: 00007faf251d5fa0 R15: 0000000000000005
 </TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:sock_devmem_dontneed+0x40b/0x910 net/core/sock.c:1112
Code: 8b 44 24 18 44 8b 20 44 03 64 24 14 48 8b 44 24 68 80 3c 18 00 74 08 4c 89 ef e8 f0 bb c9 f8 4d 8b 7d 00 4c 89 f8 48 c1 e8 03 <80> 3c 18 00 74 08 4c 89 ff e8 d7 bb c9 f8 4d 8b 2f 4c 89 e8 48 c1
RSP: 0018:ffffc90002987ac0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: dffffc0000000000 RCX: 1ffff11020d27e78
RDX: ffff88810a039cc0 RSI: 0000000000000003 RDI: 0000000000000000
RBP: ffffc90002987c50 R08: ffffc90002987bdf R09: 0000000000000000
R10: ffffc90002987b60 R11: fffff52000530f7c R12: 0000000000000006
R13: ffff8881235cb710 R14: 0000000000000000 R15: 0000000000000000
FS:  000055555e866500(0000) GS:ffff8881a3c14000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b31b63fff CR3: 0000000027a20000 CR4: 00000000000006f0
----------------
Code disassembly (best guess):
   0:	8b 44 24 18          	mov    0x18(%rsp),%eax
   4:	44 8b 20             	mov    (%rax),%r12d
   7:	44 03 64 24 14       	add    0x14(%rsp),%r12d
   c:	48 8b 44 24 68       	mov    0x68(%rsp),%rax
  11:	80 3c 18 00          	cmpb   $0x0,(%rax,%rbx,1)
  15:	74 08                	je     0x1f
  17:	4c 89 ef             	mov    %r13,%rdi
  1a:	e8 f0 bb c9 f8       	call   0xf8c9bc0f
  1f:	4d 8b 7d 00          	mov    0x0(%r13),%r15
  23:	4c 89 f8             	mov    %r15,%rax
  26:	48 c1 e8 03          	shr    $0x3,%rax
* 2a:	80 3c 18 00          	cmpb   $0x0,(%rax,%rbx,1) <-- trapping instruction
  2e:	74 08                	je     0x38
  30:	4c 89 ff             	mov    %r15,%rdi
  33:	e8 d7 bb c9 f8       	call   0xf8c9bc0f
  38:	4d 8b 2f             	mov    (%r15),%r13
  3b:	4c 89 e8             	mov    %r13,%rax
  3e:	48                   	rex.W
  3f:	c1                   	.byte 0xc1


***

If these findings have caused you to resend the series or submit a
separate fix, please add the following tag to your commit message:
  Tested-by: syzbot@syzkaller.appspotmail.com

---
This report is generated by a bot. It may contain errors.
syzbot ci engineers can be reached at syzkaller@googlegroups.com.