net/mptcp/protocol.c | 158 +++++++++++------- net/mptcp/protocol.h | 12 +- net/mptcp/sched.c | 59 ++++--- .../testing/selftests/bpf/prog_tests/mptcp.c | 34 ++++ .../selftests/bpf/progs/mptcp_bpf_red.c | 36 ++++ 5 files changed, 209 insertions(+), 90 deletions(-) create mode 100644 tools/testing/selftests/bpf/progs/mptcp_bpf_red.c
v11: - address to Mat's comments in v10. - rebase to export/20220908T063452 v10: - send multiple dfrags in __mptcp_push_pending(). v9: - drop the extra *err paramenter of mptcp_sched_get_send() as Florian suggested. v8: - update __mptcp_push_pending(), send the same data on each subflow. - update __mptcp_retrans, track the max sent data. = add a new patch. v7: - drop redundant flag in v6 - drop __mptcp_subflows_push_pending in v6 - update redundant subflows support in __mptcp_push_pending - update redundant subflows support in __mptcp_retrans v6: - Add redundant flag for struct mptcp_sched_ops. - add a dedicated function __mptcp_subflows_push_pending() to deal with redundat subflows push pending. v5: - address to Paolo's comment, keep the optimization to mptcp_subflow_get_send() for the non eBPF case. - merge mptcp_sched_get_send() and __mptcp_sched_get_send() in v4 into one. - depends on "cleanups for bpf sched selftests". v4: - small cleanups in patch 1, 2. - add TODO in patch 3. - rebase patch 5 on 'cleanups for bpf sched selftests'. v3: - use new API. - fix the link failure tests issue mentioned in ("https://patchwork.kernel.org/project/mptcp/cover/cover.1653033459.git.geliang.tang@suse.com/"). v2: - add MPTCP_SUBFLOWS_MAX limit to avoid infinite loops when the scheduler always sets call_again to true. - track the largest copied amount. - deal with __mptcp_subflow_push_pending() and the retransmit loop. - depends on "BPF round-robin scheduler" v14. v1: Implements the redundant BPF MPTCP scheduler, which sends all packets redundantly on all available subflows. Geliang Tang (5): Squash to "mptcp: add get_subflow wrappers" mptcp: redundant subflows push pending mptcp: redundant subflows retrans support selftests/bpf: Add bpf_red scheduler selftests/bpf: Add bpf_red test net/mptcp/protocol.c | 158 +++++++++++------- net/mptcp/protocol.h | 12 +- net/mptcp/sched.c | 59 ++++--- .../testing/selftests/bpf/prog_tests/mptcp.c | 34 ++++ .../selftests/bpf/progs/mptcp_bpf_red.c | 36 ++++ 5 files changed, 209 insertions(+), 90 deletions(-) create mode 100644 tools/testing/selftests/bpf/progs/mptcp_bpf_red.c -- 2.35.3
On Fri, 9 Sep 2022, Geliang Tang wrote: > v11: > - address to Mat's comments in v10. > - rebase to export/20220908T063452 > Hi Geliang - Thanks for the updates to this series. I get slightly different kernel splats than the CI. For example, here's my kmsg output with the first test in mptcp_connect.sh: [ 3102.670021] IPv6: ADDRCONF(NETDEV_CHANGE): ns1eth2: link becomes ready [ 3102.885448] IPv6: ADDRCONF(NETDEV_CHANGE): ns2eth3: link becomes ready [ 3103.112575] IPv6: ADDRCONF(NETDEV_CHANGE): ns3eth4: link becomes ready [ 3103.463347] IPv6: ADDRCONF(NETDEV_CHANGE): ns2eth1: link becomes ready [ 3107.580236] ------------[ cut here ]------------ [ 3107.581325] WARNING: CPU: 2 PID: 1112 at net/mptcp/protocol.c:1306 mptcp_sendmsg_frag (/home/mjmartin/work/mptcp-nn/net/mptcp/protocol.c:1306 (discriminator 1)) [ 3107.583192] Modules linked in: [ 3107.585317] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.fc36 04/01/2014 [ 3107.587250] RIP: 0010:mptcp_sendmsg_frag (/home/mjmartin/work/mptcp-nn/net/mptcp/protocol.c:1306 (discriminator 1)) [ 3107.588421] Code: 0f 85 21 fd ff ff 48 c7 c2 a0 c1 1a 83 be 78 00 00 00 48 c7 c7 80 c7 1a 83 c6 05 9e 31 f2 01 01 e8 03 83 03 00 e9 fd fc ff ff <0f> 0b 0f b6 44 24 63 88 44 24 30 e9 dc f4 ff ff 8b 74 24 18 48 89 All code ======== 0: 0f 85 21 fd ff ff jne 0xfffffffffffffd27 6: 48 c7 c2 a0 c1 1a 83 mov $0xffffffff831ac1a0,%rdx d: be 78 00 00 00 mov $0x78,%esi 12: 48 c7 c7 80 c7 1a 83 mov $0xffffffff831ac780,%rdi 19: c6 05 9e 31 f2 01 01 movb $0x1,0x1f2319e(%rip) # 0x1f231be 20: e8 03 83 03 00 call 0x38328 25: e9 fd fc ff ff jmp 0xfffffffffffffd27 2a:* 0f 0b ud2 <-- trapping instruction 2c: 0f b6 44 24 63 movzbl 0x63(%rsp),%eax 31: 88 44 24 30 mov %al,0x30(%rsp) 35: e9 dc f4 ff ff jmp 0xfffffffffffff516 3a: 8b 74 24 18 mov 0x18(%rsp),%esi 3e: 48 rex.W 3f: 89 .byte 0x89 Code starting with the faulting instruction =========================================== 0: 0f 0b ud2 2: 0f b6 44 24 63 movzbl 0x63(%rsp),%eax 7: 88 44 24 30 mov %al,0x30(%rsp) b: e9 dc f4 ff ff jmp 0xfffffffffffff4ec 10: 8b 74 24 18 mov 0x18(%rsp),%esi 14: 48 rex.W 15: 89 .byte 0x89 [ 3107.592031] RSP: 0018:ffff888010f67910 EFLAGS: 00010202 [ 3107.593172] RAX: e585171f95821f87 RBX: ffff888113b04f00 RCX: ffffffff8260bb19 [ 3107.594458] RDX: 0000000000000001 RSI: ffffffff8260bac4 RDI: ffff88800fa84848 [ 3107.596326] RBP: e585171f95821f88 R08: 0000000000000000 R09: ffff88800fa848af [ 3107.597665] R10: ffffed1001f50915 R11: 0000000000000000 R12: ffff888010f67a78 [ 3107.599903] R13: 0000000000000001 R14: ffff888107691800 R15: ffff8880357a0000 [ 3107.601366] FS: 00007f668bdb0740(0000) GS:ffff88811b100000(0000) knlGS:0000000000000000 [ 3107.603201] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3107.604495] CR2: 00007ffe77890328 CR3: 0000000115318005 CR4: 0000000000370ee0 [ 3107.606212] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 3107.607854] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 3107.610532] Call Trace: [ 3107.611229] <TASK> [ 3107.611746] ? mptcp_init_sock (/home/mjmartin/work/mptcp-nn/net/mptcp/protocol.c:1230) [ 3107.612654] ? lockdep_hardirqs_on_prepare (/home/mjmartin/work/mptcp-nn/kernel/locking/lockdep.c:4252 /home/mjmartin/work/mptcp-nn/kernel/locking/lockdep.c:4319 /home/mjmartin/work/mptcp-nn/kernel/locking/lockdep.c:4271) [ 3107.613890] ? __local_bh_enable_ip (/home/mjmartin/work/mptcp-nn/./arch/x86/include/asm/irqflags.h:45 /home/mjmartin/work/mptcp-nn/./arch/x86/include/asm/irqflags.h:80 /home/mjmartin/work/mptcp-nn/kernel/softirq.c:401) [ 3107.614952] __mptcp_push_pending (/home/mjmartin/work/mptcp-nn/net/mptcp/protocol.c:1569) [ 3107.615948] ? mptcp_close (/home/mjmartin/work/mptcp-nn/net/mptcp/protocol.c:1532) [ 3107.616850] ? __sk_mem_raise_allocated (/home/mjmartin/work/mptcp-nn/net/core/sock.c:2810 /home/mjmartin/work/mptcp-nn/net/core/sock.c:2981) [ 3107.617982] ? copy_page_from_iter (/home/mjmartin/work/mptcp-nn/lib/iov_iter.c:751 /home/mjmartin/work/mptcp-nn/lib/iov_iter.c:738) [ 3107.618991] mptcp_sendmsg (/home/mjmartin/work/mptcp-nn/net/mptcp/protocol.c:1789) [ 3107.619896] ? __mptcp_push_pending (/home/mjmartin/work/mptcp-nn/net/mptcp/protocol.c:1682) [ 3107.621006] ? inet_send_prepare (/home/mjmartin/work/mptcp-nn/net/ipv4/af_inet.c:807) [ 3107.622087] ? inet_send_prepare (/home/mjmartin/work/mptcp-nn/net/ipv4/af_inet.c:816) [ 3107.623082] sock_sendmsg (/home/mjmartin/work/mptcp-nn/net/socket.c:717 /home/mjmartin/work/mptcp-nn/net/socket.c:734) [ 3107.624372] sock_write_iter (/home/mjmartin/work/mptcp-nn/net/socket.c:1109) [ 3107.625361] ? sock_sendmsg (/home/mjmartin/work/mptcp-nn/net/socket.c:1092) [ 3107.626261] ? file_has_perm (/home/mjmartin/work/mptcp-nn/security/selinux/hooks.c:1724) [ 3107.627185] ? selinux_file_permission (/home/mjmartin/work/mptcp-nn/security/selinux/hooks.c:3570 /home/mjmartin/work/mptcp-nn/security/selinux/hooks.c:3590) [ 3107.628345] vfs_write (/home/mjmartin/work/mptcp-nn/./include/linux/fs.h:2187 /home/mjmartin/work/mptcp-nn/fs/read_write.c:491 /home/mjmartin/work/mptcp-nn/fs/read_write.c:578) [ 3107.629171] ? __ia32_sys_pread64 (/home/mjmartin/work/mptcp-nn/fs/read_write.c:559) [ 3107.630159] ? bit_wait_io_timeout (/home/mjmartin/work/mptcp-nn/kernel/locking/mutex.c:902) [ 3107.631177] ? __fget_light (/home/mjmartin/work/mptcp-nn/fs/file.c:1007 (discriminator 1)) [ 3107.632005] ksys_write (/home/mjmartin/work/mptcp-nn/fs/read_write.c:631) [ 3107.632957] ? __ia32_sys_read (/home/mjmartin/work/mptcp-nn/fs/read_write.c:621) [ 3107.633859] ? lockdep_hardirqs_on_prepare (/home/mjmartin/work/mptcp-nn/kernel/locking/lockdep.c:466 /home/mjmartin/work/mptcp-nn/kernel/locking/lockdep.c:4320 /home/mjmartin/work/mptcp-nn/kernel/locking/lockdep.c:4271) [ 3107.635041] ? syscall_enter_from_user_mode (/home/mjmartin/work/mptcp-nn/./arch/x86/include/asm/irqflags.h:45 /home/mjmartin/work/mptcp-nn/./arch/x86/include/asm/irqflags.h:80 /home/mjmartin/work/mptcp-nn/kernel/entry/common.c:109) [ 3107.636398] do_syscall_64 (/home/mjmartin/work/mptcp-nn/arch/x86/entry/common.c:50 /home/mjmartin/work/mptcp-nn/arch/x86/entry/common.c:80) [ 3107.637254] entry_SYSCALL_64_after_hwframe (/home/mjmartin/work/mptcp-nn/arch/x86/entry/entry_64.S:120) [ 3107.638351] RIP: 0033:0x7f668beb48f7 [ 3107.639336] Code: 0f 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 All code ======== 0: 0f 00 (bad) 2: f7 d8 neg %eax 4: 64 89 02 mov %eax,%fs:(%rdx) 7: 48 c7 c0 ff ff ff ff mov $0xffffffffffffffff,%rax e: eb b7 jmp 0xffffffffffffffc7 10: 0f 1f 00 nopl (%rax) 13: f3 0f 1e fa endbr64 17: 64 8b 04 25 18 00 00 mov %fs:0x18,%eax 1e: 00 1f: 85 c0 test %eax,%eax 21: 75 10 jne 0x33 23: b8 01 00 00 00 mov $0x1,%eax 28: 0f 05 syscall 2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction 30: 77 51 ja 0x83 32: c3 ret 33: 48 83 ec 28 sub $0x28,%rsp 37: 48 89 54 24 18 mov %rdx,0x18(%rsp) 3c: 48 rex.W 3d: 89 .byte 0x89 3e: 74 24 je 0x64 Code starting with the faulting instruction =========================================== 0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax 6: 77 51 ja 0x59 8: c3 ret 9: 48 83 ec 28 sub $0x28,%rsp d: 48 89 54 24 18 mov %rdx,0x18(%rsp) 12: 48 rex.W 13: 89 .byte 0x89 14: 74 24 je 0x3a [ 3107.643426] RSP: 002b:00007ffe778943c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 3107.645335] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f668beb48f7 [ 3107.647101] RDX: 0000000000002000 RSI: 00007ffe778943f0 RDI: 0000000000000003 [ 3107.648756] RBP: 0000000000000000 R08: 00007f668bfab214 R09: 00007f668bfab280 [ 3107.650320] R10: 00007f668bdba140 R11: 0000000000000246 R12: 0000000000001500 [ 3107.651883] R13: 0000000000002000 R14: 0000000000000000 R15: 0000000000002000 [ 3107.653497] </TASK> [ 3107.654005] irq event stamp: 17761 [ 3107.654780] hardirqs last enabled at (17773): __up_console_sem (/home/mjmartin/work/mptcp-nn/./arch/x86/include/asm/irqflags.h:45 (discriminator 1) /home/mjmartin/work/mptcp-nn/./arch/x86/include/asm/irqflags.h:80 (discriminator 1) /home/mjmartin/work/mptcp-nn/./arch/x86/include/asm/irqflags.h:138 (discriminator 1) /home/mjmartin/work/mptcp-nn/kernel/printk/printk.c:264 (discriminator 1)) [ 3107.656996] hardirqs last disabled at (17788): __schedule (/home/mjmartin/work/mptcp-nn/kernel/sched/core.c:6393 (discriminator 1)) [ 3107.658933] softirqs last enabled at (17806): __irq_exit_rcu (/home/mjmartin/work/mptcp-nn/kernel/softirq.c:445 /home/mjmartin/work/mptcp-nn/kernel/softirq.c:650) [ 3107.660848] softirqs last disabled at (17819): __irq_exit_rcu (/home/mjmartin/work/mptcp-nn/kernel/softirq.c:445 /home/mjmartin/work/mptcp-nn/kernel/softirq.c:650) [ 3107.662940] ---[ end trace 0000000000000000 ]--- Do you see anything similar in your testing? This was on a 4-cpu VM for me. Line 1306 of protocol.c that caused the splat is: WARN_ON_ONCE(reuse_skb); and it looks like that is expected to happen with a zero window and all data acked. Sounds like a condition that wasn't expected with previous schedulers (that only sent on one subflow at a time), but could happen with redundant schedulers when msk->snd_una is updated by another subflow. If you can't reproduce this, let me know and I can investigate some more. It's reproducible on my system. -- Mat Martineau Intel
Hi Mat, Sorry for the late reply. On Mon, Sep 12, 2022 at 05:01:39PM -0700, Mat Martineau wrote: > On Fri, 9 Sep 2022, Geliang Tang wrote: > > > v11: > > - address to Mat's comments in v10. > > - rebase to export/20220908T063452 > > > > Hi Geliang - > > Thanks for the updates to this series. > > I get slightly different kernel splats than the CI. For example, here's my > kmsg output with the first test in mptcp_connect.sh: > > [ 3102.670021] IPv6: ADDRCONF(NETDEV_CHANGE): ns1eth2: link becomes ready > [ 3102.885448] IPv6: ADDRCONF(NETDEV_CHANGE): ns2eth3: link becomes ready > [ 3103.112575] IPv6: ADDRCONF(NETDEV_CHANGE): ns3eth4: link becomes ready > [ 3103.463347] IPv6: ADDRCONF(NETDEV_CHANGE): ns2eth1: link becomes ready > [ 3107.580236] ------------[ cut here ]------------ > [ 3107.581325] WARNING: CPU: 2 PID: 1112 at net/mptcp/protocol.c:1306 > mptcp_sendmsg_frag (/home/mjmartin/work/mptcp-nn/net/mptcp/protocol.c:1306 > (discriminator 1)) [ 3107.583192] Modules linked in: > [ 3107.585317] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.fc36 04/01/2014 > [ 3107.587250] RIP: 0010:mptcp_sendmsg_frag > (/home/mjmartin/work/mptcp-nn/net/mptcp/protocol.c:1306 (discriminator 1)) [ > 3107.588421] Code: 0f 85 21 fd ff ff 48 c7 c2 a0 c1 1a 83 be 78 00 00 00 48 > c7 c7 80 c7 1a 83 c6 05 9e 31 f2 01 01 e8 03 83 03 00 e9 fd fc ff ff <0f> 0b > 0f b6 44 24 63 88 44 24 30 e9 dc f4 ff ff 8b 74 24 18 48 89 > All code > ======== > 0: 0f 85 21 fd ff ff jne 0xfffffffffffffd27 > 6: 48 c7 c2 a0 c1 1a 83 mov $0xffffffff831ac1a0,%rdx > d: be 78 00 00 00 mov $0x78,%esi > 12: 48 c7 c7 80 c7 1a 83 mov $0xffffffff831ac780,%rdi > 19: c6 05 9e 31 f2 01 01 movb $0x1,0x1f2319e(%rip) # 0x1f231be > 20: e8 03 83 03 00 call 0x38328 > 25: e9 fd fc ff ff jmp 0xfffffffffffffd27 > 2a:* 0f 0b ud2 <-- trapping instruction > 2c: 0f b6 44 24 63 movzbl 0x63(%rsp),%eax > 31: 88 44 24 30 mov %al,0x30(%rsp) > 35: e9 dc f4 ff ff jmp 0xfffffffffffff516 > 3a: 8b 74 24 18 mov 0x18(%rsp),%esi > 3e: 48 rex.W > 3f: 89 .byte 0x89 > > Code starting with the faulting instruction > =========================================== > 0: 0f 0b ud2 > 2: 0f b6 44 24 63 movzbl 0x63(%rsp),%eax > 7: 88 44 24 30 mov %al,0x30(%rsp) > b: e9 dc f4 ff ff jmp 0xfffffffffffff4ec > 10: 8b 74 24 18 mov 0x18(%rsp),%esi > 14: 48 rex.W > 15: 89 .byte 0x89 > [ 3107.592031] RSP: 0018:ffff888010f67910 EFLAGS: 00010202 > [ 3107.593172] RAX: e585171f95821f87 RBX: ffff888113b04f00 RCX: ffffffff8260bb19 > [ 3107.594458] RDX: 0000000000000001 RSI: ffffffff8260bac4 RDI: ffff88800fa84848 > [ 3107.596326] RBP: e585171f95821f88 R08: 0000000000000000 R09: ffff88800fa848af > [ 3107.597665] R10: ffffed1001f50915 R11: 0000000000000000 R12: ffff888010f67a78 > [ 3107.599903] R13: 0000000000000001 R14: ffff888107691800 R15: ffff8880357a0000 > [ 3107.601366] FS: 00007f668bdb0740(0000) GS:ffff88811b100000(0000) knlGS:0000000000000000 > [ 3107.603201] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 3107.604495] CR2: 00007ffe77890328 CR3: 0000000115318005 CR4: 0000000000370ee0 > [ 3107.606212] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 3107.607854] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > [ 3107.610532] Call Trace: > [ 3107.611229] <TASK> > [ 3107.611746] ? mptcp_init_sock > (/home/mjmartin/work/mptcp-nn/net/mptcp/protocol.c:1230) [ 3107.612654] ? > lockdep_hardirqs_on_prepare > (/home/mjmartin/work/mptcp-nn/kernel/locking/lockdep.c:4252 > /home/mjmartin/work/mptcp-nn/kernel/locking/lockdep.c:4319 > /home/mjmartin/work/mptcp-nn/kernel/locking/lockdep.c:4271) [ 3107.613890] ? > __local_bh_enable_ip > (/home/mjmartin/work/mptcp-nn/./arch/x86/include/asm/irqflags.h:45 > /home/mjmartin/work/mptcp-nn/./arch/x86/include/asm/irqflags.h:80 > /home/mjmartin/work/mptcp-nn/kernel/softirq.c:401) [ 3107.614952] > __mptcp_push_pending > (/home/mjmartin/work/mptcp-nn/net/mptcp/protocol.c:1569) [ 3107.615948] ? > mptcp_close (/home/mjmartin/work/mptcp-nn/net/mptcp/protocol.c:1532) [ > 3107.616850] ? __sk_mem_raise_allocated > (/home/mjmartin/work/mptcp-nn/net/core/sock.c:2810 > /home/mjmartin/work/mptcp-nn/net/core/sock.c:2981) [ 3107.617982] ? > copy_page_from_iter (/home/mjmartin/work/mptcp-nn/lib/iov_iter.c:751 > /home/mjmartin/work/mptcp-nn/lib/iov_iter.c:738) [ 3107.618991] > mptcp_sendmsg (/home/mjmartin/work/mptcp-nn/net/mptcp/protocol.c:1789) [ > 3107.619896] ? __mptcp_push_pending > (/home/mjmartin/work/mptcp-nn/net/mptcp/protocol.c:1682) [ 3107.621006] ? > inet_send_prepare (/home/mjmartin/work/mptcp-nn/net/ipv4/af_inet.c:807) [ > 3107.622087] ? inet_send_prepare > (/home/mjmartin/work/mptcp-nn/net/ipv4/af_inet.c:816) [ 3107.623082] > sock_sendmsg (/home/mjmartin/work/mptcp-nn/net/socket.c:717 > /home/mjmartin/work/mptcp-nn/net/socket.c:734) [ 3107.624372] > sock_write_iter (/home/mjmartin/work/mptcp-nn/net/socket.c:1109) [ > 3107.625361] ? sock_sendmsg (/home/mjmartin/work/mptcp-nn/net/socket.c:1092) > [ 3107.626261] ? file_has_perm > (/home/mjmartin/work/mptcp-nn/security/selinux/hooks.c:1724) [ 3107.627185] > ? selinux_file_permission > (/home/mjmartin/work/mptcp-nn/security/selinux/hooks.c:3570 > /home/mjmartin/work/mptcp-nn/security/selinux/hooks.c:3590) [ 3107.628345] > vfs_write (/home/mjmartin/work/mptcp-nn/./include/linux/fs.h:2187 > /home/mjmartin/work/mptcp-nn/fs/read_write.c:491 > /home/mjmartin/work/mptcp-nn/fs/read_write.c:578) [ 3107.629171] ? > __ia32_sys_pread64 (/home/mjmartin/work/mptcp-nn/fs/read_write.c:559) [ > 3107.630159] ? bit_wait_io_timeout > (/home/mjmartin/work/mptcp-nn/kernel/locking/mutex.c:902) [ 3107.631177] ? > __fget_light (/home/mjmartin/work/mptcp-nn/fs/file.c:1007 (discriminator 1)) > [ 3107.632005] ksys_write (/home/mjmartin/work/mptcp-nn/fs/read_write.c:631) > [ 3107.632957] ? __ia32_sys_read > (/home/mjmartin/work/mptcp-nn/fs/read_write.c:621) [ 3107.633859] ? > lockdep_hardirqs_on_prepare > (/home/mjmartin/work/mptcp-nn/kernel/locking/lockdep.c:466 > /home/mjmartin/work/mptcp-nn/kernel/locking/lockdep.c:4320 > /home/mjmartin/work/mptcp-nn/kernel/locking/lockdep.c:4271) [ 3107.635041] ? > syscall_enter_from_user_mode > (/home/mjmartin/work/mptcp-nn/./arch/x86/include/asm/irqflags.h:45 > /home/mjmartin/work/mptcp-nn/./arch/x86/include/asm/irqflags.h:80 > /home/mjmartin/work/mptcp-nn/kernel/entry/common.c:109) [ 3107.636398] > do_syscall_64 (/home/mjmartin/work/mptcp-nn/arch/x86/entry/common.c:50 > /home/mjmartin/work/mptcp-nn/arch/x86/entry/common.c:80) [ 3107.637254] > entry_SYSCALL_64_after_hwframe > (/home/mjmartin/work/mptcp-nn/arch/x86/entry/entry_64.S:120) [ 3107.638351] > RIP: 0033:0x7f668beb48f7 > [ 3107.639336] Code: 0f 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 > All code > ======== > 0: 0f 00 (bad) > 2: f7 d8 neg %eax > 4: 64 89 02 mov %eax,%fs:(%rdx) > 7: 48 c7 c0 ff ff ff ff mov $0xffffffffffffffff,%rax > e: eb b7 jmp 0xffffffffffffffc7 > 10: 0f 1f 00 nopl (%rax) > 13: f3 0f 1e fa endbr64 > 17: 64 8b 04 25 18 00 00 mov %fs:0x18,%eax > 1e: 00 > 1f: 85 c0 test %eax,%eax > 21: 75 10 jne 0x33 > 23: b8 01 00 00 00 mov $0x1,%eax > 28: 0f 05 syscall > 2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction > 30: 77 51 ja 0x83 > 32: c3 ret > 33: 48 83 ec 28 sub $0x28,%rsp > 37: 48 89 54 24 18 mov %rdx,0x18(%rsp) > 3c: 48 rex.W > 3d: 89 .byte 0x89 > 3e: 74 24 je 0x64 > > Code starting with the faulting instruction > =========================================== > 0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax > 6: 77 51 ja 0x59 > 8: c3 ret > 9: 48 83 ec 28 sub $0x28,%rsp > d: 48 89 54 24 18 mov %rdx,0x18(%rsp) > 12: 48 rex.W > 13: 89 .byte 0x89 > 14: 74 24 je 0x3a > [ 3107.643426] RSP: 002b:00007ffe778943c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 > [ 3107.645335] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f668beb48f7 > [ 3107.647101] RDX: 0000000000002000 RSI: 00007ffe778943f0 RDI: 0000000000000003 > [ 3107.648756] RBP: 0000000000000000 R08: 00007f668bfab214 R09: 00007f668bfab280 > [ 3107.650320] R10: 00007f668bdba140 R11: 0000000000000246 R12: 0000000000001500 > [ 3107.651883] R13: 0000000000002000 R14: 0000000000000000 R15: 0000000000002000 > [ 3107.653497] </TASK> > [ 3107.654005] irq event stamp: 17761 > [ 3107.654780] hardirqs last enabled at (17773): __up_console_sem > (/home/mjmartin/work/mptcp-nn/./arch/x86/include/asm/irqflags.h:45 > (discriminator 1) > /home/mjmartin/work/mptcp-nn/./arch/x86/include/asm/irqflags.h:80 > (discriminator 1) > /home/mjmartin/work/mptcp-nn/./arch/x86/include/asm/irqflags.h:138 > (discriminator 1) /home/mjmartin/work/mptcp-nn/kernel/printk/printk.c:264 > (discriminator 1)) [ 3107.656996] hardirqs last disabled at (17788): > __schedule (/home/mjmartin/work/mptcp-nn/kernel/sched/core.c:6393 > (discriminator 1)) [ 3107.658933] softirqs last enabled at (17806): > __irq_exit_rcu (/home/mjmartin/work/mptcp-nn/kernel/softirq.c:445 > /home/mjmartin/work/mptcp-nn/kernel/softirq.c:650) [ 3107.660848] softirqs > last disabled at (17819): __irq_exit_rcu > (/home/mjmartin/work/mptcp-nn/kernel/softirq.c:445 > /home/mjmartin/work/mptcp-nn/kernel/softirq.c:650) [ 3107.662940] ---[ end > trace 0000000000000000 ]--- > > > Do you see anything similar in your testing? This was on a 4-cpu VM for me. > Yes, I got both this error (1306 WARN_ON_ONCE(reuse_skb)) and another error (1010 WARN_ON_ONCE(!msk->recovery)) in my tests. > > Line 1306 of protocol.c that caused the splat is: > > WARN_ON_ONCE(reuse_skb); > > and it looks like that is expected to happen with a zero window and all data > acked. Sounds like a condition that wasn't expected with previous schedulers > (that only sent on one subflow at a time), but could happen with redundant > schedulers when msk->snd_una is updated by another subflow. > The original code updates dfrag->already_sent immediately after invoking mptcp_sendmsg_frag. But we delay updating dfrag->already_sent in our code in this series after all frags are sent. Then mptcp_check_allowed_size() will return 0 sometime in this case. We got (1306 WARN_ON_ONCE(reuse_skb)) error here: 1291 if (copy == 0) { 1292 u64 snd_una = READ_ONCE(msk->snd_una); 1293 1294 if (snd_una != msk->snd_nxt) { 1295 tcp_remove_empty_skb(ssk); 1296 return 0; 1297 } 1298 1299 zero_window_probe = true; 1300 data_seq = snd_una - 1; 1301 copy = 1; 1302 1303 /* all mptcp-level data is acked, no skbs should be present into the 1304 * ssk write queue 1305 */ 1306 WARN_ON_ONCE(reuse_skb); 1307 } The orignal code updates msk->first_pending immediately after every frag is sent, but we delay updating it after all frags are sent. In this way, the code will run to the position of (dfrag == msk->first_pending). We got (1010 WARN_ON_ONCE(!msk->recovery)) error here: 1008 if (unlikely(dfrag == msk->first_pending)) { 1009 /* in recovery mode can see ack after the current snd head */ 1010 if (WARN_ON_ONCE(!msk->recovery)) 1011 break; 1012 1013 WRITE_ONCE(msk->first_pending, mptcp_send_next(sk)); 1014 } I'm trying to fix these two errors, but I haven't made much progress. So I want to hear your suggestions. Thanks, -Geliang > If you can't reproduce this, let me know and I can investigate some more. > It's reproducible on my system. > > -- > Mat Martineau > Intel
© 2016 - 2024 Red Hat, Inc.