net/mptcp/protocol.c | 158 +++++++++++------- net/mptcp/protocol.h | 12 +- net/mptcp/sched.c | 59 ++++--- .../testing/selftests/bpf/prog_tests/mptcp.c | 34 ++++ .../selftests/bpf/progs/mptcp_bpf_red.c | 36 ++++ 5 files changed, 209 insertions(+), 90 deletions(-) create mode 100644 tools/testing/selftests/bpf/progs/mptcp_bpf_red.c
v11:
- address to Mat's comments in v10.
- rebase to export/20220908T063452
v10:
- send multiple dfrags in __mptcp_push_pending().
v9:
- drop the extra *err paramenter of mptcp_sched_get_send() as Florian
suggested.
v8:
- update __mptcp_push_pending(), send the same data on each subflow.
- update __mptcp_retrans, track the max sent data.
= add a new patch.
v7:
- drop redundant flag in v6
- drop __mptcp_subflows_push_pending in v6
- update redundant subflows support in __mptcp_push_pending
- update redundant subflows support in __mptcp_retrans
v6:
- Add redundant flag for struct mptcp_sched_ops.
- add a dedicated function __mptcp_subflows_push_pending() to deal with
redundat subflows push pending.
v5:
- address to Paolo's comment, keep the optimization to
mptcp_subflow_get_send() for the non eBPF case.
- merge mptcp_sched_get_send() and __mptcp_sched_get_send() in v4 into one.
- depends on "cleanups for bpf sched selftests".
v4:
- small cleanups in patch 1, 2.
- add TODO in patch 3.
- rebase patch 5 on 'cleanups for bpf sched selftests'.
v3:
- use new API.
- fix the link failure tests issue mentioned in ("https://patchwork.kernel.org/project/mptcp/cover/cover.1653033459.git.geliang.tang@suse.com/").
v2:
- add MPTCP_SUBFLOWS_MAX limit to avoid infinite loops when the
scheduler always sets call_again to true.
- track the largest copied amount.
- deal with __mptcp_subflow_push_pending() and the retransmit loop.
- depends on "BPF round-robin scheduler" v14.
v1:
Implements the redundant BPF MPTCP scheduler, which sends all packets
redundantly on all available subflows.
Geliang Tang (5):
Squash to "mptcp: add get_subflow wrappers"
mptcp: redundant subflows push pending
mptcp: redundant subflows retrans support
selftests/bpf: Add bpf_red scheduler
selftests/bpf: Add bpf_red test
net/mptcp/protocol.c | 158 +++++++++++-------
net/mptcp/protocol.h | 12 +-
net/mptcp/sched.c | 59 ++++---
.../testing/selftests/bpf/prog_tests/mptcp.c | 34 ++++
.../selftests/bpf/progs/mptcp_bpf_red.c | 36 ++++
5 files changed, 209 insertions(+), 90 deletions(-)
create mode 100644 tools/testing/selftests/bpf/progs/mptcp_bpf_red.c
--
2.35.3
On Fri, 9 Sep 2022, Geliang Tang wrote:
> v11:
> - address to Mat's comments in v10.
> - rebase to export/20220908T063452
>
Hi Geliang -
Thanks for the updates to this series.
I get slightly different kernel splats than the CI. For example, here's my
kmsg output with the first test in mptcp_connect.sh:
[ 3102.670021] IPv6: ADDRCONF(NETDEV_CHANGE): ns1eth2: link becomes ready
[ 3102.885448] IPv6: ADDRCONF(NETDEV_CHANGE): ns2eth3: link becomes ready
[ 3103.112575] IPv6: ADDRCONF(NETDEV_CHANGE): ns3eth4: link becomes ready
[ 3103.463347] IPv6: ADDRCONF(NETDEV_CHANGE): ns2eth1: link becomes ready
[ 3107.580236] ------------[ cut here ]------------
[ 3107.581325] WARNING: CPU: 2 PID: 1112 at net/mptcp/protocol.c:1306 mptcp_sendmsg_frag (/home/mjmartin/work/mptcp-nn/net/mptcp/protocol.c:1306 (discriminator 1))
[ 3107.583192] Modules linked in:
[ 3107.585317] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.fc36 04/01/2014
[ 3107.587250] RIP: 0010:mptcp_sendmsg_frag (/home/mjmartin/work/mptcp-nn/net/mptcp/protocol.c:1306 (discriminator 1))
[ 3107.588421] Code: 0f 85 21 fd ff ff 48 c7 c2 a0 c1 1a 83 be 78 00 00 00 48 c7 c7 80 c7 1a 83 c6 05 9e 31 f2 01 01 e8 03 83 03 00 e9 fd fc ff ff <0f> 0b 0f b6 44 24 63 88 44 24 30 e9 dc f4 ff ff 8b 74 24 18 48 89
All code
========
0: 0f 85 21 fd ff ff jne 0xfffffffffffffd27
6: 48 c7 c2 a0 c1 1a 83 mov $0xffffffff831ac1a0,%rdx
d: be 78 00 00 00 mov $0x78,%esi
12: 48 c7 c7 80 c7 1a 83 mov $0xffffffff831ac780,%rdi
19: c6 05 9e 31 f2 01 01 movb $0x1,0x1f2319e(%rip) # 0x1f231be
20: e8 03 83 03 00 call 0x38328
25: e9 fd fc ff ff jmp 0xfffffffffffffd27
2a:* 0f 0b ud2 <-- trapping instruction
2c: 0f b6 44 24 63 movzbl 0x63(%rsp),%eax
31: 88 44 24 30 mov %al,0x30(%rsp)
35: e9 dc f4 ff ff jmp 0xfffffffffffff516
3a: 8b 74 24 18 mov 0x18(%rsp),%esi
3e: 48 rex.W
3f: 89 .byte 0x89
Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: 0f b6 44 24 63 movzbl 0x63(%rsp),%eax
7: 88 44 24 30 mov %al,0x30(%rsp)
b: e9 dc f4 ff ff jmp 0xfffffffffffff4ec
10: 8b 74 24 18 mov 0x18(%rsp),%esi
14: 48 rex.W
15: 89 .byte 0x89
[ 3107.592031] RSP: 0018:ffff888010f67910 EFLAGS: 00010202
[ 3107.593172] RAX: e585171f95821f87 RBX: ffff888113b04f00 RCX: ffffffff8260bb19
[ 3107.594458] RDX: 0000000000000001 RSI: ffffffff8260bac4 RDI: ffff88800fa84848
[ 3107.596326] RBP: e585171f95821f88 R08: 0000000000000000 R09: ffff88800fa848af
[ 3107.597665] R10: ffffed1001f50915 R11: 0000000000000000 R12: ffff888010f67a78
[ 3107.599903] R13: 0000000000000001 R14: ffff888107691800 R15: ffff8880357a0000
[ 3107.601366] FS: 00007f668bdb0740(0000) GS:ffff88811b100000(0000) knlGS:0000000000000000
[ 3107.603201] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3107.604495] CR2: 00007ffe77890328 CR3: 0000000115318005 CR4: 0000000000370ee0
[ 3107.606212] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 3107.607854] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 3107.610532] Call Trace:
[ 3107.611229] <TASK>
[ 3107.611746] ? mptcp_init_sock (/home/mjmartin/work/mptcp-nn/net/mptcp/protocol.c:1230)
[ 3107.612654] ? lockdep_hardirqs_on_prepare (/home/mjmartin/work/mptcp-nn/kernel/locking/lockdep.c:4252 /home/mjmartin/work/mptcp-nn/kernel/locking/lockdep.c:4319
/home/mjmartin/work/mptcp-nn/kernel/locking/lockdep.c:4271)
[ 3107.613890] ? __local_bh_enable_ip (/home/mjmartin/work/mptcp-nn/./arch/x86/include/asm/irqflags.h:45 /home/mjmartin/work/mptcp-nn/./arch/x86/include/asm/irqflags.h:80
/home/mjmartin/work/mptcp-nn/kernel/softirq.c:401)
[ 3107.614952] __mptcp_push_pending (/home/mjmartin/work/mptcp-nn/net/mptcp/protocol.c:1569)
[ 3107.615948] ? mptcp_close (/home/mjmartin/work/mptcp-nn/net/mptcp/protocol.c:1532)
[ 3107.616850] ? __sk_mem_raise_allocated (/home/mjmartin/work/mptcp-nn/net/core/sock.c:2810 /home/mjmartin/work/mptcp-nn/net/core/sock.c:2981)
[ 3107.617982] ? copy_page_from_iter (/home/mjmartin/work/mptcp-nn/lib/iov_iter.c:751 /home/mjmartin/work/mptcp-nn/lib/iov_iter.c:738)
[ 3107.618991] mptcp_sendmsg (/home/mjmartin/work/mptcp-nn/net/mptcp/protocol.c:1789)
[ 3107.619896] ? __mptcp_push_pending (/home/mjmartin/work/mptcp-nn/net/mptcp/protocol.c:1682)
[ 3107.621006] ? inet_send_prepare (/home/mjmartin/work/mptcp-nn/net/ipv4/af_inet.c:807)
[ 3107.622087] ? inet_send_prepare (/home/mjmartin/work/mptcp-nn/net/ipv4/af_inet.c:816)
[ 3107.623082] sock_sendmsg (/home/mjmartin/work/mptcp-nn/net/socket.c:717 /home/mjmartin/work/mptcp-nn/net/socket.c:734)
[ 3107.624372] sock_write_iter (/home/mjmartin/work/mptcp-nn/net/socket.c:1109)
[ 3107.625361] ? sock_sendmsg (/home/mjmartin/work/mptcp-nn/net/socket.c:1092)
[ 3107.626261] ? file_has_perm (/home/mjmartin/work/mptcp-nn/security/selinux/hooks.c:1724)
[ 3107.627185] ? selinux_file_permission (/home/mjmartin/work/mptcp-nn/security/selinux/hooks.c:3570 /home/mjmartin/work/mptcp-nn/security/selinux/hooks.c:3590)
[ 3107.628345] vfs_write (/home/mjmartin/work/mptcp-nn/./include/linux/fs.h:2187 /home/mjmartin/work/mptcp-nn/fs/read_write.c:491 /home/mjmartin/work/mptcp-nn/fs/read_write.c:578)
[ 3107.629171] ? __ia32_sys_pread64 (/home/mjmartin/work/mptcp-nn/fs/read_write.c:559)
[ 3107.630159] ? bit_wait_io_timeout (/home/mjmartin/work/mptcp-nn/kernel/locking/mutex.c:902)
[ 3107.631177] ? __fget_light (/home/mjmartin/work/mptcp-nn/fs/file.c:1007 (discriminator 1))
[ 3107.632005] ksys_write (/home/mjmartin/work/mptcp-nn/fs/read_write.c:631)
[ 3107.632957] ? __ia32_sys_read (/home/mjmartin/work/mptcp-nn/fs/read_write.c:621)
[ 3107.633859] ? lockdep_hardirqs_on_prepare (/home/mjmartin/work/mptcp-nn/kernel/locking/lockdep.c:466 /home/mjmartin/work/mptcp-nn/kernel/locking/lockdep.c:4320
/home/mjmartin/work/mptcp-nn/kernel/locking/lockdep.c:4271)
[ 3107.635041] ? syscall_enter_from_user_mode (/home/mjmartin/work/mptcp-nn/./arch/x86/include/asm/irqflags.h:45 /home/mjmartin/work/mptcp-nn/./arch/x86/include/asm/irqflags.h:80
/home/mjmartin/work/mptcp-nn/kernel/entry/common.c:109)
[ 3107.636398] do_syscall_64 (/home/mjmartin/work/mptcp-nn/arch/x86/entry/common.c:50 /home/mjmartin/work/mptcp-nn/arch/x86/entry/common.c:80)
[ 3107.637254] entry_SYSCALL_64_after_hwframe (/home/mjmartin/work/mptcp-nn/arch/x86/entry/entry_64.S:120)
[ 3107.638351] RIP: 0033:0x7f668beb48f7
[ 3107.639336] Code: 0f 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
All code
========
0: 0f 00 (bad)
2: f7 d8 neg %eax
4: 64 89 02 mov %eax,%fs:(%rdx)
7: 48 c7 c0 ff ff ff ff mov $0xffffffffffffffff,%rax
e: eb b7 jmp 0xffffffffffffffc7
10: 0f 1f 00 nopl (%rax)
13: f3 0f 1e fa endbr64
17: 64 8b 04 25 18 00 00 mov %fs:0x18,%eax
1e: 00
1f: 85 c0 test %eax,%eax
21: 75 10 jne 0x33
23: b8 01 00 00 00 mov $0x1,%eax
28: 0f 05 syscall
2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction
30: 77 51 ja 0x83
32: c3 ret
33: 48 83 ec 28 sub $0x28,%rsp
37: 48 89 54 24 18 mov %rdx,0x18(%rsp)
3c: 48 rex.W
3d: 89 .byte 0x89
3e: 74 24 je 0x64
Code starting with the faulting instruction
===========================================
0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax
6: 77 51 ja 0x59
8: c3 ret
9: 48 83 ec 28 sub $0x28,%rsp
d: 48 89 54 24 18 mov %rdx,0x18(%rsp)
12: 48 rex.W
13: 89 .byte 0x89
14: 74 24 je 0x3a
[ 3107.643426] RSP: 002b:00007ffe778943c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 3107.645335] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f668beb48f7
[ 3107.647101] RDX: 0000000000002000 RSI: 00007ffe778943f0 RDI: 0000000000000003
[ 3107.648756] RBP: 0000000000000000 R08: 00007f668bfab214 R09: 00007f668bfab280
[ 3107.650320] R10: 00007f668bdba140 R11: 0000000000000246 R12: 0000000000001500
[ 3107.651883] R13: 0000000000002000 R14: 0000000000000000 R15: 0000000000002000
[ 3107.653497] </TASK>
[ 3107.654005] irq event stamp: 17761
[ 3107.654780] hardirqs last enabled at (17773): __up_console_sem (/home/mjmartin/work/mptcp-nn/./arch/x86/include/asm/irqflags.h:45 (discriminator 1) /home/mjmartin/work/mptcp-nn/./arch/x86/include/asm/irqflags.h:80
(discriminator 1) /home/mjmartin/work/mptcp-nn/./arch/x86/include/asm/irqflags.h:138 (discriminator 1) /home/mjmartin/work/mptcp-nn/kernel/printk/printk.c:264 (discriminator 1))
[ 3107.656996] hardirqs last disabled at (17788): __schedule (/home/mjmartin/work/mptcp-nn/kernel/sched/core.c:6393 (discriminator 1))
[ 3107.658933] softirqs last enabled at (17806): __irq_exit_rcu (/home/mjmartin/work/mptcp-nn/kernel/softirq.c:445 /home/mjmartin/work/mptcp-nn/kernel/softirq.c:650)
[ 3107.660848] softirqs last disabled at (17819): __irq_exit_rcu (/home/mjmartin/work/mptcp-nn/kernel/softirq.c:445 /home/mjmartin/work/mptcp-nn/kernel/softirq.c:650)
[ 3107.662940] ---[ end trace 0000000000000000 ]---
Do you see anything similar in your testing? This was on a 4-cpu VM for
me.
Line 1306 of protocol.c that caused the splat is:
WARN_ON_ONCE(reuse_skb);
and it looks like that is expected to happen with a zero window and all
data acked. Sounds like a condition that wasn't expected with previous
schedulers (that only sent on one subflow at a time), but could happen
with redundant schedulers when msk->snd_una is updated by another subflow.
If you can't reproduce this, let me know and I can investigate some more.
It's reproducible on my system.
--
Mat Martineau
Intel
Hi Mat,
Sorry for the late reply.
On Mon, Sep 12, 2022 at 05:01:39PM -0700, Mat Martineau wrote:
> On Fri, 9 Sep 2022, Geliang Tang wrote:
>
> > v11:
> > - address to Mat's comments in v10.
> > - rebase to export/20220908T063452
> >
>
> Hi Geliang -
>
> Thanks for the updates to this series.
>
> I get slightly different kernel splats than the CI. For example, here's my
> kmsg output with the first test in mptcp_connect.sh:
>
> [ 3102.670021] IPv6: ADDRCONF(NETDEV_CHANGE): ns1eth2: link becomes ready
> [ 3102.885448] IPv6: ADDRCONF(NETDEV_CHANGE): ns2eth3: link becomes ready
> [ 3103.112575] IPv6: ADDRCONF(NETDEV_CHANGE): ns3eth4: link becomes ready
> [ 3103.463347] IPv6: ADDRCONF(NETDEV_CHANGE): ns2eth1: link becomes ready
> [ 3107.580236] ------------[ cut here ]------------
> [ 3107.581325] WARNING: CPU: 2 PID: 1112 at net/mptcp/protocol.c:1306
> mptcp_sendmsg_frag (/home/mjmartin/work/mptcp-nn/net/mptcp/protocol.c:1306
> (discriminator 1)) [ 3107.583192] Modules linked in:
> [ 3107.585317] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.fc36 04/01/2014
> [ 3107.587250] RIP: 0010:mptcp_sendmsg_frag
> (/home/mjmartin/work/mptcp-nn/net/mptcp/protocol.c:1306 (discriminator 1)) [
> 3107.588421] Code: 0f 85 21 fd ff ff 48 c7 c2 a0 c1 1a 83 be 78 00 00 00 48
> c7 c7 80 c7 1a 83 c6 05 9e 31 f2 01 01 e8 03 83 03 00 e9 fd fc ff ff <0f> 0b
> 0f b6 44 24 63 88 44 24 30 e9 dc f4 ff ff 8b 74 24 18 48 89
> All code
> ========
> 0: 0f 85 21 fd ff ff jne 0xfffffffffffffd27
> 6: 48 c7 c2 a0 c1 1a 83 mov $0xffffffff831ac1a0,%rdx
> d: be 78 00 00 00 mov $0x78,%esi
> 12: 48 c7 c7 80 c7 1a 83 mov $0xffffffff831ac780,%rdi
> 19: c6 05 9e 31 f2 01 01 movb $0x1,0x1f2319e(%rip) # 0x1f231be
> 20: e8 03 83 03 00 call 0x38328
> 25: e9 fd fc ff ff jmp 0xfffffffffffffd27
> 2a:* 0f 0b ud2 <-- trapping instruction
> 2c: 0f b6 44 24 63 movzbl 0x63(%rsp),%eax
> 31: 88 44 24 30 mov %al,0x30(%rsp)
> 35: e9 dc f4 ff ff jmp 0xfffffffffffff516
> 3a: 8b 74 24 18 mov 0x18(%rsp),%esi
> 3e: 48 rex.W
> 3f: 89 .byte 0x89
>
> Code starting with the faulting instruction
> ===========================================
> 0: 0f 0b ud2
> 2: 0f b6 44 24 63 movzbl 0x63(%rsp),%eax
> 7: 88 44 24 30 mov %al,0x30(%rsp)
> b: e9 dc f4 ff ff jmp 0xfffffffffffff4ec
> 10: 8b 74 24 18 mov 0x18(%rsp),%esi
> 14: 48 rex.W
> 15: 89 .byte 0x89
> [ 3107.592031] RSP: 0018:ffff888010f67910 EFLAGS: 00010202
> [ 3107.593172] RAX: e585171f95821f87 RBX: ffff888113b04f00 RCX: ffffffff8260bb19
> [ 3107.594458] RDX: 0000000000000001 RSI: ffffffff8260bac4 RDI: ffff88800fa84848
> [ 3107.596326] RBP: e585171f95821f88 R08: 0000000000000000 R09: ffff88800fa848af
> [ 3107.597665] R10: ffffed1001f50915 R11: 0000000000000000 R12: ffff888010f67a78
> [ 3107.599903] R13: 0000000000000001 R14: ffff888107691800 R15: ffff8880357a0000
> [ 3107.601366] FS: 00007f668bdb0740(0000) GS:ffff88811b100000(0000) knlGS:0000000000000000
> [ 3107.603201] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 3107.604495] CR2: 00007ffe77890328 CR3: 0000000115318005 CR4: 0000000000370ee0
> [ 3107.606212] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 3107.607854] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 3107.610532] Call Trace:
> [ 3107.611229] <TASK>
> [ 3107.611746] ? mptcp_init_sock
> (/home/mjmartin/work/mptcp-nn/net/mptcp/protocol.c:1230) [ 3107.612654] ?
> lockdep_hardirqs_on_prepare
> (/home/mjmartin/work/mptcp-nn/kernel/locking/lockdep.c:4252
> /home/mjmartin/work/mptcp-nn/kernel/locking/lockdep.c:4319
> /home/mjmartin/work/mptcp-nn/kernel/locking/lockdep.c:4271) [ 3107.613890] ?
> __local_bh_enable_ip
> (/home/mjmartin/work/mptcp-nn/./arch/x86/include/asm/irqflags.h:45
> /home/mjmartin/work/mptcp-nn/./arch/x86/include/asm/irqflags.h:80
> /home/mjmartin/work/mptcp-nn/kernel/softirq.c:401) [ 3107.614952]
> __mptcp_push_pending
> (/home/mjmartin/work/mptcp-nn/net/mptcp/protocol.c:1569) [ 3107.615948] ?
> mptcp_close (/home/mjmartin/work/mptcp-nn/net/mptcp/protocol.c:1532) [
> 3107.616850] ? __sk_mem_raise_allocated
> (/home/mjmartin/work/mptcp-nn/net/core/sock.c:2810
> /home/mjmartin/work/mptcp-nn/net/core/sock.c:2981) [ 3107.617982] ?
> copy_page_from_iter (/home/mjmartin/work/mptcp-nn/lib/iov_iter.c:751
> /home/mjmartin/work/mptcp-nn/lib/iov_iter.c:738) [ 3107.618991]
> mptcp_sendmsg (/home/mjmartin/work/mptcp-nn/net/mptcp/protocol.c:1789) [
> 3107.619896] ? __mptcp_push_pending
> (/home/mjmartin/work/mptcp-nn/net/mptcp/protocol.c:1682) [ 3107.621006] ?
> inet_send_prepare (/home/mjmartin/work/mptcp-nn/net/ipv4/af_inet.c:807) [
> 3107.622087] ? inet_send_prepare
> (/home/mjmartin/work/mptcp-nn/net/ipv4/af_inet.c:816) [ 3107.623082]
> sock_sendmsg (/home/mjmartin/work/mptcp-nn/net/socket.c:717
> /home/mjmartin/work/mptcp-nn/net/socket.c:734) [ 3107.624372]
> sock_write_iter (/home/mjmartin/work/mptcp-nn/net/socket.c:1109) [
> 3107.625361] ? sock_sendmsg (/home/mjmartin/work/mptcp-nn/net/socket.c:1092)
> [ 3107.626261] ? file_has_perm
> (/home/mjmartin/work/mptcp-nn/security/selinux/hooks.c:1724) [ 3107.627185]
> ? selinux_file_permission
> (/home/mjmartin/work/mptcp-nn/security/selinux/hooks.c:3570
> /home/mjmartin/work/mptcp-nn/security/selinux/hooks.c:3590) [ 3107.628345]
> vfs_write (/home/mjmartin/work/mptcp-nn/./include/linux/fs.h:2187
> /home/mjmartin/work/mptcp-nn/fs/read_write.c:491
> /home/mjmartin/work/mptcp-nn/fs/read_write.c:578) [ 3107.629171] ?
> __ia32_sys_pread64 (/home/mjmartin/work/mptcp-nn/fs/read_write.c:559) [
> 3107.630159] ? bit_wait_io_timeout
> (/home/mjmartin/work/mptcp-nn/kernel/locking/mutex.c:902) [ 3107.631177] ?
> __fget_light (/home/mjmartin/work/mptcp-nn/fs/file.c:1007 (discriminator 1))
> [ 3107.632005] ksys_write (/home/mjmartin/work/mptcp-nn/fs/read_write.c:631)
> [ 3107.632957] ? __ia32_sys_read
> (/home/mjmartin/work/mptcp-nn/fs/read_write.c:621) [ 3107.633859] ?
> lockdep_hardirqs_on_prepare
> (/home/mjmartin/work/mptcp-nn/kernel/locking/lockdep.c:466
> /home/mjmartin/work/mptcp-nn/kernel/locking/lockdep.c:4320
> /home/mjmartin/work/mptcp-nn/kernel/locking/lockdep.c:4271) [ 3107.635041] ?
> syscall_enter_from_user_mode
> (/home/mjmartin/work/mptcp-nn/./arch/x86/include/asm/irqflags.h:45
> /home/mjmartin/work/mptcp-nn/./arch/x86/include/asm/irqflags.h:80
> /home/mjmartin/work/mptcp-nn/kernel/entry/common.c:109) [ 3107.636398]
> do_syscall_64 (/home/mjmartin/work/mptcp-nn/arch/x86/entry/common.c:50
> /home/mjmartin/work/mptcp-nn/arch/x86/entry/common.c:80) [ 3107.637254]
> entry_SYSCALL_64_after_hwframe
> (/home/mjmartin/work/mptcp-nn/arch/x86/entry/entry_64.S:120) [ 3107.638351]
> RIP: 0033:0x7f668beb48f7
> [ 3107.639336] Code: 0f 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
> All code
> ========
> 0: 0f 00 (bad)
> 2: f7 d8 neg %eax
> 4: 64 89 02 mov %eax,%fs:(%rdx)
> 7: 48 c7 c0 ff ff ff ff mov $0xffffffffffffffff,%rax
> e: eb b7 jmp 0xffffffffffffffc7
> 10: 0f 1f 00 nopl (%rax)
> 13: f3 0f 1e fa endbr64
> 17: 64 8b 04 25 18 00 00 mov %fs:0x18,%eax
> 1e: 00
> 1f: 85 c0 test %eax,%eax
> 21: 75 10 jne 0x33
> 23: b8 01 00 00 00 mov $0x1,%eax
> 28: 0f 05 syscall
> 2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction
> 30: 77 51 ja 0x83
> 32: c3 ret
> 33: 48 83 ec 28 sub $0x28,%rsp
> 37: 48 89 54 24 18 mov %rdx,0x18(%rsp)
> 3c: 48 rex.W
> 3d: 89 .byte 0x89
> 3e: 74 24 je 0x64
>
> Code starting with the faulting instruction
> ===========================================
> 0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax
> 6: 77 51 ja 0x59
> 8: c3 ret
> 9: 48 83 ec 28 sub $0x28,%rsp
> d: 48 89 54 24 18 mov %rdx,0x18(%rsp)
> 12: 48 rex.W
> 13: 89 .byte 0x89
> 14: 74 24 je 0x3a
> [ 3107.643426] RSP: 002b:00007ffe778943c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> [ 3107.645335] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f668beb48f7
> [ 3107.647101] RDX: 0000000000002000 RSI: 00007ffe778943f0 RDI: 0000000000000003
> [ 3107.648756] RBP: 0000000000000000 R08: 00007f668bfab214 R09: 00007f668bfab280
> [ 3107.650320] R10: 00007f668bdba140 R11: 0000000000000246 R12: 0000000000001500
> [ 3107.651883] R13: 0000000000002000 R14: 0000000000000000 R15: 0000000000002000
> [ 3107.653497] </TASK>
> [ 3107.654005] irq event stamp: 17761
> [ 3107.654780] hardirqs last enabled at (17773): __up_console_sem
> (/home/mjmartin/work/mptcp-nn/./arch/x86/include/asm/irqflags.h:45
> (discriminator 1)
> /home/mjmartin/work/mptcp-nn/./arch/x86/include/asm/irqflags.h:80
> (discriminator 1)
> /home/mjmartin/work/mptcp-nn/./arch/x86/include/asm/irqflags.h:138
> (discriminator 1) /home/mjmartin/work/mptcp-nn/kernel/printk/printk.c:264
> (discriminator 1)) [ 3107.656996] hardirqs last disabled at (17788):
> __schedule (/home/mjmartin/work/mptcp-nn/kernel/sched/core.c:6393
> (discriminator 1)) [ 3107.658933] softirqs last enabled at (17806):
> __irq_exit_rcu (/home/mjmartin/work/mptcp-nn/kernel/softirq.c:445
> /home/mjmartin/work/mptcp-nn/kernel/softirq.c:650) [ 3107.660848] softirqs
> last disabled at (17819): __irq_exit_rcu
> (/home/mjmartin/work/mptcp-nn/kernel/softirq.c:445
> /home/mjmartin/work/mptcp-nn/kernel/softirq.c:650) [ 3107.662940] ---[ end
> trace 0000000000000000 ]---
>
>
> Do you see anything similar in your testing? This was on a 4-cpu VM for me.
>
Yes, I got both this error (1306 WARN_ON_ONCE(reuse_skb)) and another error
(1010 WARN_ON_ONCE(!msk->recovery)) in my tests.
>
> Line 1306 of protocol.c that caused the splat is:
>
> WARN_ON_ONCE(reuse_skb);
>
> and it looks like that is expected to happen with a zero window and all data
> acked. Sounds like a condition that wasn't expected with previous schedulers
> (that only sent on one subflow at a time), but could happen with redundant
> schedulers when msk->snd_una is updated by another subflow.
>
The original code updates dfrag->already_sent immediately after invoking
mptcp_sendmsg_frag. But we delay updating dfrag->already_sent in our code
in this series after all frags are sent. Then mptcp_check_allowed_size()
will return 0 sometime in this case. We got (1306 WARN_ON_ONCE(reuse_skb))
error here:
1291 if (copy == 0) {
1292 u64 snd_una = READ_ONCE(msk->snd_una);
1293
1294 if (snd_una != msk->snd_nxt) {
1295 tcp_remove_empty_skb(ssk);
1296 return 0;
1297 }
1298
1299 zero_window_probe = true;
1300 data_seq = snd_una - 1;
1301 copy = 1;
1302
1303 /* all mptcp-level data is acked, no skbs should be present into the
1304 * ssk write queue
1305 */
1306 WARN_ON_ONCE(reuse_skb);
1307 }
The orignal code updates msk->first_pending immediately after every frag is
sent, but we delay updating it after all frags are sent. In this way, the
code will run to the position of (dfrag == msk->first_pending). We got
(1010 WARN_ON_ONCE(!msk->recovery)) error here:
1008 if (unlikely(dfrag == msk->first_pending)) {
1009 /* in recovery mode can see ack after the current snd head */
1010 if (WARN_ON_ONCE(!msk->recovery))
1011 break;
1012
1013 WRITE_ONCE(msk->first_pending, mptcp_send_next(sk));
1014 }
I'm trying to fix these two errors, but I haven't made much progress. So I
want to hear your suggestions.
Thanks,
-Geliang
> If you can't reproduce this, let me know and I can investigate some more.
> It's reproducible on my system.
>
> --
> Mat Martineau
> Intel
© 2016 - 2026 Red Hat, Inc.