:p
atchew
Login
Greg recently reported 2 patches that could not be applied without conflict in v6.6: - 4dabcdf58121 ("tcp: fix mptcp DSS corruption due to large pmtu xmit") - 5afca7e996c4 ("selftests: mptcp: join: test for prohibited MPC to port-based endp") Conflicts have been resolved, and documented in each patch. Note that there are two extra patches: - 8c6f6b4bb53a ("selftests: mptcp: join: change capture/checksum as bool"): to avoid some conflicts - "selftests: mptcp: remove duplicated variables": a dedicated patch for v6.6, to fix some previous backport issues. Geliang Tang (1): selftests: mptcp: join: change capture/checksum as bool Matthieu Baerts (NGI0) (1): selftests: mptcp: remove duplicated variables Paolo Abeni (2): tcp: fix mptcp DSS corruption due to large pmtu xmit selftests: mptcp: join: test for prohibited MPC to port-based endp net/ipv4/tcp_output.c | 4 +- .../testing/selftests/net/mptcp/mptcp_join.sh | 135 ++++++++++++------ .../testing/selftests/net/mptcp/mptcp_lib.sh | 11 -- 3 files changed, 96 insertions(+), 54 deletions(-) -- 2.45.2
From: Paolo Abeni <pabeni@redhat.com> commit 4dabcdf581217e60690467a37c956a5b8dbc6bd9 upstream. Syzkaller was able to trigger a DSS corruption: TCP: request_sock_subflow_v4: Possible SYN flooding on port [::]:20002. Sending cookies. ------------[ cut here ]------------ WARNING: CPU: 0 PID: 5227 at net/mptcp/protocol.c:695 __mptcp_move_skbs_from_subflow+0x20a9/0x21f0 net/mptcp/protocol.c:695 Modules linked in: CPU: 0 UID: 0 PID: 5227 Comm: syz-executor350 Not tainted 6.11.0-syzkaller-08829-gaf9c191ac2a0 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024 RIP: 0010:__mptcp_move_skbs_from_subflow+0x20a9/0x21f0 net/mptcp/protocol.c:695 Code: 0f b6 dc 31 ff 89 de e8 b5 dd ea f5 89 d8 48 81 c4 50 01 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc e8 98 da ea f5 90 <0f> 0b 90 e9 47 ff ff ff e8 8a da ea f5 90 0f 0b 90 e9 99 e0 ff ff RSP: 0018:ffffc90000006db8 EFLAGS: 00010246 RAX: ffffffff8ba9df18 RBX: 00000000000055f0 RCX: ffff888030023c00 RDX: 0000000000000100 RSI: 00000000000081e5 RDI: 00000000000055f0 RBP: 1ffff110062bf1ae R08: ffffffff8ba9cf12 R09: 1ffff110062bf1b8 R10: dffffc0000000000 R11: ffffed10062bf1b9 R12: 0000000000000000 R13: dffffc0000000000 R14: 00000000700cec61 R15: 00000000000081e5 FS: 000055556679c380(0000) GS:ffff8880b8600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020287000 CR3: 0000000077892000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <IRQ> move_skbs_to_msk net/mptcp/protocol.c:811 [inline] mptcp_data_ready+0x29c/0xa90 net/mptcp/protocol.c:854 subflow_data_ready+0x34a/0x920 net/mptcp/subflow.c:1490 tcp_data_queue+0x20fd/0x76c0 net/ipv4/tcp_input.c:5283 tcp_rcv_established+0xfba/0x2020 net/ipv4/tcp_input.c:6237 tcp_v4_do_rcv+0x96d/0xc70 net/ipv4/tcp_ipv4.c:1915 tcp_v4_rcv+0x2dc0/0x37f0 net/ipv4/tcp_ipv4.c:2350 ip_protocol_deliver_rcu+0x22e/0x440 net/ipv4/ip_input.c:205 ip_local_deliver_finish+0x341/0x5f0 net/ipv4/ip_input.c:233 NF_HOOK+0x3a4/0x450 include/linux/netfilter.h:314 NF_HOOK+0x3a4/0x450 include/linux/netfilter.h:314 __netif_receive_skb_one_core net/core/dev.c:5662 [inline] __netif_receive_skb+0x2bf/0x650 net/core/dev.c:5775 process_backlog+0x662/0x15b0 net/core/dev.c:6107 __napi_poll+0xcb/0x490 net/core/dev.c:6771 napi_poll net/core/dev.c:6840 [inline] net_rx_action+0x89b/0x1240 net/core/dev.c:6962 handle_softirqs+0x2c5/0x980 kernel/softirq.c:554 do_softirq+0x11b/0x1e0 kernel/softirq.c:455 </IRQ> <TASK> __local_bh_enable_ip+0x1bb/0x200 kernel/softirq.c:382 local_bh_enable include/linux/bottom_half.h:33 [inline] rcu_read_unlock_bh include/linux/rcupdate.h:919 [inline] __dev_queue_xmit+0x1764/0x3e80 net/core/dev.c:4451 dev_queue_xmit include/linux/netdevice.h:3094 [inline] neigh_hh_output include/net/neighbour.h:526 [inline] neigh_output include/net/neighbour.h:540 [inline] ip_finish_output2+0xd41/0x1390 net/ipv4/ip_output.c:236 ip_local_out net/ipv4/ip_output.c:130 [inline] __ip_queue_xmit+0x118c/0x1b80 net/ipv4/ip_output.c:536 __tcp_transmit_skb+0x2544/0x3b30 net/ipv4/tcp_output.c:1466 tcp_transmit_skb net/ipv4/tcp_output.c:1484 [inline] tcp_mtu_probe net/ipv4/tcp_output.c:2547 [inline] tcp_write_xmit+0x641d/0x6bf0 net/ipv4/tcp_output.c:2752 __tcp_push_pending_frames+0x9b/0x360 net/ipv4/tcp_output.c:3015 tcp_push_pending_frames include/net/tcp.h:2107 [inline] tcp_data_snd_check net/ipv4/tcp_input.c:5714 [inline] tcp_rcv_established+0x1026/0x2020 net/ipv4/tcp_input.c:6239 tcp_v4_do_rcv+0x96d/0xc70 net/ipv4/tcp_ipv4.c:1915 sk_backlog_rcv include/net/sock.h:1113 [inline] __release_sock+0x214/0x350 net/core/sock.c:3072 release_sock+0x61/0x1f0 net/core/sock.c:3626 mptcp_push_release net/mptcp/protocol.c:1486 [inline] __mptcp_push_pending+0x6b5/0x9f0 net/mptcp/protocol.c:1625 mptcp_sendmsg+0x10bb/0x1b10 net/mptcp/protocol.c:1903 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x1a6/0x270 net/socket.c:745 ____sys_sendmsg+0x52a/0x7e0 net/socket.c:2603 ___sys_sendmsg net/socket.c:2657 [inline] __sys_sendmsg+0x2aa/0x390 net/socket.c:2686 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fb06e9317f9 Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffe2cfd4f98 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007fb06e97f468 RCX: 00007fb06e9317f9 RDX: 0000000000000000 RSI: 0000000020000080 RDI: 0000000000000005 RBP: 00007fb06e97f446 R08: 0000555500000000 R09: 0000555500000000 R10: 0000555500000000 R11: 0000000000000246 R12: 00007fb06e97f406 R13: 0000000000000001 R14: 00007ffe2cfd4fe0 R15: 0000000000000003 </TASK> Additionally syzkaller provided a nice reproducer. The repro enables pmtu on the loopback device, leading to tcp_mtu_probe() generating very large probe packets. tcp_can_coalesce_send_queue_head() currently does not check for mptcp-level invariants, and allowed the creation of cross-DSS probes, leading to the mentioned corruption. Address the issue teaching tcp_can_coalesce_send_queue_head() about mptcp using the tcp_skb_can_collapse(), also reducing the code duplication. Fixes: 85712484110d ("tcp: coalesce/collapse must respect MPTCP extensions") Cc: stable@vger.kernel.org Reported-by: syzbot+d1bff73460e33101f0e7@syzkaller.appspotmail.com Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/513 Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20241008-net-mptcp-fallback-fixes-v1-2-c6fb8e93e551@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> [ Conflict in tcp_output.c, because the commit 65249feb6b3d ("net: add support for skbs with unreadable frags") is not in this version. This commit is linked to a new feature (Devmem TCP) and introduces a new condition which causes the conflicts. Resolving this is easy: we can ignore the missing new condition, and use tcp_skb_can_collapse() like in the original patch. ] Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> --- net/ipv4/tcp_output.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index XXXXXXX..XXXXXXX 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -XXX,XX +XXX,XX @@ static bool tcp_can_coalesce_send_queue_head(struct sock *sk, int len) if (len <= skb->len) break; - if (unlikely(TCP_SKB_CB(skb)->eor) || - tcp_has_tx_tstamp(skb) || - !skb_pure_zcopy_same(skb, next)) + if (tcp_has_tx_tstamp(skb) || !tcp_skb_can_collapse(skb, next)) return false; len -= skb->len; -- 2.45.2
From: Geliang Tang <tanggeliang@kylinos.cn> commit 8c6f6b4bb53a904f922dfb90d566391d3feee32c upstream. To maintain consistency with other scripts, this patch changes vars 'capture' and 'checksum' as bool vars in mptcp_join. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240223-upstream-net-next-20240223-misc-improvements-v1-7-b6c8a10396bd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Stable-dep-of: 5afca7e996c4 ("selftests: mptcp: join: test for prohibited MPC to port-based endp") Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> --- .../testing/selftests/net/mptcp/mptcp_join.sh | 22 +++++++++---------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh index XXXXXXX..XXXXXXX 100755 --- a/tools/testing/selftests/net/mptcp/mptcp_join.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh @@ -XXX,XX +XXX,XX @@ iptables="iptables" ip6tables="ip6tables" timeout_poll=30 timeout_test=$((timeout_poll * 2 + 1)) -capture=0 -checksum=0 +capture=false +checksum=false ip_mptcp=0 check_invert=0 -validate_checksum=0 +validate_checksum=false init=0 evts_ns1="" evts_ns2="" @@ -XXX,XX +XXX,XX @@ init_partial() ip netns exec $netns sysctl -q net.mptcp.pm_type=0 2>/dev/null || true ip netns exec $netns sysctl -q net.ipv4.conf.all.rp_filter=0 ip netns exec $netns sysctl -q net.ipv4.conf.default.rp_filter=0 - if [ $checksum -eq 1 ]; then + if $checksum; then ip netns exec $netns sysctl -q net.mptcp.checksum_enabled=1 fi done @@ -XXX,XX +XXX,XX @@ reset_with_checksum() ip netns exec $ns1 sysctl -q net.mptcp.checksum_enabled=$ns1_enable ip netns exec $ns2 sysctl -q net.mptcp.checksum_enabled=$ns2_enable - validate_checksum=1 + validate_checksum=true } reset_with_allow_join_id0() @@ -XXX,XX +XXX,XX @@ reset_with_allow_join_id0() setup_fail_rules() { check_invert=1 - validate_checksum=1 + validate_checksum=true local i="$1" local ip="${2:-4}" local tables @@ -XXX,XX +XXX,XX @@ do_transfer() :> "$sout" :> "$capout" - if [ $capture -eq 1 ]; then + if $capture; then local capuser if [ -z $SUDO_USER ] ; then capuser="" @@ -XXX,XX +XXX,XX @@ do_transfer() wait $spid local rets=$? - if [ $capture -eq 1 ]; then + if $capture; then sleep 1 kill $cappid fi @@ -XXX,XX +XXX,XX @@ chk_join_nr() else print_ok fi - if [ $validate_checksum -eq 1 ]; then + if $validate_checksum; then chk_csum_nr $csum_ns1 $csum_ns2 chk_fail_nr $fail_nr $fail_nr chk_rst_nr $rst_nr $rst_nr @@ -XXX,XX +XXX,XX @@ while getopts "${all_tests_args}cCih" opt; do tests+=("${all_tests[${opt}]}") ;; c) - capture=1 + capture=true ;; C) - checksum=1 + checksum=true ;; i) ip_mptcp=1 -- 2.45.2
From: Paolo Abeni <pabeni@redhat.com> commit 5afca7e996c42aed1b4a42d4712817601ba42aff upstream. Explicitly verify that MPC connection attempts towards a port-based signal endpoint fail with a reset. Note that this new test is a bit different from the other ones, not using 'run_tests'. It is then needed to add the capture capability, and the picking the right port which have been extracted into three new helpers. The info about the capture can also be printed from a single point, which simplifies the exit paths in do_transfer(). The 'Fixes' tag here below is the same as the one from the previous commit: this patch here is not fixing anything wrong in the selftests, but it validates the previous fix for an issue introduced by this commit ID. Fixes: 1729cf186d8a ("mptcp: create the listening socket for new port") Cc: stable@vger.kernel.org Co-developed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20241014-net-mptcp-mpc-port-endp-v2-2-7faea8e6b6ae@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> [ Conflicts in mptcp_join.sh, because commit 0bd962dd86b2 ("selftests: mptcp: join: check CURRESTAB counters"), and commit 9e6a39ecb9a1 ("selftests: mptcp: export TEST_COUNTER variable") are linked to new features, not available in this version. Resolving the conflicts is easy, simply adding the new helpers before do_transfer(), and rename MPTCP_LIB_TEST_COUNTER to TEST_COUNT that was used before. ] Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> --- .../testing/selftests/net/mptcp/mptcp_join.sh | 117 +++++++++++++----- 1 file changed, 86 insertions(+), 31 deletions(-) diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh index XXXXXXX..XXXXXXX 100755 --- a/tools/testing/selftests/net/mptcp/mptcp_join.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh @@ -XXX,XX +XXX,XX @@ tmpfile="" cout="" err="" capout="" +cappid="" ns1="" ns2="" ksft_skip=4 @@ -XXX,XX +XXX,XX @@ pm_nl_set_endpoint() fi } +cond_start_capture() +{ + local ns="$1" + + :> "$capout" + + if $capture; then + local capuser capfile + if [ -z $SUDO_USER ]; then + capuser="" + else + capuser="-Z $SUDO_USER" + fi + + capfile=$(printf "mp_join-%02u-%s.pcap" "$TEST_COUNT" "$ns") + + echo "Capturing traffic for test $TEST_COUNT into $capfile" + ip netns exec "$ns" tcpdump -i any -s 65535 -B 32768 $capuser -w "$capfile" > "$capout" 2>&1 & + cappid=$! + + sleep 1 + fi +} + +cond_stop_capture() +{ + if $capture; then + sleep 1 + kill $cappid + cat "$capout" + fi +} + +get_port() +{ + echo "$((10000 + TEST_COUNT - 1))" +} + do_transfer() { local listener_ns="$1" @@ -XXX,XX +XXX,XX @@ do_transfer() local cl_proto="$3" local srv_proto="$4" local connect_addr="$5" + local port - local port=$((10000 + TEST_COUNT - 1)) - local cappid local FAILING_LINKS=${FAILING_LINKS:-""} local fastclose=${fastclose:-""} local speed=${speed:-"fast"} + port=$(get_port) :> "$cout" :> "$sout" - :> "$capout" - if $capture; then - local capuser - if [ -z $SUDO_USER ] ; then - capuser="" - else - capuser="-Z $SUDO_USER" - fi - - capfile=$(printf "mp_join-%02u-%s.pcap" "$TEST_COUNT" "${listener_ns}") - - echo "Capturing traffic for test $TEST_COUNT into $capfile" - ip netns exec ${listener_ns} tcpdump -i any -s 65535 -B 32768 $capuser -w $capfile > "$capout" 2>&1 & - cappid=$! - - sleep 1 - fi + cond_start_capture ${listener_ns} NSTAT_HISTORY=/tmp/${listener_ns}.nstat ip netns exec ${listener_ns} \ nstat -n @@ -XXX,XX +XXX,XX @@ do_transfer() wait $spid local rets=$? - if $capture; then - sleep 1 - kill $cappid - fi + cond_stop_capture NSTAT_HISTORY=/tmp/${listener_ns}.nstat ip netns exec ${listener_ns} \ nstat | grep Tcp > /tmp/${listener_ns}.out @@ -XXX,XX +XXX,XX @@ do_transfer() ip netns exec ${connector_ns} ss -Menita 1>&2 -o "dport = :$port" cat /tmp/${connector_ns}.out - cat "$capout" return 1 fi @@ -XXX,XX +XXX,XX @@ do_transfer() fi rets=$? - if [ $retc -eq 0 ] && [ $rets -eq 0 ];then - cat "$capout" - return 0 - fi - - cat "$capout" - return 1 + [ $retc -eq 0 ] && [ $rets -eq 0 ] } make_file() @@ -XXX,XX +XXX,XX @@ verify_listener_events() fail_test "$e_type:$type $e_family:$family $e_saddr:$saddr $e_sport:$sport" } +chk_mpc_endp_attempt() +{ + local retl=$1 + local attempts=$2 + + print_check "Connect" + + if [ ${retl} = 124 ]; then + fail_test "timeout on connect" + elif [ ${retl} = 0 ]; then + fail_test "unexpected successful connect" + else + print_ok + + print_check "Attempts" + count=$(mptcp_lib_get_counter ${ns1} "MPTcpExtMPCapableEndpAttempt") + if [ -z "$count" ]; then + print_skip + elif [ "$count" != "$attempts" ]; then + fail_test "got ${count} MPC attempt[s] on port-based endpoint, expected ${attempts}" + else + print_ok + fi + fi +} + add_addr_ports_tests() { # signal address with port @@ -XXX,XX +XXX,XX @@ add_addr_ports_tests() chk_join_nr 2 2 2 chk_add_nr 2 2 2 fi + + if reset "port-based signal endpoint must not accept mpc"; then + local port retl count + port=$(get_port) + + cond_start_capture ${ns1} + pm_nl_add_endpoint ${ns1} 10.0.2.1 flags signal port ${port} + mptcp_lib_wait_local_port_listen ${ns1} ${port} + + timeout 1 ip netns exec ${ns2} \ + ./mptcp_connect -t ${timeout_poll} -p $port -s MPTCP 10.0.2.1 >/dev/null 2>&1 + retl=$? + cond_stop_capture + + chk_mpc_endp_attempt ${retl} 1 + fi } syncookies_tests() -- 2.45.2
A few week ago, there were some backport issues in MPTCP selftests, because some patches have been applied twice, but with versions handling conflicts differently [1]. Patches fixing these issues have been sent [2] and applied, but it looks like quilt was still confused with the removal of some patches, and commit a417ef47a665 ("selftests: mptcp: join: validate event numbers") duplicated some variables, not present in the original patch [3]. Anyway, Bash was complaining, but that was not causing any tests to fail. Also, that's easy to fix by simply removing the duplicated ones. Link: https://lore.kernel.org/fc21db4a-508d-41db-aa45-e3bc06d18ce7@kernel.org [1] Link: https://lore.kernel.org/20240905144306.1192409-5-matttbe@kernel.org [2] Link: https://lore.kernel.org/20240905144306.1192409-7-matttbe@kernel.org [3] Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> --- tools/testing/selftests/net/mptcp/mptcp_lib.sh | 11 ----------- 1 file changed, 11 deletions(-) diff --git a/tools/testing/selftests/net/mptcp/mptcp_lib.sh b/tools/testing/selftests/net/mptcp/mptcp_lib.sh index XXXXXXX..XXXXXXX 100644 --- a/tools/testing/selftests/net/mptcp/mptcp_lib.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_lib.sh @@ -XXX,XX +XXX,XX @@ readonly KSFT_SKIP=4 # shellcheck disable=SC2155 # declare and assign separately readonly KSFT_TEST="${MPTCP_LIB_KSFT_TEST:-$(basename "${0}" .sh)}" -# These variables are used in some selftests, read-only -declare -rx MPTCP_LIB_EVENT_ANNOUNCED=6 # MPTCP_EVENT_ANNOUNCED -declare -rx MPTCP_LIB_EVENT_REMOVED=7 # MPTCP_EVENT_REMOVED -declare -rx MPTCP_LIB_EVENT_SUB_ESTABLISHED=10 # MPTCP_EVENT_SUB_ESTABLISHED -declare -rx MPTCP_LIB_EVENT_SUB_CLOSED=11 # MPTCP_EVENT_SUB_CLOSED -declare -rx MPTCP_LIB_EVENT_LISTENER_CREATED=15 # MPTCP_EVENT_LISTENER_CREATED -declare -rx MPTCP_LIB_EVENT_LISTENER_CLOSED=16 # MPTCP_EVENT_LISTENER_CLOSED - -declare -rx MPTCP_LIB_AF_INET=2 -declare -rx MPTCP_LIB_AF_INET6=10 - # These variables are used in some selftests, read-only declare -rx MPTCP_LIB_EVENT_CREATED=1 # MPTCP_EVENT_CREATED declare -rx MPTCP_LIB_EVENT_ESTABLISHED=2 # MPTCP_EVENT_ESTABLISHED -- 2.45.2
Greg recently reported 2 patches that could not be applied without conflicts in v5.10: - e32d262c89e2 ("mptcp: handle consistently DSS corruption") - 4dabcdf58121 ("tcp: fix mptcp DSS corruption due to large pmtu xmit") Conflicts have been resolved, and documented in each patch. One extra commit has been backported, to support allow_infinite_fallback which is used by one commit from the list above: - 0530020a7c8f ("mptcp: track and update contiguous data status") Geliang Tang (1): mptcp: track and update contiguous data status Paolo Abeni (2): mptcp: handle consistently DSS corruption tcp: fix mptcp DSS corruption due to large pmtu xmit net/ipv4/tcp_output.c | 2 +- net/mptcp/mib.c | 2 ++ net/mptcp/mib.h | 2 ++ net/mptcp/protocol.c | 26 ++++++++++++++++++++++---- net/mptcp/protocol.h | 1 + net/mptcp/subflow.c | 3 ++- 6 files changed, 30 insertions(+), 6 deletions(-) -- 2.45.2
From: Geliang Tang <geliang.tang@suse.com> commit 0530020a7c8f2204e784f0dbdc882bbd961fdbde upstream. This patch adds a new member allow_infinite_fallback in mptcp_sock, which is initialized to 'true' when the connection begins and is set to 'false' on any retransmit or successful MP_JOIN. Only do infinite mapping fallback if there is a single subflow AND there have been no retransmissions AND there have never been any MP_JOINs. Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Stable-dep-of: e32d262c89e2 ("mptcp: handle consistently DSS corruption") [ Conflicts in protocol.c, because commit 3e5014909b56 ("mptcp: cleanup MPJ subflow list handling") is not in this version. This commit is linked to a new feature, changing the context around. The new line can still be added at the same place. Conflicts in protocol.h, because commit 4f6e14bd19d6 ("mptcp: support TCP_CORK and TCP_NODELAY") is not in this version. This commit is linked to a new feature, changing the context around. The new line can still be added at the same place. Conflicts in subflow.c, because commit 0348c690ed37 ("mptcp: add the fallback check") is not in this version. This commit is linked to a new feature, changing the context around. The new line can still be added at the same place. Extra conflicts in v5.10, because the context has been changed. ] Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> --- net/mptcp/protocol.c | 6 +++++- net/mptcp/protocol.h | 1 + net/mptcp/subflow.c | 1 + 3 files changed, 7 insertions(+), 1 deletion(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index XXXXXXX..XXXXXXX 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -XXX,XX +XXX,XX @@ static void mptcp_worker(struct work_struct *work) if (!mptcp_ext_cache_refill(msk)) break; } - if (copied) + if (copied) { tcp_push(ssk, msg.msg_flags, mss_now, tcp_sk(ssk)->nonagle, size_goal); + WRITE_ONCE(msk->allow_infinite_fallback, false); + } dfrag->data_seq = orig_write_seq; dfrag->offset = orig_offset; @@ -XXX,XX +XXX,XX @@ static int __mptcp_init_sock(struct sock *sk) msk->first = NULL; inet_csk(sk)->icsk_sync_mss = mptcp_sync_mss; + WRITE_ONCE(msk->allow_infinite_fallback, true); mptcp_pm_data_init(msk); @@ -XXX,XX +XXX,XX @@ bool mptcp_finish_join(struct sock *sk) if (parent_sock && !sk->sk_socket) mptcp_sock_graft(sk, parent_sock); subflow->map_seq = READ_ONCE(msk->ack_seq); + WRITE_ONCE(msk->allow_infinite_fallback, false); return true; } diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index XXXXXXX..XXXXXXX 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -XXX,XX +XXX,XX @@ struct mptcp_sock { bool rcv_data_fin; bool snd_data_fin_enable; bool use_64bit_ack; /* Set when we received a 64-bit DSN */ + bool allow_infinite_fallback; spinlock_t join_list_lock; struct work_struct work; struct sk_buff *ooo_last_skb; diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index XXXXXXX..XXXXXXX 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -XXX,XX +XXX,XX @@ int __mptcp_subflow_connect(struct sock *sk, const struct mptcp_addr_info *loc, list_add_tail(&subflow->node, &msk->join_list); spin_unlock_bh(&msk->join_list_lock); + WRITE_ONCE(msk->allow_infinite_fallback, false); return err; failed: -- 2.45.2
From: Paolo Abeni <pabeni@redhat.com> commit e32d262c89e2b22cb0640223f953b548617ed8a6 upstream. Bugged peer implementation can send corrupted DSS options, consistently hitting a few warning in the data path. Use DEBUG_NET assertions, to avoid the splat on some builds and handle consistently the error, dumping related MIBs and performing fallback and/or reset according to the subflow type. Fixes: 6771bfd9ee24 ("mptcp: update mptcp ack sequence from work queue") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20241008-net-mptcp-fallback-fixes-v1-1-c6fb8e93e551@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> [ Conflicts in mib.[ch], because commit 104125b82e5c ("mptcp: add mib for infinite map sending") is linked to a new feature, not available in this version. Resolving the conflicts is easy, simply adding the new lines declaring the new "DSS corruptions" MIB entries. Also removed in protocol.c and subflow.c all DEBUG_NET_WARN_ON_ONCE because they are not defined in this version: enough with the MIB counters that have been added in this commit. ] Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> --- net/mptcp/mib.c | 2 ++ net/mptcp/mib.h | 2 ++ net/mptcp/protocol.c | 20 +++++++++++++++++--- net/mptcp/subflow.c | 2 +- 4 files changed, 22 insertions(+), 4 deletions(-) diff --git a/net/mptcp/mib.c b/net/mptcp/mib.c index XXXXXXX..XXXXXXX 100644 --- a/net/mptcp/mib.c +++ b/net/mptcp/mib.c @@ -XXX,XX +XXX,XX @@ static const struct snmp_mib mptcp_snmp_list[] = { SNMP_MIB_ITEM("MPJoinAckRx", MPTCP_MIB_JOINACKRX), SNMP_MIB_ITEM("MPJoinAckHMacFailure", MPTCP_MIB_JOINACKMAC), SNMP_MIB_ITEM("DSSNotMatching", MPTCP_MIB_DSSNOMATCH), + SNMP_MIB_ITEM("DSSCorruptionFallback", MPTCP_MIB_DSSCORRUPTIONFALLBACK), + SNMP_MIB_ITEM("DSSCorruptionReset", MPTCP_MIB_DSSCORRUPTIONRESET), SNMP_MIB_ITEM("InfiniteMapRx", MPTCP_MIB_INFINITEMAPRX), SNMP_MIB_ITEM("OFOQueueTail", MPTCP_MIB_OFOQUEUETAIL), SNMP_MIB_ITEM("OFOQueue", MPTCP_MIB_OFOQUEUE), diff --git a/net/mptcp/mib.h b/net/mptcp/mib.h index XXXXXXX..XXXXXXX 100644 --- a/net/mptcp/mib.h +++ b/net/mptcp/mib.h @@ -XXX,XX +XXX,XX @@ enum linux_mptcp_mib_field { MPTCP_MIB_JOINACKRX, /* Received an ACK + MP_JOIN */ MPTCP_MIB_JOINACKMAC, /* HMAC was wrong on ACK + MP_JOIN */ MPTCP_MIB_DSSNOMATCH, /* Received a new mapping that did not match the previous one */ + MPTCP_MIB_DSSCORRUPTIONFALLBACK,/* DSS corruption detected, fallback */ + MPTCP_MIB_DSSCORRUPTIONRESET, /* DSS corruption detected, MPJ subflow reset */ MPTCP_MIB_INFINITEMAPRX, /* Received an infinite mapping */ MPTCP_MIB_OFOQUEUETAIL, /* Segments inserted into OoO queue tail */ MPTCP_MIB_OFOQUEUE, /* Segments inserted into OoO queue */ diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index XXXXXXX..XXXXXXX 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -XXX,XX +XXX,XX @@ static void mptcp_check_data_fin(struct sock *sk) } } +static void mptcp_dss_corruption(struct mptcp_sock *msk, struct sock *ssk) +{ + if (READ_ONCE(msk->allow_infinite_fallback)) { + MPTCP_INC_STATS(sock_net(ssk), + MPTCP_MIB_DSSCORRUPTIONFALLBACK); + mptcp_do_fallback(ssk); + } else { + MPTCP_INC_STATS(sock_net(ssk), MPTCP_MIB_DSSCORRUPTIONRESET); + mptcp_subflow_reset(ssk); + } +} + static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk, struct sock *ssk, unsigned int *bytes) @@ -XXX,XX +XXX,XX @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk, moved += len; seq += len; - if (WARN_ON_ONCE(map_remaining < len)) - break; + if (unlikely(map_remaining < len)) + mptcp_dss_corruption(msk, ssk); } else { - WARN_ON_ONCE(!fin); + if (unlikely(!fin)) + mptcp_dss_corruption(msk, ssk); + sk_eat_skb(ssk, skb); done = true; } diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index XXXXXXX..XXXXXXX 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -XXX,XX +XXX,XX @@ static bool skb_is_fully_mapped(struct sock *ssk, struct sk_buff *skb) unsigned int skb_consumed; skb_consumed = tcp_sk(ssk)->copied_seq - TCP_SKB_CB(skb)->seq; - if (WARN_ON_ONCE(skb_consumed >= skb->len)) + if (unlikely(skb_consumed >= skb->len)) return true; return skb->len - skb_consumed <= subflow->map_data_len - -- 2.45.2
From: Paolo Abeni <pabeni@redhat.com> commit 4dabcdf581217e60690467a37c956a5b8dbc6bd9 upstream. Syzkaller was able to trigger a DSS corruption: TCP: request_sock_subflow_v4: Possible SYN flooding on port [::]:20002. Sending cookies. ------------[ cut here ]------------ WARNING: CPU: 0 PID: 5227 at net/mptcp/protocol.c:695 __mptcp_move_skbs_from_subflow+0x20a9/0x21f0 net/mptcp/protocol.c:695 Modules linked in: CPU: 0 UID: 0 PID: 5227 Comm: syz-executor350 Not tainted 6.11.0-syzkaller-08829-gaf9c191ac2a0 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024 RIP: 0010:__mptcp_move_skbs_from_subflow+0x20a9/0x21f0 net/mptcp/protocol.c:695 Code: 0f b6 dc 31 ff 89 de e8 b5 dd ea f5 89 d8 48 81 c4 50 01 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc e8 98 da ea f5 90 <0f> 0b 90 e9 47 ff ff ff e8 8a da ea f5 90 0f 0b 90 e9 99 e0 ff ff RSP: 0018:ffffc90000006db8 EFLAGS: 00010246 RAX: ffffffff8ba9df18 RBX: 00000000000055f0 RCX: ffff888030023c00 RDX: 0000000000000100 RSI: 00000000000081e5 RDI: 00000000000055f0 RBP: 1ffff110062bf1ae R08: ffffffff8ba9cf12 R09: 1ffff110062bf1b8 R10: dffffc0000000000 R11: ffffed10062bf1b9 R12: 0000000000000000 R13: dffffc0000000000 R14: 00000000700cec61 R15: 00000000000081e5 FS: 000055556679c380(0000) GS:ffff8880b8600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020287000 CR3: 0000000077892000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <IRQ> move_skbs_to_msk net/mptcp/protocol.c:811 [inline] mptcp_data_ready+0x29c/0xa90 net/mptcp/protocol.c:854 subflow_data_ready+0x34a/0x920 net/mptcp/subflow.c:1490 tcp_data_queue+0x20fd/0x76c0 net/ipv4/tcp_input.c:5283 tcp_rcv_established+0xfba/0x2020 net/ipv4/tcp_input.c:6237 tcp_v4_do_rcv+0x96d/0xc70 net/ipv4/tcp_ipv4.c:1915 tcp_v4_rcv+0x2dc0/0x37f0 net/ipv4/tcp_ipv4.c:2350 ip_protocol_deliver_rcu+0x22e/0x440 net/ipv4/ip_input.c:205 ip_local_deliver_finish+0x341/0x5f0 net/ipv4/ip_input.c:233 NF_HOOK+0x3a4/0x450 include/linux/netfilter.h:314 NF_HOOK+0x3a4/0x450 include/linux/netfilter.h:314 __netif_receive_skb_one_core net/core/dev.c:5662 [inline] __netif_receive_skb+0x2bf/0x650 net/core/dev.c:5775 process_backlog+0x662/0x15b0 net/core/dev.c:6107 __napi_poll+0xcb/0x490 net/core/dev.c:6771 napi_poll net/core/dev.c:6840 [inline] net_rx_action+0x89b/0x1240 net/core/dev.c:6962 handle_softirqs+0x2c5/0x980 kernel/softirq.c:554 do_softirq+0x11b/0x1e0 kernel/softirq.c:455 </IRQ> <TASK> __local_bh_enable_ip+0x1bb/0x200 kernel/softirq.c:382 local_bh_enable include/linux/bottom_half.h:33 [inline] rcu_read_unlock_bh include/linux/rcupdate.h:919 [inline] __dev_queue_xmit+0x1764/0x3e80 net/core/dev.c:4451 dev_queue_xmit include/linux/netdevice.h:3094 [inline] neigh_hh_output include/net/neighbour.h:526 [inline] neigh_output include/net/neighbour.h:540 [inline] ip_finish_output2+0xd41/0x1390 net/ipv4/ip_output.c:236 ip_local_out net/ipv4/ip_output.c:130 [inline] __ip_queue_xmit+0x118c/0x1b80 net/ipv4/ip_output.c:536 __tcp_transmit_skb+0x2544/0x3b30 net/ipv4/tcp_output.c:1466 tcp_transmit_skb net/ipv4/tcp_output.c:1484 [inline] tcp_mtu_probe net/ipv4/tcp_output.c:2547 [inline] tcp_write_xmit+0x641d/0x6bf0 net/ipv4/tcp_output.c:2752 __tcp_push_pending_frames+0x9b/0x360 net/ipv4/tcp_output.c:3015 tcp_push_pending_frames include/net/tcp.h:2107 [inline] tcp_data_snd_check net/ipv4/tcp_input.c:5714 [inline] tcp_rcv_established+0x1026/0x2020 net/ipv4/tcp_input.c:6239 tcp_v4_do_rcv+0x96d/0xc70 net/ipv4/tcp_ipv4.c:1915 sk_backlog_rcv include/net/sock.h:1113 [inline] __release_sock+0x214/0x350 net/core/sock.c:3072 release_sock+0x61/0x1f0 net/core/sock.c:3626 mptcp_push_release net/mptcp/protocol.c:1486 [inline] __mptcp_push_pending+0x6b5/0x9f0 net/mptcp/protocol.c:1625 mptcp_sendmsg+0x10bb/0x1b10 net/mptcp/protocol.c:1903 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x1a6/0x270 net/socket.c:745 ____sys_sendmsg+0x52a/0x7e0 net/socket.c:2603 ___sys_sendmsg net/socket.c:2657 [inline] __sys_sendmsg+0x2aa/0x390 net/socket.c:2686 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fb06e9317f9 Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffe2cfd4f98 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007fb06e97f468 RCX: 00007fb06e9317f9 RDX: 0000000000000000 RSI: 0000000020000080 RDI: 0000000000000005 RBP: 00007fb06e97f446 R08: 0000555500000000 R09: 0000555500000000 R10: 0000555500000000 R11: 0000000000000246 R12: 00007fb06e97f406 R13: 0000000000000001 R14: 00007ffe2cfd4fe0 R15: 0000000000000003 </TASK> Additionally syzkaller provided a nice reproducer. The repro enables pmtu on the loopback device, leading to tcp_mtu_probe() generating very large probe packets. tcp_can_coalesce_send_queue_head() currently does not check for mptcp-level invariants, and allowed the creation of cross-DSS probes, leading to the mentioned corruption. Address the issue teaching tcp_can_coalesce_send_queue_head() about mptcp using the tcp_skb_can_collapse(), also reducing the code duplication. Fixes: 85712484110d ("tcp: coalesce/collapse must respect MPTCP extensions") Cc: stable@vger.kernel.org Reported-by: syzbot+d1bff73460e33101f0e7@syzkaller.appspotmail.com Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/513 Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20241008-net-mptcp-fallback-fixes-v1-2-c6fb8e93e551@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> [ Conflict in tcp_output.c, because commit 65249feb6b3d ("net: add support for skbs with unreadable frags"), and commit 9b65b17db723 ("net: avoid double accounting for pure zerocopy skbs") are not in this version. These commits are linked to new features and introduce new conditions which cause the conflicts. Resolving this is easy: we can ignore the missing new condition, and use tcp_skb_can_collapse() like in the original patch. ] Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> --- net/ipv4/tcp_output.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index XXXXXXX..XXXXXXX 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -XXX,XX +XXX,XX @@ static bool tcp_can_coalesce_send_queue_head(struct sock *sk, int len) if (len <= skb->len) break; - if (unlikely(TCP_SKB_CB(skb)->eor) || tcp_has_tx_tstamp(skb)) + if (tcp_has_tx_tstamp(skb) || !tcp_skb_can_collapse(skb, next)) return false; len -= skb->len; -- 2.45.2