From nobody Wed Sep 17 18:19:16 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 86F57266F00; Mon, 28 Jul 2025 13:29:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753709378; cv=none; b=s6WLuYDhrLNbkxZOWm+9fCwVS7ptRRVFyj4iFeMATLqlPy94hkgpPZxcivKqVnS46oTKz43qWUvVp/FysLLWAcpXYX6O9uOm/HNO8JvCnCvJLYan0vsWaqygeHGagVMjPqKhELJbc83+uBQr6Ztqes0RNo5wotPVUZuCvpE+F3s= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753709378; c=relaxed/simple; bh=D5DUlFZaF/HWne+lZFbLRq09ofhdgQSVazQJ7PCafQ0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=NBnzV/9LaRQ5UePN9EFmhgYg+Se3pOBQ8qRLogGJEDMVQyBnvTnqSK8rgnXVpapXSpZKrtQYxI6cE3RR6R4L3REKOdolaxZXr1tAaVJJAvdESN4Zwiic1Y5Cb0rE0BO95BJVh7JPJDOaOU/rPB7Q+EvapHD7Ot3W9+ADyUeIIcI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=l5SKixdX; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="l5SKixdX" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 92EBDC4CEEF; Mon, 28 Jul 2025 13:29:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1753709375; bh=D5DUlFZaF/HWne+lZFbLRq09ofhdgQSVazQJ7PCafQ0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=l5SKixdXKpETJ+TCPxWCcYwb8wb2zU2jlodOZbezBIlr3O3umLbdDshjnX+T/bLCY wCjgs+6iMugkeRcxVV2/Yk2mlKrrPB9maOWqhfGO1hRvZ0Rx0WomTHKRHV1mbqljLF Ye04Dpx7ruKvQMeY8Z0KN/PEw2PfQPZNs8W5DHDbBib84xOVi1T5+at5pE0y1tOZAF HtCKPERDepTDsASEjPV7x2kzhvmx/kDAWg0xqbr4dEFn6q8/kjoziT/BlX0yF1INoN YFQUp7XkWMNzjZndwgmQR64VVG5+QMAcYik4AqfJWIRnS4CxIozpu+gaUUPSI2YK9y CaoAXVC2fYvxw== From: "Matthieu Baerts (NGI0)" To: mptcp@lists.linux.dev, stable@vger.kernel.org, gregkh@linuxfoundation.org Cc: Paolo Abeni , sashal@kernel.org, Matthieu Baerts , syzbot+5cf807c20386d699b524@syzkaller.appspotmail.com, Jakub Kicinski Subject: [PATCH 6.1.y 1/3] mptcp: make fallback action and fallback decision atomic Date: Mon, 28 Jul 2025 15:29:21 +0200 Message-ID: <20250728132919.3904847-6-matttbe@kernel.org> X-Mailer: git-send-email 2.50.0 In-Reply-To: <20250728132919.3904847-5-matttbe@kernel.org> References: <20250728132919.3904847-5-matttbe@kernel.org> Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=16434; i=matttbe@kernel.org; h=from:subject; bh=27CI4d5tkp0CAoFid/3hCTIp7tDC+Nz4eZUCr1CzIqY=; b=owGbwMvMwCVWo/Th0Gd3rumMp9WSGDLaqw31r7l5JK7kX54sGMjOOW92rV2Y0qwfC9/GbODfd Fiv89mJjlIWBjEuBlkxRRbptsj8mc+reEu8/Cxg5rAygQxh4OIUgInsm8zI8H+z0hslR65JSlum /OYN1iw7ciWq/MHMO9q6nL6626U4VBn+u73/eEqtW7vDm9k2ykF99v4TmR5dBwLP3XRa+vOgffp xTgA= X-Developer-Key: i=matttbe@kernel.org; a=openpgp; fpr=E8CB85F76877057A6E27F77AF6B7824F4269A073 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Paolo Abeni commit f8a1d9b18c5efc76784f5a326e905f641f839894 upstream. Syzkaller reported the following splat: WARNING: CPU: 1 PID: 7704 at net/mptcp/protocol.h:1223 __mptcp_do_fallbac= k net/mptcp/protocol.h:1223 [inline] WARNING: CPU: 1 PID: 7704 at net/mptcp/protocol.h:1223 mptcp_do_fallback = net/mptcp/protocol.h:1244 [inline] WARNING: CPU: 1 PID: 7704 at net/mptcp/protocol.h:1223 check_fully_establ= ished net/mptcp/options.c:982 [inline] WARNING: CPU: 1 PID: 7704 at net/mptcp/protocol.h:1223 mptcp_incoming_opt= ions+0x21a8/0x2510 net/mptcp/options.c:1153 Modules linked in: CPU: 1 UID: 0 PID: 7704 Comm: syz.3.1419 Not tainted 6.16.0-rc3-gbd5ce232= 4dba #20 PREEMPT(voluntary) Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-de= bian-1.16.3-2 04/01/2014 RIP: 0010:__mptcp_do_fallback net/mptcp/protocol.h:1223 [inline] RIP: 0010:mptcp_do_fallback net/mptcp/protocol.h:1244 [inline] RIP: 0010:check_fully_established net/mptcp/options.c:982 [inline] RIP: 0010:mptcp_incoming_options+0x21a8/0x2510 net/mptcp/options.c:1153 Code: 24 18 e8 bb 2a 00 fd e9 1b df ff ff e8 b1 21 0f 00 e8 ec 5f c4 fc 4= 4 0f b7 ac 24 b0 00 00 00 e9 54 f1 ff ff e8 d9 5f c4 fc 90 <0f> 0b 90 e9 b8= f4 ff ff e8 8b 2a 00 fd e9 8d e6 ff ff e8 81 2a 00 RSP: 0018:ffff8880a3f08448 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8880180a8000 RCX: ffffffff84afcf45 RDX: ffff888090223700 RSI: ffffffff84afdaa7 RDI: 0000000000000001 RBP: ffff888017955780 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8880180a8910 R14: ffff8880a3e9d058 R15: 0000000000000000 FS: 00005555791b8500(0000) GS:ffff88811c495000(0000) knlGS:0000000000000= 000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000110c2800b7 CR3: 0000000058e44000 CR4: 0000000000350ef0 Call Trace: tcp_reset+0x26f/0x2b0 net/ipv4/tcp_input.c:4432 tcp_validate_incoming+0x1057/0x1b60 net/ipv4/tcp_input.c:5975 tcp_rcv_established+0x5b5/0x21f0 net/ipv4/tcp_input.c:6166 tcp_v4_do_rcv+0x5dc/0xa70 net/ipv4/tcp_ipv4.c:1925 tcp_v4_rcv+0x3473/0x44a0 net/ipv4/tcp_ipv4.c:2363 ip_protocol_deliver_rcu+0xba/0x480 net/ipv4/ip_input.c:205 ip_local_deliver_finish+0x2f1/0x500 net/ipv4/ip_input.c:233 NF_HOOK include/linux/netfilter.h:317 [inline] NF_HOOK include/linux/netfilter.h:311 [inline] ip_local_deliver+0x1be/0x560 net/ipv4/ip_input.c:254 dst_input include/net/dst.h:469 [inline] ip_rcv_finish net/ipv4/ip_input.c:447 [inline] NF_HOOK include/linux/netfilter.h:317 [inline] NF_HOOK include/linux/netfilter.h:311 [inline] ip_rcv+0x514/0x810 net/ipv4/ip_input.c:567 __netif_receive_skb_one_core+0x197/0x1e0 net/core/dev.c:5975 __netif_receive_skb+0x1f/0x120 net/core/dev.c:6088 process_backlog+0x301/0x1360 net/core/dev.c:6440 __napi_poll.constprop.0+0xba/0x550 net/core/dev.c:7453 napi_poll net/core/dev.c:7517 [inline] net_rx_action+0xb44/0x1010 net/core/dev.c:7644 handle_softirqs+0x1d0/0x770 kernel/softirq.c:579 do_softirq+0x3f/0x90 kernel/softirq.c:480 __local_bh_enable_ip+0xed/0x110 kernel/softirq.c:407 local_bh_enable include/linux/bottom_half.h:33 [inline] inet_csk_listen_stop+0x2c5/0x1070 net/ipv4/inet_connection_sock.c:1524 mptcp_check_listen_stop.part.0+0x1cc/0x220 net/mptcp/protocol.c:2985 mptcp_check_listen_stop net/mptcp/mib.h:118 [inline] __mptcp_close+0x9b9/0xbd0 net/mptcp/protocol.c:3000 mptcp_close+0x2f/0x140 net/mptcp/protocol.c:3066 inet_release+0xed/0x200 net/ipv4/af_inet.c:435 inet6_release+0x4f/0x70 net/ipv6/af_inet6.c:487 __sock_release+0xb3/0x270 net/socket.c:649 sock_close+0x1c/0x30 net/socket.c:1439 __fput+0x402/0xb70 fs/file_table.c:465 task_work_run+0x150/0x240 kernel/task_work.c:227 resume_user_mode_work include/linux/resume_user_mode.h:50 [inline] exit_to_user_mode_loop+0xd4/0xe0 kernel/entry/common.c:114 exit_to_user_mode_prepare include/linux/entry-common.h:330 [inline] syscall_exit_to_user_mode_work include/linux/entry-common.h:414 [inline] syscall_exit_to_user_mode include/linux/entry-common.h:449 [inline] do_syscall_64+0x245/0x360 arch/x86/entry/syscall_64.c:100 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fc92f8a36ad Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f= 7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff= ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffcf52802d8 EFLAGS: 00000246 ORIG_RAX: 00000000000001b4 RAX: 0000000000000000 RBX: 00007ffcf52803a8 RCX: 00007fc92f8a36ad RDX: 0000000000000000 RSI: 000000000000001e RDI: 0000000000000003 RBP: 00007fc92fae7ba0 R08: 0000000000000001 R09: 0000002800000000 R10: 00007fc92f700000 R11: 0000000000000246 R12: 00007fc92fae5fac R13: 00007fc92fae5fa0 R14: 0000000000026d00 R15: 0000000000026c51 irq event stamp: 4068 hardirqs last enabled at (4076): [] __up_console_sem+0= x76/0x80 kernel/printk/printk.c:344 hardirqs last disabled at (4085): [] __up_console_sem+0= x5b/0x80 kernel/printk/printk.c:342 softirqs last enabled at (3096): [] local_bh_enable in= clude/linux/bottom_half.h:33 [inline] softirqs last enabled at (3096): [] inet_csk_listen_st= op+0x2c0/0x1070 net/ipv4/inet_connection_sock.c:1524 softirqs last disabled at (3097): [] do_softirq+0x3f/0x= 90 kernel/softirq.c:480 Since we need to track the 'fallback is possible' condition and the fallback status separately, there are a few possible races open between the check and the actual fallback action. Add a spinlock to protect the fallback related information and use it close all the possible related races. While at it also remove the too-early clearing of allow_infinite_fallback in __mptcp_subflow_connect(): the field will be correctly cleared by subflow_finish_connect() if/when the connection will complete successfully. If fallback is not possible, as per RFC, reset the current subflow. Since the fallback operation can now fail and return value should be checked, rename the helper accordingly. Fixes: 0530020a7c8f ("mptcp: track and update contiguous data status") Cc: stable@vger.kernel.org Reported-by: Matthieu Baerts Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/570 Reported-by: syzbot+5cf807c20386d699b524@syzkaller.appspotmail.com Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/555 Signed-off-by: Paolo Abeni Reviewed-by: Matthieu Baerts (NGI0) Signed-off-by: Matthieu Baerts (NGI0) Link: https://patch.msgid.link/20250714-net-mptcp-fallback-races-v1-1-391af= f963322@kernel.org Signed-off-by: Jakub Kicinski [ Conflicts in protocol.h, because commit 6ebf6f90ab4a ("mptcp: add mptcpi_subflows_total counter") is not in this version, and this causes conflicts in the context. Commit 65b02260a0e0 ("mptcp: export mptcp_subflow_early_fallback()") is also not in this version, and moves code from protocol.c to protocol.h, but the modification can still apply there. Conflicts in protocol.c because commit ee2708aedad0 ("mptcp: use get_retrans wrapper") is not in this version and refactor the code in __mptcp_retrans(), but the modification can still be applied, just not at the same indentation level. There were other conflicts in the context due to commit 8005184fd1ca ("mptcp: refactor sndbuf auto-tuning"), commit b3ea6b272d79 ("mptcp: consolidate initial ack seq generation"), and commit 013e3179dbd2 ("mptcp: fix rcv space initialization") that are not in this version. ] Signed-off-by: Matthieu Baerts (NGI0) --- net/mptcp/options.c | 3 ++- net/mptcp/protocol.c | 39 +++++++++++++++++++++++++++++++++------ net/mptcp/protocol.h | 24 ++++++++++++++++++------ net/mptcp/subflow.c | 11 +++++------ 4 files changed, 58 insertions(+), 19 deletions(-) diff --git a/net/mptcp/options.c b/net/mptcp/options.c index 929074f08713..8e48451f3225 100644 --- a/net/mptcp/options.c +++ b/net/mptcp/options.c @@ -973,8 +973,9 @@ static bool check_fully_established(struct mptcp_sock *= msk, struct sock *ssk, if (subflow->mp_join) goto reset; subflow->mp_capable =3D 0; + if (!mptcp_try_fallback(ssk)) + goto reset; pr_fallback(msk); - mptcp_do_fallback(ssk); return false; } =20 diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index e975693b8fa9..904a348daa51 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -633,10 +633,9 @@ static bool mptcp_check_data_fin(struct sock *sk) =20 static void mptcp_dss_corruption(struct mptcp_sock *msk, struct sock *ssk) { - if (READ_ONCE(msk->allow_infinite_fallback)) { + if (mptcp_try_fallback(ssk)) { MPTCP_INC_STATS(sock_net(ssk), MPTCP_MIB_DSSCORRUPTIONFALLBACK); - mptcp_do_fallback(ssk); } else { MPTCP_INC_STATS(sock_net(ssk), MPTCP_MIB_DSSCORRUPTIONRESET); mptcp_subflow_reset(ssk); @@ -897,6 +896,14 @@ static bool __mptcp_finish_join(struct mptcp_sock *msk= , struct sock *ssk) if (sk->sk_state !=3D TCP_ESTABLISHED) return false; =20 + spin_lock_bh(&msk->fallback_lock); + if (__mptcp_check_fallback(msk)) { + spin_unlock_bh(&msk->fallback_lock); + return false; + } + mptcp_subflow_joined(msk, ssk); + spin_unlock_bh(&msk->fallback_lock); + /* attach to msk socket only after we are sure we will deal with it * at close time */ @@ -904,7 +911,6 @@ static bool __mptcp_finish_join(struct mptcp_sock *msk,= struct sock *ssk) mptcp_sock_graft(ssk, sk->sk_socket); =20 mptcp_sockopt_sync_locked(msk, ssk); - mptcp_subflow_joined(msk, ssk); mptcp_stop_tout_timer(sk); return true; } @@ -1288,10 +1294,14 @@ static void mptcp_update_infinite_map(struct mptcp_= sock *msk, mpext->infinite_map =3D 1; mpext->data_len =3D 0; =20 + if (!mptcp_try_fallback(ssk)) { + mptcp_subflow_reset(ssk); + return; + } + MPTCP_INC_STATS(sock_net(ssk), MPTCP_MIB_INFINITEMAPTX); mptcp_subflow_ctx(ssk)->send_infinite_map =3D 0; pr_fallback(msk); - mptcp_do_fallback(ssk); } =20 #define MPTCP_MAX_GSO_SIZE (GSO_LEGACY_MAX_SIZE - (MAX_TCP_HEADER + 1)) @@ -2638,8 +2648,8 @@ static void mptcp_check_fastclose(struct mptcp_sock *= msk) =20 static void __mptcp_retrans(struct sock *sk) { + struct mptcp_sendmsg_info info =3D { .data_lock_held =3D true, }; struct mptcp_sock *msk =3D mptcp_sk(sk); - struct mptcp_sendmsg_info info =3D {}; struct mptcp_data_frag *dfrag; size_t copied =3D 0; struct sock *ssk; @@ -2675,6 +2685,15 @@ static void __mptcp_retrans(struct sock *sk) /* limit retransmission to the bytes already sent on some subflows */ info.sent =3D 0; info.limit =3D READ_ONCE(msk->csum_enabled) ? dfrag->data_len : dfrag->al= ready_sent; + + /* make the whole retrans decision, xmit, disallow fallback atomic */ + spin_lock_bh(&msk->fallback_lock); + if (__mptcp_check_fallback(msk)) { + spin_unlock_bh(&msk->fallback_lock); + release_sock(ssk); + return; + } + while (info.sent < info.limit) { ret =3D mptcp_sendmsg_frag(sk, ssk, dfrag, &info); if (ret <=3D 0) @@ -2690,6 +2709,7 @@ static void __mptcp_retrans(struct sock *sk) info.size_goal); WRITE_ONCE(msk->allow_infinite_fallback, false); } + spin_unlock_bh(&msk->fallback_lock); =20 release_sock(ssk); =20 @@ -2819,6 +2839,7 @@ static int __mptcp_init_sock(struct sock *sk) msk->recovery =3D false; =20 mptcp_pm_data_init(msk); + spin_lock_init(&msk->fallback_lock); =20 /* re-use the csk retrans timer for MPTCP-level retrans */ timer_setup(&msk->sk.icsk_retransmit_timer, mptcp_retransmit_timer, 0); @@ -3651,7 +3672,13 @@ bool mptcp_finish_join(struct sock *ssk) =20 /* active subflow, already present inside the conn_list */ if (!list_empty(&subflow->node)) { + spin_lock_bh(&msk->fallback_lock); + if (__mptcp_check_fallback(msk)) { + spin_unlock_bh(&msk->fallback_lock); + return false; + } mptcp_subflow_joined(msk, ssk); + spin_unlock_bh(&msk->fallback_lock); return true; } =20 @@ -3764,7 +3791,7 @@ static void mptcp_subflow_early_fallback(struct mptcp= _sock *msk, struct mptcp_subflow_context *subflow) { subflow->request_mptcp =3D 0; - __mptcp_do_fallback(msk); + WARN_ON_ONCE(!__mptcp_try_fallback(msk)); } =20 static int mptcp_connect(struct sock *sk, struct sockaddr *uaddr, int addr= _len) diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index 25c1cda5c1bc..a3acc7042ee9 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -317,6 +317,10 @@ struct mptcp_sock { =20 u32 setsockopt_seq; char ca_name[TCP_CA_NAME_MAX]; + + spinlock_t fallback_lock; /* protects fallback and + * allow_infinite_fallback + */ }; =20 #define mptcp_data_lock(sk) spin_lock_bh(&(sk)->sk_lock.slock) @@ -975,25 +979,32 @@ static inline bool mptcp_check_fallback(const struct = sock *sk) return __mptcp_check_fallback(msk); } =20 -static inline void __mptcp_do_fallback(struct mptcp_sock *msk) +static inline bool __mptcp_try_fallback(struct mptcp_sock *msk) { if (test_bit(MPTCP_FALLBACK_DONE, &msk->flags)) { pr_debug("TCP fallback already done (msk=3D%p)\n", msk); - return; + return true; } - if (WARN_ON_ONCE(!READ_ONCE(msk->allow_infinite_fallback))) - return; + spin_lock_bh(&msk->fallback_lock); + if (!msk->allow_infinite_fallback) { + spin_unlock_bh(&msk->fallback_lock); + return false; + } + set_bit(MPTCP_FALLBACK_DONE, &msk->flags); + spin_unlock_bh(&msk->fallback_lock); + return true; } =20 -static inline void mptcp_do_fallback(struct sock *ssk) +static inline bool mptcp_try_fallback(struct sock *ssk) { struct mptcp_subflow_context *subflow =3D mptcp_subflow_ctx(ssk); struct sock *sk =3D subflow->conn; struct mptcp_sock *msk; =20 msk =3D mptcp_sk(sk); - __mptcp_do_fallback(msk); + if (!__mptcp_try_fallback(msk)) + return false; if (READ_ONCE(msk->snd_data_fin_enable) && !(ssk->sk_shutdown & SEND_SHUT= DOWN)) { gfp_t saved_allocation =3D ssk->sk_allocation; =20 @@ -1005,6 +1016,7 @@ static inline void mptcp_do_fallback(struct sock *ssk) tcp_shutdown(ssk, SEND_SHUTDOWN); ssk->sk_allocation =3D saved_allocation; } + return true; } =20 #define pr_fallback(a) pr_debug("%s:fallback to TCP (msk=3D%p)\n", __func_= _, a) diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index a6237eb55537..d21109a130ec 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -431,9 +431,11 @@ static void subflow_finish_connect(struct sock *sk, co= nst struct sk_buff *skb) mptcp_get_options(skb, &mp_opt); if (subflow->request_mptcp) { if (!(mp_opt.suboptions & OPTION_MPTCP_MPC_SYNACK)) { + if (!mptcp_try_fallback(sk)) + goto do_reset; + MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_MPCAPABLEACTIVEFALLBACK); - mptcp_do_fallback(sk); pr_fallback(mptcp_sk(subflow->conn)); goto fallback; } @@ -1269,7 +1271,7 @@ static bool subflow_check_data_avail(struct sock *ssk) return true; } =20 - if (!READ_ONCE(msk->allow_infinite_fallback)) { + if (!mptcp_try_fallback(ssk)) { /* fatal protocol error, close the socket. * subflow_error_report() will introduce the appropriate barriers */ @@ -1285,8 +1287,6 @@ static bool subflow_check_data_avail(struct sock *ssk) WRITE_ONCE(subflow->data_avail, MPTCP_SUBFLOW_NODATA); return false; } - - mptcp_do_fallback(ssk); } =20 skb =3D skb_peek(&ssk->sk_receive_queue); @@ -1519,7 +1519,6 @@ int __mptcp_subflow_connect(struct sock *sk, const st= ruct mptcp_addr_info *loc, /* discard the subflow socket */ mptcp_sock_graft(ssk, sk->sk_socket); iput(SOCK_INODE(sf)); - WRITE_ONCE(msk->allow_infinite_fallback, false); mptcp_stop_tout_timer(sk); return 0; =20 @@ -1690,7 +1689,7 @@ static void subflow_state_change(struct sock *sk) msk =3D mptcp_sk(parent); if (subflow_simultaneous_connect(sk)) { mptcp_propagate_sndbuf(parent, sk); - mptcp_do_fallback(sk); + WARN_ON_ONCE(!mptcp_try_fallback(sk)); mptcp_rcv_space_init(msk, sk); pr_fallback(msk); subflow->conn_finished =3D 1; --=20 2.50.0 From nobody Wed Sep 17 18:19:16 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 86EF1264625; Mon, 28 Jul 2025 13:29:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753709378; cv=none; b=dv89ir37bSRBT/J1YQnJD3XepCaax8/Y0kxHlYazjixqTyBglPskUSI7JBxclVZVCJcy5cx5MKhA0/ZG+F9XTO++bfyt+d2fTc2fPAbAOBS0DbcyAhAsSI/HxoUDw1326d8jN2D4BSmgoRW6iQ25hNn3SILXCz/eaOXPAHSeevE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753709378; c=relaxed/simple; bh=ajIsIPGdqkDtuGr8nwg2OjjGRVFb6CUhknoCIztH9rQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=NYb7jiOe/WVRTlhCjNcVPF7iadMqy3N1MU9uSrQfnS2kCnObkcfIU/xQVuxSSNwsJrut+aitwfbi7CsN2T76A8ucdCBNf0hRRj1hTm7cChTvgM/RcVB4Ck+xY0j8Qs6P4fZvDyeuO8Ih/dPwHT3aiz3es+YrEAGHVlvBQPZiqck= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Lc2GYVTH; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Lc2GYVTH" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8A39FC4CEF8; Mon, 28 Jul 2025 13:29:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1753709377; bh=ajIsIPGdqkDtuGr8nwg2OjjGRVFb6CUhknoCIztH9rQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Lc2GYVTHk69+rh2DMmd3Pax5bQ9bsE3iJKxquSwgEe4lyV06PAou89O63wObSRViC H4QiAu8hbJoUQjFfnqxWOo2EcFW6SroUzNug+ET65e8iDHazrvRm/rO7opTOUbQiLc qQBH5YHoxC5vXU7RvxZ0W9czS6HYC4XWK80KjVbWKnfa1BIgK8+tTPDyjpO68Raz7q NLuSikFgtzEveO7RakWV1gZdqLhAhXge+Er8aiwT+JMCwPaxt8gJCg0C5cQSo/yxXD FML8XEROBi6IOE4JXr5Cs1hBWwbTYgrS/IHcp1Z/8DRyVR+qbzqx5doUhvU0WJzEFG e/WBwBQ9EOfbQ== From: "Matthieu Baerts (NGI0)" To: mptcp@lists.linux.dev, stable@vger.kernel.org, gregkh@linuxfoundation.org Cc: Paolo Abeni , sashal@kernel.org, "Matthieu Baerts (NGI0)" , Jakub Kicinski Subject: [PATCH 6.1.y 2/3] mptcp: plug races between subflow fail and subflow creation Date: Mon, 28 Jul 2025 15:29:22 +0200 Message-ID: <20250728132919.3904847-7-matttbe@kernel.org> X-Mailer: git-send-email 2.50.0 In-Reply-To: <20250728132919.3904847-5-matttbe@kernel.org> References: <20250728132919.3904847-5-matttbe@kernel.org> Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=7484; i=matttbe@kernel.org; h=from:subject; bh=6ZgUWt+C+BU2mwxkJxeLetLROoJ7tZVUltObvgt8/5c=; b=owGbwMvMwCVWo/Th0Gd3rumMp9WSGDLaq405Mguv3J/aYbe551TEwamxh771Fu9qnf2oxVzVe 7L8lW2TOkpZGMS4GGTFFFmk2yLzZz6v4i3x8rOAmcPKBDKEgYtTACZS5snwP3TPqcCtbJ/+xzl6 t+4Ofv7EgGnS9z1zLzul3D/1LT/p7ESG/26h/KwSupnfHKQURD7PqFEXvjD1n92qxR9KyqZeiD/ KyAUA X-Developer-Key: i=matttbe@kernel.org; a=openpgp; fpr=E8CB85F76877057A6E27F77AF6B7824F4269A073 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Paolo Abeni commit def5b7b2643ebba696fc60ddf675dca13f073486 upstream. We have races similar to the one addressed by the previous patch between subflow failing and additional subflow creation. They are just harder to trigger. The solution is similar. Use a separate flag to track the condition 'socket state prevent any additional subflow creation' protected by the fallback lock. The socket fallback makes such flag true, and also receiving or sending an MP_FAIL option. The field 'allow_infinite_fallback' is now always touched under the relevant lock, we can drop the ONCE annotation on write. Fixes: 478d770008b0 ("mptcp: send out MP_FAIL when data checksum fails") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni Reviewed-by: Matthieu Baerts (NGI0) Signed-off-by: Matthieu Baerts (NGI0) Link: https://patch.msgid.link/20250714-net-mptcp-fallback-races-v1-2-391af= f963322@kernel.org Signed-off-by: Jakub Kicinski [ Conflicts in subflow.c, because commit f1f26512a9bf ("mptcp: use plain bool instead of custom binary enum") and commit 46a5d3abedbe ("mptcp: fix typos in comments") are not in this version. Both are causing conflicts in the context, and the same modifications can still be applied. Same in protocol.h with commit b8dc6d6ce931 ("mptcp: fix rcv buffer auto-tuning"). Conflicts in protocol.c because commit ee2708aedad0 ("mptcp: use get_retrans wrapper") is not in this version and refactor the code in __mptcp_retrans(), but the modification can still be applied, just not at the same indentation level. ] Signed-off-by: Matthieu Baerts (NGI0) --- net/mptcp/pm.c | 8 +++++++- net/mptcp/protocol.c | 11 ++++++----- net/mptcp/protocol.h | 7 +++++-- net/mptcp/subflow.c | 19 ++++++++++++++----- 4 files changed, 32 insertions(+), 13 deletions(-) diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c index 34120694ad49..6392973b1fa7 100644 --- a/net/mptcp/pm.c +++ b/net/mptcp/pm.c @@ -309,8 +309,14 @@ void mptcp_pm_mp_fail_received(struct sock *sk, u64 fa= il_seq) =20 pr_debug("fail_seq=3D%llu\n", fail_seq); =20 - if (!READ_ONCE(msk->allow_infinite_fallback)) + /* After accepting the fail, we can't create any other subflows */ + spin_lock_bh(&msk->fallback_lock); + if (!msk->allow_infinite_fallback) { + spin_unlock_bh(&msk->fallback_lock); return; + } + msk->allow_subflows =3D false; + spin_unlock_bh(&msk->fallback_lock); =20 if (!subflow->fail_tout) { pr_debug("send MP_FAIL response and infinite map\n"); diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 904a348daa51..73e298f276a8 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -885,7 +885,7 @@ void mptcp_data_ready(struct sock *sk, struct sock *ssk) static void mptcp_subflow_joined(struct mptcp_sock *msk, struct sock *ssk) { mptcp_subflow_ctx(ssk)->map_seq =3D READ_ONCE(msk->ack_seq); - WRITE_ONCE(msk->allow_infinite_fallback, false); + msk->allow_infinite_fallback =3D false; mptcp_event(MPTCP_EVENT_SUB_ESTABLISHED, msk, ssk, GFP_ATOMIC); } =20 @@ -897,7 +897,7 @@ static bool __mptcp_finish_join(struct mptcp_sock *msk,= struct sock *ssk) return false; =20 spin_lock_bh(&msk->fallback_lock); - if (__mptcp_check_fallback(msk)) { + if (!msk->allow_subflows) { spin_unlock_bh(&msk->fallback_lock); return false; } @@ -2707,7 +2707,7 @@ static void __mptcp_retrans(struct sock *sk) dfrag->already_sent =3D max(dfrag->already_sent, info.sent); tcp_push(ssk, 0, info.mss_now, tcp_sk(ssk)->nonagle, info.size_goal); - WRITE_ONCE(msk->allow_infinite_fallback, false); + msk->allow_infinite_fallback =3D false; } spin_unlock_bh(&msk->fallback_lock); =20 @@ -2835,7 +2835,8 @@ static int __mptcp_init_sock(struct sock *sk) WRITE_ONCE(msk->first, NULL); inet_csk(sk)->icsk_sync_mss =3D mptcp_sync_mss; WRITE_ONCE(msk->csum_enabled, mptcp_is_checksum_enabled(sock_net(sk))); - WRITE_ONCE(msk->allow_infinite_fallback, true); + msk->allow_infinite_fallback =3D true; + msk->allow_subflows =3D true; msk->recovery =3D false; =20 mptcp_pm_data_init(msk); @@ -3673,7 +3674,7 @@ bool mptcp_finish_join(struct sock *ssk) /* active subflow, already present inside the conn_list */ if (!list_empty(&subflow->node)) { spin_lock_bh(&msk->fallback_lock); - if (__mptcp_check_fallback(msk)) { + if (!msk->allow_subflows) { spin_unlock_bh(&msk->fallback_lock); return false; } diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index a3acc7042ee9..e1637443203e 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -314,12 +314,14 @@ struct mptcp_sock { u64 time; /* start time of measurement window */ u64 rtt_us; /* last maximum rtt of subflows */ } rcvq_space; + bool allow_subflows; =20 u32 setsockopt_seq; char ca_name[TCP_CA_NAME_MAX]; =20 - spinlock_t fallback_lock; /* protects fallback and - * allow_infinite_fallback + spinlock_t fallback_lock; /* protects fallback, + * allow_infinite_fallback and + * allow_join */ }; =20 @@ -991,6 +993,7 @@ static inline bool __mptcp_try_fallback(struct mptcp_so= ck *msk) return false; } =20 + msk->allow_subflows =3D false; set_bit(MPTCP_FALLBACK_DONE, &msk->flags); spin_unlock_bh(&msk->fallback_lock); return true; diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index d21109a130ec..cff232810692 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -1168,20 +1168,29 @@ static void subflow_sched_work_if_closed(struct mpt= cp_sock *msk, struct sock *ss mptcp_schedule_work(sk); } =20 -static void mptcp_subflow_fail(struct mptcp_sock *msk, struct sock *ssk) +static bool mptcp_subflow_fail(struct mptcp_sock *msk, struct sock *ssk) { struct mptcp_subflow_context *subflow =3D mptcp_subflow_ctx(ssk); unsigned long fail_tout; =20 + /* we are really failing, prevent any later subflow join */ + spin_lock_bh(&msk->fallback_lock); + if (!msk->allow_infinite_fallback) { + spin_unlock_bh(&msk->fallback_lock); + return false; + } + msk->allow_subflows =3D false; + spin_unlock_bh(&msk->fallback_lock); + /* greceful failure can happen only on the MPC subflow */ if (WARN_ON_ONCE(ssk !=3D READ_ONCE(msk->first))) - return; + return false; =20 /* since the close timeout take precedence on the fail one, * no need to start the latter when the first is already set */ if (sock_flag((struct sock *)msk, SOCK_DEAD)) - return; + return true; =20 /* we don't need extreme accuracy here, use a zero fail_tout as special * value meaning no fail timeout at all; @@ -1193,6 +1202,7 @@ static void mptcp_subflow_fail(struct mptcp_sock *msk= , struct sock *ssk) tcp_send_ack(ssk); =20 mptcp_reset_tout_timer(msk, subflow->fail_tout); + return true; } =20 static bool subflow_check_data_avail(struct sock *ssk) @@ -1261,12 +1271,11 @@ static bool subflow_check_data_avail(struct sock *s= sk) (subflow->mp_join || subflow->valid_csum_seen)) { subflow->send_mp_fail =3D 1; =20 - if (!READ_ONCE(msk->allow_infinite_fallback)) { + if (!mptcp_subflow_fail(msk, ssk)) { subflow->reset_transient =3D 0; subflow->reset_reason =3D MPTCP_RST_EMIDDLEBOX; goto reset; } - mptcp_subflow_fail(msk, ssk); WRITE_ONCE(subflow->data_avail, MPTCP_SUBFLOW_DATA_AVAIL); return true; } --=20 2.50.0 From nobody Wed Sep 17 18:19:16 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 57B5E2673AF; Mon, 28 Jul 2025 13:29:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753709380; cv=none; b=ARRuOIMvJaGqXQ3pri7+QyGFucjfM6qvUe6J2u/wvq8J8QE6E54Pjp++q9PHtyKlZjhQtsRMlMx8EEHkdmtltk3zURWTWsYjrOABVGu+CdDqmvZg3f7m+MbwNb4dbDKoyGUkQ64YHQUBdF+oTfsPGKHwCuMdI4/iRAOrafqwHiQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753709380; c=relaxed/simple; bh=GHVCaTB0Vx7iBMCo+RjSScymdZpOLghQGpFNn/Pt46E=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=YFsa5aroluZhwUFHLJhRdg2d/v+N5DYqf0BpbhJvj20nf95H1xulRrP1jykHjR38RSszoWsocSMTgfZj2wTCXWLT6W1htiRyRT9TmMZrlF5caAEfNpRdv01xyXy6nJ5lxBZZyBgrkyEFfx5JszgRUrQaOQnumq3FFR8wa6YoKao= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=sbc5giFt; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="sbc5giFt" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5A04EC4CEFA; Mon, 28 Jul 2025 13:29:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1753709378; bh=GHVCaTB0Vx7iBMCo+RjSScymdZpOLghQGpFNn/Pt46E=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=sbc5giFtceIV6RXojxPbYpQlFKOb6KxcS2Z7Tm0z/wDmesJ8YdUQZOeEGtVSBeG0x g8xFvmoPgneOSwdcwEj5pQyA7wtfZkhVlxsxY5phuErqz4InPly++oxJyDp0mqJIvO e/XZ2GY2TcVCLfhJmbL8jc30YZn3Uj45hC2W20HtWymrJ0YITzrUTFNZehvYbBrynY 58QirQMryBhtcydt+9852jwrqSr+DawthWFlNsdoxwVqT+inwnMSt/Pte0ohLllosn ioY+ywT6mJNgqQ8h1/rEC8aA/k5aDMSFTKTWrOy8c51FWqcNh2fXVGAegzlFBA9gc0 4/00NRnlOzZ8Q== From: "Matthieu Baerts (NGI0)" To: mptcp@lists.linux.dev, stable@vger.kernel.org, gregkh@linuxfoundation.org Cc: Paolo Abeni , sashal@kernel.org, "Matthieu Baerts (NGI0)" , Jakub Kicinski Subject: [PATCH 6.1.y 3/3] mptcp: reset fallback status gracefully at disconnect() time Date: Mon, 28 Jul 2025 15:29:23 +0200 Message-ID: <20250728132919.3904847-8-matttbe@kernel.org> X-Mailer: git-send-email 2.50.0 In-Reply-To: <20250728132919.3904847-5-matttbe@kernel.org> References: <20250728132919.3904847-5-matttbe@kernel.org> Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=1933; i=matttbe@kernel.org; h=from:subject; bh=5mHy2kyjr4LxJA17fPitqXM6r2+TskKQK8TH8Q+eVxU=; b=owGbwMvMwCVWo/Th0Gd3rumMp9WSGDLaq01m/udd8myKhpaJTP6Jq5Mabpy0WbrseVufe1fT3 NtzipeYdJSyMIhxMciKKbJIt0Xmz3xexVvi5WcBM4eVCWQIAxenAExEOICRYe5j1ZCC6KvbpCQO Xfj4iNNyR1bxE+/ZazJfXlZxCkwxXcvwv1g61fSgTm5I/unHt/yeblZin7F1fadW8qIGrtV6B75 O4wQA X-Developer-Key: i=matttbe@kernel.org; a=openpgp; fpr=E8CB85F76877057A6E27F77AF6B7824F4269A073 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Paolo Abeni commit da9b2fc7b73d147d88abe1922de5ab72d72d7756 upstream. mptcp_disconnect() clears the fallback bit unconditionally, without touching the associated flags. The bit clear is safe, as no fallback operation can race with that -- all subflow are already in TCP_CLOSE status thanks to the previous FASTCLOSE -- but we need to consistently reset all the fallback related status. Also acquire the relevant lock, to avoid fouling static analyzers. Fixes: b29fcfb54cd7 ("mptcp: full disconnect implementation") Cc: stable@vger.kernel.org Signed-off-by: Paolo Abeni Reviewed-by: Matthieu Baerts (NGI0) Signed-off-by: Matthieu Baerts (NGI0) Link: https://patch.msgid.link/20250714-net-mptcp-fallback-races-v1-3-391af= f963322@kernel.org Signed-off-by: Jakub Kicinski [ Conflicts in protocol.c, because commit ebc1e08f01eb ("mptcp: drop last_snd and MPTCP_RESET_SCHEDULER") is not in this version and changed the context. The same modification can still be applied at the same place. ] Signed-off-by: Matthieu Baerts (NGI0) --- net/mptcp/protocol.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 73e298f276a8..883efcbb8dfc 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -3204,7 +3204,16 @@ static int mptcp_disconnect(struct sock *sk, int fla= gs) */ mptcp_destroy_common(msk, MPTCP_CF_FASTCLOSE); msk->last_snd =3D NULL; + + /* The first subflow is already in TCP_CLOSE status, the following + * can't overlap with a fallback anymore + */ + spin_lock_bh(&msk->fallback_lock); + msk->allow_subflows =3D true; + msk->allow_infinite_fallback =3D true; WRITE_ONCE(msk->flags, 0); + spin_unlock_bh(&msk->fallback_lock); + msk->cb_flags =3D 0; msk->recovery =3D false; msk->can_ack =3D false; --=20 2.50.0