From nobody Sat Apr 20 04:07:25 2024 Delivered-To: wpasupplicant.patchew@gmail.com Received: by 2002:a05:6638:27c:0:0:0:0 with SMTP id x28csp1975826jaq; Thu, 7 Oct 2021 15:05:39 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzXIVLITpA1VoZeyqrWIcvEuVkp9HGW4XoWbDVGKdaoWzwChYkykxNrYtWX/lPLlwug9Jum X-Received: by 2002:a62:7850:0:b0:44c:5b71:2506 with SMTP id t77-20020a627850000000b0044c5b712506mr6792819pfc.37.1633644339287; Thu, 07 Oct 2021 15:05:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1633644339; cv=none; d=google.com; s=arc-20160816; b=oag7edXind4XyVTYLG3WswxYC+l5HqzVAQCKZj7EnhckC7OCSe8BK1bnM0Tr5msQDW 1Lj9JJgT9t/gnmspsmcdv208kWDROrWeEnggVVdIO8Q6bRo94Ji5/92Nb+WMJZPU/Zaq wlwJthRQprhHfoTSii+GZkojA5B/6ADWN4+KktDbbUQRZGO1zYwdwt9Mp215n588MD4r kME0giplfXxbBwmitdN4BIVMHUA+7FyZfq7w1+1GHUqfW5DOpmJOUS7iD3kp9eoE/B/u voCHQFkHoex/LCGd0VQ+3T9d/aQIVtGb26ALJRGQn18k3ATy7c33jrh/pG4X2knpCqwY 2vfA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:message-id:date:subject:cc:to :from; bh=lcXqA3ezLFvwaOtBAyPMRZIu2ueHMbdyVcodmO/7W80=; b=ooKdR0j7e7+RLzE1Vst3O73JxRzf1wEghzGoR+uqE6jxMVK2nD8V+eOyKadrM95mHc r5BA6zmJszq1LjfP5W9uroOMse+cLpuMBx3HdNMHsqvR+t1mca70aEoGfCOES91HKQJY 6QNJfPmRuVSbHX4C0zQkXq7X2/LZPpaSwDnkEmJ7CdUY/j8ra0DX7BhQssGHNjL/myVk HpzTr/c/HaHTVh+aamqzcPTCAc9PRc6gb08gOz6jlYF1DW50b9+6cEzsQrLU0JkpQtlE wbNbs34cMjUiGcCRukq3TDT0eHQu2eHX7Hcwa5M3korJyZonhHGm+01H1WXJOn4DfLCc 5nvQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of mptcp+bounces-2136-wpasupplicant.patchew=gmail.com@lists.linux.dev designates 147.75.69.165 as permitted sender) smtp.mailfrom="mptcp+bounces-2136-wpasupplicant.patchew=gmail.com@lists.linux.dev"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from sjc.edge.kernel.org (sjc.edge.kernel.org. [147.75.69.165]) by mx.google.com with ESMTPS id t9si632868plb.213.2021.10.07.15.05.38 for (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 07 Oct 2021 15:05:39 -0700 (PDT) Received-SPF: pass (google.com: domain of mptcp+bounces-2136-wpasupplicant.patchew=gmail.com@lists.linux.dev designates 147.75.69.165 as permitted sender) client-ip=147.75.69.165; Authentication-Results: mx.google.com; spf=pass (google.com: domain of mptcp+bounces-2136-wpasupplicant.patchew=gmail.com@lists.linux.dev designates 147.75.69.165 as permitted sender) smtp.mailfrom="mptcp+bounces-2136-wpasupplicant.patchew=gmail.com@lists.linux.dev"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sjc.edge.kernel.org (Postfix) with ESMTPS id 3B1BD3E05C9 for ; Thu, 7 Oct 2021 22:05:38 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 49E262C94; Thu, 7 Oct 2021 22:05:37 +0000 (UTC) X-Original-To: mptcp@lists.linux.dev Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 878CD2C83 for ; Thu, 7 Oct 2021 22:05:35 +0000 (UTC) X-IronPort-AV: E=McAfee;i="6200,9189,10130"; a="207187866" X-IronPort-AV: E=Sophos;i="5.85,355,1624345200"; d="scan'208";a="207187866" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Oct 2021 15:05:09 -0700 X-IronPort-AV: E=Sophos;i="5.85,355,1624345200"; d="scan'208";a="489193334" Received: from mjmartin-desk2.amr.corp.intel.com (HELO mjmartin-desk2.intel.com) ([10.209.67.146]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 07 Oct 2021 15:05:09 -0700 From: Mat Martineau To: netdev@vger.kernel.org Cc: Paolo Abeni , davem@davemloft.net, kuba@kernel.org, matthieu.baerts@tessares.net, mptcp@lists.linux.dev, syzbot+3360da629681aa0d22fe@syzkaller.appspotmail.com, Mat Martineau Subject: [PATCH net] mptcp: fix possible stall on recvmsg() Date: Thu, 7 Oct 2021 15:05:00 -0700 Message-Id: <20211007220500.280862-1-mathew.j.martineau@linux.intel.com> X-Mailer: git-send-email 2.33.0 Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Paolo Abeni recvmsg() can enter an infinite loop if the caller provides the MSG_WAITALL, the data present in the receive queue is not sufficient to fulfill the request, and no more data is received by the peer. When the above happens, mptcp_wait_data() will always return with no wait, as the MPTCP_DATA_READY flag checked by such function is set and never cleared in such code path. Leveraging the above syzbot was able to trigger an RCU stall: rcu: INFO: rcu_preempt self-detected stall on CPU rcu: 0-...!: (10499 ticks this GP) idle=3D0af/1/0x4000000000000000 softi= rq=3D10678/10678 fqs=3D1 (t=3D10500 jiffies g=3D13089 q=3D109) rcu: rcu_preempt kthread starved for 10497 jiffies! g13089 f0x0 RCU_GP_WAIT= _FQS(5) ->state=3D0x0 ->cpu=3D1 rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now exp= ected behavior. rcu: RCU grace-period kthread stack dump: task:rcu_preempt state:R running task stack:28696 pid: 14 ppid: = 2 flags:0x00004000 Call Trace: context_switch kernel/sched/core.c:4955 [inline] __schedule+0x940/0x26f0 kernel/sched/core.c:6236 schedule+0xd3/0x270 kernel/sched/core.c:6315 schedule_timeout+0x14a/0x2a0 kernel/time/timer.c:1881 rcu_gp_fqs_loop+0x186/0x810 kernel/rcu/tree.c:1955 rcu_gp_kthread+0x1de/0x320 kernel/rcu/tree.c:2128 kthread+0x405/0x4f0 kernel/kthread.c:327 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 rcu: Stack dump where RCU GP kthread last ran: Sending NMI from CPU 0 to CPUs 1: NMI backtrace for cpu 1 CPU: 1 PID: 8510 Comm: syz-executor827 Not tainted 5.15.0-rc2-next-20210920= -syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Goo= gle 01/01/2011 RIP: 0010:bytes_is_nonzero mm/kasan/generic.c:84 [inline] RIP: 0010:memory_is_nonzero mm/kasan/generic.c:102 [inline] RIP: 0010:memory_is_poisoned_n mm/kasan/generic.c:128 [inline] RIP: 0010:memory_is_poisoned mm/kasan/generic.c:159 [inline] RIP: 0010:check_region_inline mm/kasan/generic.c:180 [inline] RIP: 0010:kasan_check_range+0xc8/0x180 mm/kasan/generic.c:189 Code: 38 00 74 ed 48 8d 50 08 eb 09 48 83 c0 01 48 39 d0 74 7a 80 38 00 74 = f2 48 89 c2 b8 01 00 00 00 48 85 d2 75 56 5b 5d 41 5c c3 <48> 85 d2 74 5e 4= 8 01 ea eb 09 48 83 c0 01 48 39 d0 74 50 80 38 00 RSP: 0018:ffffc9000cd676c8 EFLAGS: 00000283 RAX: ffffed100e9a110e RBX: ffffed100e9a110f RCX: ffffffff88ea062a RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff888074d08870 RBP: ffffed100e9a110e R08: 0000000000000001 R09: ffff888074d08877 R10: ffffed100e9a110e R11: 0000000000000000 R12: ffff888074d08000 R13: ffff888074d08000 R14: ffff888074d08088 R15: ffff888074d08000 FS: 0000555556d8e300(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000 S: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000180 CR3: 0000000068909000 CR4: 00000000001506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: instrument_atomic_read_write include/linux/instrumented.h:101 [inline] test_and_clear_bit include/asm-generic/bitops/instrumented-atomic.h:83 [in= line] mptcp_release_cb+0x14a/0x210 net/mptcp/protocol.c:3016 release_sock+0xb4/0x1b0 net/core/sock.c:3204 mptcp_wait_data net/mptcp/protocol.c:1770 [inline] mptcp_recvmsg+0xfd1/0x27b0 net/mptcp/protocol.c:2080 inet6_recvmsg+0x11b/0x5e0 net/ipv6/af_inet6.c:659 sock_recvmsg_nosec net/socket.c:944 [inline] ____sys_recvmsg+0x527/0x600 net/socket.c:2626 ___sys_recvmsg+0x127/0x200 net/socket.c:2670 do_recvmmsg+0x24d/0x6d0 net/socket.c:2764 __sys_recvmmsg net/socket.c:2843 [inline] __do_sys_recvmmsg net/socket.c:2866 [inline] __se_sys_recvmmsg net/socket.c:2859 [inline] __x64_sys_recvmmsg+0x20b/0x260 net/socket.c:2859 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7fc200d2dc39 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 41 15 00 00 90 48 89 f8 48 89 f7 = 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff f= f 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffc5758e5a8 EFLAGS: 00000246 ORIG_RAX: 000000000000012b RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fc200d2dc39 RDX: 0000000000000002 RSI: 00000000200017c0 RDI: 0000000000000003 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000f0b5ff R10: 0000000000000100 R11: 0000000000000246 R12: 0000000000000003 R13: 00007ffc5758e5d0 R14: 00007ffc5758e5c0 R15: 0000000000000003 Fix the issue by replacing the MPTCP_DATA_READY bit with direct inspection of the msk receive queue. Reported-and-tested-by: syzbot+3360da629681aa0d22fe@syzkaller.appspotmail.c= om Fixes: 7a6a6cbc3e59 ("mptcp: recvmsg() can drain data from multiple subflow= ") Signed-off-by: Paolo Abeni Signed-off-by: Mat Martineau --- We've reviewed and tested this in the MPTCP tree, no regressions have come up and the issue is resolved as described above. Looks like it's ready for the net tree. Thanks, Mat --- net/mptcp/protocol.c | 55 ++++++++++++-------------------------------- 1 file changed, 15 insertions(+), 40 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index e5df0b5971c8..d073b2111382 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -528,7 +528,6 @@ static bool mptcp_check_data_fin(struct sock *sk) =20 sk->sk_shutdown |=3D RCV_SHUTDOWN; smp_mb__before_atomic(); /* SHUTDOWN must be visible first */ - set_bit(MPTCP_DATA_READY, &msk->flags); =20 switch (sk->sk_state) { case TCP_ESTABLISHED: @@ -742,10 +741,9 @@ void mptcp_data_ready(struct sock *sk, struct sock *ss= k) =20 /* Wake-up the reader only for in-sequence data */ mptcp_data_lock(sk); - if (move_skbs_to_msk(msk, ssk)) { - set_bit(MPTCP_DATA_READY, &msk->flags); + if (move_skbs_to_msk(msk, ssk)) sk->sk_data_ready(sk); - } + mptcp_data_unlock(sk); } =20 @@ -847,7 +845,6 @@ static void mptcp_check_for_eof(struct mptcp_sock *msk) sk->sk_shutdown |=3D RCV_SHUTDOWN; =20 smp_mb__before_atomic(); /* SHUTDOWN must be visible first */ - set_bit(MPTCP_DATA_READY, &msk->flags); sk->sk_data_ready(sk); } =20 @@ -1759,21 +1756,6 @@ static int mptcp_sendmsg(struct sock *sk, struct msg= hdr *msg, size_t len) return copied ? : ret; } =20 -static void mptcp_wait_data(struct sock *sk, long *timeo) -{ - DEFINE_WAIT_FUNC(wait, woken_wake_function); - struct mptcp_sock *msk =3D mptcp_sk(sk); - - add_wait_queue(sk_sleep(sk), &wait); - sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk); - - sk_wait_event(sk, timeo, - test_bit(MPTCP_DATA_READY, &msk->flags), &wait); - - sk_clear_bit(SOCKWQ_ASYNC_WAITDATA, sk); - remove_wait_queue(sk_sleep(sk), &wait); -} - static int __mptcp_recvmsg_mskq(struct mptcp_sock *msk, struct msghdr *msg, size_t len, int flags, @@ -2077,19 +2059,7 @@ static int mptcp_recvmsg(struct sock *sk, struct msg= hdr *msg, size_t len, } =20 pr_debug("block timeout %ld", timeo); - mptcp_wait_data(sk, &timeo); - } - - if (skb_queue_empty_lockless(&sk->sk_receive_queue) && - skb_queue_empty(&msk->receive_queue)) { - /* entire backlog drained, clear DATA_READY. */ - clear_bit(MPTCP_DATA_READY, &msk->flags); - - /* .. race-breaker: ssk might have gotten new data - * after last __mptcp_move_skbs() returned false. - */ - if (unlikely(__mptcp_move_skbs(msk))) - set_bit(MPTCP_DATA_READY, &msk->flags); + sk_wait_data(sk, &timeo, NULL); } =20 out_err: @@ -2098,9 +2068,9 @@ static int mptcp_recvmsg(struct sock *sk, struct msgh= dr *msg, size_t len, tcp_recv_timestamp(msg, sk, &tss); } =20 - pr_debug("msk=3D%p data_ready=3D%d rx queue empty=3D%d copied=3D%d", - msk, test_bit(MPTCP_DATA_READY, &msk->flags), - skb_queue_empty_lockless(&sk->sk_receive_queue), copied); + pr_debug("msk=3D%p rx queue empty=3D%d:%d copied=3D%d", + msk, skb_queue_empty_lockless(&sk->sk_receive_queue), + skb_queue_empty(&msk->receive_queue), copied); if (!(flags & MSG_PEEK)) mptcp_rcv_space_adjust(msk, copied); =20 @@ -2368,7 +2338,6 @@ static void mptcp_check_fastclose(struct mptcp_sock *= msk) inet_sk_state_store(sk, TCP_CLOSE); sk->sk_shutdown =3D SHUTDOWN_MASK; smp_mb__before_atomic(); /* SHUTDOWN must be visible first */ - set_bit(MPTCP_DATA_READY, &msk->flags); set_bit(MPTCP_WORK_CLOSE_SUBFLOW, &msk->flags); =20 mptcp_close_wake_up(sk); @@ -3385,8 +3354,14 @@ static int mptcp_stream_accept(struct socket *sock, = struct socket *newsock, =20 static __poll_t mptcp_check_readable(struct mptcp_sock *msk) { - return test_bit(MPTCP_DATA_READY, &msk->flags) ? EPOLLIN | EPOLLRDNORM : - 0; + /* Concurrent splices from sk_receive_queue into receive_queue will + * always show at least one non-empty queue when checked in this order. + */ + if (skb_queue_empty_lockless(&((struct sock *)msk)->sk_receive_queue) && + skb_queue_empty_lockless(&msk->receive_queue)) + return 0; + + return EPOLLIN | EPOLLRDNORM; } =20 static __poll_t mptcp_check_writeable(struct mptcp_sock *msk) @@ -3421,7 +3396,7 @@ static __poll_t mptcp_poll(struct file *file, struct = socket *sock, state =3D inet_sk_state_load(sk); pr_debug("msk=3D%p state=3D%d flags=3D%lx", msk, state, msk->flags); if (state =3D=3D TCP_LISTEN) - return mptcp_check_readable(msk); + return test_bit(MPTCP_DATA_READY, &msk->flags) ? EPOLLIN | EPOLLRDNORM := 0; =20 if (state !=3D TCP_SYN_SENT && state !=3D TCP_SYN_RECV) { mask |=3D mptcp_check_readable(msk); base-commit: 8d6c414cd2fb74aa6812e9bfec6178f8246c4f3a --=20 2.33.0