From nobody Thu Mar 28 14:37:49 2024 Delivered-To: wpasupplicant.patchew@gmail.com Received: by 2002:a9f:3ec7:0:0:0:0:0 with SMTP id n7csp459424uaj; Thu, 18 Aug 2022 09:55:37 -0700 (PDT) X-Google-Smtp-Source: AA6agR4KD4Aq06YLbogFvt8hfQKvJbXlx2s+a6Xb1OI8X0494ILtzynxBoQAeSua2qhpMXeJwTR7 X-Received: by 2002:a65:6b95:0:b0:420:2cb1:68e5 with SMTP id d21-20020a656b95000000b004202cb168e5mr3036126pgw.220.1660841736933; Thu, 18 Aug 2022 09:55:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1660841736; cv=none; d=google.com; s=arc-20160816; b=V+OVYYL6TntwXFemX/Evp5tSupOylTvNolTFsoZJkfjvilULAAQs2fB3ruujkKk6aQ XY5VGpk2H38Xtd+JdlqYfixeWSWmhtwXDinmU1dxiz9BFdLQICciPUPdpewA+xGNFuX7 MYRZRm0xcJWVgoGdfQnQ+176Ou5B1YIEr3svnucjn2fFISpr9ZUAsm4y5maeYgPEo8u1 Xt1rkzQbk2v9N2V9X9pfVqd0TH/A1JKZ4m5zuUtOyyrSxeEp0FktjU3xK1OcDNcyRBSn /Q6nnEKj08gtP44AgBIkhbCxmxHtoHOZRt5EaqfTCo9MefUIcq6SgF8lkPrPMEta1VQc czYQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:message-id:date:subject:cc:to :from; bh=wUX8F3K3IdUv+JgSacyIpR8AHcGOYiy3kR0KsdVqCKw=; b=tdvx3TrQHmNug0eM75ve93zq6YKbx9eKr9DP9QARlSiekepZV1J0dTUQ8M2W0voLw4 l+IL+i/yg0l+e85UhstNLNlKa4fGYMj9fcHxQdYbCADh9V/0R7GPi6qXKgW873PBb2Vr /StSMgSqX3dHJ8dLwM5jduE08dRdbC8veZANaP/qJHl8l/0bdHURGtPtmqWL+KAa6PwX TUKJTu1eCC90M6iIl6ZtPcuU+jgr9+59keF9Rv41OwByC750TPPCnZNKj31xXIQcNJQ4 Hnj4J+/Y5/ScDoMpe1lzUsT3BAZi5M4Ot0DhP+17w9Mi5d2wy19KGy8yCDWW6+WE0/Wk OmhA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of mptcp+bounces-6147-wpasupplicant.patchew=gmail.com@lists.linux.dev designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="mptcp+bounces-6147-wpasupplicant.patchew=gmail.com@lists.linux.dev" Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id 137-20020a63068f000000b0041b7b971b11si1638021pgg.425.2022.08.18.09.55.36 for (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 18 Aug 2022 09:55:36 -0700 (PDT) Received-SPF: pass (google.com: domain of mptcp+bounces-6147-wpasupplicant.patchew=gmail.com@lists.linux.dev designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; spf=pass (google.com: domain of mptcp+bounces-6147-wpasupplicant.patchew=gmail.com@lists.linux.dev designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="mptcp+bounces-6147-wpasupplicant.patchew=gmail.com@lists.linux.dev" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 3C4EE280BDA for ; Thu, 18 Aug 2022 16:55:36 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 0D7F14683; Thu, 18 Aug 2022 16:55:35 +0000 (UTC) X-Original-To: mptcp@lists.linux.dev Received: from Chamillionaire.breakpoint.cc (Chamillionaire.breakpoint.cc [193.142.43.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F160C443F for ; Thu, 18 Aug 2022 16:55:32 +0000 (UTC) Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.92) (envelope-from ) id 1oOiGy-00018b-1V; Thu, 18 Aug 2022 18:21:32 +0200 From: Florian Westphal To: Cc: Florian Westphal Subject: [PATCH RFC mptcp] net: tcp: prevent transition to FIN_WAIT2 when mptcp has unacked data_fin Date: Thu, 18 Aug 2022 18:21:23 +0200 Message-Id: <20220818162123.30198-1-fw@strlen.de> X-Mailer: git-send-email 2.35.1 Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When an mptcp connection closes down, a MPTCP-level DATA_FIN (flag in mp tcp options) is sent. We should wait until we can be sure the peer has received the mptcp-level a= ck of that DATA_FIN. Else, we can end up with the last mptcp subflow in FIN_WAIT= 2 state. In that state, TCP won't respond to incoming ACK anymore, because no progress happens from TCP point of view. mptcp-packetdrill example: // MPTCP sk moves to FIN_WAIT1, subflow remains in ESTABLISHED state: +0.5 close(4) =3D 0 +0.0 > . 2:2(0) ack 1 win 256 // receive DATA_FIN ack. MPTCP sk moves to FIN_WAIT2, subflow remains in ES= TABLISHED state. +0.0 < . 1:1(0) ack 2 win 450 // Receive DATA_FIN. MPTCP socket is removed, subflow sk moves to FIN_WAIT= 1. +0.0 < . 1:1(0) ack 2 win 450 // acks the clients DATA_FIN, but packet might get lost +0 > . 2:2(0) ack 1 win 256 // Receive DATA_FIN retransmission, subflow still in TCP_FIN1 state. +0.01 < . 1:1(0) ack 1 win 450 +0 > F. 2:2(0) ack 1 win 256 // Ack the tcp-level fin but also retransmit the data fin. // Note that we still claim mptcp-level dack4=3D2, i.e. the // servers DATA_FIN remains unacked. // Without this change, this 'ack 3' moves the subflow to FIN_WAIT2 state. +0.0 < . 2:2(0) ack 3 win 450 +0 > . 3:3(0) ack 1 win 256 // receive another retransmit of data fin... +0.0 < . 2:2(0) ack 3 win 450 // ... but without this patch, nothing happens, TW sk won't send anything. // With this patch, the subflow remains in FIN_WAIT1 and replies with a DSS= ack. +0 > . 3:3(0) ack 1 win 256 Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/181 Signed-off-by: Florian Westphal --- include/net/mptcp.h | 4 ++++ net/ipv4/tcp_input.c | 5 +++++ net/mptcp/subflow.c | 8 ++++++++ 3 files changed, 17 insertions(+) Full test case stashed here for the time being: https://github.com/fw-strlen/packetdrill/commit/60a3f57ea309f910643e06dbf12= 3236741c7f8d9 Its possible to either change tw_sk to have enough mptcp context to be able to send a packet back, or we can supress/delay the tw_sock transition. This is what classic TCP is doing when it receives another FIN while in FIN1 state. This change makes the test case work (another dss ack is sent), but there may be other corner cases where we need to delay the sk -> tw_sk transition. If the general idea looks ok, perhaps its better to replace tcp_time_wait(sk, skb, .. with a mptcp aware helper, e.g. tcp_time_wait_check(sk, skb, .. and place the option parsing for mptcp-subflows there? diff --git a/include/net/mptcp.h b/include/net/mptcp.h index 02ca83493470..de2249996671 100644 --- a/include/net/mptcp.h +++ b/include/net/mptcp.h @@ -152,6 +152,8 @@ void mptcp_write_options(struct tcphdr *th, __be32 *ptr= , struct tcp_sock *tp, =20 void mptcp_diag_fill_info(struct mptcp_sock *msk, struct mptcp_info *info); =20 +bool mptcp_subflow_pending_data_fin_ack(const struct sock *ssk); + /* move the skb extension owership, with the assumption that 'to' is * newly allocated */ @@ -297,6 +299,8 @@ static inline int mptcp_subflow_init_cookie_req(struct = request_sock *req, } =20 static inline __be32 mptcp_reset_option(const struct sk_buff *skb) { retu= rn htonl(0u); } + +static inline bool mptcp_subflow_pending_data_fin_ack(const struct sock *s= sk) { return false; } #endif /* CONFIG_MPTCP */ =20 #if IS_ENABLED(CONFIG_MPTCP_IPV6) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index ab5f0ea166f1..c3281e0cf9db 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -6608,6 +6608,11 @@ int tcp_rcv_state_process(struct sock *sk, struct sk= _buff *skb) * marginal case. */ inet_csk_reset_keepalive_timer(sk, tmo); + } else if (sk_is_mptcp(sk) && + mptcp_incoming_options(sk, skb) && + mptcp_subflow_pending_data_fin_ack(sk)) { + inet_csk_schedule_ack(sk); + inet_csk_reset_keepalive_timer(sk, tmo); } else { tcp_time_wait(sk, TCP_FIN_WAIT2, tmo); goto consume; diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index c7d49fb6e7bd..d3e285bc45b8 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -351,6 +351,14 @@ static bool subflow_thmac_valid(struct mptcp_subflow_c= ontext *subflow) return thmac =3D=3D subflow->thmac; } =20 +bool mptcp_subflow_pending_data_fin_ack(const struct sock *ssk) +{ + struct mptcp_subflow_context *subflow =3D mptcp_subflow_ctx(ssk); + struct sock *sk =3D subflow->conn; + + return READ_ONCE(mptcp_sk(sk)->rcv_data_fin); +} + void mptcp_subflow_reset(struct sock *ssk) { struct mptcp_subflow_context *subflow =3D mptcp_subflow_ctx(ssk); --=20 2.35.1