From nobody Thu Sep 18 08:16:31 2025 Delivered-To: wpasupplicant.patchew@gmail.com Received: by 2002:a9f:3042:0:0:0:0:0 with SMTP id i2csp209508uab; Wed, 15 Jun 2022 13:29:00 -0700 (PDT) X-Google-Smtp-Source: AGRyM1sYYbVn7b0XFKymteGNALNH+LgKUKWr1H2I87tehAcUrb/4P5k0xODSff5XPyWn11gs6h7X X-Received: by 2002:a05:6830:22d5:b0:60c:11d4:eb38 with SMTP id q21-20020a05683022d500b0060c11d4eb38mr706455otc.257.1655324940461; Wed, 15 Jun 2022 13:29:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1655324940; cv=none; d=google.com; s=arc-20160816; b=C6k1jIL5G9imvxm6jJTyWAMmD82cOURYGA9ocgG/yofdu8D9asPUgkb1TV9A2MvZeO HQXXBSwMsAA1rbozUWYT7XajHmZZSFVWY/UntAz96xMJtLjHVJ6s1kf0CMgLfGoN8yeK kXQho077TJQwQ0+ik74s1NHWelTshrrqfrwPm19v6jdDUbGx8/ALrHI+A/RfyZgVWP63 0bJwz9K355s7YV/rPDPqxWsLWCYfHOt9rxDrdKF8Tc2hDvYgz55lZT9iTQJkKhSJpSwU 8jBZ5rN3krhQREMKVurCPmoMEgc2D90iZgx5oeg8awMyDkF9AegnXdHBKyZ6JChxiAZg 2+qA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:to:from:dkim-signature; bh=+adNQwyIY6UWuo7I4clhR5DAcIejKwNIKEOGOIt5dV0=; b=dZCbKoBdb6DZGfl96PeDql4ihQuPlXNIiouZOIFSoHMRdsJqn+83+YCL+pXbGvCDOM knnWHKowvhKzUmwQ8tx0q60hHwpEyMuzFGXKRr6yGBv56wmfVTpvGDU4BwANpIA1ZjCK OQyxLuXRIZT3c43mAU+cPVm0ttJRRkLSmOFVTKI+a2bPUBVdKxqEkBqNVkKW9PXqmwDB 2a9L+NFuCzSrFROZliveKyobkqHsyPUmIglpTgryaT8V1rlYLz7/pO1EBYXU4RCrQguT YGzaKm/l4u4V22ODkykcPf3USUWZfBBtgMQldu9qyhcOt2G7nITdWhAbSxb/SvzI7Odg WP2w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=BLVMg28+; spf=pass (google.com: domain of mptcp+bounces-5662-wpasupplicant.patchew=gmail.com@lists.linux.dev designates 139.178.84.19 as permitted sender) smtp.mailfrom="mptcp+bounces-5662-wpasupplicant.patchew=gmail.com@lists.linux.dev"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from da.mirrors.kernel.org (da.mirrors.kernel.org. [139.178.84.19]) by mx.google.com with ESMTPS id bd1-20020a056808220100b0032670efa1d3si426783oib.48.2022.06.15.13.29.00 for (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 15 Jun 2022 13:29:00 -0700 (PDT) Received-SPF: pass (google.com: domain of mptcp+bounces-5662-wpasupplicant.patchew=gmail.com@lists.linux.dev designates 139.178.84.19 as permitted sender) client-ip=139.178.84.19; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=BLVMg28+; spf=pass (google.com: domain of mptcp+bounces-5662-wpasupplicant.patchew=gmail.com@lists.linux.dev designates 139.178.84.19 as permitted sender) smtp.mailfrom="mptcp+bounces-5662-wpasupplicant.patchew=gmail.com@lists.linux.dev"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by da.mirrors.kernel.org (Postfix) with ESMTPS id 25E3D2E0A15 for ; Wed, 15 Jun 2022 20:28:30 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id D53223D84; Wed, 15 Jun 2022 20:28:27 +0000 (UTC) X-Original-To: mptcp@lists.linux.dev Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0E8483D99 for ; Wed, 15 Jun 2022 20:28:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1655324905; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+adNQwyIY6UWuo7I4clhR5DAcIejKwNIKEOGOIt5dV0=; b=BLVMg28+CRmD+KVGGJ0Q9ucseWAcmUtPS1DWhEvFV64LNYM+nI1sSWtQ0EL70yRe3eiCTI DbjHvaurSYdd7e4cZAn+Rs+MCIzTFT+HBuVLl7Te2MU4MMPPzuJsDwyUR947CEwZHgiK3Q EkGa0gni2HRD+3LDUDaYX1WyhxX04jw= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-263-IfBoc1Z5OL6eEM07drNfOA-1; Wed, 15 Jun 2022 16:28:23 -0400 X-MC-Unique: IfBoc1Z5OL6eEM07drNfOA-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 85606101E989 for ; Wed, 15 Jun 2022 20:28:23 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.39.193.220]) by smtp.corp.redhat.com (Postfix) with ESMTP id 15E1A40C1288 for ; Wed, 15 Jun 2022 20:28:22 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Subject: [PATCH mptcp-net v2 3/6] Squash-to: "mptcp: invoke MP_FAIL response when needed" Date: Wed, 15 Jun 2022 22:28:10 +0200 Message-Id: <1deca06703f8666dc199b76f39cd881733741666.1655324843.git.pabeni@redhat.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.11.54.2 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=pabeni@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" This tries to address a few issues outstanding in the mentioned patch: - we explicitly need to reset the timeout timer for mp_fail's sake - we need to explicitly generate a tcp ack for mp_fail, otherwise there are no guarantees for suck option being sent out - the timeout timer needs handling need some caring, as it's still shared between mp_fail and msk socket timeout. - we can re-use msk->first for msk->fail_ssk, as only the first/mpc subflow can fail without reset. That additionally avoid the need to clear fail_ssk on the relevant subflow close. - fail_tout would need some additional annotation. Just to be on the safe side move its manipulaiton under the ssk socket lock. Last 2 paragraph of the squash to commit should be replaced with: """ It leverages the fact that only the MPC/first subflow can gracefully fail to avoid unneeded subflows traversal: the failing subflow can be only msk->first. A new 'fail_tout' field is added to the subflow context to record the MP_FAIL response timeout and use such field to reliably share the timeout timer between the MP_FAIL event and the MPTCP socket close timeout. Finally, a new ack is generated to send out MP_FAIL notification as soon as we hit the relevant condition, instead of waiting a possibly unbound time for the next data packet. """ Signed-off-by: Paolo Abeni --- net/mptcp/pm.c | 4 +++- net/mptcp/protocol.c | 54 ++++++++++++++++++++++++++++++++++++-------- net/mptcp/protocol.h | 4 ++-- net/mptcp/subflow.c | 24 ++++++++++++++++++-- 4 files changed, 72 insertions(+), 14 deletions(-) diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c index 3c7f07bb124e..45e2a48397b9 100644 --- a/net/mptcp/pm.c +++ b/net/mptcp/pm.c @@ -305,13 +305,15 @@ void mptcp_pm_mp_fail_received(struct sock *sk, u64 f= ail_seq) if (!READ_ONCE(msk->allow_infinite_fallback)) return; =20 - if (!msk->fail_ssk) { + if (!subflow->fail_tout) { pr_debug("send MP_FAIL response and infinite map"); =20 subflow->send_mp_fail =3D 1; subflow->send_infinite_map =3D 1; + tcp_send_ack(sk); } else { pr_debug("MP_FAIL response received"); + WRITE_ONCE(subflow->fail_tout, 0); } } =20 diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index a0f9f3831509..50026b8da625 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -500,7 +500,7 @@ static void mptcp_set_timeout(struct sock *sk) __mptcp_set_timeout(sk, tout); } =20 -static bool tcp_can_send_ack(const struct sock *ssk) +static inline bool tcp_can_send_ack(const struct sock *ssk) { return !((1 << inet_sk_state_load(ssk)) & (TCPF_SYN_SENT | TCPF_SYN_RECV | TCPF_TIME_WAIT | TCPF_CLOSE | TCP= F_LISTEN)); @@ -2490,24 +2490,56 @@ static void __mptcp_retrans(struct sock *sk) mptcp_reset_timer(sk); } =20 +/* schedule the timeout timer for the nearest relevant event: either + * close timeout or mp_fail timeout. Both of them could be not + * scheduled yet + */ +void mptcp_reset_timeout(struct mptcp_sock *msk, unsigned long fail_tout) +{ + struct sock *sk =3D (struct sock *)msk; + unsigned long timeout, close_timeout; + + if (!fail_tout && !sock_flag(sk, SOCK_DEAD)) + return; + + close_timeout =3D inet_csk(sk)->icsk_mtup.probe_timestamp - tcp_jiffies32= + jiffies + TCP_TIMEWAIT_LEN; + + /* the following is basically time_min(close_timeout, fail_tout) */ + if (!fail_tout) + timeout =3D close_timeout; + else if (!sock_flag(sk, SOCK_DEAD)) + timeout =3D fail_tout; + else if (time_after(close_timeout, fail_tout)) + timeout =3D fail_tout; + else + timeout =3D close_timeout; + + sk_reset_timer(sk, &sk->sk_timer, timeout); +} + static void mptcp_mp_fail_no_response(struct mptcp_sock *msk) { - struct sock *ssk =3D msk->fail_ssk; + struct sock *ssk =3D msk->first; bool slow; =20 + if (!ssk) + return; + pr_debug("MP_FAIL doesn't respond, reset the subflow"); =20 slow =3D lock_sock_fast(ssk); mptcp_subflow_reset(ssk); + WRITE_ONCE(mptcp_subflow_ctx(ssk)->fail_tout, 0); unlock_sock_fast(ssk, slow); =20 - msk->fail_ssk =3D NULL; + mptcp_reset_timeout(msk, 0); } =20 static void mptcp_worker(struct work_struct *work) { struct mptcp_sock *msk =3D container_of(work, struct mptcp_sock, work); struct sock *sk =3D &msk->sk.icsk_inet.sk; + unsigned long fail_tout; int state; =20 lock_sock(sk); @@ -2544,7 +2576,8 @@ static void mptcp_worker(struct work_struct *work) if (test_and_clear_bit(MPTCP_WORK_RTX, &msk->flags)) __mptcp_retrans(sk); =20 - if (msk->fail_ssk && time_after(jiffies, msk->fail_tout)) + fail_tout =3D msk->first ? READ_ONCE(mptcp_subflow_ctx(msk->first)->fail_= tout) : 0; + if (fail_tout && time_after(jiffies, fail_tout)) mptcp_mp_fail_no_response(msk); =20 unlock: @@ -2572,8 +2605,6 @@ static int __mptcp_init_sock(struct sock *sk) WRITE_ONCE(msk->csum_enabled, mptcp_is_checksum_enabled(sock_net(sk))); WRITE_ONCE(msk->allow_infinite_fallback, true); msk->recovery =3D false; - msk->fail_ssk =3D NULL; - msk->fail_tout =3D 0; =20 mptcp_pm_data_init(msk); =20 @@ -2804,7 +2835,9 @@ static void __mptcp_destroy_sock(struct sock *sk) static void mptcp_close(struct sock *sk, long timeout) { struct mptcp_subflow_context *subflow; + struct mptcp_sock *msk =3D mptcp_sk(sk); bool do_cancel_work =3D false; + unsigned long fail_tout =3D 0; =20 lock_sock(sk); sk->sk_shutdown =3D SHUTDOWN_MASK; @@ -2822,10 +2855,13 @@ static void mptcp_close(struct sock *sk, long timeo= ut) cleanup: /* orphan all the subflows */ inet_csk(sk)->icsk_mtup.probe_timestamp =3D tcp_jiffies32; - mptcp_for_each_subflow(mptcp_sk(sk), subflow) { + mptcp_for_each_subflow(msk, subflow) { struct sock *ssk =3D mptcp_subflow_tcp_sock(subflow); bool slow =3D lock_sock_fast_nested(ssk); =20 + if (ssk =3D=3D msk->first) + fail_tout =3D subflow->fail_tout; + sock_orphan(ssk); unlock_sock_fast(ssk, slow); } @@ -2834,13 +2870,13 @@ static void mptcp_close(struct sock *sk, long timeo= ut) sock_hold(sk); pr_debug("msk=3D%p state=3D%d", sk, sk->sk_state); if (mptcp_sk(sk)->token) - mptcp_event(MPTCP_EVENT_CLOSED, mptcp_sk(sk), NULL, GFP_KERNEL); + mptcp_event(MPTCP_EVENT_CLOSED, msk, NULL, GFP_KERNEL); =20 if (sk->sk_state =3D=3D TCP_CLOSE) { __mptcp_destroy_sock(sk); do_cancel_work =3D true; } else { - sk_reset_timer(sk, &sk->sk_timer, jiffies + TCP_TIMEWAIT_LEN); + mptcp_reset_timeout(msk, fail_tout); } release_sock(sk); if (do_cancel_work) diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index bef7dea9f358..077a717799a0 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -306,8 +306,6 @@ struct mptcp_sock { =20 u32 setsockopt_seq; char ca_name[TCP_CA_NAME_MAX]; - struct sock *fail_ssk; - unsigned long fail_tout; }; =20 #define mptcp_data_lock(sk) spin_lock_bh(&(sk)->sk_lock.slock) @@ -484,6 +482,7 @@ struct mptcp_subflow_context { u8 stale_count; =20 long delegated_status; + unsigned long fail_tout; =20 ); =20 @@ -677,6 +676,7 @@ void mptcp_get_options(const struct sk_buff *skb, =20 void mptcp_finish_connect(struct sock *sk); void __mptcp_set_connected(struct sock *sk); +void mptcp_reset_timeout(struct mptcp_sock *msk, unsigned long fail_tout); static inline bool mptcp_is_fully_established(struct sock *sk) { return inet_sk_state_load(sk) =3D=3D TCP_ESTABLISHED && diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index 98b12a9c4eb5..040901c1f40c 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -1158,6 +1158,27 @@ static bool subflow_can_fallback(struct mptcp_subflo= w_context *subflow) return !subflow->fully_established; } =20 +static void mptcp_subflow_fail(struct mptcp_sock *msk, struct sock *ssk) +{ + struct mptcp_subflow_context *subflow =3D mptcp_subflow_ctx(ssk); + unsigned long fail_tout; + + /* grecefull failure can happen only on the MPC subflow */ + if (WARN_ON_ONCE(ssk !=3D READ_ONCE(msk->first))) + return; + + /* we don't need extreme accuracy here, use a zero fail_tout as special + * value meaning no fail timeout at all; + */ + fail_tout =3D jiffies + TCP_RTO_MAX; + if (!fail_tout) + fail_tout =3D 1; + WRITE_ONCE(subflow->fail_tout, fail_tout); + tcp_send_ack(ssk); + + mptcp_reset_timeout(msk, subflow->fail_tout); +} + static bool subflow_check_data_avail(struct sock *ssk) { struct mptcp_subflow_context *subflow =3D mptcp_subflow_ctx(ssk); @@ -1233,8 +1254,7 @@ static bool subflow_check_data_avail(struct sock *ssk) while ((skb =3D skb_peek(&ssk->sk_receive_queue))) sk_eat_skb(ssk, skb); } else { - msk->fail_ssk =3D ssk; - msk->fail_tout =3D jiffies + TCP_RTO_MAX; + mptcp_subflow_fail(msk, ssk); } WRITE_ONCE(subflow->data_avail, MPTCP_SUBFLOW_NODATA); return true; --=20 2.35.3