From nobody Thu Nov 27 12:35:52 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CFF5A34678B for ; Tue, 4 Nov 2025 21:52:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762293124; cv=none; b=PK276rOkfnK6gMfwsY6mlP+TYrN4g0+25WFChjTrOCQQDnod2ZGItIoOP9BaI6gYaGBgwU6/2gwX0iIxgeBxXeqYjuttfuvKxOJhZGDQp4TcdaweQaZIDDJvR+MRzG9HHSP8NoXi0uBeJfwkPvgn6BXt1sf34nZs/LyGNR/uEhI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762293124; c=relaxed/simple; bh=WxFehIMK7FonvL9E1qFx+I+NaZx7yMO68SDAIqOl/QI=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=pTDQrw+dRNMXw/LXT7WXMrcIDytIvjWsgAXAkpGPgRe2QNYH3u4ylBHDrjKOHj+6uELKSXJP1iya7WhHlRJhS5HHjjYZv45yaf+885dKStztxPtUDaLRh9xweGUF7F6sYWtdNShxiPzll9QcwjmsBuBlAeEE8uDLoWIdLkEAwzA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=LRwyj+1F; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="LRwyj+1F" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1762293121; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=9SxLshViev8Rp+T51omzgDSjDQP6r1mlLDBl7r5EFOg=; b=LRwyj+1FdgdEiJT0X3+TPy9hHHGEg2nU58/N1KU2Plt5AH7nmaxxwqzfRYTeb7J/ukfU7F QJ1rT9n7JE8OeyMKXrfgBWGTIoHpZXu4SGuh1H4Pwcy4a+0xAs/nMrNDLC9axk1EAu9+li mM4t41Yy19Wa3o46gI6fG2Hw71EIj44= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-350-GU3I5JVNMbK2lNXeBAn5qA-1; Tue, 04 Nov 2025 16:52:00 -0500 X-MC-Unique: GU3I5JVNMbK2lNXeBAn5qA-1 X-Mimecast-MFC-AGG-ID: GU3I5JVNMbK2lNXeBAn5qA_1762293119 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 8A9F9180034C for ; Tue, 4 Nov 2025 21:51:59 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.45.224.32]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 6FC5119560A2 for ; Tue, 4 Nov 2025 21:51:58 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Subject: [PATCH v2 mptcp-next 1/7] trace: mptcp: add mptcp_rcvbuf_grow tracepoint Date: Tue, 4 Nov 2025 22:51:35 +0100 Message-ID: <5b1042b7f934b9a749dee435b7494a414adb57ce.1762292476.git.pabeni@redhat.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: ys-K1s8cIm28HNt-pxBA-DzPMrG-JD-EIkBLNL53EUk_1762293119 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" Similar to tcp, provide a new tracepoint to better understand mptcp_rcv_space_adjust() behavior, which presents many artifacts. Signed-off-by: Paolo Abeni --- include/trace/events/mptcp.h | 74 ++++++++++++++++++++++++++++++++++++ net/mptcp/protocol.c | 3 ++ 2 files changed, 77 insertions(+) diff --git a/include/trace/events/mptcp.h b/include/trace/events/mptcp.h index 085b749cdd97..71fd6d33f48b 100644 --- a/include/trace/events/mptcp.h +++ b/include/trace/events/mptcp.h @@ -178,6 +178,80 @@ TRACE_EVENT(subflow_check_data_avail, __entry->skb) ); =20 +#include + +TRACE_EVENT(mptcp_rcvbuf_grow, + + TP_PROTO(struct sock *sk, int time), + + TP_ARGS(sk, time), + + TP_STRUCT__entry( + __field(int, time) + __field(__u32, rtt_us) + __field(__u32, copied) + __field(__u32, inq) + __field(__u32, space) + __field(__u32, ooo_space) + __field(__u32, rcvbuf) + __field(__u32, rcv_wnd) + __field(__u8, scaling_ratio) + __field(__u16, sport) + __field(__u16, dport) + __field(__u16, family) + __array(__u8, saddr, 4) + __array(__u8, daddr, 4) + __array(__u8, saddr_v6, 16) + __array(__u8, daddr_v6, 16) + __field(const void *, skaddr) + ), + + TP_fast_assign( + struct mptcp_sock *msk =3D mptcp_sk(sk); + struct inet_sock *inet =3D inet_sk(sk); + __be32 *p32; + + __entry->time =3D time; + __entry->rtt_us =3D msk->rcvq_space.rtt_us >> 3; + __entry->copied =3D msk->rcvq_space.copied; + __entry->inq =3D mptcp_inq_hint(sk); + __entry->space =3D msk->rcvq_space.space; + __entry->ooo_space =3D RB_EMPTY_ROOT(&msk->out_of_order_queue) ? 0 : + MPTCP_SKB_CB(msk->ooo_last_skb)->end_seq - + msk->ack_seq; + + __entry->rcvbuf =3D sk->sk_rcvbuf; + __entry->rcv_wnd =3D atomic64_read(&msk->rcv_wnd_sent) - msk->ack_seq; + __entry->scaling_ratio =3D msk->scaling_ratio; + __entry->sport =3D ntohs(inet->inet_sport); + __entry->dport =3D ntohs(inet->inet_dport); + __entry->family =3D sk->sk_family; + + p32 =3D (__be32 *) __entry->saddr; + *p32 =3D inet->inet_saddr; + + p32 =3D (__be32 *) __entry->daddr; + *p32 =3D inet->inet_daddr; + + TP_STORE_ADDRS(__entry, inet->inet_saddr, inet->inet_daddr, + sk->sk_v6_rcv_saddr, sk->sk_v6_daddr); + + __entry->skaddr =3D sk; + ), + + TP_printk("time=3D%u rtt_us=3D%u copied=3D%u inq=3D%u space=3D%u ooo=3D%u= scaling_ratio=3D%u rcvbuf=3D%u " + "rcv_wnd=3D%u " + "sport=3D%hu dport=3D%hu saddr=3D%pI4 daddr=3D%pI4 " + "saddrv6=3D%pI6c daddrv6=3D%pI6c skaddr=3D%p", + __entry->time, __entry->rtt_us, __entry->copied, + __entry->inq, __entry->space, __entry->ooo_space, + __entry->scaling_ratio, __entry->rcvbuf, + __entry->rcv_wnd, + __entry->sport, __entry->dport, + __entry->saddr, __entry->daddr, + __entry->saddr_v6, __entry->daddr_v6, + __entry->skaddr) +); #endif /* _TRACE_MPTCP_H */ =20 /* This part must be outside protection */ diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 01114456dec6..443406bc4a54 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -28,6 +28,8 @@ #include "protocol.h" #include "mib.h" =20 +static unsigned int mptcp_inq_hint(const struct sock *sk); + #define CREATE_TRACE_POINTS #include =20 @@ -2101,6 +2103,7 @@ static void mptcp_rcv_space_adjust(struct mptcp_sock = *msk, int copied) if (msk->rcvq_space.copied <=3D msk->rcvq_space.space) goto new_measure; =20 + trace_mptcp_rcvbuf_grow(sk, time); if (mptcp_rcvbuf_grow(sk, msk->rcvq_space.copied)) { /* Make subflows follow along. If we do not do this, we * get drops at subflow level if skbs can't be moved to --=20 2.51.0 From nobody Thu Nov 27 12:35:52 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 855A52BDC29 for ; Tue, 4 Nov 2025 21:52:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762293126; cv=none; b=p5yAp1N+u0n0CY7bKikwCmdm9h5lBPlYv5tihn5wTyK38YbtbxCRzNBFXlaUe+UcTlK8D6OkinbrpahTkbe/tpxCsHlqzRagsaOhO07JOUQJcd2jdObdA4vQ4UOchhjkkDsUjLN9Q60rHvf54H6XEPb8BI3MbBQWfVBH8bk3NTk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762293126; c=relaxed/simple; bh=NwbEEdj/AkproEAao//DBLfh3z4kJOZmrZToxZxcHsY=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=O5arQ9R4+rVIJIpACvZIIxXu7P04A7MnfedPawTR+dY8hxjnrexvG8GgNF+qHqn/yPlN+OGk1uPgOEDIyxk8lvb08pNh/WW/1eGR9Bhy7Ib4PrVS6h1rmBB49E8ezOA+uaxXrWGJDgrh5mONTVCl+pL6tfifU+0JI4HcmbrJflg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=Rx9d3Toh; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Rx9d3Toh" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1762293123; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Li1+Jv7qbUgUpuvaSFhubFLy6xlkszRtVs8/yb6nvAU=; b=Rx9d3TohciYGYNQDI4Ld7e9+3F2+FKFBLZfkh35+DRo4mGTY8biM+ORBdFfSjP/Vy7Dn4q ZRoCNY56E/B+zsfDxgxxFWhXPN9z3FFk7HR9e9slLUe0MWDyP+kEXEeEqlElNiM/XU43MN nIe+e4Spj1RvSL/AoH1D0frXpTbXd4U= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-534-Iu3qb4A9M5izlbPbca0sCQ-1; Tue, 04 Nov 2025 16:52:02 -0500 X-MC-Unique: Iu3qb4A9M5izlbPbca0sCQ-1 X-Mimecast-MFC-AGG-ID: Iu3qb4A9M5izlbPbca0sCQ_1762293121 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 4AC241800669 for ; Tue, 4 Nov 2025 21:52:01 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.45.224.32]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 5637E19560A2 for ; Tue, 4 Nov 2025 21:51:59 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Subject: [PATCH v2 mptcp-next 2/7] mptcp: avoid unneeded subflow-level drops. Date: Tue, 4 Nov 2025 22:51:36 +0100 Message-ID: In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: FAmTvNxlnN4pt6-dZNPfGsqX_EO--mlKIVXmP3p7CU8_1762293121 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" The rcv window is shared among all the subflows. Currently, MPTCP sync the TCP-level rcv window with the MPTCP one at tcp_transmit_skb() time. The above means that incoming data may sporadically observe outdated TCP-level rcv window and being wrongly dropped by TCP. Address the issue checking for the edge condition before queuing the data at TCP level, and eventually syncing the rcv window as needed. Note that the issue is actually present from the very first MPTCP implementation, but backports older than the blamed commit below will range from impossible to useless. Before: nstat >/dev/null ;sleep 1; nstat -z TcpExtBeyondWindow TcpExtBeyondWindow 14 0.0 After: nstat >/dev/null ;sleep 1; nstat -z TcpExtBeyondWindow TcpExtBeyondWindow 0 0.0 Fixes: fa3fe2b15031 ("mptcp: track window announced to peer") Signed-off-by: Paolo Abeni --- net/mptcp/options.c | 31 +++++++++++++++++++++++++++++++ net/mptcp/protocol.h | 1 + 2 files changed, 32 insertions(+) diff --git a/net/mptcp/options.c b/net/mptcp/options.c index cf531f2d815c..ad51dcf18984 100644 --- a/net/mptcp/options.c +++ b/net/mptcp/options.c @@ -1042,6 +1042,31 @@ static void __mptcp_snd_una_update(struct mptcp_sock= *msk, u64 new_snd_una) WRITE_ONCE(msk->snd_una, new_snd_una); } =20 +static void rwin_update(struct mptcp_sock *msk, struct sock *ssk, + struct sk_buff *skb) +{ + struct mptcp_subflow_context *subflow =3D mptcp_subflow_ctx(ssk); + struct tcp_sock *tp =3D tcp_sk(ssk); + u64 mptcp_rcv_wnd; + + /* Avoid touching extra cachelines if TCP is going to accept this + * skb without filling the TCP-level window even with a possibly + * outdated mptcp-level rwin. + */ + if (!skb->len || skb->len < tcp_receive_window(tp)) + return; + + mptcp_rcv_wnd =3D atomic64_read(&msk->rcv_wnd_sent); + if (!after64(mptcp_rcv_wnd, subflow->rcv_wnd_sent)) + return; + + /* Some other subflow grew the mptcp-level rwin since rcv_wup, + * resync. + */ + tp->rcv_wnd +=3D mptcp_rcv_wnd - subflow->rcv_wnd_sent; + subflow->rcv_wnd_sent =3D mptcp_rcv_wnd; +} + static void ack_update_msk(struct mptcp_sock *msk, struct sock *ssk, struct mptcp_options_received *mp_opt) @@ -1209,6 +1234,7 @@ bool mptcp_incoming_options(struct sock *sk, struct s= k_buff *skb) */ if (mp_opt.use_ack) ack_update_msk(msk, sk, &mp_opt); + rwin_update(msk, sk, skb); =20 /* Zero-data-length packets are dropped by the caller and not * propagated to the MPTCP layer, so the skb extension does not @@ -1295,6 +1321,10 @@ static void mptcp_set_rwin(struct tcp_sock *tp, stru= ct tcphdr *th) =20 if (rcv_wnd_new !=3D rcv_wnd_old) { raise_win: + /* The msk-level rcv wnd is after the tcp level one, + * sync the latter. + */ + rcv_wnd_new =3D rcv_wnd_old; win =3D rcv_wnd_old - ack_seq; tp->rcv_wnd =3D min_t(u64, win, U32_MAX); new_win =3D tp->rcv_wnd; @@ -1318,6 +1348,7 @@ static void mptcp_set_rwin(struct tcp_sock *tp, struc= t tcphdr *th) =20 update_wspace: WRITE_ONCE(msk->old_wspace, tp->rcv_wnd); + subflow->rcv_wnd_sent =3D rcv_wnd_new; } =20 __sum16 __mptcp_make_csum(u64 data_seq, u32 subflow_seq, u16 data_len, __w= sum sum) diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index d0881db16b12..f14eeb4fd884 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -513,6 +513,7 @@ struct mptcp_subflow_context { u64 remote_key; u64 idsn; u64 map_seq; + u64 rcv_wnd_sent; u32 snd_isn; u32 token; u32 rel_write_seq; --=20 2.51.0 From nobody Thu Nov 27 12:35:52 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 293E7346791 for ; Tue, 4 Nov 2025 21:52:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762293127; cv=none; b=WK4ldw2Jo5dXrZdAX8K0QGXhM8+lRHZ/29bYBfVxypSayS9uMuWrhhf5NLTzOWnqMz9Z0OeahPaRZaPCQ1Zyg1IQ8gN+ZIXVbWcSA6jqVq4X981HnFByf9PYZiW6+fDATDhYKb5xmLpCWKWEMs20gAz2gW2cTb5ebkYkKSJEBPw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762293127; c=relaxed/simple; bh=iAQlvwi+CPiFalAX6WJ9QZdeCHLSgVQWp5znRSJTDa8=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=dcRjHVaHQkQxHnrYFOZhQAF8YwtDBm93WKr50564jiJdUfL17xwNgb4VG9A7MNOoXGpTUbUw2uHSRWwrJbAz2RCQoZ5z1PomMWwtJ4vMePK2FRQhReAaJuIjhMNohSdHf4ZNN6YND7PtQgOU2983Ku7ytM9qPO5Fi8qgkLWeiGE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=dKNeg7DW; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="dKNeg7DW" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1762293125; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Me4rreWp/5HU/SJCEgJeFmFZCRm90DkRo4EntxdLNtA=; b=dKNeg7DWIGTCycpYQKC3/dRvOA8YjFBNeSwmIgs7wcYKGjWqI32VoQN3/hqdowL8WMwvXS n5A6cEYdil3EKiiXElKYeG+vgiJUhebUaJZtk2tVbtfq1U98zNRIz83TWu2aN2ocLj5A6q TKvH41Xmv2GzDDIL7WDPv+3N1MJGiGo= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-645-m7P6K8vRNP-9L_k6JoC4OQ-1; Tue, 04 Nov 2025 16:52:03 -0500 X-MC-Unique: m7P6K8vRNP-9L_k6JoC4OQ-1 X-Mimecast-MFC-AGG-ID: m7P6K8vRNP-9L_k6JoC4OQ_1762293123 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id E4B5E1800345 for ; Tue, 4 Nov 2025 21:52:02 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.45.224.32]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id D8EED19560A2 for ; Tue, 4 Nov 2025 21:52:01 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Subject: [PATCH v2 mptcp-next 3/7] mptcp: fix receive space timestamp initialization. Date: Tue, 4 Nov 2025 22:51:37 +0100 Message-ID: <1f53b9ab2809061f293d327d1d71cc89a8f6822f.1762292476.git.pabeni@redhat.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: yQJNSl5antXJcqOwxWMxhb8JC6wwRkyATmvgecE1E54_1762293123 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" MPTCP initialize the receive buffer stamp in mptcp_rcv_space_init(), using the provided subflow stamp. Such helper is invoked in several places; for passive sockets, space init happened at clone time. In such scenario, MPTCP ends-up accesses the subflow stamp before its initialization, leading to quite randomic timing for the first receive buffer auto-tune event, as the timestamp for newly created subflow is not refreshed there. Fix the issue moving the stamp initialization of the mentioned helper, as soon at the data transfer start, and always using a fresh timestamp. This will also make the next patch cleaner. Fixes: 013e3179dbd2 ("mptcp: fix rcv space initialization") Signed-off-by: Paolo Abeni --- v1 -> v2: -- factor out only the tstamp change for better reviewability --- net/mptcp/protocol.c | 7 ++++--- net/mptcp/protocol.h | 5 +++++ 2 files changed, 9 insertions(+), 3 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 443406bc4a54..fd10565d9287 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -2072,8 +2072,8 @@ static void mptcp_rcv_space_adjust(struct mptcp_sock = *msk, int copied) =20 msk->rcvq_space.copied +=3D copied; =20 - mstamp =3D div_u64(tcp_clock_ns(), NSEC_PER_USEC); - time =3D tcp_stamp_us_delta(mstamp, msk->rcvq_space.time); + mstamp =3D mptcp_stamp(); + time =3D tcp_stamp_us_delta(mstamp, READ_ONCE(msk->rcvq_space.time)); =20 rtt_us =3D msk->rcvq_space.rtt_us; if (rtt_us && time < (rtt_us >> 3)) @@ -3491,7 +3491,7 @@ struct sock *mptcp_sk_clone_init(const struct sock *s= k, mptcp_copy_inaddrs(nsk, ssk); __mptcp_propagate_sndbuf(nsk, ssk); =20 - mptcp_rcv_space_init(msk, ssk); + msk->rcvq_space.time =3D mptcp_stamp(); =20 if (mp_opt->suboptions & OPTION_MPTCP_MPC_ACK) __mptcp_subflow_fully_established(msk, subflow, mp_opt); @@ -3706,6 +3706,7 @@ void mptcp_finish_connect(struct sock *ssk) * accessing the field below */ WRITE_ONCE(msk->local_key, subflow->local_key); + WRITE_ONCE(msk->rcvq_space.time, mptcp_stamp()); =20 mptcp_pm_new_connection(msk, ssk, 0); } diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index f14eeb4fd884..49f211e427bf 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -913,6 +913,11 @@ static inline bool mptcp_is_fully_established(struct s= ock *sk) READ_ONCE(mptcp_sk(sk)->fully_established); } =20 +static inline u64 mptcp_stamp(void) +{ + return div_u64(tcp_clock_ns(), NSEC_PER_USEC); +} + void mptcp_rcv_space_init(struct mptcp_sock *msk, const struct sock *ssk); void mptcp_data_ready(struct sock *sk, struct sock *ssk); bool mptcp_finish_join(struct sock *sk); --=20 2.51.0 From nobody Thu Nov 27 12:35:52 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C8794328B48 for ; Tue, 4 Nov 2025 21:52:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762293129; cv=none; b=YlFR/a0UYlMVno8jB5yiHlHu4fQu0Q2lGikCAVuLMJbCx4Rz3d3Z2WqeBESJISRsd3VIZobHw383UxhEhRto7LA1V2oUouRzZs6B92QlAlsJST6wErRXS60Iu4rjE6+Hx1u+AIC7JDdLndbMC7v5DJlrCHTx1PKyj/S+JOdiWe0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762293129; c=relaxed/simple; bh=cO+TiZ7FZwHQ+YXUqBpTfCSDE0dOGIEyLgweNUyUvkQ=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=hR2P8NTOYrRM42MPltinbguELqlK5FZqbrxFEXoVvuD2/sOC9BuuVxhWpXzjy3ePNgjYA/hoL8JjE2ZgjqcU/COwwvJgULX/+CCfoymIDSS+rVYWnfWU+6smPYZn8TXUGWzj2XniklBFvsl8TRdZBbO4ZVNezG4Z8vGBFFaal3Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=Mo4xYfRc; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Mo4xYfRc" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1762293126; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5Uy6O4RY7w5qeUehIXfes15xbFEjDVkBTBQiO4BRw+E=; b=Mo4xYfRc90SRim5jp9UCY2HuJ29uCthM3WXZ8FHCZBMi/oVB4DRQ5oaSjbkPeS0TQv/VWr JbdgypmwlTD3reZXJ2ynle6tOf78nvCg6gSL79r313UJy54yGGrQX1vcLmxaNDVnSpFYNB xETRmhAKCG8GLlLgusAyc9PC0RE+eSI= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-304-DWl7sPWuPLmBGLDFkFfOwg-1; Tue, 04 Nov 2025 16:52:05 -0500 X-MC-Unique: DWl7sPWuPLmBGLDFkFfOwg-1 X-Mimecast-MFC-AGG-ID: DWl7sPWuPLmBGLDFkFfOwg_1762293124 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 8FDE91800365 for ; Tue, 4 Nov 2025 21:52:04 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.45.224.32]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 7F20B19560A2 for ; Tue, 4 Nov 2025 21:52:03 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Subject: [PATCH v2 mptcp-next 4/7] mptcp: consolidate rcv space init Date: Tue, 4 Nov 2025 22:51:38 +0100 Message-ID: <501c37105f5645ed4249c9619c1fe3a63ac0bd97.1762292476.git.pabeni@redhat.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: 8bd8c8RlmxyHdZtpVvITeux6Kbspo1tNqbKlJZfZejQ_1762293124 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" MPTCP uses several calls of the mptcp_rcv_space_init() helper to initialize the receive space, with a catch-up call in mptcp_rcv_space_adjust(). Drop all the other strictly not needed invocations and move constant fields initialization at socket init/reset time. This removes a bit of complexity and will simplify the following patches. No functional changes intended. Signed-off-by: Paolo Abeni --- v1 -> v2: - split helper consolidation out of v1 patch - additionally move 'copied' and 'rtt_us' initialization out of mptcp_rcv_space_init() --- net/mptcp/protocol.c | 34 +++++++++++++++++----------------- net/mptcp/protocol.h | 1 - net/mptcp/subflow.c | 2 -- 3 files changed, 17 insertions(+), 20 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index fd10565d9287..07e1703688ea 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -2050,6 +2050,19 @@ static int __mptcp_recvmsg_mskq(struct sock *sk, str= uct msghdr *msg, return copied; } =20 +static void mptcp_rcv_space_init(struct mptcp_sock *msk, const struct sock= *ssk) +{ + const struct tcp_sock *tp =3D tcp_sk(ssk); + + msk->rcvspace_init =3D 1; + + /* initial rcv_space offering made to peer */ + msk->rcvq_space.space =3D min_t(u32, tp->rcv_wnd, + TCP_INIT_CWND * tp->advmss); + if (msk->rcvq_space.space =3D=3D 0) + msk->rcvq_space.space =3D TCP_INIT_CWND * TCP_MSS_DEFAULT; +} + /* receive buffer autotuning. See tcp_rcv_space_adjust for more informati= on. * * Only difference: Use highest rtt estimate of the subflows in use. @@ -2922,6 +2935,8 @@ static void __mptcp_init_sock(struct sock *sk) msk->timer_ival =3D TCP_RTO_MIN; msk->scaling_ratio =3D TCP_DEFAULT_SCALING_RATIO; msk->backlog_len =3D 0; + msk->rcvq_space.copied =3D 0; + msk->rcvq_space.rtt_us =3D 0; =20 WRITE_ONCE(msk->first, NULL); inet_csk(sk)->icsk_sync_mss =3D mptcp_sync_mss; @@ -3364,6 +3379,8 @@ static int mptcp_disconnect(struct sock *sk, int flag= s) msk->bytes_sent =3D 0; msk->bytes_retrans =3D 0; msk->rcvspace_init =3D 0; + msk->rcvq_space.copied =3D 0; + msk->rcvq_space.rtt_us =3D 0; =20 /* for fallback's sake */ WRITE_ONCE(msk->ack_seq, 0); @@ -3501,23 +3518,6 @@ struct sock *mptcp_sk_clone_init(const struct sock *= sk, return nsk; } =20 -void mptcp_rcv_space_init(struct mptcp_sock *msk, const struct sock *ssk) -{ - const struct tcp_sock *tp =3D tcp_sk(ssk); - - msk->rcvspace_init =3D 1; - msk->rcvq_space.copied =3D 0; - msk->rcvq_space.rtt_us =3D 0; - - msk->rcvq_space.time =3D tp->tcp_mstamp; - - /* initial rcv_space offering made to peer */ - msk->rcvq_space.space =3D min_t(u32, tp->rcv_wnd, - TCP_INIT_CWND * tp->advmss); - if (msk->rcvq_space.space =3D=3D 0) - msk->rcvq_space.space =3D TCP_INIT_CWND * TCP_MSS_DEFAULT; -} - static void mptcp_destroy(struct sock *sk) { struct mptcp_sock *msk =3D mptcp_sk(sk); diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index 49f211e427bf..7c9e143c0fb5 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -918,7 +918,6 @@ static inline u64 mptcp_stamp(void) return div_u64(tcp_clock_ns(), NSEC_PER_USEC); } =20 -void mptcp_rcv_space_init(struct mptcp_sock *msk, const struct sock *ssk); void mptcp_data_ready(struct sock *sk, struct sock *ssk); bool mptcp_finish_join(struct sock *sk); bool mptcp_schedule_work(struct sock *sk); diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index ac8616e7521e..b64ab7649908 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -462,8 +462,6 @@ void __mptcp_sync_state(struct sock *sk, int state) =20 subflow =3D mptcp_subflow_ctx(ssk); __mptcp_propagate_sndbuf(sk, ssk); - if (!msk->rcvspace_init) - mptcp_rcv_space_init(msk, ssk); =20 if (sk->sk_state =3D=3D TCP_SYN_SENT) { /* subflow->idsn is always available is TCP_SYN_SENT state, --=20 2.51.0 From nobody Thu Nov 27 12:35:52 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 70E30346785 for ; Tue, 4 Nov 2025 21:52:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762293131; cv=none; b=lGzT/CbXPIJkDdEq9nD7a4RQF4eWSsjQ+N2zd0KGIFAWia78p+XWeZ5YiISXYneJ2wvmW7i7T3KGfkd4DRV4Y5RFq6nI91SB2baWvVb8MAPEHYuc+/T4y8/H4M6pvuRR4/k7A2buTksbiq6WwMPHvQJqYdebG7Uc2czfKdvoXSI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762293131; c=relaxed/simple; bh=04CtghEKinoNE2kJLz2d9cXT1ME/nYe9c1E96D2n290=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=DqTIXqgAFminS00DtM//2l6xaQCnHgkLwQ5bKNwwwSf0Vlkac34rYnnXxwSrdnppPxmyNIJXsRWZsTIhPLaYBxB18uXsNZoCAGpm/V/D3K3UVFuI6HWx5GWYvDgVIEfS281iV5+JTIurSTWh439VOAuCWFfSVWohblFSs4q+lTI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=WG9Fvmf/; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="WG9Fvmf/" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1762293128; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=SsRaVVvc8g1ztE+wrObc5V5YjWWZxFKQP6KFVKVjGEM=; b=WG9Fvmf/SFe+cnud8YEET9DWRBmCET6vWXzblB9bYMAVpJ/+rllL7tNp7th6ZlInObo787 4CvZJQr/22RkSAS2w8v2lIo7fAM/39/NxJfZshF41QMraoLz3Ln/wp0bIDAMo4P+3F1rIZ +BmtozXGQgSzbjrFfQ6iz8hdFa8u7Os= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-136-FaV5YqjVMRqBMC4W_MFkrg-1; Tue, 04 Nov 2025 16:52:07 -0500 X-MC-Unique: FaV5YqjVMRqBMC4W_MFkrg-1 X-Mimecast-MFC-AGG-ID: FaV5YqjVMRqBMC4W_MFkrg_1762293126 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 3546818009C0 for ; Tue, 4 Nov 2025 21:52:06 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.45.224.32]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 41BAE19560B2 for ; Tue, 4 Nov 2025 21:52:04 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Subject: [PATCH v2 mptcp-next 5/7] mptcp: better rcv space initialization Date: Tue, 4 Nov 2025 22:51:39 +0100 Message-ID: <704300b6e3c86122a4373775aca5b9ffc4e66c82.1762292476.git.pabeni@redhat.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: zwiJUgLBYccQsHi4j8JB8ePTHKGSru5QmT1rDXjxfa4_1762293126 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" Each additional subflow can overshot the rcv space by a full rcv wnd, as the TCP socket at creation time will announce such window even if the MPTCP-level window is closed. Underestimating the initial real rcv space can overshot the initial rcv win increase significantly. Keep track explicitly of the rcv space contribution by newly created subflow, updating rcvq_space.space accordingly for every successfully completed subflow handshake. Signed-off-by: Paolo Abeni --- net/mptcp/protocol.c | 44 ++++++++++++++++++++++++++++++-------------- net/mptcp/protocol.h | 30 ++++++++++++++++++++++++++++++ net/mptcp/subflow.c | 3 +++ 3 files changed, 63 insertions(+), 14 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 07e1703688ea..c85ad7ef29b0 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -923,6 +923,7 @@ static bool __mptcp_finish_join(struct mptcp_sock *msk,= struct sock *ssk) mptcp_sockopt_sync_locked(msk, ssk); mptcp_stop_tout_timer(sk); __mptcp_propagate_sndbuf(sk, ssk); + __mptcp_propagate_rcvspace(sk, ssk); return true; } =20 @@ -2050,17 +2051,21 @@ static int __mptcp_recvmsg_mskq(struct sock *sk, st= ruct msghdr *msg, return copied; } =20 -static void mptcp_rcv_space_init(struct mptcp_sock *msk, const struct sock= *ssk) +static void mptcp_rcv_space_init(struct mptcp_sock *msk) { - const struct tcp_sock *tp =3D tcp_sk(ssk); + struct sock *sk =3D (struct sock *)msk; =20 msk->rcvspace_init =3D 1; =20 - /* initial rcv_space offering made to peer */ - msk->rcvq_space.space =3D min_t(u32, tp->rcv_wnd, - TCP_INIT_CWND * tp->advmss); - if (msk->rcvq_space.space =3D=3D 0) + mptcp_data_lock(sk); + __mptcp_sync_rcvspace(sk); + + /* Paranoid check: at least one subflow pushed data to the msk. */ + if (msk->rcvq_space.space =3D=3D 0) { + DEBUG_NET_WARN_ON_ONCE(1); msk->rcvq_space.space =3D TCP_INIT_CWND * TCP_MSS_DEFAULT; + } + mptcp_data_unlock(sk); } =20 /* receive buffer autotuning. See tcp_rcv_space_adjust for more informati= on. @@ -2081,7 +2086,7 @@ static void mptcp_rcv_space_adjust(struct mptcp_sock = *msk, int copied) return; =20 if (!msk->rcvspace_init) - mptcp_rcv_space_init(msk, msk->first); + mptcp_rcv_space_init(msk); =20 msk->rcvq_space.copied +=3D copied; =20 @@ -3507,6 +3512,7 @@ struct sock *mptcp_sk_clone_init(const struct sock *s= k, */ mptcp_copy_inaddrs(nsk, ssk); __mptcp_propagate_sndbuf(nsk, ssk); + __mptcp_propagate_rcvspace(nsk, ssk); =20 msk->rcvq_space.time =3D mptcp_stamp(); =20 @@ -3606,8 +3612,10 @@ static void mptcp_release_cb(struct sock *sk) __mptcp_sync_state(sk, msk->pending_state); if (__test_and_clear_bit(MPTCP_ERROR_REPORT, &msk->cb_flags)) __mptcp_error_report(sk); - if (__test_and_clear_bit(MPTCP_SYNC_SNDBUF, &msk->cb_flags)) + if (__test_and_clear_bit(MPTCP_SYNC_SNDBUF, &msk->cb_flags)) { __mptcp_sync_sndbuf(sk); + __mptcp_sync_rcvspace(sk); + } } } =20 @@ -3723,13 +3731,13 @@ bool mptcp_finish_join(struct sock *ssk) { struct mptcp_subflow_context *subflow =3D mptcp_subflow_ctx(ssk); struct mptcp_sock *msk =3D mptcp_sk(subflow->conn); - struct sock *parent =3D (void *)msk; + struct sock *sk =3D (void *)msk; bool ret =3D true; =20 pr_debug("msk=3D%p, subflow=3D%p\n", msk, subflow); =20 /* mptcp socket already closing? */ - if (!mptcp_is_fully_established(parent)) { + if (!mptcp_is_fully_established(sk)) { subflow->reset_reason =3D MPTCP_RST_EMPTCP; return false; } @@ -3743,7 +3751,15 @@ bool mptcp_finish_join(struct sock *ssk) } mptcp_subflow_joined(msk, ssk); spin_unlock_bh(&msk->fallback_lock); - mptcp_propagate_sndbuf(parent, ssk); + mptcp_data_lock(sk); + if (!sock_owned_by_user(sk)) { + __mptcp_propagate_sndbuf(sk, ssk); + __mptcp_propagate_rcvspace(sk, ssk); + } else { + __mptcp_bl_rcvspace(sk, ssk); + __set_bit(MPTCP_SYNC_SNDBUF, &mptcp_sk(sk)->cb_flags); + } + mptcp_data_unlock(sk); return true; } =20 @@ -3755,8 +3771,8 @@ bool mptcp_finish_join(struct sock *ssk) /* If we can't acquire msk socket lock here, let the release callback * handle it */ - mptcp_data_lock(parent); - if (!sock_owned_by_user(parent)) { + mptcp_data_lock(sk); + if (!sock_owned_by_user(sk)) { ret =3D __mptcp_finish_join(msk, ssk); if (ret) { sock_hold(ssk); @@ -3767,7 +3783,7 @@ bool mptcp_finish_join(struct sock *ssk) list_add_tail(&subflow->node, &msk->join_list); __set_bit(MPTCP_FLUSH_JOIN_LIST, &msk->cb_flags); } - mptcp_data_unlock(parent); + mptcp_data_unlock(sk); =20 if (!ret) { err_prohibited: diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index 7c9e143c0fb5..adc0851bad69 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -360,6 +360,7 @@ struct mptcp_sock { =20 struct list_head backlog_list; /* protected by the data lock */ u32 backlog_len; + u32 bl_space; /* rcvspace propagation via bl */ }; =20 #define mptcp_data_lock(sk) spin_lock_bh(&(sk)->sk_lock.slock) @@ -976,6 +977,35 @@ static inline void mptcp_write_space(struct sock *sk) sk_stream_write_space(sk); } =20 +static inline void __mptcp_sync_rcvspace(struct sock *sk) +{ + struct mptcp_sock *msk =3D mptcp_sk(sk); + + msk->rcvq_space.space +=3D msk->bl_space; + msk->bl_space =3D 0; +} + +static inline u32 mptcp_subflow_rcvspace(struct sock *ssk) +{ + struct tcp_sock *tp =3D tcp_sk(ssk); + int space; + + space =3D min_t(u32, tp->rcv_wnd, TCP_INIT_CWND * tp->advmss); + if (space =3D=3D 0) + space =3D TCP_INIT_CWND * TCP_MSS_DEFAULT; + return space; +} + +static inline void __mptcp_propagate_rcvspace(struct sock *sk, struct sock= *ssk) +{ + mptcp_sk(sk)->rcvq_space.space +=3D mptcp_subflow_rcvspace(ssk); +} + +static inline void __mptcp_bl_rcvspace(struct sock *sk, struct sock *ssk) +{ + mptcp_sk(sk)->bl_space +=3D mptcp_subflow_rcvspace(ssk); +} + static inline void __mptcp_sync_sndbuf(struct sock *sk) { struct mptcp_subflow_context *subflow; diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index b64ab7649908..33b88991cffb 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -462,6 +462,7 @@ void __mptcp_sync_state(struct sock *sk, int state) =20 subflow =3D mptcp_subflow_ctx(ssk); __mptcp_propagate_sndbuf(sk, ssk); + __mptcp_sync_rcvspace(sk); =20 if (sk->sk_state =3D=3D TCP_SYN_SENT) { /* subflow->idsn is always available is TCP_SYN_SENT state, @@ -516,8 +517,10 @@ static void mptcp_propagate_state(struct sock *sk, str= uct sock *ssk, =20 if (!sock_owned_by_user(sk)) { __mptcp_sync_state(sk, ssk->sk_state); + __mptcp_propagate_rcvspace(sk, ssk); } else { msk->pending_state =3D ssk->sk_state; + __mptcp_bl_rcvspace(sk, ssk); __set_bit(MPTCP_SYNC_STATE, &msk->cb_flags); } mptcp_data_unlock(sk); --=20 2.51.0 From nobody Thu Nov 27 12:35:52 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DCFF1346776 for ; Tue, 4 Nov 2025 21:52:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762293132; cv=none; b=pN7BdojRlrndibCAcsJvK+aVF3vop+0ZOe4Eqe5WV69ONOMlgMBC7AgnyVDChl1AhYcsi/qB8/dkMbEf32wlst1/2PSKbboPX8j98p6u1FLyOHtWDFYV1U4tNpS/U/28hGJq4TcahQpTSRuhG/ZRdZnlsQDh0FTUOPRhEAmpn+M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762293132; c=relaxed/simple; bh=2mtXytkvgckfgdHJsby+YlnG/UyjFCKuZpllE4EOTR4=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=jPFaYhsgm2673C/eMvtgw7dtBuXxUikg0+xo1BXA/1W3Nk97lGRHMzK97tYcSuXrB5D8YRsu1yyQ1YgaL+/6eovCxE8aDks7OryZALJEZzDOrywXm5p1jCugRyGELIqnCsq27Xp4PVJmT0QwnYeyaxUGd+HgoIOJc/jHqWvM47A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=cYWD400f; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="cYWD400f" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1762293130; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tDXgwkaD72st4VwQNZpBl12EN3BOQGL0r6lD2/jXtAE=; b=cYWD400f64+QmTqyqv/NtdfE8lI+rFgqRjprSGAH2ECH/WCdqTQIVU6xZxgnONe42bKajY PTrXij5Ur8B2F3QdsNX/9HzUrlSZY9TdxrEAHAc7FpuqB/jVXqCpOroV23tgv0RY2Q+jZD 6hRpc4F0ht+nxXs2I8kocxItjptIKVs= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-691-qqmGx3GsMJiwA6ZEmSA1PQ-1; Tue, 04 Nov 2025 16:52:08 -0500 X-MC-Unique: qqmGx3GsMJiwA6ZEmSA1PQ-1 X-Mimecast-MFC-AGG-ID: qqmGx3GsMJiwA6ZEmSA1PQ_1762293128 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id F2B46195608A for ; Tue, 4 Nov 2025 21:52:07 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.45.224.32]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id C392D19560A2 for ; Tue, 4 Nov 2025 21:52:06 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Subject: [PATCH v2 mptcp-next 6/7] mptcp: better mptcp-level rtt estimator Date: Tue, 4 Nov 2025 22:51:40 +0100 Message-ID: In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: lPCphFaG4rbInf_eRdwx2EjkMBGlSflAXba47QeSlyQ_1762293128 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" On high speed links, the MPTCP-level receive buffer auto-tuning happens with a frequency well above the TCP-level's one. That in turn can cause excessive/unneeded receive buffer increase. On such links, the initial rtt_us value is considerably higher than the actual delay, but the current mptcp_rcv_space_adjust() logic prevents msk->rcvq_space.rtt_us from decreasing. Address the issue with a more accurate RTT estimation strategy: the MPTCP-level RTT is set to the minimum of all the subflows, in a rcv-win based interval, feeding data into the MPTCP-receive buffer. Use some care to avoid updating msk and ssk level fields too often. Fixes: a6b118febbab ("mptcp: add receive buffer auto-tuning") Signed-off-by: Paolo Abeni --- v1 -> v2: - do not use explicit reset flags - do rcv win based decision instead - discard 0 rtt_us samples from subflows - discard samples on non empty rx queue - discard "too high" samples, see the code comments WRT the whys --- include/trace/events/mptcp.h | 2 +- net/mptcp/protocol.c | 74 ++++++++++++++++++++++-------------- net/mptcp/protocol.h | 7 +++- 3 files changed, 53 insertions(+), 30 deletions(-) diff --git a/include/trace/events/mptcp.h b/include/trace/events/mptcp.h index 71fd6d33f48b..999133000cb8 100644 --- a/include/trace/events/mptcp.h +++ b/include/trace/events/mptcp.h @@ -212,7 +212,7 @@ TRACE_EVENT(mptcp_rcvbuf_grow, __be32 *p32; =20 __entry->time =3D time; - __entry->rtt_us =3D msk->rcvq_space.rtt_us >> 3; + __entry->rtt_us =3D msk->rcv_rtt_est.rtt_us >> 3; __entry->copied =3D msk->rcvq_space.copied; __entry->inq =3D mptcp_inq_hint(sk); __entry->space =3D msk->rcvq_space.space; diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index c85ad7ef29b0..414cca078541 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -868,6 +868,42 @@ static bool move_skbs_to_msk(struct mptcp_sock *msk, s= truct sock *ssk) return moved; } =20 +static void mptcp_rcv_rtt_update(struct mptcp_sock *msk, + struct mptcp_subflow_context *subflow) +{ + const struct tcp_sock *tp =3D tcp_sk(subflow->tcp_sock); + u32 rtt_us =3D tp->rcv_rtt_est.rtt_us; + u8 sr =3D tp->scaling_ratio; + + /* MPTCP can react to incoming acks pushing data on different subflows, + * causing apparent high RTT: ignore large samples; also do the update + * only on RTT changes + */ + if (tp->rcv_rtt_est.seq =3D=3D subflow->prev_rtt_seq || + (subflow->prev_rtt_us && (rtt_us >> 1) > subflow->prev_rtt_us)) + return; + + /* Similar to plain TCP, only consider samples with empty RX queue. */ + if (!rtt_us || mptcp_data_avail(msk)) + return; + + /* Refresh the RTT after a full win per subflow */ + subflow->prev_rtt_us =3D rtt_us; + subflow->prev_rtt_seq =3D tp->rcv_rtt_est.seq; + if (after(subflow->map_seq, msk->rcv_rtt_est.seq)) { + msk->rcv_rtt_est.seq =3D subflow->map_seq + + tp->rcv_wnd * msk->pm.extra_subflows; + msk->rcv_rtt_est.rtt_us =3D rtt_us; + msk->scaling_ratio =3D sr; + return; + } + + if (rtt_us < msk->rcv_rtt_est.rtt_us) + msk->rcv_rtt_est.rtt_us =3D rtt_us; + if (sr < msk->scaling_ratio) + msk->scaling_ratio =3D sr; +} + void mptcp_data_ready(struct sock *sk, struct sock *ssk) { struct mptcp_subflow_context *subflow =3D mptcp_subflow_ctx(ssk); @@ -881,6 +917,7 @@ void mptcp_data_ready(struct sock *sk, struct sock *ssk) return; =20 mptcp_data_lock(sk); + mptcp_rcv_rtt_update(msk, subflow); if (!sock_owned_by_user(sk)) { /* Wake-up the reader only for in-sequence data */ if (move_skbs_to_msk(msk, ssk) && mptcp_epollin_ready(sk)) @@ -2058,6 +2095,7 @@ static void mptcp_rcv_space_init(struct mptcp_sock *m= sk) msk->rcvspace_init =3D 1; =20 mptcp_data_lock(sk); + msk->rcv_rtt_est.seq =3D atomic64_read(&msk->rcv_wnd_sent); __mptcp_sync_rcvspace(sk); =20 /* Paranoid check: at least one subflow pushed data to the msk. */ @@ -2076,9 +2114,8 @@ static void mptcp_rcv_space_adjust(struct mptcp_sock = *msk, int copied) { struct mptcp_subflow_context *subflow; struct sock *sk =3D (struct sock *)msk; - u8 scaling_ratio =3D U8_MAX; - u32 time, advmss =3D 1; - u64 rtt_us, mstamp; + u32 rtt_us, time; + u64 mstamp; =20 msk_owned_by_me(msk); =20 @@ -2093,29 +2130,8 @@ static void mptcp_rcv_space_adjust(struct mptcp_sock= *msk, int copied) mstamp =3D mptcp_stamp(); time =3D tcp_stamp_us_delta(mstamp, READ_ONCE(msk->rcvq_space.time)); =20 - rtt_us =3D msk->rcvq_space.rtt_us; - if (rtt_us && time < (rtt_us >> 3)) - return; - - rtt_us =3D 0; - mptcp_for_each_subflow(msk, subflow) { - const struct tcp_sock *tp; - u64 sf_rtt_us; - u32 sf_advmss; - - tp =3D tcp_sk(mptcp_subflow_tcp_sock(subflow)); - - sf_rtt_us =3D READ_ONCE(tp->rcv_rtt_est.rtt_us); - sf_advmss =3D READ_ONCE(tp->advmss); - - rtt_us =3D max(sf_rtt_us, rtt_us); - advmss =3D max(sf_advmss, advmss); - scaling_ratio =3D min(tp->scaling_ratio, scaling_ratio); - } - - msk->rcvq_space.rtt_us =3D rtt_us; - msk->scaling_ratio =3D scaling_ratio; - if (time < (rtt_us >> 3) || rtt_us =3D=3D 0) + rtt_us =3D READ_ONCE(msk->rcv_rtt_est.rtt_us); + if (rtt_us =3D=3D U32_MAX || time < (rtt_us >> 3)) return; =20 if (msk->rcvq_space.copied <=3D msk->rcvq_space.space) @@ -2941,7 +2957,8 @@ static void __mptcp_init_sock(struct sock *sk) msk->scaling_ratio =3D TCP_DEFAULT_SCALING_RATIO; msk->backlog_len =3D 0; msk->rcvq_space.copied =3D 0; - msk->rcvq_space.rtt_us =3D 0; + msk->rcv_rtt_est.rtt_us =3D U32_MAX; + msk->scaling_ratio =3D U8_MAX; =20 WRITE_ONCE(msk->first, NULL); inet_csk(sk)->icsk_sync_mss =3D mptcp_sync_mss; @@ -3385,7 +3402,8 @@ static int mptcp_disconnect(struct sock *sk, int flag= s) msk->bytes_retrans =3D 0; msk->rcvspace_init =3D 0; msk->rcvq_space.copied =3D 0; - msk->rcvq_space.rtt_us =3D 0; + msk->scaling_ratio =3D U8_MAX; + msk->rcv_rtt_est.rtt_us =3D U32_MAX; =20 /* for fallback's sake */ WRITE_ONCE(msk->ack_seq, 0); diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index adc0851bad69..051f21b06d33 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -340,11 +340,14 @@ struct mptcp_sock { */ struct mptcp_pm_data pm; struct mptcp_sched_ops *sched; + struct { + u32 rtt_us; /* Minimum RTT of subflows */ + u64 seq; /* Refresh RTT after this seq */ + } rcv_rtt_est; struct { int space; /* bytes copied in last measurement window */ int copied; /* bytes copied in this measurement window */ u64 time; /* start time of measurement window */ - u64 rtt_us; /* last maximum rtt of subflows */ } rcvq_space; u8 scaling_ratio; bool allow_subflows; @@ -523,6 +526,8 @@ struct mptcp_subflow_context { u32 map_data_len; __wsum map_data_csum; u32 map_csum_len; + u32 prev_rtt_us; + u32 prev_rtt_seq; u32 request_mptcp : 1, /* send MP_CAPABLE */ request_join : 1, /* send MP_JOIN */ request_bkup : 1, --=20 2.51.0 From nobody Thu Nov 27 12:35:52 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A077D346776 for ; Tue, 4 Nov 2025 21:52:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762293135; cv=none; b=W1ZJc2poeM7/fojc6XXFY9cflk9ajoP7ko9M/Cs1oos/gLqV0SSWW3OuttYXkBEY5PLEeiWJfE07WcecO478ru8KZU3N+h0wqmjrchh0WseyXELKVNzIs6EPNTWXWDbqrGtUigYd03lOnGDn6sGV1NjcJxp0XnLYrn3oPayf0Uk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1762293135; c=relaxed/simple; bh=cQtygeHmS0T4Fb8OK4PX2TgRnKAB9pNZGKcIqyRKGlA=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=ttDvbs4ZlWE4kh5Q/1dcYnVLHpExjIn18iy+Uc7BhDyhJV88CQqDjbXvLUEsZc6BuUjYXWnGbzol86zJhT1WE4/Y+ezjS4p4nhvBgHEp5b3mACd2zKDe+MDLFbI55TFliETbXI+2onhVveIBjpnx5RO9/HFBpW4Gu1/IlVed0/E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=augKDP3i; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="augKDP3i" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1762293132; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zznC7FJgXOMRZVH7xPPMHGah7C+BgELZa2Qbf29YCCw=; b=augKDP3ir3qn9p6mvCQYEaERmzPS/xe4kDP7g1NewN11EHtWvA4UE91Y8/yILnEtrIbzEc ltWnzK4MrQEm1Xkxgno+T0VS7KIcAjB2DR/vnPPGbaghPx3FA+VQw4gZ67D7R0b/ceW7wc B3K8IpoYfMfNUzsCZGkR1hi/P3DVpHI= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-38-j1qKVhPkMImYLDsenNbEoQ-1; Tue, 04 Nov 2025 16:52:11 -0500 X-MC-Unique: j1qKVhPkMImYLDsenNbEoQ-1 X-Mimecast-MFC-AGG-ID: j1qKVhPkMImYLDsenNbEoQ_1762293129 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 7B451180065F for ; Tue, 4 Nov 2025 21:52:09 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.45.224.32]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 6ACEA19560A2 for ; Tue, 4 Nov 2025 21:52:08 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Subject: [PATCH v2 mptcp-next 7/7] mptcp: add receive queue awareness in tcp_rcv_space_adjust() Date: Tue, 4 Nov 2025 22:51:41 +0100 Message-ID: <639f98b0d408d8acef5876b58e05d343091418c3.1762292476.git.pabeni@redhat.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: 4LNPHb1Zvk7Wzv_FUEdJvfS0QT1rT_YKaadwRhxgdCo_1762293129 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" This is the mptcp counter-part of commit ea33537d8292 ("tcp: add receive queue awareness in tcp_rcv_space_adjust()"). Prior to this commit: ESTAB 33165568 0 192.168.255.2:5201 192.168.255.1:53380 \ skmem:(r33076416,rb33554432,t0,tb91136,f448,w0,o0,bl0,d0) After: ESTAB 3279168 0 192.168.255.2:5201 192.168.255.1]:53042 \ skmem:(r3190912,rb3719956,t0,tb91136,f1536,w0,o0,bl0,d0) (same tput) Signed-off-by: Paolo Abeni --- net/mptcp/protocol.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 414cca078541..4b1944e40d91 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -2134,11 +2134,13 @@ static void mptcp_rcv_space_adjust(struct mptcp_soc= k *msk, int copied) if (rtt_us =3D=3D U32_MAX || time < (rtt_us >> 3)) return; =20 - if (msk->rcvq_space.copied <=3D msk->rcvq_space.space) + copied =3D msk->rcvq_space.copied; + copied -=3D mptcp_inq_hint(sk); + if (copied <=3D msk->rcvq_space.space) goto new_measure; =20 trace_mptcp_rcvbuf_grow(sk, time); - if (mptcp_rcvbuf_grow(sk, msk->rcvq_space.copied)) { + if (mptcp_rcvbuf_grow(sk, copied)) { /* Make subflows follow along. If we do not do this, we * get drops at subflow level if skbs can't be moved to * the mptcp rx queue fast enough (announced rcv_win can @@ -2152,7 +2154,7 @@ static void mptcp_rcv_space_adjust(struct mptcp_sock = *msk, int copied) slow =3D lock_sock_fast(ssk); /* subflows can be added before tcp_init_transfer() */ if (tcp_sk(ssk)->rcvq_space.space) - tcp_rcvbuf_grow(ssk, msk->rcvq_space.copied); + tcp_rcvbuf_grow(ssk, copied); unlock_sock_fast(ssk, slow); } } --=20 2.51.0