From nobody Thu Nov 27 12:35:53 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C7A7E1B4F0A for ; Thu, 20 Nov 2025 08:40:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763628008; cv=none; b=O380ohbs9osc1HmH9RlfaZS0sVCXGGJ53r3BUbIv8Wou9BBVBwNre4db4A9KG3EnCA9QTncsaJW3PWOAiUW0JpKHfozDCsDr0RjNgnNjBpUx4FFdsdBY1sYZLjtALFEwWbJP3FHxZV0k/gXlfBBQsjDfaAqzDxsEodgzfAuqjBU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763628008; c=relaxed/simple; bh=aURQQXEryRFfJjkaU6HdjVlmD3kMg+HHLP7Rbjx7jeg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=EhuDFo8/GhQnmnlh6PKlPitXBChdoF8ZvkJrPTiqxQ3X4BL3NiRsYdCmdFB5//md6U7oU6mxe4nkXNAYgMLfoQUORNpiBuTdSQYPDe9Xaxa8oV9iK54L10dTmRJMgZwxpEiPWUbwTGCkP6GTMCexvxo3pnqwlaPgSr5UTaS6zhw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=TC0KWa6N; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="TC0KWa6N" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1763628003; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=sVyLFxsoy29T4gggCzNi54Gin1PHAW9O695ZnK7TPeY=; b=TC0KWa6N427oV0pw+QbqlvOXqOF1gqARydkBF51oi0wy6JA2viCxnq3I8UsbyCdSW4b/7a UBWaC+l1+O0dRBLSe8aTAlLLiT+Isxgoo7Q9RVlDFgNpwD6ujJqYIDBPtqr5gpZqvhpCbg dM5WNTnHzi7OsNrvb2OTkrDHIvVPFxs= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-586-J8gbsklnMfyehECJhz_oGQ-1; Thu, 20 Nov 2025 03:40:00 -0500 X-MC-Unique: J8gbsklnMfyehECJhz_oGQ-1 X-Mimecast-MFC-AGG-ID: J8gbsklnMfyehECJhz_oGQ_1763627999 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 4D4AC19560A7; Thu, 20 Nov 2025 08:39:59 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.44.33.89]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 17BAC3003E40; Thu, 20 Nov 2025 08:39:57 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Cc: martineau@kernel.org Subject: [PATCH v7 mptcp-next 1/6] trace: mptcp: add mptcp_rcvbuf_grow tracepoint Date: Thu, 20 Nov 2025 09:39:45 +0100 Message-ID: <80017719d0443742e0d5a26de64da5570a85fc54.1763625391.git.pabeni@redhat.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: Y34yxVRy7k8DQoAUxFZrCCa6scWXhcEvYaWLNKib4Uw_1763627999 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" Similar to tcp, provide a new tracepoint to better understand mptcp_rcv_space_adjust() behavior, which presents many artifacts. Note that the used format string is so long that I preferred wrap it, contrary to guidance for quoted strings. Reviewed-by: Mat Martineau Signed-off-by: Paolo Abeni --- v4 -> v5: - fixed a couple of checkpatch issues. v2 -> v3: - use __entry->family; note that its value is show as raw number instead of string, as show_family_name is available only to net/core/ trace and I preferred not moving the mptcp traces there as we need mptcp-specific helpers, too. --- include/trace/events/mptcp.h | 80 ++++++++++++++++++++++++++++++++++++ net/mptcp/protocol.c | 3 ++ 2 files changed, 83 insertions(+) diff --git a/include/trace/events/mptcp.h b/include/trace/events/mptcp.h index 085b749cdd97..269d949b2025 100644 --- a/include/trace/events/mptcp.h +++ b/include/trace/events/mptcp.h @@ -5,7 +5,13 @@ #if !defined(_TRACE_MPTCP_H) || defined(TRACE_HEADER_MULTI_READ) #define _TRACE_MPTCP_H =20 +#include +#include #include +#include +#include +#include +#include =20 #define show_mapping_status(status) \ __print_symbolic(status, \ @@ -178,6 +184,80 @@ TRACE_EVENT(subflow_check_data_avail, __entry->skb) ); =20 +#include + +TRACE_EVENT(mptcp_rcvbuf_grow, + + TP_PROTO(struct sock *sk, int time), + + TP_ARGS(sk, time), + + TP_STRUCT__entry( + __field(int, time) + __field(__u32, rtt_us) + __field(__u32, copied) + __field(__u32, inq) + __field(__u32, space) + __field(__u32, ooo_space) + __field(__u32, rcvbuf) + __field(__u32, rcv_wnd) + __field(__u8, scaling_ratio) + __field(__u16, sport) + __field(__u16, dport) + __field(__u16, family) + __array(__u8, saddr, 4) + __array(__u8, daddr, 4) + __array(__u8, saddr_v6, 16) + __array(__u8, daddr_v6, 16) + __field(const void *, skaddr) + ), + + TP_fast_assign( + struct mptcp_sock *msk =3D mptcp_sk(sk); + struct inet_sock *inet =3D inet_sk(sk); + bool ofo_empty; + __be32 *p32; + + __entry->time =3D time; + __entry->rtt_us =3D msk->rcvq_space.rtt_us >> 3; + __entry->copied =3D msk->rcvq_space.copied; + __entry->inq =3D mptcp_inq_hint(sk); + __entry->space =3D msk->rcvq_space.space; + ofo_empty =3D RB_EMPTY_ROOT(&msk->out_of_order_queue); + __entry->ooo_space =3D ofo_empty ? 0 : + MPTCP_SKB_CB(msk->ooo_last_skb)->end_seq - + msk->ack_seq; + + __entry->rcvbuf =3D sk->sk_rcvbuf; + __entry->rcv_wnd =3D atomic64_read(&msk->rcv_wnd_sent) - + msk->ack_seq; + __entry->scaling_ratio =3D msk->scaling_ratio; + __entry->sport =3D ntohs(inet->inet_sport); + __entry->dport =3D ntohs(inet->inet_dport); + __entry->family =3D sk->sk_family; + + p32 =3D (__be32 *)__entry->saddr; + *p32 =3D inet->inet_saddr; + + p32 =3D (__be32 *)__entry->daddr; + *p32 =3D inet->inet_daddr; + + TP_STORE_ADDRS(__entry, inet->inet_saddr, inet->inet_daddr, + sk->sk_v6_rcv_saddr, sk->sk_v6_daddr); + + __entry->skaddr =3D sk; + ), + + TP_printk("time=3D%u rtt_us=3D%u copied=3D%u inq=3D%u space=3D%u ooo=3D%u= scaling_ratio=3D%u " + "rcvbuf=3D%u rcv_wnd=3D%u family=3D%d sport=3D%hu dport=3D%hu saddr=3D= %pI4 " + "daddr=3D%pI4 saddrv6=3D%pI6c daddrv6=3D%pI6c skaddr=3D%p", + __entry->time, __entry->rtt_us, __entry->copied, + __entry->inq, __entry->space, __entry->ooo_space, + __entry->scaling_ratio, __entry->rcvbuf, __entry->rcv_wnd, + __entry->family, __entry->sport, __entry->dport, + __entry->saddr, __entry->daddr, __entry->saddr_v6, + __entry->daddr_v6, __entry->skaddr) +); #endif /* _TRACE_MPTCP_H */ =20 /* This part must be outside protection */ diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index f8756521b2e2..64f592a7897c 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -28,6 +28,8 @@ #include "protocol.h" #include "mib.h" =20 +static unsigned int mptcp_inq_hint(const struct sock *sk); + #define CREATE_TRACE_POINTS #include =20 @@ -2118,6 +2120,7 @@ static void mptcp_rcv_space_adjust(struct mptcp_sock = *msk, int copied) if (msk->rcvq_space.copied <=3D msk->rcvq_space.space) goto new_measure; =20 + trace_mptcp_rcvbuf_grow(sk, time); if (mptcp_rcvbuf_grow(sk, msk->rcvq_space.copied)) { /* Make subflows follow along. If we do not do this, we * get drops at subflow level if skbs can't be moved to --=20 2.51.1 From nobody Thu Nov 27 12:35:53 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8646F2D9789 for ; Thu, 20 Nov 2025 08:40:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763628009; cv=none; b=mDWjV0f8lgSX0UHjVVhJB28opMP+PZUaX9YBruM+xRI/ovzJyaCJajBiZsqTljfCCvW5c1ZxGdWLHyCx9SBsoIfF0Xd5YDgv+EftfIMikCDtKjln+iTo113p1bR9PKfXaH5dD+7RQaPw88C0v9+Ex8iID5+hO+cms/e2KPSDO/g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763628009; c=relaxed/simple; bh=BptF/KopwGiedURcoRj3W1/F6WZNSbGRSKtLJtmeocw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=rz24/votO0gX1egItf7BpiRnSdm1nERyves3MIQ9euWiA/+mNW5iQG7ukyIOqgOd75AdOZH1XPIRS4anwPDJ1RqF5FCcSrQeZYioS/lJj9gkKkp/8pILjlpjKeQh3AhNBB7WYt//AMgN6y/RW21u+/dQzuzQNf4nu5RI8DJ4cys= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=K8UbKtd1; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="K8UbKtd1" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1763628006; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=W9hFOFktoE5+jnuEW3U7u6oJwwx87zQVPK58e+dZDwk=; b=K8UbKtd1E94h5hC36jU4yNfJsNnSeun6I4CBa+Y6Hg30KM9sEezTZ9+lR+Z4kABalo0Iwm YxomxWTc21rnCenKaTzqZ6D9ftb2DZERwIkUjb0RzJherkcccEG8GHT+i4FkvLFSgoh41J L5RrJ74Wv/ooBtp2tMByd0VUeCZONgQ= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-661-wEsfFdg8PXWQKuL7hUto7w-1; Thu, 20 Nov 2025 03:40:02 -0500 X-MC-Unique: wEsfFdg8PXWQKuL7hUto7w-1 X-Mimecast-MFC-AGG-ID: wEsfFdg8PXWQKuL7hUto7w_1763628001 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 4D2D8180035D; Thu, 20 Nov 2025 08:40:01 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.44.33.89]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 27BA93001E83; Thu, 20 Nov 2025 08:39:59 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Cc: martineau@kernel.org Subject: [PATCH v7 mptcp-next 2/6] mptcp: do not account for OoO in mptcp_rcvbuf_grow() Date: Thu, 20 Nov 2025 09:39:46 +0100 Message-ID: <069348a7ea6d0ee86d1003a89dc343c540efa776.1763625391.git.pabeni@redhat.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: PDMIgSD45TnBwmQ3cM2Rh2Xrj-E7-bFlZFX07sl3_0I_1763628001 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" MPTCP-level OoOs are physiological when multiple subflows are active concurrently and will not cause retransmissions nor are caused by drops. Accounting for them in mptcp_rcvbuf_grow() causes the rcvbuf slowly drifting towards tcp_rmem[2]. Remove such accounting. Note that subflows will still account for TCP-level OoO when the MPTCP-level rcvbuf is propagated. This also closes a subtle and very unlikely race condition with rcvspace init; active sockets with user-space holding the msk-level socket lock, could complete such initialization in the receive callback, after that the first OoO data reaches the rcvbuf and potentially triggering a divide by zero Oops. Fixes: e118cdc34dd1 ("mptcp: rcvbuf auto-tuning improvement") Signed-off-by: Paolo Abeni Reviewed-by: Mat Martineau --- net/mptcp/protocol.c | 6 ------ 1 file changed, 6 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 64f592a7897c..e31ccc4bbb2d 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -226,9 +226,6 @@ static bool mptcp_rcvbuf_grow(struct sock *sk, u32 newv= al) do_div(grow, oldval); rcvwin +=3D grow << 1; =20 - if (!RB_EMPTY_ROOT(&msk->out_of_order_queue)) - rcvwin +=3D MPTCP_SKB_CB(msk->ooo_last_skb)->end_seq - msk->ack_seq; - cap =3D READ_ONCE(net->ipv4.sysctl_tcp_rmem[2]); =20 rcvbuf =3D min_t(u32, mptcp_space_from_win(sk, rcvwin), cap); @@ -352,9 +349,6 @@ static void mptcp_data_queue_ofo(struct mptcp_sock *msk= , struct sk_buff *skb) end: skb_condense(skb); skb_set_owner_r(skb, sk); - /* do not grow rcvbuf for not-yet-accepted or orphaned sockets. */ - if (sk->sk_socket) - mptcp_rcvbuf_grow(sk, msk->rcvq_space.space); } =20 static void mptcp_init_skb(struct sock *ssk, struct sk_buff *skb, int offs= et, --=20 2.51.1 From nobody Thu Nov 27 12:35:53 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 90C813093A7 for ; Thu, 20 Nov 2025 08:40:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763628012; cv=none; b=t9CexT2YKAnqFb7bAC/NBBvaM3XLHriPuqfhVJCVR5VvtwbLSDt/IUBIPmVLg0VZi6yViMwn45wtogsbJvYWSl2CO4g4sRI6VKOYQKPvrA8N3YMg2Bt3EEnbgnZ0T1rBWtybRGhegXo/hqNcCSI++8IYUHBjWuAfXfx11KfQ1kM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763628012; c=relaxed/simple; bh=XIySVHr1xGFqLAZzVNJCoceiAPT4qi9Fym6TCpJinf4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=Nrn8Oyu3hnTCpHLQT99NJhQrDFHeY4yKjmm34zqncErvkSfn/JwAI5c9rtkAC8c6f1KRrjitoCEXGQhz6GlS7VYJ5hUe5dNn4BYqmVcN0VKqZt4QEkqrmxhhJ/Ni92423Ahm6aYGCd0vT8+XlFc9asycCCagpJe+m235eVEgGJw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=bb+WvWoy; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="bb+WvWoy" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1763628008; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Q5Hjjp4+I4ApWiQR2nRDfv0Ab/gCrFvattRyrPB8cv8=; b=bb+WvWoyFVuXPUQjL1aCvtzyx0zBrlzNflys86zCie95wmH0wniyn/TCD6x7nJdFFeE+br 7XC9Uz9epRRE6igpvY3L+zydfs0dmdZyii9twUGfNXiLoNS/qSMk5vzCz86Ve/QSNImRrt +Y932W17c1JrbywhylN9/2A+pWYMkAw= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-29-IeYrJ-acMrCHTXxapti7WQ-1; Thu, 20 Nov 2025 03:40:03 -0500 X-MC-Unique: IeYrJ-acMrCHTXxapti7WQ-1 X-Mimecast-MFC-AGG-ID: IeYrJ-acMrCHTXxapti7WQ_1763628003 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id EC5C31801235; Thu, 20 Nov 2025 08:40:02 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.44.33.89]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id CCB9C3001E83; Thu, 20 Nov 2025 08:40:01 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Cc: martineau@kernel.org Subject: [PATCH v7 mptcp-next 3/6] mptcp: fix receive space timestamp initialization. Date: Thu, 20 Nov 2025 09:39:47 +0100 Message-ID: <1ff99337d84cb7827b4aed83d6cfeffce9563d80.1763625391.git.pabeni@redhat.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: k-0q7QD86uLXYmrhfyjfrISIdZmP2utHxt0ycdg_sXA_1763628003 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" MPTCP initialize the receive buffer stamp in mptcp_rcv_space_init(), using the provided subflow stamp. Such helper is invoked in several places; for passive sockets, space init happened at clone time. In such scenario, MPTCP ends-up accesses the subflow stamp before its initialization, leading to quite randomic timing for the first receive buffer auto-tune event, as the timestamp for newly created subflow is not refreshed there. Fix the issue moving the stamp initialization out of the mentioned helper, at the data transfer start, and always using a fresh timestamp. Fixes: 013e3179dbd2 ("mptcp: fix rcv space initialization") Reviewed-by: Mat Martineau Signed-off-by: Paolo Abeni --- v6 -> v7 - do not remove the mptcp_rcv_space_init() call in mptcp_sk_clone_init(): v5 -> v6 - really remove the stamp init from mptcp_rcv_space_init() v1 -> v2: - factor out only the tstamp change for better reviewability --- net/mptcp/protocol.c | 8 ++++---- net/mptcp/protocol.h | 5 +++++ 2 files changed, 9 insertions(+), 4 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index e31ccc4bbb2d..b25cc8d5c98d 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -2083,8 +2083,8 @@ static void mptcp_rcv_space_adjust(struct mptcp_sock = *msk, int copied) =20 msk->rcvq_space.copied +=3D copied; =20 - mstamp =3D div_u64(tcp_clock_ns(), NSEC_PER_USEC); - time =3D tcp_stamp_us_delta(mstamp, msk->rcvq_space.time); + mstamp =3D mptcp_stamp(); + time =3D tcp_stamp_us_delta(mstamp, READ_ONCE(msk->rcvq_space.time)); =20 rtt_us =3D msk->rcvq_space.rtt_us; if (rtt_us && time < (rtt_us >> 3)) @@ -3525,6 +3525,7 @@ struct sock *mptcp_sk_clone_init(const struct sock *s= k, __mptcp_propagate_sndbuf(nsk, ssk); =20 mptcp_rcv_space_init(msk, ssk); + msk->rcvq_space.time =3D mptcp_stamp(); =20 if (mp_opt->suboptions & OPTION_MPTCP_MPC_ACK) __mptcp_subflow_fully_established(msk, subflow, mp_opt); @@ -3542,8 +3543,6 @@ void mptcp_rcv_space_init(struct mptcp_sock *msk, con= st struct sock *ssk) msk->rcvq_space.copied =3D 0; msk->rcvq_space.rtt_us =3D 0; =20 - msk->rcvq_space.time =3D tp->tcp_mstamp; - /* initial rcv_space offering made to peer */ msk->rcvq_space.space =3D min_t(u32, tp->rcv_wnd, TCP_INIT_CWND * tp->advmss); @@ -3739,6 +3738,7 @@ void mptcp_finish_connect(struct sock *ssk) * accessing the field below */ WRITE_ONCE(msk->local_key, subflow->local_key); + WRITE_ONCE(msk->rcvq_space.time, mptcp_stamp()); =20 mptcp_pm_new_connection(msk, ssk, 0); } diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index 199f28f3dd5e..95c62f2ac705 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -917,6 +917,11 @@ static inline bool mptcp_is_fully_established(struct s= ock *sk) READ_ONCE(mptcp_sk(sk)->fully_established); } =20 +static inline u64 mptcp_stamp(void) +{ + return div_u64(tcp_clock_ns(), NSEC_PER_USEC); +} + void mptcp_rcv_space_init(struct mptcp_sock *msk, const struct sock *ssk); void mptcp_data_ready(struct sock *sk, struct sock *ssk); bool mptcp_finish_join(struct sock *sk); --=20 2.51.1 From nobody Thu Nov 27 12:35:53 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 90D0E309EE8 for ; Thu, 20 Nov 2025 08:40:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763628011; cv=none; b=Upj62H+3m+BjJ/RC22m7a5+qwJ7sdpKIcccgRttM0zO/Aj9QGeZFNGhJ3YEZ+xcSQWfQp+XN/dyB8SY21WXEMPX1V3fBtYPeI498GKWpCC29L5yXH2YnMXislwc7nDHoXl8vkAuP/CNLJBCcN5fFcA7dPcXiebgbB1GWLjNthIk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763628011; c=relaxed/simple; bh=nofP3u7zgg/MDwd7fZ1oQgkFajXrpDTrN7iAIaod4D8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=XP6Kclz+yX7+c1Q46te0UgWChzYzysCFhhcf4BZu9QtTrxBcVjtOoGZdR+8P5HjQ9Riwiq3mNH++9lBVylGuc5FaphIHIDhlT4/ajZbr4GHBqF+Mxmsrsm+iRcUzG+bs+dDyFLCQH8AWFLnkqGBgUBswcc4vbYt5seInax6QxL0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=R3QnUGHg; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="R3QnUGHg" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1763628008; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bqLmGrnunNaiTfmgkq+7KLGtOgqR3RypDXjcwvpOrNo=; b=R3QnUGHgF3GQqE1TCZ61Ceg2kxRdIzhQWIWItclrU0xS5dK/JlARWVP2Z9WmH8wKI2qDqy Kp2kshtqOzG1pfg6eMg/HNehSRjjFy+OANPO6H3liRK4enDdx282WE7+DXtvaAh2E0aUw/ oHr2zkymfCBx2yFY9sOWU2YhL8qrY8Y= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-671-VwWQFwpFP8aL3s4mFXq5Ag-1; Thu, 20 Nov 2025 03:40:05 -0500 X-MC-Unique: VwWQFwpFP8aL3s4mFXq5Ag-1 X-Mimecast-MFC-AGG-ID: VwWQFwpFP8aL3s4mFXq5Ag_1763628004 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id AF48E19560A7; Thu, 20 Nov 2025 08:40:04 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.44.33.89]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 880E230044DB; Thu, 20 Nov 2025 08:40:03 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Cc: martineau@kernel.org Subject: [PATCH v7 mptcp-next 4/6] mptcp: consolidate rcv space init Date: Thu, 20 Nov 2025 09:39:48 +0100 Message-ID: In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: xIc3YK88fsdF3rl5_MICF8uVjc4SjY1QsY4WrqVhGd0_1763628004 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" MPTCP uses several calls of the mptcp_rcv_space_init() helper to initialize the receive space, with a catch-up call in mptcp_rcv_space_adjust(). Drop all the other strictly not needed invocations and move constant fields initialization at socket init/reset time. This removes a bit of complexity from mptcp DRS code. No functional changes intended. Reviewed-by: Mat Martineau Signed-off-by: Paolo Abeni --- v4 -> v5: - reworded the commit message v1 -> v2: - split helper consolidation out of v1 patch - additionally move 'copied' and 'rtt_us' initialization out of mptcp_rcv_space_init() --- net/mptcp/protocol.c | 30 +++++++++++++++--------------- net/mptcp/protocol.h | 1 - net/mptcp/subflow.c | 2 -- 3 files changed, 15 insertions(+), 18 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index b25cc8d5c98d..40c90d841237 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -2061,6 +2061,21 @@ static int __mptcp_recvmsg_mskq(struct sock *sk, str= uct msghdr *msg, return copied; } =20 +static void mptcp_rcv_space_init(struct mptcp_sock *msk, const struct sock= *ssk) +{ + const struct tcp_sock *tp =3D tcp_sk(ssk); + + msk->rcvspace_init =3D 1; + msk->rcvq_space.copied =3D 0; + msk->rcvq_space.rtt_us =3D 0; + + /* initial rcv_space offering made to peer */ + msk->rcvq_space.space =3D min_t(u32, tp->rcv_wnd, + TCP_INIT_CWND * tp->advmss); + if (msk->rcvq_space.space =3D=3D 0) + msk->rcvq_space.space =3D TCP_INIT_CWND * TCP_MSS_DEFAULT; +} + /* receive buffer autotuning. See tcp_rcv_space_adjust for more informati= on. * * Only difference: Use highest rtt estimate of the subflows in use. @@ -3535,21 +3550,6 @@ struct sock *mptcp_sk_clone_init(const struct sock *= sk, return nsk; } =20 -void mptcp_rcv_space_init(struct mptcp_sock *msk, const struct sock *ssk) -{ - const struct tcp_sock *tp =3D tcp_sk(ssk); - - msk->rcvspace_init =3D 1; - msk->rcvq_space.copied =3D 0; - msk->rcvq_space.rtt_us =3D 0; - - /* initial rcv_space offering made to peer */ - msk->rcvq_space.space =3D min_t(u32, tp->rcv_wnd, - TCP_INIT_CWND * tp->advmss); - if (msk->rcvq_space.space =3D=3D 0) - msk->rcvq_space.space =3D TCP_INIT_CWND * TCP_MSS_DEFAULT; -} - static void mptcp_destroy(struct sock *sk) { struct mptcp_sock *msk =3D mptcp_sk(sk); diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index 95c62f2ac705..ee0dbd6dbacf 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -922,7 +922,6 @@ static inline u64 mptcp_stamp(void) return div_u64(tcp_clock_ns(), NSEC_PER_USEC); } =20 -void mptcp_rcv_space_init(struct mptcp_sock *msk, const struct sock *ssk); void mptcp_data_ready(struct sock *sk, struct sock *ssk); bool mptcp_finish_join(struct sock *sk); bool mptcp_schedule_work(struct sock *sk); diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index 86ce58ae533d..23f439853a7c 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -462,8 +462,6 @@ void __mptcp_sync_state(struct sock *sk, int state) =20 subflow =3D mptcp_subflow_ctx(ssk); __mptcp_propagate_sndbuf(sk, ssk); - if (!msk->rcvspace_init) - mptcp_rcv_space_init(msk, ssk); =20 if (sk->sk_state =3D=3D TCP_SYN_SENT) { /* subflow->idsn is always available is TCP_SYN_SENT state, --=20 2.51.1 From nobody Thu Nov 27 12:35:53 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 016592FF64F for ; Thu, 20 Nov 2025 08:40:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763628014; cv=none; b=ogVN/EOfHyf0HFN3k5baBU2auydCOakzU4E09jibTmMNA76HRVDFKH6bIg9r3IM2fX1ScIjRwyo9ru3bv20k013cPaRPqkE69iOgl4NtnPr4ojfY3K8GiL3pmBCPPEI21eTe/a1V8Oj2Ag8mbqvjhhKk216iYfwtwFoa73ik1ZM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763628014; c=relaxed/simple; bh=VVquttA7yGyhLBbPyWHq/vc/VpHwX7EnHgT8rh/Uiec=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=Er5iTmWm9bG3xE7fDy3Ed1ZZKMBLV0OcA/owG0CSKBmgg4rfQ1olye1+QiazYpe7TzYSk1k1VqrxJsdcVechLObNNYEVqIsT123PMmWcrgxdBLqtbZUW8YQh6/FmZ2vbz6QofOPsfcduXp2V2mInzzkifTU3utyHnpNvagYggBE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=SmHjAbHQ; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="SmHjAbHQ" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1763628011; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=qMo725Nsvwr6yCfluXwFUFy3GgJcnUBlQE2OUpfBoXM=; b=SmHjAbHQzYNzupnSVKKB02wX1AUi3LJxoXF3KSk94oiwgC/NSiDUIi823xIpF45aJ3Z4B/ eevoficWEPAbyVUgjwR0S4O1UYY9/ksc2W3CIGjqtXw+zwXs0XMOsPKpkomWtTqtgHJGdi dEJFH5DqUQZnGHd04XEMfGk9/aTpZjo= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-669-DJaiaEGVMKCK31u9SrFQ3A-1; Thu, 20 Nov 2025 03:40:07 -0500 X-MC-Unique: DJaiaEGVMKCK31u9SrFQ3A-1 X-Mimecast-MFC-AGG-ID: DJaiaEGVMKCK31u9SrFQ3A_1763628006 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 681A119540DD; Thu, 20 Nov 2025 08:40:06 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.44.33.89]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 34ABA3001E83; Thu, 20 Nov 2025 08:40:04 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Cc: martineau@kernel.org Subject: [PATCH v7 mptcp-next 5/6] mptcp: better mptcp-level RTT estimator Date: Thu, 20 Nov 2025 09:39:49 +0100 Message-ID: <6ffc28b03d35f1b5698ff9ed6b22bd3e82fc81be.1763625391.git.pabeni@redhat.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: OzbIsyVOTj6eA48Jsdq7Fd58scFVb3uD8rM9-InRC34_1763628006 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" The current MPTCP-level RTT estimator has several issues. On high speed links, the MPTCP-level receive buffer auto-tuning happens with a frequency well above the TCP-level's one. That in turn can cause excessive/unneeded receive buffer increase. On such links, the initial rtt_us value is considerably higher than the actual delay, and the current mptcp_rcv_space_adjust() updates msk->rcvq_space.rtt_us with a period equal to the such field previous value. If the initial rtt_us is 40ms, its first update will happen after 40ms, even if the subflows see actual RTT orders of magnitude lower. Additionally: - setting the msk rtt to the maximum among all the subflows RTTs makes DRS constantly overshooting the rcvbuf size when a subflow has considerable higher latency than the other(s). - during unidirectional bulk transfers with multiple active subflows, the TCP-level RTT estimator occasionally sees considerably higher value than the real link delay, i.e. when the packet scheduler reacts to an incoming ack on given subflow pushing data on a different subflow. - currently inactive but still open subflows (i.e. switched to backup mode) are always considered when computing the msk-level rtt. Address the all the issues above with a more accurate RTT estimation strategy: the MPTCP-level RTT is set to the minimum of all the subflows actually feeding data into the MPTCP receive buffer, using a small sliding window. While at it, also use EWMA to compute the msk-level scaling_ratio, to that mptcp can avoid traversing the subflow list is mptcp_rcv_space_adjust(). Use some care to avoid updating msk and ssk level fields too often. Fixes: a6b118febbab ("mptcp: add receive buffer auto-tuning") Signed-off-by: Paolo Abeni Reviewed-by: Mat Martineau --- v4 -> v5: - avoid filtering out too high value, use sliding window instead v3 -> v4: - really refresh msk rtt after a full win per subflow (off-by-one in prev revision) - sync mptcp_rcv_space_adjust() comment with the new code v1 -> v2: - do not use explicit reset flags - do rcv win based decision instead - discard 0 rtt_us samples from subflows - discard samples on non empty rx queue - discard "too high" samples, see the code comments WRT the whys --- include/trace/events/mptcp.h | 2 +- net/mptcp/protocol.c | 63 ++++++++++++++++++++---------------- net/mptcp/protocol.h | 38 +++++++++++++++++++++- 3 files changed, 73 insertions(+), 30 deletions(-) diff --git a/include/trace/events/mptcp.h b/include/trace/events/mptcp.h index 269d949b2025..04521acba483 100644 --- a/include/trace/events/mptcp.h +++ b/include/trace/events/mptcp.h @@ -219,7 +219,7 @@ TRACE_EVENT(mptcp_rcvbuf_grow, __be32 *p32; =20 __entry->time =3D time; - __entry->rtt_us =3D msk->rcvq_space.rtt_us >> 3; + __entry->rtt_us =3D mptcp_rtt_us_est(msk) >> 3; __entry->copied =3D msk->rcvq_space.copied; __entry->inq =3D mptcp_inq_hint(sk); __entry->space =3D msk->rcvq_space.space; diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 40c90d841237..5dde016a108d 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -880,6 +880,32 @@ static bool move_skbs_to_msk(struct mptcp_sock *msk, s= truct sock *ssk) return moved; } =20 +static void mptcp_rcv_rtt_update(struct mptcp_sock *msk, + struct mptcp_subflow_context *subflow) +{ + const struct tcp_sock *tp =3D tcp_sk(subflow->tcp_sock); + u32 rtt_us =3D tp->rcv_rtt_est.rtt_us; + int id; + + /* Update once per subflow per rcvwnd to avoid touching the msk + * too often. + */ + if (!rtt_us || tp->rcv_rtt_est.seq =3D=3D subflow->prev_rtt_seq) + return; + + subflow->prev_rtt_seq =3D tp->rcv_rtt_est.seq; + + /* Pairs with READ_ONCE() in mptcp_rtt_us_est(). */ + id =3D msk->rcv_rtt_est.next_sample; + WRITE_ONCE(msk->rcv_rtt_est.samples[id], rtt_us); + if (++msk->rcv_rtt_est.next_sample =3D=3D MPTCP_RTT_SAMPLES) + msk->rcv_rtt_est.next_sample =3D 0; + + /* EWMA among the incoming subflows */ + msk->scaling_ratio =3D ((msk->scaling_ratio << 3) - msk->scaling_ratio + + tp->scaling_ratio) >> 3; +} + void mptcp_data_ready(struct sock *sk, struct sock *ssk) { struct mptcp_subflow_context *subflow =3D mptcp_subflow_ctx(ssk); @@ -893,6 +919,7 @@ void mptcp_data_ready(struct sock *sk, struct sock *ssk) return; =20 mptcp_data_lock(sk); + mptcp_rcv_rtt_update(msk, subflow); if (!sock_owned_by_user(sk)) { /* Wake-up the reader only for in-sequence data */ if (move_skbs_to_msk(msk, ssk) && mptcp_epollin_ready(sk)) @@ -2067,7 +2094,6 @@ static void mptcp_rcv_space_init(struct mptcp_sock *m= sk, const struct sock *ssk) =20 msk->rcvspace_init =3D 1; msk->rcvq_space.copied =3D 0; - msk->rcvq_space.rtt_us =3D 0; =20 /* initial rcv_space offering made to peer */ msk->rcvq_space.space =3D min_t(u32, tp->rcv_wnd, @@ -2078,15 +2104,15 @@ static void mptcp_rcv_space_init(struct mptcp_sock = *msk, const struct sock *ssk) =20 /* receive buffer autotuning. See tcp_rcv_space_adjust for more informati= on. * - * Only difference: Use highest rtt estimate of the subflows in use. + * Only difference: Use lowest rtt estimate of the subflows in use, see + * mptcp_rcv_rtt_update() and mptcp_rtt_us_est(). */ static void mptcp_rcv_space_adjust(struct mptcp_sock *msk, int copied) { struct mptcp_subflow_context *subflow; struct sock *sk =3D (struct sock *)msk; - u8 scaling_ratio =3D U8_MAX; - u32 time, advmss =3D 1; - u64 rtt_us, mstamp; + u32 time, rtt_us; + u64 mstamp; =20 msk_owned_by_me(msk); =20 @@ -2101,29 +2127,8 @@ static void mptcp_rcv_space_adjust(struct mptcp_sock= *msk, int copied) mstamp =3D mptcp_stamp(); time =3D tcp_stamp_us_delta(mstamp, READ_ONCE(msk->rcvq_space.time)); =20 - rtt_us =3D msk->rcvq_space.rtt_us; - if (rtt_us && time < (rtt_us >> 3)) - return; - - rtt_us =3D 0; - mptcp_for_each_subflow(msk, subflow) { - const struct tcp_sock *tp; - u64 sf_rtt_us; - u32 sf_advmss; - - tp =3D tcp_sk(mptcp_subflow_tcp_sock(subflow)); - - sf_rtt_us =3D READ_ONCE(tp->rcv_rtt_est.rtt_us); - sf_advmss =3D READ_ONCE(tp->advmss); - - rtt_us =3D max(sf_rtt_us, rtt_us); - advmss =3D max(sf_advmss, advmss); - scaling_ratio =3D min(tp->scaling_ratio, scaling_ratio); - } - - msk->rcvq_space.rtt_us =3D rtt_us; - msk->scaling_ratio =3D scaling_ratio; - if (time < (rtt_us >> 3) || rtt_us =3D=3D 0) + rtt_us =3D mptcp_rtt_us_est(msk); + if (rtt_us =3D=3D U32_MAX || time < (rtt_us >> 3)) return; =20 if (msk->rcvq_space.copied <=3D msk->rcvq_space.space) @@ -2969,6 +2974,7 @@ static void __mptcp_init_sock(struct sock *sk) msk->timer_ival =3D TCP_RTO_MIN; msk->scaling_ratio =3D TCP_DEFAULT_SCALING_RATIO; msk->backlog_len =3D 0; + mptcp_init_rtt_est(msk); =20 WRITE_ONCE(msk->first, NULL); inet_csk(sk)->icsk_sync_mss =3D mptcp_sync_mss; @@ -3412,6 +3418,7 @@ static int mptcp_disconnect(struct sock *sk, int flag= s) msk->bytes_sent =3D 0; msk->bytes_retrans =3D 0; msk->rcvspace_init =3D 0; + mptcp_init_rtt_est(msk); =20 /* for fallback's sake */ WRITE_ONCE(msk->ack_seq, 0); diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index ee0dbd6dbacf..b392d7855928 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -269,6 +269,13 @@ struct mptcp_data_frag { struct page *page; }; =20 +/* Arbitrary compromise between as low as possible to react timely to subf= low + * close event and as big as possible to avoid being fouled by biased large + * samples due to peer sending data on a different subflow WRT to the inco= ming + * ack. + */ +#define MPTCP_RTT_SAMPLES 5 + /* MPTCP connection sock */ struct mptcp_sock { /* inet_connection_sock must be the first member */ @@ -340,11 +347,17 @@ struct mptcp_sock { */ struct mptcp_pm_data pm; struct mptcp_sched_ops *sched; + + /* Most recent rtt_us observed by in use incoming subflows. */ + struct { + u32 samples[MPTCP_RTT_SAMPLES]; + u32 next_sample; + } rcv_rtt_est; + struct { int space; /* bytes copied in last measurement window */ int copied; /* bytes copied in this measurement window */ u64 time; /* start time of measurement window */ - u64 rtt_us; /* last maximum rtt of subflows */ } rcvq_space; u8 scaling_ratio; bool allow_subflows; @@ -422,6 +435,27 @@ static inline struct mptcp_data_frag *mptcp_send_head(= const struct sock *sk) return msk->first_pending; } =20 +static inline void mptcp_init_rtt_est(struct mptcp_sock *msk) +{ + int i; + + for (i =3D 0; i < MPTCP_RTT_SAMPLES; ++i) + msk->rcv_rtt_est.samples[i] =3D U32_MAX; + msk->rcv_rtt_est.next_sample =3D 0; + msk->scaling_ratio =3D TCP_DEFAULT_SCALING_RATIO; +} + +static inline u32 mptcp_rtt_us_est(const struct mptcp_sock *msk) +{ + u32 rtt_us =3D msk->rcv_rtt_est.samples[0]; + int i; + + /* Lockless access of collected samples. */ + for (i =3D 1; i < MPTCP_RTT_SAMPLES; ++i) + rtt_us =3D min(rtt_us, READ_ONCE(msk->rcv_rtt_est.samples[i])); + return rtt_us; +} + static inline struct mptcp_data_frag *mptcp_send_next(struct sock *sk) { struct mptcp_sock *msk =3D mptcp_sk(sk); @@ -523,6 +557,8 @@ struct mptcp_subflow_context { u32 map_data_len; __wsum map_data_csum; u32 map_csum_len; + u32 prev_rtt_us; + u32 prev_rtt_seq; u32 request_mptcp : 1, /* send MP_CAPABLE */ request_join : 1, /* send MP_JOIN */ request_bkup : 1, --=20 2.51.1 From nobody Thu Nov 27 12:35:53 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ADA783074B7 for ; Thu, 20 Nov 2025 08:40:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763628015; cv=none; b=Iay3U4a9PyyFOSVXe79twbDI101AzvNBAk0RwR1VRpWm1PZTkYYGQBrgMnRclvChv1C4d8uuHWnOU64Qck+as8t4S2iFWCvuPVVLQmrY9q5AxXfB1wxw8Oyn+erX188y0gC1PaRPyN8sPTNxuRO9h6RHDCrNgTgdNarAs0Plfig= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763628015; c=relaxed/simple; bh=gSrdQwDjm115Q1m52SVCeJQ0DEmbVlnHJWkV3DRQlB8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=iKb6ZU+cuqmzajyMDVG2Fy+k8ycbqLDhh6one3ueqOBC3i7dzNqDVnhhthcJQ3zgbGuKIiVlraXOrb84FEIxE3glHV75SeBiaUb7k+MKrj6U0D0w4voisiORPjKAa/ugvvCyGl6YpmJ3EABhT0zVAwp2jmlBaAUP82ipcyE8a4Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=cl+FIihW; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="cl+FIihW" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1763628012; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Ikoc2Af4yKSVFSmlXaLBzBKrx2pjEEo9eIOeqYbV7dM=; b=cl+FIihWyIIZCycEiexIHpzgpDOVZB8mA/x9PcP0IgMr02cDGZZlJSCM0CZ5qLsldE9y3n 3/u5G2cwh3JWgDGNzvvvFun8I989alYSD80Yas9hBrhktcfNJxzmBBRE3pW+Vt3AFmdHND 83C5C3C+wd4ZmW4/HgMYQXDjkidsBCk= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-649-TEkwmGk6MK6W9smpGqNIZQ-1; Thu, 20 Nov 2025 03:40:09 -0500 X-MC-Unique: TEkwmGk6MK6W9smpGqNIZQ-1 X-Mimecast-MFC-AGG-ID: TEkwmGk6MK6W9smpGqNIZQ_1763628008 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 2A3861954B06; Thu, 20 Nov 2025 08:40:08 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.44.33.89]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id ECAEE30044DB; Thu, 20 Nov 2025 08:40:06 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Cc: martineau@kernel.org Subject: [PATCH v7 mptcp-next 6/6] mptcp: add receive queue awareness in tcp_rcv_space_adjust() Date: Thu, 20 Nov 2025 09:39:50 +0100 Message-ID: <9b2b80f0d291661ec13066502cf92823873e5ef9.1763625391.git.pabeni@redhat.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: 3ZtAQH-7uNHlenwJFlXNY1YJ723pDGlQI8ti8PIfsNI_1763628008 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" This is the mptcp counter-part of commit ea33537d8292 ("tcp: add receive queue awareness in tcp_rcv_space_adjust()"). Prior to this commit: ESTAB 33165568 0 192.168.255.2:5201 192.168.255.1:53380 \ skmem:(r33076416,rb33554432,t0,tb91136,f448,w0,o0,bl0,d0) After: ESTAB 3279168 0 192.168.255.2:5201 192.168.255.1]:53042 \ skmem:(r3190912,rb3719956,t0,tb91136,f1536,w0,o0,bl0,d0) (same tput) Reviewed-by: Mat Martineau Signed-off-by: Paolo Abeni --- net/mptcp/protocol.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 5dde016a108d..6f4220c8b5bb 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -2131,11 +2131,13 @@ static void mptcp_rcv_space_adjust(struct mptcp_soc= k *msk, int copied) if (rtt_us =3D=3D U32_MAX || time < (rtt_us >> 3)) return; =20 - if (msk->rcvq_space.copied <=3D msk->rcvq_space.space) + copied =3D msk->rcvq_space.copied; + copied -=3D mptcp_inq_hint(sk); + if (copied <=3D msk->rcvq_space.space) goto new_measure; =20 trace_mptcp_rcvbuf_grow(sk, time); - if (mptcp_rcvbuf_grow(sk, msk->rcvq_space.copied)) { + if (mptcp_rcvbuf_grow(sk, copied)) { /* Make subflows follow along. If we do not do this, we * get drops at subflow level if skbs can't be moved to * the mptcp rx queue fast enough (announced rcv_win can @@ -2149,7 +2151,7 @@ static void mptcp_rcv_space_adjust(struct mptcp_sock = *msk, int copied) slow =3D lock_sock_fast(ssk); /* subflows can be added before tcp_init_transfer() */ if (tcp_sk(ssk)->rcvq_space.space) - tcp_rcvbuf_grow(ssk, msk->rcvq_space.copied); + tcp_rcvbuf_grow(ssk, copied); unlock_sock_fast(ssk, slow); } } --=20 2.51.1