From nobody Sun Mar 22 08:10:23 2026 Received: from mail-oa1-f53.google.com (mail-oa1-f53.google.com [209.85.160.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4C37C384235 for ; Sat, 14 Mar 2026 20:14:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.53 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773519284; cv=none; b=FqCFo/cDF4x+2Yg1edv2Nq5IUreiODICjyLzrV+qmxKCkObR0JVvWduhqYJ2N7QskrwdcIcI3wMhMZakOAos/Ph5ivr2dVpfvo5+hoq8yQ5nZXKQx+/Rk1SDLg4BTXi0XmjetMUNNDsYawmC+VBBuhlCnotkmKD2hRBQcZdRIEw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773519284; c=relaxed/simple; bh=x7JZBaiPMjila4i+dsi/muzHxZeS9PJ/SGrV4i00JNc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Dgz9f7e0OrKN1quxH8YPjwLf2/mW6d54b4sQWNd5+WSqD4N1xkn2yPUbWo5aMX6tCS9Mf0l9/lPNbop7FCpGmcMCvRHE+rq/lVTlm7qd2ujHRaRdGF8uYbaSw7A0j7+3ao6T41NPmhgp+eev3Igt0QPWJvb4YEZGZeWSnDAm5B0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=AhUkIN79; arc=none smtp.client-ip=209.85.160.53 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="AhUkIN79" Received: by mail-oa1-f53.google.com with SMTP id 586e51a60fabf-4170e916d77so1047876fac.1 for ; Sat, 14 Mar 2026 13:14:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1773519282; x=1774124082; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=iO8Gig+uUsNsJX3c3oLWHfAhxFR79WVWca0SWR3nLZY=; b=AhUkIN79mN8ojz/fybmKvpF0JjleCLGY4QbBKJgI6OUDuBXH4XR3lV3+WckMvA4KCc 0OjOLBcBwM71dp/Mk32iasY+Z2jObUC4pSQP4hQqWgy0rpHeliVj8qTuEtNgwgrUcnoJ seV+FVl7Z93NKeu9ucjyvGwlbAFnJbY3D28Gs5S1QmeSbxksLmAUGkHjrA/P1xTN4edZ EOb+QWuFCp8WX0ZIJtxHpe70w9ne1eN+Xii8M33lXisL1CNd8vydDrg1MsnigRVFcecW bjcQFJWcU58wxnXkq+57nzQ57x/HlYiTb2HC26qEevaQptff6NMfXwj31ErwfQ8+QhUm kElQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1773519282; x=1774124082; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=iO8Gig+uUsNsJX3c3oLWHfAhxFR79WVWca0SWR3nLZY=; b=FeZGEm5xJI3v20SLm2P4uC4LLQMCJMy/DWdSWsiooXzCnuoubRnjCZZyg/tWT0n2X5 JmQoS4PFxOKsrYN5CnOozS1A3y6w0B/eRAF7XFAwU+tevZZUQEg8/1MQx1M9HDWA6z5i /Kp+fziJIHaSzOJXk2VW7X60SKiSukIB7jpwyybrBgM/P2CPlER/1cp+vqyraq2VSB/E QDOakjCSQu9uXTKNTBWjUIejiAtgKk1wDELceeMrwf2fSg6devDKqoMRntLf6VAhJ3ul yhmd23amp+txe5Zu0CthXwdJpVi/D6SMMacReQZ1bQEEDBWRTMxfcur7ER359+An6JSq SxpA== X-Gm-Message-State: AOJu0Yy9lRQ2Q4jnTMQxMhGmxDl8YRGpdHI6CRvJC4mqbldS+l0gIpT8 JznFMlATzcJNcjJADi7J654shb0fFzWVKpiCV+n8PG09Rcxwz9UAwRx+ X-Gm-Gg: ATEYQzxjru9zUK9rlSpwc8dElrilOYkbOKcBvY45prPj4lsxMfYNIM2BDuczD3Xr/WE kH2Pbe0x0kwWG499i9A74F4jVuoOtX9WDIreKSnAB5tzb/nsOIAaCmrsAZAk24UVMEfGQRJidWN f3DFZgVqPx3mD3tYWdsMK7dottcaYIsz24KrcGcolZ0v17V7NtewibXdR6fZ0LeKha0unL41TEF CQbHk/NjxKS/mn8/8s4AZVFOq/PufE3CKaKaUPFgmH/Ubli7vyCUnbDVkecu9CQe01mktKABEba t6L9elvIIB7K0JIpFZkFw7FlWFlp4FNLthQ/PAmIumBFWsGkIa/vZZmAGenZLs1ATQg1g9AqoWa W/0F6fu4z2blnT8KoIKh8GxX/hPbgmKI5ZDulWcqEZCmxkFG0xUXR2ZBUbLK2APjksy1rCO9s91 yR83xWjiDvvm6dmSirAPZryNtEoW5uCot+JCRM2O/BTzsfYXqAjvCcaCKBrD3YxS6pi27cgsn1G KtWox0x3Asd2yyVZm078Plh45tyHQGI20FfG/uJ X-Received: by 2002:a05:6870:d8c:b0:404:33e1:3cc2 with SMTP id 586e51a60fabf-417b91b2c8fmr4474193fac.13.1773519282232; Sat, 14 Mar 2026 13:14:42 -0700 (PDT) Received: from Atwell-Laptop.. (108-212-132-20.lightspeed.irvnca.sbcglobal.net. [108.212.132.20]) by smtp.gmail.com with ESMTPSA id 586e51a60fabf-4177e5e8185sm11914165fac.12.2026.03.14.13.14.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 14 Mar 2026 13:14:41 -0700 (PDT) From: atwellwea@gmail.com To: netdev@vger.kernel.org, davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com, edumazet@google.com, ncardwell@google.com Cc: linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-trace-kernel@vger.kernel.org, mptcp@lists.linux.dev, dsahern@kernel.org, horms@kernel.org, kuniyu@google.com, andrew+netdev@lunn.ch, willemdebruijn.kernel@gmail.com, jasowang@redhat.com, skhan@linuxfoundation.org, corbet@lwn.net, matttbe@kernel.org, martineau@kernel.org, geliang@kernel.org, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, 0x7f454c46@gmail.com Subject: [PATCH net-next v2 01/14] tcp: factor receive-memory accounting helpers Date: Sat, 14 Mar 2026 14:13:35 -0600 Message-ID: <20260314201348.1786972-2-atwellwea@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260314201348.1786972-1-atwellwea@gmail.com> References: <20260314201348.1786972-1-atwellwea@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Wesley Atwell Factor the core receive-memory byte accounting into small helpers so window selection, pressure checks, and prune decisions all start from one set of quantities. This is preparatory only. Later patches will use the same helpers when tying sender-visible receive-window state back to hard memory admission. Signed-off-by: Wesley Atwell --- include/net/tcp.h | 32 +++++++++++++++++++++++++++----- net/ipv4/tcp_input.c | 2 +- 2 files changed, 28 insertions(+), 6 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index f87bdacb5a69..3a0060599afe 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1751,12 +1751,34 @@ static inline void tcp_scaling_ratio_init(struct so= ck *sk) tcp_sk(sk)->scaling_ratio =3D TCP_DEFAULT_SCALING_RATIO; } =20 +/* TCP receive-side accounting reuses sk_rcvbuf as both a hard memory limit + * and as the source material for the advertised receive window after + * scaling_ratio conversion. Keep the byte accounting explicit so admissio= n, + * pruning, and rwnd selection all start from the same quantities. + */ +static inline int tcp_rmem_used(const struct sock *sk) +{ + return atomic_read(&sk->sk_rmem_alloc); +} + +static inline int tcp_rmem_avail(const struct sock *sk) +{ + return READ_ONCE(sk->sk_rcvbuf) - tcp_rmem_used(sk); +} + +/* Sender-visible rwnd headroom also reserves bytes already queued on back= log. + * Those bytes are not free to advertise again until __release_sock() drai= ns + * backlog and clears sk_backlog.len. + */ +static inline int tcp_rwnd_avail(const struct sock *sk) +{ + return tcp_rmem_avail(sk) - READ_ONCE(sk->sk_backlog.len); +} + /* Note: caller must be prepared to deal with negative returns */ static inline int tcp_space(const struct sock *sk) { - return tcp_win_from_space(sk, READ_ONCE(sk->sk_rcvbuf) - - READ_ONCE(sk->sk_backlog.len) - - atomic_read(&sk->sk_rmem_alloc)); + return tcp_win_from_space(sk, tcp_rwnd_avail(sk)); } =20 static inline int tcp_full_space(const struct sock *sk) @@ -1799,7 +1821,7 @@ static inline bool tcp_rmem_pressure(const struct soc= k *sk) rcvbuf =3D READ_ONCE(sk->sk_rcvbuf); threshold =3D rcvbuf - (rcvbuf >> 3); =20 - return atomic_read(&sk->sk_rmem_alloc) > threshold; + return tcp_rmem_used(sk) > threshold; } =20 static inline bool tcp_epollin_ready(const struct sock *sk, int target) @@ -1949,7 +1971,7 @@ static inline void tcp_fast_path_check(struct sock *s= k) =20 if (RB_EMPTY_ROOT(&tp->out_of_order_queue) && tp->rcv_wnd && - atomic_read(&sk->sk_rmem_alloc) < sk->sk_rcvbuf && + tcp_rmem_avail(sk) > 0 && !tp->urg_data) tcp_fast_path_on(tp); } diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index e6b2f4be7723..b8e65e31255e 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -5959,7 +5959,7 @@ static int tcp_prune_queue(struct sock *sk, const str= uct sk_buff *in_skb) struct tcp_sock *tp =3D tcp_sk(sk); =20 /* Do nothing if our queues are empty. */ - if (!atomic_read(&sk->sk_rmem_alloc)) + if (!tcp_rmem_used(sk)) return -1; =20 NET_INC_STATS(sock_net(sk), LINUX_MIB_PRUNECALLED); --=20 2.43.0 From nobody Sun Mar 22 08:10:23 2026 Received: from mail-ot1-f53.google.com (mail-ot1-f53.google.com [209.85.210.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3D647385527 for ; Sat, 14 Mar 2026 20:14:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.53 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773519286; cv=none; b=FPOi8Y9GeXKEMvd5PVNk3R8TdGtcxLKIjQKl66mfYGJ+P6IBybAbYYO+4BlRzmDLW3Zt/PiyIhUqhoumLPAIOvT/A632faEdoBNCFbNLFnY9yc3I6rCDausIVsP9MZJObB1BnFHqCVkgBcyKeb1eBYcS6O/mpbXCzIFx/kU3SXE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773519286; c=relaxed/simple; bh=kQAayNyz3gm5FbyPOD3uaOhq7RsoX4Ii2Fi5JpOs7C0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Ltk7w8Uq15ey8focyvuUgwv3mg9h8SeC+rbbEXaMrKZOzlKxPMcZF03vxJtkLc+COTc6VhsdqMSJFBw4zieC3IxHMAQxNQlkEWxtYK7Kv2S3BalwhIbnrz+GA9IKxX25DvvK/f7El5w7oGh6T0R7gCGN8YBd0v7Q8aBz4uA+nI0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=XCJ0medY; arc=none smtp.client-ip=209.85.210.53 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="XCJ0medY" Received: by mail-ot1-f53.google.com with SMTP id 46e09a7af769-7d738fe814cso2746919a34.3 for ; Sat, 14 Mar 2026 13:14:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1773519284; x=1774124084; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=meSreHlytwJ2+CYWUu8fQhMJhkO7SpENNzkrogjDyzo=; b=XCJ0medYSyUSQmiTOLhukeVpDbD+puJ/8hkc4j8T4ppJOy2IPw7pcm7waqXMySbFHv +uFn4Wk34rscOGfg/JVVvQVVPhXilbjqcUs4j3xJ688YhRUn9KmlFPMPsweGK4GI52N0 ktmKT52GwZvwkEhywvrQn8k5PAEOg1IXO9GhpQ1OzAi0njDItQESw7Xij9rdL/qREk9j 1iJ8HwNk43be2Suv1qAsVN3lnFKoZ1B6D2Pyf1xEQqUGO+aa6W0WoqP6ysItY6Cuw+wc kLDfSMbDb9co1eJtGvg7ilaAKeka3N/YtLBVwwmxAZctGg5uR59sbfB8dy2pGX8DdPgb ITmQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1773519284; x=1774124084; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=meSreHlytwJ2+CYWUu8fQhMJhkO7SpENNzkrogjDyzo=; b=B2+Z/3jy4jcG1GJH4jL0fPaWD+SwKW9o23L8GOCxE+aUx51YoaCSh0VlhAzfZvFkPf gRDIQz76Sr9nX/YUZ0cTqVtnzpTv9SkevpeSYOoGEI0Py2ZWxh/UEwlfLKEmwh8PWMPp OoDHN1+35q48RX0tKdT6R+1Uu74S1tnhsPODp4p3vyJvDY1+oLtEA1S1MAyeXFMeDpiW jEapwjuvsVSGfVBT2lx2MqZYoqcosjZS4GZaUElbC/CbO3L2mT+JMgeFfzpq5Kw2bSqg mD/Y4tVoz+qI74eWLCn8BYziDIoIQ1HpJgQyYm4ZaNyEsmLsL45f2H7pR+RgPvKU5LoQ dhOA== X-Gm-Message-State: AOJu0Yzjk45aTTUoBWVKHSWLl4UViw273RvA1wo5XuwzP+1UGKbgJjng Bgan60j/z0dv5ZJdzOIAWE8Zy9yuGWInscYaLaaeWRpRqtYHsyoZG8yt X-Gm-Gg: ATEYQzwzgL+lmXLBAiW5x4D38QJRt0L2X2swE1wT6ELqW6ci1q81b9/fLEzLbn9lohu xI6JaAzdA9b1YlPp+TMIVS5XAFyZqCh3ysCu5kR6tXBSHKQ2NVji/Hy3y2tSFnTey0RFNm+kVWo JUPAUI4mjHMFmqM6c6IldYh2+w0QF0tbdwibfw1TVJayZqIJ4eZ9pGWBqnZ/kEn6E4sOLYwcHUn QihmGJdSbox6/YG1ZjrvRu+H/EhvUtxDcUBWTsWkXBuzbQYDu4phEB3mkllJCgZHY36szLy3xlx Di2C7+z47icJGf714mUu7KeV8h2zJ7UbZ8WXUbXjEshRkiwmjf2EA6TqwX/tklsSnww5r2A8RKm brnTVn6w2SmEKAD+dZ7xkvg86k9cJa9e3aX1wyjaSIGDnbsTd8QeZ9TZEKWS3xOv/R7HOFWs57N 9c4zHtxVywr5W97u5cGcFfF82sQA7hsZUOQyd39RrvHV6KVBTbhIc70qROcc9JDGmSZtmvzmMuM hIjk/z8BMxT+IifA/S9aRocLeKKk0cdzpm2BEht X-Received: by 2002:a05:6820:2909:b0:662:fbd6:1849 with SMTP id 006d021491bc7-67bda98cb6dmr4777838eaf.4.1773519284132; Sat, 14 Mar 2026 13:14:44 -0700 (PDT) Received: from Atwell-Laptop.. (108-212-132-20.lightspeed.irvnca.sbcglobal.net. [108.212.132.20]) by smtp.gmail.com with ESMTPSA id 586e51a60fabf-4177e5e8185sm11914165fac.12.2026.03.14.13.14.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 14 Mar 2026 13:14:43 -0700 (PDT) From: atwellwea@gmail.com To: netdev@vger.kernel.org, davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com, edumazet@google.com, ncardwell@google.com Cc: linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-trace-kernel@vger.kernel.org, mptcp@lists.linux.dev, dsahern@kernel.org, horms@kernel.org, kuniyu@google.com, andrew+netdev@lunn.ch, willemdebruijn.kernel@gmail.com, jasowang@redhat.com, skhan@linuxfoundation.org, corbet@lwn.net, matttbe@kernel.org, martineau@kernel.org, geliang@kernel.org, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, 0x7f454c46@gmail.com Subject: [PATCH net-next v2 02/14] tcp: snapshot advertise-time scaling for rcv_wnd Date: Sat, 14 Mar 2026 14:13:36 -0600 Message-ID: <20260314201348.1786972-3-atwellwea@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260314201348.1786972-1-atwellwea@gmail.com> References: <20260314201348.1786972-1-atwellwea@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Wesley Atwell Track the scaling basis that was in force when tp->rcv_wnd was last advertised, and provide helpers to refresh or interpret that snapshot. Later patches use this live-window basis to preserve sender-visible rwnd accounting when receive-side memory costs drift after advertisement. Signed-off-by: Wesley Atwell --- .../networking/net_cachelines/tcp_sock.rst | 1 + include/linux/tcp.h | 1 + include/net/tcp.h | 52 ++++++++++++++++++- net/ipv4/tcp.c | 1 + 4 files changed, 54 insertions(+), 1 deletion(-) diff --git a/Documentation/networking/net_cachelines/tcp_sock.rst b/Documen= tation/networking/net_cachelines/tcp_sock.rst index fecf61166a54..09ece1c59c2d 100644 --- a/Documentation/networking/net_cachelines/tcp_sock.rst +++ b/Documentation/networking/net_cachelines/tcp_sock.rst @@ -11,6 +11,7 @@ Type Name fas= tpath_tx_access fastpa struct inet_connection_sock inet_conn u16 tcp_header_len read_mostly = read_mostly tcp_bound_to_half_wnd,tcp_current_mss(tx);tcp_rcv_estab= lished(rx) u16 gso_segs read_mostly = tcp_xmit_size_goal +u8 rcv_wnd_scaling_ratio read_write = read_mostly tcp_set_rcv_wnd,tcp_can_ingest,tcp_repair_set_window,do= _tcp_getsockopt __be32 pred_flags read_write = read_mostly tcp_select_window(tx);tcp_rcv_established(rx) u64 bytes_received = read_write tcp_rcv_nxt_update(rx) u32 segs_in = read_write tcp_v6_rcv(rx) diff --git a/include/linux/tcp.h b/include/linux/tcp.h index 6982f10e826b..2ace563d59d6 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -297,6 +297,7 @@ struct tcp_sock { est_ecnfield:2,/* ECN field for AccECN delivered estimates */ accecn_opt_demand:2,/* Demand AccECN option for n next ACKs */ prev_ecnfield:2; /* ECN bits from the previous segment */ + u8 rcv_wnd_scaling_ratio; /* 0 if unknown, else tp->rcv_wnd basis */ __be32 pred_flags; u64 tcp_clock_cache; /* cache last tcp_clock_ns() (see tcp_mstamp_refresh= ()) */ u64 tcp_mstamp; /* most recent packet received/sent */ diff --git a/include/net/tcp.h b/include/net/tcp.h index 3a0060599afe..6fa7cdb0979e 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1741,6 +1741,31 @@ static inline int tcp_space_from_win(const struct so= ck *sk, int win) return __tcp_space_from_win(tcp_sk(sk)->scaling_ratio, win); } =20 +static inline bool tcp_wnd_snapshot_valid(u8 scaling_ratio) +{ + return scaling_ratio !=3D 0; +} + +static inline bool tcp_space_from_wnd_snapshot(u8 scaling_ratio, int win, + int *space) +{ + if (!tcp_wnd_snapshot_valid(scaling_ratio)) + return false; + + *space =3D __tcp_space_from_win(scaling_ratio, win); + return true; +} + +/* Rebuild hard receive-memory units for data already covered by tp->rcv_w= nd if + * the advertise-time basis is known. + */ +static inline bool tcp_space_from_rcv_wnd(const struct tcp_sock *tp, int w= in, + int *space) +{ + return tcp_space_from_wnd_snapshot(tp->rcv_wnd_scaling_ratio, win, + space); +} + /* Assume a 50% default for skb->len/skb->truesize ratio. * This may be adjusted later in tcp_measure_rcv_mss(). */ @@ -1748,7 +1773,32 @@ static inline int tcp_space_from_win(const struct so= ck *sk, int win) =20 static inline void tcp_scaling_ratio_init(struct sock *sk) { - tcp_sk(sk)->scaling_ratio =3D TCP_DEFAULT_SCALING_RATIO; + struct tcp_sock *tp =3D tcp_sk(sk); + + tp->scaling_ratio =3D TCP_DEFAULT_SCALING_RATIO; + tp->rcv_wnd_scaling_ratio =3D TCP_DEFAULT_SCALING_RATIO; +} + +/* tp->rcv_wnd is paired with the scaling_ratio that was in force when that + * window was last advertised. Callers can leave a zero snapshot when the + * advertise-time basis is unknown and refresh the pair on the next local + * window update. + */ +static inline void tcp_set_rcv_wnd_snapshot(struct tcp_sock *tp, u32 win, + u8 scaling_ratio) +{ + tp->rcv_wnd =3D win; + tp->rcv_wnd_scaling_ratio =3D scaling_ratio; +} + +static inline void tcp_set_rcv_wnd(struct tcp_sock *tp, u32 win) +{ + tcp_set_rcv_wnd_snapshot(tp, win, tp->scaling_ratio); +} + +static inline void tcp_set_rcv_wnd_unknown(struct tcp_sock *tp, u32 win) +{ + tcp_set_rcv_wnd_snapshot(tp, win, 0); } =20 /* TCP receive-side accounting reuses sk_rcvbuf as both a hard memory limit diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 516087c622ad..0383ee8d3b78 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -5275,6 +5275,7 @@ static void __init tcp_struct_check(void) CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_txrx, recei= ved_ce); CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_txrx, recei= ved_ecn_bytes); CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_txrx, app_l= imited); + CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_txrx, rcv_w= nd_scaling_ratio); CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_txrx, rcv_w= nd); CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_txrx, rcv_m= wnd_seq); CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_txrx, rcv_t= stamp); --=20 2.43.0 From nobody Sun Mar 22 08:10:23 2026 Received: from mail-oa1-f47.google.com (mail-oa1-f47.google.com [209.85.160.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6DDC0386439 for ; Sat, 14 Mar 2026 20:14:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.47 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773519289; cv=none; b=TZ4iDMhwubkTZn1WtqgHLyZ+iDmeIFNINsspZ13gEshCoWmgxS0D7nAINAa+cCSPVa+mPuLYclHMzbZjdMqPfrybIn3T0HTiJMMIB+b8MpLbUtsKp3+5VAsC48wSPywtMpAhJloOuhLDOszUP1fILl8dTFhYxpPlQAB5HQwiWtw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773519289; c=relaxed/simple; bh=Ttxd17H5LIzMMzsw2TR13aHibCW2LBBebxMgXaknmMs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=iRvHxNJyiBaXBw5L5vCGTRSo+7NXoaLbl/B4dEnY2Z/QiBOveYx/OrX51kiMgZN6znwhQlCFEt2kQRWGr2kIMzlYpaH1BEKIgb40hLLDMAdbzZYj5IfU7maTi/h+xrv2yY1rJGVb5H2g6EOKogifY4YUKZwFh870Hd8b8PQfz0Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=GYmglQBk; arc=none smtp.client-ip=209.85.160.47 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="GYmglQBk" Received: by mail-oa1-f47.google.com with SMTP id 586e51a60fabf-40946982a78so1278351fac.2 for ; Sat, 14 Mar 2026 13:14:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1773519286; x=1774124086; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=zMcuR3aCCM7PiXgftnIc+4nYAKR8gQqAUuswiZ7+FpY=; b=GYmglQBkG2E1X7kBarsxxIkNHb6EjaPAexNG1J/bbJZ/0cXvJUreZuGHd6Q8gpCRJc THax/x7rWjLdmM93OMlNnpCT6vIx0UK2aWUQwZyYq285oHH2S/GXRP6Hh4G0zUj8G9ZH BaoLhauFbOMn6zIIziSlGKm9KLpd8yMaohK8FEe2mw/u8jtJJw4Nt1/KtZSdbznqP9dO B5/JjRREpPF0nZGjCDEX4bQQ4+i837nS/KDXSE7C+XV6rClkeTa6XyIG4kaKMeWW9HyO 9N9bsmPdqkKZEUXtur5bUZFEyOiLFOcW0S3Bcr6PBYq0+xFor32PLTZXUTlUBFPpmhN4 ZsYA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1773519286; x=1774124086; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=zMcuR3aCCM7PiXgftnIc+4nYAKR8gQqAUuswiZ7+FpY=; b=niP6ifZYNxaciB5wwxd6GZ4m6LyQ8+Bt+UPK4WEzxQDyBlpg3/7uOnTctzCBO0w43V 1CxtNE3c/wUrOvnrVyAj/L7UgLYwYQ7KG78hiZDRtK+5c3/maZs85H2yibioDpqNHSgZ t/N11+cf7sPyNXtsNj9hqHH5j0ihD7Bs2lXP1VR1hhSsqaniuE6YWfZhTaarROeOGiTV jVBZQYgw7qUtJzt4ZAcLYlmkmABpzgCW2qLpOgiQju8RSzioeggPAePJnraHsHTR0+uC X2/LQ99iOQFTHnY6qZwd86TOzVES6eury16cFkpvUVd5qHxO4t8XtSIFYspHNi7bRZ6m cQBw== X-Gm-Message-State: AOJu0YzTgSQAqTgCyBZ7yfCSru462WnY22KjvVk1s/C6IoP1qXx3vZu2 tVzDf1xwh5EM1I6u6jO3u1ZYfKDTrGQMfJKFaii4Pk5zsHFjABZxqFHD X-Gm-Gg: ATEYQzy0OPJBnVEhWEcuevn9MmvwspcZWCVa6AeAcGDS7x252EqJ+LrV6iPIxHjVTeA je6k7lfBGTJMbFCWpRzVw1l4XpZW003BGQ/FM2nunuofFBOWeIWem9NYKLcneDWj0AW8R8BRCa5 YEOTSfwdYw8izZx5ILPq1LqQ6JLf0lfcssrCni0SQSl/G7s1oZyTZITvnd4pubsspyeLwEtWCqv mhekQcyKGFTSqU6WFUIdgAV1OylW+mWQEuOXyZXCONzEnQGN+kNT6tgQFKkxgy/AfAhmE3o8yOa W/OcwZys21caGz45j/N5GgkObLLZ9P9UmRJ7fYDCzbUI8LukwEenCzFjW5agPfkzbLshcpA7v65 w7mtcJBgTAKAKOwpk2HYgeTcqn886GryQ6IpcY3EaL0YKC7WEQYQC7lR7bGUCoUysl1ZZiduBNg s+BqD+g0LeOZb5yZPLByygfXJ9Ne9dAWRbERZ4vx7WMV6BEuvlvvigeswYUqqOa1i8t6TUVBTSo YpqkicqPEHa4s7j8Vj/jibHpd82d3rnvRK7c37a X-Received: by 2002:a05:6870:ebc9:b0:404:590:59dc with SMTP id 586e51a60fabf-417b938f328mr3924319fac.33.1773519285922; Sat, 14 Mar 2026 13:14:45 -0700 (PDT) Received: from Atwell-Laptop.. (108-212-132-20.lightspeed.irvnca.sbcglobal.net. [108.212.132.20]) by smtp.gmail.com with ESMTPSA id 586e51a60fabf-4177e5e8185sm11914165fac.12.2026.03.14.13.14.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 14 Mar 2026 13:14:45 -0700 (PDT) From: atwellwea@gmail.com To: netdev@vger.kernel.org, davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com, edumazet@google.com, ncardwell@google.com Cc: linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-trace-kernel@vger.kernel.org, mptcp@lists.linux.dev, dsahern@kernel.org, horms@kernel.org, kuniyu@google.com, andrew+netdev@lunn.ch, willemdebruijn.kernel@gmail.com, jasowang@redhat.com, skhan@linuxfoundation.org, corbet@lwn.net, matttbe@kernel.org, martineau@kernel.org, geliang@kernel.org, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, 0x7f454c46@gmail.com Subject: [PATCH net-next v2 03/14] tcp: refresh rcv_wnd snapshots at TCP write sites Date: Sat, 14 Mar 2026 14:13:37 -0600 Message-ID: <20260314201348.1786972-4-atwellwea@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260314201348.1786972-1-atwellwea@gmail.com> References: <20260314201348.1786972-1-atwellwea@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Wesley Atwell Refresh the live rwnd snapshot whenever TCP updates tp->rcv_wnd at the normal write sites, including child setup, tcp_select_window(), and the initial connect-time window selection. This keeps the live sender-visible window paired with the scaling basis that was actually advertised. Signed-off-by: Wesley Atwell --- net/ipv4/tcp_minisocks.c | 2 +- net/ipv4/tcp_output.c | 8 ++++++-- 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c index d350d794a959..1c02c9cd13fe 100644 --- a/net/ipv4/tcp_minisocks.c +++ b/net/ipv4/tcp_minisocks.c @@ -603,7 +603,7 @@ struct sock *tcp_create_openreq_child(const struct sock= *sk, newtp->rx_opt.sack_ok =3D ireq->sack_ok; newtp->window_clamp =3D req->rsk_window_clamp; newtp->rcv_ssthresh =3D req->rsk_rcv_wnd; - newtp->rcv_wnd =3D req->rsk_rcv_wnd; + tcp_set_rcv_wnd(newtp, req->rsk_rcv_wnd); newtp->rcv_mwnd_seq =3D newtp->rcv_wup + req->rsk_rcv_wnd; newtp->rx_opt.wscale_ok =3D ireq->wscale_ok; if (newtp->rx_opt.wscale_ok) { diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 35c3b0ab5a0c..0b082726d7c4 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -291,7 +291,7 @@ static u16 tcp_select_window(struct sock *sk) */ if (unlikely(inet_csk(sk)->icsk_ack.pending & ICSK_ACK_NOMEM)) { tp->pred_flags =3D 0; - tp->rcv_wnd =3D 0; + tcp_set_rcv_wnd(tp, 0); tp->rcv_wup =3D tp->rcv_nxt; tcp_update_max_rcv_wnd_seq(tp); return 0; @@ -315,7 +315,7 @@ static u16 tcp_select_window(struct sock *sk) } } =20 - tp->rcv_wnd =3D new_win; + tcp_set_rcv_wnd(tp, new_win); tp->rcv_wup =3D tp->rcv_nxt; tcp_update_max_rcv_wnd_seq(tp); =20 @@ -4148,6 +4148,10 @@ static void tcp_connect_init(struct sock *sk) READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_window_scaling), &rcv_wscale, rcv_wnd); + /* tcp_select_initial_window() filled tp->rcv_wnd through its out-param, + * so snapshot the scaling_ratio we will use for that initial rwnd. + */ + tcp_set_rcv_wnd(tp, tp->rcv_wnd); =20 tp->rx_opt.rcv_wscale =3D rcv_wscale; tp->rcv_ssthresh =3D tp->rcv_wnd; --=20 2.43.0 From nobody Sun Mar 22 08:10:23 2026 Received: from mail-oi1-f178.google.com (mail-oi1-f178.google.com [209.85.167.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 800F7386448 for ; Sat, 14 Mar 2026 20:14:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.167.178 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773519290; cv=none; b=sZYTXU4opfYXKl6dn5+0dCRIiH/3Jhf/6oSH59ZBUvaKXsgXGdxULBqHZ9VQxZq5XD6UhL5oTkpYzAaqrIwz4iQPAargZ+dcJQ0zCWVIxlvfBgA73zejSehtsaqIqHzK/D7c3fT724csNKooeeNbTptzHE3rW0HvYf7m+DuW93s= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773519290; c=relaxed/simple; bh=2EkRqoGXrOUfHB63oRGb5TDy5HRb1y3sBh+IDjMEZ3s=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=R/NN2DyjukGXRwMvC7aPPsgI0QnO1+1h3TkWPN0WZ4MAtSzyfMl3EenPCTcAvU31oXqfQ0u3khx4TE2E4gT82aS1cOvuzlAWPjzHzSeORQO50BmwKfeuDMxZeGUr2fQoNOyJ97vYL9c9ONYLB3Zhhy+s/XYFzKslcTcg3ffBEBI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=HzoTk/o5; arc=none smtp.client-ip=209.85.167.178 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="HzoTk/o5" Received: by mail-oi1-f178.google.com with SMTP id 5614622812f47-466ec4c6852so2150155b6e.3 for ; Sat, 14 Mar 2026 13:14:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1773519287; x=1774124087; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=RTjO+z47PYHq3eZ5eRulxWts7U/pAeGFmyBg9Cf0dZo=; b=HzoTk/o5XQt2v5xx6QWk4rY+G7uNBLanSaRLHwKz95tFB/BvzUyoGHa5eOB5czVkHs V3L/nLCvUIiGtHmHLYnZ9jXmCBezPi+C8kdLT55PArkjtvm6M621to35zQEPJ8rZQI2M fPEv6rR04ooH/PiKYB7W9CVCnCYgVGETeTBbFl2pLn+///r6iMKa183QViv+VqD6F7eF sfIet1IdIhHSMIkgYGrkB9ZV+B62OM9N2JeftpkP+SPPZWqlueQ7i+s0XxmEhbsBTMLP GFwo2dar5HRcXBYl1RXgZbSkBjeTBO21fmwH0mCYtJrQnwaRm/oBtpOiomxwI7sgDnU7 TXGQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1773519287; x=1774124087; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=RTjO+z47PYHq3eZ5eRulxWts7U/pAeGFmyBg9Cf0dZo=; b=mr67oMKbhtKKsO+fM8NzxD+IkSNk2zWtZ1CxcA4y4ymfjMcMnS59GL2oY3rKDGWC7c bUFn3BG9sCfrmgiJAcuIkuHKAPOTCI7P+YvbF6BBmsuqwGOv6cnyKvyEsjRqmHYIg3vQ A+LhVgZnABdAEFFD9u3NwXvReR1OpzprVC6EQAZ0BSkRK7A60s7vHYR97xSCAeLc3gZp hJODbX64tH3+RHsyLxTztmgCN/+7fCresA+pRBGLk8E26/vBCNW9NirhB4QVx60GU75a p/a38F2uvSKcxc4MB/k7p34ck2lEMrB4yfWK6EHC0xAc0WGsh+aj4Ribwpbioya0ALlN m0Lg== X-Gm-Message-State: AOJu0YxhV7NAWBLVOPXXoDDiXgE4Ya9eaGI+q7vv3gbfptVlcR352KV5 /zdTQw6kg7ReqWmjZSqgRukPdwcP0Dsue9UiFbyqVjzrTL0cowJGhko95j245cpIIpw= X-Gm-Gg: ATEYQzx4YqiJUoM/oEkaRQzDadV/DsDjqNRdPEXNffkTDsSroOyF9XTOzAfmxAmsTyB 5RgFj5fYHPP2Pm2VWzbvmbcAOJFgCiiUpS0gYTeaHsT9JPfUDM4nbWgBkqwJvr7EXwgTX35d/qu /MvIvaZFAH8MKjSVYR73j7SIAwRyWAC4hSENzLlsBMyzqMgKrrFKS7Y3HGz7qYZdvTSXq99Hr3M Ty/btl2fb8q3rBpAkMygoz1XvD2o94xYfbcOFSSA+DTrdlND7f7hhDjz0YrLVlwSY+sVdoujtj1 j6SX+qIZqh5rnkQ5NR7tvHujxJt8jXXcHc0cGEIn+h4oy2OzydBxPwJflJfmPO199QISCwkeFge SZKajb3MIZBp+IugEthvMFLWnUIeZD9aYMc43hyB64AZSc9XKmmklTDtS7dhjpi5arzSzeWfHwA ybjiP7X+1QZ2T0MFC3drSiesJAv3cc8v6hGKYDz/DLvNqJcxNNq58+/iRe9uUsT/5cZMz3ZCK95 ScWX8l9K0S9RDhl7+vCSQW3AOF9R49N4VDUDazCrZVUmmopvLM= X-Received: by 2002:a05:6808:23d1:b0:467:15ad:9de5 with SMTP id 5614622812f47-467570a27f2mr4101052b6e.13.1773519287456; Sat, 14 Mar 2026 13:14:47 -0700 (PDT) Received: from Atwell-Laptop.. (108-212-132-20.lightspeed.irvnca.sbcglobal.net. [108.212.132.20]) by smtp.gmail.com with ESMTPSA id 586e51a60fabf-4177e5e8185sm11914165fac.12.2026.03.14.13.14.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 14 Mar 2026 13:14:47 -0700 (PDT) From: atwellwea@gmail.com To: netdev@vger.kernel.org, davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com, edumazet@google.com, ncardwell@google.com Cc: linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-trace-kernel@vger.kernel.org, mptcp@lists.linux.dev, dsahern@kernel.org, horms@kernel.org, kuniyu@google.com, andrew+netdev@lunn.ch, willemdebruijn.kernel@gmail.com, jasowang@redhat.com, skhan@linuxfoundation.org, corbet@lwn.net, matttbe@kernel.org, martineau@kernel.org, geliang@kernel.org, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, 0x7f454c46@gmail.com Subject: [PATCH net-next v2 04/14] tcp: snapshot the maximum advertised receive window Date: Sat, 14 Mar 2026 14:13:38 -0600 Message-ID: <20260314201348.1786972-5-atwellwea@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260314201348.1786972-1-atwellwea@gmail.com> References: <20260314201348.1786972-1-atwellwea@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Wesley Atwell Track the maximum sender-visible receive-window right edge separately from the live rwnd, along with the scaling basis that was in force when that larger window was advertised. This gives later admission and restore paths enough information to reason about retracted windows without losing the original sender- visible bound. Signed-off-by: Wesley Atwell --- .../networking/net_cachelines/tcp_sock.rst | 1 + include/linux/tcp.h | 1 + include/net/tcp.h | 21 ++++++++++++++++++- net/ipv4/tcp.c | 1 + net/ipv4/tcp_fastopen.c | 2 +- net/ipv4/tcp_input.c | 4 ++-- net/ipv4/tcp_minisocks.c | 2 +- net/ipv4/tcp_output.c | 2 +- 8 files changed, 28 insertions(+), 6 deletions(-) diff --git a/Documentation/networking/net_cachelines/tcp_sock.rst b/Documen= tation/networking/net_cachelines/tcp_sock.rst index 09ece1c59c2d..d58a3b1eb55d 100644 --- a/Documentation/networking/net_cachelines/tcp_sock.rst +++ b/Documentation/networking/net_cachelines/tcp_sock.rst @@ -11,6 +11,7 @@ Type Name fas= tpath_tx_access fastpa struct inet_connection_sock inet_conn u16 tcp_header_len read_mostly = read_mostly tcp_bound_to_half_wnd,tcp_current_mss(tx);tcp_rcv_estab= lished(rx) u16 gso_segs read_mostly = tcp_xmit_size_goal +u8 rcv_mwnd_scaling_ratio read_write = read_mostly tcp_init_max_rcv_wnd_seq,tcp_update_max_rcv_wnd_seq,tcp= _repair_set_window,do_tcp_getsockopt u8 rcv_wnd_scaling_ratio read_write = read_mostly tcp_set_rcv_wnd,tcp_can_ingest,tcp_repair_set_window,do= _tcp_getsockopt __be32 pred_flags read_write = read_mostly tcp_select_window(tx);tcp_rcv_established(rx) u64 bytes_received = read_write tcp_rcv_nxt_update(rx) diff --git a/include/linux/tcp.h b/include/linux/tcp.h index 2ace563d59d6..e5d7a65ac439 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -297,6 +297,7 @@ struct tcp_sock { est_ecnfield:2,/* ECN field for AccECN delivered estimates */ accecn_opt_demand:2,/* Demand AccECN option for n next ACKs */ prev_ecnfield:2; /* ECN bits from the previous segment */ + u8 rcv_mwnd_scaling_ratio; /* 0 if unknown, else tp->rcv_mwnd_seq basis */ u8 rcv_wnd_scaling_ratio; /* 0 if unknown, else tp->rcv_wnd basis */ __be32 pred_flags; u64 tcp_clock_cache; /* cache last tcp_clock_ns() (see tcp_mstamp_refresh= ()) */ diff --git a/include/net/tcp.h b/include/net/tcp.h index 6fa7cdb0979e..fc22ab6b80d5 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -947,13 +947,21 @@ static inline u32 tcp_max_receive_window(const struct= tcp_sock *tp) return (u32) win; } =20 +static inline void tcp_init_max_rcv_wnd_seq(struct tcp_sock *tp) +{ + tp->rcv_mwnd_seq =3D tp->rcv_wup + tp->rcv_wnd; + tp->rcv_mwnd_scaling_ratio =3D tp->rcv_wnd_scaling_ratio; +} + /* Check if we need to update the maximum receive window sequence number */ static inline void tcp_update_max_rcv_wnd_seq(struct tcp_sock *tp) { u32 wre =3D tp->rcv_wup + tp->rcv_wnd; =20 - if (after(wre, tp->rcv_mwnd_seq)) + if (after(wre, tp->rcv_mwnd_seq)) { tp->rcv_mwnd_seq =3D wre; + tp->rcv_mwnd_scaling_ratio =3D tp->rcv_wnd_scaling_ratio; + } } =20 /* Choose a new window, without checks for shrinking, and without @@ -1766,6 +1774,16 @@ static inline bool tcp_space_from_rcv_wnd(const stru= ct tcp_sock *tp, int win, space); } =20 +/* Same as tcp_space_from_rcv_wnd(), but for the remembered maximum + * sender-visible receive window. + */ +static inline bool tcp_space_from_rcv_mwnd(const struct tcp_sock *tp, int = win, + int *space) +{ + return tcp_space_from_wnd_snapshot(tp->rcv_mwnd_scaling_ratio, win, + space); +} + /* Assume a 50% default for skb->len/skb->truesize ratio. * This may be adjusted later in tcp_measure_rcv_mss(). */ @@ -1776,6 +1794,7 @@ static inline void tcp_scaling_ratio_init(struct sock= *sk) struct tcp_sock *tp =3D tcp_sk(sk); =20 tp->scaling_ratio =3D TCP_DEFAULT_SCALING_RATIO; + tp->rcv_mwnd_scaling_ratio =3D TCP_DEFAULT_SCALING_RATIO; tp->rcv_wnd_scaling_ratio =3D TCP_DEFAULT_SCALING_RATIO; } =20 diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 0383ee8d3b78..66706dbb90f5 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -5275,6 +5275,7 @@ static void __init tcp_struct_check(void) CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_txrx, recei= ved_ce); CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_txrx, recei= ved_ecn_bytes); CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_txrx, app_l= imited); + CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_txrx, rcv_m= wnd_scaling_ratio); CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_txrx, rcv_w= nd_scaling_ratio); CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_txrx, rcv_w= nd); CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_txrx, rcv_m= wnd_seq); diff --git a/net/ipv4/tcp_fastopen.c b/net/ipv4/tcp_fastopen.c index 4e389d609f91..56113cf2a165 100644 --- a/net/ipv4/tcp_fastopen.c +++ b/net/ipv4/tcp_fastopen.c @@ -377,7 +377,7 @@ static struct sock *tcp_fastopen_create_child(struct so= ck *sk, =20 tcp_rsk(req)->rcv_nxt =3D tp->rcv_nxt; tp->rcv_wup =3D tp->rcv_nxt; - tp->rcv_mwnd_seq =3D tp->rcv_wup + tp->rcv_wnd; + tcp_init_max_rcv_wnd_seq(tp); /* tcp_conn_request() is sending the SYNACK, * and queues the child into listener accept queue. */ diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index b8e65e31255e..352f814a4ff6 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -6902,7 +6902,7 @@ static int tcp_rcv_synsent_state_process(struct sock = *sk, struct sk_buff *skb, */ WRITE_ONCE(tp->rcv_nxt, TCP_SKB_CB(skb)->seq + 1); tp->rcv_wup =3D TCP_SKB_CB(skb)->seq + 1; - tp->rcv_mwnd_seq =3D tp->rcv_wup + tp->rcv_wnd; + tcp_init_max_rcv_wnd_seq(tp); =20 /* RFC1323: The window in SYN & SYN/ACK segments is * never scaled. @@ -7015,7 +7015,7 @@ static int tcp_rcv_synsent_state_process(struct sock = *sk, struct sk_buff *skb, WRITE_ONCE(tp->rcv_nxt, TCP_SKB_CB(skb)->seq + 1); WRITE_ONCE(tp->copied_seq, tp->rcv_nxt); tp->rcv_wup =3D TCP_SKB_CB(skb)->seq + 1; - tp->rcv_mwnd_seq =3D tp->rcv_wup + tp->rcv_wnd; + tcp_init_max_rcv_wnd_seq(tp); =20 /* RFC1323: The window in SYN & SYN/ACK segments is * never scaled. diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c index 1c02c9cd13fe..85bd9580caf9 100644 --- a/net/ipv4/tcp_minisocks.c +++ b/net/ipv4/tcp_minisocks.c @@ -604,7 +604,7 @@ struct sock *tcp_create_openreq_child(const struct sock= *sk, newtp->window_clamp =3D req->rsk_window_clamp; newtp->rcv_ssthresh =3D req->rsk_rcv_wnd; tcp_set_rcv_wnd(newtp, req->rsk_rcv_wnd); - newtp->rcv_mwnd_seq =3D newtp->rcv_wup + req->rsk_rcv_wnd; + tcp_init_max_rcv_wnd_seq(newtp); newtp->rx_opt.wscale_ok =3D ireq->wscale_ok; if (newtp->rx_opt.wscale_ok) { newtp->rx_opt.snd_wscale =3D ireq->snd_wscale; diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 0b082726d7c4..57a2a6daaad3 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -4171,7 +4171,7 @@ static void tcp_connect_init(struct sock *sk) else tp->rcv_tstamp =3D tcp_jiffies32; tp->rcv_wup =3D tp->rcv_nxt; - tp->rcv_mwnd_seq =3D tp->rcv_nxt + tp->rcv_wnd; + tcp_init_max_rcv_wnd_seq(tp); WRITE_ONCE(tp->copied_seq, tp->rcv_nxt); =20 inet_csk(sk)->icsk_rto =3D tcp_timeout_init(sk); --=20 2.43.0 From nobody Sun Mar 22 08:10:23 2026 Received: from mail-oa1-f44.google.com (mail-oa1-f44.google.com [209.85.160.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0DFA138654F for ; Sat, 14 Mar 2026 20:14:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.44 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773519292; cv=none; b=CGvHIOTKH9yNuLMsP18hn8r1SePx5vjWJ1aapQgRiTqArTgaL9m/gZIlJM1xyPrvpizMkJajHZpLoCZwuMPadp7fIauAY8+Ccgb9e4k7qKFZk6yoWT/sDkMkK4Cy48WEoWJPE9kKIo32+nj+PUyDKYuA7r9SaRzJxSGftyQue1Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773519292; c=relaxed/simple; bh=7izhNqLY7+iqpp+D5zcP+czL6od/LnGYqCsmQoSJSu4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=untnuFumKe+Nc/67Cp98cNY5wcjWRiU36bvoZzlJhQdDUmtR5xbWofsrhJVeiTXz5aflxk5BLJJCUQLygyTNgSr5VHTw7sZAPlpWBlziy9QM1gE4mVBfCY6mDOcQx8tTQMQPfFrbUiUHO/O7Vnt7VorqzzIgmqskelbDaUNfeOk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=YfSQCnI8; arc=none smtp.client-ip=209.85.160.44 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="YfSQCnI8" Received: by mail-oa1-f44.google.com with SMTP id 586e51a60fabf-417400afaeeso3363279fac.1 for ; Sat, 14 Mar 2026 13:14:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1773519289; x=1774124089; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=fOOOdlrS2+fu1x+v8Rmi5MAUGmJW2DC+1z1Ah1tHPHA=; b=YfSQCnI8pcAkQBxbxRmg0IXw0rj6yeiQQBJEY+8xfNyIFRNHohbTtZgFJr19+aIs5d po7C/9kl24MwwwIpN7b2TDapKOhvwhx0SOXgfsb9P3JuQo44cThC4EZC4W7qj0lS1R3s 0WLlVmQ1GtJA79Y7M+esx4Q8tOheHSz8oqqmaiELJYb2BGUQjaWrIaCpPw0+0sjbVib9 TI8wFudad+LkDCH+nVFSPDUf2JTqlLX9Iqv5B/y3QiV4nEFRGWYQlZku/Iu/0Lm2Jq06 JErdP0bcSZykMicBHCwDaAsVaAFg0imOj2oy5cCZdXFPzZN7gzZbiCIh9M3UKJiSxIl1 PI+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1773519289; x=1774124089; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=fOOOdlrS2+fu1x+v8Rmi5MAUGmJW2DC+1z1Ah1tHPHA=; b=Rv3mCEHl3jo+ApZdyXnU/HjOEmCCY9Ri3qb6+oy0g5qywoLblH1hVlnBQRzX2FQSmY vnZ8fpPjg3Z/OfdLCpUlnElTVdAuuTl171ELVANPJd1HoPR8L+y1KeMJkoudPYDJ6QQi 2MasZngL6qlc4R7RZPHZnIivgKD7x+k7RQ8UJYUH+cF6zdWFYPJshfDScWnndNEL5Ru8 Hsrr46LmUz5FhccMTGlJHwjvTp1QFPClDknLggDfoctmWyjQ9rewg3R3LY6jZWVj3MKt ob90IL5SuPXSeVfrUQpaiOxOx9w6UPL09C43eoGyjKzGI1Xf29h04T/jSQpHQWv21gQZ xN9g== X-Gm-Message-State: AOJu0YyM38Db7QJuHyyLpDdEvGQ63YdodZZC69fXcXJPfsSgFs4kiZSA L80P2W/IsGnnvd3IlixjBpoAe72Xv8wpFwKAX4fRgF5rKnz3kpy3D8eq X-Gm-Gg: ATEYQzzIvMZi482806y/mYIR5TVL1kckF95eOLLEnPtltwGQJRcBSJQYTqfE3rDkJED j4C2xgNg99Yn+qoWRPEvWJZ8u1sMLQaG5Av0XvdaMkbzrth4ID8APub8P6ILSK583gSQC4y1+IY zRVZ/Lc2pG/RlnyT7D+P+mDwXfj6IOrV79JI3vxxbdX5ZhDBUnm3HO9zTwni6qy1384fWBk3P4G VDB5RRU9Wnk0q1b0Kkx+gnRGZEeBANo+j7uMNEbB+i2zrGMXnX8xyor69BqiLocfJfqJRGXqo3/ 4QQ8EgxAbDNYf7APpvvnPQTqALve8Z5NUfhd8r3ksDHiLCwaCxljU1p3LKHBwjwRhTlA8fz4RIL DU/gZz78l+sSx0ipTLr/O2rMDPwk7vFuVx0q7BJRDif8HNuC61I/ci+FUBS4ZJjK+eyUE3Z0mM+ g6W3bW+ArHKqdHtvJlOTG2MPOKXQvak18bTMHMjpkWYizgj7DYZfZlGLawmWl13gzTtQUMHF6J0 aVO73sladm2WuGnQpV2/ocgnr7fr9pXIs6w7acW X-Received: by 2002:a05:6870:7d86:b0:40a:5870:98bb with SMTP id 586e51a60fabf-4179911cfccmr6459609fac.21.1773519288902; Sat, 14 Mar 2026 13:14:48 -0700 (PDT) Received: from Atwell-Laptop.. (108-212-132-20.lightspeed.irvnca.sbcglobal.net. [108.212.132.20]) by smtp.gmail.com with ESMTPSA id 586e51a60fabf-4177e5e8185sm11914165fac.12.2026.03.14.13.14.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 14 Mar 2026 13:14:48 -0700 (PDT) From: atwellwea@gmail.com To: netdev@vger.kernel.org, davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com, edumazet@google.com, ncardwell@google.com Cc: linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-trace-kernel@vger.kernel.org, mptcp@lists.linux.dev, dsahern@kernel.org, horms@kernel.org, kuniyu@google.com, andrew+netdev@lunn.ch, willemdebruijn.kernel@gmail.com, jasowang@redhat.com, skhan@linuxfoundation.org, corbet@lwn.net, matttbe@kernel.org, martineau@kernel.org, geliang@kernel.org, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, 0x7f454c46@gmail.com Subject: [PATCH net-next v2 05/14] tcp: grow rcvbuf to back scaled-window quantization slack Date: Sat, 14 Mar 2026 14:13:39 -0600 Message-ID: <20260314201348.1786972-6-atwellwea@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260314201348.1786972-1-atwellwea@gmail.com> References: <20260314201348.1786972-1-atwellwea@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Wesley Atwell Teach TCP to grow sk_rcvbuf when scale rounding would otherwise expose more sender-visible window than the current hard receive-memory backing can cover. The new helper keeps backlog and memory-pressure limits in the same units as the rest of the receive path, while __tcp_select_window() backs any rounding slack before advertising it. Signed-off-by: Wesley Atwell --- include/net/tcp.h | 12 ++++++++++++ net/ipv4/tcp_input.c | 36 ++++++++++++++++++++++++++++++++++-- net/ipv4/tcp_output.c | 15 +++++++++++++-- 3 files changed, 59 insertions(+), 4 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index fc22ab6b80d5..5b479ad44f89 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -397,6 +397,7 @@ int tcp_ioctl(struct sock *sk, int cmd, int *karg); enum skb_drop_reason tcp_rcv_state_process(struct sock *sk, struct sk_buff= *skb); void tcp_rcv_established(struct sock *sk, struct sk_buff *skb); void tcp_rcvbuf_grow(struct sock *sk, u32 newval); +bool tcp_try_grow_rcvbuf(struct sock *sk, int needed); void tcp_rcv_space_adjust(struct sock *sk); int tcp_twsk_unique(struct sock *sk, struct sock *sktw, void *twp); void tcp_twsk_destructor(struct sock *sk); @@ -1844,6 +1845,17 @@ static inline int tcp_rwnd_avail(const struct sock *= sk) return tcp_rmem_avail(sk) - READ_ONCE(sk->sk_backlog.len); } =20 +/* Passive children clone the listener's sk_socket until accept() grafts + * their own struct socket, so only sockets that point back to themselves + * should autotune receive-buffer backing. + */ +static inline bool tcp_rcvbuf_grow_allowed(const struct sock *sk) +{ + struct socket *sock =3D READ_ONCE(sk->sk_socket); + + return sock && READ_ONCE(sock->sk) =3D=3D sk; +} + /* Note: caller must be prepared to deal with negative returns */ static inline int tcp_space(const struct sock *sk) { diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 352f814a4ff6..32256519a085 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -774,6 +774,38 @@ static void tcp_init_buffer_space(struct sock *sk) (u32)TCP_INIT_CWND * tp->advmss); } =20 +/* Try to grow sk_rcvbuf so the hard receive-memory limit covers @needed + * bytes beyond sk_rmem_alloc while preserving sender-visible headroom + * already consumed by sk_backlog.len. + */ +bool tcp_try_grow_rcvbuf(struct sock *sk, int needed) +{ + struct net *net =3D sock_net(sk); + int backlog; + int rmem2; + int target; + + needed =3D max(needed, 0); + backlog =3D READ_ONCE(sk->sk_backlog.len); + target =3D tcp_rmem_used(sk) + backlog + needed; + + if (target <=3D READ_ONCE(sk->sk_rcvbuf)) + return true; + + rmem2 =3D READ_ONCE(net->ipv4.sysctl_tcp_rmem[2]); + if (READ_ONCE(sk->sk_rcvbuf) >=3D rmem2 || + (sk->sk_userlocks & SOCK_RCVBUF_LOCK) || + tcp_under_memory_pressure(sk) || + sk_memory_allocated(sk) >=3D sk_prot_mem_limits(sk, 0)) + return false; + + WRITE_ONCE(sk->sk_rcvbuf, + min_t(int, rmem2, + max_t(int, READ_ONCE(sk->sk_rcvbuf), target))); + + return target <=3D READ_ONCE(sk->sk_rcvbuf); +} + /* 4. Recalculate window clamp after socket hit its memory bounds. */ static void tcp_clamp_window(struct sock *sk) { @@ -785,14 +817,14 @@ static void tcp_clamp_window(struct sock *sk) icsk->icsk_ack.quick =3D 0; rmem2 =3D READ_ONCE(net->ipv4.sysctl_tcp_rmem[2]); =20 - if (sk->sk_rcvbuf < rmem2 && + if (READ_ONCE(sk->sk_rcvbuf) < rmem2 && !(sk->sk_userlocks & SOCK_RCVBUF_LOCK) && !tcp_under_memory_pressure(sk) && sk_memory_allocated(sk) < sk_prot_mem_limits(sk, 0)) { WRITE_ONCE(sk->sk_rcvbuf, min(atomic_read(&sk->sk_rmem_alloc), rmem2)); } - if (atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf) + if (atomic_read(&sk->sk_rmem_alloc) > READ_ONCE(sk->sk_rcvbuf)) tp->rcv_ssthresh =3D min(tp->window_clamp, 2U * tp->advmss); } =20 diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 57a2a6daaad3..53781cf591d2 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -3375,13 +3375,24 @@ u32 __tcp_select_window(struct sock *sk) * scaled window will not line up with the MSS boundary anyway. */ if (tp->rx_opt.rcv_wscale) { + int rcv_wscale =3D 1 << tp->rx_opt.rcv_wscale; + window =3D free_space; =20 /* Advertise enough space so that it won't get scaled away. - * Import case: prevent zero window announcement if + * Important case: prevent zero-window announcement if * 1< mss. */ - window =3D ALIGN(window, (1 << tp->rx_opt.rcv_wscale)); + window =3D ALIGN(window, rcv_wscale); + + /* Back any scale-quantization slack before we expose it. + * Otherwise tcp_can_ingest() can reject data which is still + * within the sender-visible window. + */ + if (window > free_space && + (!tcp_rcvbuf_grow_allowed(sk) || + !tcp_try_grow_rcvbuf(sk, tcp_space_from_win(sk, window)))) + window =3D round_down(free_space, rcv_wscale); } else { window =3D tp->rcv_wnd; /* Get the largest window that is a nice multiple of mss. --=20 2.43.0 From nobody Sun Mar 22 08:10:23 2026 Received: from mail-ot1-f46.google.com (mail-ot1-f46.google.com [209.85.210.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BE5BC386C11 for ; Sat, 14 Mar 2026 20:14:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.46 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773519293; cv=none; b=qPReFEjGms4jjYuAAeluH0U/fY/3MzaiQOZxCFRtgUtJHvJRS7ILtI5lFwkZjrhgJoHMeUyO/JXJDu4pw6fsnHGCdLWOZ+UaDAWsrno/fST8i4n/WHNdjTvKQ7IZJt3m8WMu3PocazCqRJjRjfEJPzKbeYh7FQ0ya0RepReZMSI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773519293; c=relaxed/simple; bh=X1zLIw1J0h9kTV3NoqokrWtcJn8B9ET3Wq/q4qH3Vzo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=uhqMymXCvhw9FWO4378M3gjN2CCWXkqju7co19Wk8hM4S1vCwMwsER22Vm0n/iKva+bpfFenHE8VQ1E0Jkhso1XLHvDEU+Rn7yPg2QPMw/ZLMvFtpKE1x2plDyFQMIjHeC3yuAEBwa3xzgFZ/uy23y51z8o2lAFVLENfpmaaAto= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=av4C9+ab; arc=none smtp.client-ip=209.85.210.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="av4C9+ab" Received: by mail-ot1-f46.google.com with SMTP id 46e09a7af769-7d7422b4ff1so1458077a34.3 for ; Sat, 14 Mar 2026 13:14:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1773519291; x=1774124091; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=q07GcXVkZ6+WpFEBkbyw8fuqV93z6jZU74lSMjAcsUw=; b=av4C9+abqVpl1yMfwP13WzFukND7oTB7bmhw/zU+qs7arventB0BU1KB4Mbsou5O+k 6DRJJrkSnX7dOSovGsorBbjyap39bFfQGwd9zn0pmCabgitQ3aGWUXM2Kq3ShumzS21K S4U0DuPE0UlEdXoN/+jcyQwD22HlteF2XGQcWzONGCc2q72voKQIm/HP4CoxGXuJVagY ohNaxMSy0L6t3uHxjB/zD9ya7F219NmdlOpPj/tTZWGriZQc1N63XXGmWzlLOyWUuZOV la5lceDE1cuF5SiB8JBi8dEXDq3ND1sgIheSbwJn3HqodREIbOvahlN/u1khl0jM+dBi K4sA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1773519291; x=1774124091; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=q07GcXVkZ6+WpFEBkbyw8fuqV93z6jZU74lSMjAcsUw=; b=RJ1rP6b23SHQhkVQZelSJW5+nRqgGmk7x2y+4r7VgI3obcTWJ4rBfS2AF3mqGT+FpK IheNvPx6vpAi1ygeVnBEjrm+jcUWoL8kceLmyNfkhuytHBYm+8oznk8Zp/9nPC+5kBB7 7LjrRNZR/qJOzxejmg+NIiGpj3A0B6tNsyITwmx/A7Xs6CIow3mBDgaubdMjqXCIDmSm RQC9jOpoeW9ONWZNW/vSMVbZM/fPjC58isb8mB4Gd+vyZ3dCfCEtBtDn7j4IFvhkfbA5 AYBaYmfiJA8LkEWdBHA9jF3oopyK3RH5aYNcADoYSOLtmhUs+2oS1l+vFV+QO41TO+Tk 9Zbw== X-Gm-Message-State: AOJu0YwvhBJWoEdJxKBTKITVt+AI+YFE/lipEayWbWPVh4dlLkiJjjPy fV1XJ7StWtmtqVNJsTTYQ3Q/yIff7sCx7jZCYi8Vm3psBA7qHbWaJm0U X-Gm-Gg: ATEYQzxKBuV6gEvdUkdO1pLR3H2LnUCn3kRv2yFHuq/qfPpKhr/Nj8ddrM3SYVeekO4 i00Hv65xhtzV6TusFA1RYwF34vMn1lv2Csx0T0m2xGPftwiQMg7sG3zoCjbemUAHAMvUBUFrJqM cWsvBNZTEbT35goDNB5HNOWJkhxpepcyQlaAt+2jYqt0yED5uoo6ra7ejMxYhqfcehN3CMO9oYp HVBgfEG7hpBHBai9nVqAA4gpJN09mrSzEjlElBWYkkiAoll7iE86R01tV1zr9GN5g8wZyVNMrI8 iMKvNkhMZ0gXTz7EnK3YQw9llfn2hKFWR2gZGzmvAQiPwWFP17jQltG9AMVqrfryIUzwoizfN77 /Qvj1vCdpz/7J0pHt42hvMYRsyofaS7ElGMvTKLnamkRr/D/QmetcH/w2M0o8VEZ8qmiQjKJWJC 9DPywagVKd5ZYiAoc8lzFIv3Zg1EEuY4dd+F4RxRhp0eBAOJy/NZlSILVt7fDR5fhXcrtGewqTi fQWJ3505Ltfi3HPWKce7Btx8CZvtue8oiE1ItN/48oQQXBSG04= X-Received: by 2002:a05:6830:370d:b0:7d5:13eb:6010 with SMTP id 46e09a7af769-7d78259393bmr5062611a34.33.1773519290685; Sat, 14 Mar 2026 13:14:50 -0700 (PDT) Received: from Atwell-Laptop.. (108-212-132-20.lightspeed.irvnca.sbcglobal.net. [108.212.132.20]) by smtp.gmail.com with ESMTPSA id 586e51a60fabf-4177e5e8185sm11914165fac.12.2026.03.14.13.14.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 14 Mar 2026 13:14:50 -0700 (PDT) From: atwellwea@gmail.com To: netdev@vger.kernel.org, davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com, edumazet@google.com, ncardwell@google.com Cc: linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-trace-kernel@vger.kernel.org, mptcp@lists.linux.dev, dsahern@kernel.org, horms@kernel.org, kuniyu@google.com, andrew+netdev@lunn.ch, willemdebruijn.kernel@gmail.com, jasowang@redhat.com, skhan@linuxfoundation.org, corbet@lwn.net, matttbe@kernel.org, martineau@kernel.org, geliang@kernel.org, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, 0x7f454c46@gmail.com Subject: [PATCH net-next v2 06/14] tcp: regrow rcvbuf when scaling_ratio drops after advertisement Date: Sat, 14 Mar 2026 14:13:40 -0600 Message-ID: <20260314201348.1786972-7-atwellwea@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260314201348.1786972-1-atwellwea@gmail.com> References: <20260314201348.1786972-1-atwellwea@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Wesley Atwell When tcp_measure_rcv_mss() lowers scaling_ratio after a window was already advertised, grow sk_rcvbuf so the remaining live sender-visible window still has matching hard receive-memory backing. This repairs the live advertised window only. Retracted-window rescue is handled separately in a later patch. Signed-off-by: Wesley Atwell --- net/ipv4/tcp_input.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 32256519a085..d76e4e4c0e57 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -221,6 +221,31 @@ static __cold void tcp_gro_dev_warn(const struct sock = *sk, const struct sk_buff rcu_read_unlock(); } =20 +/* If scaling_ratio drops after we already advertised tp->rcv_wnd, grow + * sk_rcvbuf so the remaining live window still maps back to hard memory + * units under the old advertise-time basis. + */ +static void tcp_try_grow_advertised_window(struct sock *sk, + const struct sk_buff *skb) +{ + struct tcp_sock *tp =3D tcp_sk(sk); + int needed; + + /* Keep this repair aligned with tcp_rcvbuf_grow(): do not adjust + * receive-buffer backing for not-yet-accepted or orphaned sockets. + */ + if (!tcp_rcvbuf_grow_allowed(sk)) + return; + + if (!tcp_receive_window(tp)) + return; + + if (!tcp_space_from_rcv_wnd(tp, tcp_receive_window(tp), &needed)) + return; + + tcp_try_grow_rcvbuf(sk, needed); +} + /* Adapt the MSS value used to make delayed ack decision to the * real world. */ @@ -251,6 +276,7 @@ static void tcp_measure_rcv_mss(struct sock *sk, const = struct sk_buff *skb) if (old_ratio !=3D tcp_sk(sk)->scaling_ratio) { struct tcp_sock *tp =3D tcp_sk(sk); =20 + tcp_try_grow_advertised_window(sk, skb); val =3D tcp_win_from_space(sk, sk->sk_rcvbuf); tcp_set_window_clamp(sk, val); =20 --=20 2.43.0 From nobody Sun Mar 22 08:10:23 2026 Received: from mail-ot1-f49.google.com (mail-ot1-f49.google.com [209.85.210.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 39CA538424A for ; Sat, 14 Mar 2026 20:14:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.49 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773519295; cv=none; b=JAPY/XJl+y67uCF+qZ7a/ipQ0KR1n74r1JW7VjRWBDWfMS4lx0s9TGMGr20pTYByyS2CtyPPOb2VQheAFqNtZk0SWaSLZcefYzsOKQamVppnFLOC2RM9LxCaEzWzylJzGHICK37+WVKhBtjFbpIN7UFIUOT06J1OijOa1ddkkzw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773519295; c=relaxed/simple; bh=ww0RG9/Q0OVKyskNxv3bbhNoq1jg/FnsAuo6CN8DsUQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=hahrg6nM+TOujL3FS/8t/DsFK2D1HO+MEB9gjUvdqXLBNHIVdSpcapnoM+Nyh7G7iYjv13Re9b8dwIyLTGX7Z8QZPwU8d3CNAOS0E6VREZTCcWQTAljQf3JVUclGFVebLrX4cwhRcUu+glVEaLuiiHniMZI7TfcMG6rSZQ08b9w= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=AOWC0qXh; arc=none smtp.client-ip=209.85.210.49 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="AOWC0qXh" Received: by mail-ot1-f49.google.com with SMTP id 46e09a7af769-7d7851e2cc4so1741467a34.3 for ; Sat, 14 Mar 2026 13:14:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1773519292; x=1774124092; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=LSj81+CF0f4yFIE+he2UWBsfbEKSyH1G8KBdb3oc/PY=; b=AOWC0qXh+17dY5zV3bR0wBdty1DN4PVf4/qwtPFcaS+XM1Djn8KeVi5gDTy96aBuvq xxTCXRv1DcXqaZS1iOaiahU5QeEVdTPt29VzILWyJLSwn4/Of6Hvu9SeiVuibgzSy1UI BTkqBFEUPrRKibNQJ8EiR0USjSx1NjQfRAcsYaZor/av3f4jS7NJys2xlYBY4h0bKa3O NFewcN29iIS2T+Zwb28EGxipknPGSw0YQ/2R+Ccs7H8fBhzR3WaO3SMRv0kxt7sShsdy gfo17VuqIp6fauojTs7NUc3YQLhSfSPecz8x/2LMVqMMBIU+6c6jcx9RS2GgbprsTzYW xgtA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1773519292; x=1774124092; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=LSj81+CF0f4yFIE+he2UWBsfbEKSyH1G8KBdb3oc/PY=; b=WJjENXS+cwLaSyeKNKZ7YpdZbyvT86v+fkt+DY8Wt2Z73+capvY105sCNOkgXc21+2 rNOSuPyJb6pifqSLVLz80Ic3lL5jY7HdiLik8vVONLsOQkqn6OVeLXaCY1m8sNUK9Hpg xJhP6jnNLAznfDfjWTMfI5phtVEkiCGT7Uzjla8PUBevUt7i2ihjdh2IyQlq0l/KUZiH 0es3N86lJwm1js3vVbRu3vFaXqsYWGHsDPuLusqYuXtlheXyoLRdxKR7HkMVTRDMTW7J iQ52/wNTMOhkS52+8bGw9pj6PkbP+MHEG8Jq6KUV3O9J1jz/Jc6LLAG2EWLh+a2SeFNj KYzA== X-Gm-Message-State: AOJu0YyvQaO/+wVbHneWGh1V6TlXsmZQ5Xq6p6+cCVoVhjCi8g6TupR1 0q44hBr0nFwZaPwzBoFOmLsTiojx3xfS79yFhrGvejl3sVGmQ2AcTOT6 X-Gm-Gg: ATEYQzxCl8uBN8KYoyLLeZcJBBynZdkSLZZLZdyXim6SRUdNm0mk9DVspqSAlu/oazm 97qqtbf9ui0zHBTClBKyEL7Mufopj9haYtG1fXUIXaHZErfqygIMvRilc1+tyyrSa91xjMUmcv5 iPCrbN0iIaSJLlzRUnw0MX5WNoyFFASV7DqOriYBp+5iP+vKqlvDV+OQz83CxfMcwhO36jkVnGy KDtzx8Ub96z7tDqWmPYElLaNO5/h2C6jk8PDpz1tjaXS6magIqXTX1xSQZcd0NBrf6F3wfpzgwi U1lDF4xDoy1XKO7Qvz9jZ41eo8RPPw3HtA9QCatwrIkElmOG7iZWNRO989FiKb0/dGIZUjxTfrH Yl7dI5azjFAq6G5J25rsbyDhTcInnHccPluOo/rWhRagkqLBPa6CF7cpu/NIS9eBLBnZ1NtoPKZ GQ4oOYbV1lfr9sTQhRKTkXkvRIQljeHIGJWEpuxG9GiItmUwtXPYZCxqCfpTEW3xuAenurzl+WB Dr64kyPXio7FxbjegvZZe0T39kInJEpFwkcHGt2Y5raRWHWRho= X-Received: by 2002:a05:6870:309:b0:40e:95b9:40e6 with SMTP id 586e51a60fabf-417b946c51fmr4109655fac.40.1773519292142; Sat, 14 Mar 2026 13:14:52 -0700 (PDT) Received: from Atwell-Laptop.. (108-212-132-20.lightspeed.irvnca.sbcglobal.net. [108.212.132.20]) by smtp.gmail.com with ESMTPSA id 586e51a60fabf-4177e5e8185sm11914165fac.12.2026.03.14.13.14.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 14 Mar 2026 13:14:51 -0700 (PDT) From: atwellwea@gmail.com To: netdev@vger.kernel.org, davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com, edumazet@google.com, ncardwell@google.com Cc: linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-trace-kernel@vger.kernel.org, mptcp@lists.linux.dev, dsahern@kernel.org, horms@kernel.org, kuniyu@google.com, andrew+netdev@lunn.ch, willemdebruijn.kernel@gmail.com, jasowang@redhat.com, skhan@linuxfoundation.org, corbet@lwn.net, matttbe@kernel.org, martineau@kernel.org, geliang@kernel.org, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, 0x7f454c46@gmail.com Subject: [PATCH net-next v2 07/14] tcp: honor the maximum advertised window after live retraction Date: Sat, 14 Mar 2026 14:13:41 -0600 Message-ID: <20260314201348.1786972-8-atwellwea@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260314201348.1786972-1-atwellwea@gmail.com> References: <20260314201348.1786972-1-atwellwea@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Wesley Atwell If receive-side accounting retracts the live rwnd below a larger sender-visible window that was already advertised, allow one in-order skb within that historical bound to repair its backing and reach the normal receive path. Hard receive-memory admission is still enforced through the existing prune and collapse path. The rescue only changes how data already inside sender-visible sequence space is classified and backed. Signed-off-by: Wesley Atwell --- net/ipv4/tcp_input.c | 92 +++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 86 insertions(+), 6 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index d76e4e4c0e57..4b9309c37e99 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -5376,24 +5376,86 @@ static void tcp_ofo_queue(struct sock *sk) static bool tcp_prune_ofo_queue(struct sock *sk, const struct sk_buff *in_= skb); static int tcp_prune_queue(struct sock *sk, const struct sk_buff *in_skb); =20 +/* Sequence checks run against the sender-visible receive window before th= is + * point. If later receive-side accounting retracts the live receive window + * below the maximum right edge we already advertised, allow one in-order = skb + * which still fits inside that sender-visible bound to reach the normal + * receive queue path. + * + * Keep receive-memory admission itself on the legacy hard-cap path so pru= ne + * and collapse behavior stay aligned with the established retracted-window + * handling. + */ +static bool tcp_skb_in_retracted_window(const struct tcp_sock *tp, + const struct sk_buff *skb) +{ + u32 live_end =3D tp->rcv_nxt + tcp_receive_window(tp); + u32 max_end =3D tp->rcv_nxt + tcp_max_receive_window(tp); + + return after(max_end, live_end) && + after(TCP_SKB_CB(skb)->end_seq, live_end) && + !after(TCP_SKB_CB(skb)->end_seq, max_end); +} + static bool tcp_can_ingest(const struct sock *sk, const struct sk_buff *sk= b) { - unsigned int rmem =3D atomic_read(&sk->sk_rmem_alloc); + return tcp_rmem_used(sk) <=3D READ_ONCE(sk->sk_rcvbuf); +} + +/* Caller already established that @skb extends into the retracted-but-sti= ll- + * valid sender-visible window. For in-order progress, regrow sk_rcvbuf be= fore + * falling into prune/forced-mem handling. + * + * This path intentionally repairs backing for one in-order skb that is al= ready + * within sender-visible sequence space, rather than treating it like ordi= nary + * receive-buffer autotuning. + * + * Keep this rescue bounded to the span accepted by this skb instead of the + * full historical tp->rcv_mwnd_seq. However, never grow below skb->truesi= ze, + * because sk_rmem_schedule() still charges hard memory, not sender-visible + * window bytes. + */ +static void tcp_try_grow_retracted_skb(struct sock *sk, + const struct sk_buff *skb) +{ + struct tcp_sock *tp =3D tcp_sk(sk); + int needed =3D skb->truesize; + int span_space; + u32 span_win; + + if (TCP_SKB_CB(skb)->seq !=3D tp->rcv_nxt) + return; + + span_win =3D TCP_SKB_CB(skb)->end_seq - tp->rcv_nxt; + if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN) + span_win--; + + if (tcp_space_from_rcv_mwnd(tp, span_win, &span_space)) + needed =3D max_t(int, needed, span_space); =20 - return rmem <=3D sk->sk_rcvbuf; + tcp_try_grow_rcvbuf(sk, needed); } =20 +/* Sender-visible window rescue does not relax hard receive-memory admissi= on. + * If growth did not make room, fall back to the established prune/collapse + * path. + */ static int tcp_try_rmem_schedule(struct sock *sk, const struct sk_buff *sk= b, unsigned int size) { - if (!tcp_can_ingest(sk, skb) || - !sk_rmem_schedule(sk, skb, size)) { + bool can_ingest =3D tcp_can_ingest(sk, skb); + bool scheduled =3D can_ingest && sk_rmem_schedule(sk, skb, size); + + if (!scheduled) { + int pruned =3D tcp_prune_queue(sk, skb); =20 - if (tcp_prune_queue(sk, skb) < 0) + if (pruned < 0) return -1; =20 while (!sk_rmem_schedule(sk, skb, size)) { - if (!tcp_prune_ofo_queue(sk, skb)) + bool pruned_ofo =3D tcp_prune_ofo_queue(sk, skb); + + if (!pruned_ofo) return -1; } } @@ -5629,6 +5691,7 @@ void tcp_data_ready(struct sock *sk) static void tcp_data_queue(struct sock *sk, struct sk_buff *skb) { struct tcp_sock *tp =3D tcp_sk(sk); + bool retracted; enum skb_drop_reason reason; bool fragstolen; int eaten; @@ -5647,6 +5710,7 @@ static void tcp_data_queue(struct sock *sk, struct sk= _buff *skb) } tcp_cleanup_skb(skb); __skb_pull(skb, tcp_hdr(skb)->doff * 4); + retracted =3D skb->len && tcp_skb_in_retracted_window(tp, skb); =20 reason =3D SKB_DROP_REASON_NOT_SPECIFIED; tp->rx_opt.dsack =3D 0; @@ -5667,6 +5731,9 @@ static void tcp_data_queue(struct sock *sk, struct sk= _buff *skb) (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN)) goto queue_and_out; =20 + if (retracted) + goto queue_and_out; + reason =3D SKB_DROP_REASON_TCP_ZEROWINDOW; NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPZEROWINDOWDROP); goto out_of_window; @@ -5674,7 +5741,20 @@ static void tcp_data_queue(struct sock *sk, struct s= k_buff *skb) =20 /* Ok. In sequence. In window. */ queue_and_out: + if (unlikely(retracted)) + tcp_try_grow_retracted_skb(sk, skb); + if (tcp_try_rmem_schedule(sk, skb, skb->truesize)) { + /* If the live rwnd collapsed to zero while rescuing an + * skb that still fit in sender-visible sequence space, + * report zero-window rather than generic proto-mem. + */ + if (unlikely(!tcp_receive_window(tp) && retracted)) { + reason =3D SKB_DROP_REASON_TCP_ZEROWINDOW; + NET_INC_STATS(sock_net(sk), + LINUX_MIB_TCPZEROWINDOWDROP); + goto out_of_window; + } /* TODO: maybe ratelimit these WIN 0 ACK ? */ inet_csk(sk)->icsk_ack.pending |=3D (ICSK_ACK_NOMEM | ICSK_ACK_NOW); --=20 2.43.0 From nobody Sun Mar 22 08:10:23 2026 Received: from mail-oa1-f49.google.com (mail-oa1-f49.google.com [209.85.160.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AAA86386428 for ; Sat, 14 Mar 2026 20:14:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.49 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773519296; cv=none; b=YGw+hpK499GdtYQ3vx0xy2jmnnwQk3yERN/GvP0AhYwcvqYBfgrdQjL6+F37T8wrZxRSduAWEH8E5HJ9rYRsbNFihClHz/rVoX0H1nf1L/WTM9ApZEewdMUZAKW7Jn35JjFKHRj5LL/+WV66xsbHQEApdEuVmpjVq2uQlfev+UI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773519296; c=relaxed/simple; bh=UwI3YBDLFZSN0Qol0fxNTwazKut7aA5SPPECRbpGROI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=rIVSZKDYxrPLCd8tcYQfb5fkBjwciDw0RSLps9co+oQ4tsf8MIIPXLD0asdoMcKEiUALyVDuYOqgWEsRnQnfozkSS96WOmh2qpH4+V02NKfnVpQF0HGYNCMoqypMIOXe9fJVlMN+3SzESu7gbEBQPfATGKayI+gFI7ZJgXszBvY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=j49SHDWy; arc=none smtp.client-ip=209.85.160.49 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="j49SHDWy" Received: by mail-oa1-f49.google.com with SMTP id 586e51a60fabf-40438e0cba6so2122776fac.1 for ; Sat, 14 Mar 2026 13:14:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1773519294; x=1774124094; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=W8jG5HL0B71BMqNMR6IMNfhIt0KSqCtw04UXTW9LEYQ=; b=j49SHDWyhL6wVPV5IL2n94528OcExoRQnxUY1jTYXuP9K5S7QNJhuDq8bK7PGpVdf2 IqZrZw86ALzFrFB7Dm1rYuRSKE51wU3EBTogSlIzUc8xR7rhpUWSEgMkUE3jmocumDW3 ZXRd33PWAzTNzUQIcWtqxbrFNOrq00xB6l/Afjl+nBygCSzgG3J8KYC1MXTabkdXlG5Z ho5UwDBIsi6JuMoafh9D1E73l18caLotal1OPM62yTIie9pZ+IcMRXu/muCXIk7W6eAv Q221WX4LU4hrfBVNoXINbhrBMEwgtWdBRAO25EnJl/8EHyc6i8KbPBQrnVbLQN0uz7L4 4BoA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1773519294; x=1774124094; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=W8jG5HL0B71BMqNMR6IMNfhIt0KSqCtw04UXTW9LEYQ=; b=BxGHYtKia/Ycj9Hxyu3y7h12Ovpwk0qrlmbExBEuAxf8W1AuqeP6mdijvEkzwwMwld 1sTkKpAN5f+pwih+eWuiFWzZ6NwaPfyOw6urkTn5kFW0mTcsnCQJIVDUOj+qsx1e0j1V 1xlW9qnESEb0eJGywwTnsI2hVlq8019FK6jDwsA9vH8Sb4sfA4l0eMv/Jy+Fik00KDZw 2s51BGP9kp0zfElZu2hRYJQeXTAwJoqlxnkumJxn5Sd5vQI5fu45VlIvaajmfvfHly79 06g14PA3Xtpmjhgt+7kkxO/A5yHYSTqXxVcpgtWGY6nqtboZr2Y5SSlUEZ4FKfvhvxlT CTWg== X-Gm-Message-State: AOJu0YwI0yHsMwdIcYBq0FxIRINDpY/n9cj/RpHqhNScKkq10fVz0MCb bBAGPS9pY/7SGwKzHJJZ+cWjQLQTEik7NxltK4UTyqINFkqcorx5Y8I2 X-Gm-Gg: ATEYQzx/TkplT4+uP7VvPSpua7e6CM5QAMsniSM5HM9rrm/uHsdy4Kfwd6atvUxdKaR ef8p5/NG8trFdhuu1xR6zsnpo/2f+lBPW5PDwTzWGM4H1uO+s6D/pK3CUDOPLwv69/s+c3Qm8bN 3YWNtRIabsYrMIwfs2JC4WxP8bKdB8P3k/uDRNbGZWbfmkKUhVm2hRo5JW/v0kY3BAyh7C6LF/a 9SKeIKRFFUGQRZXbmrm00llzyHiPb6R9gSQHp9peE1Tv5raBW4ANH2rac8Lg1mUH52Ly4y9PTWW RmwfMseJQMgvSNatr9+AX9RRWP/1fXHrfQstFHxBCe5od+CuJJeoeTdxYLI+umAO+DVeJmmdNk+ ABEO8zjWULOrTzKvPcY432I+XPJgObg51RG3IrUb9XXnzirSYJwhzIK+lc+T2epUqVTjxri/TJy POykBJxeprRHjVsVe7Y0TLN8xLu4LDRbIHTYw7xsTtzwAYoyhDdWNefm4Cnb0Xl11eDsFrpggKA NphKchS3zcWv7LtljiirS2Ee60JkGUc0gJkoHNf X-Received: by 2002:a05:6870:a70b:b0:409:5241:8abc with SMTP id 586e51a60fabf-417b91902e7mr4222211fac.20.1773519293652; Sat, 14 Mar 2026 13:14:53 -0700 (PDT) Received: from Atwell-Laptop.. (108-212-132-20.lightspeed.irvnca.sbcglobal.net. [108.212.132.20]) by smtp.gmail.com with ESMTPSA id 586e51a60fabf-4177e5e8185sm11914165fac.12.2026.03.14.13.14.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 14 Mar 2026 13:14:53 -0700 (PDT) From: atwellwea@gmail.com To: netdev@vger.kernel.org, davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com, edumazet@google.com, ncardwell@google.com Cc: linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-trace-kernel@vger.kernel.org, mptcp@lists.linux.dev, dsahern@kernel.org, horms@kernel.org, kuniyu@google.com, andrew+netdev@lunn.ch, willemdebruijn.kernel@gmail.com, jasowang@redhat.com, skhan@linuxfoundation.org, corbet@lwn.net, matttbe@kernel.org, martineau@kernel.org, geliang@kernel.org, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, 0x7f454c46@gmail.com Subject: [PATCH net-next v2 08/14] tcp: extend TCP_REPAIR_WINDOW for live and max-window snapshots Date: Sat, 14 Mar 2026 14:13:42 -0600 Message-ID: <20260314201348.1786972-9-atwellwea@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260314201348.1786972-1-atwellwea@gmail.com> References: <20260314201348.1786972-1-atwellwea@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Wesley Atwell Extend TCP_REPAIR_WINDOW so repair and restore can round-trip both the live rwnd snapshot and the remembered maximum sender-visible window. Keep the ABI append-only by accepting the legacy and v1 prefix lengths on both get and set, rebuilding any missing max-window state from the live window when older userspace restores a socket. Signed-off-by: Wesley Atwell --- include/net/tcp.h | 13 +++---- include/uapi/linux/tcp.h | 8 +++++ net/ipv4/tcp.c | 73 ++++++++++++++++++++++++++++++++++++---- 3 files changed, 81 insertions(+), 13 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index 5b479ad44f89..12e62fea2aaf 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1766,13 +1766,14 @@ static inline bool tcp_space_from_wnd_snapshot(u8 s= caling_ratio, int win, } =20 /* Rebuild hard receive-memory units for data already covered by tp->rcv_w= nd if - * the advertise-time basis is known. + * the advertise-time basis is known. Legacy TCP_REPAIR restores can only + * recover tp->rcv_wnd itself; callers must fall back when the snapshot is + * unknown. */ static inline bool tcp_space_from_rcv_wnd(const struct tcp_sock *tp, int w= in, int *space) { - return tcp_space_from_wnd_snapshot(tp->rcv_wnd_scaling_ratio, win, - space); + return tcp_space_from_wnd_snapshot(tp->rcv_wnd_scaling_ratio, win, space); } =20 /* Same as tcp_space_from_rcv_wnd(), but for the remembered maximum @@ -1800,9 +1801,9 @@ static inline void tcp_scaling_ratio_init(struct sock= *sk) } =20 /* tp->rcv_wnd is paired with the scaling_ratio that was in force when that - * window was last advertised. Callers can leave a zero snapshot when the - * advertise-time basis is unknown and refresh the pair on the next local - * window update. + * window was last advertised. Legacy TCP_REPAIR restores can only recover= the + * window value itself and use a zero snapshot until a fresh local window + * advertisement refreshes the pair. */ static inline void tcp_set_rcv_wnd_snapshot(struct tcp_sock *tp, u32 win, u8 scaling_ratio) diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h index 03772dd4d399..564a77f69130 100644 --- a/include/uapi/linux/tcp.h +++ b/include/uapi/linux/tcp.h @@ -152,6 +152,11 @@ struct tcp_repair_opt { __u32 opt_val; }; =20 +/* Append-only repair ABI. + * Older userspace may stop at rcv_wup or rcv_wnd_scaling_ratio. + * The kernel accepts those prefix lengths and rebuilds any missing + * receive-window snapshot state on restore. + */ struct tcp_repair_window { __u32 snd_wl1; __u32 snd_wnd; @@ -159,6 +164,9 @@ struct tcp_repair_window { =20 __u32 rcv_wnd; __u32 rcv_wup; + __u32 rcv_wnd_scaling_ratio; /* 0 means live-window basis unknown */ + __u32 rcv_mwnd_seq; + __u32 rcv_mwnd_scaling_ratio; /* 0 means max-window basis unknown */ }; =20 enum { diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 66706dbb90f5..39a1265876ea 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -3533,17 +3533,31 @@ static inline bool tcp_can_repair_sock(const struct= sock *sk) (sk->sk_state !=3D TCP_LISTEN); } =20 +/* Keep accepting the pre-extension TCP_REPAIR_WINDOW layout so legacy + * userspace can restore sockets without fabricating a snapshot basis. + */ +static inline int tcp_repair_window_legacy_size(void) +{ + return offsetof(struct tcp_repair_window, rcv_wnd_scaling_ratio); +} + +static inline int tcp_repair_window_v1_size(void) +{ + return offsetof(struct tcp_repair_window, rcv_mwnd_seq); +} + static int tcp_repair_set_window(struct tcp_sock *tp, sockptr_t optbuf, in= t len) { - struct tcp_repair_window opt; + struct tcp_repair_window opt =3D {}; =20 if (!tp->repair) return -EPERM; =20 - if (len !=3D sizeof(opt)) + if (len !=3D tcp_repair_window_legacy_size() && + len !=3D tcp_repair_window_v1_size() && len !=3D sizeof(opt)) return -EINVAL; =20 - if (copy_from_sockptr(&opt, optbuf, sizeof(opt))) + if (copy_from_sockptr(&opt, optbuf, len)) return -EFAULT; =20 if (opt.max_window < opt.snd_wnd) @@ -3559,9 +3573,47 @@ static int tcp_repair_set_window(struct tcp_sock *tp= , sockptr_t optbuf, int len) tp->snd_wnd =3D opt.snd_wnd; tp->max_window =3D opt.max_window; =20 - tp->rcv_wnd =3D opt.rcv_wnd; + if (len =3D=3D tcp_repair_window_legacy_size()) { + /* Legacy repair UAPI has no advertise-time basis for tp->rcv_wnd. + * Mark the snapshot unknown until a fresh local advertisement + * re-establishes the pair. + */ + tcp_set_rcv_wnd_unknown(tp, opt.rcv_wnd); + tp->rcv_wup =3D opt.rcv_wup; + tcp_init_max_rcv_wnd_seq(tp); + return 0; + } + + if (opt.rcv_wnd_scaling_ratio > U8_MAX) + return -EINVAL; + + tcp_set_rcv_wnd_snapshot(tp, opt.rcv_wnd, opt.rcv_wnd_scaling_ratio); tp->rcv_wup =3D opt.rcv_wup; - tp->rcv_mwnd_seq =3D opt.rcv_wup + opt.rcv_wnd; + + if (len =3D=3D tcp_repair_window_v1_size()) { + /* v1 repair can restore the live-window snapshot, but not a + * retracted max-window snapshot. Rebuild it from the live pair + * until a fresh local advertisement updates it again. + */ + tcp_init_max_rcv_wnd_seq(tp); + return 0; + } + + if (opt.rcv_mwnd_scaling_ratio > U8_MAX) + return -EINVAL; + + /* Userspace may repair sequence-space values after checkpoint without + * also rebasing the remembered max advertised right edge. If the exact + * snapshot no longer covers the restored live window, treat it like + * v1 and rebuild the max-window side from the live pair. + */ + if (after(opt.rcv_wup + opt.rcv_wnd, opt.rcv_mwnd_seq)) { + tcp_init_max_rcv_wnd_seq(tp); + return 0; + } + + tp->rcv_mwnd_seq =3D opt.rcv_mwnd_seq; + tp->rcv_mwnd_scaling_ratio =3D opt.rcv_mwnd_scaling_ratio; =20 return 0; } @@ -4650,12 +4702,16 @@ int do_tcp_getsockopt(struct sock *sk, int level, break; =20 case TCP_REPAIR_WINDOW: { - struct tcp_repair_window opt; + struct tcp_repair_window opt =3D {}; =20 if (copy_from_sockptr(&len, optlen, sizeof(int))) return -EFAULT; =20 - if (len !=3D sizeof(opt)) + /* Mirror the accepted set-side prefix lengths so checkpoint + * tools can round-trip exactly the layout version they know. + */ + if (len !=3D tcp_repair_window_legacy_size() && + len !=3D tcp_repair_window_v1_size() && len !=3D sizeof(opt)) return -EINVAL; =20 if (!tp->repair) @@ -4666,6 +4722,9 @@ int do_tcp_getsockopt(struct sock *sk, int level, opt.max_window =3D tp->max_window; opt.rcv_wnd =3D tp->rcv_wnd; opt.rcv_wup =3D tp->rcv_wup; + opt.rcv_wnd_scaling_ratio =3D tp->rcv_wnd_scaling_ratio; + opt.rcv_mwnd_seq =3D tp->rcv_mwnd_seq; + opt.rcv_mwnd_scaling_ratio =3D tp->rcv_mwnd_scaling_ratio; =20 if (copy_to_sockptr(optval, &opt, len)) return -EFAULT; --=20 2.43.0 From nobody Sun Mar 22 08:10:23 2026 Received: from mail-oa1-f52.google.com (mail-oa1-f52.google.com [209.85.160.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 58E4A388E6D for ; Sat, 14 Mar 2026 20:14:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.52 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773519298; cv=none; b=nC9edU8yPkoYLwdOtF2WzFxXk11iJJqw8OjQUSE5zKHIssPNabd9Eu1DEfIeJ/Lwqpdznw1NBeVL3TtU3MxeEFlO/Rc98Hfm2sJz9RiFwqKzfDSqL0zHRvItbAN7xVyB/Yz6ZFBDPG6s3FZ05HcGagFWm+c70XPThzlQFR95FZQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773519298; c=relaxed/simple; bh=dcj1s6Dgcrw3Y7tNOEUlJFcSNNEIPs4iabts40+JwIM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=htLw03m2OIKjPDTYlAqflLb3lsBdZbKzUr+eCk2cdQxBz80x70d9E1kPlRo0FvkSsoajJS6Qi/Ez5oNpEEMxDUoKQ/v7XlLJpz13WzMaQnMiWersJrypRA7lFDeKWT8UtH5wtYS+Inr54LaIYcHyIKoK73Zf1Kt7ThtlWhZmJi0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=mFW7KEXb; arc=none smtp.client-ip=209.85.160.52 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="mFW7KEXb" Received: by mail-oa1-f52.google.com with SMTP id 586e51a60fabf-40f0e14b9f9so2247358fac.1 for ; Sat, 14 Mar 2026 13:14:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1773519295; x=1774124095; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=xpvhfo9p2XSrc/tbQHeJ0/cabcWcjahm5y6siccjbzs=; b=mFW7KEXbx2xHzOcHzEC4+WerjZYDr6T5uY/8RXTxpLRLV9ogJJQlGWMN7baacQtjD8 0w6nsnlYIhN/otPV6vVvu3g8yy/JiJI5mnP77ZdS4pQ8DJymo/PXbkmVocG9uNYovlLz 4/5Ocjf42HImX3dEKcpcff1fprnlrROEXn3BA8jyNwI6LVH6z61MUT2DXpwnDo+NBHsn nfF8O5euOI/3LCSqq7X6hHeSQb2r1QurUdXyaxbvzNuljzDXVgWg9qd3mAn9NdJQGQsA VlLzjLF2EcJ5MOmMhrQ38lHjicLFQO9IiTLg2g4acWNXyaaQvZoGTL8TdOdq/eArIkf8 QIAw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1773519295; x=1774124095; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=xpvhfo9p2XSrc/tbQHeJ0/cabcWcjahm5y6siccjbzs=; b=eOKKqsaUmzMr3ujHoH0UKfQAs3fdd8z6yySc583vlDDXE7Sa9QpZwgvD56HmSnZH7J x2fLLKlkRVN1ryjeAF7XsssQyKqN4BVPuBdd5qObkaCJBoI74jUlAhiVciOxciAFZMmL QiIy1TxN0twmBaTGmnhbUvi26fa4ym/fnA/NV17SK07VbHJjMKvXPH0u6zcuVq91Dj8i dGURWRySnDaq95ZNzE6c18v4TmESySaHzM9gDTPGDYSJs84+R5KsYu8y5speXVWHh6GT SW/b7+9cAro8HkbkNNMhvg338yBBbJ24RiqZx0mfV2tlsKxkMnQd84xAGz8oT+FsuQVq /NLw== X-Gm-Message-State: AOJu0YwwQvPMLGlD7Blte7nmpzOgwFLqQyPm+XSUbaW2fOEGT3EgzZas 4jRvbQbXGrdoejiGFFxkF6u1/KqMKxduGJYihA49f2IofD/JuC1QUm30 X-Gm-Gg: ATEYQzx8ySxk8NrPBSfQLjY2/Ltq3339p2R2xHkTxynf6h9zytVgWmlVjkrAisqMjXh jl00QlL4DF+nxMYq/SX7vdjyNCbqj/8lU383G6S2NHPdTV653kKKZ3cjF3IPhR9R7ZNRPXx91B2 6eofAuI6yBwqnyv60NdegH084T2bxMa45j8yElSj1AYU+X8Rf5hBCMhV7w1pbLJJjgHZ38i6fhT VonFTozN2KxioIUCBXbMm2o+XEJFER5LE8/RjdMrLeB+VWH8VySlwJGsvmN4dheAOug31A0ag5U M4moz1y24cFpStMvcx5uuww7Sa+rFnFluxcpz7KOqh1eyPZ9Zx5zQ2pWV9nIGiblUb4lHOOqD6g 7ZIchx050rJgos3TvlxNq6ApgLjtm2qzMQuTfI38JBgZDXaFY3LsNyfQDEMCCir9618/8F2MNl+ Vw2TYsXyopSs/S9RtO/qh/3iAMmSdOV990HCrtzlWUhmP2FFhUTx1LEQtFUkkuPDROcoaG4AUEL vGxSfODchdCw81qxt1/wlgARsk4D4ZmQQYMTCAy X-Received: by 2002:a05:6870:a796:b0:3e8:8b6f:9d85 with SMTP id 586e51a60fabf-417b937d46cmr4417226fac.29.1773519295129; Sat, 14 Mar 2026 13:14:55 -0700 (PDT) Received: from Atwell-Laptop.. (108-212-132-20.lightspeed.irvnca.sbcglobal.net. [108.212.132.20]) by smtp.gmail.com with ESMTPSA id 586e51a60fabf-4177e5e8185sm11914165fac.12.2026.03.14.13.14.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 14 Mar 2026 13:14:54 -0700 (PDT) From: atwellwea@gmail.com To: netdev@vger.kernel.org, davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com, edumazet@google.com, ncardwell@google.com Cc: linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-trace-kernel@vger.kernel.org, mptcp@lists.linux.dev, dsahern@kernel.org, horms@kernel.org, kuniyu@google.com, andrew+netdev@lunn.ch, willemdebruijn.kernel@gmail.com, jasowang@redhat.com, skhan@linuxfoundation.org, corbet@lwn.net, matttbe@kernel.org, martineau@kernel.org, geliang@kernel.org, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, 0x7f454c46@gmail.com Subject: [PATCH net-next v2 09/14] mptcp: refresh TCP receive-window snapshots on subflows Date: Sat, 14 Mar 2026 14:13:43 -0600 Message-ID: <20260314201348.1786972-10-atwellwea@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260314201348.1786972-1-atwellwea@gmail.com> References: <20260314201348.1786972-1-atwellwea@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Wesley Atwell When MPTCP resynchronizes the per-subflow TCP shadow window from the mptcp-level receive state, refresh the live rwnd snapshot and the remembered maximum-window snapshot along with it. That keeps subflow TCP bookkeeping aligned with the sender-visible window state tracked in the core TCP patches. Signed-off-by: Wesley Atwell --- net/mptcp/options.c | 14 +++++++++----- net/mptcp/protocol.h | 14 +++++++++++--- 2 files changed, 20 insertions(+), 8 deletions(-) diff --git a/net/mptcp/options.c b/net/mptcp/options.c index 8a1c5698983c..64cd637484a4 100644 --- a/net/mptcp/options.c +++ b/net/mptcp/options.c @@ -1073,9 +1073,12 @@ static void rwin_update(struct mptcp_sock *msk, stru= ct sock *ssk, return; =20 /* Some other subflow grew the mptcp-level rwin since rcv_wup, - * resync. + * resync. Keep the TCP shadow window in its advertised u32 domain + * and refresh the advertise-time scaling snapshot while doing so. */ - tp->rcv_wnd +=3D mptcp_rcv_wnd - subflow->rcv_wnd_sent; + tcp_set_rcv_wnd(tp, min_t(u64, (u64)tp->rcv_wnd + + (mptcp_rcv_wnd - subflow->rcv_wnd_sent), + U32_MAX)); tcp_update_max_rcv_wnd_seq(tp); subflow->rcv_wnd_sent =3D mptcp_rcv_wnd; } @@ -1335,12 +1338,13 @@ static void mptcp_set_rwin(struct tcp_sock *tp, str= uct tcphdr *th) if (rcv_wnd_new !=3D rcv_wnd_old) { raise_win: /* The msk-level rcv wnd is after the tcp level one, - * sync the latter. + * sync the latter and refresh its advertise-time scaling + * snapshot. */ rcv_wnd_new =3D rcv_wnd_old; win =3D rcv_wnd_old - ack_seq; - new_win =3D min_t(u64, win, U32_MAX); - tp->rcv_wnd =3D new_win; + tcp_set_rcv_wnd(tp, min_t(u64, win, U32_MAX)); + new_win =3D tp->rcv_wnd; tcp_update_max_rcv_wnd_seq(tp); =20 /* Make sure we do not exceed the maximum possible diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index 0bd1ee860316..4ea95c9c0c7a 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -408,11 +408,19 @@ static inline int mptcp_space_from_win(const struct s= ock *sk, int win) return __tcp_space_from_win(mptcp_sk(sk)->scaling_ratio, win); } =20 +/* MPTCP exposes window space from the mptcp-level receive queue, so it tr= acks + * a separate backlog counter from the subflow backlog embedded in struct = sock. + */ +static inline int mptcp_rwnd_avail(const struct sock *sk) +{ + return READ_ONCE(sk->sk_rcvbuf) - + READ_ONCE(mptcp_sk(sk)->backlog_len) - + tcp_rmem_used(sk); +} + static inline int __mptcp_space(const struct sock *sk) { - return mptcp_win_from_space(sk, READ_ONCE(sk->sk_rcvbuf) - - READ_ONCE(mptcp_sk(sk)->backlog_len) - - sk_rmem_alloc_get(sk)); + return mptcp_win_from_space(sk, mptcp_rwnd_avail(sk)); } =20 static inline struct mptcp_data_frag *mptcp_send_head(const struct sock *s= k) --=20 2.43.0 From nobody Sun Mar 22 08:10:23 2026 Received: from mail-oo1-f45.google.com (mail-oo1-f45.google.com [209.85.161.45]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 823E03845B4 for ; Sat, 14 Mar 2026 20:14:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.161.45 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773519301; cv=none; b=HR6tx2U39zfOBpzB+PNdZQzGiZ/6JDG9DjMG6lPuHe4waYL/UhU3nAC7S93ZesjkBk0ucAJ9y4ZXVK1ZAvu/jalrgkIkKxk0HQf/NJlDe5FuH84hLuIdnTYa5rbygQq/3bvfawVHLrSAmqxNBBSzljlWEhfd3UBVJ6YgP9Vud+Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773519301; c=relaxed/simple; bh=EnceP3zAxby9Gd7UKyzsACNfFh/U0eVEzsye/mjXh5Q=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ajpJU4/vDtq3DtMwFIYxVCizvU0tiUDJbq3tVHf6zLSxrKvHCnqxplkZVHgVpuzfrNecPsA8NCHjDh3Sc0ctfSUJqDhRyHbAuE7UbIEk31qksVA7XQZxT/jwZw5//SrYWEdmplNzIVAhaEe24sp/xn5zju6BuiSa9MY+hpeWXTI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=ZQzm6OK0; arc=none smtp.client-ip=209.85.161.45 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="ZQzm6OK0" Received: by mail-oo1-f45.google.com with SMTP id 006d021491bc7-67bb87f866bso2136942eaf.2 for ; Sat, 14 Mar 2026 13:14:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1773519297; x=1774124097; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=FUx81sAr4LqFx9f2CLQOytjXeMe6lrxWLv4FjaWsnVM=; b=ZQzm6OK0AS/dByADCifyfvmuM5Nt67P8C03nokYMZPaCYSMhOi5QYT/N/0p9uOpxal KI7yxacerUUpHlkzMRCwgD7Wuq3wVCFkXG6Z7vihdsJDuh3EgGaoGNCiVREgnPjRlDZ6 19qu5KbKDeaHjzALEqpK1mtq4QuuQUgAVAlgDXXxF5GDFRipdXDjvPdilJoajzk+7n0r 078x5vhpx2PbL4muLjqTkKNwIdE568pC+duI47AyQy1TN9VXVbJE53Te0UeQuEEn11Gk LjouXnMvTt+VPd6svE260ycFYtBLM7XRYyCjnJyCZQnZwxnKetFScLO3kzLCkBKZre9X YPng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1773519297; x=1774124097; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=FUx81sAr4LqFx9f2CLQOytjXeMe6lrxWLv4FjaWsnVM=; b=sRlF74Uz5zfx3NANXxBYlCSWF569QinY0WsAjS69+9Mf0VqdjeKvC9ccI1WzjzQ/JW EJyZ177rwW2JJDoc66OoRrGZ0EbSlLg0E3EXuglyhnIvZcFZrsHEb8Ud/0CagQMa9rdn F8zwumDjmg8iKllJuMKM/pFRDF7VAnXUYOCFgRKHY7t4oNKCAgxygBq6USJS3wUTjzU5 cqVx2FH7Z5v882VEPfSogr7MpUpopKtM8hqAKyqgKdGFBmw8H5nSk+xUlEEjDb7MEXu6 ubwsVTAcRWxJCroVmZE6ENlwB5RUwdFyn5LwyCc6E6L9dGEaAYXeAVo3M+Zo0cDVAMSM 2HCA== X-Gm-Message-State: AOJu0YxwEoRPI5+xOzNqO+Lf7RbEoPtatFyTtrFt7zsAOZhsuKfG+D3W hgLJKCFC4jQoNq4PEK8UJIIZaaeanOwEcuQtvoRsAdz1FfK9q87nUlGe X-Gm-Gg: ATEYQzwFMxNcV520uvSloTch6kUX1/APRAvA9XNCgTwG+MUpBhqYR3+VLI9XKEmPi0Y CQtUsF48aaVYIlNu0O9FLmyFBYRWFLqdWYtVKSNlZNMy9q28IxgnRMwjnWqL9ChI8350F61I18D Yh0b800HDatYZiZVszHVAHvo2liDvpyU4qRg1xSq9f1DlLcu8M228hX5FTTeg2Ah5dTv0HWVkFe yi5yAtOtU0+1IA+Vcdg95FXUPhGr13/OASc6rnM5vzC4oPb2JlxRhyKzM1Z+OKgJ2lk2va70WH8 EP/HQ/pUrGvINhFqaf7axu9Uf0kgeSzIARkJpB4/TvJb47l6SB+gqZVpqMmQww1O4XLgSuRjAHn HzrMwmuMqV2aAYhy0ved8ZwqiDwrqMqWFzoiCdvEqZJumq/WElXJfIZAO0xyBM3onSg9KarMg7/ htj9D9NYXAqqmtLAlxkzO9cmd+Uj/ZM/cka6iBuvKmxPv6rgNLiWixmwLDe8Nct5bNE90mFf3fH atL5EkDBfrnDYLxF7Ux6yebXNtNBL38OpyA8Vj+ X-Received: by 2002:a05:6820:883:b0:67a:1d28:7bae with SMTP id 006d021491bc7-67bdaa2eedbmr4832661eaf.37.1773519297408; Sat, 14 Mar 2026 13:14:57 -0700 (PDT) Received: from Atwell-Laptop.. (108-212-132-20.lightspeed.irvnca.sbcglobal.net. [108.212.132.20]) by smtp.gmail.com with ESMTPSA id 586e51a60fabf-4177e5e8185sm11914165fac.12.2026.03.14.13.14.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 14 Mar 2026 13:14:56 -0700 (PDT) From: atwellwea@gmail.com To: netdev@vger.kernel.org, davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com, edumazet@google.com, ncardwell@google.com Cc: linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-trace-kernel@vger.kernel.org, mptcp@lists.linux.dev, dsahern@kernel.org, horms@kernel.org, kuniyu@google.com, andrew+netdev@lunn.ch, willemdebruijn.kernel@gmail.com, jasowang@redhat.com, skhan@linuxfoundation.org, corbet@lwn.net, matttbe@kernel.org, martineau@kernel.org, geliang@kernel.org, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, 0x7f454c46@gmail.com Subject: [PATCH net-next v2 10/14] tcp: expose rmem and backlog in tcp and mptcp rcvbuf_grow tracepoints Date: Sat, 14 Mar 2026 14:13:44 -0600 Message-ID: <20260314201348.1786972-11-atwellwea@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260314201348.1786972-1-atwellwea@gmail.com> References: <20260314201348.1786972-1-atwellwea@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Wesley Atwell Extend the tcp_rcvbuf_grow and mptcp_rcvbuf_grow tracepoints with the live receive-memory allocation and backlog occupancy that now drive the window-growth decisions in this series. That makes it easier to inspect sender-visible rwnd state against the actual hard receive-memory inputs. Signed-off-by: Wesley Atwell --- include/trace/events/mptcp.h | 11 +++++++---- include/trace/events/tcp.h | 12 +++++++----- 2 files changed, 14 insertions(+), 9 deletions(-) diff --git a/include/trace/events/mptcp.h b/include/trace/events/mptcp.h index 269d949b2025..167970e8e0a5 100644 --- a/include/trace/events/mptcp.h +++ b/include/trace/events/mptcp.h @@ -199,6 +199,8 @@ TRACE_EVENT(mptcp_rcvbuf_grow, __field(__u32, inq) __field(__u32, space) __field(__u32, ooo_space) + __field(__u32, rmem_alloc) + __field(__u32, backlog_len) __field(__u32, rcvbuf) __field(__u32, rcv_wnd) __field(__u8, scaling_ratio) @@ -228,6 +230,8 @@ TRACE_EVENT(mptcp_rcvbuf_grow, MPTCP_SKB_CB(msk->ooo_last_skb)->end_seq - msk->ack_seq; =20 + __entry->rmem_alloc =3D tcp_rmem_used(sk); + __entry->backlog_len =3D READ_ONCE(msk->backlog_len); __entry->rcvbuf =3D sk->sk_rcvbuf; __entry->rcv_wnd =3D atomic64_read(&msk->rcv_wnd_sent) - msk->ack_seq; @@ -248,12 +252,11 @@ TRACE_EVENT(mptcp_rcvbuf_grow, __entry->skaddr =3D sk; ), =20 - TP_printk("time=3D%u rtt_us=3D%u copied=3D%u inq=3D%u space=3D%u ooo=3D%u= scaling_ratio=3D%u " - "rcvbuf=3D%u rcv_wnd=3D%u family=3D%d sport=3D%hu dport=3D%hu saddr=3D= %pI4 " - "daddr=3D%pI4 saddrv6=3D%pI6c daddrv6=3D%pI6c skaddr=3D%p", + TP_printk("time=3D%u rtt_us=3D%u copied=3D%u inq=3D%u space=3D%u ooo=3D%u= scaling_ratio=3D%u rmem_alloc=3D%u backlog_len=3D%u rcvbuf=3D%u rcv_wnd=3D= %u family=3D%d sport=3D%hu dport=3D%hu saddr=3D%pI4 daddr=3D%pI4 saddrv6=3D= %pI6c daddrv6=3D%pI6c skaddr=3D%p", __entry->time, __entry->rtt_us, __entry->copied, __entry->inq, __entry->space, __entry->ooo_space, - __entry->scaling_ratio, __entry->rcvbuf, __entry->rcv_wnd, + __entry->scaling_ratio, __entry->rmem_alloc, + __entry->backlog_len, __entry->rcvbuf, __entry->rcv_wnd, __entry->family, __entry->sport, __entry->dport, __entry->saddr, __entry->daddr, __entry->saddr_v6, __entry->daddr_v6, __entry->skaddr) diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h index f155f95cdb6e..92d0bd6be0ba 100644 --- a/include/trace/events/tcp.h +++ b/include/trace/events/tcp.h @@ -217,6 +217,8 @@ TRACE_EVENT(tcp_rcvbuf_grow, __field(__u32, inq) __field(__u32, space) __field(__u32, ooo_space) + __field(__u32, rmem_alloc) + __field(__u32, backlog_len) __field(__u32, rcvbuf) __field(__u32, rcv_ssthresh) __field(__u32, window_clamp) @@ -247,6 +249,8 @@ TRACE_EVENT(tcp_rcvbuf_grow, TCP_SKB_CB(tp->ooo_last_skb)->end_seq - tp->rcv_nxt; =20 + __entry->rmem_alloc =3D tcp_rmem_used(sk); + __entry->backlog_len =3D READ_ONCE(sk->sk_backlog.len); __entry->rcvbuf =3D sk->sk_rcvbuf; __entry->rcv_ssthresh =3D tp->rcv_ssthresh; __entry->window_clamp =3D tp->window_clamp; @@ -269,13 +273,11 @@ TRACE_EVENT(tcp_rcvbuf_grow, __entry->sock_cookie =3D sock_gen_cookie(sk); ), =20 - TP_printk("time=3D%u rtt_us=3D%u copied=3D%u inq=3D%u space=3D%u ooo=3D%u= scaling_ratio=3D%u rcvbuf=3D%u " - "rcv_ssthresh=3D%u window_clamp=3D%u rcv_wnd=3D%u " - "family=3D%s sport=3D%hu dport=3D%hu saddr=3D%pI4 daddr=3D%pI4 " - "saddrv6=3D%pI6c daddrv6=3D%pI6c skaddr=3D%p sock_cookie=3D%llx", + TP_printk("time=3D%u rtt_us=3D%u copied=3D%u inq=3D%u space=3D%u ooo=3D%u= scaling_ratio=3D%u rmem_alloc=3D%u backlog_len=3D%u rcvbuf=3D%u rcv_ssthre= sh=3D%u window_clamp=3D%u rcv_wnd=3D%u family=3D%s sport=3D%hu dport=3D%hu = saddr=3D%pI4 daddr=3D%pI4 saddrv6=3D%pI6c daddrv6=3D%pI6c skaddr=3D%p sock_= cookie=3D%llx", __entry->time, __entry->rtt_us, __entry->copied, __entry->inq, __entry->space, __entry->ooo_space, - __entry->scaling_ratio, __entry->rcvbuf, + __entry->scaling_ratio, __entry->rmem_alloc, + __entry->backlog_len, __entry->rcvbuf, __entry->rcv_ssthresh, __entry->window_clamp, __entry->rcv_wnd, show_family_name(__entry->family), --=20 2.43.0 From nobody Sun Mar 22 08:10:23 2026 Received: from mail-oo1-f51.google.com (mail-oo1-f51.google.com [209.85.161.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7869738CFE9 for ; Sat, 14 Mar 2026 20:15:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.161.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773519303; cv=none; b=ZUIfi4gqHv0vRr2X2XtdRcRLp+Qdeox7pMWdlBCyP+6KCX+a7NzHjNkhu3DsKmO0KK83+lRvIHOgLhh+9MRjQmhOPwRILoiOJMzXK3BXP5SvQClDNwnSj9Ta028PmOPINqZ/Rk+W/oOSLfqANp8B5F/K1DCHWzNBf5aJV9CtL3Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773519303; c=relaxed/simple; bh=h2w84+E53fjjORSXCTmZqDRXCz4rcg4b0xrFtcD8+/M=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=c2F+0I2qOt92MbNaBCRXCjPjFEZDBvzKZsvpKkJdVgDhAjD3ebRHcTjia8H08FQAhhgeWlDDwoSFbQp2rDXyBsvd1vyBnhmX4A41vzQeR8gMDehJ+0wJIb+Jcf37HXtvI/HDWzLnUyFXfiYzEzhDRYRn7LafIOfyDw8vahI0XJM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=j22CojKk; arc=none smtp.client-ip=209.85.161.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="j22CojKk" Received: by mail-oo1-f51.google.com with SMTP id 006d021491bc7-67bb04151dcso1903296eaf.0 for ; Sat, 14 Mar 2026 13:15:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1773519299; x=1774124099; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=WzFkYp6vQFMuSQL+rwhcnHsZCs8WUPXFTJK3qolPQ8w=; b=j22CojKkEk6Mn7enjb9ZrHV/qZKDrIz7Ab5gl+dFN5t603IivBcmEEiXFBm2QYuVGO 8iDuus6MssUDPbWToyM+W/hyS9baRxmjKXa0e/zugPPqL0gFfeAMAVVg3FHUqmasigcN c7qMT+DWO1AD6SKPf2L+9qYXBPZPeB1pTsZ44NwkSPMuNFkhX9ObfCFNiE2aXlsGhnxE 8umR5D5aBEyFP4Mui4hd93/gagS+VIh6glSM9eY/H9x4uHg5zlTJ5xGyXYf5yx7xkE9u Mtnx43sU7YHMr8OqaGr+/1uxO18vrcra64+zNO3nAsmxpIWzLGqDJ8RvyZuLhrznj0QH ja6Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1773519299; x=1774124099; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=WzFkYp6vQFMuSQL+rwhcnHsZCs8WUPXFTJK3qolPQ8w=; b=rneoDm2j4t2iYG7/j+hhRhAJna0Ka3SUeKKAU7/ier1qY5XasB+xyJqR93si8RjPR6 kPmQI4ah8f2z49T3aJu+g1sq0W1toAg36inGaCO4pHDDy0JzMBOugz6JixMZWUwV8CXJ 3JzgR8uuK7Pxscs1eMCuZguo4MkYDwhH5Qr3Qr1kxDeQoZxICrOWvcy5+eXEuO+tX333 OQdsxerTuaEmZ7PpGlRQ5nAa+RUvtemzHMrj1q0LC5Q/ftuYENm65xZ2Wuey8GUP1Q68 N9DFPDtj78Th1whar2ndq/h6FaA6ToeAMAa7MP4Wg/1vYpx005DGmk9/PvvHaCPu041q a0uw== X-Gm-Message-State: AOJu0YyOxIwmp1r/1afXt1avX6ijpHdlBciVxL+h0SSFM8TZAt3OmNEt yGAK3Lq7TEo6PUH6cA8VZoliMVwRuKIICR3ppPBzaAWNIMlLf/zzfS/b X-Gm-Gg: ATEYQzwgR79Q1VBwYg7ryc4BSPXIqYECoUMVFYWt+3UBXYL1g9BhDxzJ51Efw8QrBZk QgfuQivFYeng83Ajn5drqdFM++3udWxVdaJi+J0TxXFC/e814lHBezpLibbA4DfxC/CT2aZPJPk LNk3itEsc+T7hR9HV2tpgJQSxa9TJVVQA8edJpQS0I4bu/fIgyNzshr0sUM/KnrHRBRyMgNsVKD my3Xsz5JU6MKFXD4uQm1Ox9d0/s5c3tx1SAxZI+6TOmZgB52p9uFX0lnBV5FrzbSuZYVFd8IGfw 3ZLwzDs6VFRVJnfGZnP1sKofFQqkLcjiTANflSg5E+tPbHoypcZd4qzJ3Yxa/HX8g28q5e+N7V5 GOGKUXlugt82W0s9uENRx/mLF9Nggw0UMe+4cTwNgP3iH2ffmMtFAFZ/VywLMZsR6ou7/KMKP1b t4GJFgzHsDenDhb8vGrAeBE6ISsRGhWO2Z3nhsEnKs/Xm2Q7IbRdiAD6M3Kz8cbhBvCZuvpNFw0 UAe/hAgIuzuZpZuLKS1E+v2+TmreC1KAi1+Pm+W X-Received: by 2002:a05:6820:616:b0:67b:aca0:3d96 with SMTP id 006d021491bc7-67bdaa7db03mr5100287eaf.65.1773519299237; Sat, 14 Mar 2026 13:14:59 -0700 (PDT) Received: from Atwell-Laptop.. (108-212-132-20.lightspeed.irvnca.sbcglobal.net. [108.212.132.20]) by smtp.gmail.com with ESMTPSA id 586e51a60fabf-4177e5e8185sm11914165fac.12.2026.03.14.13.14.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 14 Mar 2026 13:14:58 -0700 (PDT) From: atwellwea@gmail.com To: netdev@vger.kernel.org, davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com, edumazet@google.com, ncardwell@google.com Cc: linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-trace-kernel@vger.kernel.org, mptcp@lists.linux.dev, dsahern@kernel.org, horms@kernel.org, kuniyu@google.com, andrew+netdev@lunn.ch, willemdebruijn.kernel@gmail.com, jasowang@redhat.com, skhan@linuxfoundation.org, corbet@lwn.net, matttbe@kernel.org, martineau@kernel.org, geliang@kernel.org, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, 0x7f454c46@gmail.com Subject: [PATCH net-next v2 11/14] selftests: tcp_ao: cover legacy, v1, and retracted repair windows Date: Sat, 14 Mar 2026 14:13:45 -0600 Message-ID: <20260314201348.1786972-12-atwellwea@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260314201348.1786972-1-atwellwea@gmail.com> References: <20260314201348.1786972-1-atwellwea@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Wesley Atwell Extend the tcp_ao repair selftests to exercise the legacy, v1, and current TCP_REPAIR_WINDOW layouts, plus a synthesized retracted-window image that preserves a larger historical right edge. These tests validate both the append-only ABI contract and the restore- time rebuilding of any snapshot state older userspace could not save. Signed-off-by: Wesley Atwell --- .../testing/selftests/net/tcp_ao/lib/aolib.h | 83 +++++++- .../testing/selftests/net/tcp_ao/lib/repair.c | 18 +- .../selftests/net/tcp_ao/self-connect.c | 201 +++++++++++++++++- 3 files changed, 279 insertions(+), 23 deletions(-) diff --git a/tools/testing/selftests/net/tcp_ao/lib/aolib.h b/tools/testing= /selftests/net/tcp_ao/lib/aolib.h index ebb2899c12fe..ef08db831457 100644 --- a/tools/testing/selftests/net/tcp_ao/lib/aolib.h +++ b/tools/testing/selftests/net/tcp_ao/lib/aolib.h @@ -13,6 +13,7 @@ #include #include #include +#include #include #include #include @@ -671,17 +672,55 @@ struct tcp_sock_state { int timestamp; }; =20 -extern void __test_sock_checkpoint(int sk, struct tcp_sock_state *state, - void *addr, size_t addr_size); +/* Legacy userspace stops before the snapshot field and therefore exercises + * the kernel's unknown-snapshot fallback path. + */ +static inline socklen_t test_tcp_repair_window_legacy_size(void) +{ + return offsetof(struct tcp_repair_window, rcv_wnd_scaling_ratio); +} + +static inline socklen_t test_tcp_repair_window_v1_size(void) +{ + return offsetof(struct tcp_repair_window, rcv_mwnd_seq); +} + +static inline socklen_t test_tcp_repair_window_exact_size(void) +{ + return sizeof(struct tcp_repair_window); +} + +void __test_sock_checkpoint_opt(int sk, struct tcp_sock_state *state, + socklen_t trw_len, + void *addr, size_t addr_size); static inline void test_sock_checkpoint(int sk, struct tcp_sock_state *sta= te, sockaddr_af *saddr) { - __test_sock_checkpoint(sk, state, saddr, sizeof(*saddr)); + __test_sock_checkpoint_opt(sk, state, test_tcp_repair_window_exact_size(), + saddr, sizeof(*saddr)); +} + +static inline void test_sock_checkpoint_legacy(int sk, + struct tcp_sock_state *state, + sockaddr_af *saddr) +{ + __test_sock_checkpoint_opt(sk, state, test_tcp_repair_window_legacy_size(= ), + saddr, sizeof(*saddr)); +} + +static inline void test_sock_checkpoint_v1(int sk, + struct tcp_sock_state *state, + sockaddr_af *saddr) +{ + __test_sock_checkpoint_opt(sk, state, test_tcp_repair_window_v1_size(), + saddr, sizeof(*saddr)); } extern void test_ao_checkpoint(int sk, struct tcp_ao_repair *state); -extern void __test_sock_restore(int sk, const char *device, - struct tcp_sock_state *state, - void *saddr, void *daddr, size_t addr_size); +void __test_sock_restore_opt(int sk, const char *device, + struct tcp_sock_state *state, + socklen_t trw_len, + void *saddr, void *daddr, + size_t addr_size); static inline void test_sock_restore(int sk, struct tcp_sock_state *state, sockaddr_af *saddr, const union tcp_addr daddr, @@ -690,7 +729,37 @@ static inline void test_sock_restore(int sk, struct tc= p_sock_state *state, sockaddr_af addr; =20 tcp_addr_to_sockaddr_in(&addr, &daddr, htons(dport)); - __test_sock_restore(sk, veth_name, state, saddr, &addr, sizeof(addr)); + __test_sock_restore_opt(sk, veth_name, state, + test_tcp_repair_window_exact_size(), + saddr, &addr, sizeof(addr)); +} + +static inline void test_sock_restore_legacy(int sk, + struct tcp_sock_state *state, + sockaddr_af *saddr, + const union tcp_addr daddr, + unsigned int dport) +{ + sockaddr_af addr; + + tcp_addr_to_sockaddr_in(&addr, &daddr, htons(dport)); + __test_sock_restore_opt(sk, veth_name, state, + test_tcp_repair_window_legacy_size(), + saddr, &addr, sizeof(addr)); +} + +static inline void test_sock_restore_v1(int sk, + struct tcp_sock_state *state, + sockaddr_af *saddr, + const union tcp_addr daddr, + unsigned int dport) +{ + sockaddr_af addr; + + tcp_addr_to_sockaddr_in(&addr, &daddr, htons(dport)); + __test_sock_restore_opt(sk, veth_name, state, + test_tcp_repair_window_v1_size(), + saddr, &addr, sizeof(addr)); } extern void test_ao_restore(int sk, struct tcp_ao_repair *state); extern void test_sock_state_free(struct tcp_sock_state *state); diff --git a/tools/testing/selftests/net/tcp_ao/lib/repair.c b/tools/testin= g/selftests/net/tcp_ao/lib/repair.c index 9893b3ba69f5..befbd0f72db5 100644 --- a/tools/testing/selftests/net/tcp_ao/lib/repair.c +++ b/tools/testing/selftests/net/tcp_ao/lib/repair.c @@ -66,8 +66,9 @@ static void test_sock_checkpoint_queue(int sk, int queue,= int qlen, test_error("recv(%d): %d", qlen, ret); } =20 -void __test_sock_checkpoint(int sk, struct tcp_sock_state *state, - void *addr, size_t addr_size) +void __test_sock_checkpoint_opt(int sk, struct tcp_sock_state *state, + socklen_t trw_len, + void *addr, size_t addr_size) { socklen_t len =3D sizeof(state->info); int ret; @@ -82,9 +83,9 @@ void __test_sock_checkpoint(int sk, struct tcp_sock_state= *state, if (getsockname(sk, addr, &len) || len !=3D addr_size) test_error("getsockname(): %d", (int)len); =20 - len =3D sizeof(state->trw); + len =3D trw_len; ret =3D getsockopt(sk, SOL_TCP, TCP_REPAIR_WINDOW, &state->trw, &len); - if (ret || len !=3D sizeof(state->trw)) + if (ret || len !=3D trw_len) test_error("getsockopt(TCP_REPAIR_WINDOW): %d", (int)len); =20 if (ioctl(sk, SIOCOUTQ, &state->outq_len)) @@ -160,9 +161,10 @@ static void test_sock_restore_queue(int sk, int queue,= void *buf, int len) } while (len > 0); } =20 -void __test_sock_restore(int sk, const char *device, - struct tcp_sock_state *state, - void *saddr, void *daddr, size_t addr_size) +void __test_sock_restore_opt(int sk, const char *device, + struct tcp_sock_state *state, + socklen_t trw_len, + void *saddr, void *daddr, size_t addr_size) { struct tcp_repair_opt opts[4]; unsigned int opt_nr =3D 0; @@ -215,7 +217,7 @@ void __test_sock_restore(int sk, const char *device, } test_sock_restore_queue(sk, TCP_RECV_QUEUE, state->in.buf, state->inq_len= ); test_sock_restore_queue(sk, TCP_SEND_QUEUE, state->out.buf, state->outq_l= en); - if (setsockopt(sk, SOL_TCP, TCP_REPAIR_WINDOW, &state->trw, sizeof(state-= >trw))) + if (setsockopt(sk, SOL_TCP, TCP_REPAIR_WINDOW, &state->trw, trw_len)) test_error("setsockopt(TCP_REPAIR_WINDOW)"); } =20 diff --git a/tools/testing/selftests/net/tcp_ao/self-connect.c b/tools/test= ing/selftests/net/tcp_ao/self-connect.c index 2c73bea698a6..a7c0f2edd351 100644 --- a/tools/testing/selftests/net/tcp_ao/self-connect.c +++ b/tools/testing/selftests/net/tcp_ao/self-connect.c @@ -4,6 +4,14 @@ #include "aolib.h" =20 static union tcp_addr local_addr; +static bool checked_repair_window_lens; + +enum repair_window_mode { + REPAIR_WINDOW_CURRENT, + REPAIR_WINDOW_LEGACY, + REPAIR_WINDOW_V1, + REPAIR_WINDOW_RETRACTED, +}; =20 static void __setup_lo_intf(const char *lo_intf, const char *addr_str, uint8_t prefix) @@ -30,8 +38,157 @@ static void setup_lo_intf(const char *lo_intf) #endif } =20 +/* The repair ABI accepts the legacy, v1, and current layouts. */ +static void test_repair_window_len_contract(int sk) +{ + struct tcp_repair_window trw =3D {}; + socklen_t len =3D test_tcp_repair_window_exact_size(); + socklen_t v1_len =3D test_tcp_repair_window_v1_size(); + socklen_t bad_len =3D test_tcp_repair_window_legacy_size() + 1; + int ret; + + if (checked_repair_window_lens) + return; + + checked_repair_window_lens =3D true; + + ret =3D getsockopt(sk, SOL_TCP, TCP_REPAIR_WINDOW, &trw, &len); + if (ret || len !=3D test_tcp_repair_window_exact_size()) + test_error("getsockopt(TCP_REPAIR_WINDOW): %d", (int)len); + + len =3D v1_len; + ret =3D getsockopt(sk, SOL_TCP, TCP_REPAIR_WINDOW, &trw, &len); + if (ret || len !=3D v1_len) + test_fail("repair-window get accepts v1 len"); + else + test_ok("repair-window get accepts v1 len"); + + len =3D bad_len; + ret =3D getsockopt(sk, SOL_TCP, TCP_REPAIR_WINDOW, &trw, &len); + if (ret =3D=3D 0 || errno !=3D EINVAL) + test_fail("repair-window get rejects invalid len"); + else + test_ok("repair-window get rejects invalid len"); + + ret =3D setsockopt(sk, SOL_TCP, TCP_REPAIR_WINDOW, &trw, bad_len); + if (ret =3D=3D 0 || errno !=3D EINVAL) + test_fail("repair-window set rejects invalid len"); + else + test_ok("repair-window set rejects invalid len"); + + ret =3D setsockopt(sk, SOL_TCP, TCP_REPAIR_WINDOW, &trw, v1_len + 1); + if (ret =3D=3D 0 || errno !=3D EINVAL) + test_fail("repair-window set rejects invalid v1+1 len"); + else + test_ok("repair-window set rejects invalid v1+1 len"); +} + +static void test_retracted_repair_window_state(int sk, + struct tcp_sock_state *img) +{ + struct tcp_repair_window trw =3D {}; + socklen_t len =3D sizeof(trw); + int ret; + + ret =3D getsockopt(sk, SOL_TCP, TCP_REPAIR_WINDOW, &trw, &len); + if (ret || len !=3D sizeof(trw)) + test_error("getsockopt(TCP_REPAIR_WINDOW): %d", (int)len); + + if (trw.rcv_mwnd_seq !=3D img->trw.rcv_mwnd_seq || + trw.rcv_mwnd_scaling_ratio !=3D img->trw.rcv_mwnd_scaling_ratio || + trw.rcv_wnd !=3D img->trw.rcv_wnd || + trw.rcv_wup !=3D img->trw.rcv_wup || + trw.rcv_wnd_scaling_ratio !=3D img->trw.rcv_wnd_scaling_ratio) + test_fail("repair-window restore preserves retracted state"); + else + test_ok("repair-window restore preserves retracted state"); +} + +static void test_v1_repair_window_state(int sk, struct tcp_sock_state *img) +{ + struct tcp_repair_window trw =3D {}; + socklen_t len =3D sizeof(trw); + __u32 max_right =3D img->trw.rcv_wup + img->trw.rcv_wnd; + int ret; + + ret =3D getsockopt(sk, SOL_TCP, TCP_REPAIR_WINDOW, &trw, &len); + if (ret || len !=3D sizeof(trw)) + test_error("getsockopt(TCP_REPAIR_WINDOW): %d", (int)len); + + if (trw.rcv_mwnd_seq !=3D max_right || + trw.rcv_mwnd_scaling_ratio !=3D img->trw.rcv_wnd_scaling_ratio || + trw.rcv_wnd !=3D img->trw.rcv_wnd || + trw.rcv_wup !=3D img->trw.rcv_wup || + trw.rcv_wnd_scaling_ratio !=3D img->trw.rcv_wnd_scaling_ratio) + test_fail("repair-window v1 restore rebuilds max-window state"); + else + test_ok("repair-window v1 restore rebuilds max-window state"); +} + +/* Synthesize a repair image whose live rwnd was retracted after a larger + * right edge had already been advertised, so restore testing can validate + * snapshot preservation without depending on the live receive path. + */ +static bool make_retracted_repair_window_state(struct tcp_sock_state *img) +{ + __u32 gran =3D 1U << img->info.tcpi_rcv_wscale; + __u32 max_right; + __u32 shrink; + + if (!(img->info.tcpi_options & TCPI_OPT_WSCALE)) + return false; + + max_right =3D img->trw.rcv_wup + img->trw.rcv_wnd; + shrink =3D img->trw.rcv_wnd / 4; + if (shrink < gran) + shrink =3D gran; + if (shrink >=3D img->trw.rcv_wnd) + shrink =3D img->trw.rcv_wnd >> 1; + if (shrink =3D=3D 0 || shrink >=3D img->trw.rcv_wnd) + return false; + + img->trw.rcv_wnd -=3D shrink; + img->trw.rcv_mwnd_seq =3D max_right; + img->trw.rcv_mwnd_scaling_ratio =3D img->trw.rcv_wnd_scaling_ratio; + return true; +} + +static socklen_t repair_window_len(enum repair_window_mode mode) +{ + switch (mode) { + case REPAIR_WINDOW_LEGACY: + return test_tcp_repair_window_legacy_size(); + case REPAIR_WINDOW_V1: + return test_tcp_repair_window_v1_size(); + case REPAIR_WINDOW_CURRENT: + case REPAIR_WINDOW_RETRACTED: + return test_tcp_repair_window_exact_size(); + } + + return test_tcp_repair_window_exact_size(); +} + +static void test_sock_checkpoint_mode(enum repair_window_mode mode, int sk, + struct tcp_sock_state *img, + sockaddr_af *addr) +{ + switch (mode) { + case REPAIR_WINDOW_LEGACY: + test_sock_checkpoint_legacy(sk, img, addr); + break; + case REPAIR_WINDOW_V1: + test_sock_checkpoint_v1(sk, img, addr); + break; + case REPAIR_WINDOW_CURRENT: + case REPAIR_WINDOW_RETRACTED: + test_sock_checkpoint(sk, img, addr); + break; + } +} + static void tcp_self_connect(const char *tst, unsigned int port, - bool different_keyids, bool check_restore) + bool different_keyids, bool check_restore, + enum repair_window_mode repair_window_mode) { struct tcp_counters before, after; uint64_t before_aogood, after_aogood; @@ -109,7 +266,16 @@ static void tcp_self_connect(const char *tst, unsigned= int port, } =20 test_enable_repair(sk); - test_sock_checkpoint(sk, &img, &addr); + test_repair_window_len_contract(sk); + test_sock_checkpoint_mode(repair_window_mode, sk, &img, &addr); + if (repair_window_mode =3D=3D REPAIR_WINDOW_RETRACTED && + !make_retracted_repair_window_state(&img)) { + test_sock_state_free(&img); + netstat_free(ns_before); + close(sk); + test_skip("%s: no scaled repair window to retract", tst); + return; + } #ifdef IPV6_TEST addr.sin6_port =3D htons(port + 1); #else @@ -123,7 +289,9 @@ static void tcp_self_connect(const char *tst, unsigned = int port, test_error("socket()"); =20 test_enable_repair(sk); - __test_sock_restore(sk, "lo", &img, &addr, &addr, sizeof(addr)); + __test_sock_restore_opt(sk, "lo", &img, + repair_window_len(repair_window_mode), + &addr, &addr, sizeof(addr)); if (different_keyids) { if (test_add_repaired_key(sk, DEFAULT_TEST_PASSWORD, 0, local_addr, -1, 7, 5)) @@ -137,6 +305,10 @@ static void tcp_self_connect(const char *tst, unsigned= int port, test_error("setsockopt(TCP_AO_ADD_KEY)"); } test_ao_restore(sk, &ao_img); + if (repair_window_mode =3D=3D REPAIR_WINDOW_V1) + test_v1_repair_window_state(sk, &img); + if (repair_window_mode =3D=3D REPAIR_WINDOW_RETRACTED) + test_retracted_repair_window_state(sk, &img); test_disable_repair(sk); test_sock_state_free(&img); if (test_client_verify(sk, 100, nr_packets)) { @@ -165,20 +337,33 @@ static void *client_fn(void *arg) =20 setup_lo_intf("lo"); =20 - tcp_self_connect("self-connect(same keyids)", port++, false, false); + tcp_self_connect("self-connect(same keyids)", port++, false, false, + REPAIR_WINDOW_CURRENT); =20 /* expecting rnext to change based on the first segment RNext !=3D Curren= t */ trace_ao_event_expect(TCP_AO_RNEXT_REQUEST, local_addr, local_addr, port, port, 0, -1, -1, -1, -1, -1, 7, 5, -1); - tcp_self_connect("self-connect(different keyids)", port++, true, false); - tcp_self_connect("self-connect(restore)", port, false, true); + tcp_self_connect("self-connect(different keyids)", port++, true, false, + REPAIR_WINDOW_CURRENT); + tcp_self_connect("self-connect(restore)", port, false, true, + REPAIR_WINDOW_CURRENT); + port +=3D 2; /* restore test restores over different port */ + tcp_self_connect("self-connect(restore, legacy repair window)", port, + false, true, REPAIR_WINDOW_LEGACY); + port +=3D 2; /* restore test restores over different port */ + tcp_self_connect("self-connect(restore, v1 repair window)", port, + false, true, REPAIR_WINDOW_V1); + port +=3D 2; /* restore test restores over different port */ + tcp_self_connect("self-connect(restore, retracted repair window)", port, + false, true, REPAIR_WINDOW_RETRACTED); port +=3D 2; /* restore test restores over different port */ trace_ao_event_expect(TCP_AO_RNEXT_REQUEST, local_addr, local_addr, port, port, 0, -1, -1, -1, -1, -1, 7, 5, -1); /* intentionally on restore they are added to the socket in different ord= er */ trace_ao_event_expect(TCP_AO_RNEXT_REQUEST, local_addr, local_addr, port + 1, port + 1, 0, -1, -1, -1, -1, -1, 5, 7, -1); - tcp_self_connect("self-connect(restore, different keyids)", port, true, t= rue); + tcp_self_connect("self-connect(restore, different keyids)", + port, true, true, REPAIR_WINDOW_CURRENT); port +=3D 2; /* restore test restores over different port */ =20 return NULL; @@ -186,6 +371,6 @@ static void *client_fn(void *arg) =20 int main(int argc, char *argv[]) { - test_init(5, client_fn, NULL); + test_init(14, client_fn, NULL); return 0; } --=20 2.43.0 From nobody Sun Mar 22 08:10:23 2026 Received: from mail-ot1-f44.google.com (mail-ot1-f44.google.com [209.85.210.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7F71738D00F for ; Sat, 14 Mar 2026 20:15:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.44 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773519306; cv=none; b=eamBUQT5Qryw94/M4eC9gL8v/QARytcDT+5ZVpVb/KX9q4Mh8pKVTt7Cbb8eHpuVnEWDouJkYrix0r95ZBn/zBPH36ia545zD3zNb6ny+Kz7jqCNcrms7gZWMd1QcWP+38B6axzTIA4lXLSOSDUm+mozck/eqSKYF7UNQGYNiLw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773519306; c=relaxed/simple; bh=c3aw5VIxdZS2azxJsUSqQU6JjwbrB5pvPMaxk94MujQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=uoHGXjQmJkkok+gii+f6yaS/ZaaqI1B1FXN1JwhVljpt54g1YdSzxl4dVf4ayJYD2WKbarLJKPKmVXxxSdEDDA3K/ZmYBNpc4zgwOKkF4kaJfN0jcF9rEnjunCAoCtdJi/z4pkWiUP4F/Gf4vtqKmm6J+nfnYdylgmC23Tsk7+E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=D/YM8XUo; arc=none smtp.client-ip=209.85.210.44 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="D/YM8XUo" Received: by mail-ot1-f44.google.com with SMTP id 46e09a7af769-7d4c383f2fcso2747480a34.0 for ; Sat, 14 Mar 2026 13:15:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1773519301; x=1774124101; darn=lists.linux.dev; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=9mcgGvP3OSrXTj+URf09ITzfMzhOZtxhsq3uE0kGGg8=; b=D/YM8XUoamMFzczoTnbwYhCqXHs/uChOlwePjUe0XUJxUY83GqzZt6KSOckCtAalk8 H+CJ9675fp8AJtfow3SCp9KAdXS3TDkyNhijRPLXfiJvyIzVGkvlgtFI49K1i9JDsypp LAvvImPe4GMH9IPVg3nk0NsSrY1Ipl/LAzyuSR/RPT4Je7QSim8rJkPUJZek56GMYPWp 4nu5Pm7V4/XeaeRCPWhkQZltfC7GFswyo52BQW026XsciRP7d5/vdwtj1LtPJ8sP+UG/ 56kirBFgVTdglYghSLQSR3xf9kN6GzQwqF0zIEmoTMXpbGWpwZss8wUJiT6aOpfoa7+s C25g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1773519301; x=1774124101; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=9mcgGvP3OSrXTj+URf09ITzfMzhOZtxhsq3uE0kGGg8=; b=MVZKHM8hJDQcCXB3PhNIsGSlRHgFjOdurx1uPbjug0rkQvAiu80fyHMqOG6BKRRqk0 H5911PJnlXqQz4Si8QcoVgnKMS1XxdWcoDoW4qM5Xtxu6tyXwINKji+b2ECLOqY1JGaO r3b+6dnPakeYqg+nBt8rlBOB3YG6nABl+f6qH2/b41YD/0AUVDc+SF71UMYBHzCLyH4j 5lGWm6q1PFsesHCJNbbqb+bjE8xnTRG4fRkf99Mg5O2tCawoG+HScRRZYCLdvHH5SYY7 r+jXiD26j9PgHEd3BvRqC6Xuz6h0QU/a1FxOCqvViTOvfnf+BCnIzL5zPapCg+7LP8+l onWw== X-Forwarded-Encrypted: i=1; AJvYcCWbIS09TFG2v/J1/lRL2DXx1ZTfJNpoY1KcVE7Pw/h284msF1qMuomHKSeXsTFgJze9txEkNQ==@lists.linux.dev X-Gm-Message-State: AOJu0YwYttuiK/U5yEFUMh8C0CGZNMvUHpmqtaR/1GRjddogszZaxQQW zC5nZjFccEqefV1EMlbGZPMGBUL5J+J2L2musVfCu6SBxrz9XYAYt/H+ X-Gm-Gg: ATEYQzy2ZVwmc5w0eAg+SWMPOQCPUClIDubAEAmo8764Lt3HyJtwvAZ+5peYjYqVPMC F44QcMwnJAeGVpiYuu2goPK139/gC4oJO7vHMvU3ju/B+iOVvpa3p7CZIvA4ddM4oXH+kLyW5Xd B0vJn/En9sKxon4gfBMSvyGt4h7jF+Ogcq8gkCysXdKCvEwESjvA2LnjLdJCm4ATekN3fhwa0vA wBa4eU486RPdtqfCQWzpRgZMFzPdNqXoBxgw0TEQp/7nMq/wthV+4459uaW4ZLnsLENE+xY6OEN oSyPIVAVU+d5lR5GYUR9qNTuykNLStxjd6WJGUEY2syHmioaB5jwqQjhG0pmTJiISw7WRlwxJgT KHDXPzkL35p20E1KFO1HOG47fNbXTO0m5gkjydi8TCkoC7KH9kVwOKLVyS8e1yQJSRXfFLIHclQ MZOPjQ/5qWn4i97uVxDjLe3o1XqYW95dbWurUndyFaI7m5zU5oyPhPlwCrRLziuiK4AsgOtCeii MIROk7dTWIm5xUm97uWUJRZk7jqffLfV1+Gqo6E7iuqFT7MJpY= X-Received: by 2002:a05:6820:290d:b0:678:f8f3:d6dd with SMTP id 006d021491bc7-67bda98e367mr4712493eaf.8.1773519300803; Sat, 14 Mar 2026 13:15:00 -0700 (PDT) Received: from Atwell-Laptop.. (108-212-132-20.lightspeed.irvnca.sbcglobal.net. [108.212.132.20]) by smtp.gmail.com with ESMTPSA id 586e51a60fabf-4177e5e8185sm11914165fac.12.2026.03.14.13.14.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 14 Mar 2026 13:15:00 -0700 (PDT) From: atwellwea@gmail.com To: netdev@vger.kernel.org, davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com, edumazet@google.com, ncardwell@google.com Cc: linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-trace-kernel@vger.kernel.org, mptcp@lists.linux.dev, dsahern@kernel.org, horms@kernel.org, kuniyu@google.com, andrew+netdev@lunn.ch, willemdebruijn.kernel@gmail.com, jasowang@redhat.com, skhan@linuxfoundation.org, corbet@lwn.net, matttbe@kernel.org, martineau@kernel.org, geliang@kernel.org, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, 0x7f454c46@gmail.com Subject: [PATCH net-next v2 12/14] tun/selftests: add RX truesize injection for TCP window tests Date: Sat, 14 Mar 2026 14:13:46 -0600 Message-ID: <20260314201348.1786972-13-atwellwea@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260314201348.1786972-1-atwellwea@gmail.com> References: <20260314201348.1786972-1-atwellwea@gmail.com> Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Wesley Atwell Add a test-only TUN ioctl that inflates RX skb->truesize, plus the packetdrill-side helper needed to drive that ioctl through packetdrill's own TUN queue file descriptor. Use that plumbing to cover the receive-window regressions where scaling_ratio drifts after advertisement, alongside the baseline too-big packetdrill cases that exercise the same sender-visible rwnd accounting from the non-injected path. Signed-off-by: Wesley Atwell --- drivers/net/tun.c | 65 ++++++++ include/uapi/linux/if_tun.h | 4 + .../tcp_rcv_neg_window_truesize.pkt | 143 ++++++++++++++++++ .../net/packetdrill/tcp_rcv_toobig.pkt | 35 +++++ .../packetdrill/tcp_rcv_toobig_default.pkt | 97 ++++++++++++ .../tcp_rcv_toobig_default_truesize.pkt | 118 +++++++++++++++ .../tcp_rcv_wnd_shrink_allowed_truesize.pkt | 49 ++++++ tools/testing/selftests/net/tun.c | 140 ++++++++++++++++- 8 files changed, 650 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/net/packetdrill/tcp_rcv_neg_win= dow_truesize.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_rcv_toobig.= pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_rcv_toobig_= default.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_rcv_toobig_= default_truesize.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_rcv_wnd_shr= ink_allowed_truesize.pkt diff --git a/drivers/net/tun.c b/drivers/net/tun.c index c492fda6fc15..2cef62cebe88 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -53,6 +53,7 @@ #include #include #include +#include #include #include #include @@ -85,8 +86,13 @@ =20 #include "tun_vnet.h" =20 +struct tun_file; + +#define TUNSETTRUESIZE_OLD _IOW('T', 228, unsigned int) + static void tun_default_link_ksettings(struct net_device *dev, struct ethtool_link_ksettings *cmd); +static void tun_rx_update_truesize(struct tun_file *tfile, struct sk_buff = *skb); =20 #define TUN_RX_PAD (NET_IP_ALIGN + NET_SKB_PAD) =20 @@ -138,6 +144,7 @@ struct tun_file { u16 queue_index; unsigned int ifindex; }; + u32 rx_extra_truesize; struct napi_struct napi; bool napi_enabled; bool napi_frags_enabled; @@ -1817,6 +1824,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, s= truct tun_file *tfile, goto free_skb; } =20 + tun_rx_update_truesize(tfile, skb); switch (tun->flags & TUN_TYPE_MASK) { case IFF_TUN: if (tun->flags & IFF_NO_PI) { @@ -2373,6 +2381,25 @@ static void tun_put_page(struct tun_page *tpage) __page_frag_cache_drain(tpage->page, tpage->count); } =20 +/* Tests can inflate skb->truesize on ingress to exercise receive-memory + * accounting against a scaling_ratio that drifts after a window was + * advertised. The knob is per queue file, defaults to zero, and only chan= ges + * behavior when explicitly enabled through the TUN fd. + */ +static void tun_rx_update_truesize(struct tun_file *tfile, struct sk_buff = *skb) +{ + u32 extra =3D READ_ONCE(tfile->rx_extra_truesize); + unsigned int truesize; + + if (!extra) + return; + + if (check_add_overflow(skb->truesize, extra, &truesize)) + truesize =3D UINT_MAX; + + skb->truesize =3D truesize; +} + static int tun_xdp_one(struct tun_struct *tun, struct tun_file *tfile, struct xdp_buff *xdp, int *flush, @@ -2459,6 +2486,7 @@ static int tun_xdp_one(struct tun_struct *tun, goto out; } =20 + tun_rx_update_truesize(tfile, skb); skb->protocol =3D eth_type_trans(skb, tun->dev); skb_reset_network_header(skb); skb_probe_transport_header(skb); @@ -3045,6 +3073,7 @@ static long __tun_chr_ioctl(struct file *file, unsign= ed int cmd, struct tun_struct *tun; void __user* argp =3D (void __user*)arg; unsigned int carrier; + unsigned int extra_truesize; struct ifreq ifr; kuid_t owner; kgid_t group; @@ -3309,6 +3338,40 @@ static long __tun_chr_ioctl(struct file *file, unsig= ned int cmd, ret =3D tun_net_change_carrier(tun->dev, (bool)carrier); break; =20 + /* Support both the legacy pointer-payload form and the scalar form + * used by the selftest helper when injecting truesize from + * packetdrill shell commands. + */ + case TUNSETTRUESIZE: + case TUNSETTRUESIZE_OLD: + ret =3D -EPERM; + if (!ns_capable(net->user_ns, CAP_NET_ADMIN)) + goto unlock; + + if (cmd =3D=3D TUNSETTRUESIZE_OLD) { + ret =3D -EFAULT; + if (copy_from_user(&extra_truesize, argp, + sizeof(extra_truesize))) { + ret =3D -EINVAL; + if (arg > U32_MAX) + goto unlock; + + extra_truesize =3D arg; + } + } else { + ret =3D -EINVAL; + if (arg > U32_MAX) + goto unlock; + + extra_truesize =3D arg; + } + + WRITE_ONCE(tfile->rx_extra_truesize, extra_truesize); + netif_info(tun, drv, tun->dev, + "rx extra truesize set to %u\n", extra_truesize); + ret =3D 0; + break; + case TUNGETDEVNETNS: ret =3D -EPERM; if (!ns_capable(net->user_ns, CAP_NET_ADMIN)) @@ -3348,6 +3411,7 @@ static long tun_chr_compat_ioctl(struct file *file, case TUNGETSNDBUF: case TUNSETSNDBUF: case SIOCGIFHWADDR: + case TUNSETTRUESIZE_OLD: case SIOCSIFHWADDR: arg =3D (unsigned long)compat_ptr(arg); break; @@ -3408,6 +3472,7 @@ static int tun_chr_open(struct inode *inode, struct f= ile * file) RCU_INIT_POINTER(tfile->tun, NULL); tfile->flags =3D 0; tfile->ifindex =3D 0; + tfile->rx_extra_truesize =3D 0; =20 init_waitqueue_head(&tfile->socket.wq.wait); =20 diff --git a/include/uapi/linux/if_tun.h b/include/uapi/linux/if_tun.h index 79d53c7a1ebd..4be63efe6540 100644 --- a/include/uapi/linux/if_tun.h +++ b/include/uapi/linux/if_tun.h @@ -61,6 +61,10 @@ #define TUNSETFILTEREBPF _IOR('T', 225, int) #define TUNSETCARRIER _IOW('T', 226, int) #define TUNGETDEVNETNS _IO('T', 227) +/* Test-only: add scalar bytes to skb->truesize on RX after TUN allocates + * an skb. + */ +#define TUNSETTRUESIZE _IO('T', 228) =20 /* TUNSETIFF ifr flags */ #define IFF_TUN 0x0001 diff --git a/tools/testing/selftests/net/packetdrill/tcp_rcv_neg_window_tru= esize.pkt b/tools/testing/selftests/net/packetdrill/tcp_rcv_neg_window_true= size.pkt new file mode 100644 index 000000000000..1c5550fff509 --- /dev/null +++ b/tools/testing/selftests/net/packetdrill/tcp_rcv_neg_window_truesize.p= kt @@ -0,0 +1,143 @@ +// SPDX-License-Identifier: GPL-2.0 +// Run the negative-window / max-advertised-window regression with inflated +// TUN skb->truesize so scaling_ratio drifts throughout the flow. The sequ= ence +// checks and drop counters should remain identical to the uninflated case. + +--mss=3D1000 + +`./defaults.sh` + + 0 `nstat -n` + +// Establish a connection. + +0 socket(..., SOCK_STREAM, IPPROTO_TCP) =3D 3 + +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) =3D 0 + +0 setsockopt(3, SOL_SOCKET, SO_RCVBUF, [1000000], 4) =3D 0 + +0 bind(3, ..., ...) =3D 0 + +0 listen(3, 1) =3D 0 + + +0 < S 0:0(0) win 32792 + +0 > S. 0:0(0) ack 1 win 65535 + +0 < . 1:1(0) ack 1 win 257 + + +0 accept(3, ..., ...) =3D 4 + +// Put 1040000 bytes into the receive buffer. + +0 < P. 1:65001(65000) ack 1 win 257 + * > . 1:1(0) ack 65001 + +0 < P. 65001:130001(65000) ack 1 win 257 + * > . 1:1(0) ack 130001 + +0 < P. 130001:195001(65000) ack 1 win 257 + * > . 1:1(0) ack 195001 + +0 < P. 195001:260001(65000) ack 1 win 257 + * > . 1:1(0) ack 260001 + +0 < P. 260001:325001(65000) ack 1 win 257 + * > . 1:1(0) ack 325001 + +0 < P. 325001:390001(65000) ack 1 win 257 + * > . 1:1(0) ack 390001 + +0 < P. 390001:455001(65000) ack 1 win 257 + * > . 1:1(0) ack 455001 + +0 < P. 455001:520001(65000) ack 1 win 257 + * > . 1:1(0) ack 520001 + +0 < P. 520001:585001(65000) ack 1 win 257 + * > . 1:1(0) ack 585001 + +0 < P. 585001:650001(65000) ack 1 win 257 + * > . 1:1(0) ack 650001 + +0 < P. 650001:715001(65000) ack 1 win 257 + * > . 1:1(0) ack 715001 + +0 < P. 715001:780001(65000) ack 1 win 257 + * > . 1:1(0) ack 780001 + +0 < P. 780001:845001(65000) ack 1 win 257 + * > . 1:1(0) ack 845001 + +0 < P. 845001:910001(65000) ack 1 win 257 + * > . 1:1(0) ack 910001 + +0 < P. 910001:975001(65000) ack 1 win 257 + * > . 1:1(0) ack 975001 + +0 < P. 975001:1040001(65000) ack 1 win 257 + * > . 1:1(0) ack 1040001 + +// Start inflating future TUN skbs only after the baseline sender-visible +// window has been established, so the negative-window checks below exerci= se +// ratio drift without changing the initial max advertised window. + +0 `../tun --set-rx-truesize tun0 65536` + +// Trigger an extreme memory squeeze by shrinking SO_RCVBUF. + +0 setsockopt(4, SOL_SOCKET, SO_RCVBUF, [16000], 4) =3D 0 + + +0 < P. 1040001:1105001(65000) ack 1 win 257 + * > . 1:1(0) ack 1040001 win 0 +// Check LINUX_MIB_TCPRCVQDROP has been incremented. + +0 `nstat -s | grep TcpExtTCPRcvQDrop | grep -q " 1 "` + +// RWIN =3D=3D 0: rcv_wup =3D 1040001, rcv_wnd =3D 0, rcv_mwnd_seq > 11050= 01. + +// Accept pure ack with seq in max adv. window. + +0 write(4, ..., 1000) =3D 1000 + +0 > P. 1:1001(1000) ack 1040001 win 0 + +0 < . 1105001:1105001(0) ack 1001 win 257 + +// In order segment, in max adv. window -> drop (SKB_DROP_REASON_TCP_ZEROW= INDOW). + +0 < P. 1040001:1041001(1000) ack 1001 win 257 + +0 > . 1001:1001(0) ack 1040001 win 0 +// Ooo partial segment, in max adv. window -> drop (SKB_DROP_REASON_TCP_ZE= ROWINDOW). + +0 < P. 1039001:1041001(2000) ack 1001 win 257 + +0 > . 1001:1001(0) ack 1040001 win 0 +// Check LINUX_MIB_TCPZEROWINDOWDROP has been incremented twice. + +0 `nstat -s | grep TcpExtTCPZeroWindowDrop | grep -q " 2 "` + +// Ooo segment, in max adv. window -> drop (SKB_DROP_REASON_TCP_OVERWINDOW= ). + +0 < P. 1105001:1106001(1000) ack 1001 win 257 + +0 > . 1001:1001(0) ack 1040001 win 0 +// Ooo segment, beyond max adv. window -> drop (SKB_DROP_REASON_TCP_INVALI= D_SEQUENCE). + +0 < P. 2000001:2001001(1000) ack 1001 win 257 + +0 > . 1001:1001(0) ack 1040001 win 0 +// Check LINUX_MIB_BEYOND_WINDOW has been incremented twice. + +0 `nstat -s | grep TcpExtBeyondWindow | grep -q " 2 "` + +// Read all data. + +0 read(4, ..., 2000000) =3D 1040000 + * > . 1001:1001(0) ack 1040001 + +// RWIN > 0: rcv_wup =3D 1040001, 0 < rcv_wnd < 32000, rcv_mwnd_seq > 1105= 001. + +// Accept pure ack with seq in max adv. window, beyond adv. window. + +0 write(4, ..., 1000) =3D 1000 + +0 > P. 1001:2001(1000) ack 1040001 + +0 < . 1105001:1105001(0) ack 2001 win 257 + +// In order segment, in max adv. window, in adv. window -> accept. + +0 < P. 1040001:1041001(1000) ack 2001 win 257 + * > . 2001:2001(0) ack 1041001 + +// Ooo partial segment, in adv. window -> accept. + +0 < P. 1040001:1042001(2000) ack 2001 win 257 + * > . 2001:2001(0) ack 1042001 + +// Ooo segment, in max adv. window, beyond adv. window -> drop. + +0 < P. 1105001:1106001(1000) ack 2001 win 257 + +0 > . 2001:2001(0) ack 1042001 +// Ooo segment, beyond max adv. window, beyond adv. window -> drop. + +0 < P. 2000001:2001001(1000) ack 2001 win 257 + +0 > . 2001:2001(0) ack 1042001 +// Check LINUX_MIB_BEYOND_WINDOW has been incremented twice more. + +0 `nstat -s | grep TcpExtBeyondWindow | grep -q " 4 "` + +// We are allowed to go beyond the window and buffer with one packet. + +0 < P. 1042001:1062001(20000) ack 2001 win 257 + * > . 2001:2001(0) ack 1062001 + +0 < P. 1062001:1082001(20000) ack 2001 win 257 + * > . 2001:2001(0) ack 1082001 win 0 + +// But not more: in-order segment, in max adv. window -> drop. + +0 < P. 1082001:1083001(1000) ack 2001 win 257 + * > . 2001:2001(0) ack 1082001 +// Check LINUX_MIB_TCPZEROWINDOWDROP has been incremented again. + +0 `nstat -s | grep TcpExtTCPZeroWindowDrop | grep -q " 3 "` + +// Another ratio drop must not change the final zero-window decision. + +0 `../tun --set-rx-truesize tun0 131072` + + +0 < P. 1082001:1083001(1000) ack 2001 win 257 + * > . 2001:2001(0) ack 1082001 +// Check LINUX_MIB_TCPZEROWINDOWDROP has been incremented once more. + +0 `nstat -s | grep TcpExtTCPZeroWindowDrop | grep -q " 4 "` diff --git a/tools/testing/selftests/net/packetdrill/tcp_rcv_toobig.pkt b/t= ools/testing/selftests/net/packetdrill/tcp_rcv_toobig.pkt new file mode 100644 index 000000000000..837ba3633752 --- /dev/null +++ b/tools/testing/selftests/net/packetdrill/tcp_rcv_toobig.pkt @@ -0,0 +1,35 @@ +// SPDX-License-Identifier: GPL-2.0 + +--mss=3D1000 + +`./defaults.sh` + + 0 `nstat -n` + +// Establish a connection. + +0 socket(..., SOCK_STREAM, IPPROTO_TCP) =3D 3 + +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) =3D 0 + +0 setsockopt(3, SOL_SOCKET, SO_RCVBUF, [20000], 4) =3D 0 + +0 bind(3, ..., ...) =3D 0 + +0 listen(3, 1) =3D 0 + + +0 < S 0:0(0) win 32792 + +0 > S. 0:0(0) ack 1 win 18980 + +.1 < . 1:1(0) ack 1 win 257 + + +0 accept(3, ..., ...) =3D 4 + + +0 < P. 1:20001(20000) ack 1 win 257 + +.04 > . 1:1(0) ack 20001 win 18000 + + +0 setsockopt(4, SOL_SOCKET, SO_RCVBUF, [12000], 4) =3D 0 + +0 < P. 20001:80001(60000) ack 1 win 257 + +0 > . 1:1(0) ack 20001 win 18000 + + +0 read(4, ..., 20000) =3D 20000 + +// A too big packet is accepted if the receive queue is empty, but the +// stronger admission path must not zero the receive buffer while doing so. + +0 < P. 20001:80001(60000) ack 1 win 257 + * > . 1:1(0) ack 80001 win 0 + +0 %{ assert SK_MEMINFO_RCVBUF > 0, SK_MEMINFO_RCVBUF }% diff --git a/tools/testing/selftests/net/packetdrill/tcp_rcv_toobig_default= .pkt b/tools/testing/selftests/net/packetdrill/tcp_rcv_toobig_default.pkt new file mode 100644 index 000000000000..b2e4950e0b83 --- /dev/null +++ b/tools/testing/selftests/net/packetdrill/tcp_rcv_toobig_default.pkt @@ -0,0 +1,97 @@ +// SPDX-License-Identifier: GPL-2.0 + +--mss=3D1000 + +`./defaults.sh +sysctl -q net.ipv4.tcp_moderate_rcvbuf=3D0` + +// Establish a connection on the default receive buffer. Leave a large skb= in +// the queue, then deliver another one which still fits the remaining rwnd. +// We should grow sk_rcvbuf to honor the already-advertised window instead= of +// dropping the packet. + +0 socket(..., SOCK_STREAM, IPPROTO_TCP) =3D 3 + +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) =3D 0 + +0 bind(3, ..., ...) =3D 0 + +0 listen(3, 1) =3D 0 + + +0 < S 0:0(0) win 65535 + +0 > S. 0:0(0) ack 1 <...> + +.1 < . 1:1(0) ack 1 win 257 + + +0 accept(3, ..., ...) =3D 4 + +// Exchange enough data to get past the completely fresh-socket case while +// still keeping the receive buffer at its 128kB default. + +0 < P. 1:65001(65000) ack 1 win 257 + * > . 1:1(0) ack 65001 + +0 read(4, ..., 65000) =3D 65000 + + +0 < P. 65001:130001(65000) ack 1 win 257 + * > . 1:1(0) ack 130001 + +0 read(4, ..., 65000) =3D 65000 + + +0 < P. 130001:195001(65000) ack 1 win 257 + * > . 1:1(0) ack 195001 + +0 read(4, ..., 65000) =3D 65000 + + +0 < P. 195001:260001(65000) ack 1 win 257 + * > . 1:1(0) ack 260001 + +0 read(4, ..., 65000) =3D 65000 + + +0 < P. 260001:325001(65000) ack 1 win 257 + * > . 1:1(0) ack 325001 + +0 read(4, ..., 65000) =3D 65000 + + +0 < P. 325001:390001(65000) ack 1 win 257 + * > . 1:1(0) ack 390001 + +0 read(4, ..., 65000) =3D 65000 + + +0 < P. 390001:455001(65000) ack 1 win 257 + * > . 1:1(0) ack 455001 + +0 read(4, ..., 65000) =3D 65000 + + +0 < P. 455001:520001(65000) ack 1 win 257 + * > . 1:1(0) ack 520001 + +0 read(4, ..., 65000) =3D 65000 + + +0 < P. 520001:585001(65000) ack 1 win 257 + * > . 1:1(0) ack 585001 + +0 read(4, ..., 65000) =3D 65000 + + +0 < P. 585001:650001(65000) ack 1 win 257 + * > . 1:1(0) ack 650001 + +0 read(4, ..., 65000) =3D 65000 + + +0 < P. 650001:715001(65000) ack 1 win 257 + * > . 1:1(0) ack 715001 + +0 read(4, ..., 65000) =3D 65000 + + +0 < P. 715001:780001(65000) ack 1 win 257 + * > . 1:1(0) ack 780001 + +0 read(4, ..., 65000) =3D 65000 + + +0 < P. 780001:845001(65000) ack 1 win 257 + * > . 1:1(0) ack 845001 + +0 read(4, ..., 65000) =3D 65000 + + +0 < P. 845001:910001(65000) ack 1 win 257 + * > . 1:1(0) ack 910001 + +0 read(4, ..., 65000) =3D 65000 + + +0 < P. 910001:975001(65000) ack 1 win 257 + * > . 1:1(0) ack 975001 + +0 read(4, ..., 65000) =3D 65000 + + +0 < P. 975001:1040001(65000) ack 1 win 257 + * > . 1:1(0) ack 1040001 + +0 read(4, ..., 65000) =3D 65000 + +// Leave about 60kB queued, then accept another large skb which still fits +// the rwnd we already exposed to the peer. The regression is the drop; the +// exact sk_rcvbuf growth path is an implementation detail. + +0 < P. 1040001:1102001(62000) ack 1 win 257 + * > . 1:1(0) ack 1102001 + + +0 < P. 1102001:1167001(65000) ack 1 win 257 + * > . 1:1(0) ack 1167001 + +0 read(4, ..., 127000) =3D 127000 diff --git a/tools/testing/selftests/net/packetdrill/tcp_rcv_toobig_default= _truesize.pkt b/tools/testing/selftests/net/packetdrill/tcp_rcv_toobig_defa= ult_truesize.pkt new file mode 100644 index 000000000000..c2ebe11d75f7 --- /dev/null +++ b/tools/testing/selftests/net/packetdrill/tcp_rcv_toobig_default_truesi= ze.pkt @@ -0,0 +1,118 @@ +// SPDX-License-Identifier: GPL-2.0 + +--mss=3D1000 + +`./defaults.sh +sysctl -q net.ipv4.tcp_moderate_rcvbuf=3D0` + +// Establish a connection on the default receive buffer. The warmup traffic +// keeps the socket in the normal data path without changing its default +// sk_rcvbuf. Then inflate skb->truesize on future TUN RX packets so the l= ive +// scaling_ratio drops after we already exposed a larger rwnd to the peer. +// The follow-up packet should still be admitted, and tcp_clamp_window() s= hould +// grow sk_rcvbuf to honor the sender-visible window instead of dropping d= ata. + +0 socket(..., SOCK_STREAM, IPPROTO_TCP) =3D 3 + +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) =3D 0 + +0 bind(3, ..., ...) =3D 0 + +0 listen(3, 1) =3D 0 + + +0 < S 0:0(0) win 65535 + +0 > S. 0:0(0) ack 1 <...> + +.1 < . 1:1(0) ack 1 win 257 + + +0 accept(3, ..., ...) =3D 4 + +// Exchange enough data to get past the completely fresh-socket case while +// still keeping the receive buffer at its initial default. + +0 < P. 1:65001(65000) ack 1 win 257 + * > . 1:1(0) ack 65001 + +0 read(4, ..., 65000) =3D 65000 + + +0 < P. 65001:130001(65000) ack 1 win 257 + * > . 1:1(0) ack 130001 + +0 read(4, ..., 65000) =3D 65000 + + +0 < P. 130001:195001(65000) ack 1 win 257 + * > . 1:1(0) ack 195001 + +0 read(4, ..., 65000) =3D 65000 + + +0 < P. 195001:260001(65000) ack 1 win 257 + * > . 1:1(0) ack 260001 + +0 read(4, ..., 65000) =3D 65000 + + +0 < P. 260001:325001(65000) ack 1 win 257 + * > . 1:1(0) ack 325001 + +0 read(4, ..., 65000) =3D 65000 + + +0 < P. 325001:390001(65000) ack 1 win 257 + * > . 1:1(0) ack 390001 + +0 read(4, ..., 65000) =3D 65000 + + +0 < P. 390001:455001(65000) ack 1 win 257 + * > . 1:1(0) ack 455001 + +0 read(4, ..., 65000) =3D 65000 + + +0 < P. 455001:520001(65000) ack 1 win 257 + * > . 1:1(0) ack 520001 + +0 read(4, ..., 65000) =3D 65000 + + +0 < P. 520001:585001(65000) ack 1 win 257 + * > . 1:1(0) ack 585001 + +0 read(4, ..., 65000) =3D 65000 + + +0 < P. 585001:650001(65000) ack 1 win 257 + * > . 1:1(0) ack 650001 + +0 read(4, ..., 65000) =3D 65000 + + +0 < P. 650001:715001(65000) ack 1 win 257 + * > . 1:1(0) ack 715001 + +0 read(4, ..., 65000) =3D 65000 + + +0 < P. 715001:780001(65000) ack 1 win 257 + * > . 1:1(0) ack 780001 + +0 read(4, ..., 65000) =3D 65000 + + +0 < P. 780001:845001(65000) ack 1 win 257 + * > . 1:1(0) ack 845001 + +0 read(4, ..., 65000) =3D 65000 + + +0 < P. 845001:910001(65000) ack 1 win 257 + * > . 1:1(0) ack 910001 + +0 read(4, ..., 65000) =3D 65000 + + +0 < P. 910001:975001(65000) ack 1 win 257 + * > . 1:1(0) ack 975001 + +0 read(4, ..., 65000) =3D 65000 + + +0 < P. 975001:1040001(65000) ack 1 win 257 + * > . 1:1(0) ack 1040001 + +0 read(4, ..., 65000) =3D 65000 + + +0 %{ base_rcvbuf =3D SK_MEMINFO_RCVBUF }% + +// Leave about 60kB queued, then make future TUN skbs look more expensive = in +// two steps. Both inflated skbs still fit the already-advertised window a= nd +// must be admitted, and sk_rcvbuf should keep growing as the live +// scaling_ratio drops further. + +0 < P. 1040001:1102001(62000) ack 1 win 257 + * > . 1:1(0) ack 1102001 + + +0 `../tun --set-rx-truesize tun0 4096` + + +0 < P. 1102001:1167001(65000) ack 1 win 257 + * > . 1:1(0) ack 1167001 + +0 %{ assert SK_MEMINFO_RCVBUF > base_rcvbuf, (base_rcvbuf, SK_MEMINFO_= RCVBUF) }% + +0 %{ small_rcvbuf =3D SK_MEMINFO_RCVBUF }% + + +0 < P. 1167001:1229001(62000) ack 1 win 257 + * > . 1:1(0) ack 1229001 + + +0 `../tun --set-rx-truesize tun0 65536` + + +0 < P. 1229001:1294001(65000) ack 1 win 257 + * > . 1:1(0) ack 1294001 + +0 %{ assert SK_MEMINFO_RCVBUF > small_rcvbuf, (base_rcvbuf, small_rcvb= uf, SK_MEMINFO_RCVBUF) }% + + +0 < P. 1294001:1356001(62000) ack 1 win 257 + * > . 1:1(0) ack 1356001 + +0 read(4, ..., 254000) =3D 254000 diff --git a/tools/testing/selftests/net/packetdrill/tcp_rcv_wnd_shrink_all= owed_truesize.pkt b/tools/testing/selftests/net/packetdrill/tcp_rcv_wnd_shr= ink_allowed_truesize.pkt new file mode 100644 index 000000000000..08da5fddaa12 --- /dev/null +++ b/tools/testing/selftests/net/packetdrill/tcp_rcv_wnd_shrink_allowed_tr= uesize.pkt @@ -0,0 +1,49 @@ +// SPDX-License-Identifier: GPL-2.0 + +--mss=3D1000 + +`./defaults.sh +sysctl -q net.ipv4.tcp_shrink_window=3D1 +sysctl -q net.ipv4.tcp_rmem=3D"4096 32768 $((32*1024*1024))"` + + 0 `nstat -n` + +// Establish a connection. After the first payload we know the peer has se= en a +// scaled receive window reaching sequence 25361. Inflate later TUN skbs i= n two +// steps so the live scaling_ratio drops more than once, then verify that: +// 1) a segment one byte beyond the max advertised window is still dropp= ed, +// 2) a segment exactly using the previously advertised max window is st= ill +// accepted even though the current live ratio no longer matches that +// original advertisement basis. + +0 socket(..., SOCK_STREAM, IPPROTO_TCP) =3D 3 + +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) =3D 0 + +0 bind(3, ..., ...) =3D 0 + +0 listen(3, 1) =3D 0 + + +0 < S 0:0(0) win 32792 + +0 > S. 0:0(0) ack 1 + +0 < . 1:1(0) ack 1 win 257 + + +0 accept(3, ..., ...) =3D 4 + + +0 < P. 1:10001(10000) ack 1 win 257 + * > . 1:1(0) ack 10001 win 15 + +// Max window seq advertised here is 10001 + 15*1024 =3D 25361. + +0 `../tun --set-rx-truesize tun0 4096` + + +0 < P. 10001:11024(1023) ack 1 win 257 + * > . 1:1(0) ack 11024 + + +0 `../tun --set-rx-truesize tun0 65536` + +// Segment beyond the max window stays invalid even after ratio drift. + +0 < P. 11024:25362(14338) ack 1 win 257 + * > . 1:1(0) ack 11024 + +// Segment exactly using the max window must still be accepted. + +0 < P. 11024:25361(14337) ack 1 win 257 + * > . 1:1(0) ack 25361 + +// Check LINUX_MIB_BEYOND_WINDOW has been incremented once. + +0 `nstat | grep TcpExtBeyondWindow | grep -q " 1 "` diff --git a/tools/testing/selftests/net/tun.c b/tools/testing/selftests/ne= t/tun.c index cf106a49b55e..473992b3784d 100644 --- a/tools/testing/selftests/net/tun.c +++ b/tools/testing/selftests/net/tun.c @@ -2,14 +2,17 @@ =20 #define _GNU_SOURCE =20 +#include #include #include +#include #include #include #include #include #include #include +#include #include =20 #include "kselftest_harness.h" @@ -174,6 +177,135 @@ static int tun_delete(char *dev) return ip_link_del(dev); } =20 +static bool is_numeric_name(const char *name) +{ + for (; *name; name++) { + if (*name < '0' || *name > '9') + return false; + } + + return true; +} + +static int packetdrill_dup_fd(int pidfd, const char *fd_name) +{ + char *end; + unsigned long tmp; + + errno =3D 0; + tmp =3D strtoul(fd_name, &end, 10); + if (errno || *end || tmp > INT_MAX) { + errno =3D EINVAL; + return -1; + } + + return syscall(SYS_pidfd_getfd, pidfd, (int)tmp, 0); +} + +static int open_packetdrill_tunfd(pid_t pid, const char *ifname) +{ + char fd_dir[PATH_MAX]; + struct dirent *dent; + struct ifreq ifr =3D {}; + int pidfd; + int saved_errno =3D ENOENT; + DIR *dir; + + snprintf(fd_dir, sizeof(fd_dir), "/proc/%ld/fd", (long)pid); + + pidfd =3D syscall(SYS_pidfd_open, pid, 0); + if (pidfd < 0) + return -1; + + dir =3D opendir(fd_dir); + if (!dir) { + close(pidfd); + return -1; + } + + while ((dent =3D readdir(dir))) { + int fd; + + if (!is_numeric_name(dent->d_name)) + continue; + + /* Reopen via pidfd_getfd() so we duplicate packetdrill's attached + * queue file, instead of opening a fresh /dev/net/tun instance. + */ + fd =3D packetdrill_dup_fd(pidfd, dent->d_name); + if (fd < 0) { + saved_errno =3D errno; + continue; + } + + memset(&ifr, 0, sizeof(ifr)); + if (!ioctl(fd, TUNGETIFF, &ifr) && + !strncmp(ifr.ifr_name, ifname, IFNAMSIZ)) { + close(pidfd); + closedir(dir); + return fd; + } + + if (errno) + saved_errno =3D errno; + close(fd); + } + + close(pidfd); + closedir(dir); + errno =3D saved_errno; + return -1; +} + +/* Packetdrill owns the TUN queue fd, so drive the test ioctl through that + * exact file descriptor found under /proc/$PACKETDRILL_PID/fd. + */ +static int packetdrill_set_rx_truesize(const char *ifname, const char *val= ue) +{ + char *packetdrill_pid, *end; + unsigned long long tmp; + unsigned int extra; + pid_t pid; + int fd; + + packetdrill_pid =3D getenv("PACKETDRILL_PID"); + if (!packetdrill_pid || !*packetdrill_pid) { + fprintf(stderr, "PACKETDRILL_PID is not set\n"); + return 1; + } + + errno =3D 0; + tmp =3D strtoull(packetdrill_pid, &end, 10); + if (errno || *end || !tmp || tmp > INT_MAX) { + fprintf(stderr, "invalid PACKETDRILL_PID: %s\n", packetdrill_pid); + return 1; + } + pid =3D (pid_t)tmp; + + errno =3D 0; + tmp =3D strtoull(value, &end, 0); + if (errno || *end || tmp > UINT_MAX) { + fprintf(stderr, "invalid truesize value: %s\n", value); + return 1; + } + extra =3D (unsigned int)tmp; + + fd =3D open_packetdrill_tunfd(pid, ifname); + if (fd < 0) { + perror("open_packetdrill_tunfd"); + return 1; + } + + if (ioctl(fd, TUNSETTRUESIZE, (unsigned long)extra)) { + perror("ioctl(TUNSETTRUESIZE)"); + close(fd); + return 1; + } + + close(fd); + return 0; +} + static int tun_open(char *dev, const int flags, const int hdrlen, const int features, const unsigned char *mac_addr) { @@ -985,4 +1117,10 @@ XFAIL_ADD(tun_vnet_udptnl, 6in4_over_maxbytes, recv_g= so_packet); XFAIL_ADD(tun_vnet_udptnl, 4in6_over_maxbytes, recv_gso_packet); XFAIL_ADD(tun_vnet_udptnl, 6in6_over_maxbytes, recv_gso_packet); =20 -TEST_HARNESS_MAIN +int main(int argc, char **argv) +{ + if (argc =3D=3D 4 && !strcmp(argv[1], "--set-rx-truesize")) + return packetdrill_set_rx_truesize(argv[2], argv[3]); + + return test_harness_run(argc, argv); +} --=20 2.43.0 From nobody Sun Mar 22 08:10:23 2026 Received: from mail-oa1-f45.google.com (mail-oa1-f45.google.com [209.85.160.45]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D4E5538E12A for ; Sat, 14 Mar 2026 20:15:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.45 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773519307; cv=none; b=BgFV5ZYePpiTFVa3qk8RiSzaf2kZIop38w8ocamut3SEDD+8ViehttbLe3oL2LYzNvjwmoG8TDy0KVS1oTYQCdAFIXQ2LmjqLeoRLC6sxvLES0t0oUUsXJhpxUdF71AdYA5RV4FDslT/HojRAX38ioUD18VdbzSyJ4cOf863gVY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773519307; c=relaxed/simple; bh=6JX3Yctd8bDnEGsPO4CjvDQJ7QxYVDLwhFVJibho2l4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=JtSMwUCqh450y8TJ1O15kZHHjKFARFQS6QSDJzwX7N6Os2TtOt1q+S3Dgees4dOz0X5FJkiDtu+r/Q7gMM0CcgV6ycNMnnOrJ/SYnOIqq3+rfpNNTX3YBtlMJgkZg7BISjAz0uDqwxYX2a3lqA6jANvY6fNwjlnXsRwjKvX3QAM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=XGOBVJqS; arc=none smtp.client-ip=209.85.160.45 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="XGOBVJqS" Received: by mail-oa1-f45.google.com with SMTP id 586e51a60fabf-417571c6083so1955135fac.2 for ; Sat, 14 Mar 2026 13:15:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1773519303; x=1774124103; darn=lists.linux.dev; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=5cgfXJCeRN9gp7HY5pu2xM6oON8ruZZKeF5hMpMsDXs=; b=XGOBVJqSDKew2eGmfYE/7xIafCiEVLUdrEOoBwDpp0OhX/x4DvdRTQ9hdXeVSljPue cT/q9aajQYD6qH/FgTKwnY/6KKI+I4t0FG3SeC2+/Wqk1Z5WhPnb29U4e9YSd+V3r9Jy vpt+bCz2k0jlxsQJ35xBSd4BnhPREPXDOT9Be4zgret1RnoP5IkOmpGEem3VwMz4AkyI qx/H+a0uWKeTmGGiuEYWX8jMt4tZm2oTOXIuEF8/tO9Rd0Ef+KbXB+U05DstzHl4acd8 L+EOtmjynzEfWfAC2jRyqTw+PRzRbjoGZDfOCpgiqzl+pfen5t24dqYk6wO1MWIH/bQb MNdg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1773519303; x=1774124103; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=5cgfXJCeRN9gp7HY5pu2xM6oON8ruZZKeF5hMpMsDXs=; b=CsZAlJQMqArbJo1aeMEidWdd3hNhEQBzcJgFkZnyIDErPPil1MV3QXoV6TYcft91In V0A++Gb3a0mgYiEvoUZBKCxKTXxFjZRbSkM82+ZNvPsPJI8tHjuluQxoUb67pdshX8kn 2Qi8A6xm8VJ/My2f06HVNuujXdI259T8FIAxaLatrbXXOxoGwC7CkXqyatHCIetzDnoB zKc5wfXzllFdXsLdoJihmPjjv7a2BDYC8+nocESat40GnIyW0As/DnENg/xnX5DoF5yc Y1H9L57S34ymBKDENUgbtJd06AUTfSS8DwdGRhDiWf69FfQxc7IUsKs1P2oLZu8ZTFl8 i8Lw== X-Forwarded-Encrypted: i=1; AJvYcCVCiG8fXXWoez+YRp/Y08O3QDlnzY+9yIfgTPDTDGFBBcX4EUnhdnd1Umo7dRZK3O4g/dgDrw==@lists.linux.dev X-Gm-Message-State: AOJu0YwPE9qpnFW577ZltoQZ0IzFRbPOyXd1j6VPQY69MMcn0LoFnBLh K6mjCNA3WcnQpyfIhDqjKl8ioshq2U+CU7hjxPNN+RHml16z7BQOzSWm X-Gm-Gg: ATEYQzyEy7PcT3SKThGEjhNu3oo+McKWuYKIKos7ImGZXNGqxxu4mrO2dBandh5oXhA z/DxfxIed7msrOErQCBT5gnVFefj42H2AGL1qPntoAwhnDcNr9NibEFaVL82CeWVtEE4NUt44D1 zMZoEeYZNQnNuZTO0a2BFWF/Lf0U/eILA6tTds6F3w/q5JzpR8d1o18+PS3Xm8OJf9NwHrogaOC WT4vwLs2lXeIQRquC8ix2sdpJl4GET/tjV/9az38p/KY52rGEvxAMmxP4uiHG3MClX9A3QuowlP CG4eEu5OViB0UQJdoaNnUYl9JSQNoFmzN9c/7gFuN6f5Ptwz+WEKzCG20b0wM6mKi6b8AgMjHc8 /oXNoSVPUtnbshdA5k2Iw2APeM87br8cKezHnG/L8v0Itdc2aZbxJTNHBOTKpCJfxxCD4lQNhNX SE28REYMnQ0R5mJLhFULtxO/soy/LLZ5GwbGjq3A3six0HbtykAFmUGg9G07YiZWa1yIxUsW3nY /LoosYwtOrZwWmEmmL9BrzAYZy0LFm9voin3RO/ X-Received: by 2002:a05:6870:7d16:b0:417:4bd3:f5f5 with SMTP id 586e51a60fabf-417b93e2407mr4714434fac.37.1773519302604; Sat, 14 Mar 2026 13:15:02 -0700 (PDT) Received: from Atwell-Laptop.. (108-212-132-20.lightspeed.irvnca.sbcglobal.net. [108.212.132.20]) by smtp.gmail.com with ESMTPSA id 586e51a60fabf-4177e5e8185sm11914165fac.12.2026.03.14.13.15.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 14 Mar 2026 13:15:02 -0700 (PDT) From: atwellwea@gmail.com To: netdev@vger.kernel.org, davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com, edumazet@google.com, ncardwell@google.com Cc: linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-trace-kernel@vger.kernel.org, mptcp@lists.linux.dev, dsahern@kernel.org, horms@kernel.org, kuniyu@google.com, andrew+netdev@lunn.ch, willemdebruijn.kernel@gmail.com, jasowang@redhat.com, skhan@linuxfoundation.org, corbet@lwn.net, matttbe@kernel.org, martineau@kernel.org, geliang@kernel.org, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, 0x7f454c46@gmail.com Subject: [PATCH net-next v2 13/14] netdevsim: add peer RX truesize support for selftests Date: Sat, 14 Mar 2026 14:13:47 -0600 Message-ID: <20260314201348.1786972-14-atwellwea@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260314201348.1786972-1-atwellwea@gmail.com> References: <20260314201348.1786972-1-atwellwea@gmail.com> Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Wesley Atwell Add a debugfs-controlled peer RX truesize knob to netdevsim, inflate the forwarded skb only on the peer RX side, and cover the resulting socket memory-accounting behavior with a dedicated selftest. This keeps the synthetic cost out of the sender-side skb geometry while giving the selftests a second runtime vehicle for the receive-memory accounting exercised by the TCP rwnd work. Signed-off-by: Wesley Atwell --- drivers/net/netdevsim/netdev.c | 145 +++++- drivers/net/netdevsim/netdevsim.h | 4 + .../selftests/drivers/net/netdevsim/Makefile | 1 + .../drivers/net/netdevsim/peer-rx-truesize.sh | 426 ++++++++++++++++++ 4 files changed, 575 insertions(+), 1 deletion(-) create mode 100755 tools/testing/selftests/drivers/net/netdevsim/peer-rx-t= ruesize.sh diff --git a/drivers/net/netdevsim/netdev.c b/drivers/net/netdevsim/netdev.c index 5ec028a00c62..22238df79b6a 100644 --- a/drivers/net/netdevsim/netdev.c +++ b/drivers/net/netdevsim/netdev.c @@ -17,8 +17,10 @@ #include #include #include +#include #include #include +#include #include #include #include @@ -37,6 +39,91 @@ MODULE_IMPORT_NS("NETDEV_INTERNAL"); =20 #define NSIM_RING_SIZE 256 =20 +struct nsim_rx_truesize { + refcount_t refs; + u32 value; +}; + +static struct nsim_rx_truesize * +nsim_rx_truesize_get(struct nsim_rx_truesize *rx_truesize) +{ + if (!rx_truesize) + return NULL; + + if (!refcount_inc_not_zero(&rx_truesize->refs)) + return NULL; + + return rx_truesize; +} + +static void nsim_rx_truesize_put(struct nsim_rx_truesize *rx_truesize) +{ + if (!rx_truesize) + return; + + if (refcount_dec_and_test(&rx_truesize->refs)) + kfree(rx_truesize); +} + +static ssize_t nsim_rx_truesize_read(struct file *file, char __user *user_= buf, + size_t count, loff_t *ppos) +{ + struct nsim_rx_truesize *rx_truesize =3D file->private_data; + char buf[24]; + int len; + + len =3D scnprintf(buf, sizeof(buf), "%u\n", + READ_ONCE(rx_truesize->value)); + + return simple_read_from_buffer(user_buf, count, ppos, buf, len); +} + +static ssize_t nsim_rx_truesize_write(struct file *file, + const char __user *user_buf, + size_t count, loff_t *ppos) +{ + struct nsim_rx_truesize *rx_truesize =3D file->private_data; + u32 value; + int err; + + err =3D kstrtou32_from_user(user_buf, count, 0, &value); + if (err) + return err; + + WRITE_ONCE(rx_truesize->value, value); + + return count; +} + +static int nsim_rx_truesize_open(struct inode *inode, struct file *file) +{ + struct nsim_rx_truesize *rx_truesize; + + rx_truesize =3D nsim_rx_truesize_get(inode->i_private); + if (!rx_truesize) + return -ENODEV; + + file->private_data =3D rx_truesize; + + return nonseekable_open(inode, file); +} + +static int nsim_rx_truesize_release(struct inode *inode, struct file *file) +{ + nsim_rx_truesize_put(file->private_data); + + return 0; +} + +static const struct file_operations nsim_rx_truesize_fops =3D { + .owner =3D THIS_MODULE, + .open =3D nsim_rx_truesize_open, + .read =3D nsim_rx_truesize_read, + .write =3D nsim_rx_truesize_write, + .release =3D nsim_rx_truesize_release, + .llseek =3D noop_llseek, +}; + static void nsim_start_peer_tx_queue(struct net_device *dev, struct nsim_r= q *rq) { struct netdevsim *ns =3D netdev_priv(dev); @@ -117,6 +204,28 @@ static int nsim_forward_skb(struct net_device *tx_dev, return nsim_napi_rx(tx_dev, rx_dev, rq, skb); } =20 +/* Tests can inflate peer RX skb->truesize to exercise receiver-side TCP + * accounting under scaling-ratio drift without perturbing sender-side skb + * ownership. + */ +static void nsim_rx_update_truesize(struct sk_buff *skb, u32 extra) +{ + unsigned int truesize; + + if (!extra) + return; + + if (check_add_overflow(skb->truesize, extra, &truesize)) + truesize =3D UINT_MAX; + + skb->truesize =3D truesize; +} + +static u32 nsim_rx_extra_truesize(const struct netdevsim *ns) +{ + return READ_ONCE(ns->rx_truesize->value); +} + static netdev_tx_t nsim_start_xmit(struct sk_buff *skb, struct net_device = *dev) { struct netdevsim *ns =3D netdev_priv(dev); @@ -125,7 +234,9 @@ static netdev_tx_t nsim_start_xmit(struct sk_buff *skb,= struct net_device *dev) unsigned int len =3D skb->len; struct netdevsim *peer_ns; struct netdev_config *cfg; + struct sk_buff *nskb; struct nsim_rq *rq; + u32 extra; int rxq; int dr; =20 @@ -160,7 +271,24 @@ static netdev_tx_t nsim_start_xmit(struct sk_buff *skb= , struct net_device *dev) cfg->hds_thresh > len))) skb_linearize(skb); =20 + extra =3D nsim_rx_extra_truesize(peer_ns); skb_tx_timestamp(skb); + if (extra) { + /* Clone before inflating truesize so only the peer RX path sees + * the synthetic cost; sender-side skb accounting stays put. + */ + nskb =3D skb_clone(skb, GFP_ATOMIC); + if (!nskb) { + if (psp_ext) + __skb_ext_put(psp_ext); + goto out_drop_free; + } + + consume_skb(skb); + skb =3D nskb; + nsim_rx_update_truesize(skb, extra); + } + if (unlikely(nsim_forward_skb(dev, peer_dev, skb, rq, psp_ext) =3D=3D NET_RX_DROP)) goto out_drop_cnt; @@ -1121,6 +1249,7 @@ struct netdevsim *nsim_create(struct nsim_dev *nsim_d= ev, u8 perm_addr[ETH_ALEN]) { struct net_device *dev; + struct nsim_rx_truesize *rx_truesize; struct netdevsim *ns; int err; =20 @@ -1140,6 +1269,13 @@ struct netdevsim *nsim_create(struct nsim_dev *nsim_= dev, ns->nsim_bus_dev =3D nsim_dev->nsim_bus_dev; SET_NETDEV_DEV(dev, &ns->nsim_bus_dev->dev); SET_NETDEV_DEVLINK_PORT(dev, &nsim_dev_port->devlink_port); + rx_truesize =3D kzalloc_obj(*rx_truesize); + if (!rx_truesize) { + err =3D -ENOMEM; + goto err_free_netdev; + } + refcount_set(&rx_truesize->refs, 1); + ns->rx_truesize =3D rx_truesize; nsim_ethtool_init(ns); if (nsim_dev_port_is_pf(nsim_dev_port)) err =3D nsim_init_netdevsim(ns); @@ -1153,21 +1289,27 @@ struct netdevsim *nsim_create(struct nsim_dev *nsim= _dev, ns->qr_dfs =3D debugfs_create_file("queue_reset", 0200, nsim_dev_port->ddir, ns, &nsim_qreset_fops); + ns->rx_truesize_dfs =3D debugfs_create_file("rx_extra_truesize", 0600, + nsim_dev_port->ddir, + ns->rx_truesize, + &nsim_rx_truesize_fops); return ns; =20 err_free_netdev: + nsim_rx_truesize_put(ns->rx_truesize); free_netdev(dev); return ERR_PTR(err); } =20 void nsim_destroy(struct netdevsim *ns) { + struct nsim_rx_truesize *rx_truesize =3D ns->rx_truesize; struct net_device *dev =3D ns->netdev; struct netdevsim *peer; =20 + debugfs_remove(ns->rx_truesize_dfs); debugfs_remove(ns->qr_dfs); debugfs_remove(ns->pp_dfs); - if (ns->nb.notifier_call) unregister_netdevice_notifier_dev_net(ns->netdev, &ns->nb, &ns->nn); @@ -1198,6 +1340,7 @@ void nsim_destroy(struct netdevsim *ns) } =20 free_netdev(dev); + nsim_rx_truesize_put(rx_truesize); } =20 bool netdev_is_nsim(struct net_device *dev) diff --git a/drivers/net/netdevsim/netdevsim.h b/drivers/net/netdevsim/netd= evsim.h index f767fc8a7505..972ad274060e 100644 --- a/drivers/net/netdevsim/netdevsim.h +++ b/drivers/net/netdevsim/netdevsim.h @@ -75,6 +75,8 @@ struct nsim_macsec { u8 nsim_secy_count; }; =20 +struct nsim_rx_truesize; + struct nsim_ethtool_pauseparam { bool rx; bool tx; @@ -144,6 +146,8 @@ struct netdevsim { } udp_ports; =20 struct page *page; + struct nsim_rx_truesize *rx_truesize; + struct dentry *rx_truesize_dfs; struct dentry *pp_dfs; struct dentry *qr_dfs; =20 diff --git a/tools/testing/selftests/drivers/net/netdevsim/Makefile b/tools= /testing/selftests/drivers/net/netdevsim/Makefile index 1a228c5430f5..9e9e48d5913b 100644 --- a/tools/testing/selftests/drivers/net/netdevsim/Makefile +++ b/tools/testing/selftests/drivers/net/netdevsim/Makefile @@ -14,6 +14,7 @@ TEST_PROGS :=3D \ macsec-offload.sh \ nexthop.sh \ peer.sh \ + peer-rx-truesize.sh \ psample.sh \ tc-mq-visibility.sh \ udp_tunnel_nic.sh \ diff --git a/tools/testing/selftests/drivers/net/netdevsim/peer-rx-truesize= .sh b/tools/testing/selftests/drivers/net/netdevsim/peer-rx-truesize.sh new file mode 100755 index 000000000000..6d1101d20847 --- /dev/null +++ b/tools/testing/selftests/drivers/net/netdevsim/peer-rx-truesize.sh @@ -0,0 +1,426 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0-only + +set -euo pipefail + +lib_dir=3D$(dirname "$0")/../../../net +source "$lib_dir"/lib.sh + +NSIM_SRV_ID=3D$((1024 + RANDOM % 1024)) +NSIM_CLI_ID=3D$((2048 + RANDOM % 1024)) +NSIM_SYS_LINK=3D/sys/bus/netdevsim/link_device +SERVER_ADDR=3D192.0.2.1 +CLIENT_ADDR=3D192.0.2.2 +RMEM_PORT=3D12345 +WARM_PORT=3D12346 +RMEM_QUEUED_LEN=3D65000 +RMEM_INFLATED_LEN=3D65000 +RMEM_SMALL_EXTRA=3D4096 +RMEM_LARGE_EXTRA=3D65536 +WARM_WARMUP_ROUNDS=3D16 +WARM_WARMUP_LEN=3D65000 +WARM_QUEUED_LEN=3D62000 +WARM_INFLATED_LEN=3D65000 +WARM_EXTRA=3D65536 + +srv_dev=3D +cli_dev=3D +srv_pid=3D +cli_pid=3D +srv_fd=3D +cli_fd=3D +stage_dir=3D +CASE_BASE_METRIC=3D +CASE_FINAL_METRIC=3D + +cleanup() +{ + local rc=3D$? + + if [ -n "${srv_pid:-}" ]; then + kill "${srv_pid}" 2>/dev/null || true + wait "${srv_pid}" 2>/dev/null || true + fi + + if [ -n "${cli_pid:-}" ]; then + kill "${cli_pid}" 2>/dev/null || true + wait "${cli_pid}" 2>/dev/null || true + fi + + if [ -n "${srv_fd:-}" ]; then + eval "exec ${srv_fd}<&-" + fi + + if [ -n "${cli_fd:-}" ]; then + eval "exec ${cli_fd}<&-" + fi + + if [ -d "${stage_dir:-}" ]; then + rm -rf "${stage_dir}" + fi + + cleanup_netdevsim "${NSIM_SRV_ID}" 2>/dev/null || true + cleanup_netdevsim "${NSIM_CLI_ID}" 2>/dev/null || true + cleanup_ns "${SRV:-}" "${CLI:-}" 2>/dev/null || true + + exit "${rc}" +} + +trap cleanup EXIT + +ensure_debugfs() +{ + if mount | grep -q 'on /sys/kernel/debug type debugfs'; then + return 0 + fi + + if ! mount -t debugfs none /sys/kernel/debug >/dev/null 2>&1; then + echo "SKIP: failed to mount debugfs" + exit "${ksft_skip}" + fi +} + +ensure_netdevsim() +{ + if [ -w /sys/bus/netdevsim/new_device ]; then + return 0 + fi + + if ! modprobe netdevsim >/dev/null 2>&1; then + echo "SKIP: no netdevsim support" + exit "${ksft_skip}" + fi +} + +create_nsim() +{ + local id=3D"$1" + local ns=3D"$2" + local addr=3D"$3" + local dev + + echo "${id}" | ip netns exec "${ns}" tee /sys/bus/netdevsim/new_device >/= dev/null + udevadm settle + + dev=3D$(ip netns exec "${ns}" ls /sys/bus/netdevsim/devices/netdevsim"${i= d}"/net) + ip -netns "${ns}" link set dev "${dev}" name "nsim${id}" + ip -netns "${ns}" addr add "${addr}/24" dev "nsim${id}" + ip -netns "${ns}" link set dev "nsim${id}" up + + echo "nsim${id}" +} + +link_nsim_peers() +{ + local srv_ifindex + local cli_ifindex + + eval "exec {srv_fd} "${NSIM_SYS_LI= NK}" +} + +wait_for_file() +{ + local path=3D"$1" + local i + + for i in $(seq 100); do + if [ -e "${path}" ]; then + return 0 + fi + sleep 0.1 + done + + return 1 +} + +server_python=3D' +import array +import fcntl +import os +import socket +import struct +import sys +import time + +SO_MEMINFO =3D 55 +SK_MEMINFO_RMEM_ALLOC =3D 0 +TCP_MAXSEG =3D getattr(socket, "TCP_MAXSEG", 2) +FIONREAD =3D 0x541B +POLL_INTERVAL =3D 0.01 +POLL_TIMEOUT =3D 20.0 + +(mode, host, port, warmup_rounds, warmup_len, queued_len, inflated_len, + ready_file, result_file) =3D sys.argv[1:] +port =3D int(port) +warmup_rounds =3D int(warmup_rounds) +warmup_len =3D int(warmup_len) +queued_len =3D int(queued_len) +inflated_len =3D int(inflated_len) + +def queued_bytes(sock): + buf =3D array.array("I", [0]) + fcntl.ioctl(sock.fileno(), FIONREAD, buf, True) + return buf[0] + +def wait_for_queued(sock, target): + deadline =3D time.time() + POLL_TIMEOUT + while time.time() < deadline: + if queued_bytes(sock) >=3D target: + return + time.sleep(POLL_INTERVAL) + raise SystemExit(f"timed out waiting for {target} queued bytes") + +def meminfo(sock): + raw =3D sock.getsockopt(socket.SOL_SOCKET, SO_MEMINFO, 9 * 4) + return struct.unpack("=3D9I", raw) + +def wait_for_growth(sock, idx, base): + deadline =3D time.time() + POLL_TIMEOUT + while time.time() < deadline: + cur =3D meminfo(sock)[idx] + if cur > base: + return cur + time.sleep(POLL_INTERVAL) + raise SystemExit(f"timed out waiting for SO_MEMINFO[{idx}] growth from= {base}") + +def write_metric(path, value): + with open(path, "w", encoding=3D"ascii") as fp: + fp.write(f"{value}\n") + +def recv_all(sock, total): + remaining =3D total + while remaining: + chunk =3D sock.recv(min(65536, remaining)) + if not chunk: + raise SystemExit("unexpected EOF while draining receive data") + remaining -=3D len(chunk) + +listener =3D socket.socket(socket.AF_INET, socket.SOCK_STREAM) +listener.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) +listener.setsockopt(socket.IPPROTO_TCP, TCP_MAXSEG, 1000) +listener.bind((host, port)) +listener.listen(1) +conn, _ =3D listener.accept() + +for _ in range(warmup_rounds): + recv_all(conn, warmup_len) + +if mode =3D=3D "rmem_alloc": + wait_for_queued(conn, queued_len) + base_metric =3D meminfo(conn)[SK_MEMINFO_RMEM_ALLOC] + write_metric(ready_file, base_metric) + + recv_all(conn, queued_len) + wait_for_queued(conn, inflated_len) + grown_metric =3D meminfo(conn)[SK_MEMINFO_RMEM_ALLOC] + write_metric(result_file, grown_metric) +elif mode =3D=3D "rmem_alloc_warm": + wait_for_queued(conn, queued_len) + base_metric =3D meminfo(conn)[SK_MEMINFO_RMEM_ALLOC] + write_metric(ready_file, base_metric) + + wait_for_queued(conn, queued_len + 1) + grown_metric =3D wait_for_growth(conn, SK_MEMINFO_RMEM_ALLOC, base_met= ric) + write_metric(result_file, grown_metric) +elif mode =3D=3D "rmem_alloc_growth": + # The growth cases compare against a live socket metric, so wait for + # observed growth instead of trusting one instantaneous post-queue sam= ple. + wait_for_queued(conn, queued_len) + base_metric =3D meminfo(conn)[SK_MEMINFO_RMEM_ALLOC] + write_metric(ready_file, base_metric) + + recv_all(conn, queued_len) + wait_for_queued(conn, inflated_len) + grown_metric =3D wait_for_growth(conn, SK_MEMINFO_RMEM_ALLOC, base_met= ric) + write_metric(result_file, grown_metric) +else: + raise SystemExit(f"unknown mode: {mode}") +' + +client_python=3D' +import os +import socket +import sys +import time + +POLL_INTERVAL =3D 0.01 +POLL_TIMEOUT =3D 20.0 + +host, port, warmup_rounds, warmup_len, queued_len, inflated_len, gate_file= =3D sys.argv[1:] +port =3D int(port) +warmup_rounds =3D int(warmup_rounds) +warmup_len =3D int(warmup_len) +queued_len =3D int(queued_len) +inflated_len =3D int(inflated_len) + +def send_all(sock, total): + payload =3D b"a" * min(total, 65536) + left =3D total + while left: + chunk =3D payload[: min(len(payload), left)] + sent =3D sock.send(chunk) + if sent <=3D 0: + raise SystemExit("short send") + left -=3D sent + +def wait_for_file(path): + deadline =3D time.time() + POLL_TIMEOUT + while time.time() < deadline: + if os.path.exists(path): + return + time.sleep(POLL_INTERVAL) + raise SystemExit(f"timed out waiting for {path}") + +cli =3D socket.socket(socket.AF_INET, socket.SOCK_STREAM) +cli.setsockopt(socket.IPPROTO_TCP, socket.TCP_MAXSEG, 1000) +cli.connect((host, port)) +for _ in range(warmup_rounds): + send_all(cli, warmup_len) +send_all(cli, queued_len) +wait_for_file(gate_file) +send_all(cli, inflated_len) +cli.close() +' + +read_metric() +{ + local path=3D"$1" + local value + + if ! read -r value < "${path}"; then + echo "FAIL: unable to read metric from ${path}" + exit "${ksft_fail}" + fi + + printf '%s\n' "${value}" +} + +run_case() +{ + local case_id=3D"$1" + local mode=3D"$2" + local port=3D"$3" + local warmups=3D"$4" + local warmup_len=3D"$5" + local queued_len=3D"$6" + local inflated_len=3D"$7" + local extra=3D"$8" + local label=3D"$9" + local ready_file=3D"${stage_dir}/${case_id}.ready" + local result_file=3D"${stage_dir}/${case_id}.result" + local gate_file=3D"${stage_dir}/${case_id}.gate" + + rm -f "${ready_file}" "${result_file}" "${gate_file}" + echo 0 > "${dfs_file}" + + ip netns exec "${SRV}" python3 - "${mode}" "${SERVER_ADDR}" "${port}" \ + "${warmups}" "${warmup_len}" "${queued_len}" "${inflated_len}" \ + "${ready_file}" "${result_file}" < "${dfs_file}" + touch "${gate_file}" + + wait "${cli_pid}" + cli_pid=3D + wait "${srv_pid}" + srv_pid=3D + + CASE_BASE_METRIC=3D$(read_metric "${ready_file}") + CASE_FINAL_METRIC=3D$(read_metric "${result_file}") + + echo "PASS: ${label}" +} + +# This test only proves that injected truesize reaches socket memory +# accounting. Packetdrill covers the sender-visible rwnd accept/drop logic. + +assert_no_growth() +{ + local label=3D"$1" + + if [ "${CASE_FINAL_METRIC}" -gt "${CASE_BASE_METRIC}" ]; then + echo "FAIL: ${label}: metric grew unexpectedly:" \ + "base=3D${CASE_BASE_METRIC}" \ + "after=3D${CASE_FINAL_METRIC}" + exit "${ksft_fail}" + fi +} + +assert_growth() +{ + local label=3D"$1" + + if [ "${CASE_FINAL_METRIC}" -le "${CASE_BASE_METRIC}" ]; then + echo "FAIL: ${label}: metric did not grow:" \ + "base=3D${CASE_BASE_METRIC}" \ + "after=3D${CASE_FINAL_METRIC}" + exit "${ksft_fail}" + fi +} + +ensure_debugfs +ensure_netdevsim +set +u +setup_ns SRV CLI +set -u + +srv_dev=3D$(create_nsim "${NSIM_SRV_ID}" "${SRV}" "${SERVER_ADDR}") +cli_dev=3D$(create_nsim "${NSIM_CLI_ID}" "${CLI}" "${CLIENT_ADDR}") +link_nsim_peers + +ip netns exec "${SRV}" sysctl -wq net.ipv4.tcp_moderate_rcvbuf=3D0 + +stage_dir=3D$(mktemp -d) +dfs_file=3D"/sys/kernel/debug/netdevsim/netdevsim${NSIM_SRV_ID}/ports/0/rx= _extra_truesize" + +run_case "rmem_noop" "rmem_alloc" "${RMEM_PORT}" 0 0 \ + "${RMEM_QUEUED_LEN}" "${RMEM_INFLATED_LEN}" 0 \ + "peer rx truesize zero no-op" +assert_no_growth "peer rx truesize zero no-op" + +run_case "rmem_small" "rmem_alloc_growth" "${RMEM_PORT}" 0 0 \ + "${RMEM_QUEUED_LEN}" "${RMEM_INFLATED_LEN}" "${RMEM_SMALL_EXTRA}" \ + "peer rx truesize small rmem_alloc" +assert_growth "peer rx truesize small rmem_alloc" +small_delta=3D$((CASE_FINAL_METRIC - CASE_BASE_METRIC)) + +run_case "rmem_large" "rmem_alloc_growth" "${RMEM_PORT}" 0 0 \ + "${RMEM_QUEUED_LEN}" "${RMEM_INFLATED_LEN}" "${RMEM_LARGE_EXTRA}" \ + "peer rx truesize large rmem_alloc" +assert_growth "peer rx truesize large rmem_alloc" +large_delta=3D$((CASE_FINAL_METRIC - CASE_BASE_METRIC)) + +if [ "${large_delta}" -le "${small_delta}" ]; then + echo "FAIL: peer rx truesize stepped rmem_alloc:" \ + "small_delta=3D${small_delta}" \ + "large_delta=3D${large_delta}" + exit "${ksft_fail}" +fi + +run_case "rmem_warm" "rmem_alloc_warm" "${WARM_PORT}" "${WARM_WARMUP_ROUND= S}" "${WARM_WARMUP_LEN}" \ + "${WARM_QUEUED_LEN}" "${WARM_INFLATED_LEN}" "${WARM_EXTRA}" \ + "peer rx truesize warm rmem_alloc" +assert_growth "peer rx truesize warm rmem_alloc" --=20 2.43.0 From nobody Sun Mar 22 08:10:23 2026 Received: from mail-oa1-f54.google.com (mail-oa1-f54.google.com [209.85.160.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C44F638F641 for ; Sat, 14 Mar 2026 20:15:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.54 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773519307; cv=none; b=NJB8TEz7oxA2oKXLKfAajWovZ1MlEUFEYi+WaabgdFR9uxW/J/iN+bKMjAxHcftXO8+OXhxcBoTrsptLfsnHkIDb52mXCbiclLCwJnTydXsmq495aeTxZuDcFKjXRFwiAPBTJeCrsvmz4TF6GVVa5CbSzDNGshvtQ3vw49xvCI8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773519307; c=relaxed/simple; bh=cj/seZubwP7nLqBoz8ofrtucQHhr4mn40yQgMXMr40Y=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=OMCPSpZ4PAAiGZJTpRGXjIVnylltOcOXcfT1mSlc7U/QcPytbH4lJ2klGt3Xe9NrlWKmYJvaNPPDwvYorA92Jp9VlGE8cYv8XpRQcM5MSTgGcSwq6pvinlHKKcgQ3D/MBQcja6uRMNo4Lv7SEapbeWA16J7iZcJdBZ3nHyZ9QPo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=ivkAmQmz; arc=none smtp.client-ip=209.85.160.54 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="ivkAmQmz" Received: by mail-oa1-f54.google.com with SMTP id 586e51a60fabf-40946982a78so1278414fac.2 for ; Sat, 14 Mar 2026 13:15:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1773519304; x=1774124104; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ib/vlu986mcpuvwIbmamafwqas870ti4JrE+4bqQJAA=; b=ivkAmQmzdcLJlNNYxp6d5Wvnrm5CvFJdBD52nglEHtaE7RwLh47ZMzL/ttonOB+vJW ZQDGv1lXXvYBjtu1ENydtWMCaC2R2fQ7ha+kdZFhUuOrMKfz9QO4ITE9pa5Zvew7Bx/V 3dtfMLqZ1vEpFcQSBZvLiFGoB/FTq8slitueAr/d1F4HZw+eKO9/iLaEHzdp882Ss+hS C6EjZrO/iSi7raWOMd5nLDoooPqMIEwpIYFA6dRACEEYxrdCceevaNy2bx3fyx8kMWBx 7JOOU4OtRct6dgB5K6pv4voVVEYlyEHiMgrIh35dTNf9rDrwFhxld1VAse/CgIwdmcfj ifjA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1773519304; x=1774124104; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=ib/vlu986mcpuvwIbmamafwqas870ti4JrE+4bqQJAA=; b=ULHD4yvVtKZM5/4ugA01aMSBzH1LWqLy0L8gBCGiX0dAJW7JBprLRmeOCY4Dmi8CwL JLpzlUa9qk1/3yJItPtSsJqjSTcY23wDsAJ1gG/FiDLyhlLSQXz46C+63M1/PIXW/Z/F Km0frXI2hac3Ygob6Bmth8xt+x22cm0rj4Bnl2m/JnDACoZIS82RFo5yNWFLLOFq5ixw b+7EM2oXTeduvrmYyIKsOHSnwCxrMvUsh5gObvcTj8a0j7ZD6r2wwIcsdiud6Zshi/HZ dWMlxyGvzo30QAsSsIFwX2XKowmMU1iG0OjVM/mrz3jJZvWVKr6MZWBKChZDhRvZqO2P grgQ== X-Gm-Message-State: AOJu0YwtoEAZSXSpbgYSCHguGaw355JdPiqNch3CaFTpaRj1h/Bv0RXq dr57R/onum0T67G5n9ECi79QTFsUPnEV6rgX+PIGtjXZXUO9d2PysIZ6 X-Gm-Gg: ATEYQzwKRYJkUHIiJWMwHECIJQQEB/Zo35F3upcdAEDoaUn9QUOmv1igqQQuCxZow8s EFoTFoOw9pEgd3VI7lponwhG6+XugdwWVMVVa8Avql7jeA3LJl1XfRmLu/aTHEfS8yxxtOZhtLl 2PGRHfrH7w3l/GXFaN4U28AjEgkmJIMstrRhB4wmXNjqSZfnhGdKbLnJ8uxTvTRv3kOn3wk5lJD DLV1yWbVoscH/CbZLk0oLxboeOUg4CP4aNw+pUIMAZTq6JXrr4uO6UMUEBmwC9O5Ws9MkWifS6C ApBtCTv7oelkgRdsfX5wyVPC7e0seS/2Ft7THj+SVcHNMbn8FFvxq0PPcTrIwAUDjJQSA8jV3/0 816lDsth82HOCzaQUNNPhHxlprMq5RnOHeq4EluNywnUEx4EumQsltvOjrRNYZE3k360RnDdulg ZGlYwJBM5143qyOWXZrKcn7lrJhEqFwHK9nK20Gg0kz6L1eY9IRsGxg5sQ8phOqYws8o1Gl/81c 6fOVenimWMt5Pm7qAg0MZQq3IL5lD/949u3rHTb X-Received: by 2002:a05:6871:c949:b0:404:168:3ba1 with SMTP id 586e51a60fabf-417b91d86f7mr5008979fac.18.1773519304503; Sat, 14 Mar 2026 13:15:04 -0700 (PDT) Received: from Atwell-Laptop.. (108-212-132-20.lightspeed.irvnca.sbcglobal.net. [108.212.132.20]) by smtp.gmail.com with ESMTPSA id 586e51a60fabf-4177e5e8185sm11914165fac.12.2026.03.14.13.15.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 14 Mar 2026 13:15:04 -0700 (PDT) From: atwellwea@gmail.com To: netdev@vger.kernel.org, davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com, edumazet@google.com, ncardwell@google.com Cc: linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-trace-kernel@vger.kernel.org, mptcp@lists.linux.dev, dsahern@kernel.org, horms@kernel.org, kuniyu@google.com, andrew+netdev@lunn.ch, willemdebruijn.kernel@gmail.com, jasowang@redhat.com, skhan@linuxfoundation.org, corbet@lwn.net, matttbe@kernel.org, martineau@kernel.org, geliang@kernel.org, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, 0x7f454c46@gmail.com Subject: [PATCH net-next v2 14/14] netdevsim: release pinned PSP ext on drop paths Date: Sat, 14 Mar 2026 14:13:48 -0600 Message-ID: <20260314201348.1786972-15-atwellwea@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260314201348.1786972-1-atwellwea@gmail.com> References: <20260314201348.1786972-1-atwellwea@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Wesley Atwell nsim_do_psp() can leave an extra extension reference pinned until nsim_psp_handle_ext() reattaches it to a forwarded skb. Route the drop paths through a common helper and release that extra reference when __dev_forward_skb() fails and when start_xmit() drops the skb before it reaches the peer RX path. This is separate from the peer RX truesize test hook itself. Signed-off-by: Wesley Atwell --- drivers/net/netdevsim/netdev.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/drivers/net/netdevsim/netdev.c b/drivers/net/netdevsim/netdev.c index 22238df79b6a..c22513c523d6 100644 --- a/drivers/net/netdevsim/netdev.c +++ b/drivers/net/netdevsim/netdev.c @@ -187,6 +187,15 @@ static int nsim_napi_rx(struct net_device *tx_dev, str= uct net_device *rx_dev, return NET_RX_SUCCESS; } =20 +/* nsim_do_psp() pins an extra extension ref until nsim_psp_handle_ext() + * reattaches it to a forwarded skb. + */ +static void nsim_psp_ext_put(struct skb_ext *psp_ext) +{ + if (psp_ext) + __skb_ext_put(psp_ext); +} + static int nsim_forward_skb(struct net_device *tx_dev, struct net_device *rx_dev, struct sk_buff *skb, @@ -196,8 +205,10 @@ static int nsim_forward_skb(struct net_device *tx_dev, int ret; =20 ret =3D __dev_forward_skb(rx_dev, skb); - if (ret) + if (ret) { + nsim_psp_ext_put(psp_ext); return ret; + } =20 nsim_psp_handle_ext(skb, psp_ext); =20 @@ -278,11 +289,8 @@ static netdev_tx_t nsim_start_xmit(struct sk_buff *skb= , struct net_device *dev) * the synthetic cost; sender-side skb accounting stays put. */ nskb =3D skb_clone(skb, GFP_ATOMIC); - if (!nskb) { - if (psp_ext) - __skb_ext_put(psp_ext); + if (!nskb) goto out_drop_free; - } =20 consume_skb(skb); skb =3D nskb; @@ -303,6 +311,7 @@ static netdev_tx_t nsim_start_xmit(struct sk_buff *skb,= struct net_device *dev) out_drop_any: dr =3D SKB_DROP_REASON_NOT_SPECIFIED; out_drop_free: + nsim_psp_ext_put(psp_ext); kfree_skb_reason(skb, dr); out_drop_cnt: rcu_read_unlock(); --=20 2.43.0