From nobody Sun Mar 22 08:08:46 2026 Received: from mail-oa1-f48.google.com (mail-oa1-f48.google.com [209.85.160.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 260003B777B for ; Wed, 11 Mar 2026 07:56:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.48 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773215794; cv=none; b=FxdC0eeNTY7vXxlHj73adBlqg4HDJzRdvEvzcSuQBDpRBTd4qvzwbTmZIW3dOdhYrOjEzWTMYrRs6rju5aW4uZcpFKkQvYpuHC6C+0csMIqtkLBZrrgNPQUIhMLmu73IOSISzxspEFdNMZ5anHu6I5ADicGH0nWfcIfPaWQGNGg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773215794; c=relaxed/simple; bh=FYyuTZ6oj/yHfeGN9pAwnMFoWQKizQRfgw8TBRRgtVQ=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=UbFlaycXgPkv+oDf/jA8LcYjIWqZ3KmfTTAAo3jvsExiIzMznRebbWaLzf3auxaHjCWLXveBEbpOfyF02R+deP0SlgvO1QKDUK9u0SYOpJVBVnb44uQ4c46/E5RUyrgfYSjk9NRm7C9hPD2su6P+FOySvGBK/3II85Ya3BMTNJ0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=lONTAza+; arc=none smtp.client-ip=209.85.160.48 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="lONTAza+" Received: by mail-oa1-f48.google.com with SMTP id 586e51a60fabf-40ede943bf0so4920262fac.2 for ; Wed, 11 Mar 2026 00:56:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1773215789; x=1773820589; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=40e3x26UsOAMqAGLAxgvpaUIp6rLm2UmpbVxhD/U51Y=; b=lONTAza+A8SZwtCD93pRqPMiUVmL7xvOTl4aa9qY1c8XZAjojLuW9G+8TvMNSVodZ4 9To8/LBjMDjisOAygz6wdcp2cIUNPYVDjheVZBvAnfzkXxwa1tTtO14tQAYFS63sGUzX MnT4Q1fx7B1zkX+pAK9GecVIUv0zmGseDk/psUwQs2UEs1Gehl1OUgDRYKWMDFMHyH2s dq/x27USjxK1TelWPRl7WGXMc2/ddbfEk/q9MLF6/y9yOaX1tDNshwCNifRxbUFCmYsl 3Uyvk0Q8qsgIoGRcxcBiFkkdjKqdk+41nQuaQn0CkjRftqdoqNZqLhP2LVJ0dsFr9ksd VPgQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1773215789; x=1773820589; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=40e3x26UsOAMqAGLAxgvpaUIp6rLm2UmpbVxhD/U51Y=; b=sh42vpOrQNb4PqR8NID+6Pit+3aBot1/SaVxoW8clg3We8n7fbZjzFSsNQXZGxIxo2 TbwztxJrHY35gzv0gxQ2DBlhs7bJjUa+F3IBxKReeBRDeMU0cOChoB7p+1s4d17UvtUv 9Upp+pIGyVMFPbE7IY0X5ouY6X0lz2c2SdWv1OpW57EjnBLZNwNNPxnxqTxCGaUBea5D M5vk7fcHV4+MT3hhLtW4iyi2VBm6RwRaJDLzFlneoHm48WbD4vaaMSdFc50VPnVwHdcl V013pOluolGr5o6UMAhppEUbK7GOfpMz8EZrWKZKDDu6CQEwZwXYZN2Vd9LLQSOFDYK3 +08Q== X-Forwarded-Encrypted: i=1; AJvYcCXCUOwbzMucaO7H4FPM0Wpeip+7aj9/M3N/j4mBdJDwL9J9kUOh9Au6xbHk6rZnfh9UZnuUuo3h8CHn4C8=@vger.kernel.org X-Gm-Message-State: AOJu0Yw+GTOceqpaDgObHlkBhpKCSSeceV0F6z1MKQfCwie/CaDRgJ0i mjCaRotvjV0y9AHXMvoVGEbtmDaPXx71W3fEDyAG4l3CjtZaEVWAfuLq X-Gm-Gg: ATEYQzxMotPtaoQfubuZox0uMSNQfzxJE4/zwYiNYR46R/zCsUwSskTrTLFQGZODEef pclJY1POorAYpuuUoOQ+9guzY9tnV+tV0pfYImAGKVRdcs6OQd/ayasDXZh79VkiN/yPBnMT4z9 9+5Cd5LN//PC1EZQHt4zsql21IjXK+RcFePLOzYbLLr0W1aAF4g88VNIMl9kaqECsn+KVHPY4u4 afOOTQ/zNRWP/veOkr0Cg/5L2cWtgLmtJpB+lykM4lUjJYiEwcRXvxiCBlA13zgEH57wYRPNmn+ LUc11t17nGTk0f0eWSiePytgY3fUeyE5yzF9JWTbddqN43sfL1Uh7INOLT5stuihTd4ELlIYOTX +5Rf+S4XaaxQXSfsbrSOZAUijXZ7pQ/LDRS4xUqsFeYtTCslGHMBszTL69WerTPqx6jwj9dOGzi HY5gT5zBhEKw8LijBAKvSDis/PZUg05ND84VITxI4upLstXdZaxmBLo7xufNNLt+mRD1gjWRFa0 46e42LXrYuRQ4BaSPZ/ikQb+KLRQ1QczkzDNW5syYTl015E X-Received: by 2002:a05:6871:2b04:b0:3ec:565a:13a4 with SMTP id 586e51a60fabf-4177c8be4f7mr1195845fac.34.1773215788774; Wed, 11 Mar 2026 00:56:28 -0700 (PDT) Received: from localhost.localdomain (108-212-132-20.lightspeed.irvnca.sbcglobal.net. [108.212.132.20]) by smtp.gmail.com with ESMTPSA id 586e51a60fabf-4177e6ae0e3sm1568938fac.16.2026.03.11.00.56.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 Mar 2026 00:56:28 -0700 (PDT) From: Wesley Atwell To: davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com, edumazet@google.com, ncardwell@google.com, dsahern@kernel.org, matttbe@kernel.org, martineau@kernel.org, netdev@vger.kernel.org, mptcp@lists.linux.dev Cc: kuniyu@google.com, horms@kernel.org, geliang@kernel.org, corbet@lwn.net, skhan@linuxfoundation.org, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, 0x7f454c46@gmail.com, linux-doc@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, atwellwea@gmail.com Subject: [PATCH net 1/7] tcp: track advertise-time scaling basis for rcv_wnd Date: Wed, 11 Mar 2026 01:55:54 -0600 Message-Id: <20260311075600.948413-2-atwellwea@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260311075600.948413-1-atwellwea@gmail.com> References: <20260311075600.948413-1-atwellwea@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" tp->rcv_wnd is an advertised window, but later receive-side accounting needs to recover the hard memory budget that window represented when it was exposed. Prepare for that by storing the scaling basis alongside tp->rcv_wnd and centralizing the helper API around the paired state. While here, make the existing receive-memory arithmetic use the shared helper names so later behavioral changes can build on one explicit accounting model. This patch is groundwork only. Later patches will refresh the snapshot at window write sites and consume it in the receive-memory paths. Signed-off-by: Wesley Atwell --- .../networking/net_cachelines/tcp_sock.rst | 1 + include/linux/tcp.h | 1 + include/net/tcp.h | 79 +++++++++++++++++-- net/ipv4/tcp.c | 1 + 4 files changed, 76 insertions(+), 6 deletions(-) diff --git a/Documentation/networking/net_cachelines/tcp_sock.rst b/Documen= tation/networking/net_cachelines/tcp_sock.rst index 563daea10d6c..1415981b9d8a 100644 --- a/Documentation/networking/net_cachelines/tcp_sock.rst +++ b/Documentation/networking/net_cachelines/tcp_sock.rst @@ -12,6 +12,7 @@ struct inet_connection_sock inet_conn u16 tcp_header_len read_mostly = read_mostly tcp_bound_to_half_wnd,tcp_current_mss(tx);tcp_rcv_estab= lished(rx) u16 gso_segs read_mostly = tcp_xmit_size_goal __be32 pred_flags read_write = read_mostly tcp_select_window(tx);tcp_rcv_established(rx) +u8 rcv_wnd_scaling_ratio read_write = read_mostly tcp_set_rcv_wnd,tcp_can_ingest,tcp_clamp_window u64 bytes_received = read_write tcp_rcv_nxt_update(rx) u32 segs_in = read_write tcp_v6_rcv(rx) u32 data_segs_in = read_write tcp_v6_rcv(rx) diff --git a/include/linux/tcp.h b/include/linux/tcp.h index f72eef31fa23..ec6b70c1174b 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -297,6 +297,7 @@ struct tcp_sock { est_ecnfield:2,/* ECN field for AccECN delivered estimates */ accecn_opt_demand:2,/* Demand AccECN option for n next ACKs */ prev_ecnfield:2; /* ECN bits from the previous segment */ + u8 rcv_wnd_scaling_ratio; /* 0 if unknown, else tp->rcv_wnd basis */ __be32 pred_flags; u64 tcp_clock_cache; /* cache last tcp_clock_ns() (see tcp_mstamp_refresh= ()) */ u64 tcp_mstamp; /* most recent packet received/sent */ diff --git a/include/net/tcp.h b/include/net/tcp.h index 978eea2d5df0..187e6d660f62 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1702,6 +1702,26 @@ static inline int tcp_space_from_win(const struct so= ck *sk, int win) return __tcp_space_from_win(tcp_sk(sk)->scaling_ratio, win); } =20 +static inline bool tcp_rcv_wnd_snapshot_valid(const struct tcp_sock *tp) +{ + return tp->rcv_wnd_scaling_ratio !=3D 0; +} + +/* Rebuild hard receive-memory units for data already covered by tp->rcv_w= nd if + * the advertise-time basis is known. Legacy TCP_REPAIR restores can only + * recover tp->rcv_wnd itself; callers must fall back when the snapshot is + * unknown. + */ +static inline bool tcp_space_from_rcv_wnd(const struct tcp_sock *tp, int w= in, + int *space) +{ + if (!tcp_rcv_wnd_snapshot_valid(tp)) + return false; + + *space =3D __tcp_space_from_win(tp->rcv_wnd_scaling_ratio, win); + return true; +} + /* Assume a 50% default for skb->len/skb->truesize ratio. * This may be adjusted later in tcp_measure_rcv_mss(). */ @@ -1709,15 +1729,62 @@ static inline int tcp_space_from_win(const struct s= ock *sk, int win) =20 static inline void tcp_scaling_ratio_init(struct sock *sk) { - tcp_sk(sk)->scaling_ratio =3D TCP_DEFAULT_SCALING_RATIO; + struct tcp_sock *tp =3D tcp_sk(sk); + + tp->scaling_ratio =3D TCP_DEFAULT_SCALING_RATIO; + tp->rcv_wnd_scaling_ratio =3D TCP_DEFAULT_SCALING_RATIO; +} + +/* tp->rcv_wnd is paired with the scaling_ratio that was in force when that + * window was last advertised. Legacy TCP_REPAIR restores can only recover= the + * window value itself and use a zero snapshot until a fresh local window + * advertisement refreshes the pair. + */ +static inline void tcp_set_rcv_wnd_snapshot(struct tcp_sock *tp, u32 win, + u8 scaling_ratio) +{ + tp->rcv_wnd =3D win; + tp->rcv_wnd_scaling_ratio =3D scaling_ratio; +} + +static inline void tcp_set_rcv_wnd(struct tcp_sock *tp, u32 win) +{ + tcp_set_rcv_wnd_snapshot(tp, win, tp->scaling_ratio); +} + +static inline void tcp_set_rcv_wnd_unknown(struct tcp_sock *tp, u32 win) +{ + tcp_set_rcv_wnd_snapshot(tp, win, 0); +} + +/* TCP receive-side accounting reuses sk_rcvbuf as both a hard memory limit + * and as the source material for the advertised receive window after + * scaling_ratio conversion. Keep the byte accounting explicit so admissio= n, + * pruning, and rwnd selection all start from the same quantities. + */ +static inline int tcp_rmem_used(const struct sock *sk) +{ + return atomic_read(&sk->sk_rmem_alloc); +} + +static inline int tcp_rmem_avail(const struct sock *sk) +{ + return READ_ONCE(sk->sk_rcvbuf) - tcp_rmem_used(sk); +} + +/* Sender-visible rwnd headroom also reserves bytes already queued on back= log. + * Those bytes are not free to advertise again until __release_sock() drai= ns + * backlog and clears sk_backlog.len. + */ +static inline int tcp_rwnd_avail(const struct sock *sk) +{ + return tcp_rmem_avail(sk) - READ_ONCE(sk->sk_backlog.len); } =20 /* Note: caller must be prepared to deal with negative returns */ static inline int tcp_space(const struct sock *sk) { - return tcp_win_from_space(sk, READ_ONCE(sk->sk_rcvbuf) - - READ_ONCE(sk->sk_backlog.len) - - atomic_read(&sk->sk_rmem_alloc)); + return tcp_win_from_space(sk, tcp_rwnd_avail(sk)); } =20 static inline int tcp_full_space(const struct sock *sk) @@ -1760,7 +1827,7 @@ static inline bool tcp_rmem_pressure(const struct soc= k *sk) rcvbuf =3D READ_ONCE(sk->sk_rcvbuf); threshold =3D rcvbuf - (rcvbuf >> 3); =20 - return atomic_read(&sk->sk_rmem_alloc) > threshold; + return tcp_rmem_used(sk) > threshold; } =20 static inline bool tcp_epollin_ready(const struct sock *sk, int target) @@ -1910,7 +1977,7 @@ static inline void tcp_fast_path_check(struct sock *s= k) =20 if (RB_EMPTY_ROOT(&tp->out_of_order_queue) && tp->rcv_wnd && - atomic_read(&sk->sk_rmem_alloc) < sk->sk_rcvbuf && + tcp_rmem_avail(sk) > 0 && !tp->urg_data) tcp_fast_path_on(tp); } diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 202a4e57a218..cec9ae1bf875 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -5238,6 +5238,7 @@ static void __init tcp_struct_check(void) CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_txrx, recei= ved_ce); CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_txrx, recei= ved_ecn_bytes); CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_txrx, app_l= imited); + CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_txrx, rcv_w= nd_scaling_ratio); CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_txrx, rcv_w= nd); CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_txrx, rcv_t= stamp); CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_txrx, rx_op= t); --=20 2.34.1 From nobody Sun Mar 22 08:08:46 2026 Received: from mail-oa1-f48.google.com (mail-oa1-f48.google.com [209.85.160.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 429C53B777E for ; Wed, 11 Mar 2026 07:56:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.48 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773215796; cv=none; b=r0lxxjRRCVQWhwDjNMgAlM9zgEx87B4dNIzfelYy4XK7JG0VACIYXGKIeCNtV5jzodQ81HGTHr3yeiCM66EgrGY56O7p7MdhxibLe50ZBdiIpq5THnt6OvDh798FwhxiNePN5Exyl5m/zBVhvnxwvYLgRE2P9y4lotmzEOQaW8s= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773215796; c=relaxed/simple; bh=jPhrqcmDRhUMlM7YZP0Vfe8I9hineTPoN8bOhKWBAv0=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=t2VCKlbWtzqy1krMQ+23CP/CAkzcduvmDHUvndy8PNc0lNPxKoXVH6zQJBH00+2cPswZiS42bpgTdgYwldxCz2oECaSxPy9THmKHXnLFEGfagghNIpEU4Inm++he+xEVHZpHi+Z3jPFCjMOKIzJbpNXgmIAdPo/v5N5dzCQwmh4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=S1CsQpNo; arc=none smtp.client-ip=209.85.160.48 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="S1CsQpNo" Received: by mail-oa1-f48.google.com with SMTP id 586e51a60fabf-4171451e89aso2203048fac.3 for ; Wed, 11 Mar 2026 00:56:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1773215791; x=1773820591; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=wacSMrdd6YBSycXcvfypm/nD64eKKUdf2M/29BZscYc=; b=S1CsQpNohQf8dC1hcEbbvhhBIN0GGWYdx7VS/SnXvB6+7d4ONkyBsj1dWWoinKVXlz diWGDrpkVpvYnWp/QVF0lVnF8qLNRFIjIXx/aaltrDAwz/ypULZjOh09j78TUEX/NuEr QyyT29asTIHqmd1JGj/Ja3eRRxvZKIbICFOFa2oRna5Ra/BBh9LKWuFakqVC53uDINgr WJs//TuoM2pjPTtGaPia/aOcf1pNXF9CAFNg9UcMH7H8buLuRxV5jLMKCdZRnBMyfrsb lgrTVtRkAx+oOROD0aiFAWM9Y3iNMLjNbGrKKAMJwdduuYz8iRq4yimGwMsI8f4/qgXh bNFA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1773215791; x=1773820591; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=wacSMrdd6YBSycXcvfypm/nD64eKKUdf2M/29BZscYc=; b=MnTUJgW/BkPCwXxblJI3H+mMKGiAsv8cRddc7PZTdYV74L5CO68JgPQcLd+YDzceFF 60rCq4kjQmDZUADFcMTmgTbutsob4eHWknsyGhTT/8L+s7IbfuEzazCh1OAYrcKfCXQ3 JrEr58ck2BP5UkeXydPTUYvGL1fpSdX6FqmLgraB7GqfFbC+SlCvN7vqVP2PsQfvcK3g nbqfH6h2eJN8cno+TG5eAhK0knlStc8e+wXjtM9Kt3bvcomvqwJhbezK8CPjOF9x5kQw wFlQrxrD6BTJad+eI3iKeQW0TOph7iwdcfZMWkbGWxMhNTaHvcj+9Nyz691S3qSIIi3b l82Q== X-Forwarded-Encrypted: i=1; AJvYcCXdGBtb7FjBmCXuT9tmIL6DYiVrqJEmUNLj0Owd5dSvWhGNvODaIe2u81vOA/UqqSx/31dyrUJbOBaxVqs=@vger.kernel.org X-Gm-Message-State: AOJu0YyRlMvSBCslEjf5wlrCpMHVkpZIW6SybhjR5LPMucPxGCNBH5Yc 17Qcepff9HRxBvfeYz2H24iA/EcM1sj2UiBy44muKI6KobW5TT27TQv+ X-Gm-Gg: ATEYQzwA2+O0lop2YtGV4zfClSh2BC8RUphw5L7I0en7EirYOtSc3LCBiQIffuqtyTo oENVa0OaaVJK4zeGIi/j0dknn8K/9wcFPvGIDUEHzSG7C3RTcJ6Z0bgPX3OF/8cYB5QnpNIIraC gJ0aQp9l5SXj2mlmpoMg3B3/ICY2LzC3C6XpXGWVTO30kCBQDN5FZDIP/kZA/EaOr9nr7f2RIkq F5sXYKXLIuoL290coFnK+cP+dQkOplWwk+Km1CdgWVssWfj1OOqRytpcMse/BtSzqs1xJo/5B82 pCbrcgrYlrrsJyCPUp+FNRlQTkbH/vjX6Iwkw/WkqGSMLi6AcEVHnBbDiNlc7/R+DrMh9sCq4cR K6ezejkkEP7fcVfAGRaCAXZIwpn1LwSv4bU087ZQXwPyIUNrfbtsXzN1czCxg2oO/4KYZWxERHs A6si/ZTjUOyKP6G50a5+5uwmUYYmh2VF6rSLpnxXMidDSXPeRIafR0eif1bNxy4fTJkyjF2t8gq lNMEpplReeKuxIXD9ZZNvCeHmW4VflZfyO1CT0ShCXxizG+ X-Received: by 2002:a05:6870:91cb:b0:3e8:8e56:674b with SMTP id 586e51a60fabf-4177ca09363mr1118957fac.53.1773215791030; Wed, 11 Mar 2026 00:56:31 -0700 (PDT) Received: from localhost.localdomain (108-212-132-20.lightspeed.irvnca.sbcglobal.net. [108.212.132.20]) by smtp.gmail.com with ESMTPSA id 586e51a60fabf-4177e6ae0e3sm1568938fac.16.2026.03.11.00.56.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 Mar 2026 00:56:30 -0700 (PDT) From: Wesley Atwell To: davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com, edumazet@google.com, ncardwell@google.com, dsahern@kernel.org, matttbe@kernel.org, martineau@kernel.org, netdev@vger.kernel.org, mptcp@lists.linux.dev Cc: kuniyu@google.com, horms@kernel.org, geliang@kernel.org, corbet@lwn.net, skhan@linuxfoundation.org, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, 0x7f454c46@gmail.com, linux-doc@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, atwellwea@gmail.com Subject: [PATCH net 2/7] tcp: preserve rcv_wnd snapshot when updating advertised windows Date: Wed, 11 Mar 2026 01:55:55 -0600 Message-Id: <20260311075600.948413-3-atwellwea@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260311075600.948413-1-atwellwea@gmail.com> References: <20260311075600.948413-1-atwellwea@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Once tp->rcv_wnd carries paired snapshot semantics, every write of the advertised window has to refresh the snapshot at the same time. Convert the active-open, passive-open, and normal advertised-window update sites to use tcp_set_rcv_wnd(). This keeps new sockets and later window advertisements initialized with a valid advertise-time basis before the receive-memory logic starts consuming it. Signed-off-by: Wesley Atwell --- net/ipv4/tcp_minisocks.c | 2 +- net/ipv4/tcp_output.c | 8 ++++++-- 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c index dafb63b923d0..ae8a466b5298 100644 --- a/net/ipv4/tcp_minisocks.c +++ b/net/ipv4/tcp_minisocks.c @@ -603,7 +603,7 @@ struct sock *tcp_create_openreq_child(const struct sock= *sk, newtp->rx_opt.sack_ok =3D ireq->sack_ok; newtp->window_clamp =3D req->rsk_window_clamp; newtp->rcv_ssthresh =3D req->rsk_rcv_wnd; - newtp->rcv_wnd =3D req->rsk_rcv_wnd; + tcp_set_rcv_wnd(newtp, req->rsk_rcv_wnd); newtp->rx_opt.wscale_ok =3D ireq->wscale_ok; if (newtp->rx_opt.wscale_ok) { newtp->rx_opt.snd_wscale =3D ireq->snd_wscale; diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 326b58ff1118..c1b94d67d8fe 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -291,7 +291,7 @@ static u16 tcp_select_window(struct sock *sk) */ if (unlikely(inet_csk(sk)->icsk_ack.pending & ICSK_ACK_NOMEM)) { tp->pred_flags =3D 0; - tp->rcv_wnd =3D 0; + tcp_set_rcv_wnd(tp, 0); tp->rcv_wup =3D tp->rcv_nxt; return 0; } @@ -314,7 +314,7 @@ static u16 tcp_select_window(struct sock *sk) } } =20 - tp->rcv_wnd =3D new_win; + tcp_set_rcv_wnd(tp, new_win); tp->rcv_wup =3D tp->rcv_nxt; =20 /* Make sure we do not exceed the maximum possible @@ -4150,6 +4150,10 @@ static void tcp_connect_init(struct sock *sk) READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_window_scaling), &rcv_wscale, rcv_wnd); + /* tcp_select_initial_window() filled tp->rcv_wnd through its out-param, + * so snapshot the scaling_ratio we will use for that initial rwnd. + */ + tcp_set_rcv_wnd(tp, tp->rcv_wnd); =20 tp->rx_opt.rcv_wscale =3D rcv_wscale; tp->rcv_ssthresh =3D tp->rcv_wnd; --=20 2.34.1 From nobody Sun Mar 22 08:08:46 2026 Received: from mail-oo1-f50.google.com (mail-oo1-f50.google.com [209.85.161.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EB3743B2FF8 for ; Wed, 11 Mar 2026 07:56:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.161.50 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773215800; cv=none; b=sIuioiiDaZP8xlPy8+YIVAf+1/dnvavWvxytXj3jKJGq6iRkd6XHMri8ISUS1fEToJ8pwvk/WV3LhiYccIuKzfvJDu3HKaH7VRMcZrMvHiuovh3hM6LbYjUxASJ2NwPSxcwQ/wgUVEW/gWhQSAO9XdZ/ZyTYSzJWc4WCRRZHnQM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773215800; c=relaxed/simple; bh=ePqPnwQ/F2I2migGyX7cR8MPgEP4dU5kjufej9umejg=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=OeP2CIIK+Iygyzobh59LezDhTFhE84yV5QfipFteIGa1zrkbXm0SnDSxbf3unVKFJYZUIdsU5HpjvMvbhiPBrzQ7ZGwvavNvm4iYiCgyUj1TSANHphRb9HwafoQdVRUnv4NNi1cVEa9mTM+Fc5fOw0BynfkmuStOatuh5RBUWXU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=AaPSH409; arc=none smtp.client-ip=209.85.161.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="AaPSH409" Received: by mail-oo1-f50.google.com with SMTP id 006d021491bc7-67bb5e4cf5aso1761735eaf.2 for ; Wed, 11 Mar 2026 00:56:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1773215793; x=1773820593; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Bbor1ZYXGNJCT22C8JYZ8ruTnAFylNvT3LJ2XPYETWo=; b=AaPSH409ofW22colwt/fc83kgrgzPTOjyuS4QkHKhjnMM7DAEz3gFEYcjW30OWvuvw d2jrxJs8Vri7fMUEAt/8NYZ9F54ZKUdKzhQY5tt9/oJsvMNRIQ1dPALjGexfbN5PY3ww zC9xKNDWTQSJvXkvO2yfmHgcANk65ZQnri4qb+CuvhwKuot4r7AMJZyqI+SYn57fi8GN FO0m9KDoMhpH8Yv/aMzL4LTkp/EiJEgDIH0apRHqUGVwYpE9QlaUEgivHn1CTniy7yEu AwM+/w2u/8GBTPgE1Vwd+G3H11qCyvb1vNYKpJAyVZsuI3RQ8IG8R+kJLYp9NV3DNNNN 9mvg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1773215793; x=1773820593; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=Bbor1ZYXGNJCT22C8JYZ8ruTnAFylNvT3LJ2XPYETWo=; b=OM+54UQxCSa/yrVOzaZjP3DQtdX0brUp2DuNSyYybB7I8kewdRxBvTjzjV6rVreMe7 t0Zl+Msmohr3BNjcrf2tys/w1gwWiQ9Ki7AF4DfIo7RvgfOxlMz3dyxgdM+NlxFAJrlK WageDQup+9B+y0JlPjUk+SBV7OZQ04OGZoHNYWcisCFQT46tHSjSilvaF2raiSP3Vo62 owR293IblyxPthdGuRfG8gUvOFXeMkf3s9xO28lzCjMbsxg9IlV6Sd80Fk54dKeX+Iyw 0mNI6+i55hgXV7hqZSGwjYLst24miY3QbXPplRMzjotayJ+bA1HsF60nl0fotjoNxIxu KhNg== X-Forwarded-Encrypted: i=1; AJvYcCWf25rveqaweLuyt07U38W1+OJIhq3hGRA3rV8WR1SKSYQwI2nb1OkARNnBw6gox2ZJ68J7N0GoDzfYTlY=@vger.kernel.org X-Gm-Message-State: AOJu0YyZadLwebJIk6H7TpCpM4LzMG9adzSjyKBlqV784BuD6cSkdVWA FzKo1y+A9LQbKxtvrKGByOs5GYeGKz2N/8437JDVibbvVZ2U1syTg2Tp X-Gm-Gg: ATEYQzzqCFotVmhmasWF2ZqwtlgtKYDe4Pg3CkJTEweDHb6UaRJuGljLj2FdlYLp51N 7W8LHDO5clIMg5Z240iqadsAR8uq0AFbS0tPUpoYM7gaSHQxx49drj3sDe6elLJquNy/BXBkSmd laUN1SXb/QHLtM9Aki3OWrZOi/46skUdaZqeVJTLEj9yxocGudY+yeUfywBPfYoao1lWKUvSBOy N090kuITZJeRuBjcIXyxAL6htlog5qtqJB+6UqtDFEJB1GXlSk/CFWtBsGqGIZJXDOu5pDTQ8hQ qvWY3Z1AXA5tEfwa1adoz8kU70dCDo2LaGxLpx5rkm2QeP93MNILx3Y9xqIpf1ftzIqyxhpoCzX j8j/uvb6DH8kvEDre2bWmxmeBQRA6JMlSKqz42H3M05O9dUBF0/QLIVXGazqfOObKl/IrMAki34 PVK/2WycDan4De0cggdhcRZ6E61pA/M9/iFKaO5VijfT4BCsrqEzN5+z0D5OLFrrWGqcQ8Yx0cd kWET3W7dyB1cTf+U0qjFQPcpMIXyxgjzWI2WlEdBS9zoVzX X-Received: by 2002:a05:6820:8ca:b0:679:e68b:f95d with SMTP id 006d021491bc7-67bc8a08660mr1135048eaf.53.1773215793063; Wed, 11 Mar 2026 00:56:33 -0700 (PDT) Received: from localhost.localdomain (108-212-132-20.lightspeed.irvnca.sbcglobal.net. [108.212.132.20]) by smtp.gmail.com with ESMTPSA id 586e51a60fabf-4177e6ae0e3sm1568938fac.16.2026.03.11.00.56.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 Mar 2026 00:56:32 -0700 (PDT) From: Wesley Atwell To: davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com, edumazet@google.com, ncardwell@google.com, dsahern@kernel.org, matttbe@kernel.org, martineau@kernel.org, netdev@vger.kernel.org, mptcp@lists.linux.dev Cc: kuniyu@google.com, horms@kernel.org, geliang@kernel.org, corbet@lwn.net, skhan@linuxfoundation.org, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, 0x7f454c46@gmail.com, linux-doc@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, atwellwea@gmail.com Subject: [PATCH net 3/7] tcp: honor advertised receive window in memory admission and clamping Date: Wed, 11 Mar 2026 01:55:56 -0600 Message-Id: <20260311075600.948413-4-atwellwea@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260311075600.948413-1-atwellwea@gmail.com> References: <20260311075600.948413-1-atwellwea@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" tp->rcv_wnd is an advertised promise to the sender, but receive-memory accounting was still reconstructing that promise through mutable live state. Switch the receive-side decisions over to the advertise-time snapshot. Use it when deciding whether a packet can be admitted, when deciding how far to clamp future window growth, and when handling the scaled-window quantization slack in __tcp_select_window(). If a snapshot is not available, keep the legacy fallback behavior. This keeps sender-visible rwnd and the local hard rmem budget in the same unit system instead of letting ratio drift create accounting mismatches. Signed-off-by: Wesley Atwell --- include/net/tcp.h | 1 + net/ipv4/tcp_input.c | 86 ++++++++++++++++++++++++++++++++++++++++--- net/ipv4/tcp_output.c | 14 ++++++- 3 files changed, 93 insertions(+), 8 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index 187e6d660f62..88ddf7ee826e 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -384,6 +384,7 @@ int tcp_ioctl(struct sock *sk, int cmd, int *karg); enum skb_drop_reason tcp_rcv_state_process(struct sock *sk, struct sk_buff= *skb); void tcp_rcv_established(struct sock *sk, struct sk_buff *skb); void tcp_rcvbuf_grow(struct sock *sk, u32 newval); +bool tcp_try_grow_rcvbuf(struct sock *sk, int needed); void tcp_rcv_space_adjust(struct sock *sk); int tcp_twsk_unique(struct sock *sk, struct sock *sktw, void *twp); void tcp_twsk_destructor(struct sock *sk); diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index cba89733d121..f76011fc1b7a 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -774,8 +774,37 @@ static void tcp_init_buffer_space(struct sock *sk) (u32)TCP_INIT_CWND * tp->advmss); } =20 +/* Try to grow sk_rcvbuf so the hard receive-memory limit covers @needed + * bytes beyond the memory already charged in sk_rmem_alloc. + */ +bool tcp_try_grow_rcvbuf(struct sock *sk, int needed) +{ + struct net *net =3D sock_net(sk); + int target; + int rmem2; + + needed =3D max(needed, 0); + target =3D tcp_rmem_used(sk) + needed; + + if (target <=3D READ_ONCE(sk->sk_rcvbuf)) + return true; + + rmem2 =3D READ_ONCE(net->ipv4.sysctl_tcp_rmem[2]); + if (READ_ONCE(sk->sk_rcvbuf) >=3D rmem2 || + (sk->sk_userlocks & SOCK_RCVBUF_LOCK) || + tcp_under_memory_pressure(sk) || + sk_memory_allocated(sk) >=3D sk_prot_mem_limits(sk, 0)) + return false; + + WRITE_ONCE(sk->sk_rcvbuf, + min_t(int, rmem2, + max_t(int, READ_ONCE(sk->sk_rcvbuf), target))); + + return target <=3D READ_ONCE(sk->sk_rcvbuf); +} + /* 4. Recalculate window clamp after socket hit its memory bounds. */ -static void tcp_clamp_window(struct sock *sk) +static void tcp_clamp_window_legacy(struct sock *sk) { struct tcp_sock *tp =3D tcp_sk(sk); struct inet_connection_sock *icsk =3D inet_csk(sk); @@ -785,14 +814,42 @@ static void tcp_clamp_window(struct sock *sk) icsk->icsk_ack.quick =3D 0; rmem2 =3D READ_ONCE(net->ipv4.sysctl_tcp_rmem[2]); =20 - if (sk->sk_rcvbuf < rmem2 && + if (READ_ONCE(sk->sk_rcvbuf) < rmem2 && !(sk->sk_userlocks & SOCK_RCVBUF_LOCK) && !tcp_under_memory_pressure(sk) && sk_memory_allocated(sk) < sk_prot_mem_limits(sk, 0)) { WRITE_ONCE(sk->sk_rcvbuf, min(atomic_read(&sk->sk_rmem_alloc), rmem2)); } - if (atomic_read(&sk->sk_rmem_alloc) > sk->sk_rcvbuf) + if (atomic_read(&sk->sk_rmem_alloc) > READ_ONCE(sk->sk_rcvbuf)) + tp->rcv_ssthresh =3D min(tp->window_clamp, 2U * tp->advmss); +} + +static void tcp_clamp_window(struct sock *sk) +{ + struct tcp_sock *tp =3D tcp_sk(sk); + u32 cur_rwnd =3D tcp_receive_window(tp); + int need; + + if (!tcp_space_from_rcv_wnd(tp, cur_rwnd, &need)) { + tcp_clamp_window_legacy(sk); + return; + } + + inet_csk(sk)->icsk_ack.quick =3D 0; + need =3D max_t(int, need, 0); + + /* Keep the hard receive-memory cap large enough to honor the + * remaining receive window we already exposed to the sender. Use + * the scaling_ratio snapshot taken when tp->rcv_wnd was advertised, + * not the mutable live ratio which may drift later in the flow. + */ + tcp_try_grow_rcvbuf(sk, need); + + /* If the remaining advertised rwnd no longer fits the hard budget, + * slow future window growth until the accounting converges again. + */ + if (need > tcp_rmem_avail(sk)) tp->rcv_ssthresh =3D min(tp->window_clamp, 2U * tp->advmss); } =20 @@ -5374,11 +5431,28 @@ static void tcp_ofo_queue(struct sock *sk) static bool tcp_prune_ofo_queue(struct sock *sk, const struct sk_buff *in_= skb); static int tcp_prune_queue(struct sock *sk, const struct sk_buff *in_skb); =20 +/* Sequence checks run against the sender-visible receive window before th= is + * point. Convert the incoming payload back to the hard receive-memory bud= get + * using the scaling_ratio that was in force when tp->rcv_wnd was advertis= ed, + * so admission keeps honoring the same exposed window even if the live ra= tio + * changes later in the flow. Legacy TCP_REPAIR restores do not have that + * advertise-time basis, so they fall back to the pre-series admission rule + * until a fresh local advertisement refreshes the pair. + * + * Do not subtract sk_backlog.len here. tcp_space() already reserves backl= og + * bytes when selecting future advertised windows, and sk_backlog.len stays + * inflated until __release_sock() finishes draining backlog. Subtracting = it + * again here would double count already-queued backlog packets as they mo= ve + * into sk_rmem_alloc. + */ static bool tcp_can_ingest(const struct sock *sk, const struct sk_buff *sk= b) { - unsigned int rmem =3D atomic_read(&sk->sk_rmem_alloc); + int need; + + if (!tcp_space_from_rcv_wnd(tcp_sk(sk), skb->len, &need)) + return atomic_read(&sk->sk_rmem_alloc) <=3D READ_ONCE(sk->sk_rcvbuf); =20 - return rmem <=3D sk->sk_rcvbuf; + return need <=3D tcp_rmem_avail(sk); } =20 static int tcp_try_rmem_schedule(struct sock *sk, const struct sk_buff *sk= b, @@ -6014,7 +6088,7 @@ static int tcp_prune_queue(struct sock *sk, const str= uct sk_buff *in_skb) struct tcp_sock *tp =3D tcp_sk(sk); =20 /* Do nothing if our queues are empty. */ - if (!atomic_read(&sk->sk_rmem_alloc)) + if (!tcp_rmem_used(sk)) return -1; =20 NET_INC_STATS(sock_net(sk), LINUX_MIB_PRUNECALLED); diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index c1b94d67d8fe..5e69fc31a4da 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -3377,13 +3377,23 @@ u32 __tcp_select_window(struct sock *sk) * scaled window will not line up with the MSS boundary anyway. */ if (tp->rx_opt.rcv_wscale) { + int rcv_wscale =3D 1 << tp->rx_opt.rcv_wscale; + window =3D free_space; =20 /* Advertise enough space so that it won't get scaled away. - * Import case: prevent zero window announcement if + * Important case: prevent zero-window announcement if * 1< mss. */ - window =3D ALIGN(window, (1 << tp->rx_opt.rcv_wscale)); + window =3D ALIGN(window, rcv_wscale); + + /* Back any scale-quantization slack before we expose it. + * Otherwise tcp_can_ingest() can reject data which is still + * within the sender-visible window. + */ + if (window > free_space && + !tcp_try_grow_rcvbuf(sk, tcp_space_from_win(sk, window))) + window =3D round_down(free_space, rcv_wscale); } else { window =3D tp->rcv_wnd; /* Get the largest window that is a nice multiple of mss. --=20 2.34.1 From nobody Sun Mar 22 08:08:46 2026 Received: from mail-ot1-f53.google.com (mail-ot1-f53.google.com [209.85.210.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 550573BBA1A for ; Wed, 11 Mar 2026 07:56:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.53 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773215801; cv=none; b=Jg8Q3m7qzYZ8DxhVogA4eW/9xKmdE7RNRn7aFodQHAE2r+Ow4f4t+HSc+02Vb4uGnU4rRSvCnKYSEih8S40mVCESfrWY+LEl05HUpcxZ1hBV88M7F39Sf38Bk34YbPXxSI6nuOrmhi/qyer7KvuNkBvkt43BUj2ZW6uxfXGaY2Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773215801; c=relaxed/simple; bh=FWYsgoyKVIUupg0joGsjXBrmnww52mswGSTqsNgSq3U=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=VS46AlyGU1HHjqGSPZIk1dmJWFVxuqxdJJ8wzV9IDdw2i2TTvHzs3adfhWYXBPjdPoON2gZXzGyoi8ZZjwzkQrFtNpySLRt1muTSTYI8s3WHepgsaPnAGTXyoUMj/Kx4BqHQLWkkarT1uzey3Jre5xGvC8frSVEPKkKRPixr1Lc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=H/2YPI1C; arc=none smtp.client-ip=209.85.210.53 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="H/2YPI1C" Received: by mail-ot1-f53.google.com with SMTP id 46e09a7af769-7d75d698ee6so878356a34.3 for ; Wed, 11 Mar 2026 00:56:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1773215795; x=1773820595; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=DWRaO2kk43skP87kgk2P1rc0roOMtr/hXw08wzU1czs=; b=H/2YPI1Cgww1qLF8X73k6RMOlAtgpK/tDCvVe+qwTWtv1lCqP/9Fz45QU3JjxC8vrb KrI/fsEV4RvZpfe88/9Cwo/YMKi35uyvLHUFKDak3gowGawOrR3bOjUjQ3v6IgOmR/9L FYLTBlILCke4UkUMknw1uBgIO5NQE+pfMPuzCDz+CnIWsyP7Hkf22Ya8+euYwzq9rU4Y +bIc4n7aEXDhyF8zB00n5LgY89TGa4cFLqoLjmdsa67XhchiL3Zu+1D7GOGzgbGmzERY 0igkVYqDe1yoW9f/vGObwMv1JwybRSSMh3KeQl5KIPum6s8beGt543tmJSEwsJMkvkWu AR+Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1773215795; x=1773820595; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=DWRaO2kk43skP87kgk2P1rc0roOMtr/hXw08wzU1czs=; b=t54ocVeXNG1nwnwDsgErwUxppTMsLY1A4rsIddL19oteW690/40rGCyI1bAJl4gwiD gqyLCPGqQtjbC7tzjBc0ORME1TR+Nakda0WfjHgy3yly783DpXMX3TSQvrgKFUTVgvGy mJ+uQ2ugEM5qCbt6z0GbJrbkRjOPMaDmlctlkCIhHNMYOg+pwbzB4cP1rzulEvaNfYrU 1b5nW0weqBfjVy6EQUePMyt8CN+O7r4QIaCpqBdaVjH6L1QGQffNyz7woX2bKiucS0/o w/mJbhhzwevk6yVIfxK9xmzw7oeKsyCfrnwBkw1Mk6nIwGophY80y4wDVjm3UZkxy+KG kZIg== X-Forwarded-Encrypted: i=1; AJvYcCWpMQRQfhRBe8YvP+kaOqxB9u4fTp+lOkpckzo5t5R+ckeLVnAvsOxRNbWaoCro5gJot0aejwt0gcYT4go=@vger.kernel.org X-Gm-Message-State: AOJu0YyNNuXwHJYX7ebNj/VY0TWtJ8AOqgNpew59I2quZYame2NfSSG7 Htjr1nxu2le2ukaCiu6dAItG4HLFh4jwHfQ9MCD03t4OUbSjGTLADg+d X-Gm-Gg: ATEYQzwUSEs1Hywfz5a44UIIedUuMureJv68+p4/tcEiJBsmMcsPlFm4zeT0BH7Au+5 5byaqBq9kMtuE+dPFE9s3z0nFAuo/Ph9FKd+gDkRqlOzQqlK4EGOWB5dp6K0ED98ghptwHQpC34 kM8ZsaYvMXDoU9GI2tawKVv6mSQmeANPvwFcp74wMy4H6CpbMuXzI9C8rW7nvKbSF0EAZMaHINz K9Rca3zMuoMHT7aIt/9puhvopwlTGgghprc6HYNexnUcHio75fUSwFOpTjvHwd46AflikD22mmr fsybxDxwzR9q1kJazfSsu24DiUd+fqrYdest+TNI9I9fzfHF/vIT4R2gpWoU5AVtzoeA6QBZMjI Dc5mudyueoE1pgX5fq9vPUy3cIx0LnBZmYgB+8p2PkYX2pg6X8Xk0MHatttNPPg01yFBlcyywr1 pj9aMHrsplSYqiAatQCz6cin+eqpmSACJa36NjQks88Fvs2+EfGrtVG/wtCag88s0Vatnep1dh5 Ih3i2lAdTchPhcDE4htMl6cJnoIU2lOq6tthMthCTIrcXL3 X-Received: by 2002:a05:6820:4deb:b0:679:e889:dde1 with SMTP id 006d021491bc7-67bc8877e83mr927920eaf.6.1773215795085; Wed, 11 Mar 2026 00:56:35 -0700 (PDT) Received: from localhost.localdomain (108-212-132-20.lightspeed.irvnca.sbcglobal.net. [108.212.132.20]) by smtp.gmail.com with ESMTPSA id 586e51a60fabf-4177e6ae0e3sm1568938fac.16.2026.03.11.00.56.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 Mar 2026 00:56:34 -0700 (PDT) From: Wesley Atwell To: davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com, edumazet@google.com, ncardwell@google.com, dsahern@kernel.org, matttbe@kernel.org, martineau@kernel.org, netdev@vger.kernel.org, mptcp@lists.linux.dev Cc: kuniyu@google.com, horms@kernel.org, geliang@kernel.org, corbet@lwn.net, skhan@linuxfoundation.org, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, 0x7f454c46@gmail.com, linux-doc@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, atwellwea@gmail.com Subject: [PATCH net 4/7] tcp: extend TCP_REPAIR_WINDOW with receive-window scaling snapshot Date: Wed, 11 Mar 2026 01:55:57 -0600 Message-Id: <20260311075600.948413-5-atwellwea@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260311075600.948413-1-atwellwea@gmail.com> References: <20260311075600.948413-1-atwellwea@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The paired receive-window state is now part of the live TCP socket semantics, so repair and restore need a way to preserve it. Extend TCP_REPAIR_WINDOW with the advertise-time scaling snapshot while keeping old userspace working. The kernel now accepts exactly the legacy layout and the extended layout. Legacy restore leaves the snapshot unknown so the socket falls back safely until a fresh local window advertisement refreshes the pair, while the extended layout restores the exact snapshot. Signed-off-by: Wesley Atwell --- include/uapi/linux/tcp.h | 1 + net/ipv4/tcp.c | 34 ++++++++++++++++++++++++++++------ 2 files changed, 29 insertions(+), 6 deletions(-) diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h index 03772dd4d399..3a799f4c0e1e 100644 --- a/include/uapi/linux/tcp.h +++ b/include/uapi/linux/tcp.h @@ -159,6 +159,7 @@ struct tcp_repair_window { =20 __u32 rcv_wnd; __u32 rcv_wup; + __u32 rcv_wnd_scaling_ratio; /* 0 means advertise-time basis unknown */ }; =20 enum { diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index cec9ae1bf875..dd2b4fe61bd8 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -3551,17 +3551,25 @@ static inline bool tcp_can_repair_sock(const struct= sock *sk) (sk->sk_state !=3D TCP_LISTEN); } =20 +/* Keep accepting the pre-extension TCP_REPAIR_WINDOW layout so legacy + * userspace can restore sockets without fabricating a snapshot basis. + */ +static inline int tcp_repair_window_legacy_size(void) +{ + return offsetof(struct tcp_repair_window, rcv_wnd_scaling_ratio); +} + static int tcp_repair_set_window(struct tcp_sock *tp, sockptr_t optbuf, in= t len) { - struct tcp_repair_window opt; + struct tcp_repair_window opt =3D {}; =20 if (!tp->repair) return -EPERM; =20 - if (len !=3D sizeof(opt)) + if (len !=3D tcp_repair_window_legacy_size() && len !=3D sizeof(opt)) return -EINVAL; =20 - if (copy_from_sockptr(&opt, optbuf, sizeof(opt))) + if (copy_from_sockptr(&opt, optbuf, len)) return -EFAULT; =20 if (opt.max_window < opt.snd_wnd) @@ -3577,7 +3585,20 @@ static int tcp_repair_set_window(struct tcp_sock *tp= , sockptr_t optbuf, int len) tp->snd_wnd =3D opt.snd_wnd; tp->max_window =3D opt.max_window; =20 - tp->rcv_wnd =3D opt.rcv_wnd; + if (len =3D=3D tcp_repair_window_legacy_size()) { + /* Legacy repair UAPI has no advertise-time basis for tp->rcv_wnd. + * Mark the snapshot unknown until a fresh local advertisement + * re-establishes the pair. + */ + tcp_set_rcv_wnd_unknown(tp, opt.rcv_wnd); + tp->rcv_wup =3D opt.rcv_wup; + return 0; + } + + if (opt.rcv_wnd_scaling_ratio > U8_MAX) + return -EINVAL; + + tcp_set_rcv_wnd_snapshot(tp, opt.rcv_wnd, opt.rcv_wnd_scaling_ratio); tp->rcv_wup =3D opt.rcv_wup; =20 return 0; @@ -4667,12 +4688,12 @@ int do_tcp_getsockopt(struct sock *sk, int level, break; =20 case TCP_REPAIR_WINDOW: { - struct tcp_repair_window opt; + struct tcp_repair_window opt =3D {}; =20 if (copy_from_sockptr(&len, optlen, sizeof(int))) return -EFAULT; =20 - if (len !=3D sizeof(opt)) + if (len !=3D tcp_repair_window_legacy_size() && len !=3D sizeof(opt)) return -EINVAL; =20 if (!tp->repair) @@ -4683,6 +4704,7 @@ int do_tcp_getsockopt(struct sock *sk, int level, opt.max_window =3D tp->max_window; opt.rcv_wnd =3D tp->rcv_wnd; opt.rcv_wup =3D tp->rcv_wup; + opt.rcv_wnd_scaling_ratio =3D tp->rcv_wnd_scaling_ratio; =20 if (copy_to_sockptr(optval, &opt, len)) return -EFAULT; --=20 2.34.1 From nobody Sun Mar 22 08:08:46 2026 Received: from mail-oa1-f52.google.com (mail-oa1-f52.google.com [209.85.160.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 50A003B7B91 for ; Wed, 11 Mar 2026 07:56:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.52 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773215805; cv=none; b=gHYvBuMRJfQkC70Bdb5Urt4Tx/sx2NLLh2OmRbFzLrsYSP+3D3jbOkFa87OmVNj7hbrKUoN8UFu+in8tMrvFKPDEErGHNhpVvIIIp+BVKRnWLd/9VyRTFiQCRa1iIEZv5sf2P4sOVE3Ciu8NArpGRLx9w+10hPEfjgRan4rOGEU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773215805; c=relaxed/simple; bh=9YrqMac2RvVbcQMvhIkvAs6CdxSyWCSfXy/rvRlKfmE=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=IfvabFWXNXqHQsPsfmdyi+dxuhbFYVw8iLRWcZ9jWg1Z6U9jqo4ldLSnZpTkEcIANfDMJ5Ywe/2mqZhQq63D/n3U7YYqSZcWljCR052rXpMcbToOUD4mDOqrXkrkEK/rtzjzj4MbH+LEmKazvalo3bB+BoPGVO8IsTSBB5yjnOo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=bf4OdtH7; arc=none smtp.client-ip=209.85.160.52 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="bf4OdtH7" Received: by mail-oa1-f52.google.com with SMTP id 586e51a60fabf-40efc77933fso5703291fac.3 for ; Wed, 11 Mar 2026 00:56:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1773215797; x=1773820597; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=DQ1OrwVz8Z3OJFcYGZCm4ACsYMr8Af3yv6w/TQO392I=; b=bf4OdtH7dYvZH0Pm5BCE0v6Kyxgrqcm9ifR02nbDzdUgjo4ncoo7rOA6dUpoXbNOjl Cf18YTTMviUOYlpP6pO4GkC9utn3BDvSRM18341vKYC+if5y93YQ89ulI/GZlVhEIB6I 3RamUOhKpNPdZSdRSvU2vlG8WmcwO1I7w5cF6qDNtfbNMdWux7C4uOrzuWX9CE0Ms6oz EsIgclWrrW399y8mn/r12uYJRC4B+2LkZRW3YmZs+B6OvsmEDFFOHblSC1kG0ETkG/AU BzCvlhugDNX1fI/4IROI5CgS3sv3hosqE80Lk9ZkPMPB/Uo1ROn16Qfz4CiMFrDCA2k+ BAOA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1773215797; x=1773820597; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=DQ1OrwVz8Z3OJFcYGZCm4ACsYMr8Af3yv6w/TQO392I=; b=FilCGCKqIVwdy0Yp5C9bQgphPHDvJnmgtSdMt9TwZEUl4uB8W1+gftWroEE2qGttCo DFLJg1n+VgBDnyh/sOpF6Y7DIFdEBUAvKhBSMHQmf9jtIlZW27p8jdbdSPrIoYj47Qlv YnwZxE4c0mafuVj7hTY5b47iIsKXh70s5Cdi4LfZD+r+s0mkk3fbTanFTjGIOqkNkrtF fUruOMwkGfNz+duHVaU5+rwUrwyf/ErXTVgSQSmTD0iNuBLsbmgwPcAIXE4B1ha2MF9b odc9emiXPH1RI2+BhZiVrIxFjZ3y/BBEb0tFivVIXky/Cj3BOxZ4ndMJCQaGYcCdrRtB 8gsQ== X-Forwarded-Encrypted: i=1; AJvYcCVM0Iu1taEwsuzbZaJbZg6EdigIw/g4dEHZZZu8nZppTNSKqsSs4mtVMv9/04YnqIsuEHp0cyXvP1zRWN0=@vger.kernel.org X-Gm-Message-State: AOJu0Yy3twv0vZyJvb8u2oEz3wy0LkO88WDcpFz7q4UkjoF8IhIQfHJN lCyVJAjUausMHQkRbp/PXr/Oi7oEZrC8+fwOj6QZaE7i0q3HxRPcM4vN X-Gm-Gg: ATEYQzxIl8ZFakh6fS30CWXEf1OMFpFeVxqXTtIIJqWb4P4G5NotXOSUExX1dP4mbZP 3CawxDtDAtgWNriSb5kSL2s5UnO+Y46xv0sERSA5QX4k8S5VymrboNgMpOblriP1vcBvxG5Qg4z 2pcRC2VSKS1pmKMYls5Q0LK5ohVX1ygPZFa0FA/jpQ0tYuBRf1u4ylnIHcoSvM1HpMhdJp1Yg1A eZ9LXvE2PsywOuxblJObr3qFISsAVZk7C+PVMLIfRokuGXQd7+Xq5pkBaVGLwgYP4TuI6wcgtBT 5Df/BoGHf9xBB82gec58hKCy0CWdF+lmkTf73iezImruSxFy0JyYlraLvb5SInpBUD5AQ1x+SOn +tPJ4x2aFZej3QCDwavq6T9mU80BKemshNxjv2YA6mWLXMpSIuze/sqEPRMeZy4kRgZDirwi+RE 0kkRTzHkTNbbI5GHx/Q7B9AV0OTf5Oswqji+BIWF4ybtFmy35p1utO1AVWOfP1C/kY0uP+HhYE3 LXzjbG5alGvGNNi1pcrSLXiFQleDTTEP63GHVrqaLpiUkNE X-Received: by 2002:a05:6870:2e04:b0:409:7cfc:7392 with SMTP id 586e51a60fabf-4177c96b888mr1211062fac.42.1773215796875; Wed, 11 Mar 2026 00:56:36 -0700 (PDT) Received: from localhost.localdomain (108-212-132-20.lightspeed.irvnca.sbcglobal.net. [108.212.132.20]) by smtp.gmail.com with ESMTPSA id 586e51a60fabf-4177e6ae0e3sm1568938fac.16.2026.03.11.00.56.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 Mar 2026 00:56:36 -0700 (PDT) From: Wesley Atwell To: davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com, edumazet@google.com, ncardwell@google.com, dsahern@kernel.org, matttbe@kernel.org, martineau@kernel.org, netdev@vger.kernel.org, mptcp@lists.linux.dev Cc: kuniyu@google.com, horms@kernel.org, geliang@kernel.org, corbet@lwn.net, skhan@linuxfoundation.org, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, 0x7f454c46@gmail.com, linux-doc@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, atwellwea@gmail.com Subject: [PATCH net 5/7] mptcp: refresh tcp rcv_wnd snapshot when syncing receive windows Date: Wed, 11 Mar 2026 01:55:58 -0600 Message-Id: <20260311075600.948413-6-atwellwea@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260311075600.948413-1-atwellwea@gmail.com> References: <20260311075600.948413-1-atwellwea@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" MPTCP rewrites the TCP shadow receive window on subflows when shared receive-window state changes. Once tp->rcv_wnd carries paired snapshot semantics, those subflow shadow updates have to refresh the snapshot too. Convert the MPTCP window-sync write sites to use the helper and keep the aggregate receive-space arithmetic using the explicit rwnd-availability helper. Signed-off-by: Wesley Atwell --- net/mptcp/options.c | 12 ++++++++---- net/mptcp/protocol.h | 14 +++++++++++--- 2 files changed, 19 insertions(+), 7 deletions(-) diff --git a/net/mptcp/options.c b/net/mptcp/options.c index 43df4293f58b..6e6aa084cbfa 100644 --- a/net/mptcp/options.c +++ b/net/mptcp/options.c @@ -1073,9 +1073,12 @@ static void rwin_update(struct mptcp_sock *msk, stru= ct sock *ssk, return; =20 /* Some other subflow grew the mptcp-level rwin since rcv_wup, - * resync. + * resync. Keep the TCP shadow window in its advertised u32 domain + * and refresh the advertise-time scaling snapshot while doing so. */ - tp->rcv_wnd +=3D mptcp_rcv_wnd - subflow->rcv_wnd_sent; + tcp_set_rcv_wnd(tp, min_t(u64, (u64)tp->rcv_wnd + + (mptcp_rcv_wnd - subflow->rcv_wnd_sent), + U32_MAX)); subflow->rcv_wnd_sent =3D mptcp_rcv_wnd; } =20 @@ -1334,11 +1337,12 @@ static void mptcp_set_rwin(struct tcp_sock *tp, str= uct tcphdr *th) if (rcv_wnd_new !=3D rcv_wnd_old) { raise_win: /* The msk-level rcv wnd is after the tcp level one, - * sync the latter. + * sync the latter and refresh its advertise-time scaling + * snapshot. */ rcv_wnd_new =3D rcv_wnd_old; win =3D rcv_wnd_old - ack_seq; - tp->rcv_wnd =3D min_t(u64, win, U32_MAX); + tcp_set_rcv_wnd(tp, min_t(u64, win, U32_MAX)); new_win =3D tp->rcv_wnd; =20 /* Make sure we do not exceed the maximum possible diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index 0bd1ee860316..4ea95c9c0c7a 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -408,11 +408,19 @@ static inline int mptcp_space_from_win(const struct s= ock *sk, int win) return __tcp_space_from_win(mptcp_sk(sk)->scaling_ratio, win); } =20 +/* MPTCP exposes window space from the mptcp-level receive queue, so it tr= acks + * a separate backlog counter from the subflow backlog embedded in struct = sock. + */ +static inline int mptcp_rwnd_avail(const struct sock *sk) +{ + return READ_ONCE(sk->sk_rcvbuf) - + READ_ONCE(mptcp_sk(sk)->backlog_len) - + tcp_rmem_used(sk); +} + static inline int __mptcp_space(const struct sock *sk) { - return mptcp_win_from_space(sk, READ_ONCE(sk->sk_rcvbuf) - - READ_ONCE(mptcp_sk(sk)->backlog_len) - - sk_rmem_alloc_get(sk)); + return mptcp_win_from_space(sk, mptcp_rwnd_avail(sk)); } =20 static inline struct mptcp_data_frag *mptcp_send_head(const struct sock *s= k) --=20 2.34.1 From nobody Sun Mar 22 08:08:46 2026 Received: from mail-oa1-f41.google.com (mail-oa1-f41.google.com [209.85.160.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F16583BC691 for ; Wed, 11 Mar 2026 07:56:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.41 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773215805; cv=none; b=mAS3HSNga4jH8vmILTxFDSzVZWWNeSf0TH5MP8RIRv8nPFNgFdvrQ8xU855mm2rhuXkgDWLH7+k1GGnfOlGUTGkWY5DfQguIdB2rhEhoe+8nKd3eWAQ67cwsx2LEobqFPwzEE6TGbzlR6gHuShEdrgDZwzrL7T/KqjsNcJYQwIs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773215805; c=relaxed/simple; bh=C939i0d+rKTXWj91cA/6uhlPcNX1BycjNPrjJIyRlB0=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=aMkhtfLwGpxZJ3wvh+AnxkEBvuMegPJKrMo1wdPHfLrb1BLVqGJx2sPq0jtaVlBql18I6qjrBNRJc7sAd2BOOo+8aP4xHLx7DseR0prdLSVX+eQyqytxs2g56Xf+O4BRsXxTtRUElvb4BdLpkAYMi2+F/PyHZvvAPnJlUwO8tQk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Kh/S2Ut8; arc=none smtp.client-ip=209.85.160.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Kh/S2Ut8" Received: by mail-oa1-f41.google.com with SMTP id 586e51a60fabf-41729dc7d7aso1198011fac.3 for ; Wed, 11 Mar 2026 00:56:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1773215799; x=1773820599; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=4epqsXquWuAivDGaqjYARTGwK/kOqEMKWTt83y67gTo=; b=Kh/S2Ut8Mz3IOrJyUF1u8U1hKxWbezXDQfvjBZZZSRIxPlXeO/7kz+0xvq1bMEiJAe lXzcI5udhKCFXJqU2fL1KoH6Jdn0FZiXxgwW39Y5VOznkbwDLALQDwc3P74kEVEYjYq+ yig8cAAHYZAQ9qcVilzLP9jNYCgVewxw+Yb2TwGYXDOSkqfDkRmwMpg3p/Tfa3pSEe9u g7bABRqWYHbWBfa3Hjar5WRpxYmkbhu4XEe3Rvhy/d2yHHE/pxgWgsqKVIWOOYvwEhnl AVOMSDCrZpk1GXQMspz5ABATP+Km1F5tyN8Czb/z1EperES+Pbim2ngZgi1MZtlBkUbW hTDQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1773215799; x=1773820599; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=4epqsXquWuAivDGaqjYARTGwK/kOqEMKWTt83y67gTo=; b=hHcpLNY6j1UW7ocuRt03OoE9EmtUgwkg2PJHFWulsWvuaGrBq2WlxJWrHjJBSRHTNT 4atNlq+5ffqns0TA81ALzkxcNyHJOsQOFN+uYwggfSBOa+UHSyvG7jiEC5jG2S5nv2fh kg1Um04xyhlLUhOyHVfjSpw7bFF/AQy7cxK5ag2EvPDflMi7I3sPgI3F/N7k8jqMkypr jpEHwBUEJ88BB6+qX6aCw3kxY8Qg1CUy8n7DyMzer/NZrqQ2mA/VzWXqYDrviA1xIGZ0 +Eb4XjDcCFsii9rCSxcOAwVDT8awTM6wAyH0JXBvJMTp7Nom9afwXJTp3upbnrEtKxSN hzIA== X-Forwarded-Encrypted: i=1; AJvYcCX64kXe8Ew1igIKVG6lsVF5D4LF3XnDlRTisG0YjQcKZTP5Lt+ccMq/31Auz8drI0y3vKPeEA/3qr7w08U=@vger.kernel.org X-Gm-Message-State: AOJu0YzIdf60Rb4FLAbuY7bdQBYcqxOAErP2StPjzXPY6KKxeAAjZCZM KRqM0vrOVq9JJ2uvjf6U7Yeq3MyfqwCHHKLjd/KpWC02948UZwkWfyP8 X-Gm-Gg: ATEYQzxB8OVC04k4ck2gytrSOMPR4cpcv7avpyDPsI0HTyxdA+chYv6G3DSxKXdBE9v YHxLLftuFyB9s9n0M0jg0nOTbykZ1iwCcthT0F1tKNwmIranY4rFYyc1Lcy+slUC7PjuT50A85+ DTv0gW/jrEbg8iv4OKiTfBMd+4CkAxANOjYVe2Ac07rfO/8ataqhAd6PwCQ5St+M3BxXiIeMkfa ZltCBIdiqjANr/6IexOmJrW5bAFWu5KuXe5ulkM7Aw8lv1F/yoEdB5psaS4pOb/SIVeTP2eu2wn tR2grPl9FQc+TEZOicTQdvDBzZ9ndi2b3ayBV+nkIylgiosDZ30kFznYgaSY97+VEF9T0H1/F97 GEpguR9CJVDztC7Iy9xk0VIpxcEJixJywuctpY4zmqT3xs1AG6DNM6hX+Fx6KZSPfVyYcn5sZ1C dpZKx/v5PfClCv9oBBglz04rx3BigYGhfdMrw9+Mzid3o6P25BCd147SZoq2ztJLUiybLj72Brz 9I2YMRprP+vJf52pEv5NWkg0B9eVwsY+7PbM8iwrIUvfub4 X-Received: by 2002:a05:6870:8e07:b0:3e0:de76:31e5 with SMTP id 586e51a60fabf-4177c8b4e62mr1075613fac.25.1773215798847; Wed, 11 Mar 2026 00:56:38 -0700 (PDT) Received: from localhost.localdomain (108-212-132-20.lightspeed.irvnca.sbcglobal.net. [108.212.132.20]) by smtp.gmail.com with ESMTPSA id 586e51a60fabf-4177e6ae0e3sm1568938fac.16.2026.03.11.00.56.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 Mar 2026 00:56:38 -0700 (PDT) From: Wesley Atwell To: davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com, edumazet@google.com, ncardwell@google.com, dsahern@kernel.org, matttbe@kernel.org, martineau@kernel.org, netdev@vger.kernel.org, mptcp@lists.linux.dev Cc: kuniyu@google.com, horms@kernel.org, geliang@kernel.org, corbet@lwn.net, skhan@linuxfoundation.org, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, 0x7f454c46@gmail.com, linux-doc@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, atwellwea@gmail.com Subject: [PATCH net 6/7] tcp: expose rmem and backlog accounting in rcvbuf_grow tracepoints Date: Wed, 11 Mar 2026 01:55:59 -0600 Message-Id: <20260311075600.948413-7-atwellwea@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260311075600.948413-1-atwellwea@gmail.com> References: <20260311075600.948413-1-atwellwea@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The receive-window work now depends on keeping sender-visible rwnd and hard receive-memory accounting aligned. Expose the current rmem charge and backlog reservation in the TCP and MPTCP rcvbuf_grow tracepoints so that later drift between advertised window and local backing is visible during review and debugging. Signed-off-by: Wesley Atwell --- include/trace/events/mptcp.h | 11 +++++++---- include/trace/events/tcp.h | 12 +++++++----- 2 files changed, 14 insertions(+), 9 deletions(-) diff --git a/include/trace/events/mptcp.h b/include/trace/events/mptcp.h index 269d949b2025..167970e8e0a5 100644 --- a/include/trace/events/mptcp.h +++ b/include/trace/events/mptcp.h @@ -199,6 +199,8 @@ TRACE_EVENT(mptcp_rcvbuf_grow, __field(__u32, inq) __field(__u32, space) __field(__u32, ooo_space) + __field(__u32, rmem_alloc) + __field(__u32, backlog_len) __field(__u32, rcvbuf) __field(__u32, rcv_wnd) __field(__u8, scaling_ratio) @@ -228,6 +230,8 @@ TRACE_EVENT(mptcp_rcvbuf_grow, MPTCP_SKB_CB(msk->ooo_last_skb)->end_seq - msk->ack_seq; =20 + __entry->rmem_alloc =3D tcp_rmem_used(sk); + __entry->backlog_len =3D READ_ONCE(msk->backlog_len); __entry->rcvbuf =3D sk->sk_rcvbuf; __entry->rcv_wnd =3D atomic64_read(&msk->rcv_wnd_sent) - msk->ack_seq; @@ -248,12 +252,11 @@ TRACE_EVENT(mptcp_rcvbuf_grow, __entry->skaddr =3D sk; ), =20 - TP_printk("time=3D%u rtt_us=3D%u copied=3D%u inq=3D%u space=3D%u ooo=3D%u= scaling_ratio=3D%u " - "rcvbuf=3D%u rcv_wnd=3D%u family=3D%d sport=3D%hu dport=3D%hu saddr=3D= %pI4 " - "daddr=3D%pI4 saddrv6=3D%pI6c daddrv6=3D%pI6c skaddr=3D%p", + TP_printk("time=3D%u rtt_us=3D%u copied=3D%u inq=3D%u space=3D%u ooo=3D%u= scaling_ratio=3D%u rmem_alloc=3D%u backlog_len=3D%u rcvbuf=3D%u rcv_wnd=3D= %u family=3D%d sport=3D%hu dport=3D%hu saddr=3D%pI4 daddr=3D%pI4 saddrv6=3D= %pI6c daddrv6=3D%pI6c skaddr=3D%p", __entry->time, __entry->rtt_us, __entry->copied, __entry->inq, __entry->space, __entry->ooo_space, - __entry->scaling_ratio, __entry->rcvbuf, __entry->rcv_wnd, + __entry->scaling_ratio, __entry->rmem_alloc, + __entry->backlog_len, __entry->rcvbuf, __entry->rcv_wnd, __entry->family, __entry->sport, __entry->dport, __entry->saddr, __entry->daddr, __entry->saddr_v6, __entry->daddr_v6, __entry->skaddr) diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h index f155f95cdb6e..92d0bd6be0ba 100644 --- a/include/trace/events/tcp.h +++ b/include/trace/events/tcp.h @@ -217,6 +217,8 @@ TRACE_EVENT(tcp_rcvbuf_grow, __field(__u32, inq) __field(__u32, space) __field(__u32, ooo_space) + __field(__u32, rmem_alloc) + __field(__u32, backlog_len) __field(__u32, rcvbuf) __field(__u32, rcv_ssthresh) __field(__u32, window_clamp) @@ -247,6 +249,8 @@ TRACE_EVENT(tcp_rcvbuf_grow, TCP_SKB_CB(tp->ooo_last_skb)->end_seq - tp->rcv_nxt; =20 + __entry->rmem_alloc =3D tcp_rmem_used(sk); + __entry->backlog_len =3D READ_ONCE(sk->sk_backlog.len); __entry->rcvbuf =3D sk->sk_rcvbuf; __entry->rcv_ssthresh =3D tp->rcv_ssthresh; __entry->window_clamp =3D tp->window_clamp; @@ -269,13 +273,11 @@ TRACE_EVENT(tcp_rcvbuf_grow, __entry->sock_cookie =3D sock_gen_cookie(sk); ), =20 - TP_printk("time=3D%u rtt_us=3D%u copied=3D%u inq=3D%u space=3D%u ooo=3D%u= scaling_ratio=3D%u rcvbuf=3D%u " - "rcv_ssthresh=3D%u window_clamp=3D%u rcv_wnd=3D%u " - "family=3D%s sport=3D%hu dport=3D%hu saddr=3D%pI4 daddr=3D%pI4 " - "saddrv6=3D%pI6c daddrv6=3D%pI6c skaddr=3D%p sock_cookie=3D%llx", + TP_printk("time=3D%u rtt_us=3D%u copied=3D%u inq=3D%u space=3D%u ooo=3D%u= scaling_ratio=3D%u rmem_alloc=3D%u backlog_len=3D%u rcvbuf=3D%u rcv_ssthre= sh=3D%u window_clamp=3D%u rcv_wnd=3D%u family=3D%s sport=3D%hu dport=3D%hu = saddr=3D%pI4 daddr=3D%pI4 saddrv6=3D%pI6c daddrv6=3D%pI6c skaddr=3D%p sock_= cookie=3D%llx", __entry->time, __entry->rtt_us, __entry->copied, __entry->inq, __entry->space, __entry->ooo_space, - __entry->scaling_ratio, __entry->rcvbuf, + __entry->scaling_ratio, __entry->rmem_alloc, + __entry->backlog_len, __entry->rcvbuf, __entry->rcv_ssthresh, __entry->window_clamp, __entry->rcv_wnd, show_family_name(__entry->family), --=20 2.34.1 From nobody Sun Mar 22 08:08:46 2026 Received: from mail-oi1-f173.google.com (mail-oi1-f173.google.com [209.85.167.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1FEBC3BD238 for ; Wed, 11 Mar 2026 07:56:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.167.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773215811; cv=none; b=gLwHr8b6MC1r1iHPngjqLsZHSW47Ar3k6XYaKeWwn8OR3lLFs1iFYroikcUOOeUYeDolzp6TgfB8RKhqNpfecMAw+OAibbedQ/qtdq3udvuF+SJwGtU7BW8oxNTBYglwg4EJMlkAV/65y7z0nDKzaF18nOnNRjvGQIN3Li99egA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773215811; c=relaxed/simple; bh=kX5kLQG5J1Hlmj0ExCOTkwchnPLs0mj06XQzcCCourE=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=CNM4M+F+EOiJfGQEuxN4xD0gMdS8eJmrZzmnpRjo1nkBaCnEv5DJ3IqVZhjkeHYx5cMi0rwa77d65dCdrk0J7AuEqGSKYDUafEQq+qGn52BSjMvWoU1O0uGzllIzfpaxqPhQbghJgM3jzP/al4J2CpGPYrssUn80HBglPAhGUfw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=PZBHLDL3; arc=none smtp.client-ip=209.85.167.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="PZBHLDL3" Received: by mail-oi1-f173.google.com with SMTP id 5614622812f47-4671cbce32bso597851b6e.3 for ; Wed, 11 Mar 2026 00:56:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1773215801; x=1773820601; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Aqy84kuwqTRqlBZVM3vX2nv7L0enzcNB2UtOtxRx+lU=; b=PZBHLDL39zcZG6HXPIcAxfLZmW/1UBXziWefHTsZB0zSjmJpFQ/8jlQw55QEaGmEOk WHyrLakg4VHmAm/y8pIKlmIwugrrnvmIWT3+6BGJZtoC+ouu5gVllfQCyS72QKXw8/2a YE0n67s6gihcRahMXSXNVmChmPlauGN3d4rkF+pTirates39ekmIRdP8QsOgRPVRSukA +BeNnPYMGvq9fcM+9vG4VsQ9bXP/IZQDS5svDDiLoEg+7EHoo+DRmWGPXbENGMCSJtA/ EGt+V4GJAVYly5HVCKqU5WuWL494GdMkUBFQGXxsqP+HTh88dNnZqwPYnH7hULJqlG30 2ErA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1773215801; x=1773820601; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=Aqy84kuwqTRqlBZVM3vX2nv7L0enzcNB2UtOtxRx+lU=; b=Z9AEsJrJnqDwSpgCm7v+MM09K0EkX7CYjmfD6GNFBTDmpMwMJBVPsv3C7x171qakZs NXLVdcjHNQGrsDA1E0yT8ZnpPvJ+MnodXnbRpDMX8jm0snglUzOVE989Dsaapg5WAZpF RVKHQ0O4wf1xPztDqxKgpy8MFKvP3B8smMbabHrErlzo4tp6l9kWVHBC0iLVIHSsUK3T tS2TxTga0EZC/PaMFQWbW3FZADjzZ8IbSN6UjOfQhMnqUYNdsxxrvp1XzSKSByuEaYXN lMs0Nb3wqxYm8G2vqd+IQfrzR4ivB/HIj4lpCD9iGHZViFIfsNZbTYrgxY9QhOuf4/ei P1qw== X-Forwarded-Encrypted: i=1; AJvYcCWDC5KI3T2aNYKvuNWmgRUVNU5/kHnFPe+Uu6JOdbO6hdFdXQ3vVSkA4epcmj9fNd4JjBK18S+8o2R8x2k=@vger.kernel.org X-Gm-Message-State: AOJu0YzxCY2Kz0+bpw6zJjIc3yJlQKTXoRCj4B00n8bN7bBAW7HMad67 YY/XjCqJGdncdWko9awZrYEGlmKOUcRkl9sivx21hBxwtaEp4fTuqxV9 X-Gm-Gg: ATEYQzyR7O9S1/tHzQhFnrybEeB8PYYB4W8phypFZmAoHWyUMW0yPMBVeLgwzqqmGVE Q7bbOSmDFW1j2VXM170TNqtXpWrElS7yEDdUK5qoluhsyq6j7I9oXLm+A9RyXSBPLcquSbz/1IR w4OJg8pVCi4DdpgjNLUc4MEIvnidfGc2SWFaQP8eMebkjvRgSnC+DLntkS3vFsx2QlWqr/H7nWu ZJaVEncAP7MgiDeY74LakGbdB5nfbCGvR5fMDM+XcX73V9JgF9kxhzLswypECvJ5HJM7/8DScq7 Yg0125Jp7JS1BzLQ8RkJaciKmcwiCH/N6dPUiEMKgt+ILbRxg7tJ02sUJkoXEr69hy+ij035u+L MTsfbawiZ4kTgKW+tu5ayH5R/Ipb5zUnF+CFjnM4Xhqk8/DmkJ7pOMqGflMNFbeCIY0JuAvlLOq eM3kYIgLqlWSjvBt0wHDyk2TU9mPo3yR1ytrK+oKDky0/wfKXkK0M3KF2H/ukSFvXwbQYfHEWG5 YMF7WGKRgM+Enh1NESFvw6pY3AF1E4yxvCptq9GvJ2UkWud X-Received: by 2002:a05:6808:f8a:b0:450:4782:2b0e with SMTP id 5614622812f47-4673346f783mr849508b6e.15.1773215800832; Wed, 11 Mar 2026 00:56:40 -0700 (PDT) Received: from localhost.localdomain (108-212-132-20.lightspeed.irvnca.sbcglobal.net. [108.212.132.20]) by smtp.gmail.com with ESMTPSA id 586e51a60fabf-4177e6ae0e3sm1568938fac.16.2026.03.11.00.56.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 Mar 2026 00:56:40 -0700 (PDT) From: Wesley Atwell To: davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com, edumazet@google.com, ncardwell@google.com, dsahern@kernel.org, matttbe@kernel.org, martineau@kernel.org, netdev@vger.kernel.org, mptcp@lists.linux.dev Cc: kuniyu@google.com, horms@kernel.org, geliang@kernel.org, corbet@lwn.net, skhan@linuxfoundation.org, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, 0x7f454c46@gmail.com, linux-doc@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, atwellwea@gmail.com Subject: [PATCH net 7/7] selftests: tcp_ao: cover legacy and extended TCP_REPAIR_WINDOW layouts Date: Wed, 11 Mar 2026 01:56:00 -0600 Message-Id: <20260311075600.948413-8-atwellwea@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260311075600.948413-1-atwellwea@gmail.com> References: <20260311075600.948413-1-atwellwea@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Extend the repair helpers and selftests so the ABI contract is pinned down in-tree. The TCP-AO restore coverage now exercises both the exact and legacy TCP_REPAIR_WINDOW layouts, verifies that intermediate lengths are rejected, and keeps the packetdrill coverage for the advertised-window receive-memory regressions in the same net selftest series. Signed-off-by: Wesley Atwell --- .../net/packetdrill/tcp_rcv_toobig.pkt | 35 +++++++ .../packetdrill/tcp_rcv_toobig_default.pkt | 97 +++++++++++++++++++ .../testing/selftests/net/tcp_ao/lib/aolib.h | 56 +++++++++-- .../testing/selftests/net/tcp_ao/lib/repair.c | 18 ++-- .../selftests/net/tcp_ao/self-connect.c | 61 ++++++++++-- 5 files changed, 244 insertions(+), 23 deletions(-) create mode 100644 tools/testing/selftests/net/packetdrill/tcp_rcv_toobig.= pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_rcv_toobig_= default.pkt diff --git a/tools/testing/selftests/net/packetdrill/tcp_rcv_toobig.pkt b/t= ools/testing/selftests/net/packetdrill/tcp_rcv_toobig.pkt new file mode 100644 index 000000000000..723c739ddc32 --- /dev/null +++ b/tools/testing/selftests/net/packetdrill/tcp_rcv_toobig.pkt @@ -0,0 +1,35 @@ +// SPDX-License-Identifier: GPL-2.0 + +--mss=3D1000 + +`./defaults.sh` + + 0 `nstat -n` + +// Establish a connection. + +0 socket(..., SOCK_STREAM, IPPROTO_TCP) =3D 3 + +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) =3D 0 + +0 setsockopt(3, SOL_SOCKET, SO_RCVBUF, [20000], 4) =3D 0 + +0 bind(3, ..., ...) =3D 0 + +0 listen(3, 1) =3D 0 + + +0 < S 0:0(0) win 32792 + +0 > S. 0:0(0) ack 1 win 18980 + +.1 < . 1:1(0) ack 1 win 257 + + +0 accept(3, ..., ...) =3D 4 + + +0 < P. 1:20001(20000) ack 1 win 257 + +.04 > . 1:1(0) ack 20001 win 18000 + + +0 setsockopt(4, SOL_SOCKET, SO_RCVBUF, [12000], 4) =3D 0 + +0 < P. 20001:80001(60000) ack 1 win 257 + +0 > . 1:1(0) ack 20001 win 18000 + + +0 read(4, ..., 20000) =3D 20000 + +// A too big packet is accepted if the receive queue is empty, but the +// stronger admission path must not zero the receive buffer while doing so. + +0 < P. 20001:80001(60000) ack 1 win 257 + +0 > . 1:1(0) ack 80001 win 0 + +0 %{ assert SK_MEMINFO_RCVBUF > 0, SK_MEMINFO_RCVBUF }% diff --git a/tools/testing/selftests/net/packetdrill/tcp_rcv_toobig_default= .pkt b/tools/testing/selftests/net/packetdrill/tcp_rcv_toobig_default.pkt new file mode 100644 index 000000000000..b2e4950e0b83 --- /dev/null +++ b/tools/testing/selftests/net/packetdrill/tcp_rcv_toobig_default.pkt @@ -0,0 +1,97 @@ +// SPDX-License-Identifier: GPL-2.0 + +--mss=3D1000 + +`./defaults.sh +sysctl -q net.ipv4.tcp_moderate_rcvbuf=3D0` + +// Establish a connection on the default receive buffer. Leave a large skb= in +// the queue, then deliver another one which still fits the remaining rwnd. +// We should grow sk_rcvbuf to honor the already-advertised window instead= of +// dropping the packet. + +0 socket(..., SOCK_STREAM, IPPROTO_TCP) =3D 3 + +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) =3D 0 + +0 bind(3, ..., ...) =3D 0 + +0 listen(3, 1) =3D 0 + + +0 < S 0:0(0) win 65535 + +0 > S. 0:0(0) ack 1 <...> + +.1 < . 1:1(0) ack 1 win 257 + + +0 accept(3, ..., ...) =3D 4 + +// Exchange enough data to get past the completely fresh-socket case while +// still keeping the receive buffer at its 128kB default. + +0 < P. 1:65001(65000) ack 1 win 257 + * > . 1:1(0) ack 65001 + +0 read(4, ..., 65000) =3D 65000 + + +0 < P. 65001:130001(65000) ack 1 win 257 + * > . 1:1(0) ack 130001 + +0 read(4, ..., 65000) =3D 65000 + + +0 < P. 130001:195001(65000) ack 1 win 257 + * > . 1:1(0) ack 195001 + +0 read(4, ..., 65000) =3D 65000 + + +0 < P. 195001:260001(65000) ack 1 win 257 + * > . 1:1(0) ack 260001 + +0 read(4, ..., 65000) =3D 65000 + + +0 < P. 260001:325001(65000) ack 1 win 257 + * > . 1:1(0) ack 325001 + +0 read(4, ..., 65000) =3D 65000 + + +0 < P. 325001:390001(65000) ack 1 win 257 + * > . 1:1(0) ack 390001 + +0 read(4, ..., 65000) =3D 65000 + + +0 < P. 390001:455001(65000) ack 1 win 257 + * > . 1:1(0) ack 455001 + +0 read(4, ..., 65000) =3D 65000 + + +0 < P. 455001:520001(65000) ack 1 win 257 + * > . 1:1(0) ack 520001 + +0 read(4, ..., 65000) =3D 65000 + + +0 < P. 520001:585001(65000) ack 1 win 257 + * > . 1:1(0) ack 585001 + +0 read(4, ..., 65000) =3D 65000 + + +0 < P. 585001:650001(65000) ack 1 win 257 + * > . 1:1(0) ack 650001 + +0 read(4, ..., 65000) =3D 65000 + + +0 < P. 650001:715001(65000) ack 1 win 257 + * > . 1:1(0) ack 715001 + +0 read(4, ..., 65000) =3D 65000 + + +0 < P. 715001:780001(65000) ack 1 win 257 + * > . 1:1(0) ack 780001 + +0 read(4, ..., 65000) =3D 65000 + + +0 < P. 780001:845001(65000) ack 1 win 257 + * > . 1:1(0) ack 845001 + +0 read(4, ..., 65000) =3D 65000 + + +0 < P. 845001:910001(65000) ack 1 win 257 + * > . 1:1(0) ack 910001 + +0 read(4, ..., 65000) =3D 65000 + + +0 < P. 910001:975001(65000) ack 1 win 257 + * > . 1:1(0) ack 975001 + +0 read(4, ..., 65000) =3D 65000 + + +0 < P. 975001:1040001(65000) ack 1 win 257 + * > . 1:1(0) ack 1040001 + +0 read(4, ..., 65000) =3D 65000 + +// Leave about 60kB queued, then accept another large skb which still fits +// the rwnd we already exposed to the peer. The regression is the drop; the +// exact sk_rcvbuf growth path is an implementation detail. + +0 < P. 1040001:1102001(62000) ack 1 win 257 + * > . 1:1(0) ack 1102001 + + +0 < P. 1102001:1167001(65000) ack 1 win 257 + * > . 1:1(0) ack 1167001 + +0 read(4, ..., 127000) =3D 127000 diff --git a/tools/testing/selftests/net/tcp_ao/lib/aolib.h b/tools/testing= /selftests/net/tcp_ao/lib/aolib.h index ebb2899c12fe..ff259795a4a0 100644 --- a/tools/testing/selftests/net/tcp_ao/lib/aolib.h +++ b/tools/testing/selftests/net/tcp_ao/lib/aolib.h @@ -13,6 +13,7 @@ #include #include #include +#include #include #include #include @@ -671,17 +672,42 @@ struct tcp_sock_state { int timestamp; }; =20 -extern void __test_sock_checkpoint(int sk, struct tcp_sock_state *state, - void *addr, size_t addr_size); +/* Legacy userspace stops before the snapshot field and therefore exercises + * the kernel's unknown-snapshot fallback path. + */ +static inline socklen_t test_tcp_repair_window_legacy_size(void) +{ + return offsetof(struct tcp_repair_window, rcv_wnd_scaling_ratio); +} + +static inline socklen_t test_tcp_repair_window_exact_size(void) +{ + return sizeof(struct tcp_repair_window); +} + +void __test_sock_checkpoint_opt(int sk, struct tcp_sock_state *state, + socklen_t trw_len, + void *addr, size_t addr_size); static inline void test_sock_checkpoint(int sk, struct tcp_sock_state *sta= te, sockaddr_af *saddr) { - __test_sock_checkpoint(sk, state, saddr, sizeof(*saddr)); + __test_sock_checkpoint_opt(sk, state, test_tcp_repair_window_exact_size(), + saddr, sizeof(*saddr)); +} + +static inline void test_sock_checkpoint_legacy(int sk, + struct tcp_sock_state *state, + sockaddr_af *saddr) +{ + __test_sock_checkpoint_opt(sk, state, test_tcp_repair_window_legacy_size(= ), + saddr, sizeof(*saddr)); } extern void test_ao_checkpoint(int sk, struct tcp_ao_repair *state); -extern void __test_sock_restore(int sk, const char *device, - struct tcp_sock_state *state, - void *saddr, void *daddr, size_t addr_size); +void __test_sock_restore_opt(int sk, const char *device, + struct tcp_sock_state *state, + socklen_t trw_len, + void *saddr, void *daddr, + size_t addr_size); static inline void test_sock_restore(int sk, struct tcp_sock_state *state, sockaddr_af *saddr, const union tcp_addr daddr, @@ -690,7 +716,23 @@ static inline void test_sock_restore(int sk, struct tc= p_sock_state *state, sockaddr_af addr; =20 tcp_addr_to_sockaddr_in(&addr, &daddr, htons(dport)); - __test_sock_restore(sk, veth_name, state, saddr, &addr, sizeof(addr)); + __test_sock_restore_opt(sk, veth_name, state, + test_tcp_repair_window_exact_size(), + saddr, &addr, sizeof(addr)); +} + +static inline void test_sock_restore_legacy(int sk, + struct tcp_sock_state *state, + sockaddr_af *saddr, + const union tcp_addr daddr, + unsigned int dport) +{ + sockaddr_af addr; + + tcp_addr_to_sockaddr_in(&addr, &daddr, htons(dport)); + __test_sock_restore_opt(sk, veth_name, state, + test_tcp_repair_window_legacy_size(), + saddr, &addr, sizeof(addr)); } extern void test_ao_restore(int sk, struct tcp_ao_repair *state); extern void test_sock_state_free(struct tcp_sock_state *state); diff --git a/tools/testing/selftests/net/tcp_ao/lib/repair.c b/tools/testin= g/selftests/net/tcp_ao/lib/repair.c index 9893b3ba69f5..befbd0f72db5 100644 --- a/tools/testing/selftests/net/tcp_ao/lib/repair.c +++ b/tools/testing/selftests/net/tcp_ao/lib/repair.c @@ -66,8 +66,9 @@ static void test_sock_checkpoint_queue(int sk, int queue,= int qlen, test_error("recv(%d): %d", qlen, ret); } =20 -void __test_sock_checkpoint(int sk, struct tcp_sock_state *state, - void *addr, size_t addr_size) +void __test_sock_checkpoint_opt(int sk, struct tcp_sock_state *state, + socklen_t trw_len, + void *addr, size_t addr_size) { socklen_t len =3D sizeof(state->info); int ret; @@ -82,9 +83,9 @@ void __test_sock_checkpoint(int sk, struct tcp_sock_state= *state, if (getsockname(sk, addr, &len) || len !=3D addr_size) test_error("getsockname(): %d", (int)len); =20 - len =3D sizeof(state->trw); + len =3D trw_len; ret =3D getsockopt(sk, SOL_TCP, TCP_REPAIR_WINDOW, &state->trw, &len); - if (ret || len !=3D sizeof(state->trw)) + if (ret || len !=3D trw_len) test_error("getsockopt(TCP_REPAIR_WINDOW): %d", (int)len); =20 if (ioctl(sk, SIOCOUTQ, &state->outq_len)) @@ -160,9 +161,10 @@ static void test_sock_restore_queue(int sk, int queue,= void *buf, int len) } while (len > 0); } =20 -void __test_sock_restore(int sk, const char *device, - struct tcp_sock_state *state, - void *saddr, void *daddr, size_t addr_size) +void __test_sock_restore_opt(int sk, const char *device, + struct tcp_sock_state *state, + socklen_t trw_len, + void *saddr, void *daddr, size_t addr_size) { struct tcp_repair_opt opts[4]; unsigned int opt_nr =3D 0; @@ -215,7 +217,7 @@ void __test_sock_restore(int sk, const char *device, } test_sock_restore_queue(sk, TCP_RECV_QUEUE, state->in.buf, state->inq_len= ); test_sock_restore_queue(sk, TCP_SEND_QUEUE, state->out.buf, state->outq_l= en); - if (setsockopt(sk, SOL_TCP, TCP_REPAIR_WINDOW, &state->trw, sizeof(state-= >trw))) + if (setsockopt(sk, SOL_TCP, TCP_REPAIR_WINDOW, &state->trw, trw_len)) test_error("setsockopt(TCP_REPAIR_WINDOW)"); } =20 diff --git a/tools/testing/selftests/net/tcp_ao/self-connect.c b/tools/test= ing/selftests/net/tcp_ao/self-connect.c index 2c73bea698a6..a7edd72ab28d 100644 --- a/tools/testing/selftests/net/tcp_ao/self-connect.c +++ b/tools/testing/selftests/net/tcp_ao/self-connect.c @@ -4,6 +4,7 @@ #include "aolib.h" =20 static union tcp_addr local_addr; +static bool checked_repair_window_lens; =20 static void __setup_lo_intf(const char *lo_intf, const char *addr_str, uint8_t prefix) @@ -30,8 +31,40 @@ static void setup_lo_intf(const char *lo_intf) #endif } =20 +/* The repair ABI accepts exactly the legacy and extended layouts. */ +static void test_repair_window_len_contract(int sk) +{ + struct tcp_repair_window trw =3D {}; + socklen_t len =3D test_tcp_repair_window_exact_size(); + socklen_t bad_len =3D test_tcp_repair_window_legacy_size() + 1; + int ret; + + if (checked_repair_window_lens) + return; + + checked_repair_window_lens =3D true; + + ret =3D getsockopt(sk, SOL_TCP, TCP_REPAIR_WINDOW, &trw, &len); + if (ret || len !=3D test_tcp_repair_window_exact_size()) + test_error("getsockopt(TCP_REPAIR_WINDOW): %d", (int)len); + + len =3D bad_len; + ret =3D getsockopt(sk, SOL_TCP, TCP_REPAIR_WINDOW, &trw, &len); + if (ret =3D=3D 0 || errno !=3D EINVAL) + test_fail("repair-window get rejects invalid len"); + else + test_ok("repair-window get rejects invalid len"); + + ret =3D setsockopt(sk, SOL_TCP, TCP_REPAIR_WINDOW, &trw, bad_len); + if (ret =3D=3D 0 || errno !=3D EINVAL) + test_fail("repair-window set rejects invalid len"); + else + test_ok("repair-window set rejects invalid len"); +} + static void tcp_self_connect(const char *tst, unsigned int port, - bool different_keyids, bool check_restore) + bool different_keyids, bool check_restore, + bool legacy_repair_window) { struct tcp_counters before, after; uint64_t before_aogood, after_aogood; @@ -109,7 +142,11 @@ static void tcp_self_connect(const char *tst, unsigned= int port, } =20 test_enable_repair(sk); - test_sock_checkpoint(sk, &img, &addr); + test_repair_window_len_contract(sk); + if (legacy_repair_window) + test_sock_checkpoint_legacy(sk, &img, &addr); + else + test_sock_checkpoint(sk, &img, &addr); #ifdef IPV6_TEST addr.sin6_port =3D htons(port + 1); #else @@ -123,7 +160,11 @@ static void tcp_self_connect(const char *tst, unsigned= int port, test_error("socket()"); =20 test_enable_repair(sk); - __test_sock_restore(sk, "lo", &img, &addr, &addr, sizeof(addr)); + __test_sock_restore_opt(sk, "lo", &img, + legacy_repair_window ? + test_tcp_repair_window_legacy_size() : + test_tcp_repair_window_exact_size(), + &addr, &addr, sizeof(addr)); if (different_keyids) { if (test_add_repaired_key(sk, DEFAULT_TEST_PASSWORD, 0, local_addr, -1, 7, 5)) @@ -165,20 +206,24 @@ static void *client_fn(void *arg) =20 setup_lo_intf("lo"); =20 - tcp_self_connect("self-connect(same keyids)", port++, false, false); + tcp_self_connect("self-connect(same keyids)", port++, false, false, false= ); =20 /* expecting rnext to change based on the first segment RNext !=3D Curren= t */ trace_ao_event_expect(TCP_AO_RNEXT_REQUEST, local_addr, local_addr, port, port, 0, -1, -1, -1, -1, -1, 7, 5, -1); - tcp_self_connect("self-connect(different keyids)", port++, true, false); - tcp_self_connect("self-connect(restore)", port, false, true); + tcp_self_connect("self-connect(different keyids)", port++, true, false, f= alse); + tcp_self_connect("self-connect(restore)", port, false, true, false); + port +=3D 2; /* restore test restores over different port */ + tcp_self_connect("self-connect(restore, legacy repair window)", + port, false, true, true); port +=3D 2; /* restore test restores over different port */ trace_ao_event_expect(TCP_AO_RNEXT_REQUEST, local_addr, local_addr, port, port, 0, -1, -1, -1, -1, -1, 7, 5, -1); /* intentionally on restore they are added to the socket in different ord= er */ trace_ao_event_expect(TCP_AO_RNEXT_REQUEST, local_addr, local_addr, port + 1, port + 1, 0, -1, -1, -1, -1, -1, 5, 7, -1); - tcp_self_connect("self-connect(restore, different keyids)", port, true, t= rue); + tcp_self_connect("self-connect(restore, different keyids)", + port, true, true, false); port +=3D 2; /* restore test restores over different port */ =20 return NULL; @@ -186,6 +231,6 @@ static void *client_fn(void *arg) =20 int main(int argc, char *argv[]) { - test_init(5, client_fn, NULL); + test_init(8, client_fn, NULL); return 0; } --=20 2.34.1