From nobody Sun Mar 22 09:41:17 2026 Received: from mail-oa1-f49.google.com (mail-oa1-f49.google.com [209.85.160.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AAA86386428 for ; Sat, 14 Mar 2026 20:14:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.49 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773519296; cv=none; b=YGw+hpK499GdtYQ3vx0xy2jmnnwQk3yERN/GvP0AhYwcvqYBfgrdQjL6+F37T8wrZxRSduAWEH8E5HJ9rYRsbNFihClHz/rVoX0H1nf1L/WTM9ApZEewdMUZAKW7Jn35JjFKHRj5LL/+WV66xsbHQEApdEuVmpjVq2uQlfev+UI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773519296; c=relaxed/simple; bh=UwI3YBDLFZSN0Qol0fxNTwazKut7aA5SPPECRbpGROI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=rIVSZKDYxrPLCd8tcYQfb5fkBjwciDw0RSLps9co+oQ4tsf8MIIPXLD0asdoMcKEiUALyVDuYOqgWEsRnQnfozkSS96WOmh2qpH4+V02NKfnVpQF0HGYNCMoqypMIOXe9fJVlMN+3SzESu7gbEBQPfATGKayI+gFI7ZJgXszBvY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=j49SHDWy; arc=none smtp.client-ip=209.85.160.49 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="j49SHDWy" Received: by mail-oa1-f49.google.com with SMTP id 586e51a60fabf-40438e0cba6so2122776fac.1 for ; Sat, 14 Mar 2026 13:14:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1773519294; x=1774124094; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=W8jG5HL0B71BMqNMR6IMNfhIt0KSqCtw04UXTW9LEYQ=; b=j49SHDWyhL6wVPV5IL2n94528OcExoRQnxUY1jTYXuP9K5S7QNJhuDq8bK7PGpVdf2 IqZrZw86ALzFrFB7Dm1rYuRSKE51wU3EBTogSlIzUc8xR7rhpUWSEgMkUE3jmocumDW3 ZXRd33PWAzTNzUQIcWtqxbrFNOrq00xB6l/Afjl+nBygCSzgG3J8KYC1MXTabkdXlG5Z ho5UwDBIsi6JuMoafh9D1E73l18caLotal1OPM62yTIie9pZ+IcMRXu/muCXIk7W6eAv Q221WX4LU4hrfBVNoXINbhrBMEwgtWdBRAO25EnJl/8EHyc6i8KbPBQrnVbLQN0uz7L4 4BoA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1773519294; x=1774124094; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=W8jG5HL0B71BMqNMR6IMNfhIt0KSqCtw04UXTW9LEYQ=; b=BxGHYtKia/Ycj9Hxyu3y7h12Ovpwk0qrlmbExBEuAxf8W1AuqeP6mdijvEkzwwMwld 1sTkKpAN5f+pwih+eWuiFWzZ6NwaPfyOw6urkTn5kFW0mTcsnCQJIVDUOj+qsx1e0j1V 1xlW9qnESEb0eJGywwTnsI2hVlq8019FK6jDwsA9vH8Sb4sfA4l0eMv/Jy+Fik00KDZw 2s51BGP9kp0zfElZu2hRYJQeXTAwJoqlxnkumJxn5Sd5vQI5fu45VlIvaajmfvfHly79 06g14PA3Xtpmjhgt+7kkxO/A5yHYSTqXxVcpgtWGY6nqtboZr2Y5SSlUEZ4FKfvhvxlT CTWg== X-Gm-Message-State: AOJu0YwI0yHsMwdIcYBq0FxIRINDpY/n9cj/RpHqhNScKkq10fVz0MCb bBAGPS9pY/7SGwKzHJJZ+cWjQLQTEik7NxltK4UTyqINFkqcorx5Y8I2 X-Gm-Gg: ATEYQzx/TkplT4+uP7VvPSpua7e6CM5QAMsniSM5HM9rrm/uHsdy4Kfwd6atvUxdKaR ef8p5/NG8trFdhuu1xR6zsnpo/2f+lBPW5PDwTzWGM4H1uO+s6D/pK3CUDOPLwv69/s+c3Qm8bN 3YWNtRIabsYrMIwfs2JC4WxP8bKdB8P3k/uDRNbGZWbfmkKUhVm2hRo5JW/v0kY3BAyh7C6LF/a 9SKeIKRFFUGQRZXbmrm00llzyHiPb6R9gSQHp9peE1Tv5raBW4ANH2rac8Lg1mUH52Ly4y9PTWW RmwfMseJQMgvSNatr9+AX9RRWP/1fXHrfQstFHxBCe5od+CuJJeoeTdxYLI+umAO+DVeJmmdNk+ ABEO8zjWULOrTzKvPcY432I+XPJgObg51RG3IrUb9XXnzirSYJwhzIK+lc+T2epUqVTjxri/TJy POykBJxeprRHjVsVe7Y0TLN8xLu4LDRbIHTYw7xsTtzwAYoyhDdWNefm4Cnb0Xl11eDsFrpggKA NphKchS3zcWv7LtljiirS2Ee60JkGUc0gJkoHNf X-Received: by 2002:a05:6870:a70b:b0:409:5241:8abc with SMTP id 586e51a60fabf-417b91902e7mr4222211fac.20.1773519293652; Sat, 14 Mar 2026 13:14:53 -0700 (PDT) Received: from Atwell-Laptop.. (108-212-132-20.lightspeed.irvnca.sbcglobal.net. [108.212.132.20]) by smtp.gmail.com with ESMTPSA id 586e51a60fabf-4177e5e8185sm11914165fac.12.2026.03.14.13.14.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 14 Mar 2026 13:14:53 -0700 (PDT) From: atwellwea@gmail.com To: netdev@vger.kernel.org, davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com, edumazet@google.com, ncardwell@google.com Cc: linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-trace-kernel@vger.kernel.org, mptcp@lists.linux.dev, dsahern@kernel.org, horms@kernel.org, kuniyu@google.com, andrew+netdev@lunn.ch, willemdebruijn.kernel@gmail.com, jasowang@redhat.com, skhan@linuxfoundation.org, corbet@lwn.net, matttbe@kernel.org, martineau@kernel.org, geliang@kernel.org, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, 0x7f454c46@gmail.com Subject: [PATCH net-next v2 08/14] tcp: extend TCP_REPAIR_WINDOW for live and max-window snapshots Date: Sat, 14 Mar 2026 14:13:42 -0600 Message-ID: <20260314201348.1786972-9-atwellwea@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260314201348.1786972-1-atwellwea@gmail.com> References: <20260314201348.1786972-1-atwellwea@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Wesley Atwell Extend TCP_REPAIR_WINDOW so repair and restore can round-trip both the live rwnd snapshot and the remembered maximum sender-visible window. Keep the ABI append-only by accepting the legacy and v1 prefix lengths on both get and set, rebuilding any missing max-window state from the live window when older userspace restores a socket. Signed-off-by: Wesley Atwell --- include/net/tcp.h | 13 +++---- include/uapi/linux/tcp.h | 8 +++++ net/ipv4/tcp.c | 73 ++++++++++++++++++++++++++++++++++++---- 3 files changed, 81 insertions(+), 13 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index 5b479ad44f89..12e62fea2aaf 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1766,13 +1766,14 @@ static inline bool tcp_space_from_wnd_snapshot(u8 s= caling_ratio, int win, } =20 /* Rebuild hard receive-memory units for data already covered by tp->rcv_w= nd if - * the advertise-time basis is known. + * the advertise-time basis is known. Legacy TCP_REPAIR restores can only + * recover tp->rcv_wnd itself; callers must fall back when the snapshot is + * unknown. */ static inline bool tcp_space_from_rcv_wnd(const struct tcp_sock *tp, int w= in, int *space) { - return tcp_space_from_wnd_snapshot(tp->rcv_wnd_scaling_ratio, win, - space); + return tcp_space_from_wnd_snapshot(tp->rcv_wnd_scaling_ratio, win, space); } =20 /* Same as tcp_space_from_rcv_wnd(), but for the remembered maximum @@ -1800,9 +1801,9 @@ static inline void tcp_scaling_ratio_init(struct sock= *sk) } =20 /* tp->rcv_wnd is paired with the scaling_ratio that was in force when that - * window was last advertised. Callers can leave a zero snapshot when the - * advertise-time basis is unknown and refresh the pair on the next local - * window update. + * window was last advertised. Legacy TCP_REPAIR restores can only recover= the + * window value itself and use a zero snapshot until a fresh local window + * advertisement refreshes the pair. */ static inline void tcp_set_rcv_wnd_snapshot(struct tcp_sock *tp, u32 win, u8 scaling_ratio) diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h index 03772dd4d399..564a77f69130 100644 --- a/include/uapi/linux/tcp.h +++ b/include/uapi/linux/tcp.h @@ -152,6 +152,11 @@ struct tcp_repair_opt { __u32 opt_val; }; =20 +/* Append-only repair ABI. + * Older userspace may stop at rcv_wup or rcv_wnd_scaling_ratio. + * The kernel accepts those prefix lengths and rebuilds any missing + * receive-window snapshot state on restore. + */ struct tcp_repair_window { __u32 snd_wl1; __u32 snd_wnd; @@ -159,6 +164,9 @@ struct tcp_repair_window { =20 __u32 rcv_wnd; __u32 rcv_wup; + __u32 rcv_wnd_scaling_ratio; /* 0 means live-window basis unknown */ + __u32 rcv_mwnd_seq; + __u32 rcv_mwnd_scaling_ratio; /* 0 means max-window basis unknown */ }; =20 enum { diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 66706dbb90f5..39a1265876ea 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -3533,17 +3533,31 @@ static inline bool tcp_can_repair_sock(const struct= sock *sk) (sk->sk_state !=3D TCP_LISTEN); } =20 +/* Keep accepting the pre-extension TCP_REPAIR_WINDOW layout so legacy + * userspace can restore sockets without fabricating a snapshot basis. + */ +static inline int tcp_repair_window_legacy_size(void) +{ + return offsetof(struct tcp_repair_window, rcv_wnd_scaling_ratio); +} + +static inline int tcp_repair_window_v1_size(void) +{ + return offsetof(struct tcp_repair_window, rcv_mwnd_seq); +} + static int tcp_repair_set_window(struct tcp_sock *tp, sockptr_t optbuf, in= t len) { - struct tcp_repair_window opt; + struct tcp_repair_window opt =3D {}; =20 if (!tp->repair) return -EPERM; =20 - if (len !=3D sizeof(opt)) + if (len !=3D tcp_repair_window_legacy_size() && + len !=3D tcp_repair_window_v1_size() && len !=3D sizeof(opt)) return -EINVAL; =20 - if (copy_from_sockptr(&opt, optbuf, sizeof(opt))) + if (copy_from_sockptr(&opt, optbuf, len)) return -EFAULT; =20 if (opt.max_window < opt.snd_wnd) @@ -3559,9 +3573,47 @@ static int tcp_repair_set_window(struct tcp_sock *tp= , sockptr_t optbuf, int len) tp->snd_wnd =3D opt.snd_wnd; tp->max_window =3D opt.max_window; =20 - tp->rcv_wnd =3D opt.rcv_wnd; + if (len =3D=3D tcp_repair_window_legacy_size()) { + /* Legacy repair UAPI has no advertise-time basis for tp->rcv_wnd. + * Mark the snapshot unknown until a fresh local advertisement + * re-establishes the pair. + */ + tcp_set_rcv_wnd_unknown(tp, opt.rcv_wnd); + tp->rcv_wup =3D opt.rcv_wup; + tcp_init_max_rcv_wnd_seq(tp); + return 0; + } + + if (opt.rcv_wnd_scaling_ratio > U8_MAX) + return -EINVAL; + + tcp_set_rcv_wnd_snapshot(tp, opt.rcv_wnd, opt.rcv_wnd_scaling_ratio); tp->rcv_wup =3D opt.rcv_wup; - tp->rcv_mwnd_seq =3D opt.rcv_wup + opt.rcv_wnd; + + if (len =3D=3D tcp_repair_window_v1_size()) { + /* v1 repair can restore the live-window snapshot, but not a + * retracted max-window snapshot. Rebuild it from the live pair + * until a fresh local advertisement updates it again. + */ + tcp_init_max_rcv_wnd_seq(tp); + return 0; + } + + if (opt.rcv_mwnd_scaling_ratio > U8_MAX) + return -EINVAL; + + /* Userspace may repair sequence-space values after checkpoint without + * also rebasing the remembered max advertised right edge. If the exact + * snapshot no longer covers the restored live window, treat it like + * v1 and rebuild the max-window side from the live pair. + */ + if (after(opt.rcv_wup + opt.rcv_wnd, opt.rcv_mwnd_seq)) { + tcp_init_max_rcv_wnd_seq(tp); + return 0; + } + + tp->rcv_mwnd_seq =3D opt.rcv_mwnd_seq; + tp->rcv_mwnd_scaling_ratio =3D opt.rcv_mwnd_scaling_ratio; =20 return 0; } @@ -4650,12 +4702,16 @@ int do_tcp_getsockopt(struct sock *sk, int level, break; =20 case TCP_REPAIR_WINDOW: { - struct tcp_repair_window opt; + struct tcp_repair_window opt =3D {}; =20 if (copy_from_sockptr(&len, optlen, sizeof(int))) return -EFAULT; =20 - if (len !=3D sizeof(opt)) + /* Mirror the accepted set-side prefix lengths so checkpoint + * tools can round-trip exactly the layout version they know. + */ + if (len !=3D tcp_repair_window_legacy_size() && + len !=3D tcp_repair_window_v1_size() && len !=3D sizeof(opt)) return -EINVAL; =20 if (!tp->repair) @@ -4666,6 +4722,9 @@ int do_tcp_getsockopt(struct sock *sk, int level, opt.max_window =3D tp->max_window; opt.rcv_wnd =3D tp->rcv_wnd; opt.rcv_wup =3D tp->rcv_wup; + opt.rcv_wnd_scaling_ratio =3D tp->rcv_wnd_scaling_ratio; + opt.rcv_mwnd_seq =3D tp->rcv_mwnd_seq; + opt.rcv_mwnd_scaling_ratio =3D tp->rcv_mwnd_scaling_ratio; =20 if (copy_to_sockptr(optval, &opt, len)) return -EFAULT; --=20 2.43.0