[PATCH net-next 2/3] tcp: keep scaled no-shrink window representable

Wesley Atwell posted 3 patches 2 weeks, 6 days ago
There is a newer version of this series
[PATCH net-next 2/3] tcp: keep scaled no-shrink window representable
Posted by Wesley Atwell 2 weeks, 6 days ago
In the scaled no-shrink path, __tcp_select_window() currently rounds the
raw free-space value up to the receive-window scale quantum.

That can expose fresh sender-visible credit beyond the currently backed
free space.

Fix this without changing the meaning of the stored receive-window
state. Keep tp->rcv_wnd representable in scaled units by rounding larger
windows down to the scale quantum and preserving only the small
non-zero case that would otherwise scale away to zero.

tcp_select_window() already preserves the no-shrink guarantee from the
currently offered window, so later no-shrink decisions continue to
reason from a right edge the peer actually saw on the wire.

This removes the larger-window quantization slack from rounding
free_space up, while preserving the small non-zero case needed to avoid
scaling away to zero.

Signed-off-by: Wesley Atwell <atwellwea@gmail.com>
---
 net/ipv4/tcp_output.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 35c3b0ab5a0cb714155d5720fe56888f71aecced..bd3a43148a87e891bc632a47ffb5b82c475e8f6f 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -3375,13 +3375,19 @@ u32 __tcp_select_window(struct sock *sk)
 	 * scaled window will not line up with the MSS boundary anyway.
 	 */
 	if (tp->rx_opt.rcv_wscale) {
-		window = free_space;
+		u32 gran = 1U << tp->rx_opt.rcv_wscale;
 
-		/* Advertise enough space so that it won't get scaled away.
-		 * Import case: prevent zero window announcement if
-		 * 1<<rcv_wscale > mss.
+		/* Keep tp->rcv_wnd representable in scaled units so later
+		 * no-shrink decisions reason about the same right edge we
+		 * can advertise on the wire. Preserve only a small non-zero
+		 * offer that would otherwise get scaled away to zero.
 		 */
-		window = ALIGN(window, (1 << tp->rx_opt.rcv_wscale));
+		if (free_space >= gran)
+			window = round_down(free_space, gran);
+		else if (free_space > 0)
+			window = gran;
+		else
+			window = 0;
 	} else {
 		window = tp->rcv_wnd;
 		/* Get the largest window that is a nice multiple of mss.
-- 
2.43.0
Re: [PATCH net-next 2/3] tcp: keep scaled no-shrink window representable
Posted by Paolo Abeni 2 weeks, 4 days ago
On 3/17/26 7:51 AM, Wesley Atwell wrote:
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 35c3b0ab5a0cb714155d5720fe56888f71aecced..bd3a43148a87e891bc632a47ffb5b82c475e8f6f 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -3375,13 +3375,19 @@ u32 __tcp_select_window(struct sock *sk)
>  	 * scaled window will not line up with the MSS boundary anyway.
>  	 */
>  	if (tp->rx_opt.rcv_wscale) {
> -		window = free_space;
> +		u32 gran = 1U << tp->rx_opt.rcv_wscale;
>  
> -		/* Advertise enough space so that it won't get scaled away.
> -		 * Import case: prevent zero window announcement if
> -		 * 1<<rcv_wscale > mss.
> +		/* Keep tp->rcv_wnd representable in scaled units so later
> +		 * no-shrink decisions reason about the same right edge we
> +		 * can advertise on the wire. Preserve only a small non-zero
> +		 * offer that would otherwise get scaled away to zero.
>  		 */
> -		window = ALIGN(window, (1 << tp->rx_opt.rcv_wscale));
> +		if (free_space >= gran)
> +			window = round_down(free_space, gran);

The receive window already has a similar rounding in the `free_space <
(full_space >> 1)` case. This is basically excluding only:

	gran > free_space >= (full_space >> 1)

which IDK if is a realistic situation, perhaps just do the scale down
unconditionally?

Also minor nit, prefer 'granularity' over 'gran'

/P
Re: [PATCH net-next 2/3] tcp: keep scaled no-shrink window representable
Posted by Wesley Atwell 2 weeks, 4 days ago
Paolo,

I re-checked that corner more carefully. I do not want to overstate it as a
common path, but I also do not think the current code rules it out.

rcv_wscale is fixed at connection setup, while full_space/free_space are
recomputed later from the current receive-buffer state. The tree explicitly
allows later SO_RCVBUF reduction, and the window clamp can also change later,
so I do not see an invariant that keeps full_space above one scale quantum
once the scale has been negotiated.

That said, my reason for keeping the small non-zero case is not that
unconditional scale-down would be less safe. It is that it would also change
the long-standing behavior that avoids scaling a non-zero offer away to zero.
My intent here was to remove the larger-window round-up slack without changing
that smaller legacy case in the same patch.

If you would prefer to also change that small non-zero case, I can do that in
v2 instead.

I will also rename gran to granularity in v2, if so.

Thanks,
Wesley