[PATCH net-next v2 2/3] tcp: keep scaled no-shrink window representable

Wesley Atwell posted 3 patches 1 week, 5 days ago
There is a newer version of this series
[PATCH net-next v2 2/3] tcp: keep scaled no-shrink window representable
Posted by Wesley Atwell 1 week, 5 days ago
In the scaled no-shrink path, __tcp_select_window() currently rounds the
raw free-space value up to the receive-window scale quantum.

When raw backed free_space sits just below the next quantum, that can
expose fresh sender-visible credit beyond the currently backed receive
space.

Fix this by keeping tp->rcv_wnd representable in scaled units: round
larger windows down to the scale quantum and preserve only the small
non-zero case that would otherwise scale away to zero.

This series intentionally leaves that smaller longstanding non-zero case
unchanged. The proven bug and the new reproducer are both in the
larger-window path where free_space is at least one scale quantum, so
changing 0 < free_space < granularity into zero would be a separate
behavior change.

That representability matters across ACK transitions too, not only on
the immediate raw-free_space-limited ACK. tcp_select_window() preserves
the currently offered window when shrinking is disallowed, so if an
earlier ACK stores a rounded-up value in tp->rcv_wnd, a later
raw-free_space-limited ACK can keep inheriting that extra unit.

Keeping tp->rcv_wnd representable throughout the scaled no-shrink path
prevents that carry-forward and makes later no-shrink decisions reason
from a right edge the peer could actually have seen on the wire.

This removes the larger-window quantization slack while preserving the
small non-zero case needed to avoid scaling away to zero.

Signed-off-by: Wesley Atwell <atwellwea@gmail.com>
---
v2:
- rename gran to granularity
- clarify why representable tp->rcv_wnd state is required across later
  no-shrink transitions
- clarify that this series still intentionally leaves the smaller
  longstanding non-zero case unchanged

 net/ipv4/tcp_output.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 35c3b0ab5a0c..e5c4c09101be 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -3375,13 +3375,19 @@ u32 __tcp_select_window(struct sock *sk)
 	 * scaled window will not line up with the MSS boundary anyway.
 	 */
 	if (tp->rx_opt.rcv_wscale) {
-		window = free_space;
+		u32 granularity = 1U << tp->rx_opt.rcv_wscale;
 
-		/* Advertise enough space so that it won't get scaled away.
-		 * Import case: prevent zero window announcement if
-		 * 1<<rcv_wscale > mss.
+		/* Keep tp->rcv_wnd representable in scaled units so later
+		 * no-shrink decisions reason about the same right edge we
+		 * can advertise on the wire. Preserve only a small non-zero
+		 * offer that would otherwise get scaled away to zero.
 		 */
-		window = ALIGN(window, (1 << tp->rx_opt.rcv_wscale));
+		if (free_space >= granularity)
+			window = round_down(free_space, granularity);
+		else if (free_space > 0)
+			window = granularity;
+		else
+			window = 0;
 	} else {
 		window = tp->rcv_wnd;
 		/* Get the largest window that is a nice multiple of mss.
-- 
2.43.0
Re: [PATCH net-next v2 2/3] tcp: keep scaled no-shrink window representable
Posted by Eric Dumazet 1 week, 5 days ago
On Mon, Mar 23, 2026 at 11:04 PM Wesley Atwell <atwellwea@gmail.com> wrote:
>
> In the scaled no-shrink path, __tcp_select_window() currently rounds the
> raw free-space value up to the receive-window scale quantum.
>
> When raw backed free_space sits just below the next quantum, that can
> expose fresh sender-visible credit beyond the currently backed receive
> space.
>
> Fix this by keeping tp->rcv_wnd representable in scaled units: round
> larger windows down to the scale quantum and preserve only the small
> non-zero case that would otherwise scale away to zero.
>
> This series intentionally leaves that smaller longstanding non-zero case
> unchanged. The proven bug and the new reproducer are both in the
> larger-window path where free_space is at least one scale quantum, so
> changing 0 < free_space < granularity into zero would be a separate
> behavior change.
>
> That representability matters across ACK transitions too, not only on
> the immediate raw-free_space-limited ACK. tcp_select_window() preserves
> the currently offered window when shrinking is disallowed, so if an
> earlier ACK stores a rounded-up value in tp->rcv_wnd, a later
> raw-free_space-limited ACK can keep inheriting that extra unit.
>
> Keeping tp->rcv_wnd representable throughout the scaled no-shrink path
> prevents that carry-forward and makes later no-shrink decisions reason
> from a right edge the peer could actually have seen on the wire.
>
> This removes the larger-window quantization slack while preserving the
> small non-zero case needed to avoid scaling away to zero.
>
> Signed-off-by: Wesley Atwell <atwellwea@gmail.com>
> ---
> v2:
> - rename gran to granularity
> - clarify why representable tp->rcv_wnd state is required across later
>   no-shrink transitions
> - clarify that this series still intentionally leaves the smaller
>   longstanding non-zero case unchanged
>
>  net/ipv4/tcp_output.c | 16 +++++++++++-----
>  1 file changed, 11 insertions(+), 5 deletions(-)
>
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 35c3b0ab5a0c..e5c4c09101be 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -3375,13 +3375,19 @@ u32 __tcp_select_window(struct sock *sk)
>          * scaled window will not line up with the MSS boundary anyway.
>          */
>         if (tp->rx_opt.rcv_wscale) {
> -               window = free_space;
> +               u32 granularity = 1U << tp->rx_opt.rcv_wscale;
>
> -               /* Advertise enough space so that it won't get scaled away.
> -                * Import case: prevent zero window announcement if
> -                * 1<<rcv_wscale > mss.
> +               /* Keep tp->rcv_wnd representable in scaled units so later
> +                * no-shrink decisions reason about the same right edge we
> +                * can advertise on the wire. Preserve only a small non-zero
> +                * offer that would otherwise get scaled away to zero.
>                  */
> -               window = ALIGN(window, (1 << tp->rx_opt.rcv_wscale));
> +               if (free_space >= granularity)

@free_space is a signed integer, and @granularity is unsigned.
If @free_space is negative, this first condition will always be true,
because the comparison will promote @free_space to a very large
(unsigned) value.

> +                       window = round_down(free_space, granularity);
> +               else if (free_space > 0)
> +                       window = granularity;
> +               else
> +                       window = 0;
>         } else {
>                 window = tp->rcv_wnd;
>                 /* Get the largest window that is a nice multiple of mss.
> --
> 2.43.0