[PATCH net-next v3 0/3] tcp: fix scaled no-shrink rwnd quantization slack

Wesley Atwell posted 3 patches 1 week, 2 days ago
net/ipv4/tcp_output.c                              | 16 +++++++++++-----
.../selftests/net/packetdrill/tcp_ooo_rcv_mss.pkt |  5 ++---
.../packetdrill/tcp_rcv_quantization_credit.pkt   | 62 ++++++++++++++++++++++
3 files changed, 75 insertions(+), 8 deletions(-)
[PATCH net-next v3 0/3] tcp: fix scaled no-shrink rwnd quantization slack
Posted by Wesley Atwell 1 week, 2 days ago
Hi,

This v3 addresses the follow-up review on v2.

Eric pointed out that 1/3 does not need the added packetdrill comment
and that 2/3 compared signed free_space against an unsigned
granularity.

This revision drops the extra in-file comment from 1/3 and keeps
the scaled-window granularity in int space in 2/3 so the comparison
stays type-safe. The overall approach and reproducer remain unchanged
from v2.

Simon was right that the original 3/3 only showed the explicit
rcv_ssthresh-limited ALIGN-up behavior. For v2, 3/3 was replaced with
an OOO-memory-based reproducer that first grows rcv_ssthresh with
in-order data and then drives raw backed free_space below
rcv_ssthresh without advancing rcv_nxt. In the instrumented
old-behavior run that shaped this test, the critical ACK reached
free_space=86190, rcv_ssthresh=86286, and still advertised 87040
(85 << 10). With 2/3 applied, the same ACK stays at 84.

That follow-up also clarified why the broader 2/3 change is required.
A narrower variant that preserved the old rcv_ssthresh-limited ALIGN-up
behavior was not sufficient: earlier ACKs still stored 85 in tp->rcv_wnd,
and tcp_select_window() later preserved that extra unit because shrinking
was disallowed. Keeping tp->rcv_wnd representable across the scaled
no-shrink path is what lets later ACKs settle at the correct
wire-visible edge.

Problem
=======

In the scaled no-shrink path, __tcp_select_window() rounds free_space up
to the receive-window scale quantum:

  window = ALIGN(free_space, 1 << tp->rx_opt.rcv_wscale);

When raw backed free_space sits just below the next quantum, that can
expose fresh sender-visible credit that is not actually backed by the
current receive-memory state.

Approach
========

This repost keeps the part with a clear fail-before/pass-after case:

  - relax one unrelated packetdrill test which was pinning an
    incidental advertised window
  - keep tp->rcv_wnd representable in scaled units by rounding larger
    windows down to the scale quantum
  - preserve only the small non-zero case that would otherwise scale
    away to zero; changing that longstanding non-zero-to-zero behavior
    would be a separate change from the bug proven here
  - prove the actual raw-free_space case with a packetdrill sequence
    that reaches free_space < rcv_ssthresh without changing SO_RCVBUF
    after the handshake

Tests
=====

Local validation included:
- git diff --check
- checkpatch on the touched diff
- /home/wes/nipa/local/vmksft dirty --tests
  'net/packetdrill:tcp_ooo_rcv_mss.pkt
   net/packetdrill:tcp_rcv_quantization_credit.pkt'
  passes in run 20260324-202158-4929 for ipv4, ipv6, and
  ipv4-mapped-ipv6
- the same quantization packetdrill fails on HEAD without 2/3 with:

    expected: win 84
      actual: win 85

Changes in v3
=============

- drop the unnecessary explanatory packetdrill comment from 1/3
- keep 2/3 granularity in signed int space to avoid the free_space
  signed/unsigned comparison bug Eric pointed out
- keep 3/3 unchanged

Series layout
=============

  1/3 selftests: packetdrill: stop pinning rwnd in tcp_ooo_rcv_mss
  2/3 tcp: keep scaled no-shrink window representable
  3/3 selftests: packetdrill: cover scaled rwnd quantization slack

Thanks,
Wesley Atwell

---
 net/ipv4/tcp_output.c                              | 16 +++++++++++-----
 .../selftests/net/packetdrill/tcp_ooo_rcv_mss.pkt |  5 ++---
 .../packetdrill/tcp_rcv_quantization_credit.pkt   | 62 ++++++++++++++++++++++
 3 files changed, 75 insertions(+), 8 deletions(-)
 base-commit: 5446b8691eb8278f10deca92048fad84ffd1e4d5

-- 
2.43.0
Re: [PATCH net-next v3 0/3] tcp: fix scaled no-shrink rwnd quantization slack
Posted by Simon Baatz 1 week, 1 day ago
Hi Wesley,

On Tue, Mar 24, 2026 at 02:52:58PM -0600, Wesley Atwell wrote:
> Hi,
> 
> This v3 addresses the follow-up review on v2.
> 
> Eric pointed out that 1/3 does not need the added packetdrill comment
> and that 2/3 compared signed free_space against an unsigned
> granularity.
> 
> This revision drops the extra in-file comment from 1/3 and keeps
> the scaled-window granularity in int space in 2/3 so the comparison
> stays type-safe. The overall approach and reproducer remain unchanged
> from v2.
> 
> Simon was right that the original 3/3 only showed the explicit
> rcv_ssthresh-limited ALIGN-up behavior. For v2, 3/3 was replaced with
> an OOO-memory-based reproducer that first grows rcv_ssthresh with
> in-order data and then drives raw backed free_space below
> rcv_ssthresh without advancing rcv_nxt. In the instrumented
> old-behavior run that shaped this test, the critical ACK reached
> free_space=86190, rcv_ssthresh=86286, and still advertised 87040
> (85 << 10). With 2/3 applied, the same ACK stays at 84.
> 
> That follow-up also clarified why the broader 2/3 change is required.
> A narrower variant that preserved the old rcv_ssthresh-limited ALIGN-up
> behavior was not sufficient: earlier ACKs still stored 85 in tp->rcv_wnd,
> and tcp_select_window() later preserved that extra unit because shrinking
> was disallowed. Keeping tp->rcv_wnd representable across the scaled
> no-shrink path is what lets later ACKs settle at the correct
> wire-visible edge.

So, you are saying that 84 defines the "correct
wire-visible edge"?  That's a strong claim.

The test in 3/3 adds OOO packets until the window calculated from
free_space is 84.  But why stop there?  If I added further OOO
packets until the calculated window drops to 83, I can claim, by the
same reasoning, that 83 is the correct value and the initial 84 is
wrong.

In other words, this is a very synthetic scenario that can be steered
to arbitrary values.  As stated in v1, I would really like to see a
packetdrill (or real-world scenario) where the old behavior actually
hurts (after all, this series claims that the current behavior needs
to be fixed).

 
- Simon
-- 
Simon Baatz <gmbnomis@gmail.com>
Re: [PATCH net-next v3 0/3] tcp: fix scaled no-shrink rwnd quantization slack
Posted by Eric Dumazet 1 week, 1 day ago
On Wed, Mar 25, 2026 at 12:58 AM Simon Baatz <gmbnomis@gmail.com> wrote:
>
> Hi Wesley,
>
> On Tue, Mar 24, 2026 at 02:52:58PM -0600, Wesley Atwell wrote:
> > Hi,
> >
> > This v3 addresses the follow-up review on v2.
> >
> > Eric pointed out that 1/3 does not need the added packetdrill comment
> > and that 2/3 compared signed free_space against an unsigned
> > granularity.
> >
> > This revision drops the extra in-file comment from 1/3 and keeps
> > the scaled-window granularity in int space in 2/3 so the comparison
> > stays type-safe. The overall approach and reproducer remain unchanged
> > from v2.
> >
> > Simon was right that the original 3/3 only showed the explicit
> > rcv_ssthresh-limited ALIGN-up behavior. For v2, 3/3 was replaced with
> > an OOO-memory-based reproducer that first grows rcv_ssthresh with
> > in-order data and then drives raw backed free_space below
> > rcv_ssthresh without advancing rcv_nxt. In the instrumented
> > old-behavior run that shaped this test, the critical ACK reached
> > free_space=86190, rcv_ssthresh=86286, and still advertised 87040
> > (85 << 10). With 2/3 applied, the same ACK stays at 84.
> >
> > That follow-up also clarified why the broader 2/3 change is required.
> > A narrower variant that preserved the old rcv_ssthresh-limited ALIGN-up
> > behavior was not sufficient: earlier ACKs still stored 85 in tp->rcv_wnd,
> > and tcp_select_window() later preserved that extra unit because shrinking
> > was disallowed. Keeping tp->rcv_wnd representable across the scaled
> > no-shrink path is what lets later ACKs settle at the correct
> > wire-visible edge.
>
> So, you are saying that 84 defines the "correct
> wire-visible edge"?  That's a strong claim.
>
> The test in 3/3 adds OOO packets until the window calculated from
> free_space is 84.  But why stop there?  If I added further OOO
> packets until the calculated window drops to 83, I can claim, by the
> same reasoning, that 83 is the correct value and the initial 84 is
> wrong.
>
> In other words, this is a very synthetic scenario that can be steered
> to arbitrary values.  As stated in v1, I would really like to see a
> packetdrill (or real-world scenario) where the old behavior actually
> hurts (after all, this series claims that the current behavior needs
> to be fixed).
>

This series seems to be a social engineering experiment.

Could we have the prompts that were fed to an AI agent ?

I really do not see the point. RWIN is by essence advisory, and TCP
cannot accept
arbitrary bloated (small skb->len / skb->truesize ratio) anyway.
Re: [PATCH net-next v3 0/3] tcp: fix scaled no-shrink rwnd quantization slack
Posted by Wesley Atwell 1 week, 1 day ago
Hi,

I still think 2/3 is a legitimate fix. To clarify, I was not trying to
claim that 84 is some magic number in the abstract, and I agree the
packetdrill is artificial.

My point was only that, in the constructed case, the old code can
preserve a scaled window that is larger than the currently backed
receive space, while 2/3 keeps the stored window representable in
scaled units.
That said, I am probably missing the reason why that is not a problem
according to the feedback you all have given.

So I am going to drop it here.

To be clear this has nothing to do with social engineering, just was
trying to fix something that doesn't need fixed I suppose.

Thanks,
Wesley Atwell
Re: [PATCH net-next v3 0/3] tcp: fix scaled no-shrink rwnd quantization slack
Posted by Eric Dumazet 1 week, 1 day ago
On Wed, Mar 25, 2026 at 10:18 AM Wesley Atwell <atwellwea@gmail.com> wrote:
>
> Hi,
>
> I still think 2/3 is a legitimate fix. To clarify, I was not trying to
> claim that 84 is some magic number in the abstract, and I agree the
> packetdrill is artificial.
>
> My point was only that, in the constructed case, the old code can
> preserve a scaled window that is larger than the currently backed
> receive space, while 2/3 keeps the stored window representable in
> scaled units.
> That said, I am probably missing the reason why that is not a problem
> according to the feedback you all have given.
>
> So I am going to drop it here.
>
> To be clear this has nothing to do with social engineering, just was
> trying to fix something that doesn't need fixed I suppose.

We have limited time and many bugs to review.

Your series really lacks the signal explaining why it's needed and why
it's important.
What real workload suffers from the current behavior.

Thanks.