net/ipv4/tcp_output.c | 16 +++++++++++----- .../selftests/net/packetdrill/tcp_ooo_rcv_mss.pkt | 5 ++--- .../packetdrill/tcp_rcv_quantization_credit.pkt | 62 ++++++++++++++++++++++ 3 files changed, 75 insertions(+), 8 deletions(-)
Hi,
This v3 addresses the follow-up review on v2.
Eric pointed out that 1/3 does not need the added packetdrill comment
and that 2/3 compared signed free_space against an unsigned
granularity.
This revision drops the extra in-file comment from 1/3 and keeps
the scaled-window granularity in int space in 2/3 so the comparison
stays type-safe. The overall approach and reproducer remain unchanged
from v2.
Simon was right that the original 3/3 only showed the explicit
rcv_ssthresh-limited ALIGN-up behavior. For v2, 3/3 was replaced with
an OOO-memory-based reproducer that first grows rcv_ssthresh with
in-order data and then drives raw backed free_space below
rcv_ssthresh without advancing rcv_nxt. In the instrumented
old-behavior run that shaped this test, the critical ACK reached
free_space=86190, rcv_ssthresh=86286, and still advertised 87040
(85 << 10). With 2/3 applied, the same ACK stays at 84.
That follow-up also clarified why the broader 2/3 change is required.
A narrower variant that preserved the old rcv_ssthresh-limited ALIGN-up
behavior was not sufficient: earlier ACKs still stored 85 in tp->rcv_wnd,
and tcp_select_window() later preserved that extra unit because shrinking
was disallowed. Keeping tp->rcv_wnd representable across the scaled
no-shrink path is what lets later ACKs settle at the correct
wire-visible edge.
Problem
=======
In the scaled no-shrink path, __tcp_select_window() rounds free_space up
to the receive-window scale quantum:
window = ALIGN(free_space, 1 << tp->rx_opt.rcv_wscale);
When raw backed free_space sits just below the next quantum, that can
expose fresh sender-visible credit that is not actually backed by the
current receive-memory state.
Approach
========
This repost keeps the part with a clear fail-before/pass-after case:
- relax one unrelated packetdrill test which was pinning an
incidental advertised window
- keep tp->rcv_wnd representable in scaled units by rounding larger
windows down to the scale quantum
- preserve only the small non-zero case that would otherwise scale
away to zero; changing that longstanding non-zero-to-zero behavior
would be a separate change from the bug proven here
- prove the actual raw-free_space case with a packetdrill sequence
that reaches free_space < rcv_ssthresh without changing SO_RCVBUF
after the handshake
Tests
=====
Local validation included:
- git diff --check
- checkpatch on the touched diff
- /home/wes/nipa/local/vmksft dirty --tests
'net/packetdrill:tcp_ooo_rcv_mss.pkt
net/packetdrill:tcp_rcv_quantization_credit.pkt'
passes in run 20260324-202158-4929 for ipv4, ipv6, and
ipv4-mapped-ipv6
- the same quantization packetdrill fails on HEAD without 2/3 with:
expected: win 84
actual: win 85
Changes in v3
=============
- drop the unnecessary explanatory packetdrill comment from 1/3
- keep 2/3 granularity in signed int space to avoid the free_space
signed/unsigned comparison bug Eric pointed out
- keep 3/3 unchanged
Series layout
=============
1/3 selftests: packetdrill: stop pinning rwnd in tcp_ooo_rcv_mss
2/3 tcp: keep scaled no-shrink window representable
3/3 selftests: packetdrill: cover scaled rwnd quantization slack
Thanks,
Wesley Atwell
---
net/ipv4/tcp_output.c | 16 +++++++++++-----
.../selftests/net/packetdrill/tcp_ooo_rcv_mss.pkt | 5 ++---
.../packetdrill/tcp_rcv_quantization_credit.pkt | 62 ++++++++++++++++++++++
3 files changed, 75 insertions(+), 8 deletions(-)
base-commit: 5446b8691eb8278f10deca92048fad84ffd1e4d5
--
2.43.0
Hi Wesley, On Tue, Mar 24, 2026 at 02:52:58PM -0600, Wesley Atwell wrote: > Hi, > > This v3 addresses the follow-up review on v2. > > Eric pointed out that 1/3 does not need the added packetdrill comment > and that 2/3 compared signed free_space against an unsigned > granularity. > > This revision drops the extra in-file comment from 1/3 and keeps > the scaled-window granularity in int space in 2/3 so the comparison > stays type-safe. The overall approach and reproducer remain unchanged > from v2. > > Simon was right that the original 3/3 only showed the explicit > rcv_ssthresh-limited ALIGN-up behavior. For v2, 3/3 was replaced with > an OOO-memory-based reproducer that first grows rcv_ssthresh with > in-order data and then drives raw backed free_space below > rcv_ssthresh without advancing rcv_nxt. In the instrumented > old-behavior run that shaped this test, the critical ACK reached > free_space=86190, rcv_ssthresh=86286, and still advertised 87040 > (85 << 10). With 2/3 applied, the same ACK stays at 84. > > That follow-up also clarified why the broader 2/3 change is required. > A narrower variant that preserved the old rcv_ssthresh-limited ALIGN-up > behavior was not sufficient: earlier ACKs still stored 85 in tp->rcv_wnd, > and tcp_select_window() later preserved that extra unit because shrinking > was disallowed. Keeping tp->rcv_wnd representable across the scaled > no-shrink path is what lets later ACKs settle at the correct > wire-visible edge. So, you are saying that 84 defines the "correct wire-visible edge"? That's a strong claim. The test in 3/3 adds OOO packets until the window calculated from free_space is 84. But why stop there? If I added further OOO packets until the calculated window drops to 83, I can claim, by the same reasoning, that 83 is the correct value and the initial 84 is wrong. In other words, this is a very synthetic scenario that can be steered to arbitrary values. As stated in v1, I would really like to see a packetdrill (or real-world scenario) where the old behavior actually hurts (after all, this series claims that the current behavior needs to be fixed). - Simon -- Simon Baatz <gmbnomis@gmail.com>
On Wed, Mar 25, 2026 at 12:58 AM Simon Baatz <gmbnomis@gmail.com> wrote: > > Hi Wesley, > > On Tue, Mar 24, 2026 at 02:52:58PM -0600, Wesley Atwell wrote: > > Hi, > > > > This v3 addresses the follow-up review on v2. > > > > Eric pointed out that 1/3 does not need the added packetdrill comment > > and that 2/3 compared signed free_space against an unsigned > > granularity. > > > > This revision drops the extra in-file comment from 1/3 and keeps > > the scaled-window granularity in int space in 2/3 so the comparison > > stays type-safe. The overall approach and reproducer remain unchanged > > from v2. > > > > Simon was right that the original 3/3 only showed the explicit > > rcv_ssthresh-limited ALIGN-up behavior. For v2, 3/3 was replaced with > > an OOO-memory-based reproducer that first grows rcv_ssthresh with > > in-order data and then drives raw backed free_space below > > rcv_ssthresh without advancing rcv_nxt. In the instrumented > > old-behavior run that shaped this test, the critical ACK reached > > free_space=86190, rcv_ssthresh=86286, and still advertised 87040 > > (85 << 10). With 2/3 applied, the same ACK stays at 84. > > > > That follow-up also clarified why the broader 2/3 change is required. > > A narrower variant that preserved the old rcv_ssthresh-limited ALIGN-up > > behavior was not sufficient: earlier ACKs still stored 85 in tp->rcv_wnd, > > and tcp_select_window() later preserved that extra unit because shrinking > > was disallowed. Keeping tp->rcv_wnd representable across the scaled > > no-shrink path is what lets later ACKs settle at the correct > > wire-visible edge. > > So, you are saying that 84 defines the "correct > wire-visible edge"? That's a strong claim. > > The test in 3/3 adds OOO packets until the window calculated from > free_space is 84. But why stop there? If I added further OOO > packets until the calculated window drops to 83, I can claim, by the > same reasoning, that 83 is the correct value and the initial 84 is > wrong. > > In other words, this is a very synthetic scenario that can be steered > to arbitrary values. As stated in v1, I would really like to see a > packetdrill (or real-world scenario) where the old behavior actually > hurts (after all, this series claims that the current behavior needs > to be fixed). > This series seems to be a social engineering experiment. Could we have the prompts that were fed to an AI agent ? I really do not see the point. RWIN is by essence advisory, and TCP cannot accept arbitrary bloated (small skb->len / skb->truesize ratio) anyway.
Hi, I still think 2/3 is a legitimate fix. To clarify, I was not trying to claim that 84 is some magic number in the abstract, and I agree the packetdrill is artificial. My point was only that, in the constructed case, the old code can preserve a scaled window that is larger than the currently backed receive space, while 2/3 keeps the stored window representable in scaled units. That said, I am probably missing the reason why that is not a problem according to the feedback you all have given. So I am going to drop it here. To be clear this has nothing to do with social engineering, just was trying to fix something that doesn't need fixed I suppose. Thanks, Wesley Atwell
On Wed, Mar 25, 2026 at 10:18 AM Wesley Atwell <atwellwea@gmail.com> wrote: > > Hi, > > I still think 2/3 is a legitimate fix. To clarify, I was not trying to > claim that 84 is some magic number in the abstract, and I agree the > packetdrill is artificial. > > My point was only that, in the constructed case, the old code can > preserve a scaled window that is larger than the currently backed > receive space, while 2/3 keeps the stored window representable in > scaled units. > That said, I am probably missing the reason why that is not a problem > according to the feedback you all have given. > > So I am going to drop it here. > > To be clear this has nothing to do with social engineering, just was > trying to fix something that doesn't need fixed I suppose. We have limited time and many bugs to review. Your series really lacks the signal explaining why it's needed and why it's important. What real workload suffers from the current behavior. Thanks.
© 2016 - 2026 Red Hat, Inc.