.../networking/net_cachelines/tcp_sock.rst | 1 + include/linux/tcp.h | 1 + include/net/tcp.h | 14 +++ net/ipv4/tcp_fastopen.c | 1 + net/ipv4/tcp_input.c | 6 +- net/ipv4/tcp_minisocks.c | 1 + net/ipv4/tcp_output.c | 12 +++ .../net/packetdrill/tcp_rcv_big_endseq.pkt | 2 +- .../packetdrill/tcp_rcv_toobig_back_to_back.pkt | 27 +++++ .../net/packetdrill/tcp_rcv_wnd_shrink_allowed.pkt | 35 +++++++ .../net/packetdrill/tcp_rcv_wnd_shrink_nomem.pkt | 109 +++++++++++++++++++++ 11 files changed, 206 insertions(+), 3 deletions(-)
Hi,
this series implements the receiver-side requirements for TCP window
retraction as specified in RFC 7323 and adds packetdrill tests to
cover the new behavior.
It addresses a regression with somewhat complex causes; see my message
"Re: [regression] [PATCH net-next 7/8] tcp: stronger sk_rcvbuf checks"
(https://lkml.kernel.org/netdev/aXaHEk_eRJyhYfyM@gandalf.schnuecks.de/).
Please see the first patch for background and implementation details.
This is an RFC because a few open questions remain:
- Placement of the new rcv_mwnd_seq field in tcp_sock:
rcv_mwnd_seq is updated together with rcv_wup and rcv_wnd in
tcp_select_window. However, rcv_wup is documented as RX read_write
only (even though it is updated in tcp_select_window), and rcv_wnd
is TX read_write / RX read_mostly.
rcv_mwnd_seq is only updated in tcp_select_window and, as far as I
can tell, is not used on the RX fast path.
If I understand the placement rules correctly, this means that
rcv_mwnd_seq, rcv_wup, and rcv_wnd end up in different cacheline
groups, which feels odd. Guidance on where rcv_mwnd_seq should live
would be appreciated.
- In tcp_minisocks.c, it is not clear to me whether we should change
"tcptw->tw_rcv_wnd = tcp_receive_window(tp)" to
"tcptw->tw_rcv_wnd = tcp_max_receive_window(tp)". I could not find a
case where this makes a practical difference and have left the
existing behavior unchanged.
- Packetdrill tests: Some of these seem rather brittle to me; I
included them mostly to document what I have tested. Suggestions
for making them more robust are welcome.
- MPTCP seems to modify tp->rcv_wnd of subflows. I haven't looked at
this, since I wanted to get feedback on the overall approach first.
- Although this series addresses a regression triggered by commit
d2fbaad7cd8 ("tcp: stronger sk_rcvbuf checks") the underlying
problem is shrinking the window. Thus I added "Fixes" headers for
the commits that introduced window shrinking.
I would appreciate feedback on the overall approach and on these
questions.
Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
---
Simon Baatz (4):
tcp: implement RFC 7323 window retraction receiver requirements
selftests/net: packetdrill: add tcp_rcv_wnd_shrink_nomem.pkt
selftests/net: packetdrill: add tcp_rcv_wnd_shrink_allowed.pkt
selftests/net: packetdrill: add tcp_rcv_toobig_back_to_back.pkt
.../networking/net_cachelines/tcp_sock.rst | 1 +
include/linux/tcp.h | 1 +
include/net/tcp.h | 14 +++
net/ipv4/tcp_fastopen.c | 1 +
net/ipv4/tcp_input.c | 6 +-
net/ipv4/tcp_minisocks.c | 1 +
net/ipv4/tcp_output.c | 12 +++
.../net/packetdrill/tcp_rcv_big_endseq.pkt | 2 +-
.../packetdrill/tcp_rcv_toobig_back_to_back.pkt | 27 +++++
.../net/packetdrill/tcp_rcv_wnd_shrink_allowed.pkt | 35 +++++++
.../net/packetdrill/tcp_rcv_wnd_shrink_nomem.pkt | 109 +++++++++++++++++++++
11 files changed, 206 insertions(+), 3 deletions(-)
---
base-commit: 8bf22c33e7a172fbc72464f4cc484d23a6b412ba
change-id: 20260220-tcp_rfc7323_retract_wnd_rfc-c8a2d2baebde
Best regards,
--
Simon Baatz <gmbnomis@gmail.com>
On Fri, Feb 20, 2026 at 12:56 AM Simon Baatz via B4 Relay
<devnull+gmbnomis.gmail.com@kernel.org> wrote:
>
> Hi,
>
> this series implements the receiver-side requirements for TCP window
> retraction as specified in RFC 7323 and adds packetdrill tests to
> cover the new behavior.
>
> It addresses a regression with somewhat complex causes; see my message
> "Re: [regression] [PATCH net-next 7/8] tcp: stronger sk_rcvbuf checks"
> (https://lkml.kernel.org/netdev/aXaHEk_eRJyhYfyM@gandalf.schnuecks.de/).
>
> Please see the first patch for background and implementation details.
>
> This is an RFC because a few open questions remain:
>
> - Placement of the new rcv_mwnd_seq field in tcp_sock:
>
> rcv_mwnd_seq is updated together with rcv_wup and rcv_wnd in
> tcp_select_window. However, rcv_wup is documented as RX read_write
> only (even though it is updated in tcp_select_window), and rcv_wnd
> is TX read_write / RX read_mostly.
>
> rcv_mwnd_seq is only updated in tcp_select_window and, as far as I
> can tell, is not used on the RX fast path.
>
> If I understand the placement rules correctly, this means that
> rcv_mwnd_seq, rcv_wup, and rcv_wnd end up in different cacheline
> groups, which feels odd. Guidance on where rcv_mwnd_seq should live
> would be appreciated.
>
> - In tcp_minisocks.c, it is not clear to me whether we should change
> "tcptw->tw_rcv_wnd = tcp_receive_window(tp)" to
> "tcptw->tw_rcv_wnd = tcp_max_receive_window(tp)". I could not find a
> case where this makes a practical difference and have left the
> existing behavior unchanged.
>
> - Packetdrill tests: Some of these seem rather brittle to me; I
> included them mostly to document what I have tested. Suggestions
> for making them more robust are welcome.
>
> - MPTCP seems to modify tp->rcv_wnd of subflows. I haven't looked at
> this, since I wanted to get feedback on the overall approach first.
>
> - Although this series addresses a regression triggered by commit
> d2fbaad7cd8 ("tcp: stronger sk_rcvbuf checks") the underlying
> problem is shrinking the window. Thus I added "Fixes" headers for
> the commits that introduced window shrinking.
>
> I would appreciate feedback on the overall approach and on these
> questions.
>
Hi Simon, thanks for the clean series.
I would guess you use some AI ? This is fine, just curious.
Can you add more tests, in memory stress situations ?
Like :
A receiver grew the RWIN over time up to 8 MB.
Then the application (or the kernel under stress) used SO_RCVBUF to 16K.
I want to make sure the socket wont accept packets to fill the prior
window and consume 8MB
8MB seems fine, unless the host has 100,000 sockets in the same situation.
Thanks
> Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
> ---
> Simon Baatz (4):
> tcp: implement RFC 7323 window retraction receiver requirements
> selftests/net: packetdrill: add tcp_rcv_wnd_shrink_nomem.pkt
> selftests/net: packetdrill: add tcp_rcv_wnd_shrink_allowed.pkt
> selftests/net: packetdrill: add tcp_rcv_toobig_back_to_back.pkt
>
> .../networking/net_cachelines/tcp_sock.rst | 1 +
> include/linux/tcp.h | 1 +
> include/net/tcp.h | 14 +++
> net/ipv4/tcp_fastopen.c | 1 +
> net/ipv4/tcp_input.c | 6 +-
> net/ipv4/tcp_minisocks.c | 1 +
> net/ipv4/tcp_output.c | 12 +++
> .../net/packetdrill/tcp_rcv_big_endseq.pkt | 2 +-
> .../packetdrill/tcp_rcv_toobig_back_to_back.pkt | 27 +++++
> .../net/packetdrill/tcp_rcv_wnd_shrink_allowed.pkt | 35 +++++++
> .../net/packetdrill/tcp_rcv_wnd_shrink_nomem.pkt | 109 +++++++++++++++++++++
> 11 files changed, 206 insertions(+), 3 deletions(-)
> ---
> base-commit: 8bf22c33e7a172fbc72464f4cc484d23a6b412ba
> change-id: 20260220-tcp_rfc7323_retract_wnd_rfc-c8a2d2baebde
>
> Best regards,
> --
> Simon Baatz <gmbnomis@gmail.com>
>
>
Hi Eric, On Fri, Feb 20, 2026 at 09:58:00AM +0100, Eric Dumazet wrote: > Hi Simon, thanks for the clean series. > > I would guess you use some AI ? This is fine, just curious. Thank you! Yes, Iâve found AI helpful for getting familiar with a new code base. I also use it to refine or clean up the wording of bigger commit messages. Code generation works quite well for quick, throwâaway code (like reproducers). > Can you add more tests, in memory stress situations ? > > Like : > > A receiver grew the RWIN over time up to 8 MB. > > Then the application (or the kernel under stress) used SO_RCVBUF to 16K. > > I want to make sure the socket wont accept packets to fill the prior > window and consume 8MB I suspect generating 8â¯MB worth of RX data in packetdrill won't be fun (unless thereâs a trick Iâm missing). And using regular TCP sockets on both ends would probably be rather uninteresting (no packets sent once RWIN = 0) It might be more practical to extend one of the tests to create two situations in packetdrill: 1. Zero window: 0 == RWIN < 2 * squeezed SO_RCVBUF < tracked max. RWIN < 2 * original SO_RCVBUF 2. Small window: 0 < RWIN < 2 * squeezed SO_RCVBUF < tracked max. RWIN < 2 * original SO_RCVBUF If these limits are sufficiently distinct, we could probe tcp_sequence() and tcp_data_queue() paths in detail using: * pure ACK or data packet * in-order or out-of order * within, partially within, or beyond (max) window If we can show that we can't use more memory than expected for the squeezed buffer, then the original max window size shouldnât really matter. wdyt? - Simon -- Simon Baatz <gmbnomis@gmail.com>
On Mon, Feb 23, 2026 at 1:07 AM Simon Baatz <gmbnomis@gmail.com> wrote: > > Hi Eric, > > On Fri, Feb 20, 2026 at 09:58:00AM +0100, Eric Dumazet wrote: > > Hi Simon, thanks for the clean series. > > > > I would guess you use some AI ? This is fine, just curious. > > Thank you! Yes, I’ve found AI helpful for getting familiar with a > new code base. I also use it to refine or clean up the wording of > bigger commit messages. Code generation works quite well for quick, > throw‑away code (like reproducers). > > > Can you add more tests, in memory stress situations ? > > > > Like : > > > > A receiver grew the RWIN over time up to 8 MB. > > > > Then the application (or the kernel under stress) used SO_RCVBUF to 16K. > > > > I want to make sure the socket wont accept packets to fill the prior > > window and consume 8MB > > I suspect generating 8 MB worth of RX data in packetdrill won't be > fun (unless there’s a trick I’m missing). And using regular TCP > sockets on both ends would probably be rather uninteresting (no > packets sent once RWIN = 0) > 8MB was only to show my point. A packetdrill test reaching 1MB should be doable.
© 2016 - 2026 Red Hat, Inc.