[PATCH RFC net-next 0/4] tcp: RFC 7323-compliant window retraction handling

Simon Baatz via B4 Relay posted 4 patches 1 month, 1 week ago
There is a newer version of this series
.../networking/net_cachelines/tcp_sock.rst         |   1 +
include/linux/tcp.h                                |   1 +
include/net/tcp.h                                  |  14 +++
net/ipv4/tcp_fastopen.c                            |   1 +
net/ipv4/tcp_input.c                               |   6 +-
net/ipv4/tcp_minisocks.c                           |   1 +
net/ipv4/tcp_output.c                              |  12 +++
.../net/packetdrill/tcp_rcv_big_endseq.pkt         |   2 +-
.../packetdrill/tcp_rcv_toobig_back_to_back.pkt    |  27 +++++
.../net/packetdrill/tcp_rcv_wnd_shrink_allowed.pkt |  35 +++++++
.../net/packetdrill/tcp_rcv_wnd_shrink_nomem.pkt   | 109 +++++++++++++++++++++
11 files changed, 206 insertions(+), 3 deletions(-)
[PATCH RFC net-next 0/4] tcp: RFC 7323-compliant window retraction handling
Posted by Simon Baatz via B4 Relay 1 month, 1 week ago
Hi,

this series implements the receiver-side requirements for TCP window
retraction as specified in RFC 7323 and adds packetdrill tests to
cover the new behavior.

It addresses a regression with somewhat complex causes; see my message
"Re: [regression] [PATCH net-next 7/8] tcp: stronger sk_rcvbuf checks"
(https://lkml.kernel.org/netdev/aXaHEk_eRJyhYfyM@gandalf.schnuecks.de/).

Please see the first patch for background and implementation details.

This is an RFC because a few open questions remain:

- Placement of the new rcv_mwnd_seq field in tcp_sock:

  rcv_mwnd_seq is updated together with rcv_wup and rcv_wnd in
  tcp_select_window. However, rcv_wup is documented as RX read_write
  only (even though it is updated in tcp_select_window), and rcv_wnd
  is TX read_write / RX read_mostly.

  rcv_mwnd_seq is only updated in tcp_select_window and, as far as I
  can tell, is not used on the RX fast path.

  If I understand the placement rules correctly, this means that
  rcv_mwnd_seq, rcv_wup, and rcv_wnd end up in different cacheline
  groups, which feels odd. Guidance on where rcv_mwnd_seq should live
  would be appreciated.

- In tcp_minisocks.c, it is not clear to me whether we should change
  "tcptw->tw_rcv_wnd = tcp_receive_window(tp)" to
  "tcptw->tw_rcv_wnd = tcp_max_receive_window(tp)". I could not find a
  case where this makes a practical difference and have left the
  existing behavior unchanged.

- Packetdrill tests: Some of these seem rather brittle to me; I
  included them mostly to document what I have tested. Suggestions
  for making them more robust are welcome.

- MPTCP seems to modify tp->rcv_wnd of subflows. I haven't looked at
  this, since I wanted to get feedback on the overall approach first.

- Although this series addresses a regression triggered by commit
  d2fbaad7cd8 ("tcp: stronger sk_rcvbuf checks") the underlying
  problem is shrinking the window. Thus I added "Fixes" headers for
  the commits that introduced window shrinking.

I would appreciate feedback on the overall approach and on these
questions.

Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
---
Simon Baatz (4):
      tcp: implement RFC 7323 window retraction receiver requirements
      selftests/net: packetdrill: add tcp_rcv_wnd_shrink_nomem.pkt
      selftests/net: packetdrill: add tcp_rcv_wnd_shrink_allowed.pkt
      selftests/net: packetdrill: add tcp_rcv_toobig_back_to_back.pkt

 .../networking/net_cachelines/tcp_sock.rst         |   1 +
 include/linux/tcp.h                                |   1 +
 include/net/tcp.h                                  |  14 +++
 net/ipv4/tcp_fastopen.c                            |   1 +
 net/ipv4/tcp_input.c                               |   6 +-
 net/ipv4/tcp_minisocks.c                           |   1 +
 net/ipv4/tcp_output.c                              |  12 +++
 .../net/packetdrill/tcp_rcv_big_endseq.pkt         |   2 +-
 .../packetdrill/tcp_rcv_toobig_back_to_back.pkt    |  27 +++++
 .../net/packetdrill/tcp_rcv_wnd_shrink_allowed.pkt |  35 +++++++
 .../net/packetdrill/tcp_rcv_wnd_shrink_nomem.pkt   | 109 +++++++++++++++++++++
 11 files changed, 206 insertions(+), 3 deletions(-)
---
base-commit: 8bf22c33e7a172fbc72464f4cc484d23a6b412ba
change-id: 20260220-tcp_rfc7323_retract_wnd_rfc-c8a2d2baebde

Best regards,
-- 
Simon Baatz <gmbnomis@gmail.com>
Re: [PATCH RFC net-next 0/4] tcp: RFC 7323-compliant window retraction handling
Posted by Eric Dumazet 1 month, 1 week ago
On Fri, Feb 20, 2026 at 12:56 AM Simon Baatz via B4 Relay
<devnull+gmbnomis.gmail.com@kernel.org> wrote:
>
> Hi,
>
> this series implements the receiver-side requirements for TCP window
> retraction as specified in RFC 7323 and adds packetdrill tests to
> cover the new behavior.
>
> It addresses a regression with somewhat complex causes; see my message
> "Re: [regression] [PATCH net-next 7/8] tcp: stronger sk_rcvbuf checks"
> (https://lkml.kernel.org/netdev/aXaHEk_eRJyhYfyM@gandalf.schnuecks.de/).
>
> Please see the first patch for background and implementation details.
>
> This is an RFC because a few open questions remain:
>
> - Placement of the new rcv_mwnd_seq field in tcp_sock:
>
>   rcv_mwnd_seq is updated together with rcv_wup and rcv_wnd in
>   tcp_select_window. However, rcv_wup is documented as RX read_write
>   only (even though it is updated in tcp_select_window), and rcv_wnd
>   is TX read_write / RX read_mostly.
>
>   rcv_mwnd_seq is only updated in tcp_select_window and, as far as I
>   can tell, is not used on the RX fast path.
>
>   If I understand the placement rules correctly, this means that
>   rcv_mwnd_seq, rcv_wup, and rcv_wnd end up in different cacheline
>   groups, which feels odd. Guidance on where rcv_mwnd_seq should live
>   would be appreciated.
>
> - In tcp_minisocks.c, it is not clear to me whether we should change
>   "tcptw->tw_rcv_wnd = tcp_receive_window(tp)" to
>   "tcptw->tw_rcv_wnd = tcp_max_receive_window(tp)". I could not find a
>   case where this makes a practical difference and have left the
>   existing behavior unchanged.
>
> - Packetdrill tests: Some of these seem rather brittle to me; I
>   included them mostly to document what I have tested. Suggestions
>   for making them more robust are welcome.
>
> - MPTCP seems to modify tp->rcv_wnd of subflows. I haven't looked at
>   this, since I wanted to get feedback on the overall approach first.
>
> - Although this series addresses a regression triggered by commit
>   d2fbaad7cd8 ("tcp: stronger sk_rcvbuf checks") the underlying
>   problem is shrinking the window. Thus I added "Fixes" headers for
>   the commits that introduced window shrinking.
>
> I would appreciate feedback on the overall approach and on these
> questions.
>

Hi Simon, thanks for the clean series.

I would guess you use some AI ? This is fine, just curious.

Can you add more tests, in memory stress situations ?

Like :

A receiver grew the RWIN over time up to 8 MB.

Then the application (or the kernel under stress) used SO_RCVBUF to 16K.

I want to make sure the socket wont accept packets to fill the prior
window and consume 8MB

8MB seems fine, unless the host has 100,000 sockets in the same situation.

Thanks

> Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
> ---
> Simon Baatz (4):
>       tcp: implement RFC 7323 window retraction receiver requirements
>       selftests/net: packetdrill: add tcp_rcv_wnd_shrink_nomem.pkt
>       selftests/net: packetdrill: add tcp_rcv_wnd_shrink_allowed.pkt
>       selftests/net: packetdrill: add tcp_rcv_toobig_back_to_back.pkt
>
>  .../networking/net_cachelines/tcp_sock.rst         |   1 +
>  include/linux/tcp.h                                |   1 +
>  include/net/tcp.h                                  |  14 +++
>  net/ipv4/tcp_fastopen.c                            |   1 +
>  net/ipv4/tcp_input.c                               |   6 +-
>  net/ipv4/tcp_minisocks.c                           |   1 +
>  net/ipv4/tcp_output.c                              |  12 +++
>  .../net/packetdrill/tcp_rcv_big_endseq.pkt         |   2 +-
>  .../packetdrill/tcp_rcv_toobig_back_to_back.pkt    |  27 +++++
>  .../net/packetdrill/tcp_rcv_wnd_shrink_allowed.pkt |  35 +++++++
>  .../net/packetdrill/tcp_rcv_wnd_shrink_nomem.pkt   | 109 +++++++++++++++++++++
>  11 files changed, 206 insertions(+), 3 deletions(-)
> ---
> base-commit: 8bf22c33e7a172fbc72464f4cc484d23a6b412ba
> change-id: 20260220-tcp_rfc7323_retract_wnd_rfc-c8a2d2baebde
>
> Best regards,
> --
> Simon Baatz <gmbnomis@gmail.com>
>
>
Re: [PATCH RFC net-next 0/4] tcp: RFC 7323-compliant window retraction handling
Posted by Simon Baatz 1 month, 1 week ago
Hi Eric,

On Fri, Feb 20, 2026 at 09:58:00AM +0100, Eric Dumazet wrote:
> Hi Simon, thanks for the clean series.
> 
> I would guess you use some AI ? This is fine, just curious.

Thank you!  Yes, I’ve found AI helpful for getting familiar with a
new code base.  I also use it to refine or clean up the wording of
bigger commit messages.  Code generation works quite well for quick,
throw‑away code (like reproducers).
 
> Can you add more tests, in memory stress situations ?
> 
> Like :
> 
> A receiver grew the RWIN over time up to 8 MB.
> 
> Then the application (or the kernel under stress) used SO_RCVBUF to 16K.
> 
> I want to make sure the socket wont accept packets to fill the prior
> window and consume 8MB

I suspect generating 8 MB worth of RX data in packetdrill won't be
fun (unless there’s a trick I’m missing).  And using regular TCP
sockets on both ends would probably be rather uninteresting (no
packets sent once RWIN = 0)

It might be more practical to extend one of the tests to create two
situations in packetdrill:

1. Zero window:  0 == RWIN < 2 * squeezed SO_RCVBUF < tracked max. RWIN < 2 * original SO_RCVBUF
2. Small window: 0  < RWIN < 2 * squeezed SO_RCVBUF < tracked max. RWIN < 2 * original SO_RCVBUF

If these limits are sufficiently distinct, we could probe tcp_sequence() and
tcp_data_queue() paths in detail using:
  
* pure ACK or data packet
* in-order or out-of order
* within, partially within, or beyond (max) window

If we can show that we can't use more memory than expected for the
squeezed buffer, then the original max window size shouldn’t really
matter.

wdyt?

- Simon

-- 
Simon Baatz <gmbnomis@gmail.com>
Re: [PATCH RFC net-next 0/4] tcp: RFC 7323-compliant window retraction handling
Posted by Eric Dumazet 1 month, 1 week ago
On Mon, Feb 23, 2026 at 1:07 AM Simon Baatz <gmbnomis@gmail.com> wrote:
>
> Hi Eric,
>
> On Fri, Feb 20, 2026 at 09:58:00AM +0100, Eric Dumazet wrote:
> > Hi Simon, thanks for the clean series.
> >
> > I would guess you use some AI ? This is fine, just curious.
>
> Thank you!  Yes, I’ve found AI helpful for getting familiar with a
> new code base.  I also use it to refine or clean up the wording of
> bigger commit messages.  Code generation works quite well for quick,
> throw‑away code (like reproducers).
>
> > Can you add more tests, in memory stress situations ?
> >
> > Like :
> >
> > A receiver grew the RWIN over time up to 8 MB.
> >
> > Then the application (or the kernel under stress) used SO_RCVBUF to 16K.
> >
> > I want to make sure the socket wont accept packets to fill the prior
> > window and consume 8MB
>
> I suspect generating 8 MB worth of RX data in packetdrill won't be
> fun (unless there’s a trick I’m missing).  And using regular TCP
> sockets on both ends would probably be rather uninteresting (no
> packets sent once RWIN = 0)
>

8MB was only to show my point.

A packetdrill test reaching 1MB should be doable.