[PATCH RFC net-next v2 0/5] tcp: RFC 7323-compliant window retraction handling

Simon Baatz via B4 Relay posted 5 patches 1 month, 1 week ago
There is a newer version of this series
.../networking/net_cachelines/tcp_sock.rst         |   1 +
include/linux/tcp.h                                |   3 +
include/net/tcp.h                                  |  13 ++
net/ipv4/tcp.c                                     |   1 +
net/ipv4/tcp_fastopen.c                            |   1 +
net/ipv4/tcp_input.c                               |   7 +-
net/ipv4/tcp_minisocks.c                           |   1 +
net/ipv4/tcp_output.c                              |  12 ++
.../net/packetdrill/tcp_rcv_big_endseq.pkt         |   2 +-
.../net/packetdrill/tcp_rcv_neg_window.pkt         |  26 ++++
.../net/packetdrill/tcp_rcv_wnd_shrink_allowed.pkt |  40 ++++++
.../net/packetdrill/tcp_rcv_wnd_shrink_nomem.pkt   | 141 +++++++++++++++++++++
12 files changed, 245 insertions(+), 3 deletions(-)
[PATCH RFC net-next v2 0/5] tcp: RFC 7323-compliant window retraction handling
Posted by Simon Baatz via B4 Relay 1 month, 1 week ago
Hi,

this series implements the receiver-side requirements for TCP window
retraction as specified in RFC 7323 and adds packetdrill tests to
cover the new behavior.

It addresses a regression with somewhat complex causes; see my message
"Re: [regression] [PATCH net-next 7/8] tcp: stronger sk_rcvbuf checks"
(https://lkml.kernel.org/netdev/aXaHEk_eRJyhYfyM@gandalf.schnuecks.de/).

Please see the first patch for background and implementation details.

This is an RFC because a few open questions remain:

- Placement of the new rcv_mwnd_seq field in tcp_sock:

  rcv_mwnd_seq is updated together with rcv_wup and rcv_wnd in
  tcp_select_window(). However, rcv_wup is documented as RX read_write
  only (even though it is updated in tcp_select_window()), and rcv_wnd
  is TX read_write / RX read_mostly.

  rcv_mwnd_seq is only updated in tcp_select_window(). If we
  count tcp_sequence() as fast path, it is read in the fast path.

  Therefore, the proposal is to put rcv_mwnd_seq in rcv_wnd's
  cacheline group.

- In tcp_minisocks.c, it is not clear to me whether we should change
  "tcptw->tw_rcv_wnd = tcp_receive_window(tp)" to
  "tcptw->tw_rcv_wnd = tcp_max_receive_window(tp)". I could not find a
  case where this makes a practical difference and have left the
  existing behavior unchanged.

- MPTCP seems to modify tp->rcv_wnd of subflows. And the modifications
  look odd:

  1. It is updated in the RX path. Since we never advertised that
     value, we shouldn't need to update rcv_mwnd_seq.
  2. In the TX path, there is:
  
     tp->rcv_wnd = min_t(u64, win, U32_MAX);

     To me, that looks very wrong and that code might need to be fixed
     first.

- Although this series addresses a regression triggered by commit
  d2fbaad7cd8 ("tcp: stronger sk_rcvbuf checks") the underlying
  problem is shrinking the window. Thus, I added "Fixes" headers for
  the commits that introduced window shrinking.

I would appreciate feedback on the overall approach and on these
questions.

Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
---
Changes in v2:

- tcp_rcv_wnd_shrink_nomem.pkt tests more RX code paths using various
  segment types. It also uses a more drastic rcv. buffer reduction (1MB
  to 16KB).
- Setting the TCP_REPAIR_WINDOW socket option initializes rcv_mwnd_seq.
- SKB_DROP_REASON_TCP_OVERWINDOW increases LINUX_MIB_BEYOND_WINDOW now.
- Moved rcv_mwnd_seq into rcv_wnd's cacheline group.
- Small editorial changes
- Link to v1: https://lore.kernel.org/r/20260220-tcp_rfc7323_retract_wnd_rfc-v1-0-904942561479@gmail.com

---
Simon Baatz (5):
      tcp: implement RFC 7323 window retraction receiver requirements
      tcp: increase LINUX_MIB_BEYOND_WINDOW for SKB_DROP_REASON_TCP_OVERWINDOW
      selftests/net: packetdrill: add tcp_rcv_wnd_shrink_nomem.pkt
      selftests/net: packetdrill: add tcp_rcv_wnd_shrink_allowed.pkt
      selftests/net: packetdrill: add tcp_rcv_neg_window.pkt

 .../networking/net_cachelines/tcp_sock.rst         |   1 +
 include/linux/tcp.h                                |   3 +
 include/net/tcp.h                                  |  13 ++
 net/ipv4/tcp.c                                     |   1 +
 net/ipv4/tcp_fastopen.c                            |   1 +
 net/ipv4/tcp_input.c                               |   7 +-
 net/ipv4/tcp_minisocks.c                           |   1 +
 net/ipv4/tcp_output.c                              |  12 ++
 .../net/packetdrill/tcp_rcv_big_endseq.pkt         |   2 +-
 .../net/packetdrill/tcp_rcv_neg_window.pkt         |  26 ++++
 .../net/packetdrill/tcp_rcv_wnd_shrink_allowed.pkt |  40 ++++++
 .../net/packetdrill/tcp_rcv_wnd_shrink_nomem.pkt   | 141 +++++++++++++++++++++
 12 files changed, 245 insertions(+), 3 deletions(-)
---
base-commit: 2f61f38a217462411fed950e843b82bc119884cf
change-id: 20260220-tcp_rfc7323_retract_wnd_rfc-c8a2d2baebde

Best regards,
-- 
Simon Baatz <gmbnomis@gmail.com>
Re: [PATCH RFC net-next v2 0/5] tcp: RFC 7323-compliant window retraction handling
Posted by Matthieu Baerts 1 month, 1 week ago
Hi Simon,

On 26/02/2026 01:49, Simon Baatz via B4 Relay wrote:
> this series implements the receiver-side requirements for TCP window
> retraction as specified in RFC 7323 and adds packetdrill tests to
> cover the new behavior.

Thank you for looking at that.

> It addresses a regression with somewhat complex causes; see my message
> "Re: [regression] [PATCH net-next 7/8] tcp: stronger sk_rcvbuf checks"
> (https://lkml.kernel.org/netdev/aXaHEk_eRJyhYfyM@gandalf.schnuecks.de/).
> 
> Please see the first patch for background and implementation details.
> 
> This is an RFC because a few open questions remain:

(...)

> - MPTCP seems to modify tp->rcv_wnd of subflows. And the modifications
>   look odd:
> 
>   1. It is updated in the RX path. Since we never advertised that
>      value, we shouldn't need to update rcv_mwnd_seq.

FYI, with MPTCP the received windows are shared between subflows. This
might be surprising, but maintaining per-subflow receive windows could
end up stalling some subflows while others would not use up their
window. For more details, please check this section of the RFC:

  https://datatracker.ietf.org/doc/html/rfc8684#sec_rwin

>   2. In the TX path, there is:
>   
>      tp->rcv_wnd = min_t(u64, win, U32_MAX);
> 
>      To me, that looks very wrong and that code might need to be fixed
>      first.
The capping is explained because the MPTCP-level ack seq is on 64-bit,
while the TCP level receive window is on 32-bit.

I hope this helps better understanding these modifications, and
hopefully not introducing regressions on the MPTCP side :)

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.
Re: [PATCH RFC net-next v2 0/5] tcp: RFC 7323-compliant window retraction handling
Posted by Simon Baatz 1 month, 1 week ago
Hi Matt,

On Thu, Feb 26, 2026 at 09:12:07AM +0100, Matthieu Baerts wrote:
> Hi Simon,
> 
> On 26/02/2026 01:49, Simon Baatz via B4 Relay wrote:
> > this series implements the receiver-side requirements for TCP window
> > retraction as specified in RFC 7323 and adds packetdrill tests to
> > cover the new behavior.
> 
> Thank you for looking at that.

Thank you for chiming in; I know that my comments are somewhat
provocative. :)

> > It addresses a regression with somewhat complex causes; see my message
> > "Re: [regression] [PATCH net-next 7/8] tcp: stronger sk_rcvbuf checks"
> > (https://lkml.kernel.org/netdev/aXaHEk_eRJyhYfyM@gandalf.schnuecks.de/).
> > 
> > Please see the first patch for background and implementation details.
> > 
> > This is an RFC because a few open questions remain:
> 
> (...)
> 
> > - MPTCP seems to modify tp->rcv_wnd of subflows. And the modifications
> >   look odd:
> > 
> >   1. It is updated in the RX path. Since we never advertised that
> >      value, we shouldn't need to update rcv_mwnd_seq.
> 
> FYI, with MPTCP the received windows are shared between subflows. This
> might be surprising, but maintaining per-subflow receive windows could
> end up stalling some subflows while others would not use up their
> window. For more details, please check this section of the RFC:
> 
>   https://datatracker.ietf.org/doc/html/rfc8684#sec_rwin

RFC 8646 has several pointers to RFC 5961 and in section 3.3.5 it
says:

                                                 ... Each of these
   segments will be mapped onto subflows, as long as subflow sequence
   numbers are in the allowed windows for those subflows.  Note that

So, I assume that on sub-flow level we are still supposed to do
the standard TCP sequence acceptability checks with respect to
the advertised window for the subflow.

If so, my concern is that raising rcv_wnd in the RX path means that
we may accept sequence numbers that we never advertised in that
particular subflow.

> 
> >   2. In the TX path, there is:
> >   
> >      tp->rcv_wnd = min_t(u64, win, U32_MAX);
> > 
> >      To me, that looks very wrong and that code might need to be fixed
> >      first.
> The capping is explained because the MPTCP-level ack seq is on 64-bit,
> while the TCP level receive window is on 32-bit.

The issues I see here are:

1. When calculating the usable receive window in TCP, we use 32-bit
   signed arithmetic.
2. The max. window size with window scaling is around 1GB
3. As said, rcv_wnd is used for acceptability checks.  In standard
   TCP we ensure that rcv_wnd is aligned to the window scaling
   factor.

So, I had assumed to see the "reverse" of the current TX raise window
logic in MPTCP: First, calculate the advertised window to put into
the outgoing packet and then update rcv_wnd accordingly.
 
> I hope this helps better understanding these modifications, and
> hopefully not introducing regressions on the MPTCP side :)

Yes, thank you. Regarding regressions, I couldn't agree more.


- Simon
 
-- 
Simon Baatz <gmbnomis@gmail.com>