This series keeps sender-visible TCP receive-window accounting tied to the scaling basis that was in force when the window was advertised. Problem ------- `tp->rcv_wnd` is an advertised promise to the sender, but later receive-memory admission and clamping could reconstruct that promise through the mutable live `scaling_ratio`. After ratio drift, the stack could retain or advertise a receive window that no longer matched the local hard rmem budget. Fix --- - store the advertise-time scaling basis alongside `tp->rcv_wnd` - refresh that pair at the TCP and MPTCP receive-window write sites - consume the snapshot in receive-memory admission, clamping, and the scaled-window quantization path - preserve the snapshot across `TCP_REPAIR_WINDOW` restore when userspace provides it, and fall back safely when legacy userspace cannot - expose the accounting in tracepoints and cover the ABI/runtime contract in selftests Series layout ------------- 1. track the receive-window snapshot state and helpers 2. refresh the snapshot when TCP advertises or initializes windows 3. use the snapshot in receive-memory admission and clamping 4. extend `TCP_REPAIR_WINDOW` for exact restore plus legacy compatibility 5. refresh the TCP shadow window snapshot in MPTCP 6. expose rmem/backlog state in `rcvbuf_grow` tracepoints 7. cover legacy and extended repair-window layouts in selftests Testing ------- - `git diff --check origin/main..HEAD` - `scripts/checkpatch.pl --strict --show-types` on patches 1-7 - `make -j8 headers` - `make -j8 net/ipv4/tcp_input.o net/ipv4/tcp_output.o net/ipv4/tcp_minisocks.o net/ipv4/tcp.o` - `make -j8 C=1 CF='-D__CHECK_ENDIAN__' W=1 net/ipv4/tcp_input.o net/ipv4/tcp_output.o net/ipv4/tcp_minisocks.o net/ipv4/tcp.o` - `make SPHINXDIRS='networking/net_cachelines' htmldocs` - `make -j8 vmlinux bzImage modules` - `make -C tools/testing/selftests/net/tcp_ao -j8` - `make -C tools/testing/selftests/net/mptcp -j8` - `packetdrill --dry_run` for `tcp_rcv_toobig.pkt` and `tcp_rcv_toobig_default.pkt` - `virtme-run` guest pass for both packetdrill tests - feature-enabled guest pass for `restore_ipv4`, `self-connect_ipv4`, and `mptcp_sockopt.sh` Thanks, Wesley --- base-commit: 908c344d5cfa0ee6efb3226d22ea661e078ebfa0 -- 2.43.0
Hi Wesley,
Thank you for your modifications, that's great!
Our CI did some validations and here is its report:
- KVM Validation: normal (except selftest_mptcp_join): Success! ✅
- KVM Validation: normal (only selftest_mptcp_join): Success! ✅
- KVM Validation: debug (except selftest_mptcp_join): Success! ✅
- KVM Validation: debug (only selftest_mptcp_join): Success! ✅
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/22943222289
Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/1943750d8521
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=1064867
If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:
$ cd [kernel source code]
$ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
--pull always mptcp/mptcp-upstream-virtme-docker:latest \
auto-normal
For more details:
https://github.com/multipath-tcp/mptcp-upstream-virtme-docker
Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)
Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)
On Wed, Mar 11, 2026 at 8:56 AM Wesley Atwell <atwellwea@gmail.com> wrote: > > This series keeps sender-visible TCP receive-window accounting tied to the > scaling basis that was in force when the window was advertised. > > Problem > ------- > > `tp->rcv_wnd` is an advertised promise to the sender, but later > receive-memory admission and clamping could reconstruct that promise > through the mutable live `scaling_ratio`. After ratio drift, the stack > could retain or advertise a receive window that no longer matched the > local hard rmem budget. > > Fix > --- > > - store the advertise-time scaling basis alongside `tp->rcv_wnd` > - refresh that pair at the TCP and MPTCP receive-window write sites > - consume the snapshot in receive-memory admission, clamping, and the > scaled-window quantization path > - preserve the snapshot across `TCP_REPAIR_WINDOW` restore when userspace > provides it, and fall back safely when legacy userspace cannot > - expose the accounting in tracepoints and cover the ABI/runtime contract > in selftests > Your series will heavily conflict with Simon's one https://patchwork.kernel.org/project/netdevbpf/list/?series=1063486&state=%2A&archive=both I suggest you rebase/retest/resend after we merge it. > Series layout > ------------- > > 1. track the receive-window snapshot state and helpers > 2. refresh the snapshot when TCP advertises or initializes windows > 3. use the snapshot in receive-memory admission and clamping > 4. extend `TCP_REPAIR_WINDOW` for exact restore plus legacy compatibility > 5. refresh the TCP shadow window snapshot in MPTCP > 6. expose rmem/backlog state in `rcvbuf_grow` tracepoints > 7. cover legacy and extended repair-window layouts in selftests > > Testing > ------- > > - `git diff --check origin/main..HEAD` > - `scripts/checkpatch.pl --strict --show-types` on patches 1-7 > - `make -j8 headers` > - `make -j8 net/ipv4/tcp_input.o net/ipv4/tcp_output.o net/ipv4/tcp_minisocks.o net/ipv4/tcp.o` > - `make -j8 C=1 CF='-D__CHECK_ENDIAN__' W=1 net/ipv4/tcp_input.o net/ipv4/tcp_output.o net/ipv4/tcp_minisocks.o net/ipv4/tcp.o` > - `make SPHINXDIRS='networking/net_cachelines' htmldocs` > - `make -j8 vmlinux bzImage modules` > - `make -C tools/testing/selftests/net/tcp_ao -j8` > - `make -C tools/testing/selftests/net/mptcp -j8` > - `packetdrill --dry_run` for `tcp_rcv_toobig.pkt` and > `tcp_rcv_toobig_default.pkt` > - `virtme-run` guest pass for both packetdrill tests > - feature-enabled guest pass for `restore_ipv4`, `self-connect_ipv4`, and > `mptcp_sockopt.sh` > > Thanks, > Wesley > > --- > base-commit: 908c344d5cfa0ee6efb3226d22ea661e078ebfa0 > -- > 2.43.0 >
On Wed, 11 Mar 2026 09:34:32 +0100 Eric Dumazet wrote: > Your series will heavily conflict with Simon's one > > https://patchwork.kernel.org/project/netdevbpf/list/?series=1063486&state=%2A&archive=both > > I suggest you rebase/retest/resend after we merge it. Would it make sense to extend netdevsim and packetdrill to be able to exercise scaling ratio a little more? Having it optionally clone the skb and truesize += X would be trivial. IDK how many bugs this would let us catch tho :(
On Thu, Mar 12, 2026 at 1:41 AM Jakub Kicinski <kuba@kernel.org> wrote: > > On Wed, 11 Mar 2026 09:34:32 +0100 Eric Dumazet wrote: > > Your series will heavily conflict with Simon's one > > > > https://patchwork.kernel.org/project/netdevbpf/list/?series=1063486&state=%2A&archive=both > > > > I suggest you rebase/retest/resend after we merge it. > > Would it make sense to extend netdevsim and packetdrill to be able to > exercise scaling ratio a little more? Having it optionally clone the > skb and truesize += X would be trivial. IDK how many bugs this would > let us catch tho :( Yes, I think we mentioned this at some point. packetdrill uses tun device. Adding a TUN ioctl() to control how many additional bytes are added to skb->truesize after tun allocates an skb is doable.
On Wed, 11 Mar 2026 01:55:53 -0600 Wesley Atwell wrote: > Subject: [PATCH net 0/7] tcp: preserve advertised rwnd accounting across receive-memory decisions when you repost please make sure you use "PATCH net-next v2" as the tag / prefix. "net" is a tree we use to fast track fixes.
© 2016 - 2026 Red Hat, Inc.