include/net/tcp.h | 8 ++ net/ipv4/tcp_input.c | 55 +++++--- net/mptcp/fastopen.c | 1 - net/mptcp/mib.c | 3 + net/mptcp/mib.h | 3 + net/mptcp/options.c | 55 +++++++- net/mptcp/protocol.c | 328 ++++++++++++++++++++++++++++--------------- net/mptcp/protocol.h | 11 +- net/mptcp/subflow.c | 2 + 9 files changed, 323 insertions(+), 143 deletions(-)
This an attempt to fix the data transfer stall reported by Geliang and Gang more carefully enforcing memory constraints at the MPTCP level. Patch 1/9 moves the bound check before entering the TCP socket. Patch 2, 3 and 4 are cleanups/refactors finalized to safely re-using TCP helpers on MPTCP skbs. Patch 5 makes TCP pruning related helpers available to MPTCP and patch 6 makes use of them. Patch 7 addresses an edge scenario that could still lead to transfer stall under memory pressure. Finally patch 8 and 9 improve the MPTCP-level retransmission schema to make recovery from memory pressure significanly faster. Note that the diffstat is biases by the quite large patch 4/9, which contains mechanical transformation of existing code; "real" changes are noticiable smaller. Tested successfully vs the test cases proposed by Geliang and Gang. --- RFC -> v1: - dropped old patch 4 & 5 - addressed AI reported comments - added retrans refactor. Paolo Abeni (9): mptcp: move checks vs rcvbuf size earlier in the RX path mptcp: drop the mptcp_ooo_try_coalesce() helper mptcp: remove CB offset field mptcp: sync mptcp skb cb layout with tcp one tcp: expose the tcp_collapse_ofo_queue() helper to mptcp usage, too mptcp: implemented OoO queue pruning mptcp: track prune recovery status mptcp: move the retrans loop to a separate helper mptcp: let the retrans scheduler do its job. include/net/tcp.h | 8 ++ net/ipv4/tcp_input.c | 55 +++++--- net/mptcp/fastopen.c | 1 - net/mptcp/mib.c | 3 + net/mptcp/mib.h | 3 + net/mptcp/options.c | 55 +++++++- net/mptcp/protocol.c | 328 ++++++++++++++++++++++++++++--------------- net/mptcp/protocol.h | 11 +- net/mptcp/subflow.c | 2 + 9 files changed, 323 insertions(+), 143 deletions(-) -- 2.53.0
Hi Paolo,
Thanks for these fixes.
On Fri, 2026-04-24 at 16:08 +0200, Paolo Abeni wrote:
> This an attempt to fix the data transfer stall reported by Geliang
> and
> Gang more carefully enforcing memory constraints at the MPTCP level.
>
> Patch 1/9 moves the bound check before entering the TCP socket.
> Patch 2, 3 and 4 are cleanups/refactors finalized to safely re-using
> TCP
> helpers on MPTCP skbs.
> Patch 5 makes TCP pruning related helpers available to MPTCP and
> patch 6
> makes use of them. Patch 7 addresses an edge scenario that could
> still
> lead to transfer stall under memory pressure.
> Finally patch 8 and 9 improve the MPTCP-level retransmission schema
> to
> make recovery from memory pressure significanly faster.
>
> Note that the diffstat is biases by the quite large patch 4/9, which
> contains mechanical transformation of existing code; "real" changes
> are
> noticiable smaller.
>
> Tested successfully vs the test cases proposed by Geliang and Gang.
We found this issue while testing the MPTCP TLS selftests
(tools/testing/selftests/net/tls.c). The multi_chunk.c test was
actually extracted from chunked_sendfile() function in tls.c. The tls.c
file contains many test groups, and chunked_sendfile() is just one of
them. Therefore, passing the chunked_sendfile tests does not guarantee
that all tests in tls.c will pass in the future. So we will provide you
with a test similar to multi_chunk.c shortly, but includes a more
complete set of the tests from tls.c.
> ---
> RFC -> v1:
> - dropped old patch 4 & 5
> - addressed AI reported comments
> - added retrans refactor.
>
> Paolo Abeni (9):
> mptcp: move checks vs rcvbuf size earlier in the RX path
> mptcp: drop the mptcp_ooo_try_coalesce() helper
> mptcp: remove CB offset field
When implementing MPTCP KTLS, I used this offset field. Please see the
implementation in [1], including mptcp_read_done() and
mptcp_get_skb_seq(). After removing the offset field, I think the new
version should be implemented as follows:
static void mptcp_read_done(struct sock *sk, size_t len)
{
struct mptcp_sock *msk = mptcp_sk(sk);
struct sk_buff *skb;
size_t left;
u32 offset;
msk_owned_by_me(msk);
if (sk->sk_state == TCP_LISTEN)
return;
left = len;
while (left && (skb = mptcp_recv_skb(sk, &offset)) != NULL) {
int used;
used = min_t(size_t, skb->len - offset, left);
msk->bytes_consumed += used;
msk->copied_seq += used;
left -= used;
if (skb->len > offset + used)
break;
mptcp_eat_recv_skb(sk, skb);
}
mptcp_rcv_space_adjust(msk, len - left);
/* Clean up data we have read: This will do ACK frames. */
if (left != len)
mptcp_cleanup_rbuf(msk, len - left);
}
static u32 mptcp_get_skb_seq(struct sk_buff *skb)
{
return MPTCP_SKB_CB(skb)->map_seq;
}
But unfortunately, after this modification, with all your patches in
this set, the newly added MPTCP test cases in tls.c did not all pass.
Are there any obvious issues with my modification in these two
function?
Thanks again, and I will continue to follow up and test this series.
-Geliang
[1]
https://patchwork.kernel.org/project/mptcp/patch/b86c642262c9718f4936ad52dab804b8f494aa6d.1777026753.git.tanggeliang@kylinos.cn/
> mptcp: sync mptcp skb cb layout with tcp one
> tcp: expose the tcp_collapse_ofo_queue() helper to mptcp usage, too
> mptcp: implemented OoO queue pruning
> mptcp: track prune recovery status
> mptcp: move the retrans loop to a separate helper
> mptcp: let the retrans scheduler do its job.
>
> include/net/tcp.h | 8 ++
> net/ipv4/tcp_input.c | 55 +++++---
> net/mptcp/fastopen.c | 1 -
> net/mptcp/mib.c | 3 +
> net/mptcp/mib.h | 3 +
> net/mptcp/options.c | 55 +++++++-
> net/mptcp/protocol.c | 328 ++++++++++++++++++++++++++++-------------
> --
> net/mptcp/protocol.h | 11 +-
> net/mptcp/subflow.c | 2 +
> 9 files changed, 323 insertions(+), 143 deletions(-)
Hi Paolo,
On Mon, 2026-04-27 at 15:27 +0800, Geliang Tang wrote:
> Hi Paolo,
>
> Thanks for these fixes.
>
> On Fri, 2026-04-24 at 16:08 +0200, Paolo Abeni wrote:
> > This an attempt to fix the data transfer stall reported by Geliang
> > and
> > Gang more carefully enforcing memory constraints at the MPTCP
> > level.
> >
> > Patch 1/9 moves the bound check before entering the TCP socket.
> > Patch 2, 3 and 4 are cleanups/refactors finalized to safely re-
> > using
> > TCP
> > helpers on MPTCP skbs.
> > Patch 5 makes TCP pruning related helpers available to MPTCP and
> > patch 6
> > makes use of them. Patch 7 addresses an edge scenario that could
> > still
> > lead to transfer stall under memory pressure.
> > Finally patch 8 and 9 improve the MPTCP-level retransmission schema
> > to
> > make recovery from memory pressure significanly faster.
> >
> > Note that the diffstat is biases by the quite large patch 4/9,
> > which
> > contains mechanical transformation of existing code; "real" changes
> > are
> > noticiable smaller.
> >
> > Tested successfully vs the test cases proposed by Geliang and Gang.
>
> We found this issue while testing the MPTCP TLS selftests
> (tools/testing/selftests/net/tls.c). The multi_chunk.c test was
> actually extracted from chunked_sendfile() function in tls.c. The
> tls.c
> file contains many test groups, and chunked_sendfile() is just one of
> them. Therefore, passing the chunked_sendfile tests does not
> guarantee
> that all tests in tls.c will pass in the future. So we will provide
> you
> with a test similar to multi_chunk.c shortly, but includes a more
> complete set of the tests from tls.c.
I have attached a new test that basically covers all the tests in
tls.c. This test uses MPTCP sockets without TLS encryption.
When I ran it on v5 of Gang's patches in [1], all test items passed;
however, when I ran it on v2 of this series, several test items failed,
accompanied by an OOM error:
# ok 7 mptcp.mutliproc_sendpage_even
# # RUN mptcp.mutliproc_writers ...
# # mutliproc_writers: Test terminated by timeout
# # FAIL mptcp.mutliproc_writers
# not ok 8 mptcp.mutliproc_writers
# # RUN mptcp.mutliproc_readers ...
# # mutliproc_readers: Test terminated by timeout
# # FAIL mptcp.mutliproc_readers
# not ok 9 mptcp.mutliproc_readers
# # RUN mptcp.mutliproc_even ...
root@mptcpdev:/home/tgl/mptcp_net-next# [ 83.031990][ T458]
kworker/13:2: page allocation failure: order:0,
mode:0x40820(GFP_ATOMIC|__GFP_COMP),
nodemask=(null),cpuset=/,mems_allowed=0
[ 83.032214][ T465] SLUB: Unable to allocate memory on CPU 20 (of
node 0) on node -1, gfp=0x920(GFP_ATOMIC|__GFP_ZERO)
[ 83.032956][ T458] CPU: 13 UID: 0 PID: 458 Comm: kworker/13:2 Not
tainted 7.1.0-rc1+ #85 PREEMPT(full)
[ 83.032959][ T458] Hardware name: Bochs Bochs, BIOS Bochs
01/01/2011
[ 83.032961][ T458] Workqueue: events mptcp_worker
[ 83.032969][ T458] Call Trace:
[ 83.032971][ T458] <TASK>
[ 83.032973][ T458] dump_stack_lvl+0x6f/0xb0
[ 83.032982][ T458] warn_alloc.cold+0x9b/0x1c4
[ 83.032987][ T458] ? __pfx_warn_alloc+0x10/0x10
[ 83.033000][ T458] __alloc_pages_slowpath.constprop.0+0xa3e/0x1770
[ 83.033005][ T458] ?
__pfx___alloc_pages_slowpath.constprop.0+0x10/0x10
[ 83.033011][ T458] __alloc_frozen_pages_noprof+0x2f3/0x380
I hope this test is useful to you. If you need me to do any testing, I
am very willing to help, just let me know.
Thanks,
-Geliang
[1]
https://patchwork.kernel.org/project/mptcp/cover/cover.1775033340.git.yangang@kylinos.cn/
>
> > ---
> > RFC -> v1:
> > - dropped old patch 4 & 5
> > - addressed AI reported comments
> > - added retrans refactor.
> >
> > Paolo Abeni (9):
> > mptcp: move checks vs rcvbuf size earlier in the RX path
> > mptcp: drop the mptcp_ooo_try_coalesce() helper
> > mptcp: remove CB offset field
>
> When implementing MPTCP KTLS, I used this offset field. Please see
> the
> implementation in [1], including mptcp_read_done() and
> mptcp_get_skb_seq(). After removing the offset field, I think the new
> version should be implemented as follows:
>
> static void mptcp_read_done(struct sock *sk, size_t len)
> {
> struct mptcp_sock *msk = mptcp_sk(sk);
> struct sk_buff *skb;
> size_t left;
> u32 offset;
>
> msk_owned_by_me(msk);
>
> if (sk->sk_state == TCP_LISTEN)
> return;
>
> left = len;
> while (left && (skb = mptcp_recv_skb(sk, &offset)) != NULL) {
> int used;
>
> used = min_t(size_t, skb->len - offset, left);
> msk->bytes_consumed += used;
> msk->copied_seq += used;
> left -= used;
>
> if (skb->len > offset + used)
> break;
>
> mptcp_eat_recv_skb(sk, skb);
> }
>
> mptcp_rcv_space_adjust(msk, len - left);
>
> /* Clean up data we have read: This will do ACK frames. */
> if (left != len)
> mptcp_cleanup_rbuf(msk, len - left);
> }
>
> static u32 mptcp_get_skb_seq(struct sk_buff *skb)
> {
> return MPTCP_SKB_CB(skb)->map_seq;
> }
>
> But unfortunately, after this modification, with all your patches in
> this set, the newly added MPTCP test cases in tls.c did not all pass.
> Are there any obvious issues with my modification in these two
> function?
>
> Thanks again, and I will continue to follow up and test this series.
>
> -Geliang
>
> [1]
> https://patchwork.kernel.org/project/mptcp/patch/b86c642262c9718f4936ad52dab804b8f494aa6d.1777026753.git.tanggeliang@kylinos.cn/
>
> > mptcp: sync mptcp skb cb layout with tcp one
> > tcp: expose the tcp_collapse_ofo_queue() helper to mptcp usage,
> > too
> > mptcp: implemented OoO queue pruning
> > mptcp: track prune recovery status
> > mptcp: move the retrans loop to a separate helper
> > mptcp: let the retrans scheduler do its job.
> >
> > include/net/tcp.h | 8 ++
> > net/ipv4/tcp_input.c | 55 +++++---
> > net/mptcp/fastopen.c | 1 -
> > net/mptcp/mib.c | 3 +
> > net/mptcp/mib.h | 3 +
> > net/mptcp/options.c | 55 +++++++-
> > net/mptcp/protocol.c | 328 ++++++++++++++++++++++++++++-----------
> > --
> > --
> > net/mptcp/protocol.h | 11 +-
> > net/mptcp/subflow.c | 2 +
> > 9 files changed, 323 insertions(+), 143 deletions(-)
Hi Paolo,
Thank you for your modifications, that's great!
Our CI did some validations and here is its report:
- KVM Validation: normal (except selftest_mptcp_join): Unstable: 7 failed test(s): packetdrill_fastopen packetdrill_regressions selftest_mptcp_connect selftest_mptcp_connect_checksum selftest_mptcp_connect_mmap selftest_mptcp_connect_sendfile selftest_mptcp_connect_splice ⚠️
- KVM Validation: normal (only selftest_mptcp_join): Success! ✅
- KVM Validation: debug (except selftest_mptcp_join): Critical: 2 Call Trace(s) - Critical: Global Timeout ❌
- KVM Validation: debug (only selftest_mptcp_join): Success! ✅
- KVM Validation: btf-normal (only bpftest_all): Unstable: 2 failed test(s): bpftest_test_progs-no_alu32_mptcp bpftest_test_progs_mptcp ⚠️
- KVM Validation: btf-debug (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/24894934320
Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/cceb6849cdf8
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=1085225
If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:
$ cd [kernel source code]
$ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
--pull always mptcp/mptcp-upstream-virtme-docker:latest \
auto-normal
For more details:
https://github.com/multipath-tcp/mptcp-upstream-virtme-docker
Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)
Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)
© 2016 - 2026 Red Hat, Inc.