[PATCH mptcp-next v1 0/9] mptcp: address stall under memory pressure

Paolo Abeni posted 9 patches 1 week, 3 days ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/multipath-tcp/mptcp_net-next tags/patchew/cover.1777038888.git.pabeni@redhat.com
There is a newer version of this series
include/net/tcp.h    |   8 ++
net/ipv4/tcp_input.c |  55 +++++---
net/mptcp/fastopen.c |   1 -
net/mptcp/mib.c      |   3 +
net/mptcp/mib.h      |   3 +
net/mptcp/options.c  |  55 +++++++-
net/mptcp/protocol.c | 328 ++++++++++++++++++++++++++++---------------
net/mptcp/protocol.h |  11 +-
net/mptcp/subflow.c  |   2 +
9 files changed, 323 insertions(+), 143 deletions(-)
[PATCH mptcp-next v1 0/9] mptcp: address stall under memory pressure
Posted by Paolo Abeni 1 week, 3 days ago
This an attempt to fix the data transfer stall reported by Geliang and
Gang more carefully enforcing memory constraints at the MPTCP level.

Patch 1/9 moves the bound check before entering the TCP socket.
Patch 2, 3 and 4 are cleanups/refactors finalized to safely re-using TCP
helpers on MPTCP skbs.
Patch 5 makes TCP pruning related helpers available to MPTCP and patch 6
makes use of them. Patch 7 addresses an edge scenario that could still
lead to transfer stall under memory pressure.
Finally patch 8 and 9 improve the MPTCP-level retransmission schema to
make recovery from memory pressure significanly faster.

Note that the diffstat is biases by the quite large patch 4/9, which
contains mechanical transformation of existing code; "real" changes are
noticiable smaller.

Tested successfully vs the test cases proposed by Geliang and Gang.
---
RFC -> v1:
 - dropped old patch 4 & 5
 - addressed AI reported comments
 - added retrans refactor.

Paolo Abeni (9):
  mptcp: move checks vs rcvbuf size earlier in the RX path
  mptcp: drop the mptcp_ooo_try_coalesce() helper
  mptcp: remove CB offset field
  mptcp: sync mptcp skb cb layout with tcp one
  tcp: expose the tcp_collapse_ofo_queue() helper to mptcp usage, too
  mptcp: implemented OoO queue pruning
  mptcp: track prune recovery status
  mptcp: move the retrans loop to a separate helper
  mptcp: let the retrans scheduler do its job.

 include/net/tcp.h    |   8 ++
 net/ipv4/tcp_input.c |  55 +++++---
 net/mptcp/fastopen.c |   1 -
 net/mptcp/mib.c      |   3 +
 net/mptcp/mib.h      |   3 +
 net/mptcp/options.c  |  55 +++++++-
 net/mptcp/protocol.c | 328 ++++++++++++++++++++++++++++---------------
 net/mptcp/protocol.h |  11 +-
 net/mptcp/subflow.c  |   2 +
 9 files changed, 323 insertions(+), 143 deletions(-)

-- 
2.53.0
Re: [PATCH mptcp-next v1 0/9] mptcp: address stall under memory pressure
Posted by Geliang Tang 1 week, 1 day ago
Hi Paolo,

Thanks for these fixes.

On Fri, 2026-04-24 at 16:08 +0200, Paolo Abeni wrote:
> This an attempt to fix the data transfer stall reported by Geliang
> and
> Gang more carefully enforcing memory constraints at the MPTCP level.
> 
> Patch 1/9 moves the bound check before entering the TCP socket.
> Patch 2, 3 and 4 are cleanups/refactors finalized to safely re-using
> TCP
> helpers on MPTCP skbs.
> Patch 5 makes TCP pruning related helpers available to MPTCP and
> patch 6
> makes use of them. Patch 7 addresses an edge scenario that could
> still
> lead to transfer stall under memory pressure.
> Finally patch 8 and 9 improve the MPTCP-level retransmission schema
> to
> make recovery from memory pressure significanly faster.
> 
> Note that the diffstat is biases by the quite large patch 4/9, which
> contains mechanical transformation of existing code; "real" changes
> are
> noticiable smaller.
> 
> Tested successfully vs the test cases proposed by Geliang and Gang.

We found this issue while testing the MPTCP TLS selftests
(tools/testing/selftests/net/tls.c). The multi_chunk.c test was
actually extracted from chunked_sendfile() function in tls.c. The tls.c
file contains many test groups, and chunked_sendfile() is just one of
them. Therefore, passing the chunked_sendfile tests does not guarantee
that all tests in tls.c will pass in the future. So we will provide you
with a test similar to multi_chunk.c shortly, but includes a more
complete set of the tests from tls.c.

> ---
> RFC -> v1:
>  - dropped old patch 4 & 5
>  - addressed AI reported comments
>  - added retrans refactor.
> 
> Paolo Abeni (9):
>   mptcp: move checks vs rcvbuf size earlier in the RX path
>   mptcp: drop the mptcp_ooo_try_coalesce() helper
>   mptcp: remove CB offset field

When implementing MPTCP KTLS, I used this offset field. Please see the
implementation in [1], including mptcp_read_done() and
mptcp_get_skb_seq(). After removing the offset field, I think the new
version should be implemented as follows:

static void mptcp_read_done(struct sock *sk, size_t len) 
{
        struct mptcp_sock *msk = mptcp_sk(sk);
        struct sk_buff *skb;
        size_t left;
        u32 offset;

        msk_owned_by_me(msk);

        if (sk->sk_state == TCP_LISTEN)
                return;

        left = len; 
        while (left && (skb = mptcp_recv_skb(sk, &offset)) != NULL) {
                int used;

                used = min_t(size_t, skb->len - offset, left);
                msk->bytes_consumed += used;
                msk->copied_seq += used;
                left -= used;

                if (skb->len > offset + used)
                        break;

                mptcp_eat_recv_skb(sk, skb);
        }

        mptcp_rcv_space_adjust(msk, len - left);

        /* Clean up data we have read: This will do ACK frames. */
        if (left != len) 
                mptcp_cleanup_rbuf(msk, len - left);
}

static u32 mptcp_get_skb_seq(struct sk_buff *skb)
{
        return MPTCP_SKB_CB(skb)->map_seq;
}

But unfortunately, after this modification, with all your patches in
this set, the newly added MPTCP test cases in tls.c did not all pass.
Are there any obvious issues with my modification in these two
function?

Thanks again, and I will continue to follow up and test this series.

-Geliang

[1]
https://patchwork.kernel.org/project/mptcp/patch/b86c642262c9718f4936ad52dab804b8f494aa6d.1777026753.git.tanggeliang@kylinos.cn/

>   mptcp: sync mptcp skb cb layout with tcp one
>   tcp: expose the tcp_collapse_ofo_queue() helper to mptcp usage, too
>   mptcp: implemented OoO queue pruning
>   mptcp: track prune recovery status
>   mptcp: move the retrans loop to a separate helper
>   mptcp: let the retrans scheduler do its job.
> 
>  include/net/tcp.h    |   8 ++
>  net/ipv4/tcp_input.c |  55 +++++---
>  net/mptcp/fastopen.c |   1 -
>  net/mptcp/mib.c      |   3 +
>  net/mptcp/mib.h      |   3 +
>  net/mptcp/options.c  |  55 +++++++-
>  net/mptcp/protocol.c | 328 ++++++++++++++++++++++++++++-------------
> --
>  net/mptcp/protocol.h |  11 +-
>  net/mptcp/subflow.c  |   2 +
>  9 files changed, 323 insertions(+), 143 deletions(-)
Re: [PATCH mptcp-next v1 0/9] mptcp: address stall under memory pressure
Posted by Geliang Tang 1 week ago
Hi Paolo,

On Mon, 2026-04-27 at 15:27 +0800, Geliang Tang wrote:
> Hi Paolo,
> 
> Thanks for these fixes.
> 
> On Fri, 2026-04-24 at 16:08 +0200, Paolo Abeni wrote:
> > This an attempt to fix the data transfer stall reported by Geliang
> > and
> > Gang more carefully enforcing memory constraints at the MPTCP
> > level.
> > 
> > Patch 1/9 moves the bound check before entering the TCP socket.
> > Patch 2, 3 and 4 are cleanups/refactors finalized to safely re-
> > using
> > TCP
> > helpers on MPTCP skbs.
> > Patch 5 makes TCP pruning related helpers available to MPTCP and
> > patch 6
> > makes use of them. Patch 7 addresses an edge scenario that could
> > still
> > lead to transfer stall under memory pressure.
> > Finally patch 8 and 9 improve the MPTCP-level retransmission schema
> > to
> > make recovery from memory pressure significanly faster.
> > 
> > Note that the diffstat is biases by the quite large patch 4/9,
> > which
> > contains mechanical transformation of existing code; "real" changes
> > are
> > noticiable smaller.
> > 
> > Tested successfully vs the test cases proposed by Geliang and Gang.
> 
> We found this issue while testing the MPTCP TLS selftests
> (tools/testing/selftests/net/tls.c). The multi_chunk.c test was
> actually extracted from chunked_sendfile() function in tls.c. The
> tls.c
> file contains many test groups, and chunked_sendfile() is just one of
> them. Therefore, passing the chunked_sendfile tests does not
> guarantee
> that all tests in tls.c will pass in the future. So we will provide
> you
> with a test similar to multi_chunk.c shortly, but includes a more
> complete set of the tests from tls.c.

I have attached a new test that basically covers all the tests in
tls.c. This test uses MPTCP sockets without TLS encryption.

When I ran it on v5 of Gang's patches in [1], all test items passed;
however, when I ran it on v2 of this series, several test items failed,
accompanied by an OOM error:

# ok 7 mptcp.mutliproc_sendpage_even
# #  RUN           mptcp.mutliproc_writers ...
# # mutliproc_writers: Test terminated by timeout
# #          FAIL  mptcp.mutliproc_writers
# not ok 8 mptcp.mutliproc_writers
# #  RUN           mptcp.mutliproc_readers ...
# # mutliproc_readers: Test terminated by timeout
# #          FAIL  mptcp.mutliproc_readers
# not ok 9 mptcp.mutliproc_readers
# #  RUN           mptcp.mutliproc_even ...
root@mptcpdev:/home/tgl/mptcp_net-next# [   83.031990][  T458]
kworker/13:2: page allocation failure: order:0,
mode:0x40820(GFP_ATOMIC|__GFP_COMP),
nodemask=(null),cpuset=/,mems_allowed=0
[   83.032214][  T465] SLUB: Unable to allocate memory on CPU 20 (of
node 0) on node -1, gfp=0x920(GFP_ATOMIC|__GFP_ZERO)
[   83.032956][  T458] CPU: 13 UID: 0 PID: 458 Comm: kworker/13:2 Not
tainted 7.1.0-rc1+ #85 PREEMPT(full) 
[   83.032959][  T458] Hardware name: Bochs Bochs, BIOS Bochs
01/01/2011
[   83.032961][  T458] Workqueue: events mptcp_worker
[   83.032969][  T458] Call Trace:
[   83.032971][  T458]  <TASK>
[   83.032973][  T458]  dump_stack_lvl+0x6f/0xb0
[   83.032982][  T458]  warn_alloc.cold+0x9b/0x1c4
[   83.032987][  T458]  ? __pfx_warn_alloc+0x10/0x10
[   83.033000][  T458]  __alloc_pages_slowpath.constprop.0+0xa3e/0x1770
[   83.033005][  T458]  ?
__pfx___alloc_pages_slowpath.constprop.0+0x10/0x10
[   83.033011][  T458]  __alloc_frozen_pages_noprof+0x2f3/0x380

I hope this test is useful to you. If you need me to do any testing, I
am very willing to help, just let me know.

Thanks,
-Geliang

[1]
https://patchwork.kernel.org/project/mptcp/cover/cover.1775033340.git.yangang@kylinos.cn/

> 
> > ---
> > RFC -> v1:
> >  - dropped old patch 4 & 5
> >  - addressed AI reported comments
> >  - added retrans refactor.
> > 
> > Paolo Abeni (9):
> >   mptcp: move checks vs rcvbuf size earlier in the RX path
> >   mptcp: drop the mptcp_ooo_try_coalesce() helper
> >   mptcp: remove CB offset field
> 
> When implementing MPTCP KTLS, I used this offset field. Please see
> the
> implementation in [1], including mptcp_read_done() and
> mptcp_get_skb_seq(). After removing the offset field, I think the new
> version should be implemented as follows:
> 
> static void mptcp_read_done(struct sock *sk, size_t len) 
> {
>         struct mptcp_sock *msk = mptcp_sk(sk);
>         struct sk_buff *skb;
>         size_t left;
>         u32 offset;
> 
>         msk_owned_by_me(msk);
> 
>         if (sk->sk_state == TCP_LISTEN)
>                 return;
> 
>         left = len; 
>         while (left && (skb = mptcp_recv_skb(sk, &offset)) != NULL) {
>                 int used;
> 
>                 used = min_t(size_t, skb->len - offset, left);
>                 msk->bytes_consumed += used;
>                 msk->copied_seq += used;
>                 left -= used;
> 
>                 if (skb->len > offset + used)
>                         break;
> 
>                 mptcp_eat_recv_skb(sk, skb);
>         }
> 
>         mptcp_rcv_space_adjust(msk, len - left);
> 
>         /* Clean up data we have read: This will do ACK frames. */
>         if (left != len) 
>                 mptcp_cleanup_rbuf(msk, len - left);
> }
> 
> static u32 mptcp_get_skb_seq(struct sk_buff *skb)
> {
>         return MPTCP_SKB_CB(skb)->map_seq;
> }
> 
> But unfortunately, after this modification, with all your patches in
> this set, the newly added MPTCP test cases in tls.c did not all pass.
> Are there any obvious issues with my modification in these two
> function?
> 
> Thanks again, and I will continue to follow up and test this series.
> 
> -Geliang
> 
> [1]
> https://patchwork.kernel.org/project/mptcp/patch/b86c642262c9718f4936ad52dab804b8f494aa6d.1777026753.git.tanggeliang@kylinos.cn/
> 
> >   mptcp: sync mptcp skb cb layout with tcp one
> >   tcp: expose the tcp_collapse_ofo_queue() helper to mptcp usage,
> > too
> >   mptcp: implemented OoO queue pruning
> >   mptcp: track prune recovery status
> >   mptcp: move the retrans loop to a separate helper
> >   mptcp: let the retrans scheduler do its job.
> > 
> >  include/net/tcp.h    |   8 ++
> >  net/ipv4/tcp_input.c |  55 +++++---
> >  net/mptcp/fastopen.c |   1 -
> >  net/mptcp/mib.c      |   3 +
> >  net/mptcp/mib.h      |   3 +
> >  net/mptcp/options.c  |  55 +++++++-
> >  net/mptcp/protocol.c | 328 ++++++++++++++++++++++++++++-----------
> > --
> > --
> >  net/mptcp/protocol.h |  11 +-
> >  net/mptcp/subflow.c  |   2 +
> >  9 files changed, 323 insertions(+), 143 deletions(-)
Re: [PATCH mptcp-next v1 0/9] mptcp: address stall under memory pressure
Posted by MPTCP CI 1 week, 3 days ago
Hi Paolo,

Thank you for your modifications, that's great!

Our CI did some validations and here is its report:

- KVM Validation: normal (except selftest_mptcp_join): Unstable: 7 failed test(s): packetdrill_fastopen packetdrill_regressions selftest_mptcp_connect selftest_mptcp_connect_checksum selftest_mptcp_connect_mmap selftest_mptcp_connect_sendfile selftest_mptcp_connect_splice ⚠️ 
- KVM Validation: normal (only selftest_mptcp_join): Success! ✅
- KVM Validation: debug (except selftest_mptcp_join): Critical: 2 Call Trace(s) - Critical: Global Timeout ❌
- KVM Validation: debug (only selftest_mptcp_join): Success! ✅
- KVM Validation: btf-normal (only bpftest_all): Unstable: 2 failed test(s): bpftest_test_progs-no_alu32_mptcp bpftest_test_progs_mptcp ⚠️ 
- KVM Validation: btf-debug (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/24894934320

Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/cceb6849cdf8
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=1085225


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-normal

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)