[PATCH v7 mptcp-next 0/7] mptcp: address stall under memory pressure

Paolo Abeni posted 7 patches 6 days ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/multipath-tcp/mptcp_net-next tags/patchew/cover.1779210016.git.pabeni@redhat.com
There is a newer version of this series
net/mptcp/mib.c      |   2 +
net/mptcp/mib.h      |   2 +
net/mptcp/options.c  |  66 +++++++++++-
net/mptcp/protocol.c | 247 ++++++++++++++++++++++++++++++++-----------
net/mptcp/protocol.h |  18 ++++
net/mptcp/subflow.c  |   1 +
6 files changed, 270 insertions(+), 66 deletions(-)
[PATCH v7 mptcp-next 0/7] mptcp: address stall under memory pressure
Posted by Paolo Abeni 6 days ago
This an attempt to fix the data transfer stall reported by Geliang and
Gang more carefully enforcing memory constraints at the MPTCP level.

This iteration presents a significant change WRT the previous one,
avoiding entirely the collapse attempt on memory pressure. Note that
this choice represent a trade off: collapsing allow much faster transfer
(to be more accurate: order of magnitude less slow) under some extreme
conditions, but makes transfer slower and much more CPU intensive for
less unlikely conditions.

As a consequence of the above the `mptcp_data.multi_chunk_sendfile`
test-case needs a 240 seconds timeout to complete successfully:

TEST_F_TIMEOUT(mptcp, multi_chunk_sendfile, 240)

The solution performing data collapsing would need similar long timeout
for the multiproc tests cases: mutliproc_even, mutliproc_readers,
mutliproc_writers, mutliproc_sendpage_even, mutliproc_sendpage_readers,
mutliproc_sendpage_writers.

Patch 1 is new in v6, and is actually a fix for an old issue (targeting
net), included here just for my convenience.

Patch 2 and 3 makes the admission check much more strict for incoming
packets exceeding the memory limits, with some exception for fallback
sockets.
Patch 4 makes implement OoO queue pruning for MPTCP and patch 5
addresses an edge scenario that could still lead to transfer stall
under memory pressure.
Finally patch 6 and 7 improve the MPTCP-level retransmission schema to
make recovery from memory pressure/after MPTCP-level drop significantly
faster.
---
v6 -> v7: 
  - address some of sashiko feedback, see individual patches for the
    gory details.

Paolo Abeni (7):
  mptcp: fix missing wakeups in edge scenarios
  mptcp: explicitly drop over memory limits
  mptcp: enforce hard limit on backlog flushing
  mptcp: implemented OoO queue pruning
  mptcp: track prune recovery status
  mptcp: move the retrans loop to a separate helper
  mptcp: let the retrans scheduler do its job.

 net/mptcp/mib.c      |   2 +
 net/mptcp/mib.h      |   2 +
 net/mptcp/options.c  |  66 +++++++++++-
 net/mptcp/protocol.c | 247 ++++++++++++++++++++++++++++++++-----------
 net/mptcp/protocol.h |  18 ++++
 net/mptcp/subflow.c  |   1 +
 6 files changed, 270 insertions(+), 66 deletions(-)

-- 
2.54.0
Re: [PATCH v7 mptcp-next 0/7] mptcp: address stall under memory pressure
Posted by Geliang Tang 5 days, 10 hours ago
Hi Paolo,

Thanks for this v7.

On Tue, 2026-05-19 at 19:01 +0200, Paolo Abeni wrote:
> This an attempt to fix the data transfer stall reported by Geliang
> and
> Gang more carefully enforcing memory constraints at the MPTCP level.
> 
> This iteration presents a significant change WRT the previous one,
> avoiding entirely the collapse attempt on memory pressure. Note that
> this choice represent a trade off: collapsing allow much faster
> transfer
> (to be more accurate: order of magnitude less slow) under some
> extreme
> conditions, but makes transfer slower and much more CPU intensive for
> less unlikely conditions.
> 
> As a consequence of the above the `mptcp_data.multi_chunk_sendfile`
> test-case needs a 240 seconds timeout to complete successfully:
> 
> TEST_F_TIMEOUT(mptcp, multi_chunk_sendfile, 240)
> 
> The solution performing data collapsing would need similar long
> timeout
> for the multiproc tests cases: mutliproc_even, mutliproc_readers,
> mutliproc_writers, mutliproc_sendpage_even,
> mutliproc_sendpage_readers,
> mutliproc_sendpage_writers.

Based on this version, I actually tested the MPTCP TLS self-tests and
still encountered a few similar errors, with the test duration taking
several times longer than before.


First, I applied the "MPTCP KTLS support" v18 [1] patch series and ran
the MPTCP TLS self-tests in debug mode, obtaining the following output:

Selftest Test: ./mptcp_tls.sh
TAP version 13
1..1
# TAP version 13
# 1..871
# # Starting 871 tests from 13 test cases.
# #  RUN           tls.12_aes_gcm_mptcp.sendfile ...
# #            OK  tls.12_aes_gcm_mptcp.sendfile
# ok 1 tls.12_aes_gcm_mptcp.sendfile

... ...
# #  RUN           tls.12_aria_gcm_256_mptcp.rekey_poll_delay ...
# #            OK  tls.12_aria_gcm_256_mptcp.rekey_poll_delay
# ok 871 tls.12_aria_gcm_256_mptcp.rekey_poll_delay
# # PASSED: 871 / 871 tests passed.
# # Totals: pass:871 fail:0 xfail:0 xpass:0 skip:0 error:0
ok 1 test: selftest_mptcp_tls
# time=142

All 871 tests passed, taking 142 seconds.


Then based on the "MPTCP KTLS support" v18, I further applied this
patch series "mptcp: address stall under memory pressure" v7, and ran
the MPTCP TLS self-tests in debug mode, obtaining the following output:

Selftest Test: ./mptcp_tls.sh
TAP version 13
1..1
# TAP version 13
# 1..871
# # Starting 871 tests from 13 test cases.
# #  RUN           tls.12_aes_gcm_mptcp.sendfile ...
# #            OK  tls.12_aes_gcm_mptcp.sendfile
# ok 1 tls.12_aes_gcm_mptcp.sendfile

... ...
# #  RUN           tls.12_aes_gcm_mptcp.multi_chunk_sendfile ...
# # multi_chunk_sendfile: Test terminated by timeout
# #          FAIL  tls.12_aes_gcm_mptcp.multi_chunk_sendfile
# not ok 3 tls.12_aes_gcm_mptcp.multi_chunk_sendfile

... ...
# #  RUN           tls.13_aes_gcm_mptcp.multi_chunk_sendfile ...
# # multi_chunk_sendfile: Test terminated by timeout
# #          FAIL  tls.13_aes_gcm_mptcp.multi_chunk_sendfile
# not ok 70 tls.13_aes_gcm_mptcp.multi_chunk_sendfile

... ...
# #  RUN           tls.12_chacha_mptcp.multi_chunk_sendfile ...
# # multi_chunk_sendfile: Test terminated by timeout
# #          FAIL  tls.12_chacha_mptcp.multi_chunk_sendfile
# not ok 137 tls.12_chacha_mptcp.multi_chunk_sendfile

... ...
# #  RUN           tls.13_chacha_mptcp.multi_chunk_sendfile ...
# # multi_chunk_sendfile: Test terminated by timeout
# #          FAIL  tls.13_chacha_mptcp.multi_chunk_sendfile
# not ok 204 tls.13_chacha_mptcp.multi_chunk_sendfile

... ...
# #  RUN           tls.13_sm4_gcm_mptcp.multi_chunk_sendfile ...
# # multi_chunk_sendfile: Test terminated by timeout
# #          FAIL  tls.13_sm4_gcm_mptcp.multi_chunk_sendfile
# not ok 271 tls.13_sm4_gcm_mptcp.multi_chunk_sendfile

... ...
# #  RUN           tls.13_sm4_ccm_mptcp.multi_chunk_sendfile ...
# # multi_chunk_sendfile: Test terminated by timeout
# #          FAIL  tls.13_sm4_ccm_mptcp.multi_chunk_sendfile
# not ok 338 tls.13_sm4_ccm_mptcp.multi_chunk_sendfile

... ...
# #  RUN           tls.12_aes_ccm_mptcp.multi_chunk_sendfile ...
# # multi_chunk_sendfile: Test terminated by timeout
# #          FAIL  tls.12_aes_ccm_mptcp.multi_chunk_sendfile
# not ok 405 tls.12_aes_ccm_mptcp.multi_chunk_sendfile

... ...
# #  RUN           tls.13_aes_ccm_mptcp.multi_chunk_sendfile ...
# # multi_chunk_sendfile: Test terminated by timeout
# #          FAIL  tls.13_aes_ccm_mptcp.multi_chunk_sendfile
# not ok 472 tls.13_aes_ccm_mptcp.multi_chunk_sendfile

... ...
# #  RUN           tls.12_aes_gcm_256_mptcp.multi_chunk_sendfile ...
# # multi_chunk_sendfile: Test terminated by timeout
# #          FAIL  tls.12_aes_gcm_256_mptcp.multi_chunk_sendfile
# not ok 539 tls.12_aes_gcm_256_mptcp.multi_chunk_sendfile

... ...
# #  RUN           tls.13_aes_gcm_256_mptcp.multi_chunk_sendfile ...
# # multi_chunk_sendfile: Test terminated by timeout
# #          FAIL  tls.13_aes_gcm_256_mptcp.multi_chunk_sendfile
# not ok 606 tls.13_aes_gcm_256_mptcp.multi_chunk_sendfile

... ...
# #  RUN           tls.13_nopad_mptcp.multi_chunk_sendfile ...
# # multi_chunk_sendfile: Test terminated by timeout
# #          FAIL  tls.13_nopad_mptcp.multi_chunk_sendfile
# not ok 673 tls.13_nopad_mptcp.multi_chunk_sendfile

... ...
# #  RUN           tls.12_aria_gcm_mptcp.multi_chunk_sendfile ...
# # multi_chunk_sendfile: Test terminated by timeout
# #          FAIL  tls.12_aria_gcm_mptcp.multi_chunk_sendfile
# not ok 740 tls.12_aria_gcm_mptcp.multi_chunk_sendfile

... ...
# #  RUN           tls.12_aria_gcm_256_mptcp.multi_chunk_sendfile ...
# # multi_chunk_sendfile: Test terminated by timeout
# #          FAIL  tls.12_aria_gcm_256_mptcp.multi_chunk_sendfile
# not ok 807 tls.12_aria_gcm_256_mptcp.multi_chunk_sendfile

... ...
# #  RUN           tls.12_aria_gcm_256_mptcp.rekey_poll_delay ...
# #            OK  tls.12_aria_gcm_256_mptcp.rekey_poll_delay
# ok 871 tls.12_aria_gcm_256_mptcp.rekey_poll_delay
# # FAILED: 858 / 871 tests passed.
# # Totals: pass:858 fail:13 xfail:0 xpass:0 skip:0 error:0
not ok 1 test: selftest_mptcp_tls # FAIL
# time=567

Here, 13 tests failed, with the total test taking 567 seconds, which is
four times longer than the previous test.


For the next test, on top of "MPTCP KTLS support" v18 and "mptcp:
address stall under memory pressure" v7, I used the following patch to
enable multipath testing:

---
 tools/testing/selftests/net/mptcp/mptcp_tls.sh | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/tools/testing/selftests/net/mptcp/mptcp_tls.sh
b/tools/testing/selftests/net/mptcp/mptcp_tls.sh
index ea366d149a20..f8a07928ffa0 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_tls.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_tls.sh
@@ -36,10 +36,7 @@ init()
 trap cleanup EXIT
 
 mptcp_lib_check_mptcp
-# Temporarily set max to '0' to disable multipath testing,
-# as it depends on "mptcp: fix stall because of data_ready" series of
fixes.
-# It will be re-enabled together with that series later as a squash-to
patch.
-init 0
+init
 
 ip netns exec "$ns1" ./tls -v 12_aes_gcm_mptcp \
 			   -v 13_aes_gcm_mptcp \
-- 
2.53.0

And ran the MPTCP TLS self-tests in debug mode, obtaining the following
output:

Selftest Test: ./mptcp_tls.sh
TAP version 13
1..1
# TAP version 13
# 1..871
# # Starting 871 tests from 13 test cases.
# #  RUN           tls.12_aes_gcm_mptcp.sendfile ...
# #            OK  tls.12_aes_gcm_mptcp.sendfile
# ok 1 tls.12_aes_gcm_mptcp.sendfile

... ...
# #  RUN           tls.13_aes_gcm_mptcp.mutliproc_sendpage_readers ...
# # mutliproc_sendpage_readers: Test terminated by timeout
# #          FAIL  tls.13_aes_gcm_mptcp.mutliproc_sendpage_readers
# not ok 117 tls.13_aes_gcm_mptcp.mutliproc_sendpage_readers

... ...
# #  RUN           tls.12_aes_gcm_256_mptcp.mutliproc_sendpage_readers
...
# # tls.c:1690:mutliproc_sendpage_readers:Expected res (-1) >= 0 (0)
# # mutliproc_sendpage_readers: Test terminated by timeout
# #          FAIL  tls.12_aes_gcm_256_mptcp.mutliproc_sendpage_readers
# not ok 586 tls.12_aes_gcm_256_mptcp.mutliproc_sendpage_readers

... ...
# #  RUN           tls.12_aria_gcm_256_mptcp.mutliproc_sendpage_writers
...
# # mutliproc_sendpage_writers: Test terminated by timeout
# #          FAIL  tls.12_aria_gcm_256_mptcp.mutliproc_sendpage_writers
# not ok 855 tls.12_aria_gcm_256_mptcp.mutliproc_sendpage_writers

... ...
# #  RUN           tls.12_aria_gcm_256_mptcp.rekey_poll_delay ...
# #            OK  tls.12_aria_gcm_256_mptcp.rekey_poll_delay
# ok 871 tls.12_aria_gcm_256_mptcp.rekey_poll_delay
# # FAILED: 868 / 871 tests passed.
# # Totals: pass:868 fail:3 xfail:0 xpass:0 skip:0 error:0
not ok 1 test: selftest_mptcp_tls # FAIL
# time=1256

Here, 3 tests failed, with the total test taking 1256 seconds, which is
much longer than the previous test.


Hope this result is useful for you to refine this patch series. I would
be very grateful if you could also kindly help me review the "MPTCP
KTLS support" v18 patch series.

Thanks,
-Geliang

[1]
https://patchwork.kernel.org/project/mptcp/cover/cover.1777459066.git.tanggeliang@kylinos.cn/

> 
> Patch 1 is new in v6, and is actually a fix for an old issue
> (targeting
> net), included here just for my convenience.
> 
> Patch 2 and 3 makes the admission check much more strict for incoming
> packets exceeding the memory limits, with some exception for fallback
> sockets.
> Patch 4 makes implement OoO queue pruning for MPTCP and patch 5
> addresses an edge scenario that could still lead to transfer stall
> under memory pressure.
> Finally patch 6 and 7 improve the MPTCP-level retransmission schema
> to
> make recovery from memory pressure/after MPTCP-level drop
> significantly
> faster.
> ---
> v6 -> v7: 
>   - address some of sashiko feedback, see individual patches for the
>     gory details.
> 
> Paolo Abeni (7):
>   mptcp: fix missing wakeups in edge scenarios
>   mptcp: explicitly drop over memory limits
>   mptcp: enforce hard limit on backlog flushing
>   mptcp: implemented OoO queue pruning
>   mptcp: track prune recovery status
>   mptcp: move the retrans loop to a separate helper
>   mptcp: let the retrans scheduler do its job.
> 
>  net/mptcp/mib.c      |   2 +
>  net/mptcp/mib.h      |   2 +
>  net/mptcp/options.c  |  66 +++++++++++-
>  net/mptcp/protocol.c | 247 ++++++++++++++++++++++++++++++++---------
> --
>  net/mptcp/protocol.h |  18 ++++
>  net/mptcp/subflow.c  |   1 +
>  6 files changed, 270 insertions(+), 66 deletions(-)
Re: [PATCH v7 mptcp-next 0/7] mptcp: address stall under memory pressure
Posted by Paolo Abeni 5 days, 9 hours ago
On 5/20/26 8:32 AM, Geliang Tang wrote:
> On Tue, 2026-05-19 at 19:01 +0200, Paolo Abeni wrote:
>> This an attempt to fix the data transfer stall reported by Geliang
>> and
>> Gang more carefully enforcing memory constraints at the MPTCP level.
>>
>> This iteration presents a significant change WRT the previous one,
>> avoiding entirely the collapse attempt on memory pressure. Note that
>> this choice represent a trade off: collapsing allow much faster
>> transfer
>> (to be more accurate: order of magnitude less slow) under some
>> extreme
>> conditions, but makes transfer slower and much more CPU intensive for
>> less unlikely conditions.
>>
>> As a consequence of the above the `mptcp_data.multi_chunk_sendfile`
>> test-case needs a 240 seconds timeout to complete successfully:
>>
>> TEST_F_TIMEOUT(mptcp, multi_chunk_sendfile, 240)
>>
>> The solution performing data collapsing would need similar long
>> timeout
>> for the multiproc tests cases: mutliproc_even, mutliproc_readers,
>> mutliproc_writers, mutliproc_sendpage_even,
>> mutliproc_sendpage_readers,
>> mutliproc_sendpage_writers.
> 
> Based on this version, I actually tested the MPTCP TLS self-tests and
> still encountered a few similar errors, with the test duration taking
> several times longer than before.

[...]
> ... ...
> # #  RUN           tls.12_aes_gcm_mptcp.multi_chunk_sendfile ...
> # # multi_chunk_sendfile: Test terminated by timeout
> # #          FAIL  tls.12_aes_gcm_mptcp.multi_chunk_sendfile
> # not ok 3 tls.12_aes_gcm_mptcp.multi_chunk_sendfile

Without this series you should get some stall there, right?

Note that the 'multi_chunk' test will require increasing the test-case timeout,
as mentioned in v3:

---
diff --git a/tools/testing/selftests/net/mptcp/mptcp_data.c b/tools/testing/selftests/net/mptcp/mptcp_data.c
index 39d092e7888d..127d8b47bd39 100644
--- a/tools/testing/selftests/net/mptcp/mptcp_data.c
+++ b/tools/testing/selftests/net/mptcp/mptcp_data.c
@@ -166,7 +166,7 @@ static void chunked_sendfile(struct __test_metadata *_metadata,
 	close(fd);
 }
 
-TEST_F(mptcp, multi_chunk_sendfile)
+TEST_F_TIMEOUT(mptcp, multi_chunk_sendfile, 240)
 {
 	chunked_sendfile(_metadata, self, 4096, 4096);
 	chunked_sendfile(_metadata, self, 4096, 0);

---

The timeout could be reduced/avoided, including the 'collapse' strategy from v5
and previous revisions. As hinted by Eric, collapsing is a sort of weak spot
for potential evil peers and in practice causes high CPU usage increase to the
point that in debug build some other test-cases will still require an increased
timeout. 

AFAICS the multi chunk is really a corner case, especially when sending
1 byte chunk. As a trade off I prefer avoiding collapsing, and accept any
solution that allow completion in the multi chunk test, even with very low
tput. 

Side note: I think/I'm reasonably sure even plain TCP will have hard time
with such that case in comparable conditions, i.e. when OoO happens with
very high probability _after_ that the sender start pushing data at high speed, 
but the upstream self-tests (rightfully) do not include the OoO part.

/P
Re: [PATCH v7 mptcp-next 0/7] mptcp: address stall under memory pressure
Posted by gang.yan@linux.dev 4 days, 14 hours ago
May 20, 2026 at 4:19 PM, "Paolo Abeni" <pabeni@redhat.com mailto:pabeni@redhat.com?to=%22Paolo%20Abeni%22%20%3Cpabeni%40redhat.com%3E > wrote:


> 
> On 5/20/26 8:32 AM, Geliang Tang wrote:
> 
> > 
> > On Tue, 2026-05-19 at 19:01 +0200, Paolo Abeni wrote:
> > 
> > > 
> > > This an attempt to fix the data transfer stall reported by Geliang
> > >  and
> > >  Gang more carefully enforcing memory constraints at the MPTCP level.
> > > 
> > >  This iteration presents a significant change WRT the previous one,
> > >  avoiding entirely the collapse attempt on memory pressure. Note that
> > >  this choice represent a trade off: collapsing allow much faster
> > >  transfer
> > >  (to be more accurate: order of magnitude less slow) under some
> > >  extreme
> > >  conditions, but makes transfer slower and much more CPU intensive for
> > >  less unlikely conditions.
> > > 
> > >  As a consequence of the above the `mptcp_data.multi_chunk_sendfile`
> > >  test-case needs a 240 seconds timeout to complete successfully:
> > > 
> > >  TEST_F_TIMEOUT(mptcp, multi_chunk_sendfile, 240)
> > > 
> > >  The solution performing data collapsing would need similar long
> > >  timeout
> > >  for the multiproc tests cases: mutliproc_even, mutliproc_readers,
> > >  mutliproc_writers, mutliproc_sendpage_even,
> > >  mutliproc_sendpage_readers,
> > >  mutliproc_sendpage_writers.
> > > 
> >  
> >  Based on this version, I actually tested the MPTCP TLS self-tests and
> >  still encountered a few similar errors, with the test duration taking
> >  several times longer than before.
> > 
> [...]
> 
> > 
> > ... ...
> >  # # RUN tls.12_aes_gcm_mptcp.multi_chunk_sendfile ...
> >  # # multi_chunk_sendfile: Test terminated by timeout
> >  # # FAIL tls.12_aes_gcm_mptcp.multi_chunk_sendfile
> >  # not ok 3 tls.12_aes_gcm_mptcp.multi_chunk_sendfile
> > 
> Without this series you should get some stall there, right?
> 
> Note that the 'multi_chunk' test will require increasing the test-case timeout,
> as mentioned in v3:
> 
> ---
> diff --git a/tools/testing/selftests/net/mptcp/mptcp_data.c b/tools/testing/selftests/net/mptcp/mptcp_data.c
> index 39d092e7888d..127d8b47bd39 100644
> --- a/tools/testing/selftests/net/mptcp/mptcp_data.c
> +++ b/tools/testing/selftests/net/mptcp/mptcp_data.c
> @@ -166,7 +166,7 @@ static void chunked_sendfile(struct __test_metadata *_metadata,
>  close(fd);
>  }
>  
> -TEST_F(mptcp, multi_chunk_sendfile)
> +TEST_F_TIMEOUT(mptcp, multi_chunk_sendfile, 240)
>  {
>  chunked_sendfile(_metadata, self, 4096, 4096);
>  chunked_sendfile(_metadata, self, 4096, 0);
> 
> ---
> 
Hi Paolo,

No offense intended at all— Geliang and I just wanted to confirm
whether this performance regression is expected, acceptable, or
something we can optimize further.

We tested the performance of mptcp_data.sh (time per run) 15 times
under the v7 patch and [1], and the results are as follows:

v7 results:
5.82 4.72 5.38 6.18 6.52 6.04 5.05 6.49 4.78 5.62 5.52 4.91 8.07 3.87 5.61

    Max: 8.07s, Min: 3.87s, Avg: 5.64s

[1] results:
2.98 3.44 3.11 3.06 3.78 3.23 3.28 2.88 3.52 3.33 2.89 3.33 3.91 3.20 3.45

    Max: 3.91s, Min: 2.88s, Avg: 3.29s

We’d appreciate your thoughts on whether this delta aligns with
expectations, or if there are further optimizations we should explore.


[1] https://patchwork.kernel.org/project/mptcp/cover/cover.1773735950.git.yangang@kylinos.cn/

Thanks
Gang

> The timeout could be reduced/avoided, including the 'collapse' strategy from v5
> and previous revisions. As hinted by Eric, collapsing is a sort of weak spot
> for potential evil peers and in practice causes high CPU usage increase to the
> point that in debug build some other test-cases will still require an increased
> timeout. 
> 
> AFAICS the multi chunk is really a corner case, especially when sending
> 1 byte chunk. As a trade off I prefer avoiding collapsing, and accept any
> solution that allow completion in the multi chunk test, even with very low
> tput. 
> 
> Side note: I think/I'm reasonably sure even plain TCP will have hard time
> with such that case in comparable conditions, i.e. when OoO happens with
> very high probability _after_ that the sender start pushing data at high speed, 
> but the upstream self-tests (rightfully) do not include the OoO part.
> 
> /P
>
Re: [PATCH v7 mptcp-next 0/7] mptcp: address stall under memory pressure
Posted by Paolo Abeni 4 days, 9 hours ago
On 5/21/26 5:23 AM, gang.yan@linux.dev wrote:
> May 20, 2026 at 4:19 PM, "Paolo Abeni" <pabeni@redhat.com mailto:pabeni@redhat.com?to=%22Paolo%20Abeni%22%20%3Cpabeni%40redhat.com%3E > wrote:
>> On 5/20/26 8:32 AM, Geliang Tang wrote:
>>>  # # RUN tls.12_aes_gcm_mptcp.multi_chunk_sendfile ...
>>>  # # multi_chunk_sendfile: Test terminated by timeout
>>>  # # FAIL tls.12_aes_gcm_mptcp.multi_chunk_sendfile
>>>  # not ok 3 tls.12_aes_gcm_mptcp.multi_chunk_sendfile
>>>
>> Without this series you should get some stall there, right?
>>
>> Note that the 'multi_chunk' test will require increasing the test-case timeout,
>> as mentioned in v3:
>>
>> ---
>> diff --git a/tools/testing/selftests/net/mptcp/mptcp_data.c b/tools/testing/selftests/net/mptcp/mptcp_data.c
>> index 39d092e7888d..127d8b47bd39 100644
>> --- a/tools/testing/selftests/net/mptcp/mptcp_data.c
>> +++ b/tools/testing/selftests/net/mptcp/mptcp_data.c
>> @@ -166,7 +166,7 @@ static void chunked_sendfile(struct __test_metadata *_metadata,
>>  close(fd);
>>  }
>>  
>> -TEST_F(mptcp, multi_chunk_sendfile)
>> +TEST_F_TIMEOUT(mptcp, multi_chunk_sendfile, 240)
>>  {
>>  chunked_sendfile(_metadata, self, 4096, 4096);
>>  chunked_sendfile(_metadata, self, 4096, 0);
>>
>> ---
>>
> No offense intended at all— 

No offense taken at all :)

> Geliang and I just wanted to confirm
> whether this performance regression is expected, acceptable, or
> something we can optimize further.
> 
> We tested the performance of mptcp_data.sh (time per run) 15 times
> under the v7 patch and [1], and the results are as follows:
> 
> v7 results:
> 5.82 4.72 5.38 6.18 6.52 6.04 5.05 6.49 4.78 5.62 5.52 4.91 8.07 3.87 5.61
> 
>     Max: 8.07s, Min: 3.87s, Avg: 5.64s
> 
> [1] results:
> 2.98 3.44 3.11 3.06 3.78 3.23 3.28 2.88 3.52 3.33 2.89 3.33 3.91 3.20 3.45
> 
>     Max: 3.91s, Min: 2.88s, Avg: 3.29s
> 
> We’d appreciate your thoughts on whether this delta aligns with
> expectations, or if there are further optimizations we should explore.
> 
> 
> [1] https://patchwork.kernel.org/project/mptcp/cover/cover.1773735950.git.yangang@kylinos.cn/
It's a matter of trade offs. This series intentionally sacrifices
performances for hopefully very edge corner cases over the average/most
usual ones.

Note that when we hit the prune condition, the sender is really doing
something against performances (i.e. sending 1 byte payload packets).

If you break down the runtime you should observe that the delta should
visible mostily/only in the `multi_chunk_sendfile` and `multiproc*`
cases, none of them is very relevant performance wise
(`multi_chunk_sendfile` requires intensive pruning, and `multiproc*` is
doing I/O using multiple threads on the same socket, which is also
really bad for performances).

The change in [1] has a different trade-off: replacing the backlog list
with an RB-tree makes computational complexity for backlog processing
change from O(n) to O(n * log (n)), and that will affect every fastpath
user - in practice high speed transfer are more likely to use the
backlog than slow speed ones as the optimal CPU pinning is BHs on a set
of CPUs and user-space on a different one.

Negative delta (worse performances) are expected to be observed on top
of [1] in high performance setup (i.e. 2 BM hosts B2B connected doing a
single stream MPTCP bulk transfer over a couple of high speed links).

Yep, I suggested the RB tree backlog usage, but I did not see in advance
that drawback, I'm sorry.

/P

Re: [PATCH v7 mptcp-next 0/7] mptcp: address stall under memory pressure
Posted by MPTCP CI 5 days, 23 hours ago
Hi Paolo,

Thank you for your modifications, that's great!

Our CI did some validations and here is its report:

- KVM Validation: normal (except selftest_mptcp_join): Success! ✅
- KVM Validation: normal (only selftest_mptcp_join): Success! ✅
- KVM Validation: debug (except selftest_mptcp_join): Success! ✅
- KVM Validation: debug (only selftest_mptcp_join): Success! ✅
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/26113401623

Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/23b24a4a7ecb
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=1097528


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-normal

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)