net/mptcp/mib.c | 2 + net/mptcp/mib.h | 2 + net/mptcp/options.c | 66 +++++++++++- net/mptcp/protocol.c | 247 ++++++++++++++++++++++++++++++++----------- net/mptcp/protocol.h | 18 ++++ net/mptcp/subflow.c | 1 + 6 files changed, 270 insertions(+), 66 deletions(-)
This an attempt to fix the data transfer stall reported by Geliang and
Gang more carefully enforcing memory constraints at the MPTCP level.
This iteration presents a significant change WRT the previous one,
avoiding entirely the collapse attempt on memory pressure. Note that
this choice represent a trade off: collapsing allow much faster transfer
(to be more accurate: order of magnitude less slow) under some extreme
conditions, but makes transfer slower and much more CPU intensive for
less unlikely conditions.
As a consequence of the above the `mptcp_data.multi_chunk_sendfile`
test-case needs a 240 seconds timeout to complete successfully:
TEST_F_TIMEOUT(mptcp, multi_chunk_sendfile, 240)
The solution performing data collapsing would need similar long timeout
for the multiproc tests cases: mutliproc_even, mutliproc_readers,
mutliproc_writers, mutliproc_sendpage_even, mutliproc_sendpage_readers,
mutliproc_sendpage_writers.
Patch 1 is new in v6, and is actually a fix for an old issue (targeting
net), included here just for my convenience.
Patch 2 and 3 makes the admission check much more strict for incoming
packets exceeding the memory limits, with some exception for fallback
sockets.
Patch 4 makes implement OoO queue pruning for MPTCP and patch 5
addresses an edge scenario that could still lead to transfer stall
under memory pressure.
Finally patch 6 and 7 improve the MPTCP-level retransmission schema to
make recovery from memory pressure/after MPTCP-level drop significantly
faster.
---
v6 -> v7:
- address some of sashiko feedback, see individual patches for the
gory details.
Paolo Abeni (7):
mptcp: fix missing wakeups in edge scenarios
mptcp: explicitly drop over memory limits
mptcp: enforce hard limit on backlog flushing
mptcp: implemented OoO queue pruning
mptcp: track prune recovery status
mptcp: move the retrans loop to a separate helper
mptcp: let the retrans scheduler do its job.
net/mptcp/mib.c | 2 +
net/mptcp/mib.h | 2 +
net/mptcp/options.c | 66 +++++++++++-
net/mptcp/protocol.c | 247 ++++++++++++++++++++++++++++++++-----------
net/mptcp/protocol.h | 18 ++++
net/mptcp/subflow.c | 1 +
6 files changed, 270 insertions(+), 66 deletions(-)
--
2.54.0
Hi Paolo, Thanks for this v7. On Tue, 2026-05-19 at 19:01 +0200, Paolo Abeni wrote: > This an attempt to fix the data transfer stall reported by Geliang > and > Gang more carefully enforcing memory constraints at the MPTCP level. > > This iteration presents a significant change WRT the previous one, > avoiding entirely the collapse attempt on memory pressure. Note that > this choice represent a trade off: collapsing allow much faster > transfer > (to be more accurate: order of magnitude less slow) under some > extreme > conditions, but makes transfer slower and much more CPU intensive for > less unlikely conditions. > > As a consequence of the above the `mptcp_data.multi_chunk_sendfile` > test-case needs a 240 seconds timeout to complete successfully: > > TEST_F_TIMEOUT(mptcp, multi_chunk_sendfile, 240) > > The solution performing data collapsing would need similar long > timeout > for the multiproc tests cases: mutliproc_even, mutliproc_readers, > mutliproc_writers, mutliproc_sendpage_even, > mutliproc_sendpage_readers, > mutliproc_sendpage_writers. Based on this version, I actually tested the MPTCP TLS self-tests and still encountered a few similar errors, with the test duration taking several times longer than before. First, I applied the "MPTCP KTLS support" v18 [1] patch series and ran the MPTCP TLS self-tests in debug mode, obtaining the following output: Selftest Test: ./mptcp_tls.sh TAP version 13 1..1 # TAP version 13 # 1..871 # # Starting 871 tests from 13 test cases. # # RUN tls.12_aes_gcm_mptcp.sendfile ... # # OK tls.12_aes_gcm_mptcp.sendfile # ok 1 tls.12_aes_gcm_mptcp.sendfile ... ... # # RUN tls.12_aria_gcm_256_mptcp.rekey_poll_delay ... # # OK tls.12_aria_gcm_256_mptcp.rekey_poll_delay # ok 871 tls.12_aria_gcm_256_mptcp.rekey_poll_delay # # PASSED: 871 / 871 tests passed. # # Totals: pass:871 fail:0 xfail:0 xpass:0 skip:0 error:0 ok 1 test: selftest_mptcp_tls # time=142 All 871 tests passed, taking 142 seconds. Then based on the "MPTCP KTLS support" v18, I further applied this patch series "mptcp: address stall under memory pressure" v7, and ran the MPTCP TLS self-tests in debug mode, obtaining the following output: Selftest Test: ./mptcp_tls.sh TAP version 13 1..1 # TAP version 13 # 1..871 # # Starting 871 tests from 13 test cases. # # RUN tls.12_aes_gcm_mptcp.sendfile ... # # OK tls.12_aes_gcm_mptcp.sendfile # ok 1 tls.12_aes_gcm_mptcp.sendfile ... ... # # RUN tls.12_aes_gcm_mptcp.multi_chunk_sendfile ... # # multi_chunk_sendfile: Test terminated by timeout # # FAIL tls.12_aes_gcm_mptcp.multi_chunk_sendfile # not ok 3 tls.12_aes_gcm_mptcp.multi_chunk_sendfile ... ... # # RUN tls.13_aes_gcm_mptcp.multi_chunk_sendfile ... # # multi_chunk_sendfile: Test terminated by timeout # # FAIL tls.13_aes_gcm_mptcp.multi_chunk_sendfile # not ok 70 tls.13_aes_gcm_mptcp.multi_chunk_sendfile ... ... # # RUN tls.12_chacha_mptcp.multi_chunk_sendfile ... # # multi_chunk_sendfile: Test terminated by timeout # # FAIL tls.12_chacha_mptcp.multi_chunk_sendfile # not ok 137 tls.12_chacha_mptcp.multi_chunk_sendfile ... ... # # RUN tls.13_chacha_mptcp.multi_chunk_sendfile ... # # multi_chunk_sendfile: Test terminated by timeout # # FAIL tls.13_chacha_mptcp.multi_chunk_sendfile # not ok 204 tls.13_chacha_mptcp.multi_chunk_sendfile ... ... # # RUN tls.13_sm4_gcm_mptcp.multi_chunk_sendfile ... # # multi_chunk_sendfile: Test terminated by timeout # # FAIL tls.13_sm4_gcm_mptcp.multi_chunk_sendfile # not ok 271 tls.13_sm4_gcm_mptcp.multi_chunk_sendfile ... ... # # RUN tls.13_sm4_ccm_mptcp.multi_chunk_sendfile ... # # multi_chunk_sendfile: Test terminated by timeout # # FAIL tls.13_sm4_ccm_mptcp.multi_chunk_sendfile # not ok 338 tls.13_sm4_ccm_mptcp.multi_chunk_sendfile ... ... # # RUN tls.12_aes_ccm_mptcp.multi_chunk_sendfile ... # # multi_chunk_sendfile: Test terminated by timeout # # FAIL tls.12_aes_ccm_mptcp.multi_chunk_sendfile # not ok 405 tls.12_aes_ccm_mptcp.multi_chunk_sendfile ... ... # # RUN tls.13_aes_ccm_mptcp.multi_chunk_sendfile ... # # multi_chunk_sendfile: Test terminated by timeout # # FAIL tls.13_aes_ccm_mptcp.multi_chunk_sendfile # not ok 472 tls.13_aes_ccm_mptcp.multi_chunk_sendfile ... ... # # RUN tls.12_aes_gcm_256_mptcp.multi_chunk_sendfile ... # # multi_chunk_sendfile: Test terminated by timeout # # FAIL tls.12_aes_gcm_256_mptcp.multi_chunk_sendfile # not ok 539 tls.12_aes_gcm_256_mptcp.multi_chunk_sendfile ... ... # # RUN tls.13_aes_gcm_256_mptcp.multi_chunk_sendfile ... # # multi_chunk_sendfile: Test terminated by timeout # # FAIL tls.13_aes_gcm_256_mptcp.multi_chunk_sendfile # not ok 606 tls.13_aes_gcm_256_mptcp.multi_chunk_sendfile ... ... # # RUN tls.13_nopad_mptcp.multi_chunk_sendfile ... # # multi_chunk_sendfile: Test terminated by timeout # # FAIL tls.13_nopad_mptcp.multi_chunk_sendfile # not ok 673 tls.13_nopad_mptcp.multi_chunk_sendfile ... ... # # RUN tls.12_aria_gcm_mptcp.multi_chunk_sendfile ... # # multi_chunk_sendfile: Test terminated by timeout # # FAIL tls.12_aria_gcm_mptcp.multi_chunk_sendfile # not ok 740 tls.12_aria_gcm_mptcp.multi_chunk_sendfile ... ... # # RUN tls.12_aria_gcm_256_mptcp.multi_chunk_sendfile ... # # multi_chunk_sendfile: Test terminated by timeout # # FAIL tls.12_aria_gcm_256_mptcp.multi_chunk_sendfile # not ok 807 tls.12_aria_gcm_256_mptcp.multi_chunk_sendfile ... ... # # RUN tls.12_aria_gcm_256_mptcp.rekey_poll_delay ... # # OK tls.12_aria_gcm_256_mptcp.rekey_poll_delay # ok 871 tls.12_aria_gcm_256_mptcp.rekey_poll_delay # # FAILED: 858 / 871 tests passed. # # Totals: pass:858 fail:13 xfail:0 xpass:0 skip:0 error:0 not ok 1 test: selftest_mptcp_tls # FAIL # time=567 Here, 13 tests failed, with the total test taking 567 seconds, which is four times longer than the previous test. For the next test, on top of "MPTCP KTLS support" v18 and "mptcp: address stall under memory pressure" v7, I used the following patch to enable multipath testing: --- tools/testing/selftests/net/mptcp/mptcp_tls.sh | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/tools/testing/selftests/net/mptcp/mptcp_tls.sh b/tools/testing/selftests/net/mptcp/mptcp_tls.sh index ea366d149a20..f8a07928ffa0 100755 --- a/tools/testing/selftests/net/mptcp/mptcp_tls.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_tls.sh @@ -36,10 +36,7 @@ init() trap cleanup EXIT mptcp_lib_check_mptcp -# Temporarily set max to '0' to disable multipath testing, -# as it depends on "mptcp: fix stall because of data_ready" series of fixes. -# It will be re-enabled together with that series later as a squash-to patch. -init 0 +init ip netns exec "$ns1" ./tls -v 12_aes_gcm_mptcp \ -v 13_aes_gcm_mptcp \ -- 2.53.0 And ran the MPTCP TLS self-tests in debug mode, obtaining the following output: Selftest Test: ./mptcp_tls.sh TAP version 13 1..1 # TAP version 13 # 1..871 # # Starting 871 tests from 13 test cases. # # RUN tls.12_aes_gcm_mptcp.sendfile ... # # OK tls.12_aes_gcm_mptcp.sendfile # ok 1 tls.12_aes_gcm_mptcp.sendfile ... ... # # RUN tls.13_aes_gcm_mptcp.mutliproc_sendpage_readers ... # # mutliproc_sendpage_readers: Test terminated by timeout # # FAIL tls.13_aes_gcm_mptcp.mutliproc_sendpage_readers # not ok 117 tls.13_aes_gcm_mptcp.mutliproc_sendpage_readers ... ... # # RUN tls.12_aes_gcm_256_mptcp.mutliproc_sendpage_readers ... # # tls.c:1690:mutliproc_sendpage_readers:Expected res (-1) >= 0 (0) # # mutliproc_sendpage_readers: Test terminated by timeout # # FAIL tls.12_aes_gcm_256_mptcp.mutliproc_sendpage_readers # not ok 586 tls.12_aes_gcm_256_mptcp.mutliproc_sendpage_readers ... ... # # RUN tls.12_aria_gcm_256_mptcp.mutliproc_sendpage_writers ... # # mutliproc_sendpage_writers: Test terminated by timeout # # FAIL tls.12_aria_gcm_256_mptcp.mutliproc_sendpage_writers # not ok 855 tls.12_aria_gcm_256_mptcp.mutliproc_sendpage_writers ... ... # # RUN tls.12_aria_gcm_256_mptcp.rekey_poll_delay ... # # OK tls.12_aria_gcm_256_mptcp.rekey_poll_delay # ok 871 tls.12_aria_gcm_256_mptcp.rekey_poll_delay # # FAILED: 868 / 871 tests passed. # # Totals: pass:868 fail:3 xfail:0 xpass:0 skip:0 error:0 not ok 1 test: selftest_mptcp_tls # FAIL # time=1256 Here, 3 tests failed, with the total test taking 1256 seconds, which is much longer than the previous test. Hope this result is useful for you to refine this patch series. I would be very grateful if you could also kindly help me review the "MPTCP KTLS support" v18 patch series. Thanks, -Geliang [1] https://patchwork.kernel.org/project/mptcp/cover/cover.1777459066.git.tanggeliang@kylinos.cn/ > > Patch 1 is new in v6, and is actually a fix for an old issue > (targeting > net), included here just for my convenience. > > Patch 2 and 3 makes the admission check much more strict for incoming > packets exceeding the memory limits, with some exception for fallback > sockets. > Patch 4 makes implement OoO queue pruning for MPTCP and patch 5 > addresses an edge scenario that could still lead to transfer stall > under memory pressure. > Finally patch 6 and 7 improve the MPTCP-level retransmission schema > to > make recovery from memory pressure/after MPTCP-level drop > significantly > faster. > --- > v6 -> v7: > - address some of sashiko feedback, see individual patches for the > gory details. > > Paolo Abeni (7): > mptcp: fix missing wakeups in edge scenarios > mptcp: explicitly drop over memory limits > mptcp: enforce hard limit on backlog flushing > mptcp: implemented OoO queue pruning > mptcp: track prune recovery status > mptcp: move the retrans loop to a separate helper > mptcp: let the retrans scheduler do its job. > > net/mptcp/mib.c | 2 + > net/mptcp/mib.h | 2 + > net/mptcp/options.c | 66 +++++++++++- > net/mptcp/protocol.c | 247 ++++++++++++++++++++++++++++++++--------- > -- > net/mptcp/protocol.h | 18 ++++ > net/mptcp/subflow.c | 1 + > 6 files changed, 270 insertions(+), 66 deletions(-)
On 5/20/26 8:32 AM, Geliang Tang wrote:
> On Tue, 2026-05-19 at 19:01 +0200, Paolo Abeni wrote:
>> This an attempt to fix the data transfer stall reported by Geliang
>> and
>> Gang more carefully enforcing memory constraints at the MPTCP level.
>>
>> This iteration presents a significant change WRT the previous one,
>> avoiding entirely the collapse attempt on memory pressure. Note that
>> this choice represent a trade off: collapsing allow much faster
>> transfer
>> (to be more accurate: order of magnitude less slow) under some
>> extreme
>> conditions, but makes transfer slower and much more CPU intensive for
>> less unlikely conditions.
>>
>> As a consequence of the above the `mptcp_data.multi_chunk_sendfile`
>> test-case needs a 240 seconds timeout to complete successfully:
>>
>> TEST_F_TIMEOUT(mptcp, multi_chunk_sendfile, 240)
>>
>> The solution performing data collapsing would need similar long
>> timeout
>> for the multiproc tests cases: mutliproc_even, mutliproc_readers,
>> mutliproc_writers, mutliproc_sendpage_even,
>> mutliproc_sendpage_readers,
>> mutliproc_sendpage_writers.
>
> Based on this version, I actually tested the MPTCP TLS self-tests and
> still encountered a few similar errors, with the test duration taking
> several times longer than before.
[...]
> ... ...
> # # RUN tls.12_aes_gcm_mptcp.multi_chunk_sendfile ...
> # # multi_chunk_sendfile: Test terminated by timeout
> # # FAIL tls.12_aes_gcm_mptcp.multi_chunk_sendfile
> # not ok 3 tls.12_aes_gcm_mptcp.multi_chunk_sendfile
Without this series you should get some stall there, right?
Note that the 'multi_chunk' test will require increasing the test-case timeout,
as mentioned in v3:
---
diff --git a/tools/testing/selftests/net/mptcp/mptcp_data.c b/tools/testing/selftests/net/mptcp/mptcp_data.c
index 39d092e7888d..127d8b47bd39 100644
--- a/tools/testing/selftests/net/mptcp/mptcp_data.c
+++ b/tools/testing/selftests/net/mptcp/mptcp_data.c
@@ -166,7 +166,7 @@ static void chunked_sendfile(struct __test_metadata *_metadata,
close(fd);
}
-TEST_F(mptcp, multi_chunk_sendfile)
+TEST_F_TIMEOUT(mptcp, multi_chunk_sendfile, 240)
{
chunked_sendfile(_metadata, self, 4096, 4096);
chunked_sendfile(_metadata, self, 4096, 0);
---
The timeout could be reduced/avoided, including the 'collapse' strategy from v5
and previous revisions. As hinted by Eric, collapsing is a sort of weak spot
for potential evil peers and in practice causes high CPU usage increase to the
point that in debug build some other test-cases will still require an increased
timeout.
AFAICS the multi chunk is really a corner case, especially when sending
1 byte chunk. As a trade off I prefer avoiding collapsing, and accept any
solution that allow completion in the multi chunk test, even with very low
tput.
Side note: I think/I'm reasonably sure even plain TCP will have hard time
with such that case in comparable conditions, i.e. when OoO happens with
very high probability _after_ that the sender start pushing data at high speed,
but the upstream self-tests (rightfully) do not include the OoO part.
/P
May 20, 2026 at 4:19 PM, "Paolo Abeni" <pabeni@redhat.com mailto:pabeni@redhat.com?to=%22Paolo%20Abeni%22%20%3Cpabeni%40redhat.com%3E > wrote:
>
> On 5/20/26 8:32 AM, Geliang Tang wrote:
>
> >
> > On Tue, 2026-05-19 at 19:01 +0200, Paolo Abeni wrote:
> >
> > >
> > > This an attempt to fix the data transfer stall reported by Geliang
> > > and
> > > Gang more carefully enforcing memory constraints at the MPTCP level.
> > >
> > > This iteration presents a significant change WRT the previous one,
> > > avoiding entirely the collapse attempt on memory pressure. Note that
> > > this choice represent a trade off: collapsing allow much faster
> > > transfer
> > > (to be more accurate: order of magnitude less slow) under some
> > > extreme
> > > conditions, but makes transfer slower and much more CPU intensive for
> > > less unlikely conditions.
> > >
> > > As a consequence of the above the `mptcp_data.multi_chunk_sendfile`
> > > test-case needs a 240 seconds timeout to complete successfully:
> > >
> > > TEST_F_TIMEOUT(mptcp, multi_chunk_sendfile, 240)
> > >
> > > The solution performing data collapsing would need similar long
> > > timeout
> > > for the multiproc tests cases: mutliproc_even, mutliproc_readers,
> > > mutliproc_writers, mutliproc_sendpage_even,
> > > mutliproc_sendpage_readers,
> > > mutliproc_sendpage_writers.
> > >
> >
> > Based on this version, I actually tested the MPTCP TLS self-tests and
> > still encountered a few similar errors, with the test duration taking
> > several times longer than before.
> >
> [...]
>
> >
> > ... ...
> > # # RUN tls.12_aes_gcm_mptcp.multi_chunk_sendfile ...
> > # # multi_chunk_sendfile: Test terminated by timeout
> > # # FAIL tls.12_aes_gcm_mptcp.multi_chunk_sendfile
> > # not ok 3 tls.12_aes_gcm_mptcp.multi_chunk_sendfile
> >
> Without this series you should get some stall there, right?
>
> Note that the 'multi_chunk' test will require increasing the test-case timeout,
> as mentioned in v3:
>
> ---
> diff --git a/tools/testing/selftests/net/mptcp/mptcp_data.c b/tools/testing/selftests/net/mptcp/mptcp_data.c
> index 39d092e7888d..127d8b47bd39 100644
> --- a/tools/testing/selftests/net/mptcp/mptcp_data.c
> +++ b/tools/testing/selftests/net/mptcp/mptcp_data.c
> @@ -166,7 +166,7 @@ static void chunked_sendfile(struct __test_metadata *_metadata,
> close(fd);
> }
>
> -TEST_F(mptcp, multi_chunk_sendfile)
> +TEST_F_TIMEOUT(mptcp, multi_chunk_sendfile, 240)
> {
> chunked_sendfile(_metadata, self, 4096, 4096);
> chunked_sendfile(_metadata, self, 4096, 0);
>
> ---
>
Hi Paolo,
No offense intended at all— Geliang and I just wanted to confirm
whether this performance regression is expected, acceptable, or
something we can optimize further.
We tested the performance of mptcp_data.sh (time per run) 15 times
under the v7 patch and [1], and the results are as follows:
v7 results:
5.82 4.72 5.38 6.18 6.52 6.04 5.05 6.49 4.78 5.62 5.52 4.91 8.07 3.87 5.61
Max: 8.07s, Min: 3.87s, Avg: 5.64s
[1] results:
2.98 3.44 3.11 3.06 3.78 3.23 3.28 2.88 3.52 3.33 2.89 3.33 3.91 3.20 3.45
Max: 3.91s, Min: 2.88s, Avg: 3.29s
We’d appreciate your thoughts on whether this delta aligns with
expectations, or if there are further optimizations we should explore.
[1] https://patchwork.kernel.org/project/mptcp/cover/cover.1773735950.git.yangang@kylinos.cn/
Thanks
Gang
> The timeout could be reduced/avoided, including the 'collapse' strategy from v5
> and previous revisions. As hinted by Eric, collapsing is a sort of weak spot
> for potential evil peers and in practice causes high CPU usage increase to the
> point that in debug build some other test-cases will still require an increased
> timeout.
>
> AFAICS the multi chunk is really a corner case, especially when sending
> 1 byte chunk. As a trade off I prefer avoiding collapsing, and accept any
> solution that allow completion in the multi chunk test, even with very low
> tput.
>
> Side note: I think/I'm reasonably sure even plain TCP will have hard time
> with such that case in comparable conditions, i.e. when OoO happens with
> very high probability _after_ that the sender start pushing data at high speed,
> but the upstream self-tests (rightfully) do not include the OoO part.
>
> /P
>
On 5/21/26 5:23 AM, gang.yan@linux.dev wrote:
> May 20, 2026 at 4:19 PM, "Paolo Abeni" <pabeni@redhat.com mailto:pabeni@redhat.com?to=%22Paolo%20Abeni%22%20%3Cpabeni%40redhat.com%3E > wrote:
>> On 5/20/26 8:32 AM, Geliang Tang wrote:
>>> # # RUN tls.12_aes_gcm_mptcp.multi_chunk_sendfile ...
>>> # # multi_chunk_sendfile: Test terminated by timeout
>>> # # FAIL tls.12_aes_gcm_mptcp.multi_chunk_sendfile
>>> # not ok 3 tls.12_aes_gcm_mptcp.multi_chunk_sendfile
>>>
>> Without this series you should get some stall there, right?
>>
>> Note that the 'multi_chunk' test will require increasing the test-case timeout,
>> as mentioned in v3:
>>
>> ---
>> diff --git a/tools/testing/selftests/net/mptcp/mptcp_data.c b/tools/testing/selftests/net/mptcp/mptcp_data.c
>> index 39d092e7888d..127d8b47bd39 100644
>> --- a/tools/testing/selftests/net/mptcp/mptcp_data.c
>> +++ b/tools/testing/selftests/net/mptcp/mptcp_data.c
>> @@ -166,7 +166,7 @@ static void chunked_sendfile(struct __test_metadata *_metadata,
>> close(fd);
>> }
>>
>> -TEST_F(mptcp, multi_chunk_sendfile)
>> +TEST_F_TIMEOUT(mptcp, multi_chunk_sendfile, 240)
>> {
>> chunked_sendfile(_metadata, self, 4096, 4096);
>> chunked_sendfile(_metadata, self, 4096, 0);
>>
>> ---
>>
> No offense intended at all—
No offense taken at all :)
> Geliang and I just wanted to confirm
> whether this performance regression is expected, acceptable, or
> something we can optimize further.
>
> We tested the performance of mptcp_data.sh (time per run) 15 times
> under the v7 patch and [1], and the results are as follows:
>
> v7 results:
> 5.82 4.72 5.38 6.18 6.52 6.04 5.05 6.49 4.78 5.62 5.52 4.91 8.07 3.87 5.61
>
> Max: 8.07s, Min: 3.87s, Avg: 5.64s
>
> [1] results:
> 2.98 3.44 3.11 3.06 3.78 3.23 3.28 2.88 3.52 3.33 2.89 3.33 3.91 3.20 3.45
>
> Max: 3.91s, Min: 2.88s, Avg: 3.29s
>
> We’d appreciate your thoughts on whether this delta aligns with
> expectations, or if there are further optimizations we should explore.
>
>
> [1] https://patchwork.kernel.org/project/mptcp/cover/cover.1773735950.git.yangang@kylinos.cn/
It's a matter of trade offs. This series intentionally sacrifices
performances for hopefully very edge corner cases over the average/most
usual ones.
Note that when we hit the prune condition, the sender is really doing
something against performances (i.e. sending 1 byte payload packets).
If you break down the runtime you should observe that the delta should
visible mostily/only in the `multi_chunk_sendfile` and `multiproc*`
cases, none of them is very relevant performance wise
(`multi_chunk_sendfile` requires intensive pruning, and `multiproc*` is
doing I/O using multiple threads on the same socket, which is also
really bad for performances).
The change in [1] has a different trade-off: replacing the backlog list
with an RB-tree makes computational complexity for backlog processing
change from O(n) to O(n * log (n)), and that will affect every fastpath
user - in practice high speed transfer are more likely to use the
backlog than slow speed ones as the optimal CPU pinning is BHs on a set
of CPUs and user-space on a different one.
Negative delta (worse performances) are expected to be observed on top
of [1] in high performance setup (i.e. 2 BM hosts B2B connected doing a
single stream MPTCP bulk transfer over a couple of high speed links).
Yep, I suggested the RB tree backlog usage, but I did not see in advance
that drawback, I'm sorry.
/P
Hi Paolo,
Thank you for your modifications, that's great!
Our CI did some validations and here is its report:
- KVM Validation: normal (except selftest_mptcp_join): Success! ✅
- KVM Validation: normal (only selftest_mptcp_join): Success! ✅
- KVM Validation: debug (except selftest_mptcp_join): Success! ✅
- KVM Validation: debug (only selftest_mptcp_join): Success! ✅
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/26113401623
Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/23b24a4a7ecb
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=1097528
If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:
$ cd [kernel source code]
$ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
--pull always mptcp/mptcp-upstream-virtme-docker:latest \
auto-normal
For more details:
https://github.com/multipath-tcp/mptcp-upstream-virtme-docker
Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)
Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)
© 2016 - 2026 Red Hat, Inc.