Sashiko noted that mptcp_prune_ofo_queue() is 1 byte too restrictive with
the bound check compared to the pre-existing code.
Accepts rmem <= sk_rcvbuf.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
net/mptcp/protocol.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 264a13bc6f3e..8bfa21ef52ff 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -402,7 +402,7 @@ static bool mptcp_prune_ofo_queue(struct sock *sk, u64 seq)
msk->ooo_last_skb = rb_to_skb(prev);
mem = (unsigned int)sk_rmem_alloc_get(sk);
- if (mem < sk->sk_rcvbuf)
+ if (mem <= sk->sk_rcvbuf)
break;
node = prev;
@@ -413,7 +413,7 @@ static bool mptcp_prune_ofo_queue(struct sock *sk, u64 seq)
out:
mem = (unsigned int)sk_rmem_alloc_get(sk);
- return mem < sk->sk_rcvbuf;
+ return mem <= sk->sk_rcvbuf;
}
static bool __mptcp_move_skb(struct sock *sk, struct sk_buff *skb)
--
2.54.0
Hi Paolo, On 31/05/2026 01:08, Paolo Abeni wrote: > Sashiko noted that mptcp_prune_ofo_queue() is 1 byte too restrictive with > the bound check compared to the pre-existing code. The CI is reporting that mptcp_connect.sh is now flaky: - normal: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/26704613812/attempts/1#summary-78703396420 - debug: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/26706355990/attempts/1#summary-78708269694 When it happens, MPTcpExtRcvPruned is not 0. I can reproduce the issue locally (~5% of the time), so I started a quick bisect running mptcp_connect.sh, but only MPTFO and disconnect subtests, in a loop, 25 times (maybe that was not enough). Apparently, the issue seems to come from: mptcp: implemented OoO queue pruning I didn't check the intermediate versions without the squash-to patches, but I can run more tests if that helps. Do you have a rough idea what can cause this? Cheers, Matt -- Sponsored by the NGI0 Core fund.
On 6/2/26 1:21 PM, Matthieu Baerts wrote: > On 31/05/2026 01:08, Paolo Abeni wrote: >> Sashiko noted that mptcp_prune_ofo_queue() is 1 byte too restrictive with >> the bound check compared to the pre-existing code. > > The CI is reporting that mptcp_connect.sh is now flaky: > > - normal: > https://github.com/multipath-tcp/mptcp_net-next/actions/runs/26704613812/attempts/1#summary-78703396420 > > - debug: > https://github.com/multipath-tcp/mptcp_net-next/actions/runs/26706355990/attempts/1#summary-78708269694 > > When it happens, MPTcpExtRcvPruned is not 0. > > I can reproduce the issue locally (~5% of the time), so I started a > quick bisect running mptcp_connect.sh, but only MPTFO and disconnect > subtests, in a loop, 25 times (maybe that was not enough). Apparently, > the issue seems to come from: > > mptcp: implemented OoO queue pruning Of course :( > I didn't check the intermediate versions without the squash-to patches, > but I can run more tests if that helps. Do you have a rough idea what > can cause this? Pruning drops OoO packets and relies on good mptcp-level retrans to make forward progresses. In the to samples above I see quite a lot of NoDSSInWindow events - presumably happening _after_ the prune event. My wild guess is that MPTCP-level is not yet good enough/trip into some corner cases. Possibly paired with pruning being too aggressive. I'll try to investigate this. /P
On 6/3/26 9:14 AM, Paolo Abeni wrote:
> On 6/2/26 1:21 PM, Matthieu Baerts wrote:
>> On 31/05/2026 01:08, Paolo Abeni wrote:
>>> Sashiko noted that mptcp_prune_ofo_queue() is 1 byte too restrictive with
>>> the bound check compared to the pre-existing code.
>>
>> The CI is reporting that mptcp_connect.sh is now flaky:
>>
>> - normal:
>> https://github.com/multipath-tcp/mptcp_net-next/actions/runs/26704613812/attempts/1#summary-78703396420
>>
>> - debug:
>> https://github.com/multipath-tcp/mptcp_net-next/actions/runs/26706355990/attempts/1#summary-78708269694
>>
>> When it happens, MPTcpExtRcvPruned is not 0.
>>
>> I can reproduce the issue locally (~5% of the time), so I started a
>> quick bisect running mptcp_connect.sh, but only MPTFO and disconnect
>> subtests, in a loop, 25 times (maybe that was not enough). Apparently,
>> the issue seems to come from:
>>
>> mptcp: implemented OoO queue pruning
>
> Of course :(
>
>> I didn't check the intermediate versions without the squash-to patches,
>> but I can run more tests if that helps. Do you have a rough idea what
>> can cause this?
>
> Pruning drops OoO packets and relies on good mptcp-level retrans to make
> forward progresses. In the to samples above I see quite a lot of
> NoDSSInWindow events - presumably happening _after_ the prune event.
>
> My wild guess is that MPTCP-level is not yet good enough/trip into some
> corner cases. Possibly paired with pruning being too aggressive.
>
> I'll try to investigate this.
I dashed the above reply before actually looking at the PW status.
`mptcp: implemented OoO queue pruning` relies on
`mptcp: let the retrans scheduler do its job.`.
The latter is still rightfully on PW, and sashiko has a comment on it.
The (incremental) fix for sashiko comment should be, I'll test that and send
a v12 (or you could apply it :).
/P
---
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 2e62fdc2af3e..a4f7e99b30db 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -2927,8 +2927,7 @@ static void __mptcp_retrans(struct sock *sk)
* the current dfrag, if so try to start again from RTX head.
*/
mptcp_data_lock(sk);
- already_retrans = !dfrag->already_sent ||
- !before64(msk->snd_una, dfrag->data_seq +
+ already_retrans = !before64(msk->snd_una, dfrag->data_seq +
dfrag->already_sent);
put_page(dfrag->page);
if (already_retrans) {
Hi Paolo, On 03/06/2026 17:27, Paolo Abeni wrote: > > > On 6/3/26 9:14 AM, Paolo Abeni wrote: >> On 6/2/26 1:21 PM, Matthieu Baerts wrote: >>> On 31/05/2026 01:08, Paolo Abeni wrote: >>>> Sashiko noted that mptcp_prune_ofo_queue() is 1 byte too restrictive with >>>> the bound check compared to the pre-existing code. >>> >>> The CI is reporting that mptcp_connect.sh is now flaky: >>> >>> - normal: >>> https://github.com/multipath-tcp/mptcp_net-next/actions/runs/26704613812/attempts/1#summary-78703396420 >>> >>> - debug: >>> https://github.com/multipath-tcp/mptcp_net-next/actions/runs/26706355990/attempts/1#summary-78708269694 >>> >>> When it happens, MPTcpExtRcvPruned is not 0. >>> >>> I can reproduce the issue locally (~5% of the time), so I started a >>> quick bisect running mptcp_connect.sh, but only MPTFO and disconnect >>> subtests, in a loop, 25 times (maybe that was not enough). Apparently, >>> the issue seems to come from: >>> >>> mptcp: implemented OoO queue pruning >> >> Of course :( >> >>> I didn't check the intermediate versions without the squash-to patches, >>> but I can run more tests if that helps. Do you have a rough idea what >>> can cause this? >> >> Pruning drops OoO packets and relies on good mptcp-level retrans to make >> forward progresses. In the to samples above I see quite a lot of >> NoDSSInWindow events - presumably happening _after_ the prune event. >> >> My wild guess is that MPTCP-level is not yet good enough/trip into some >> corner cases. Possibly paired with pruning being too aggressive. >> >> I'll try to investigate this. > > I dashed the above reply before actually looking at the PW status. > `mptcp: implemented OoO queue pruning` relies on > `mptcp: let the retrans scheduler do its job.`. Detail: can we apply the second one before the first one? > The latter is still rightfully on PW, and sashiko has a comment on it. > > The (incremental) fix for sashiko comment should be, I'll test that and send > a v12 (or you could apply it :). I'm happy to apply it when you finish the testing, if you prefer not to send a v12 :) Cheers, Matt -- Sponsored by the NGI0 Core fund.
Hi Paolo,
Thank you for your modifications, that's great!
Our CI did some validations and here is its report:
- KVM Validation: normal (except selftest_mptcp_join): Success! ✅
- KVM Validation: normal (only selftest_mptcp_join): Success! ✅
- KVM Validation: debug (except selftest_mptcp_join): Success! ✅
- KVM Validation: debug (only selftest_mptcp_join): Success! ✅
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/26687492560
Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/8b6df95aea3a
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=1103335
If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:
$ cd [kernel source code]
$ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
--pull always mptcp/mptcp-upstream-virtme-docker:latest \
auto-normal
For more details:
https://github.com/multipath-tcp/mptcp-upstream-virtme-docker
Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)
Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)
Hi Paolo, On 31/05/2026 01:08, Paolo Abeni wrote: > Sashiko noted that mptcp_prune_ofo_queue() is 1 byte too restrictive with > the bound check compared to the pre-existing code. > > Accepts rmem <= sk_rcvbuf. Thank you for the follow-up patch. A few tests have failed, but I didn't find a link with this patch. Or did I miss something? - mptcp_connect.sh - mptcp_connect.sh -m sendfile - simult_flows.sh I already applied the patch, just in case that would help you: New patches for t/upstream: - c8dd2e45c108: Squash-to: "mptcp: implemented OoO queue pruning" - Results: 75370a1de11f..294e9da8a098 (export) Tests are now in progress: - export: https://github.com/multipath-tcp/mptcp_net-next/commit/8ff849807e07f5c503bab6d7b953b6eccd97c667/checks Cheers, Matt -- Sponsored by the NGI0 Core fund.
Hi Paolo,
Thank you for your modifications, that's great!
Our CI did some validations and here is its report:
- KVM Validation: normal (except selftest_mptcp_join): Unstable: 3 failed test(s): selftest_mptcp_connect selftest_mptcp_connect_sendfile selftest_simult_flows ⚠️
- KVM Validation: normal (only selftest_mptcp_join): Success! ✅
- KVM Validation: debug (except selftest_mptcp_join): Success! ✅
- KVM Validation: debug (only selftest_mptcp_join): Success! ✅
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/26687492560
Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/8b6df95aea3a
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=1103335
If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:
$ cd [kernel source code]
$ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
--pull always mptcp/mptcp-upstream-virtme-docker:latest \
auto-normal
For more details:
https://github.com/multipath-tcp/mptcp-upstream-virtme-docker
Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)
Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)
© 2016 - 2026 Red Hat, Inc.