Squash-to: "mptcp: implemented OoO queue pruning"

[PATCH mptcp-next] Squash-to: "mptcp: implemented OoO queue pruning"

Posted by Paolo Abeni 1 week, 1 day ago

Sashiko noted that mptcp_prune_ofo_queue() is 1 byte too restrictive with
the bound check compared to the pre-existing code.

Accepts rmem <= sk_rcvbuf.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/protocol.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 264a13bc6f3e..8bfa21ef52ff 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -402,7 +402,7 @@ static bool mptcp_prune_ofo_queue(struct sock *sk, u64 seq)
 		msk->ooo_last_skb = rb_to_skb(prev);
 
 		mem = (unsigned int)sk_rmem_alloc_get(sk);
-		if (mem < sk->sk_rcvbuf)
+		if (mem <= sk->sk_rcvbuf)
 			break;
 
 		node = prev;
@@ -413,7 +413,7 @@ static bool mptcp_prune_ofo_queue(struct sock *sk, u64 seq)
 
 out:
 	mem = (unsigned int)sk_rmem_alloc_get(sk);
-	return mem < sk->sk_rcvbuf;
+	return mem <= sk->sk_rcvbuf;
 }
 
 static bool __mptcp_move_skb(struct sock *sk, struct sk_buff *skb)
-- 
2.54.0

Re: [PATCH mptcp-next] Squash-to: "mptcp: implemented OoO queue pruning"

Posted by Matthieu Baerts 5 days, 21 hours ago

Hi Paolo,

On 31/05/2026 01:08, Paolo Abeni wrote:
> Sashiko noted that mptcp_prune_ofo_queue() is 1 byte too restrictive with
> the bound check compared to the pre-existing code.

The CI is reporting that mptcp_connect.sh is now flaky:

 - normal:
https://github.com/multipath-tcp/mptcp_net-next/actions/runs/26704613812/attempts/1#summary-78703396420

  - debug:
https://github.com/multipath-tcp/mptcp_net-next/actions/runs/26706355990/attempts/1#summary-78708269694

When it happens, MPTcpExtRcvPruned is not 0.

I can reproduce the issue locally (~5% of the time), so I started a
quick bisect running mptcp_connect.sh, but only MPTFO and disconnect
subtests, in a loop, 25 times (maybe that was not enough). Apparently,
the issue seems to come from:

  mptcp: implemented OoO queue pruning

I didn't check the intermediate versions without the squash-to patches,
but I can run more tests if that helps. Do you have a rough idea what
can cause this?

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.

Re: [PATCH mptcp-next] Squash-to: "mptcp: implemented OoO queue pruning"

Posted by Paolo Abeni 5 days, 1 hour ago

On 6/2/26 1:21 PM, Matthieu Baerts wrote:
> On 31/05/2026 01:08, Paolo Abeni wrote:
>> Sashiko noted that mptcp_prune_ofo_queue() is 1 byte too restrictive with
>> the bound check compared to the pre-existing code.
> 
> The CI is reporting that mptcp_connect.sh is now flaky:
> 
>  - normal:
> https://github.com/multipath-tcp/mptcp_net-next/actions/runs/26704613812/attempts/1#summary-78703396420
> 
>   - debug:
> https://github.com/multipath-tcp/mptcp_net-next/actions/runs/26706355990/attempts/1#summary-78708269694
> 
> When it happens, MPTcpExtRcvPruned is not 0.
> 
> I can reproduce the issue locally (~5% of the time), so I started a
> quick bisect running mptcp_connect.sh, but only MPTFO and disconnect
> subtests, in a loop, 25 times (maybe that was not enough). Apparently,
> the issue seems to come from:
> 
>   mptcp: implemented OoO queue pruning

Of course :(

> I didn't check the intermediate versions without the squash-to patches,
> but I can run more tests if that helps. Do you have a rough idea what
> can cause this?

Pruning drops OoO packets and relies on good mptcp-level retrans to make
forward progresses. In the to samples above I see quite a lot of
NoDSSInWindow events - presumably happening _after_ the prune event.

My wild guess is that MPTCP-level is not yet good enough/trip into some
corner cases. Possibly paired with pruning being too aggressive.

I'll try to investigate this.

/P

Re: [PATCH mptcp-next] Squash-to: "mptcp: implemented OoO queue pruning"

Posted by Paolo Abeni 5 days, 1 hour ago


On 6/3/26 9:14 AM, Paolo Abeni wrote:
> On 6/2/26 1:21 PM, Matthieu Baerts wrote:
>> On 31/05/2026 01:08, Paolo Abeni wrote:
>>> Sashiko noted that mptcp_prune_ofo_queue() is 1 byte too restrictive with
>>> the bound check compared to the pre-existing code.
>>
>> The CI is reporting that mptcp_connect.sh is now flaky:
>>
>>  - normal:
>> https://github.com/multipath-tcp/mptcp_net-next/actions/runs/26704613812/attempts/1#summary-78703396420
>>
>>   - debug:
>> https://github.com/multipath-tcp/mptcp_net-next/actions/runs/26706355990/attempts/1#summary-78708269694
>>
>> When it happens, MPTcpExtRcvPruned is not 0.
>>
>> I can reproduce the issue locally (~5% of the time), so I started a
>> quick bisect running mptcp_connect.sh, but only MPTFO and disconnect
>> subtests, in a loop, 25 times (maybe that was not enough). Apparently,
>> the issue seems to come from:
>>
>>   mptcp: implemented OoO queue pruning
> 
> Of course :(
> 
>> I didn't check the intermediate versions without the squash-to patches,
>> but I can run more tests if that helps. Do you have a rough idea what
>> can cause this?
> 
> Pruning drops OoO packets and relies on good mptcp-level retrans to make
> forward progresses. In the to samples above I see quite a lot of
> NoDSSInWindow events - presumably happening _after_ the prune event.
> 
> My wild guess is that MPTCP-level is not yet good enough/trip into some
> corner cases. Possibly paired with pruning being too aggressive.
> 
> I'll try to investigate this.

I dashed the above reply before actually looking at the PW status. 
`mptcp: implemented OoO queue pruning` relies on 
`mptcp: let the retrans scheduler do its job.`.

The latter is still rightfully on PW, and sashiko has a comment on it.

The (incremental) fix for sashiko comment should be, I'll test that and send
a v12 (or you could apply it :).

/P

---
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 2e62fdc2af3e..a4f7e99b30db 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -2927,8 +2927,7 @@ static void __mptcp_retrans(struct sock *sk)
 		 * the current dfrag, if so try to start again from RTX head.
 		 */
 		mptcp_data_lock(sk);
-		already_retrans = !dfrag->already_sent ||
-				  !before64(msk->snd_una, dfrag->data_seq +
+		already_retrans = !before64(msk->snd_una, dfrag->data_seq +
 					    dfrag->already_sent);
 		put_page(dfrag->page);
 		if (already_retrans) {

Re: [PATCH mptcp-next] Squash-to: "mptcp: implemented OoO queue pruning"

Posted by Matthieu Baerts 4 days, 23 hours ago

Hi Paolo,

On 03/06/2026 17:27, Paolo Abeni wrote:
> 
> 
> On 6/3/26 9:14 AM, Paolo Abeni wrote:
>> On 6/2/26 1:21 PM, Matthieu Baerts wrote:
>>> On 31/05/2026 01:08, Paolo Abeni wrote:
>>>> Sashiko noted that mptcp_prune_ofo_queue() is 1 byte too restrictive with
>>>> the bound check compared to the pre-existing code.
>>>
>>> The CI is reporting that mptcp_connect.sh is now flaky:
>>>
>>>  - normal:
>>> https://github.com/multipath-tcp/mptcp_net-next/actions/runs/26704613812/attempts/1#summary-78703396420
>>>
>>>   - debug:
>>> https://github.com/multipath-tcp/mptcp_net-next/actions/runs/26706355990/attempts/1#summary-78708269694
>>>
>>> When it happens, MPTcpExtRcvPruned is not 0.
>>>
>>> I can reproduce the issue locally (~5% of the time), so I started a
>>> quick bisect running mptcp_connect.sh, but only MPTFO and disconnect
>>> subtests, in a loop, 25 times (maybe that was not enough). Apparently,
>>> the issue seems to come from:
>>>
>>>   mptcp: implemented OoO queue pruning
>>
>> Of course :(
>>
>>> I didn't check the intermediate versions without the squash-to patches,
>>> but I can run more tests if that helps. Do you have a rough idea what
>>> can cause this?
>>
>> Pruning drops OoO packets and relies on good mptcp-level retrans to make
>> forward progresses. In the to samples above I see quite a lot of
>> NoDSSInWindow events - presumably happening _after_ the prune event.
>>
>> My wild guess is that MPTCP-level is not yet good enough/trip into some
>> corner cases. Possibly paired with pruning being too aggressive.
>>
>> I'll try to investigate this.
> 
> I dashed the above reply before actually looking at the PW status. 
> `mptcp: implemented OoO queue pruning` relies on 
> `mptcp: let the retrans scheduler do its job.`.

Detail: can we apply the second one before the first one?

> The latter is still rightfully on PW, and sashiko has a comment on it.
> 
> The (incremental) fix for sashiko comment should be, I'll test that and send
> a v12 (or you could apply it :).

I'm happy to apply it when you finish the testing, if you prefer not to
send a v12 :)

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.

Re: [PATCH mptcp-next] Squash-to: "mptcp: implemented OoO queue pruning"

Posted by MPTCP CI 1 week, 1 day ago

Hi Paolo,

Thank you for your modifications, that's great!

Our CI did some validations and here is its report:

- KVM Validation: normal (except selftest_mptcp_join): Success! ✅
- KVM Validation: normal (only selftest_mptcp_join): Success! ✅
- KVM Validation: debug (except selftest_mptcp_join): Success! ✅
- KVM Validation: debug (only selftest_mptcp_join): Success! ✅
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/26687492560

Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/8b6df95aea3a
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=1103335


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-normal

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)

Re: [PATCH mptcp-next] Squash-to: "mptcp: implemented OoO queue pruning"

Posted by Matthieu Baerts 1 week, 1 day ago

Hi Paolo,

On 31/05/2026 01:08, Paolo Abeni wrote:
> Sashiko noted that mptcp_prune_ofo_queue() is 1 byte too restrictive with
> the bound check compared to the pre-existing code.
> 
> Accepts rmem <= sk_rcvbuf.
Thank you for the follow-up patch. A few tests have failed, but I didn't
find a link with this patch. Or did I miss something?

  - mptcp_connect.sh
  - mptcp_connect.sh -m sendfile
  - simult_flows.sh

I already applied the patch, just in case that would help you:

New patches for t/upstream:
- c8dd2e45c108: Squash-to: "mptcp: implemented OoO queue pruning"
- Results: 75370a1de11f..294e9da8a098 (export)

Tests are now in progress:

- export:
https://github.com/multipath-tcp/mptcp_net-next/commit/8ff849807e07f5c503bab6d7b953b6eccd97c667/checks

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.

Re: [PATCH mptcp-next] Squash-to: "mptcp: implemented OoO queue pruning"

Posted by MPTCP CI 1 week, 1 day ago

Hi Paolo,

Thank you for your modifications, that's great!

Our CI did some validations and here is its report:

- KVM Validation: normal (except selftest_mptcp_join): Unstable: 3 failed test(s): selftest_mptcp_connect selftest_mptcp_connect_sendfile selftest_simult_flows ⚠️ 
- KVM Validation: normal (only selftest_mptcp_join): Success! ✅
- KVM Validation: debug (except selftest_mptcp_join): Success! ✅
- KVM Validation: debug (only selftest_mptcp_join): Success! ✅
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/26687492560

Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/8b6df95aea3a
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=1103335


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-normal

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)