net/mptcp/protocol.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-)
The MPTCP protocol usually schedule the retransmission timer only
when there is some chances for such retransmissions to happen.
With a notable exception: __mptcp_push_pending() currently schedule
such timer unconditionally, potentially leading to unnecessary rtx
timer expiration.
The issue is present since the blamed commit below but become easily
reproducible after commit 27b0e701d387 ("mptcp: drop bogus optimization in
__mptcp_check_push()")
Fixes: 33d41c9cd74c ("mptcp: more accurate timeout")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
Note that the blamed commit states we always need to rearm the timer even
with no data pending, to catch link failures. That sounds weird/possibly
tied to broken zero window probes?!? Hopefully mp_join self test should
catch that
---
net/mptcp/protocol.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 3b327e9ccb78..b3122f53827f 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1649,7 +1649,7 @@ void __mptcp_push_pending(struct sock *sk, unsigned int flags)
struct mptcp_sendmsg_info info = {
.flags = flags,
};
- bool do_check_data_fin = false;
+ bool copied = false;
int push_count = 1;
while (mptcp_send_head(sk) && (push_count > 0)) {
@@ -1691,7 +1691,7 @@ void __mptcp_push_pending(struct sock *sk, unsigned int flags)
push_count--;
continue;
}
- do_check_data_fin = true;
+ copied = true;
}
}
}
@@ -1700,11 +1700,14 @@ void __mptcp_push_pending(struct sock *sk, unsigned int flags)
if (ssk)
mptcp_push_release(ssk, &info);
- /* ensure the rtx timer is running */
- if (!mptcp_rtx_timer_pending(sk))
- mptcp_reset_rtx_timer(sk);
- if (do_check_data_fin)
+ /* Avoid scheduling the rtx timer if no data has been pushed; the timer
+ * will be updated on positive acks by __mptcp_cleanup_una().
+ */
+ if (copied) {
+ if (!mptcp_rtx_timer_pending(sk))
+ mptcp_reset_rtx_timer(sk);
mptcp_check_send_data_fin(sk);
+ }
}
static void __mptcp_subflow_push_pending(struct sock *sk, struct sock *ssk, bool first)
--
2.51.1
Hi Paolo,
On 21/11/2025 09:52, Paolo Abeni wrote:
> The MPTCP protocol usually schedule the retransmission timer only
> when there is some chances for such retransmissions to happen.
>
> With a notable exception: __mptcp_push_pending() currently schedule
> such timer unconditionally, potentially leading to unnecessary rtx
> timer expiration.
>
> The issue is present since the blamed commit below but become easily
> reproducible after commit 27b0e701d387 ("mptcp: drop bogus optimization in
> __mptcp_check_push()")
Thank you for the fix!
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
> Note that the blamed commit states we always need to rearm the timer even
> with no data pending, to catch link failures. That sounds weird/possibly
> tied to broken zero window probes?!? Hopefully mp_join self test should
> catch that
Thank you for the note! I suggest waiting one more week before sending it.
Cheers,
Matt
--
Sponsored by the NGI0 Core fund.
Hi Paolo,
On 24/11/2025 19:24, Matthieu Baerts wrote:
> Hi Paolo,
>
> On 21/11/2025 09:52, Paolo Abeni wrote:
>> The MPTCP protocol usually schedule the retransmission timer only
>> when there is some chances for such retransmissions to happen.
>>
>> With a notable exception: __mptcp_push_pending() currently schedule
>> such timer unconditionally, potentially leading to unnecessary rtx
>> timer expiration.
>>
>> The issue is present since the blamed commit below but become easily
>> reproducible after commit 27b0e701d387 ("mptcp: drop bogus optimization in
>> __mptcp_check_push()")
>
> Thank you for the fix!
>
> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
>> Note that the blamed commit states we always need to rearm the timer even
>> with no data pending, to catch link failures. That sounds weird/possibly
>> tied to broken zero window probes?!? Hopefully mp_join self test should
>> catch that
>
> Thank you for the note! I suggest waiting one more week before sending it.
Applied!
New patches for t/upstream-net and t/upstream:
- 246fb174f79b: mptcp: schedule rtx timer only after pushing data
- Results: c85e07c93b0a..bbb6d3ccd581 (export-net)
- Results: d68a54341cb3..23a6a5017274 (export)
Tests are now in progress:
- export-net:
https://github.com/multipath-tcp/mptcp_net-next/commit/5cfb92e8243a99a2edc03b68e8ef3568672b7546/checks
- export:
https://github.com/multipath-tcp/mptcp_net-next/commit/25182e7cd25a7e550ad2d9f7ecbb7e24cecd7f7f/checks
Cheers,
Matt
--
Sponsored by the NGI0 Core fund.
Hi Paolo,
On Fri, 2025-11-21 at 09:52 +0100, Paolo Abeni wrote:
> The MPTCP protocol usually schedule the retransmission timer only
> when there is some chances for such retransmissions to happen.
>
> With a notable exception: __mptcp_push_pending() currently schedule
> such timer unconditionally, potentially leading to unnecessary rtx
> timer expiration.
>
> The issue is present since the blamed commit below but become easily
> reproducible after commit 27b0e701d387 ("mptcp: drop bogus
> optimization in
> __mptcp_check_push()")
>
> Fixes: 33d41c9cd74c ("mptcp: more accurate timeout")
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> Note that the blamed commit states we always need to rearm the timer
> even
> with no data pending, to catch link failures. That sounds
> weird/possibly
> tied to broken zero window probes?!? Hopefully mp_join self test
> should
> catch that
> ---
> net/mptcp/protocol.c | 15 +++++++++------
> 1 file changed, 9 insertions(+), 6 deletions(-)
>
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 3b327e9ccb78..b3122f53827f 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -1649,7 +1649,7 @@ void __mptcp_push_pending(struct sock *sk,
> unsigned int flags)
> struct mptcp_sendmsg_info info = {
> .flags = flags,
> };
> - bool do_check_data_fin = false;
> + bool copied = false;
> int push_count = 1;
>
> while (mptcp_send_head(sk) && (push_count > 0)) {
> @@ -1691,7 +1691,7 @@ void __mptcp_push_pending(struct sock *sk,
> unsigned int flags)
> push_count--;
> continue;
> }
> - do_check_data_fin = true;
> + copied = true;
> }
> }
> }
> @@ -1700,11 +1700,14 @@ void __mptcp_push_pending(struct sock *sk,
> unsigned int flags)
> if (ssk)
> mptcp_push_release(ssk, &info);
>
> - /* ensure the rtx timer is running */
nit:
This comment remains valid and can be moved inside the if (copied)
block together.
Thanks,
-Geliang
> - if (!mptcp_rtx_timer_pending(sk))
> - mptcp_reset_rtx_timer(sk);
> - if (do_check_data_fin)
> + /* Avoid scheduling the rtx timer if no data has been
> pushed; the timer
> + * will be updated on positive acks by
> __mptcp_cleanup_una().
> + */
> + if (copied) {
> + if (!mptcp_rtx_timer_pending(sk))
> + mptcp_reset_rtx_timer(sk);
> mptcp_check_send_data_fin(sk);
> + }
> }
>
> static void __mptcp_subflow_push_pending(struct sock *sk, struct
> sock *ssk, bool first)
On 11/22/25 1:14 PM, Geliang Tang wrote:
> On Fri, 2025-11-21 at 09:52 +0100, Paolo Abeni wrote:
>> The MPTCP protocol usually schedule the retransmission timer only
>> when there is some chances for such retransmissions to happen.
>>
>> With a notable exception: __mptcp_push_pending() currently schedule
>> such timer unconditionally, potentially leading to unnecessary rtx
>> timer expiration.
>>
>> The issue is present since the blamed commit below but become easily
>> reproducible after commit 27b0e701d387 ("mptcp: drop bogus
>> optimization in
>> __mptcp_check_push()")
>>
>> Fixes: 33d41c9cd74c ("mptcp: more accurate timeout")
>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>> ---
>> Note that the blamed commit states we always need to rearm the timer
>> even
>> with no data pending, to catch link failures. That sounds
>> weird/possibly
>> tied to broken zero window probes?!? Hopefully mp_join self test
>> should
>> catch that
>> ---
>> net/mptcp/protocol.c | 15 +++++++++------
>> 1 file changed, 9 insertions(+), 6 deletions(-)
>>
>> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
>> index 3b327e9ccb78..b3122f53827f 100644
>> --- a/net/mptcp/protocol.c
>> +++ b/net/mptcp/protocol.c
>> @@ -1649,7 +1649,7 @@ void __mptcp_push_pending(struct sock *sk,
>> unsigned int flags)
>> struct mptcp_sendmsg_info info = {
>> .flags = flags,
>> };
>> - bool do_check_data_fin = false;
>> + bool copied = false;
>> int push_count = 1;
>>
>> while (mptcp_send_head(sk) && (push_count > 0)) {
>> @@ -1691,7 +1691,7 @@ void __mptcp_push_pending(struct sock *sk,
>> unsigned int flags)
>> push_count--;
>> continue;
>> }
>> - do_check_data_fin = true;
>> + copied = true;
>> }
>> }
>> }
>> @@ -1700,11 +1700,14 @@ void __mptcp_push_pending(struct sock *sk,
>> unsigned int flags)
>> if (ssk)
>> mptcp_push_release(ssk, &info);
>>
>> - /* ensure the rtx timer is running */
>
> nit:
>
> This comment remains valid and can be moved inside the if (copied)
> block together.
I intentionally changed the comment because the original one comes from
commit 33d41c9cd74c ("mptcp: more accurate timeout"); such change
explicitly _always_ scheduled the rtx timeout "to catch link failure
even when no data are send" according to the original changelog, see:
https://lore.kernel.org/mptcp/d4ab7e417c82dfefa3dc91ab6ea4adcb7c5752c3.1626210682.git.pabeni@redhat.com/
I want to update to commit message to outline the behavior change and
keep track of the relevant history.
BTW I suspect the above statement was due zero win probe being not
functional/buggy back then and I run the link failures self-tests
several times in a row without any problem with this patch applied,
still I guess there is some chance of regression here.
(hence I suggest not pushing this patch to netdev before 6.18-final)
/P
Hi Paolo,
Thank you for your modifications, that's great!
Our CI did some validations and here is its report:
- KVM Validation: normal (except selftest_mptcp_join): Success! ✅
- KVM Validation: normal (only selftest_mptcp_join): Success! ✅
- KVM Validation: debug (except selftest_mptcp_join): Success! ✅
- KVM Validation: debug (only selftest_mptcp_join): Success! ✅
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/19565315895
Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/e7f0f49b0657
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=1026213
If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:
$ cd [kernel source code]
$ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
--pull always mptcp/mptcp-upstream-virtme-docker:latest \
auto-normal
For more details:
https://github.com/multipath-tcp/mptcp-upstream-virtme-docker
Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)
Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)
© 2016 - 2025 Red Hat, Inc.