mptcp: schedule rtx timer only after pushing data

[PATCH mptcp-net] mptcp: schedule rtx timer only after pushing data

Posted by Paolo Abeni 6 days, 2 hours ago

The MPTCP protocol usually schedule the retransmission timer only
when there is some chances for such retransmissions to happen.

With a notable exception: __mptcp_push_pending() currently schedule
such timer unconditionally, potentially leading to unnecessary rtx
timer expiration.

The issue is present since the blamed commit below but become easily
reproducible after commit 27b0e701d387 ("mptcp: drop bogus optimization in
__mptcp_check_push()")

Fixes: 33d41c9cd74c ("mptcp: more accurate timeout")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
Note that the blamed commit states we always need to rearm the timer even
with no data pending, to catch link failures. That sounds weird/possibly
tied to broken zero window probes?!? Hopefully mp_join self test should
catch that
---
 net/mptcp/protocol.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 3b327e9ccb78..b3122f53827f 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -1649,7 +1649,7 @@ void __mptcp_push_pending(struct sock *sk, unsigned int flags)
 	struct mptcp_sendmsg_info info = {
 				.flags = flags,
 	};
-	bool do_check_data_fin = false;
+	bool copied = false;
 	int push_count = 1;
 
 	while (mptcp_send_head(sk) && (push_count > 0)) {
@@ -1691,7 +1691,7 @@ void __mptcp_push_pending(struct sock *sk, unsigned int flags)
 						push_count--;
 					continue;
 				}
-				do_check_data_fin = true;
+				copied = true;
 			}
 		}
 	}
@@ -1700,11 +1700,14 @@ void __mptcp_push_pending(struct sock *sk, unsigned int flags)
 	if (ssk)
 		mptcp_push_release(ssk, &info);
 
-	/* ensure the rtx timer is running */
-	if (!mptcp_rtx_timer_pending(sk))
-		mptcp_reset_rtx_timer(sk);
-	if (do_check_data_fin)
+	/* Avoid scheduling the rtx timer if no data has been pushed; the timer
+	 * will be updated on positive acks by __mptcp_cleanup_una().
+	 */
+	if (copied) {
+		if (!mptcp_rtx_timer_pending(sk))
+			mptcp_reset_rtx_timer(sk);
 		mptcp_check_send_data_fin(sk);
+	}
 }
 
 static void __mptcp_subflow_push_pending(struct sock *sk, struct sock *ssk, bool first)
-- 
2.51.1

Re: [PATCH mptcp-net] mptcp: schedule rtx timer only after pushing data

Posted by Matthieu Baerts 2 days, 16 hours ago

Hi Paolo,

On 21/11/2025 09:52, Paolo Abeni wrote:
> The MPTCP protocol usually schedule the retransmission timer only
> when there is some chances for such retransmissions to happen.
> 
> With a notable exception: __mptcp_push_pending() currently schedule
> such timer unconditionally, potentially leading to unnecessary rtx
> timer expiration.
> 
> The issue is present since the blamed commit below but become easily
> reproducible after commit 27b0e701d387 ("mptcp: drop bogus optimization in
> __mptcp_check_push()")

Thank you for the fix!

Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
> Note that the blamed commit states we always need to rearm the timer even
> with no data pending, to catch link failures. That sounds weird/possibly
> tied to broken zero window probes?!? Hopefully mp_join self test should
> catch that

Thank you for the note! I suggest waiting one more week before sending it.

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.

Re: [PATCH mptcp-net] mptcp: schedule rtx timer only after pushing data

Posted by Matthieu Baerts 2 days, 16 hours ago

Hi Paolo,

On 24/11/2025 19:24, Matthieu Baerts wrote:
> Hi Paolo,
> 
> On 21/11/2025 09:52, Paolo Abeni wrote:
>> The MPTCP protocol usually schedule the retransmission timer only
>> when there is some chances for such retransmissions to happen.
>>
>> With a notable exception: __mptcp_push_pending() currently schedule
>> such timer unconditionally, potentially leading to unnecessary rtx
>> timer expiration.
>>
>> The issue is present since the blamed commit below but become easily
>> reproducible after commit 27b0e701d387 ("mptcp: drop bogus optimization in
>> __mptcp_check_push()")
> 
> Thank you for the fix!
> 
> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
>> Note that the blamed commit states we always need to rearm the timer even
>> with no data pending, to catch link failures. That sounds weird/possibly
>> tied to broken zero window probes?!? Hopefully mp_join self test should
>> catch that
> 
> Thank you for the note! I suggest waiting one more week before sending it.

Applied!

New patches for t/upstream-net and t/upstream:
- 246fb174f79b: mptcp: schedule rtx timer only after pushing data
- Results: c85e07c93b0a..bbb6d3ccd581 (export-net)
- Results: d68a54341cb3..23a6a5017274 (export)

Tests are now in progress:

- export-net:
https://github.com/multipath-tcp/mptcp_net-next/commit/5cfb92e8243a99a2edc03b68e8ef3568672b7546/checks
- export:
https://github.com/multipath-tcp/mptcp_net-next/commit/25182e7cd25a7e550ad2d9f7ecbb7e24cecd7f7f/checks

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.

Re: [PATCH mptcp-net] mptcp: schedule rtx timer only after pushing data

Posted by Geliang Tang 4 days, 22 hours ago

Hi Paolo,

On Fri, 2025-11-21 at 09:52 +0100, Paolo Abeni wrote:
> The MPTCP protocol usually schedule the retransmission timer only
> when there is some chances for such retransmissions to happen.
> 
> With a notable exception: __mptcp_push_pending() currently schedule
> such timer unconditionally, potentially leading to unnecessary rtx
> timer expiration.
> 
> The issue is present since the blamed commit below but become easily
> reproducible after commit 27b0e701d387 ("mptcp: drop bogus
> optimization in
> __mptcp_check_push()")
> 
> Fixes: 33d41c9cd74c ("mptcp: more accurate timeout")
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> Note that the blamed commit states we always need to rearm the timer
> even
> with no data pending, to catch link failures. That sounds
> weird/possibly
> tied to broken zero window probes?!? Hopefully mp_join self test
> should
> catch that
> ---
>  net/mptcp/protocol.c | 15 +++++++++------
>  1 file changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 3b327e9ccb78..b3122f53827f 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -1649,7 +1649,7 @@ void __mptcp_push_pending(struct sock *sk,
> unsigned int flags)
>  	struct mptcp_sendmsg_info info = {
>  				.flags = flags,
>  	};
> -	bool do_check_data_fin = false;
> +	bool copied = false;
>  	int push_count = 1;
>  
>  	while (mptcp_send_head(sk) && (push_count > 0)) {
> @@ -1691,7 +1691,7 @@ void __mptcp_push_pending(struct sock *sk,
> unsigned int flags)
>  						push_count--;
>  					continue;
>  				}
> -				do_check_data_fin = true;
> +				copied = true;
>  			}
>  		}
>  	}
> @@ -1700,11 +1700,14 @@ void __mptcp_push_pending(struct sock *sk,
> unsigned int flags)
>  	if (ssk)
>  		mptcp_push_release(ssk, &info);
>  
> -	/* ensure the rtx timer is running */

nit:

This comment remains valid and can be moved inside the if (copied)
block together.

Thanks,
-Geliang

> -	if (!mptcp_rtx_timer_pending(sk))
> -		mptcp_reset_rtx_timer(sk);
> -	if (do_check_data_fin)
> +	/* Avoid scheduling the rtx timer if no data has been
> pushed; the timer
> +	 * will be updated on positive acks by
> __mptcp_cleanup_una().
> +	 */
> +	if (copied) {
> +		if (!mptcp_rtx_timer_pending(sk))
> +			mptcp_reset_rtx_timer(sk);
>  		mptcp_check_send_data_fin(sk);
> +	}
>  }
>  
>  static void __mptcp_subflow_push_pending(struct sock *sk, struct
> sock *ssk, bool first)

Re: [PATCH mptcp-net] mptcp: schedule rtx timer only after pushing data

Posted by Paolo Abeni 3 days, 2 hours ago

On 11/22/25 1:14 PM, Geliang Tang wrote:
> On Fri, 2025-11-21 at 09:52 +0100, Paolo Abeni wrote:
>> The MPTCP protocol usually schedule the retransmission timer only
>> when there is some chances for such retransmissions to happen.
>>
>> With a notable exception: __mptcp_push_pending() currently schedule
>> such timer unconditionally, potentially leading to unnecessary rtx
>> timer expiration.
>>
>> The issue is present since the blamed commit below but become easily
>> reproducible after commit 27b0e701d387 ("mptcp: drop bogus
>> optimization in
>> __mptcp_check_push()")
>>
>> Fixes: 33d41c9cd74c ("mptcp: more accurate timeout")
>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>> ---
>> Note that the blamed commit states we always need to rearm the timer
>> even
>> with no data pending, to catch link failures. That sounds
>> weird/possibly
>> tied to broken zero window probes?!? Hopefully mp_join self test
>> should
>> catch that
>> ---
>>  net/mptcp/protocol.c | 15 +++++++++------
>>  1 file changed, 9 insertions(+), 6 deletions(-)
>>
>> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
>> index 3b327e9ccb78..b3122f53827f 100644
>> --- a/net/mptcp/protocol.c
>> +++ b/net/mptcp/protocol.c
>> @@ -1649,7 +1649,7 @@ void __mptcp_push_pending(struct sock *sk,
>> unsigned int flags)
>>  	struct mptcp_sendmsg_info info = {
>>  				.flags = flags,
>>  	};
>> -	bool do_check_data_fin = false;
>> +	bool copied = false;
>>  	int push_count = 1;
>>  
>>  	while (mptcp_send_head(sk) && (push_count > 0)) {
>> @@ -1691,7 +1691,7 @@ void __mptcp_push_pending(struct sock *sk,
>> unsigned int flags)
>>  						push_count--;
>>  					continue;
>>  				}
>> -				do_check_data_fin = true;
>> +				copied = true;
>>  			}
>>  		}
>>  	}
>> @@ -1700,11 +1700,14 @@ void __mptcp_push_pending(struct sock *sk,
>> unsigned int flags)
>>  	if (ssk)
>>  		mptcp_push_release(ssk, &info);
>>  
>> -	/* ensure the rtx timer is running */
> 
> nit:
> 
> This comment remains valid and can be moved inside the if (copied)
> block together.

I intentionally changed the comment because the original one comes from
commit 33d41c9cd74c ("mptcp: more accurate timeout"); such change
explicitly _always_ scheduled the rtx timeout "to catch link failure
even when no data are send" according to the original changelog, see:

https://lore.kernel.org/mptcp/d4ab7e417c82dfefa3dc91ab6ea4adcb7c5752c3.1626210682.git.pabeni@redhat.com/

I want to update to commit message to outline the behavior change and
keep track of the relevant history.

BTW I suspect the above statement was due zero win probe being not
functional/buggy back then and I run the link failures self-tests
several times in a row without any problem with this patch applied,
still I guess there is some chance of regression here.

(hence I suggest not pushing this patch to netdev before 6.18-final)

/P

Re: [PATCH mptcp-net] mptcp: schedule rtx timer only after pushing data

Posted by MPTCP CI 6 days, 1 hour ago

Hi Paolo,

Thank you for your modifications, that's great!

Our CI did some validations and here is its report:

- KVM Validation: normal (except selftest_mptcp_join): Success! ✅
- KVM Validation: normal (only selftest_mptcp_join): Success! ✅
- KVM Validation: debug (except selftest_mptcp_join): Success! ✅
- KVM Validation: debug (only selftest_mptcp_join): Success! ✅
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/19565315895

Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/e7f0f49b0657
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=1026213


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-normal

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)