[v1] mptcp: clear scheduled subflows on retransmit

[PATCH mptcp-net] mptcp: clear scheduled subflows on retransmit

Posted by Paolo Abeni 6 days, 12 hours ago

When __mptcp_retrans() kicks-in, it schedules one or more subflow for
retransmission, but such subflows could be actually left alone if there
is no more data to retransmit and/or in case of concurrent fallback.

Scheduled subflows could be processed much later in time, i.e. when new
data will be transmitted, leading to bad subflow selection.

Explicitly clear all scheduled subflows before leaving the retransmission
function.

Fixes: ee2708aedad0 ("mptcp: use get_retrans wrapper")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/protocol.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 6f4220c8b5bb..3b327e9ccb78 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -2763,7 +2763,7 @@ static void __mptcp_retrans(struct sock *sk)
 		}
 
 		if (!mptcp_send_head(sk))
-			return;
+			goto clear_scheduled;
 
 		goto reset_timer;
 	}
@@ -2794,7 +2794,7 @@ static void __mptcp_retrans(struct sock *sk)
 			if (__mptcp_check_fallback(msk)) {
 				spin_unlock_bh(&msk->fallback_lock);
 				release_sock(ssk);
-				return;
+				goto clear_scheduled;
 			}
 
 			while (info.sent < info.limit) {
@@ -2826,6 +2826,15 @@ static void __mptcp_retrans(struct sock *sk)
 
 	if (!mptcp_rtx_timer_pending(sk))
 		mptcp_reset_rtx_timer(sk);
+
+clear_scheduled:
+	/* If no rtx data was available or in case of fallback, there
+	 * could be left-over scheduled subflows; clear them all
+	 * or later xmit could use bad ones
+	 */
+	mptcp_for_each_subflow(msk, subflow)
+		if (READ_ONCE(subflow->scheduled))
+			mptcp_subflow_set_scheduled(subflow, false);
 }
 
 /* schedule the timeout timer for the relevant event: either close timeout
-- 
2.51.1

Re: [PATCH mptcp-net] mptcp: clear scheduled subflows on retransmit

Posted by Matthieu Baerts 2 days, 16 hours ago

Hi Paolo,

On 20/11/2025 23:16, Paolo Abeni wrote:
> When __mptcp_retrans() kicks-in, it schedules one or more subflow for
> retransmission, but such subflows could be actually left alone if there
> is no more data to retransmit and/or in case of concurrent fallback.
> 
> Scheduled subflows could be processed much later in time, i.e. when new
> data will be transmitted, leading to bad subflow selection.
> 
> Explicitly clear all scheduled subflows before leaving the retransmission
> function.
> 
> Fixes: ee2708aedad0 ("mptcp: use get_retrans wrapper")
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>  net/mptcp/protocol.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 6f4220c8b5bb..3b327e9ccb78 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -2763,7 +2763,7 @@ static void __mptcp_retrans(struct sock *sk)
>  		}
>  
>  		if (!mptcp_send_head(sk))
> -			return;
> +			goto clear_scheduled;
>  
>  		goto reset_timer;
>  	}
> @@ -2794,7 +2794,7 @@ static void __mptcp_retrans(struct sock *sk)
>  			if (__mptcp_check_fallback(msk)) {
>  				spin_unlock_bh(&msk->fallback_lock);
>  				release_sock(ssk);
> -				return;
> +				goto clear_scheduled;
>  			}
>  
>  			while (info.sent < info.limit) {
> @@ -2826,6 +2826,15 @@ static void __mptcp_retrans(struct sock *sk)
>  
>  	if (!mptcp_rtx_timer_pending(sk))
>  		mptcp_reset_rtx_timer(sk);
> +
> +clear_scheduled:
> +	/* If no rtx data was available or in case of fallback, there

When reading this comment, it sounds like a "return" could be added
before this new label (clear_scheduled). Is it always needed to clear
this flag or only in these two cases?

> +	 * could be left-over scheduled subflows; clear them all
> +	 * or later xmit could use bad ones
> +	 */
> +	mptcp_for_each_subflow(msk, subflow)
> +		if (READ_ONCE(subflow->scheduled))
> +			mptcp_subflow_set_scheduled(subflow, false);
>  }
>  
>  /* schedule the timeout timer for the relevant event: either close timeout

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.

Re: [PATCH mptcp-net] mptcp: clear scheduled subflows on retransmit

Posted by Paolo Abeni 2 days, 14 hours ago

On 11/24/25 7:18 PM, Matthieu Baerts wrote:
> On 20/11/2025 23:16, Paolo Abeni wrote:
>> When __mptcp_retrans() kicks-in, it schedules one or more subflow for
>> retransmission, but such subflows could be actually left alone if there
>> is no more data to retransmit and/or in case of concurrent fallback.
>>
>> Scheduled subflows could be processed much later in time, i.e. when new
>> data will be transmitted, leading to bad subflow selection.
>>
>> Explicitly clear all scheduled subflows before leaving the retransmission
>> function.
>>
>> Fixes: ee2708aedad0 ("mptcp: use get_retrans wrapper")
>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>> ---
>>  net/mptcp/protocol.c | 13 +++++++++++--
>>  1 file changed, 11 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
>> index 6f4220c8b5bb..3b327e9ccb78 100644
>> --- a/net/mptcp/protocol.c
>> +++ b/net/mptcp/protocol.c
>> @@ -2763,7 +2763,7 @@ static void __mptcp_retrans(struct sock *sk)
>>  		}
>>  
>>  		if (!mptcp_send_head(sk))
>> -			return;
>> +			goto clear_scheduled;
>>  
>>  		goto reset_timer;
>>  	}
>> @@ -2794,7 +2794,7 @@ static void __mptcp_retrans(struct sock *sk)
>>  			if (__mptcp_check_fallback(msk)) {
>>  				spin_unlock_bh(&msk->fallback_lock);
>>  				release_sock(ssk);
>> -				return;
>> +				goto clear_scheduled;
>>  			}
>>  
>>  			while (info.sent < info.limit) {
>> @@ -2826,6 +2826,15 @@ static void __mptcp_retrans(struct sock *sk)
>>  
>>  	if (!mptcp_rtx_timer_pending(sk))
>>  		mptcp_reset_rtx_timer(sk);
>> +
>> +clear_scheduled:
>> +	/* If no rtx data was available or in case of fallback, there
> 
> When reading this comment, it sounds like a "return" could be added
> before this new label (clear_scheduled). Is it always needed to clear
> this flag or only in these two cases?

No, we can't add a return before 'clear_scheduled:'. At least the 'goto
reset_timer;' under the 'if (!dfrag) {' conditional above could reach
there with one or more subflows scheduled.

Cheers,

Paolo

Re: [PATCH mptcp-net] mptcp: clear scheduled subflows on retransmit

Posted by Matthieu Baerts 2 days, 16 hours ago

Hi Paolo,

On 20/11/2025 23:16, Paolo Abeni wrote:
> When __mptcp_retrans() kicks-in, it schedules one or more subflow for
> retransmission, but such subflows could be actually left alone if there
> is no more data to retransmit and/or in case of concurrent fallback.
> 
> Scheduled subflows could be processed much later in time, i.e. when new
> data will be transmitted, leading to bad subflow selection.
> 
> Explicitly clear all scheduled subflows before leaving the retransmission
> function.

Thank you for this fix! It looks good to me:

Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>

I suggest sending it (alone) tomorrow if there are no objections.

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.

Re: [PATCH mptcp-net] mptcp: clear scheduled subflows on retransmit

Posted by Matthieu Baerts 2 days, 16 hours ago

On 24/11/2025 19:10, Matthieu Baerts wrote:
> Hi Paolo,
> 
> On 20/11/2025 23:16, Paolo Abeni wrote:
>> When __mptcp_retrans() kicks-in, it schedules one or more subflow for
>> retransmission, but such subflows could be actually left alone if there
>> is no more data to retransmit and/or in case of concurrent fallback.
>>
>> Scheduled subflows could be processed much later in time, i.e. when new
>> data will be transmitted, leading to bad subflow selection.
>>
>> Explicitly clear all scheduled subflows before leaving the retransmission
>> function.
> 
> Thank you for this fix! It looks good to me:
> 
> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
> 
> I suggest sending it (alone) tomorrow if there are no objections.

(Note: I already applied the patch before my other comment)

Now in our tree (fix for -net):

New patches for t/upstream-net and t/upstream:
- 8f165e0fb8c1: mptcp: clear scheduled subflows on retransmit
- dda4b8ef8834: tg:msg: add Reported-by tag
- Results: c5303a89cbfe..c85e07c93b0a (export-net)
- Results: ea19f6f490ba..d68a54341cb3 (export)

Tests are now in progress:

- export-net:
https://github.com/multipath-tcp/mptcp_net-next/commit/56912d17ffa80bd7c622bcfa4b5f0ce6da906a19/checks
- export:
https://github.com/multipath-tcp/mptcp_net-next/commit/9b3f7e4aba65f380467c4dbf76a247de39c411e9/checks

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.

Re: [PATCH mptcp-net] mptcp: clear scheduled subflows on retransmit

Posted by Geliang Tang 4 days, 22 hours ago

Hi Paolo,

On Thu, 2025-11-20 at 23:16 +0100, Paolo Abeni wrote:
> When __mptcp_retrans() kicks-in, it schedules one or more subflow for
> retransmission, but such subflows could be actually left alone if
> there
> is no more data to retransmit and/or in case of concurrent fallback.
> 
> Scheduled subflows could be processed much later in time, i.e. when
> new
> data will be transmitted, leading to bad subflow selection.
> 
> Explicitly clear all scheduled subflows before leaving the
> retransmission
> function.
> 
> Fixes: ee2708aedad0 ("mptcp: use get_retrans wrapper")
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>  net/mptcp/protocol.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 6f4220c8b5bb..3b327e9ccb78 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -2763,7 +2763,7 @@ static void __mptcp_retrans(struct sock *sk)
>  		}
>  
>  		if (!mptcp_send_head(sk))
> -			return;
> +			goto clear_scheduled;
>  
>  		goto reset_timer;
>  	}
> @@ -2794,7 +2794,7 @@ static void __mptcp_retrans(struct sock *sk)
>  			if (__mptcp_check_fallback(msk)) {
>  				spin_unlock_bh(&msk->fallback_lock);
>  				release_sock(ssk);
> -				return;
> +				goto clear_scheduled;
>  			}
>  
>  			while (info.sent < info.limit) {
> @@ -2826,6 +2826,15 @@ static void __mptcp_retrans(struct sock *sk)
>  
>  	if (!mptcp_rtx_timer_pending(sk))
>  		mptcp_reset_rtx_timer(sk);
> +
> +clear_scheduled:
> +	/* If no rtx data was available or in case of fallback,
> there
> +	 * could be left-over scheduled subflows; clear them all
> +	 * or later xmit could use bad ones
> +	 */
> +	mptcp_for_each_subflow(msk, subflow)
> +		if (READ_ONCE(subflow->scheduled))
> +			mptcp_subflow_set_scheduled(subflow, false);
>  }

If this __mptcp_retrans() needs to clear scheduled subflows, then
shouldn't similar this clear_scheduled operations also be added at the
end of both the __mptcp_push_pending() and
__mptcp_subflow_push_pending()?

Thanks,
-Geliang

>  
>  /* schedule the timeout timer for the relevant event: either close
> timeout

Re: [PATCH mptcp-net] mptcp: clear scheduled subflows on retransmit

Posted by Paolo Abeni 3 days, 2 hours ago

On 11/22/25 1:59 PM, Geliang Tang wrote:
> On Thu, 2025-11-20 at 23:16 +0100, Paolo Abeni wrote:
>> When __mptcp_retrans() kicks-in, it schedules one or more subflow for
>> retransmission, but such subflows could be actually left alone if
>> there
>> is no more data to retransmit and/or in case of concurrent fallback.
>>
>> Scheduled subflows could be processed much later in time, i.e. when
>> new
>> data will be transmitted, leading to bad subflow selection.
>>
>> Explicitly clear all scheduled subflows before leaving the
>> retransmission
>> function.
>>
>> Fixes: ee2708aedad0 ("mptcp: use get_retrans wrapper")
>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>> ---
>>  net/mptcp/protocol.c | 13 +++++++++++--
>>  1 file changed, 11 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
>> index 6f4220c8b5bb..3b327e9ccb78 100644
>> --- a/net/mptcp/protocol.c
>> +++ b/net/mptcp/protocol.c
>> @@ -2763,7 +2763,7 @@ static void __mptcp_retrans(struct sock *sk)
>>  		}
>>  
>>  		if (!mptcp_send_head(sk))
>> -			return;
>> +			goto clear_scheduled;
>>  
>>  		goto reset_timer;
>>  	}
>> @@ -2794,7 +2794,7 @@ static void __mptcp_retrans(struct sock *sk)
>>  			if (__mptcp_check_fallback(msk)) {
>>  				spin_unlock_bh(&msk->fallback_lock);
>>  				release_sock(ssk);
>> -				return;
>> +				goto clear_scheduled;
>>  			}
>>  
>>  			while (info.sent < info.limit) {
>> @@ -2826,6 +2826,15 @@ static void __mptcp_retrans(struct sock *sk)
>>  
>>  	if (!mptcp_rtx_timer_pending(sk))
>>  		mptcp_reset_rtx_timer(sk);
>> +
>> +clear_scheduled:
>> +	/* If no rtx data was available or in case of fallback,
>> there
>> +	 * could be left-over scheduled subflows; clear them all
>> +	 * or later xmit could use bad ones
>> +	 */
>> +	mptcp_for_each_subflow(msk, subflow)
>> +		if (READ_ONCE(subflow->scheduled))
>> +			mptcp_subflow_set_scheduled(subflow, false);
>>  }
> 
> If this __mptcp_retrans() needs to clear scheduled subflows, then
> shouldn't similar this clear_scheduled operations also be added at the
> end of both the __mptcp_push_pending() and
> __mptcp_subflow_push_pending()?

AFAICS none of such functions can currently schedule some subflows and
terminate without descheduling/processing all of them; additional checks
there are not needed.

I think it's better to avoid adding unneeded tests in the fastpath (also
there is a generic guidance from netdev against defensive programming
[it hides bug, increases bloat and makes the code less readable]).

/P

Re: [PATCH mptcp-net] mptcp: clear scheduled subflows on retransmit

Posted by Paolo Abeni 6 days, 2 hours ago

On 11/20/25 11:16 PM, Paolo Abeni wrote:
> When __mptcp_retrans() kicks-in, it schedules one or more subflow for
> retransmission, but such subflows could be actually left alone if there
> is no more data to retransmit and/or in case of concurrent fallback.
> 
> Scheduled subflows could be processed much later in time, i.e. when new
> data will be transmitted, leading to bad subflow selection.
> 
> Explicitly clear all scheduled subflows before leaving the retransmission
> function.
> 
> Fixes: ee2708aedad0 ("mptcp: use get_retrans wrapper")
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>

I forgot to add:

Reported-by: Filip Pokryvka <fpokryvk@redhat.com>

/P

Re: [PATCH mptcp-net] mptcp: clear scheduled subflows on retransmit

Posted by MPTCP CI 6 days, 11 hours ago

Hi Paolo,

Thank you for your modifications, that's great!

Our CI did some validations and here is its report:

- KVM Validation: normal (except selftest_mptcp_join): Success! ✅
- KVM Validation: normal (only selftest_mptcp_join): Success! ✅
- KVM Validation: debug (except selftest_mptcp_join): Success! ✅
- KVM Validation: debug (only selftest_mptcp_join): Success! ✅
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/19553352689

Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/79d3d1a2339d
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=1026047


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-normal

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)