net/mptcp/protocol.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-)
When __mptcp_retrans() kicks-in, it schedules one or more subflow for
retransmission, but such subflows could be actually left alone if there
is no more data to retransmit and/or in case of concurrent fallback.
Scheduled subflows could be processed much later in time, i.e. when new
data will be transmitted, leading to bad subflow selection.
Explicitly clear all scheduled subflows before leaving the retransmission
function.
Fixes: ee2708aedad0 ("mptcp: use get_retrans wrapper")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
net/mptcp/protocol.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 6f4220c8b5bb..3b327e9ccb78 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -2763,7 +2763,7 @@ static void __mptcp_retrans(struct sock *sk)
}
if (!mptcp_send_head(sk))
- return;
+ goto clear_scheduled;
goto reset_timer;
}
@@ -2794,7 +2794,7 @@ static void __mptcp_retrans(struct sock *sk)
if (__mptcp_check_fallback(msk)) {
spin_unlock_bh(&msk->fallback_lock);
release_sock(ssk);
- return;
+ goto clear_scheduled;
}
while (info.sent < info.limit) {
@@ -2826,6 +2826,15 @@ static void __mptcp_retrans(struct sock *sk)
if (!mptcp_rtx_timer_pending(sk))
mptcp_reset_rtx_timer(sk);
+
+clear_scheduled:
+ /* If no rtx data was available or in case of fallback, there
+ * could be left-over scheduled subflows; clear them all
+ * or later xmit could use bad ones
+ */
+ mptcp_for_each_subflow(msk, subflow)
+ if (READ_ONCE(subflow->scheduled))
+ mptcp_subflow_set_scheduled(subflow, false);
}
/* schedule the timeout timer for the relevant event: either close timeout
--
2.51.1
Hi Paolo,
On 20/11/2025 23:16, Paolo Abeni wrote:
> When __mptcp_retrans() kicks-in, it schedules one or more subflow for
> retransmission, but such subflows could be actually left alone if there
> is no more data to retransmit and/or in case of concurrent fallback.
>
> Scheduled subflows could be processed much later in time, i.e. when new
> data will be transmitted, leading to bad subflow selection.
>
> Explicitly clear all scheduled subflows before leaving the retransmission
> function.
>
> Fixes: ee2708aedad0 ("mptcp: use get_retrans wrapper")
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> net/mptcp/protocol.c | 13 +++++++++++--
> 1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 6f4220c8b5bb..3b327e9ccb78 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -2763,7 +2763,7 @@ static void __mptcp_retrans(struct sock *sk)
> }
>
> if (!mptcp_send_head(sk))
> - return;
> + goto clear_scheduled;
>
> goto reset_timer;
> }
> @@ -2794,7 +2794,7 @@ static void __mptcp_retrans(struct sock *sk)
> if (__mptcp_check_fallback(msk)) {
> spin_unlock_bh(&msk->fallback_lock);
> release_sock(ssk);
> - return;
> + goto clear_scheduled;
> }
>
> while (info.sent < info.limit) {
> @@ -2826,6 +2826,15 @@ static void __mptcp_retrans(struct sock *sk)
>
> if (!mptcp_rtx_timer_pending(sk))
> mptcp_reset_rtx_timer(sk);
> +
> +clear_scheduled:
> + /* If no rtx data was available or in case of fallback, there
When reading this comment, it sounds like a "return" could be added
before this new label (clear_scheduled). Is it always needed to clear
this flag or only in these two cases?
> + * could be left-over scheduled subflows; clear them all
> + * or later xmit could use bad ones
> + */
> + mptcp_for_each_subflow(msk, subflow)
> + if (READ_ONCE(subflow->scheduled))
> + mptcp_subflow_set_scheduled(subflow, false);
> }
>
> /* schedule the timeout timer for the relevant event: either close timeout
Cheers,
Matt
--
Sponsored by the NGI0 Core fund.
On 11/24/25 7:18 PM, Matthieu Baerts wrote:
> On 20/11/2025 23:16, Paolo Abeni wrote:
>> When __mptcp_retrans() kicks-in, it schedules one or more subflow for
>> retransmission, but such subflows could be actually left alone if there
>> is no more data to retransmit and/or in case of concurrent fallback.
>>
>> Scheduled subflows could be processed much later in time, i.e. when new
>> data will be transmitted, leading to bad subflow selection.
>>
>> Explicitly clear all scheduled subflows before leaving the retransmission
>> function.
>>
>> Fixes: ee2708aedad0 ("mptcp: use get_retrans wrapper")
>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>> ---
>> net/mptcp/protocol.c | 13 +++++++++++--
>> 1 file changed, 11 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
>> index 6f4220c8b5bb..3b327e9ccb78 100644
>> --- a/net/mptcp/protocol.c
>> +++ b/net/mptcp/protocol.c
>> @@ -2763,7 +2763,7 @@ static void __mptcp_retrans(struct sock *sk)
>> }
>>
>> if (!mptcp_send_head(sk))
>> - return;
>> + goto clear_scheduled;
>>
>> goto reset_timer;
>> }
>> @@ -2794,7 +2794,7 @@ static void __mptcp_retrans(struct sock *sk)
>> if (__mptcp_check_fallback(msk)) {
>> spin_unlock_bh(&msk->fallback_lock);
>> release_sock(ssk);
>> - return;
>> + goto clear_scheduled;
>> }
>>
>> while (info.sent < info.limit) {
>> @@ -2826,6 +2826,15 @@ static void __mptcp_retrans(struct sock *sk)
>>
>> if (!mptcp_rtx_timer_pending(sk))
>> mptcp_reset_rtx_timer(sk);
>> +
>> +clear_scheduled:
>> + /* If no rtx data was available or in case of fallback, there
>
> When reading this comment, it sounds like a "return" could be added
> before this new label (clear_scheduled). Is it always needed to clear
> this flag or only in these two cases?
No, we can't add a return before 'clear_scheduled:'. At least the 'goto
reset_timer;' under the 'if (!dfrag) {' conditional above could reach
there with one or more subflows scheduled.
Cheers,
Paolo
Hi Paolo, On 20/11/2025 23:16, Paolo Abeni wrote: > When __mptcp_retrans() kicks-in, it schedules one or more subflow for > retransmission, but such subflows could be actually left alone if there > is no more data to retransmit and/or in case of concurrent fallback. > > Scheduled subflows could be processed much later in time, i.e. when new > data will be transmitted, leading to bad subflow selection. > > Explicitly clear all scheduled subflows before leaving the retransmission > function. Thank you for this fix! It looks good to me: Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> I suggest sending it (alone) tomorrow if there are no objections. Cheers, Matt -- Sponsored by the NGI0 Core fund.
On 24/11/2025 19:10, Matthieu Baerts wrote: > Hi Paolo, > > On 20/11/2025 23:16, Paolo Abeni wrote: >> When __mptcp_retrans() kicks-in, it schedules one or more subflow for >> retransmission, but such subflows could be actually left alone if there >> is no more data to retransmit and/or in case of concurrent fallback. >> >> Scheduled subflows could be processed much later in time, i.e. when new >> data will be transmitted, leading to bad subflow selection. >> >> Explicitly clear all scheduled subflows before leaving the retransmission >> function. > > Thank you for this fix! It looks good to me: > > Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> > > I suggest sending it (alone) tomorrow if there are no objections. (Note: I already applied the patch before my other comment) Now in our tree (fix for -net): New patches for t/upstream-net and t/upstream: - 8f165e0fb8c1: mptcp: clear scheduled subflows on retransmit - dda4b8ef8834: tg:msg: add Reported-by tag - Results: c5303a89cbfe..c85e07c93b0a (export-net) - Results: ea19f6f490ba..d68a54341cb3 (export) Tests are now in progress: - export-net: https://github.com/multipath-tcp/mptcp_net-next/commit/56912d17ffa80bd7c622bcfa4b5f0ce6da906a19/checks - export: https://github.com/multipath-tcp/mptcp_net-next/commit/9b3f7e4aba65f380467c4dbf76a247de39c411e9/checks Cheers, Matt -- Sponsored by the NGI0 Core fund.
Hi Paolo,
On Thu, 2025-11-20 at 23:16 +0100, Paolo Abeni wrote:
> When __mptcp_retrans() kicks-in, it schedules one or more subflow for
> retransmission, but such subflows could be actually left alone if
> there
> is no more data to retransmit and/or in case of concurrent fallback.
>
> Scheduled subflows could be processed much later in time, i.e. when
> new
> data will be transmitted, leading to bad subflow selection.
>
> Explicitly clear all scheduled subflows before leaving the
> retransmission
> function.
>
> Fixes: ee2708aedad0 ("mptcp: use get_retrans wrapper")
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> net/mptcp/protocol.c | 13 +++++++++++--
> 1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 6f4220c8b5bb..3b327e9ccb78 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -2763,7 +2763,7 @@ static void __mptcp_retrans(struct sock *sk)
> }
>
> if (!mptcp_send_head(sk))
> - return;
> + goto clear_scheduled;
>
> goto reset_timer;
> }
> @@ -2794,7 +2794,7 @@ static void __mptcp_retrans(struct sock *sk)
> if (__mptcp_check_fallback(msk)) {
> spin_unlock_bh(&msk->fallback_lock);
> release_sock(ssk);
> - return;
> + goto clear_scheduled;
> }
>
> while (info.sent < info.limit) {
> @@ -2826,6 +2826,15 @@ static void __mptcp_retrans(struct sock *sk)
>
> if (!mptcp_rtx_timer_pending(sk))
> mptcp_reset_rtx_timer(sk);
> +
> +clear_scheduled:
> + /* If no rtx data was available or in case of fallback,
> there
> + * could be left-over scheduled subflows; clear them all
> + * or later xmit could use bad ones
> + */
> + mptcp_for_each_subflow(msk, subflow)
> + if (READ_ONCE(subflow->scheduled))
> + mptcp_subflow_set_scheduled(subflow, false);
> }
If this __mptcp_retrans() needs to clear scheduled subflows, then
shouldn't similar this clear_scheduled operations also be added at the
end of both the __mptcp_push_pending() and
__mptcp_subflow_push_pending()?
Thanks,
-Geliang
>
> /* schedule the timeout timer for the relevant event: either close
> timeout
On 11/22/25 1:59 PM, Geliang Tang wrote:
> On Thu, 2025-11-20 at 23:16 +0100, Paolo Abeni wrote:
>> When __mptcp_retrans() kicks-in, it schedules one or more subflow for
>> retransmission, but such subflows could be actually left alone if
>> there
>> is no more data to retransmit and/or in case of concurrent fallback.
>>
>> Scheduled subflows could be processed much later in time, i.e. when
>> new
>> data will be transmitted, leading to bad subflow selection.
>>
>> Explicitly clear all scheduled subflows before leaving the
>> retransmission
>> function.
>>
>> Fixes: ee2708aedad0 ("mptcp: use get_retrans wrapper")
>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>> ---
>> net/mptcp/protocol.c | 13 +++++++++++--
>> 1 file changed, 11 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
>> index 6f4220c8b5bb..3b327e9ccb78 100644
>> --- a/net/mptcp/protocol.c
>> +++ b/net/mptcp/protocol.c
>> @@ -2763,7 +2763,7 @@ static void __mptcp_retrans(struct sock *sk)
>> }
>>
>> if (!mptcp_send_head(sk))
>> - return;
>> + goto clear_scheduled;
>>
>> goto reset_timer;
>> }
>> @@ -2794,7 +2794,7 @@ static void __mptcp_retrans(struct sock *sk)
>> if (__mptcp_check_fallback(msk)) {
>> spin_unlock_bh(&msk->fallback_lock);
>> release_sock(ssk);
>> - return;
>> + goto clear_scheduled;
>> }
>>
>> while (info.sent < info.limit) {
>> @@ -2826,6 +2826,15 @@ static void __mptcp_retrans(struct sock *sk)
>>
>> if (!mptcp_rtx_timer_pending(sk))
>> mptcp_reset_rtx_timer(sk);
>> +
>> +clear_scheduled:
>> + /* If no rtx data was available or in case of fallback,
>> there
>> + * could be left-over scheduled subflows; clear them all
>> + * or later xmit could use bad ones
>> + */
>> + mptcp_for_each_subflow(msk, subflow)
>> + if (READ_ONCE(subflow->scheduled))
>> + mptcp_subflow_set_scheduled(subflow, false);
>> }
>
> If this __mptcp_retrans() needs to clear scheduled subflows, then
> shouldn't similar this clear_scheduled operations also be added at the
> end of both the __mptcp_push_pending() and
> __mptcp_subflow_push_pending()?
AFAICS none of such functions can currently schedule some subflows and
terminate without descheduling/processing all of them; additional checks
there are not needed.
I think it's better to avoid adding unneeded tests in the fastpath (also
there is a generic guidance from netdev against defensive programming
[it hides bug, increases bloat and makes the code less readable]).
/P
On 11/20/25 11:16 PM, Paolo Abeni wrote:
> When __mptcp_retrans() kicks-in, it schedules one or more subflow for
> retransmission, but such subflows could be actually left alone if there
> is no more data to retransmit and/or in case of concurrent fallback.
>
> Scheduled subflows could be processed much later in time, i.e. when new
> data will be transmitted, leading to bad subflow selection.
>
> Explicitly clear all scheduled subflows before leaving the retransmission
> function.
>
> Fixes: ee2708aedad0 ("mptcp: use get_retrans wrapper")
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
I forgot to add:
Reported-by: Filip Pokryvka <fpokryvk@redhat.com>
/P
Hi Paolo,
Thank you for your modifications, that's great!
Our CI did some validations and here is its report:
- KVM Validation: normal (except selftest_mptcp_join): Success! ✅
- KVM Validation: normal (only selftest_mptcp_join): Success! ✅
- KVM Validation: debug (except selftest_mptcp_join): Success! ✅
- KVM Validation: debug (only selftest_mptcp_join): Success! ✅
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/19553352689
Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/79d3d1a2339d
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=1026047
If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:
$ cd [kernel source code]
$ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
--pull always mptcp/mptcp-upstream-virtme-docker:latest \
auto-normal
For more details:
https://github.com/multipath-tcp/mptcp-upstream-virtme-docker
Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)
Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)
© 2016 - 2025 Red Hat, Inc.