net/mptcp/options.c | 22 +++++++++++++++++++++- 1 file changed, 21 insertions(+), 1 deletion(-)
mptcp_cleanup_rbuf() need to know the last most recent, mptcp-level
rcv_wnd sent, and such information is tracked into the msk->old_wspace
field, updated at ack transmission time by mptcp_write_options().
Fallback socket do not add any mptcp options, such helper is never
invoked, and msk->old_wspace value remain stale. That in turn makes
ack generation at recvmsg() time quite randomic.
Address the issue ensuring mptcp_write_options() is invoked even
for fallback sockets, and just update the needed info in such a
case.
The issue went unnoticed for a long time, as mptcp overshot the
fallback socket receive buffer autotune significantly.
Fixes: e3859603ba13 ("mptcp: better msk receive window updates")
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/594
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
net/mptcp/options.c | 22 +++++++++++++++++++++-
1 file changed, 21 insertions(+), 1 deletion(-)
diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index ad51dcf18984..8d3b0f9754dc 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -838,8 +838,11 @@ bool mptcp_established_options(struct sock *sk, struct sk_buff *skb,
opts->suboptions = 0;
+ /* Force later mptcp_write_options(), but do not use any actual
+ * option space.
+ */
if (unlikely(__mptcp_check_fallback(msk) && !mptcp_check_infinite_map(skb)))
- return false;
+ return true;
if (unlikely(skb && TCP_SKB_CB(skb)->tcp_flags & TCPHDR_RST)) {
if (mptcp_established_options_fastclose(sk, &opt_size, remaining, opts) ||
@@ -1351,6 +1354,20 @@ static void mptcp_set_rwin(struct tcp_sock *tp, struct tcphdr *th)
subflow->rcv_wnd_sent = rcv_wnd_new;
}
+static void mptcp_track_rwin(struct tcp_sock *tp)
+{
+ const struct sock *ssk = (const struct sock *)tp;
+ struct mptcp_subflow_context *subflow;
+ struct mptcp_sock *msk;
+
+ if (!ssk)
+ return;
+
+ subflow = mptcp_subflow_ctx(ssk);
+ msk = mptcp_sk(subflow->conn);
+ WRITE_ONCE(msk->old_wspace, tp->rcv_wnd);
+}
+
__sum16 __mptcp_make_csum(u64 data_seq, u32 subflow_seq, u16 data_len, __wsum sum)
{
struct csum_pseudo_header header;
@@ -1643,6 +1660,9 @@ void mptcp_write_options(struct tcphdr *th, __be32 *ptr, struct tcp_sock *tp,
opts->reset_transient,
opts->reset_reason);
return;
+ } else if (unlikely(!opts->suboptions)) {
+ mptcp_track_rwin(tp);
+ return;
}
if (OPTION_MPTCP_PRIO & opts->suboptions) {
--
2.51.0
Hi Paolo,
On 07/11/2025 01:10, Paolo Abeni wrote:
> mptcp_cleanup_rbuf() need to know the last most recent, mptcp-level
> rcv_wnd sent, and such information is tracked into the msk->old_wspace
> field, updated at ack transmission time by mptcp_write_options().
>
> Fallback socket do not add any mptcp options, such helper is never
> invoked, and msk->old_wspace value remain stale. That in turn makes
> ack generation at recvmsg() time quite randomic.
>
> Address the issue ensuring mptcp_write_options() is invoked even
> for fallback sockets, and just update the needed info in such a
> case.
>
> The issue went unnoticed for a long time, as mptcp overshot the> fallback socket receive buffer autotune significantly.
Should we say "used to overshot"? :)
> Fixes: e3859603ba13 ("mptcp: better msk receive window updates")
> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/594
Thank you for the fix!
(...)
> @@ -1643,6 +1660,9 @@ void mptcp_write_options(struct tcphdr *th, __be32 *ptr, struct tcp_sock *tp,
> opts->reset_transient,
> opts->reset_reason);
> return;
> + } else if (unlikely(!opts->suboptions)) {
Small detail: do you mind if I add /* Fallback to TCP */ just above when
applying the patch?
> + mptcp_track_rwin(tp);
Should we eventually also call mptcp_track_rwin(tp) in case of MP_FAIL
without MP_RST? Or is this not needed?
Other than that, it looks good to me, and I guess we can already apply
the patch:
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
> + return;
> }
>
> if (OPTION_MPTCP_PRIO & opts->suboptions) {
Cheers,
Matt
--
Sponsored by the NGI0 Core fund.
Hi Paolo,
On 11/11/2025 16:30, Matthieu Baerts wrote:
> Hi Paolo,
>
> On 07/11/2025 01:10, Paolo Abeni wrote:
>> mptcp_cleanup_rbuf() need to know the last most recent, mptcp-level
>> rcv_wnd sent, and such information is tracked into the msk->old_wspace
>> field, updated at ack transmission time by mptcp_write_options().
>>
>> Fallback socket do not add any mptcp options, such helper is never
>> invoked, and msk->old_wspace value remain stale. That in turn makes
>> ack generation at recvmsg() time quite randomic.
>>
>> Address the issue ensuring mptcp_write_options() is invoked even
>> for fallback sockets, and just update the needed info in such a
>> case.
>>
>> The issue went unnoticed for a long time, as mptcp overshot the> fallback socket receive buffer autotune significantly.
>
> Should we say "used to overshot"? :)
>
>> Fixes: e3859603ba13 ("mptcp: better msk receive window updates")
>> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/594
>
> Thank you for the fix!
>
> (...)
>
>> @@ -1643,6 +1660,9 @@ void mptcp_write_options(struct tcphdr *th, __be32 *ptr, struct tcp_sock *tp,
>> opts->reset_transient,
>> opts->reset_reason);
>> return;
>> + } else if (unlikely(!opts->suboptions)) {
>
> Small detail: do you mind if I add /* Fallback to TCP */ just above when
> applying the patch?
>
>> + mptcp_track_rwin(tp);
>
> Should we eventually also call mptcp_track_rwin(tp) in case of MP_FAIL
> without MP_RST? Or is this not needed?
>
> Other than that, it looks good to me, and I guess we can already apply
> the patch:
That's what I just did: I guess the modification for MP_FAIL, if really
needed, can still be added later on.
New patches for t/upstream-net and t/upstream:
- c3a571ff59cc: mptcp: fix ack generation for fallback msk
- Results: 751a2432b7d8..5b6b4e3aef4f (export-net)
- Results: 247540d77d20..a96affe5f0a1 (export)
Tests are now in progress:
- export-net:
https://github.com/multipath-tcp/mptcp_net-next/commit/7fade8917eeabe227b5ffae540e9cbe48d3e5046/checks
- export:
https://github.com/multipath-tcp/mptcp_net-next/commit/b10c1d3d1e5da09bbc2b81eca8f4339b84499268/checks
Cheers,
Matt
--
Sponsored by the NGI0 Core fund.
Hi Paolo,
Thanks for this fix.
On Fri, 2025-11-07 at 01:10 +0100, Paolo Abeni wrote:
> mptcp_cleanup_rbuf() need to know the last most recent, mptcp-level
> rcv_wnd sent, and such information is tracked into the msk-
> >old_wspace
> field, updated at ack transmission time by mptcp_write_options().
>
> Fallback socket do not add any mptcp options, such helper is never
> invoked, and msk->old_wspace value remain stale. That in turn makes
> ack generation at recvmsg() time quite randomic.
>
> Address the issue ensuring mptcp_write_options() is invoked even
> for fallback sockets, and just update the needed info in such a
> case.
>
> The issue went unnoticed for a long time, as mptcp overshot the
> fallback socket receive buffer autotune significantly.
>
> Fixes: e3859603ba13 ("mptcp: better msk receive window updates")
> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/594
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
This patch looks good to me!
Reviewed-by: Geliang Tang <geliang@kernel.org>
-Geliang
> ---
> net/mptcp/options.c | 22 +++++++++++++++++++++-
> 1 file changed, 21 insertions(+), 1 deletion(-)
>
> diff --git a/net/mptcp/options.c b/net/mptcp/options.c
> index ad51dcf18984..8d3b0f9754dc 100644
> --- a/net/mptcp/options.c
> +++ b/net/mptcp/options.c
> @@ -838,8 +838,11 @@ bool mptcp_established_options(struct sock *sk,
> struct sk_buff *skb,
>
> opts->suboptions = 0;
>
> + /* Force later mptcp_write_options(), but do not use any
> actual
> + * option space.
> + */
> if (unlikely(__mptcp_check_fallback(msk) &&
> !mptcp_check_infinite_map(skb)))
> - return false;
> + return true;
>
> if (unlikely(skb && TCP_SKB_CB(skb)->tcp_flags &
> TCPHDR_RST)) {
> if (mptcp_established_options_fastclose(sk,
> &opt_size, remaining, opts) ||
> @@ -1351,6 +1354,20 @@ static void mptcp_set_rwin(struct tcp_sock
> *tp, struct tcphdr *th)
> subflow->rcv_wnd_sent = rcv_wnd_new;
> }
>
> +static void mptcp_track_rwin(struct tcp_sock *tp)
> +{
> + const struct sock *ssk = (const struct sock *)tp;
> + struct mptcp_subflow_context *subflow;
> + struct mptcp_sock *msk;
> +
> + if (!ssk)
> + return;
> +
> + subflow = mptcp_subflow_ctx(ssk);
> + msk = mptcp_sk(subflow->conn);
> + WRITE_ONCE(msk->old_wspace, tp->rcv_wnd);
> +}
> +
> __sum16 __mptcp_make_csum(u64 data_seq, u32 subflow_seq, u16
> data_len, __wsum sum)
> {
> struct csum_pseudo_header header;
> @@ -1643,6 +1660,9 @@ void mptcp_write_options(struct tcphdr *th,
> __be32 *ptr, struct tcp_sock *tp,
> opts->reset_transient,
> opts->reset_reason);
> return;
> + } else if (unlikely(!opts->suboptions)) {
> + mptcp_track_rwin(tp);
> + return;
> }
>
> if (OPTION_MPTCP_PRIO & opts->suboptions) {
Hi Paolo,
Thank you for your modifications, that's great!
Our CI did some validations and here is its report:
- KVM Validation: normal (except selftest_mptcp_join): Unstable: 1 failed test(s): packetdrill_dss 🔴
- KVM Validation: normal (only selftest_mptcp_join): Success! ✅
- KVM Validation: debug (except selftest_mptcp_join): Success! ✅
- KVM Validation: debug (only selftest_mptcp_join): Success! ✅
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/19161858402
Initiator: Matthieu Baerts (NGI0)
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/032d2f7924de
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=1020688
If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:
$ cd [kernel source code]
$ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
--pull always mptcp/mptcp-upstream-virtme-docker:latest \
auto-normal
For more details:
https://github.com/multipath-tcp/mptcp-upstream-virtme-docker
Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)
Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)
© 2016 - 2025 Red Hat, Inc.