From: Gang Yan <yangang@kylinos.cn>
After an MPTCP connection is established, the sk_sndbuf of client's msk
can be updated through 'subflow_finish_connect'. However, the newly
accepted msk on the server side has a small sk_sndbuf than
msk->first->sk_sndbuf:
'''
MPTCP: msk:00000000e55b09db, msk->sndbuf:20480, msk->first->sndbuf:2626560
'''
This means that when the server immediately sends MSG_DONTWAIT data to
the client after the connection is established, it is more likely to
encounter EAGAIN.
This patch synchronizes the sk_sndbuf by triggering its update during accept.
Fixes: 8005184fd1ca ("mptcp: refactor sndbuf auto-tuning")
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/602
Signed-off-by: Gang Yan <yangang@kylinos.cn>
---
Notes:
Hi Paolo, Matt,
Sorry for the late response for this patch. I've been analyzing this
issue recently, and the basic picture is as follows:
The root cause is a timing gap between msk creation and TCP sndbuf
auto-tuning on the server side:
1. When the server receives the SYN, mptcp_sk_clone_init() creates the
msk and calls __mptcp_propagate_sndbuf(). At this point, the TCP
subflow is still in SYN_RCVD state, so its sk_sndbuf has only the
initial value (tcp_wmem[1], typically ~16KB).
2. When the 3-way handshake completes (ACK received), the TCP stack
calls tcp_init_buffer_space() -> tcp_sndbuf_expand(), which grows
the subflow's sk_sndbuf based on MSS, congestion window, etc.
(potentially up to tcp_wmem[2], ~4MB).
3. However, this auto-tuning happens deep in the TCP stack without
any callback to MPTCP, so msk->sk_sndbuf is never updated to
reflect the new subflow sndbuf value.
4. When accept() returns, msk->sk_sndbuf still holds the small initial
value, while msk->first->sk_sndbuf has been auto-tuned to a much
larger value.
In contrast, the active (client) side doesn't have this issue because
subflow_finish_connect() calls mptcp_propagate_state() after the TCP
sndbuf auto-tuning has already occurred, ensuring proper synchronization.
Thanks
Gang
net/mptcp/protocol.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index b5676b37f8f4..17e43aff4459 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -4232,6 +4232,7 @@ static int mptcp_stream_accept(struct socket *sock, struct socket *newsock,
mptcp_graft_subflows(newsk);
mptcp_rps_record_subflows(msk);
+ __mptcp_propagate_sndbuf(newsk, mptcp_subflow_tcp_sock(subflow));
/* Do late cleanup for the first subflow as necessary. Also
* deal with bad peers not doing a complete shutdown.
--
2.43.0
Hi Gang,
Thank you for your modifications, that's great!
Our CI did some validations and here is its report:
- KVM Validation: normal (except selftest_mptcp_join): Notice: Boot failures, rebooted and continued 🔴
- KVM Validation: normal (only selftest_mptcp_join): Notice: Call Traces at boot time, rebooted and continued 🔴
- KVM Validation: debug (except selftest_mptcp_join): Success! ✅
- KVM Validation: debug (only selftest_mptcp_join): Success! ✅
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/22752507226
Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/56845d31abb3
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=1062356
If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:
$ cd [kernel source code]
$ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
--pull always mptcp/mptcp-upstream-virtme-docker:latest \
auto-normal
For more details:
https://github.com/multipath-tcp/mptcp-upstream-virtme-docker
Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)
Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)
© 2016 - 2026 Red Hat, Inc.