include/net/sock.h | 13 --- net/core/sock.c | 2 +- net/ipv4/af_inet.c | 2 +- net/ipv4/inet_diag.c | 2 +- net/mptcp/fastopen.c | 4 +- net/mptcp/protocol.c | 259 +++++++++++++------------------------------ net/mptcp/protocol.h | 6 +- net/mptcp/subflow.c | 33 +++--- net/sched/em_meta.c | 2 +- 9 files changed, 104 insertions(+), 219 deletions(-)
This is a batch of changes I had sitting in my local tree for a while. Why another refactor you may ask? Two main resons: - currently the mptcp RX path introduces quite a bit of 'exceptional' accounting/locking processing WRT to plain TCP, adding up to the implementation complexity in a misurable way - the performance gap WRT plain TCP for single subflow connections is quite measurable. The present refactor addresses both the above items: most of the additional complexity is dropped, and single stream performances increase measurably - from 55Gbps to 71Gbps in my loopback test. As a reference, plain TCP is around 84Gps on the same host. The above comes to a price: the patch are invasive, even in subtle ways: the chance of destabilizing the implementation is real (ence the additional, intentional '-next' into the subj). In any case keeping the patch hidden for longer was not going to do any good, so here we are. Changes from v1: - fixed several data stream corruption and wake-up misses due to multi subflows races - added patches 1-3 mainly to address the above - added an additional follow-up patch (patch 7) with more cleanup Paolo Abeni (7): mptcp: prevent excessive coalescing on receive tcp: fix recvbuffer adjust on sleeping rcvmsg mptcp: don't always assume copied data in mptcp_cleanup_rbuf() mptcp: consolidate subflow cleanup mptcp: move the whole rx path under msk socket lock protection mptcp: cleanup mem accounting. net: dismiss sk_forward_alloc_get() include/net/sock.h | 13 --- net/core/sock.c | 2 +- net/ipv4/af_inet.c | 2 +- net/ipv4/inet_diag.c | 2 +- net/mptcp/fastopen.c | 4 +- net/mptcp/protocol.c | 259 +++++++++++++------------------------------ net/mptcp/protocol.h | 6 +- net/mptcp/subflow.c | 33 +++--- net/sched/em_meta.c | 2 +- 9 files changed, 104 insertions(+), 219 deletions(-) -- 2.45.2
On Fri, 6 Dec 2024, Paolo Abeni wrote: > This is a batch of changes I had sitting in my local tree for a while. > Why another refactor you may ask? Two main resons: > > - currently the mptcp RX path introduces quite a bit of 'exceptional' > accounting/locking processing WRT to plain TCP, adding up to the > implementation complexity in a misurable way > - the performance gap WRT plain TCP for single subflow connections is > quite measurable. > > The present refactor addresses both the above items: most of the > additional complexity is dropped, and single stream performances > increase measurably - from 55Gbps to 71Gbps in my loopback test. As a > reference, plain TCP is around 84Gps on the same host. > Hi Paolo - Thanks for the v2 and for sharing the performance numbers - great to see such a dramatic improvement in throughput! > The above comes to a price: the patch are invasive, even in subtle ways: > the chance of destabilizing the implementation is real (ence the > additional, intentional '-next' into the subj). > I think we will also need to be extra careful about monitoring stable tree backports that modify the affected functions. But to me the simplifications and performance fixes are worth it. It will help to get this in to the export branch for further testing! Reviewed-by: Mat Martineau <martineau@kernel.org> > In any case keeping the patch hidden for longer was not going to do any > good, so here we are. > > Changes from v1: > - fixed several data stream corruption and wake-up misses due > to multi subflows races > - added patches 1-3 mainly to address the above > - added an additional follow-up patch (patch 7) with more cleanup > > Paolo Abeni (7): > mptcp: prevent excessive coalescing on receive > tcp: fix recvbuffer adjust on sleeping rcvmsg > mptcp: don't always assume copied data in mptcp_cleanup_rbuf() > mptcp: consolidate subflow cleanup > mptcp: move the whole rx path under msk socket lock protection > mptcp: cleanup mem accounting. > net: dismiss sk_forward_alloc_get() > > include/net/sock.h | 13 --- > net/core/sock.c | 2 +- > net/ipv4/af_inet.c | 2 +- > net/ipv4/inet_diag.c | 2 +- > net/mptcp/fastopen.c | 4 +- > net/mptcp/protocol.c | 259 +++++++++++++------------------------------ > net/mptcp/protocol.h | 6 +- > net/mptcp/subflow.c | 33 +++--- > net/sched/em_meta.c | 2 +- > 9 files changed, 104 insertions(+), 219 deletions(-) > > -- > 2.45.2 > > >
On 12/6/24 13:09, Paolo Abeni wrote: > The present refactor addresses both the above items: most of the > additional complexity is dropped, and single stream performances > increase measurably - from 55Gbps to 71Gbps in my loopback test. As a > reference, plain TCP is around 84Gps on the same host. I forgot to update this section. I did some additional test with multiple links, differences vs the existing codebase is within the noise range. Disclaimer: tests over multiple veth pairs, with RPS to use multiple cores for the forwarding. This is a somewhat decent approximation of a real setup with multiple H/W links but I still have to try the latter. I think the issue reported by the CI in the previous iteration should be addressed here - or at least this iteration addresses the closest problem I was able to reproduce locally. I'm not aware of any standing issue, but I would bet a few coins this is not bug-free. Paolo
Hi Paolo, Thank you for your modifications, that's great! Our CI did some validations and here is its report: - KVM Validation: normal: Success! ✅ - KVM Validation: debug: Success! ✅ - KVM Validation: btf-normal (only bpftest_all): Success! ✅ - KVM Validation: btf-debug (only bpftest_all): Success! ✅ - Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/12198751082 Initiator: Patchew Applier Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/1c3b9857b5b4 Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=915337 If there are some issues, you can reproduce them using the same environment as the one used by the CI thanks to a docker image, e.g.: $ cd [kernel source code] $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \ --pull always mptcp/mptcp-upstream-virtme-docker:latest \ auto-normal For more details: https://github.com/multipath-tcp/mptcp-upstream-virtme-docker Please note that despite all the efforts that have been already done to have a stable tests suite when executed on a public CI like here, it is possible some reported issues are not due to your modifications. Still, do not hesitate to help us improve that ;-) Cheers, MPTCP GH Action bot Bot operated by Matthieu Baerts (NGI0 Core)
Hi Paolo, Mat, On 06/12/2024 13:09, Paolo Abeni wrote: > This is a batch of changes I had sitting in my local tree for a while. > Why another refactor you may ask? Two main resons: > > - currently the mptcp RX path introduces quite a bit of 'exceptional' > accounting/locking processing WRT to plain TCP, adding up to the > implementation complexity in a misurable way > - the performance gap WRT plain TCP for single subflow connections is > quite measurable. > > The present refactor addresses both the above items: most of the > additional complexity is dropped, and single stream performances > increase measurably - from 55Gbps to 71Gbps in my loopback test. As a > reference, plain TCP is around 84Gps on the same host. > > The above comes to a price: the patch are invasive, even in subtle ways: > the chance of destabilizing the implementation is real (ence the > additional, intentional '-next' into the subj). > > In any case keeping the patch hidden for longer was not going to do any > good, so here we are. Thank you for the patches and the reviews! Patches 2 and 3 are now in the "fixes for net", while the others are in "feat. for next". I think patch 1 can go to "Fixes for net", any objections? New patches for t/upstream-net and t/upstream: - d443f83de23d: mptcp: fix recvbuffer adjust on sleeping rcvmsg - 11d4943d66ee: mptcp: don't always assume copied data in mptcp_cleanup_rbuf() - Results: 3a0f411ba1c5..2116e71814b7 (export-net) - Results: df3193ae83eb..953b0d388720 (export) Tests are now in progress: - export-net: https://github.com/multipath-tcp/mptcp_net-next/commit/047c2f99575350d9971b6257f96ce5f734f0186a/checks New patches for t/upstream: - 972d8a5b7f20: mptcp: prevent excessive coalescing on receive - 0b0a25a24d21: mptcp: consolidate subflow cleanup - 533bb69f1b0d: mptcp: move the whole rx path under msk socket lock protection - c5d13b230497: mptcp: cleanup mem accounting - a564b9923a94: net: dismiss sk_forward_alloc_get() - Results: 953b0d388720..9213903d77ba (export) Tests are now in progress: - export: https://github.com/multipath-tcp/mptcp_net-next/commit/6aae9c8bd8361dd90a7def87459ca0173915c56c/checks Cheers, Matt -- Sponsored by the NGI0 Core fund.
© 2016 - 2025 Red Hat, Inc.