[RFC PATCH mptcp-next 0/4] MP_JOIN fallback for TCP listeners

Mat Martineau posted 4 patches 2 years, 1 month ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/multipath-tcp/mptcp_net-next tags/patchew/20220224232226.259304-1-mathew.j.martineau@linux.intel.com
Maintainers: "David S. Miller" <davem@davemloft.net>, David Ahern <dsahern@kernel.org>, Matthieu Baerts <matthieu.baerts@tessares.net>, Eric Dumazet <edumazet@google.com>, Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>, Mat Martineau <mathew.j.martineau@linux.intel.com>, Jakub Kicinski <kuba@kernel.org>
include/linux/tcp.h      |  3 ++-
include/net/mptcp.h      | 18 ++++++++++++++++++
include/net/tcp.h        | 10 +++++++++-
net/ipv4/tcp_input.c     | 12 ++++++++++--
net/ipv4/tcp_ipv4.c      | 13 ++++++++++---
net/ipv4/tcp_minisocks.c | 14 ++++++++++----
net/ipv6/tcp_ipv6.c      | 13 ++++++++++---
net/mptcp/protocol.h     | 12 ------------
8 files changed, 69 insertions(+), 26 deletions(-)
[RFC PATCH mptcp-next 0/4] MP_JOIN fallback for TCP listeners
Posted by Mat Martineau 2 years, 1 month ago
In discussion of Florian's "replace per-addr listener sockets" series, I
proposed this:

    One possibility that doesn't disturb the hot path too much: Let the
    TCP listener try to handle the SYN. In tcp_parse_options(), add just
    enough code to set a bit in struct tcp_options_received if MP_JOIN
    is present (don't fully parse the MPTCP option). If MP_JOIN is there
    and MPTCP is enabled, return early from tcp_check_req() and have
    tcp_v4_rcv() treat it similar to req_stolen==true. Instead of
    kicking back to a regular lookup, try a token lookup.

It seemed to me that the hot path impact would be minimal. To better
show that, I put together these RFC patches. Only one conditional is
added to the hot path, checking a bit in struct tcp_received options
(which should be in the data cache anyway). The other hooks are:

 * tcp_parse_options() switch statement (only executed if a MPTCP option
   is present).

 * The tcp_check_req()-returned-NULL error paths for IPv4 and IPv6.

In addition, the added code in this series is almost entirely optimized
out if CONFIG_MPTCP is not set. With one more ifdef for the
tcp_parse_option() change, the optimized kernel binary may even be
entirely unchanged.

Compared to Florian's proposal (add token lookup before the listener
lookup) this does change more lines of TCP code, but has less runtime
impact.

Major caveat: this is very lightly tested, build and a quick check to
make sure it didn't break TCP.

This applies on top of "mptcp: replace per-addr listener sockets" v4.


Mat Martineau (4):
  mptcp: Move some symbols to be visible outside the MPTCP subsystem
  tcp: Allow tcp_check_req() to return more status values
  tcp: Check for the presence of a MP_JOIN option
  tcp: Add TCP hooks for detecting MP_JOIN on a regular TCP listener

 include/linux/tcp.h      |  3 ++-
 include/net/mptcp.h      | 18 ++++++++++++++++++
 include/net/tcp.h        | 10 +++++++++-
 net/ipv4/tcp_input.c     | 12 ++++++++++--
 net/ipv4/tcp_ipv4.c      | 13 ++++++++++---
 net/ipv4/tcp_minisocks.c | 14 ++++++++++----
 net/ipv6/tcp_ipv6.c      | 13 ++++++++++---
 net/mptcp/protocol.h     | 12 ------------
 8 files changed, 69 insertions(+), 26 deletions(-)

-- 
2.35.1


Re: [RFC PATCH mptcp-next 0/4] MP_JOIN fallback for TCP listeners
Posted by Paolo Abeni 2 years, 1 month ago
On Thu, 2022-02-24 at 15:22 -0800, Mat Martineau wrote:
> Compared to Florian's proposal (add token lookup before the listener
> lookup) this does change more lines of TCP code, but has less runtime
> impact.

I think this will need more changes to TCP code, see my reply to patch
4/4.

I really don't know which one will affect plain TCP setup more from
performance PoV: we will have a token lookup for each incoming syn
packet with both impl. With this patch we still need "mptcp: replace
per-addr listener sockets", right? otherwise issues/203 will not be
addressed.

The thing that concerns me more is that this approach brings quite a
bit of MTPCP logic inside the TCP code with little/no isolation.

More I think about it and more I belive we will should try to code as
simple as possible. And the easyest way to address issues/203 is
possibly at the documentation level...

Thanks,

Paolo



Re: [RFC PATCH mptcp-next 0/4] MP_JOIN fallback for TCP listeners
Posted by Mat Martineau 2 years, 1 month ago
On Fri, 25 Feb 2022, Paolo Abeni wrote:

> On Thu, 2022-02-24 at 15:22 -0800, Mat Martineau wrote:
>> Compared to Florian's proposal (add token lookup before the listener
>> lookup) this does change more lines of TCP code, but has less runtime
>> impact.
>
> I think this will need more changes to TCP code, see my reply to patch
> 4/4.
>
> I really don't know which one will affect plain TCP setup more from
> performance PoV: we will have a token lookup for each incoming syn
> packet with both impl.

This proposal has less runtime impact on plain TCP. It uses the existing 
TCP option parsing pass in tcp_check_req(), and adds one conditional 
that's looking at the new saw_mp_join bit that was possibly set when 
parsing options.

Florian's "mptcp: handle join requests early" is a smaller change to the 
TCP code, but adds an extra function call and a new TCP option parsing 
pass for every incoming SYN. Since the MP_JOIN option will not typically 
be present, it doesn't look like there are many extra CPU cycles other 
than that option parsing pass.

> With this patch we still need "mptcp: replace
> per-addr listener sockets", right? otherwise issues/203 will not be
> addressed.

Right.

>
> The thing that concerns me more is that this approach brings quite a
> bit of MTPCP logic inside the TCP code with little/no isolation.
>

If there's enough interest for a RFC v2 I could refactor with some new 
mptcp helpers.

> More I think about it and more I belive we will should try to code as
> simple as possible. And the easyest way to address issues/203 is
> possibly at the documentation level...
>

There seems to be enough benefit to socket users to justify some amount of 
code change to avoid confusing and hard-to-troubleshoot behavior. I still 
lean toward making it work better, and acknowledge the need for better 
docs no matter the solution :)

--
Mat Martineau
Intel