[PATCH v2 0/5] mptcp: improve mptcp-level window tracking

Paolo Abeni posted 5 patches 2 years ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/multipath-tcp/mptcp_net-next tags/patchew/cover.1650550242.git.pabeni@redhat.com
Maintainers: Mat Martineau <mathew.j.martineau@linux.intel.com>, Matthieu Baerts <matthieu.baerts@tessares.net>, Eric Dumazet <edumazet@google.com>, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>, David Ahern <dsahern@kernel.org>, "David S. Miller" <davem@davemloft.net>, Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
include/net/mptcp.h   |  2 +-
net/ipv4/tcp_output.c | 14 ++++++-----
net/mptcp/mib.c       |  4 +++
net/mptcp/mib.h       |  6 +++++
net/mptcp/options.c   | 58 +++++++++++++++++++++++++++++++++++++------
net/mptcp/protocol.c  | 32 +++++++++++++++---------
net/mptcp/protocol.h  |  2 +-
7 files changed, 90 insertions(+), 28 deletions(-)
[PATCH v2 0/5] mptcp: improve mptcp-level window tracking
Posted by Paolo Abeni 2 years ago
I've been chasing bad/unstable performances with multiple subflows
on very high speed links.

It looks like the root cause is due to the current mptcp-level
congestion window handling. There are apparently a few different
sub-issues:

- the rcv_wnd is not effectively shared on the tx side, as each
  subflow takes in account only the value received by the underlaying
  TCP connection. This is addressed in patch 1/4

- The mptcp-level offered wnd right edge is currently allowed to shrink.
  Reading section 3.3.4.:

"""
   The receive window is relative to the DATA_ACK.  As in TCP, a
   receiver MUST NOT shrink the right edge of the receive window (i.e.,
   DATA_ACK + receive window).  The receiver will use the data sequence
   number to tell if a packet should be accepted at the connection
   level.
"""

I read the above as we need to reflect window right-edge tracking
on the wire, see patch 3/4.

- The offered window right edge tracking can happen concurrently on
  multiple subflows, but there is no mutex protection. We need an
  additional atomic operation - still patch 3/4

This series additionally bump a few new MIBs to track all the above
(ensure/observe that the suspected races actually take place).

I could not access again the host where the issue was su much
noticeable, still in the current setup the tput changes from
[6-18] Gbps to 19Gbps very stable.

 v1 -> v2:
 - pass only the TCP header to tcp_options_write (Mat)
 - fix build issues on some 32 bit arches (intel bot)

RFC -> v1:
 - added patch 3/5 to address Mat's comment, and rebased the
   following on top of it - I hope Eric may tolerate that, it's
   more an hope than guess ;)

Paolo Abeni (5):
  mptcp: really share subflow snd_wnd
  mptcp: add mib for xmit window sharing
  tcp: allow MPTCP to update the announced window.
  mptcp: never shrink offered window
  mptcp: add more offered MIBs counter.

 include/net/mptcp.h   |  2 +-
 net/ipv4/tcp_output.c | 14 ++++++-----
 net/mptcp/mib.c       |  4 +++
 net/mptcp/mib.h       |  6 +++++
 net/mptcp/options.c   | 58 +++++++++++++++++++++++++++++++++++++------
 net/mptcp/protocol.c  | 32 +++++++++++++++---------
 net/mptcp/protocol.h  |  2 +-
 7 files changed, 90 insertions(+), 28 deletions(-)

-- 
2.35.1


Re: [PATCH v2 0/5] mptcp: improve mptcp-level window tracking
Posted by Mat Martineau 2 years ago
On Thu, 21 Apr 2022, Paolo Abeni wrote:

> I've been chasing bad/unstable performances with multiple subflows
> on very high speed links.
>
> It looks like the root cause is due to the current mptcp-level
> congestion window handling. There are apparently a few different
> sub-issues:
>
> - the rcv_wnd is not effectively shared on the tx side, as each
>  subflow takes in account only the value received by the underlaying
>  TCP connection. This is addressed in patch 1/4
>
> - The mptcp-level offered wnd right edge is currently allowed to shrink.
>  Reading section 3.3.4.:
>
> """
>   The receive window is relative to the DATA_ACK.  As in TCP, a
>   receiver MUST NOT shrink the right edge of the receive window (i.e.,
>   DATA_ACK + receive window).  The receiver will use the data sequence
>   number to tell if a packet should be accepted at the connection
>   level.
> """
>
> I read the above as we need to reflect window right-edge tracking
> on the wire, see patch 3/4.
>
> - The offered window right edge tracking can happen concurrently on
>  multiple subflows, but there is no mutex protection. We need an
>  additional atomic operation - still patch 3/4
>
> This series additionally bump a few new MIBs to track all the above
> (ensure/observe that the suspected races actually take place).
>
> I could not access again the host where the issue was su much
> noticeable, still in the current setup the tput changes from
> [6-18] Gbps to 19Gbps very stable.
>
> v1 -> v2:
> - pass only the TCP header to tcp_options_write (Mat)
> - fix build issues on some 32 bit arches (intel bot)

v2 looks good for the export branch, thanks Paolo.

Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>

>
> RFC -> v1:
> - added patch 3/5 to address Mat's comment, and rebased the
>   following on top of it - I hope Eric may tolerate that, it's
>   more an hope than guess ;)
>
> Paolo Abeni (5):
>  mptcp: really share subflow snd_wnd
>  mptcp: add mib for xmit window sharing
>  tcp: allow MPTCP to update the announced window.
>  mptcp: never shrink offered window
>  mptcp: add more offered MIBs counter.
>
> include/net/mptcp.h   |  2 +-
> net/ipv4/tcp_output.c | 14 ++++++-----
> net/mptcp/mib.c       |  4 +++
> net/mptcp/mib.h       |  6 +++++
> net/mptcp/options.c   | 58 +++++++++++++++++++++++++++++++++++++------
> net/mptcp/protocol.c  | 32 +++++++++++++++---------
> net/mptcp/protocol.h  |  2 +-
> 7 files changed, 90 insertions(+), 28 deletions(-)
>
> -- 
> 2.35.1
>
>
>

--
Mat Martineau
Intel

Re: [PATCH v2 0/5] mptcp: improve mptcp-level window tracking
Posted by Matthieu Baerts 2 years ago
Hi Paolo, Mat,

On 21/04/2022 16:20, Paolo Abeni wrote:
> I've been chasing bad/unstable performances with multiple subflows
> on very high speed links.
> 
> It looks like the root cause is due to the current mptcp-level
> congestion window handling. There are apparently a few different
> sub-issues:
> 
> - the rcv_wnd is not effectively shared on the tx side, as each
>   subflow takes in account only the value received by the underlaying
>   TCP connection. This is addressed in patch 1/4
> 
> - The mptcp-level offered wnd right edge is currently allowed to shrink.
>   Reading section 3.3.4.:
> 
> """
>    The receive window is relative to the DATA_ACK.  As in TCP, a
>    receiver MUST NOT shrink the right edge of the receive window (i.e.,
>    DATA_ACK + receive window).  The receiver will use the data sequence
>    number to tell if a packet should be accepted at the connection
>    level.
> """
> 
> I read the above as we need to reflect window right-edge tracking
> on the wire, see patch 3/4.
> 
> - The offered window right edge tracking can happen concurrently on
>   multiple subflows, but there is no mutex protection. We need an
>   additional atomic operation - still patch 3/4
> 
> This series additionally bump a few new MIBs to track all the above
> (ensure/observe that the suspected races actually take place).
> 
> I could not access again the host where the issue was su much
> noticeable, still in the current setup the tput changes from
> [6-18] Gbps to 19Gbps very stable.

Thank you for the patches and reviews!

Now in our tree (feat. for net-next) with Mat's RvB tag:

New patches for t/upstream:
- 61ed3e818378: mptcp: really share subflow snd_wnd
- d5b00c55441f: mptcp: add mib for xmit window sharing
- 0b814d52c6bb: tcp: allow MPTCP to update the announced window
- 87ce505746f5: mptcp: never shrink offered window
- dac3ff7c87fe: mptcp: add more offered MIBs counter
- Results: 9709ecfa06e8..5b7d53ca0bd1 (export)

Builds and tests are now in progress:

https://cirrus-ci.com/github/multipath-tcp/mptcp_net-next/export/20220422T151603
https://github.com/multipath-tcp/mptcp_net-next/actions/workflows/build-validation.yml?query=branch:export

Cheers,
Matt
-- 
Tessares | Belgium | Hybrid Access Solutions
www.tessares.net