[v4] mptcp: introduce backlog processing

[PATCH v4 mptcp-next 0/8] mptcp: introduce backlog processing

Posted by Paolo Abeni 1 week ago

This series includes RX path improvement built around backlog processing

The main goals are improving the RX performances _and_ increase the
long term maintainability.

Patches 1-3 prepare the stack for backlog processing, removing
assumptions that will not hold true anymore after backlog introduction.

Patch 4 fixes a long standing issue which is quite hard to reproduce
with the current implementation but will become very apparent with
backlog usage.

Patches 5 and 6 are more cleanups that will make the backlog patch a
little less huge.

Patch 7 is a somewhat unrelated cleanup, included here before I forgot
about it.

The real work is done by patch 8. There are significant changes vs the
previous iteration, as it turned out we can't uset the sk_backlog, as
the mptcp release callback can also release and re-acquire the msk-level
spinlock and core backlog processing works under the assumption that
such event is not possible.

Other relevant points are:
- skbs in the backlog are _not_ accounted. TCP does the same, and we
  can't update the fwd mem while enqueuing to the backlog as the caller
  does not own the msk-level socket lock nor can acquire it.
- skbs in the backlog still use the incoming ssk rmem. This allows
  backpressure and implicitly prevent excessive memory usage for the
  backlog itself
- [this is possibly the most critical point]: when the msk rx buf is
  full, we don't add more packets there even when the caller owns the
  msk socket lock. Instead packets are added to the backlog. Note that
  the amount of memory used there is still limited by the above. Also
  note that this implicitly means that such packets could stage in the
  backlog until the receiver flushes the rx buffer - an unbound amount
  of time. That is not supposed to happen for the backlog, ence the
  criticality here.

This survived a few hours of selftest iterations in a loop: should
address all functional issues observed in previous iterations (and
possibly includes different ones ;)

Paolo Abeni (8):
  mptcp: borrow forward memory from subflow
  mptcp: cleanup fallback data fin reception
  mptcp: cleanup fallback dummy mapping generation
  mptcp: fix MSG_PEEK stream corruption
  mptcp: ensure the kernel PM does not take action too late
  mptcp: do not miss early first subflow close event notification.
  mptcp: make mptcp_destroy_common() static
  mptcp: leverage the backlog for RX packet processing

 net/mptcp/pm.c        |   4 +-
 net/mptcp/pm_kernel.c |   2 +
 net/mptcp/protocol.c  | 306 +++++++++++++++++++++++++++---------------
 net/mptcp/protocol.h  |   7 +-
 net/mptcp/subflow.c   |  12 +-
 5 files changed, 215 insertions(+), 116 deletions(-)

-- 
2.51.0

Re: [PATCH v4 mptcp-next 0/8] mptcp: introduce backlog processing

Posted by MPTCP CI 1 week ago

Hi Paolo,

Thank you for your modifications, that's great!

Our CI did some validations and here is its report:

- KVM Validation: normal (except selftest_mptcp_join): Success! ✅
- KVM Validation: normal (only selftest_mptcp_join): Success! ✅
- KVM Validation: debug (except selftest_mptcp_join): Success! ✅
- KVM Validation: debug (only selftest_mptcp_join): Unstable: 1 failed test(s): selftest_mptcp_join - Critical: 4 Call Trace(s) ❌
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/18226665550

Initiator: Matthieu Baerts (NGI0)
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/2ed154e487d2
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=1008293


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-normal

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)

Re: [PATCH v4 mptcp-next 0/8] mptcp: introduce backlog processing

Posted by Geliang Tang 1 week ago

Hi Paolo,

Thanks for this v4.

On Fri, 2025-10-03 at 16:01 +0200, Paolo Abeni wrote:
> This series includes RX path improvement built around backlog
> processing
> 
> The main goals are improving the RX performances _and_ increase the
> long term maintainability.
> 
> Patches 1-3 prepare the stack for backlog processing, removing
> assumptions that will not hold true anymore after backlog
> introduction.
> 
> Patch 4 fixes a long standing issue which is quite hard to reproduce
> with the current implementation but will become very apparent with
> backlog usage.
> 
> Patches 5 and 6 are more cleanups that will make the backlog patch a
> little less huge.
> 
> Patch 7 is a somewhat unrelated cleanup, included here before I
> forgot
> about it.
> 
> The real work is done by patch 8. There are significant changes vs
> the
> previous iteration, as it turned out we can't uset the sk_backlog, as
> the mptcp release callback can also release and re-acquire the msk-
> level
> spinlock and core backlog processing works under the assumption that
> such event is not possible.
> 
> Other relevant points are:
> - skbs in the backlog are _not_ accounted. TCP does the same, and we
>   can't update the fwd mem while enqueuing to the backlog as the
> caller
>   does not own the msk-level socket lock nor can acquire it.
> - skbs in the backlog still use the incoming ssk rmem. This allows
>   backpressure and implicitly prevent excessive memory usage for the
>   backlog itself
> - [this is possibly the most critical point]: when the msk rx buf is
>   full, we don't add more packets there even when the caller owns the
>   msk socket lock. Instead packets are added to the backlog. Note
> that
>   the amount of memory used there is still limited by the above. Also
>   note that this implicitly means that such packets could stage in
> the
>   backlog until the receiver flushes the rx buffer - an unbound
> amount
>   of time. That is not supposed to happen for the backlog, ence the
>   criticality here.
> 
> This survived a few hours of selftest iterations in a loop: should
> address all functional issues observed in previous iterations (and
> possibly includes different ones ;)

This set looks good to me!

Reviewed-by: Geliang Tang <geliang@kernel.org>

Thanks,
-Geliang

> 
> Paolo Abeni (8):
>   mptcp: borrow forward memory from subflow
>   mptcp: cleanup fallback data fin reception
>   mptcp: cleanup fallback dummy mapping generation
>   mptcp: fix MSG_PEEK stream corruption
>   mptcp: ensure the kernel PM does not take action too late
>   mptcp: do not miss early first subflow close event notification.
>   mptcp: make mptcp_destroy_common() static
>   mptcp: leverage the backlog for RX packet processing
> 
>  net/mptcp/pm.c        |   4 +-
>  net/mptcp/pm_kernel.c |   2 +
>  net/mptcp/protocol.c  | 306 +++++++++++++++++++++++++++-------------
> --
>  net/mptcp/protocol.h  |   7 +-
>  net/mptcp/subflow.c   |  12 +-
>  5 files changed, 215 insertions(+), 116 deletions(-)
>