[v2] mptcp: fix fallback-related races

[PATCH mptcp-net v2 0/5] mptcp: fix fallback-related races

Posted by Paolo Abeni 7 months, 1 week ago

This series contains 3 fixes somewhat related to various races we have
while handling fallback and 2 small follow-up likely more suited for
net-next.

The root cause of the issues addressed here is that the check for
"we can fallback to tcp now" and the related action are not atomic. That
also applies to fallback due to MP_FAIL - where the window race is even
wider.

Address the issue introducing an additional spinlock to bundle together
all the relevant events, as per patch 1 and 2.

Note that mptcp_disconnect() unconditionally
clears the fallback status (zeroing msk->flags) and that may race with
operation still running on the (closing) subflows.

Such race is addressed in patch 3.

Patch 4 cleans up a bit the fallback code, introducing specific MIB for
each FB reason, and patch 5 drops the, hopefully now redundandt
pr_fallback().

Paolo Abeni (5):
  mptcp: make fallback action and fallback decision atomic
  mptcp: plug races between subflow fail and subflow creation
  mptcp: fix status reset on disconnect()
  mptcp: track fallbacks accurately via mibs
  mptcp: remove pr_fallback()

 net/mptcp/ctrl.c     |   4 +-
 net/mptcp/mib.c      |   5 ++
 net/mptcp/mib.h      |   7 +++
 net/mptcp/options.c  |   4 +-
 net/mptcp/pm.c       |   8 ++-
 net/mptcp/protocol.c | 126 +++++++++++++++++++++++++++++++++++--------
 net/mptcp/protocol.h |  35 ++++++------
 net/mptcp/subflow.c  |  35 ++++++------
 8 files changed, 164 insertions(+), 60 deletions(-)

-- 
2.49.0

Re: [PATCH mptcp-net v2 0/5] mptcp: fix fallback-related races

Posted by MPTCP CI 7 months ago

Hi Paolo,

Thank you for your modifications, that's great!

But sadly, our CI spotted some issues with it when trying to build it.

You can find more details there:

  https://github.com/multipath-tcp/mptcp_net-next/actions/runs/16110745400

Status: failure
Initiator: Matthieu Baerts (NGI0)
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/54ab1b69acaf
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=979295

Feel free to reply to this email if you cannot access logs, if you need
some support to fix the error, if this doesn't seem to be caused by your
modifications or if the error is a false positive one.

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)

Re: [PATCH mptcp-net v2 0/5] mptcp: fix fallback-related races

Posted by Matthieu Baerts 7 months ago

Hi Paolo,

On 05/07/2025 09:24, Paolo Abeni wrote:
> This series contains 3 fixes somewhat related to various races we have
> while handling fallback and 2 small follow-up likely more suited for
> net-next.
> 
> The root cause of the issues addressed here is that the check for
> "we can fallback to tcp now" and the related action are not atomic. That
> also applies to fallback due to MP_FAIL - where the window race is even
> wider.
> 
> Address the issue introducing an additional spinlock to bundle together
> all the relevant events, as per patch 1 and 2.
> 
> Note that mptcp_disconnect() unconditionally
> clears the fallback status (zeroing msk->flags) and that may race with
> operation still running on the (closing) subflows.
> 
> Such race is addressed in patch 3.
> 
> Patch 4 cleans up a bit the fallback code, introducing specific MIB for
> each FB reason, and patch 5 drops the, hopefully now redundandt
> pr_fallback().

Thank you very much for the fixes!

> Paolo Abeni (5):
>   mptcp: make fallback action and fallback decision atomic
>   mptcp: plug races between subflow fail and subflow creation
>   mptcp: fix status reset on disconnect()
>   mptcp: track fallbacks accurately via mibs
>   mptcp: remove pr_fallback()

I have a few small comments, please see the individual patches if you
don't mind!

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.

Re: [PATCH mptcp-net v2 0/5] mptcp: fix fallback-related races

Posted by MPTCP CI 7 months ago

Hi Paolo,

Thank you for your modifications, that's great!

Our CI did some validations and here is its report:

- KVM Validation: normal: Success! ✅
- KVM Validation: debug: Success! ✅
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/16110745406

Initiator: Matthieu Baerts (NGI0)
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/54ab1b69acaf
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=979295


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-normal

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)