mptcp: fix incorrect IPv4/IPv6 check

[PATCH net-next v1] mptcp: fix incorrect IPv4/IPv6 check

Posted by Jiayuan Chen 3 months, 3 weeks ago

When MPTCP falls back to normal TCP, it needs to reset proto_ops. However,
for sockmap and TLS, they have their own custom proto_ops, so simply
checking sk->sk_prot is insufficient.

For example, an IPv6 request might incorrectly follow the IPv4 code path,
leading to kernel panic.

Note that Golang has enabled MPTCP by default [1]

[1] https://go-review.googlesource.com/c/go/+/607715

Fixes: 8e2b8a9fa512 ("mptcp: don't overwrite sock_ops in mptcp_is_tcpsk()")
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
---
 net/mptcp/protocol.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 0292162a14ee..efcdaeff91f8 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -62,10 +62,10 @@ static u64 mptcp_wnd_end(const struct mptcp_sock *msk)
 static const struct proto_ops *mptcp_fallback_tcp_ops(const struct sock *sk)
 {
 #if IS_ENABLED(CONFIG_MPTCP_IPV6)
-	if (sk->sk_prot == &tcpv6_prot)
+	if (sk->sk_family == AF_INET6)
 		return &inet6_stream_ops;
 #endif
-	WARN_ON_ONCE(sk->sk_prot != &tcp_prot);
+	WARN_ON(sk->sk_family != AF_INET);
 	return &inet_stream_ops;
 }
 
-- 
2.43.0

Re: [PATCH net-next v1] mptcp: fix incorrect IPv4/IPv6 check

Posted by Matthieu Baerts 3 months, 3 weeks ago

Hi Jiayuan,

Thank you for sharing this patch!

On 14/10/2025 14:26, Jiayuan Chen wrote:
> When MPTCP falls back to normal TCP, it needs to reset proto_ops. However,
> for sockmap and TLS, they have their own custom proto_ops, so simply
> checking sk->sk_prot is insufficient.
> 
> For example, an IPv6 request might incorrectly follow the IPv4 code path,
> leading to kernel panic.

Did you experiment issues, or is it a supposition? If yes, do you have
traces containing such panics (or just a WARN()?), and ideally the
userspace code that was leading to this?

What is unclear to me is how you got an MPTCP + TLS + sockmap socket.
And if yes, can we set sk_socket->ops to inet(6)_stream_ops and nothing
else without having any other issues?

And do we maybe have to update some code in subflow.c also looking at
sk->sk_prot? I guess no because there, the socket is created by MPTCP,
and it should be set to tcp(v6)_prot. Except if there is some BPF code
that can change that?

> Note that Golang has enabled MPTCP by default [1]
> 
> [1] https://go-review.googlesource.com/c/go/+/607715
> 
> Fixes: 8e2b8a9fa512 ("mptcp: don't overwrite sock_ops in mptcp_is_tcpsk()")
If I understand the issue correctly, was it not present from the
beginning, before the mentioned commit?

> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
> ---
>  net/mptcp/protocol.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
> index 0292162a14ee..efcdaeff91f8 100644
> --- a/net/mptcp/protocol.c
> +++ b/net/mptcp/protocol.c
> @@ -62,10 +62,10 @@ static u64 mptcp_wnd_end(const struct mptcp_sock *msk)
>  static const struct proto_ops *mptcp_fallback_tcp_ops(const struct sock *sk)
>  {
>  #if IS_ENABLED(CONFIG_MPTCP_IPV6)
> -	if (sk->sk_prot == &tcpv6_prot)
> +	if (sk->sk_family == AF_INET6)

sk_prot was proving it was a TCP + IPv4/6 socket, and then that's OK to
set inet(6)_stream_ops. I guess we could only check the family, but, can
we always return inet(6)_stream_ops no matter what sk->sk_prot is?

If the protocol has been modified, the stream one has maybe been
modified too, no?

>  		return &inet6_stream_ops;
>  #endif
> -	WARN_ON_ONCE(sk->sk_prot != &tcp_prot);
> +	WARN_ON(sk->sk_family != AF_INET);

Please keep the WARN_ON_ONCE().

Maybe we should not return inet_stream_ops in case the previous
condition was wrong, and not change sk_socket->ops.

>  	return &inet_stream_ops;
>  }

Note about the subject: if it is a fix for an older commit, it should
target 'net', not 'net-next' (+ cc stable). Can you also have a clearer
subject mentioning 'proto' and 'fallback' please?

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.

Re: [PATCH net-next v1] mptcp: fix incorrect IPv4/IPv6 check

Posted by Jiayuan Chen 3 months, 3 weeks ago

October 14, 2025 at 23:27, "Matthieu Baerts" <matttbe@kernel.org mailto:matttbe@kernel.org?to=%22Matthieu%20Baerts%22%20%3Cmatttbe%40kernel.org%3E > wrote:

> 
> Hi Jiayuan,
> 
> Thank you for sharing this patch!
> 
> On 14/10/2025 14:26, Jiayuan Chen wrote:
> 
> > 
> > When MPTCP falls back to normal TCP, it needs to reset proto_ops. However,
> >  for sockmap and TLS, they have their own custom proto_ops, so simply
> >  checking sk->sk_prot is insufficient.
> >  
> >  For example, an IPv6 request might incorrectly follow the IPv4 code path,
> >  leading to kernel panic.
> > 
> Did you experiment issues, or is it a supposition? If yes, do you have
> traces containing such panics (or just a WARN()?), and ideally the
> userspace code that was leading to this?
> 

Thank you, Matthieu, for your suggestions. I spent some time revisiting the MPTCP logic.

Now I need to describe how sockmap/skmsg works to explain its conflict with MPTCP:

1. skmsg works by replacing sk_data_ready, recvmsg, sendmsg operations and implementing
fast socket-level forwarding logic

2. Users can obtain file descriptors through userspace socket()/accept() interfaces, then
   call BPF syscall to perform these replacements.
3. Users can also use the bpf_sock_hash_update helper (in sockops programs) to replace
   handlers when TCP connections enter ESTABLISHED state (BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB or BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB)

For MPTCP to work with sockmap, I believe we need to address the following points
(please correct me if I have any conceptual misunderstandings about MPTCP):

1. From client perspective: When a user connects to a server via socket(), the kernel
   creates one master sk and at least two subflow sk's. Since the master sk doesn't participate
   in the three-way handshake, in the sockops flow we can only access the subflow sk's.
   In this case, we need to replace the handlers of mptcp_subflow_ctx(sk)->conn rather
   than the subflow sk itself.

2. From server perspective: In BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB, the sk is the MP_CAPABLE
   subflow sk, so similar to the client perspective, we need to replace the handlers of
   mptcp_subflow_ctx(sk)->conn.

If the above description is correct, then my current patch is incorrect. I should focus on
handling the sockmap handler replacement flow properly instead.

Of course, this would require comprehensive selftests to validate.

Returning to the initial issue, the panic occurred on kernel 6.1, but when I tested with the
latest upstream test environment, it only triggered a WARN().
I suspect there have been significant changes in MPTCP during this period.

Re: [PATCH net-next v1] mptcp: fix incorrect IPv4/IPv6 check

Posted by Matthieu Baerts 3 months, 2 weeks ago

Hi Jiayuan,

Thank you for your reply (and sorry for the delay, I was unavailable for
a few days).

On 15/10/2025 16:16, Jiayuan Chen wrote:
> October 14, 2025 at 23:27, "Matthieu Baerts" <matttbe@kernel.org> wrote:
>> On 14/10/2025 14:26, Jiayuan Chen wrote:
>>
>>>
>>> When MPTCP falls back to normal TCP, it needs to reset proto_ops. However,
>>>  for sockmap and TLS, they have their own custom proto_ops, so simply
>>>  checking sk->sk_prot is insufficient.
>>>  
>>>  For example, an IPv6 request might incorrectly follow the IPv4 code path,
>>>  leading to kernel panic.
>>>
>> Did you experiment issues, or is it a supposition? If yes, do you have
>> traces containing such panics (or just a WARN()?), and ideally the
>> userspace code that was leading to this?
>>
> 
> 
> Thank you, Matthieu, for your suggestions. I spent some time revisiting the MPTCP logic.
> 
> 
> Now I need to describe how sockmap/skmsg works to explain its conflict with MPTCP:

OK, so the issue is only with sockmap, not TLS, right?

> 1. skmsg works by replacing sk_data_ready, recvmsg, sendmsg operations and implementing
> fast socket-level forwarding logic
> 
> 2. Users can obtain file descriptors through userspace socket()/accept() interfaces, then
>    call BPF syscall to perform these replacements.
> 3. Users can also use the bpf_sock_hash_update helper (in sockops programs) to replace
>    handlers when TCP connections enter ESTABLISHED state (BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB or BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB)

I appreciate these explanations. I will comment on the v3.

> For MPTCP to work with sockmap, I believe we need to address the following points
> (please correct me if I have any conceptual misunderstandings about MPTCP):
> 
> 1. From client perspective: When a user connects to a server via socket(), the kernel
>    creates one master sk and at least two subflow sk's. Since the master sk doesn't participate
>    in the three-way handshake, in the sockops flow we can only access the subflow sk's.

To be a bit more precise, with MPTCP, you will deal with different
socket types:

- the userspace facing one: it is an MPTCP socket (IPPROTO_MPTCP)

- the in-kernel subflow(s) (= path): they are TCP sockets, but not
  exposed to the userspace.

There is no "master sk" (I hope you didn't look at the previous fork
implementation that was using this name, before the upstreaming
process), but yes, you will have the MPTCP socket, and at least one TCP
socket for the subflow.

>    In this case, we need to replace the handlers of mptcp_subflow_ctx(sk)->conn rather
>    than the subflow sk itself.
>> 2. From server perspective: In BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB,
the sk is the MP_CAPABLE
>    subflow sk, so similar to the client perspective, we need to replace the handlers of
>    mptcp_subflow_ctx(sk)->conn.

On the userspace side, the socket after the 'accept()' is either an
MPTCP socket (IPPROTO_MPTCP) or a TCP one (IPPROTO_TCP) depending on the
request: if the SYN was containing the MP_CAPABLE option or not. If a
plain TCP socket is returned, it is not an MPTCP subflow any more, it is
a "classic" TCP connection.

To get MPTCP support with sockmap, I guess you will need to act at the
MPTCP level: you should never manipulate the data on the TCP subflows
directly, because you will only get a part of the data when multiple
paths are being used. Instead, you should wait for MPTCP to re-order the
data, etc.

> If the above description is correct, then my current patch is incorrect. I should focus on
> handling the sockmap handler replacement flow properly instead.

It would be really great to add MPTCP support in sockmap, but first, I
guess we need a way to prevent issues like the one you saw.

> Of course, this would require comprehensive selftests to validate.
> 
> Returning to the initial issue, the panic occurred on kernel 6.1, but when I tested with the
> latest upstream test environment, it only triggered a WARN().
> I suspect there have been significant changes in MPTCP during this period.

Even if it was only triggering a WARN(), we will still need a fix for
v6.1. Once the series will be ready, do you mind checking what needs to
be done to have the solution working on v6.1? I guess the solution
should be very close to what we will have on v6.18.

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.

Re: [PATCH net-next v1] mptcp: fix incorrect IPv4/IPv6 check

Posted by MPTCP CI 3 months, 3 weeks ago

Hi Jiayuan,

Thank you for your modifications, that's great!

Our CI did some validations and here is its report:

- KVM Validation: normal (except selftest_mptcp_join): Success! ✅
- KVM Validation: normal (only selftest_mptcp_join): Success! ✅
- KVM Validation: debug (except selftest_mptcp_join): Unstable: 1 failed test(s): packetdrill_mp_join 🔴
- KVM Validation: debug (only selftest_mptcp_join): Success! ✅
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/18497133102

Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/3d065c91b0dd
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=1011291


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-normal

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)