net/mptcp/ctrl.c | 6 +++++- net/mptcp/protocol.h | 1 + net/mptcp/subflow.c | 15 +++++++++------ 3 files changed, 15 insertions(+), 7 deletions(-)
The ehash table lookups are lockless and rely on
SLAB_TYPESAFE_BY_RCU to guarantee socket memory stability
during RCU read-side critical sections. Both tcp_prot and
tcpv6_prot have their slab caches created with this flag
via proto_register().
However, MPTCP's mptcp_subflow_init() copies tcpv6_prot into
tcpv6_prot_override during inet_init() (fs_initcall, level 5),
before inet6_init() (module_init/device_initcall, level 6) has
called proto_register(&tcpv6_prot). At that point,
tcpv6_prot.slab is still NULL, so tcpv6_prot_override.slab
remains NULL permanently.
This causes MPTCP v6 subflow child sockets to be allocated via
kmalloc (falling into kmalloc-4k) instead of the TCPv6 slab
cache. The kmalloc-4k cache lacks SLAB_TYPESAFE_BY_RCU, so
when these sockets are freed without SOCK_RCU_FREE (which is
cleared for child sockets by design), the memory can be
immediately reused. Concurrent ehash lookups under
rcu_read_lock can then access freed memory, triggering a
slab-use-after-free in __inet_lookup_established.
Fix this by splitting the IPv6-specific initialization out of
mptcp_subflow_init() into a new mptcp_subflow_v6_init(), which
is called from mptcpv6_init() after proto_register(&tcpv6_prot)
has completed. This ensures tcpv6_prot_override.slab correctly
inherits the SLAB_TYPESAFE_BY_RCU slab cache.
Fixes: b19bc2945b40 ("mptcp: implement delegated actions")
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
---
net/mptcp/ctrl.c | 6 +++++-
net/mptcp/protocol.h | 1 +
net/mptcp/subflow.c | 15 +++++++++------
3 files changed, 15 insertions(+), 7 deletions(-)
diff --git a/net/mptcp/ctrl.c b/net/mptcp/ctrl.c
index d96130e49942..5887ddcdb875 100644
--- a/net/mptcp/ctrl.c
+++ b/net/mptcp/ctrl.c
@@ -583,7 +583,11 @@ int __init mptcpv6_init(void)
int err;
err = mptcp_proto_v6_init();
+ if (err)
+ return err;
- return err;
+ mptcp_subflow_v6_init();
+
+ return 0;
}
#endif
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index 0bd1ee860316..ec15e503da8b 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -875,6 +875,7 @@ static inline void mptcp_subflow_tcp_fallback(struct sock *sk,
void __init mptcp_proto_init(void);
#if IS_ENABLED(CONFIG_MPTCP_IPV6)
int __init mptcp_proto_v6_init(void);
+void __init mptcp_subflow_v6_init(void);
#endif
struct sock *mptcp_sk_clone_init(const struct sock *sk,
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 6716970693e9..4ff5863aa9fd 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -2165,7 +2165,15 @@ void __init mptcp_subflow_init(void)
tcp_prot_override.psock_update_sk_prot = NULL;
#endif
+ mptcp_diag_subflow_init(&subflow_ulp_ops);
+
+ if (tcp_register_ulp(&subflow_ulp_ops) != 0)
+ panic("MPTCP: failed to register subflows to ULP\n");
+}
+
#if IS_ENABLED(CONFIG_MPTCP_IPV6)
+void __init mptcp_subflow_v6_init(void)
+{
/* In struct mptcp_subflow_request_sock, we assume the TCP request sock
* structures for v4 and v6 have the same size. It should not changed in
* the future but better to make sure to be warned if it is no longer
@@ -2204,10 +2212,5 @@ void __init mptcp_subflow_init(void)
/* Disable sockmap processing for subflows */
tcpv6_prot_override.psock_update_sk_prot = NULL;
#endif
-#endif
-
- mptcp_diag_subflow_init(&subflow_ulp_ops);
-
- if (tcp_register_ulp(&subflow_ulp_ops) != 0)
- panic("MPTCP: failed to register subflows to ULP\n");
}
+#endif
--
2.43.0
Hi Jiayuan,
On 03/04/2026 15:07, Jiayuan Chen wrote:
> The ehash table lookups are lockless and rely on
> SLAB_TYPESAFE_BY_RCU to guarantee socket memory stability
> during RCU read-side critical sections. Both tcp_prot and
> tcpv6_prot have their slab caches created with this flag
> via proto_register().
>
> However, MPTCP's mptcp_subflow_init() copies tcpv6_prot into
> tcpv6_prot_override during inet_init() (fs_initcall, level 5),
> before inet6_init() (module_init/device_initcall, level 6) has
> called proto_register(&tcpv6_prot). At that point,
> tcpv6_prot.slab is still NULL, so tcpv6_prot_override.slab
> remains NULL permanently.
>
> This causes MPTCP v6 subflow child sockets to be allocated via
> kmalloc (falling into kmalloc-4k) instead of the TCPv6 slab
> cache. The kmalloc-4k cache lacks SLAB_TYPESAFE_BY_RCU, so
> when these sockets are freed without SOCK_RCU_FREE (which is
> cleared for child sockets by design), the memory can be
> immediately reused. Concurrent ehash lookups under
> rcu_read_lock can then access freed memory, triggering a
> slab-use-after-free in __inet_lookup_established.
Good catch! Thank you for this patch.
> Fix this by splitting the IPv6-specific initialization out of
> mptcp_subflow_init() into a new mptcp_subflow_v6_init(), which
> is called from mptcpv6_init() after proto_register(&tcpv6_prot)
> has completed. This ensures tcpv6_prot_override.slab correctly
> inherits the SLAB_TYPESAFE_BY_RCU slab cache.
The split makes sense anyway: better to regroup all v6-related init steps.
> Fixes: b19bc2945b40 ("mptcp: implement delegated actions")
> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
> ---
> net/mptcp/ctrl.c | 6 +++++-
> net/mptcp/protocol.h | 1 +
> net/mptcp/subflow.c | 15 +++++++++------
> 3 files changed, 15 insertions(+), 7 deletions(-)
>
> diff --git a/net/mptcp/ctrl.c b/net/mptcp/ctrl.c
> index d96130e49942..5887ddcdb875 100644
> --- a/net/mptcp/ctrl.c
> +++ b/net/mptcp/ctrl.c
> @@ -583,7 +583,11 @@ int __init mptcpv6_init(void)
> int err;
>
> err = mptcp_proto_v6_init();
> + if (err)
> + return err;
>
> - return err;
> + mptcp_subflow_v6_init();
I think it would be better to move this to mptcp_proto_v6_init, similar
to what is done with mptcp_subflow_init, from mptcp_proto_init.
From there, you can even call it before registering the protocol, at the
beginning, so before inet6_register_protosw, which seems more logical
and similar to what is done in v4. WDYT?
If you send a v2, can you please remove the 'net:' prefix please?
'mptcp:' is enough:
[PATCH net v2] mptcp: fix (...)
Also, can you add a "Cc: stable" tag please?
Cheers,
Matt
--
Sponsored by the NGI0 Core fund.
On 4/4/26 7:15 PM, Matthieu Baerts wrote:
> Hi Jiayuan,
>
> On 03/04/2026 15:07, Jiayuan Chen wrote:
>> The ehash table lookups are lockless and rely on
>> SLAB_TYPESAFE_BY_RCU to guarantee socket memory stability
>> during RCU read-side critical sections. Both tcp_prot and
>> tcpv6_prot have their slab caches created with this flag
>> via proto_register().
>>
>> However, MPTCP's mptcp_subflow_init() copies tcpv6_prot into
>> tcpv6_prot_override during inet_init() (fs_initcall, level 5),
>> before inet6_init() (module_init/device_initcall, level 6) has
>> called proto_register(&tcpv6_prot). At that point,
>> tcpv6_prot.slab is still NULL, so tcpv6_prot_override.slab
>> remains NULL permanently.
>>
>> This causes MPTCP v6 subflow child sockets to be allocated via
>> kmalloc (falling into kmalloc-4k) instead of the TCPv6 slab
>> cache. The kmalloc-4k cache lacks SLAB_TYPESAFE_BY_RCU, so
>> when these sockets are freed without SOCK_RCU_FREE (which is
>> cleared for child sockets by design), the memory can be
>> immediately reused. Concurrent ehash lookups under
>> rcu_read_lock can then access freed memory, triggering a
>> slab-use-after-free in __inet_lookup_established.
> Good catch! Thank you for this patch.
>
>> Fix this by splitting the IPv6-specific initialization out of
>> mptcp_subflow_init() into a new mptcp_subflow_v6_init(), which
>> is called from mptcpv6_init() after proto_register(&tcpv6_prot)
>> has completed. This ensures tcpv6_prot_override.slab correctly
>> inherits the SLAB_TYPESAFE_BY_RCU slab cache.
> The split makes sense anyway: better to regroup all v6-related init steps.
>
>> Fixes: b19bc2945b40 ("mptcp: implement delegated actions")
>> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
>> ---
>> net/mptcp/ctrl.c | 6 +++++-
>> net/mptcp/protocol.h | 1 +
>> net/mptcp/subflow.c | 15 +++++++++------
>> 3 files changed, 15 insertions(+), 7 deletions(-)
>>
>> diff --git a/net/mptcp/ctrl.c b/net/mptcp/ctrl.c
>> index d96130e49942..5887ddcdb875 100644
>> --- a/net/mptcp/ctrl.c
>> +++ b/net/mptcp/ctrl.c
>> @@ -583,7 +583,11 @@ int __init mptcpv6_init(void)
>> int err;
>>
>> err = mptcp_proto_v6_init();
>> + if (err)
>> + return err;
>>
>> - return err;
>> + mptcp_subflow_v6_init();
> I think it would be better to move this to mptcp_proto_v6_init, similar
> to what is done with mptcp_subflow_init, from mptcp_proto_init.
>
> From there, you can even call it before registering the protocol, at the
> beginning, so before inet6_register_protosw, which seems more logical
> and similar to what is done in v4. WDYT?
>
> If you send a v2, can you please remove the 'net:' prefix please?
> 'mptcp:' is enough:
>
> [PATCH net v2] mptcp: fix (...)
>
> Also, can you add a "Cc: stable" tag please?
>
> Cheers,
> Matt
Hi Matt,
Thanks for the feedback! I'll try all your suggestions in v2.
Thanks!
Jiayuan
On 4/3/26 9:07 PM, Jiayuan Chen wrote:
> The ehash table lookups are lockless and rely on
> SLAB_TYPESAFE_BY_RCU to guarantee socket memory stability
> during RCU read-side critical sections. Both tcp_prot and
> tcpv6_prot have their slab caches created with this flag
> via proto_register().
>
> However, MPTCP's mptcp_subflow_init() copies tcpv6_prot into
> tcpv6_prot_override during inet_init() (fs_initcall, level 5),
> before inet6_init() (module_init/device_initcall, level 6) has
> called proto_register(&tcpv6_prot). At that point,
> tcpv6_prot.slab is still NULL, so tcpv6_prot_override.slab
> remains NULL permanently.
>
> This causes MPTCP v6 subflow child sockets to be allocated via
> kmalloc (falling into kmalloc-4k) instead of the TCPv6 slab
> cache. The kmalloc-4k cache lacks SLAB_TYPESAFE_BY_RCU, so
> when these sockets are freed without SOCK_RCU_FREE (which is
> cleared for child sockets by design), the memory can be
> immediately reused. Concurrent ehash lookups under
> rcu_read_lock can then access freed memory, triggering a
> slab-use-after-free in __inet_lookup_established.
>
> Fix this by splitting the IPv6-specific initialization out of
> mptcp_subflow_init() into a new mptcp_subflow_v6_init(), which
> is called from mptcpv6_init() after proto_register(&tcpv6_prot)
> has completed. This ensures tcpv6_prot_override.slab correctly
> inherits the SLAB_TYPESAFE_BY_RCU slab cache.
>
> Fixes: b19bc2945b40 ("mptcp: implement delegated actions")
> Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
> ---
> net/mptcp/ctrl.c | 6 +++++-
> net/mptcp/protocol.h | 1 +
> net/mptcp/subflow.c | 15 +++++++++------
> 3 files changed, 15 insertions(+), 7 deletions(-)
>
> diff --git a/net/mptcp/ctrl.c b/net/mptcp/ctrl.c
> index d96130e49942..5887ddcdb875 100644
> --- a/net/mptcp/ctrl.c
> +++ b/net/mptcp/ctrl.c
> @@ -583,7 +583,11 @@ int __init mptcpv6_init(void)
> int err;
>
> err = mptcp_proto_v6_init();
> + if (err)
> + return err;
>
> - return err;
> + mptcp_subflow_v6_init();
> +
> + return 0;
> }
> #endif
> diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
> index 0bd1ee860316..ec15e503da8b 100644
> --- a/net/mptcp/protocol.h
> +++ b/net/mptcp/protocol.h
> @@ -875,6 +875,7 @@ static inline void mptcp_subflow_tcp_fallback(struct sock *sk,
> void __init mptcp_proto_init(void);
> #if IS_ENABLED(CONFIG_MPTCP_IPV6)
> int __init mptcp_proto_v6_init(void);
> +void __init mptcp_subflow_v6_init(void);
> #endif
>
> struct sock *mptcp_sk_clone_init(const struct sock *sk,
> diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
> index 6716970693e9..4ff5863aa9fd 100644
> --- a/net/mptcp/subflow.c
> +++ b/net/mptcp/subflow.c
> @@ -2165,7 +2165,15 @@ void __init mptcp_subflow_init(void)
> tcp_prot_override.psock_update_sk_prot = NULL;
> #endif
>
> + mptcp_diag_subflow_init(&subflow_ulp_ops);
> +
> + if (tcp_register_ulp(&subflow_ulp_ops) != 0)
> + panic("MPTCP: failed to register subflows to ULP\n");
> +}
> +
> #if IS_ENABLED(CONFIG_MPTCP_IPV6)
> +void __init mptcp_subflow_v6_init(void)
> +{
> /* In struct mptcp_subflow_request_sock, we assume the TCP request sock
> * structures for v4 and v6 have the same size. It should not changed in
> * the future but better to make sure to be warned if it is no longer
> @@ -2204,10 +2212,5 @@ void __init mptcp_subflow_init(void)
> /* Disable sockmap processing for subflows */
> tcpv6_prot_override.psock_update_sk_prot = NULL;
> #endif
> -#endif
> -
> - mptcp_diag_subflow_init(&subflow_ulp_ops);
> -
> - if (tcp_register_ulp(&subflow_ulp_ops) != 0)
> - panic("MPTCP: failed to register subflows to ULP\n");
> }
> +#endif
I think the AI review is not accurate here.
https://sashiko.dev/#/patchset/20260403130734.93981-1-jiayuan.chen%40linux.dev
Userspace programs only start running after all initcalls have
completed, so there is no race condition.
Hi Jiayuan,
Thank you for your modifications, that's great!
Our CI did some validations and here is its report:
- KVM Validation: normal (except selftest_mptcp_join): Success! ✅
- KVM Validation: normal (only selftest_mptcp_join): Success! ✅
- KVM Validation: debug (except selftest_mptcp_join): Success! ✅
- KVM Validation: debug (only selftest_mptcp_join): Success! ✅
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/23948054233
Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/df0f3b32e603
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=1076991
If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:
$ cd [kernel source code]
$ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
--pull always mptcp/mptcp-upstream-virtme-docker:latest \
auto-normal
For more details:
https://github.com/multipath-tcp/mptcp-upstream-virtme-docker
Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)
Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)
© 2016 - 2026 Red Hat, Inc.