When test valkey in mptcp, the kernel will panic due to the function
'mptcp_can_accept_new_subflow' when the subflow_req->msk is NULL.
This problem shows in kernel 6.11.0-rc3 and 6.14.0-rc4. The message
attached is valkey_test in 6.14.0-rc4:
Call trace:
mptcp_can_accept_new_subflow (./net/mptcp/subflow.c:63 (discriminator 4)) (P)
subflow_syn_recv_sock (./net/mptcp/subflow.c:854)
tcp_check_req (./net/ipv4/tcp_minisocks.c:863)
tcp_v4_rcv (./net/ipv4/tcp_ipv4.c:2268)
ip_protocol_deliver_rcu (./net/ipv4/ip_input.c:207)
ip_local_deliver_finish (./net/ipv4/ip_input.c:234)
ip_local_deliver (./net/ipv4/ip_input.c:254)
ip_rcv_finish (./net/ipv4/ip_input.c:449)
...
According to the debug log, the same req received two SYN-ACK in a very
short time. (When the client retransmits the syn ack dueto multiple reasons.
Even if the packets are transmitted with a relevant time interval, they can be
processed by the server on different CPUs concurrently). The 'subflow_req->msk'
ownership is transferred to subflow in the first, and there will be a risk of a
null pointer dereference here.
This patch can fix this issue by moving the 'subflow_req->msk' under the
`own_req == true` conditional.
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Gang Yan <yangang@kylinos.cn>
---
Notes:
- v2:
- The MPTCP and the TCP code protect vs such race using the
`own_req` boolean. So moving the subflow_req->msk under
the 'own_req'.
---
net/mptcp/subflow.c | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index efe8d86496db..409bd415ef1d 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -754,8 +754,6 @@ static bool subflow_hmac_valid(const struct request_sock *req,
subflow_req = mptcp_subflow_rsk(req);
msk = subflow_req->msk;
- if (!msk)
- return false;
subflow_generate_hmac(READ_ONCE(msk->remote_key),
READ_ONCE(msk->local_key),
@@ -850,12 +848,8 @@ static struct sock *subflow_syn_recv_sock(const struct sock *sk,
} else if (subflow_req->mp_join) {
mptcp_get_options(skb, &mp_opt);
- if (!(mp_opt.suboptions & OPTION_MPTCP_MPJ_ACK) ||
- !subflow_hmac_valid(req, &mp_opt) ||
- !mptcp_can_accept_new_subflow(subflow_req->msk)) {
- SUBFLOW_REQ_INC_STATS(req, MPTCP_MIB_JOINACKMAC);
+ if (!(mp_opt.suboptions & OPTION_MPTCP_MPJ_ACK))
fallback = true;
- }
}
create_child:
@@ -905,6 +899,13 @@ static struct sock *subflow_syn_recv_sock(const struct sock *sk,
goto dispose_child;
}
+ if (!subflow_hmac_valid(req, &mp_opt) ||
+ !mptcp_can_accept_new_subflow(subflow_req->msk)) {
+ SUBFLOW_REQ_INC_STATS(req, MPTCP_MIB_JOINACKMAC);
+ subflow_add_reset_reason(skb, MPTCP_RST_EPROHIBIT);
+ goto dispose_child;
+ }
+
/* move the msk reference ownership to the subflow */
subflow_req->msk = NULL;
ctx->conn = (struct sock *)owner;
--
2.25.1
On 3/19/25 3:27 AM, Gang Yan wrote:
> When test valkey in mptcp, the kernel will panic due to the function
> 'mptcp_can_accept_new_subflow' when the subflow_req->msk is NULL.
>
> This problem shows in kernel 6.11.0-rc3 and 6.14.0-rc4. The message
> attached is valkey_test in 6.14.0-rc4:
>
> Call trace:
> mptcp_can_accept_new_subflow (./net/mptcp/subflow.c:63 (discriminator 4)) (P)
> subflow_syn_recv_sock (./net/mptcp/subflow.c:854)
> tcp_check_req (./net/ipv4/tcp_minisocks.c:863)
> tcp_v4_rcv (./net/ipv4/tcp_ipv4.c:2268)
> ip_protocol_deliver_rcu (./net/ipv4/ip_input.c:207)
> ip_local_deliver_finish (./net/ipv4/ip_input.c:234)
> ip_local_deliver (./net/ipv4/ip_input.c:254)
> ip_rcv_finish (./net/ipv4/ip_input.c:449)
> ...
>
> According to the debug log, the same req received two SYN-ACK in a very
> short time. (When the client retransmits the syn ack dueto multiple reasons.
> Even if the packets are transmitted with a relevant time interval, they can be
> processed by the server on different CPUs concurrently). The 'subflow_req->msk'
> ownership is transferred to subflow in the first, and there will be a risk of a
> null pointer dereference here.
>
> This patch can fix this issue by moving the 'subflow_req->msk' under the
> `own_req == true` conditional.
>
> Suggested-by: Paolo Abeni <pabeni@redhat.com>
> Signed-off-by: Gang Yan <yangang@kylinos.cn>
The code LGTM.
Possibly the commit message could be expanded noting that the !msk check
in subflow_hmac_valid() could be dropped, because there is already the
same check under the own_req mpj branch.
Also this needs a Fixes tag:
Fixes: 9466a1ccebbe ("mptcp: enable JOIN requests even if cookies are in
use")
Matt: could you please adjust the above while applying the patch?
Thanks!
Paolo
Hi Paolo,
On 19/03/2025 08:38, Paolo Abeni wrote:
> On 3/19/25 3:27 AM, Gang Yan wrote:
>> When test valkey in mptcp, the kernel will panic due to the function
>> 'mptcp_can_accept_new_subflow' when the subflow_req->msk is NULL.
>>
>> This problem shows in kernel 6.11.0-rc3 and 6.14.0-rc4. The message
>> attached is valkey_test in 6.14.0-rc4:
>>
>> Call trace:
>> mptcp_can_accept_new_subflow (./net/mptcp/subflow.c:63 (discriminator 4)) (P)
>> subflow_syn_recv_sock (./net/mptcp/subflow.c:854)
>> tcp_check_req (./net/ipv4/tcp_minisocks.c:863)
>> tcp_v4_rcv (./net/ipv4/tcp_ipv4.c:2268)
>> ip_protocol_deliver_rcu (./net/ipv4/ip_input.c:207)
>> ip_local_deliver_finish (./net/ipv4/ip_input.c:234)
>> ip_local_deliver (./net/ipv4/ip_input.c:254)
>> ip_rcv_finish (./net/ipv4/ip_input.c:449)
>> ...
>>
>> According to the debug log, the same req received two SYN-ACK in a very
>> short time. (When the client retransmits the syn ack dueto multiple reasons.
>> Even if the packets are transmitted with a relevant time interval, they can be
>> processed by the server on different CPUs concurrently). The 'subflow_req->msk'
>> ownership is transferred to subflow in the first, and there will be a risk of a
>> null pointer dereference here.
>>
>> This patch can fix this issue by moving the 'subflow_req->msk' under the
>> `own_req == true` conditional.
>>
>> Suggested-by: Paolo Abeni <pabeni@redhat.com>
>> Signed-off-by: Gang Yan <yangang@kylinos.cn>
>
> The code LGTM.
Me too, thank you both for having looked at that!
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
> Possibly the commit message could be expanded noting that the !msk check
> in subflow_hmac_valid() could be dropped, because there is already the
> same check under the own_req mpj branch.
>
> Also this needs a Fixes tag:
>
> Fixes: 9466a1ccebbe ("mptcp: enable JOIN requests even if cookies are in
> use")
>
> Matt: could you please adjust the above while applying the patch?
Sure, I will do that!
Cheers,
Matt
--
Sponsored by the NGI0 Core fund.
Hi Paolo, Gang, On 19/03/2025 18:47, Matthieu Baerts wrote: > Hi Paolo, > > On 19/03/2025 08:38, Paolo Abeni wrote: >> On 3/19/25 3:27 AM, Gang Yan wrote: >>> When test valkey in mptcp, the kernel will panic due to the function >>> 'mptcp_can_accept_new_subflow' when the subflow_req->msk is NULL. >>> >>> This problem shows in kernel 6.11.0-rc3 and 6.14.0-rc4. The message >>> attached is valkey_test in 6.14.0-rc4: >>> >>> Call trace: >>> mptcp_can_accept_new_subflow (./net/mptcp/subflow.c:63 (discriminator 4)) (P) >>> subflow_syn_recv_sock (./net/mptcp/subflow.c:854) >>> tcp_check_req (./net/ipv4/tcp_minisocks.c:863) >>> tcp_v4_rcv (./net/ipv4/tcp_ipv4.c:2268) >>> ip_protocol_deliver_rcu (./net/ipv4/ip_input.c:207) >>> ip_local_deliver_finish (./net/ipv4/ip_input.c:234) >>> ip_local_deliver (./net/ipv4/ip_input.c:254) >>> ip_rcv_finish (./net/ipv4/ip_input.c:449) >>> ... >>> >>> According to the debug log, the same req received two SYN-ACK in a very >>> short time. (When the client retransmits the syn ack dueto multiple reasons. >>> Even if the packets are transmitted with a relevant time interval, they can be >>> processed by the server on different CPUs concurrently). The 'subflow_req->msk' >>> ownership is transferred to subflow in the first, and there will be a risk of a >>> null pointer dereference here. >>> >>> This patch can fix this issue by moving the 'subflow_req->msk' under the >>> `own_req == true` conditional. >>> >>> Suggested-by: Paolo Abeni <pabeni@redhat.com> >>> Signed-off-by: Gang Yan <yangang@kylinos.cn> >> >> The code LGTM. > > Me too, thank you both for having looked at that! Now in our tree (fix for net): New patches for t/upstream-net and t/upstream: - 40ad50026099: mptcp: fix NULL pointer in can_accept_new_subflow - Results: 15fd746f34c5..dba54e66e69e (export-net) - Results: 2bea3081157b..332a74c2a7d4 (export) Tests are now in progress: - export-net: https://github.com/multipath-tcp/mptcp_net-next/commit/06f515aab4fa9d84ba4dfe81b17aa7d77ea22b4d/checks - export: https://github.com/multipath-tcp/mptcp_net-next/commit/83da8f611c90d09ade3f1fc8b04302d272c8f7b9/checks Cheers, Matt -- Sponsored by the NGI0 Core fund.
Hi Gang,
Thank you for your modifications, that's great!
Our CI did some validations and here is its report:
- KVM Validation: normal: Success! ✅
- KVM Validation: debug: Success! ✅
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/13937733185
Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/904006e09ceb
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=945398
If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:
$ cd [kernel source code]
$ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
--pull always mptcp/mptcp-upstream-virtme-docker:latest \
auto-normal
For more details:
https://github.com/multipath-tcp/mptcp-upstream-virtme-docker
Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)
Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)
© 2016 - 2026 Red Hat, Inc.