When test valkey in mptcp, the kernel will panic due to the function
'mptcp_can_accept_new_subflow' when the subflow_req->msk is NULL.
This problem shows in kernel 6.11.0-rc3 and 6.14.0-rc4. The message
attached is valkey_test in 6.14.0-rc4:
Call trace:
mptcp_can_accept_new_subflow (./net/mptcp/subflow.c:63 (discriminator 4)) (P)
subflow_syn_recv_sock (./net/mptcp/subflow.c:854)
tcp_check_req (./net/ipv4/tcp_minisocks.c:863)
tcp_v4_rcv (./net/ipv4/tcp_ipv4.c:2268)
ip_protocol_deliver_rcu (./net/ipv4/ip_input.c:207)
ip_local_deliver_finish (./net/ipv4/ip_input.c:234)
ip_local_deliver (./net/ipv4/ip_input.c:254)
ip_rcv_finish (./net/ipv4/ip_input.c:449)
...
According to the debug log, the same req received two SYN-ACK in a very
short time. (When the client retransmits the syn ack dueto multiple reasons.
Even if the packets are transmitted with a relevant time interval, they can be
processed by the server on different CPUs concurrently). The 'subflow_req->msk'
ownership is transferred to subflow in the first, and there will be a risk of a
null pointer dereference here.
This patch can fix this issue by moving the 'subflow_req->msk' under the
`own_req == true` conditional.
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Gang Yan <yangang@kylinos.cn>
---
Notes:
- v2:
- The MPTCP and the TCP code protect vs such race using the
`own_req` boolean. So moving the subflow_req->msk under
the 'own_req'.
---
net/mptcp/subflow.c | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index efe8d86496db..409bd415ef1d 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -754,8 +754,6 @@ static bool subflow_hmac_valid(const struct request_sock *req,
subflow_req = mptcp_subflow_rsk(req);
msk = subflow_req->msk;
- if (!msk)
- return false;
subflow_generate_hmac(READ_ONCE(msk->remote_key),
READ_ONCE(msk->local_key),
@@ -850,12 +848,8 @@ static struct sock *subflow_syn_recv_sock(const struct sock *sk,
} else if (subflow_req->mp_join) {
mptcp_get_options(skb, &mp_opt);
- if (!(mp_opt.suboptions & OPTION_MPTCP_MPJ_ACK) ||
- !subflow_hmac_valid(req, &mp_opt) ||
- !mptcp_can_accept_new_subflow(subflow_req->msk)) {
- SUBFLOW_REQ_INC_STATS(req, MPTCP_MIB_JOINACKMAC);
+ if (!(mp_opt.suboptions & OPTION_MPTCP_MPJ_ACK))
fallback = true;
- }
}
create_child:
@@ -905,6 +899,13 @@ static struct sock *subflow_syn_recv_sock(const struct sock *sk,
goto dispose_child;
}
+ if (!subflow_hmac_valid(req, &mp_opt) ||
+ !mptcp_can_accept_new_subflow(subflow_req->msk)) {
+ SUBFLOW_REQ_INC_STATS(req, MPTCP_MIB_JOINACKMAC);
+ subflow_add_reset_reason(skb, MPTCP_RST_EPROHIBIT);
+ goto dispose_child;
+ }
+
/* move the msk reference ownership to the subflow */
subflow_req->msk = NULL;
ctx->conn = (struct sock *)owner;
--
2.25.1
On 3/19/25 3:27 AM, Gang Yan wrote: > When test valkey in mptcp, the kernel will panic due to the function > 'mptcp_can_accept_new_subflow' when the subflow_req->msk is NULL. > > This problem shows in kernel 6.11.0-rc3 and 6.14.0-rc4. The message > attached is valkey_test in 6.14.0-rc4: > > Call trace: > mptcp_can_accept_new_subflow (./net/mptcp/subflow.c:63 (discriminator 4)) (P) > subflow_syn_recv_sock (./net/mptcp/subflow.c:854) > tcp_check_req (./net/ipv4/tcp_minisocks.c:863) > tcp_v4_rcv (./net/ipv4/tcp_ipv4.c:2268) > ip_protocol_deliver_rcu (./net/ipv4/ip_input.c:207) > ip_local_deliver_finish (./net/ipv4/ip_input.c:234) > ip_local_deliver (./net/ipv4/ip_input.c:254) > ip_rcv_finish (./net/ipv4/ip_input.c:449) > ... > > According to the debug log, the same req received two SYN-ACK in a very > short time. (When the client retransmits the syn ack dueto multiple reasons. > Even if the packets are transmitted with a relevant time interval, they can be > processed by the server on different CPUs concurrently). The 'subflow_req->msk' > ownership is transferred to subflow in the first, and there will be a risk of a > null pointer dereference here. > > This patch can fix this issue by moving the 'subflow_req->msk' under the > `own_req == true` conditional. > > Suggested-by: Paolo Abeni <pabeni@redhat.com> > Signed-off-by: Gang Yan <yangang@kylinos.cn> The code LGTM. Possibly the commit message could be expanded noting that the !msk check in subflow_hmac_valid() could be dropped, because there is already the same check under the own_req mpj branch. Also this needs a Fixes tag: Fixes: 9466a1ccebbe ("mptcp: enable JOIN requests even if cookies are in use") Matt: could you please adjust the above while applying the patch? Thanks! Paolo
Hi Paolo, On 19/03/2025 08:38, Paolo Abeni wrote: > On 3/19/25 3:27 AM, Gang Yan wrote: >> When test valkey in mptcp, the kernel will panic due to the function >> 'mptcp_can_accept_new_subflow' when the subflow_req->msk is NULL. >> >> This problem shows in kernel 6.11.0-rc3 and 6.14.0-rc4. The message >> attached is valkey_test in 6.14.0-rc4: >> >> Call trace: >> mptcp_can_accept_new_subflow (./net/mptcp/subflow.c:63 (discriminator 4)) (P) >> subflow_syn_recv_sock (./net/mptcp/subflow.c:854) >> tcp_check_req (./net/ipv4/tcp_minisocks.c:863) >> tcp_v4_rcv (./net/ipv4/tcp_ipv4.c:2268) >> ip_protocol_deliver_rcu (./net/ipv4/ip_input.c:207) >> ip_local_deliver_finish (./net/ipv4/ip_input.c:234) >> ip_local_deliver (./net/ipv4/ip_input.c:254) >> ip_rcv_finish (./net/ipv4/ip_input.c:449) >> ... >> >> According to the debug log, the same req received two SYN-ACK in a very >> short time. (When the client retransmits the syn ack dueto multiple reasons. >> Even if the packets are transmitted with a relevant time interval, they can be >> processed by the server on different CPUs concurrently). The 'subflow_req->msk' >> ownership is transferred to subflow in the first, and there will be a risk of a >> null pointer dereference here. >> >> This patch can fix this issue by moving the 'subflow_req->msk' under the >> `own_req == true` conditional. >> >> Suggested-by: Paolo Abeni <pabeni@redhat.com> >> Signed-off-by: Gang Yan <yangang@kylinos.cn> > > The code LGTM. Me too, thank you both for having looked at that! Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> > Possibly the commit message could be expanded noting that the !msk check > in subflow_hmac_valid() could be dropped, because there is already the > same check under the own_req mpj branch. > > Also this needs a Fixes tag: > > Fixes: 9466a1ccebbe ("mptcp: enable JOIN requests even if cookies are in > use") > > Matt: could you please adjust the above while applying the patch? Sure, I will do that! Cheers, Matt -- Sponsored by the NGI0 Core fund.
Hi Paolo, Gang, On 19/03/2025 18:47, Matthieu Baerts wrote: > Hi Paolo, > > On 19/03/2025 08:38, Paolo Abeni wrote: >> On 3/19/25 3:27 AM, Gang Yan wrote: >>> When test valkey in mptcp, the kernel will panic due to the function >>> 'mptcp_can_accept_new_subflow' when the subflow_req->msk is NULL. >>> >>> This problem shows in kernel 6.11.0-rc3 and 6.14.0-rc4. The message >>> attached is valkey_test in 6.14.0-rc4: >>> >>> Call trace: >>> mptcp_can_accept_new_subflow (./net/mptcp/subflow.c:63 (discriminator 4)) (P) >>> subflow_syn_recv_sock (./net/mptcp/subflow.c:854) >>> tcp_check_req (./net/ipv4/tcp_minisocks.c:863) >>> tcp_v4_rcv (./net/ipv4/tcp_ipv4.c:2268) >>> ip_protocol_deliver_rcu (./net/ipv4/ip_input.c:207) >>> ip_local_deliver_finish (./net/ipv4/ip_input.c:234) >>> ip_local_deliver (./net/ipv4/ip_input.c:254) >>> ip_rcv_finish (./net/ipv4/ip_input.c:449) >>> ... >>> >>> According to the debug log, the same req received two SYN-ACK in a very >>> short time. (When the client retransmits the syn ack dueto multiple reasons. >>> Even if the packets are transmitted with a relevant time interval, they can be >>> processed by the server on different CPUs concurrently). The 'subflow_req->msk' >>> ownership is transferred to subflow in the first, and there will be a risk of a >>> null pointer dereference here. >>> >>> This patch can fix this issue by moving the 'subflow_req->msk' under the >>> `own_req == true` conditional. >>> >>> Suggested-by: Paolo Abeni <pabeni@redhat.com> >>> Signed-off-by: Gang Yan <yangang@kylinos.cn> >> >> The code LGTM. > > Me too, thank you both for having looked at that! Now in our tree (fix for net): New patches for t/upstream-net and t/upstream: - 40ad50026099: mptcp: fix NULL pointer in can_accept_new_subflow - Results: 15fd746f34c5..dba54e66e69e (export-net) - Results: 2bea3081157b..332a74c2a7d4 (export) Tests are now in progress: - export-net: https://github.com/multipath-tcp/mptcp_net-next/commit/06f515aab4fa9d84ba4dfe81b17aa7d77ea22b4d/checks - export: https://github.com/multipath-tcp/mptcp_net-next/commit/83da8f611c90d09ade3f1fc8b04302d272c8f7b9/checks Cheers, Matt -- Sponsored by the NGI0 Core fund.
Hi Gang, Thank you for your modifications, that's great! Our CI did some validations and here is its report: - KVM Validation: normal: Success! ✅ - KVM Validation: debug: Success! ✅ - KVM Validation: btf-normal (only bpftest_all): Success! ✅ - KVM Validation: btf-debug (only bpftest_all): Success! ✅ - Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/13937733185 Initiator: Patchew Applier Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/904006e09ceb Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=945398 If there are some issues, you can reproduce them using the same environment as the one used by the CI thanks to a docker image, e.g.: $ cd [kernel source code] $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \ --pull always mptcp/mptcp-upstream-virtme-docker:latest \ auto-normal For more details: https://github.com/multipath-tcp/mptcp-upstream-virtme-docker Please note that despite all the efforts that have been already done to have a stable tests suite when executed on a public CI like here, it is possible some reported issues are not due to your modifications. Still, do not hesitate to help us improve that ;-) Cheers, MPTCP GH Action bot Bot operated by Matthieu Baerts (NGI0 Core)
© 2016 - 2025 Red Hat, Inc.