The backlog list is updated by mptcp_data_ready() under
mptcp_data_lock(). The cleanup of backlog references to a closing
subflow, however, was performed in mptcp_close_ssk(), before
__mptcp_close_ssk() acquires the ssk lock, and while holding neither
the ssk lock nor mptcp_data_lock().
Because that traversal ran without mptcp_data_lock(), concurrent softirq
RX processing on another CPU (subflow_data_ready() -> mptcp_data_ready()
-> __mptcp_add_backlog(), under mptcp_data_lock()) could add a backlog
entry referencing the ssk while the cleanup loop was in progress. Such
an entry could be missed by the cleanup, or the concurrent list update
could corrupt the traversal, leaving skb->sk pointing at the ssk after
it is freed.
A later mptcp_backlog_purge() then dereferences the stale pointer,
triggering a warning in inet_sock_destruct() (ssk->sk_rmem_alloc != 0)
followed by a use-after-free in mptcp_backlog_purge().
Fix this by moving the backlog cleanup into __mptcp_close_ssk(), after
subflow->closing is set to 1 and while the ssk lock is still held,
serialized under mptcp_data_lock(). The cleanup runs only on the push
path (MPTCP_CF_PUSH), where backlog references accumulate; on other
teardown paths the caller already handles cleanup.
With subflow->closing set and mptcp_data_lock() held across the purge,
any concurrent mptcp_data_ready() either completes its enqueue before
the purge runs and is caught, or observes closing=1 and bails out. Once
mptcp_data_unlock() is reached, no new skb referencing the ssk can be
enqueued, so the cleanup is exhaustive.
Remove the unprotected traversal from mptcp_close_ssk() entirely.
Tested with the MPTCP kernel selftests on the patched kernel:
- tools/testing/selftests/net/mptcp/mptcp_join.sh: all tests pass
- tools/testing/selftests/net/mptcp/mptcp_connect.sh: 68/68 pass
Fixes: ee458a3f314e ("mptcp: introduce mptcp-level backlog")
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Reported-by: syzkaller <syzkaller@googlegroups.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/621
Signed-off-by: Kalpan Jani <kalpan.jani@mpiricsoftware.com>
---
net/mptcp/protocol.c | 34 ++++++++++++++++++++--------------
1 file changed, 20 insertions(+), 14 deletions(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 4546a8b09884..37efcea99dc1 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -2527,6 +2527,22 @@ static void __mptcp_subflow_disconnect(struct sock *ssk,
}
}
+static void mptcp_cleanup_ssk_backlog(struct sock *sk, struct sock *ssk)
+{
+ struct mptcp_sock *msk = mptcp_sk(sk);
+ struct sk_buff *skb;
+
+ mptcp_data_lock(sk);
+ list_for_each_entry(skb, &msk->backlog_list, list) {
+ if (skb->sk != ssk)
+ continue;
+
+ atomic_sub(skb->truesize, &skb->sk->sk_rmem_alloc);
+ skb->sk = NULL;
+ }
+ mptcp_data_unlock(sk);
+}
+
/* subflow sockets can be either outgoing (connect) or incoming
* (accept).
*
@@ -2550,6 +2567,9 @@ static void __mptcp_close_ssk(struct sock *sk, struct sock *ssk,
lock_sock_nested(ssk, SINGLE_DEPTH_NESTING);
subflow->closing = 1;
+ if (flags & MPTCP_CF_PUSH)
+ mptcp_cleanup_ssk_backlog(sk, ssk);
+
/* Borrow the fwd allocated page left-over; fwd memory for the subflow
* could be negative at this point, but will be reach zero soon - when
* the data allocated using such fragment will be freed.
@@ -2641,9 +2661,6 @@ static void __mptcp_close_ssk(struct sock *sk, struct sock *ssk,
void mptcp_close_ssk(struct sock *sk, struct sock *ssk,
struct mptcp_subflow_context *subflow)
{
- struct mptcp_sock *msk = mptcp_sk(sk);
- struct sk_buff *skb;
-
/* The first subflow can already be closed or disconnected */
if (subflow->close_event_done || READ_ONCE(subflow->local_id) < 0)
return;
@@ -2653,17 +2670,6 @@ void mptcp_close_ssk(struct sock *sk, struct sock *ssk,
if (sk->sk_state == TCP_ESTABLISHED)
mptcp_event(MPTCP_EVENT_SUB_CLOSED, mptcp_sk(sk), ssk, GFP_KERNEL);
- /* Remove any reference from the backlog to this ssk; backlog skbs consume
- * space in the msk receive queue, no need to touch sk->sk_rmem_alloc
- */
- list_for_each_entry(skb, &msk->backlog_list, list) {
- if (skb->sk != ssk)
- continue;
-
- atomic_sub(skb->truesize, &skb->sk->sk_rmem_alloc);
- skb->sk = NULL;
- }
-
/* subflow aborted before reaching the fully_established status
* attempt the creation of the next subflow
*/
--
2.43.0
Hi Kalpan,
Thank you for your modifications, that's great!
Our CI did some validations and here is its report:
- KVM Validation: normal (except selftest_mptcp_join): Success! ✅
- KVM Validation: normal (only selftest_mptcp_join): Success! ✅
- KVM Validation: debug (except selftest_mptcp_join): Unstable: 1 failed test(s): selftest_mptcp_connect_checksum ⚠️
- KVM Validation: debug (only selftest_mptcp_join): Success! ✅
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/26744812858
Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/60889317c233
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=1103840
If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:
$ cd [kernel source code]
$ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
--pull always mptcp/mptcp-upstream-virtme-docker:latest \
auto-normal
For more details:
https://github.com/multipath-tcp/mptcp-upstream-virtme-docker
Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)
Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)
© 2016 - 2026 Red Hat, Inc.