[PATCH net v2] mptcp: pm: fix memory leak from alloc-during-teardown race

Shardul Bankar posted 1 patch 1 day, 19 hours ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/multipath-tcp/mptcp_net-next tags/patchew/20260523212930.2957096-1-shardul.b@mpiricsoftware.com
net/mptcp/pm.c           | 7 +++++++
net/mptcp/pm_userspace.c | 4 ++++
net/mptcp/protocol.h     | 7 +++++--
3 files changed, 16 insertions(+), 2 deletions(-)
[PATCH net v2] mptcp: pm: fix memory leak from alloc-during-teardown race
Posted by Shardul Bankar 1 day, 19 hours ago
mptcp_pm_destroy() drains msk->pm.anno_list via list_splice_init()
under msk->pm.lock and then, for the userspace PM, drains
msk->pm.userspace_pm_local_addr_list under the same lock. Between
the two splices, msk->pm.lock is released.

A concurrent userspace PM genl ANNOUNCE on the same msk holds a
sock reference via mptcp_token_get_sock() and, in
mptcp_pm_nl_announce_doit(), calls
mptcp_userspace_pm_append_new_local_addr() and
mptcp_pm_alloc_anno_list(). Both take msk->pm.lock briefly to
add to their respective lists. Because the genl handler holds a
sock reference, mptcp_pm_destroy() may run on the same msk via
mptcp_disconnect(), which invokes mptcp_destroy_common() without
dropping the sock refcount, before the handler completes.

If the lock acquisitions interleave such that mptcp_pm_destroy()'s
splice runs first, the subsequent alloc paths land on a list head
pointing to itself. Nothing else iterates that list for this msk,
and the added entry leaks. kmemleak reports both 192-byte
mptcp_pm_add_entry objects (from mptcp_pm_alloc_anno_list()) and
64-byte mptcp_pm_addr_entry objects (from
mptcp_userspace_pm_append_new_local_addr()) under sustained
concurrent ANNOUNCE + close load against the userspace PM.

Add an MPTCP_PM_DESTROYING bit in msk->pm.status, set by
mptcp_pm_destroy() under pm.lock before the splice and checked
under pm.lock by the alloc paths. Either the alloc takes pm.lock
first, in which case its entry will be on the list when destroy's
splice runs and will be freed normally; or destroy takes pm.lock
first, in which case the subsequent alloc observes the bit and
refuses. Either ordering is correct, and the race that orphans
an entry is precluded. The bit lives in the pm data reset group,
so mptcp_pm_data_reset() clears it on the disconnect/reuse path
as well as on fresh socket init.

Found by an MPTCP protocol-flow harness extending BRF
(arXiv:2305.08782). Baseline ~0.3 kmemleak reports/minute on the
userspace PM reduces to zero across multi-hour runs with this
patch applied.

Fixes: 9ab4807c84a4 ("mptcp: netlink: Add MPTCP_PM_CMD_ANNOUNCE")
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Shardul Bankar <shardul.b@mpiricsoftware.com>
---
v2:
- Use a new MPTCP_PM_DESTROYING bit in msk->pm.status instead of a
  separate bool field; the bit lives in the PM data reset group, so
  mptcp_pm_data_reset() clears it on the disconnect/reuse path as
  well as on fresh socket init (Matt).
- Add Fixes tag (Matt).

Link to v1: https://lore.kernel.org/all/20260519191207.1110003-1-shardul.b@mpiricsoftware.com/

 net/mptcp/pm.c           | 7 +++++++
 net/mptcp/pm_userspace.c | 4 ++++
 net/mptcp/protocol.h     | 7 +++++--
 3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c
index 3e770c7407e1f..480842b01fed0 100644
--- a/net/mptcp/pm.c
+++ b/net/mptcp/pm.c
@@ -440,6 +440,9 @@ bool mptcp_pm_alloc_anno_list(struct mptcp_sock *msk,
 
 	lockdep_assert_held(&msk->pm.lock);
 
+	if (msk->pm.status & BIT(MPTCP_PM_DESTROYING))
+		return false;
+
 	add_entry = mptcp_lookup_anno_list_by_saddr(msk, addr);
 
 	if (add_entry) {
@@ -1109,6 +1112,10 @@ void mptcp_pm_worker(struct mptcp_sock *msk)
 
 void mptcp_pm_destroy(struct mptcp_sock *msk)
 {
+	spin_lock_bh(&msk->pm.lock);
+	msk->pm.status |= BIT(MPTCP_PM_DESTROYING);
+	spin_unlock_bh(&msk->pm.lock);
+
 	mptcp_pm_free_anno_list(msk);
 
 	if (mptcp_pm_is_userspace(msk))
diff --git a/net/mptcp/pm_userspace.c b/net/mptcp/pm_userspace.c
index 8cbc1920afb49..3ba1f67ba8ed8 100644
--- a/net/mptcp/pm_userspace.c
+++ b/net/mptcp/pm_userspace.c
@@ -54,6 +54,10 @@ static int mptcp_userspace_pm_append_new_local_addr(struct mptcp_sock *msk,
 	bitmap_zero(id_bitmap, MPTCP_PM_MAX_ADDR_ID + 1);
 
 	spin_lock_bh(&msk->pm.lock);
+	if (msk->pm.status & BIT(MPTCP_PM_DESTROYING)) {
+		ret = -EINVAL;
+		goto append_err;
+	}
 	mptcp_for_each_userspace_pm_addr(msk, e) {
 		addr_match = mptcp_addresses_equal(&e->addr, &entry->addr, true);
 		if (addr_match && entry->addr.id == 0 && needs_id)
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index e4f5aba24da7d..86a050e22585d 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -189,8 +189,11 @@ enum mptcp_pm_status {
 	MPTCP_PM_ESTABLISHED,
 	MPTCP_PM_SUBFLOW_ESTABLISHED,
 	MPTCP_PM_ALREADY_ESTABLISHED,	/* persistent status, set after ESTABLISHED event */
-	MPTCP_PM_MPC_ENDPOINT_ACCOUNTED /* persistent status, set after MPC local address is
-					 * accounted int id_avail_bitmap
+	MPTCP_PM_MPC_ENDPOINT_ACCOUNTED, /* persistent status, set after MPC local address is
+					  * accounted int id_avail_bitmap
+					  */
+	MPTCP_PM_DESTROYING,		/* set under pm.lock by mptcp_pm_destroy() to fence
+					 * out PM list allocs that would be orphaned by splice
 					 */
 };
 
-- 
2.34.1
Re: [PATCH net v2] mptcp: pm: fix memory leak from alloc-during-teardown race
Posted by MPTCP CI 1 day, 18 hours ago
Hi Shardul,

Thank you for your modifications, that's great!

Our CI did some validations and here is its report:

- KVM Validation: normal (except selftest_mptcp_join): Success! ✅
- KVM Validation: normal (only selftest_mptcp_join): Success! ✅
- KVM Validation: debug (except selftest_mptcp_join): Unstable: 1 failed test(s): packetdrill_sockopts ⚠️ 
- KVM Validation: debug (only selftest_mptcp_join): Success! ✅
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/26344200521

Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/c67bb4bfe409
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=1099906


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-normal

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)