[PATCH net] mptcp: pm: fix memory leak from alloc-during-teardown race

Shardul Bankar posted 1 patch 5 days, 22 hours ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/multipath-tcp/mptcp_net-next tags/patchew/20260519191207.1110003-1-shardul.b@mpiricsoftware.com
There is a newer version of this series
net/mptcp/pm.c           | 6 ++++++
net/mptcp/pm_userspace.c | 4 ++++
net/mptcp/protocol.h     | 7 +++++++
3 files changed, 17 insertions(+)
[PATCH net] mptcp: pm: fix memory leak from alloc-during-teardown race
Posted by Shardul Bankar 5 days, 22 hours ago
mptcp_pm_destroy() drains msk->pm.anno_list via list_splice_init()
under msk->pm.lock and then, for the userspace PM, drains
msk->pm.userspace_pm_local_addr_list under the same lock. Between
the two splices, msk->pm.lock is released.

A concurrent userspace PM genl ANNOUNCE on the same msk holds a
sock reference via mptcp_token_get_sock() and, in
mptcp_pm_nl_announce_doit(), calls
mptcp_userspace_pm_append_new_local_addr() and
mptcp_pm_alloc_anno_list(). Both take msk->pm.lock briefly to
add to their respective lists. Because the genl handler holds a
sock reference, mptcp_pm_destroy() may run on the same msk via
mptcp_disconnect(), which invokes mptcp_destroy_common() without
dropping the sock refcount, before the handler completes.

If the lock acquisitions interleave such that mptcp_pm_destroy()'s
splice runs first, the subsequent alloc paths land on a list head
pointing to itself. Nothing else iterates that list for this msk,
and the added entry leaks. kmemleak reports both 192-byte
mptcp_pm_add_entry objects (from mptcp_pm_alloc_anno_list()) and
64-byte mptcp_pm_addr_entry objects (from
mptcp_userspace_pm_append_new_local_addr()) under sustained
concurrent ANNOUNCE + close load against the userspace PM.

Add a teardown flag set by mptcp_pm_destroy() before its
pm.lock-held splice and checked under pm.lock by the alloc paths.
Either the alloc takes pm.lock first, in which case its entry
will be on the list when destroy's splice runs and will be freed
normally; or destroy takes pm.lock first, in which case the
subsequent alloc observes the flag and refuses. Either ordering
is correct, and the race that orphans an entry is precluded. The
flag is cleared in mptcp_pm_data_init() so the state is correct
on slab reuse.

Found by an MPTCP protocol-flow harness extending BRF
(arXiv:2305.08782). Baseline ~0.3 kmemleak reports/minute on the
userspace PM reduces to zero across multi-hour runs with this
patch applied.

Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Shardul Bankar <shardul.b@mpiricsoftware.com>
---
 net/mptcp/pm.c           | 6 ++++++
 net/mptcp/pm_userspace.c | 4 ++++
 net/mptcp/protocol.h     | 7 +++++++
 3 files changed, 17 insertions(+)

diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c
index 3e770c7407e1f..5b5c5e0c38583 100644
--- a/net/mptcp/pm.c
+++ b/net/mptcp/pm.c
@@ -440,6 +440,9 @@ bool mptcp_pm_alloc_anno_list(struct mptcp_sock *msk,
 
 	lockdep_assert_held(&msk->pm.lock);
 
+	if (READ_ONCE(msk->pm.destroying))
+		return false;
+
 	add_entry = mptcp_lookup_anno_list_by_saddr(msk, addr);
 
 	if (add_entry) {
@@ -1109,6 +1112,8 @@ void mptcp_pm_worker(struct mptcp_sock *msk)
 
 void mptcp_pm_destroy(struct mptcp_sock *msk)
 {
+	WRITE_ONCE(msk->pm.destroying, true);
+
 	mptcp_pm_free_anno_list(msk);
 
 	if (mptcp_pm_is_userspace(msk))
@@ -1149,6 +1154,7 @@ void mptcp_pm_data_init(struct mptcp_sock *msk)
 	spin_lock_init(&msk->pm.lock);
 	INIT_LIST_HEAD(&msk->pm.anno_list);
 	INIT_LIST_HEAD(&msk->pm.userspace_pm_local_addr_list);
+	msk->pm.destroying = false;
 	mptcp_pm_data_reset(msk);
 }
 
diff --git a/net/mptcp/pm_userspace.c b/net/mptcp/pm_userspace.c
index 8cbc1920afb49..bf53db7307302 100644
--- a/net/mptcp/pm_userspace.c
+++ b/net/mptcp/pm_userspace.c
@@ -54,6 +54,10 @@ static int mptcp_userspace_pm_append_new_local_addr(struct mptcp_sock *msk,
 	bitmap_zero(id_bitmap, MPTCP_PM_MAX_ADDR_ID + 1);
 
 	spin_lock_bh(&msk->pm.lock);
+	if (READ_ONCE(msk->pm.destroying)) {
+		ret = -EINVAL;
+		goto append_err;
+	}
 	mptcp_for_each_userspace_pm_addr(msk, e) {
 		addr_match = mptcp_addresses_equal(&e->addr, &entry->addr, true);
 		if (addr_match && entry->addr.id == 0 && needs_id)
diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
index e4f5aba24da7d..c7a068ea71324 100644
--- a/net/mptcp/protocol.h
+++ b/net/mptcp/protocol.h
@@ -222,6 +222,13 @@ struct mptcp_pm_data {
 
 	spinlock_t	lock;		/*protects the whole PM data */
 
+	/* set by mptcp_pm_destroy() before its pm.lock-held splice;
+	 * checked under pm.lock by alloc paths to refuse adds that would
+	 * be orphaned by destroy's splice.  Cleared in mptcp_pm_data_init()
+	 * so the field is correct after slab reuse.
+	 */
+	bool		destroying;
+
 	struct_group(reset,
 
 	u8		addr_signal;
-- 
2.34.1
Re: [PATCH net] mptcp: pm: fix memory leak from alloc-during-teardown race
Posted by Matthieu Baerts 3 days, 15 hours ago
Hi Shardul,

On 20/05/2026 05:12, Shardul Bankar wrote:
> mptcp_pm_destroy() drains msk->pm.anno_list via list_splice_init()
> under msk->pm.lock and then, for the userspace PM, drains
> msk->pm.userspace_pm_local_addr_list under the same lock. Between
> the two splices, msk->pm.lock is released.
> 
> A concurrent userspace PM genl ANNOUNCE on the same msk holds a
> sock reference via mptcp_token_get_sock() and, in
> mptcp_pm_nl_announce_doit(), calls
> mptcp_userspace_pm_append_new_local_addr() and
> mptcp_pm_alloc_anno_list(). Both take msk->pm.lock briefly to
> add to their respective lists. Because the genl handler holds a
> sock reference, mptcp_pm_destroy() may run on the same msk via
> mptcp_disconnect(), which invokes mptcp_destroy_common() without
> dropping the sock refcount, before the handler completes.
> 
> If the lock acquisitions interleave such that mptcp_pm_destroy()'s
> splice runs first, the subsequent alloc paths land on a list head
> pointing to itself. Nothing else iterates that list for this msk,
> and the added entry leaks. kmemleak reports both 192-byte
> mptcp_pm_add_entry objects (from mptcp_pm_alloc_anno_list()) and
> 64-byte mptcp_pm_addr_entry objects (from
> mptcp_userspace_pm_append_new_local_addr()) under sustained
> concurrent ANNOUNCE + close load against the userspace PM.
> 
> Add a teardown flag set by mptcp_pm_destroy() before its
> pm.lock-held splice and checked under pm.lock by the alloc paths.
> Either the alloc takes pm.lock first, in which case its entry
> will be on the list when destroy's splice runs and will be freed
> normally; or destroy takes pm.lock first, in which case the
> subsequent alloc observes the flag and refuses. Either ordering
> is correct, and the race that orphans an entry is precluded. The
> flag is cleared in mptcp_pm_data_init() so the state is correct
> on slab reuse.
> 
> Found by an MPTCP protocol-flow harness extending BRF
> (arXiv:2305.08782). Baseline ~0.3 kmemleak reports/minute on the
> userspace PM reduces to zero across multi-hour runs with this
> patch applied.

Nice, thank you for executing these tests, and fixing issues!

I'm travelling, and I'm not able to do a full detailed review, but here
is already a quick one.

> Assisted-by: Claude:claude-opus-4-7

Thank you for adding this tag. Can you also add a Fixes tag, pleases?

> Signed-off-by: Shardul Bankar <shardul.b@mpiricsoftware.com>
> ---
>  net/mptcp/pm.c           | 6 ++++++
>  net/mptcp/pm_userspace.c | 4 ++++
>  net/mptcp/protocol.h     | 7 +++++++
>  3 files changed, 17 insertions(+)
> 
> diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c
> index 3e770c7407e1f..5b5c5e0c38583 100644
> --- a/net/mptcp/pm.c
> +++ b/net/mptcp/pm.c
> @@ -440,6 +440,9 @@ bool mptcp_pm_alloc_anno_list(struct mptcp_sock *msk,
>  
>  	lockdep_assert_held(&msk->pm.lock);
>  
> +	if (READ_ONCE(msk->pm.destroying))
> +		return false;
> +
>  	add_entry = mptcp_lookup_anno_list_by_saddr(msk, addr);
>  
>  	if (add_entry) {
> @@ -1109,6 +1112,8 @@ void mptcp_pm_worker(struct mptcp_sock *msk)
>  
>  void mptcp_pm_destroy(struct mptcp_sock *msk)
>  {
> +	WRITE_ONCE(msk->pm.destroying, true);
> +
>  	mptcp_pm_free_anno_list(msk);
>  
>  	if (mptcp_pm_is_userspace(msk))
> @@ -1149,6 +1154,7 @@ void mptcp_pm_data_init(struct mptcp_sock *msk)
>  	spin_lock_init(&msk->pm.lock);
>  	INIT_LIST_HEAD(&msk->pm.anno_list);
>  	INIT_LIST_HEAD(&msk->pm.userspace_pm_local_addr_list);
> +	msk->pm.destroying = false;
>  	mptcp_pm_data_reset(msk);

I guess this field should be reset in mptcp_pm_data_reset(): to be able
to re-use the socket after a disconnection (see mptcp_disconnect()).

>  }
>  
> diff --git a/net/mptcp/pm_userspace.c b/net/mptcp/pm_userspace.c
> index 8cbc1920afb49..bf53db7307302 100644
> --- a/net/mptcp/pm_userspace.c
> +++ b/net/mptcp/pm_userspace.c
> @@ -54,6 +54,10 @@ static int mptcp_userspace_pm_append_new_local_addr(struct mptcp_sock *msk,
>  	bitmap_zero(id_bitmap, MPTCP_PM_MAX_ADDR_ID + 1);
>  
>  	spin_lock_bh(&msk->pm.lock);
> +	if (READ_ONCE(msk->pm.destroying)) {
> +		ret = -EINVAL;
> +		goto append_err;
> +	}
>  	mptcp_for_each_userspace_pm_addr(msk, e) {
>  		addr_match = mptcp_addresses_equal(&e->addr, &entry->addr, true);
>  		if (addr_match && entry->addr.id == 0 && needs_id)
> diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h
> index e4f5aba24da7d..c7a068ea71324 100644
> --- a/net/mptcp/protocol.h
> +++ b/net/mptcp/protocol.h
> @@ -222,6 +222,13 @@ struct mptcp_pm_data {
>  
>  	spinlock_t	lock;		/*protects the whole PM data */
>  
> +	/* set by mptcp_pm_destroy() before its pm.lock-held splice;
> +	 * checked under pm.lock by alloc paths to refuse adds that would
> +	 * be orphaned by destroy's splice.  Cleared in mptcp_pm_data_init()
> +	 * so the field is correct after slab reuse.
> +	 */
> +	bool		destroying;

Would it work to use a new bit in 'status' instead?
You can add a *short* comment next to declaration of the new bit.

> +
>  	struct_group(reset,
>  
>  	u8		addr_signal;

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.
Re: [PATCH net] mptcp: pm: fix memory leak from alloc-during-teardown race
Posted by MPTCP CI 5 days, 21 hours ago
Hi Shardul,

Thank you for your modifications, that's great!

Our CI did some validations and here is its report:

- KVM Validation: normal (except selftest_mptcp_join): Success! ✅
- KVM Validation: normal (only selftest_mptcp_join): Success! ✅
- KVM Validation: debug (except selftest_mptcp_join): Unstable: 1 failed test(s): packetdrill_dss ⚠️ 
- KVM Validation: debug (only selftest_mptcp_join): Success! ✅
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/26120116974

Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/9d4635b60d0a
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=1097575


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-normal

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)