[PATCH mptcp-net v2 1/2] mptcp: pm: avoid sending RM_ADDR over same subflow

Matthieu Baerts (NGI0) posted 2 patches 1 week, 2 days ago
[PATCH mptcp-net v2 1/2] mptcp: pm: avoid sending RM_ADDR over same subflow
Posted by Matthieu Baerts (NGI0) 1 week, 2 days ago
RM_ADDR are sent over an active subflow, the first one in the subflows
list. There is then a high chance the initial subflow is picked. With
the in-kernel PM, when an endpoint is removed, a RM_ADDR is sent, then
linked subflows are closed. This is done for each active MPTCP
connection.

MPTCP endpoints are likely removed because the attached network is no
longer available or usable. In this case, it is better to avoid sending
this RM_ADDR over the subflow that is going to be removed, but prefer
sending it over another active and non stale subflow, if any.

This modification avoids situations where the other end is not notified
when a subflow is no longer usable: typically when the endpoint linked
to the initial subflow is removed, especially on the server side.

Fixes: 8dd5efb1f91b ("mptcp: send ack for rm_addr")
Reported-by: Frank Lorenz
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/612
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
Note: in my initial version, I only used one alternative for both
"stale" and "same id" subflows. I guess it is better to send over the
same subflow than a stale one, hence the priority, but there are then a
few more lines of code (but still readable, I think). To be discussed.

v2:
 - reduce one indentation level and s/rlist/rm_list/g
---
 net/mptcp/pm.c | 55 +++++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 43 insertions(+), 12 deletions(-)

diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c
index 8206b0fd2377..daef91e597ae 100644
--- a/net/mptcp/pm.c
+++ b/net/mptcp/pm.c
@@ -212,9 +212,24 @@ void mptcp_pm_send_ack(struct mptcp_sock *msk,
 	spin_lock_bh(&msk->pm.lock);
 }
 
-void mptcp_pm_addr_send_ack(struct mptcp_sock *msk)
+static bool subflow_in_rm_list(const struct mptcp_subflow_context *subflow,
+			       const struct mptcp_rm_list *rm_list)
 {
-	struct mptcp_subflow_context *subflow, *alt = NULL;
+	u8 i, id = subflow_get_local_id(subflow);
+
+	for (i = 0; i < rm_list->nr; i++) {
+		if (rm_list->ids[i] == id)
+			return true;
+	}
+
+	return false;
+}
+
+static void
+mptcp_pm_addr_send_ack_avoid_list(struct mptcp_sock *msk,
+				  const struct mptcp_rm_list *rm_list)
+{
+	struct mptcp_subflow_context *subflow, *stale = NULL, *same_id = NULL;
 
 	msk_owned_by_me(msk);
 	lockdep_assert_held(&msk->pm.lock);
@@ -224,19 +239,35 @@ void mptcp_pm_addr_send_ack(struct mptcp_sock *msk)
 		return;
 
 	mptcp_for_each_subflow(msk, subflow) {
-		if (__mptcp_subflow_active(subflow)) {
-			if (!subflow->stale) {
-				mptcp_pm_send_ack(msk, subflow, false, false);
-				return;
-			}
+		if (!__mptcp_subflow_active(subflow))
+			continue;
 
-			if (!alt)
-				alt = subflow;
+		if (unlikely(subflow->stale)) {
+			if (!stale)
+				stale = subflow;
+		} else if (unlikely(rm_list &&
+				    subflow_in_rm_list(subflow, rm_list))) {
+			if (!same_id)
+				same_id = subflow;
+		} else {
+			goto send_ack;
 		}
 	}
 
-	if (alt)
-		mptcp_pm_send_ack(msk, alt, false, false);
+	if (same_id)
+		subflow = same_id;
+	else if (stale)
+		subflow = stale;
+	else
+		return;
+
+send_ack:
+	mptcp_pm_send_ack(msk, subflow, false, false);
+}
+
+void mptcp_pm_addr_send_ack(struct mptcp_sock *msk)
+{
+	mptcp_pm_addr_send_ack_avoid_list(msk, NULL);
 }
 
 int mptcp_pm_mp_prio_send_ack(struct mptcp_sock *msk,
@@ -470,7 +501,7 @@ int mptcp_pm_remove_addr(struct mptcp_sock *msk, const struct mptcp_rm_list *rm_
 	msk->pm.rm_list_tx = *rm_list;
 	rm_addr |= BIT(MPTCP_RM_ADDR_SIGNAL);
 	WRITE_ONCE(msk->pm.addr_signal, rm_addr);
-	mptcp_pm_addr_send_ack(msk);
+	mptcp_pm_addr_send_ack_avoid_list(msk, rm_list);
 	return 0;
 }
 

-- 
2.51.0
Re: [PATCH mptcp-net v2 1/2] mptcp: pm: avoid sending RM_ADDR over same subflow
Posted by Mat Martineau 5 days, 2 hours ago
On Fri, 20 Feb 2026, Matthieu Baerts (NGI0) wrote:

> RM_ADDR are sent over an active subflow, the first one in the subflows
> list. There is then a high chance the initial subflow is picked. With
> the in-kernel PM, when an endpoint is removed, a RM_ADDR is sent, then
> linked subflows are closed. This is done for each active MPTCP
> connection.
>
> MPTCP endpoints are likely removed because the attached network is no
> longer available or usable. In this case, it is better to avoid sending
> this RM_ADDR over the subflow that is going to be removed, but prefer
> sending it over another active and non stale subflow, if any.
>
> This modification avoids situations where the other end is not notified
> when a subflow is no longer usable: typically when the endpoint linked
> to the initial subflow is removed, especially on the server side.
>
> Fixes: 8dd5efb1f91b ("mptcp: send ack for rm_addr")
> Reported-by: Frank Lorenz
> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/612
> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
> ---
> Note: in my initial version, I only used one alternative for both
> "stale" and "same id" subflows. I guess it is better to send over the
> same subflow than a stale one, hence the priority, but there are then a
> few more lines of code (but still readable, I think). To be discussed.
>
> v2:
> - reduce one indentation level and s/rlist/rm_list/g
> ---
> net/mptcp/pm.c | 55 +++++++++++++++++++++++++++++++++++++++++++------------
> 1 file changed, 43 insertions(+), 12 deletions(-)
>
> diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c
> index 8206b0fd2377..daef91e597ae 100644
> --- a/net/mptcp/pm.c
> +++ b/net/mptcp/pm.c
> @@ -212,9 +212,24 @@ void mptcp_pm_send_ack(struct mptcp_sock *msk,
> 	spin_lock_bh(&msk->pm.lock);
> }
>
> -void mptcp_pm_addr_send_ack(struct mptcp_sock *msk)
> +static bool subflow_in_rm_list(const struct mptcp_subflow_context *subflow,
> +			       const struct mptcp_rm_list *rm_list)
> {
> -	struct mptcp_subflow_context *subflow, *alt = NULL;
> +	u8 i, id = subflow_get_local_id(subflow);
> +
> +	for (i = 0; i < rm_list->nr; i++) {
> +		if (rm_list->ids[i] == id)
> +			return true;
> +	}
> +
> +	return false;
> +}
> +
> +static void
> +mptcp_pm_addr_send_ack_avoid_list(struct mptcp_sock *msk,
> +				  const struct mptcp_rm_list *rm_list)
> +{
> +	struct mptcp_subflow_context *subflow, *stale = NULL, *same_id = NULL;
>
> 	msk_owned_by_me(msk);
> 	lockdep_assert_held(&msk->pm.lock);
> @@ -224,19 +239,35 @@ void mptcp_pm_addr_send_ack(struct mptcp_sock *msk)
> 		return;
>
> 	mptcp_for_each_subflow(msk, subflow) {
> -		if (__mptcp_subflow_active(subflow)) {
> -			if (!subflow->stale) {
> -				mptcp_pm_send_ack(msk, subflow, false, false);
> -				return;
> -			}
> +		if (!__mptcp_subflow_active(subflow))
> +			continue;
>
> -			if (!alt)
> -				alt = subflow;
> +		if (unlikely(subflow->stale)) {
> +			if (!stale)
> +				stale = subflow;
> +		} else if (unlikely(rm_list &&
> +				    subflow_in_rm_list(subflow, rm_list))) {
> +			if (!same_id)
> +				same_id = subflow;
> +		} else {
> +			goto send_ack;

Hi Matthieu -

This is definitely an improvement over the older code, thanks! It does 
still send RM_ADDR exactly once. It could also RM_ADDR using *all* active 
non-stale subflows (any that are delivered after the first would be 
ignored). In terms of interoperability there is the risk of confusing the 
peer's path manager if it doesn't handle RM_ADDR for a non-existant 
subflow.

Maybe that's more of a mptcp-next feature (if it makes sense to do at 
all).

The v2 patch here is closer to the existing behavior so I'm ok with 
approving it:

Reviewed-by: Mat Martineau <martineau@kernel.org>

> 		}
> 	}
>
> -	if (alt)
> -		mptcp_pm_send_ack(msk, alt, false, false);
> +	if (same_id)
> +		subflow = same_id;
> +	else if (stale)
> +		subflow = stale;
> +	else
> +		return;
> +
> +send_ack:
> +	mptcp_pm_send_ack(msk, subflow, false, false);
> +}
> +
> +void mptcp_pm_addr_send_ack(struct mptcp_sock *msk)
> +{
> +	mptcp_pm_addr_send_ack_avoid_list(msk, NULL);
> }
>
> int mptcp_pm_mp_prio_send_ack(struct mptcp_sock *msk,
> @@ -470,7 +501,7 @@ int mptcp_pm_remove_addr(struct mptcp_sock *msk, const struct mptcp_rm_list *rm_
> 	msk->pm.rm_list_tx = *rm_list;
> 	rm_addr |= BIT(MPTCP_RM_ADDR_SIGNAL);
> 	WRITE_ONCE(msk->pm.addr_signal, rm_addr);
> -	mptcp_pm_addr_send_ack(msk);
> +	mptcp_pm_addr_send_ack_avoid_list(msk, rm_list);
> 	return 0;
> }
>
>
> -- 
> 2.51.0
>
>
>
Re: [PATCH mptcp-net v2 1/2] mptcp: pm: avoid sending RM_ADDR over same subflow
Posted by Matthieu Baerts 4 days, 18 hours ago
Hi Mat,

Thank you for the review!

On 25/02/2026 05:12, Mat Martineau wrote:
> On Fri, 20 Feb 2026, Matthieu Baerts (NGI0) wrote:
> 
>> RM_ADDR are sent over an active subflow, the first one in the subflows
>> list. There is then a high chance the initial subflow is picked. With
>> the in-kernel PM, when an endpoint is removed, a RM_ADDR is sent, then
>> linked subflows are closed. This is done for each active MPTCP
>> connection.
>>
>> MPTCP endpoints are likely removed because the attached network is no
>> longer available or usable. In this case, it is better to avoid sending
>> this RM_ADDR over the subflow that is going to be removed, but prefer
>> sending it over another active and non stale subflow, if any.
>>
>> This modification avoids situations where the other end is not notified
>> when a subflow is no longer usable: typically when the endpoint linked
>> to the initial subflow is removed, especially on the server side.

(...)

> This is definitely an improvement over the older code, thanks! It does
> still send RM_ADDR exactly once. It could also RM_ADDR using *all*
> active non-stale subflows (any that are delivered after the first would
> be ignored). In terms of interoperability there is the risk of confusing
> the peer's path manager if it doesn't handle RM_ADDR for a non-existant
> subflow.
> 
> Maybe that's more of a mptcp-next feature (if it makes sense to do at all).

I think implementing this would definitively be mptcp-next material. If
we want this, we will also need to change the way the option is added:
for the moment, the rm_list is copied in the msk, and a bit is set
before triggering the ACK, and when sending the ACK, the bit is reset.
So we would need to also record the subflow IDs that should send the
RM_ADDR, and only remove the main bit when all of subflows have sent it.

Now regarding the behaviour, I think it more likely to have concurrent
issues: maybe a subflow could be re-created or an ADD_ADDR could be
received before all RM_ADDR are transmitted, e.g. in case of bufferbloat
on one path?

> The v2 patch here is closer to the existing behavior so I'm ok with
> approving it:
> 
> Reviewed-by: Mat Martineau <martineau@kernel.org>

Thanks! Is this tag also covering patch 2/2?

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.
Re: [PATCH mptcp-net v2 1/2] mptcp: pm: avoid sending RM_ADDR over same subflow
Posted by Mat Martineau 4 days, 13 hours ago
On Wed, 25 Feb 2026, Matthieu Baerts wrote:

> Hi Mat,
>
> Thank you for the review!
>
> On 25/02/2026 05:12, Mat Martineau wrote:
>> On Fri, 20 Feb 2026, Matthieu Baerts (NGI0) wrote:
>>
>>> RM_ADDR are sent over an active subflow, the first one in the subflows
>>> list. There is then a high chance the initial subflow is picked. With
>>> the in-kernel PM, when an endpoint is removed, a RM_ADDR is sent, then
>>> linked subflows are closed. This is done for each active MPTCP
>>> connection.
>>>
>>> MPTCP endpoints are likely removed because the attached network is no
>>> longer available or usable. In this case, it is better to avoid sending
>>> this RM_ADDR over the subflow that is going to be removed, but prefer
>>> sending it over another active and non stale subflow, if any.
>>>
>>> This modification avoids situations where the other end is not notified
>>> when a subflow is no longer usable: typically when the endpoint linked
>>> to the initial subflow is removed, especially on the server side.
>
> (...)
>
>> This is definitely an improvement over the older code, thanks! It does
>> still send RM_ADDR exactly once. It could also RM_ADDR using *all*
>> active non-stale subflows (any that are delivered after the first would
>> be ignored). In terms of interoperability there is the risk of confusing
>> the peer's path manager if it doesn't handle RM_ADDR for a non-existant
>> subflow.
>>
>> Maybe that's more of a mptcp-next feature (if it makes sense to do at all).
>
> I think implementing this would definitively be mptcp-next material. If
> we want this, we will also need to change the way the option is added:
> for the moment, the rm_list is copied in the msk, and a bit is set
> before triggering the ACK, and when sending the ACK, the bit is reset.
> So we would need to also record the subflow IDs that should send the
> RM_ADDR, and only remove the main bit when all of subflows have sent it.
>
> Now regarding the behaviour, I think it more likely to have concurrent
> issues: maybe a subflow could be re-created or an ADD_ADDR could be
> received before all RM_ADDR are transmitted, e.g. in case of bufferbloat
> on one path?
>
>> The v2 patch here is closer to the existing behavior so I'm ok with
>> approving it:
>>
>> Reviewed-by: Mat Martineau <martineau@kernel.org>
>
> Thanks! Is this tag also covering patch 2/2?
>

Yes, I had intended to reply to the cover letter to RvB the series. 
Thanks for clarifying.

- Mat
Re: [PATCH mptcp-net v2 1/2] mptcp: pm: avoid sending RM_ADDR over same subflow
Posted by Matthieu Baerts 3 days, 22 hours ago
Hi Mat,

On 25/02/2026 17:57, Mat Martineau wrote:
> On Wed, 25 Feb 2026, Matthieu Baerts wrote:
> 
>> Hi Mat,
>>
>> Thank you for the review!
>>
>> On 25/02/2026 05:12, Mat Martineau wrote:
>>> On Fri, 20 Feb 2026, Matthieu Baerts (NGI0) wrote:
>>>
>>>> RM_ADDR are sent over an active subflow, the first one in the subflows
>>>> list. There is then a high chance the initial subflow is picked. With
>>>> the in-kernel PM, when an endpoint is removed, a RM_ADDR is sent, then
>>>> linked subflows are closed. This is done for each active MPTCP
>>>> connection.
>>>>
>>>> MPTCP endpoints are likely removed because the attached network is no
>>>> longer available or usable. In this case, it is better to avoid sending
>>>> this RM_ADDR over the subflow that is going to be removed, but prefer
>>>> sending it over another active and non stale subflow, if any.
>>>>
>>>> This modification avoids situations where the other end is not notified
>>>> when a subflow is no longer usable: typically when the endpoint linked
>>>> to the initial subflow is removed, especially on the server side.
>>
>> (...)
>>
>>> This is definitely an improvement over the older code, thanks! It does
>>> still send RM_ADDR exactly once. It could also RM_ADDR using *all*
>>> active non-stale subflows (any that are delivered after the first would
>>> be ignored). In terms of interoperability there is the risk of confusing
>>> the peer's path manager if it doesn't handle RM_ADDR for a non-existant
>>> subflow.
>>>
>>> Maybe that's more of a mptcp-next feature (if it makes sense to do at
>>> all).
>>
>> I think implementing this would definitively be mptcp-next material. If
>> we want this, we will also need to change the way the option is added:
>> for the moment, the rm_list is copied in the msk, and a bit is set
>> before triggering the ACK, and when sending the ACK, the bit is reset.
>> So we would need to also record the subflow IDs that should send the
>> RM_ADDR, and only remove the main bit when all of subflows have sent it.
>>
>> Now regarding the behaviour, I think it more likely to have concurrent
>> issues: maybe a subflow could be re-created or an ADD_ADDR could be
>> received before all RM_ADDR are transmitted, e.g. in case of bufferbloat
>> on one path?
>>
>>> The v2 patch here is closer to the existing behavior so I'm ok with
>>> approving it:
>>>
>>> Reviewed-by: Mat Martineau <martineau@kernel.org>
>>
>> Thanks! Is this tag also covering patch 2/2?
>>
> 
> Yes, I had intended to reply to the cover letter to RvB the series.
> Thanks for clarifying.

Great, just applied:

New patches for t/upstream-net and t/upstream:
- 2f79a7def595: mptcp: pm: avoid sending RM_ADDR over same subflow
- c4a9449065db: selftests: mptcp: join: check RM_ADDR not sent over same
subflow
- Results: 4a4900c134dd..652750a9d6a6 (export-net)
- Results: 5e61492ae392..d3854ef490c9 (export)

Tests are now in progress:

- export-net:
https://github.com/multipath-tcp/mptcp_net-next/commit/f7b905d300ce1fb82367bb2e069fa067c4ae0d49/checks
- export:
https://github.com/multipath-tcp/mptcp_net-next/commit/07e1c275611e95ade08509cf5eeb2c39f181d116/checks

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.