[PATCH mptcp-net 1/2] mptcp: pm: in-kernel: usable client side with C-flag

Matthieu Baerts (NGI0) posted 2 patches 1 day, 22 hours ago
There is a newer version of this series
[PATCH mptcp-net 1/2] mptcp: pm: in-kernel: usable client side with C-flag
Posted by Matthieu Baerts (NGI0) 1 day, 22 hours ago
When servers set the C-flag in their MP_CAPABLE to tell clients not to
create subflows to the initial address and port, clients will likely not
use their other endpoints. That's because the in-kernel path-manager
uses the 'subflow' endpoints to create subflows only to the initial
address and port.

If the limits have not been modified to accept ADD_ADDR, the client
doesn't try to establish new subflows. If the limits accept ADD_ADDR,
the routing routes will be used to select the source IP.

The C-flag is typically set when the server is operating behind a legacy
Layer 4 load balancer, or using anycast IP address. Clients having their
different 'subflow' endpoints setup, don't end up creating multiple
subflows as expected, and causing some deployment issues.

A special case is then added here: when servers set the C-flag in the
MPC and directly sends an ADD_ADDR, this single ADD_ADDR is accepted.
The 'subflows' endpoints will then be used with this new remote IP and
port. This exception is only allowed when the ADD_ADDR is sent
immediately after the 3WHS, and makes the client switching to the 'fully
established' mode. After that, 'select_local_address()' will not be able
to find any subflows, because 'id_avail_bitmap' will be filled in
mptcp_pm_create_subflow_or_signal_addr(), when switching to 'fully
established' mode.

Fixes: df377be38725 ("mptcp: add deny_join_id0 in mptcp_options_received")
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/536
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
Notes:
- I tried to find a simple solution that can hopefully be backported to
  cover the 'CDN' use-case.
- I tried to find a solution respecting the 'accepted add addr' limit,
  but I don't see how to... That's why here I limit the exception a
  maximum: when this limit is 0 (default case), only the first ADD_ADDR,
  etc. Feel free to share what you think about that.
- I have some additional code doing some cleanup, and I started to look
  at #503, which is a bit linked to that.
---
 net/mptcp/pm.c        |  7 +++++--
 net/mptcp/pm_kernel.c | 39 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 44 insertions(+), 2 deletions(-)

diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c
index 02dfb379417e2843301f121039ec0d370b040ef7..fda8d1880224dedbbfb20954f93a04da435346df 100644
--- a/net/mptcp/pm.c
+++ b/net/mptcp/pm.c
@@ -637,9 +637,12 @@ void mptcp_pm_add_addr_received(const struct sock *ssk,
 		} else {
 			__MPTCP_INC_STATS(sock_net((struct sock *)msk), MPTCP_MIB_ADDADDRDROP);
 		}
-	/* id0 should not have a different address */
+	/* id0 should not have a different address ; special case for C-flag */
 	} else if ((addr->id == 0 && !mptcp_pm_is_init_remote_addr(msk, addr)) ||
-		   (addr->id > 0 && !READ_ONCE(pm->accept_addr))) {
+		   (addr->id > 0 && !READ_ONCE(pm->accept_addr) &&
+		    (!READ_ONCE(msk->pm.remote_deny_join_id0) ||
+		     msk->pm.local_addr_used != 0 ||
+		     mptcp_pm_get_add_addr_accept_max(msk) != 0))) {
 		mptcp_pm_announce_addr(msk, addr, true);
 		mptcp_pm_add_addr_send_ack(msk);
 	} else if (mptcp_pm_schedule_work(msk, MPTCP_PM_ADD_ADDR_RECEIVED)) {
diff --git a/net/mptcp/pm_kernel.c b/net/mptcp/pm_kernel.c
index f30df8e884a0b3de9fb092e5774cafe08f6350ce..408801e20ed6e874b80ccbf38af76abc13615a3a 100644
--- a/net/mptcp/pm_kernel.c
+++ b/net/mptcp/pm_kernel.c
@@ -389,10 +389,15 @@ static unsigned int fill_local_addresses_vec(struct mptcp_sock *msk,
 	struct mptcp_addr_info mpc_addr;
 	struct pm_nl_pernet *pernet;
 	unsigned int subflows_max;
+	bool deny_join_id0;
 	int i = 0;
 
 	pernet = pm_nl_get_pernet_from_msk(msk);
 	subflows_max = mptcp_pm_get_subflows_max(msk);
+	deny_join_id0 = remote->id && READ_ONCE(msk->pm.remote_deny_join_id0) &&
+			!READ_ONCE(msk->pm.accept_addr) &&
+			msk->pm.local_addr_used == 0 &&
+			mptcp_pm_get_add_addr_accept_max(msk) == 0;
 
 	mptcp_local_address((struct sock_common *)msk, &mpc_addr);
 
@@ -409,6 +414,9 @@ static unsigned int fill_local_addresses_vec(struct mptcp_sock *msk,
 			locals[i].flags = entry->flags;
 			locals[i].ifindex = entry->ifindex;
 
+			if (deny_join_id0)
+				__clear_bit(locals[i].addr.id, msk->pm.id_avail_bitmap);
+
 			/* Special case for ID0: set the correct ID */
 			if (mptcp_addresses_equal(&locals[i].addr, &mpc_addr, locals[i].addr.port))
 				locals[i].addr.id = 0;
@@ -419,6 +427,37 @@ static unsigned int fill_local_addresses_vec(struct mptcp_sock *msk,
 	}
 	rcu_read_unlock();
 
+	/* Special case: peer sets the C flag, accept one ADD_ADDR if default
+	 * limits are used -- accepting no ADD_ADDR -- and use subflow endpoints
+	 */
+	if (!i && deny_join_id0) {
+		unsigned int local_addr_max = mptcp_pm_get_local_addr_max(msk);
+
+		while (msk->pm.local_addr_used < local_addr_max &&
+		       msk->pm.subflows < subflows_max) {
+			struct mptcp_pm_local *local = &locals[i];
+
+			if (!select_local_address(pernet, msk, local))
+				break;
+
+			__clear_bit(local->addr.id, msk->pm.id_avail_bitmap);
+
+			if (!mptcp_pm_addr_families_match(sk, &local->addr,
+							  remote))
+				continue;
+
+			if (mptcp_addresses_equal(&local->addr, &mpc_addr,
+						  local->addr.port))
+				continue;
+
+			msk->pm.local_addr_used++;
+			msk->pm.subflows++;
+			i++;
+		}
+
+		return i;
+	}
+
 	/* If the array is empty, fill in the single
 	 * 'IPADDRANY' local address
 	 */

-- 
2.51.0
Re: [PATCH mptcp-net 1/2] mptcp: pm: in-kernel: usable client side with C-flag
Posted by Geliang Tang 1 day, 6 hours ago
Hi Matt,

On Mon, 2025-09-15 at 20:11 +0200, Matthieu Baerts (NGI0) wrote:
> When servers set the C-flag in their MP_CAPABLE to tell clients not
> to
> create subflows to the initial address and port, clients will likely
> not
> use their other endpoints. That's because the in-kernel path-manager
> uses the 'subflow' endpoints to create subflows only to the initial
> address and port.
> 
> If the limits have not been modified to accept ADD_ADDR, the client
> doesn't try to establish new subflows. If the limits accept ADD_ADDR,
> the routing routes will be used to select the source IP.
> 
> The C-flag is typically set when the server is operating behind a
> legacy
> Layer 4 load balancer, or using anycast IP address. Clients having
> their
> different 'subflow' endpoints setup, don't end up creating multiple
> subflows as expected, and causing some deployment issues.
> 
> A special case is then added here: when servers set the C-flag in the
> MPC and directly sends an ADD_ADDR, this single ADD_ADDR is accepted.
> The 'subflows' endpoints will then be used with this new remote IP
> and
> port. This exception is only allowed when the ADD_ADDR is sent
> immediately after the 3WHS, and makes the client switching to the
> 'fully
> established' mode. After that, 'select_local_address()' will not be
> able
> to find any subflows, because 'id_avail_bitmap' will be filled in
> mptcp_pm_create_subflow_or_signal_addr(), when switching to 'fully
> established' mode.
> 
> Fixes: df377be38725 ("mptcp: add deny_join_id0 in
> mptcp_options_received")
> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/536
> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
> ---
> Notes:
> - I tried to find a simple solution that can hopefully be backported
> to
>   cover the 'CDN' use-case.
> - I tried to find a solution respecting the 'accepted add addr'
> limit,
>   but I don't see how to... That's why here I limit the exception a
>   maximum: when this limit is 0 (default case), only the first
> ADD_ADDR,
>   etc. Feel free to share what you think about that.
> - I have some additional code doing some cleanup, and I started to
> look
>   at #503, which is a bit linked to that.
> ---
>  net/mptcp/pm.c        |  7 +++++--
>  net/mptcp/pm_kernel.c | 39 +++++++++++++++++++++++++++++++++++++++
>  2 files changed, 44 insertions(+), 2 deletions(-)
> 
> diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c
> index
> 02dfb379417e2843301f121039ec0d370b040ef7..fda8d1880224dedbbfb20954f93
> a04da435346df 100644
> --- a/net/mptcp/pm.c
> +++ b/net/mptcp/pm.c
> @@ -637,9 +637,12 @@ void mptcp_pm_add_addr_received(const struct
> sock *ssk,
>  		} else {
>  			__MPTCP_INC_STATS(sock_net((struct sock
> *)msk), MPTCP_MIB_ADDADDRDROP);
>  		}
> -	/* id0 should not have a different address */
> +	/* id0 should not have a different address ; special case
> for C-flag */
>  	} else if ((addr->id == 0 &&
> !mptcp_pm_is_init_remote_addr(msk, addr)) ||
> -		   (addr->id > 0 && !READ_ONCE(pm->accept_addr))) {
> +		   (addr->id > 0 && !READ_ONCE(pm->accept_addr) &&
> +		    (!READ_ONCE(msk->pm.remote_deny_join_id0) ||
> +		     msk->pm.local_addr_used != 0 ||
> +		     mptcp_pm_get_add_addr_accept_max(msk) != 0))) {
>  		mptcp_pm_announce_addr(msk, addr, true);
>  		mptcp_pm_add_addr_send_ack(msk);
>  	} else if (mptcp_pm_schedule_work(msk,
> MPTCP_PM_ADD_ADDR_RECEIVED)) {
> diff --git a/net/mptcp/pm_kernel.c b/net/mptcp/pm_kernel.c
> index
> f30df8e884a0b3de9fb092e5774cafe08f6350ce..408801e20ed6e874b80ccbf38af
> 76abc13615a3a 100644
> --- a/net/mptcp/pm_kernel.c
> +++ b/net/mptcp/pm_kernel.c
> @@ -389,10 +389,15 @@ static unsigned int
> fill_local_addresses_vec(struct mptcp_sock *msk,
>  	struct mptcp_addr_info mpc_addr;
>  	struct pm_nl_pernet *pernet;
>  	unsigned int subflows_max;
> +	bool deny_join_id0;
>  	int i = 0;
>  
>  	pernet = pm_nl_get_pernet_from_msk(msk);
>  	subflows_max = mptcp_pm_get_subflows_max(msk);
> +	deny_join_id0 = remote->id && READ_ONCE(msk-
> >pm.remote_deny_join_id0) &&
> +			!READ_ONCE(msk->pm.accept_addr) &&
> +			msk->pm.local_addr_used == 0 &&
> +			mptcp_pm_get_add_addr_accept_max(msk) == 0;
>  
>  	mptcp_local_address((struct sock_common *)msk, &mpc_addr);
>  
> @@ -409,6 +414,9 @@ static unsigned int
> fill_local_addresses_vec(struct mptcp_sock *msk,
>  			locals[i].flags = entry->flags;
>  			locals[i].ifindex = entry->ifindex;
>  
> +			if (deny_join_id0)
> +				__clear_bit(locals[i].addr.id, msk-
> >pm.id_avail_bitmap);
> +
>  			/* Special case for ID0: set the correct ID
> */
>  			if (mptcp_addresses_equal(&locals[i].addr,
> &mpc_addr, locals[i].addr.port))
>  				locals[i].addr.id = 0;
> @@ -419,6 +427,37 @@ static unsigned int
> fill_local_addresses_vec(struct mptcp_sock *msk,
>  	}
>  	rcu_read_unlock();
>  
> +	/* Special case: peer sets the C flag, accept one ADD_ADDR
> if default
> +	 * limits are used -- accepting no ADD_ADDR -- and use
> subflow endpoints
> +	 */
> +	if (!i && deny_join_id0) {
> +		unsigned int local_addr_max =
> mptcp_pm_get_local_addr_max(msk);
> +
> +		while (msk->pm.local_addr_used < local_addr_max &&
> +		       msk->pm.subflows < subflows_max) {
> +			struct mptcp_pm_local *local = &locals[i];
> +
> +			if (!select_local_address(pernet, msk,
> local))
> +				break;
> +
> +			__clear_bit(local->addr.id, msk-
> >pm.id_avail_bitmap);
> +
> +			if (!mptcp_pm_addr_families_match(sk,
> &local->addr,
> +							  remote))

Here we need to consider the case of v4mapped addresses, same as in the
section below "/* If the array is empty */".

Thanks,
-Geliang

> +				continue;
> +
> +			if (mptcp_addresses_equal(&local->addr,
> &mpc_addr,
> +						  local->addr.port))
> +				continue;
> +
> +			msk->pm.local_addr_used++;
> +			msk->pm.subflows++;
> +			i++;
> +		}
> +
> +		return i;
> +	}
> +
>  	/* If the array is empty, fill in the single
>  	 * 'IPADDRANY' local address
>  	 */
Re: [PATCH mptcp-net 1/2] mptcp: pm: in-kernel: usable client side with C-flag
Posted by Matthieu Baerts 1 day, 5 hours ago
Hi Geliang,

On 16/09/2025 12:16, Geliang Tang wrote:
> Hi Matt,
> 
> On Mon, 2025-09-15 at 20:11 +0200, Matthieu Baerts (NGI0) wrote:
>> When servers set the C-flag in their MP_CAPABLE to tell clients not
>> to
>> create subflows to the initial address and port, clients will likely
>> not
>> use their other endpoints. That's because the in-kernel path-manager
>> uses the 'subflow' endpoints to create subflows only to the initial
>> address and port.
>>
>> If the limits have not been modified to accept ADD_ADDR, the client
>> doesn't try to establish new subflows. If the limits accept ADD_ADDR,
>> the routing routes will be used to select the source IP.
>>
>> The C-flag is typically set when the server is operating behind a
>> legacy
>> Layer 4 load balancer, or using anycast IP address. Clients having
>> their
>> different 'subflow' endpoints setup, don't end up creating multiple
>> subflows as expected, and causing some deployment issues.
>>
>> A special case is then added here: when servers set the C-flag in the
>> MPC and directly sends an ADD_ADDR, this single ADD_ADDR is accepted.
>> The 'subflows' endpoints will then be used with this new remote IP
>> and
>> port. This exception is only allowed when the ADD_ADDR is sent
>> immediately after the 3WHS, and makes the client switching to the
>> 'fully
>> established' mode. After that, 'select_local_address()' will not be
>> able
>> to find any subflows, because 'id_avail_bitmap' will be filled in
>> mptcp_pm_create_subflow_or_signal_addr(), when switching to 'fully
>> established' mode.

(...)

>> diff --git a/net/mptcp/pm_kernel.c b/net/mptcp/pm_kernel.c
>> index
>> f30df8e884a0b3de9fb092e5774cafe08f6350ce..408801e20ed6e874b80ccbf38af
>> 76abc13615a3a 100644
>> --- a/net/mptcp/pm_kernel.c
>> +++ b/net/mptcp/pm_kernel.c

(...)
>> @@ -419,6 +427,37 @@ static unsigned int
>> fill_local_addresses_vec(struct mptcp_sock *msk,
>>  	}
>>  	rcu_read_unlock();
>>  
>> +	/* Special case: peer sets the C flag, accept one ADD_ADDR
>> if default
>> +	 * limits are used -- accepting no ADD_ADDR -- and use
>> subflow endpoints
>> +	 */
>> +	if (!i && deny_join_id0) {
>> +		unsigned int local_addr_max =
>> mptcp_pm_get_local_addr_max(msk);
>> +
>> +		while (msk->pm.local_addr_used < local_addr_max &&
>> +		       msk->pm.subflows < subflows_max) {
>> +			struct mptcp_pm_local *local = &locals[i];
>> +
>> +			if (!select_local_address(pernet, msk,
>> local))
>> +				break;
>> +
>> +			__clear_bit(local->addr.id, msk-
>>> pm.id_avail_bitmap);
>> +
>> +			if (!mptcp_pm_addr_families_match(sk,
>> &local->addr,
>> +							  remote))
> 
> Here we need to consider the case of v4mapped addresses, same as in the
> section below "/* If the array is empty */".

I'm not sure to understand: here, we pick a local, then
mptcp_pm_addr_families_match() takes care of the v4mapped addresses.

Here under, we define a local, and we want to take one for the same
address family. We still need the check just in case we receive a v6
address via the ADD_ADDR, and v6 is not supported.

So I don't need to change anything here. Or did I miss something?

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.

Re: [PATCH mptcp-net 1/2] mptcp: pm: in-kernel: usable client side with C-flag
Posted by Geliang Tang 1 day, 7 hours ago
Hi Matt,

Thanks for this set. I have some comments below.

On Mon, 2025-09-15 at 20:11 +0200, Matthieu Baerts (NGI0) wrote:
> When servers set the C-flag in their MP_CAPABLE to tell clients not
> to
> create subflows to the initial address and port, clients will likely
> not
> use their other endpoints. That's because the in-kernel path-manager
> uses the 'subflow' endpoints to create subflows only to the initial
> address and port.
> 
> If the limits have not been modified to accept ADD_ADDR, the client
> doesn't try to establish new subflows. If the limits accept ADD_ADDR,
> the routing routes will be used to select the source IP.
> 
> The C-flag is typically set when the server is operating behind a
> legacy
> Layer 4 load balancer, or using anycast IP address. Clients having
> their
> different 'subflow' endpoints setup, don't end up creating multiple
> subflows as expected, and causing some deployment issues.
> 
> A special case is then added here: when servers set the C-flag in the
> MPC and directly sends an ADD_ADDR, this single ADD_ADDR is accepted.
> The 'subflows' endpoints will then be used with this new remote IP
> and
> port. This exception is only allowed when the ADD_ADDR is sent
> immediately after the 3WHS, and makes the client switching to the
> 'fully
> established' mode. After that, 'select_local_address()' will not be
> able
> to find any subflows, because 'id_avail_bitmap' will be filled in
> mptcp_pm_create_subflow_or_signal_addr(), when switching to 'fully
> established' mode.
> 
> Fixes: df377be38725 ("mptcp: add deny_join_id0 in
> mptcp_options_received")
> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/536
> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
> ---
> Notes:
> - I tried to find a simple solution that can hopefully be backported
> to
>   cover the 'CDN' use-case.
> - I tried to find a solution respecting the 'accepted add addr'
> limit,
>   but I don't see how to... That's why here I limit the exception a
>   maximum: when this limit is 0 (default case), only the first
> ADD_ADDR,
>   etc. Feel free to share what you think about that.
> - I have some additional code doing some cleanup, and I started to
> look
>   at #503, which is a bit linked to that.
> ---
>  net/mptcp/pm.c        |  7 +++++--
>  net/mptcp/pm_kernel.c | 39 +++++++++++++++++++++++++++++++++++++++
>  2 files changed, 44 insertions(+), 2 deletions(-)
> 
> diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c
> index
> 02dfb379417e2843301f121039ec0d370b040ef7..fda8d1880224dedbbfb20954f93
> a04da435346df 100644
> --- a/net/mptcp/pm.c
> +++ b/net/mptcp/pm.c
> @@ -637,9 +637,12 @@ void mptcp_pm_add_addr_received(const struct
> sock *ssk,
>  		} else {
>  			__MPTCP_INC_STATS(sock_net((struct sock
> *)msk), MPTCP_MIB_ADDADDRDROP);
>  		}
> -	/* id0 should not have a different address */
> +	/* id0 should not have a different address ; special case
> for C-flag */
>  	} else if ((addr->id == 0 &&
> !mptcp_pm_is_init_remote_addr(msk, addr)) ||
> -		   (addr->id > 0 && !READ_ONCE(pm->accept_addr))) {
> +		   (addr->id > 0 && !READ_ONCE(pm->accept_addr) &&
> +		    (!READ_ONCE(msk->pm.remote_deny_join_id0) ||
> +		     msk->pm.local_addr_used != 0 ||
> +		     mptcp_pm_get_add_addr_accept_max(msk) != 0))) {

I think we can define an mptcp_pm_deny_join_id0(struct mptcp_sock *msk)
helper, then use !mptcp_pm_deny_join_id0(msk) here, and use
mptcp_pm_deny_join_id0(msk) in fill_local_addresses_vec().

For example:

static inline bool mptcp_pm_deny_join_id0(struct mptcp_sock *msk)
{
       return (READ_ONCE(msk->pm.remote_deny_join_id0) &&
               msk->pm.local_addr_used == 0 &&
               mptcp_pm_get_add_addr_accept_max(msk) == 0);
}

>  		mptcp_pm_announce_addr(msk, addr, true);
>  		mptcp_pm_add_addr_send_ack(msk);
>  	} else if (mptcp_pm_schedule_work(msk,
> MPTCP_PM_ADD_ADDR_RECEIVED)) {
> diff --git a/net/mptcp/pm_kernel.c b/net/mptcp/pm_kernel.c
> index
> f30df8e884a0b3de9fb092e5774cafe08f6350ce..408801e20ed6e874b80ccbf38af
> 76abc13615a3a 100644
> --- a/net/mptcp/pm_kernel.c
> +++ b/net/mptcp/pm_kernel.c
> @@ -389,10 +389,15 @@ static unsigned int
> fill_local_addresses_vec(struct mptcp_sock *msk,
>  	struct mptcp_addr_info mpc_addr;
>  	struct pm_nl_pernet *pernet;
>  	unsigned int subflows_max;
> +	bool deny_join_id0;
>  	int i = 0;
>  
>  	pernet = pm_nl_get_pernet_from_msk(msk);
>  	subflows_max = mptcp_pm_get_subflows_max(msk);
> +	deny_join_id0 = remote->id && READ_ONCE(msk-
> >pm.remote_deny_join_id0) &&
> +			!READ_ONCE(msk->pm.accept_addr) &&
> +			msk->pm.local_addr_used == 0 &&
> +			mptcp_pm_get_add_addr_accept_max(msk) == 0;

I think it's unnecessary to check msk->pm.accept_addr again here, as it
has already been verified in mptcp_pm_add_addr_received, which should
be sufficient:

deny_join_id0 = remote->id && mptcp_pm_deny_join_id0(msk);

>  
>  	mptcp_local_address((struct sock_common *)msk, &mpc_addr);
>  
> @@ -409,6 +414,9 @@ static unsigned int
> fill_local_addresses_vec(struct mptcp_sock *msk,
>  			locals[i].flags = entry->flags;
>  			locals[i].ifindex = entry->ifindex;
>  
> +			if (deny_join_id0)
> +				__clear_bit(locals[i].addr.id, msk-
> >pm.id_avail_bitmap);
> +
>  			/* Special case for ID0: set the correct ID
> */
>  			if (mptcp_addresses_equal(&locals[i].addr,
> &mpc_addr, locals[i].addr.port))
>  				locals[i].addr.id = 0;
> @@ -419,6 +427,37 @@ static unsigned int
> fill_local_addresses_vec(struct mptcp_sock *msk,
>  	}
>  	rcu_read_unlock();
>  
> +	/* Special case: peer sets the C flag, accept one ADD_ADDR
> if default
> +	 * limits are used -- accepting no ADD_ADDR -- and use
> subflow endpoints
> +	 */
> +	if (!i && deny_join_id0) {
> +		unsigned int local_addr_max =
> mptcp_pm_get_local_addr_max(msk);

How about moving this line to after

	subflows_max = mptcp_pm_get_subflows_max(msk);?

They are similar and can be grouped together.

> +
> +		while (msk->pm.local_addr_used < local_addr_max &&
> +		       msk->pm.subflows < subflows_max) {
> +			struct mptcp_pm_local *local = &locals[i];
> +
> +			if (!select_local_address(pernet, msk,
> local))
> +				break;
> +
> +			__clear_bit(local->addr.id, msk-
> >pm.id_avail_bitmap);
> +
> +			if (!mptcp_pm_addr_families_match(sk,
> &local->addr,
> +							  remote))
> +				continue;
> +
> +			if (mptcp_addresses_equal(&local->addr,
> &mpc_addr,
> +						  local->addr.port))
> +				continue;
> +
> +			msk->pm.local_addr_used++;
> +			msk->pm.subflows++;
> +			i++;
> +		}
> +
> +		return i;

Here, we can avoid using a return statement to allow the subsequent
array empty check to proceed.

> +	}
> +
>  	/* If the array is empty, fill in the single
>  	 * 'IPADDRANY' local address
>  	 */
Re: [PATCH mptcp-net 1/2] mptcp: pm: in-kernel: usable client side with C-flag
Posted by Matthieu Baerts 1 day, 6 hours ago
Hi Geliang,

On 16/09/2025 11:18, Geliang Tang wrote:
> Hi Matt,
> 
> Thanks for this set. I have some comments below.
> 
> On Mon, 2025-09-15 at 20:11 +0200, Matthieu Baerts (NGI0) wrote:
>> When servers set the C-flag in their MP_CAPABLE to tell clients not
>> to
>> create subflows to the initial address and port, clients will likely
>> not
>> use their other endpoints. That's because the in-kernel path-manager
>> uses the 'subflow' endpoints to create subflows only to the initial
>> address and port.
>>
>> If the limits have not been modified to accept ADD_ADDR, the client
>> doesn't try to establish new subflows. If the limits accept ADD_ADDR,
>> the routing routes will be used to select the source IP.
>>
>> The C-flag is typically set when the server is operating behind a
>> legacy
>> Layer 4 load balancer, or using anycast IP address. Clients having
>> their
>> different 'subflow' endpoints setup, don't end up creating multiple
>> subflows as expected, and causing some deployment issues.
>>
>> A special case is then added here: when servers set the C-flag in the
>> MPC and directly sends an ADD_ADDR, this single ADD_ADDR is accepted.
>> The 'subflows' endpoints will then be used with this new remote IP
>> and
>> port. This exception is only allowed when the ADD_ADDR is sent
>> immediately after the 3WHS, and makes the client switching to the
>> 'fully
>> established' mode. After that, 'select_local_address()' will not be
>> able
>> to find any subflows, because 'id_avail_bitmap' will be filled in
>> mptcp_pm_create_subflow_or_signal_addr(), when switching to 'fully
>> established' mode.
>>
>> Fixes: df377be38725 ("mptcp: add deny_join_id0 in
>> mptcp_options_received")
>> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/536
>> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
>> ---
>> Notes:
>> - I tried to find a simple solution that can hopefully be backported
>> to
>>   cover the 'CDN' use-case.
>> - I tried to find a solution respecting the 'accepted add addr'
>> limit,
>>   but I don't see how to... That's why here I limit the exception a
>>   maximum: when this limit is 0 (default case), only the first
>> ADD_ADDR,
>>   etc. Feel free to share what you think about that.
>> - I have some additional code doing some cleanup, and I started to
>> look
>>   at #503, which is a bit linked to that.
>> ---
>>  net/mptcp/pm.c        |  7 +++++--
>>  net/mptcp/pm_kernel.c | 39 +++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 44 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c
>> index
>> 02dfb379417e2843301f121039ec0d370b040ef7..fda8d1880224dedbbfb20954f93
>> a04da435346df 100644
>> --- a/net/mptcp/pm.c
>> +++ b/net/mptcp/pm.c
>> @@ -637,9 +637,12 @@ void mptcp_pm_add_addr_received(const struct
>> sock *ssk,
>>  		} else {
>>  			__MPTCP_INC_STATS(sock_net((struct sock
>> *)msk), MPTCP_MIB_ADDADDRDROP);
>>  		}
>> -	/* id0 should not have a different address */
>> +	/* id0 should not have a different address ; special case
>> for C-flag */
>>  	} else if ((addr->id == 0 &&
>> !mptcp_pm_is_init_remote_addr(msk, addr)) ||
>> -		   (addr->id > 0 && !READ_ONCE(pm->accept_addr))) {
>> +		   (addr->id > 0 && !READ_ONCE(pm->accept_addr) &&
>> +		    (!READ_ONCE(msk->pm.remote_deny_join_id0) ||
>> +		     msk->pm.local_addr_used != 0 ||
>> +		     mptcp_pm_get_add_addr_accept_max(msk) != 0))) {
> 
> I think we can define an mptcp_pm_deny_join_id0(struct mptcp_sock *msk)
> helper, then use !mptcp_pm_deny_join_id0(msk) here, and use
> mptcp_pm_deny_join_id0(msk) in fill_local_addresses_vec().
> 
> For example:
> 
> static inline bool mptcp_pm_deny_join_id0(struct mptcp_sock *msk)
> {
>        return (READ_ONCE(msk->pm.remote_deny_join_id0) &&
>                msk->pm.local_addr_used == 0 &&
>                mptcp_pm_get_add_addr_accept_max(msk) == 0);
> }

I initially didn't want to add such helper which means modifying more
files: this will cause more troubles for the backports just for that.
But now thinking about that again, I guess we will have issues with the
backports anyway, because the code has been moved around in pm_kernel.c.
So I suppose it is fine to introduce more conflicts, and we can add a
new helper.

> 
>>  		mptcp_pm_announce_addr(msk, addr, true);
>>  		mptcp_pm_add_addr_send_ack(msk);
>>  	} else if (mptcp_pm_schedule_work(msk,
>> MPTCP_PM_ADD_ADDR_RECEIVED)) {
>> diff --git a/net/mptcp/pm_kernel.c b/net/mptcp/pm_kernel.c
>> index
>> f30df8e884a0b3de9fb092e5774cafe08f6350ce..408801e20ed6e874b80ccbf38af
>> 76abc13615a3a 100644
>> --- a/net/mptcp/pm_kernel.c
>> +++ b/net/mptcp/pm_kernel.c
>> @@ -389,10 +389,15 @@ static unsigned int
>> fill_local_addresses_vec(struct mptcp_sock *msk,
>>  	struct mptcp_addr_info mpc_addr;
>>  	struct pm_nl_pernet *pernet;
>>  	unsigned int subflows_max;
>> +	bool deny_join_id0;
>>  	int i = 0;
>>  
>>  	pernet = pm_nl_get_pernet_from_msk(msk);
>>  	subflows_max = mptcp_pm_get_subflows_max(msk);
>> +	deny_join_id0 = remote->id && READ_ONCE(msk-
>>> pm.remote_deny_join_id0) &&
>> +			!READ_ONCE(msk->pm.accept_addr) &&
>> +			msk->pm.local_addr_used == 0 &&
>> +			mptcp_pm_get_add_addr_accept_max(msk) == 0;
> 
> I think it's unnecessary to check msk->pm.accept_addr again here, as it
> has already been verified in mptcp_pm_add_addr_received, which should
> be sufficient:
> 
> deny_join_id0 = remote->id && mptcp_pm_deny_join_id0(msk);

Mmh, we can be in a situation where remote_deny_join_id0 is true, but
accept_addr is true as well, and in this case, we don't want the exception.

But I suppose that mptcp_pm_get_add_addr_accept_max(msk) == 0 implies
accept_addr being false. So I guess we can drop this condition.
> 
>>  
>>  	mptcp_local_address((struct sock_common *)msk, &mpc_addr);
>>  
>> @@ -409,6 +414,9 @@ static unsigned int
>> fill_local_addresses_vec(struct mptcp_sock *msk,
>>  			locals[i].flags = entry->flags;
>>  			locals[i].ifindex = entry->ifindex;
>>  
>> +			if (deny_join_id0)
>> +				__clear_bit(locals[i].addr.id, msk-
>>> pm.id_avail_bitmap);
>> +
>>  			/* Special case for ID0: set the correct ID
>> */
>>  			if (mptcp_addresses_equal(&locals[i].addr,
>> &mpc_addr, locals[i].addr.port))
>>  				locals[i].addr.id = 0;
>> @@ -419,6 +427,37 @@ static unsigned int
>> fill_local_addresses_vec(struct mptcp_sock *msk,
>>  	}
>>  	rcu_read_unlock();
>>  
>> +	/* Special case: peer sets the C flag, accept one ADD_ADDR
>> if default
>> +	 * limits are used -- accepting no ADD_ADDR -- and use
>> subflow endpoints
>> +	 */
>> +	if (!i && deny_join_id0) {
>> +		unsigned int local_addr_max =
>> mptcp_pm_get_local_addr_max(msk);
> 
> How about moving this line to after
> 
> 	subflows_max = mptcp_pm_get_subflows_max(msk);?
> 
> They are similar and can be grouped together.

Simply to restrict the scope to the block where it is used:
local_addr_max is only used here.
> 
>> +
>> +		while (msk->pm.local_addr_used < local_addr_max &&
>> +		       msk->pm.subflows < subflows_max) {
>> +			struct mptcp_pm_local *local = &locals[i];
>> +
>> +			if (!select_local_address(pernet, msk,
>> local))
>> +				break;
>> +
>> +			__clear_bit(local->addr.id, msk-
>>> pm.id_avail_bitmap);
>> +
>> +			if (!mptcp_pm_addr_families_match(sk,
>> &local->addr,
>> +							  remote))
>> +				continue;
>> +
>> +			if (mptcp_addresses_equal(&local->addr,
>> &mpc_addr,
>> +						  local->addr.port))
>> +				continue;
>> +
>> +			msk->pm.local_addr_used++;
>> +			msk->pm.subflows++;
>> +			i++;
>> +		}
>> +
>> +		return i;
> 
> Here, we can avoid using a return statement to allow the subsequent
> array empty check to proceed.

No, we cannot: if we are in the exception (deny_join_id0), it is
possible we are not allowed to create additional subflows using the
'subflow' endpoints, see the conditions in the 'while()'. In this case,
we don't want to create a new subflow below, because the idea is to use
the 'subflow' endpoints if we can.

Or we need to check these conditions before, but that's a lot of
conditions, and it seems safer to return here, no,

> 
>> +	}
>> +
>>  	/* If the array is empty, fill in the single
>>  	 * 'IPADDRANY' local address
>>  	 */

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.