[PATCH mptcp-next 0/4] mptcp: fix endpoints with 'signal' and 'subflow' flags

Matthieu Baerts (NGI0) posted 4 patches 4 months, 1 week ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/multipath-tcp/mptcp_net-next tags/patchew/20240621-mptcp-pm-avail-v1-0-b692d5eb89b5@kernel.org
There is a newer version of this series
net/mptcp/options.c                             |  3 +-
net/mptcp/pm.c                                  |  3 +-
net/mptcp/pm_netlink.c                          | 20 +++++----
net/mptcp/protocol.h                            |  5 ++-
tools/testing/selftests/net/mptcp/mptcp_join.sh | 55 ++++++++++++++++++-------
5 files changed, 60 insertions(+), 26 deletions(-)
[PATCH mptcp-next 0/4] mptcp: fix endpoints with 'signal' and 'subflow' flags
Posted by Matthieu Baerts (NGI0) 4 months, 1 week ago
When looking at improving the user experience around the MPTCP endpoints
setup, I noticed that setting an endpoint with both the 'signal' and the
'subflow' flags -- as it has been done in the past by users according to
bug reports we got -- were resulting on only announcing the endpoint,
but not using it to create subflows: the 'subflow' flag was then
ignored.

My initial thought was to modify IPRoute2 to warn the user when the two
flags were set, but it doesn't sound normal to ignore one of them. I
then looked at modifying the kernel not to allow having the two flags
set, but when discussing about that with Mat, we thought it was maybe
not ideal to do that, as there might be use-cases, we might break some
configs, and it was working before apparently. So instead, I fixed the
support by splitting the previously shared 'available IDs' bitmap.

While at it, I added a new selftest (patch 4) to validate this case --
including a modification of the chk_add_nr helper to inverse the sides
were the counters are checked (patch 3) -- and allowed ADD_ADDR echo
just after the MP_JOIN 3WHS.

The selftests modification don't have a Fixes tag, because backporting
will not be easy: on some versions, the backport will not generate
conflicts, but the test will fail because some helpers are not
available. That's OK, many CIs will use the selftests from the last
stable version to validate previous stable releases.

Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
Matthieu Baerts (NGI0) (4):
      mptcp: fully established after ADD_ADDR echo on MPJ
      mptcp: split id_avail_bitmap per target
      selftests: mptcp: join: ability to invert ADD_ADDR check
      selftests: mptcp: join: test both signal & subflow

 net/mptcp/options.c                             |  3 +-
 net/mptcp/pm.c                                  |  3 +-
 net/mptcp/pm_netlink.c                          | 20 +++++----
 net/mptcp/protocol.h                            |  5 ++-
 tools/testing/selftests/net/mptcp/mptcp_join.sh | 55 ++++++++++++++++++-------
 5 files changed, 60 insertions(+), 26 deletions(-)
---
base-commit: e6b62d29168e1aef6aa328f8956e5e0aadffe3d5
change-id: 20240620-mptcp-pm-avail-f5e3957be441

Best regards,
-- 
Matthieu Baerts (NGI0) <matttbe@kernel.org>
Re: [PATCH mptcp-next 0/4] mptcp: fix endpoints with 'signal' and 'subflow' flags
Posted by Paolo Abeni 3 months, 4 weeks ago
Hi,

On Fri, 2024-06-21 at 13:39 +0200, Matthieu Baerts (NGI0) wrote:
> When looking at improving the user experience around the MPTCP endpoints
> setup, I noticed that setting an endpoint with both the 'signal' and the
> 'subflow' flags -- as it has been done in the past by users according to
> bug reports we got -- were resulting on only announcing the endpoint,
> but not using it to create subflows: the 'subflow' flag was then
> ignored.
> 
> My initial thought was to modify IPRoute2 to warn the user when the two
> flags were set, but it doesn't sound normal to ignore one of them. I
> then looked at modifying the kernel not to allow having the two flags
> set, but when discussing about that with Mat, we thought it was maybe
> not ideal to do that, as there might be use-cases, we might break some
> configs, and it was working before apparently. So instead, I fixed the
> support by splitting the previously shared 'available IDs' bitmap.
> 
> While at it, I added a new selftest (patch 4) to validate this case --
> including a modification of the chk_add_nr helper to inverse the sides
> were the counters are checked (patch 3) -- and allowed ADD_ADDR echo
> just after the MP_JOIN 3WHS.
> 
> The selftests modification don't have a Fixes tag, because backporting
> will not be easy: on some versions, the backport will not generate
> conflicts, but the test will fail because some helpers are not
> available. That's OK, many CIs will use the selftests from the last
> stable version to validate previous stable releases.

I'm sorry for the large latency. 

I think the above is more a new feature than a fix, I would avoid
including fixes tag, to prevent stable backports.

I am a bit doubtful about having endpoint with both subflow | signal,
do we want encourage such bogus setup? What if the endpoint specify a
port, too? 

AFAICS it can not work, as the port-based signal endpoint will create a
listener on <IP>:<port> and later on the 'subflow' flag will try to
bind on the same <IP>:<port>, failing, while creating the outgoing
connection.

What about just documenting the existing behavior?

Thanks,

Paolo
Re: [PATCH mptcp-next 0/4] mptcp: fix endpoints with 'signal' and 'subflow' flags
Posted by Matthieu Baerts 3 months, 4 weeks ago
Hi Paolo,

On 04/07/2024 17:47, Paolo Abeni wrote:
> Hi,
> 
> On Fri, 2024-06-21 at 13:39 +0200, Matthieu Baerts (NGI0) wrote:
>> When looking at improving the user experience around the MPTCP endpoints
>> setup, I noticed that setting an endpoint with both the 'signal' and the
>> 'subflow' flags -- as it has been done in the past by users according to
>> bug reports we got -- were resulting on only announcing the endpoint,
>> but not using it to create subflows: the 'subflow' flag was then
>> ignored.
>>
>> My initial thought was to modify IPRoute2 to warn the user when the two
>> flags were set, but it doesn't sound normal to ignore one of them. I
>> then looked at modifying the kernel not to allow having the two flags
>> set, but when discussing about that with Mat, we thought it was maybe
>> not ideal to do that, as there might be use-cases, we might break some
>> configs, and it was working before apparently. So instead, I fixed the
>> support by splitting the previously shared 'available IDs' bitmap.
>>
>> While at it, I added a new selftest (patch 4) to validate this case --
>> including a modification of the chk_add_nr helper to inverse the sides
>> were the counters are checked (patch 3) -- and allowed ADD_ADDR echo
>> just after the MP_JOIN 3WHS.
>>
>> The selftests modification don't have a Fixes tag, because backporting
>> will not be easy: on some versions, the backport will not generate
>> conflicts, but the test will fail because some helpers are not
>> available. That's OK, many CIs will use the selftests from the last
>> stable version to validate previous stable releases.
> 
> I'm sorry for the large latency.

No problem, thank you for your reply!

> I think the above is more a new feature than a fix, I would avoid
> including fixes tag, to prevent stable backports.

My main goal here is to reduce frustration from users who are (certainly
"badly") setting both the 'signal' and 'subflow' flags on new endpoints.
In the different doc we have, we mention what does 'signal' and
'subflow', but nothing about using them together. For us, it seems
obvious not to set the two flags together. But if you are new to MPTCP,
I can understand that "signalling doesn't seem to hurt", and both
'subflow' and 'signal' flags are used.

(Note that in the mptcp.org implementation, the fullmesh PM was sending
ADD_ADDR and establishing new subflows, just because "it doesn't hurt to
announce all addresses" :) )

Currently, if we use the two flags together, the 'subflow' flag is
ignored. I added the Fixes tag in patch 2, because it sounds like a bug
not to create subflows when the 'subflow' flag is set. And also because
it was working before the mentioned 'Fixes' commit which has
accidentally removed this support.

I understand this can be seen as a feature, but I'm not sure how to
better fix that to avoid this frustration from users when the 'subflow'
flag is ignored. Or we change the kernel not to accept setting the two
flags together?

> I am a bit doubtful about having endpoint with both subflow | signal,
> do we want encourage such bogus setup? What if the endpoint specify a
> port, too? 
> 
> AFAICS it can not work, as the port-based signal endpoint will create a
> listener on <IP>:<port> and later on the 'subflow' flag will try to
> bind on the same <IP>:<port>, failing, while creating the outgoing
> connection.

What about returning an error when a port is given with the 'subflow' flag?

Currently, we do:

  if (addr.addr.port && !(addr.flags & MPTCP_PM_ADDR_FLAG_SIGNAL))
      return -EINVAL;

We could add "(... || addr.flags & MPTCP_PM_ADDR_FLAG_SUBLFOW)".

> What about just documenting the existing behavior?

I don't think we should document that "setting both the 'signal' and
'subflow' flags is allowed, but in fact the 'subflow' flag is ignored in
>= 5.17 versions", no? :)

I initially looked at adding a warning in IPRoute2, but then I failed to
explain when setting these two flags can be justified. (And the new
warning would take time to arrive in Linux distro currently ignoring the
'subflow' flag in this case.)

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.

Re: [PATCH mptcp-next 0/4] mptcp: fix endpoints with 'signal' and 'subflow' flags
Posted by Paolo Abeni 3 months, 4 weeks ago
On Thu, 2024-07-04 at 18:43 +0200, Matthieu Baerts wrote:
> Hi Paolo,
> 
> On 04/07/2024 17:47, Paolo Abeni wrote:
> > Hi,
> > 
> > On Fri, 2024-06-21 at 13:39 +0200, Matthieu Baerts (NGI0) wrote:
> > > When looking at improving the user experience around the MPTCP endpoints
> > > setup, I noticed that setting an endpoint with both the 'signal' and the
> > > 'subflow' flags -- as it has been done in the past by users according to
> > > bug reports we got -- were resulting on only announcing the endpoint,
> > > but not using it to create subflows: the 'subflow' flag was then
> > > ignored.
> > > 
> > > My initial thought was to modify IPRoute2 to warn the user when the two
> > > flags were set, but it doesn't sound normal to ignore one of them. I
> > > then looked at modifying the kernel not to allow having the two flags
> > > set, but when discussing about that with Mat, we thought it was maybe
> > > not ideal to do that, as there might be use-cases, we might break some
> > > configs, and it was working before apparently. So instead, I fixed the
> > > support by splitting the previously shared 'available IDs' bitmap.
> > > 
> > > While at it, I added a new selftest (patch 4) to validate this case --
> > > including a modification of the chk_add_nr helper to inverse the sides
> > > were the counters are checked (patch 3) -- and allowed ADD_ADDR echo
> > > just after the MP_JOIN 3WHS.
> > > 
> > > The selftests modification don't have a Fixes tag, because backporting
> > > will not be easy: on some versions, the backport will not generate
> > > conflicts, but the test will fail because some helpers are not
> > > available. That's OK, many CIs will use the selftests from the last
> > > stable version to validate previous stable releases.
> > 
> > I'm sorry for the large latency.
> 
> No problem, thank you for your reply!
> 
> > I think the above is more a new feature than a fix, I would avoid
> > including fixes tag, to prevent stable backports.
> 
> My main goal here is to reduce frustration from users who are (certainly
> "badly") setting both the 'signal' and 'subflow' flags on new endpoints.
> In the different doc we have, we mention what does 'signal' and
> 'subflow', but nothing about using them together. For us, it seems
> obvious not to set the two flags together. But if you are new to MPTCP,
> I can understand that "signalling doesn't seem to hurt", and both
> 'subflow' and 'signal' flags are used.
> 
> (Note that in the mptcp.org implementation, the fullmesh PM was sending
> ADD_ADDR and establishing new subflows, just because "it doesn't hurt to
> announce all addresses" :) )
> 
> Currently, if we use the two flags together, the 'subflow' flag is
> ignored. I added the Fixes tag in patch 2, because it sounds like a bug
> not to create subflows when the 'subflow' flag is set. And also because
> it was working before the mentioned 'Fixes' commit which has
> accidentally removed this support.

Ah, my bad! I underlooked the commit message and missed this part.

> > I am a bit doubtful about having endpoint with both subflow | signal,
> > do we want encourage such bogus setup? What if the endpoint specify a
> > port, too? 
> > 
> > AFAICS it can not work, as the port-based signal endpoint will create a
> > listener on <IP>:<port> and later on the 'subflow' flag will try to
> > bind on the same <IP>:<port>, failing, while creating the outgoing
> > connection.
> 
> What about returning an error when a port is given with the 'subflow' flag?
> 
> Currently, we do:
> 
>   if (addr.addr.port && !(addr.flags & MPTCP_PM_ADDR_FLAG_SIGNAL))
>       return -EINVAL;
> 
> We could add "(... || addr.flags & MPTCP_PM_ADDR_FLAG_SUBLFOW)".

IMHO if we deny SIGNAL | SUBFLOW for a subset of the possible
configuration, than is better to be consistent and deny them for all
the conf.

> > What about just documenting the existing behavior?
> 
> I don't think we should document that "setting both the 'signal' and
> 'subflow' flags is allowed, but in fact the 'subflow' flag is ignored in
> > = 5.17 versions", no? :)

Yup, I missed the regression part of it.

If possible, I would be happy to reduce complexity by preventing signal
and subflow flags together.

> I initially looked at adding a warning in IPRoute2, but then I failed to
> explain when setting these two flags can be justified. (And the new
> warning would take time to arrive in Linux distro currently ignoring the
> 'subflow' flag in this case.)

Well, distro problems are distro problems, I'm sure we can ignore the
distro lifecycle problems here. Should be addressed by a completely
different set of persons ;)

Indeed the ip-mptcp man page could help some significant rewording,
what about starting from there?

PORT description implicitly refers to 'signal' endpoint without
mentioning such fact. Also it has effect for subflow endpoint, not
mentioned there

/P
Re: [PATCH mptcp-next 0/4] mptcp: fix endpoints with 'signal' and 'subflow' flags
Posted by Matthieu Baerts 3 months, 3 weeks ago
4 Jul 2024 19:32:47 Paolo Abeni <pabeni@redhat.com>:

> On Thu, 2024-07-04 at 18:43 +0200, Matthieu Baerts wrote:
>> Hi Paolo,
>>
>> On 04/07/2024 17:47, Paolo Abeni wrote:
>>> Hi,
>>>
>>> On Fri, 2024-06-21 at 13:39 +0200, Matthieu Baerts (NGI0) wrote:
>>>> When looking at improving the user experience around the MPTCP endpoints
>>>> setup, I noticed that setting an endpoint with both the 'signal' and the
>>>> 'subflow' flags -- as it has been done in the past by users according to
>>>> bug reports we got -- were resulting on only announcing the endpoint,
>>>> but not using it to create subflows: the 'subflow' flag was then
>>>> ignored.
>>>>
>>>> My initial thought was to modify IPRoute2 to warn the user when the two
>>>> flags were set, but it doesn't sound normal to ignore one of them. I
>>>> then looked at modifying the kernel not to allow having the two flags
>>>> set, but when discussing about that with Mat, we thought it was maybe
>>>> not ideal to do that, as there might be use-cases, we might break some
>>>> configs, and it was working before apparently. So instead, I fixed the
>>>> support by splitting the previously shared 'available IDs' bitmap.
>>>>
>>>> While at it, I added a new selftest (patch 4) to validate this case --
>>>> including a modification of the chk_add_nr helper to inverse the sides
>>>> were the counters are checked (patch 3) -- and allowed ADD_ADDR echo
>>>> just after the MP_JOIN 3WHS.
>>>>
>>>> The selftests modification don't have a Fixes tag, because backporting
>>>> will not be easy: on some versions, the backport will not generate
>>>> conflicts, but the test will fail because some helpers are not
>>>> available. That's OK, many CIs will use the selftests from the last
>>>> stable version to validate previous stable releases.
>>>
>>> I'm sorry for the large latency.
>>
>> No problem, thank you for your reply!
>>
>>> I think the above is more a new feature than a fix, I would avoid
>>> including fixes tag, to prevent stable backports.
>>
>> My main goal here is to reduce frustration from users who are (certainly
>> "badly") setting both the 'signal' and 'subflow' flags on new endpoints.
>> In the different doc we have, we mention what does 'signal' and
>> 'subflow', but nothing about using them together. For us, it seems
>> obvious not to set the two flags together. But if you are new to MPTCP,
>> I can understand that "signalling doesn't seem to hurt", and both
>> 'subflow' and 'signal' flags are used.
>>
>> (Note that in the mptcp.org implementation, the fullmesh PM was sending
>> ADD_ADDR and establishing new subflows, just because "it doesn't hurt to
>> announce all addresses" :) )
>>
>> Currently, if we use the two flags together, the 'subflow' flag is
>> ignored. I added the Fixes tag in patch 2, because it sounds like a bug
>> not to create subflows when the 'subflow' flag is set. And also because
>> it was working before the mentioned 'Fixes' commit which has
>> accidentally removed this support.
>
> Ah, my bad! I underlooked the commit message and missed this part.
>
>>> I am a bit doubtful about having endpoint with both subflow | signal,
>>> do we want encourage such bogus setup? What if the endpoint specify a
>>> port, too? 
>>>
>>> AFAICS it can not work, as the port-based signal endpoint will create a
>>> listener on <IP>:<port> and later on the 'subflow' flag will try to
>>> bind on the same <IP>:<port>, failing, while creating the outgoing
>>> connection.
>>
>> What about returning an error when a port is given with the 'subflow' flag?
>>
>> Currently, we do:
>>
>>   if (addr.addr.port && !(addr.flags & MPTCP_PM_ADDR_FLAG_SIGNAL))
>>       return -EINVAL;
>>
>> We could add "(... || addr.flags & MPTCP_PM_ADDR_FLAG_SUBLFOW)".
>
> IMHO if we deny SIGNAL | SUBFLOW for a subset of the possible
> configuration, than is better to be consistent and deny them for all
> the conf.

I thought about denying them all, but I wonder if there are no use cases currently using that. I didn't want to break them. Do you think it would be OK?

For the moment, using signal + subflow + port doesn't work. But more importantly, it cannot work. This looks similar to signal + fullmesh that is currently denied. So maybe OK to deny only a subset?

>>> What about just documenting the existing behavior?
>>
>> I don't think we should document that "setting both the 'signal' and
>> 'subflow' flags is allowed, but in fact the 'subflow' flag is ignored in
>>> = 5.17 versions", no? :)
>
> Yup, I missed the regression part of it.
>
> If possible, I would be happy to reduce complexity by preventing signal
> and subflow flags together.

Do you think it would be OK to backport that? How to know if this configuration is not already used for valid use cases?

>> I initially looked at adding a warning in IPRoute2, but then I failed to
>> explain when setting these two flags can be justified. (And the new
>> warning would take time to arrive in Linux distro currently ignoring the
>> 'subflow' flag in this case.)
>
> Well, distro problems are distro problems, I'm sure we can ignore the
> distro lifecycle problems here. Should be addressed by a completely
> different set of persons ;)

That's our problem if people report issues to us :-P

> Indeed the ip-mptcp man page could help some significant rewording,
> what about starting from there?

Yes I started to do that, I will look at the v2 :)

https://lore.kernel.org/mptcp/20240605-mptcp-user-feedback-v1-0-d2dc3b399643@kernel.org/T/

> PORT description implicitly refers to 'signal' endpoint without
> mentioning such fact. Also it has effect for subflow endpoint, not
> mentioned there

Good idea to mention that. I will look at adding that in the v2.

Cheers,
Matt
Re: [PATCH mptcp-next 0/4] mptcp: fix endpoints with 'signal' and 'subflow' flags
Posted by Paolo Abeni 3 months, 3 weeks ago
On Thu, 2024-07-04 at 22:02 +0200, Matthieu Baerts wrote:
> > > I don't think we should document that "setting both the 'signal' and
> > > 'subflow' flags is allowed, but in fact the 'subflow' flag is ignored in
> > > > = 5.17 versions", no? :)
> > 
> > Yup, I missed the regression part of it.
> > 
> > If possible, I would be happy to reduce complexity by preventing signal
> > and subflow flags together.
> 
> Do you think it would be OK to backport that? How to know if this 
> configuration is not already used for valid use cases?

I guess that what worry me most is the addition of another large field
to the msk "just" to track (again) endpoints.

Thinking again about it, perhaps we don't need it. Something alike the
following should be sufficient to replace patch 2/4 here:

---
diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index ea9e5817b9e9..24e18cd43f48 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -513,8 +513,8 @@ __lookup_addr(struct pm_nl_pernet *pernet, const struct mptcp_addr_info *info)
 
 static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk)
 {
+	struct mptcp_pm_addr_entry *local, *signal_and_subflow = NULL;
 	struct sock *sk = (struct sock *)msk;
-	struct mptcp_pm_addr_entry *local;
 	unsigned int add_addr_signal_max;
 	unsigned int local_addr_max;
 	struct pm_nl_pernet *pernet;
@@ -568,14 +568,27 @@ static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk)
 		if (msk->pm.addr_signal & BIT(MPTCP_ADD_ADDR_SIGNAL))
 			return;
 
-		if (local) {
-			if (mptcp_pm_alloc_anno_list(msk, &local->addr)) {
-				__clear_bit(local->addr.id, msk->pm.id_avail_bitmap);
-				msk->pm.add_addr_signaled++;
-				mptcp_pm_announce_addr(msk, &local->addr, false);
-				mptcp_pm_nl_addr_send_ack(msk);
-			}
-		}
+		if (!local)
+			continue;
+
+		/* If the allocation fails we are on memory pressure, it looks
+		 * not worthy try any forther later action, that will likely fail
+		 * And if local has both the SIGNAL and SUBFLOW flags, and the
+		 * by bad luck the later subflow creation is successfully, the
+		 * msk will end-up not signaling this address; not even
+		 * re-attempting to do it.
+		 * Likely worthy incrementing a new MIB here
+		 */
+		if (!mptcp_pm_alloc_anno_list(msk, &local->addr))
+			return;
+
+		__clear_bit(local->addr.id, msk->pm.id_avail_bitmap);
+		msk->pm.add_addr_signaled++;
+		mptcp_pm_announce_addr(msk, &local->addr, false);
+		mptcp_pm_nl_addr_send_ack(msk);
+		if (local->flags & MPTCP_PM_ADDR_FLAG_SUBFLOW)
+			signal_and_subflow = local;
+		break;
 	}
 
 	/* check if should create a new subflow */
@@ -585,9 +598,14 @@ static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk)
 		bool fullmesh;
 		int i, nr;
 
-		local = select_local_address(pernet, msk);
-		if (!local)
-			break;
+		if (signal_and_subflow) {
+			local = signal_and_subflow;
+			signal_and_subflow = NULL;
+		} else {
+			local = select_local_address(pernet, msk);
+			if (!local)
+				break;
+		}
 
 		fullmesh = !!(local->flags & MPTCP_PM_ADDR_FLAG_FULLMESH);
---
that is, when we use a signal endpoint, if that has the subflow flag,
use additionally to create the subflows, without doing an additional
search.

WDYT?

Thanks!

Paolo

p.s. Jakubs report the upstream mptcp_connect() st are flacky:

https://netdev.bots.linux.dev/contest.html?executor=vmksft-mptcp-dbg&test=mptcp-connect-sh

can you see that in our CI?
Re: [PATCH mptcp-next 0/4] mptcp: fix endpoints with 'signal' and 'subflow' flags
Posted by Matthieu Baerts 3 months, 3 weeks ago
Hi Paolo,

On 05/07/2024 10:20, Paolo Abeni wrote:
> On Thu, 2024-07-04 at 22:02 +0200, Matthieu Baerts wrote:
>>>> I don't think we should document that "setting both the 'signal' and
>>>> 'subflow' flags is allowed, but in fact the 'subflow' flag is ignored in
>>>>> = 5.17 versions", no? :)
>>>
>>> Yup, I missed the regression part of it.
>>>
>>> If possible, I would be happy to reduce complexity by preventing signal
>>> and subflow flags together.
>>
>> Do you think it would be OK to backport that? How to know if this 
>> configuration is not already used for valid use cases?
> 
> I guess that what worry me most is the addition of another large field
> to the msk "just" to track (again) endpoints.
> 
> Thinking again about it, perhaps we don't need it. Something alike the
> following should be sufficient to replace patch 2/4 here:
> 
> ---
> diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
> index ea9e5817b9e9..24e18cd43f48 100644
> --- a/net/mptcp/pm_netlink.c
> +++ b/net/mptcp/pm_netlink.c
> @@ -513,8 +513,8 @@ __lookup_addr(struct pm_nl_pernet *pernet, const struct mptcp_addr_info *info)
>  
>  static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk)
>  {
> +	struct mptcp_pm_addr_entry *local, *signal_and_subflow = NULL;
>  	struct sock *sk = (struct sock *)msk;
> -	struct mptcp_pm_addr_entry *local;
>  	unsigned int add_addr_signal_max;
>  	unsigned int local_addr_max;
>  	struct pm_nl_pernet *pernet;
> @@ -568,14 +568,27 @@ static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk)
>  		if (msk->pm.addr_signal & BIT(MPTCP_ADD_ADDR_SIGNAL))
>  			return;
>  
> -		if (local) {
> -			if (mptcp_pm_alloc_anno_list(msk, &local->addr)) {
> -				__clear_bit(local->addr.id, msk->pm.id_avail_bitmap);
> -				msk->pm.add_addr_signaled++;
> -				mptcp_pm_announce_addr(msk, &local->addr, false);
> -				mptcp_pm_nl_addr_send_ack(msk);
> -			}
> -		}
> +		if (!local)
> +			continue;
> +
> +		/* If the allocation fails we are on memory pressure, it looks
> +		 * not worthy try any forther later action, that will likely fail
> +		 * And if local has both the SIGNAL and SUBFLOW flags, and the
> +		 * by bad luck the later subflow creation is successfully, the
> +		 * msk will end-up not signaling this address; not even
> +		 * re-attempting to do it.

If the subflow creation is successful, the new address will be somehow
advertised to the other peer. Should we not try to create the subflow
anyway? That would reduce the diff as well.

> +		 * Likely worthy incrementing a new MIB here

I think there are many other places where we don't have such MIB
counters for not being able to allocate memory. Do you think we would
need one here?

> +		 */
> +		if (!mptcp_pm_alloc_anno_list(msk, &local->addr))
> +			return;
> +
> +		__clear_bit(local->addr.id, msk->pm.id_avail_bitmap);
> +		msk->pm.add_addr_signaled++;
> +		mptcp_pm_announce_addr(msk, &local->addr, false);
> +		mptcp_pm_nl_addr_send_ack(msk);
> +		if (local->flags & MPTCP_PM_ADDR_FLAG_SUBFLOW)
> +			signal_and_subflow = local;
> +		break;
>  	}
>  
>  	/* check if should create a new subflow */
> @@ -585,9 +598,14 @@ static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk)
>  		bool fullmesh;
>  		int i, nr;
>  
> -		local = select_local_address(pernet, msk);
> -		if (!local)
> -			break;
> +		if (signal_and_subflow) {
> +			local = signal_and_subflow;
> +			signal_and_subflow = NULL;
> +		} else {
> +			local = select_local_address(pernet, msk);
> +			if (!local)
> +				break;
> +		}
>  
>  		fullmesh = !!(local->flags & MPTCP_PM_ADDR_FLAG_FULLMESH);
> ---
> that is, when we use a signal endpoint, if that has the subflow flag,
> use additionally to create the subflows, without doing an additional
> search.
> 
> WDYT?

Smart! Indeed, we can do that because we always mark the IDs as being
used for one or the other from the same place!

I will update the v2 with your version!

> Thanks!
> 
> Paolo
> 
> p.s. Jakubs report the upstream mptcp_connect() st are flacky:
> 
> https://netdev.bots.linux.dev/contest.html?executor=vmksft-mptcp-dbg&test=mptcp-connect-sh

I noticed that this morning too, but I didn't have a chance to look at
the logs. Now I just did, and I think they are all linked to:

  https://github.com/multipath-tcp/mptcp_net-next/issues/499

> can you see that in our CI?

No:

  https://ci-results.mptcp.dev/flakes.html?branch=export
  https://ci-results.mptcp.dev/flakes.html?branch=export-net


I only saw these issues on the Netdev CI, in debug mode, when the same
IP is used for both the client and the server. I just updated this #499
issue with the two new occurrences.

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.

Re: [PATCH mptcp-next 0/4] mptcp: fix endpoints with 'signal' and 'subflow' flags
Posted by Paolo Abeni 3 months, 3 weeks ago
On Fri, 2024-07-05 at 17:29 +0200, Matthieu Baerts wrote:
> On 05/07/2024 10:20, Paolo Abeni wrote:
> > @@ -568,14 +568,27 @@ static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk)
> >  		if (msk->pm.addr_signal & BIT(MPTCP_ADD_ADDR_SIGNAL))
> >  			return;
> >  
> > -		if (local) {
> > -			if (mptcp_pm_alloc_anno_list(msk, &local->addr)) {
> > -				__clear_bit(local->addr.id, msk->pm.id_avail_bitmap);
> > -				msk->pm.add_addr_signaled++;
> > -				mptcp_pm_announce_addr(msk, &local->addr, false);
> > -				mptcp_pm_nl_addr_send_ack(msk);
> > -			}
> > -		}
> > +		if (!local)
> > +			continue;
> > +
> > +		/* If the allocation fails we are on memory pressure, it looks
> > +		 * not worthy try any forther later action, that will likely fail
> > +		 * And if local has both the SIGNAL and SUBFLOW flags, and the
> > +		 * by bad luck the later subflow creation is successfully, the
> > +		 * msk will end-up not signaling this address; not even
> > +		 * re-attempting to do it.
> 
> If the subflow creation is successful, the new address will be somehow
> advertised to the other peer. Should we not try to create the subflow
> anyway? That would reduce the diff as well.

The announce list requires a very small amount of memory from a single
slab. A subflow creation will require a tcp socket, an mptcp context,
and likely some lsm context, I think is very unlikely all of them will
be successfully.

The diff reduction will be just mostly for the trimmed comment, am I
correct? If so I think we can keep the code this way. Note that
most/all of comment above could be moved into the commit message
itself.

> > +		 * Likely worthy incrementing a new MIB here
> 
> I think there are many other places where we don't have such MIB
> counters for not being able to allocate memory. Do you think we would
> need one here?

Thinking again about it, we don't need them. The warn in dmsg should be
enough ;)

Thanks!

Paolo
Re: [PATCH mptcp-next 0/4] mptcp: fix endpoints with 'signal' and 'subflow' flags
Posted by Matthieu Baerts 3 months, 3 weeks ago
Hi Paolo,

Thank you for your reply!

On 09/07/2024 18:47, Paolo Abeni wrote:
> On Fri, 2024-07-05 at 17:29 +0200, Matthieu Baerts wrote:
>> On 05/07/2024 10:20, Paolo Abeni wrote:
>>> @@ -568,14 +568,27 @@ static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk)
>>>  		if (msk->pm.addr_signal & BIT(MPTCP_ADD_ADDR_SIGNAL))
>>>  			return;
>>>  
>>> -		if (local) {
>>> -			if (mptcp_pm_alloc_anno_list(msk, &local->addr)) {
>>> -				__clear_bit(local->addr.id, msk->pm.id_avail_bitmap);
>>> -				msk->pm.add_addr_signaled++;
>>> -				mptcp_pm_announce_addr(msk, &local->addr, false);
>>> -				mptcp_pm_nl_addr_send_ack(msk);
>>> -			}
>>> -		}
>>> +		if (!local)
>>> +			continue;
>>> +
>>> +		/* If the allocation fails we are on memory pressure, it looks
>>> +		 * not worthy try any forther later action, that will likely fail
>>> +		 * And if local has both the SIGNAL and SUBFLOW flags, and the
>>> +		 * by bad luck the later subflow creation is successfully, the
>>> +		 * msk will end-up not signaling this address; not even
>>> +		 * re-attempting to do it.
>>
>> If the subflow creation is successful, the new address will be somehow
>> advertised to the other peer. Should we not try to create the subflow
>> anyway? That would reduce the diff as well.
> 
> The announce list requires a very small amount of memory from a single
> slab. A subflow creation will require a tcp socket, an mptcp context,
> and likely some lsm context, I think is very unlikely all of them will
> be successfully.

I agree, but I don't know if we should change the behaviour here with a
return.

Note that mptcp_pm_alloc_anno_list() can also return 'false' if the
address was already in the list. It should not happen, but I didn't
check if there were cases we didn't handle correctly there.

> 
> The diff reduction will be just mostly for the trimmed comment, am I
> correct? If so I think we can keep the code this way. Note that
> most/all of comment above could be moved into the commit message
> itself.

For this fix, I would have only added these two lines after having
called mptcp_pm_nl_addr_send_ack():

  if (local->flags & MPTCP_PM_ADDR_FLAG_SUBFLOW)
  	signal_and_subflow = local;

Or we take this opportunity to reduce the level of indentations, but we
use a "continue" if mptcp_pm_alloc_anno_list() failed. WDYT?

>>> +		 * Likely worthy incrementing a new MIB here
>>
>> I think there are many other places where we don't have such MIB
>> counters for not being able to allocate memory. Do you think we would
>> need one here?
> 
> Thinking again about it, we don't need them. The warn in dmsg should be
> enough ;)

Sounds good to me!

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.
Re: [PATCH mptcp-next 0/4] mptcp: fix endpoints with 'signal' and 'subflow' flags
Posted by Paolo Abeni 3 months, 3 weeks ago
On Tue, 2024-07-09 at 19:43 +0200, Matthieu Baerts wrote:
> On 09/07/2024 18:47, Paolo Abeni wrote:
> > The announce list requires a very small amount of memory from a single
> > slab. A subflow creation will require a tcp socket, an mptcp context,
> > and likely some lsm context, I think is very unlikely all of them will
> > be successfully.
> 
> I agree, but I don't know if we should change the behaviour here with a
> return.
> 
> Note that mptcp_pm_alloc_anno_list() can also return 'false' if the
> address was already in the list. It should not happen, but I didn't
> check if there were cases we didn't handle correctly there.

AFAICS, we should never enter that branch. In fact I think we could add
a WARN_ON_ONCE there:

	if (WARN_ON_ONCE(mptcp_pm_is_kernel(msk)))
		return false;

because the bitmap should prevent us from adding an already announce
address.


> 
> > 
> > The diff reduction will be just mostly for the trimmed comment, am I
> > correct? If so I think we can keep the code this way. Note that
> > most/all of comment above could be moved into the commit message
> > itself.
> 
> For this fix, I would have only added these two lines after having
> called mptcp_pm_nl_addr_send_ack():
> 
>   if (local->flags & MPTCP_PM_ADDR_FLAG_SUBFLOW)
>   	signal_and_subflow = local;
> 
> Or we take this opportunity to reduce the level of indentations, but we
> use a "continue" if mptcp_pm_alloc_anno_list() failed. WDYT?

I would go for reducing the level of indentation.

I think that in practice the behavior will be the same with both
'continue' and 'return': on 'continue' the next iteration will very
likely fail. Perhaps it's better to be avoid wasting cycles and/or put
extreme memory pressure on the system and consistently 'return'?


Thanks!

Paolo
Re: [PATCH mptcp-next 0/4] mptcp: fix endpoints with 'signal' and 'subflow' flags
Posted by MPTCP CI 4 months, 1 week ago
Hi Matthieu,

Thank you for your modifications, that's great!

Our CI did some validations and here is its report:

- KVM Validation: normal: Success! ✅
- KVM Validation: debug: Success! ✅
- KVM Validation: btf (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/9613128523

Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/5b1c305d4d2d
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=864250


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-normal

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)