mptcp: pm: userspace: avoid scheduling in-kernel PM worker

[PATCH mptcp-next] mptcp: pm: userspace: avoid scheduling in-kernel PM worker

Posted by Matthieu Baerts (NGI0) 1 month, 2 weeks ago

When the userspace PM is used, there is no need to schedule the PM
worker for in-kernel specific tasks, e.g. creating new subflows, or
sending more ADD_ADDR.

Now, these tasks will be done only if the in-kernel PM is being used.

Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
Notes:
 - I don't know if this should be seen as a fix. Maybe yes because it
   was not supposed to do that from the beginning, and it feels less
   risky not to schedule the worker when it is not needed. I'm open to
   suggestions here.
 - I'm mainly sharing this patch now because Geliang is going to modify
   these helpers to separate PM specific code.
   @Geliang: please rebase your series on top of this one (or include
   this patch if it is easier).
Cc: Geliang Tang <tanggeliang@kylinos.cn>
---
 net/mptcp/pm.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c
index 16cacce6c10fe86467aa7ef8e588f9f535b586fb..4eabb83328905676759768fb44c859f5682721e3 100644
--- a/net/mptcp/pm.c
+++ b/net/mptcp/pm.c
@@ -138,7 +138,7 @@ void mptcp_pm_fully_established(struct mptcp_sock *msk, const struct sock *ssk)
 	 * racing paths - accept() and check_fully_established()
 	 * be sure to serve this event only once.
 	 */
-	if (READ_ONCE(pm->work_pending) &&
+	if (mptcp_pm_is_kernel(msk) && READ_ONCE(pm->work_pending) &&
 	    !(msk->pm.status & BIT(MPTCP_PM_ALREADY_ESTABLISHED)))
 		mptcp_pm_schedule_work(msk, MPTCP_PM_ESTABLISHED);
 
@@ -166,7 +166,7 @@ void mptcp_pm_subflow_established(struct mptcp_sock *msk)
 
 	pr_debug("msk=%p\n", msk);
 
-	if (!READ_ONCE(pm->work_pending))
+	if (!mptcp_pm_is_kernel(msk) || !READ_ONCE(pm->work_pending))
 		return;
 
 	spin_lock_bh(&pm->lock);
@@ -251,6 +251,9 @@ void mptcp_pm_add_addr_echoed(struct mptcp_sock *msk,
 
 	pr_debug("msk=%p\n", msk);
 
+	if (!mptcp_pm_is_kernel(msk) || !READ_ONCE(pm->work_pending))
+		return;
+
 	spin_lock_bh(&pm->lock);
 
 	if (mptcp_lookup_anno_list_by_saddr(msk, addr) && READ_ONCE(pm->work_pending))
@@ -278,6 +281,9 @@ void mptcp_pm_rm_addr_received(struct mptcp_sock *msk,
 	for (i = 0; i < rm_list->nr; i++)
 		mptcp_event_addr_removed(msk, rm_list->ids[i]);
 
+	if (!mptcp_pm_is_kernel(msk))
+		return;
+
 	spin_lock_bh(&pm->lock);
 	if (mptcp_pm_schedule_work(msk, MPTCP_PM_RM_ADDR_RECEIVED))
 		pm->rm_list_rx = *rm_list;

---
base-commit: 3716d837622ee1baf6fbe7e6a6a7a3df116e75fe
change-id: 20250224-mptcp-userspace-avoid-worker-93f367e39ca6

Best regards,
-- 
Matthieu Baerts (NGI0) <matttbe@kernel.org>

Re: [PATCH mptcp-next] mptcp: pm: userspace: avoid scheduling in-kernel PM worker

Posted by Geliang Tang 1 month, 2 weeks ago

Hi Matt,

On Mon, 2025-02-24 at 20:01 +0100, Matthieu Baerts (NGI0) wrote:
> When the userspace PM is used, there is no need to schedule the PM
> worker for in-kernel specific tasks, e.g. creating new subflows, or
> sending more ADD_ADDR.
> 
> Now, these tasks will be done only if the in-kernel PM is being used.
> 
> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
> ---
> Notes:
>  - I don't know if this should be seen as a fix. Maybe yes because it
>    was not supposed to do that from the beginning, and it feels less
>    risky not to schedule the worker when it is not needed. I'm open
> to
>    suggestions here.
>  - I'm mainly sharing this patch now because Geliang is going to
> modify
>    these helpers to separate PM specific code.
>    @Geliang: please rebase your series on top of this one (or include
>    this patch if it is easier).
> Cc: Geliang Tang <tanggeliang@kylinos.cn>
> ---
>  net/mptcp/pm.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c
> index
> 16cacce6c10fe86467aa7ef8e588f9f535b586fb..4eabb83328905676759768fb44c
> 859f5682721e3 100644
> --- a/net/mptcp/pm.c
> +++ b/net/mptcp/pm.c
> @@ -138,7 +138,7 @@ void mptcp_pm_fully_established(struct mptcp_sock
> *msk, const struct sock *ssk)
>  	 * racing paths - accept() and check_fully_established()
>  	 * be sure to serve this event only once.
>  	 */
> -	if (READ_ONCE(pm->work_pending) &&
> +	if (mptcp_pm_is_kernel(msk) && READ_ONCE(pm->work_pending)
> &&

I think there's no need to modify this. work_pending is set to 0 by 
userspace pm, for all cases where work_pending is 1, the path manager
must be an in-kernel pm.

So there is no need to add mptcp_pm_is_kernel() check here. Just
checking work_pending is sufficient.

>  	    !(msk->pm.status & BIT(MPTCP_PM_ALREADY_ESTABLISHED)))
>  		mptcp_pm_schedule_work(msk, MPTCP_PM_ESTABLISHED);
>  
> @@ -166,7 +166,7 @@ void mptcp_pm_subflow_established(struct
> mptcp_sock *msk)
>  
>  	pr_debug("msk=%p\n", msk);
>  
> -	if (!READ_ONCE(pm->work_pending))
> +	if (!mptcp_pm_is_kernel(msk) || !READ_ONCE(pm-
> >work_pending))
>  		return;

No need to modify this too, work_pending has been checked later in
mptcp_pm_subflow_established:

    if (READ_ONCE(pm->work_pending))
        mptcp_pm_schedule_work(msk, MPTCP_PM_SUBFLOW_ESTABLISHED);

>  
>  	spin_lock_bh(&pm->lock);
> @@ -251,6 +251,9 @@ void mptcp_pm_add_addr_echoed(struct mptcp_sock
> *msk,
>  
>  	pr_debug("msk=%p\n", msk);
>  
> +	if (!mptcp_pm_is_kernel(msk) || !READ_ONCE(pm-
> >work_pending))
> +		return;
> +

Same here, work_pending has been checked later in
mptcp_pm_add_addr_echoed:

if (mptcp_lookup_anno_list_by_saddr(msk, addr) && READ_ONCE(pm-
>work_pending))
    mptcp_pm_schedule_work(msk, MPTCP_PM_SUBFLOW_ESTABLISHED);

>  	spin_lock_bh(&pm->lock);
>  
>  	if (mptcp_lookup_anno_list_by_saddr(msk, addr) &&
> READ_ONCE(pm->work_pending))
> @@ -278,6 +281,9 @@ void mptcp_pm_rm_addr_received(struct mptcp_sock
> *msk,
>  	for (i = 0; i < rm_list->nr; i++)
>  		mptcp_event_addr_removed(msk, rm_list->ids[i]);
>  
> +	if (!mptcp_pm_is_kernel(msk))
> +		return;

This mptcp_pm_is_kernel() is indeed needed.

WDYT?

Thanks,
-Geliang

> +
>  	spin_lock_bh(&pm->lock);
>  	if (mptcp_pm_schedule_work(msk, MPTCP_PM_RM_ADDR_RECEIVED))
>  		pm->rm_list_rx = *rm_list;
> 
> ---
> base-commit: 3716d837622ee1baf6fbe7e6a6a7a3df116e75fe
> change-id: 20250224-mptcp-userspace-avoid-worker-93f367e39ca6
> 
> Best regards,

Re: [PATCH mptcp-next] mptcp: pm: userspace: avoid scheduling in-kernel PM worker

Posted by Matthieu Baerts 1 month, 2 weeks ago

Hi Geliang,

Thank you for your review!

On 25/02/2025 05:21, Geliang Tang wrote:
> Hi Matt,
> 
> On Mon, 2025-02-24 at 20:01 +0100, Matthieu Baerts (NGI0) wrote:
>> When the userspace PM is used, there is no need to schedule the PM
>> worker for in-kernel specific tasks, e.g. creating new subflows, or
>> sending more ADD_ADDR.
>>
>> Now, these tasks will be done only if the in-kernel PM is being used.
>>
>> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
>> ---
>> Notes:
>>  - I don't know if this should be seen as a fix. Maybe yes because it
>>    was not supposed to do that from the beginning, and it feels less
>>    risky not to schedule the worker when it is not needed. I'm open
>> to
>>    suggestions here.
>>  - I'm mainly sharing this patch now because Geliang is going to
>> modify
>>    these helpers to separate PM specific code.
>>    @Geliang: please rebase your series on top of this one (or include
>>    this patch if it is easier).
>> Cc: Geliang Tang <tanggeliang@kylinos.cn>
>> ---
>>  net/mptcp/pm.c | 10 ++++++++--
>>  1 file changed, 8 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c
>> index
>> 16cacce6c10fe86467aa7ef8e588f9f535b586fb..4eabb83328905676759768fb44c
>> 859f5682721e3 100644
>> --- a/net/mptcp/pm.c
>> +++ b/net/mptcp/pm.c
>> @@ -138,7 +138,7 @@ void mptcp_pm_fully_established(struct mptcp_sock
>> *msk, const struct sock *ssk)
>>  	 * racing paths - accept() and check_fully_established()
>>  	 * be sure to serve this event only once.
>>  	 */
>> -	if (READ_ONCE(pm->work_pending) &&
>> +	if (mptcp_pm_is_kernel(msk) && READ_ONCE(pm->work_pending)
>> &&
> 
> I think there's no need to modify this. work_pending is set to 0 by 
> userspace pm, for all cases where work_pending is 1, the path manager
> must be an in-kernel pm.

Good point, I missed that!

'pm->work_pending' doesn't look like a good name for what it does.
Hopefully, this will be clearer when pm->ops will be used, which will
clearly separate what is done by the different PMs.

Also, is it possible we never set 'pm->work_pending' back to 'true' once
it has been set to 'false'? Does it mean when we reach the limits, and
something happens (e.g. a removed subflow, limits are changed, etc.),
the worker is no longer used to establish new subflows or react when an
ADD_ADDR is received?

> So there is no need to add mptcp_pm_is_kernel() check here. Just
> checking work_pending is sufficient.

Indeed.

>>  	    !(msk->pm.status & BIT(MPTCP_PM_ALREADY_ESTABLISHED)))
>>  		mptcp_pm_schedule_work(msk, MPTCP_PM_ESTABLISHED);
>>  
>> @@ -166,7 +166,7 @@ void mptcp_pm_subflow_established(struct
>> mptcp_sock *msk)
>>  
>>  	pr_debug("msk=%p\n", msk);
>>  
>> -	if (!READ_ONCE(pm->work_pending))
>> +	if (!mptcp_pm_is_kernel(msk) || !READ_ONCE(pm-
>>> work_pending))
>>  		return;
> 
> No need to modify this too, work_pending has been checked later in
> mptcp_pm_subflow_established:
> 
>     if (READ_ONCE(pm->work_pending))
>         mptcp_pm_schedule_work(msk, MPTCP_PM_SUBFLOW_ESTABLISHED);

Indeed.

>>  
>>  	spin_lock_bh(&pm->lock);
>> @@ -251,6 +251,9 @@ void mptcp_pm_add_addr_echoed(struct mptcp_sock
>> *msk,
>>  
>>  	pr_debug("msk=%p\n", msk);
>>  
>> +	if (!mptcp_pm_is_kernel(msk) || !READ_ONCE(pm-
>>> work_pending))
>> +		return;
>> +
> 
> Same here, work_pending has been checked later in
> mptcp_pm_add_addr_echoed:
> 
> if (mptcp_lookup_anno_list_by_saddr(msk, addr) && READ_ONCE(pm-
>> work_pending))
>     mptcp_pm_schedule_work(msk, MPTCP_PM_SUBFLOW_ESTABLISHED);

Indeed.

>>  	spin_lock_bh(&pm->lock);
>>  
>>  	if (mptcp_lookup_anno_list_by_saddr(msk, addr) &&
>> READ_ONCE(pm->work_pending))
>> @@ -278,6 +281,9 @@ void mptcp_pm_rm_addr_received(struct mptcp_sock
>> *msk,
>>  	for (i = 0; i < rm_list->nr; i++)
>>  		mptcp_event_addr_removed(msk, rm_list->ids[i]);
>>  
>> +	if (!mptcp_pm_is_kernel(msk))
>> +		return;
> 
> This mptcp_pm_is_kernel() is indeed needed.

Yes it is. And 'pm->work_pending' cannot be used, because it will be set
to false when the limits are reached, but a RM_ADDR will potentially
decrement counters.

I will send a v2 with only this change.

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.

Re: [PATCH mptcp-next] mptcp: pm: userspace: avoid scheduling in-kernel PM worker

Posted by Matthieu Baerts 1 month, 2 weeks ago

Hi Geliang,

On 25/02/2025 10:25, Matthieu Baerts wrote:
> Hi Geliang,
> 
> Thank you for your review!
> 
> On 25/02/2025 05:21, Geliang Tang wrote:
>> Hi Matt,
>>
>> On Mon, 2025-02-24 at 20:01 +0100, Matthieu Baerts (NGI0) wrote:
>>> When the userspace PM is used, there is no need to schedule the PM
>>> worker for in-kernel specific tasks, e.g. creating new subflows, or
>>> sending more ADD_ADDR.
>>>
>>> Now, these tasks will be done only if the in-kernel PM is being used.
>>>
>>> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
>>> ---
>>> Notes:
>>>  - I don't know if this should be seen as a fix. Maybe yes because it
>>>    was not supposed to do that from the beginning, and it feels less
>>>    risky not to schedule the worker when it is not needed. I'm open
>>> to
>>>    suggestions here.
>>>  - I'm mainly sharing this patch now because Geliang is going to
>>> modify
>>>    these helpers to separate PM specific code.
>>>    @Geliang: please rebase your series on top of this one (or include
>>>    this patch if it is easier).
>>> Cc: Geliang Tang <tanggeliang@kylinos.cn>
>>> ---
>>>  net/mptcp/pm.c | 10 ++++++++--
>>>  1 file changed, 8 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c
>>> index
>>> 16cacce6c10fe86467aa7ef8e588f9f535b586fb..4eabb83328905676759768fb44c
>>> 859f5682721e3 100644
>>> --- a/net/mptcp/pm.c
>>> +++ b/net/mptcp/pm.c
>>> @@ -138,7 +138,7 @@ void mptcp_pm_fully_established(struct mptcp_sock
>>> *msk, const struct sock *ssk)
>>>  	 * racing paths - accept() and check_fully_established()
>>>  	 * be sure to serve this event only once.
>>>  	 */
>>> -	if (READ_ONCE(pm->work_pending) &&
>>> +	if (mptcp_pm_is_kernel(msk) && READ_ONCE(pm->work_pending)
>>> &&
>>
>> I think there's no need to modify this. work_pending is set to 0 by 
>> userspace pm, for all cases where work_pending is 1, the path manager
>> must be an in-kernel pm.
> 
> Good point, I missed that!
> 
> 'pm->work_pending' doesn't look like a good name for what it does.
> Hopefully, this will be clearer when pm->ops will be used, which will
> clearly separate what is done by the different PMs.
> 
> Also, is it possible we never set 'pm->work_pending' back to 'true' once
> it has been set to 'false'? Does it mean when we reach the limits, and
> something happens (e.g. a removed subflow, limits are changed, etc.),
> the worker is no longer used to establish new subflows or react when an
> ADD_ADDR is received?

I didn't check, but it feels like we should add something like that:

> diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
> index d4328443d844..dee3e3c14ec3 100644
> --- a/net/mptcp/pm_netlink.c
> +++ b/net/mptcp/pm_netlink.c
> @@ -248,6 +248,8 @@ bool mptcp_pm_nl_check_work_pending(struct mptcp_sock *msk)
>                 WRITE_ONCE(msk->pm.work_pending, false);
>                 return false;
>         }
> +
> +       WRITE_ONCE(msk->pm.work_pending, true);
>         return true;
>  }
>  
> @@ -911,6 +913,9 @@ static void mptcp_pm_nl_rm_addr_or_subflow(struct mptcp_sock *msk,
>                                 WRITE_ONCE(msk->pm.accept_addr, true);
>                 }
>         }
> +
> +       if (mptcp_pm_is_kernel(msk))
> +               mptcp_pm_nl_check_work_pending(msk);
>  }
>  
>  static void mptcp_pm_nl_rm_addr_received(struct mptcp_sock *msk)


So to re-enable msk->pm.work_pending when limits have been reached, and
it is no longer the case. WDYT?

We would need a new subtest to confirm this patch is needed, and to
avoid regressions later.

(...)

>>> @@ -278,6 +281,9 @@ void mptcp_pm_rm_addr_received(struct mptcp_sock
>>> *msk,
>>>  	for (i = 0; i < rm_list->nr; i++)
>>>  		mptcp_event_addr_removed(msk, rm_list->ids[i]);
>>>  
>>> +	if (!mptcp_pm_is_kernel(msk))
>>> +		return;
>>
>> This mptcp_pm_is_kernel() is indeed needed.
> 
> Yes it is. And 'pm->work_pending' cannot be used, because it will be set
> to false when the limits are reached, but a RM_ADDR will potentially
> decrement counters.

In fact, it is not needed: from what I see, the corresponding subflows
are removed on the kernel side, the userspace daemon doesn't need to
send a request to remove them. See the modification from 4d25247d3ae4
("mptcp: bypass in-kernel PM restrictions for non-kernel PMs") in
mptcp_pm_nl_rm_addr_or_subflow().

> I will send a v2 with only this change.
I will send a v2 with only an early exit in mptcp_pm_add_addr_echoed()
if there is no need to schedule the worker.

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.

Re: [PATCH mptcp-next] mptcp: pm: userspace: avoid scheduling in-kernel PM worker

Posted by Geliang Tang 1 month, 2 weeks ago

Hi Matt,

On Tue, 2025-02-25 at 19:15 +0100, Matthieu Baerts wrote:
> Hi Geliang,
> 
> On 25/02/2025 10:25, Matthieu Baerts wrote:
> > Hi Geliang,
> > 
> > Thank you for your review!
> > 
> > On 25/02/2025 05:21, Geliang Tang wrote:
> > > Hi Matt,
> > > 
> > > On Mon, 2025-02-24 at 20:01 +0100, Matthieu Baerts (NGI0) wrote:
> > > > When the userspace PM is used, there is no need to schedule the
> > > > PM
> > > > worker for in-kernel specific tasks, e.g. creating new
> > > > subflows, or
> > > > sending more ADD_ADDR.
> > > > 
> > > > Now, these tasks will be done only if the in-kernel PM is being
> > > > used.
> > > > 
> > > > Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
> > > > ---
> > > > Notes:
> > > >  - I don't know if this should be seen as a fix. Maybe yes
> > > > because it
> > > >    was not supposed to do that from the beginning, and it feels
> > > > less
> > > >    risky not to schedule the worker when it is not needed. I'm
> > > > open
> > > > to
> > > >    suggestions here.
> > > >  - I'm mainly sharing this patch now because Geliang is going
> > > > to
> > > > modify
> > > >    these helpers to separate PM specific code.
> > > >    @Geliang: please rebase your series on top of this one (or
> > > > include
> > > >    this patch if it is easier).
> > > > Cc: Geliang Tang <tanggeliang@kylinos.cn>
> > > > ---
> > > >  net/mptcp/pm.c | 10 ++++++++--
> > > >  1 file changed, 8 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c
> > > > index
> > > > 16cacce6c10fe86467aa7ef8e588f9f535b586fb..4eabb8332890567675976
> > > > 8fb44c
> > > > 859f5682721e3 100644
> > > > --- a/net/mptcp/pm.c
> > > > +++ b/net/mptcp/pm.c
> > > > @@ -138,7 +138,7 @@ void mptcp_pm_fully_established(struct
> > > > mptcp_sock
> > > > *msk, const struct sock *ssk)
> > > >  	 * racing paths - accept() and
> > > > check_fully_established()
> > > >  	 * be sure to serve this event only once.
> > > >  	 */
> > > > -	if (READ_ONCE(pm->work_pending) &&
> > > > +	if (mptcp_pm_is_kernel(msk) && READ_ONCE(pm-
> > > > >work_pending)
> > > > &&
> > > 
> > > I think there's no need to modify this. work_pending is set to 0
> > > by 
> > > userspace pm, for all cases where work_pending is 1, the path
> > > manager
> > > must be an in-kernel pm.
> > 
> > Good point, I missed that!
> > 
> > 'pm->work_pending' doesn't look like a good name for what it does.
> > Hopefully, this will be clearer when pm->ops will be used, which
> > will
> > clearly separate what is done by the different PMs.
> > 
> > Also, is it possible we never set 'pm->work_pending' back to 'true'
> > once
> > it has been set to 'false'? Does it mean when we reach the limits,
> > and
> > something happens (e.g. a removed subflow, limits are changed,
> > etc.),
> > the worker is no longer used to establish new subflows or react
> > when an
> > ADD_ADDR is received?
> 
> I didn't check, but it feels like we should add something like that:

Yes, it is. We only set 'work_pending' to 'false' in
mptcp_pm_nl_check_work_pending() but never set it back.

I'll try to fix this later. Please add an issue for it on github.

> 
> > diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
> > index d4328443d844..dee3e3c14ec3 100644
> > --- a/net/mptcp/pm_netlink.c
> > +++ b/net/mptcp/pm_netlink.c
> > @@ -248,6 +248,8 @@ bool mptcp_pm_nl_check_work_pending(struct
> > mptcp_sock *msk)
> >                 WRITE_ONCE(msk->pm.work_pending, false);
> >                 return false;
> >         }
> > +
> > +       WRITE_ONCE(msk->pm.work_pending, true);
> >         return true;
> >  }
> >  
> > @@ -911,6 +913,9 @@ static void
> > mptcp_pm_nl_rm_addr_or_subflow(struct mptcp_sock *msk,
> >                                 WRITE_ONCE(msk->pm.accept_addr,
> > true);
> >                 }
> >         }
> > +
> > +       if (mptcp_pm_is_kernel(msk))
> > +               mptcp_pm_nl_check_work_pending(msk);
> >  }
> >  
> >  static void mptcp_pm_nl_rm_addr_received(struct mptcp_sock *msk)
> 
> 
> So to re-enable msk->pm.work_pending when limits have been reached,
> and
> it is no longer the case. WDYT?
> 
> We would need a new subtest to confirm this patch is needed, and to
> avoid regressions later.

And add a selftest for this too.

> 
> (...)
> 
> > > > @@ -278,6 +281,9 @@ void mptcp_pm_rm_addr_received(struct
> > > > mptcp_sock
> > > > *msk,
> > > >  	for (i = 0; i < rm_list->nr; i++)
> > > >  		mptcp_event_addr_removed(msk, rm_list-
> > > > >ids[i]);
> > > >  
> > > > +	if (!mptcp_pm_is_kernel(msk))
> > > > +		return;
> > > 
> > > This mptcp_pm_is_kernel() is indeed needed.
> > 
> > Yes it is. And 'pm->work_pending' cannot be used, because it will
> > be set
> > to false when the limits are reached, but a RM_ADDR will
> > potentially
> > decrement counters.
> 
> In fact, it is not needed: from what I see, the corresponding
> subflows
> are removed on the kernel side, the userspace daemon doesn't need to
> send a request to remove them. See the modification from 4d25247d3ae4
> ("mptcp: bypass in-kernel PM restrictions for non-kernel PMs") in
> mptcp_pm_nl_rm_addr_or_subflow().
> 
> > I will send a v2 with only this change.
> I will send a v2 with only an early exit in
> mptcp_pm_add_addr_echoed()
> if there is no need to schedule the worker.

Sure. I agree.

Thanks,
-Geliang

> 
> Cheers,
> Matt

Re: [PATCH mptcp-next] mptcp: pm: userspace: avoid scheduling in-kernel PM worker

Posted by Matthieu Baerts 1 month, 1 week ago

Hi Geliang,

On 26/02/2025 02:05, Geliang Tang wrote:
> Hi Matt,
> 
> On Tue, 2025-02-25 at 19:15 +0100, Matthieu Baerts wrote:
>> Hi Geliang,
>>
>> On 25/02/2025 10:25, Matthieu Baerts wrote:
>>> Hi Geliang,
>>>
>>> Thank you for your review!
>>>
>>> On 25/02/2025 05:21, Geliang Tang wrote:
>>>> Hi Matt,
>>>>
>>>> On Mon, 2025-02-24 at 20:01 +0100, Matthieu Baerts (NGI0) wrote:
>>>>> When the userspace PM is used, there is no need to schedule the
>>>>> PM
>>>>> worker for in-kernel specific tasks, e.g. creating new
>>>>> subflows, or
>>>>> sending more ADD_ADDR.
>>>>>
>>>>> Now, these tasks will be done only if the in-kernel PM is being
>>>>> used.
>>>>>
>>>>> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
>>>>> ---
>>>>> Notes:
>>>>>  - I don't know if this should be seen as a fix. Maybe yes
>>>>> because it
>>>>>    was not supposed to do that from the beginning, and it feels
>>>>> less
>>>>>    risky not to schedule the worker when it is not needed. I'm
>>>>> open
>>>>> to
>>>>>    suggestions here.
>>>>>  - I'm mainly sharing this patch now because Geliang is going
>>>>> to
>>>>> modify
>>>>>    these helpers to separate PM specific code.
>>>>>    @Geliang: please rebase your series on top of this one (or
>>>>> include
>>>>>    this patch if it is easier).
>>>>> Cc: Geliang Tang <tanggeliang@kylinos.cn>
>>>>> ---
>>>>>  net/mptcp/pm.c | 10 ++++++++--
>>>>>  1 file changed, 8 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c
>>>>> index
>>>>> 16cacce6c10fe86467aa7ef8e588f9f535b586fb..4eabb8332890567675976
>>>>> 8fb44c
>>>>> 859f5682721e3 100644
>>>>> --- a/net/mptcp/pm.c
>>>>> +++ b/net/mptcp/pm.c
>>>>> @@ -138,7 +138,7 @@ void mptcp_pm_fully_established(struct
>>>>> mptcp_sock
>>>>> *msk, const struct sock *ssk)
>>>>>  	 * racing paths - accept() and
>>>>> check_fully_established()
>>>>>  	 * be sure to serve this event only once.
>>>>>  	 */
>>>>> -	if (READ_ONCE(pm->work_pending) &&
>>>>> +	if (mptcp_pm_is_kernel(msk) && READ_ONCE(pm-
>>>>>> work_pending)
>>>>> &&
>>>>
>>>> I think there's no need to modify this. work_pending is set to 0
>>>> by 
>>>> userspace pm, for all cases where work_pending is 1, the path
>>>> manager
>>>> must be an in-kernel pm.
>>>
>>> Good point, I missed that!
>>>
>>> 'pm->work_pending' doesn't look like a good name for what it does.
>>> Hopefully, this will be clearer when pm->ops will be used, which
>>> will
>>> clearly separate what is done by the different PMs.
>>>
>>> Also, is it possible we never set 'pm->work_pending' back to 'true'
>>> once
>>> it has been set to 'false'? Does it mean when we reach the limits,
>>> and
>>> something happens (e.g. a removed subflow, limits are changed,
>>> etc.),
>>> the worker is no longer used to establish new subflows or react
>>> when an
>>> ADD_ADDR is received?
>>
>> I didn't check, but it feels like we should add something like that:
> 
> Yes, it is. We only set 'work_pending' to 'false' in
> mptcp_pm_nl_check_work_pending() but never set it back.

I looked a bit at that, but I didn't find anything wrong, even if it
feels wrong not to put it back to 'true'.

> I'll try to fix this later. Please add an issue for it on github.

I can, but we will need to proof that there is an interest to set it
back to "true".

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.