[v2] Squash to "Add mptcp_subflow bpf_iter support"

[PATCH mptcp-next v2 2/3] Squash to "bpf: Add mptcp_subflow bpf_iter"

Posted by Geliang Tang 1 year, 2 months ago

From: Geliang Tang <tanggeliang@kylinos.cn>

Address Martin's comments in v1:

- bpf_iter_mptcp_subflow_new returns -EINVAL when msk socket lock isn't
  held.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
---
 net/mptcp/bpf.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c
index 02038db59956..89a7a482368d 100644
--- a/net/mptcp/bpf.c
+++ b/net/mptcp/bpf.c
@@ -252,7 +252,10 @@ bpf_iter_mptcp_subflow_new(struct bpf_iter_mptcp_subflow *it,
 	if (!msk)
 		return -EINVAL;
 
-	msk_owned_by_me(msk);
+#ifdef CONFIG_LOCKDEP
+	if (!lockdep_sock_is_held((const struct sock *)msk))
+		return -EINVAL;
+#endif
 
 	kit->pos = &msk->conn_list;
 	return 0;
-- 
2.45.2

Re: [PATCH mptcp-next v2 2/3] Squash to "bpf: Add mptcp_subflow bpf_iter"

Posted by Matthieu Baerts 1 year, 2 months ago

Hi Geliang,

On 21/11/2024 10:36, Geliang Tang wrote:
> From: Geliang Tang <tanggeliang@kylinos.cn>
> 
> Address Martin's comments in v1:
> 
> - bpf_iter_mptcp_subflow_new returns -EINVAL when msk socket lock isn't
>   held.
> 
> Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
> ---
>  net/mptcp/bpf.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c
> index 02038db59956..89a7a482368d 100644
> --- a/net/mptcp/bpf.c
> +++ b/net/mptcp/bpf.c
> @@ -252,7 +252,10 @@ bpf_iter_mptcp_subflow_new(struct bpf_iter_mptcp_subflow *it,
>  	if (!msk)
>  		return -EINVAL;
>  
> -	msk_owned_by_me(msk);
> +#ifdef CONFIG_LOCKDEP
> +	if (!lockdep_sock_is_held((const struct sock *)msk))
> +		return -EINVAL;

This feels wrong: lockdep is a debugging tool, the goal is usually to
WARN in case of issue, not to take a different path.

Maybe you want to use spin_is_locked(), but still, I'm not sure whether
that's the direction to take: it means the msk always needs to be locked
when iterating over the different subflow → is it always the case? Maybe
yes if it is used only to modify update it?

(If not, maybe you just need to check the sk_refcnt as a safety measure?
But that's maybe not enough for all cases.)

Did you check what was being done for the BPF TCP CA? I guess they need
a way to prevent calling some kfunc without the required locks or
refcount, no?

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.

Re: [PATCH mptcp-next v2 2/3] Squash to "bpf: Add mptcp_subflow bpf_iter"

Posted by Geliang Tang 1 year, 2 months ago

Hi Matt,

On Fri, 2024-11-22 at 11:02 +0100, Matthieu Baerts wrote:
> Hi Geliang,
> 
> On 21/11/2024 10:36, Geliang Tang wrote:
> > From: Geliang Tang <tanggeliang@kylinos.cn>
> > 
> > Address Martin's comments in v1:
> > 
> > - bpf_iter_mptcp_subflow_new returns -EINVAL when msk socket lock
> > isn't
> >   held.
> > 
> > Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
> > ---
> >  net/mptcp/bpf.c | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> > 
> > diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c
> > index 02038db59956..89a7a482368d 100644
> > --- a/net/mptcp/bpf.c
> > +++ b/net/mptcp/bpf.c
> > @@ -252,7 +252,10 @@ bpf_iter_mptcp_subflow_new(struct
> > bpf_iter_mptcp_subflow *it,

Other bpf_iter have added "sizeof" and "alignof" checks, I think
mptcp_subflow bpf_iter also needs to add them too:

      BUILD_BUG_ON(sizeof(struct bpf_iter_mptcp_subflow_kern) !=
                   sizeof(struct bpf_iter_mptcp_subflow));
      BUILD_BUG_ON(__alignof__(struct bpf_iter_mptcp_subflow_kern) !=
                   __alignof__(struct bpf_iter_mptcp_subflow));

> >  	if (!msk)
> >  		return -EINVAL;
> >  
> > -	msk_owned_by_me(msk);
> > +#ifdef CONFIG_LOCKDEP
> > +	if (!lockdep_sock_is_held((const struct sock *)msk))
> > +		return -EINVAL;
> 
> This feels wrong: lockdep is a debugging tool, the goal is usually to
> WARN in case of issue, not to take a different path.
> 
> Maybe you want to use spin_is_locked(), but still, I'm not sure
> whether

spin_is_locked() doesn't work, since this msk socket lock isn't a
spinlock, right?

> that's the direction to take: it means the msk always needs to be
> locked
> when iterating over the different subflow → is it always the case?
> Maybe
> yes if it is used only to modify update it?
> 
> (If not, maybe you just need to check the sk_refcnt as a safety
> measure?
> But that's maybe not enough for all cases.)

I think the answer is no, sometimes locking is not necessary when
iterating over the different subflows. Does this mean we can simply
drop msk_owned_by_me() here?

> 
> Did you check what was being done for the BPF TCP CA? I guess they
> need
> a way to prevent calling some kfunc without the required locks or
> refcount, no?

Not found in BPF TCP CA.

Thanks,
-Geliang

> 
> Cheers,
> Matt

Re: [PATCH mptcp-next v2 2/3] Squash to "bpf: Add mptcp_subflow bpf_iter"

Posted by Matthieu Baerts 1 year, 2 months ago

Hi Geliang,

We briefly talked about this patch yesterday with Mat.

On 27/11/2024 03:20, Geliang Tang wrote:
> Hi Matt,
> 
> On Fri, 2024-11-22 at 11:02 +0100, Matthieu Baerts wrote:
>> Hi Geliang,
>>
>> On 21/11/2024 10:36, Geliang Tang wrote:
>>> From: Geliang Tang <tanggeliang@kylinos.cn>
>>>
>>> Address Martin's comments in v1:
>>>
>>> - bpf_iter_mptcp_subflow_new returns -EINVAL when msk socket lock
>>> isn't
>>>   held.
>>>
>>> Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
>>> ---
>>>  net/mptcp/bpf.c | 5 ++++-
>>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c
>>> index 02038db59956..89a7a482368d 100644
>>> --- a/net/mptcp/bpf.c
>>> +++ b/net/mptcp/bpf.c
>>> @@ -252,7 +252,10 @@ bpf_iter_mptcp_subflow_new(struct
>>> bpf_iter_mptcp_subflow *it,
> 
> Other bpf_iter have added "sizeof" and "alignof" checks, I think
> mptcp_subflow bpf_iter also needs to add them too:
> 
>       BUILD_BUG_ON(sizeof(struct bpf_iter_mptcp_subflow_kern) !=
>                    sizeof(struct bpf_iter_mptcp_subflow));
>       BUILD_BUG_ON(__alignof__(struct bpf_iter_mptcp_subflow_kern) !=
>                    __alignof__(struct bpf_iter_mptcp_subflow));

If it is something that is done with other bpf_iter, I guess we should
indeed do that, especially for a check done at build time!

>>>  	if (!msk)
>>>  		return -EINVAL;
>>>  
>>> -	msk_owned_by_me(msk);
>>> +#ifdef CONFIG_LOCKDEP
>>> +	if (!lockdep_sock_is_held((const struct sock *)msk))
>>> +		return -EINVAL;
>>
>> This feels wrong: lockdep is a debugging tool, the goal is usually to
>> WARN in case of issue, not to take a different path.
>>
>> Maybe you want to use spin_is_locked(), but still, I'm not sure
>> whether
> 
> spin_is_locked() doesn't work, since this msk socket lock isn't a
> spinlock, right?

(I was talking about sk->sk_lock.slock, but while not being sure that's
what you need here)

>> that's the direction to take: it means the msk always needs to be
>> locked
>> when iterating over the different subflow → is it always the case?
>> Maybe
>> yes if it is used only to modify update it?
>>
>> (If not, maybe you just need to check the sk_refcnt as a safety
>> measure?
>> But that's maybe not enough for all cases.)
> 
> I think the answer is no, sometimes locking is not necessary when
> iterating over the different subflows. Does this mean we can simply
> drop msk_owned_by_me() here?

I guess it is safer to add a check here, to avoid random BPF programs
calling this kfunc in other context that could cause issues.

At the meeting yesterday, it was not clear what would be best: in the
context of the packet scheduler and path-manager, do you know if this
helper will always be called when the msk is locked? (And owned by the
user?) If it is a hook to modify msk or subflows, I guess yes, but there
are many hooks, we didn't check everything.

So if the msk is locked (and owned by the user?), then that's good to
add such check. (If we don't need the lock, maybe we don't need bpf_iter
because we only need to read stuff?)

In this case, maybe enough to look around sock_owned_by_user_nocheck()
and/or spin_is_locked(&sk->sk_lock.slock) helpers? (We didn't check if
they were making sense in this context, to be checked first!)

Also, we were wondering what was done in other cases with BPF. If a
check is needed to verify a lock is held, how can the verifier make sure
the lock is held by the same context. For example, how can it make sure
it is not going to be released during the kfunc operation. (Maybe that's
more a question for BPF maintainers)

>> Did you check what was being done for the BPF TCP CA? I guess they
>> need
>> a way to prevent calling some kfunc without the required locks or
>> refcount, no?
> 
> Not found in BPF TCP CA.

Strange, I wonder what safeguards they have then.

Or maybe it is possible to say that some kfunc can only be called with a
specific struct ops?

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.