From: Geliang Tang <tanggeliang@kylinos.cn>
Address Martin's comments in v1:
- bpf_iter_mptcp_subflow_new returns -EINVAL when msk socket lock isn't
held.
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
---
net/mptcp/bpf.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c
index 02038db59956..89a7a482368d 100644
--- a/net/mptcp/bpf.c
+++ b/net/mptcp/bpf.c
@@ -252,7 +252,10 @@ bpf_iter_mptcp_subflow_new(struct bpf_iter_mptcp_subflow *it,
if (!msk)
return -EINVAL;
- msk_owned_by_me(msk);
+#ifdef CONFIG_LOCKDEP
+ if (!lockdep_sock_is_held((const struct sock *)msk))
+ return -EINVAL;
+#endif
kit->pos = &msk->conn_list;
return 0;
--
2.45.2
Hi Geliang, On 21/11/2024 10:36, Geliang Tang wrote: > From: Geliang Tang <tanggeliang@kylinos.cn> > > Address Martin's comments in v1: > > - bpf_iter_mptcp_subflow_new returns -EINVAL when msk socket lock isn't > held. > > Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> > --- > net/mptcp/bpf.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c > index 02038db59956..89a7a482368d 100644 > --- a/net/mptcp/bpf.c > +++ b/net/mptcp/bpf.c > @@ -252,7 +252,10 @@ bpf_iter_mptcp_subflow_new(struct bpf_iter_mptcp_subflow *it, > if (!msk) > return -EINVAL; > > - msk_owned_by_me(msk); > +#ifdef CONFIG_LOCKDEP > + if (!lockdep_sock_is_held((const struct sock *)msk)) > + return -EINVAL; This feels wrong: lockdep is a debugging tool, the goal is usually to WARN in case of issue, not to take a different path. Maybe you want to use spin_is_locked(), but still, I'm not sure whether that's the direction to take: it means the msk always needs to be locked when iterating over the different subflow → is it always the case? Maybe yes if it is used only to modify update it? (If not, maybe you just need to check the sk_refcnt as a safety measure? But that's maybe not enough for all cases.) Did you check what was being done for the BPF TCP CA? I guess they need a way to prevent calling some kfunc without the required locks or refcount, no? Cheers, Matt -- Sponsored by the NGI0 Core fund.
Hi Matt,
On Fri, 2024-11-22 at 11:02 +0100, Matthieu Baerts wrote:
> Hi Geliang,
>
> On 21/11/2024 10:36, Geliang Tang wrote:
> > From: Geliang Tang <tanggeliang@kylinos.cn>
> >
> > Address Martin's comments in v1:
> >
> > - bpf_iter_mptcp_subflow_new returns -EINVAL when msk socket lock
> > isn't
> > held.
> >
> > Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
> > ---
> > net/mptcp/bpf.c | 5 ++++-
> > 1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c
> > index 02038db59956..89a7a482368d 100644
> > --- a/net/mptcp/bpf.c
> > +++ b/net/mptcp/bpf.c
> > @@ -252,7 +252,10 @@ bpf_iter_mptcp_subflow_new(struct
> > bpf_iter_mptcp_subflow *it,
Other bpf_iter have added "sizeof" and "alignof" checks, I think
mptcp_subflow bpf_iter also needs to add them too:
BUILD_BUG_ON(sizeof(struct bpf_iter_mptcp_subflow_kern) !=
sizeof(struct bpf_iter_mptcp_subflow));
BUILD_BUG_ON(__alignof__(struct bpf_iter_mptcp_subflow_kern) !=
__alignof__(struct bpf_iter_mptcp_subflow));
> > if (!msk)
> > return -EINVAL;
> >
> > - msk_owned_by_me(msk);
> > +#ifdef CONFIG_LOCKDEP
> > + if (!lockdep_sock_is_held((const struct sock *)msk))
> > + return -EINVAL;
>
> This feels wrong: lockdep is a debugging tool, the goal is usually to
> WARN in case of issue, not to take a different path.
>
> Maybe you want to use spin_is_locked(), but still, I'm not sure
> whether
spin_is_locked() doesn't work, since this msk socket lock isn't a
spinlock, right?
> that's the direction to take: it means the msk always needs to be
> locked
> when iterating over the different subflow → is it always the case?
> Maybe
> yes if it is used only to modify update it?
>
> (If not, maybe you just need to check the sk_refcnt as a safety
> measure?
> But that's maybe not enough for all cases.)
I think the answer is no, sometimes locking is not necessary when
iterating over the different subflows. Does this mean we can simply
drop msk_owned_by_me() here?
>
> Did you check what was being done for the BPF TCP CA? I guess they
> need
> a way to prevent calling some kfunc without the required locks or
> refcount, no?
Not found in BPF TCP CA.
Thanks,
-Geliang
>
> Cheers,
> Matt
Hi Geliang, We briefly talked about this patch yesterday with Mat. On 27/11/2024 03:20, Geliang Tang wrote: > Hi Matt, > > On Fri, 2024-11-22 at 11:02 +0100, Matthieu Baerts wrote: >> Hi Geliang, >> >> On 21/11/2024 10:36, Geliang Tang wrote: >>> From: Geliang Tang <tanggeliang@kylinos.cn> >>> >>> Address Martin's comments in v1: >>> >>> - bpf_iter_mptcp_subflow_new returns -EINVAL when msk socket lock >>> isn't >>> held. >>> >>> Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> >>> --- >>> net/mptcp/bpf.c | 5 ++++- >>> 1 file changed, 4 insertions(+), 1 deletion(-) >>> >>> diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c >>> index 02038db59956..89a7a482368d 100644 >>> --- a/net/mptcp/bpf.c >>> +++ b/net/mptcp/bpf.c >>> @@ -252,7 +252,10 @@ bpf_iter_mptcp_subflow_new(struct >>> bpf_iter_mptcp_subflow *it, > > Other bpf_iter have added "sizeof" and "alignof" checks, I think > mptcp_subflow bpf_iter also needs to add them too: > > BUILD_BUG_ON(sizeof(struct bpf_iter_mptcp_subflow_kern) != > sizeof(struct bpf_iter_mptcp_subflow)); > BUILD_BUG_ON(__alignof__(struct bpf_iter_mptcp_subflow_kern) != > __alignof__(struct bpf_iter_mptcp_subflow)); If it is something that is done with other bpf_iter, I guess we should indeed do that, especially for a check done at build time! >>> if (!msk) >>> return -EINVAL; >>> >>> - msk_owned_by_me(msk); >>> +#ifdef CONFIG_LOCKDEP >>> + if (!lockdep_sock_is_held((const struct sock *)msk)) >>> + return -EINVAL; >> >> This feels wrong: lockdep is a debugging tool, the goal is usually to >> WARN in case of issue, not to take a different path. >> >> Maybe you want to use spin_is_locked(), but still, I'm not sure >> whether > > spin_is_locked() doesn't work, since this msk socket lock isn't a > spinlock, right? (I was talking about sk->sk_lock.slock, but while not being sure that's what you need here) >> that's the direction to take: it means the msk always needs to be >> locked >> when iterating over the different subflow → is it always the case? >> Maybe >> yes if it is used only to modify update it? >> >> (If not, maybe you just need to check the sk_refcnt as a safety >> measure? >> But that's maybe not enough for all cases.) > > I think the answer is no, sometimes locking is not necessary when > iterating over the different subflows. Does this mean we can simply > drop msk_owned_by_me() here? I guess it is safer to add a check here, to avoid random BPF programs calling this kfunc in other context that could cause issues. At the meeting yesterday, it was not clear what would be best: in the context of the packet scheduler and path-manager, do you know if this helper will always be called when the msk is locked? (And owned by the user?) If it is a hook to modify msk or subflows, I guess yes, but there are many hooks, we didn't check everything. So if the msk is locked (and owned by the user?), then that's good to add such check. (If we don't need the lock, maybe we don't need bpf_iter because we only need to read stuff?) In this case, maybe enough to look around sock_owned_by_user_nocheck() and/or spin_is_locked(&sk->sk_lock.slock) helpers? (We didn't check if they were making sense in this context, to be checked first!) Also, we were wondering what was done in other cases with BPF. If a check is needed to verify a lock is held, how can the verifier make sure the lock is held by the same context. For example, how can it make sure it is not going to be released during the kfunc operation. (Maybe that's more a question for BPF maintainers) >> Did you check what was being done for the BPF TCP CA? I guess they >> need >> a way to prevent calling some kfunc without the required locks or >> refcount, no? > > Not found in BPF TCP CA. Strange, I wonder what safeguards they have then. Or maybe it is possible to say that some kfunc can only be called with a specific struct ops? Cheers, Matt -- Sponsored by the NGI0 Core fund.
© 2016 - 2026 Red Hat, Inc.