[PATCH bpf-next/net 2/5] bpf: Add mptcp_subflow bpf_iter

Matthieu Baerts (NGI0) posted 5 patches 1 month, 2 weeks ago
There is a newer version of this series
[PATCH bpf-next/net 2/5] bpf: Add mptcp_subflow bpf_iter
Posted by Matthieu Baerts (NGI0) 1 month, 2 weeks ago
From: Geliang Tang <tanggeliang@kylinos.cn>

It's necessary to traverse all subflows on the conn_list of an MPTCP
socket and then call kfunc to modify the fields of each subflow. In
kernel space, mptcp_for_each_subflow() helper is used for this:

	mptcp_for_each_subflow(msk, subflow)
		kfunc(subflow);

But in the MPTCP BPF program, this has not yet been implemented. As
Martin suggested recently, this conn_list walking + modify-by-kfunc
usage fits the bpf_iter use case.

So this patch adds a new bpf_iter type named "mptcp_subflow" to do
this and implements its helpers bpf_iter_mptcp_subflow_new()/_next()/
_destroy(). And register these bpf_iter mptcp_subflow into mptcp
common kfunc set. Then bpf_for_each() for mptcp_subflow can be used
in BPF program like this:

	bpf_for_each(mptcp_subflow, subflow, msk)
		kfunc(subflow);

Suggested-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
Notes:
A few versions of this single patch have been previously posted to the
BPF mailing list by Geliang, before continuing to the MPTCP mailing list
only, with other patches of this series. The version of the whole series
has been reset to 1, but here is the ChangeLog for this patch here:
 - v2: remove msk->pm.lock in _new() and _destroy() (Martin)
       drop DEFINE_BPF_ITER_FUNC, change opaque[3] to opaque[2] (Andrii)
 - v3: drop bpf_iter__mptcp_subflow
 - v4: if msk is NULL, initialize kit->msk to NULL in _new() and check
       it in _next() (Andrii)
 - v5: use list_is_last() instead of list_entry_is_head() add
       KF_ITER_NEW/NEXT/DESTROY flags add msk_owned_by_me in _new()
 - v6: add KF_TRUSTED_ARGS flag (Andrii, Martin)
---
 net/mptcp/bpf.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)

diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c
index 6f96a5927fd371f8ea92cbf96c875edef9272b98..d107c2865e97e6ccffb9e0720dfbbd232b63a3b8 100644
--- a/net/mptcp/bpf.c
+++ b/net/mptcp/bpf.c
@@ -29,6 +29,15 @@ static const struct btf_kfunc_id_set bpf_mptcp_fmodret_set = {
 	.set   = &bpf_mptcp_fmodret_ids,
 };
 
+struct bpf_iter_mptcp_subflow {
+	__u64 __opaque[2];
+} __aligned(8);
+
+struct bpf_iter_mptcp_subflow_kern {
+	struct mptcp_sock *msk;
+	struct list_head *pos;
+} __aligned(8);
+
 __bpf_kfunc_start_defs();
 
 __bpf_kfunc static struct mptcp_sock *bpf_mptcp_sk(struct sock *sk)
@@ -48,12 +57,48 @@ bpf_mptcp_subflow_tcp_sock(const struct mptcp_subflow_context *subflow)
 	return mptcp_subflow_tcp_sock(subflow);
 }
 
+__bpf_kfunc static int
+bpf_iter_mptcp_subflow_new(struct bpf_iter_mptcp_subflow *it,
+			   struct mptcp_sock *msk)
+{
+	struct bpf_iter_mptcp_subflow_kern *kit = (void *)it;
+
+	kit->msk = msk;
+	if (!msk)
+		return -EINVAL;
+
+	msk_owned_by_me(msk);
+
+	kit->pos = &msk->conn_list;
+	return 0;
+}
+
+__bpf_kfunc static struct mptcp_subflow_context *
+bpf_iter_mptcp_subflow_next(struct bpf_iter_mptcp_subflow *it)
+{
+	struct bpf_iter_mptcp_subflow_kern *kit = (void *)it;
+
+	if (!kit->msk || list_is_last(kit->pos, &kit->msk->conn_list))
+		return NULL;
+
+	kit->pos = kit->pos->next;
+	return list_entry(kit->pos, struct mptcp_subflow_context, node);
+}
+
+__bpf_kfunc static void
+bpf_iter_mptcp_subflow_destroy(struct bpf_iter_mptcp_subflow *it)
+{
+}
+
 __bpf_kfunc_end_defs();
 
 BTF_KFUNCS_START(bpf_mptcp_common_kfunc_ids)
 BTF_ID_FLAGS(func, bpf_mptcp_sk)
 BTF_ID_FLAGS(func, bpf_mptcp_subflow_ctx)
 BTF_ID_FLAGS(func, bpf_mptcp_subflow_tcp_sock)
+BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_new, KF_ITER_NEW | KF_TRUSTED_ARGS)
+BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_next, KF_ITER_NEXT | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_destroy, KF_ITER_DESTROY)
 BTF_KFUNCS_END(bpf_mptcp_common_kfunc_ids)
 
 static const struct btf_kfunc_id_set bpf_mptcp_common_kfunc_set = {

-- 
2.45.2
Re: [PATCH bpf-next/net 2/5] bpf: Add mptcp_subflow bpf_iter
Posted by Martin KaFai Lau 1 month, 2 weeks ago
On 11/8/24 7:52 AM, Matthieu Baerts (NGI0) wrote:
> From: Geliang Tang <tanggeliang@kylinos.cn>
> 
> It's necessary to traverse all subflows on the conn_list of an MPTCP
> socket and then call kfunc to modify the fields of each subflow. In
> kernel space, mptcp_for_each_subflow() helper is used for this:
> 
> 	mptcp_for_each_subflow(msk, subflow)
> 		kfunc(subflow);
> 
> But in the MPTCP BPF program, this has not yet been implemented. As
> Martin suggested recently, this conn_list walking + modify-by-kfunc
> usage fits the bpf_iter use case.
> 
> So this patch adds a new bpf_iter type named "mptcp_subflow" to do
> this and implements its helpers bpf_iter_mptcp_subflow_new()/_next()/
> _destroy(). And register these bpf_iter mptcp_subflow into mptcp
> common kfunc set. Then bpf_for_each() for mptcp_subflow can be used
> in BPF program like this:
> 
> 	bpf_for_each(mptcp_subflow, subflow, msk)
> 		kfunc(subflow);
> 
> Suggested-by: Martin KaFai Lau <martin.lau@kernel.org>
> Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
> Reviewed-by: Mat Martineau <martineau@kernel.org>
> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
> ---
> Notes:
> A few versions of this single patch have been previously posted to the
> BPF mailing list by Geliang, before continuing to the MPTCP mailing list
> only, with other patches of this series. The version of the whole series
> has been reset to 1, but here is the ChangeLog for this patch here:
>   - v2: remove msk->pm.lock in _new() and _destroy() (Martin)
>         drop DEFINE_BPF_ITER_FUNC, change opaque[3] to opaque[2] (Andrii)
>   - v3: drop bpf_iter__mptcp_subflow
>   - v4: if msk is NULL, initialize kit->msk to NULL in _new() and check
>         it in _next() (Andrii)
>   - v5: use list_is_last() instead of list_entry_is_head() add
>         KF_ITER_NEW/NEXT/DESTROY flags add msk_owned_by_me in _new()
>   - v6: add KF_TRUSTED_ARGS flag (Andrii, Martin)
> ---
>   net/mptcp/bpf.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 45 insertions(+)
> 
> diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c
> index 6f96a5927fd371f8ea92cbf96c875edef9272b98..d107c2865e97e6ccffb9e0720dfbbd232b63a3b8 100644
> --- a/net/mptcp/bpf.c
> +++ b/net/mptcp/bpf.c
> @@ -29,6 +29,15 @@ static const struct btf_kfunc_id_set bpf_mptcp_fmodret_set = {
>   	.set   = &bpf_mptcp_fmodret_ids,
>   };
>   
> +struct bpf_iter_mptcp_subflow {
> +	__u64 __opaque[2];
> +} __aligned(8);
> +
> +struct bpf_iter_mptcp_subflow_kern {
> +	struct mptcp_sock *msk;
> +	struct list_head *pos;
> +} __aligned(8);
> +
>   __bpf_kfunc_start_defs();
>   
>   __bpf_kfunc static struct mptcp_sock *bpf_mptcp_sk(struct sock *sk)
> @@ -48,12 +57,48 @@ bpf_mptcp_subflow_tcp_sock(const struct mptcp_subflow_context *subflow)
>   	return mptcp_subflow_tcp_sock(subflow);
>   }
>   
> +__bpf_kfunc static int
> +bpf_iter_mptcp_subflow_new(struct bpf_iter_mptcp_subflow *it,
> +			   struct mptcp_sock *msk)
> +{
> +	struct bpf_iter_mptcp_subflow_kern *kit = (void *)it;
> +
> +	kit->msk = msk;
> +	if (!msk)
> +		return -EINVAL;
> +
> +	msk_owned_by_me(msk);

I recalled in the earlier revision, a concern had already been brought up about 
needing lock held and using the subflow iter in tracing. This patch still has 
the subflow iter available to tracing [by 
register_btf_kfunc_id_set(BPF_PROG_TYPE_UNSPEC)]. How is it supposed to work? 
Adding msk_owned_by_me(msk) does not help. At best it will give a WARN which is 
not good and then keep going even msk is not locked.

Do you need to use subflow iter in tracing?

The commit message mentioned it needs to modify the subflow. I don't see how 
this modification could work in a tracing program also. It must be some non 
tracing hooks? What is the plan on this hook? Is it a bpf_struct_ops or 
something else?

If it needs to modify the subflow, does it need to take the lock of the subflow?

> +
> +	kit->pos = &msk->conn_list;
> +	return 0;
> +}
> +
> +__bpf_kfunc static struct mptcp_subflow_context *
> +bpf_iter_mptcp_subflow_next(struct bpf_iter_mptcp_subflow *it)
> +{
> +	struct bpf_iter_mptcp_subflow_kern *kit = (void *)it;
> +
> +	if (!kit->msk || list_is_last(kit->pos, &kit->msk->conn_list))
> +		return NULL;
> +
> +	kit->pos = kit->pos->next;
> +	return list_entry(kit->pos, struct mptcp_subflow_context, node);
> +}
> +
> +__bpf_kfunc static void
> +bpf_iter_mptcp_subflow_destroy(struct bpf_iter_mptcp_subflow *it)
> +{
> +}
> +
>   __bpf_kfunc_end_defs();
>   
>   BTF_KFUNCS_START(bpf_mptcp_common_kfunc_ids)
>   BTF_ID_FLAGS(func, bpf_mptcp_sk)
>   BTF_ID_FLAGS(func, bpf_mptcp_subflow_ctx)
>   BTF_ID_FLAGS(func, bpf_mptcp_subflow_tcp_sock)
> +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_new, KF_ITER_NEW | KF_TRUSTED_ARGS)
> +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_next, KF_ITER_NEXT | KF_RET_NULL)
> +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_destroy, KF_ITER_DESTROY)
>   BTF_KFUNCS_END(bpf_mptcp_common_kfunc_ids)
>   
>   static const struct btf_kfunc_id_set bpf_mptcp_common_kfunc_set = {
>
Re: [PATCH bpf-next/net 2/5] bpf: Add mptcp_subflow bpf_iter
Posted by Geliang Tang 1 month, 1 week ago
Hi Martin,

On Mon, Nov 11, 2024 at 04:50:49PM -0800, Martin KaFai Lau wrote:
> On 11/8/24 7:52 AM, Matthieu Baerts (NGI0) wrote:
> > From: Geliang Tang <tanggeliang@kylinos.cn>
> > 
> > It's necessary to traverse all subflows on the conn_list of an MPTCP
> > socket and then call kfunc to modify the fields of each subflow. In
> > kernel space, mptcp_for_each_subflow() helper is used for this:
> > 
> > 	mptcp_for_each_subflow(msk, subflow)
> > 		kfunc(subflow);
> > 
> > But in the MPTCP BPF program, this has not yet been implemented. As
> > Martin suggested recently, this conn_list walking + modify-by-kfunc
> > usage fits the bpf_iter use case.
> > 
> > So this patch adds a new bpf_iter type named "mptcp_subflow" to do
> > this and implements its helpers bpf_iter_mptcp_subflow_new()/_next()/
> > _destroy(). And register these bpf_iter mptcp_subflow into mptcp
> > common kfunc set. Then bpf_for_each() for mptcp_subflow can be used
> > in BPF program like this:
> > 
> > 	bpf_for_each(mptcp_subflow, subflow, msk)
> > 		kfunc(subflow);
> > 
> > Suggested-by: Martin KaFai Lau <martin.lau@kernel.org>
> > Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
> > Reviewed-by: Mat Martineau <martineau@kernel.org>
> > Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
> > ---
> > Notes:
> > A few versions of this single patch have been previously posted to the
> > BPF mailing list by Geliang, before continuing to the MPTCP mailing list
> > only, with other patches of this series. The version of the whole series
> > has been reset to 1, but here is the ChangeLog for this patch here:
> >   - v2: remove msk->pm.lock in _new() and _destroy() (Martin)
> >         drop DEFINE_BPF_ITER_FUNC, change opaque[3] to opaque[2] (Andrii)
> >   - v3: drop bpf_iter__mptcp_subflow
> >   - v4: if msk is NULL, initialize kit->msk to NULL in _new() and check
> >         it in _next() (Andrii)
> >   - v5: use list_is_last() instead of list_entry_is_head() add
> >         KF_ITER_NEW/NEXT/DESTROY flags add msk_owned_by_me in _new()
> >   - v6: add KF_TRUSTED_ARGS flag (Andrii, Martin)
> > ---
> >   net/mptcp/bpf.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
> >   1 file changed, 45 insertions(+)
> > 
> > diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c
> > index 6f96a5927fd371f8ea92cbf96c875edef9272b98..d107c2865e97e6ccffb9e0720dfbbd232b63a3b8 100644
> > --- a/net/mptcp/bpf.c
> > +++ b/net/mptcp/bpf.c
> > @@ -29,6 +29,15 @@ static const struct btf_kfunc_id_set bpf_mptcp_fmodret_set = {
> >   	.set   = &bpf_mptcp_fmodret_ids,
> >   };
> > +struct bpf_iter_mptcp_subflow {
> > +	__u64 __opaque[2];
> > +} __aligned(8);
> > +
> > +struct bpf_iter_mptcp_subflow_kern {
> > +	struct mptcp_sock *msk;
> > +	struct list_head *pos;
> > +} __aligned(8);
> > +
> >   __bpf_kfunc_start_defs();
> >   __bpf_kfunc static struct mptcp_sock *bpf_mptcp_sk(struct sock *sk)
> > @@ -48,12 +57,48 @@ bpf_mptcp_subflow_tcp_sock(const struct mptcp_subflow_context *subflow)
> >   	return mptcp_subflow_tcp_sock(subflow);
> >   }
> > +__bpf_kfunc static int
> > +bpf_iter_mptcp_subflow_new(struct bpf_iter_mptcp_subflow *it,
> > +			   struct mptcp_sock *msk)
> > +{
> > +	struct bpf_iter_mptcp_subflow_kern *kit = (void *)it;
> > +
> > +	kit->msk = msk;
> > +	if (!msk)
> > +		return -EINVAL;
> > +
> > +	msk_owned_by_me(msk);
> 
> I recalled in the earlier revision, a concern had already been brought up
> about needing lock held and using the subflow iter in tracing. This patch
> still has the subflow iter available to tracing [by
> register_btf_kfunc_id_set(BPF_PROG_TYPE_UNSPEC)]. How is it supposed to
> work? Adding msk_owned_by_me(msk) does not help. At best it will give a WARN
> which is not good and then keep going even msk is not locked.

I'll check lockdep_sock_is_held(msk) here in v2, and return NULL if msk
socket is not locked.

> 
> Do you need to use subflow iter in tracing?

No.

> 
> The commit message mentioned it needs to modify the subflow. I don't see how
> this modification could work in a tracing program also. It must be some non
> tracing hooks? What is the plan on this hook? Is it a bpf_struct_ops or
> something else?

We only plan to use it in struct_ops (mptcp bpf path manager [1], and mptcp
packet scheduler [2]) and cgroup sockopt (mptcp setsockopt [3]).

So I'll register this kfunc_set for BPF_PROG_TYPE_STRUCT_OPS and
BPF_PROG_TYPE_CGROUP_SOCKOPT only in v2, not for BPF_PROG_TYPE_UNSPEC.

> 
> If it needs to modify the subflow, does it need to take the lock of the subflow?

We will call the following mptcp_subflow_set_scheduled() kfunc to set the
scheduled field of a subflow:

void mptcp_subflow_set_scheduled(struct mptcp_subflow_context *subflow,
                                 bool scheduled)
{
        WRITE_ONCE(subflow->scheduled, scheduled);
}

WRITE_ONCE is used here, and no additional lock of the subflow is used.

Thanks,
-Geliang

[1] https://github.com/multipath-tcp/mptcp_net-next/issues/74
[2] https://github.com/multipath-tcp/mptcp_net-next/issues/75
[3] https://github.com/multipath-tcp/mptcp_net-next/issues/484

> 
> > +
> > +	kit->pos = &msk->conn_list;
> > +	return 0;
> > +}
> > +
> > +__bpf_kfunc static struct mptcp_subflow_context *
> > +bpf_iter_mptcp_subflow_next(struct bpf_iter_mptcp_subflow *it)
> > +{
> > +	struct bpf_iter_mptcp_subflow_kern *kit = (void *)it;
> > +
> > +	if (!kit->msk || list_is_last(kit->pos, &kit->msk->conn_list))
> > +		return NULL;
> > +
> > +	kit->pos = kit->pos->next;
> > +	return list_entry(kit->pos, struct mptcp_subflow_context, node);
> > +}
> > +
> > +__bpf_kfunc static void
> > +bpf_iter_mptcp_subflow_destroy(struct bpf_iter_mptcp_subflow *it)
> > +{
> > +}
> > +
> >   __bpf_kfunc_end_defs();
> >   BTF_KFUNCS_START(bpf_mptcp_common_kfunc_ids)
> >   BTF_ID_FLAGS(func, bpf_mptcp_sk)
> >   BTF_ID_FLAGS(func, bpf_mptcp_subflow_ctx)
> >   BTF_ID_FLAGS(func, bpf_mptcp_subflow_tcp_sock)
> > +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_new, KF_ITER_NEW | KF_TRUSTED_ARGS)
> > +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_next, KF_ITER_NEXT | KF_RET_NULL)
> > +BTF_ID_FLAGS(func, bpf_iter_mptcp_subflow_destroy, KF_ITER_DESTROY)
> >   BTF_KFUNCS_END(bpf_mptcp_common_kfunc_ids)
> >   static const struct btf_kfunc_id_set bpf_mptcp_common_kfunc_set = {
> > 
>