[v9] add mptcp_subflow bpf_iter

[PATCH mptcp-next v9 4/7] selftests/bpf: Add mptcp_subflow bpf_iter test prog

Posted by Geliang Tang 1 year, 4 months ago

From: Geliang Tang <tanggeliang@kylinos.cn>

This patch adds a ftrace hook for mptcp_sched_get_send() to test the newly
added mptcp_subflow bpf_iter. This test simulates a typical mptcp packet
scheduler, which selects a subflow from multiple subflows of an mptcp
socket to send data.

Export mptcp_subflow helpers bpf_iter_mptcp_subflow_new/_next/_destroy,
bpf_mptcp_sock_acquire/_release and other helpers into bpf_experimental.h.

Use _acquire() to acquire the msk, then use bpf_for_each(mptcp_subflow) to
walk the subflow list of this msk. Invoke kfuncs mptcp_subflow_active() and
bpf_mptcp_subflow_tcp_sock() in the loop to pick a subsocket. Finally use
bpf_mptcp_subflow_ctx() to get the subflow context of this subsocket and
use mptcp_subflow_set_scheduled() to set it as being scheduled.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
---
 .../testing/selftests/bpf/bpf_experimental.h  |  7 ++++
 tools/testing/selftests/bpf/progs/mptcp_bpf.h |  9 ++++
 .../bpf/progs/mptcp_bpf_iters_subflow.c       | 42 +++++++++++++++++++
 3 files changed, 58 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/progs/mptcp_bpf_iters_subflow.c

diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
index b0668f29f7b3..d43690b17468 100644
--- a/tools/testing/selftests/bpf/bpf_experimental.h
+++ b/tools/testing/selftests/bpf/bpf_experimental.h
@@ -575,6 +575,13 @@ extern int bpf_iter_css_new(struct bpf_iter_css *it,
 extern struct cgroup_subsys_state *bpf_iter_css_next(struct bpf_iter_css *it) __weak __ksym;
 extern void bpf_iter_css_destroy(struct bpf_iter_css *it) __weak __ksym;
 
+struct bpf_iter_mptcp_subflow;
+extern int bpf_iter_mptcp_subflow_new(struct bpf_iter_mptcp_subflow *it,
+				      struct mptcp_sock *msk) __weak __ksym;
+extern struct mptcp_subflow_context *
+bpf_iter_mptcp_subflow_next(struct bpf_iter_mptcp_subflow *it) __weak __ksym;
+extern void bpf_iter_mptcp_subflow_destroy(struct bpf_iter_mptcp_subflow *it) __weak __ksym;
+
 extern int bpf_wq_init(struct bpf_wq *wq, void *p__map, unsigned int flags) __weak __ksym;
 extern int bpf_wq_start(struct bpf_wq *wq, unsigned int flags) __weak __ksym;
 extern int bpf_wq_set_callback_impl(struct bpf_wq *wq,
diff --git a/tools/testing/selftests/bpf/progs/mptcp_bpf.h b/tools/testing/selftests/bpf/progs/mptcp_bpf.h
index c3800f986ae1..e18796361394 100644
--- a/tools/testing/selftests/bpf/progs/mptcp_bpf.h
+++ b/tools/testing/selftests/bpf/progs/mptcp_bpf.h
@@ -43,9 +43,18 @@ mptcp_subflow_tcp_sock(const struct mptcp_subflow_context *subflow)
 }
 
 /* ksym */
+extern bool mptcp_subflow_active(struct mptcp_subflow_context *subflow) __ksym;
 extern void mptcp_subflow_set_scheduled(struct mptcp_subflow_context *subflow,
 					bool scheduled) __ksym;
 
+extern struct mptcp_sock *bpf_mptcp_sock_acquire(struct mptcp_sock *msk) __ksym;
+extern void bpf_mptcp_sock_release(struct mptcp_sock *msk) __ksym;
+
+extern struct mptcp_subflow_context *
+bpf_mptcp_subflow_ctx(const struct sock *sk) __ksym;
+extern struct sock *
+bpf_mptcp_subflow_tcp_sock(const struct mptcp_subflow_context *subflow) __ksym;
+
 extern struct mptcp_subflow_context *
 bpf_mptcp_subflow_ctx_by_pos(const struct mptcp_sched_data *data, unsigned int pos) __ksym;
 
diff --git a/tools/testing/selftests/bpf/progs/mptcp_bpf_iters_subflow.c b/tools/testing/selftests/bpf/progs/mptcp_bpf_iters_subflow.c
new file mode 100644
index 000000000000..4268e4604c5a
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/mptcp_bpf_iters_subflow.c
@@ -0,0 +1,42 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2024, Kylin Software */
+
+/* vmlinux.h, bpf_helpers.h and other 'define' */
+#include "bpf_tracing_net.h"
+#include "mptcp_bpf.h"
+
+char _license[] SEC("license") = "GPL";
+int subflows;
+int pid;
+
+SEC("fentry/mptcp_sched_get_send")
+int BPF_PROG(trace_mptcp_sched_get_send, struct mptcp_sock *msk)
+{
+	struct mptcp_subflow_context *subflow;
+	struct sock *ssk = NULL;
+
+	if (bpf_get_current_pid_tgid() >> 32 != pid)
+		return 0;
+
+	msk = bpf_mptcp_sock_acquire(msk);
+	if (!msk)
+		return 0;
+	bpf_for_each(mptcp_subflow, subflow, msk) {
+		if (subflow->token != msk->token)
+			break;
+
+		if (!mptcp_subflow_active(subflow))
+			continue;
+
+		ssk = bpf_mptcp_subflow_tcp_sock(subflow);
+	}
+	bpf_mptcp_sock_release(msk);
+
+	if (!ssk)
+		return 0;
+	subflow = bpf_mptcp_subflow_ctx(ssk);
+	mptcp_subflow_set_scheduled(subflow, true);
+	subflows = subflow->subflow_id;
+
+	return 0;
+}
-- 
2.43.0

Re: [PATCH mptcp-next v9 4/7] selftests/bpf: Add mptcp_subflow bpf_iter test prog

Posted by Matthieu Baerts 1 year, 3 months ago

Hi Geliang,

On 09/10/2024 11:45, Geliang Tang wrote:
> From: Geliang Tang <tanggeliang@kylinos.cn>
> 
> This patch adds a ftrace hook for mptcp_sched_get_send() to test the newly
> added mptcp_subflow bpf_iter. This test simulates a typical mptcp packet
> scheduler, which selects a subflow from multiple subflows of an mptcp
> socket to send data.
> 
> Export mptcp_subflow helpers bpf_iter_mptcp_subflow_new/_next/_destroy,
> bpf_mptcp_sock_acquire/_release and other helpers into bpf_experimental.h.
> 
> Use _acquire() to acquire the msk, then use bpf_for_each(mptcp_subflow) to
> walk the subflow list of this msk. Invoke kfuncs mptcp_subflow_active() and
> bpf_mptcp_subflow_tcp_sock() in the loop to pick a subsocket. Finally use
> bpf_mptcp_subflow_ctx() to get the subflow context of this subsocket and
> use mptcp_subflow_set_scheduled() to set it as being scheduled.

Do you think we could have a test not depending on scheduler helpers?
Without this dependence, we could already upstream this series, and get
feedback without having to wait for the new scheduler API.

If you remove the use of mptcp_subflow_active() and
mptcp_subflow_set_scheduled(), that's enough, no?

> 
> Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
> ---
>  .../testing/selftests/bpf/bpf_experimental.h  |  7 ++++
>  tools/testing/selftests/bpf/progs/mptcp_bpf.h |  9 ++++
>  .../bpf/progs/mptcp_bpf_iters_subflow.c       | 42 +++++++++++++++++++
>  3 files changed, 58 insertions(+)
>  create mode 100644 tools/testing/selftests/bpf/progs/mptcp_bpf_iters_subflow.c
> 
> diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h
> index b0668f29f7b3..d43690b17468 100644
> --- a/tools/testing/selftests/bpf/bpf_experimental.h
> +++ b/tools/testing/selftests/bpf/bpf_experimental.h
> @@ -575,6 +575,13 @@ extern int bpf_iter_css_new(struct bpf_iter_css *it,
>  extern struct cgroup_subsys_state *bpf_iter_css_next(struct bpf_iter_css *it) __weak __ksym;
>  extern void bpf_iter_css_destroy(struct bpf_iter_css *it) __weak __ksym;
>  
> +struct bpf_iter_mptcp_subflow;
> +extern int bpf_iter_mptcp_subflow_new(struct bpf_iter_mptcp_subflow *it,
> +				      struct mptcp_sock *msk) __weak __ksym;
> +extern struct mptcp_subflow_context *
> +bpf_iter_mptcp_subflow_next(struct bpf_iter_mptcp_subflow *it) __weak __ksym;
> +extern void bpf_iter_mptcp_subflow_destroy(struct bpf_iter_mptcp_subflow *it) __weak __ksym;
> +
>  extern int bpf_wq_init(struct bpf_wq *wq, void *p__map, unsigned int flags) __weak __ksym;
>  extern int bpf_wq_start(struct bpf_wq *wq, unsigned int flags) __weak __ksym;
>  extern int bpf_wq_set_callback_impl(struct bpf_wq *wq,
> diff --git a/tools/testing/selftests/bpf/progs/mptcp_bpf.h b/tools/testing/selftests/bpf/progs/mptcp_bpf.h
> index c3800f986ae1..e18796361394 100644
> --- a/tools/testing/selftests/bpf/progs/mptcp_bpf.h
> +++ b/tools/testing/selftests/bpf/progs/mptcp_bpf.h
> @@ -43,9 +43,18 @@ mptcp_subflow_tcp_sock(const struct mptcp_subflow_context *subflow)
>  }
>  
>  /* ksym */
> +extern bool mptcp_subflow_active(struct mptcp_subflow_context *subflow) __ksym;
>  extern void mptcp_subflow_set_scheduled(struct mptcp_subflow_context *subflow,
>  					bool scheduled) __ksym;
>  
> +extern struct mptcp_sock *bpf_mptcp_sock_acquire(struct mptcp_sock *msk) __ksym;
> +extern void bpf_mptcp_sock_release(struct mptcp_sock *msk) __ksym;
> +
> +extern struct mptcp_subflow_context *
> +bpf_mptcp_subflow_ctx(const struct sock *sk) __ksym;
> +extern struct sock *
> +bpf_mptcp_subflow_tcp_sock(const struct mptcp_subflow_context *subflow) __ksym;
> +
>  extern struct mptcp_subflow_context *
>  bpf_mptcp_subflow_ctx_by_pos(const struct mptcp_sched_data *data, unsigned int pos) __ksym;
>  
> diff --git a/tools/testing/selftests/bpf/progs/mptcp_bpf_iters_subflow.c b/tools/testing/selftests/bpf/progs/mptcp_bpf_iters_subflow.c
> new file mode 100644
> index 000000000000..4268e4604c5a
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/mptcp_bpf_iters_subflow.c
> @@ -0,0 +1,42 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2024, Kylin Software */
> +
> +/* vmlinux.h, bpf_helpers.h and other 'define' */
> +#include "bpf_tracing_net.h"
> +#include "mptcp_bpf.h"
> +
> +char _license[] SEC("license") = "GPL";
> +int subflows;
> +int pid;
> +
> +SEC("fentry/mptcp_sched_get_send")

If I understand correctly, this hook will be called twice on the client
connection for this test because the client is sending two times one
byte, right?

> +int BPF_PROG(trace_mptcp_sched_get_send, struct mptcp_sock *msk)
> +{
> +	struct mptcp_subflow_context *subflow;
> +	struct sock *ssk = NULL;
> +
> +	if (bpf_get_current_pid_tgid() >> 32 != pid)
> +		return 0;
> +
> +	msk = bpf_mptcp_sock_acquire(msk);
> +	if (!msk)
> +		return 0;
> +	bpf_for_each(mptcp_subflow, subflow, msk) {
> +		if (subflow->token != msk->token)
> +			break;

Out of curiosity, why is this needed? Is it needed for the verifier? Or
just an extra check?
Can you add a comment here explaining why this is there please?

> +
> +		if (!mptcp_subflow_active(subflow))
> +			continue;
> +
> +		ssk = bpf_mptcp_subflow_tcp_sock(subflow);

Do you get 'ssk' just to use bpf_mptcp_subflow_tcp_sock() and
bpf_mptcp_subflow_ctx()?

> +	}
> +	bpf_mptcp_sock_release(msk);
> +
> +	if (!ssk)
> +		return 0;
> +	subflow = bpf_mptcp_subflow_ctx(ssk);
> +	mptcp_subflow_set_scheduled(subflow, true);
> +	subflows = subflow->subflow_id;

Here, it looks like you only check if the last subflow is in the list.

Would it not be better to count the number of subflows in the list? Or
if you want to read something from 'subflow', you could add all subflow_id?

  subflows = 0;

  (...)

  bpf_for_each(mptcp_subflow, subflow, msk)
      subflows += subflow->subflow_id;

By doing that, we will catch if one is missing, and the order is not
important.

If still want to use bpf_mptcp_subflow_tcp_sock() and
bpf_mptcp_subflow_ctx(), maybe you can save other fields from the last
subflow: the token to compare it with the one in the msk? Or the port
number or the address?

> +
> +	return 0;
> +}

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.

Re: [PATCH mptcp-next v9 4/7] selftests/bpf: Add mptcp_subflow bpf_iter test prog

Posted by Geliang Tang 1 year, 3 months ago

On Mon, 2024-10-14 at 18:06 +0200, Matthieu Baerts wrote:
> Hi Geliang,
> 
> On 09/10/2024 11:45, Geliang Tang wrote:
> > From: Geliang Tang <tanggeliang@kylinos.cn>
> > 
> > This patch adds a ftrace hook for mptcp_sched_get_send() to test
> > the newly
> > added mptcp_subflow bpf_iter. This test simulates a typical mptcp
> > packet
> > scheduler, which selects a subflow from multiple subflows of an
> > mptcp
> > socket to send data.
> > 
> > Export mptcp_subflow helpers
> > bpf_iter_mptcp_subflow_new/_next/_destroy,
> > bpf_mptcp_sock_acquire/_release and other helpers into
> > bpf_experimental.h.
> > 
> > Use _acquire() to acquire the msk, then use
> > bpf_for_each(mptcp_subflow) to
> > walk the subflow list of this msk. Invoke kfuncs
> > mptcp_subflow_active() and
> > bpf_mptcp_subflow_tcp_sock() in the loop to pick a subsocket.
> > Finally use
> > bpf_mptcp_subflow_ctx() to get the subflow context of this
> > subsocket and
> > use mptcp_subflow_set_scheduled() to set it as being scheduled.
> 
> Do you think we could have a test not depending on scheduler helpers?
> Without this dependence, we could already upstream this series, and
> get
> feedback without having to wait for the new scheduler API.

This set can be upstream as is, no need to wait for the new scheduler
API.

If no scheduler helpers are used in this test, no need to add this
mptcp_subflow bpf_iter at all. mptcp_for_each_subflow() helper in
progs/mptcp_bpf.h can do that.

mptcp_for_each_subflow(msk, subflow) {
	subflow = bpf_core_cast(subflow, struct
mptcp_subflow_context);
	subflows += subflow->subflow_id;
}

No need to use this:

bpf_for_each(mptcp_subflow, subflow, msk)
	subflows += subflow->subflow_id;

> 
> If you remove the use of mptcp_subflow_active() and
> mptcp_subflow_set_scheduled(), that's enough, no?

But "subflow" in mptcp_for_each_subflow() loop can't be passed to a
kernel function:

mptcp_for_each_subflow(msk, subflow) {
	subflow = bpf_core_cast(subflow, struct
mptcp_subflow_context);
	mptcp_subflow_active(subflows);
}

This is not allowed by BPF.

So we add this iter to do this:

bpf_for_each(mptcp_subflow, subflow, msk)
	kfunc(subflow);

In this test, I must pick some kfuncs accepted "subflow" argument to do
this test, so mptcp_subflow_active and mptcp_subflow_set_scheduled are
picked.

Also, This iter is for the BPF packet scheduler, why not test the
actual usage of scheduler helpers in this test?

> 
> > 
> > Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
> > ---
> >  .../testing/selftests/bpf/bpf_experimental.h  |  7 ++++
> >  tools/testing/selftests/bpf/progs/mptcp_bpf.h |  9 ++++
> >  .../bpf/progs/mptcp_bpf_iters_subflow.c       | 42
> > +++++++++++++++++++
> >  3 files changed, 58 insertions(+)
> >  create mode 100644
> > tools/testing/selftests/bpf/progs/mptcp_bpf_iters_subflow.c
> > 
> > diff --git a/tools/testing/selftests/bpf/bpf_experimental.h
> > b/tools/testing/selftests/bpf/bpf_experimental.h
> > index b0668f29f7b3..d43690b17468 100644
> > --- a/tools/testing/selftests/bpf/bpf_experimental.h
> > +++ b/tools/testing/selftests/bpf/bpf_experimental.h
> > @@ -575,6 +575,13 @@ extern int bpf_iter_css_new(struct
> > bpf_iter_css *it,
> >  extern struct cgroup_subsys_state *bpf_iter_css_next(struct
> > bpf_iter_css *it) __weak __ksym;
> >  extern void bpf_iter_css_destroy(struct bpf_iter_css *it) __weak
> > __ksym;
> >  
> > +struct bpf_iter_mptcp_subflow;
> > +extern int bpf_iter_mptcp_subflow_new(struct
> > bpf_iter_mptcp_subflow *it,
> > +				      struct mptcp_sock *msk)
> > __weak __ksym;
> > +extern struct mptcp_subflow_context *
> > +bpf_iter_mptcp_subflow_next(struct bpf_iter_mptcp_subflow *it)
> > __weak __ksym;
> > +extern void bpf_iter_mptcp_subflow_destroy(struct
> > bpf_iter_mptcp_subflow *it) __weak __ksym;
> > +
> >  extern int bpf_wq_init(struct bpf_wq *wq, void *p__map, unsigned
> > int flags) __weak __ksym;
> >  extern int bpf_wq_start(struct bpf_wq *wq, unsigned int flags)
> > __weak __ksym;
> >  extern int bpf_wq_set_callback_impl(struct bpf_wq *wq,
> > diff --git a/tools/testing/selftests/bpf/progs/mptcp_bpf.h
> > b/tools/testing/selftests/bpf/progs/mptcp_bpf.h
> > index c3800f986ae1..e18796361394 100644
> > --- a/tools/testing/selftests/bpf/progs/mptcp_bpf.h
> > +++ b/tools/testing/selftests/bpf/progs/mptcp_bpf.h
> > @@ -43,9 +43,18 @@ mptcp_subflow_tcp_sock(const struct
> > mptcp_subflow_context *subflow)
> >  }
> >  
> >  /* ksym */
> > +extern bool mptcp_subflow_active(struct mptcp_subflow_context
> > *subflow) __ksym;
> >  extern void mptcp_subflow_set_scheduled(struct
> > mptcp_subflow_context *subflow,
> >  					bool scheduled) __ksym;
> >  
> > +extern struct mptcp_sock *bpf_mptcp_sock_acquire(struct mptcp_sock
> > *msk) __ksym;
> > +extern void bpf_mptcp_sock_release(struct mptcp_sock *msk) __ksym;
> > +
> > +extern struct mptcp_subflow_context *
> > +bpf_mptcp_subflow_ctx(const struct sock *sk) __ksym;
> > +extern struct sock *
> > +bpf_mptcp_subflow_tcp_sock(const struct mptcp_subflow_context
> > *subflow) __ksym;
> > +
> >  extern struct mptcp_subflow_context *
> >  bpf_mptcp_subflow_ctx_by_pos(const struct mptcp_sched_data *data,
> > unsigned int pos) __ksym;
> >  
> > diff --git
> > a/tools/testing/selftests/bpf/progs/mptcp_bpf_iters_subflow.c
> > b/tools/testing/selftests/bpf/progs/mptcp_bpf_iters_subflow.c
> > new file mode 100644
> > index 000000000000..4268e4604c5a
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/progs/mptcp_bpf_iters_subflow.c
> > @@ -0,0 +1,42 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/* Copyright (c) 2024, Kylin Software */
> > +
> > +/* vmlinux.h, bpf_helpers.h and other 'define' */
> > +#include "bpf_tracing_net.h"
> > +#include "mptcp_bpf.h"
> > +
> > +char _license[] SEC("license") = "GPL";
> > +int subflows;
> > +int pid;
> > +
> > +SEC("fentry/mptcp_sched_get_send")
> 
> If I understand correctly, this hook will be called twice on the
> client
> connection for this test because the client is sending two times one
> byte, right?

Yes, in the first call, the subflows are not been added yet, so it is
called twice. I will add this check to skip the first call:

       if (msk->pm.server_side || !msk->pm.subflows)
               return 0;

> 
> > +int BPF_PROG(trace_mptcp_sched_get_send, struct mptcp_sock *msk)
> > +{
> > +	struct mptcp_subflow_context *subflow;
> > +	struct sock *ssk = NULL;
> > +
> > +	if (bpf_get_current_pid_tgid() >> 32 != pid)
> > +		return 0;
> > +
> > +	msk = bpf_mptcp_sock_acquire(msk);
> > +	if (!msk)
> > +		return 0;
> > +	bpf_for_each(mptcp_subflow, subflow, msk) {
> > +		if (subflow->token != msk->token)
> > +			break;
> 
> Out of curiosity, why is this needed? Is it needed for the verifier?
> Or
> just an extra check?
> Can you add a comment here explaining why this is there please?

I'll drop it.

> 
> > +
> > +		if (!mptcp_subflow_active(subflow))
> > +			continue;
> > +
> > +		ssk = bpf_mptcp_subflow_tcp_sock(subflow);
> 
> Do you get 'ssk' just to use bpf_mptcp_subflow_tcp_sock() and
> bpf_mptcp_subflow_ctx()?

Yes, almost every BPF scheduler use these helpers, so test them here.

> 
> > +	}
> > +	bpf_mptcp_sock_release(msk);
> > +
> > +	if (!ssk)
> > +		return 0;
> > +	subflow = bpf_mptcp_subflow_ctx(ssk);
> > +	mptcp_subflow_set_scheduled(subflow, true);
> > +	subflows = subflow->subflow_id;
> 
> Here, it looks like you only check if the last subflow is in the
> list.
> 
> Would it not be better to count the number of subflows in the list?
> Or
> if you want to read something from 'subflow', you could add all
> subflow_id?
> 
>   subflows = 0;
> 
>   (...)
> 
>   bpf_for_each(mptcp_subflow, subflow, msk)
>       subflows += subflow->subflow_id;
> 
> By doing that, we will catch if one is missing, and the order is not
> important.

Yes, "subflows += subflow->subflow_id" is much better.

> 
> If still want to use bpf_mptcp_subflow_tcp_sock() and
> bpf_mptcp_subflow_ctx(), maybe you can save other fields from the
> last
> subflow: the token to compare it with the one in the msk? Or the port
> number or the address?

Sorry, I don't fully understand your last paragraph. But I stored the
dport number in the code below. Is it the same as what you thought? If
so, I'll send a squash-to patch for it.

int pid;
int ids;

SEC("fentry/mptcp_sched_get_send")
int BPF_PROG(trace_mptcp_sched_get_send, struct mptcp_sock *msk)
{
        struct mptcp_subflow_context *subflow;
        struct sock *sk = (struct sock *)msk;
        struct sock *ssk = NULL;
        int subflows = 0;
        __be16 dport;

        if (bpf_get_current_pid_tgid() >> 32 != pid)
                return 0;

        if (msk->pm.server_side || !msk->pm.subflows)
                return 0;

        msk = bpf_mptcp_sock_acquire(msk);
        if (!msk)
                return 0;
        bpf_for_each(mptcp_subflow, subflow, msk) {
                if (!mptcp_subflow_active(subflow))
                        continue;

                subflows += subflow->subflow_id;
                ssk = bpf_mptcp_subflow_tcp_sock(subflow);
                dport = ssk->sk_dport;
        }

        if (!ssk)
                goto out;
        subflow = bpf_mptcp_subflow_ctx(ssk);
        if (dport != ssk->sk_dport || dport != sk->sk_dport)
                goto out;
        mptcp_subflow_set_scheduled(subflow, true);
        ids = subflows;

out:
        bpf_mptcp_sock_release(msk);
        return 0;
}

Thanks,
-Geliang

> 
> > +
> > +	return 0;
> > +}
> 
> Cheers,
> Matt

Re: [PATCH mptcp-next v9 4/7] selftests/bpf: Add mptcp_subflow bpf_iter test prog

Posted by Matthieu Baerts 1 year, 3 months ago

Hi Geliang,

On 15/10/2024 09:59, Geliang Tang wrote:
> On Mon, 2024-10-14 at 18:06 +0200, Matthieu Baerts wrote:
>> Hi Geliang,
>>
>> On 09/10/2024 11:45, Geliang Tang wrote:
>>> From: Geliang Tang <tanggeliang@kylinos.cn>
>>>
>>> This patch adds a ftrace hook for mptcp_sched_get_send() to test
>>> the newly
>>> added mptcp_subflow bpf_iter. This test simulates a typical mptcp
>>> packet
>>> scheduler, which selects a subflow from multiple subflows of an
>>> mptcp
>>> socket to send data.
>>>
>>> Export mptcp_subflow helpers
>>> bpf_iter_mptcp_subflow_new/_next/_destroy,
>>> bpf_mptcp_sock_acquire/_release and other helpers into
>>> bpf_experimental.h.
>>>
>>> Use _acquire() to acquire the msk, then use
>>> bpf_for_each(mptcp_subflow) to
>>> walk the subflow list of this msk. Invoke kfuncs
>>> mptcp_subflow_active() and
>>> bpf_mptcp_subflow_tcp_sock() in the loop to pick a subsocket.
>>> Finally use
>>> bpf_mptcp_subflow_ctx() to get the subflow context of this
>>> subsocket and
>>> use mptcp_subflow_set_scheduled() to set it as being scheduled.
>>
>> Do you think we could have a test not depending on scheduler helpers?
>> Without this dependence, we could already upstream this series, and
>> get
>> feedback without having to wait for the new scheduler API.
> 
> This set can be upstream as is, no need to wait for the new scheduler
> API.
> 
> If no scheduler helpers are used in this test, no need to add this
> mptcp_subflow bpf_iter at all. mptcp_for_each_subflow() helper in
> progs/mptcp_bpf.h can do that.
> 
> mptcp_for_each_subflow(msk, subflow) {
> 	subflow = bpf_core_cast(subflow, struct
> mptcp_subflow_context);
> 	subflows += subflow->subflow_id;
> }
> 
> No need to use this:
> 
> bpf_for_each(mptcp_subflow, subflow, msk)
> 	subflows += subflow->subflow_id;

I agree, but for me, I see this as a preparation step, and then it is
fine if this test is not doing anything useful for the moment. I think
it would be enough to add a comment in the code like:

  /* Here MPTCP-specific kfunc can be called */

And add in the commit message that these kfunc will be added later one,
as a next step.

>> If you remove the use of mptcp_subflow_active() and
>> mptcp_subflow_set_scheduled(), that's enough, no?
> 
> But "subflow" in mptcp_for_each_subflow() loop can't be passed to a
> kernel function:
> 
> mptcp_for_each_subflow(msk, subflow) {
> 	subflow = bpf_core_cast(subflow, struct
> mptcp_subflow_context);
> 	mptcp_subflow_active(subflows);
> }
> 
> This is not allowed by BPF.
> 
> So we add this iter to do this:
> 
> bpf_for_each(mptcp_subflow, subflow, msk)
> 	kfunc(subflow);
> 
> In this test, I must pick some kfuncs accepted "subflow" argument to do
> this test, so mptcp_subflow_active and mptcp_subflow_set_scheduled are
> picked.

I think that can be done later on, the first step is to have these MPTCP
bpf_iter upstreamed I think.

> Also, This iter is for the BPF packet scheduler, why not test the
> actual usage of scheduler helpers in this test?

I understand that, but I think it will be easier to avoid these helpers
for the moment, and make the test as simpler as possible.

(...)

>>> diff --git
>>> a/tools/testing/selftests/bpf/progs/mptcp_bpf_iters_subflow.c
>>> b/tools/testing/selftests/bpf/progs/mptcp_bpf_iters_subflow.c
>>> new file mode 100644
>>> index 000000000000..4268e4604c5a
>>> --- /dev/null
>>> +++ b/tools/testing/selftests/bpf/progs/mptcp_bpf_iters_subflow.c
>>> @@ -0,0 +1,42 @@
>>> +// SPDX-License-Identifier: GPL-2.0
>>> +/* Copyright (c) 2024, Kylin Software */
>>> +
>>> +/* vmlinux.h, bpf_helpers.h and other 'define' */
>>> +#include "bpf_tracing_net.h"
>>> +#include "mptcp_bpf.h"
>>> +
>>> +char _license[] SEC("license") = "GPL";
>>> +int subflows;
>>> +int pid;
>>> +
>>> +SEC("fentry/mptcp_sched_get_send")
>>
>> If I understand correctly, this hook will be called twice on the
>> client
>> connection for this test because the client is sending two times one
>> byte, right?
> 
> Yes, in the first call, the subflows are not been added yet, so it is
> called twice. I will add this check to skip the first call:
> 
>        if (msk->pm.server_side || !msk->pm.subflows)
>                return 0;

Good idea! Please add a comment above to explain you added that to only
do the rest once.

If we want to have a simpler test without the scheduler helpers, we can
also use another hook, e.g. just before closing the different subflows?
Up to you.

Also, just to be sure: is this BPF program only used (loaded, running)
by the corresponding test?
Can you remind me why is there a PID check? Is it because all BPF
programs are always loaded for all tests?

>>> +int BPF_PROG(trace_mptcp_sched_get_send, struct mptcp_sock *msk)
>>> +{
>>> +	struct mptcp_subflow_context *subflow;
>>> +	struct sock *ssk = NULL;
>>> +
>>> +	if (bpf_get_current_pid_tgid() >> 32 != pid)
>>> +		return 0;
>>> +
>>> +	msk = bpf_mptcp_sock_acquire(msk);
>>> +	if (!msk)
>>> +		return 0;
>>> +	bpf_for_each(mptcp_subflow, subflow, msk) {
>>> +		if (subflow->token != msk->token)
>>> +			break;
>>
>> Out of curiosity, why is this needed? Is it needed for the verifier?
>> Or
>> just an extra check?
>> Can you add a comment here explaining why this is there please?
> 
> I'll drop it.

Just to be clear: I'm not asking to drop it, but to understand why it is
needed. The main reason is to have other people using it in future
programs, because they saw it was used here. If it is here as an
"assert", something that cannot be wrong, that's fine, as long as it is
marked (by a comment) as is.

>>> +
>>> +		if (!mptcp_subflow_active(subflow))
>>> +			continue;
>>> +
>>> +		ssk = bpf_mptcp_subflow_tcp_sock(subflow);
>>
>> Do you get 'ssk' just to use bpf_mptcp_subflow_tcp_sock() and
>> bpf_mptcp_subflow_ctx()?
> 
> Yes, almost every BPF scheduler use these helpers, so test them here.

For me, that's fine to keep using these helpers, even if their usage is
limited in the test. You can keep it here, and keep
bpf_mptcp_subflow_ctx(ssk) below, then use 'subflow' below to compare
the token with the one of the msk for example. Feel free to add a
comment mentioning that it is just to show these helpers can be used in
the 'bpf_for_each()' and outside.

>>> +	}
>>> +	bpf_mptcp_sock_release(msk);
>>> +
>>> +	if (!ssk)
>>> +		return 0;
>>> +	subflow = bpf_mptcp_subflow_ctx(ssk);
>>> +	mptcp_subflow_set_scheduled(subflow, true);
>>> +	subflows = subflow->subflow_id;
>>
>> Here, it looks like you only check if the last subflow is in the
>> list.
>>
>> Would it not be better to count the number of subflows in the list?
>> Or
>> if you want to read something from 'subflow', you could add all
>> subflow_id?
>>
>>   subflows = 0;
>>
>>   (...)
>>
>>   bpf_for_each(mptcp_subflow, subflow, msk)
>>       subflows += subflow->subflow_id;
>>
>> By doing that, we will catch if one is missing, and the order is not
>> important.
> 
> Yes, "subflows += subflow->subflow_id" is much better.
> 
>>
>> If still want to use bpf_mptcp_subflow_tcp_sock() and
>> bpf_mptcp_subflow_ctx(), maybe you can save other fields from the
>> last
>> subflow: the token to compare it with the one in the msk? Or the port
>> number or the address?
> 
> Sorry, I don't fully understand your last paragraph. But I stored the
> dport number in the code below. Is it the same as what you thought? If
> so, I'll send a squash-to patch for it.

I was just thinking about something you could do to keep using these new
helpers: keep the ref to the last subflow, and compare with data from
the msk, e.g.

  struct mptcp_subflow_context *subflow;
  struct sock *sk = (struct sock *)msk;
  struct sock *ssk = NULL;
  int all_ids = 0;

  /* TODO: why is it needed? */
  if (bpf_get_current_pid_tgid() >> 32 != pid)
      return 0;

  /* to do the test only once: on the client side, at the 2nd send */
  if (msk->pm.server_side || !msk->pm.subflows)
      return 0;

  msk = bpf_mptcp_sock_acquire(msk);
  if (!msk)
      return 0;

  bpf_for_each(mptcp_subflow, subflow, msk) {
      /* Here MPTCP-specific kfunc can be called: this test is not doing
       * anything really useful, only to verify the iteration works.
       */

      /* assert: if not OK, something wrong on the kernel side */
      if (subflow->token != msk->token)
          break;

      /* to check we iterate over all subflows */
      local_ids += subflow->subflow_id;

      /* only to check the following kfunc works */
      ssk = bpf_mptcp_subflow_tcp_sock(subflow);
  }

  if (!ssk)
      goto out;

  /* assert: if not OK, something wrong on the kernel side */
  if (ssk->sk_dport != sk->sk_dport)
      goto out;

  /* only to check the following kfunc works */
  subflow = bpf_mptcp_subflow_ctx(ssk);
  if (subflow->token != msk->token)
      goto out;

  ids = local_ids;


(not tested)

→ So not using anything linked to the packet scheduler, just very simple
check to make sure 'bpf_iter' + bpf_mptcp_subflow_tcp_sock() and
bpf_mptcp_subflow_ctx() work as expected. Like that, it sounds easier to
upstream as it is.

WDYT?

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.

Re: [PATCH mptcp-next v9 4/7] selftests/bpf: Add mptcp_subflow bpf_iter test prog

Posted by Geliang Tang 1 year, 3 months ago

Hi Matt,

On Tue, 2024-10-15 at 12:47 +0200, Matthieu Baerts wrote:
> Hi Geliang,
> 
> On 15/10/2024 09:59, Geliang Tang wrote:
> > On Mon, 2024-10-14 at 18:06 +0200, Matthieu Baerts wrote:
> > > Hi Geliang,
> > > 
> > > On 09/10/2024 11:45, Geliang Tang wrote:
> > > > From: Geliang Tang <tanggeliang@kylinos.cn>
> > > > 
> > > > This patch adds a ftrace hook for mptcp_sched_get_send() to
> > > > test
> > > > the newly
> > > > added mptcp_subflow bpf_iter. This test simulates a typical
> > > > mptcp
> > > > packet
> > > > scheduler, which selects a subflow from multiple subflows of an
> > > > mptcp
> > > > socket to send data.
> > > > 
> > > > Export mptcp_subflow helpers
> > > > bpf_iter_mptcp_subflow_new/_next/_destroy,
> > > > bpf_mptcp_sock_acquire/_release and other helpers into
> > > > bpf_experimental.h.
> > > > 
> > > > Use _acquire() to acquire the msk, then use
> > > > bpf_for_each(mptcp_subflow) to
> > > > walk the subflow list of this msk. Invoke kfuncs
> > > > mptcp_subflow_active() and
> > > > bpf_mptcp_subflow_tcp_sock() in the loop to pick a subsocket.
> > > > Finally use
> > > > bpf_mptcp_subflow_ctx() to get the subflow context of this
> > > > subsocket and
> > > > use mptcp_subflow_set_scheduled() to set it as being scheduled.
> > > 
> > > Do you think we could have a test not depending on scheduler
> > > helpers?
> > > Without this dependence, we could already upstream this series,
> > > and
> > > get
> > > feedback without having to wait for the new scheduler API.
> > 
> > This set can be upstream as is, no need to wait for the new
> > scheduler
> > API.
> > 
> > If no scheduler helpers are used in this test, no need to add this
> > mptcp_subflow bpf_iter at all. mptcp_for_each_subflow() helper in
> > progs/mptcp_bpf.h can do that.
> > 
> > mptcp_for_each_subflow(msk, subflow) {
> > 	subflow = bpf_core_cast(subflow, struct
> > mptcp_subflow_context);
> > 	subflows += subflow->subflow_id;
> > }
> > 
> > No need to use this:
> > 
> > bpf_for_each(mptcp_subflow, subflow, msk)
> > 	subflows += subflow->subflow_id;
> 
> I agree, but for me, I see this as a preparation step, and then it is
> fine if this test is not doing anything useful for the moment. I
> think
> it would be enough to add a comment in the code like:
> 
>   /* Here MPTCP-specific kfunc can be called */
> 
> And add in the commit message that these kfunc will be added later
> one,
> as a next step.
> 
> > > If you remove the use of mptcp_subflow_active() and
> > > mptcp_subflow_set_scheduled(), that's enough, no?
> > 
> > But "subflow" in mptcp_for_each_subflow() loop can't be passed to a
> > kernel function:
> > 
> > mptcp_for_each_subflow(msk, subflow) {
> > 	subflow = bpf_core_cast(subflow, struct
> > mptcp_subflow_context);
> > 	mptcp_subflow_active(subflows);
> > }
> > 
> > This is not allowed by BPF.
> > 
> > So we add this iter to do this:
> > 
> > bpf_for_each(mptcp_subflow, subflow, msk)
> > 	kfunc(subflow);
> > 
> > In this test, I must pick some kfuncs accepted "subflow" argument
> > to do
> > this test, so mptcp_subflow_active and mptcp_subflow_set_scheduled
> > are
> > picked.
> 
> I think that can be done later on, the first step is to have these
> MPTCP
> bpf_iter upstreamed I think.
> 
> > Also, This iter is for the BPF packet scheduler, why not test the
> > actual usage of scheduler helpers in this test?
> 
> I understand that, but I think it will be easier to avoid these
> helpers
> for the moment, and make the test as simpler as possible.
> 
> (...)
> 
> > > > diff --git
> > > > a/tools/testing/selftests/bpf/progs/mptcp_bpf_iters_subflow.c
> > > > b/tools/testing/selftests/bpf/progs/mptcp_bpf_iters_subflow.c
> > > > new file mode 100644
> > > > index 000000000000..4268e4604c5a
> > > > --- /dev/null
> > > > +++
> > > > b/tools/testing/selftests/bpf/progs/mptcp_bpf_iters_subflow.c
> > > > @@ -0,0 +1,42 @@
> > > > +// SPDX-License-Identifier: GPL-2.0
> > > > +/* Copyright (c) 2024, Kylin Software */
> > > > +
> > > > +/* vmlinux.h, bpf_helpers.h and other 'define' */
> > > > +#include "bpf_tracing_net.h"
> > > > +#include "mptcp_bpf.h"
> > > > +
> > > > +char _license[] SEC("license") = "GPL";
> > > > +int subflows;
> > > > +int pid;
> > > > +
> > > > +SEC("fentry/mptcp_sched_get_send")
> > > 
> > > If I understand correctly, this hook will be called twice on the
> > > client
> > > connection for this test because the client is sending two times
> > > one
> > > byte, right?
> > 
> > Yes, in the first call, the subflows are not been added yet, so it
> > is
> > called twice. I will add this check to skip the first call:
> > 
> >        if (msk->pm.server_side || !msk->pm.subflows)
> >                return 0;
> 
> Good idea! Please add a comment above to explain you added that to
> only
> do the rest once.
> 
> If we want to have a simpler test without the scheduler helpers, we
> can
> also use another hook, e.g. just before closing the different
> subflows?
> Up to you.
> 
> Also, just to be sure: is this BPF program only used (loaded,
> running)
> by the corresponding test?
> Can you remind me why is there a PID check? Is it because all BPF
> programs are always loaded for all tests?
> 
> > > > +int BPF_PROG(trace_mptcp_sched_get_send, struct mptcp_sock
> > > > *msk)
> > > > +{
> > > > +	struct mptcp_subflow_context *subflow;
> > > > +	struct sock *ssk = NULL;
> > > > +
> > > > +	if (bpf_get_current_pid_tgid() >> 32 != pid)
> > > > +		return 0;
> > > > +
> > > > +	msk = bpf_mptcp_sock_acquire(msk);
> > > > +	if (!msk)
> > > > +		return 0;
> > > > +	bpf_for_each(mptcp_subflow, subflow, msk) {
> > > > +		if (subflow->token != msk->token)
> > > > +			break;
> > > 
> > > Out of curiosity, why is this needed? Is it needed for the
> > > verifier?
> > > Or
> > > just an extra check?
> > > Can you add a comment here explaining why this is there please?
> > 
> > I'll drop it.
> 
> Just to be clear: I'm not asking to drop it, but to understand why it
> is
> needed. The main reason is to have other people using it in future
> programs, because they saw it was used here. If it is here as an
> "assert", something that cannot be wrong, that's fine, as long as it
> is
> marked (by a comment) as is.
> 
> > > > +
> > > > +		if (!mptcp_subflow_active(subflow))
> > > > +			continue;
> > > > +
> > > > +		ssk = bpf_mptcp_subflow_tcp_sock(subflow);
> > > 
> > > Do you get 'ssk' just to use bpf_mptcp_subflow_tcp_sock() and
> > > bpf_mptcp_subflow_ctx()?
> > 
> > Yes, almost every BPF scheduler use these helpers, so test them
> > here.
> 
> For me, that's fine to keep using these helpers, even if their usage
> is
> limited in the test. You can keep it here, and keep
> bpf_mptcp_subflow_ctx(ssk) below, then use 'subflow' below to compare
> the token with the one of the msk for example. Feel free to add a
> comment mentioning that it is just to show these helpers can be used
> in
> the 'bpf_for_each()' and outside.
> 
> > > > +	}
> > > > +	bpf_mptcp_sock_release(msk);
> > > > +
> > > > +	if (!ssk)
> > > > +		return 0;
> > > > +	subflow = bpf_mptcp_subflow_ctx(ssk);
> > > > +	mptcp_subflow_set_scheduled(subflow, true);
> > > > +	subflows = subflow->subflow_id;
> > > 
> > > Here, it looks like you only check if the last subflow is in the
> > > list.
> > > 
> > > Would it not be better to count the number of subflows in the
> > > list?
> > > Or
> > > if you want to read something from 'subflow', you could add all
> > > subflow_id?
> > > 
> > >   subflows = 0;
> > > 
> > >   (...)
> > > 
> > >   bpf_for_each(mptcp_subflow, subflow, msk)
> > >       subflows += subflow->subflow_id;
> > > 
> > > By doing that, we will catch if one is missing, and the order is
> > > not
> > > important.
> > 
> > Yes, "subflows += subflow->subflow_id" is much better.
> > 
> > > 
> > > If still want to use bpf_mptcp_subflow_tcp_sock() and
> > > bpf_mptcp_subflow_ctx(), maybe you can save other fields from the
> > > last
> > > subflow: the token to compare it with the one in the msk? Or the
> > > port
> > > number or the address?
> > 
> > Sorry, I don't fully understand your last paragraph. But I stored
> > the
> > dport number in the code below. Is it the same as what you thought?
> > If
> > so, I'll send a squash-to patch for it.
> 
> I was just thinking about something you could do to keep using these
> new
> helpers: keep the ref to the last subflow, and compare with data from
> the msk, e.g.
> 
>   struct mptcp_subflow_context *subflow;
>   struct sock *sk = (struct sock *)msk;
>   struct sock *ssk = NULL;
>   int all_ids = 0;
> 
>   /* TODO: why is it needed? */

This test program is a ftrace hook, it may affect other programs, so
the pid limitation is needed. v10 has changed the test program as
"cgroup/getsockopt", no need to add this limitation any more.

>   if (bpf_get_current_pid_tgid() >> 32 != pid)
>       return 0;
> 
>   /* to do the test only once: on the client side, at the 2nd send */
>   if (msk->pm.server_side || !msk->pm.subflows)
>       return 0;
> 
>   msk = bpf_mptcp_sock_acquire(msk);
>   if (!msk)
>       return 0;
> 
>   bpf_for_each(mptcp_subflow, subflow, msk) {
>       /* Here MPTCP-specific kfunc can be called: this test is not
> doing
>        * anything really useful, only to verify the iteration works.
>        */
> 
>       /* assert: if not OK, something wrong on the kernel side */
>       if (subflow->token != msk->token)
>           break;
> 
>       /* to check we iterate over all subflows */
>       local_ids += subflow->subflow_id;
> 
>       /* only to check the following kfunc works */
>       ssk = bpf_mptcp_subflow_tcp_sock(subflow);
>   }
> 
>   if (!ssk)
>       goto out;
> 
>   /* assert: if not OK, something wrong on the kernel side */
>   if (ssk->sk_dport != sk->sk_dport)
>       goto out;
> 
>   /* only to check the following kfunc works */
>   subflow = bpf_mptcp_subflow_ctx(ssk);
>   if (subflow->token != msk->token)
>       goto out;

token is checked twice. I dropped the first one in v10.

Thanks,
-Geliang

> 
>   ids = local_ids;
> 
> 
> (not tested)
> 
> → So not using anything linked to the packet scheduler, just very
> simple
> check to make sure 'bpf_iter' + bpf_mptcp_subflow_tcp_sock() and
> bpf_mptcp_subflow_ctx() work as expected. Like that, it sounds easier
> to
> upstream as it is.
> 
> WDYT?
> 
> Cheers,
> Matt

Re: [PATCH mptcp-next v9 4/7] selftests/bpf: Add mptcp_subflow bpf_iter test prog

Posted by Matthieu Baerts 1 year, 3 months ago

Hi Geliang,

On 18/10/2024 03:22, Geliang Tang wrote:
> On Tue, 2024-10-15 at 12:47 +0200, Matthieu Baerts wrote:
>>   /* TODO: why is it needed? */
> 
> This test program is a ftrace hook, it may affect other programs, so
> the pid limitation is needed. v10 has changed the test program as
> "cgroup/getsockopt", no need to add this limitation any more.

OK. Do you mean that when running the BPF selftests, all BPF test
programs are loaded? Even the non MPTCP ones? Or only the ftrace ones?

Why is "cgroup/getsockopt" safer? Because it is only executed for a
specific CGroup?

(...)

>>   /* only to check the following kfunc works */
>>   subflow = bpf_mptcp_subflow_ctx(ssk);
>>   if (subflow->token != msk->token)
>>       goto out;
> 
> token is checked twice. I dropped the first one in v10.

OK!

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.

[PATCH mptcp-next v9 1/7] bpf: Register mptcp common kfunc set
[PATCH mptcp-next v9 2/7] bpf: Add mptcp_subflow bpf_iter
[PATCH mptcp-next v9 3/7] bpf: Add mptcp_sock acquire and release helpers
[PATCH mptcp-next v9 4/7] selftests/bpf: Add mptcp_subflow bpf_iter test prog
[PATCH mptcp-next v9 5/7] selftests/bpf: More endpoints for endpoint_init
[PATCH mptcp-next v9 6/7] Squash to "selftests/bpf: Add bpf scheduler test"
[PATCH mptcp-next v9 7/7] selftests/bpf: Add mptcp_subflow bpf_iter subtest