[PATCH mptcp-next RFC 0/4] mptcp: update scheduler API

Gregory Detal posted 4 patches 3 months, 3 weeks ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/multipath-tcp/mptcp_net-next tags/patchew/20240527-sched._5Fper._5Fpacket-v1-0-09a41d405f7c@gmail.com
include/net/mptcp.h                              | 29 ++++++++++++++++++++++++
net/mptcp/bpf.c                                  | 25 ++++++++++++++++++--
net/mptcp/protocol.c                             | 14 ++++++++----
net/mptcp/protocol.h                             |  9 ++++++++
net/mptcp/sched.c                                | 20 ++++++++++++++++
tools/testing/selftests/bpf/progs/mptcp_bpf.h    |  1 +
tools/testing/selftests/bpf/progs/mptcp_bpf_rr.c | 18 +++++++++++++++
7 files changed, 110 insertions(+), 6 deletions(-)
[PATCH mptcp-next RFC 0/4] mptcp: update scheduler API
Posted by Gregory Detal 3 months, 3 weeks ago
This patchset proposes an updated API for mptcp schedulers to provide
more flexibility for future scheduling strategies. This does not change
the current behavior of get_subflow. It adds an optional hook to control
scheduling behavior after selecting a subflow in get_subflow.

The main idea is to provide a way to perform a scheduling per-MSS or per
chunk of data. To demonstrate this, the mptcp_bpf_rr scheduler has been
modified to send one MSS per subflow in an alternating fashion.

The other potential application of this API could be to define a
scheduler that will proactivelly reinject data on available subflows
based on some heuristics (eg. when one subflow RTT increases when moving
away from WiFi AP). It could also allow to limit the usage of a specific
subflow, eg., in the case of high BDP.

I am mainly looking for feedback to see whether this design make sense.
Matthieu already did a quick pass on it.

The API introduces a push() function (name TBD) that takes the subflow
and data chunk (sequence number and length) as arguments. The scheduler
can then control packet scheduling by:
- Setting data transmission limits (e.g., per-MSS on a subflow or per window).
- Setting flags associated with the data. Currently, only
  MPTCP_SCHED_FLAG_RESCHEDULE is defined, forcing the scheduler to call
  get_subflow after sending the chunk.

I am envisioning another flag MPTCP_SCHED_FLAG_REINJECT that will ensure
that chunk of data can be rescheduled directly on another subflow.

I've written some packetdrill tests to validate the update RR scheduler
the test are available on my fork:

https://github.com/gdetal/packetdrill/tree/mptcp-net-next/gtests/net/mptcp-bpf/round-robin

Open questions:
- Is the current API for setting flags and limits appropriate? What are
  eBPF best practices to pass back information to kernel ?
- should get_scheduler and this new API be merged somehow as to provide
  a single way of scheduling data ? Eg. could pass chunk and return
  flags, limit and subflow to send this data to.

Co-Developed-By: Matthieu Baerts <matttbe@kernel.org>
Signed-off-by: Gregory Detal <gregory.detal@gmail.com>
---
Gregory Detal (4):
      mptcp: add push sched callback
      mptcp: use new push callback to schedule chunks
      mptcp: bpf: allow to write to mptcp_sched_chunk
      selftests/bpf: mptcp RR: send 1 MSS on each subflow

 include/net/mptcp.h                              | 29 ++++++++++++++++++++++++
 net/mptcp/bpf.c                                  | 25 ++++++++++++++++++--
 net/mptcp/protocol.c                             | 14 ++++++++----
 net/mptcp/protocol.h                             |  9 ++++++++
 net/mptcp/sched.c                                | 20 ++++++++++++++++
 tools/testing/selftests/bpf/progs/mptcp_bpf.h    |  1 +
 tools/testing/selftests/bpf/progs/mptcp_bpf_rr.c | 18 +++++++++++++++
 7 files changed, 110 insertions(+), 6 deletions(-)
---
base-commit: c98f8dc106a6d6c4424f2e612762821e1be14bf0
change-id: 20240503-sched_per_packet-d6e4488e54e8

Best regards,
-- 
Gregory Detal <gregory.detal@gmail.com>
Re: [PATCH mptcp-next RFC 0/4] mptcp: update scheduler API
Posted by Mat Martineau 3 months ago
On Mon, 27 May 2024, Gregory Detal wrote:

> This patchset proposes an updated API for mptcp schedulers to provide
> more flexibility for future scheduling strategies. This does not change
> the current behavior of get_subflow. It adds an optional hook to control
> scheduling behavior after selecting a subflow in get_subflow.
>
> The main idea is to provide a way to perform a scheduling per-MSS or per
> chunk of data. To demonstrate this, the mptcp_bpf_rr scheduler has been
> modified to send one MSS per subflow in an alternating fashion.
>
> The other potential application of this API could be to define a
> scheduler that will proactivelly reinject data on available subflows
> based on some heuristics (eg. when one subflow RTT increases when moving
> away from WiFi AP). It could also allow to limit the usage of a specific
> subflow, eg., in the case of high BDP.
>
> I am mainly looking for feedback to see whether this design make sense.
> Matthieu already did a quick pass on it.
>

Hi Gregory -

Sorry about the delay, thanks for working on this scheduler code!

> The API introduces a push() function (name TBD) that takes the subflow
> and data chunk (sequence number and length) as arguments. The scheduler
> can then control packet scheduling by:
> - Setting data transmission limits (e.g., per-MSS on a subflow or per window).
> - Setting flags associated with the data. Currently, only
>  MPTCP_SCHED_FLAG_RESCHEDULE is defined, forcing the scheduler to call
>  get_subflow after sending the chunk.
>
> I am envisioning another flag MPTCP_SCHED_FLAG_REINJECT that will ensure
> that chunk of data can be rescheduled directly on another subflow.
>
> I've written some packetdrill tests to validate the update RR scheduler
> the test are available on my fork:
>
> https://github.com/gdetal/packetdrill/tree/mptcp-net-next/gtests/net/mptcp-bpf/round-robin
>
> Open questions:
> - Is the current API for setting flags and limits appropriate? What are
>  eBPF best practices to pass back information to kernel ?

I think the approach of using a regular u16 for the flags and preprocessor 
constants like MPTCP_SCHED_FLAG_RESCHEDULE isn't a good fit for eBPF. The 
main reasons I see:

* The user's eBPF code can set any combination of bits in the u16 flags 
without the kernel automatically checking that those bits are 
known/expected.

* The flag values as preprocessor constants can't be represented in BTF 
format to allow compiling the BPF scheduler using type information from 
the kernel (rather than manually setting those #defines in the scheduler 
code).

Even though it's not as space-efficient, I think individual struct members 
would be better than an integer bitfield for this kind of flag. Then the 
struct_access hook can check what's supported at load time.

> - should get_scheduler and this new API be merged somehow as to provide
>  a single way of scheduling data ? Eg. could pass chunk and return
>  flags, limit and subflow to send this data to.

Yes, I think it would be good to consolidate get_scheduler and the new 
API, so two calls out to the BPF scheduler are not required for each 
packet. It's ok to change the internal scheduler APIs, and I think that 
will be required to support reinjection.

- Mat


>
> Co-Developed-By: Matthieu Baerts <matttbe@kernel.org>
> Signed-off-by: Gregory Detal <gregory.detal@gmail.com>
> ---
> Gregory Detal (4):
>      mptcp: add push sched callback
>      mptcp: use new push callback to schedule chunks
>      mptcp: bpf: allow to write to mptcp_sched_chunk
>      selftests/bpf: mptcp RR: send 1 MSS on each subflow
>
> include/net/mptcp.h                              | 29 ++++++++++++++++++++++++
> net/mptcp/bpf.c                                  | 25 ++++++++++++++++++--
> net/mptcp/protocol.c                             | 14 ++++++++----
> net/mptcp/protocol.h                             |  9 ++++++++
> net/mptcp/sched.c                                | 20 ++++++++++++++++
> tools/testing/selftests/bpf/progs/mptcp_bpf.h    |  1 +
> tools/testing/selftests/bpf/progs/mptcp_bpf_rr.c | 18 +++++++++++++++
> 7 files changed, 110 insertions(+), 6 deletions(-)
> ---
> base-commit: c98f8dc106a6d6c4424f2e612762821e1be14bf0
> change-id: 20240503-sched_per_packet-d6e4488e54e8
>
> Best regards,
> -- 
> Gregory Detal <gregory.detal@gmail.com>
>
>
>
Re: [PATCH mptcp-next RFC 0/4] mptcp: update scheduler API
Posted by MPTCP CI 3 months, 3 weeks ago
Hi Gregory,

Thank you for your modifications, that's great!

Our CI did some validations and here is its report:

- KVM Validation: normal: Success! ✅
- KVM Validation: debug: Success! ✅
- KVM Validation: btf (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/9255729312

Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/d60094641280
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=856158


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-normal

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)