[PATCH mptcp-next v3 0/3] BPF path manager, part 4

Geliang Tang posted 3 patches 1 year ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/multipath-tcp/mptcp_net-next tags/patchew/cover.1737012662.git.tanggeliang@kylinos.cn
There is a newer version of this series
include/net/mptcp.h      |  27 +++
net/mptcp/pm.c           |   5 +
net/mptcp/pm_userspace.c | 374 ++++++++++++++++++++++++++++-----------
net/mptcp/protocol.c     |   1 +
net/mptcp/protocol.h     |   9 +
5 files changed, 313 insertions(+), 103 deletions(-)
[PATCH mptcp-next v3 0/3] BPF path manager, part 4
Posted by Geliang Tang 1 year ago
From: Geliang Tang <tanggeliang@kylinos.cn>

v3:
 - rename the 2nd parameter of get_local_id() from 'local' to 'skc'.
 - keep the 'msk_sport' check in mptcp_userspace_pm_get_local_id().
 - return 'err' instead of '0' in userspace_pm_subflow_create().
 - drop 'ret' variable inmptcp_pm_data_reset().
 - fix typos in commit log.

Depends on: "BPF path manager, part 3" v4
Based-on: <cover.1737012165.git.tanggeliang@kylinos.cn>

v2:
 - update get_local_id interface in patch 2.

get_addr() and dump_addr() interfaces of BPF userspace pm are dropped
as Matt suggested.

In order to implement BPF userspace path manager, it is necessary to
unify the interfaces of the path manager. This set contains some
cleanups and refactoring to unify the interfaces in kernel space.
Finally, define a struct mptcp_pm_ops for a userspace path manager
like this:

struct mptcp_pm_ops {
	int (*address_announce)(struct mptcp_sock *msk,
				struct mptcp_pm_addr_entry *local);
	int (*address_remove)(struct mptcp_sock *msk, u8 id);
	int (*subflow_create)(struct mptcp_sock *msk,
			      struct mptcp_pm_addr_entry *local,
			      struct mptcp_addr_info *remote);
	int (*subflow_destroy)(struct mptcp_sock *msk,
			       struct mptcp_pm_addr_entry *local,
			       struct mptcp_addr_info *remote);
	int (*get_local_id)(struct mptcp_sock *msk,
			    struct mptcp_pm_addr_entry *skc);
	u8 (*get_flags)(struct mptcp_sock *msk,
			struct mptcp_addr_info *skc);
	int (*set_flags)(struct mptcp_sock *msk,
			 struct mptcp_pm_addr_entry *local,
			 struct mptcp_addr_info *remote);

	u8			type;
	struct module		*owner;
	struct list_head	list;

	void (*init)(struct mptcp_sock *msk);
	void (*release)(struct mptcp_sock *msk);
} ____cacheline_aligned_in_smp;

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/74

Geliang Tang (3):
  mptcp: define struct mptcp_pm_ops
  mptcp: register default userspace pm
  mptcp: init and release mptcp_pm_ops

 include/net/mptcp.h      |  27 +++
 net/mptcp/pm.c           |   5 +
 net/mptcp/pm_userspace.c | 374 ++++++++++++++++++++++++++++-----------
 net/mptcp/protocol.c     |   1 +
 net/mptcp/protocol.h     |   9 +
 5 files changed, 313 insertions(+), 103 deletions(-)

-- 
2.43.0
Re: [PATCH mptcp-next v3 0/3] BPF path manager, part 4
Posted by Matthieu Baerts 1 year ago
Hi Geliang,

On 16/01/2025 08:38, Geliang Tang wrote:
> From: Geliang Tang <tanggeliang@kylinos.cn>
> 
> v3:
>  - rename the 2nd parameter of get_local_id() from 'local' to 'skc'.
>  - keep the 'msk_sport' check in mptcp_userspace_pm_get_local_id().
>  - return 'err' instead of '0' in userspace_pm_subflow_create().
>  - drop 'ret' variable inmptcp_pm_data_reset().
>  - fix typos in commit log.
> 
> Depends on: "BPF path manager, part 3" v4
> Based-on: <cover.1737012165.git.tanggeliang@kylinos.cn>
> 
> v2:
>  - update get_local_id interface in patch 2.
> 
> get_addr() and dump_addr() interfaces of BPF userspace pm are dropped
> as Matt suggested.
> 
> In order to implement BPF userspace path manager, it is necessary to
> unify the interfaces of the path manager. This set contains some
> cleanups and refactoring to unify the interfaces in kernel space.
> Finally, define a struct mptcp_pm_ops for a userspace path manager
> like this:

Thank you for this series.

From what I see, when a BPF path-manager is loaded, it will replace the
userspace one, right?

It looks a bit confusing to know when the (netlink) userspace PM is
used, and when the BPF one is. Why not defining a new type instead?
Similar to what we have with the packet scheduler? net.mptcp.pm_type
could be extended to select a BPF one of a certain ID, no? (or a new
type for BPF, then another sysctl to select the name?)

Why not, instead, and similar to the scheduler, have a proper
path-manager interface?

For the moment, when the core gets something that might interest the PM,
it will directly call a PM helper, then the right PM is picked thanks to
mptcp_pm_is_userspace() or mptcp_pm_is_kernel(). That starts to get a
bit messy when we want to add more PMs. Instead, the core could simply
use the new ops structure, using a pointer stored in the msk.

In other words, when the core needs to notify the PM, it will simply
call (directly, or via the PM worker): msk->pm.ops->something_happened()
(or via helpers in pm.c that could check if the callback is set, and
call mptcp_event() for example)

That way, we have a clear separation of all the different PMs, and the
interface is clear to understand the interactions between the core and
the PMs.

WDYT?

I hope this doesn't change all your plans. But I think having such
interface would help for the maintenance in general.


> struct mptcp_pm_ops {
> 	int (*address_announce)(struct mptcp_sock *msk,
> 				struct mptcp_pm_addr_entry *local);
> 	int (*address_remove)(struct mptcp_sock *msk, u8 id);
> 	int (*subflow_create)(struct mptcp_sock *msk,
> 			      struct mptcp_pm_addr_entry *local,
> 			      struct mptcp_addr_info *remote);
> 	int (*subflow_destroy)(struct mptcp_sock *msk,
> 			       struct mptcp_pm_addr_entry *local,
> 			       struct mptcp_addr_info *remote);

Small detail: I would add a 'd' at the end to clearly understand it is
linked to an event, not an action, e.g. a new address has just been
announced, here is a notification for the PM. So:

  address_announced
  address_removed
  subflow_created
  subflow_destroyed

> 	int (*get_local_id)(struct mptcp_sock *msk,
> 			    struct mptcp_pm_addr_entry *skc);
> 	u8 (*get_flags)(struct mptcp_sock *msk,
> 			struct mptcp_addr_info *skc);
> 	int (*set_flags)(struct mptcp_sock *msk,
> 			 struct mptcp_pm_addr_entry *local,
> 			 struct mptcp_addr_info *remote);

I think it should only be get_priority and set_priority (i.e. backup):
the core doesn't need to know the other (internal) flags.

Also, when looking at mptcp_event_type, I think some events are missing
here for a PM to handle the different cases:

  created: a new MPTCP has been created, to init some stuff
  established: fully established connection, the PM can trigger actions
  closed: to free some stuff

And probably 'listerner_created' and 'listener_closed' too.

One last thing; I guess the current 'subflow_created' is more a
'subflow_established', no?
And 'subflow_destroyed' could be 'subflow_closed' to keep the same name,
(and also because I guess this will be called just before it is going to
be destroyed).

> 
> 	u8			type;

Either a type (0 for in-kernel, 1 for userspace, >1 for BPF) or a name.

> 	struct module		*owner;
> 	struct list_head	list;
> 
> 	void (*init)(struct mptcp_sock *msk);
> 	void (*release)(struct mptcp_sock *msk);

Is the init/release done only once, or for each MPTCP connection handled
by this PM?

> } ____cacheline_aligned_in_smp;
> 
> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/74

Better to wait to have the full BPF interface and an example in the
selftests before closing this I think.


Two last thing:

- better keeping the 'mptcp_pm' prefixes for the new helpers, e.g.
mptcp_pm_register instead of mptcp_register_path_manager.

- for the commit title, maybe clearer to prefix them with 'mptcp: pm:',
and even 'mptcp: pm: in-kernel:', 'mptcp: pm: userspace:' and 'mptcp:
pm: bpf' when it makes sense.

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.
Re: [PATCH mptcp-next v3 0/3] BPF path manager, part 4
Posted by Geliang Tang 1 year ago
Hi Matt,

Thanks for your review.

On Thu, 2025-01-23 at 13:43 +0100, Matthieu Baerts wrote:
> Hi Geliang,
> 
> On 16/01/2025 08:38, Geliang Tang wrote:
> > From: Geliang Tang <tanggeliang@kylinos.cn>
> > 
> > v3:
> >  - rename the 2nd parameter of get_local_id() from 'local' to
> > 'skc'.
> >  - keep the 'msk_sport' check in mptcp_userspace_pm_get_local_id().
> >  - return 'err' instead of '0' in userspace_pm_subflow_create().
> >  - drop 'ret' variable inmptcp_pm_data_reset().
> >  - fix typos in commit log.
> > 
> > Depends on: "BPF path manager, part 3" v4
> > Based-on: <cover.1737012165.git.tanggeliang@kylinos.cn>
> > 
> > v2:
> >  - update get_local_id interface in patch 2.
> > 
> > get_addr() and dump_addr() interfaces of BPF userspace pm are
> > dropped
> > as Matt suggested.
> > 
> > In order to implement BPF userspace path manager, it is necessary
> > to
> > unify the interfaces of the path manager. This set contains some
> > cleanups and refactoring to unify the interfaces in kernel space.
> > Finally, define a struct mptcp_pm_ops for a userspace path manager
> > like this:
> 
> Thank you for this series.
> 
> From what I see, when a BPF path-manager is loaded, it will replace
> the
> userspace one, right?
> 
> It looks a bit confusing to know when the (netlink) userspace PM is
> used, and when the BPF one is. Why not defining a new type instead?
> Similar to what we have with the packet scheduler? net.mptcp.pm_type
> could be extended to select a BPF one of a certain ID, no? (or a new
> type for BPF, then another sysctl to select the name?)
> 
> Why not, instead, and similar to the scheduler, have a proper
> path-manager interface?
> 
> For the moment, when the core gets something that might interest the
> PM,
> it will directly call a PM helper, then the right PM is picked thanks
> to
> mptcp_pm_is_userspace() or mptcp_pm_is_kernel(). That starts to get a
> bit messy when we want to add more PMs. Instead, the core could
> simply
> use the new ops structure, using a pointer stored in the msk.
> 
> In other words, when the core needs to notify the PM, it will simply
> call (directly, or via the PM worker): msk->pm.ops-
> >something_happened()
> (or via helpers in pm.c that could check if the callback is set, and
> call mptcp_event() for example)
> 
> That way, we have a clear separation of all the different PMs, and
> the
> interface is clear to understand the interactions between the core
> and
> the PMs.
> 
> WDYT?
> 
> I hope this doesn't change all your plans. But I think having such
> interface would help for the maintenance in general.

It would be best if struct mptcp_pm_ops could be used to represent both
userspace pm and in-kernel pm, but it's not easy to represent in-kernel
pm right now, and the code in pm_netlink.c needs to be adjusted. So I
still use struct mptcp_pm_ops to implement MPTCP_PM_TYPE_BPF_USERSPACE
first in the set, and use mptcp_pm_is_userspace() function to
distinguish different PMs.

In the future, if we feel it's necessary, we can add a
MPTCP_PM_TYPE_BPF_KERENL pm type to implement BPF in-kernel pm, and
then gradually remove the use of mptcp_pm_is_userspace() and
mptcp_pm_is_kernel().

Thanks,
-Geliang

> 
> 
> > struct mptcp_pm_ops {
> >  int (*address_announce)(struct mptcp_sock *msk,
> >  struct mptcp_pm_addr_entry *local);
> >  int (*address_remove)(struct mptcp_sock *msk, u8 id);
> >  int (*subflow_create)(struct mptcp_sock *msk,
> >        struct mptcp_pm_addr_entry *local,
> >        struct mptcp_addr_info *remote);
> >  int (*subflow_destroy)(struct mptcp_sock *msk,
> >         struct mptcp_pm_addr_entry *local,
> >         struct mptcp_addr_info *remote);
> 
> Small detail: I would add a 'd' at the end to clearly understand it
> is
> linked to an event, not an action, e.g. a new address has just been
> announced, here is a notification for the PM. So:
> 
>   address_announced
>   address_removed
>   subflow_created
>   subflow_destroyed
> 
> >  int (*get_local_id)(struct mptcp_sock *msk,
> >      struct mptcp_pm_addr_entry *skc);
> >  u8 (*get_flags)(struct mptcp_sock *msk,
> >  struct mptcp_addr_info *skc);
> >  int (*set_flags)(struct mptcp_sock *msk,
> >  struct mptcp_pm_addr_entry *local,
> >  struct mptcp_addr_info *remote);
> 
> I think it should only be get_priority and set_priority (i.e.
> backup):
> the core doesn't need to know the other (internal) flags.
> 
> Also, when looking at mptcp_event_type, I think some events are
> missing
> here for a PM to handle the different cases:
> 
>   created: a new MPTCP has been created, to init some stuff
>   established: fully established connection, the PM can trigger
> actions
>   closed: to free some stuff
> 
> And probably 'listerner_created' and 'listener_closed' too.
> 
> One last thing; I guess the current 'subflow_created' is more a
> 'subflow_established', no?
> And 'subflow_destroyed' could be 'subflow_closed' to keep the same
> name,
> (and also because I guess this will be called just before it is going
> to
> be destroyed).
> 
> > 
> >  u8 type;
> 
> Either a type (0 for in-kernel, 1 for userspace, >1 for BPF) or a
> name.
> 
> >  struct module *owner;
> >  struct list_head list;
> > 
> >  void (*init)(struct mptcp_sock *msk);
> >  void (*release)(struct mptcp_sock *msk);
> 
> Is the init/release done only once, or for each MPTCP connection
> handled
> by this PM?
> 
> > } ____cacheline_aligned_in_smp;
> > 
> > Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/74
> 
> Better to wait to have the full BPF interface and an example in the
> selftests before closing this I think.
> 
> 
> Two last thing:
> 
> - better keeping the 'mptcp_pm' prefixes for the new helpers, e.g.
> mptcp_pm_register instead of mptcp_register_path_manager.
> 
> - for the commit title, maybe clearer to prefix them with 'mptcp:
> pm:',
> and even 'mptcp: pm: in-kernel:', 'mptcp: pm: userspace:' and 'mptcp:
> pm: bpf' when it makes sense.
> 
> Cheers,
> Matt

Re: [PATCH mptcp-next v3 0/3] BPF path manager, part 4
Posted by Geliang Tang 11 months, 4 weeks ago
Hi Matt,

On Fri, 2025-02-07 at 19:21 +0800, Geliang Tang wrote:
> Hi Matt,
> 
> Thanks for your review.
> 
> On Thu, 2025-01-23 at 13:43 +0100, Matthieu Baerts wrote:
> > Hi Geliang,
> > 
> > On 16/01/2025 08:38, Geliang Tang wrote:
> > > From: Geliang Tang <tanggeliang@kylinos.cn>
> > > 
> > > v3:
> > >  - rename the 2nd parameter of get_local_id() from 'local' to
> > > 'skc'.
> > >  - keep the 'msk_sport' check in
> > > mptcp_userspace_pm_get_local_id().
> > >  - return 'err' instead of '0' in userspace_pm_subflow_create().
> > >  - drop 'ret' variable inmptcp_pm_data_reset().
> > >  - fix typos in commit log.
> > > 
> > > Depends on: "BPF path manager, part 3" v4
> > > Based-on: <cover.1737012165.git.tanggeliang@kylinos.cn>
> > > 
> > > v2:
> > >  - update get_local_id interface in patch 2.
> > > 
> > > get_addr() and dump_addr() interfaces of BPF userspace pm are
> > > dropped
> > > as Matt suggested.
> > > 
> > > In order to implement BPF userspace path manager, it is necessary
> > > to
> > > unify the interfaces of the path manager. This set contains some
> > > cleanups and refactoring to unify the interfaces in kernel space.
> > > Finally, define a struct mptcp_pm_ops for a userspace path
> > > manager
> > > like this:
> > 
> > Thank you for this series.
> > 
> > From what I see, when a BPF path-manager is loaded, it will replace
> > the
> > userspace one, right?
> > 
> > It looks a bit confusing to know when the (netlink) userspace PM is
> > used, and when the BPF one is. Why not defining a new type instead?
> > Similar to what we have with the packet scheduler?
> > net.mptcp.pm_type
> > could be extended to select a BPF one of a certain ID, no? (or a
> > new
> > type for BPF, then another sysctl to select the name?)
> > 
> > Why not, instead, and similar to the scheduler, have a proper
> > path-manager interface?
> > 
> > For the moment, when the core gets something that might interest
> > the
> > PM,
> > it will directly call a PM helper, then the right PM is picked
> > thanks
> > to
> > mptcp_pm_is_userspace() or mptcp_pm_is_kernel(). That starts to get
> > a
> > bit messy when we want to add more PMs. Instead, the core could
> > simply
> > use the new ops structure, using a pointer stored in the msk.
> > 
> > In other words, when the core needs to notify the PM, it will
> > simply
> > call (directly, or via the PM worker): msk->pm.ops-
> > > something_happened()
> > (or via helpers in pm.c that could check if the callback is set,
> > and
> > call mptcp_event() for example)
> > 
> > That way, we have a clear separation of all the different PMs, and
> > the
> > interface is clear to understand the interactions between the core
> > and
> > the PMs.
> > 
> > WDYT?
> > 
> > I hope this doesn't change all your plans. But I think having such
> > interface would help for the maintenance in general.
> 
> It would be best if struct mptcp_pm_ops could be used to represent
> both
> userspace pm and in-kernel pm, but it's not easy to represent in-
> kernel
> pm right now, and the code in pm_netlink.c needs to be adjusted. So I
> still use struct mptcp_pm_ops to implement
> MPTCP_PM_TYPE_BPF_USERSPACE
> first in the set, and use mptcp_pm_is_userspace() function to
> distinguish different PMs.

I have made some new progress recently. I adjusted the code in
pm_netlink.c and implemented get_local_id(), get_priority() and
set_priority() interfaces for in-kernel pm. It seems that we can indeed
use struct mptcp_pm_ops to represent both userspace pm and in-kernel
pm, and then remove the use of mptcp_pm_is_userspace(). I will update
the new version of this set in the near future.

Thanks,
-Geliang

> 
> In the future, if we feel it's necessary, we can add a
> MPTCP_PM_TYPE_BPF_KERENL pm type to implement BPF in-kernel pm, and
> then gradually remove the use of mptcp_pm_is_userspace() and
> mptcp_pm_is_kernel().
> 
> Thanks,
> -Geliang
> 
> > 
> > 
> > > struct mptcp_pm_ops {
> > >  int (*address_announce)(struct mptcp_sock *msk,
> > >  struct mptcp_pm_addr_entry *local);
> > >  int (*address_remove)(struct mptcp_sock *msk, u8 id);
> > >  int (*subflow_create)(struct mptcp_sock *msk,
> > >        struct mptcp_pm_addr_entry *local,
> > >        struct mptcp_addr_info *remote);
> > >  int (*subflow_destroy)(struct mptcp_sock *msk,
> > >         struct mptcp_pm_addr_entry *local,
> > >         struct mptcp_addr_info *remote);
> > 
> > Small detail: I would add a 'd' at the end to clearly understand it
> > is
> > linked to an event, not an action, e.g. a new address has just been
> > announced, here is a notification for the PM. So:
> > 
> >   address_announced
> >   address_removed
> >   subflow_created
> >   subflow_destroyed
> > 
> > >  int (*get_local_id)(struct mptcp_sock *msk,
> > >      struct mptcp_pm_addr_entry *skc);
> > >  u8 (*get_flags)(struct mptcp_sock *msk,
> > >  struct mptcp_addr_info *skc);
> > >  int (*set_flags)(struct mptcp_sock *msk,
> > >  struct mptcp_pm_addr_entry *local,
> > >  struct mptcp_addr_info *remote);
> > 
> > I think it should only be get_priority and set_priority (i.e.
> > backup):
> > the core doesn't need to know the other (internal) flags.
> > 
> > Also, when looking at mptcp_event_type, I think some events are
> > missing
> > here for a PM to handle the different cases:
> > 
> >   created: a new MPTCP has been created, to init some stuff
> >   established: fully established connection, the PM can trigger
> > actions
> >   closed: to free some stuff
> > 
> > And probably 'listerner_created' and 'listener_closed' too.
> > 
> > One last thing; I guess the current 'subflow_created' is more a
> > 'subflow_established', no?
> > And 'subflow_destroyed' could be 'subflow_closed' to keep the same
> > name,
> > (and also because I guess this will be called just before it is
> > going
> > to
> > be destroyed).
> > 
> > > 
> > >  u8 type;
> > 
> > Either a type (0 for in-kernel, 1 for userspace, >1 for BPF) or a
> > name.
> > 
> > >  struct module *owner;
> > >  struct list_head list;
> > > 
> > >  void (*init)(struct mptcp_sock *msk);
> > >  void (*release)(struct mptcp_sock *msk);
> > 
> > Is the init/release done only once, or for each MPTCP connection
> > handled
> > by this PM?
> > 
> > > } ____cacheline_aligned_in_smp;
> > > 
> > > Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/74
> > 
> > Better to wait to have the full BPF interface and an example in the
> > selftests before closing this I think.
> > 
> > 
> > Two last thing:
> > 
> > - better keeping the 'mptcp_pm' prefixes for the new helpers, e.g.
> > mptcp_pm_register instead of mptcp_register_path_manager.
> > 
> > - for the commit title, maybe clearer to prefix them with 'mptcp:
> > pm:',
> > and even 'mptcp: pm: in-kernel:', 'mptcp: pm: userspace:' and
> > 'mptcp:
> > pm: bpf' when it makes sense.
> > 
> > Cheers,
> > Matt
> 
> 

Re: [PATCH mptcp-next v3 0/3] BPF path manager, part 4
Posted by MPTCP CI 1 year ago
Hi Geliang,

Thank you for your modifications, that's great!

Our CI did some validations and here is its report:

- KVM Validation: normal: Success! ✅
- KVM Validation: debug: Unstable: 1 failed test(s): selftest_mptcp_join - Critical: 1 Call Trace(s) ❌
- KVM Validation: btf-normal (only bpftest_all): Success! ✅
- KVM Validation: btf-debug (only bpftest_all): Success! ✅
- Task: https://github.com/multipath-tcp/mptcp_net-next/actions/runs/12804456330

Initiator: Patchew Applier
Commits: https://github.com/multipath-tcp/mptcp_net-next/commits/50059925b7f3
Patchwork: https://patchwork.kernel.org/project/mptcp/list/?series=925962


If there are some issues, you can reproduce them using the same environment as
the one used by the CI thanks to a docker image, e.g.:

    $ cd [kernel source code]
    $ docker run -v "${PWD}:${PWD}:rw" -w "${PWD}" --privileged --rm -it \
        --pull always mptcp/mptcp-upstream-virtme-docker:latest \
        auto-normal

For more details:

    https://github.com/multipath-tcp/mptcp-upstream-virtme-docker


Please note that despite all the efforts that have been already done to have a
stable tests suite when executed on a public CI like here, it is possible some
reported issues are not due to your modifications. Still, do not hesitate to
help us improve that ;-)

Cheers,
MPTCP GH Action bot
Bot operated by Matthieu Baerts (NGI0 Core)