:p
atchew
Login
Currently, upon the reception of an ADD_ADDR (and when the fullmesh flag is not used), the in-kernel PM will create new subflows using the local address the routing configuration will pick. It would be easier to pick local addresses from a selected list of endpoints, and use it only once, than relying on routing rules. Use case: both the client (C) and the server (S) have two addresses (a and b). The client establishes the connection between C(a) and S(a). Once established, the server announces its additional address S(b). Once received, the client connects to it using its second address C(b). Compared to a situation without the 'address' endpoint for C(b), the client didn't use this address C(b) to establish a subflow to the server's primary address S(a). So at the end, we have: C S C(a) --- S(a) C(b) --- S(b) In case of a 3rd address on each side (C(c) and S(c)), upon the reception of an ADD_ADDR with S(c), the client should not pick C(b) because it has already been used. C(c) should then be used. Note that this situation is currently possible if C doesn't add any endpoint, but configure the routing in order to pick C(b) for the route to S(b), and pick C(c) for the route to S(c). That doesn't sound very practical because it means knowing in advance the IP addresses that will be used and announced by the server. Patches 1 & 2: some clean-ups and refactoring. Patch 3: "standardisation" and small perf improvement. Patch 4: squash to patches for a commit queued for net-next. Patch 5: new 'address' endpoints. Patch 6: validation using selftests. Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> --- Matthieu Baerts (NGI0) (6): mptcp: pm: in-kernel: remove stale_loss_cnt mptcp: pm: in-kernel: reduce pernet struct size mptcp: pm: in-kernel: compare IDs instead of addresses Squash to "mptcp: pm: in-kernel: usable client side with C-flag" mptcp: pm: in-kernel: add 'address' endpoints selftests: mptcp: join: validate new 'address' endpoints include/uapi/linux/mptcp.h | 6 +- net/mptcp/pm_kernel.c | 228 ++++++++++++++++-------- net/mptcp/protocol.h | 9 +- net/mptcp/sockopt.c | 2 + tools/testing/selftests/net/mptcp/mptcp_join.sh | 56 ++++++ tools/testing/selftests/net/mptcp/pm_nl_ctl.c | 9 + 6 files changed, 229 insertions(+), 81 deletions(-) --- base-commit: 77807b94c731081ef3d97e96dabcea7aae2bfe15 change-id: 20250918-pm-kern-endp-add_addr-new-a20893e45389 Best regards, -- Matthieu Baerts (NGI0) <matttbe@kernel.org>
It is currently not used. It was in fact never used since its introduction in commit ff5a0b421cb2 ("mptcp: faster active backup recovery"). It was probably initially added to struct pm_nl_pernet during the development of this commit, before being added to struct mptcp_pernet in ctrl.c, but not removed from the first place. Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> --- net/mptcp/pm_kernel.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/net/mptcp/pm_kernel.c b/net/mptcp/pm_kernel.c index XXXXXXX..XXXXXXX 100644 --- a/net/mptcp/pm_kernel.c +++ b/net/mptcp/pm_kernel.c @@ -XXX,XX +XXX,XX @@ struct pm_nl_pernet { spinlock_t lock; struct list_head endp_list; unsigned int endpoints; - unsigned int stale_loss_cnt; unsigned int endp_signal_max; unsigned int endp_subflow_max; unsigned int limit_add_addr_accepted; @@ -XXX,XX +XXX,XX @@ static int __net_init pm_nl_init_net(struct net *net) /* Cit. 2 subflows ought to be enough for anybody. */ pernet->limit_extra_subflows = 2; pernet->next_id = 1; - pernet->stale_loss_cnt = 4; spin_lock_init(&pernet->lock); /* No need to initialize other pernet fields, the struct is zeroed at -- 2.51.0
All the 'unsigned int' variables from the 'pm_nl_pernet' structure are bounded to MPTCP_PM_ADDR_MAX, currently set to 8. The endpoint ID is also bounded by the protocol to 8-bit. MPTCP_PM_ADDR_MAX, if extended later, will never over 8-bit. So no need to use 'unsigned int' variables, 'u8' is enough. Note that the exposed counters in MPTCP_INFO are already limited to 8-bit, same for pm->extra_subflows, and others. So it seems even better to limit them to 8-bit. Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> --- net/mptcp/pm_kernel.c | 59 ++++++++++++++++++++------------------------------- net/mptcp/protocol.h | 8 +++---- 2 files changed, 27 insertions(+), 40 deletions(-) diff --git a/net/mptcp/pm_kernel.c b/net/mptcp/pm_kernel.c index XXXXXXX..XXXXXXX 100644 --- a/net/mptcp/pm_kernel.c +++ b/net/mptcp/pm_kernel.c @@ -XXX,XX +XXX,XX @@ struct pm_nl_pernet { /* protects pernet updates */ spinlock_t lock; struct list_head endp_list; - unsigned int endpoints; - unsigned int endp_signal_max; - unsigned int endp_subflow_max; - unsigned int limit_add_addr_accepted; - unsigned int limit_extra_subflows; - unsigned int next_id; + u8 endpoints; + u8 endp_signal_max; + u8 endp_subflow_max; + u8 limit_add_addr_accepted; + u8 limit_extra_subflows; + u8 next_id; DECLARE_BITMAP(id_bitmap, MPTCP_PM_MAX_ADDR_ID + 1); }; @@ -XXX,XX +XXX,XX @@ static struct pm_nl_pernet *genl_info_pm_nl(struct genl_info *info) return pm_nl_get_pernet(genl_info_net(info)); } -unsigned int mptcp_pm_get_endp_signal_max(const struct mptcp_sock *msk) +u8 mptcp_pm_get_endp_signal_max(const struct mptcp_sock *msk) { const struct pm_nl_pernet *pernet = pm_nl_get_pernet_from_msk(msk); @@ -XXX,XX +XXX,XX @@ unsigned int mptcp_pm_get_endp_signal_max(const struct mptcp_sock *msk) } EXPORT_SYMBOL_GPL(mptcp_pm_get_endp_signal_max); -unsigned int mptcp_pm_get_endp_subflow_max(const struct mptcp_sock *msk) +u8 mptcp_pm_get_endp_subflow_max(const struct mptcp_sock *msk) { struct pm_nl_pernet *pernet = pm_nl_get_pernet_from_msk(msk); @@ -XXX,XX +XXX,XX @@ unsigned int mptcp_pm_get_endp_subflow_max(const struct mptcp_sock *msk) } EXPORT_SYMBOL_GPL(mptcp_pm_get_endp_subflow_max); -unsigned int mptcp_pm_get_limit_add_addr_accepted(const struct mptcp_sock *msk) +u8 mptcp_pm_get_limit_add_addr_accepted(const struct mptcp_sock *msk) { struct pm_nl_pernet *pernet = pm_nl_get_pernet_from_msk(msk); @@ -XXX,XX +XXX,XX @@ unsigned int mptcp_pm_get_limit_add_addr_accepted(const struct mptcp_sock *msk) } EXPORT_SYMBOL_GPL(mptcp_pm_get_limit_add_addr_accepted); -unsigned int mptcp_pm_get_limit_extra_subflows(const struct mptcp_sock *msk) +u8 mptcp_pm_get_limit_extra_subflows(const struct mptcp_sock *msk) { struct pm_nl_pernet *pernet = pm_nl_get_pernet_from_msk(msk); @@ -XXX,XX +XXX,XX @@ fill_remote_addresses_fullmesh(struct mptcp_sock *msk, struct mptcp_addr_info *local, struct mptcp_addr_info *addrs) { + u8 limit_extra_subflows = mptcp_pm_get_limit_extra_subflows(msk); bool deny_id0 = READ_ONCE(msk->pm.remote_deny_join_id0); DECLARE_BITMAP(unavail_id, MPTCP_PM_MAX_ADDR_ID + 1); struct sock *sk = (struct sock *)msk, *ssk; struct mptcp_subflow_context *subflow; - unsigned int limit_extra_subflows; int i = 0; - limit_extra_subflows = mptcp_pm_get_limit_extra_subflows(msk); - /* Forbid creation of new subflows matching existing ones, possibly * already created by incoming ADD_ADDR */ @@ -XXX,XX +XXX,XX @@ __lookup_addr(struct pm_nl_pernet *pernet, const struct mptcp_addr_info *info) static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk) { + u8 limit_extra_subflows = mptcp_pm_get_limit_extra_subflows(msk); + struct pm_nl_pernet *pernet = pm_nl_get_pernet_from_msk(msk); + u8 endp_subflow_max = mptcp_pm_get_endp_subflow_max(msk); + u8 endp_signal_max = mptcp_pm_get_endp_signal_max(msk); struct sock *sk = (struct sock *)msk; - unsigned int limit_extra_subflows; bool signal_and_subflow = false; - unsigned int endp_subflow_max; - unsigned int endp_signal_max; - struct pm_nl_pernet *pernet; struct mptcp_pm_local local; - pernet = pm_nl_get_pernet(sock_net(sk)); - - endp_signal_max = mptcp_pm_get_endp_signal_max(msk); - endp_subflow_max = mptcp_pm_get_endp_subflow_max(msk); - limit_extra_subflows = mptcp_pm_get_limit_extra_subflows(msk); - /* do lazy endpoint usage accounting for the MPC subflows */ if (unlikely(!(msk->pm.status & BIT(MPTCP_PM_MPC_ENDPOINT_ACCOUNTED))) && msk->first) { struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(msk->first); @@ -XXX,XX +XXX,XX @@ fill_local_addresses_vec_fullmesh(struct mptcp_sock *msk, struct mptcp_pm_local *locals, bool c_flag_case) { + u8 limit_extra_subflows = mptcp_pm_get_limit_extra_subflows(msk); struct pm_nl_pernet *pernet = pm_nl_get_pernet_from_msk(msk); struct sock *sk = (struct sock *)msk; struct mptcp_pm_addr_entry *entry; - unsigned int limit_extra_subflows; struct mptcp_addr_info mpc_addr; struct mptcp_pm_local *local; int i = 0; mptcp_local_address((struct sock_common *)msk, &mpc_addr); - limit_extra_subflows = mptcp_pm_get_limit_extra_subflows(msk); rcu_read_lock(); list_for_each_entry_rcu(entry, &pernet->endp_list, list) { @@ -XXX,XX +XXX,XX @@ fill_local_addresses_vec_c_flag(struct mptcp_sock *msk, struct mptcp_addr_info *remote, struct mptcp_pm_local *locals) { - unsigned int endp_subflow_max = mptcp_pm_get_endp_subflow_max(msk); + u8 limit_extra_subflows = mptcp_pm_get_limit_extra_subflows(msk); struct pm_nl_pernet *pernet = pm_nl_get_pernet_from_msk(msk); + u8 endp_subflow_max = mptcp_pm_get_endp_subflow_max(msk); struct sock *sk = (struct sock *)msk; - unsigned int limit_extra_subflows; struct mptcp_addr_info mpc_addr; struct mptcp_pm_local *local; int i = 0; mptcp_local_address((struct sock_common *)msk, &mpc_addr); - limit_extra_subflows = mptcp_pm_get_limit_extra_subflows(msk); while (msk->pm.local_addr_used < endp_subflow_max) { local = &locals[i]; @@ -XXX,XX +XXX,XX @@ fill_local_addresses_vec(struct mptcp_sock *msk, struct mptcp_addr_info *remote, static void mptcp_pm_nl_add_addr_received(struct mptcp_sock *msk) { + u8 limit_add_addr_accepted = mptcp_pm_get_limit_add_addr_accepted(msk); + u8 limit_extra_subflows = mptcp_pm_get_limit_extra_subflows(msk); struct mptcp_pm_local locals[MPTCP_PM_ADDR_MAX]; struct sock *sk = (struct sock *)msk; - unsigned int limit_extra_subflows; - unsigned int limit_add_addr_accepted; struct mptcp_addr_info remote; bool sf_created = false; int i, nr; - limit_add_addr_accepted = mptcp_pm_get_limit_add_addr_accepted(msk); - limit_extra_subflows = mptcp_pm_get_limit_extra_subflows(msk); - pr_debug("accepted %d:%d remote family %d\n", msk->pm.add_addr_accepted, limit_add_addr_accepted, msk->pm.remote.family); @@ -XXX,XX +XXX,XX @@ static void mptcp_pm_nl_add_addr_received(struct mptcp_sock *msk) void mptcp_pm_nl_rm_addr(struct mptcp_sock *msk, u8 rm_id) { if (rm_id && WARN_ON_ONCE(msk->pm.add_addr_accepted == 0)) { - unsigned int limit_add_addr_accepted = + u8 limit_add_addr_accepted = mptcp_pm_get_limit_add_addr_accepted(msk); /* Note: if the subflow has been closed before, this @@ -XXX,XX +XXX,XX @@ static int mptcp_pm_nl_append_new_local_addr(struct pm_nl_pernet *pernet, bool needs_id, bool replace) { struct mptcp_pm_addr_entry *cur, *del_entry = NULL; - unsigned int addr_max; int ret = -EINVAL; + u8 addr_max; spin_lock_bh(&pernet->lock); /* to keep the code simple, don't do IDR-like allocation for address ID, @@ -XXX,XX +XXX,XX @@ int mptcp_pm_nl_del_addr_doit(struct sk_buff *skb, struct genl_info *info) { struct pm_nl_pernet *pernet = genl_info_pm_nl(info); struct mptcp_pm_addr_entry addr, *entry; - unsigned int addr_max; struct nlattr *attr; + u8 addr_max; int ret; if (GENL_REQ_ATTR_CHECK(info, MPTCP_PM_ENDPOINT_ADDR)) diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index XXXXXXX..XXXXXXX 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -XXX,XX +XXX,XX @@ void __init mptcp_pm_userspace_register(void); void __init mptcp_pm_nl_init(void); void mptcp_pm_worker(struct mptcp_sock *msk); void __mptcp_pm_kernel_worker(struct mptcp_sock *msk); -unsigned int mptcp_pm_get_endp_signal_max(const struct mptcp_sock *msk); -unsigned int mptcp_pm_get_endp_subflow_max(const struct mptcp_sock *msk); -unsigned int mptcp_pm_get_limit_add_addr_accepted(const struct mptcp_sock *msk); -unsigned int mptcp_pm_get_limit_extra_subflows(const struct mptcp_sock *msk); +u8 mptcp_pm_get_endp_signal_max(const struct mptcp_sock *msk); +u8 mptcp_pm_get_endp_subflow_max(const struct mptcp_sock *msk); +u8 mptcp_pm_get_limit_add_addr_accepted(const struct mptcp_sock *msk); +u8 mptcp_pm_get_limit_extra_subflows(const struct mptcp_sock *msk); /* called under PM lock */ static inline void __mptcp_pm_close_subflow(struct mptcp_sock *msk) -- 2.51.0
When receiving an ADD_ADDR right after the 3WHS, the connection will switch to 'fully established'. It means the MPTCP worker will be called to treat two events, in this order: ADD_ADDR_RECEIVED, PM_ESTABLISHED. The MPTCP endpoints cannot have the ID 0, because it is reserved to the address and port used by the initial subflow. To be able to deal with this case in different places, msk->mpc_endpoint_id contains the endpoint ID linked to the initial subflow. This variable was only set when treating the first PM_ESTABLISHED event, after ADD_ADDR_RECEIVED. That's why in fill_local_addresses_vec(), the endpoint addresses were compared with the one of the initial subflow, instead of only comparing the IDs. Instead, msk->mpc_endpoint_id is now set when treating ADD_ADDR_RECEIVED as well, if needed, then the IDs can be compared. To be able to do so, the code doing that is now in a dedicated helper, and called from the functions linked to the two actions. While at it, mptcp_endp_get_local_id() has also been moved up, next to this new helper, because they are linked, and to be able to use it in fill_local_addresses_vec() in the next commit. Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> --- net/mptcp/pm_kernel.c | 81 +++++++++++++++++++++++++++------------------------ 1 file changed, 43 insertions(+), 38 deletions(-) diff --git a/net/mptcp/pm_kernel.c b/net/mptcp/pm_kernel.c index XXXXXXX..XXXXXXX 100644 --- a/net/mptcp/pm_kernel.c +++ b/net/mptcp/pm_kernel.c @@ -XXX,XX +XXX,XX @@ __lookup_addr(struct pm_nl_pernet *pernet, const struct mptcp_addr_info *info) return NULL; } +static u8 mptcp_endp_get_local_id(struct mptcp_sock *msk, + const struct mptcp_addr_info *addr) +{ + return msk->mpc_endpoint_id == addr->id ? 0 : addr->id; +} + +static void mptcp_set_mpc_endpoint_id(struct mptcp_sock *msk) +{ + struct mptcp_subflow_context *subflow; + struct mptcp_pm_addr_entry *entry; + struct mptcp_addr_info mpc_addr; + struct pm_nl_pernet *pernet; + bool backup = false; + + /* do lazy endpoint usage accounting for the MPC subflows */ + if (likely(msk->pm.status & BIT(MPTCP_PM_MPC_ENDPOINT_ACCOUNTED)) || + !msk->first) + return; + + subflow = mptcp_subflow_ctx(msk->first); + pernet = pm_nl_get_pernet_from_msk(msk); + + mptcp_local_address((struct sock_common *)msk->first, &mpc_addr); + rcu_read_lock(); + entry = __lookup_addr(pernet, &mpc_addr); + if (entry) { + __clear_bit(entry->addr.id, msk->pm.id_avail_bitmap); + msk->mpc_endpoint_id = entry->addr.id; + backup = !!(entry->flags & MPTCP_PM_ADDR_FLAG_BACKUP); + } + rcu_read_unlock(); + + if (backup) + mptcp_pm_send_ack(msk, subflow, true, backup); + + msk->pm.status |= BIT(MPTCP_PM_MPC_ENDPOINT_ACCOUNTED); +} + static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk) { u8 limit_extra_subflows = mptcp_pm_get_limit_extra_subflows(msk); @@ -XXX,XX +XXX,XX @@ static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk) bool signal_and_subflow = false; struct mptcp_pm_local local; - /* do lazy endpoint usage accounting for the MPC subflows */ - if (unlikely(!(msk->pm.status & BIT(MPTCP_PM_MPC_ENDPOINT_ACCOUNTED))) && msk->first) { - struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(msk->first); - struct mptcp_pm_addr_entry *entry; - struct mptcp_addr_info mpc_addr; - bool backup = false; - - mptcp_local_address((struct sock_common *)msk->first, &mpc_addr); - rcu_read_lock(); - entry = __lookup_addr(pernet, &mpc_addr); - if (entry) { - __clear_bit(entry->addr.id, msk->pm.id_avail_bitmap); - msk->mpc_endpoint_id = entry->addr.id; - backup = !!(entry->flags & MPTCP_PM_ADDR_FLAG_BACKUP); - } - rcu_read_unlock(); - - if (backup) - mptcp_pm_send_ack(msk, subflow, true, backup); - - msk->pm.status |= BIT(MPTCP_PM_MPC_ENDPOINT_ACCOUNTED); - } + mptcp_set_mpc_endpoint_id(msk); pr_debug("local %d:%d signal %d:%d subflows %d:%d\n", msk->pm.local_addr_used, endp_subflow_max, @@ -XXX,XX +XXX,XX @@ fill_local_addresses_vec_fullmesh(struct mptcp_sock *msk, struct pm_nl_pernet *pernet = pm_nl_get_pernet_from_msk(msk); struct sock *sk = (struct sock *)msk; struct mptcp_pm_addr_entry *entry; - struct mptcp_addr_info mpc_addr; struct mptcp_pm_local *local; int i = 0; - mptcp_local_address((struct sock_common *)msk, &mpc_addr); - rcu_read_lock(); list_for_each_entry_rcu(entry, &pernet->endp_list, list) { if (!(entry->flags & MPTCP_PM_ADDR_FLAG_FULLMESH)) @@ -XXX,XX +XXX,XX @@ fill_local_addresses_vec_fullmesh(struct mptcp_sock *msk, __clear_bit(local->addr.id, msk->pm.id_avail_bitmap); /* Special case for ID0: set the correct ID */ - if (mptcp_addresses_equal(&local->addr, &mpc_addr, - local->addr.port)) + if (local->addr.id == msk->mpc_endpoint_id) local->addr.id = 0; msk->pm.extra_subflows++; @@ -XXX,XX +XXX,XX @@ fill_local_addresses_vec_c_flag(struct mptcp_sock *msk, struct pm_nl_pernet *pernet = pm_nl_get_pernet_from_msk(msk); u8 endp_subflow_max = mptcp_pm_get_endp_subflow_max(msk); struct sock *sk = (struct sock *)msk; - struct mptcp_addr_info mpc_addr; struct mptcp_pm_local *local; int i = 0; - mptcp_local_address((struct sock_common *)msk, &mpc_addr); - while (msk->pm.local_addr_used < endp_subflow_max) { local = &locals[i]; @@ -XXX,XX +XXX,XX @@ fill_local_addresses_vec_c_flag(struct mptcp_sock *msk, if (!mptcp_pm_addr_families_match(sk, &local->addr, remote)) continue; - if (mptcp_addresses_equal(&local->addr, &mpc_addr, - local->addr.port)) + if (local->addr.id == msk->mpc_endpoint_id) continue; msk->pm.local_addr_used++; @@ -XXX,XX +XXX,XX @@ fill_local_addresses_vec(struct mptcp_sock *msk, struct mptcp_addr_info *remote, bool c_flag_case = remote->id && mptcp_pm_add_addr_c_flag_case(msk); int i; + mptcp_set_mpc_endpoint_id(msk); + /* If there is at least one MPTCP endpoint with a fullmesh flag */ i = fill_local_addresses_vec_fullmesh(msk, remote, locals, c_flag_case); if (i) @@ -XXX,XX +XXX,XX @@ int mptcp_pm_nl_add_addr_doit(struct sk_buff *skb, struct genl_info *info) return ret; } -static u8 mptcp_endp_get_local_id(struct mptcp_sock *msk, - const struct mptcp_addr_info *addr) -{ - return msk->mpc_endpoint_id == addr->id ? 0 : addr->id; -} - static bool mptcp_pm_remove_anno_addr(struct mptcp_sock *msk, const struct mptcp_addr_info *addr, bool force) -- 2.51.0
In this special case (fullmesh + subflow + c-flag), local_addr_used should be incremented for new subflows not involving local ID0. Similar to what is done when receiving an ADD_ADR in the non-fullmesh case, or in the subflow only case. Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> --- net/mptcp/pm_kernel.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/net/mptcp/pm_kernel.c b/net/mptcp/pm_kernel.c index XXXXXXX..XXXXXXX 100644 --- a/net/mptcp/pm_kernel.c +++ b/net/mptcp/pm_kernel.c @@ -XXX,XX +XXX,XX @@ fill_local_addresses_vec_fullmesh(struct mptcp_sock *msk, local->flags = entry->flags; local->ifindex = entry->ifindex; - if (c_flag_case && (entry->flags & MPTCP_PM_ADDR_FLAG_SUBFLOW)) + if (c_flag_case && (entry->flags & MPTCP_PM_ADDR_FLAG_SUBFLOW)) { __clear_bit(local->addr.id, msk->pm.id_avail_bitmap); + if (local->addr.id != msk->mpc_endpoint_id) + msk->pm.local_addr_used++; + } + /* Special case for ID0: set the correct ID */ if (local->addr.id == msk->mpc_endpoint_id) local->addr.id = 0; -- 2.51.0
Currently, upon the reception of an ADD_ADDR (and when the fullmesh flag is not used), the in-kernel PM will create new subflows using the local address the routing configuration will pick. It would be easier to pick local addresses from a selected list of endpoints, and use it only once, than relying on routing rules. Use case: both the client (C) and the server (S) have two addresses (a and b). The client establishes the connection between C(a) and S(a). Once established, the server announces its additional address S(b). Once received, the client connects to it using its second address C(b). Compared to a situation without the 'address' endpoint for C(b), the client didn't use this address C(b) to establish a subflow to the server's primary address S(a). So at the end, we have: C S C(a) --- S(a) C(b) --- S(b) In case of a 3rd address on each side (C(c) and S(c)), upon the reception of an ADD_ADDR with S(c), the client should not pick C(b) because it has already been used. C(c) should then be used. Note that this situation is currently possible if C doesn't add any endpoint, but configure the routing in order to pick C(b) for the route to S(b), and pick C(c) for the route to S(c). That doesn't sound very practical because it means knowing in advance the IP addresses that will be used and announced by the server. In the code, the new endpoint type is added. Similar to the other subflow types, an MPTCP_INFO counter is added. While at it, hole are now commented in struct mptcp_info, to remember next time that these holes can no longer be used. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/503 Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> --- include/uapi/linux/mptcp.h | 6 +++- net/mptcp/pm_kernel.c | 82 ++++++++++++++++++++++++++++++++++++++++++++++ net/mptcp/protocol.h | 1 + net/mptcp/sockopt.c | 2 ++ 4 files changed, 90 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/mptcp.h b/include/uapi/linux/mptcp.h index XXXXXXX..XXXXXXX 100644 --- a/include/uapi/linux/mptcp.h +++ b/include/uapi/linux/mptcp.h @@ -XXX,XX +XXX,XX @@ #define MPTCP_PM_ADDR_FLAG_BACKUP _BITUL(2) #define MPTCP_PM_ADDR_FLAG_FULLMESH _BITUL(3) #define MPTCP_PM_ADDR_FLAG_IMPLICIT _BITUL(4) +#define MPTCP_PM_ADDR_FLAG_ADDRESS _BITUL(5) struct mptcp_info { __u8 mptcpi_subflows; @@ -XXX,XX +XXX,XX @@ struct mptcp_info { #define mptcpi_endp_signal_max mptcpi_add_addr_signal_max __u8 mptcpi_add_addr_accepted_max; #define mptcpi_limit_add_addr_accepted mptcpi_add_addr_accepted_max + /* 16-bit hole that can no longer be filled */ __u32 mptcpi_flags; __u32 mptcpi_token; __u64 mptcpi_write_seq; @@ -XXX,XX +XXX,XX @@ struct mptcp_info { __u8 mptcpi_local_addr_max; #define mptcpi_endp_subflow_max mptcpi_local_addr_max __u8 mptcpi_csum_enabled; + /* 8-bit hole that can no longer be filled */ __u32 mptcpi_retransmits; __u64 mptcpi_bytes_retrans; __u64 mptcpi_bytes_sent; __u64 mptcpi_bytes_received; __u64 mptcpi_bytes_acked; __u8 mptcpi_subflows_total; - __u8 reserved[3]; + __u8 mptcpi_endp_address_max; + __u8 reserved[2]; __u32 mptcpi_last_data_sent; __u32 mptcpi_last_data_recv; __u32 mptcpi_last_ack_recv; diff --git a/net/mptcp/pm_kernel.c b/net/mptcp/pm_kernel.c index XXXXXXX..XXXXXXX 100644 --- a/net/mptcp/pm_kernel.c +++ b/net/mptcp/pm_kernel.c @@ -XXX,XX +XXX,XX @@ struct pm_nl_pernet { u8 endpoints; u8 endp_signal_max; u8 endp_subflow_max; + u8 endp_address_max; u8 limit_add_addr_accepted; u8 limit_extra_subflows; u8 next_id; @@ -XXX,XX +XXX,XX @@ u8 mptcp_pm_get_endp_subflow_max(const struct mptcp_sock *msk) } EXPORT_SYMBOL_GPL(mptcp_pm_get_endp_subflow_max); +u8 mptcp_pm_get_endp_address_max(const struct mptcp_sock *msk) +{ + struct pm_nl_pernet *pernet = pm_nl_get_pernet_from_msk(msk); + + return READ_ONCE(pernet->endp_address_max); +} +EXPORT_SYMBOL_GPL(mptcp_pm_get_endp_address_max); + u8 mptcp_pm_get_limit_add_addr_accepted(const struct mptcp_sock *msk) { struct pm_nl_pernet *pernet = pm_nl_get_pernet_from_msk(msk); @@ -XXX,XX +XXX,XX @@ fill_local_addresses_vec_fullmesh(struct mptcp_sock *msk, return i; } +static unsigned int +fill_local_addresses_vec_address(struct mptcp_sock *msk, + struct mptcp_addr_info *remote, + struct mptcp_pm_local *locals) +{ + struct pm_nl_pernet *pernet = pm_nl_get_pernet_from_msk(msk); + DECLARE_BITMAP(unavail_id, MPTCP_PM_MAX_ADDR_ID + 1); + struct mptcp_subflow_context *subflow; + struct sock *sk = (struct sock *)msk; + struct mptcp_pm_addr_entry *entry; + struct mptcp_pm_local *local; + int i = 0; + + /* Forbid creation of new subflows matching existing ones, possibly + * already created by 'subflow' endpoints + */ + bitmap_zero(unavail_id, MPTCP_PM_MAX_ADDR_ID + 1); + mptcp_for_each_subflow(msk, subflow) { + struct sock *ssk = mptcp_subflow_tcp_sock(subflow); + + if ((1 << inet_sk_state_load(ssk)) & + (TCPF_FIN_WAIT1 | TCPF_FIN_WAIT2 | TCPF_CLOSING | TCPF_CLOSE)) + continue; + + __set_bit(READ_ONCE(subflow->local_id), unavail_id); + } + + rcu_read_lock(); + list_for_each_entry_rcu(entry, &pernet->endp_list, list) { + if (!(entry->flags & MPTCP_PM_ADDR_FLAG_ADDRESS)) + continue; + + if (!mptcp_pm_addr_families_match(sk, &entry->addr, remote)) + continue; + + if (test_bit(mptcp_endp_get_local_id(msk, &entry->addr), + unavail_id)) + continue; + + local = &locals[i]; + local->addr = entry->addr; + local->flags = entry->flags; + local->ifindex = entry->ifindex; + + if (entry->flags & MPTCP_PM_ADDR_FLAG_SUBFLOW) { + __clear_bit(local->addr.id, msk->pm.id_avail_bitmap); + + if (local->addr.id != msk->mpc_endpoint_id) + msk->pm.local_addr_used++; + } + + msk->pm.extra_subflows++; + i++; + break; + } + rcu_read_unlock(); + + return i; +} + static unsigned int fill_local_addresses_vec_c_flag(struct mptcp_sock *msk, struct mptcp_addr_info *remote, @@ -XXX,XX +XXX,XX @@ fill_local_addresses_vec(struct mptcp_sock *msk, struct mptcp_addr_info *remote, if (i) return i; + /* If there is at least one MPTCP endpoint with an address flag */ + if (mptcp_pm_get_endp_address_max(msk)) + return fill_local_addresses_vec_address(msk, remote, locals); + /* Special case: peer sets the C flag, accept one ADD_ADDR if default * limits are used -- accepting no ADD_ADDR -- and use subflow endpoints */ @@ -XXX,XX +XXX,XX @@ static int mptcp_pm_nl_append_new_local_addr(struct pm_nl_pernet *pernet, addr_max = pernet->endp_subflow_max; WRITE_ONCE(pernet->endp_subflow_max, addr_max + 1); } + if (entry->flags & MPTCP_PM_ADDR_FLAG_ADDRESS) { + addr_max = pernet->endp_address_max; + WRITE_ONCE(pernet->endp_address_max, addr_max + 1); + } pernet->endpoints++; if (!entry->addr.port) @@ -XXX,XX +XXX,XX @@ int mptcp_pm_nl_del_addr_doit(struct sk_buff *skb, struct genl_info *info) addr_max = pernet->endp_subflow_max; WRITE_ONCE(pernet->endp_subflow_max, addr_max - 1); } + if (entry->flags & MPTCP_PM_ADDR_FLAG_ADDRESS) { + addr_max = pernet->endp_address_max; + WRITE_ONCE(pernet->endp_address_max, addr_max - 1); + } pernet->endpoints--; list_del_rcu(&entry->list); @@ -XXX,XX +XXX,XX @@ static void __reset_counters(struct pm_nl_pernet *pernet) { WRITE_ONCE(pernet->endp_signal_max, 0); WRITE_ONCE(pernet->endp_subflow_max, 0); + WRITE_ONCE(pernet->endp_address_max, 0); pernet->endpoints = 0; } diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index XXXXXXX..XXXXXXX 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -XXX,XX +XXX,XX @@ void mptcp_pm_worker(struct mptcp_sock *msk); void __mptcp_pm_kernel_worker(struct mptcp_sock *msk); u8 mptcp_pm_get_endp_signal_max(const struct mptcp_sock *msk); u8 mptcp_pm_get_endp_subflow_max(const struct mptcp_sock *msk); +u8 mptcp_pm_get_endp_address_max(const struct mptcp_sock *msk); u8 mptcp_pm_get_limit_add_addr_accepted(const struct mptcp_sock *msk); u8 mptcp_pm_get_limit_extra_subflows(const struct mptcp_sock *msk); diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c index XXXXXXX..XXXXXXX 100644 --- a/net/mptcp/sockopt.c +++ b/net/mptcp/sockopt.c @@ -XXX,XX +XXX,XX @@ void mptcp_diag_fill_info(struct mptcp_sock *msk, struct mptcp_info *info) mptcp_pm_get_limit_add_addr_accepted(msk); info->mptcpi_endp_subflow_max = mptcp_pm_get_endp_subflow_max(msk); + info->mptcpi_endp_address_max = + mptcp_pm_get_endp_address_max(msk); } if (__mptcp_check_fallback(msk)) -- 2.51.0
Here are a few sub-tests for mptcp_join.sh, validating the new 'address' endpoint type. In a setup where subflows created using the routing rules would be rejected by the listener, and where the latter announces one IP address, some cases are verified: - Without any 'address' endpoints: no new subflows are created. - With one 'address' endpoints: a second subflow is created. - With multiple 'address' endpoints: 2 IPv4 subflows are created. - With one 'address' endpoints, but the server announcing a second IP address, only one subflow is created. Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> --- tools/testing/selftests/net/mptcp/mptcp_join.sh | 56 +++++++++++++++++++++++++ tools/testing/selftests/net/mptcp/pm_nl_ctl.c | 9 ++++ 2 files changed, 65 insertions(+) diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh index XXXXXXX..XXXXXXX 100755 --- a/tools/testing/selftests/net/mptcp/mptcp_join.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh @@ -XXX,XX +XXX,XX @@ signal_address_tests() fi } +address_endp_tests() +{ + # no address endpoints: routing rules are used + if reset_with_tcp_filter "without address endpoint" ns1 10.0.2.2 REJECT && + mptcp_lib_kallsyms_has "mptcp_pm_get_endp_address_max$"; then + pm_nl_set_limits $ns1 0 2 + pm_nl_set_limits $ns2 2 2 + pm_nl_add_endpoint $ns1 10.0.2.1 flags signal + run_tests $ns1 $ns2 10.0.1.1 + join_syn_tx=1 \ + chk_join_nr 0 0 0 + chk_add_nr 1 1 + fi + + # address endpoints: this endpoint is used + if reset_with_tcp_filter "with address endpoint" ns1 10.0.2.2 REJECT && + mptcp_lib_kallsyms_has "mptcp_pm_get_endp_address_max$"; then + pm_nl_set_limits $ns1 0 2 + pm_nl_set_limits $ns2 2 2 + pm_nl_add_endpoint $ns1 10.0.2.1 flags signal + pm_nl_add_endpoint $ns2 10.0.3.2 flags address + run_tests $ns1 $ns2 10.0.1.1 + chk_join_nr 1 1 1 + chk_add_nr 1 1 + fi + + # address endpoints: these endpoints are used + if reset_with_tcp_filter "with multiple address endpoints" ns1 10.0.2.2 REJECT && + mptcp_lib_kallsyms_has "mptcp_pm_get_endp_address_max$"; then + pm_nl_set_limits $ns1 0 2 + pm_nl_set_limits $ns2 2 2 + pm_nl_add_endpoint $ns1 10.0.2.1 flags signal + pm_nl_add_endpoint $ns1 10.0.3.1 flags signal + pm_nl_add_endpoint $ns2 dead:beef:3::2 flags address + pm_nl_add_endpoint $ns2 10.0.3.2 flags address + pm_nl_add_endpoint $ns2 10.0.4.2 flags address + run_tests $ns1 $ns2 10.0.1.1 + chk_join_nr 2 2 2 + chk_add_nr 2 2 + fi + + # address endpoints: only one endpoint is used + if reset_with_tcp_filter "single address endpoints" ns1 10.0.2.2 REJECT && + mptcp_lib_kallsyms_has "mptcp_pm_get_endp_address_max$"; then + pm_nl_set_limits $ns1 0 2 + pm_nl_set_limits $ns2 2 2 + pm_nl_add_endpoint $ns1 10.0.2.1 flags signal + pm_nl_add_endpoint $ns1 10.0.3.1 flags signal + pm_nl_add_endpoint $ns2 10.0.3.2 flags address + run_tests $ns1 $ns2 10.0.1.1 + chk_join_nr 1 1 1 + chk_add_nr 2 2 + fi +} + link_failure_tests() { # accept and use add_addr with additional subflows and link loss @@ -XXX,XX +XXX,XX @@ all_tests_sorted=( f@subflows_tests e@subflows_error_tests s@signal_address_tests + A@address_endp_tests l@link_failure_tests t@add_addr_timeout_tests r@remove_tests diff --git a/tools/testing/selftests/net/mptcp/pm_nl_ctl.c b/tools/testing/selftests/net/mptcp/pm_nl_ctl.c index XXXXXXX..XXXXXXX 100644 --- a/tools/testing/selftests/net/mptcp/pm_nl_ctl.c +++ b/tools/testing/selftests/net/mptcp/pm_nl_ctl.c @@ -XXX,XX +XXX,XX @@ int add_addr(int fd, int pm_family, int argc, char *argv[]) flags |= MPTCP_PM_ADDR_FLAG_SUBFLOW; else if (!strcmp(tok, "signal")) flags |= MPTCP_PM_ADDR_FLAG_SIGNAL; + else if (!strcmp(tok, "address")) + flags |= MPTCP_PM_ADDR_FLAG_ADDRESS; else if (!strcmp(tok, "backup")) flags |= MPTCP_PM_ADDR_FLAG_BACKUP; else if (!strcmp(tok, "fullmesh")) @@ -XXX,XX +XXX,XX @@ static void print_addr(struct rtattr *attrs, int len) printf(","); } + if (flags & MPTCP_PM_ADDR_FLAG_ADDRESS) { + printf("address"); + flags &= ~MPTCP_PM_ADDR_FLAG_ADDRESS; + if (flags) + printf(","); + } + if (flags & MPTCP_PM_ADDR_FLAG_BACKUP) { printf("backup"); flags &= ~MPTCP_PM_ADDR_FLAG_BACKUP; -- 2.51.0
Currently, upon the reception of an ADD_ADDR (and when the fullmesh flag is not used), the in-kernel PM will create new subflows using the local address the routing configuration will pick. It would be easier to pick local addresses from a selected list of endpoints, and use it only once, than relying on routing rules. Use case: both the client (C) and the server (S) have two addresses (a and b). The client establishes the connection between C(a) and S(a). Once established, the server announces its additional address S(b). Once received, the client connects to it using its second address C(b). Compared to a situation without the 'address' endpoint for C(b), the client didn't use this address C(b) to establish a subflow to the server's primary address S(a). So at the end, we have: C S C(a) --- S(a) C(b) --- S(b) In case of a 3rd address on each side (C(c) and S(c)), upon the reception of an ADD_ADDR with S(c), the client should not pick C(b) because it has already been used. C(c) should then be used. Note that this situation is currently possible if C doesn't add any endpoint, but configure the routing in order to pick C(b) for the route to S(b), and pick C(c) for the route to S(c). That doesn't sound very practical because it means knowing in advance the IP addresses that will be used and announced by the server. Patches 1 & 2: some clean-ups and refactoring. Patch 3: "standardisation" and small perf improvement. Patch 4: squash to patches for a commit queued for net-next. Patch 5: new 'address' endpoints. Patch 6: validation using selftests. Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> --- Changes in v2: - patch 3: rename helper, move where it is called, comments (Mat) - patch 5: rename var and function (Mat) - Link to v1: https://lore.kernel.org/r/20250923-pm-kern-endp-add_addr-new-v1-0-60e3a8968f45@kernel.org --- Matthieu Baerts (NGI0) (6): mptcp: pm: in-kernel: remove stale_loss_cnt mptcp: pm: in-kernel: reduce pernet struct size mptcp: pm: in-kernel: compare IDs instead of addresses Squash to "mptcp: pm: in-kernel: usable client side with C-flag" mptcp: pm: in-kernel: add 'address' endpoints selftests: mptcp: join: validate new 'address' endpoints include/uapi/linux/mptcp.h | 6 +- net/mptcp/pm_kernel.c | 229 ++++++++++++++++-------- net/mptcp/protocol.h | 9 +- net/mptcp/sockopt.c | 2 + tools/testing/selftests/net/mptcp/mptcp_join.sh | 56 ++++++ tools/testing/selftests/net/mptcp/pm_nl_ctl.c | 9 + 6 files changed, 230 insertions(+), 81 deletions(-) --- base-commit: 77807b94c731081ef3d97e96dabcea7aae2bfe15 change-id: 20250918-pm-kern-endp-add_addr-new-a20893e45389 Best regards, -- Matthieu Baerts (NGI0) <matttbe@kernel.org>
It is currently not used. It was in fact never used since its introduction in commit ff5a0b421cb2 ("mptcp: faster active backup recovery"). It was probably initially added to struct pm_nl_pernet during the development of this commit, before being added to struct mptcp_pernet in ctrl.c, but not removed from the first place. Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> --- net/mptcp/pm_kernel.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/net/mptcp/pm_kernel.c b/net/mptcp/pm_kernel.c index XXXXXXX..XXXXXXX 100644 --- a/net/mptcp/pm_kernel.c +++ b/net/mptcp/pm_kernel.c @@ -XXX,XX +XXX,XX @@ struct pm_nl_pernet { spinlock_t lock; struct list_head endp_list; unsigned int endpoints; - unsigned int stale_loss_cnt; unsigned int endp_signal_max; unsigned int endp_subflow_max; unsigned int limit_add_addr_accepted; @@ -XXX,XX +XXX,XX @@ static int __net_init pm_nl_init_net(struct net *net) /* Cit. 2 subflows ought to be enough for anybody. */ pernet->limit_extra_subflows = 2; pernet->next_id = 1; - pernet->stale_loss_cnt = 4; spin_lock_init(&pernet->lock); /* No need to initialize other pernet fields, the struct is zeroed at -- 2.51.0
All the 'unsigned int' variables from the 'pm_nl_pernet' structure are bounded to MPTCP_PM_ADDR_MAX, currently set to 8. The endpoint ID is also bounded by the protocol to 8-bit. MPTCP_PM_ADDR_MAX, if extended later, will never over 8-bit. So no need to use 'unsigned int' variables, 'u8' is enough. Note that the exposed counters in MPTCP_INFO are already limited to 8-bit, same for pm->extra_subflows, and others. So it seems even better to limit them to 8-bit. Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> --- net/mptcp/pm_kernel.c | 59 ++++++++++++++++++++------------------------------- net/mptcp/protocol.h | 8 +++---- 2 files changed, 27 insertions(+), 40 deletions(-) diff --git a/net/mptcp/pm_kernel.c b/net/mptcp/pm_kernel.c index XXXXXXX..XXXXXXX 100644 --- a/net/mptcp/pm_kernel.c +++ b/net/mptcp/pm_kernel.c @@ -XXX,XX +XXX,XX @@ struct pm_nl_pernet { /* protects pernet updates */ spinlock_t lock; struct list_head endp_list; - unsigned int endpoints; - unsigned int endp_signal_max; - unsigned int endp_subflow_max; - unsigned int limit_add_addr_accepted; - unsigned int limit_extra_subflows; - unsigned int next_id; + u8 endpoints; + u8 endp_signal_max; + u8 endp_subflow_max; + u8 limit_add_addr_accepted; + u8 limit_extra_subflows; + u8 next_id; DECLARE_BITMAP(id_bitmap, MPTCP_PM_MAX_ADDR_ID + 1); }; @@ -XXX,XX +XXX,XX @@ static struct pm_nl_pernet *genl_info_pm_nl(struct genl_info *info) return pm_nl_get_pernet(genl_info_net(info)); } -unsigned int mptcp_pm_get_endp_signal_max(const struct mptcp_sock *msk) +u8 mptcp_pm_get_endp_signal_max(const struct mptcp_sock *msk) { const struct pm_nl_pernet *pernet = pm_nl_get_pernet_from_msk(msk); @@ -XXX,XX +XXX,XX @@ unsigned int mptcp_pm_get_endp_signal_max(const struct mptcp_sock *msk) } EXPORT_SYMBOL_GPL(mptcp_pm_get_endp_signal_max); -unsigned int mptcp_pm_get_endp_subflow_max(const struct mptcp_sock *msk) +u8 mptcp_pm_get_endp_subflow_max(const struct mptcp_sock *msk) { struct pm_nl_pernet *pernet = pm_nl_get_pernet_from_msk(msk); @@ -XXX,XX +XXX,XX @@ unsigned int mptcp_pm_get_endp_subflow_max(const struct mptcp_sock *msk) } EXPORT_SYMBOL_GPL(mptcp_pm_get_endp_subflow_max); -unsigned int mptcp_pm_get_limit_add_addr_accepted(const struct mptcp_sock *msk) +u8 mptcp_pm_get_limit_add_addr_accepted(const struct mptcp_sock *msk) { struct pm_nl_pernet *pernet = pm_nl_get_pernet_from_msk(msk); @@ -XXX,XX +XXX,XX @@ unsigned int mptcp_pm_get_limit_add_addr_accepted(const struct mptcp_sock *msk) } EXPORT_SYMBOL_GPL(mptcp_pm_get_limit_add_addr_accepted); -unsigned int mptcp_pm_get_limit_extra_subflows(const struct mptcp_sock *msk) +u8 mptcp_pm_get_limit_extra_subflows(const struct mptcp_sock *msk) { struct pm_nl_pernet *pernet = pm_nl_get_pernet_from_msk(msk); @@ -XXX,XX +XXX,XX @@ fill_remote_addresses_fullmesh(struct mptcp_sock *msk, struct mptcp_addr_info *local, struct mptcp_addr_info *addrs) { + u8 limit_extra_subflows = mptcp_pm_get_limit_extra_subflows(msk); bool deny_id0 = READ_ONCE(msk->pm.remote_deny_join_id0); DECLARE_BITMAP(unavail_id, MPTCP_PM_MAX_ADDR_ID + 1); struct sock *sk = (struct sock *)msk, *ssk; struct mptcp_subflow_context *subflow; - unsigned int limit_extra_subflows; int i = 0; - limit_extra_subflows = mptcp_pm_get_limit_extra_subflows(msk); - /* Forbid creation of new subflows matching existing ones, possibly * already created by incoming ADD_ADDR */ @@ -XXX,XX +XXX,XX @@ __lookup_addr(struct pm_nl_pernet *pernet, const struct mptcp_addr_info *info) static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk) { + u8 limit_extra_subflows = mptcp_pm_get_limit_extra_subflows(msk); + struct pm_nl_pernet *pernet = pm_nl_get_pernet_from_msk(msk); + u8 endp_subflow_max = mptcp_pm_get_endp_subflow_max(msk); + u8 endp_signal_max = mptcp_pm_get_endp_signal_max(msk); struct sock *sk = (struct sock *)msk; - unsigned int limit_extra_subflows; bool signal_and_subflow = false; - unsigned int endp_subflow_max; - unsigned int endp_signal_max; - struct pm_nl_pernet *pernet; struct mptcp_pm_local local; - pernet = pm_nl_get_pernet(sock_net(sk)); - - endp_signal_max = mptcp_pm_get_endp_signal_max(msk); - endp_subflow_max = mptcp_pm_get_endp_subflow_max(msk); - limit_extra_subflows = mptcp_pm_get_limit_extra_subflows(msk); - /* do lazy endpoint usage accounting for the MPC subflows */ if (unlikely(!(msk->pm.status & BIT(MPTCP_PM_MPC_ENDPOINT_ACCOUNTED))) && msk->first) { struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(msk->first); @@ -XXX,XX +XXX,XX @@ fill_local_addresses_vec_fullmesh(struct mptcp_sock *msk, struct mptcp_pm_local *locals, bool c_flag_case) { + u8 limit_extra_subflows = mptcp_pm_get_limit_extra_subflows(msk); struct pm_nl_pernet *pernet = pm_nl_get_pernet_from_msk(msk); struct sock *sk = (struct sock *)msk; struct mptcp_pm_addr_entry *entry; - unsigned int limit_extra_subflows; struct mptcp_addr_info mpc_addr; struct mptcp_pm_local *local; int i = 0; mptcp_local_address((struct sock_common *)msk, &mpc_addr); - limit_extra_subflows = mptcp_pm_get_limit_extra_subflows(msk); rcu_read_lock(); list_for_each_entry_rcu(entry, &pernet->endp_list, list) { @@ -XXX,XX +XXX,XX @@ fill_local_addresses_vec_c_flag(struct mptcp_sock *msk, struct mptcp_addr_info *remote, struct mptcp_pm_local *locals) { - unsigned int endp_subflow_max = mptcp_pm_get_endp_subflow_max(msk); + u8 limit_extra_subflows = mptcp_pm_get_limit_extra_subflows(msk); struct pm_nl_pernet *pernet = pm_nl_get_pernet_from_msk(msk); + u8 endp_subflow_max = mptcp_pm_get_endp_subflow_max(msk); struct sock *sk = (struct sock *)msk; - unsigned int limit_extra_subflows; struct mptcp_addr_info mpc_addr; struct mptcp_pm_local *local; int i = 0; mptcp_local_address((struct sock_common *)msk, &mpc_addr); - limit_extra_subflows = mptcp_pm_get_limit_extra_subflows(msk); while (msk->pm.local_addr_used < endp_subflow_max) { local = &locals[i]; @@ -XXX,XX +XXX,XX @@ fill_local_addresses_vec(struct mptcp_sock *msk, struct mptcp_addr_info *remote, static void mptcp_pm_nl_add_addr_received(struct mptcp_sock *msk) { + u8 limit_add_addr_accepted = mptcp_pm_get_limit_add_addr_accepted(msk); + u8 limit_extra_subflows = mptcp_pm_get_limit_extra_subflows(msk); struct mptcp_pm_local locals[MPTCP_PM_ADDR_MAX]; struct sock *sk = (struct sock *)msk; - unsigned int limit_extra_subflows; - unsigned int limit_add_addr_accepted; struct mptcp_addr_info remote; bool sf_created = false; int i, nr; - limit_add_addr_accepted = mptcp_pm_get_limit_add_addr_accepted(msk); - limit_extra_subflows = mptcp_pm_get_limit_extra_subflows(msk); - pr_debug("accepted %d:%d remote family %d\n", msk->pm.add_addr_accepted, limit_add_addr_accepted, msk->pm.remote.family); @@ -XXX,XX +XXX,XX @@ static void mptcp_pm_nl_add_addr_received(struct mptcp_sock *msk) void mptcp_pm_nl_rm_addr(struct mptcp_sock *msk, u8 rm_id) { if (rm_id && WARN_ON_ONCE(msk->pm.add_addr_accepted == 0)) { - unsigned int limit_add_addr_accepted = + u8 limit_add_addr_accepted = mptcp_pm_get_limit_add_addr_accepted(msk); /* Note: if the subflow has been closed before, this @@ -XXX,XX +XXX,XX @@ static int mptcp_pm_nl_append_new_local_addr(struct pm_nl_pernet *pernet, bool needs_id, bool replace) { struct mptcp_pm_addr_entry *cur, *del_entry = NULL; - unsigned int addr_max; int ret = -EINVAL; + u8 addr_max; spin_lock_bh(&pernet->lock); /* to keep the code simple, don't do IDR-like allocation for address ID, @@ -XXX,XX +XXX,XX @@ int mptcp_pm_nl_del_addr_doit(struct sk_buff *skb, struct genl_info *info) { struct pm_nl_pernet *pernet = genl_info_pm_nl(info); struct mptcp_pm_addr_entry addr, *entry; - unsigned int addr_max; struct nlattr *attr; + u8 addr_max; int ret; if (GENL_REQ_ATTR_CHECK(info, MPTCP_PM_ENDPOINT_ADDR)) diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index XXXXXXX..XXXXXXX 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -XXX,XX +XXX,XX @@ void __init mptcp_pm_userspace_register(void); void __init mptcp_pm_nl_init(void); void mptcp_pm_worker(struct mptcp_sock *msk); void __mptcp_pm_kernel_worker(struct mptcp_sock *msk); -unsigned int mptcp_pm_get_endp_signal_max(const struct mptcp_sock *msk); -unsigned int mptcp_pm_get_endp_subflow_max(const struct mptcp_sock *msk); -unsigned int mptcp_pm_get_limit_add_addr_accepted(const struct mptcp_sock *msk); -unsigned int mptcp_pm_get_limit_extra_subflows(const struct mptcp_sock *msk); +u8 mptcp_pm_get_endp_signal_max(const struct mptcp_sock *msk); +u8 mptcp_pm_get_endp_subflow_max(const struct mptcp_sock *msk); +u8 mptcp_pm_get_limit_add_addr_accepted(const struct mptcp_sock *msk); +u8 mptcp_pm_get_limit_extra_subflows(const struct mptcp_sock *msk); /* called under PM lock */ static inline void __mptcp_pm_close_subflow(struct mptcp_sock *msk) -- 2.51.0
When receiving an ADD_ADDR right after the 3WHS, the connection will switch to 'fully established'. It means the MPTCP worker will be called to treat two events, in this order: ADD_ADDR_RECEIVED, PM_ESTABLISHED. The MPTCP endpoints cannot have the ID 0, because it is reserved to the address and port used by the initial subflow. To be able to deal with this case in different places, msk->mpc_endpoint_id contains the endpoint ID linked to the initial subflow. This variable was only set when treating the first PM_ESTABLISHED event, after ADD_ADDR_RECEIVED. That's why in fill_local_addresses_vec(), the endpoint addresses were compared with the one of the initial subflow, instead of only comparing the IDs. Instead, msk->mpc_endpoint_id is now set when treating ADD_ADDR_RECEIVED as well, if needed, then the IDs can be compared. To be able to do so, the code doing that is now in a dedicated helper, and called from the functions linked to the two actions. While at it, mptcp_endp_get_local_id() has also been moved up, next to this new helper, because they are linked, and to be able to use it in fill_local_addresses_vec() in the next commit. Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> --- v2: - rename new helper to mptcp_mpc_endpoint_setup (Mat) - call it from mptcp_pm_nl_add_addr_received instead of fill_vec (Mat) - add comments mentioning the MP_PRIO operation. --- net/mptcp/pm_kernel.c | 82 +++++++++++++++++++++++++++------------------------ 1 file changed, 44 insertions(+), 38 deletions(-) diff --git a/net/mptcp/pm_kernel.c b/net/mptcp/pm_kernel.c index XXXXXXX..XXXXXXX 100644 --- a/net/mptcp/pm_kernel.c +++ b/net/mptcp/pm_kernel.c @@ -XXX,XX +XXX,XX @@ __lookup_addr(struct pm_nl_pernet *pernet, const struct mptcp_addr_info *info) return NULL; } +static u8 mptcp_endp_get_local_id(struct mptcp_sock *msk, + const struct mptcp_addr_info *addr) +{ + return msk->mpc_endpoint_id == addr->id ? 0 : addr->id; +} + +/* Set mpc_endpoint_id, and send MP_PRIO for ID0 if needed */ +static void mptcp_mpc_endpoint_setup(struct mptcp_sock *msk) +{ + struct mptcp_subflow_context *subflow; + struct mptcp_pm_addr_entry *entry; + struct mptcp_addr_info mpc_addr; + struct pm_nl_pernet *pernet; + bool backup = false; + + /* do lazy endpoint usage accounting for the MPC subflows */ + if (likely(msk->pm.status & BIT(MPTCP_PM_MPC_ENDPOINT_ACCOUNTED)) || + !msk->first) + return; + + subflow = mptcp_subflow_ctx(msk->first); + pernet = pm_nl_get_pernet_from_msk(msk); + + mptcp_local_address((struct sock_common *)msk->first, &mpc_addr); + rcu_read_lock(); + entry = __lookup_addr(pernet, &mpc_addr); + if (entry) { + __clear_bit(entry->addr.id, msk->pm.id_avail_bitmap); + msk->mpc_endpoint_id = entry->addr.id; + backup = !!(entry->flags & MPTCP_PM_ADDR_FLAG_BACKUP); + } + rcu_read_unlock(); + + /* Send MP_PRIO */ + if (backup) + mptcp_pm_send_ack(msk, subflow, true, backup); + + msk->pm.status |= BIT(MPTCP_PM_MPC_ENDPOINT_ACCOUNTED); +} + static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk) { u8 limit_extra_subflows = mptcp_pm_get_limit_extra_subflows(msk); @@ -XXX,XX +XXX,XX @@ static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk) bool signal_and_subflow = false; struct mptcp_pm_local local; - /* do lazy endpoint usage accounting for the MPC subflows */ - if (unlikely(!(msk->pm.status & BIT(MPTCP_PM_MPC_ENDPOINT_ACCOUNTED))) && msk->first) { - struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(msk->first); - struct mptcp_pm_addr_entry *entry; - struct mptcp_addr_info mpc_addr; - bool backup = false; - - mptcp_local_address((struct sock_common *)msk->first, &mpc_addr); - rcu_read_lock(); - entry = __lookup_addr(pernet, &mpc_addr); - if (entry) { - __clear_bit(entry->addr.id, msk->pm.id_avail_bitmap); - msk->mpc_endpoint_id = entry->addr.id; - backup = !!(entry->flags & MPTCP_PM_ADDR_FLAG_BACKUP); - } - rcu_read_unlock(); - - if (backup) - mptcp_pm_send_ack(msk, subflow, true, backup); - - msk->pm.status |= BIT(MPTCP_PM_MPC_ENDPOINT_ACCOUNTED); - } + mptcp_mpc_endpoint_setup(msk); pr_debug("local %d:%d signal %d:%d subflows %d:%d\n", msk->pm.local_addr_used, endp_subflow_max, @@ -XXX,XX +XXX,XX @@ fill_local_addresses_vec_fullmesh(struct mptcp_sock *msk, struct pm_nl_pernet *pernet = pm_nl_get_pernet_from_msk(msk); struct sock *sk = (struct sock *)msk; struct mptcp_pm_addr_entry *entry; - struct mptcp_addr_info mpc_addr; struct mptcp_pm_local *local; int i = 0; - mptcp_local_address((struct sock_common *)msk, &mpc_addr); - rcu_read_lock(); list_for_each_entry_rcu(entry, &pernet->endp_list, list) { if (!(entry->flags & MPTCP_PM_ADDR_FLAG_FULLMESH)) @@ -XXX,XX +XXX,XX @@ fill_local_addresses_vec_fullmesh(struct mptcp_sock *msk, __clear_bit(local->addr.id, msk->pm.id_avail_bitmap); /* Special case for ID0: set the correct ID */ - if (mptcp_addresses_equal(&local->addr, &mpc_addr, - local->addr.port)) + if (local->addr.id == msk->mpc_endpoint_id) local->addr.id = 0; msk->pm.extra_subflows++; @@ -XXX,XX +XXX,XX @@ fill_local_addresses_vec_c_flag(struct mptcp_sock *msk, struct pm_nl_pernet *pernet = pm_nl_get_pernet_from_msk(msk); u8 endp_subflow_max = mptcp_pm_get_endp_subflow_max(msk); struct sock *sk = (struct sock *)msk; - struct mptcp_addr_info mpc_addr; struct mptcp_pm_local *local; int i = 0; - mptcp_local_address((struct sock_common *)msk, &mpc_addr); - while (msk->pm.local_addr_used < endp_subflow_max) { local = &locals[i]; @@ -XXX,XX +XXX,XX @@ fill_local_addresses_vec_c_flag(struct mptcp_sock *msk, if (!mptcp_pm_addr_families_match(sk, &local->addr, remote)) continue; - if (mptcp_addresses_equal(&local->addr, &mpc_addr, - local->addr.port)) + if (local->addr.id == msk->mpc_endpoint_id) continue; msk->pm.local_addr_used++; @@ -XXX,XX +XXX,XX @@ static void mptcp_pm_nl_add_addr_received(struct mptcp_sock *msk) remote = msk->pm.remote; mptcp_pm_announce_addr(msk, &remote, true); mptcp_pm_addr_send_ack(msk); + mptcp_mpc_endpoint_setup(msk); if (lookup_subflow_by_daddr(&msk->conn_list, &remote)) return; @@ -XXX,XX +XXX,XX @@ int mptcp_pm_nl_add_addr_doit(struct sk_buff *skb, struct genl_info *info) return ret; } -static u8 mptcp_endp_get_local_id(struct mptcp_sock *msk, - const struct mptcp_addr_info *addr) -{ - return msk->mpc_endpoint_id == addr->id ? 0 : addr->id; -} - static bool mptcp_pm_remove_anno_addr(struct mptcp_sock *msk, const struct mptcp_addr_info *addr, bool force) -- 2.51.0
In this special case (fullmesh + subflow + c-flag), local_addr_used should be incremented for new subflows not involving local ID0. Similar to what is done when receiving an ADD_ADR in the non-fullmesh case, or in the subflow only case. Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> --- net/mptcp/pm_kernel.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/net/mptcp/pm_kernel.c b/net/mptcp/pm_kernel.c index XXXXXXX..XXXXXXX 100644 --- a/net/mptcp/pm_kernel.c +++ b/net/mptcp/pm_kernel.c @@ -XXX,XX +XXX,XX @@ fill_local_addresses_vec_fullmesh(struct mptcp_sock *msk, local->flags = entry->flags; local->ifindex = entry->ifindex; - if (c_flag_case && (entry->flags & MPTCP_PM_ADDR_FLAG_SUBFLOW)) + if (c_flag_case && (entry->flags & MPTCP_PM_ADDR_FLAG_SUBFLOW)) { __clear_bit(local->addr.id, msk->pm.id_avail_bitmap); + if (local->addr.id != msk->mpc_endpoint_id) + msk->pm.local_addr_used++; + } + /* Special case for ID0: set the correct ID */ if (local->addr.id == msk->mpc_endpoint_id) local->addr.id = 0; -- 2.51.0
Currently, upon the reception of an ADD_ADDR (and when the fullmesh flag is not used), the in-kernel PM will create new subflows using the local address the routing configuration will pick. It would be easier to pick local addresses from a selected list of endpoints, and use it only once, than relying on routing rules. Use case: both the client (C) and the server (S) have two addresses (a and b). The client establishes the connection between C(a) and S(a). Once established, the server announces its additional address S(b). Once received, the client connects to it using its second address C(b). Compared to a situation without the 'address' endpoint for C(b), the client didn't use this address C(b) to establish a subflow to the server's primary address S(a). So at the end, we have: C S C(a) --- S(a) C(b) --- S(b) In case of a 3rd address on each side (C(c) and S(c)), upon the reception of an ADD_ADDR with S(c), the client should not pick C(b) because it has already been used. C(c) should then be used. Note that this situation is currently possible if C doesn't add any endpoint, but configure the routing in order to pick C(b) for the route to S(b), and pick C(c) for the route to S(c). That doesn't sound very practical because it means knowing in advance the IP addresses that will be used and announced by the server. In the code, the new endpoint type is added. Similar to the other subflow types, an MPTCP_INFO counter is added. While at it, hole are now commented in struct mptcp_info, to remember next time that these holes can no longer be used. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/503 Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> --- v2: - rename var and function names to state 1 address will be filled (Mat) --- include/uapi/linux/mptcp.h | 6 +++- net/mptcp/pm_kernel.c | 82 ++++++++++++++++++++++++++++++++++++++++++++++ net/mptcp/protocol.h | 1 + net/mptcp/sockopt.c | 2 ++ 4 files changed, 90 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/mptcp.h b/include/uapi/linux/mptcp.h index XXXXXXX..XXXXXXX 100644 --- a/include/uapi/linux/mptcp.h +++ b/include/uapi/linux/mptcp.h @@ -XXX,XX +XXX,XX @@ #define MPTCP_PM_ADDR_FLAG_BACKUP _BITUL(2) #define MPTCP_PM_ADDR_FLAG_FULLMESH _BITUL(3) #define MPTCP_PM_ADDR_FLAG_IMPLICIT _BITUL(4) +#define MPTCP_PM_ADDR_FLAG_ADDRESS _BITUL(5) struct mptcp_info { __u8 mptcpi_subflows; @@ -XXX,XX +XXX,XX @@ struct mptcp_info { #define mptcpi_endp_signal_max mptcpi_add_addr_signal_max __u8 mptcpi_add_addr_accepted_max; #define mptcpi_limit_add_addr_accepted mptcpi_add_addr_accepted_max + /* 16-bit hole that can no longer be filled */ __u32 mptcpi_flags; __u32 mptcpi_token; __u64 mptcpi_write_seq; @@ -XXX,XX +XXX,XX @@ struct mptcp_info { __u8 mptcpi_local_addr_max; #define mptcpi_endp_subflow_max mptcpi_local_addr_max __u8 mptcpi_csum_enabled; + /* 8-bit hole that can no longer be filled */ __u32 mptcpi_retransmits; __u64 mptcpi_bytes_retrans; __u64 mptcpi_bytes_sent; __u64 mptcpi_bytes_received; __u64 mptcpi_bytes_acked; __u8 mptcpi_subflows_total; - __u8 reserved[3]; + __u8 mptcpi_endp_address_max; + __u8 reserved[2]; __u32 mptcpi_last_data_sent; __u32 mptcpi_last_data_recv; __u32 mptcpi_last_ack_recv; diff --git a/net/mptcp/pm_kernel.c b/net/mptcp/pm_kernel.c index XXXXXXX..XXXXXXX 100644 --- a/net/mptcp/pm_kernel.c +++ b/net/mptcp/pm_kernel.c @@ -XXX,XX +XXX,XX @@ struct pm_nl_pernet { u8 endpoints; u8 endp_signal_max; u8 endp_subflow_max; + u8 endp_address_max; u8 limit_add_addr_accepted; u8 limit_extra_subflows; u8 next_id; @@ -XXX,XX +XXX,XX @@ u8 mptcp_pm_get_endp_subflow_max(const struct mptcp_sock *msk) } EXPORT_SYMBOL_GPL(mptcp_pm_get_endp_subflow_max); +u8 mptcp_pm_get_endp_address_max(const struct mptcp_sock *msk) +{ + struct pm_nl_pernet *pernet = pm_nl_get_pernet_from_msk(msk); + + return READ_ONCE(pernet->endp_address_max); +} +EXPORT_SYMBOL_GPL(mptcp_pm_get_endp_address_max); + u8 mptcp_pm_get_limit_add_addr_accepted(const struct mptcp_sock *msk) { struct pm_nl_pernet *pernet = pm_nl_get_pernet_from_msk(msk); @@ -XXX,XX +XXX,XX @@ fill_local_addresses_vec_fullmesh(struct mptcp_sock *msk, return i; } +static unsigned int +fill_local_address_endp(struct mptcp_sock *msk, struct mptcp_addr_info *remote, + struct mptcp_pm_local *locals) +{ + struct pm_nl_pernet *pernet = pm_nl_get_pernet_from_msk(msk); + DECLARE_BITMAP(unavail_id, MPTCP_PM_MAX_ADDR_ID + 1); + struct mptcp_subflow_context *subflow; + struct sock *sk = (struct sock *)msk; + struct mptcp_pm_addr_entry *entry; + struct mptcp_pm_local *local; + int found = 0; + + /* Forbid creation of new subflows matching existing ones, possibly + * already created by 'subflow' endpoints + */ + bitmap_zero(unavail_id, MPTCP_PM_MAX_ADDR_ID + 1); + mptcp_for_each_subflow(msk, subflow) { + struct sock *ssk = mptcp_subflow_tcp_sock(subflow); + + if ((1 << inet_sk_state_load(ssk)) & + (TCPF_FIN_WAIT1 | TCPF_FIN_WAIT2 | TCPF_CLOSING | + TCPF_CLOSE)) + continue; + + __set_bit(READ_ONCE(subflow->local_id), unavail_id); + } + + rcu_read_lock(); + list_for_each_entry_rcu(entry, &pernet->endp_list, list) { + if (!(entry->flags & MPTCP_PM_ADDR_FLAG_ADDRESS)) + continue; + + if (!mptcp_pm_addr_families_match(sk, &entry->addr, remote)) + continue; + + if (test_bit(mptcp_endp_get_local_id(msk, &entry->addr), + unavail_id)) + continue; + + local = &locals[0]; + local->addr = entry->addr; + local->flags = entry->flags; + local->ifindex = entry->ifindex; + + if (entry->flags & MPTCP_PM_ADDR_FLAG_SUBFLOW) { + __clear_bit(local->addr.id, msk->pm.id_avail_bitmap); + + if (local->addr.id != msk->mpc_endpoint_id) + msk->pm.local_addr_used++; + } + + msk->pm.extra_subflows++; + found = 1; + break; + } + rcu_read_unlock(); + + return found; +} + static unsigned int fill_local_addresses_vec_c_flag(struct mptcp_sock *msk, struct mptcp_addr_info *remote, @@ -XXX,XX +XXX,XX @@ fill_local_addresses_vec(struct mptcp_sock *msk, struct mptcp_addr_info *remote, if (i) return i; + /* If there is at least one MPTCP endpoint with an address flag */ + if (mptcp_pm_get_endp_address_max(msk)) + return fill_local_address_endp(msk, remote, locals); + /* Special case: peer sets the C flag, accept one ADD_ADDR if default * limits are used -- accepting no ADD_ADDR -- and use subflow endpoints */ @@ -XXX,XX +XXX,XX @@ static int mptcp_pm_nl_append_new_local_addr(struct pm_nl_pernet *pernet, addr_max = pernet->endp_subflow_max; WRITE_ONCE(pernet->endp_subflow_max, addr_max + 1); } + if (entry->flags & MPTCP_PM_ADDR_FLAG_ADDRESS) { + addr_max = pernet->endp_address_max; + WRITE_ONCE(pernet->endp_address_max, addr_max + 1); + } pernet->endpoints++; if (!entry->addr.port) @@ -XXX,XX +XXX,XX @@ int mptcp_pm_nl_del_addr_doit(struct sk_buff *skb, struct genl_info *info) addr_max = pernet->endp_subflow_max; WRITE_ONCE(pernet->endp_subflow_max, addr_max - 1); } + if (entry->flags & MPTCP_PM_ADDR_FLAG_ADDRESS) { + addr_max = pernet->endp_address_max; + WRITE_ONCE(pernet->endp_address_max, addr_max - 1); + } pernet->endpoints--; list_del_rcu(&entry->list); @@ -XXX,XX +XXX,XX @@ static void __reset_counters(struct pm_nl_pernet *pernet) { WRITE_ONCE(pernet->endp_signal_max, 0); WRITE_ONCE(pernet->endp_subflow_max, 0); + WRITE_ONCE(pernet->endp_address_max, 0); pernet->endpoints = 0; } diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index XXXXXXX..XXXXXXX 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -XXX,XX +XXX,XX @@ void mptcp_pm_worker(struct mptcp_sock *msk); void __mptcp_pm_kernel_worker(struct mptcp_sock *msk); u8 mptcp_pm_get_endp_signal_max(const struct mptcp_sock *msk); u8 mptcp_pm_get_endp_subflow_max(const struct mptcp_sock *msk); +u8 mptcp_pm_get_endp_address_max(const struct mptcp_sock *msk); u8 mptcp_pm_get_limit_add_addr_accepted(const struct mptcp_sock *msk); u8 mptcp_pm_get_limit_extra_subflows(const struct mptcp_sock *msk); diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c index XXXXXXX..XXXXXXX 100644 --- a/net/mptcp/sockopt.c +++ b/net/mptcp/sockopt.c @@ -XXX,XX +XXX,XX @@ void mptcp_diag_fill_info(struct mptcp_sock *msk, struct mptcp_info *info) mptcp_pm_get_limit_add_addr_accepted(msk); info->mptcpi_endp_subflow_max = mptcp_pm_get_endp_subflow_max(msk); + info->mptcpi_endp_address_max = + mptcp_pm_get_endp_address_max(msk); } if (__mptcp_check_fallback(msk)) -- 2.51.0
Here are a few sub-tests for mptcp_join.sh, validating the new 'address' endpoint type. In a setup where subflows created using the routing rules would be rejected by the listener, and where the latter announces one IP address, some cases are verified: - Without any 'address' endpoints: no new subflows are created. - With one 'address' endpoints: a second subflow is created. - With multiple 'address' endpoints: 2 IPv4 subflows are created. - With one 'address' endpoints, but the server announcing a second IP address, only one subflow is created. Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> --- tools/testing/selftests/net/mptcp/mptcp_join.sh | 56 +++++++++++++++++++++++++ tools/testing/selftests/net/mptcp/pm_nl_ctl.c | 9 ++++ 2 files changed, 65 insertions(+) diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh index XXXXXXX..XXXXXXX 100755 --- a/tools/testing/selftests/net/mptcp/mptcp_join.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh @@ -XXX,XX +XXX,XX @@ signal_address_tests() fi } +address_endp_tests() +{ + # no address endpoints: routing rules are used + if reset_with_tcp_filter "without address endpoint" ns1 10.0.2.2 REJECT && + mptcp_lib_kallsyms_has "mptcp_pm_get_endp_address_max$"; then + pm_nl_set_limits $ns1 0 2 + pm_nl_set_limits $ns2 2 2 + pm_nl_add_endpoint $ns1 10.0.2.1 flags signal + run_tests $ns1 $ns2 10.0.1.1 + join_syn_tx=1 \ + chk_join_nr 0 0 0 + chk_add_nr 1 1 + fi + + # address endpoints: this endpoint is used + if reset_with_tcp_filter "with address endpoint" ns1 10.0.2.2 REJECT && + mptcp_lib_kallsyms_has "mptcp_pm_get_endp_address_max$"; then + pm_nl_set_limits $ns1 0 2 + pm_nl_set_limits $ns2 2 2 + pm_nl_add_endpoint $ns1 10.0.2.1 flags signal + pm_nl_add_endpoint $ns2 10.0.3.2 flags address + run_tests $ns1 $ns2 10.0.1.1 + chk_join_nr 1 1 1 + chk_add_nr 1 1 + fi + + # address endpoints: these endpoints are used + if reset_with_tcp_filter "with multiple address endpoints" ns1 10.0.2.2 REJECT && + mptcp_lib_kallsyms_has "mptcp_pm_get_endp_address_max$"; then + pm_nl_set_limits $ns1 0 2 + pm_nl_set_limits $ns2 2 2 + pm_nl_add_endpoint $ns1 10.0.2.1 flags signal + pm_nl_add_endpoint $ns1 10.0.3.1 flags signal + pm_nl_add_endpoint $ns2 dead:beef:3::2 flags address + pm_nl_add_endpoint $ns2 10.0.3.2 flags address + pm_nl_add_endpoint $ns2 10.0.4.2 flags address + run_tests $ns1 $ns2 10.0.1.1 + chk_join_nr 2 2 2 + chk_add_nr 2 2 + fi + + # address endpoints: only one endpoint is used + if reset_with_tcp_filter "single address endpoints" ns1 10.0.2.2 REJECT && + mptcp_lib_kallsyms_has "mptcp_pm_get_endp_address_max$"; then + pm_nl_set_limits $ns1 0 2 + pm_nl_set_limits $ns2 2 2 + pm_nl_add_endpoint $ns1 10.0.2.1 flags signal + pm_nl_add_endpoint $ns1 10.0.3.1 flags signal + pm_nl_add_endpoint $ns2 10.0.3.2 flags address + run_tests $ns1 $ns2 10.0.1.1 + chk_join_nr 1 1 1 + chk_add_nr 2 2 + fi +} + link_failure_tests() { # accept and use add_addr with additional subflows and link loss @@ -XXX,XX +XXX,XX @@ all_tests_sorted=( f@subflows_tests e@subflows_error_tests s@signal_address_tests + A@address_endp_tests l@link_failure_tests t@add_addr_timeout_tests r@remove_tests diff --git a/tools/testing/selftests/net/mptcp/pm_nl_ctl.c b/tools/testing/selftests/net/mptcp/pm_nl_ctl.c index XXXXXXX..XXXXXXX 100644 --- a/tools/testing/selftests/net/mptcp/pm_nl_ctl.c +++ b/tools/testing/selftests/net/mptcp/pm_nl_ctl.c @@ -XXX,XX +XXX,XX @@ int add_addr(int fd, int pm_family, int argc, char *argv[]) flags |= MPTCP_PM_ADDR_FLAG_SUBFLOW; else if (!strcmp(tok, "signal")) flags |= MPTCP_PM_ADDR_FLAG_SIGNAL; + else if (!strcmp(tok, "address")) + flags |= MPTCP_PM_ADDR_FLAG_ADDRESS; else if (!strcmp(tok, "backup")) flags |= MPTCP_PM_ADDR_FLAG_BACKUP; else if (!strcmp(tok, "fullmesh")) @@ -XXX,XX +XXX,XX @@ static void print_addr(struct rtattr *attrs, int len) printf(","); } + if (flags & MPTCP_PM_ADDR_FLAG_ADDRESS) { + printf("address"); + flags &= ~MPTCP_PM_ADDR_FLAG_ADDRESS; + if (flags) + printf(","); + } + if (flags & MPTCP_PM_ADDR_FLAG_BACKUP) { printf("backup"); flags &= ~MPTCP_PM_ADDR_FLAG_BACKUP; -- 2.51.0