From nobody Sat Oct 11 08:04:34 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F41A530E0D8 for ; Tue, 23 Sep 2025 09:33:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758619990; cv=none; b=JM5cV9gvqv6nSGOr2WlnI00PL+2CDEWXOWt4XNxlWfxO7AlJDAdt9M2SQJsM6EZ4bEFqpjHDJNJAL6tn5KnFx38aQ1LkmtR5d/ikZCLUDsCzdFO5y7p/2CPmtzbFL+sqvqnYuz04LuX/M1QS3NGaSTvDeYT38hc1xie/uKrtoyE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758619990; c=relaxed/simple; bh=IPTmBXy/Tar6Ks8W5hbH2qvDPsOxMd/ABnDHBa3sIWE=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=k3iSJVOimYQIGGRUCT7Bwg84emtjxrptAWtNyq5RskbM1kRH461vz0c/gpzqv969QQtGjR91J6yYXhWxU668aT9r4g/TQxIPi43RwGEw0bYBWAeP8+na01+Qy9iySUWvHzetjnXSw4Y3KLMCSqmj7wAw/W4YPliC1JrXvqMKobg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=EZ2k0d3Q; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="EZ2k0d3Q" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0B696C4CEF5; Tue, 23 Sep 2025 09:33:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1758619989; bh=IPTmBXy/Tar6Ks8W5hbH2qvDPsOxMd/ABnDHBa3sIWE=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=EZ2k0d3QoevEIQRfkTxbXQiFfIMgGRj/TID8Pjhp4KnqDX1qwNnDK6jn53JsLYp4j 4uPOsoDDRmivQPqVSXLAbVSdAVue9WefnXkQSn1uCSoRASRFJVZaOSdxkoi11qLcMX Qjlp5WZhxIKWom6rA1lk3s93Hnz/OedlCU+6mEW16i2gnbvViZgZf/+glVMWAA3+cQ 7/14tY4UJMm9oY3/btUsi/tqvfXEeieZRbae+2j0HUlhJU0w6rKDvzUuQTxEMB2gIU Ivc9r329oxEs7E+9GnwG0WN0z3yYP8SkzQAm+UlY9/R5AoqudDtjthI6nXC5KWkXaJ TQG1JqyjB1cig== From: "Matthieu Baerts (NGI0)" Date: Tue, 23 Sep 2025 11:32:51 +0200 Subject: [PATCH mptcp-next v2 5/6] mptcp: pm: in-kernel: add 'address' endpoints Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20250923-pm-kern-endp-add_addr-new-v2-5-ee5369dad569@kernel.org> References: <20250923-pm-kern-endp-add_addr-new-v2-0-ee5369dad569@kernel.org> In-Reply-To: <20250923-pm-kern-endp-add_addr-new-v2-0-ee5369dad569@kernel.org> To: MPTCP Upstream Cc: "Matthieu Baerts (NGI0)" X-Mailer: b4 0.14.2 X-Developer-Signature: v=1; a=openpgp-sha256; l=9124; i=matttbe@kernel.org; h=from:subject:message-id; bh=IPTmBXy/Tar6Ks8W5hbH2qvDPsOxMd/ABnDHBa3sIWE=; b=owGbwMvMwCVWo/Th0Gd3rumMp9WSGDIuZfox93+YN8erX3XmOlmhA1fm8/v8u5/UtL1+wvPV9 eJPN2wx7ChlYRDjYpAVU2SRbovMn/m8irfEy88CZg4rE8gQBi5OAZgIEzsjw9Qou93vDYtOhoRs Wq/eKdp6fX0199+FnC6XZjPVMNdvi2Fk2OhZn3Hw4D+PtfJCPuv/CVz18A08nzbPU6E29PdBsTZ JfgA= X-Developer-Key: i=matttbe@kernel.org; a=openpgp; fpr=E8CB85F76877057A6E27F77AF6B7824F4269A073 Currently, upon the reception of an ADD_ADDR (and when the fullmesh flag is not used), the in-kernel PM will create new subflows using the local address the routing configuration will pick. It would be easier to pick local addresses from a selected list of endpoints, and use it only once, than relying on routing rules. Use case: both the client (C) and the server (S) have two addresses (a and b). The client establishes the connection between C(a) and S(a). Once established, the server announces its additional address S(b). Once received, the client connects to it using its second address C(b). Compared to a situation without the 'address' endpoint for C(b), the client didn't use this address C(b) to establish a subflow to the server's primary address S(a). So at the end, we have: C S C(a) --- S(a) C(b) --- S(b) In case of a 3rd address on each side (C(c) and S(c)), upon the reception of an ADD_ADDR with S(c), the client should not pick C(b) because it has already been used. C(c) should then be used. Note that this situation is currently possible if C doesn't add any endpoint, but configure the routing in order to pick C(b) for the route to S(b), and pick C(c) for the route to S(c). That doesn't sound very practical because it means knowing in advance the IP addresses that will be used and announced by the server. In the code, the new endpoint type is added. Similar to the other subflow types, an MPTCP_INFO counter is added. While at it, hole are now commented in struct mptcp_info, to remember next time that these holes can no longer be used. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/503 Signed-off-by: Matthieu Baerts (NGI0) --- v2: - rename var and function names to state 1 address will be filled (Mat) --- include/uapi/linux/mptcp.h | 6 +++- net/mptcp/pm_kernel.c | 82 ++++++++++++++++++++++++++++++++++++++++++= ++++ net/mptcp/protocol.h | 1 + net/mptcp/sockopt.c | 2 ++ 4 files changed, 90 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/mptcp.h b/include/uapi/linux/mptcp.h index 5ec996977b3fa2351222e6d01b814770b34348e9..65dc069e9063325ad2e1ffb1da2= 1cc4a4b6efd32 100644 --- a/include/uapi/linux/mptcp.h +++ b/include/uapi/linux/mptcp.h @@ -39,6 +39,7 @@ #define MPTCP_PM_ADDR_FLAG_BACKUP _BITUL(2) #define MPTCP_PM_ADDR_FLAG_FULLMESH _BITUL(3) #define MPTCP_PM_ADDR_FLAG_IMPLICIT _BITUL(4) +#define MPTCP_PM_ADDR_FLAG_ADDRESS _BITUL(5) =20 struct mptcp_info { __u8 mptcpi_subflows; @@ -51,6 +52,7 @@ struct mptcp_info { #define mptcpi_endp_signal_max mptcpi_add_addr_signal_max __u8 mptcpi_add_addr_accepted_max; #define mptcpi_limit_add_addr_accepted mptcpi_add_addr_accepted_max + /* 16-bit hole that can no longer be filled */ __u32 mptcpi_flags; __u32 mptcpi_token; __u64 mptcpi_write_seq; @@ -60,13 +62,15 @@ struct mptcp_info { __u8 mptcpi_local_addr_max; #define mptcpi_endp_subflow_max mptcpi_local_addr_max __u8 mptcpi_csum_enabled; + /* 8-bit hole that can no longer be filled */ __u32 mptcpi_retransmits; __u64 mptcpi_bytes_retrans; __u64 mptcpi_bytes_sent; __u64 mptcpi_bytes_received; __u64 mptcpi_bytes_acked; __u8 mptcpi_subflows_total; - __u8 reserved[3]; + __u8 mptcpi_endp_address_max; + __u8 reserved[2]; __u32 mptcpi_last_data_sent; __u32 mptcpi_last_data_recv; __u32 mptcpi_last_ack_recv; diff --git a/net/mptcp/pm_kernel.c b/net/mptcp/pm_kernel.c index 116d62ed86f78b0242a33a67f77ff875ba00ae30..13c575e477833303c8f030e37a2= 809ece3c30ab4 100644 --- a/net/mptcp/pm_kernel.c +++ b/net/mptcp/pm_kernel.c @@ -21,6 +21,7 @@ struct pm_nl_pernet { u8 endpoints; u8 endp_signal_max; u8 endp_subflow_max; + u8 endp_address_max; u8 limit_add_addr_accepted; u8 limit_extra_subflows; u8 next_id; @@ -61,6 +62,14 @@ u8 mptcp_pm_get_endp_subflow_max(const struct mptcp_sock= *msk) } EXPORT_SYMBOL_GPL(mptcp_pm_get_endp_subflow_max); =20 +u8 mptcp_pm_get_endp_address_max(const struct mptcp_sock *msk) +{ + struct pm_nl_pernet *pernet =3D pm_nl_get_pernet_from_msk(msk); + + return READ_ONCE(pernet->endp_address_max); +} +EXPORT_SYMBOL_GPL(mptcp_pm_get_endp_address_max); + u8 mptcp_pm_get_limit_add_addr_accepted(const struct mptcp_sock *msk) { struct pm_nl_pernet *pernet =3D pm_nl_get_pernet_from_msk(msk); @@ -453,6 +462,66 @@ fill_local_addresses_vec_fullmesh(struct mptcp_sock *m= sk, return i; } =20 +static unsigned int +fill_local_address_endp(struct mptcp_sock *msk, struct mptcp_addr_info *re= mote, + struct mptcp_pm_local *locals) +{ + struct pm_nl_pernet *pernet =3D pm_nl_get_pernet_from_msk(msk); + DECLARE_BITMAP(unavail_id, MPTCP_PM_MAX_ADDR_ID + 1); + struct mptcp_subflow_context *subflow; + struct sock *sk =3D (struct sock *)msk; + struct mptcp_pm_addr_entry *entry; + struct mptcp_pm_local *local; + int found =3D 0; + + /* Forbid creation of new subflows matching existing ones, possibly + * already created by 'subflow' endpoints + */ + bitmap_zero(unavail_id, MPTCP_PM_MAX_ADDR_ID + 1); + mptcp_for_each_subflow(msk, subflow) { + struct sock *ssk =3D mptcp_subflow_tcp_sock(subflow); + + if ((1 << inet_sk_state_load(ssk)) & + (TCPF_FIN_WAIT1 | TCPF_FIN_WAIT2 | TCPF_CLOSING | + TCPF_CLOSE)) + continue; + + __set_bit(READ_ONCE(subflow->local_id), unavail_id); + } + + rcu_read_lock(); + list_for_each_entry_rcu(entry, &pernet->endp_list, list) { + if (!(entry->flags & MPTCP_PM_ADDR_FLAG_ADDRESS)) + continue; + + if (!mptcp_pm_addr_families_match(sk, &entry->addr, remote)) + continue; + + if (test_bit(mptcp_endp_get_local_id(msk, &entry->addr), + unavail_id)) + continue; + + local =3D &locals[0]; + local->addr =3D entry->addr; + local->flags =3D entry->flags; + local->ifindex =3D entry->ifindex; + + if (entry->flags & MPTCP_PM_ADDR_FLAG_SUBFLOW) { + __clear_bit(local->addr.id, msk->pm.id_avail_bitmap); + + if (local->addr.id !=3D msk->mpc_endpoint_id) + msk->pm.local_addr_used++; + } + + msk->pm.extra_subflows++; + found =3D 1; + break; + } + rcu_read_unlock(); + + return found; +} + static unsigned int fill_local_addresses_vec_c_flag(struct mptcp_sock *msk, struct mptcp_addr_info *remote, @@ -527,6 +596,10 @@ fill_local_addresses_vec(struct mptcp_sock *msk, struc= t mptcp_addr_info *remote, if (i) return i; =20 + /* If there is at least one MPTCP endpoint with an address flag */ + if (mptcp_pm_get_endp_address_max(msk)) + return fill_local_address_endp(msk, remote, locals); + /* Special case: peer sets the C flag, accept one ADD_ADDR if default * limits are used -- accepting no ADD_ADDR -- and use subflow endpoints */ @@ -702,6 +775,10 @@ static int mptcp_pm_nl_append_new_local_addr(struct pm= _nl_pernet *pernet, addr_max =3D pernet->endp_subflow_max; WRITE_ONCE(pernet->endp_subflow_max, addr_max + 1); } + if (entry->flags & MPTCP_PM_ADDR_FLAG_ADDRESS) { + addr_max =3D pernet->endp_address_max; + WRITE_ONCE(pernet->endp_address_max, addr_max + 1); + } =20 pernet->endpoints++; if (!entry->addr.port) @@ -1096,6 +1173,10 @@ int mptcp_pm_nl_del_addr_doit(struct sk_buff *skb, s= truct genl_info *info) addr_max =3D pernet->endp_subflow_max; WRITE_ONCE(pernet->endp_subflow_max, addr_max - 1); } + if (entry->flags & MPTCP_PM_ADDR_FLAG_ADDRESS) { + addr_max =3D pernet->endp_address_max; + WRITE_ONCE(pernet->endp_address_max, addr_max - 1); + } =20 pernet->endpoints--; list_del_rcu(&entry->list); @@ -1178,6 +1259,7 @@ static void __reset_counters(struct pm_nl_pernet *per= net) { WRITE_ONCE(pernet->endp_signal_max, 0); WRITE_ONCE(pernet->endp_subflow_max, 0); + WRITE_ONCE(pernet->endp_address_max, 0); pernet->endpoints =3D 0; } =20 diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index 027d717ef7cffe150f8de7b3b404916a1899537a..57e4db26e0ae1c5e82bc5a262cc= b9d5e36508543 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -1179,6 +1179,7 @@ void mptcp_pm_worker(struct mptcp_sock *msk); void __mptcp_pm_kernel_worker(struct mptcp_sock *msk); u8 mptcp_pm_get_endp_signal_max(const struct mptcp_sock *msk); u8 mptcp_pm_get_endp_subflow_max(const struct mptcp_sock *msk); +u8 mptcp_pm_get_endp_address_max(const struct mptcp_sock *msk); u8 mptcp_pm_get_limit_add_addr_accepted(const struct mptcp_sock *msk); u8 mptcp_pm_get_limit_extra_subflows(const struct mptcp_sock *msk); =20 diff --git a/net/mptcp/sockopt.c b/net/mptcp/sockopt.c index 92a2a274262732a345b9ab185efd7da1f0a5773a..3cdc35323cc18de3585169fe729= a51cab25a4cba 100644 --- a/net/mptcp/sockopt.c +++ b/net/mptcp/sockopt.c @@ -980,6 +980,8 @@ void mptcp_diag_fill_info(struct mptcp_sock *msk, struc= t mptcp_info *info) mptcp_pm_get_limit_add_addr_accepted(msk); info->mptcpi_endp_subflow_max =3D mptcp_pm_get_endp_subflow_max(msk); + info->mptcpi_endp_address_max =3D + mptcp_pm_get_endp_address_max(msk); } =20 if (__mptcp_check_fallback(msk)) --=20 2.51.0