From nobody Thu Nov 27 12:37:17 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6C4312EF673; Fri, 21 Nov 2025 17:02:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763744552; cv=none; b=ESPryXhelyC2BdPmpbSI43qk/em50tFl7Pr5aQAirKdxUjDDk7g5D4mjyG0NmFpG7JpGSkbjeCfk92YwI8UWSuZhXokXyDixiGftSZhWNXfO0etpe5RgfGm6x3te4OrwFk1qWlrg5Rx2he3EixbEIKBRICBfyteS3Clgyhcn3p4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763744552; c=relaxed/simple; bh=E5wcI+z/Iyhniomwivx/9qPvXu9SKA6nPNoFpiR5cmI=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=jowCLOzn/fZ5ruHuDBYCeoCMpkCN5LfBWxUjUDqXyD7PEthZES69AaiBcsoF68D8SVlG3XbsVA4PyFmCiv1fkPVK6ioZ4RD75x8//HGtgc6dacJ5fo1XFynWadujPngbv+/v8T6skLhU3Nc3SGa51UL7UcLHNVq6OOlodTitEPE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=LjrsNAMk; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="LjrsNAMk" Received: by smtp.kernel.org (Postfix) with ESMTPSA id B1246C4CEF1; Fri, 21 Nov 2025 17:02:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1763744551; bh=E5wcI+z/Iyhniomwivx/9qPvXu9SKA6nPNoFpiR5cmI=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=LjrsNAMk7EqdJ1rDyrSyFSMyqEcHyuhF3RGHfQMRWC97UPZA1jRUCce9HF9zwL6V+ hUipubhu/Y6aKi2tanvmAzNIFk3j2vFJQUzNJhGMHpBmLsR4nFQQ5Dte9rulCycedt NO/5hiJ1dmZsEL6o50i+74MSw+6cK6nW6sFeGgG+1FdDj8FIhfSoemILQaXQof/6Lr EI+64s/qCmE+VzXBW3dzL/J/HAEgFlU8hbz+gaj+BWdaTNCWNypyqhSoIZ57IjtZE6 0HFg46rXYzCrkp6j+WW4y2G0tDncymJduI73rTjStQu5Mr9L6IKE9M98D9o5/TBpOD 7c8SrE7IvKZGg== From: "Matthieu Baerts (NGI0)" Date: Fri, 21 Nov 2025 18:02:00 +0100 Subject: [PATCH net-next 01/14] net: factor-out _sk_charge() helper Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20251121-net-next-mptcp-memcg-backlog-imp-v1-1-1f34b6c1e0b1@kernel.org> References: <20251121-net-next-mptcp-memcg-backlog-imp-v1-0-1f34b6c1e0b1@kernel.org> In-Reply-To: <20251121-net-next-mptcp-memcg-backlog-imp-v1-0-1f34b6c1e0b1@kernel.org> To: Eric Dumazet , Kuniyuki Iwashima , Paolo Abeni , Willem de Bruijn , "David S. Miller" , Jakub Kicinski , Simon Horman , David Ahern , Mat Martineau , Geliang Tang , Peter Krystad , Florian Westphal , Christoph Paasch Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, mptcp@lists.linux.dev, Davide Caratti , "Matthieu Baerts (NGI0)" X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=openpgp-sha256; l=2662; i=matttbe@kernel.org; h=from:subject:message-id; bh=hK0a1iO/ogfUyqYhDGeAlo+Zr9nsYO9S3nxWufSEKio=; b=owGbwMvMwCVWo/Th0Gd3rumMp9WSGDIVZgswZf5axP6QzecYl1HAWjGR2x4zj/BcOSbKtfTbw 3BzN97kjlIWBjEuBlkxRRbptsj8mc+reEu8/Cxg5rAygQxh4OIUgImkHWZk2BCU5tAQzybe7Ryb Y73ki4yvJ0OnAftP3lJeeY/NPzLeMvzTzNjg+PP6Nl/d3kWCrE8/Se9Nc8/YtoxvHtc/rlCmpAg WAA== X-Developer-Key: i=matttbe@kernel.org; a=openpgp; fpr=E8CB85F76877057A6E27F77AF6B7824F4269A073 From: Paolo Abeni Move out of __inet_accept() the code dealing charging newly accepted socket to memcg. MPTCP will soon use it to on a per subflow basis, in different contexts. No functional changes intended. Signed-off-by: Paolo Abeni Acked-by: Geliang Tang Acked-by: Matthieu Baerts (NGI0) Signed-off-by: Matthieu Baerts (NGI0) --- include/net/sock.h | 2 ++ net/core/sock.c | 18 ++++++++++++++++++ net/ipv4/af_inet.c | 17 +---------------- 3 files changed, 21 insertions(+), 16 deletions(-) diff --git a/include/net/sock.h b/include/net/sock.h index a5f36ea9d46f..38d48cfe0741 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1631,6 +1631,8 @@ static inline void sk_mem_uncharge(struct sock *sk, i= nt size) sk_mem_reclaim(sk); } =20 +void __sk_charge(struct sock *sk, gfp_t gfp); + #if IS_ENABLED(CONFIG_PROVE_LOCKING) && IS_ENABLED(CONFIG_MODULES) static inline void sk_owner_set(struct sock *sk, struct module *owner) { diff --git a/net/core/sock.c b/net/core/sock.c index 3b74fc71f51c..b26a6cdc9bcd 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -3448,6 +3448,24 @@ void __sk_mem_reclaim(struct sock *sk, int amount) } EXPORT_SYMBOL(__sk_mem_reclaim); =20 +void __sk_charge(struct sock *sk, gfp_t gfp) +{ + int amt; + + gfp |=3D __GFP_NOFAIL; + if (mem_cgroup_from_sk(sk)) { + /* The socket has not been accepted yet, no need + * to look at newsk->sk_wmem_queued. + */ + amt =3D sk_mem_pages(sk->sk_forward_alloc + + atomic_read(&sk->sk_rmem_alloc)); + if (amt) + mem_cgroup_sk_charge(sk, amt, gfp); + } + + kmem_cache_charge(sk, gfp); +} + int sk_set_peek_off(struct sock *sk, int val) { WRITE_ONCE(sk->sk_peek_off, val); diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index a31b94ce8968..08d811f11896 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -756,23 +756,8 @@ EXPORT_SYMBOL(inet_stream_connect); void __inet_accept(struct socket *sock, struct socket *newsock, struct soc= k *newsk) { if (mem_cgroup_sockets_enabled) { - gfp_t gfp =3D GFP_KERNEL | __GFP_NOFAIL; - mem_cgroup_sk_alloc(newsk); - - if (mem_cgroup_from_sk(newsk)) { - int amt; - - /* The socket has not been accepted yet, no need - * to look at newsk->sk_wmem_queued. - */ - amt =3D sk_mem_pages(newsk->sk_forward_alloc + - atomic_read(&newsk->sk_rmem_alloc)); - if (amt) - mem_cgroup_sk_charge(newsk, amt, gfp); - } - - kmem_cache_charge(newsk, gfp); + __sk_charge(newsk, GFP_KERNEL); } =20 sock_rps_record_flow(newsk); --=20 2.51.0 From nobody Thu Nov 27 12:37:17 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B6BE32ED15D; Fri, 21 Nov 2025 17:02:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763744555; cv=none; b=mCFzFmJlravIgZkOWErRU6pa59/DApe6Pd0QoXjJRnpXKJS2qDDpxoKiRhr84Stsd75RUZTukJMpZGpIjxcVGdSMkV9HurC7urjOrLskWf0QbHS+2ZoJTjh0JzU1rmUTT0/YNHWk4XKfUG3eROmhwnYTlYB6bhE1KZeqOBLLEK8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763744555; c=relaxed/simple; bh=0S2Dl/LwDiEUosVGC/2CWwzwI/BJ/rbvoLABLOoz4Jo=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=TvEHiV3PuQrV8R9btG8cL6giRYVds4WfjNM5I50W/DHgnMzKG0r95QYRQMFsVYCHcaoIN1uTvrC+M2thN8C1qh9JdhTusV1qT5VoNuFuK+BHpHudrjlSSooGyjruTkUJWEN5ZDA9S+4sZiH0umaL9XHYQ6F0kLw6XeaNeFXvFxo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=edUiglc1; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="edUiglc1" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 23F60C116C6; Fri, 21 Nov 2025 17:02:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1763744555; bh=0S2Dl/LwDiEUosVGC/2CWwzwI/BJ/rbvoLABLOoz4Jo=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=edUiglc18eTAHAXbz4xdaykCLWbkZGRNecOm9BKlIarDcfw5QdlErvL1rApCqvmpW ctGdrPQTlKk1N4SNtwFMXF5VyMCQPUTohzpK/7khpnFvKddJb6zTrAwkwIuZfskyRy Cfp6FWQwUQde1dUbxoDt+V8K1z4Kw3M4XnU9RRy3SYy5O+biqXMgh0NaTxPzcIOMf3 cCIFRjIclM6eNKZzrfOM60U2Qd/VreArsMshOi8eIF45Ig37rrWI6M95ZHsfRdkJV3 p6rrJ/gclgEVDn+xbiqeG1TJuCcJpCLavLtQ509PEN1YfpDDnMl1Dw7/5U5bEPB/x/ NFuKwRa4iof+A== From: "Matthieu Baerts (NGI0)" Date: Fri, 21 Nov 2025 18:02:01 +0100 Subject: [PATCH net-next 02/14] mptcp: factor-out cgroup data inherit helper Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20251121-net-next-mptcp-memcg-backlog-imp-v1-2-1f34b6c1e0b1@kernel.org> References: <20251121-net-next-mptcp-memcg-backlog-imp-v1-0-1f34b6c1e0b1@kernel.org> In-Reply-To: <20251121-net-next-mptcp-memcg-backlog-imp-v1-0-1f34b6c1e0b1@kernel.org> To: Eric Dumazet , Kuniyuki Iwashima , Paolo Abeni , Willem de Bruijn , "David S. Miller" , Jakub Kicinski , Simon Horman , David Ahern , Mat Martineau , Geliang Tang , Peter Krystad , Florian Westphal , Christoph Paasch Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, mptcp@lists.linux.dev, Davide Caratti , "Matthieu Baerts (NGI0)" X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=openpgp-sha256; l=2371; i=matttbe@kernel.org; h=from:subject:message-id; bh=5JqcFvBV+VrzUQAyfBZzw40qaie6ZII/3JwjK0LcD90=; b=owGbwMvMwCVWo/Th0Gd3rumMp9WSGDIVZgsJ3P3Osrfil7HAhK7ZBTNkNgTkZpkGrHf4J+T9M 0b/guLTjlIWBjEuBlkxRRbptsj8mc+reEu8/Cxg5rAygQxh4OIUgIlYFjP8s2krufNN/V+Y8e05 KyZsP77uAqvdt3nhz/UTNyVe/zvJKpKR4VUxw/KZfr2Gu328Q/3kH5g8fti758Tj8IjPn8OPp3+ fwgEA X-Developer-Key: i=matttbe@kernel.org; a=openpgp; fpr=E8CB85F76877057A6E27F77AF6B7824F4269A073 From: Paolo Abeni MPTCP will soon need the same functionality for passive sockets, factor them out in a common helper. No functional change intended. Signed-off-by: Paolo Abeni Reviewed-by: Geliang Tang Reviewed-by: Matthieu Baerts (NGI0) Signed-off-by: Matthieu Baerts (NGI0) --- net/mptcp/protocol.h | 2 ++ net/mptcp/subflow.c | 20 ++++++++++++-------- 2 files changed, 14 insertions(+), 8 deletions(-) diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index a23780ff670f..6d9de13c1f9c 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -707,6 +707,8 @@ mptcp_subflow_delegated_next(struct mptcp_delegated_act= ion *delegated) return ret; } =20 +void __mptcp_inherit_cgrp_data(struct sock *sk, struct sock *ssk); + int mptcp_is_enabled(const struct net *net); unsigned int mptcp_get_add_addr_timeout(const struct net *net); int mptcp_is_checksum_enabled(const struct net *net); diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index ddd0fc6fcf45..d98d151392d2 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -1712,21 +1712,25 @@ int __mptcp_subflow_connect(struct sock *sk, const = struct mptcp_pm_local *local, return err; } =20 -static void mptcp_attach_cgroup(struct sock *parent, struct sock *child) +void __mptcp_inherit_cgrp_data(struct sock *sk, struct sock *ssk) { #ifdef CONFIG_SOCK_CGROUP_DATA - struct sock_cgroup_data *parent_skcd =3D &parent->sk_cgrp_data, - *child_skcd =3D &child->sk_cgrp_data; + struct sock_cgroup_data *sk_cd =3D &sk->sk_cgrp_data, + *ssk_cd =3D &ssk->sk_cgrp_data; =20 /* only the additional subflows created by kworkers have to be modified */ - if (cgroup_id(sock_cgroup_ptr(parent_skcd)) !=3D - cgroup_id(sock_cgroup_ptr(child_skcd))) { - cgroup_sk_free(child_skcd); - *child_skcd =3D *parent_skcd; - cgroup_sk_clone(child_skcd); + if (cgroup_id(sock_cgroup_ptr(sk_cd)) !=3D + cgroup_id(sock_cgroup_ptr(ssk_cd))) { + cgroup_sk_free(ssk_cd); + *ssk_cd =3D *sk_cd; + cgroup_sk_clone(sk_cd); } #endif /* CONFIG_SOCK_CGROUP_DATA */ +} =20 +static void mptcp_attach_cgroup(struct sock *parent, struct sock *child) +{ + __mptcp_inherit_cgrp_data(parent, child); if (mem_cgroup_sockets_enabled) mem_cgroup_sk_inherit(parent, child); } --=20 2.51.0 From nobody Thu Nov 27 12:37:17 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EC8913451A3; Fri, 21 Nov 2025 17:02:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763744559; cv=none; b=D+dyM4f4420+ozdtjPaK8aZG4A0LUtSg3Pa4HJrIXtCg1mj97tGrSz+CtWlxqaze3bDxOJvybqnto4z+5a5gqKjiwW6NsmPJCRXHAT/hB5/PtlLv7gxRtpyp0UCmyJnMcUL9vEbM+PJNQNds4TGftEKGYQJrTt5T61/AmsAdOqg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763744559; c=relaxed/simple; bh=jQ0UDB4U29x/KlnOx2L558m9xp5Y4bGvMVx4QgXsvAY=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=X3PGst/2kjkSZ8KzeyN0a5i70U0VQiJtzX9Rnja/EtTNOay9UbI/j8ufr6RCSg7LxQl0abuBCAvXX1Op4uc/GufKu9ZIXja8LHbbdu2jEnQfBFGHMFbw1lLVW+PvyCqlEPXC99W861TkzLIGCnNsrK8+SyS71S52l4LTjdvjMB8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=ilEO6GAI; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="ilEO6GAI" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 895B0C4CEF1; Fri, 21 Nov 2025 17:02:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1763744558; bh=jQ0UDB4U29x/KlnOx2L558m9xp5Y4bGvMVx4QgXsvAY=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=ilEO6GAIQEmvi73CO+PkIZFQJQVe0bSLd+kKfmOnlWX2iZL87CqdFE+GZVWihWM8d gXnDXPKmJKj5bLUvMJsneqO52HhCRT9E/+XM0BzdW1qJIiznpcshFrE0KKkcPi8tye 2Qpymrjrfg34+SIwF1b3ZNkysGT0TDdQmDPRWiJoc8TSec4QmNmlmVap419R8tKw4V GfP1ubAx9A7sFa2kCcl3RciRw47kxo0y2zQ0FkM77nE3A7kZKcPZXAuiAI7zOrFM+1 yK9kaJVddyIyw4LTzqp17Og5b5TT040P88fGIPEmOCTYFmOmyM/+vrsBD6KR52Py+a 1D+un71sEVX1A== From: "Matthieu Baerts (NGI0)" Date: Fri, 21 Nov 2025 18:02:02 +0100 Subject: [PATCH net-next 03/14] mptcp: grafting MPJ subflow earlier Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20251121-net-next-mptcp-memcg-backlog-imp-v1-3-1f34b6c1e0b1@kernel.org> References: <20251121-net-next-mptcp-memcg-backlog-imp-v1-0-1f34b6c1e0b1@kernel.org> In-Reply-To: <20251121-net-next-mptcp-memcg-backlog-imp-v1-0-1f34b6c1e0b1@kernel.org> To: Eric Dumazet , Kuniyuki Iwashima , Paolo Abeni , Willem de Bruijn , "David S. Miller" , Jakub Kicinski , Simon Horman , David Ahern , Mat Martineau , Geliang Tang , Peter Krystad , Florian Westphal , Christoph Paasch Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, mptcp@lists.linux.dev, Davide Caratti , "Matthieu Baerts (NGI0)" X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=openpgp-sha256; l=2932; i=matttbe@kernel.org; h=from:subject:message-id; bh=TzG7leIFSjvo22mwXd7Xqen2NE60+AKXWYBJRpqRFhM=; b=owGbwMvMwCVWo/Th0Gd3rumMp9WSGDIVZgsnx991bsrfwRHTa9TEnH1z7h9DpjkBfY8286umR nQcM9vYUcrCIMbFICumyCLdFpk/83kVb4mXnwXMHFYmkCEMXJwCMJFTkxkZLkY0c/LkvL721P3S p0+P10nz5tfkrdo/my1+FcOWI+K32Bn+Gf7iTGx767X9cc5NvYzjb7c8rLmqruO+eP+sDQpWjxd YcQIA X-Developer-Key: i=matttbe@kernel.org; a=openpgp; fpr=E8CB85F76877057A6E27F77AF6B7824F4269A073 From: Paolo Abeni Later patches need to ensure that all MPJ subflows are grafted to the msk socket before accept() completion. Currently the grafting happens under the msk socket lock: potentially at msk release_cb time which make satisfying the above condition a bit tricky. Move the MPJ subflow grafting earlier, under the msk data lock, so that we can use such lock as a synchronization point. Signed-off-by: Paolo Abeni Reviewed-by: Matthieu Baerts (NGI0) Signed-off-by: Matthieu Baerts (NGI0) --- net/mptcp/protocol.c | 30 +++++++++++++++++++++++------- 1 file changed, 23 insertions(+), 7 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 75bb1199bed9..2104ab1eda1d 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -895,12 +895,6 @@ static bool __mptcp_finish_join(struct mptcp_sock *msk= , struct sock *ssk) mptcp_subflow_joined(msk, ssk); spin_unlock_bh(&msk->fallback_lock); =20 - /* attach to msk socket only after we are sure we will deal with it - * at close time - */ - if (sk->sk_socket && !ssk->sk_socket) - mptcp_sock_graft(ssk, sk->sk_socket); - mptcp_subflow_ctx(ssk)->subflow_id =3D msk->subflow_id++; mptcp_sockopt_sync_locked(msk, ssk); mptcp_stop_tout_timer(sk); @@ -3647,6 +3641,20 @@ void mptcp_sock_graft(struct sock *sk, struct socket= *parent) write_unlock_bh(&sk->sk_callback_lock); } =20 +/* Can be called without holding the msk socket lock; use the callback lock + * to avoid {READ_,WRITE_}ONCE annotations on sk_socket. + */ +static void mptcp_sock_check_graft(struct sock *sk, struct sock *ssk) +{ + struct socket *sock; + + write_lock_bh(&sk->sk_callback_lock); + sock =3D sk->sk_socket; + write_unlock_bh(&sk->sk_callback_lock); + if (sock) + mptcp_sock_graft(ssk, sock); +} + bool mptcp_finish_join(struct sock *ssk) { struct mptcp_subflow_context *subflow =3D mptcp_subflow_ctx(ssk); @@ -3662,7 +3670,9 @@ bool mptcp_finish_join(struct sock *ssk) return false; } =20 - /* active subflow, already present inside the conn_list */ + /* Active subflow, already present inside the conn_list; is grafted + * either by __mptcp_subflow_connect() or accept. + */ if (!list_empty(&subflow->node)) { spin_lock_bh(&msk->fallback_lock); if (!msk->allow_subflows) { @@ -3689,11 +3699,17 @@ bool mptcp_finish_join(struct sock *ssk) if (ret) { sock_hold(ssk); list_add_tail(&subflow->node, &msk->conn_list); + mptcp_sock_check_graft(parent, ssk); } } else { sock_hold(ssk); list_add_tail(&subflow->node, &msk->join_list); __set_bit(MPTCP_FLUSH_JOIN_LIST, &msk->cb_flags); + + /* In case of later failures, __mptcp_flush_join_list() will + * properly orphan the ssk via mptcp_close_ssk(). + */ + mptcp_sock_check_graft(parent, ssk); } mptcp_data_unlock(parent); =20 --=20 2.51.0 From nobody Thu Nov 27 12:37:17 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 786DD340293; Fri, 21 Nov 2025 17:02:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763744562; cv=none; b=mYwFBWAvh6rXW4RIttcr4kiTYzKpoYYkhqCTVtbWpPJtTVhpzeJTiN7MoANB1lJ4BsutvC+qHPSWZgBeCGPRhSBBuXbFGO7ulQZ/pyr1kvQpSNsl/Fix59yzxYImdSHMpNdhvtz7iAljcYKtCt7GXHN0fUFBWUuO7QzrXpM1nZw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763744562; c=relaxed/simple; bh=JqiyVRxMHEYtyT/rjFS/LpRedROqGTJ7vuKAa7M89V0=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=QSPmIB7s9AapUrGCbPJ92z2lDJ1JdiRQtGlR5DDjC6rV20wFmEaMLW5VLDLZ9GEP46RJEQjrPr5ZRZFBWF2SAQ0ZwLnYbHPQqzaTyMH9cUUTqbUHeZIZZJMTsOrZv/hHpXKqf3Cs1wDWr/mYYSjgy5aKyHFem/s2UHKg5UJC5Yo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=rypruDlL; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="rypruDlL" Received: by smtp.kernel.org (Postfix) with ESMTPSA id EF7D7C116D0; Fri, 21 Nov 2025 17:02:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1763744562; bh=JqiyVRxMHEYtyT/rjFS/LpRedROqGTJ7vuKAa7M89V0=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=rypruDlL4gS0xw+0N3B2/mLhGOZTuUqPTGK7eBk+c8rsdGFwKXnGFBYqDcUKqRidt Fccxo38dAP0e3U9THPLjJepQb46j9fKO5hplXBgcxlsnebTPAOQIsiPsitM4FdSwBR 0CEZ1RAzTTg+uj3egKYoTahPIarjVPV9y2L4+lB8JPBT0qpUZCWJJENIgVR1wtmmQO 3usRl/JCCfpjysLc3/ho9KyijRgdGg7YdZePeo2j+xwC5ozRQUD0rD4VMRU84S4fp9 QtUnba0GdLI7fX246n7WbQ/gg3PyRJx03EXQRJI4ye+JtEoNK2sMA5eLCf9rB3CLYG 2v6NmUUig9Kqw== From: "Matthieu Baerts (NGI0)" Date: Fri, 21 Nov 2025 18:02:03 +0100 Subject: [PATCH net-next 04/14] mptcp: fix memcg accounting for passive sockets Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20251121-net-next-mptcp-memcg-backlog-imp-v1-4-1f34b6c1e0b1@kernel.org> References: <20251121-net-next-mptcp-memcg-backlog-imp-v1-0-1f34b6c1e0b1@kernel.org> In-Reply-To: <20251121-net-next-mptcp-memcg-backlog-imp-v1-0-1f34b6c1e0b1@kernel.org> To: Eric Dumazet , Kuniyuki Iwashima , Paolo Abeni , Willem de Bruijn , "David S. Miller" , Jakub Kicinski , Simon Horman , David Ahern , Mat Martineau , Geliang Tang , Peter Krystad , Florian Westphal , Christoph Paasch Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, mptcp@lists.linux.dev, Davide Caratti , "Matthieu Baerts (NGI0)" X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=openpgp-sha256; l=4066; i=matttbe@kernel.org; h=from:subject:message-id; bh=TRMQesoJ1teDIYpT9/7kCY3YsKcNcxDZavMqNyv2chc=; b=owGbwMvMwCVWo/Th0Gd3rumMp9WSGDIVZotwzwg9YZ3fZ3lO4h77TMG0r4mXGlc+thOa4Rw6p VsnY7tTRykLgxgXg6yYIot0W2T+zOdVvCVefhYwc1iZQIYwcHEKwER2PmRkWNR9P6p5T+WX/wWn r2nuEr8nvfXUPQ41kSK+43cWr3me1svIsGCtwP845kSDj4nHkrRqwn/ffCltcOWsM09kQ2//GwF GZgA= X-Developer-Key: i=matttbe@kernel.org; a=openpgp; fpr=E8CB85F76877057A6E27F77AF6B7824F4269A073 From: Paolo Abeni The passive sockets never got proper memcg accounting: the msk socket is associated with the memcg at accept time, but the passive subflows never got it right. At accept time, traverse the subflows list and associate each of them with the msk memcg, and try to do the same at join completion time, if the msk has been already accepted. Fixes: cf7da0d66cc1 ("mptcp: Create SUBFLOW socket for incoming connections= ") Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/298 Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/597 Signed-off-by: Paolo Abeni Reviewed-by: Matthieu Baerts (NGI0) Signed-off-by: Matthieu Baerts (NGI0) --- net/mptcp/protocol.c | 38 +++++++++++++++++++++++++++----------- net/mptcp/protocol.h | 1 + net/mptcp/subflow.c | 10 ++++++++++ 3 files changed, 38 insertions(+), 11 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 2104ab1eda1d..67732d3c502c 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -3651,8 +3651,11 @@ static void mptcp_sock_check_graft(struct sock *sk, = struct sock *ssk) write_lock_bh(&sk->sk_callback_lock); sock =3D sk->sk_socket; write_unlock_bh(&sk->sk_callback_lock); - if (sock) + if (sock) { mptcp_sock_graft(ssk, sock); + __mptcp_inherit_cgrp_data(sk, ssk); + __mptcp_inherit_memcg(sk, ssk, GFP_ATOMIC); + } } =20 bool mptcp_finish_join(struct sock *ssk) @@ -3970,6 +3973,28 @@ static int mptcp_listen(struct socket *sock, int bac= klog) return err; } =20 +static void mptcp_graft_subflows(struct sock *sk) +{ + struct mptcp_subflow_context *subflow; + struct mptcp_sock *msk =3D mptcp_sk(sk); + + mptcp_for_each_subflow(msk, subflow) { + struct sock *ssk =3D mptcp_subflow_tcp_sock(subflow); + + lock_sock(ssk); + + /* Set ssk->sk_socket of accept()ed flows to mptcp socket. + * This is needed so NOSPACE flag can be set from tcp stack. + */ + if (!ssk->sk_socket) + mptcp_sock_graft(ssk, sk->sk_socket); + + __mptcp_inherit_cgrp_data(sk, ssk); + __mptcp_inherit_memcg(sk, ssk, GFP_KERNEL); + release_sock(ssk); + } +} + static int mptcp_stream_accept(struct socket *sock, struct socket *newsock, struct proto_accept_arg *arg) { @@ -4017,16 +4042,7 @@ static int mptcp_stream_accept(struct socket *sock, = struct socket *newsock, msk =3D mptcp_sk(newsk); msk->in_accept_queue =3D 0; =20 - /* set ssk->sk_socket of accept()ed flows to mptcp socket. - * This is needed so NOSPACE flag can be set from tcp stack. - */ - mptcp_for_each_subflow(msk, subflow) { - struct sock *ssk =3D mptcp_subflow_tcp_sock(subflow); - - if (!ssk->sk_socket) - mptcp_sock_graft(ssk, newsock); - } - + mptcp_graft_subflows(newsk); mptcp_rps_record_subflows(msk); =20 /* Do late cleanup for the first subflow as necessary. Also diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index 6d9de13c1f9c..8c27f4b1789f 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -707,6 +707,7 @@ mptcp_subflow_delegated_next(struct mptcp_delegated_act= ion *delegated) return ret; } =20 +void __mptcp_inherit_memcg(struct sock *sk, struct sock *ssk, gfp_t gfp); void __mptcp_inherit_cgrp_data(struct sock *sk, struct sock *ssk); =20 int mptcp_is_enabled(const struct net *net); diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index d98d151392d2..72b7efe388db 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -1712,6 +1712,16 @@ int __mptcp_subflow_connect(struct sock *sk, const s= truct mptcp_pm_local *local, return err; } =20 +void __mptcp_inherit_memcg(struct sock *sk, struct sock *ssk, gfp_t gfp) +{ + /* Only if the msk has been accepted already (and not orphaned).*/ + if (!mem_cgroup_sockets_enabled || !sk->sk_socket) + return; + + mem_cgroup_sk_inherit(sk, ssk); + __sk_charge(ssk, gfp); +} + void __mptcp_inherit_cgrp_data(struct sock *sk, struct sock *ssk) { #ifdef CONFIG_SOCK_CGROUP_DATA --=20 2.51.0 From nobody Thu Nov 27 12:37:17 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C74E934FF41; Fri, 21 Nov 2025 17:02:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763744565; cv=none; b=EPW9+M/np4pNITmKJ76g3ZRAw9wv7FiM0fnGdbhJjA0Lsgw1umHYygl8577vKLmQC0XXp5GhgdUZN/wI+Hp32FN09+uf3lXOvBbkcMSZ19nnKR6PJ3Psm+sSwOxh+adYNqgUVIcm2RXOruEkR5LRBBhoUWbwm8A34xjbEouhIj8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763744565; c=relaxed/simple; bh=yR90RXZ3QrEYVCUxbKbZqVjv/h5Ac8YPBbnH0DCdS84=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=Ob70GpSPcNdPGJoucrLqTuFA+xbXHFvOGU/WStmamYESW4oRUR7Gui+B+xhEW2xbqMzf8RUBGv83W9wsNULBF0Mje1vKWlaLrDIg6kAu4f19TpUml/nu8Hn6lU4JS2ibLTbyX5KhacqKsKoa8MImyEC8ubBXt/KA24cVT1zFba0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=LR2Sp7nC; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="LR2Sp7nC" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 60DC6C4CEF1; Fri, 21 Nov 2025 17:02:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1763744565; bh=yR90RXZ3QrEYVCUxbKbZqVjv/h5Ac8YPBbnH0DCdS84=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=LR2Sp7nCzmQrZHf+Erqbt7xwGjNeGmtru2mfEH+z3ajOp0Fb02Kc3pWNW0nNwzY5L fUejlBa1gbb7qxII5nrjg0jCerHdB3CXzUO6xZZjQS7DtcQtz2BlwRxHkPskVBUWFZ qOt5KquUqXkDjztzoVHWuBEC/vX8400ZTp1GnXBLj8NSv8UqSk+q5BTnXu2ExyTRqS mmPfinWyixBTqdDmVOO8TvblsV98xZru/qzjoaPeNjwRD9kFHwbFksApU1vlq8+TCi ery2ZKLLbng6sBiIrT2n+tzUTOC3JU+DABxisO/AnMKK5JFijvNa8NepHljyaNKFjd XmCx8f5New9vw== From: "Matthieu Baerts (NGI0)" Date: Fri, 21 Nov 2025 18:02:04 +0100 Subject: [PATCH net-next 05/14] mptcp: cleanup fallback data fin reception Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20251121-net-next-mptcp-memcg-backlog-imp-v1-5-1f34b6c1e0b1@kernel.org> References: <20251121-net-next-mptcp-memcg-backlog-imp-v1-0-1f34b6c1e0b1@kernel.org> In-Reply-To: <20251121-net-next-mptcp-memcg-backlog-imp-v1-0-1f34b6c1e0b1@kernel.org> To: Eric Dumazet , Kuniyuki Iwashima , Paolo Abeni , Willem de Bruijn , "David S. Miller" , Jakub Kicinski , Simon Horman , David Ahern , Mat Martineau , Geliang Tang , Peter Krystad , Florian Westphal , Christoph Paasch Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, mptcp@lists.linux.dev, Davide Caratti , "Matthieu Baerts (NGI0)" X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=openpgp-sha256; l=1827; i=matttbe@kernel.org; h=from:subject:message-id; bh=VEHGF89C18Lt+T6mAhQaFTy0GJ+jzNFrpHskh7oH5Ek=; b=owGbwMvMwCVWo/Th0Gd3rumMp9WSGDIVZot9jFcS9LqzP3Hl9cLERDPTntl/LixbcIv78Pdg8 +2J9nopHaUsDGJcDLJiiizSbZH5M59X8ZZ4+VnAzGFlAhnCwMUpABPZk8/wz8Ty4Pn5xnHW5VlZ SntdWw8ZJB/5dbHmltpBQx7h7GSRrYwMW9SOeIjecDLwm/Y/dZ8nPw8Xo9Q9rUn8+ZbmdSd+1Vs wAgA= X-Developer-Key: i=matttbe@kernel.org; a=openpgp; fpr=E8CB85F76877057A6E27F77AF6B7824F4269A073 From: Paolo Abeni MPTCP currently generate a dummy data_fin for fallback socket when the fallback subflow has completed data reception using the current ack_seq. We are going to introduce backlog usage for the msk soon, even for fallback sockets: the ack_seq value will not match the most recent sequence number seen by the fallback subflow socket, as it will ignore data_seq sitting in the backlog. Instead use the last map sequence number to set the data_fin, as fallback (dummy) map sequences are always in sequence. Reviewed-by: Geliang Tang Tested-by: Geliang Tang Signed-off-by: Paolo Abeni Reviewed-by: Mat Martineau Signed-off-by: Matthieu Baerts (NGI0) --- net/mptcp/subflow.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index 72b7efe388db..1f7311afd48d 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -1285,6 +1285,7 @@ static bool subflow_is_done(const struct sock *sk) /* sched mptcp worker for subflow cleanup if no more data is pending */ static void subflow_sched_work_if_closed(struct mptcp_sock *msk, struct so= ck *ssk) { + const struct mptcp_subflow_context *subflow =3D mptcp_subflow_ctx(ssk); struct sock *sk =3D (struct sock *)msk; =20 if (likely(ssk->sk_state !=3D TCP_CLOSE && @@ -1303,7 +1304,8 @@ static void subflow_sched_work_if_closed(struct mptcp= _sock *msk, struct sock *ss */ if (__mptcp_check_fallback(msk) && subflow_is_done(ssk) && msk->first =3D=3D ssk && - mptcp_update_rcv_data_fin(msk, READ_ONCE(msk->ack_seq), true)) + mptcp_update_rcv_data_fin(msk, subflow->map_seq + + subflow->map_data_len, true)) mptcp_schedule_work(sk); } =20 --=20 2.51.0 From nobody Thu Nov 27 12:37:17 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3DB4635292C; Fri, 21 Nov 2025 17:02:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763744569; cv=none; b=PVUNCMGn+7pcNmK5l6ZV8zo3g1E9v4r561xIUf052kCmtMRKiEj8zqRuDWz9KfAsxJaX/4pSAfIo+M8B1kj0PyLSvSi+Nd9dswLh9xUulKAHIy87L6K9WZi5THelKruutQ57ShM49W1MM1j1zgquw7Qq476X61lwQXstN0/hwG8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763744569; c=relaxed/simple; bh=c1FiB03N4TjtbuLBNxU7XN7WYPiLFo1I2pUiGPKArCg=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=PFNFxpSsokelUKGkxpp3Jq7BgO4+g+jpz3O59aNJhTDQwQla0qm/RVeZoJs1ruQeJkOhTxwbk7WF+6b53FLuP2Nt8qfkc3BEDfv0dlBaLeMbCOMe2ue4BGOYn07GvrG0ORRG75Uq4M743fp9KuH52XeQw44jKdV2/+l9KMKS4P4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=pjE5iKip; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="pjE5iKip" Received: by smtp.kernel.org (Postfix) with ESMTPSA id C8D19C19423; Fri, 21 Nov 2025 17:02:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1763744568; bh=c1FiB03N4TjtbuLBNxU7XN7WYPiLFo1I2pUiGPKArCg=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=pjE5iKip0xlBCQI3455FWo+6Y6iVwxeVwVT6Ca6K5BejKYbzWUyM8iEH49iaROXzo 2sCYXuiVtSvXyLegLfMsIE0yxRwkHmNdHKwdUqG0cJMWXsHR8eI4a9i/K60s0jqix1 ym/LOgZNNZoFxdfotVyphkPqFj1Taoo/x8bn9XB2LlPqAo7cTJq4q99hBWRQREODs2 y4G3AcFPxOo114IL37YVfNQeSnCdCBWJZc/9CmvS7+M7pxOobodLHGibJ3diSQzUlo NaprwnlyO0zKJHjIC+mEb2SSYU5afHw7Go7Q7QEGBtOhQgE9iveyk1R/0vSYrdNt0z pgY2wkA2WmYxg== From: "Matthieu Baerts (NGI0)" Date: Fri, 21 Nov 2025 18:02:05 +0100 Subject: [PATCH net-next 06/14] mptcp: cleanup fallback dummy mapping generation Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20251121-net-next-mptcp-memcg-backlog-imp-v1-6-1f34b6c1e0b1@kernel.org> References: <20251121-net-next-mptcp-memcg-backlog-imp-v1-0-1f34b6c1e0b1@kernel.org> In-Reply-To: <20251121-net-next-mptcp-memcg-backlog-imp-v1-0-1f34b6c1e0b1@kernel.org> To: Eric Dumazet , Kuniyuki Iwashima , Paolo Abeni , Willem de Bruijn , "David S. Miller" , Jakub Kicinski , Simon Horman , David Ahern , Mat Martineau , Geliang Tang , Peter Krystad , Florian Westphal , Christoph Paasch Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, mptcp@lists.linux.dev, Davide Caratti , "Matthieu Baerts (NGI0)" X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=openpgp-sha256; l=2363; i=matttbe@kernel.org; h=from:subject:message-id; bh=+D4DMCB5Vu2ppaXiUX6i79cxbGMbyyjuyyjizxfl+Qg=; b=owGbwMvMwCVWo/Th0Gd3rumMp9WSGDIVZouzrFzNGueWuFOe/dVJMcHiV43rvNJvKX/RFI3tE 1jWvm1nRykLgxgXg6yYIot0W2T+zOdVvCVefhYwc1iZQIYwcHEKwEQSpRgZftwqEenO52Dl315x 89vO3Ry77j3lnKQjPU9ml6Gxq31EDyNDz7repCT2KxeOH/zj27tdyHadc/DqncvNg1MtG+2eeZj wAAA= X-Developer-Key: i=matttbe@kernel.org; a=openpgp; fpr=E8CB85F76877057A6E27F77AF6B7824F4269A073 From: Paolo Abeni MPTCP currently access ack_seq outside the msk socket log scope to generate the dummy mapping for fallback socket. Soon we are going to introduce backlog usage and even for fallback socket the ack_seq value will be significantly off outside of the msk socket lock scope. Avoid relying on ack_seq for dummy mapping generation, using instead the subflow sequence number. Note that in case of disconnect() and (re)connect() we must ensure that any previous state is re-set. Signed-off-by: Paolo Abeni Reviewed-by: Geliang Tang Tested-by: Geliang Tang Reviewed-by: Mat Martineau Signed-off-by: Matthieu Baerts (NGI0) --- net/mptcp/protocol.c | 3 +++ net/mptcp/subflow.c | 8 +++++++- 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 67732d3c502c..df4be41ed3fe 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -3274,6 +3274,9 @@ static int mptcp_disconnect(struct sock *sk, int flag= s) msk->bytes_retrans =3D 0; msk->rcvspace_init =3D 0; =20 + /* for fallback's sake */ + WRITE_ONCE(msk->ack_seq, 0); + WRITE_ONCE(sk->sk_shutdown, 0); sk_error_report(sk); return 0; diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index 1f7311afd48d..86ce58ae533d 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -491,6 +491,9 @@ static void subflow_set_remote_key(struct mptcp_sock *m= sk, mptcp_crypto_key_sha(subflow->remote_key, NULL, &subflow->iasn); subflow->iasn++; =20 + /* for fallback's sake */ + subflow->map_seq =3D subflow->iasn; + WRITE_ONCE(msk->remote_key, subflow->remote_key); WRITE_ONCE(msk->ack_seq, subflow->iasn); WRITE_ONCE(msk->can_ack, true); @@ -1435,9 +1438,12 @@ static bool subflow_check_data_avail(struct sock *ss= k) =20 skb =3D skb_peek(&ssk->sk_receive_queue); subflow->map_valid =3D 1; - subflow->map_seq =3D READ_ONCE(msk->ack_seq); subflow->map_data_len =3D skb->len; subflow->map_subflow_seq =3D tcp_sk(ssk)->copied_seq - subflow->ssn_offse= t; + subflow->map_seq =3D __mptcp_expand_seq(subflow->map_seq, + subflow->iasn + + TCP_SKB_CB(skb)->seq - + subflow->ssn_offset - 1); WRITE_ONCE(subflow->data_avail, true); return true; } --=20 2.51.0 From nobody Thu Nov 27 12:37:17 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9DDCC3546F7; Fri, 21 Nov 2025 17:02:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763744572; cv=none; b=OSurfrILb/xuLt+KBu4AFR4EdSbc9D5fvAclryErJi+/AwiDci+VS2XYbxw+avMGjmiEnChQvn11hK1c8794xm9+MuzJW6RNIf1DRrt3Eq+iW9BO3fmDm6TiDX9JCvTj8tld4xAg8Y6eHVkGkj+9OJAwnYK2NMcOLvdkoRSLvF8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763744572; c=relaxed/simple; bh=jIdiLX3VWzgNrqf7EpPm0F5vpLI5qnCVaoAPIq4F4Gw=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=C4HaDyoNpjTzue56hQc+gVmU0r2vTbj7MSbkFqN841DlJss4j286zYGojwqQ0s8KG3X1HpXm82bhgV2AuyxktNLURbigbcPtPjEqFjYVMPY1OCvmbEWqXTkdw2q0FXXpAt/Fp2TutbqwXnLePNaxWb3vpr/U2gFXNYyy7W2HztY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=obc0CNTm; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="obc0CNTm" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3C2CEC116C6; Fri, 21 Nov 2025 17:02:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1763744572; bh=jIdiLX3VWzgNrqf7EpPm0F5vpLI5qnCVaoAPIq4F4Gw=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=obc0CNTmHNgimNG9AMF+2xHb5vpsiz7As8MBfZ++aEfoSuOX3OEHolL68mEm85Z4L NjO6uCEySYBNcQlOtCC67wh/E6yJKQltlvz09TCABI4tLFnnEsWUwDJWXVldqIKCI4 bg8bL8Q2EdL/x+vmRzW7BruH9YbYAs2KhEvFoX6928dgBniTxcAIPLNOFy6UaZ3H3b BDPlssUX3xOJdGzVE37iMkmi6IxMXDyHBM9+F4oKwH1f3drarg99mrqt5IQ41Oipzu N3ljAi9J0hd79qNbkVNVv8fdGKZSldyUzlAH9IN49/gQz52yolXDUSVe5Jrdw1y0cG rmODV7vwHH1bQ== From: "Matthieu Baerts (NGI0)" Date: Fri, 21 Nov 2025 18:02:06 +0100 Subject: [PATCH net-next 07/14] mptcp: ensure the kernel PM does not take action too late Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20251121-net-next-mptcp-memcg-backlog-imp-v1-7-1f34b6c1e0b1@kernel.org> References: <20251121-net-next-mptcp-memcg-backlog-imp-v1-0-1f34b6c1e0b1@kernel.org> In-Reply-To: <20251121-net-next-mptcp-memcg-backlog-imp-v1-0-1f34b6c1e0b1@kernel.org> To: Eric Dumazet , Kuniyuki Iwashima , Paolo Abeni , Willem de Bruijn , "David S. Miller" , Jakub Kicinski , Simon Horman , David Ahern , Mat Martineau , Geliang Tang , Peter Krystad , Florian Westphal , Christoph Paasch Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, mptcp@lists.linux.dev, Davide Caratti , "Matthieu Baerts (NGI0)" X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=openpgp-sha256; l=1949; i=matttbe@kernel.org; h=from:subject:message-id; bh=tsDinnysBE7H5NIp/IYbumBnwwyuBfamcgQFOo66t24=; b=owGbwMvMwCVWo/Th0Gd3rumMp9WSGDIVZktM8dN4YCUZpzirVzLlha/F6qa1nU+kVm88Eezw8 Iz8SjuljlIWBjEuBlkxRRbptsj8mc+reEu8/Cxg5rAygQxh4OIUgIlUXGD4n68hYHY0IOHP8qev Wssj93jOWfLwwk6lqcyiXBy74z3FDjMyPMjQkDzO26gvzDZB9pfm4u6A6dtDtt5S+Bv5ku+hwuv /LAA= X-Developer-Key: i=matttbe@kernel.org; a=openpgp; fpr=E8CB85F76877057A6E27F77AF6B7824F4269A073 From: Paolo Abeni The PM hooks can currently take place when the msk is already shutting down. Subflow creation will fail, thanks to the existing check at join time, but we can entirely avoid starting the to be failed operations. Signed-off-by: Paolo Abeni Reviewed-by: Geliang Tang Tested-by: Geliang Tang Reviewed-by: Mat Martineau Signed-off-by: Matthieu Baerts (NGI0) --- net/mptcp/pm.c | 4 +++- net/mptcp/pm_kernel.c | 2 ++ 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c index 9604b91902b8..e2040c327af6 100644 --- a/net/mptcp/pm.c +++ b/net/mptcp/pm.c @@ -594,6 +594,7 @@ void mptcp_pm_subflow_established(struct mptcp_sock *ms= k) void mptcp_pm_subflow_check_next(struct mptcp_sock *msk, const struct mptcp_subflow_context *subflow) { + struct sock *sk =3D (struct sock *)msk; struct mptcp_pm_data *pm =3D &msk->pm; bool update_subflows; =20 @@ -617,7 +618,8 @@ void mptcp_pm_subflow_check_next(struct mptcp_sock *msk, /* Even if this subflow is not really established, tell the PM to try * to pick the next ones, if possible. */ - if (mptcp_pm_nl_check_work_pending(msk)) + if (mptcp_is_fully_established(sk) && + mptcp_pm_nl_check_work_pending(msk)) mptcp_pm_schedule_work(msk, MPTCP_PM_SUBFLOW_ESTABLISHED); =20 spin_unlock_bh(&pm->lock); diff --git a/net/mptcp/pm_kernel.c b/net/mptcp/pm_kernel.c index 5c1dc13efa94..57570a44e418 100644 --- a/net/mptcp/pm_kernel.c +++ b/net/mptcp/pm_kernel.c @@ -337,6 +337,8 @@ static void mptcp_pm_create_subflow_or_signal_addr(stru= ct mptcp_sock *msk) struct mptcp_pm_local local; =20 mptcp_mpc_endpoint_setup(msk); + if (!mptcp_is_fully_established(sk)) + return; =20 pr_debug("local %d:%d signal %d:%d subflows %d:%d\n", msk->pm.local_addr_used, endp_subflow_max, --=20 2.51.0 From nobody Thu Nov 27 12:37:17 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 120323546F7; Fri, 21 Nov 2025 17:02:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763744576; cv=none; b=AecPw9pj6+XhhFiZT6IBySwJ0A1W6dG8lhp0ioZfDwPQmE6SqAmZt3XfgPXN8iEYvGYY9GlqpJnX9XdQTTB0Jw1zYKdVkx4ZVLl6Kct7MznKECNMO3iUFLADW58ZoVfZ7twDZCwprAhWHPNhTWVbUwpWY2u+paOl9fs5oBOiOpk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763744576; c=relaxed/simple; bh=gF+aKM/DWD8pO3TJStmjtkBHNQi5DzGMJDiohuFYPlo=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=swwSg40Qan8LjDMPNcSuwygCxxqUOTNB8t2WuIfFnoa3uOBsIlmQNAlrjVsovtlsCmbocnHRoscRtKkALgT0RRHpsuXIpBTI5nkWIszokPHiFYd41T+EUKlAHJN+zEjWLFPkjB4I4M2t44ZFopS+nHGwubUc6xNQ6B9NjpNCIZM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=fSX+Q8lg; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="fSX+Q8lg" Received: by smtp.kernel.org (Postfix) with ESMTPSA id A2C5BC4CEF1; Fri, 21 Nov 2025 17:02:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1763744575; bh=gF+aKM/DWD8pO3TJStmjtkBHNQi5DzGMJDiohuFYPlo=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=fSX+Q8lgcIsD01JHS1mY26BjGAxJbhFEztXjpe3M+BV8YsACqdWou2U6gbuL92RvH iE0u/kqUW87FADV1CLVIZwkvy+8vg9LTcqEEaBpGM+l+Xud3RI287ZJ4s/RNTcGtzO P5c7ETF0EOTJh14PG4u91bYETQyzPAA0NrEEWtbDFrJ8Vz4aW1WVRUlIkDv1UR0P1w tN4McXtpRO6dAbcYlOs8WwRmjGuL43xr8HirRh9xDmieKH5hC2rOya7ScsfM6MZ2kh Hfd2T5hVwQ71wE1smZcFCeSp6Ulcr5kZRcV6ruTr8FuyPQakIlIIDSlEnlgq5f6c6Z S9RExzHbxBaOQ== From: "Matthieu Baerts (NGI0)" Date: Fri, 21 Nov 2025 18:02:07 +0100 Subject: [PATCH net-next 08/14] mptcp: do not miss early first subflow close event notification Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20251121-net-next-mptcp-memcg-backlog-imp-v1-8-1f34b6c1e0b1@kernel.org> References: <20251121-net-next-mptcp-memcg-backlog-imp-v1-0-1f34b6c1e0b1@kernel.org> In-Reply-To: <20251121-net-next-mptcp-memcg-backlog-imp-v1-0-1f34b6c1e0b1@kernel.org> To: Eric Dumazet , Kuniyuki Iwashima , Paolo Abeni , Willem de Bruijn , "David S. Miller" , Jakub Kicinski , Simon Horman , David Ahern , Mat Martineau , Geliang Tang , Peter Krystad , Florian Westphal , Christoph Paasch Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, mptcp@lists.linux.dev, Davide Caratti , "Matthieu Baerts (NGI0)" X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=openpgp-sha256; l=1462; i=matttbe@kernel.org; h=from:subject:message-id; bh=gtdZKYTiUr42xJypfzFgeFyAQf6xGlVY7RStOQVlKdE=; b=owGbwMvMwCVWo/Th0Gd3rumMp9WSGDIVZkv2Laxy1Y83KBFr2qbWxRyWcqD4lVWH/Keq+Cl3n eoLNwp3lLIwiHExyIopski3RebPfF7FW+LlZwEzh5UJZAgDF6cATMTLj5HhRMFq1tSIGN4inz+3 X/DEVpt1v5rVd6C/jof52Yy47crXGRmWO7yprbDcu8Or4c7yVUZT+799n8bF9uyk+4/EupOb38x lBgA= X-Developer-Key: i=matttbe@kernel.org; a=openpgp; fpr=E8CB85F76877057A6E27F77AF6B7824F4269A073 From: Paolo Abeni The MPTCP protocol is not currently emitting the NL event when the first subflow is closed before msk accept() time. By replacing the in use close helper is such scenario, implicitly introduce the missing notification. Note that in such scenario we want to be sure that mptcp_close_ssk() will not trigger any PM work, move the msk state change update earlier, so that the previous patch will offer such guarantee. Signed-off-by: Paolo Abeni Reviewed-by: Geliang Tang Tested-by: Geliang Tang Reviewed-by: Mat Martineau Signed-off-by: Matthieu Baerts (NGI0) --- net/mptcp/protocol.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index df4be41ed3fe..2ee76c8c5167 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -4052,10 +4052,10 @@ static int mptcp_stream_accept(struct socket *sock,= struct socket *newsock, * deal with bad peers not doing a complete shutdown. */ if (unlikely(inet_sk_state_load(msk->first) =3D=3D TCP_CLOSE)) { - __mptcp_close_ssk(newsk, msk->first, - mptcp_subflow_ctx(msk->first), 0); if (unlikely(list_is_singular(&msk->conn_list))) mptcp_set_state(newsk, TCP_CLOSE); + mptcp_close_ssk(newsk, msk->first, + mptcp_subflow_ctx(msk->first)); } } else { tcpfallback: --=20 2.51.0 From nobody Thu Nov 27 12:37:17 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 36625355808; Fri, 21 Nov 2025 17:02:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763744579; cv=none; b=l0MQNcYpC3AOnMCTxxxi/9G/f2M7FSbSIl3ggWrSZVwkznD/6NQtYdO3xmmSCu1ha7cmRfiHFv9K8MlKAZ63uwDjPmf7EZl5rQSoR25dNeWJV/vfJfbEP3dWMtknz8tVl+6y9UEj6VB5M4g62V+c3yYzBAv4B9ceyuRAzUci8/g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763744579; c=relaxed/simple; bh=XLpfj1iPnlktIIm2wFnxDVZ3YyEzVgDp9AF5Vh/cX6s=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=h0V+0WNquXPL8lL0hoDLutup2W/k25e7CEp66DPMwcHyxhnYCwOePujpE8mojfLVmj/4kIwsQgN4XlFgnh8k2vDjDPWIIACPyN27VlGtOZTR6yASihfwFL8DB87jSi0fgPAEkHCZsuWhRrmh9tXQQbC12g5YuCX8bfCgjZeY7M4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=GQ5eIvcN; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="GQ5eIvcN" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 14721C116C6; Fri, 21 Nov 2025 17:02:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1763744579; bh=XLpfj1iPnlktIIm2wFnxDVZ3YyEzVgDp9AF5Vh/cX6s=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=GQ5eIvcNTjb7yHsIJWzj/dJTzJaBI/aSgWUrsN/HEtr9MfmaTd+rFyNUxxY3GJ9Tf 0SGqC2SpyQ6+8KL/gqqLXpAvnIrQpZoNgUQGZ3iQXG2wvPA1HDOk30tdqmaf5F6JjO RUqjhXHIOSf7gPjNknxuPOAlxYu0eUQB/xIu1CbBHlvd64nMBN2Lko9za/2Hjqdpco 9nH+SWF194j7z4N6oWYI6iOWa87FxEYXqo6epiicgqg/la+51Vi3VzLytugkWkOzrN B7bkfstkxzF0zJpdQEppx9WY4IMd1Bl45o3xCAP8KhK+SJr6GW+cVvBMfWqS0Yg9vP gjqALyivD2/QA== From: "Matthieu Baerts (NGI0)" Date: Fri, 21 Nov 2025 18:02:08 +0100 Subject: [PATCH net-next 09/14] mptcp: make mptcp_destroy_common() static Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20251121-net-next-mptcp-memcg-backlog-imp-v1-9-1f34b6c1e0b1@kernel.org> References: <20251121-net-next-mptcp-memcg-backlog-imp-v1-0-1f34b6c1e0b1@kernel.org> In-Reply-To: <20251121-net-next-mptcp-memcg-backlog-imp-v1-0-1f34b6c1e0b1@kernel.org> To: Eric Dumazet , Kuniyuki Iwashima , Paolo Abeni , Willem de Bruijn , "David S. Miller" , Jakub Kicinski , Simon Horman , David Ahern , Mat Martineau , Geliang Tang , Peter Krystad , Florian Westphal , Christoph Paasch Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, mptcp@lists.linux.dev, Davide Caratti , "Matthieu Baerts (NGI0)" X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=openpgp-sha256; l=3042; i=matttbe@kernel.org; h=from:subject:message-id; bh=b+ZCnRM0+Jr50ht0mP9uLm5IRFiMFB1h4Upl9hCGY8E=; b=owGbwMvMwCVWo/Th0Gd3rumMp9WSGDIVZkvtDL/n9KG0SLnevl9vbqOgy9UQd/Vq6UTxjuWX8 zUCFi/tKGVhEONikBVTZJFui8yf+byKt8TLzwJmDisTyBAGLk4BmMibuwz/lGZM3a+iHFe51cmL Q/GNjUHLSb491h/MnkxconBL+4THY0aG2dpT1D7tYPqavfBcvfmfZp5mzk0ui6SO/Tb2Yfneotf CDwA= X-Developer-Key: i=matttbe@kernel.org; a=openpgp; fpr=E8CB85F76877057A6E27F77AF6B7824F4269A073 From: Paolo Abeni Such function is only used inside protocol.c, there is no need to expose it to the whole stack. Note that the function definition most be moved earlier to avoid forward declaration. Signed-off-by: Paolo Abeni Reviewed-by: Geliang Tang Tested-by: Geliang Tang Reviewed-by: Mat Martineau Signed-off-by: Matthieu Baerts (NGI0) --- net/mptcp/protocol.c | 42 +++++++++++++++++++++--------------------- net/mptcp/protocol.h | 2 -- 2 files changed, 21 insertions(+), 23 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 2ee76c8c5167..29e5bda0e913 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -3222,6 +3222,27 @@ static void mptcp_copy_inaddrs(struct sock *msk, con= st struct sock *ssk) inet_sk(msk)->inet_rcv_saddr =3D inet_sk(ssk)->inet_rcv_saddr; } =20 +static void mptcp_destroy_common(struct mptcp_sock *msk) +{ + struct mptcp_subflow_context *subflow, *tmp; + struct sock *sk =3D (struct sock *)msk; + + __mptcp_clear_xmit(sk); + + /* join list will be eventually flushed (with rst) at sock lock release t= ime */ + mptcp_for_each_subflow_safe(msk, subflow, tmp) + __mptcp_close_ssk(sk, mptcp_subflow_tcp_sock(subflow), subflow, 0); + + __skb_queue_purge(&sk->sk_receive_queue); + skb_rbtree_purge(&msk->out_of_order_queue); + + /* move all the rx fwd alloc into the sk_mem_reclaim_final in + * inet_sock_destruct() will dispose it + */ + mptcp_token_destroy(msk); + mptcp_pm_destroy(msk); +} + static int mptcp_disconnect(struct sock *sk, int flags) { struct mptcp_sock *msk =3D mptcp_sk(sk); @@ -3427,27 +3448,6 @@ void mptcp_rcv_space_init(struct mptcp_sock *msk, co= nst struct sock *ssk) msk->rcvq_space.space =3D TCP_INIT_CWND * TCP_MSS_DEFAULT; } =20 -void mptcp_destroy_common(struct mptcp_sock *msk) -{ - struct mptcp_subflow_context *subflow, *tmp; - struct sock *sk =3D (struct sock *)msk; - - __mptcp_clear_xmit(sk); - - /* join list will be eventually flushed (with rst) at sock lock release t= ime */ - mptcp_for_each_subflow_safe(msk, subflow, tmp) - __mptcp_close_ssk(sk, mptcp_subflow_tcp_sock(subflow), subflow, 0); - - __skb_queue_purge(&sk->sk_receive_queue); - skb_rbtree_purge(&msk->out_of_order_queue); - - /* move all the rx fwd alloc into the sk_mem_reclaim_final in - * inet_sock_destruct() will dispose it - */ - mptcp_token_destroy(msk); - mptcp_pm_destroy(msk); -} - static void mptcp_destroy(struct sock *sk) { struct mptcp_sock *msk =3D mptcp_sk(sk); diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index 8c27f4b1789f..3d2892cc0ef2 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -980,8 +980,6 @@ static inline void mptcp_propagate_sndbuf(struct sock *= sk, struct sock *ssk) local_bh_enable(); } =20 -void mptcp_destroy_common(struct mptcp_sock *msk); - #define MPTCP_TOKEN_MAX_RETRIES 4 =20 void __init mptcp_token_init(void); --=20 2.51.0 From nobody Thu Nov 27 12:37:17 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0D063355808; Fri, 21 Nov 2025 17:03:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763744583; cv=none; b=GEh/ATGEh0IV3/R+31szA3YROalajZ8EuRXZFgT6Oq78oyBMqOENasedPdjsU4NqtN9OEnEuHCl4v+E4aQXSN9Vtn9fJQyKvg459WzYbDH2MdROSTJc/VEaXgagI7sxw+8nXN8K6ILRddOzh5egSoCSQh+D9VrmO62lefWmRClA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763744583; c=relaxed/simple; bh=Y+xXgx+Uq+p1JTHqxHVETDhmBVd89PnFCtfD8cUF40M=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=CvxxFfC+s5NYDaxLUbUqrp+/7MltSrHMV/Su8l44i6Okpq/jXcMaAg/o1Zcd3WZXOoM/a2AIK1mJ0L+CWbEg2ZaNp1mydzDxQDVEXZJJGQFAeDoOaq3A3kj7vD7rrXn7Z3gi6L5oxqa6xTl2n5Kk7bBRMn7JO+DsOUYIShF3LQ8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=UtSL3LxX; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="UtSL3LxX" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7B23AC4CEF1; Fri, 21 Nov 2025 17:02:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1763744582; bh=Y+xXgx+Uq+p1JTHqxHVETDhmBVd89PnFCtfD8cUF40M=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=UtSL3LxX6bQJh5maA4a3OIZCmGL0n4Bif/vKXwZ2ZIdvaCZzpVH61Mq+sYR+kwAhp xGmGnp63QYyX+3h7cK+eZ+oTjwysD0VkcJfOBZDsByjF7oTcpMn1g9e7xX15zfPfU/ 3BDZksuT/0L3uxVLH+lBx702CjIQ47419yktDvzIOw/Ly3/y9ruL+hSCfuALZU498v 1BuoMsLNXLRUnm8aPGYW5faA2PY8gCZx4Wj+wrlw1iHA/4ql4tbzeiI/6c52DDPqo8 MXdoITA6Rr1cJImRy8+ax0p+V6g3fjnqcaidvVVO7IM6DQBdlX8OTN/CAMWu/3Iszy h1X/MYc83fEFg== From: "Matthieu Baerts (NGI0)" Date: Fri, 21 Nov 2025 18:02:09 +0100 Subject: [PATCH net-next 10/14] mptcp: drop the __mptcp_data_ready() helper Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20251121-net-next-mptcp-memcg-backlog-imp-v1-10-1f34b6c1e0b1@kernel.org> References: <20251121-net-next-mptcp-memcg-backlog-imp-v1-0-1f34b6c1e0b1@kernel.org> In-Reply-To: <20251121-net-next-mptcp-memcg-backlog-imp-v1-0-1f34b6c1e0b1@kernel.org> To: Eric Dumazet , Kuniyuki Iwashima , Paolo Abeni , Willem de Bruijn , "David S. Miller" , Jakub Kicinski , Simon Horman , David Ahern , Mat Martineau , Geliang Tang , Peter Krystad , Florian Westphal , Christoph Paasch Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, mptcp@lists.linux.dev, Davide Caratti , "Matthieu Baerts (NGI0)" X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=openpgp-sha256; l=1794; i=matttbe@kernel.org; h=from:subject:message-id; bh=hZOTQX3O/F+vkYeO9KCBSATOUH3wCz2peNWf1iIQb6s=; b=owGbwMvMwCVWo/Th0Gd3rumMp9WSGDIVZsvcE/u8O9dTd9KLBa2fl7w+MPkZw+3w051OdvIK4 lNPHr3xsqOUhUGMi0FWTJFFui0yf+bzKt4SLz8LmDmsTCBDGLg4BWAi3cEMf6Xznjxcs8TG/+lH eaW1Jf8XmeSIfMxkse0PuCpQpnd9nj0jw5myedLqR2uXaW/sy+fdEpCsELjMZcJDGyVur5qXn39 PYgMA X-Developer-Key: i=matttbe@kernel.org; a=openpgp; fpr=E8CB85F76877057A6E27F77AF6B7824F4269A073 From: Paolo Abeni It adds little clarity and there is a single user of such helper, just inline it in the caller. Signed-off-by: Paolo Abeni Reviewed-by: Geliang Tang Tested-by: Geliang Tang Reviewed-by: Mat Martineau Signed-off-by: Matthieu Baerts (NGI0) --- net/mptcp/protocol.c | 19 +++++++------------ 1 file changed, 7 insertions(+), 12 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 29e5bda0e913..ba1237853ebf 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -845,18 +845,10 @@ static bool move_skbs_to_msk(struct mptcp_sock *msk, = struct sock *ssk) return moved; } =20 -static void __mptcp_data_ready(struct sock *sk, struct sock *ssk) -{ - struct mptcp_sock *msk =3D mptcp_sk(sk); - - /* Wake-up the reader only for in-sequence data */ - if (move_skbs_to_msk(msk, ssk) && mptcp_epollin_ready(sk)) - sk->sk_data_ready(sk); -} - void mptcp_data_ready(struct sock *sk, struct sock *ssk) { struct mptcp_subflow_context *subflow =3D mptcp_subflow_ctx(ssk); + struct mptcp_sock *msk =3D mptcp_sk(sk); =20 /* The peer can send data while we are shutting down this * subflow at msk destruction time, but we must avoid enqueuing @@ -866,10 +858,13 @@ void mptcp_data_ready(struct sock *sk, struct sock *s= sk) return; =20 mptcp_data_lock(sk); - if (!sock_owned_by_user(sk)) - __mptcp_data_ready(sk, ssk); - else + if (!sock_owned_by_user(sk)) { + /* Wake-up the reader only for in-sequence data */ + if (move_skbs_to_msk(msk, ssk) && mptcp_epollin_ready(sk)) + sk->sk_data_ready(sk); + } else { __set_bit(MPTCP_DEQUEUE, &mptcp_sk(sk)->cb_flags); + } mptcp_data_unlock(sk); } =20 --=20 2.51.0 From nobody Thu Nov 27 12:37:17 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8942522A1D5; Fri, 21 Nov 2025 17:03:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763744586; cv=none; b=sMKkt1W3oTsmrFpgEjWUtUUFbhPtDmGkMu3jLX8+u6rkUTNGsVYHgE56o47k9QdkVGT951gPNuxeFctth1rewLFHuavdkDV42V9PwIwzZt7wal4urejdkPVgMv1TwM6zEhuQcOgQgJvPzQ+QNZxky23O+abakS0GtJQcwuBy2X0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763744586; c=relaxed/simple; bh=veFLr+o0B+jOAG0p/wdPi9gfx4GiO7j8scPu9YY5M9s=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=WaYR6d9vc43F5Fp8PzjepX58GQoWrU/Qpeav8j4WF/A7f9jbk+jg4yaJQDr1YpXWig26sOlOBf8EKlX5wvVNv8wIsFUouk/hgZy1g39IATnzAq+7533/oTjjdYqISzDxvKOolawvrSk5lgGg7atPC2S/1klp2k6PHPeeLEf9EL8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Syw60q6P; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Syw60q6P" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0D330C116C6; Fri, 21 Nov 2025 17:03:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1763744586; bh=veFLr+o0B+jOAG0p/wdPi9gfx4GiO7j8scPu9YY5M9s=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=Syw60q6PNG0xt2JIa29xtc7P2TalQlomatFxB1Vq1CA0ghACGQkW1ETNAptdc5z2S zTR9iztGlqbLTPBCfommg95Pog41SpXnkNQ+TGLI3J0Qh5UZdx5o0OE93NB/zWq4Fi xPYfU6N1xl1gmAd5gnX3xLQEgaan5+aFLfpvgZzLdYJTzHmsJZhZNo3zYmRk0UsVga e+/ixJLf5lmm2+TUeAgwH3sE/pNW0hnjJVP6PTajdIrFRTVP8+tMnPYHvrYDJX3WOg bcT9951ExwevobGEyKszvoyRHoZ3a7dIEqkN9EO/KS61hHlfTN2uuJ1x1x6obysFoh fg8diW3h9Qybw== From: "Matthieu Baerts (NGI0)" Date: Fri, 21 Nov 2025 18:02:10 +0100 Subject: [PATCH net-next 11/14] mptcp: handle first subflow closing consistently Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20251121-net-next-mptcp-memcg-backlog-imp-v1-11-1f34b6c1e0b1@kernel.org> References: <20251121-net-next-mptcp-memcg-backlog-imp-v1-0-1f34b6c1e0b1@kernel.org> In-Reply-To: <20251121-net-next-mptcp-memcg-backlog-imp-v1-0-1f34b6c1e0b1@kernel.org> To: Eric Dumazet , Kuniyuki Iwashima , Paolo Abeni , Willem de Bruijn , "David S. Miller" , Jakub Kicinski , Simon Horman , David Ahern , Mat Martineau , Geliang Tang , Peter Krystad , Florian Westphal , Christoph Paasch Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, mptcp@lists.linux.dev, Davide Caratti , "Matthieu Baerts (NGI0)" X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=openpgp-sha256; l=3870; i=matttbe@kernel.org; h=from:subject:message-id; bh=S64ANmhGtgjjf0LZRwCi2Y2h4lDr+CHuAO5tZ6lN/+U=; b=owGbwMvMwCVWo/Th0Gd3rumMp9WSGDIVZst+Cf/4x1+uLyr3i3cQY8byr76t1x8smuQT67Mn8 JPT5LezOkpZGMS4GGTFFFmk2yLzZz6v4i3x8rOAmcPKBDKEgYtTACby1pDhD/eMMxsPrn/3+PR0 o4/yRa+1Wie5f69fel944oZrB2offk9gZPi0kev+fo5bR7pS7zQ+Ff7MlJ7Hu9zooMJcc/9JLe9 eH2YAAA== X-Developer-Key: i=matttbe@kernel.org; a=openpgp; fpr=E8CB85F76877057A6E27F77AF6B7824F4269A073 From: Paolo Abeni Currently, as soon as the PM closes a subflow, the msk stops accepting data from it, even if the TCP socket could be still formally open in the incoming direction, with the notable exception of the first subflow. The root cause of such behavior is that code currently piggy back two separate semantic on the subflow->disposable bit: the subflow context must be released and that the subflow must stop accepting incoming data. The first subflow is never disposed, so it also never stop accepting incoming data. Use a separate bit to mark the latter status and set such bit in __mptcp_close_ssk() for all subflows. Beyond making per subflow behaviour more consistent this will also simplify the next patch. Signed-off-by: Paolo Abeni Reviewed-by: Mat Martineau Signed-off-by: Matthieu Baerts (NGI0) --- net/mptcp/protocol.c | 14 +++++++++----- net/mptcp/protocol.h | 3 ++- 2 files changed, 11 insertions(+), 6 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index ba1237853ebf..d22f792f4760 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -851,10 +851,10 @@ void mptcp_data_ready(struct sock *sk, struct sock *s= sk) struct mptcp_sock *msk =3D mptcp_sk(sk); =20 /* The peer can send data while we are shutting down this - * subflow at msk destruction time, but we must avoid enqueuing + * subflow at subflow destruction time, but we must avoid enqueuing * more data to the msk receive queue */ - if (unlikely(subflow->disposable)) + if (unlikely(subflow->closing)) return; =20 mptcp_data_lock(sk); @@ -2437,6 +2437,13 @@ static void __mptcp_close_ssk(struct sock *sk, struc= t sock *ssk, struct mptcp_sock *msk =3D mptcp_sk(sk); bool dispose_it, need_push =3D false; =20 + /* Do not pass RX data to the msk, even if the subflow socket is not + * going to be freed (i.e. even for the first subflow on graceful + * subflow close. + */ + lock_sock_nested(ssk, SINGLE_DEPTH_NESTING); + subflow->closing =3D 1; + /* If the first subflow moved to a close state before accept, e.g. due * to an incoming reset or listener shutdown, the subflow socket is * already deleted by inet_child_forget() and the mptcp socket can't @@ -2447,7 +2454,6 @@ static void __mptcp_close_ssk(struct sock *sk, struct= sock *ssk, /* ensure later check in mptcp_worker() will dispose the msk */ sock_set_flag(sk, SOCK_DEAD); mptcp_set_close_tout(sk, tcp_jiffies32 - (mptcp_close_timeout(sk) + 1)); - lock_sock_nested(ssk, SINGLE_DEPTH_NESTING); mptcp_subflow_drop_ctx(ssk); goto out_release; } @@ -2456,8 +2462,6 @@ static void __mptcp_close_ssk(struct sock *sk, struct= sock *ssk, if (dispose_it) list_del(&subflow->node); =20 - lock_sock_nested(ssk, SINGLE_DEPTH_NESTING); - if (subflow->send_fastclose && ssk->sk_state !=3D TCP_CLOSE) tcp_set_state(ssk, TCP_CLOSE); =20 diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index 3d2892cc0ef2..d30806b287d2 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -536,12 +536,13 @@ struct mptcp_subflow_context { send_infinite_map : 1, remote_key_valid : 1, /* received the peer key from */ disposable : 1, /* ctx can be free at ulp release time */ + closing : 1, /* must not pass rx data to msk anymore */ stale : 1, /* unable to snd/rcv data, do not use for xmit */ valid_csum_seen : 1, /* at least one csum validated */ is_mptfo : 1, /* subflow is doing TFO */ close_event_done : 1, /* has done the post-closed part */ mpc_drop : 1, /* the MPC option has been dropped in a rtx */ - __unused : 9; + __unused : 8; bool data_avail; bool scheduled; bool pm_listener; /* a listener managed by the kernel PM? */ --=20 2.51.0 From nobody Thu Nov 27 12:37:17 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9BF2635770E; Fri, 21 Nov 2025 17:03:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763744589; cv=none; b=MVR+2Bv/oLzyx7lPnwe+m1sdPpQfpLyMax6fJFhyHXk/byIdr5Qh2lrQfxBTVehZIH923IODMEc8f9QC7CRDjvunDE7R5TrhJCNVVc13tNEs2osCvUs4hpIXsAIxdjphNEI0yDgCCScF4FOYSXSVeDU/R5aON2rDxfl6mWOQg44= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763744589; c=relaxed/simple; bh=exipAAAG44c1ya/2FIftGqLb+lrF8/NOW6yQn4gOd8g=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=NQBcIae+wRVXbs0zMjrxAa1MDYpSfU+bHLb6xGYf5k+BsLrrH0nui0mMjkKcUgGKFbwGENuu9SgerqmTwr1HcxgBTYtgcVUFRwCTTE8DARwOxjFPNUDzM+olT0Ff8Mowz7+FyzIMIhgSTG/KrbpGBx9Hz9ejl77/0arnCqfOiQc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=tvH5CwQw; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="tvH5CwQw" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 75B47C4CEF1; Fri, 21 Nov 2025 17:03:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1763744589; bh=exipAAAG44c1ya/2FIftGqLb+lrF8/NOW6yQn4gOd8g=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=tvH5CwQwNHgt31gWGOdn8L7LEbE2/j7HMBgaRyLtIXqHCltYLIMiDFAOPGDGaYP2T aSHM43cz+5wCme1FJLgxn6MnmiNcSKROR4lC4UjGAdzzbv7QQj3cYQh7FAde6RFhIx T7uBoD2dcPEDFJiH+K9pAgOiH7WE2aRFGenbk9c1XkYCoM0rYLaCJmOh9TavTP4D+P m1O0seT5504F/xFzj1BWR+b05YSoUi8x/EqBOaf1u8XSyBaVOfwk8cQ22b4O21us9P pBaCE+yAUQNMDPvbVmK0d3K1DVn9mMYgT+ugJPQyDBINSqBtVXEPgw2XH21fx2aJNd 8i5BedE4dyqBQ== From: "Matthieu Baerts (NGI0)" Date: Fri, 21 Nov 2025 18:02:11 +0100 Subject: [PATCH net-next 12/14] mptcp: borrow forward memory from subflow Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20251121-net-next-mptcp-memcg-backlog-imp-v1-12-1f34b6c1e0b1@kernel.org> References: <20251121-net-next-mptcp-memcg-backlog-imp-v1-0-1f34b6c1e0b1@kernel.org> In-Reply-To: <20251121-net-next-mptcp-memcg-backlog-imp-v1-0-1f34b6c1e0b1@kernel.org> To: Eric Dumazet , Kuniyuki Iwashima , Paolo Abeni , Willem de Bruijn , "David S. Miller" , Jakub Kicinski , Simon Horman , David Ahern , Mat Martineau , Geliang Tang , Peter Krystad , Florian Westphal , Christoph Paasch Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, mptcp@lists.linux.dev, Davide Caratti , "Matthieu Baerts (NGI0)" X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=openpgp-sha256; l=7831; i=matttbe@kernel.org; h=from:subject:message-id; bh=hY44Oq6q0TXEmmxU5JFPD8A8RXoTf8Ee1IhPcJGt60c=; b=owGbwMvMwCVWo/Th0Gd3rumMp9WSGDIVZssdYv+yYuKcTN3UxWfPnlrLwWc693Zlx3wBFdFcw 9zWtTzCHaUsDGJcDLJiiizSbZH5M59X8ZZ4+VnAzGFlAhnCwMUpABOx6mBk2Lclbnvue0FzrqS2 0LNHHE/8f3Xi6JZm5WtrbuTdnHnC/TjDPyWZPsvLniXTJfSblB/FTqn/KMYfYaBm8PY9z+cHl2o kOAA= X-Developer-Key: i=matttbe@kernel.org; a=openpgp; fpr=E8CB85F76877057A6E27F77AF6B7824F4269A073 From: Paolo Abeni In the MPTCP receive path, we release the subflow allocated fwd memory just to allocate it again shortly after for the msk. That could increases the failures chances, especially when we will add backlog processing, with other actions could consume the just released memory before the msk socket has a chance to do the rcv allocation. Replace the skb_orphan() call with an open-coded variant that explicitly borrows, the fwd memory from the subflow socket instead of releasing it. The borrowed memory does not have PAGE_SIZE granularity; rounding to the page size will make the fwd allocated memory higher than what is strictly required and could make the incoming subflow fwd mem consistently negative. Instead, keep track of the accumulated frag and borrow the full page at subflow close time. This allow removing the last drop in the TCP to MPTCP transition and the associated, now unused, MIB. Signed-off-by: Paolo Abeni Reviewed-by: Mat Martineau Signed-off-by: Matthieu Baerts (NGI0) --- net/mptcp/fastopen.c | 4 +++- net/mptcp/mib.c | 1 - net/mptcp/mib.h | 1 - net/mptcp/protocol.c | 23 +++++++++++++++-------- net/mptcp/protocol.h | 28 ++++++++++++++++++++++++++++ 5 files changed, 46 insertions(+), 11 deletions(-) diff --git a/net/mptcp/fastopen.c b/net/mptcp/fastopen.c index b9e451197902..82ec15bcfd7f 100644 --- a/net/mptcp/fastopen.c +++ b/net/mptcp/fastopen.c @@ -32,7 +32,8 @@ void mptcp_fastopen_subflow_synack_set_params(struct mptc= p_subflow_context *subf /* dequeue the skb from sk receive queue */ __skb_unlink(skb, &ssk->sk_receive_queue); skb_ext_reset(skb); - skb_orphan(skb); + + mptcp_subflow_lend_fwdmem(subflow, skb); =20 /* We copy the fastopen data, but that don't belong to the mptcp sequence * space, need to offset it in the subflow sequence, see mptcp_subflow_ge= t_map_offset() @@ -50,6 +51,7 @@ void mptcp_fastopen_subflow_synack_set_params(struct mptc= p_subflow_context *subf mptcp_data_lock(sk); DEBUG_NET_WARN_ON_ONCE(sock_owned_by_user_nocheck(sk)); =20 + mptcp_borrow_fwdmem(sk, skb); skb_set_owner_r(skb, sk); __skb_queue_tail(&sk->sk_receive_queue, skb); mptcp_sk(sk)->bytes_received +=3D skb->len; diff --git a/net/mptcp/mib.c b/net/mptcp/mib.c index 171643815076..f23fda0c55a7 100644 --- a/net/mptcp/mib.c +++ b/net/mptcp/mib.c @@ -71,7 +71,6 @@ static const struct snmp_mib mptcp_snmp_list[] =3D { SNMP_MIB_ITEM("MPFastcloseRx", MPTCP_MIB_MPFASTCLOSERX), SNMP_MIB_ITEM("MPRstTx", MPTCP_MIB_MPRSTTX), SNMP_MIB_ITEM("MPRstRx", MPTCP_MIB_MPRSTRX), - SNMP_MIB_ITEM("RcvPruned", MPTCP_MIB_RCVPRUNED), SNMP_MIB_ITEM("SubflowStale", MPTCP_MIB_SUBFLOWSTALE), SNMP_MIB_ITEM("SubflowRecover", MPTCP_MIB_SUBFLOWRECOVER), SNMP_MIB_ITEM("SndWndShared", MPTCP_MIB_SNDWNDSHARED), diff --git a/net/mptcp/mib.h b/net/mptcp/mib.h index a1d3e9369fbb..812218b5ed2b 100644 --- a/net/mptcp/mib.h +++ b/net/mptcp/mib.h @@ -70,7 +70,6 @@ enum linux_mptcp_mib_field { MPTCP_MIB_MPFASTCLOSERX, /* Received a MP_FASTCLOSE */ MPTCP_MIB_MPRSTTX, /* Transmit a MP_RST */ MPTCP_MIB_MPRSTRX, /* Received a MP_RST */ - MPTCP_MIB_RCVPRUNED, /* Incoming packet dropped due to memory limit */ MPTCP_MIB_SUBFLOWSTALE, /* Subflows entered 'stale' status */ MPTCP_MIB_SUBFLOWRECOVER, /* Subflows returned to active status after bei= ng stale */ MPTCP_MIB_SNDWNDSHARED, /* Subflow snd wnd is overridden by msk's one */ diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index d22f792f4760..f5526855a2e5 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -358,7 +358,7 @@ static void mptcp_data_queue_ofo(struct mptcp_sock *msk= , struct sk_buff *skb) static void mptcp_init_skb(struct sock *ssk, struct sk_buff *skb, int offs= et, int copy_len) { - const struct mptcp_subflow_context *subflow =3D mptcp_subflow_ctx(ssk); + struct mptcp_subflow_context *subflow =3D mptcp_subflow_ctx(ssk); bool has_rxtstamp =3D TCP_SKB_CB(skb)->has_rxtstamp; =20 /* the skb map_seq accounts for the skb offset: @@ -383,11 +383,7 @@ static bool __mptcp_move_skb(struct sock *sk, struct s= k_buff *skb) struct mptcp_sock *msk =3D mptcp_sk(sk); struct sk_buff *tail; =20 - /* try to fetch required memory from subflow */ - if (!sk_rmem_schedule(sk, skb, skb->truesize)) { - MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_RCVPRUNED); - goto drop; - } + mptcp_borrow_fwdmem(sk, skb); =20 if (MPTCP_SKB_CB(skb)->map_seq =3D=3D msk->ack_seq) { /* in sequence */ @@ -409,7 +405,6 @@ static bool __mptcp_move_skb(struct sock *sk, struct sk= _buff *skb) * will retransmit as needed, if needed. */ MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_DUPDATA); -drop: mptcp_drop(sk, skb); return false; } @@ -710,7 +705,7 @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp= _sock *msk, size_t len =3D skb->len - offset; =20 mptcp_init_skb(ssk, skb, offset, len); - skb_orphan(skb); + mptcp_subflow_lend_fwdmem(subflow, skb); ret =3D __mptcp_move_skb(sk, skb) || ret; seq +=3D len; =20 @@ -2436,6 +2431,7 @@ static void __mptcp_close_ssk(struct sock *sk, struct= sock *ssk, { struct mptcp_sock *msk =3D mptcp_sk(sk); bool dispose_it, need_push =3D false; + int fwd_remaining; =20 /* Do not pass RX data to the msk, even if the subflow socket is not * going to be freed (i.e. even for the first subflow on graceful @@ -2444,6 +2440,17 @@ static void __mptcp_close_ssk(struct sock *sk, struc= t sock *ssk, lock_sock_nested(ssk, SINGLE_DEPTH_NESTING); subflow->closing =3D 1; =20 + /* Borrow the fwd allocated page left-over; fwd memory for the subflow + * could be negative at this point, but will be reach zero soon - when + * the data allocated using such fragment will be freed. + */ + if (subflow->lent_mem_frag) { + fwd_remaining =3D PAGE_SIZE - subflow->lent_mem_frag; + sk_forward_alloc_add(sk, fwd_remaining); + sk_forward_alloc_add(ssk, -fwd_remaining); + subflow->lent_mem_frag =3D 0; + } + /* If the first subflow moved to a close state before accept, e.g. due * to an incoming reset or listener shutdown, the subflow socket is * already deleted by inet_child_forget() and the mptcp socket can't diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index d30806b287d2..5e2749d92a49 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -547,6 +547,7 @@ struct mptcp_subflow_context { bool scheduled; bool pm_listener; /* a listener managed by the kernel PM? */ bool fully_established; /* path validated */ + u32 lent_mem_frag; u32 remote_nonce; u64 thmac; u32 local_nonce; @@ -646,6 +647,33 @@ mptcp_send_active_reset_reason(struct sock *sk) tcp_send_active_reset(sk, GFP_ATOMIC, reason); } =20 +/* Made the fwd mem carried by the given skb available to the msk, + * To be paired with a previous mptcp_subflow_lend_fwdmem() before freeing + * the skb or setting the skb ownership. + */ +static inline void mptcp_borrow_fwdmem(struct sock *sk, struct sk_buff *sk= b) +{ + struct sock *ssk =3D skb->sk; + + /* The subflow just lend the skb fwd memory, and we know that the skb + * is only accounted on the incoming subflow rcvbuf. + */ + DEBUG_NET_WARN_ON_ONCE(skb->destructor); + skb->sk =3D NULL; + sk_forward_alloc_add(sk, skb->truesize); + atomic_sub(skb->truesize, &ssk->sk_rmem_alloc); +} + +static inline void +mptcp_subflow_lend_fwdmem(struct mptcp_subflow_context *subflow, + struct sk_buff *skb) +{ + int frag =3D (subflow->lent_mem_frag + skb->truesize) & (PAGE_SIZE - 1); + + skb->destructor =3D NULL; + subflow->lent_mem_frag =3D frag; +} + static inline u64 mptcp_subflow_get_map_offset(const struct mptcp_subflow_context *subflow) { --=20 2.51.0 From nobody Thu Nov 27 12:37:17 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4F863357A37; Fri, 21 Nov 2025 17:03:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763744593; cv=none; b=YRJIxrDXQ9v4xmVu6AywqiV0WScMYfV947fVW19C9P1ENEP3Vgd9NktQ8ek4bqf0f62UF43kSlDfN6kp/9xrJHja5Qct0Stl7owKHY9qq37WkiNHNefaeKebnGEIIUcnXb1Uu18FUhZxv9OiLVxFy/aCbWBo194aC5gMC4C4E5E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763744593; c=relaxed/simple; bh=4lc9o5hypIw06c+bc58A+FpLlk7rv5SCZaysugtGJHg=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=GE9eNF7IsDNf/OQCdjpySdUXykafGjqN7tzokEVGoK55oFjt5+KVMldROMs8qItLCfYpLKqPfPXvl8COJsZ4skuyMHkbot4+4RR1SggnzqsLX97CSt3XbF4qdfnSgcTjhhaPG1UPuXzvKunY23L5xahp1McnHrvbQlP2rNftjnA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=SADS0S9j; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="SADS0S9j" Received: by smtp.kernel.org (Postfix) with ESMTPSA id DF4D1C116D0; Fri, 21 Nov 2025 17:03:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1763744592; bh=4lc9o5hypIw06c+bc58A+FpLlk7rv5SCZaysugtGJHg=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=SADS0S9jwOeC0CaeONHZeZbmPg9Qy1MbHSZi+z9l0zW6JKKQuRPOotkJNu1X5YoWE dAEhT8I7JmV6VJjrf1eyovXD3x9Jx8Qwj0dbF9c+AuftSoIQePFnTBkJ8hqRVydXbr KgTHpfECWYDXd7qGostOX99zhAvrbArP8NyuKUz4BMRI6VzS5JUFLkR3FliFiaqrWr zKSlu3klvvGwLxYWze34ZhAEfXn4jh6U6a053lNfmCGiUTh/ErF0cPY4eqD0bBesOZ AJ8I18TrO0WRN7B9bJY0M3Yrg9pYdfhm5UxYJvpO9c89xgsLUF/Oy7OK16lfTKWotF pHyZ7MhP2qjwQ== From: "Matthieu Baerts (NGI0)" Date: Fri, 21 Nov 2025 18:02:12 +0100 Subject: [PATCH net-next 13/14] mptcp: introduce mptcp-level backlog Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20251121-net-next-mptcp-memcg-backlog-imp-v1-13-1f34b6c1e0b1@kernel.org> References: <20251121-net-next-mptcp-memcg-backlog-imp-v1-0-1f34b6c1e0b1@kernel.org> In-Reply-To: <20251121-net-next-mptcp-memcg-backlog-imp-v1-0-1f34b6c1e0b1@kernel.org> To: Eric Dumazet , Kuniyuki Iwashima , Paolo Abeni , Willem de Bruijn , "David S. Miller" , Jakub Kicinski , Simon Horman , David Ahern , Mat Martineau , Geliang Tang , Peter Krystad , Florian Westphal , Christoph Paasch Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, mptcp@lists.linux.dev, Davide Caratti , "Matthieu Baerts (NGI0)" X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=openpgp-sha256; l=8346; i=matttbe@kernel.org; h=from:subject:message-id; bh=Ms4xwAkncS62P20+iLp6vtlwxgVmggWBcGXR1o8kOlY=; b=owGbwMvMwCVWo/Th0Gd3rumMp9WSGDIVZsu7c7QedZ0WGL0r2CuY46+W+dW/k0u/7r91L4nhZ aom48fajlIWBjEuBlkxRRbptsj8mc+reEu8/Cxg5rAygQxh4OIUgInYKjP8jxG+vNPzkdWGZQ8Z 9Z4rrVUX4koWZVrKcvz9B9MPLbU6Rxn+KZ5bl3sj0bT69qsga68pfyIPt3+6HD3t1YwKvxWzZ88 6zA0A X-Developer-Key: i=matttbe@kernel.org; a=openpgp; fpr=E8CB85F76877057A6E27F77AF6B7824F4269A073 From: Paolo Abeni We are soon using it for incoming data processing. MPTCP can't leverage the sk_backlog, as the latter is processed before the release callback, and such callback for MPTCP releases and re-acquire the socket spinlock, breaking the sk_backlog processing assumption. Add a skb backlog list inside the mptcp sock struct, and implement basic helper to transfer packet to and purge such list. Packets in the backlog are memory accounted and still use the incoming subflow receive memory, to allow back-pressure. The backlog size is implicitly bounded to the sum of subflows rcvbuf. When a subflow is closed, references from the backlog to such sock are removed. No packet is currently added to the backlog, so no functional changes intended here. Signed-off-by: Paolo Abeni Reviewed-by: Mat Martineau Signed-off-by: Matthieu Baerts (NGI0) --- net/mptcp/mptcp_diag.c | 3 +- net/mptcp/protocol.c | 78 ++++++++++++++++++++++++++++++++++++++++++++++= ++-- net/mptcp/protocol.h | 25 ++++++++++++---- 3 files changed, 97 insertions(+), 9 deletions(-) diff --git a/net/mptcp/mptcp_diag.c b/net/mptcp/mptcp_diag.c index ac974299de71..136c2d05c0ee 100644 --- a/net/mptcp/mptcp_diag.c +++ b/net/mptcp/mptcp_diag.c @@ -195,7 +195,8 @@ static void mptcp_diag_get_info(struct sock *sk, struct= inet_diag_msg *r, struct mptcp_sock *msk =3D mptcp_sk(sk); struct mptcp_info *info =3D _info; =20 - r->idiag_rqueue =3D sk_rmem_alloc_get(sk); + r->idiag_rqueue =3D sk_rmem_alloc_get(sk) + + READ_ONCE(mptcp_sk(sk)->backlog_len); r->idiag_wqueue =3D sk_wmem_alloc_get(sk); =20 if (inet_sk_state_load(sk) =3D=3D TCP_LISTEN) { diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index f5526855a2e5..dfed036e0591 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -659,6 +659,39 @@ static void mptcp_dss_corruption(struct mptcp_sock *ms= k, struct sock *ssk) } } =20 +static void __mptcp_add_backlog(struct sock *sk, + struct mptcp_subflow_context *subflow, + struct sk_buff *skb) +{ + struct mptcp_sock *msk =3D mptcp_sk(sk); + struct sk_buff *tail =3D NULL; + bool fragstolen; + int delta; + + if (unlikely(sk->sk_state =3D=3D TCP_CLOSE)) { + kfree_skb_reason(skb, SKB_DROP_REASON_SOCKET_CLOSE); + return; + } + + /* Try to coalesce with the last skb in our backlog */ + if (!list_empty(&msk->backlog_list)) + tail =3D list_last_entry(&msk->backlog_list, struct sk_buff, list); + + if (tail && MPTCP_SKB_CB(skb)->map_seq =3D=3D MPTCP_SKB_CB(tail)->end_seq= && + skb->sk =3D=3D tail->sk && + __mptcp_try_coalesce(sk, tail, skb, &fragstolen, &delta)) { + skb->truesize -=3D delta; + kfree_skb_partial(skb, fragstolen); + __mptcp_subflow_lend_fwdmem(subflow, delta); + WRITE_ONCE(msk->backlog_len, msk->backlog_len + delta); + return; + } + + list_add_tail(&skb->list, &msk->backlog_list); + mptcp_subflow_lend_fwdmem(subflow, skb); + WRITE_ONCE(msk->backlog_len, msk->backlog_len + skb->truesize); +} + static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk, struct sock *ssk) { @@ -705,8 +738,13 @@ static bool __mptcp_move_skbs_from_subflow(struct mptc= p_sock *msk, size_t len =3D skb->len - offset; =20 mptcp_init_skb(ssk, skb, offset, len); - mptcp_subflow_lend_fwdmem(subflow, skb); - ret =3D __mptcp_move_skb(sk, skb) || ret; + + if (true) { + mptcp_subflow_lend_fwdmem(subflow, skb); + ret |=3D __mptcp_move_skb(sk, skb); + } else { + __mptcp_add_backlog(sk, subflow, skb); + } seq +=3D len; =20 if (unlikely(map_remaining < len)) { @@ -2531,6 +2569,9 @@ static void __mptcp_close_ssk(struct sock *sk, struct= sock *ssk, void mptcp_close_ssk(struct sock *sk, struct sock *ssk, struct mptcp_subflow_context *subflow) { + struct mptcp_sock *msk =3D mptcp_sk(sk); + struct sk_buff *skb; + /* The first subflow can already be closed and still in the list */ if (subflow->close_event_done) return; @@ -2540,6 +2581,17 @@ void mptcp_close_ssk(struct sock *sk, struct sock *s= sk, if (sk->sk_state =3D=3D TCP_ESTABLISHED) mptcp_event(MPTCP_EVENT_SUB_CLOSED, mptcp_sk(sk), ssk, GFP_KERNEL); =20 + /* Remove any reference from the backlog to this ssk; backlog skbs consume + * space in the msk receive queue, no need to touch sk->sk_rmem_alloc + */ + list_for_each_entry(skb, &msk->backlog_list, list) { + if (skb->sk !=3D ssk) + continue; + + atomic_sub(skb->truesize, &skb->sk->sk_rmem_alloc); + skb->sk =3D NULL; + } + /* subflow aborted before reaching the fully_established status * attempt the creation of the next subflow */ @@ -2769,12 +2821,31 @@ static void mptcp_mp_fail_no_response(struct mptcp_= sock *msk) unlock_sock_fast(ssk, slow); } =20 +static void mptcp_backlog_purge(struct sock *sk) +{ + struct mptcp_sock *msk =3D mptcp_sk(sk); + struct sk_buff *tmp, *skb; + LIST_HEAD(backlog); + + mptcp_data_lock(sk); + list_splice_init(&msk->backlog_list, &backlog); + msk->backlog_len =3D 0; + mptcp_data_unlock(sk); + + list_for_each_entry_safe(skb, tmp, &backlog, list) { + mptcp_borrow_fwdmem(sk, skb); + kfree_skb_reason(skb, SKB_DROP_REASON_SOCKET_CLOSE); + } + sk_mem_reclaim(sk); +} + static void mptcp_do_fastclose(struct sock *sk) { struct mptcp_subflow_context *subflow, *tmp; struct mptcp_sock *msk =3D mptcp_sk(sk); =20 mptcp_set_state(sk, TCP_CLOSE); + mptcp_backlog_purge(sk); =20 /* Explicitly send the fastclose reset as need */ if (__mptcp_check_fallback(msk)) @@ -2853,11 +2924,13 @@ static void __mptcp_init_sock(struct sock *sk) INIT_LIST_HEAD(&msk->conn_list); INIT_LIST_HEAD(&msk->join_list); INIT_LIST_HEAD(&msk->rtx_queue); + INIT_LIST_HEAD(&msk->backlog_list); INIT_WORK(&msk->work, mptcp_worker); msk->out_of_order_queue =3D RB_ROOT; msk->first_pending =3D NULL; msk->timer_ival =3D TCP_RTO_MIN; msk->scaling_ratio =3D TCP_DEFAULT_SCALING_RATIO; + msk->backlog_len =3D 0; =20 WRITE_ONCE(msk->first, NULL); inet_csk(sk)->icsk_sync_mss =3D mptcp_sync_mss; @@ -3234,6 +3307,7 @@ static void mptcp_destroy_common(struct mptcp_sock *m= sk) struct sock *sk =3D (struct sock *)msk; =20 __mptcp_clear_xmit(sk); + mptcp_backlog_purge(sk); =20 /* join list will be eventually flushed (with rst) at sock lock release t= ime */ mptcp_for_each_subflow_safe(msk, subflow, tmp) diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index 5e2749d92a49..fe0dca4122f2 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -357,6 +357,9 @@ struct mptcp_sock { * allow_infinite_fallback and * allow_join */ + + struct list_head backlog_list; /* protected by the data lock */ + u32 backlog_len; }; =20 #define mptcp_data_lock(sk) spin_lock_bh(&(sk)->sk_lock.slock) @@ -407,6 +410,7 @@ static inline int mptcp_space_from_win(const struct soc= k *sk, int win) static inline int __mptcp_space(const struct sock *sk) { return mptcp_win_from_space(sk, READ_ONCE(sk->sk_rcvbuf) - + READ_ONCE(mptcp_sk(sk)->backlog_len) - sk_rmem_alloc_get(sk)); } =20 @@ -655,23 +659,32 @@ static inline void mptcp_borrow_fwdmem(struct sock *s= k, struct sk_buff *skb) { struct sock *ssk =3D skb->sk; =20 - /* The subflow just lend the skb fwd memory, and we know that the skb - * is only accounted on the incoming subflow rcvbuf. + /* The subflow just lend the skb fwd memory; if the subflow meanwhile + * closed, mptcp_close_ssk() already released the ssk rcv memory. */ DEBUG_NET_WARN_ON_ONCE(skb->destructor); - skb->sk =3D NULL; sk_forward_alloc_add(sk, skb->truesize); + if (!ssk) + return; + atomic_sub(skb->truesize, &ssk->sk_rmem_alloc); + skb->sk =3D NULL; +} + +static inline void +__mptcp_subflow_lend_fwdmem(struct mptcp_subflow_context *subflow, int siz= e) +{ + int frag =3D (subflow->lent_mem_frag + size) & (PAGE_SIZE - 1); + + subflow->lent_mem_frag =3D frag; } =20 static inline void mptcp_subflow_lend_fwdmem(struct mptcp_subflow_context *subflow, struct sk_buff *skb) { - int frag =3D (subflow->lent_mem_frag + skb->truesize) & (PAGE_SIZE - 1); - + __mptcp_subflow_lend_fwdmem(subflow, skb->truesize); skb->destructor =3D NULL; - subflow->lent_mem_frag =3D frag; } =20 static inline u64 --=20 2.51.0 From nobody Thu Nov 27 12:37:17 2025 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A422F3587D5; Fri, 21 Nov 2025 17:03:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763744596; cv=none; b=qOWto9tr3nHditSfeitgzsEUUl1Q+F6GU5+MqfGIfFbbKR/cj1H7RUAE+iDFox+lPmcZvrgLd4xtoPD4XrdjYyRoesnUz8354ceVKPzLTqp9PdINvfHJzpo8Ucztp6AR6EAvjWngAdr9VQyoIyRXh/L7brfz2q6C6ZmTJktTf1Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763744596; c=relaxed/simple; bh=yxIE8l3JZQR5JCywo+A5MPHo33+yYPDBOugaYFYC8vU=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=bfbEUnZTP2FbfZ5kZA5tf+7FK+hGxRpLbAuAhCiQRPHUEsjpXKzoYtGFUf2pGyELcZcV5Peg9JE2Y9XOKtBbfLXqZPSEuAdpnBnWnvKeYVxRNRNJiNTYwpx/s6TJKqj4SjDnwdsAppujj6L9Eaf+Baw5r6nLCpbXkUz1hi/ntCQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=h8yJKG12; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="h8yJKG12" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 53308C116C6; Fri, 21 Nov 2025 17:03:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1763744596; bh=yxIE8l3JZQR5JCywo+A5MPHo33+yYPDBOugaYFYC8vU=; h=From:Date:Subject:References:In-Reply-To:To:Cc:From; b=h8yJKG12y7OUoUeypsziIg8Yxrgmvno2Uej5G5zLX/qleVWx/HMzRgfVzbciFYj1t tVUWpbRuZCQThIc983/MGYsMaUTeU+H2ud2iiHceBks09RNhZbEMlYdKsZT25mH+2Z a579Hsdu22e3RkynJB31u5Tzn0VrxbxC7YT7OZ8RRjZKJbBnrFxFnJiGhRGlWlLf3j UWht0gpn3pEbTAAHMYiCfeUs0npfO8TGze7ZyYEoPQl28XH/APWjzeatdPuf7KaIVY 5Y6S3uThBr2SiAUs8aSearEUZg9e5Rxj08T4TOiIKMcpaurRU/k5ORZyr201RHXRV/ qsal4Y8KXw09w== From: "Matthieu Baerts (NGI0)" Date: Fri, 21 Nov 2025 18:02:13 +0100 Subject: [PATCH net-next 14/14] mptcp: leverage the backlog for RX packet processing Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-Id: <20251121-net-next-mptcp-memcg-backlog-imp-v1-14-1f34b6c1e0b1@kernel.org> References: <20251121-net-next-mptcp-memcg-backlog-imp-v1-0-1f34b6c1e0b1@kernel.org> In-Reply-To: <20251121-net-next-mptcp-memcg-backlog-imp-v1-0-1f34b6c1e0b1@kernel.org> To: Eric Dumazet , Kuniyuki Iwashima , Paolo Abeni , Willem de Bruijn , "David S. Miller" , Jakub Kicinski , Simon Horman , David Ahern , Mat Martineau , Geliang Tang , Peter Krystad , Florian Westphal , Christoph Paasch Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, mptcp@lists.linux.dev, Davide Caratti , "Matthieu Baerts (NGI0)" X-Mailer: b4 0.14.3 X-Developer-Signature: v=1; a=openpgp-sha256; l=11951; i=matttbe@kernel.org; h=from:subject:message-id; bh=EO6UaLraU7c+G3zNJSh1ZDma+EoTxHJngur0al91bNo=; b=owGbwMvMwCVWo/Th0Gd3rumMp9WSGDIVZiuseFX30OCNwvfLTRUXYjbzSXS/b6i5ekPicN+hg jq9W9sfdZSyMIhxMciKKbJIt0Xmz3xexVvi5WcBM4eVCWQIAxenAEzEqY2R4cIqbwPDCRUslaev LaouUT8h8Nf9y4WqR2zLVObqPatME2f4w5/00P76tGKb+/0eYt/8omwXuZw5bbt4ptJs+6fb0+X UuQE= X-Developer-Key: i=matttbe@kernel.org; a=openpgp; fpr=E8CB85F76877057A6E27F77AF6B7824F4269A073 From: Paolo Abeni When the msk socket is owned or the msk receive buffer is full, move the incoming skbs in a msk level backlog list. This avoid traversing the joined subflows and acquiring the subflow level socket lock at reception time, improving the RX performances. When processing the backlog, use the fwd alloc memory borrowed from the incoming subflow. skbs exceeding the msk receive space are not dropped; instead they are kept into the backlog until the receive buffer is freed. Dropping packets already acked at the TCP level is explicitly discouraged by the RFC and would corrupt the data stream for fallback sockets. Special care is needed to avoid adding skbs to the backlog of a closed msk and to avoid leaving dangling references into the backlog at subflow closing time. Signed-off-by: Paolo Abeni Reviewed-by: Mat Martineau Signed-off-by: Matthieu Baerts (NGI0) --- net/mptcp/protocol.c | 191 +++++++++++++++++++++++++++++++++++------------= ---- net/mptcp/protocol.h | 2 +- 2 files changed, 132 insertions(+), 61 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index dfed036e0591..e4ccc57b6f57 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -665,6 +665,7 @@ static void __mptcp_add_backlog(struct sock *sk, { struct mptcp_sock *msk =3D mptcp_sk(sk); struct sk_buff *tail =3D NULL; + struct sock *ssk =3D skb->sk; bool fragstolen; int delta; =20 @@ -678,22 +679,30 @@ static void __mptcp_add_backlog(struct sock *sk, tail =3D list_last_entry(&msk->backlog_list, struct sk_buff, list); =20 if (tail && MPTCP_SKB_CB(skb)->map_seq =3D=3D MPTCP_SKB_CB(tail)->end_seq= && - skb->sk =3D=3D tail->sk && + ssk =3D=3D tail->sk && __mptcp_try_coalesce(sk, tail, skb, &fragstolen, &delta)) { skb->truesize -=3D delta; kfree_skb_partial(skb, fragstolen); __mptcp_subflow_lend_fwdmem(subflow, delta); - WRITE_ONCE(msk->backlog_len, msk->backlog_len + delta); - return; + goto account; } =20 list_add_tail(&skb->list, &msk->backlog_list); mptcp_subflow_lend_fwdmem(subflow, skb); - WRITE_ONCE(msk->backlog_len, msk->backlog_len + skb->truesize); + delta =3D skb->truesize; + +account: + WRITE_ONCE(msk->backlog_len, msk->backlog_len + delta); + + /* Possibly not accept()ed yet, keep track of memory not CG + * accounted, mptcp_graft_subflows() will handle it. + */ + if (!mem_cgroup_from_sk(ssk)) + msk->backlog_unaccounted +=3D delta; } =20 static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk, - struct sock *ssk) + struct sock *ssk, bool own_msk) { struct mptcp_subflow_context *subflow =3D mptcp_subflow_ctx(ssk); struct sock *sk =3D (struct sock *)msk; @@ -709,9 +718,6 @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp= _sock *msk, struct sk_buff *skb; bool fin; =20 - if (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf) - break; - /* try to move as much data as available */ map_remaining =3D subflow->map_data_len - mptcp_subflow_get_map_offset(subflow); @@ -739,7 +745,7 @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp= _sock *msk, =20 mptcp_init_skb(ssk, skb, offset, len); =20 - if (true) { + if (own_msk && sk_rmem_alloc_get(sk) < sk->sk_rcvbuf) { mptcp_subflow_lend_fwdmem(subflow, skb); ret |=3D __mptcp_move_skb(sk, skb); } else { @@ -863,7 +869,7 @@ static bool move_skbs_to_msk(struct mptcp_sock *msk, st= ruct sock *ssk) struct sock *sk =3D (struct sock *)msk; bool moved; =20 - moved =3D __mptcp_move_skbs_from_subflow(msk, ssk); + moved =3D __mptcp_move_skbs_from_subflow(msk, ssk, true); __mptcp_ofo_queue(msk); if (unlikely(ssk->sk_err)) __mptcp_subflow_error_report(sk, ssk); @@ -896,7 +902,7 @@ void mptcp_data_ready(struct sock *sk, struct sock *ssk) if (move_skbs_to_msk(msk, ssk) && mptcp_epollin_ready(sk)) sk->sk_data_ready(sk); } else { - __set_bit(MPTCP_DEQUEUE, &mptcp_sk(sk)->cb_flags); + __mptcp_move_skbs_from_subflow(msk, ssk, false); } mptcp_data_unlock(sk); } @@ -2136,60 +2142,80 @@ static void mptcp_rcv_space_adjust(struct mptcp_soc= k *msk, int copied) msk->rcvq_space.time =3D mstamp; } =20 -static struct mptcp_subflow_context * -__mptcp_first_ready_from(struct mptcp_sock *msk, - struct mptcp_subflow_context *subflow) +static bool __mptcp_move_skbs(struct sock *sk, struct list_head *skbs, u32= *delta) { - struct mptcp_subflow_context *start_subflow =3D subflow; - - while (!READ_ONCE(subflow->data_avail)) { - subflow =3D mptcp_next_subflow(msk, subflow); - if (subflow =3D=3D start_subflow) - return NULL; - } - return subflow; -} - -static bool __mptcp_move_skbs(struct sock *sk) -{ - struct mptcp_subflow_context *subflow; + struct sk_buff *skb =3D list_first_entry(skbs, struct sk_buff, list); struct mptcp_sock *msk =3D mptcp_sk(sk); - bool ret =3D false; + bool moved =3D false; =20 - if (list_empty(&msk->conn_list)) - return false; - - subflow =3D list_first_entry(&msk->conn_list, - struct mptcp_subflow_context, node); - for (;;) { - struct sock *ssk; - bool slowpath; - - /* - * As an optimization avoid traversing the subflows list - * and ev. acquiring the subflow socket lock before baling out - */ + *delta =3D 0; + while (1) { + /* If the msk recvbuf is full stop, don't drop */ if (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf) break; =20 - subflow =3D __mptcp_first_ready_from(msk, subflow); - if (!subflow) + prefetch(skb->next); + list_del(&skb->list); + *delta +=3D skb->truesize; + + moved |=3D __mptcp_move_skb(sk, skb); + if (list_empty(skbs)) break; =20 - ssk =3D mptcp_subflow_tcp_sock(subflow); - slowpath =3D lock_sock_fast(ssk); - ret =3D __mptcp_move_skbs_from_subflow(msk, ssk) || ret; - if (unlikely(ssk->sk_err)) - __mptcp_error_report(sk); - unlock_sock_fast(ssk, slowpath); - - subflow =3D mptcp_next_subflow(msk, subflow); + skb =3D list_first_entry(skbs, struct sk_buff, list); } =20 __mptcp_ofo_queue(msk); - if (ret) + if (moved) mptcp_check_data_fin((struct sock *)msk); - return ret; + return moved; +} + +static bool mptcp_can_spool_backlog(struct sock *sk, struct list_head *skb= s) +{ + struct mptcp_sock *msk =3D mptcp_sk(sk); + + /* After CG initialization, subflows should never add skb before + * gaining the CG themself. + */ + DEBUG_NET_WARN_ON_ONCE(msk->backlog_unaccounted && sk->sk_socket && + mem_cgroup_from_sk(sk)); + + /* Don't spool the backlog if the rcvbuf is full. */ + if (list_empty(&msk->backlog_list) || + sk_rmem_alloc_get(sk) > sk->sk_rcvbuf) + return false; + + INIT_LIST_HEAD(skbs); + list_splice_init(&msk->backlog_list, skbs); + return true; +} + +static void mptcp_backlog_spooled(struct sock *sk, u32 moved, + struct list_head *skbs) +{ + struct mptcp_sock *msk =3D mptcp_sk(sk); + + WRITE_ONCE(msk->backlog_len, msk->backlog_len - moved); + list_splice(skbs, &msk->backlog_list); +} + +static bool mptcp_move_skbs(struct sock *sk) +{ + struct list_head skbs; + bool enqueued =3D false; + u32 moved; + + mptcp_data_lock(sk); + while (mptcp_can_spool_backlog(sk, &skbs)) { + mptcp_data_unlock(sk); + enqueued |=3D __mptcp_move_skbs(sk, &skbs, &moved); + + mptcp_data_lock(sk); + mptcp_backlog_spooled(sk, moved, &skbs); + } + mptcp_data_unlock(sk); + return enqueued; } =20 static unsigned int mptcp_inq_hint(const struct sock *sk) @@ -2255,7 +2281,7 @@ static int mptcp_recvmsg(struct sock *sk, struct msgh= dr *msg, size_t len, =20 copied +=3D bytes_read; =20 - if (skb_queue_empty(&sk->sk_receive_queue) && __mptcp_move_skbs(sk)) + if (!list_empty(&msk->backlog_list) && mptcp_move_skbs(sk)) continue; =20 /* only the MPTCP socket status is relevant here. The exit @@ -3556,8 +3582,7 @@ void __mptcp_check_push(struct sock *sk, struct sock = *ssk) =20 #define MPTCP_FLAGS_PROCESS_CTX_NEED (BIT(MPTCP_PUSH_PENDING) | \ BIT(MPTCP_RETRANSMIT) | \ - BIT(MPTCP_FLUSH_JOIN_LIST) | \ - BIT(MPTCP_DEQUEUE)) + BIT(MPTCP_FLUSH_JOIN_LIST)) =20 /* processes deferred events and flush wmem */ static void mptcp_release_cb(struct sock *sk) @@ -3567,9 +3592,12 @@ static void mptcp_release_cb(struct sock *sk) =20 for (;;) { unsigned long flags =3D (msk->cb_flags & MPTCP_FLAGS_PROCESS_CTX_NEED); - struct list_head join_list; + struct list_head join_list, skbs; + bool spool_bl; + u32 moved; =20 - if (!flags) + spool_bl =3D mptcp_can_spool_backlog(sk, &skbs); + if (!flags && !spool_bl) break; =20 INIT_LIST_HEAD(&join_list); @@ -3591,7 +3619,7 @@ static void mptcp_release_cb(struct sock *sk) __mptcp_push_pending(sk, 0); if (flags & BIT(MPTCP_RETRANSMIT)) __mptcp_retrans(sk); - if ((flags & BIT(MPTCP_DEQUEUE)) && __mptcp_move_skbs(sk)) { + if (spool_bl && __mptcp_move_skbs(sk, &skbs, &moved)) { /* notify ack seq update */ mptcp_cleanup_rbuf(msk, 0); sk->sk_data_ready(sk); @@ -3599,6 +3627,8 @@ static void mptcp_release_cb(struct sock *sk) =20 cond_resched(); spin_lock_bh(&sk->sk_lock.slock); + if (spool_bl) + mptcp_backlog_spooled(sk, moved, &skbs); } =20 if (__test_and_clear_bit(MPTCP_CLEAN_UNA, &msk->cb_flags)) @@ -3856,7 +3886,7 @@ static int mptcp_ioctl(struct sock *sk, int cmd, int = *karg) return -EINVAL; =20 lock_sock(sk); - if (__mptcp_move_skbs(sk)) + if (mptcp_move_skbs(sk)) mptcp_cleanup_rbuf(msk, 0); *karg =3D mptcp_inq_hint(sk); release_sock(sk); @@ -4061,6 +4091,22 @@ static void mptcp_graft_subflows(struct sock *sk) struct mptcp_subflow_context *subflow; struct mptcp_sock *msk =3D mptcp_sk(sk); =20 + if (mem_cgroup_sockets_enabled) { + LIST_HEAD(join_list); + + /* Subflows joining after __inet_accept() will get the + * mem CG properly initialized at mptcp_finish_join() time, + * but subflows pending in join_list need explicit + * initialization before flushing `backlog_unaccounted` + * or MPTCP can later unexpectedly observe unaccounted memory. + */ + mptcp_data_lock(sk); + list_splice_init(&msk->join_list, &join_list); + mptcp_data_unlock(sk); + + __mptcp_flush_join_list(sk, &join_list); + } + mptcp_for_each_subflow(msk, subflow) { struct sock *ssk =3D mptcp_subflow_tcp_sock(subflow); =20 @@ -4072,10 +4118,35 @@ static void mptcp_graft_subflows(struct sock *sk) if (!ssk->sk_socket) mptcp_sock_graft(ssk, sk->sk_socket); =20 + if (!mem_cgroup_sk_enabled(sk)) + goto unlock; + __mptcp_inherit_cgrp_data(sk, ssk); __mptcp_inherit_memcg(sk, ssk, GFP_KERNEL); + +unlock: release_sock(ssk); } + + if (mem_cgroup_sk_enabled(sk)) { + gfp_t gfp =3D GFP_KERNEL | __GFP_NOFAIL; + int amt; + + /* Account the backlog memory; prior accept() is aware of + * fwd and rmem only. + */ + mptcp_data_lock(sk); + amt =3D sk_mem_pages(sk->sk_forward_alloc + + msk->backlog_unaccounted + + atomic_read(&sk->sk_rmem_alloc)) - + sk_mem_pages(sk->sk_forward_alloc + + atomic_read(&sk->sk_rmem_alloc)); + msk->backlog_unaccounted =3D 0; + mptcp_data_unlock(sk); + + if (amt) + mem_cgroup_sk_charge(sk, amt, gfp); + } } =20 static int mptcp_stream_accept(struct socket *sock, struct socket *newsock, diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index fe0dca4122f2..313da78e2b75 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -124,7 +124,6 @@ #define MPTCP_FLUSH_JOIN_LIST 5 #define MPTCP_SYNC_STATE 6 #define MPTCP_SYNC_SNDBUF 7 -#define MPTCP_DEQUEUE 8 =20 struct mptcp_skb_cb { u64 map_seq; @@ -360,6 +359,7 @@ struct mptcp_sock { =20 struct list_head backlog_list; /* protected by the data lock */ u32 backlog_len; + u32 backlog_unaccounted; }; =20 #define mptcp_data_lock(sk) spin_lock_bh(&(sk)->sk_lock.slock) --=20 2.51.0