From nobody Thu Nov 27 12:35:52 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BC8F12222C4 for ; Fri, 14 Nov 2025 09:17:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763111851; cv=none; b=mA8MXLOo+OL4EQ7QEEUk45y207+Ys83LJwjfRxDoOJ1gE+BrHoWDIwiNJyQVk/fD9ox/wqeBNtQ0XczM4crw8OgyAWQ4I5MemHg8Wdf5phUMyJKsnORdaYPjcXjm01WYIqNTg/MErV6lXspTLjkW+Nf2gHamA7KQaDmIiCF5/bk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763111851; c=relaxed/simple; bh=Dzf2juHVl0sVND98n1jH36RCIpOTSCDRhccFFuEKttw=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=T4f/YMMNMn5SwNkMIwhRaXVNfAeFVlo7D/ggrbAboPaN4gjwZOwtDC9xn7pMDhFKW3FbsWTSv+L609BEmDUNyHkpKnb+8Tskddptgz/QSYmqEHL8mNAEWdZ9Qg7HIqnC3gOrNZhi1SDf1fcqeTraM08bcV7DL5Ilat9R7/8tFQo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=fu2lrJj0; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="fu2lrJj0" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1763111848; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=m81cAZhPJdxk55S0XQngQABFt9OZ9/xrFNDTWT5z57o=; b=fu2lrJj0sez1FYkQcmrofUllUcgs6gUsY9v8gPWlc7yT2Gn9Upb5NElBCzMMLaKxUEu8Cf pfLcrkOYV2coYV18hBRuT51wn/Yg/r+I1LVd/dwVAsw1Rc7+nZtUPcxW1vVBSuNYJKBTlN feTrb0ca4PK6xijvZ9K1ktF0DilvkDg= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-97-hbdTincdNu23nDunL7Z8dg-1; Fri, 14 Nov 2025 04:17:26 -0500 X-MC-Unique: hbdTincdNu23nDunL7Z8dg-1 X-Mimecast-MFC-AGG-ID: hbdTincdNu23nDunL7Z8dg_1763111846 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id F1C2018AB423 for ; Fri, 14 Nov 2025 09:17:25 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.44.32.173]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 177721800451 for ; Fri, 14 Nov 2025 09:17:24 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Subject: [PATCH v4 mptcp-next 1/3] mptcp: grafting MPJ subflow earlier Date: Fri, 14 Nov 2025 10:17:12 +0100 Message-ID: In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: Z_xZ_OP4jdkfYPF0Fkn6jSCstfe7loIbUhXMJoCvf00_1763111846 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" Later patches need to ensure that all MPJ subflows are grafted to the msk socket before accept() completion. Currently the grafting happens under the msk socket lock: potentially at msk release_cb time which make satisfying the above condition a bit tricky. Move the MPJ subflow grafting earlier, under the msk data lock, so that we can use such lock as a synchronization point. Signed-off-by: Paolo Abeni Reviewed-by: Matthieu Baerts (NGI0) --- v3 -> v4: - clarified it's not a fix - move the graft under the msk socket lock - no need to graft for active subflows --- net/mptcp/protocol.c | 30 +++++++++++++++++++++++------- 1 file changed, 23 insertions(+), 7 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 78ac8ba80e59..4a4cb9952596 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -933,12 +933,6 @@ static bool __mptcp_finish_join(struct mptcp_sock *msk= , struct sock *ssk) mptcp_subflow_joined(msk, ssk); spin_unlock_bh(&msk->fallback_lock); =20 - /* attach to msk socket only after we are sure we will deal with it - * at close time - */ - if (sk->sk_socket && !ssk->sk_socket) - mptcp_sock_graft(ssk, sk->sk_socket); - mptcp_subflow_ctx(ssk)->subflow_id =3D msk->subflow_id++; mptcp_sockopt_sync_locked(msk, ssk); mptcp_stop_tout_timer(sk); @@ -3760,6 +3754,20 @@ void mptcp_sock_graft(struct sock *sk, struct socket= *parent) write_unlock_bh(&sk->sk_callback_lock); } =20 +/* Can be called without holding the msk socket lock; use the callback lock + * to avoid {READ_,WRITE_}ONCE annotations on sk_socket. + */ +static void mptcp_sock_check_graft(struct sock *sk, struct sock *ssk) +{ + struct socket *sock; + + write_lock_bh(&sk->sk_callback_lock); + sock =3D sk->sk_socket; + write_unlock_bh(&sk->sk_callback_lock); + if (sock) + mptcp_sock_graft(ssk, sock); +} + bool mptcp_finish_join(struct sock *ssk) { struct mptcp_subflow_context *subflow =3D mptcp_subflow_ctx(ssk); @@ -3775,7 +3783,9 @@ bool mptcp_finish_join(struct sock *ssk) return false; } =20 - /* active subflow, already present inside the conn_list */ + /* Active subflow, already present inside the conn_list; is grafted + * either by __mptcp_subflow_connect() or accept. + */ if (!list_empty(&subflow->node)) { spin_lock_bh(&msk->fallback_lock); if (!msk->allow_subflows) { @@ -3802,11 +3812,17 @@ bool mptcp_finish_join(struct sock *ssk) if (ret) { sock_hold(ssk); list_add_tail(&subflow->node, &msk->conn_list); + mptcp_sock_check_graft(parent, ssk); } } else { sock_hold(ssk); list_add_tail(&subflow->node, &msk->join_list); __set_bit(MPTCP_FLUSH_JOIN_LIST, &msk->cb_flags); + + /* In case of later failures, __mptcp_flush_join_list() will + * properly orphan the ssk via mptcp_close_ssk(). + */ + mptcp_sock_check_graft(parent, ssk); } mptcp_data_unlock(parent); =20 --=20 2.51.1 From nobody Thu Nov 27 12:35:52 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 941832FFF8D for ; Fri, 14 Nov 2025 09:17:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763111852; cv=none; b=RYMus64dPTjJrSlOB0S7MfD2xN5GznF8Uop2g/wJE1hvzj1QXIzlYWAVClQ72QWMRp3Or6MyWMxiEELgRlw2B0RY4HVfAgNNXtoDygDFw2sJXhSMzB6dPLcIyeGim3HLh/eVrfa/RcIayhElfXYOOiWhHf3Z913s9yEViY1oH0M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763111852; c=relaxed/simple; bh=ZFSichs7VCZoDKwgYB3F9ufWUlKfDQW7rCXhsuAWdq8=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=DcdVT3ACu3cmuozvvKjNxrwC9J9tuAZFUcLM8YdQesLYlO6Cv2P7wxdQO88dlQWzQoVOfEsHakOFdH/WDM+NvhjXm8AJfM4zVWn0sQjnh6Srnm1REz7XMUbHP/hoK12e4Vgs+sH/qplzm9rFmMHuLS56L3MlALaushv3sxb8ud0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=Cty7Q7uX; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Cty7Q7uX" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1763111849; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=nEQ7pEH5g/wMbOVnibEWnOJT+RPdVKjpC3uYflBWDig=; b=Cty7Q7uXJ67cHC//25g7runXIW8gjhbWoxgvfhyoLPn72mL/hjWR1ahG7NXPjf9Gf8Q/DX KhHk331F+eUO4hwkAwxAuxushjOU58HNNDyYD7twVTXxgl8L1r+84QjknpOv6x4Fb/IYBx ws3qISuk+u93ADEHN7kDbgms7hasFv4= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-647-SI5dmxzvOrqKlnSFH0Mdzw-1; Fri, 14 Nov 2025 04:17:28 -0500 X-MC-Unique: SI5dmxzvOrqKlnSFH0Mdzw-1 X-Mimecast-MFC-AGG-ID: SI5dmxzvOrqKlnSFH0Mdzw_1763111847 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 94FC818001D1 for ; Fri, 14 Nov 2025 09:17:27 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.44.32.173]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 8FE9E180049F for ; Fri, 14 Nov 2025 09:17:26 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Subject: [PATCH v4 mptcp-next 2/3] Squash-to: "mptcp: fix memcg accounting for passive sockets" Date: Fri, 14 Nov 2025 10:17:13 +0100 Message-ID: <3a879a5ea9b186b7966c373d6d18f35165133a40.1763111767.git.pabeni@redhat.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: 0OaICK8FNh7DOfOCpOBaHxJ1eggy7OlpVKgliLv5TMc_1763111847 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" __mptcp_inherit_memcg() is currently invoked by mptcp_graph_subflows() with the wrong GFP flags, as lock_sock_fast() can yield atomic scope. Since this is not the most extreme fast path, use plain lock_sock() instead. Additionally move the subflow CG initialization earlier under the msk data lock, so that the next patch could use the latter as a synchronization point to ensure all subflows are CG accounted. Finally fix a typo in the mentioned helper name. Signed-off-by: Paolo Abeni Reviewed-by: Matthieu Baerts (NGI0) --- v3 -> v4: - rebased --- net/mptcp/protocol.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 4a4cb9952596..2364c144bf4f 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -936,8 +936,6 @@ static bool __mptcp_finish_join(struct mptcp_sock *msk,= struct sock *ssk) mptcp_subflow_ctx(ssk)->subflow_id =3D msk->subflow_id++; mptcp_sockopt_sync_locked(msk, ssk); mptcp_stop_tout_timer(sk); - __mptcp_inherit_cgrp_data(sk, ssk); - __mptcp_inherit_memcg(sk, ssk, GFP_ATOMIC); __mptcp_propagate_sndbuf(sk, ssk); return true; } @@ -3764,8 +3762,11 @@ static void mptcp_sock_check_graft(struct sock *sk, = struct sock *ssk) write_lock_bh(&sk->sk_callback_lock); sock =3D sk->sk_socket; write_unlock_bh(&sk->sk_callback_lock); - if (sock) + if (sock) { mptcp_sock_graft(ssk, sock); + __mptcp_inherit_cgrp_data(sk, ssk); + __mptcp_inherit_memcg(sk, ssk, GFP_ATOMIC); + } } =20 bool mptcp_finish_join(struct sock *ssk) @@ -4083,18 +4084,17 @@ static int mptcp_listen(struct socket *sock, int ba= cklog) return err; } =20 -static void mptcp_graph_subflows(struct sock *sk) +static void mptcp_graft_subflows(struct sock *sk) { struct mptcp_subflow_context *subflow; struct mptcp_sock *msk =3D mptcp_sk(sk); =20 mptcp_for_each_subflow(msk, subflow) { struct sock *ssk =3D mptcp_subflow_tcp_sock(subflow); - bool slow; =20 - slow =3D lock_sock_fast(ssk); + lock_sock(ssk); =20 - /* set ssk->sk_socket of accept()ed flows to mptcp socket. + /* Set ssk->sk_socket of accept()ed flows to mptcp socket. * This is needed so NOSPACE flag can be set from tcp stack. */ if (!ssk->sk_socket) @@ -4102,7 +4102,7 @@ static void mptcp_graph_subflows(struct sock *sk) =20 __mptcp_inherit_cgrp_data(sk, ssk); __mptcp_inherit_memcg(sk, ssk, GFP_KERNEL); - unlock_sock_fast(ssk, slow); + release_sock(ssk); } } =20 @@ -4153,7 +4153,7 @@ static int mptcp_stream_accept(struct socket *sock, s= truct socket *newsock, msk =3D mptcp_sk(newsk); msk->in_accept_queue =3D 0; =20 - mptcp_graph_subflows(newsk); + mptcp_graft_subflows(newsk); mptcp_rps_record_subflows(msk); =20 /* Do late cleanup for the first subflow as necessary. Also --=20 2.51.1 From nobody Thu Nov 27 12:35:52 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 39985306482 for ; Fri, 14 Nov 2025 09:17:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763111853; cv=none; b=YMGgzYB3Tc/VEhuRAWn634V/iG2hjPeH7HXcqDbljbnOrpaSxL2mdteSzFswDcRy50ekv//Mqnj9E7kNAo/pHUOk3QI+3VHG0NPm9YhXvJYg2YovnALqjP/iBjQTFx4T54czNdzAVoxaMzXemFAPoHegK0rwgUxZIwMn7FM/mio= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763111853; c=relaxed/simple; bh=/CYJLyQn2n0CDNeRzNqoO0z25QvPgF0U0FXl2/CgiwU=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=UtfugOUsig4fHoeq1h1Lc7bu/tL/ozBrFWf9L3hDIOC2zQwF0rj1PemJ3xQMUubUB5oJQIl/NOYdMor1QU6LJfcQJZaMnka3IUR4sH604u6iQY1EarEyusHFWWrTbCLDAnvVaI3iaBwGs/dinpSZ9H+100fAyZFMvULLoCBdtY4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=SOZx4KFQ; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="SOZx4KFQ" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1763111851; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=r04fH9s6q2z3sXswqtpreQvr6eQAnIgPSR25ar7VrDg=; b=SOZx4KFQZSqXxd1caxE8PlhYEuGYFdKzNEcJsvSXkUDgRDiNDjDf0npYQEhTvPbSWtztm4 9fz1ypFBmYNrrPI7ohawWzzMNfSCFUyvNQ6DRDbxFcv1n0CA6hE4JUINIfmAVWDJylqlSb 2vGN15c2qDDVF0Y1XHnoddJJw8dzM9c= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-65-7ePpACd7OBiNEESc_0bL4Q-1; Fri, 14 Nov 2025 04:17:29 -0500 X-MC-Unique: 7ePpACd7OBiNEESc_0bL4Q-1 X-Mimecast-MFC-AGG-ID: 7ePpACd7OBiNEESc_0bL4Q_1763111849 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id E5A30180047F for ; Fri, 14 Nov 2025 09:17:28 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.44.32.173]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 187451800451 for ; Fri, 14 Nov 2025 09:17:27 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Subject: [PATCH v4 mptcp-next 3/3] Squash-to: "mptcp: leverage the backlog for RX packet processing" Date: Fri, 14 Nov 2025 10:17:14 +0100 Message-ID: <89f8e00a62b8255050d498b861c7e4785c375775.1763111767.git.pabeni@redhat.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: TmE1QZjV92I689WMFBUW3Lszmz_dCgevnJ3-P3eAC0U_1763111849 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" If a subflow receives data before gaining the memcg while the msk socket lock is held at accept time, or the PM locks the msk socket while still unaccepted and subflows push data to it at the same time, the mptcp_graph_subflows() can complete with a non empty backlog. The msk will try to borrow such memory, but (some) of the skbs there where not memcg charged. When the msk finally will return such accounted memory, we should hit the same splat of #597. [even if so far I was unable to replicate this scenario] This patch tries to address such potential issue by: - explicitly keep track of the amount of memory added to the backlog not CG accounted - additionally accounting for such memory at accept time - preventing any subflow from adding memory to the backlog not CG accounted after the above flush Signed-off-by: Paolo Abeni Reviewed-by: Matthieu Baerts (NGI0) --- v3 -> v4: - fixed a bunch of typos - fixed build error when CG are not enabled --- net/mptcp/protocol.c | 64 +++++++++++++++++++++++++++++++++++++++++--- net/mptcp/protocol.h | 1 + 2 files changed, 61 insertions(+), 4 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 2364c144bf4f..ad3c43a9c3f4 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -678,6 +678,7 @@ static void __mptcp_add_backlog(struct sock *sk, { struct mptcp_sock *msk =3D mptcp_sk(sk); struct sk_buff *tail =3D NULL; + struct sock *ssk =3D skb->sk; bool fragstolen; int delta; =20 @@ -691,18 +692,26 @@ static void __mptcp_add_backlog(struct sock *sk, tail =3D list_last_entry(&msk->backlog_list, struct sk_buff, list); =20 if (tail && MPTCP_SKB_CB(skb)->map_seq =3D=3D MPTCP_SKB_CB(tail)->end_seq= && - skb->sk =3D=3D tail->sk && + ssk =3D=3D tail->sk && __mptcp_try_coalesce(sk, tail, skb, &fragstolen, &delta)) { skb->truesize -=3D delta; kfree_skb_partial(skb, fragstolen); __mptcp_subflow_lend_fwdmem(subflow, delta); - WRITE_ONCE(msk->backlog_len, msk->backlog_len + delta); - return; + goto account; } =20 list_add_tail(&skb->list, &msk->backlog_list); mptcp_subflow_lend_fwdmem(subflow, skb); - WRITE_ONCE(msk->backlog_len, msk->backlog_len + skb->truesize); + delta =3D skb->truesize; + +account: + WRITE_ONCE(msk->backlog_len, msk->backlog_len + delta); + + /* Possibly not accept()ed yet, keep track of memory not CG + * accounted, mptcp_graft_subflows() will handle it. + */ + if (!mem_cgroup_from_sk(ssk)) + msk->backlog_unaccounted +=3D delta; } =20 static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk, @@ -2179,6 +2188,12 @@ static bool mptcp_can_spool_backlog(struct sock *sk,= struct list_head *skbs) { struct mptcp_sock *msk =3D mptcp_sk(sk); =20 + /* After CG initialization, subflows should never add skb before + * gaining the CG themself. + */ + DEBUG_NET_WARN_ON_ONCE(msk->backlog_unaccounted && sk->sk_socket && + mem_cgroup_from_sk(sk)); + /* Don't spool the backlog if the rcvbuf is full. */ if (list_empty(&msk->backlog_list) || sk_rmem_alloc_get(sk) > sk->sk_rcvbuf) @@ -4089,6 +4104,22 @@ static void mptcp_graft_subflows(struct sock *sk) struct mptcp_subflow_context *subflow; struct mptcp_sock *msk =3D mptcp_sk(sk); =20 + if (mem_cgroup_sockets_enabled) { + LIST_HEAD(join_list); + + /* Subflows joining after __inet_accept() will get the + * mem CG properly initialized at mptcp_finish_join() time, + * but subflows pending in join_list need explicit + * initialization before flushing `backlog_unaccounted` + * or MPTCP can later unexpectedly observe unaccounted memory. + */ + mptcp_data_lock(sk); + list_splice_init(&msk->join_list, &join_list); + mptcp_data_unlock(sk); + + __mptcp_flush_join_list(sk, &join_list); + } + mptcp_for_each_subflow(msk, subflow) { struct sock *ssk =3D mptcp_subflow_tcp_sock(subflow); =20 @@ -4100,10 +4131,35 @@ static void mptcp_graft_subflows(struct sock *sk) if (!ssk->sk_socket) mptcp_sock_graft(ssk, sk->sk_socket); =20 + if (!mem_cgroup_sk_enabled(sk)) + goto unlock; + __mptcp_inherit_cgrp_data(sk, ssk); __mptcp_inherit_memcg(sk, ssk, GFP_KERNEL); + +unlock: release_sock(ssk); } + + if (mem_cgroup_sk_enabled(sk)) { + gfp_t gfp =3D GFP_KERNEL | __GFP_NOFAIL; + int amt; + + /* Account the backlog memory; prior accept() is aware of + * fwd and rmem only. + */ + mptcp_data_lock(sk); + amt =3D sk_mem_pages(sk->sk_forward_alloc + + msk->backlog_unaccounted + + atomic_read(&sk->sk_rmem_alloc)) - + sk_mem_pages(sk->sk_forward_alloc + + atomic_read(&sk->sk_rmem_alloc)); + msk->backlog_unaccounted =3D 0; + mptcp_data_unlock(sk); + + if (amt) + mem_cgroup_sk_charge(sk, amt, gfp); + } } =20 static int mptcp_stream_accept(struct socket *sock, struct socket *newsock, diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index 161b704be16b..199f28f3dd5e 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -360,6 +360,7 @@ struct mptcp_sock { =20 struct list_head backlog_list; /* protected by the data lock */ u32 backlog_len; + u32 backlog_unaccounted; }; =20 #define mptcp_data_lock(sk) spin_lock_bh(&(sk)->sk_lock.slock) --=20 2.51.1