From nobody Sat Oct 11 05:52:54 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4E3141B6D06 for ; Fri, 3 Oct 2025 14:01:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759500119; cv=none; b=XtUiIka1tagWn3QB+BJ+yyI+bbyEey17FK1SreoB5A7jQmwCLEoTb2wlFmBb8YaZnWfD3fSzMYu1HeFoL9gNQON09GVPku9qQdFyi2xZLjLlwgYa50tWZoZBcULC003o2rL4CUNtskDhQjEy8B0XeTdYA/ldHJZNq6k80eXF1WE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759500119; c=relaxed/simple; bh=t5EM9sTNa6jp1TsHRTto1TYcoIdeASiDiytfXEu5jjY=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=suLgMKYT3i5xHHEJiaO8oK5gQrMk28fEzQuOopzjN6AJ8kKCvpAPcpo5kcpxAoA4qikLlkrHnhxG8UYBN62RsUfV6ap7kpIDtXVYw50x/acohnbyBqyCf+GVMbPtjPC3X55WI3hFB/z2J4mG4PiTwdHz9GVcIxvugOjvYuVqoDM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=a8MeNIXy; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="a8MeNIXy" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1759500117; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TGuG+CeeCwFekLq09ZQq+rY5Zz/Td85BVZSx0qxji1w=; b=a8MeNIXyjGlxb5aNqiYy3y6V7mdSEf5ib8eKSjxs0CNksAWIuxiiaIWrWOR9uo1FtljAbF Hek+ASbZAkdUT5b7mQtZbwk0fmN6H4knsY0u7CJPs9o58Wiq2FfiWwcEJ11j7HF2zQSeKK XuV3D6IKPpqbIwi7YxuizmT1ZXgfotM= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-627-TMMn3YhRNnu5KabwkO-sJg-1; Fri, 03 Oct 2025 10:01:56 -0400 X-MC-Unique: TMMn3YhRNnu5KabwkO-sJg-1 X-Mimecast-MFC-AGG-ID: TMMn3YhRNnu5KabwkO-sJg_1759500115 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 4AE3B180057F for ; Fri, 3 Oct 2025 14:01:55 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.44.32.53]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 69D0E1800577 for ; Fri, 3 Oct 2025 14:01:54 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Subject: [PATCH v4 mptcp-next 1/8] mptcp: borrow forward memory from subflow Date: Fri, 3 Oct 2025 16:01:39 +0200 Message-ID: In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: xPtPCKTvTbt-xewdrLwLKBe38qb5bvVFOMyMrY8mtuw_1759500115 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" In the MPTCP receive path, we release the subflow allocated fwd memory just to allocate it again shortly after for the msk. That could increases the failures chances, especially during backlog processing, when other actions could consume the just released memory before the msk socket has a chance to do the rcv allocation. Replace the skb_orphan() call with an open-coded variant that explicitly borrows, with a PAGE_SIZE granularity, the fwd memory from the subflow socket instead of releasing it. During backlog processing the borrowed memory is accounted at release_cb time. Signed-off-by: Paolo Abeni Reviewed-by: Geliang Tang --- v1 -> v2: - rebased - explain why skb_orphan is removed --- net/mptcp/protocol.c | 19 +++++++++++++++---- 1 file changed, 15 insertions(+), 4 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 574a1e222d9cf..34661ab979158 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -337,11 +337,12 @@ static void mptcp_data_queue_ofo(struct mptcp_sock *m= sk, struct sk_buff *skb) mptcp_rcvbuf_grow(sk); } =20 -static void mptcp_init_skb(struct sock *ssk, - struct sk_buff *skb, int offset, int copy_len) +static int mptcp_init_skb(struct sock *ssk, + struct sk_buff *skb, int offset, int copy_len) { const struct mptcp_subflow_context *subflow =3D mptcp_subflow_ctx(ssk); bool has_rxtstamp =3D TCP_SKB_CB(skb)->has_rxtstamp; + int borrowed; =20 /* the skb map_seq accounts for the skb offset: * mptcp_subflow_get_mapped_dsn() is based on the current tp->copied_seq @@ -357,6 +358,13 @@ static void mptcp_init_skb(struct sock *ssk, =20 skb_ext_reset(skb); skb_dst_drop(skb); + + /* "borrow" the fwd memory from the subflow, instead of reclaiming it */ + skb->destructor =3D NULL; + borrowed =3D ssk->sk_forward_alloc - sk_unused_reserved_mem(ssk); + borrowed &=3D ~(PAGE_SIZE - 1); + sk_forward_alloc_add(ssk, skb->truesize - borrowed); + return borrowed; } =20 static bool __mptcp_move_skb(struct sock *sk, struct sk_buff *skb) @@ -690,9 +698,12 @@ static bool __mptcp_move_skbs_from_subflow(struct mptc= p_sock *msk, =20 if (offset < skb->len) { size_t len =3D skb->len - offset; + int bmem; =20 - mptcp_init_skb(ssk, skb, offset, len); - skb_orphan(skb); + bmem =3D mptcp_init_skb(ssk, skb, offset, len); + skb->sk =3D NULL; + sk_forward_alloc_add(sk, bmem); + atomic_sub(skb->truesize, &ssk->sk_rmem_alloc); ret =3D __mptcp_move_skb(sk, skb) || ret; seq +=3D len; =20 --=20 2.51.0 From nobody Sat Oct 11 05:52:54 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 04BDC1F0E26 for ; Fri, 3 Oct 2025 14:02:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759500123; cv=none; b=L/+otoHHUm/+fF8qQAadlUB71vVL4lFE9pjxc0upYTzqHEBOWUsENx18Vy5LFENydK7ZL15GUaoUTjl19tYB2woe9sDKso9MSaJz25Ab3fSoKV1LBKoP5B5FrRY9PFFdMbufgZbzGXyw8oPHkqImRL3OBKKcDjWsESi26yE8cnw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759500123; c=relaxed/simple; bh=SCQ8Xb7reEqXG2dGDHPdixMk+cLwWuEn4yVA3U8cm9w=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=l1EJIJcodUvu1h+EovUoXiTPJ4swfBmCwKfnpbwcSq0tUmQOafKNeNriHkenSGb5513MFn8XhS8P3xYSkZKFLN2CI9QaZrEPDhRZkbrfQ4/CeEh52EubCx5/fwkiPDmgwFMPyP/9m7r/st2MHAc9FViFRr2tqLVWHNpP4jrcHN8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=NZOM3ctK; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="NZOM3ctK" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1759500119; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4FZinxvqb9PDtGlLc5BpxA59lUldusP+j0A1kGvkpG8=; b=NZOM3ctKD0O4GrDuz13vfgcTie3kxG/lC2YOVvPV58ay0vqDiqeXyGElxtySo4eoHUxw1K pDlcg1Br4K7lVCosZ4hGZf5PesxDDNcKe57ZLndlup/il/qUrOdQ6mkwGuA5TRA506MIkE ssEZ4yZP+2vUaQ7/pdHH59QqXMSD4Q4= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-332-i6lAtcqKNxeIssjBwSYz4Q-1; Fri, 03 Oct 2025 10:01:57 -0400 X-MC-Unique: i6lAtcqKNxeIssjBwSYz4Q-1 X-Mimecast-MFC-AGG-ID: i6lAtcqKNxeIssjBwSYz4Q_1759500116 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id CA3771800291 for ; Fri, 3 Oct 2025 14:01:56 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.44.32.53]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id E9E9618004D8 for ; Fri, 3 Oct 2025 14:01:55 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Subject: [PATCH v4 mptcp-next 2/8] mptcp: cleanup fallback data fin reception Date: Fri, 3 Oct 2025 16:01:40 +0200 Message-ID: <82ab62f0ec0c02bbb90bd119c0d4021ce6165ce0.1759499837.git.pabeni@redhat.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: GUmCsbkoUTuBYoOV7xF0LSawulOvGZviAFi8fK2f6Dc_1759500116 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" MPTCP currently generate a dummy data_fin for fallback socket when the fallback subflow has completed data reception using the current ack_seq. We are going to introduce backlog usage for the msk soon, even for fallback sockets: the ack_seq value will not match the most recent sequence number seen by the fallback subflow socket, as it will ignore data_seq sitting in the backlog. Instead use the last map sequence number to set the data_fin, as fallback (dummy) map sequences are always in sequence. Reviewed-by: Geliang Tang Tested-by: Geliang Tang Signed-off-by: Paolo Abeni --- v2 -> v3: - keep the close check in subflow_sched_work_if_closed, fix CI failures --- net/mptcp/subflow.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index e8325890a3223..b9455c04e8a46 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -1285,6 +1285,7 @@ static bool subflow_is_done(const struct sock *sk) /* sched mptcp worker for subflow cleanup if no more data is pending */ static void subflow_sched_work_if_closed(struct mptcp_sock *msk, struct so= ck *ssk) { + const struct mptcp_subflow_context *subflow =3D mptcp_subflow_ctx(ssk); struct sock *sk =3D (struct sock *)msk; =20 if (likely(ssk->sk_state !=3D TCP_CLOSE && @@ -1303,7 +1304,8 @@ static void subflow_sched_work_if_closed(struct mptcp= _sock *msk, struct sock *ss */ if (__mptcp_check_fallback(msk) && subflow_is_done(ssk) && msk->first =3D=3D ssk && - mptcp_update_rcv_data_fin(msk, READ_ONCE(msk->ack_seq), true)) + mptcp_update_rcv_data_fin(msk, subflow->map_seq + + subflow->map_data_len, true)) mptcp_schedule_work(sk); } =20 --=20 2.51.0 From nobody Sat Oct 11 05:52:54 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 410031F4295 for ; Fri, 3 Oct 2025 14:02:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759500124; cv=none; b=a2Y7obsUfNav8PPhsqmDmRcWvDgpcshI5LOGlrLItK7kVbMUUlu/POrLIVz/fr3zoCmqwMNlHnLVGPanlpg4I/Sx95petB0/GIbU5XqC3RL6o1c6LBIaldiICusjeaApFNco3KPKRvN0Gd5FK/GciW9i3otA8UHz1RvrSkaX5dw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759500124; c=relaxed/simple; bh=4G1qiH9mi9BjJXE0vk3gEFLeEg8YRUX3K9gKDNLCCD4=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=ol5hOSj6QnVg/2O77JyAj6gCtgcXDgJFbikhdsbnhNAXwNjKD0PoNF/ph0CskBY4AINbt2O3t1h7GILJhLaChku9OLZ9oWhLoAgKU1r0IhAewl/PPI5nW3g4hGHkfPh9KC0QvBCMN1wr7mwNtTYu5SF5VnWKI990IZEENUK2QOk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=CGBi9x27; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="CGBi9x27" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1759500121; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0160zZRTbn9jo9Ehpi2hICUKWsygsRTZYPeTy4d9kbw=; b=CGBi9x27Uv4ZDYsonDegO3Dn+Y8fEoPMwZunt+bwSI4sjQBcromnMeyAYiMKH04eTdeZ2k 2RRfEBN1/KyPM9FuT1IL4EbS+4dL1nZTrE+NGjR8ksRj2XYWMJpr59ZmF51qTBFmxyjqaQ qEKwJw2XuJNCnby83TEipGN/SXFsJMA= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-695-Pz04rt8UM46Mls-nCEnKQA-1; Fri, 03 Oct 2025 10:01:59 -0400 X-MC-Unique: Pz04rt8UM46Mls-nCEnKQA-1 X-Mimecast-MFC-AGG-ID: Pz04rt8UM46Mls-nCEnKQA_1759500118 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 55CD8180045C for ; Fri, 3 Oct 2025 14:01:58 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.44.32.53]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 735761800578 for ; Fri, 3 Oct 2025 14:01:57 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Subject: [PATCH v4 mptcp-next 3/8] mptcp: cleanup fallback dummy mapping generation Date: Fri, 3 Oct 2025 16:01:41 +0200 Message-ID: <97aa8b51782fb47cbc1a20c3ad0243f80929b1ee.1759499837.git.pabeni@redhat.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: do9afBvzQ4OKqHaDhkVLv0nOwAevqhnUn-el0upRmkA_1759500118 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" MPTCP currently access ack_seq outside the msk socket log scope to generate the dummy mapping for fallback socket. Soon we are going to introduce backlog usage and even for fallback socket the ack_seq value will be significantly off outside of the msk socket lock scope. Avoid relying on ack_seq for dummy mapping generation, using instead the subflow sequence number. Note that in case of disconnect() and (re)connect() we must ensure that any previous state is re-set. Signed-off-by: Paolo Abeni Reviewed-by: Geliang Tang --- v2 -> v3: - reordered before the backlog introduction to avoid transiently break the fallback - explicitly reset ack_seq --- net/mptcp/protocol.c | 3 +++ net/mptcp/subflow.c | 8 +++++++- 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 34661ab979158..12f201aa81f43 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -3234,6 +3234,9 @@ static int mptcp_disconnect(struct sock *sk, int flag= s) msk->bytes_retrans =3D 0; msk->rcvspace_init =3D 0; =20 + /* for fallback's sake */ + WRITE_ONCE(msk->ack_seq, 0); + WRITE_ONCE(sk->sk_shutdown, 0); sk_error_report(sk); return 0; diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index b9455c04e8a46..ac8616e7521e8 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -491,6 +491,9 @@ static void subflow_set_remote_key(struct mptcp_sock *m= sk, mptcp_crypto_key_sha(subflow->remote_key, NULL, &subflow->iasn); subflow->iasn++; =20 + /* for fallback's sake */ + subflow->map_seq =3D subflow->iasn; + WRITE_ONCE(msk->remote_key, subflow->remote_key); WRITE_ONCE(msk->ack_seq, subflow->iasn); WRITE_ONCE(msk->can_ack, true); @@ -1435,9 +1438,12 @@ static bool subflow_check_data_avail(struct sock *ss= k) =20 skb =3D skb_peek(&ssk->sk_receive_queue); subflow->map_valid =3D 1; - subflow->map_seq =3D READ_ONCE(msk->ack_seq); subflow->map_data_len =3D skb->len; subflow->map_subflow_seq =3D tcp_sk(ssk)->copied_seq - subflow->ssn_offse= t; + subflow->map_seq =3D __mptcp_expand_seq(subflow->map_seq, + subflow->iasn + + TCP_SKB_CB(skb)->seq - + subflow->ssn_offset - 1); WRITE_ONCE(subflow->data_avail, true); return true; } --=20 2.51.0 From nobody Sat Oct 11 05:52:54 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 38E28145A05 for ; Fri, 3 Oct 2025 14:02:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759500124; cv=none; b=hEab5bq6BNlu+kUgISZ5XuwymT63GDdckVlSfZZZrlHsrppfK6hYgTgkQUAna7/2qAqQv21JJK8mfpwfDuwocRMFM+SD1Q31tk5d2SSLH5I2k+LwhkouYByVJZs+RZTRc1PFGvUFIopVUwUt9An8uAdckUxXlPbLZN7r9NA+zDI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759500124; c=relaxed/simple; bh=dsxWGTzWDZtaQ9AJL/wO2gPuo0PORH8K6KoRXJpuvqo=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=h89Rh2AIeD8AqL3+a/9UpzsfrPsb00F2ods5F+58zIDkxqzyWLwUwE6LkMoTyyu/9aVKmJJb5NP3ijrQB8AU+rVozHkieYJH1wxAQ+v/9T/yWbYFhH8zegLfZblua4ipqOWID1Ad/1J0CFChJirwoZ5CHvG2rMQomCJi3r07Xy0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=NIf8gg0k; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="NIf8gg0k" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1759500122; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=iGpf85DuHdtMCHWzEwbJ5upLCsIcp0p3rQ5QVj0Z7xQ=; b=NIf8gg0kakGZ1YhVEhBmd535LBHj4QYqPhNqkD1OJdzxmFiVxsnQacyZQ+qknL0LHTHKay HnTYOFFdcIo9dPVpUWeprcTMO4KVNeSF/sccBpByjZ04JcOSNBPbsgZqRCUVwtUNvs8jLv 2abAhTs4Md3q3UsebN4mvED8ES66/sA= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-148-urNYpFrqM5ewW15cUaxFcQ-1; Fri, 03 Oct 2025 10:02:00 -0400 X-MC-Unique: urNYpFrqM5ewW15cUaxFcQ-1 X-Mimecast-MFC-AGG-ID: urNYpFrqM5ewW15cUaxFcQ_1759500119 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id C3D51180057E for ; Fri, 3 Oct 2025 14:01:59 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.44.32.53]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id E09ED1800576 for ; Fri, 3 Oct 2025 14:01:58 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Subject: [PATCH v4 mptcp-next 4/8] mptcp: fix MSG_PEEK stream corruption Date: Fri, 3 Oct 2025 16:01:42 +0200 Message-ID: <1f161922203105181ea1cad8ccc6d55328f03a01.1759499837.git.pabeni@redhat.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: SCBQmu3ynAKjHzgJf0mCZR4YxHmysU2tU3xAiJ81Cqo_1759500119 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" If a MSG_PEEK | MSG_WAITALL read operation consumes all the bytes in the receive queue and recvmsg() need to waits for more data - i.e. it's a blocking one - upon arrival of the next packet the MPTCP protocol will start again copying the oldest data present in the receive queue, corrupting the data stream. Address the issue explicitly tracking the peeked sequence number, restarting from the last peeked byte. Fixes: ca4fb892579f ("mptcp: add MSG_PEEK support") Signed-off-by: Paolo Abeni Reviewed-by: Geliang Tang --- This may sound quite esoteric, but it will soon become very easy to reproduce with mptcp_connect, thanks to the backlog. --- net/mptcp/protocol.c | 38 +++++++++++++++++++++++++------------- 1 file changed, 25 insertions(+), 13 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 12f201aa81f43..ce1238f620c33 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -1947,22 +1947,36 @@ static int mptcp_sendmsg(struct sock *sk, struct ms= ghdr *msg, size_t len) =20 static void mptcp_rcv_space_adjust(struct mptcp_sock *msk, int copied); =20 -static int __mptcp_recvmsg_mskq(struct sock *sk, - struct msghdr *msg, - size_t len, int flags, +static int __mptcp_recvmsg_mskq(struct sock *sk, struct msghdr *msg, + size_t len, int flags, int copied_total, struct scm_timestamping_internal *tss, int *cmsg_flags) { struct mptcp_sock *msk =3D mptcp_sk(sk); struct sk_buff *skb, *tmp; + int total_data_len =3D 0; int copied =3D 0; =20 skb_queue_walk_safe(&sk->sk_receive_queue, skb, tmp) { - u32 offset =3D MPTCP_SKB_CB(skb)->offset; + u32 delta, offset =3D MPTCP_SKB_CB(skb)->offset; u32 data_len =3D skb->len - offset; - u32 count =3D min_t(size_t, len - copied, data_len); + u32 count; int err; =20 + if (flags & MSG_PEEK) { + /* skip already peeked skbs*/ + if (total_data_len + data_len <=3D copied_total) { + total_data_len +=3D data_len; + continue; + } + + /* skip the already peeked data in the current skb */ + delta =3D copied_total - total_data_len; + offset +=3D delta; + data_len -=3D delta; + } + + count =3D min_t(size_t, len - copied, data_len); if (!(flags & MSG_TRUNC)) { err =3D skb_copy_datagram_msg(skb, offset, msg, count); if (unlikely(err < 0)) { @@ -1979,16 +1993,14 @@ static int __mptcp_recvmsg_mskq(struct sock *sk, =20 copied +=3D count; =20 - if (count < data_len) { - if (!(flags & MSG_PEEK)) { + if (!(flags & MSG_PEEK)) { + msk->bytes_consumed +=3D count; + if (count < data_len) { MPTCP_SKB_CB(skb)->offset +=3D count; MPTCP_SKB_CB(skb)->map_seq +=3D count; - msk->bytes_consumed +=3D count; + break; } - break; - } =20 - if (!(flags & MSG_PEEK)) { /* avoid the indirect call, we know the destructor is sock_rfree */ skb->destructor =3D NULL; skb->sk =3D NULL; @@ -1996,7 +2008,6 @@ static int __mptcp_recvmsg_mskq(struct sock *sk, sk_mem_uncharge(sk, skb->truesize); __skb_unlink(skb, &sk->sk_receive_queue); skb_attempt_defer_free(skb); - msk->bytes_consumed +=3D count; } =20 if (copied >=3D len) @@ -2194,7 +2205,8 @@ static int mptcp_recvmsg(struct sock *sk, struct msgh= dr *msg, size_t len, while (copied < len) { int err, bytes_read; =20 - bytes_read =3D __mptcp_recvmsg_mskq(sk, msg, len - copied, flags, &tss, = &cmsg_flags); + bytes_read =3D __mptcp_recvmsg_mskq(sk, msg, len - copied, flags, + copied, &tss, &cmsg_flags); if (unlikely(bytes_read < 0)) { if (!copied) copied =3D bytes_read; --=20 2.51.0 From nobody Sat Oct 11 05:52:54 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E27AE2F2D for ; Fri, 3 Oct 2025 14:04:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759500264; cv=none; b=UCG9yejBgyobrLdSq+pDAgY8ToStmJISTAWf7ZNsnQ7f7p7IVbXVlpDpxKsRH2/r4Ky+ib73R/DcNQeuHwraJAbUSoZ9eB85ZuspKZYuuKpgtaovmdEo6usS0jz59+4xEeBEHXp2O+ygOd9dCkhi9DEIupT55EJprU/kKrj6dXA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759500264; c=relaxed/simple; bh=dj8c/JN7HqwDUiohLe/fpbEFg0BYKJICKZ8rVg4upAE=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=Aw9rzmvfLqLTiRvKN7bb0KZmXEaqPa7VhUY4daHMildAxQyBaIM7Ww19DoOYRwFL86hco9ez5hgIAQ1TqnLdCGLyqV86NMmZdr1EkC/nTxWYYw4yHhW7jBID8rN8TsMVwJG4ZtBSYbiJlFqJaPJcZAec3yGkhccUg7ynAScMtSk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=ZhFWRMEr; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="ZhFWRMEr" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1759500262; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QO8GCeWsLtWh9fJUyB8zOtH8wdvB6AApkaM+V13SaDA=; b=ZhFWRMErpDcJqovsc62bNR6PQYjnvRnyPs6qb2xWdPMs3gTaseMC0rNaS6u6hrFLu3Jf1S nBKVU6QprR+ALMTZiC3THty4Y4c4pV5Zp6Aiei+gA/yFdxZ3vMyi53vfHU/Gx5B5MCKoxv eQTObi+JUVDyv6IImS+qKORwjyC3clg= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-445-8Hhh3tgbM-ScYTasvRqOfg-1; Fri, 03 Oct 2025 10:02:02 -0400 X-MC-Unique: 8Hhh3tgbM-ScYTasvRqOfg-1 X-Mimecast-MFC-AGG-ID: 8Hhh3tgbM-ScYTasvRqOfg_1759500121 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 3C4941800371 for ; Fri, 3 Oct 2025 14:02:01 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.44.32.53]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 592761800576 for ; Fri, 3 Oct 2025 14:02:00 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Subject: [PATCH v4 mptcp-next 5/8] mptcp: ensure the kernel PM does not take action too late Date: Fri, 3 Oct 2025 16:01:43 +0200 Message-ID: <8793483bbd42a8be40ccef7a94c0a2a447ffe828.1759499837.git.pabeni@redhat.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: RMrX5ZctwemqQPnbcBJVPkxDtBw0SpTNnm-dF79BdDY_1759500121 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" The PM hooks can currently take place when when the msk is already shutting down. Subflow creation will fail, thanks to the existing check at join time, but we can entirely avoid starting the to be failed operations. Signed-off-by: Paolo Abeni Reviewed-by: Geliang Tang --- net/mptcp/pm.c | 4 +++- net/mptcp/pm_kernel.c | 2 ++ 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c index daf6dcb806843..eade530d38e01 100644 --- a/net/mptcp/pm.c +++ b/net/mptcp/pm.c @@ -588,6 +588,7 @@ void mptcp_pm_subflow_established(struct mptcp_sock *ms= k) void mptcp_pm_subflow_check_next(struct mptcp_sock *msk, const struct mptcp_subflow_context *subflow) { + struct sock *sk =3D (struct sock *)msk; struct mptcp_pm_data *pm =3D &msk->pm; bool update_subflows; =20 @@ -611,7 +612,8 @@ void mptcp_pm_subflow_check_next(struct mptcp_sock *msk, /* Even if this subflow is not really established, tell the PM to try * to pick the next ones, if possible. */ - if (mptcp_pm_nl_check_work_pending(msk)) + if (mptcp_is_fully_established(sk) && + mptcp_pm_nl_check_work_pending(msk)) mptcp_pm_schedule_work(msk, MPTCP_PM_SUBFLOW_ESTABLISHED); =20 spin_unlock_bh(&pm->lock); diff --git a/net/mptcp/pm_kernel.c b/net/mptcp/pm_kernel.c index da431da16ae04..07b5142004e73 100644 --- a/net/mptcp/pm_kernel.c +++ b/net/mptcp/pm_kernel.c @@ -328,6 +328,8 @@ static void mptcp_pm_create_subflow_or_signal_addr(stru= ct mptcp_sock *msk) struct mptcp_pm_local local; =20 mptcp_mpc_endpoint_setup(msk); + if (!mptcp_is_fully_established(sk)) + return; =20 pr_debug("local %d:%d signal %d:%d subflows %d:%d\n", msk->pm.local_addr_used, endp_subflow_max, --=20 2.51.0 From nobody Sat Oct 11 05:52:54 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CA7291F1306 for ; Fri, 3 Oct 2025 14:02:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759500128; cv=none; b=W24WV9LWfnoiL89wJky8l7ZtiLTG6ED/JLCsUiBzXAcE4/uEYcdg3YAUCdX+/KrmryDtq1ryjCTGcEmpdwo0jajbf3fUHCJnybdIE1oBPaRM7wyI9jHoA3azHSPuTrJHpP7dODfIJfnUrT799VadjfqGanC012EzzBPDcA/SzHA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759500128; c=relaxed/simple; bh=7VabBeinxxALiH5p3HdZiZxNb11DPqlBxE8kZRcUqUk=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=SOC1IMccDJnsGP0rANynygcYrpz9J5VCICNaLJrfHpd1rtxZ4pSVmf/pWcOMKUImcZQS4yG44y/RM6w6ehz2hTomWKa+uXBPu5T5RvngSEpk3IrLkIpor02zer3Ec7PJabFgHE2mn6kuiTDyeXQ8WP3hDKT/UxtL3qFggqt0kTQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=MNQuev0p; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="MNQuev0p" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1759500125; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=vpdef+2OiUkv1+60fG/qup4gGgKhSS8bbflqwnqPxOQ=; b=MNQuev0puzPv2F6OBxIHN12oC3uMAQbUBTN0NMvHSt9ONOM5hgwuDMc5nmsuyh5gS8B7OT CsoDy2BqOE0i+00XP4GH9Jsi7tKd8lt6zzi/a553XoJ/QhGkM+55lvM/3/6a7eitkx3Xfj lO5FC+fupIRc510RL9iQ9aw1F9/ZhUc= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-338-wAkSYL5mOeicMpmc3bchWQ-1; Fri, 03 Oct 2025 10:02:04 -0400 X-MC-Unique: wAkSYL5mOeicMpmc3bchWQ-1 X-Mimecast-MFC-AGG-ID: wAkSYL5mOeicMpmc3bchWQ_1759500122 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id B4E231800299 for ; Fri, 3 Oct 2025 14:02:02 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.44.32.53]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id CE4BE1800576 for ; Fri, 3 Oct 2025 14:02:01 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Subject: [PATCH v4 mptcp-next 6/8] mptcp: do not miss early first subflow close event notification. Date: Fri, 3 Oct 2025 16:01:44 +0200 Message-ID: <3f9dad349afe9c3fd1302da825e7ed907343c2d5.1759499837.git.pabeni@redhat.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: -XNGuODSi61RAXFa7APibie1M6ZRPjcAksRWQ4Ae_to_1759500122 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" The MPTCP protocol is not currently emitting the NL event when the first subflow is closed before msk accept() time. By replacing the in use close helper is such scenario, implicitly introduce the missing notification. Note that in such scenario we want to be sure that mptcp_close_ssk() will not trigger any PM work, move the msk state change update earlier, so that the previous patch will offer such guarantee. Signed-off-by: Paolo Abeni Reviewed-by: Geliang Tang --- net/mptcp/protocol.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index ce1238f620c33..6ae5ab7595272 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -3988,10 +3988,10 @@ static int mptcp_stream_accept(struct socket *sock,= struct socket *newsock, * deal with bad peers not doing a complete shutdown. */ if (unlikely(inet_sk_state_load(msk->first) =3D=3D TCP_CLOSE)) { - __mptcp_close_ssk(newsk, msk->first, - mptcp_subflow_ctx(msk->first), 0); if (unlikely(list_is_singular(&msk->conn_list))) mptcp_set_state(newsk, TCP_CLOSE); + mptcp_close_ssk(newsk, msk->first, + mptcp_subflow_ctx(msk->first)); } } else { tcpfallback: --=20 2.51.0 From nobody Sat Oct 11 05:52:54 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8CBB6134BD for ; Fri, 3 Oct 2025 14:02:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759500129; cv=none; b=eFvDPSGVemepZrDJVid4TsdflZM3qwzXOR+KRVXdd+hZ+tu+rtaw9CDra78mMsyRUHYSmJu79kpHx4/p1PxN2oofMp8YQADjcbY+e7uJr77Qu6IAQ99tcZCSjVUowPZ2LWtYf0/5m3xK7lmR8R9G5mxNqbdcEC/gs9vx2KxgvYQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759500129; c=relaxed/simple; bh=jqQtfJbFccWEgBHa3xJOB0qkHWJJhnvP//Z4ExzRXDc=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=qS316ZuMtNyzKHE6rbs0zW9PIc3tqjq81V5zhSbXnozz6+VdtLl/tRoe2y0IIvvEUq/3sT7d23ZGQBzcLciJaWppvb4dWaE4qDvUNZ1xfCP5w0fp/hIQZSG4Odv3g9h97saZXCwaL3y2fxGbp+jtUBf6sLqc6PuxnR29162GkPg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=SdIYYUxO; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="SdIYYUxO" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1759500126; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zUcMumon5gU519gvTrM5Jo3mGx11g5ibe+R45ZM76tM=; b=SdIYYUxOh3vaqPjl3LQ1tb2ib1thtX/37iSbzbh1g4tq91+a98yljc3tI/81zMxcOa8jcx gWQRLqeJZH0KoN7IKpqd2CuD24RobKCN/O2AVrNmFuPF7AVkbxkdV0sPpC23ZBSSf5Ymrn QLt63n6MiufANWFfzCLt6eXOabd0GTo= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-322-K4004hKQOC-irNprL53wkQ-1; Fri, 03 Oct 2025 10:02:05 -0400 X-MC-Unique: K4004hKQOC-irNprL53wkQ-1 X-Mimecast-MFC-AGG-ID: K4004hKQOC-irNprL53wkQ_1759500124 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 3E2251800343 for ; Fri, 3 Oct 2025 14:02:04 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.44.32.53]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 5DC4D180057A for ; Fri, 3 Oct 2025 14:02:03 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Subject: [PATCH v4 mptcp-next 7/8] mptcp: make mptcp_destroy_common() static Date: Fri, 3 Oct 2025 16:01:45 +0200 Message-ID: <59879dfb364477686da1ae5b6ab0ddc3478f84cc.1759499837.git.pabeni@redhat.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: OqnUznaDx5yMhxPYECSS-kPY0t-wkyBml88oay-c6X4_1759500124 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" Such function is only used inside protocol.c, there is no need to expose it to the whole stack. Note that the function definition most be moved earlier to avoid forward declaration. Signed-off-by: Paolo Abeni Reviewed-by: Geliang Tang --- net/mptcp/protocol.c | 42 +++++++++++++++++++++--------------------- net/mptcp/protocol.h | 2 -- 2 files changed, 21 insertions(+), 23 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 6ae5ab7595272..e354f16f4a79f 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -3195,6 +3195,27 @@ static void mptcp_copy_inaddrs(struct sock *msk, con= st struct sock *ssk) inet_sk(msk)->inet_rcv_saddr =3D inet_sk(ssk)->inet_rcv_saddr; } =20 +static void mptcp_destroy_common(struct mptcp_sock *msk, unsigned int flag= s) +{ + struct mptcp_subflow_context *subflow, *tmp; + struct sock *sk =3D (struct sock *)msk; + + __mptcp_clear_xmit(sk); + + /* join list will be eventually flushed (with rst) at sock lock release t= ime */ + mptcp_for_each_subflow_safe(msk, subflow, tmp) + __mptcp_close_ssk(sk, mptcp_subflow_tcp_sock(subflow), subflow, flags); + + __skb_queue_purge(&sk->sk_receive_queue); + skb_rbtree_purge(&msk->out_of_order_queue); + + /* move all the rx fwd alloc into the sk_mem_reclaim_final in + * inet_sock_destruct() will dispose it + */ + mptcp_token_destroy(msk); + mptcp_pm_destroy(msk); +} + static int mptcp_disconnect(struct sock *sk, int flags) { struct mptcp_sock *msk =3D mptcp_sk(sk); @@ -3399,27 +3420,6 @@ void mptcp_rcv_space_init(struct mptcp_sock *msk, co= nst struct sock *ssk) msk->rcvq_space.space =3D TCP_INIT_CWND * TCP_MSS_DEFAULT; } =20 -void mptcp_destroy_common(struct mptcp_sock *msk, unsigned int flags) -{ - struct mptcp_subflow_context *subflow, *tmp; - struct sock *sk =3D (struct sock *)msk; - - __mptcp_clear_xmit(sk); - - /* join list will be eventually flushed (with rst) at sock lock release t= ime */ - mptcp_for_each_subflow_safe(msk, subflow, tmp) - __mptcp_close_ssk(sk, mptcp_subflow_tcp_sock(subflow), subflow, flags); - - __skb_queue_purge(&sk->sk_receive_queue); - skb_rbtree_purge(&msk->out_of_order_queue); - - /* move all the rx fwd alloc into the sk_mem_reclaim_final in - * inet_sock_destruct() will dispose it - */ - mptcp_token_destroy(msk); - mptcp_pm_destroy(msk); -} - static void mptcp_destroy(struct sock *sk) { struct mptcp_sock *msk =3D mptcp_sk(sk); diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index 0545eab231250..46d8432c72ee7 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -979,8 +979,6 @@ static inline void mptcp_propagate_sndbuf(struct sock *= sk, struct sock *ssk) local_bh_enable(); } =20 -void mptcp_destroy_common(struct mptcp_sock *msk, unsigned int flags); - #define MPTCP_TOKEN_MAX_RETRIES 4 =20 void __init mptcp_token_init(void); --=20 2.51.0 From nobody Sat Oct 11 05:52:54 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E5862145A05 for ; Fri, 3 Oct 2025 14:02:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759500130; cv=none; b=Y86r/N3MRnRh3eD0jU3FzRkNlGGJrjamSJzr43xJtBmkAMWn0xgELw8Xpe2iJxL2JMs4MmeVRZo7jA/JMh611i0wcX6fOmNhLBJriFQuOd2bjweqkUoSi+Ddi9aCff/6An4iP/mWCMOF76r4DP11qhO4/I7PGHWMaJ82T6EdheI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1759500130; c=relaxed/simple; bh=9JDDn8pPAqMvC0EvkvOe+Wxb/1fXLGpuiseQ01ZKF78=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=hPOosRoHVDKcIfyxldR/2CbbP3M61+D75aCrpDqAyS1i5Bo43VXxkTM/314ovpVGUHHfuSp5KPTwZ+l3MoRIjiNnkLaK0W3wU7zW3cLPbnb+WHryPXwuP7Wv/y41uadxZVSmfkQCKjiG+YTLUiYcob0FLEQRXdkaJBU7DTaPCiM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=TuddQx38; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="TuddQx38" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1759500127; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Lfq75vkPv4ZtWnk2fhWyZrkRDenuWNq0XzCmo4xxZXg=; b=TuddQx38pYKBR3pYhkPH/azkw3JRha05xCdugCGxeViu2Ee6/j4JurwUflamDRFTtYTrSI HQVfO01mlohJTv/C0pFp14SolLBu0slxsa+NNKZ5Fne5/iO7C4GR6AK1e/5qLe9GebbJ5y GiKz97UhJAl0vDwKKEXwrReYQDqMmBs= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-140-jZYLrqCEOvGqOqYo4nDrPQ-1; Fri, 03 Oct 2025 10:02:06 -0400 X-MC-Unique: jZYLrqCEOvGqOqYo4nDrPQ-1 X-Mimecast-MFC-AGG-ID: jZYLrqCEOvGqOqYo4nDrPQ_1759500125 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id AEF1E1800447 for ; Fri, 3 Oct 2025 14:02:05 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.44.32.53]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id CA9D81800577 for ; Fri, 3 Oct 2025 14:02:04 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Subject: [PATCH v4 mptcp-next 8/8] mptcp: leverage the backlog for RX packet processing Date: Fri, 3 Oct 2025 16:01:46 +0200 Message-ID: In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: _mo8EdX9hdI89448vQ5WL9JdHzLzSuXuuSPrKl0iWnU_1759500125 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" When the msk socket is owned or the msk receive buffer is full, move the incoming skbs in a msk level backlog list. This avoid traversing the joined subflows and acquiring the subflow level socket lock at reception time, improving the RX performances. The skbs in the backlog keep using the incoming subflow receive space, to allow backpressure on the subflow flow control, and when processing the backlog, skbs exceeding the msk receive space are not dropped and re-inserted into backlog processing, as dropping packets already acked at the TCP level, is explicitly discouraged by the RFC and would corrupt the data stream for fallback sockets. As a drawback, special care is needed to avoid adding skbs to the backlog of a closed msk, and to avoid leaving dangling references into the backlog at subflow closing time. Note that we can't use sk_backlog, as such list is processed before release_cb() and the latter can release and re-acquire the msk level socket spin lock. That would cause msk-level OoO that in turn are fatal in case of fallback. Signed-off-by: Paolo Abeni Reviewed-by: Geliang Tang --- net/mptcp/protocol.c | 204 ++++++++++++++++++++++++++++--------------- net/mptcp/protocol.h | 5 +- 2 files changed, 136 insertions(+), 73 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index e354f16f4a79f..1fcdb26b8e0a0 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -654,8 +654,35 @@ static void mptcp_dss_corruption(struct mptcp_sock *ms= k, struct sock *ssk) } } =20 +static void __mptcp_add_backlog(struct sock *sk, struct sk_buff *skb) +{ + struct mptcp_sock *msk =3D mptcp_sk(sk); + struct sk_buff *tail =3D NULL; + bool fragstolen; + int delta; + + if (unlikely(sk->sk_state =3D=3D TCP_CLOSE)) + kfree_skb_reason(skb, SKB_DROP_REASON_SOCKET_CLOSE); + + /* Try to coalesce with the last skb in our backlog */ + if (!list_empty(&msk->backlog_list)) + tail =3D list_last_entry(&msk->backlog_list, struct sk_buff, list); + + if (tail && MPTCP_SKB_CB(skb)->map_seq =3D=3D MPTCP_SKB_CB(tail)->end_seq= && + skb->sk =3D=3D tail->sk && + __mptcp_try_coalesce(sk, tail, skb, &fragstolen, &delta)) { + atomic_sub(skb->truesize - delta, &skb->sk->sk_rmem_alloc); + kfree_skb_partial(skb, fragstolen); + WRITE_ONCE(msk->backlog_len, msk->backlog_len + delta); + return; + } + + list_add_tail(&skb->list, &msk->backlog_list); + WRITE_ONCE(msk->backlog_len, msk->backlog_len + skb->truesize); +} + static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk, - struct sock *ssk) + struct sock *ssk, bool own_msk) { struct mptcp_subflow_context *subflow =3D mptcp_subflow_ctx(ssk); struct sock *sk =3D (struct sock *)msk; @@ -671,9 +698,6 @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp= _sock *msk, struct sk_buff *skb; bool fin; =20 - if (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf) - break; - /* try to move as much data as available */ map_remaining =3D subflow->map_data_len - mptcp_subflow_get_map_offset(subflow); @@ -701,10 +725,18 @@ static bool __mptcp_move_skbs_from_subflow(struct mpt= cp_sock *msk, int bmem; =20 bmem =3D mptcp_init_skb(ssk, skb, offset, len); - skb->sk =3D NULL; - sk_forward_alloc_add(sk, bmem); - atomic_sub(skb->truesize, &ssk->sk_rmem_alloc); - ret =3D __mptcp_move_skb(sk, skb) || ret; + if (own_msk) + sk_forward_alloc_add(sk, bmem); + else + msk->borrowed_mem +=3D bmem; + + if (own_msk && sk_rmem_alloc_get(sk) < sk->sk_rcvbuf) { + skb->sk =3D NULL; + atomic_sub(skb->truesize, &ssk->sk_rmem_alloc); + ret |=3D __mptcp_move_skb(sk, skb); + } else { + __mptcp_add_backlog(sk, skb); + } seq +=3D len; =20 if (unlikely(map_remaining < len)) { @@ -823,7 +855,7 @@ static bool move_skbs_to_msk(struct mptcp_sock *msk, st= ruct sock *ssk) struct sock *sk =3D (struct sock *)msk; bool moved; =20 - moved =3D __mptcp_move_skbs_from_subflow(msk, ssk); + moved =3D __mptcp_move_skbs_from_subflow(msk, ssk, true); __mptcp_ofo_queue(msk); if (unlikely(ssk->sk_err)) __mptcp_subflow_error_report(sk, ssk); @@ -838,18 +870,10 @@ static bool move_skbs_to_msk(struct mptcp_sock *msk, = struct sock *ssk) return moved; } =20 -static void __mptcp_data_ready(struct sock *sk, struct sock *ssk) -{ - struct mptcp_sock *msk =3D mptcp_sk(sk); - - /* Wake-up the reader only for in-sequence data */ - if (move_skbs_to_msk(msk, ssk) && mptcp_epollin_ready(sk)) - sk->sk_data_ready(sk); -} - void mptcp_data_ready(struct sock *sk, struct sock *ssk) { struct mptcp_subflow_context *subflow =3D mptcp_subflow_ctx(ssk); + struct mptcp_sock *msk =3D mptcp_sk(sk); =20 /* The peer can send data while we are shutting down this * subflow at msk destruction time, but we must avoid enqueuing @@ -859,10 +883,13 @@ void mptcp_data_ready(struct sock *sk, struct sock *s= sk) return; =20 mptcp_data_lock(sk); - if (!sock_owned_by_user(sk)) - __mptcp_data_ready(sk, ssk); - else - __set_bit(MPTCP_DEQUEUE, &mptcp_sk(sk)->cb_flags); + if (!sock_owned_by_user(sk)) { + /* Wake-up the reader only for in-sequence data */ + if (move_skbs_to_msk(msk, ssk) && mptcp_epollin_ready(sk)) + sk->sk_data_ready(sk); + } else { + __mptcp_move_skbs_from_subflow(msk, ssk, false); + } mptcp_data_unlock(sk); } =20 @@ -2096,60 +2123,61 @@ static void mptcp_rcv_space_adjust(struct mptcp_soc= k *msk, int copied) msk->rcvq_space.time =3D mstamp; } =20 -static struct mptcp_subflow_context * -__mptcp_first_ready_from(struct mptcp_sock *msk, - struct mptcp_subflow_context *subflow) +static bool __mptcp_move_skbs(struct sock *sk, struct list_head *skbs, u32= *delta) { - struct mptcp_subflow_context *start_subflow =3D subflow; - - while (!READ_ONCE(subflow->data_avail)) { - subflow =3D mptcp_next_subflow(msk, subflow); - if (subflow =3D=3D start_subflow) - return NULL; - } - return subflow; -} - -static bool __mptcp_move_skbs(struct sock *sk) -{ - struct mptcp_subflow_context *subflow; + struct sk_buff *skb =3D list_first_entry(skbs, struct sk_buff, list); struct mptcp_sock *msk =3D mptcp_sk(sk); - bool ret =3D false; - - if (list_empty(&msk->conn_list)) - return false; - - subflow =3D list_first_entry(&msk->conn_list, - struct mptcp_subflow_context, node); - for (;;) { - struct sock *ssk; - bool slowpath; + bool moved =3D false; =20 - /* - * As an optimization avoid traversing the subflows list - * and ev. acquiring the subflow socket lock before baling out - */ + while (1) { + /* If the msk recvbuf is full stop, don't drop */ if (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf) break; =20 - subflow =3D __mptcp_first_ready_from(msk, subflow); - if (!subflow) - break; + prefetch(skb->next); + list_del(&skb->list); + *delta +=3D skb->truesize; =20 - ssk =3D mptcp_subflow_tcp_sock(subflow); - slowpath =3D lock_sock_fast(ssk); - ret =3D __mptcp_move_skbs_from_subflow(msk, ssk) || ret; - if (unlikely(ssk->sk_err)) - __mptcp_error_report(sk); - unlock_sock_fast(ssk, slowpath); + /* Release the memory allocated on the incoming subflow before + * moving it to the msk + */ + atomic_sub(skb->truesize, &skb->sk->sk_rmem_alloc); + skb->sk =3D NULL; + moved |=3D __mptcp_move_skb(sk, skb); + if (list_empty(skbs)) + break; =20 - subflow =3D mptcp_next_subflow(msk, subflow); + skb =3D list_first_entry(skbs, struct sk_buff, list); } =20 __mptcp_ofo_queue(msk); - if (ret) + if (moved) mptcp_check_data_fin((struct sock *)msk); - return ret; + return moved; +} + +static bool mptcp_move_skbs(struct sock *sk) +{ + struct mptcp_sock *msk =3D mptcp_sk(sk); + bool moved =3D false; + LIST_HEAD(skbs); + u32 delta =3D 0; + + mptcp_data_lock(sk); + while (!list_empty(&msk->backlog_list)) { + list_splice_init(&msk->backlog_list, &skbs); + mptcp_data_unlock(sk); + moved |=3D __mptcp_move_skbs(sk, &skbs, &delta); + + mptcp_data_lock(sk); + if (!list_empty(&skbs)) { + list_splice(&skbs, &msk->backlog_list); + break; + } + } + WRITE_ONCE(msk->backlog_len, msk->backlog_len - delta); + mptcp_data_unlock(sk); + return moved; } =20 static unsigned int mptcp_inq_hint(const struct sock *sk) @@ -2215,7 +2243,7 @@ static int mptcp_recvmsg(struct sock *sk, struct msgh= dr *msg, size_t len, =20 copied +=3D bytes_read; =20 - if (skb_queue_empty(&sk->sk_receive_queue) && __mptcp_move_skbs(sk)) + if (!list_empty(&msk->backlog_list) && mptcp_move_skbs(sk)) continue; =20 /* only the MPTCP socket status is relevant here. The exit @@ -2520,6 +2548,9 @@ static void __mptcp_close_ssk(struct sock *sk, struct= sock *ssk, void mptcp_close_ssk(struct sock *sk, struct sock *ssk, struct mptcp_subflow_context *subflow) { + struct mptcp_sock *msk =3D mptcp_sk(sk); + struct sk_buff *skb; + /* The first subflow can already be closed and still in the list */ if (subflow->close_event_done) return; @@ -2529,6 +2560,18 @@ void mptcp_close_ssk(struct sock *sk, struct sock *s= sk, if (sk->sk_state =3D=3D TCP_ESTABLISHED) mptcp_event(MPTCP_EVENT_SUB_CLOSED, mptcp_sk(sk), ssk, GFP_KERNEL); =20 + /* Remove any reference from the backlog to this ssk, accounting the + * related skb directly to the main socket + */ + list_for_each_entry(skb, &msk->backlog_list, list) { + if (skb->sk !=3D ssk) + continue; + + atomic_sub(skb->truesize, &skb->sk->sk_rmem_alloc); + atomic_add(skb->truesize, &sk->sk_rmem_alloc); + skb->sk =3D sk; + } + /* subflow aborted before reaching the fully_established status * attempt the creation of the next subflow */ @@ -2761,8 +2804,11 @@ static void mptcp_do_fastclose(struct sock *sk) { struct mptcp_subflow_context *subflow, *tmp; struct mptcp_sock *msk =3D mptcp_sk(sk); + struct sk_buff *skb; =20 mptcp_set_state(sk, TCP_CLOSE); + list_for_each_entry(skb, &msk->backlog_list, list) + kfree_skb_reason(skb, SKB_DROP_REASON_SOCKET_CLOSE); mptcp_for_each_subflow_safe(msk, subflow, tmp) __mptcp_close_ssk(sk, mptcp_subflow_tcp_sock(subflow), subflow, MPTCP_CF_FASTCLOSE); @@ -2820,6 +2866,7 @@ static void __mptcp_init_sock(struct sock *sk) INIT_LIST_HEAD(&msk->conn_list); INIT_LIST_HEAD(&msk->join_list); INIT_LIST_HEAD(&msk->rtx_queue); + INIT_LIST_HEAD(&msk->backlog_list); INIT_WORK(&msk->work, mptcp_worker); msk->out_of_order_queue =3D RB_ROOT; msk->first_pending =3D NULL; @@ -3199,9 +3246,13 @@ static void mptcp_destroy_common(struct mptcp_sock *= msk, unsigned int flags) { struct mptcp_subflow_context *subflow, *tmp; struct sock *sk =3D (struct sock *)msk; + struct sk_buff *skb; =20 __mptcp_clear_xmit(sk); =20 + list_for_each_entry(skb, &msk->backlog_list, list) + kfree_skb_reason(skb, SKB_DROP_REASON_SOCKET_CLOSE); + /* join list will be eventually flushed (with rst) at sock lock release t= ime */ mptcp_for_each_subflow_safe(msk, subflow, tmp) __mptcp_close_ssk(sk, mptcp_subflow_tcp_sock(subflow), subflow, flags); @@ -3451,23 +3502,29 @@ void __mptcp_check_push(struct sock *sk, struct soc= k *ssk) =20 #define MPTCP_FLAGS_PROCESS_CTX_NEED (BIT(MPTCP_PUSH_PENDING) | \ BIT(MPTCP_RETRANSMIT) | \ - BIT(MPTCP_FLUSH_JOIN_LIST) | \ - BIT(MPTCP_DEQUEUE)) + BIT(MPTCP_FLUSH_JOIN_LIST)) =20 /* processes deferred events and flush wmem */ static void mptcp_release_cb(struct sock *sk) __must_hold(&sk->sk_lock.slock) { struct mptcp_sock *msk =3D mptcp_sk(sk); + u32 delta =3D 0; =20 for (;;) { unsigned long flags =3D (msk->cb_flags & MPTCP_FLAGS_PROCESS_CTX_NEED); - struct list_head join_list; + LIST_HEAD(join_list); + LIST_HEAD(skbs); + + sk_forward_alloc_add(sk, msk->borrowed_mem); + msk->borrowed_mem =3D 0; + + if (sk_rmem_alloc_get(sk) < sk->sk_rcvbuf) + list_splice_init(&msk->backlog_list, &skbs); =20 - if (!flags) + if (!flags && list_empty(&skbs)) break; =20 - INIT_LIST_HEAD(&join_list); list_splice_init(&msk->join_list, &join_list); =20 /* the following actions acquire the subflow socket lock @@ -3486,7 +3543,8 @@ static void mptcp_release_cb(struct sock *sk) __mptcp_push_pending(sk, 0); if (flags & BIT(MPTCP_RETRANSMIT)) __mptcp_retrans(sk); - if ((flags & BIT(MPTCP_DEQUEUE)) && __mptcp_move_skbs(sk)) { + if (!list_empty(&skbs) && + __mptcp_move_skbs(sk, &skbs, &delta)) { /* notify ack seq update */ mptcp_cleanup_rbuf(msk, 0); sk->sk_data_ready(sk); @@ -3494,7 +3552,9 @@ static void mptcp_release_cb(struct sock *sk) =20 cond_resched(); spin_lock_bh(&sk->sk_lock.slock); + list_splice(&skbs, &msk->backlog_list); } + WRITE_ONCE(msk->backlog_len, msk->backlog_len - delta); =20 if (__test_and_clear_bit(MPTCP_CLEAN_UNA, &msk->cb_flags)) __mptcp_clean_una_wakeup(sk); @@ -3726,7 +3786,7 @@ static int mptcp_ioctl(struct sock *sk, int cmd, int = *karg) return -EINVAL; =20 lock_sock(sk); - if (__mptcp_move_skbs(sk)) + if (mptcp_move_skbs(sk)) mptcp_cleanup_rbuf(msk, 0); *karg =3D mptcp_inq_hint(sk); release_sock(sk); diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index 46d8432c72ee7..c9c6582b4e1c4 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -124,7 +124,6 @@ #define MPTCP_FLUSH_JOIN_LIST 5 #define MPTCP_SYNC_STATE 6 #define MPTCP_SYNC_SNDBUF 7 -#define MPTCP_DEQUEUE 8 =20 struct mptcp_skb_cb { u64 map_seq; @@ -301,6 +300,7 @@ struct mptcp_sock { u32 last_ack_recv; unsigned long timer_ival; u32 token; + u32 borrowed_mem; unsigned long flags; unsigned long cb_flags; bool recovery; /* closing subflow write queue reinjected */ @@ -358,6 +358,8 @@ struct mptcp_sock { * allow_infinite_fallback and * allow_join */ + struct list_head backlog_list; /*protected by the data lock */ + u32 backlog_len; }; =20 #define mptcp_data_lock(sk) spin_lock_bh(&(sk)->sk_lock.slock) @@ -408,6 +410,7 @@ static inline int mptcp_space_from_win(const struct soc= k *sk, int win) static inline int __mptcp_space(const struct sock *sk) { return mptcp_win_from_space(sk, READ_ONCE(sk->sk_rcvbuf) - + READ_ONCE(mptcp_sk(sk)->backlog_len) - sk_rmem_alloc_get(sk)); } =20 --=20 2.51.0