From nobody Wed Jun 24 11:24:54 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 250593DBD54 for ; Fri, 24 Apr 2026 14:09:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777039773; cv=none; b=fO6WBLKnDHWhcszsJzfUzi0Ag7u4JQeuKe7qx90v5aT/sqZLehl7bAfcXqOjJQ6Cwn2a0aulrvB7gg/X/a0zcrrvYPlv2zI3zP+B6z8xe33b064sz0VplkGX8k/rotTYUeHcS2V/A1teJf7hc7WzxGPRIyVVEz3UxGTjkpD21wA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777039773; c=relaxed/simple; bh=ieu4cFLGPOo5NwibD2ixaggIIXdpxjzB8adYoLIhr/0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=IE+NYE7AoiMxiY9qDzRfFzM2B1BQR9GPkKLc+7gbuCR09w7ya4DNFM+fg3qAilgQgDvs8qQacm7I0e7quo8ArOWjaqNWgXExmu7VWZ+vgENJeo6z5RJqGbt3M1tcRTiF+IjElHIGNAimQ6Yam284ebf3LDzYHghdrAAYpFORFAs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=Fojlnw5l; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Fojlnw5l" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1777039770; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=W+UK9r/sR87QonhfXv1o9uBnN81Bbv7qL8Q6uUiSfk8=; b=Fojlnw5ldrM1IoHSyrMV6/cre/omB0rcPjPCMLjAnQNXZXak5Fa4Mndrh0Cj3jhDwx6e4h lLnWvoqsRBvoEAHxkHvG8SihFag4Na2x5hrIBfewc1U5bITz1/FNciTXb0z0uBrtftBPWm zv2cS0g+O84E93bIYMyswVS4A2jsReU= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-563-IhAzXmzRNO2k6pv9lwmobg-1; Fri, 24 Apr 2026 10:09:26 -0400 X-MC-Unique: IhAzXmzRNO2k6pv9lwmobg-1 X-Mimecast-MFC-AGG-ID: IhAzXmzRNO2k6pv9lwmobg_1777039765 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 6CA5A19775AC; Fri, 24 Apr 2026 14:09:10 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.44.32.130]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 8B38730078D5; Fri, 24 Apr 2026 14:09:08 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Cc: yangang@kylinos.cn, geliang@kernel.org, matttbe@kernel.org Subject: [PATCH mptcp-next v1 6/9] mptcp: implemented OoO queue pruning Date: Fri, 24 Apr 2026 16:08:39 +0200 Message-ID: <24d5c31a9257e2c247f53db9eadec9ddc2f47db4.1777038888.git.pabeni@redhat.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: RP0yYxwKB-Bl7mwvfA_G6QwQPAFi8pt-6vHULUAf53o_1777039765 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" Leverage the hybrid helper to implement the receive queue and OoO queue collapsing at ingress time when reaching memory bounds. If the msk is owned by the user-space at incoming skb time, perform the pruning in the release_cb. The prune check is additionally performed when the skb reaches the msk-level queues. Pruning is not needed for fallback socket, as their MPTCP-level OoO queue must always be empty: remove the ingress check for such scenario and relay on the TCP-level one.. Signed-off-by: Paolo Abeni --- RFC -> v1: - use data_seq only when available - avoid ack_seq lockless access - drop limit on fallback - collapse rcvqueue, too - drop only when pruning is not possible and over rcvbuf * 2 Notes: - Similarly to path 'mptcp: move checks vs rcvbuf size earlier in the RX path', some cleanup/tuning in mptcp_over_limit() will be needed - Pruning in the release_cb() is likely not needed, should probably be removed (after more testing). --- net/mptcp/mib.c | 3 ++ net/mptcp/mib.h | 3 ++ net/mptcp/options.c | 40 +++++++++++++++++++++++--- net/mptcp/protocol.c | 68 ++++++++++++++++++++++++++++++++++++++++++++ net/mptcp/protocol.h | 2 ++ 5 files changed, 112 insertions(+), 4 deletions(-) diff --git a/net/mptcp/mib.c b/net/mptcp/mib.c index f23fda0c55a7..5128feec942c 100644 --- a/net/mptcp/mib.c +++ b/net/mptcp/mib.c @@ -85,6 +85,9 @@ static const struct snmp_mib mptcp_snmp_list[] =3D { SNMP_MIB_ITEM("SimultConnectFallback", MPTCP_MIB_SIMULTCONNFALLBACK), SNMP_MIB_ITEM("FallbackFailed", MPTCP_MIB_FALLBACKFAILED), SNMP_MIB_ITEM("WinProbe", MPTCP_MIB_WINPROBE), + SNMP_MIB_ITEM("OfoPruned", MPTCP_MIB_OFO_PRUNED), + SNMP_MIB_ITEM("RcvPruned", MPTCP_MIB_RCVPRUNED), + SNMP_MIB_ITEM("RcvCollapsed", MPTCP_MIB_RCVCOLLAPSED), }; =20 /* mptcp_mib_alloc - allocate percpu mib counters diff --git a/net/mptcp/mib.h b/net/mptcp/mib.h index 812218b5ed2b..2f8f68e33ac5 100644 --- a/net/mptcp/mib.h +++ b/net/mptcp/mib.h @@ -88,6 +88,9 @@ enum linux_mptcp_mib_field { MPTCP_MIB_SIMULTCONNFALLBACK, /* Simultaneous connect */ MPTCP_MIB_FALLBACKFAILED, /* Can't fallback due to msk status */ MPTCP_MIB_WINPROBE, /* MPTCP-level zero window probe */ + MPTCP_MIB_OFO_PRUNED, /* MPTCP-level OoO queue pruned */ + MPTCP_MIB_RCVPRUNED, /* Dropped due to memory constrains */ + MPTCP_MIB_RCVCOLLAPSED, /* Collapsed due to memory pressure */ __MPTCP_MIB_MAX }; =20 diff --git a/net/mptcp/options.c b/net/mptcp/options.c index 14afeee8ca5f..a49cb03954e5 100644 --- a/net/mptcp/options.c +++ b/net/mptcp/options.c @@ -1158,12 +1158,40 @@ static bool add_addr_hmac_valid(struct mptcp_sock *= msk, return hmac =3D=3D mp_opt->ahmac; } =20 -static bool mptcp_over_limit(const struct sock *sk, struct sk_buff *skb) +static bool mptcp_over_limit(struct sock *sk, struct sk_buff *skb, + const struct mptcp_options_received *mp_opt) { + struct mptcp_sock *msk =3D mptcp_sk(sk); + bool ret; + if (TCP_SKB_CB(skb)->seq =3D=3D TCP_SKB_CB(skb)->end_seq) return false; =20 - return sk_rmem_alloc_get(sk) > READ_ONCE(sk->sk_rcvbuf); + /* Allow some slack for backlog processing */ + if (sk_rmem_alloc_get(sk) < READ_ONCE(sk->sk_rcvbuf)) + return false; + + mptcp_data_lock(sk); + if (!sock_owned_by_user(sk)) { + /* When the data seqence is not (yet) available for the, + * incoming skb, allow pruning the whole OoO queue + */ + u32 seq =3D !mp_opt->use_map || mp_opt->mpc_map ? msk->ack_seq : + mp_opt->data_seq; + + __mptcp_check_prune(sk, seq); + ret =3D sk_rmem_alloc_get(sk) > READ_ONCE(sk->sk_rcvbuf); + } else { + u64 limit =3D ((u64)READ_ONCE(sk->sk_rcvbuf)) << 1; + + /* Pruning will take place later in the RX path, allow + * some extra slack. + */ + ret =3D sk_rmem_alloc_get(sk) > limit; + __set_bit(MPTCP_PRUNE, &msk->cb_flags); + } + mptcp_data_unlock(sk); + return ret; } =20 /* Return false when the caller must drop the packet, i.e. in case of erro= r, @@ -1194,7 +1222,11 @@ bool mptcp_incoming_options(struct sock *sk, struct = sk_buff *skb) __mptcp_data_acked(subflow->conn); mptcp_data_unlock(subflow->conn); =20 - if (mptcp_over_limit(subflow->conn, skb)) + /* Will use ack_seq as limit for OoO pruning; any value would do + * as OoO queue must be empty. + */ + mp_opt.use_map =3D 0; + if (mptcp_over_limit(subflow->conn, skb, &mp_opt)) return false; return true; } @@ -1274,7 +1306,7 @@ bool mptcp_incoming_options(struct sock *sk, struct s= k_buff *skb) return true; } =20 - if (mptcp_over_limit(subflow->conn, skb)) + if (mptcp_over_limit(subflow->conn, skb, &mp_opt)) return false; =20 mpext =3D skb_ext_add(skb, SKB_EXT_MPTCP); diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 49e62f817fd6..0c57561ee046 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -363,6 +363,66 @@ static void mptcp_init_skb(struct sock *ssk, struct sk= _buff *skb, int offset) skb_dst_drop(skb); } =20 +/* "Inspired" from the TCP version */ +static void mptcp_prune_ofo_queue(struct sock *sk, u32 seq) +{ + struct mptcp_sock *msk =3D mptcp_sk(sk); + struct rb_node *node, *prev; + bool pruned =3D false; + + if (RB_EMPTY_ROOT(&msk->out_of_order_queue)) + return; + + node =3D &msk->ooo_last_skb->rbnode; + + do { + struct sk_buff *skb =3D rb_to_skb(node); + + /* If incoming skb would land last in ofo queue, stop pruning. */ + if (after(seq, MPTCP_SKB_CB(skb)->map_seq)) + break; + + pruned =3D true; + prev =3D rb_prev(node); + rb_erase(node, &msk->out_of_order_queue); + mptcp_drop(sk, skb); + msk->ooo_last_skb =3D rb_to_skb(prev); + if (atomic_read(&sk->sk_rmem_alloc) < sk->sk_rcvbuf) + break; + + node =3D prev; + } while (node); + + if (pruned) + NET_INC_STATS(sock_net(sk), MPTCP_MIB_OFO_PRUNED); +} + +bool __mptcp_check_prune(struct sock *sk, u32 seq) +{ + struct mptcp_sock *msk =3D mptcp_sk(sk); + unsigned int dropped; + + if (likely(atomic_read(&sk->sk_rmem_alloc) < sk->sk_rcvbuf)) + return false; + + dropped =3D xtcp_collapse_ofo_queue(sk, &msk->out_of_order_queue, + &msk->ooo_last_skb, msk->scaling_ratio); + if (!skb_queue_empty(&sk->sk_receive_queue)) + dropped +=3D xtcp_collapse(sk, &sk->sk_receive_queue, NULL, + skb_peek(&sk->sk_receive_queue), + NULL, + msk->copied_seq, msk->ack_seq, + msk->scaling_ratio); + + if (dropped) + MPTCP_ADD_STATS(sock_net(sk), MPTCP_MIB_RCVCOLLAPSED, dropped); + if (likely(atomic_read(&sk->sk_rmem_alloc) < sk->sk_rcvbuf)) + return false; + + mptcp_prune_ofo_queue(sk, seq); + return atomic_read(&sk->sk_rmem_alloc) >=3D sk->sk_rcvbuf; +} + static bool __mptcp_move_skb(struct sock *sk, struct sk_buff *skb) { u32 copy_len =3D MPTCP_SKB_CB(skb)->end_seq - MPTCP_SKB_CB(skb)->map_seq; @@ -372,6 +432,12 @@ static bool __mptcp_move_skb(struct sock *sk, struct s= k_buff *skb) =20 mptcp_borrow_fwdmem(sk, skb); =20 + if (__mptcp_check_prune(sk, MPTCP_SKB_CB(skb)->map_seq)) { + MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_RCVPRUNED); + mptcp_drop(sk, skb); + return false; + } + if (MPTCP_SKB_CB(skb)->map_seq =3D=3D ack_seq) { /* in sequence */ msk->bytes_received +=3D copy_len; @@ -3679,6 +3745,8 @@ static void mptcp_release_cb(struct sock *sk) __mptcp_error_report(sk); if (__test_and_clear_bit(MPTCP_SYNC_SNDBUF, &msk->cb_flags)) __mptcp_sync_sndbuf(sk); + if (__test_and_clear_bit(MPTCP_PRUNE, &msk->cb_flags)) + __mptcp_check_prune(sk, msk->ack_seq - 1); } } =20 diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index e541f42fca25..a6b7eedf36cf 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -124,6 +124,7 @@ #define MPTCP_FLUSH_JOIN_LIST 5 #define MPTCP_SYNC_STATE 6 #define MPTCP_SYNC_SNDBUF 7 +#define MPTCP_PRUNE 8 =20 struct mptcp_skb_cb { u32 map_seq; @@ -829,6 +830,7 @@ bool __mptcp_close(struct sock *sk, long timeout); void mptcp_cancel_work(struct sock *sk); void __mptcp_unaccepted_force_close(struct sock *sk); void mptcp_set_state(struct sock *sk, int state); +bool __mptcp_check_prune(struct sock *sk, u32 seq); =20 bool mptcp_addresses_equal(const struct mptcp_addr_info *a, const struct mptcp_addr_info *b, bool use_port); --=20 2.53.0