From nobody Mon May 25 18:04:42 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3C38034040F for ; Tue, 19 May 2026 17:01:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779210114; cv=none; b=QYcoBTQc2ZSJFyakaIeYWcoExaBfzb+c+iut7WivOwM3VsUt40KLnwGpR5JzjhJciBfc7tOcXBJ3/uldziqPcHqHn/dGAweTqmr6AaIME/TjQ/+Jgnm5vrOkVGpPqrK2erawDl9d3gysil82abNn2rOlWx/Seu2uRlmYPlF76eE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779210114; c=relaxed/simple; bh=AhQyW2yrDd7+Ro+y4tNxqHjpUxnVJl37rTzaC17lcic=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=pmLu7r+TK1+p6b6FlDktcU7mnZcWrFP0A9mt0CX13qvbz8a9xhkQ4/9WFUUnq6UbK0hKdXDpj5zr5JQZbgQQkyE6CGGyahVI11m81maKq3pMkhXnoaUTOZ6HIVUYFweUQueb7DurDCJWe8Ks1YtT7Zr//A29rnuvit82/Kvbg/k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=ZH+OmP/2; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="ZH+OmP/2" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1779210112; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=H4DOF1c46C/r7O3CnWLWVV79RegeLFnzYT9iVFoYJe0=; b=ZH+OmP/2F0lfMj0UZpkHFXPXI9Lth9JTpUZls8oJPfFSDEv5/x2mYC3NEfg5RAilnwX002 nuw7EgnyD6ezP/zhY1/yXt0LW1APPI3h+dWgffRag/szqyH4pikfLmagxvERgpTMvCGmNT 7iQJAsoNXiR2sPUYS/oahT58GSE+Ozs= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-339-J60y1PfCPA67-2ktye3WrQ-1; Tue, 19 May 2026 13:01:50 -0400 X-MC-Unique: J60y1PfCPA67-2ktye3WrQ-1 X-Mimecast-MFC-AGG-ID: J60y1PfCPA67-2ktye3WrQ_1779210109 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 1AAD7180061A; Tue, 19 May 2026 17:01:49 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.44.33.47]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 85943180075B; Tue, 19 May 2026 17:01:47 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Cc: Geliang Tang , gang.yan@linux.dev Subject: [PATCH v7 mptcp-next 1/7] mptcp: fix missing wakeups in edge scenarios Date: Tue, 19 May 2026 19:01:35 +0200 Message-ID: In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: 8IV3B9OAuGrwh-5DdjT67cxwRtHqGv8g_c-WKXq_HnA_1779210109 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" The mptcp_recvmsg() can fill MPTCP socket receive queue via mptcp_move_skbs(), but currently does not try to wakeup any listener, because the same process is going to check the receive queue soon. When multiple threads are reading from the same fd, the above can cause stall. Add the missing wakeup. Fixes: 6771bfd9ee24 ("mptcp: update mptcp ack sequence from work queue") Signed-off-by: Paolo Abeni --- v6 -> v7 - use mptcp_epollin_ready() Notes: - sashiko may raise concerns about thundering herd problems. If application uses multiple threads to read on the same socket, that could/will happen. Application should not use multiple threads to read from the same socket if they want good performances. --- net/mptcp/protocol.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index ce8372fb3c6a..f14572fb1975 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -2276,6 +2276,8 @@ static bool mptcp_move_skbs(struct sock *sk) mptcp_backlog_spooled(sk, moved, &skbs); } mptcp_data_unlock(sk); + if (enqueued && mptcp_epollin_ready(sk)) + sk->sk_data_ready(sk); return enqueued; } =20 --=20 2.54.0 From nobody Mon May 25 18:04:42 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5828E367B9D for ; Tue, 19 May 2026 17:01:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779210118; cv=none; b=rWU3uviu4zw3atZFK5LPDpC8iIUd9YZO8chUIk36+D+0CfW25dETzwsZa37Nt+ironN/oGugxbK1SoWpDMhqI4Wpe1fXhenOr7GQwNv+/xYoMo7dm8GgCcGAolD6HSh37CYg/BiNjPmdALDjbOPuMOi0ztMCL48PCaWx8f8MHD0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779210118; c=relaxed/simple; bh=GMXvDJdF0Z4aJXzbjcO6CN9+uMe8cU7N1R0UxcFP8yM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=Xfk0anva26bUrSv3LVWmWeOg1ilvCeY3hGKtj+LwuGYFyqxenmypxuZe8FoZXigxD5EK8kWDXIiJroGa0Hf7jdTMdqz+huco9PJedhyp8WhDnQ6LEtrqJk+ePRhR5e1yGmkkf6Ue+PbSX+NOaUZ4SbIdDMlyZPMP52zRT0OCWSo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=cHl7xb+t; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="cHl7xb+t" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1779210116; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=klRl6UlwbAtUY17J59tpZ1IK6UxyNcVqgCNN4ghh2dU=; b=cHl7xb+tvyRfD4Ap6M6TZPY51dnjbrH4CYGF65BPhJwJtFKY8j3eeGfHY5boti5ituAS6J IXeSn1mYUCmHQmlfqnRcEbVB7gHApuT4U4VY360e2bq+9oX8caIClQaXxfF/Y1evu3wfwX 2uVrfQAPlmYah/yi7Gnx2aldqUygpGQ= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-685-0DNJbfmcOwe5dzmP21cqWg-1; Tue, 19 May 2026 13:01:52 -0400 X-MC-Unique: 0DNJbfmcOwe5dzmP21cqWg-1 X-Mimecast-MFC-AGG-ID: 0DNJbfmcOwe5dzmP21cqWg_1779210111 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 0097B180034C; Tue, 19 May 2026 17:01:51 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.44.33.47]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 8F2AD1800357; Tue, 19 May 2026 17:01:49 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Cc: Geliang Tang , gang.yan@linux.dev Subject: [PATCH v7 mptcp-next 2/7] mptcp: explicitly drop over memory limits Date: Tue, 19 May 2026 19:01:36 +0200 Message-ID: <6a4ffe65b8eae8aaae3306e318bfc533f871ec67.1779210016.git.pabeni@redhat.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: nCY7NLeGFM5BpMivBr2KmnaizHbM6KySBUyhsnwA6cs_1779210111 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" Currently the enforcement of the rcvbuf constraint is implemented when moving the skbs into the msk receive or OoO queue, keeping the incoming skbs in the subflow queue when over limit. Under significant memory pressure the above can cause permanent data transfer stalls. Hard enforce the memory limits as early as possible, before landing even in the subflow queues, and refine the check when owning the msk socket lock. Note that fallback socket must not drop on the later checks, as the incoming skb is already acked, and such drop would break the stream. Signed-off-by: Paolo Abeni --- v6 -> v7: - fix sign extension issues v4 -> v5: - fix possible u32 overflow in mptcp_over_limit v3 -> v4: - schedule TCP ack on drop - enforce limits in __mptcp_move_skb() and __mptcp_add_backlog(), too but only if not fallback. v1 -> v2: - deal correctly with tcp fin and zero win probe RFC -> v1: - limit vs actual buffer size - use CB info instead of skb->len Note that: - this needs the follow-up patches to really fix the stall - sashiko can assume ZWP carries unacked data and may be silently dropped. AFAIK that is false. - sashiko apparently can't graps mptcp subflow never hit the tcp rx fastpath, and the mptcp_incoming_options in tcp_rcv_state_process is hit, the peer can't transmit any more data. - the memory comparison is intentionally very rough, as the msk socket lock is not currently held where the condition is now enforced. This should require some refinement, shared as-is to avoid more latency on my side --- net/mptcp/options.c | 31 +++++++++++++++++++++++++++++-- net/mptcp/protocol.c | 29 +++++++++++++++++++++-------- net/mptcp/protocol.h | 8 ++++++++ 3 files changed, 58 insertions(+), 10 deletions(-) diff --git a/net/mptcp/options.c b/net/mptcp/options.c index 4cc583fdc7a9..35db19fd6b7a 100644 --- a/net/mptcp/options.c +++ b/net/mptcp/options.c @@ -1158,8 +1158,29 @@ static bool add_addr_hmac_valid(struct mptcp_sock *m= sk, return hmac =3D=3D mp_opt->ahmac; } =20 -/* Return false in case of error (or subflow has been reset), - * else return true. +static bool mptcp_over_limit(struct sock *sk, struct sock *ssk, + const struct sk_buff *skb) +{ + u64 mem =3D mptcp_mem(sk); + + /* sk_rcvbuf is ensured to be >=3D 0 and <=3D MAX_INT. */ + if (likely(mem <=3D READ_ONCE(sk->sk_rcvbuf))) + return false; + + /* Avoid silently dropping pure acks, fin or zero win probes. */ + if (TCP_SKB_CB(skb)->seq =3D=3D TCP_SKB_CB(skb)->end_seq || + TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN || + !after(TCP_SKB_CB(skb)->end_seq, tcp_sk(ssk)->rcv_nxt)) + return false; + + /* Dropped due to memory constraints, schedule an ack. */ + inet_csk(ssk)->icsk_ack.pending |=3D ICSK_ACK_NOMEM | ICSK_ACK_NOW; + inet_csk_schedule_ack(ssk); + return true; +} + +/* Return false when the caller must drop the packet, i.e. in case of erro= r, + * subflow has been reset, or over memory limits. */ bool mptcp_incoming_options(struct sock *sk, struct sk_buff *skb) { @@ -1185,6 +1206,9 @@ bool mptcp_incoming_options(struct sock *sk, struct s= k_buff *skb) =20 __mptcp_data_acked(subflow->conn); mptcp_data_unlock(subflow->conn); + + if (mptcp_over_limit(subflow->conn, sk, skb)) + return false; return true; } =20 @@ -1263,6 +1287,9 @@ bool mptcp_incoming_options(struct sock *sk, struct s= k_buff *skb) return true; } =20 + if (mptcp_over_limit(subflow->conn, sk, skb)) + return false; + mpext =3D skb_ext_add(skb, SKB_EXT_MPTCP); if (!mpext) return false; diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index f14572fb1975..e53abd1f814f 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -381,6 +381,15 @@ static bool __mptcp_move_skb(struct sock *sk, struct s= k_buff *skb) =20 mptcp_borrow_fwdmem(sk, skb); =20 + /* Can't drop packets for fallback socket this late, or the stream + * will break. + */ + if (unlikely(sk_rmem_alloc_get(sk) > READ_ONCE(sk->sk_rcvbuf)) && + !__mptcp_check_fallback(msk)) { + mptcp_drop(sk, skb); + return false; + } + if (MPTCP_SKB_CB(skb)->map_seq =3D=3D msk->ack_seq) { /* in sequence */ msk->bytes_received +=3D copy_len; @@ -675,6 +684,7 @@ static void __mptcp_add_backlog(struct sock *sk, struct sk_buff *tail =3D NULL; struct sock *ssk =3D skb->sk; bool fragstolen; + u64 limit; int delta; =20 if (unlikely(sk->sk_state =3D=3D TCP_CLOSE)) { @@ -682,6 +692,15 @@ static void __mptcp_add_backlog(struct sock *sk, return; } =20 + /* Similar additional allowance as plain TCP. */ + limit =3D READ_ONCE(sk->sk_rcvbuf); + limit +=3D (limit >> 1) + 64 * 1024; + limit =3D min_t(u64, limit, UINT_MAX); + if (msk->backlog_len > limit && !__mptcp_check_fallback(msk)) { + kfree_skb_reason(skb, SKB_DROP_REASON_SOCKET_RCVBUFF); + return; + } + /* Try to coalesce with the last skb in our backlog */ if (!list_empty(&msk->backlog_list)) tail =3D list_last_entry(&msk->backlog_list, struct sk_buff, list); @@ -753,7 +772,7 @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp= _sock *msk, =20 mptcp_init_skb(ssk, skb, offset, len); =20 - if (own_msk && sk_rmem_alloc_get(sk) < sk->sk_rcvbuf) { + if (own_msk) { mptcp_subflow_lend_fwdmem(subflow, skb); ret |=3D __mptcp_move_skb(sk, skb); } else { @@ -2211,10 +2230,6 @@ static bool __mptcp_move_skbs(struct sock *sk, struc= t list_head *skbs, u32 *delt =20 *delta =3D 0; while (1) { - /* If the msk recvbuf is full stop, don't drop */ - if (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf) - break; - prefetch(skb->next); list_del(&skb->list); *delta +=3D skb->truesize; @@ -2242,9 +2257,7 @@ static bool mptcp_can_spool_backlog(struct sock *sk, = struct list_head *skbs) DEBUG_NET_WARN_ON_ONCE(msk->backlog_unaccounted && sk->sk_socket && mem_cgroup_from_sk(sk)); =20 - /* Don't spool the backlog if the rcvbuf is full. */ - if (list_empty(&msk->backlog_list) || - sk_rmem_alloc_get(sk) > sk->sk_rcvbuf) + if (list_empty(&msk->backlog_list)) return false; =20 INIT_LIST_HEAD(skbs); diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index 661600f8b573..ef878ceae577 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -413,6 +413,14 @@ static inline void msk_owned_by_me(const struct mptcp_= sock *msk) #define mptcp_sk(ptr) container_of_const(ptr, struct mptcp_sock, sk.icsk_i= net.sk) #endif =20 +/* Be careful vs sign extension. */ +static inline u64 mptcp_mem(const struct sock *sk) +{ + u64 mem =3D (unsigned int)sk_rmem_alloc_get(sk); + + return mem + (unsigned int)READ_ONCE(mptcp_sk(sk)->backlog_len); +} + static inline int mptcp_win_from_space(const struct sock *sk, int space) { return __tcp_win_from_space(mptcp_sk(sk)->scaling_ratio, space); --=20 2.54.0 From nobody Mon May 25 18:04:42 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D34233F9A04 for ; Tue, 19 May 2026 17:01:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779210120; cv=none; b=hUC2Gk5ChYXrY/inMEZm8/J9qvaCWtT0O5x1SgEW5NXZNu+OqKpnjI3AA82ecSVpErlI8JiB+afdUzb145LvQzGJhltM5j15QRdyRmdCaG0d9gHFRfTXZrs+ok2lwIw8bD87yq/FLHikrv+oMZHpZHYFFH6zOHqwWcD3LLNIVsQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779210120; c=relaxed/simple; bh=AiCKmnTuhad4YKXiiIPs5YqOryoajZqXcHeO3iCK9Wc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=fuhyDlD2wB7KeOBhmOgT6UBGyC4ugP/GARKk4UXo3EC+QS72jehzFImyJwf3VmZOyWb0XWwopzfQqkAav3F32PfFVpgbb7UUvuK1XLlTgU0LxGZPzC6tADpZUBm6yWOB20J9eHUoIN42eK5BKsEGt9pBgneLqTjxujC68FDu2cs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=EdIMPpKv; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="EdIMPpKv" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1779210118; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=58LVag1jSIgoH5IULd9XEmzrf4yObCCTxrNqURZAXqM=; b=EdIMPpKvB7q2lQ2ISvu2aftqMl5hAlJBGaxGb3e9NFSi6MMM4kxqGXUVXX0lttPHQIiMTd UIWvoy8gekiC/OqsI836/msIc0vs5tZvG+QUrdHW2r+F8fSJxCGPWMxZ7ojn91iNbDl48J vTT2Im/XdX4spt+/o37LQZJbZr2h4ks= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-115-l-eSPM3ePcagusDuFCSl0Q-1; Tue, 19 May 2026 13:01:54 -0400 X-MC-Unique: l-eSPM3ePcagusDuFCSl0Q-1 X-Mimecast-MFC-AGG-ID: l-eSPM3ePcagusDuFCSl0Q_1779210113 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 1EEF61956055; Tue, 19 May 2026 17:01:53 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.44.33.47]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 96BBF18004A3; Tue, 19 May 2026 17:01:51 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Cc: Geliang Tang , gang.yan@linux.dev Subject: [PATCH v7 mptcp-next 3/7] mptcp: enforce hard limit on backlog flushing Date: Tue, 19 May 2026 19:01:37 +0200 Message-ID: In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: 5LnKueQfib6HW108jYVVIT-t6oRmjam_isxHiec_9c8_1779210113 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" Currently a wild producer could keep the backlog flushing operation spinning for an unbound time. Since the previous patch the amount of data present in the backlog is hard-limited. Move the backlog len update at the end of the flush loop to prevent it spinning forever. Also, no need to splice back the remaining skbs list into the backlog, as such list is always empty after each backlog processing loop. Signed-off-by: Paolo Abeni --- net/mptcp/protocol.c | 21 ++++++--------------- 1 file changed, 6 insertions(+), 15 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index e53abd1f814f..0e41596b6397 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -2228,7 +2228,6 @@ static bool __mptcp_move_skbs(struct sock *sk, struct= list_head *skbs, u32 *delt struct mptcp_sock *msk =3D mptcp_sk(sk); bool moved =3D false; =20 - *delta =3D 0; while (1) { prefetch(skb->next); list_del(&skb->list); @@ -2265,20 +2264,12 @@ static bool mptcp_can_spool_backlog(struct sock *sk= , struct list_head *skbs) return true; } =20 -static void mptcp_backlog_spooled(struct sock *sk, u32 moved, - struct list_head *skbs) -{ - struct mptcp_sock *msk =3D mptcp_sk(sk); - - WRITE_ONCE(msk->backlog_len, msk->backlog_len - moved); - list_splice(skbs, &msk->backlog_list); -} - static bool mptcp_move_skbs(struct sock *sk) { + struct mptcp_sock *msk =3D mptcp_sk(sk); struct list_head skbs; bool enqueued =3D false; - u32 moved; + u32 moved =3D 0; =20 mptcp_data_lock(sk); while (mptcp_can_spool_backlog(sk, &skbs)) { @@ -2286,8 +2277,8 @@ static bool mptcp_move_skbs(struct sock *sk) enqueued |=3D __mptcp_move_skbs(sk, &skbs, &moved); =20 mptcp_data_lock(sk); - mptcp_backlog_spooled(sk, moved, &skbs); } + WRITE_ONCE(msk->backlog_len, msk->backlog_len - moved); mptcp_data_unlock(sk); if (enqueued && mptcp_epollin_ready(sk)) sk->sk_data_ready(sk); @@ -3672,12 +3663,12 @@ static void mptcp_release_cb(struct sock *sk) __must_hold(&sk->sk_lock.slock) { struct mptcp_sock *msk =3D mptcp_sk(sk); + u32 moved =3D 0; =20 for (;;) { unsigned long flags =3D (msk->cb_flags & MPTCP_FLAGS_PROCESS_CTX_NEED); struct list_head join_list, skbs; bool spool_bl; - u32 moved; =20 spool_bl =3D mptcp_can_spool_backlog(sk, &skbs); if (!flags && !spool_bl) @@ -3710,9 +3701,9 @@ static void mptcp_release_cb(struct sock *sk) =20 cond_resched(); spin_lock_bh(&sk->sk_lock.slock); - if (spool_bl) - mptcp_backlog_spooled(sk, moved, &skbs); } + if (moved) + WRITE_ONCE(msk->backlog_len, msk->backlog_len - moved); =20 if (__test_and_clear_bit(MPTCP_CLEAN_UNA, &msk->cb_flags)) __mptcp_clean_una_wakeup(sk); --=20 2.54.0 From nobody Mon May 25 18:04:42 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 339BF367B7C for ; Tue, 19 May 2026 17:02:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779210124; cv=none; b=RP1ZRD3+nlWAQQ+EytWeoCctkYiag7qYOIYAa1pi99gjFgtA/lDL5vjrmp265FzbcQhgmNY5CTVeXHJZFG6mIM4hVIkE/GBhfODmZws3BynleBclGh/jKFU8AJKlLK0ew/zDr1Tu2dXwG5lt9jOQyZ5EdkGa++iEva+FXRj25zo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779210124; c=relaxed/simple; bh=Dbys4msg2XHqEsXPj513Rikb/yzZc336uo63BxCohlk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=K6myhB2BYLvNQoEHwR80x1jy+UzzGokjb30da0v3tOUBqxqYbZ9s5g05bwa79KHoNTwGVcOTa+ZZYMCYUsGQDTUbo4u/un7IPIUxiIFhOuacdxpwcDDIf2V/9nNGMsZQe/La7YRwzd/CcqJ+i6TxoBMHZfuxYydw7n6BZYGZsS8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=H7PfQfIn; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="H7PfQfIn" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1779210122; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/2GuRJXq9DGhHB5puJfIDLa/MIkr43VvTE9O9X/FCP4=; b=H7PfQfInpJUC3G/e4xnbS2cZiAn0TlULoJFg3L1729hvAORUhT1cDkSJZpaOMHpIHbd4Hu OQHRW/vyiuQ6W8p/s5GM2W4d9ZV3CblFPH7WH4g567FrJ2wUhd3zV1o6DAz3phP4GpeB5o lnGHW9McXeA9BLrVpwTa0pWH2ngbhEI= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-595-vrtG4HvbPAqDOE0ZWNq3CQ-1; Tue, 19 May 2026 13:01:58 -0400 X-MC-Unique: vrtG4HvbPAqDOE0ZWNq3CQ-1 X-Mimecast-MFC-AGG-ID: vrtG4HvbPAqDOE0ZWNq3CQ_1779210116 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id E9F2819560B1; Tue, 19 May 2026 17:01:54 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.44.33.47]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 9897118004A3; Tue, 19 May 2026 17:01:53 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Cc: Geliang Tang , gang.yan@linux.dev Subject: [PATCH v7 mptcp-next 4/7] mptcp: implemented OoO queue pruning Date: Tue, 19 May 2026 19:01:38 +0200 Message-ID: <9c0a479fab5bf36d8d8b098bf5c65fac4352fc42.1779210016.git.pabeni@redhat.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: MgqAq7d_pusxDavI3MmFSnlFKd1cYHP8GoE8BH2Hy_g_1779210116 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" Leverage the hybrid helpers to implement the receive queue and OoO queue collapsing at ingress time when reaching memory bounds. If the msk is owned by the user-space at incoming skb time, perform the pruning in the release_cb. The prune check is additionally performed when the skb reaches the msk-level queues. Signed-off-by: Paolo Abeni --- v6 -> v7: - fix u64 -> u32 truncation v2 -> v3: - deal with unsynced TFO skb at prune time - only possible when pruning in mptcp_over_limit() v1 -> v2: - collapse rcv queue, too - deal with MPC map, too - drop left-over sentence in the commit message RFC -> v1: - use data_seq only when available - avoid ack_seq lockless access - drop limit on fallback - collapse rcvqueue, too - drop only when pruning is not possible and over rcvbuf * 2 Note: - sashiko can be confused about fwd memory lifecycle (I can understand that :). Any exceeding amount of fwd allocated memory is always released by the next sk_mem_uncharge() - i.e. fwd memory is not tied to the current skb. - AFAICS KASAN handles bitmap variables in a sane way, and sashiko doesn't know about that --- net/mptcp/mib.c | 2 ++ net/mptcp/mib.h | 2 ++ net/mptcp/options.c | 44 +++++++++++++++++++++++++++++++-------- net/mptcp/protocol.c | 49 ++++++++++++++++++++++++++++++++++++++++++-- net/mptcp/protocol.h | 1 + 5 files changed, 88 insertions(+), 10 deletions(-) diff --git a/net/mptcp/mib.c b/net/mptcp/mib.c index f23fda0c55a7..bdc863c3a952 100644 --- a/net/mptcp/mib.c +++ b/net/mptcp/mib.c @@ -85,6 +85,8 @@ static const struct snmp_mib mptcp_snmp_list[] =3D { SNMP_MIB_ITEM("SimultConnectFallback", MPTCP_MIB_SIMULTCONNFALLBACK), SNMP_MIB_ITEM("FallbackFailed", MPTCP_MIB_FALLBACKFAILED), SNMP_MIB_ITEM("WinProbe", MPTCP_MIB_WINPROBE), + SNMP_MIB_ITEM("OfoPruned", MPTCP_MIB_OFO_PRUNED), + SNMP_MIB_ITEM("RcvPruned", MPTCP_MIB_RCVPRUNED), }; =20 /* mptcp_mib_alloc - allocate percpu mib counters diff --git a/net/mptcp/mib.h b/net/mptcp/mib.h index 812218b5ed2b..8ec314847fc3 100644 --- a/net/mptcp/mib.h +++ b/net/mptcp/mib.h @@ -88,6 +88,8 @@ enum linux_mptcp_mib_field { MPTCP_MIB_SIMULTCONNFALLBACK, /* Simultaneous connect */ MPTCP_MIB_FALLBACKFAILED, /* Can't fallback due to msk status */ MPTCP_MIB_WINPROBE, /* MPTCP-level zero window probe */ + MPTCP_MIB_OFO_PRUNED, /* MPTCP-level OoO queue pruned */ + MPTCP_MIB_RCVPRUNED, /* Dropped due to memory constrains */ __MPTCP_MIB_MAX }; =20 diff --git a/net/mptcp/options.c b/net/mptcp/options.c index 35db19fd6b7a..e5d5a50a907f 100644 --- a/net/mptcp/options.c +++ b/net/mptcp/options.c @@ -1159,9 +1159,12 @@ static bool add_addr_hmac_valid(struct mptcp_sock *m= sk, } =20 static bool mptcp_over_limit(struct sock *sk, struct sock *ssk, - const struct sk_buff *skb) + const struct sk_buff *skb, + const struct mptcp_options_received *mp_opt) { - u64 mem =3D mptcp_mem(sk); + struct mptcp_sock *msk =3D mptcp_sk(sk); + u64 limit, mem =3D mptcp_mem(sk); + bool ret; =20 /* sk_rcvbuf is ensured to be >=3D 0 and <=3D MAX_INT. */ if (likely(mem <=3D READ_ONCE(sk->sk_rcvbuf))) @@ -1173,10 +1176,31 @@ static bool mptcp_over_limit(struct sock *sk, struc= t sock *ssk, !after(TCP_SKB_CB(skb)->end_seq, tcp_sk(ssk)->rcv_nxt)) return false; =20 - /* Dropped due to memory constraints, schedule an ack. */ - inet_csk(ssk)->icsk_ack.pending |=3D ICSK_ACK_NOMEM | ICSK_ACK_NOW; - inet_csk_schedule_ack(ssk); - return true; + mptcp_data_lock(sk); + if (!sock_owned_by_user(sk)) { + /* When the data sequence is not (yet) available for the + * incoming skb, allow pruning the whole OoO queue. + */ + u64 seq =3D (!mp_opt->use_map || mp_opt->mpc_map) ? + msk->ack_seq : mp_opt->data_seq; + + limit =3D sk->sk_rcvbuf; + mptcp_prune_ofo_queue(sk, true, seq); + } else { + /* Pruning will take place later in the RX path, allow + * some extra slack. + */ + limit =3D ((u64)READ_ONCE(sk->sk_rcvbuf)) << 1; + } + ret =3D mptcp_mem(sk) > limit; + mptcp_data_unlock(sk); + + if (ret) { + /* Dropped due to memory constraints, schedule an ack. */ + inet_csk(ssk)->icsk_ack.pending |=3D ICSK_ACK_NOMEM | ICSK_ACK_NOW; + inet_csk_schedule_ack(ssk); + } + return ret; } =20 /* Return false when the caller must drop the packet, i.e. in case of erro= r, @@ -1207,7 +1231,11 @@ bool mptcp_incoming_options(struct sock *sk, struct = sk_buff *skb) __mptcp_data_acked(subflow->conn); mptcp_data_unlock(subflow->conn); =20 - if (mptcp_over_limit(subflow->conn, sk, skb)) + /* Will use ack_seq as limit for OoO pruning; any value would do + * as OoO queue must be empty. + */ + mp_opt.use_map =3D 0; + if (mptcp_over_limit(subflow->conn, sk, skb, &mp_opt)) return false; return true; } @@ -1287,7 +1315,7 @@ bool mptcp_incoming_options(struct sock *sk, struct s= k_buff *skb) return true; } =20 - if (mptcp_over_limit(subflow->conn, sk, skb)) + if (mptcp_over_limit(subflow->conn, sk, skb, &mp_opt)) return false; =20 mpext =3D skb_ext_add(skb, SKB_EXT_MPTCP); diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 0e41596b6397..f207f001c97e 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -373,6 +373,47 @@ static void mptcp_init_skb(struct sock *ssk, struct sk= _buff *skb, int offset, skb_dst_drop(skb); } =20 +/* "Inspired" from the TCP version */ +void mptcp_prune_ofo_queue(struct sock *sk, bool use_bl, u64 seq) +{ + struct mptcp_sock *msk =3D mptcp_sk(sk); + struct rb_node *node, *prev; + bool pruned =3D false; + u64 mem; + + if (RB_EMPTY_ROOT(&msk->out_of_order_queue)) + return; + + node =3D &msk->ooo_last_skb->rbnode; + + do { + struct sk_buff *skb =3D rb_to_skb(node); + + /* Stop pruning if the incoming skb would land in OoO tail. */ + if (after64(seq, MPTCP_SKB_CB(skb)->map_seq)) + break; + + pruned =3D true; + prev =3D rb_prev(node); + rb_erase(node, &msk->out_of_order_queue); + mptcp_drop(sk, skb); + msk->ooo_last_skb =3D rb_to_skb(prev); + + /* On early pruning be more strict. */ + if (use_bl) + mem =3D mptcp_mem(sk); + else + mem =3D (unsigned int)atomic_read(&sk->sk_rmem_alloc); + if (mem < sk->sk_rcvbuf) + break; + + node =3D prev; + } while (node); + + if (pruned) + MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_OFO_PRUNED); +} + static bool __mptcp_move_skb(struct sock *sk, struct sk_buff *skb) { u64 copy_len =3D MPTCP_SKB_CB(skb)->end_seq - MPTCP_SKB_CB(skb)->map_seq; @@ -386,8 +427,12 @@ static bool __mptcp_move_skb(struct sock *sk, struct s= k_buff *skb) */ if (unlikely(sk_rmem_alloc_get(sk) > READ_ONCE(sk->sk_rcvbuf)) && !__mptcp_check_fallback(msk)) { - mptcp_drop(sk, skb); - return false; + mptcp_prune_ofo_queue(sk, false, MPTCP_SKB_CB(skb)->map_seq); + if (sk_rmem_alloc_get(sk) > READ_ONCE(sk->sk_rcvbuf)) { + MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_RCVPRUNED); + mptcp_drop(sk, skb); + return false; + } } =20 if (MPTCP_SKB_CB(skb)->map_seq =3D=3D msk->ack_seq) { diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index ef878ceae577..816210ed2630 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -835,6 +835,7 @@ bool __mptcp_close(struct sock *sk, long timeout); void mptcp_cancel_work(struct sock *sk); void __mptcp_unaccepted_force_close(struct sock *sk); void mptcp_set_state(struct sock *sk, int state); +void mptcp_prune_ofo_queue(struct sock *sk, bool use_bl, u64 seq); =20 bool mptcp_addresses_equal(const struct mptcp_addr_info *a, const struct mptcp_addr_info *b, bool use_port); --=20 2.54.0 From nobody Mon May 25 18:04:42 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6908D367B7C for ; Tue, 19 May 2026 17:02:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779210122; cv=none; b=rGXp7c4N43BI8EEWBLItgOqO01S8/XEsS1xYA0xa3ngpF/Y9PCr/D3TtvVH6u1saPeb8Piaq9m63oVFw+OdTSgpd3H4f10I1jkpr/ftFbHcjjtQqcrDyNyki2YGSBFIRK04lfe5oOSt2kuXAmnAzx9hBDqgStvycvgtLTxquM+4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779210122; c=relaxed/simple; bh=FWqTxJ0XLghGmcD1hgWHaD1IYJEXr5gCWeIYa6D5Oas=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=TBZUSxmGJoawrYdG4ahbRIOoojoePcXXA7+6L8UHRpzw1dXkeaqRiBDnIPILhoj6SNd35l5uHZGq60wdrjDgODw9+xHk2a7gWNzsEAg6B47AqAYL1Ty1BllV7L4TY2WlcRFCeCV/TRtb3aD8QNsRwdxhP50kJI029yNUriJRBzU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=DR4s+u2+; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="DR4s+u2+" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1779210120; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gTs/fa+YsmB7b1xyoh0ERPqAN/BncFlXOmD857oeSK4=; b=DR4s+u2+Pn0RDrVvoD9Kpko5GedvuSiFjJDAQcI5WIEWwEepQDDqu9T+Gfg9KRmqN3NDWx wGokSSjB4UuFUBa1D6lWF0qx3ZBPlY+RaprjVtaASuy9v4a9hgUSc/mskSrAwUAXN7ujVz HZvgTKZPYqh7TpHQRrbEJytVVUz6egk= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-607-ei3N5cDfNtmVr5ms4ts5Mg-1; Tue, 19 May 2026 13:01:57 -0400 X-MC-Unique: ei3N5cDfNtmVr5ms4ts5Mg-1 X-Mimecast-MFC-AGG-ID: ei3N5cDfNtmVr5ms4ts5Mg_1779210116 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id B524819560AE; Tue, 19 May 2026 17:01:56 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.44.33.47]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 69187180056E; Tue, 19 May 2026 17:01:55 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Cc: Geliang Tang , gang.yan@linux.dev Subject: [PATCH v7 mptcp-next 5/7] mptcp: track prune recovery status Date: Tue, 19 May 2026 19:01:39 +0200 Message-ID: <083f266c0b67226ddad01571a422346d9c0d5852.1779210016.git.pabeni@redhat.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: 7VGg3sfxQnra072oBZFNt7VSFDQ5AY7JPeZcIz9Pz8s_1779210116 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" After dropping any data already acked at the TCP level, the MPTCP must avoid inducing TCP-level retransmission until the pruned data has been successfully acked at MPTCP level. Otherwise the subflows could keep retransmitting skbs carring OoO MPTCP data, preventing reinjections and stalling completely the data transfer. Explicitly keep track of the highest pruned MPTCP-level seq number and stop dropping at TCP level until such sequence has been acked. Signed-off-by: Paolo Abeni --- v6 -> v7: - add missing READ_ONCE() on ack_seq (sashiko was right) - introduce mptcp_pruned() helper, use it in __mptcp_add_backlog(), too - fix u64 -> u32 truncation --- net/mptcp/options.c | 7 +++++++ net/mptcp/protocol.c | 7 +++++++ net/mptcp/protocol.h | 9 +++++++++ net/mptcp/subflow.c | 1 + 4 files changed, 24 insertions(+) diff --git a/net/mptcp/options.c b/net/mptcp/options.c index e5d5a50a907f..f149b74cc15a 100644 --- a/net/mptcp/options.c +++ b/net/mptcp/options.c @@ -1193,6 +1193,13 @@ static bool mptcp_over_limit(struct sock *sk, struct= sock *ssk, limit =3D ((u64)READ_ONCE(sk->sk_rcvbuf)) << 1; } ret =3D mptcp_mem(sk) > limit; + + /* After pruning any packets ensure that MPTCP-driven drops do not + * cause TCP-level retransmission. + */ + if (before64(READ_ONCE(msk->ack_seq), READ_ONCE(msk->pruned_seq))) + ret =3D false; + mptcp_data_unlock(sk); =20 if (ret) { diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index f207f001c97e..8da4f43543e3 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -396,6 +396,7 @@ void mptcp_prune_ofo_queue(struct sock *sk, bool use_bl= , u64 seq) pruned =3D true; prev =3D rb_prev(node); rb_erase(node, &msk->out_of_order_queue); + mptcp_pruned(msk, MPTCP_SKB_CB(skb)->end_seq); mptcp_drop(sk, skb); msk->ooo_last_skb =3D rb_to_skb(prev); =20 @@ -429,6 +430,8 @@ static bool __mptcp_move_skb(struct sock *sk, struct sk= _buff *skb) !__mptcp_check_fallback(msk)) { mptcp_prune_ofo_queue(sk, false, MPTCP_SKB_CB(skb)->map_seq); if (sk_rmem_alloc_get(sk) > READ_ONCE(sk->sk_rcvbuf)) { + mptcp_pruned(msk, MPTCP_SKB_CB(skb)->end_seq); + MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_RCVPRUNED); mptcp_drop(sk, skb); return false; @@ -742,6 +745,7 @@ static void __mptcp_add_backlog(struct sock *sk, limit +=3D (limit >> 1) + 64 * 1024; limit =3D min_t(u64, limit, UINT_MAX); if (msk->backlog_len > limit && !__mptcp_check_fallback(msk)) { + mptcp_pruned(msk, MPTCP_SKB_CB(skb)->end_seq); kfree_skb_reason(skb, SKB_DROP_REASON_SOCKET_RCVBUFF); return; } @@ -890,6 +894,8 @@ static bool __mptcp_ofo_queue(struct mptcp_sock *msk) WRITE_ONCE(msk->ack_seq, end_seq); moved =3D true; } + + mptcp_pruned(msk, msk->ack_seq); return moved; } =20 @@ -3539,6 +3545,7 @@ static int mptcp_disconnect(struct sock *sk, int flag= s) /* for fallback's sake */ WRITE_ONCE(msk->ack_seq, 0); atomic64_set(&msk->rcv_wnd_sent, 0); + WRITE_ONCE(msk->pruned_seq, 0); =20 WRITE_ONCE(sk->sk_shutdown, 0); sk_error_report(sk); diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index 816210ed2630..2590aceb7c98 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -303,6 +303,9 @@ struct mptcp_sock { u64 bytes_acked; u64 snd_una; u64 wnd_end; + u64 pruned_seq; /* If strictly above ack_seq, + * the highest seq pruned. + */ u32 last_data_sent; u32 last_data_recv; u32 last_ack_recv; @@ -837,6 +840,12 @@ void __mptcp_unaccepted_force_close(struct sock *sk); void mptcp_set_state(struct sock *sk, int state); void mptcp_prune_ofo_queue(struct sock *sk, bool use_bl, u64 seq); =20 +static inline void mptcp_pruned(struct mptcp_sock *msk, u64 seq) +{ + if (after64(seq, msk->pruned_seq)) + WRITE_ONCE(msk->pruned_seq, seq); +} + bool mptcp_addresses_equal(const struct mptcp_addr_info *a, const struct mptcp_addr_info *b, bool use_port); void mptcp_local_address(const struct sock_common *skc, diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index d562e149606f..cc75d914c1b5 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -494,6 +494,7 @@ static void subflow_set_remote_key(struct mptcp_sock *m= sk, =20 WRITE_ONCE(msk->remote_key, subflow->remote_key); WRITE_ONCE(msk->ack_seq, subflow->iasn); + WRITE_ONCE(msk->pruned_seq, subflow->iasn); WRITE_ONCE(msk->can_ack, true); atomic64_set(&msk->rcv_wnd_sent, subflow->iasn); } --=20 2.54.0 From nobody Mon May 25 18:04:42 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CF463367B8E for ; Tue, 19 May 2026 17:02:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779210126; cv=none; b=VhHOhRd7N9ymv3UMdHfChj3myQBfQ+FcxWOGnWth6t+o77jXFEJHRU2BCrpaoB08NEEXP0DV7tZw2+Il5M1tkl0yOgg+BR75vFCUI8E1g9oJoImb5yEjnx+b6fQ/cHyKn92VFIvfvvjaWmhoOHdy60ih+YA7ngbdz2hFXFEb2yQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779210126; c=relaxed/simple; bh=2Bvksey8OhxCKHqa0OS6OfCtYmOzxXKE422bdoauIWk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=aLL9QFGn2B2hBiWN0nGDmQ2lsC5xcmDjekCTo8plj9ImF70Nrfz4NHMYvJcjoaaIPO2UHe/V1oHagsCHPdNABfmwiS7fhxBG7mlrMgD+YTb9T5l6wRt/FEGiNz/LD3dkV1InYkuRWoBY0IQStJc7F8iOPgH+hO4NM2qZd+0mVVM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=eJt7E8Fn; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="eJt7E8Fn" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1779210124; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4k4pG4LBah+XUZBWy12ZdyphauYr4mCKzBG72l6FZsc=; b=eJt7E8FnxH42NA8wsy9uVKCGMmt4x9EdQyX8vEmd6t9veqAvrDBzaM02ATkc8kUzQrxFZJ utu5i0m/cf5Syb2LGe3j5yHIHDtWR4xHMtbp/lzkZIjmeJAAM9LVvi4WwtEKH0TzTzmJ43 3JfTJwlT5LpoYV4YgY3qqCJz8xkiM6A= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-661-ZJ6l275wP6-iiG9EOIMpiA-1; Tue, 19 May 2026 13:01:59 -0400 X-MC-Unique: ZJ6l275wP6-iiG9EOIMpiA-1 X-Mimecast-MFC-AGG-ID: ZJ6l275wP6-iiG9EOIMpiA_1779210118 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 7B16119560BA; Tue, 19 May 2026 17:01:58 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.44.33.47]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 3546B180056E; Tue, 19 May 2026 17:01:56 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Cc: Geliang Tang , gang.yan@linux.dev Subject: [PATCH v7 mptcp-next 6/7] mptcp: move the retrans loop to a separate helper Date: Tue, 19 May 2026 19:01:40 +0200 Message-ID: <8e8512fa1af1914720f678c9d9fe67f7e3b46a21.1779210016.git.pabeni@redhat.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: 5tv6VCVzdMgYh3HAzT5tTpnODa4SaYth1ASIwn50fY8_1779210118 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" This is a cleanup in order to make the next patch simpler. No functional change intended. Signed-off-by: Paolo Abeni --- net/mptcp/protocol.c | 74 +++++++++++++++++++++++++------------------- 1 file changed, 43 insertions(+), 31 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 8da4f43543e3..a30461f86aeb 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -2833,41 +2833,14 @@ static void mptcp_check_fastclose(struct mptcp_sock= *msk) sk_error_report(sk); } =20 -static void __mptcp_retrans(struct sock *sk) +/* Retransmit the specified data fragment on all the selected subflows. */ +static int __mptcp_push_retrans(struct sock *sk, struct mptcp_data_frag *d= frag) { struct mptcp_sendmsg_info info =3D { .data_lock_held =3D true, }; struct mptcp_sock *msk =3D mptcp_sk(sk); struct mptcp_subflow_context *subflow; - struct mptcp_data_frag *dfrag; struct sock *ssk; - int ret, err; - u16 len =3D 0; - - mptcp_clean_una_wakeup(sk); - - /* first check ssk: need to kick "stale" logic */ - err =3D mptcp_sched_get_retrans(msk); - dfrag =3D mptcp_rtx_head(sk); - if (!dfrag) { - if (mptcp_data_fin_enabled(msk)) { - struct inet_connection_sock *icsk =3D inet_csk(sk); - - WRITE_ONCE(icsk->icsk_retransmits, - icsk->icsk_retransmits + 1); - mptcp_set_datafin_timeout(sk); - mptcp_send_ack(msk); - - goto reset_timer; - } - - if (!mptcp_send_head(sk)) - goto clear_scheduled; - - goto reset_timer; - } - - if (err) - goto reset_timer; + int ret, len =3D 0; =20 mptcp_for_each_subflow(msk, subflow) { if (READ_ONCE(subflow->scheduled)) { @@ -2895,7 +2868,7 @@ static void __mptcp_retrans(struct sock *sk) !msk->allow_subflows) { spin_unlock_bh(&msk->fallback_lock); release_sock(ssk); - goto clear_scheduled; + return -1; } =20 while (info.sent < info.limit) { @@ -2918,6 +2891,45 @@ static void __mptcp_retrans(struct sock *sk) release_sock(ssk); } } + return len; +} + +static void __mptcp_retrans(struct sock *sk) +{ + struct mptcp_sock *msk =3D mptcp_sk(sk); + struct mptcp_subflow_context *subflow; + struct mptcp_data_frag *dfrag; + int err, len; + + mptcp_clean_una_wakeup(sk); + + /* first check ssk: need to kick "stale" logic */ + err =3D mptcp_sched_get_retrans(msk); + dfrag =3D mptcp_rtx_head(sk); + if (!dfrag) { + if (mptcp_data_fin_enabled(msk)) { + struct inet_connection_sock *icsk =3D inet_csk(sk); + + WRITE_ONCE(icsk->icsk_retransmits, + icsk->icsk_retransmits + 1); + mptcp_set_datafin_timeout(sk); + mptcp_send_ack(msk); + + goto reset_timer; + } + + if (!mptcp_send_head(sk)) + goto clear_scheduled; + + goto reset_timer; + } + + if (err) + goto reset_timer; + + len =3D __mptcp_push_retrans(sk, dfrag); + if (len < 0) + goto clear_scheduled; =20 msk->bytes_retrans +=3D len; dfrag->already_sent =3D max(dfrag->already_sent, len); --=20 2.54.0 From nobody Mon May 25 18:04:42 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 73870367B80 for ; Tue, 19 May 2026 17:02:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779210124; cv=none; b=q0NiCB/5+wsXTswyBTqadNx1/U7yT76MlQZSNKXX+Er/Y2oRu9qQgRpcn/cge2/r3fpTOSbQYymptmNFmb0BbB9rM0eaGTe1fr1t6zpmSp1KOddr/LswMSH3/PDBYX3PHxaWvsz8bEtL55FSTcmQYFvV1DMg4tVXYPkVVUemdgg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779210124; c=relaxed/simple; bh=q9lZWd9B38KaeoKawm7Q/8JLuag3PuDKNJ7Ixtg9TBQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=r9ZrUZ/9wHPZi3Ty6nGTGMlih0pdW1YNkuG2zw5an+Fh2gphB/Xmjy5Y2un6Tg70HNhv6rr/Ca/EH7eAHuOG4fQPr67lzuYoHb2Y5UnO7OHTqzmqxdbbom7+xfdlHmEeQfOGfukoS/ZBiOjWqv6q2ROLon7gOBPuqwWmzHU95Fo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=aTfp5kir; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="aTfp5kir" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1779210122; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=IvcODoihm801GYFULYkypHoUGjdVcfTjsQDR+a6bIHI=; b=aTfp5kirbv7cj7jRpNF6g3Njy0MF9TpClAx9J0TeU+wQb2qlqaG84grebFn7OZ/qdPCjpA ZZZsso2YDGI/2yVMijsbL6+aYmRoJd4pyi8VnTFUOWKPmBbdk+js6eRG17A2QstTw6NinP RbgdyZ+V9FYZIdJc3JQ9nMB7GDeJhDY= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-605-ilOdeHmHN7OqiyMhGBplRQ-1; Tue, 19 May 2026 13:02:01 -0400 X-MC-Unique: ilOdeHmHN7OqiyMhGBplRQ-1 X-Mimecast-MFC-AGG-ID: ilOdeHmHN7OqiyMhGBplRQ_1779210120 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 3D0EE18002D6; Tue, 19 May 2026 17:02:00 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.44.33.47]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id EF9D218004A3; Tue, 19 May 2026 17:01:58 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Cc: Geliang Tang , gang.yan@linux.dev Subject: [PATCH v7 mptcp-next 7/7] mptcp: let the retrans scheduler do its job. Date: Tue, 19 May 2026 19:01:41 +0200 Message-ID: <81d5f314af320ed93d594948448ea7ffe2de9c93.1779210016.git.pabeni@redhat.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: 9-xkkTwHKfugezVX09k-dO8i6vfOgU0dNhoE6q64JFA_1779210120 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" Currently the MPTCP core enforces that when MPTCP-level retrans timer fires, at most a single dfrag is retransmitted. If some corner-cases it may be necessary retransmit multiple dfrags, and the MPTCP socket will need to wait multiple retrans timeout to accomplish that. Remove the mentioned constraint, allowing to transmit multiple dfrags per retrans period, as long as the scheduler keeps selecting subflows for retransmissions and pending data is available in the rtx queue. The default scheduler will transmit a dfrag per available subflow. Signed-off-by: Paolo Abeni --- v4 -> v5: - fixed already_sent update v3 -> v4: - avoid quadratic behavior, fix retrans_seq update - fix rtx timer re-schedule miss v2 -> v3: - fix infinite loop issue (should address tls tests failures) v1 -> v2: - fix retrans sequence update (sashiko) Note: - sashiko see issues when dfrag =3D mptcp_rtx_head(sk) !=3D NULL and dfrag->already_sent =3D=3D 0. That condition should not possible: if mptcp_rtx_head() is not NULL there should be some data already sent. --- net/mptcp/protocol.c | 105 +++++++++++++++++++++++++++++++------------ 1 file changed, 77 insertions(+), 28 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index a30461f86aeb..4a219fc0fa89 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -1206,13 +1206,6 @@ static void __mptcp_clean_una_wakeup(struct sock *sk) mptcp_write_space(sk); } =20 -static void mptcp_clean_una_wakeup(struct sock *sk) -{ - mptcp_data_lock(sk); - __mptcp_clean_una_wakeup(sk); - mptcp_data_unlock(sk); -} - static void mptcp_enter_memory_pressure(struct sock *sk) { struct mptcp_subflow_context *subflow; @@ -2833,8 +2826,12 @@ static void mptcp_check_fastclose(struct mptcp_sock = *msk) sk_error_report(sk); } =20 -/* Retransmit the specified data fragment on all the selected subflows. */ -static int __mptcp_push_retrans(struct sock *sk, struct mptcp_data_frag *d= frag) +/* + * Retransmit the specified data fragment on all the selected subflows, + * starting from the specified sequence + */ +static int __mptcp_push_retrans(struct sock *sk, struct mptcp_data_frag *d= frag, + u64 sent_seq) { struct mptcp_sendmsg_info info =3D { .data_lock_held =3D true, }; struct mptcp_sock *msk =3D mptcp_sk(sk); @@ -2844,6 +2841,7 @@ static int __mptcp_push_retrans(struct sock *sk, stru= ct mptcp_data_frag *dfrag) =20 mptcp_for_each_subflow(msk, subflow) { if (READ_ONCE(subflow->scheduled)) { + u16 offset =3D sent_seq - dfrag->data_seq; u16 copied =3D 0; =20 mptcp_subflow_set_scheduled(subflow, false); @@ -2853,9 +2851,12 @@ static int __mptcp_push_retrans(struct sock *sk, str= uct mptcp_data_frag *dfrag) lock_sock(ssk); =20 /* limit retransmission to the bytes already sent on some subflows */ - info.sent =3D 0; + info.sent =3D offset; info.limit =3D READ_ONCE(msk->csum_enabled) ? dfrag->data_len : dfrag->already_sent; + DEBUG_NET_WARN_ON_ONCE(!before64(sent_seq, + dfrag->data_seq + + info.limit)); =20 /* * make the whole retrans decision, xmit, disallow @@ -2899,41 +2900,89 @@ static void __mptcp_retrans(struct sock *sk) struct mptcp_sock *msk =3D mptcp_sk(sk); struct mptcp_subflow_context *subflow; struct mptcp_data_frag *dfrag; + bool retransmitted =3D false; + u64 retrans_seq; int err, len; =20 - mptcp_clean_una_wakeup(sk); - - /* first check ssk: need to kick "stale" logic */ - err =3D mptcp_sched_get_retrans(msk); + mptcp_data_lock(sk); + __mptcp_clean_una_wakeup(sk); + retrans_seq =3D msk->snd_una; dfrag =3D mptcp_rtx_head(sk); + mptcp_data_unlock(sk); + if (!dfrag) + goto check_data_fin; + + for (;;) { + bool already_retrans; + + /* The scheduler may clean the RTX queue. */ + get_page(dfrag->page); + + /* The default scheduler will kick "stale" logic. */ + err =3D mptcp_sched_get_retrans(msk); + if (err) { + put_page(dfrag->page); + break; + } + + /* Incoming acks can have moved retrans sequence after + * the current dfrag, if so try to start again from RTX head. + */ + mptcp_data_lock(sk); + already_retrans =3D !dfrag->already_sent || + !before64(msk->snd_una, dfrag->data_seq + + dfrag->already_sent); + put_page(dfrag->page); + if (already_retrans) { + __mptcp_clean_una_wakeup(sk); + retrans_seq =3D msk->snd_una; + dfrag =3D mptcp_rtx_head(sk); + } + mptcp_data_unlock(sk); + if (!dfrag) + break; + + len =3D __mptcp_push_retrans(sk, dfrag, retrans_seq); + if (len < 0) + goto clear_scheduled; + + retransmitted =3D true; + retrans_seq +=3D len; + msk->bytes_retrans +=3D len; + dfrag->already_sent =3D max_t(u16, dfrag->already_sent, + retrans_seq - dfrag->data_seq); + + /* Attempt the next fragment only if the current one is + * completely retransmitted. + */ + if (before64(retrans_seq, dfrag->data_seq + dfrag->data_len)) + break; + + dfrag =3D list_is_last(&dfrag->list, &msk->rtx_queue) ? + NULL : list_next_entry(dfrag, list); + if (!dfrag || !dfrag->already_sent) + break; + } + + /* Data fin retransmission needed only if no data retransmission took + * place, and RTX queue is empty. + */ +check_data_fin: if (!dfrag) { - if (mptcp_data_fin_enabled(msk)) { + if (!retransmitted && mptcp_data_fin_enabled(msk)) { struct inet_connection_sock *icsk =3D inet_csk(sk); =20 WRITE_ONCE(icsk->icsk_retransmits, icsk->icsk_retransmits + 1); mptcp_set_datafin_timeout(sk); mptcp_send_ack(msk); - goto reset_timer; } =20 if (!mptcp_send_head(sk)) goto clear_scheduled; - - goto reset_timer; } =20 - if (err) - goto reset_timer; - - len =3D __mptcp_push_retrans(sk, dfrag); - if (len < 0) - goto clear_scheduled; - - msk->bytes_retrans +=3D len; - dfrag->already_sent =3D max(dfrag->already_sent, len); - reset_timer: mptcp_check_and_set_pending(sk); =20 --=20 2.54.0