From nobody Mon May 25 04:16:59 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B331625A359 for ; Fri, 22 May 2026 21:44:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779486254; cv=none; b=Te2KDYTjlM1zWqUtxjQEOsB2iuKEWgqyHrsWyo4vxUqjmurxqCBodvqbhdPgWllFN82rDYGqwbANpDJoudyTIu3turmznReOGtLl5LNP7p3GFDXSifaXiimSBPWjsXJcX5PmsUvWwdAVXMwZ5U1jiLfk7tWxTPZbzKy7ezsIICs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779486254; c=relaxed/simple; bh=AhQyW2yrDd7+Ro+y4tNxqHjpUxnVJl37rTzaC17lcic=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=kwh2hvFrEH5wHROl3rQF1LhKXg3ZrttJVO8mOE9ECdvRY6X6X6hLgMZdGVdDXxXjihQlu4qeGT7EAmeg1mtnBcFsdPoAsHstvN2O+zIreMfGThl9QMRrrTUYc4PWf3rpxzJX1282CYLjCNUUwbx1snFEPQX2HhxOUqbPoJGYiCc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=CoRpLl1T; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="CoRpLl1T" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1779486251; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=H4DOF1c46C/r7O3CnWLWVV79RegeLFnzYT9iVFoYJe0=; b=CoRpLl1Tic/O/L1LmSwnOSU/2feKua6HbAq5BjgxBAFZM22LJtbVUzFZkKXjz5JkhOylGa MvVWnK3PahgTFqZs3Y/Ivb5rm/7xupp1UqcgokC1pKxvjKIAdapb/A2Fux+CYgBuAa+y2N S4jFfvyQfxyIMR9Fcr0rDrgh+VvzZ1s= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-286-cS82MsEJOIafP8cs5cqrWw-1; Fri, 22 May 2026 17:44:08 -0400 X-MC-Unique: cS82MsEJOIafP8cs5cqrWw-1 X-Mimecast-MFC-AGG-ID: cS82MsEJOIafP8cs5cqrWw_1779486247 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 7EE251800610; Fri, 22 May 2026 21:44:07 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.44.32.93]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id C0D0C180056E; Fri, 22 May 2026 21:44:05 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Cc: Matthieu Baerts , Geliang Tang , gang.yan@linux.dev Subject: [PATCH v8 mptcp-next 1/9] mptcp: fix missing wakeups in edge scenarios Date: Fri, 22 May 2026 23:43:42 +0200 Message-ID: In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: -i-Iq3hkrSjCVAV_xFynf-jBgFO7IzRQzUIkVL5Qe_A_1779486247 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" The mptcp_recvmsg() can fill MPTCP socket receive queue via mptcp_move_skbs(), but currently does not try to wakeup any listener, because the same process is going to check the receive queue soon. When multiple threads are reading from the same fd, the above can cause stall. Add the missing wakeup. Fixes: 6771bfd9ee24 ("mptcp: update mptcp ack sequence from work queue") Signed-off-by: Paolo Abeni --- v6 -> v7 - use mptcp_epollin_ready() Notes: - sashiko may raise concerns about thundering herd problems. If application uses multiple threads to read on the same socket, that could/will happen. Application should not use multiple threads to read from the same socket if they want good performances. --- net/mptcp/protocol.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index ce8372fb3c6a..f14572fb1975 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -2276,6 +2276,8 @@ static bool mptcp_move_skbs(struct sock *sk) mptcp_backlog_spooled(sk, moved, &skbs); } mptcp_data_unlock(sk); + if (enqueued && mptcp_epollin_ready(sk)) + sk->sk_data_ready(sk); return enqueued; } =20 --=20 2.54.0 From nobody Mon May 25 04:16:59 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5F8A825A359 for ; Fri, 22 May 2026 21:44:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779486256; cv=none; b=bP4+/dgeKxh3G0Nl8LcIp2ZCSDEwYkxytaJo8K245rHa2EqQQbYk72n5Ihs1ulA2WLo5Th0V3h/ZAoK8tE2GIk3f7nUXhX812i0gV0R4Rhkshu/H1/h713zbqpyzihAovsgkTZ7KMMVlTAqR9bXneX3iXv/e7Qo4FtOCSVEcTPw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779486256; c=relaxed/simple; bh=gwl+dZ1eUhiNdD/znifmDOOwuSf7gz8pXM0DfLd/q2Y=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=bDahetCzpfcEuuq/fU3hbHFrZRyhhzfVzW2/X8zFhUq0eHkbXE48SONYZhVKgUuINTe+uOwaJnn/t9je2QOuJ9HdwXukaU6Ugw8VvgQ5BMkAqy9v/VDx3Aw2F8/u4RQcuohclBn0wJVnikMepawjEWGEbl8osPCc+ndD8n/IC0o= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=ZUW+CwG3; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="ZUW+CwG3" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1779486254; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MZk5YZoZl8fxVnovYNsrtczwo00g6IVhtNt2pXPqs00=; b=ZUW+CwG3uNVMJMKJcir7CchXCebnfcrMwV9280YgAwXDcDPeN3kIT8R92l7KYv8Y24TgZt dsw5ocA+yXXdQPHUZ6ITSaUvB8vtsHB4LlfUq17dh0I1UA7CUyc7cNdwquqxil4bXh8CLz LMJ3elAwwukhRRFbaO4UCE+woWkb3z4= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-362-gt94aD-oPhufZo5UbrfZEQ-1; Fri, 22 May 2026 17:44:10 -0400 X-MC-Unique: gt94aD-oPhufZo5UbrfZEQ-1 X-Mimecast-MFC-AGG-ID: gt94aD-oPhufZo5UbrfZEQ_1779486249 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 851861956096; Fri, 22 May 2026 21:44:09 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.44.32.93]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 0046318004A3; Fri, 22 May 2026 21:44:07 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Cc: Matthieu Baerts , Geliang Tang , gang.yan@linux.dev Subject: [PATCH v8 mptcp-next 2/9] mptcp: fix retransmission loop when csum is enabled Date: Fri, 22 May 2026 23:43:43 +0200 Message-ID: In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: wwvdONnJ_41nPvjDS65vOHbbJSOIoKVAOOvt8SaUJI8_1779486249 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" Sashiko noted that retransmission with csum enabled can actually transmit new data, but currently the relevant code does not update accordingly snd_nxt. The may cause incoming ack drop and an endless retransmission loop. Address the issue incrementing snd_nxt as needed. Fixes: 4e14867d5e91 ("mptcp: tune re-injections for csum enabled mode") Signed-off-by: Paolo Abeni --- net/mptcp/protocol.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index f14572fb1975..4f2b2868031a 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -2867,6 +2867,10 @@ static void __mptcp_retrans(struct sock *sk) msk->bytes_retrans +=3D len; dfrag->already_sent =3D max(dfrag->already_sent, len); =20 + /* With csum enabled retransmission can send new data. */ + if (after64(dfrag->already_sent + dfrag->data_seq, msk->snd_nxt)) + msk->snd_nxt =3D dfrag->already_sent + dfrag->data_seq; + reset_timer: mptcp_check_and_set_pending(sk); =20 --=20 2.54.0 From nobody Mon May 25 04:16:59 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2017625B09E for ; Fri, 22 May 2026 21:44:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779486260; cv=none; b=Goz6SMkh4JswwOwKToRWfRj3H3SxGKgYMaQZjVoF+PdyYW8rmaiydkrhZUSa4RjcGrFiSBeUTNIejHlxrGzgXMtk2Nl1xIjuqN0vbGE9ZzH3ij3PVfVJvsX2jHd8o+BGOP69T96NuU7w4KDGTzN0HAExNZsrW0RGfZaBsknJwco= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779486260; c=relaxed/simple; bh=7nieWN/XLW89s9cReNGYHzt7RuS/vmLIs0PVV1ZGNbI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=P1hC71a9lEt8iJr7IcCfPJysGCYEg1z5hwwPmczELsuS+P5ipNWb4IJ8d8WlLkxkiaqRkkCspgxqKd91QnPtdAvn6MLrazXTVxNy/IjaBWSXf8UXY72e168gO/j6XSvCpIrVgAyhLi8J2N8aXr16r5G8ttP+KC/cjLT+sVaAnas= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=YjxzMCTg; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="YjxzMCTg" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1779486258; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CM77jJMwicuR2Tar4vfZPp4QWAQHaraWJi+1bVkwT2o=; b=YjxzMCTglMsNct1orGqsLnQF83Iuk59J34oYiz0exJt2cSdkoaAI/ORt8sTVReY866QY9K x/M/0ckiHrkYU2hXUfRffRxvMsGHMZcqikZ16JiyDhPEQ9+Yxn+5PqFz0wTLy2+0EOWXf9 XYIZ6CjAAdHO3m2+GilptqhXUKbmV+Y= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-441-95kUb6lGNsyRYYiIBL87dg-1; Fri, 22 May 2026 17:44:12 -0400 X-MC-Unique: 95kUb6lGNsyRYYiIBL87dg-1 X-Mimecast-MFC-AGG-ID: 95kUb6lGNsyRYYiIBL87dg_1779486251 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 9E4E719560B4; Fri, 22 May 2026 21:44:11 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.44.32.93]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 23DA018004A3; Fri, 22 May 2026 21:44:09 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Cc: Matthieu Baerts , Geliang Tang , gang.yan@linux.dev Subject: [PATCH v8 mptcp-next 3/9] mptcp: close TOCTOU race while computing rcv_wnd Date: Fri, 22 May 2026 23:43:44 +0200 Message-ID: In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: Rut9TvAZGx04np8ztrLGGb-2mkbBDN9LCMmQx3vcV7Y_1779486251 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" The MPTCP output path access locklessly the MPTCP-level ack_seq in multiple times, using possibly different values for the data_ack in the DSS option and to compute the announced rcv wnd for the same packet. Refactor the cote to avoid inconsistencies which may confuse the peer. Also ensure that the MPTCP level rcv wnd is updated only when the egress packet actually contains a DSS ack. Fixes: fa3fe2b15031 ("mptcp: track window announced to peer") Signed-off-by: Paolo Abeni --- net/mptcp/options.c | 36 ++++++++++++++++++------------------ 1 file changed, 18 insertions(+), 18 deletions(-) diff --git a/net/mptcp/options.c b/net/mptcp/options.c index 4cc583fdc7a9..4d72f286a485 100644 --- a/net/mptcp/options.c +++ b/net/mptcp/options.c @@ -570,7 +570,6 @@ static bool mptcp_established_options_dss(struct sock *= sk, struct sk_buff *skb, struct mptcp_ext *mpext; unsigned int ack_size; bool ret =3D false; - u64 ack_seq; =20 opts->csum_reqd =3D READ_ONCE(msk->csum_enabled); mpext =3D skb ? mptcp_get_ext(skb) : NULL; @@ -601,14 +600,11 @@ static bool mptcp_established_options_dss(struct sock= *sk, struct sk_buff *skb, return ret; } =20 - ack_seq =3D READ_ONCE(msk->ack_seq); if (READ_ONCE(msk->use_64bit_ack)) { ack_size =3D TCPOLEN_MPTCP_DSS_ACK64; - opts->ext_copy.data_ack =3D ack_seq; opts->ext_copy.ack64 =3D 1; } else { ack_size =3D TCPOLEN_MPTCP_DSS_ACK32; - opts->ext_copy.data_ack32 =3D (uint32_t)ack_seq; opts->ext_copy.ack64 =3D 0; } opts->ext_copy.use_ack =3D 1; @@ -1298,19 +1294,14 @@ bool mptcp_incoming_options(struct sock *sk, struct= sk_buff *skb) return true; } =20 -static void mptcp_set_rwin(struct tcp_sock *tp, struct tcphdr *th) +static u64 mptcp_set_rwin(struct mptcp_sock *msk, struct tcp_sock *tp, + struct tcphdr *th, u64 ack_seq) { const struct sock *ssk =3D (const struct sock *)tp; - struct mptcp_subflow_context *subflow; - u64 ack_seq, rcv_wnd_old, rcv_wnd_new; - struct mptcp_sock *msk; + u64 rcv_wnd_old, rcv_wnd_new; u32 new_win; u64 win; =20 - subflow =3D mptcp_subflow_ctx(ssk); - msk =3D mptcp_sk(subflow->conn); - - ack_seq =3D READ_ONCE(msk->ack_seq); rcv_wnd_new =3D ack_seq + tp->rcv_wnd; =20 rcv_wnd_old =3D atomic64_read(&msk->rcv_wnd_sent); @@ -1363,7 +1354,7 @@ static void mptcp_set_rwin(struct tcp_sock *tp, struc= t tcphdr *th) =20 update_wspace: WRITE_ONCE(msk->old_wspace, tp->rcv_wnd); - subflow->rcv_wnd_sent =3D rcv_wnd_new; + return rcv_wnd_new; } =20 static void mptcp_track_rwin(struct tcp_sock *tp) @@ -1475,13 +1466,25 @@ void mptcp_write_options(struct tcphdr *th, __be32 = *ptr, struct tcp_sock *tp, *ptr++ =3D mptcp_option(MPTCPOPT_DSS, len, 0, flags); =20 if (mpext->use_ack) { + struct mptcp_sock *msk; + u64 ack_seq; + + /* DSS option is set only by mptcp_established_option, + * the caller is __tcp_transmit_skb() and ssk is always + * not NULL. + */ + subflow =3D mptcp_subflow_ctx(ssk); + msk =3D mptcp_sk(subflow->conn); + ack_seq =3D READ_ONCE(msk->ack_seq); if (mpext->ack64) { - put_unaligned_be64(mpext->data_ack, ptr); + put_unaligned_be64(ack_seq, ptr); ptr +=3D 2; } else { - put_unaligned_be32(mpext->data_ack32, ptr); + put_unaligned_be32(ack_seq, ptr); ptr +=3D 1; } + subflow->rcv_wnd_sent =3D mptcp_set_rwin(msk, tp, th, + ack_seq); } =20 if (mpext->use_map) { @@ -1709,9 +1712,6 @@ void mptcp_write_options(struct tcphdr *th, __be32 *p= tr, struct tcp_sock *tp, i +=3D 4; } } - - if (tp) - mptcp_set_rwin(tp, th); } =20 __be32 mptcp_get_reset_option(const struct sk_buff *skb) --=20 2.54.0 From nobody Mon May 25 04:16:59 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 07F4425B09E for ; Fri, 22 May 2026 21:44:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779486267; cv=none; b=DphYEM21DE0l90qhn9KiIZRx08Uk5pR8xx28rx0lIqFQacJuVfgbrLa7I//V7RstHts5X93rMS8/+HAWb1Jht1Q7oLmLLoCFn1r8YWoHIazU4XbWfx7mxOURSn3yyhIC63ig+Mzw1Nf9bBxUbRV2NRsb4mXuyywrTs+mYdWPDpA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779486267; c=relaxed/simple; bh=q2y6wYZ1kFZxhyFm2Sr8uf28Z+yUkWk2d7xtyKkHR30=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=bjSAqk5IM7Po5h2BCh/O++IfDTnKAcJnjDv6PFZLNwH8z6LD9aDHTTyS0rjJ110c3YfN4ZgC1JKp3ycfl7FbB5ecXyffsblEchwmzp761TDHq4Uh0zzE45JA4uSSxNTh3ZtdtLECu8zEqNuPD/UKpMxmw1lcfMiB2wJA7xS+Hx4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=cv6Z1nuW; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="cv6Z1nuW" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1779486262; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Wt9SaCLUquw51iOI5MDSGLtvCvn1tQArR3VoX8zNqoQ=; b=cv6Z1nuWgzDXzukBiTOqb94CVbw9mWlln4bjm1HrZgkYhcVZfToC/BXqM8TO6V8oyzT6hc dpdu8xkpRtsIcnwO7kVjJxoX5GJxT3QCS2t+Y72ZV4euRlbPym4iVd9uFMBf4VDuzPkIqN iH5JKqGfLrAJXZaYWbkwyo7cu762LBQ= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-127-w-k5C2_yMV-3LV_LcylFVw-1; Fri, 22 May 2026 17:44:17 -0400 X-MC-Unique: w-k5C2_yMV-3LV_LcylFVw-1 X-Mimecast-MFC-AGG-ID: w-k5C2_yMV-3LV_LcylFVw_1779486253 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id ACF08195609E; Fri, 22 May 2026 21:44:13 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.44.32.93]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 2D76F1800347; Fri, 22 May 2026 21:44:11 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Cc: Matthieu Baerts , Geliang Tang , gang.yan@linux.dev Subject: [PATCH v8 mptcp-next 4/9] mptcp: allow subflow rcv wnd to shrink Date: Fri, 22 May 2026 23:43:45 +0200 Message-ID: <661603fa10737a624bba854fb2515c7abb129a80.1779485511.git.pabeni@redhat.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: 4I9w_8iwW5n5HhaKFG6D9E2HQjNWHN0-Nhect2u2JV8_1779486253 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" In MPTCP connection, the `window` field in the TCP header refers to the MPTCP-level rcv_nxt and it's right edge should not move backward. Such constraint is enforced at DSS option generation time. At the same time, the TCP stack ensures independently that the TCP-level rcv wnd right's edge does not move backward. That in turn causes artificial inflating of the MPTCP rcv window when the incoming data is acked at the TCP level and is OoO in the MPTCP sequence space (or lands in the backlog). As a consequence, the incoming traffic can exceed the receiver rcvbuf size even when the sender is not misbehaving. Prevent such scenario forcibly allowing the TCP subflow to shrink the TCP-level rcv wnd regardless of the current netns setting. Fixes: f3589be0c420 ("mptcp: never shrink offered window") Signed-off-by: Paolo Abeni --- net/mptcp/options.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/net/mptcp/options.c b/net/mptcp/options.c index 4d72f286a485..97ea4aa37b33 100644 --- a/net/mptcp/options.c +++ b/net/mptcp/options.c @@ -566,6 +566,7 @@ static bool mptcp_established_options_dss(struct sock *= sk, struct sk_buff *skb, { struct mptcp_subflow_context *subflow =3D mptcp_subflow_ctx(sk); struct mptcp_sock *msk =3D mptcp_sk(subflow->conn); + struct tcp_sock *tp =3D tcp_sk(sk); unsigned int dss_size =3D 0; struct mptcp_ext *mpext; unsigned int ack_size; @@ -614,6 +615,12 @@ static bool mptcp_established_options_dss(struct sock = *sk, struct sk_buff *skb, if (dss_size =3D=3D 0) ack_size +=3D TCPOLEN_MPTCP_DSS_BASE; =20 + /* The caller is __tcp_transmit_skb(), and will compute the new rcv + * wnd soon: ensure that the window can shrink. + */ + if (skb) + tp->rcv_wnd =3D tp->rcv_nxt - tp->rcv_wup; + dss_size +=3D ack_size; =20 *size =3D ALIGN(dss_size, 4); --=20 2.54.0 From nobody Mon May 25 04:16:59 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 370C325B09E for ; Fri, 22 May 2026 21:44:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779486262; cv=none; b=NQG8pm883EuBL52jALRRsih6fferWB3Qysoeqs5G8HOhFjgK7U8pP4mXeFwxi07sQ8i2jRJd4ZMjXSaSHN+EL0aGrywuQhTChx1SinlTDK/xEBxUTrKNxJFAF+w6nl3tAhunL8PzT4t21nmtzGLlLtkZAch2ofbKYBAr5Raspc8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779486262; c=relaxed/simple; bh=nc/zQEncJ0SUYhFhDqQHTQ2ExHZanIRBk2fEvDyzXDY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=taMTJ9poYIUgTiXjyhCPbKJNJxqiXMZSbMvSKgg+ZAXrg/Yt22muI/Qt939oXioUCdqx+1b5bJdIU8Bz6Fu03J14r9XxTxpyZ57i+kay2sigLum9r1LSg7qA4Lu8N2cjITdLpamzEPsXz6JuJ5pKx8+kG4m1X/kY0/tdEZtongc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=d6QgFQCQ; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="d6QgFQCQ" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1779486260; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Jh20VZz/YFfN5GZJ4Y6PF9MU51+fgFR+8Z0NwauxmjQ=; b=d6QgFQCQoAsmG9liVT5kPwwHjH4WO8m1e+cR2ECZ0AgolqMV9gp1TFiDNjGwbV6IOileKo d8xTgCL80j08cdCN7E+NPnTNsGC46hBRGISEhv2nCRz23VfvduGhZnhDGyN5rteXx5Ob4C fnfGCz/ZqOvWymmhD8dKyHDqEevrxBo= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-425-Ut-23MAKNcqRvcEi_3KQKA-1; Fri, 22 May 2026 17:44:16 -0400 X-MC-Unique: Ut-23MAKNcqRvcEi_3KQKA-1 X-Mimecast-MFC-AGG-ID: Ut-23MAKNcqRvcEi_3KQKA_1779486255 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id A6AA31800451; Fri, 22 May 2026 21:44:15 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.44.32.93]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 2B3311800347; Fri, 22 May 2026 21:44:13 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Cc: Matthieu Baerts , Geliang Tang , gang.yan@linux.dev Subject: [PATCH v8 mptcp-next 5/9] mptcp: explicitly drop over memory limits Date: Fri, 22 May 2026 23:43:46 +0200 Message-ID: <999dc99ccc3d98f3a668fe784114e3f5a814ec12.1779485511.git.pabeni@redhat.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: zZ9fVD1O7cGGwh9HUGOY4z8w4Nz1EqZMPb97xeCGgrY_1779486255 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" Currently the enforcement of the rcvbuf constraint is implemented when moving the skbs into the msk receive or OoO queue, keeping the incoming skbs in the subflow queue when over limit. Under significant memory pressure the above can cause permanent data transfer stalls, as the skb needed to make forward progress can be stuck in a subflow queue. Over memory limits, drop the incoming skb, relaying on MPTCP-level retransmissions. Note that fallback socket must perform the limit before the skb reaches the subflow-level queue, as dropping an in-sequence already acked skb would break the stream. This is not a complete fix for the stall issue, as the drop strategy needs refinements that will come in the next patches. Signed-off-by: Paolo Abeni --- v7 -> v8: - removed non fallback check in mptcp_incoming_option(): that is an tput optimization (avoid rejections) for a slowpath case (sender is misbehaving) and needs too much additional complexity. - move here from later patches mibs definition v6 -> v7: - fix sign extension issues v4 -> v5: - fix possible u32 overflow in mptcp_over_limit v3 -> v4: - schedule TCP ack on drop - enforce limits in __mptcp_move_skb() and __mptcp_add_backlog(), too but only if not fallback. v1 -> v2: - deal correctly with tcp fin and zero win probe RFC -> v1: - limit vs actual buffer size - use CB info instead of skb->len Note that: - this needs the follow-up patches to really fix the stall - sashiko can assume ZWP carries unacked data and may be silently dropped. AFAIK that is false. - sashiko apparently can't graps mptcp subflow never hit the tcp rx fastpath, and the mptcp_incoming_options in tcp_rcv_state_process is hit, the peer can't transmit any more data. - the memory comparison is intentionally very rough, as the msk socket lock is not currently held where the condition is now enforced. This should require some refinement, shared as-is to avoid more latency on my side --- net/mptcp/mib.c | 2 ++ net/mptcp/mib.h | 2 ++ net/mptcp/options.c | 28 +++++++++++++++++++++++++--- net/mptcp/protocol.c | 31 +++++++++++++++++++++++-------- 4 files changed, 52 insertions(+), 11 deletions(-) diff --git a/net/mptcp/mib.c b/net/mptcp/mib.c index f23fda0c55a7..ef65e2df709f 100644 --- a/net/mptcp/mib.c +++ b/net/mptcp/mib.c @@ -85,6 +85,8 @@ static const struct snmp_mib mptcp_snmp_list[] =3D { SNMP_MIB_ITEM("SimultConnectFallback", MPTCP_MIB_SIMULTCONNFALLBACK), SNMP_MIB_ITEM("FallbackFailed", MPTCP_MIB_FALLBACKFAILED), SNMP_MIB_ITEM("WinProbe", MPTCP_MIB_WINPROBE), + SNMP_MIB_ITEM("BacklogDrop", MPTCP_MIB_BACKLOGDROP), + SNMP_MIB_ITEM("RcvPruned", MPTCP_MIB_RCVPRUNED), }; =20 /* mptcp_mib_alloc - allocate percpu mib counters diff --git a/net/mptcp/mib.h b/net/mptcp/mib.h index 812218b5ed2b..c84eb853d499 100644 --- a/net/mptcp/mib.h +++ b/net/mptcp/mib.h @@ -88,6 +88,8 @@ enum linux_mptcp_mib_field { MPTCP_MIB_SIMULTCONNFALLBACK, /* Simultaneous connect */ MPTCP_MIB_FALLBACKFAILED, /* Can't fallback due to msk status */ MPTCP_MIB_WINPROBE, /* MPTCP-level zero window probe */ + MPTCP_MIB_BACKLOGDROP, /* Backlog over memory limit */ + MPTCP_MIB_RCVPRUNED, /* Dropped due to memory constrains */ __MPTCP_MIB_MAX }; =20 diff --git a/net/mptcp/options.c b/net/mptcp/options.c index 97ea4aa37b33..2b35bdc113a5 100644 --- a/net/mptcp/options.c +++ b/net/mptcp/options.c @@ -1161,8 +1161,30 @@ static bool add_addr_hmac_valid(struct mptcp_sock *m= sk, return hmac =3D=3D mp_opt->ahmac; } =20 -/* Return false in case of error (or subflow has been reset), - * else return true. +static bool mptcp_over_limit(struct sock *sk, struct sock *ssk, + const struct sk_buff *skb) +{ + struct mptcp_sock *msk =3D mptcp_sk(sk); + u64 mem =3D sk_rmem_alloc_get(sk); + + mem +=3D READ_ONCE(msk->backlog_len); + if (likely(mem <=3D READ_ONCE(sk->sk_rcvbuf))) + return false; + + /* Avoid silently dropping pure acks, fin or zero win probes. */ + if (TCP_SKB_CB(skb)->seq =3D=3D TCP_SKB_CB(skb)->end_seq || + TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN || + !after(TCP_SKB_CB(skb)->end_seq, tcp_sk(ssk)->rcv_nxt)) + return false; + + /* Dropped due to memory constraints, schedule an ack. */ + inet_csk(ssk)->icsk_ack.pending |=3D ICSK_ACK_NOMEM | ICSK_ACK_NOW; + inet_csk_schedule_ack(ssk); + return true; +} + +/* Return false when the caller must drop the packet, i.e. in case of erro= r, + * subflow has been reset, or over memory limits. */ bool mptcp_incoming_options(struct sock *sk, struct sk_buff *skb) { @@ -1188,7 +1210,7 @@ bool mptcp_incoming_options(struct sock *sk, struct s= k_buff *skb) =20 __mptcp_data_acked(subflow->conn); mptcp_data_unlock(subflow->conn); - return true; + return !mptcp_over_limit(subflow->conn, sk, skb); } =20 mptcp_get_options(skb, &mp_opt); diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 4f2b2868031a..1d498c601145 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -381,6 +381,16 @@ static bool __mptcp_move_skb(struct sock *sk, struct s= k_buff *skb) =20 mptcp_borrow_fwdmem(sk, skb); =20 + /* Can't drop packets for fallback socket this late, or the stream + * will break. + */ + if (unlikely(sk_rmem_alloc_get(sk) > READ_ONCE(sk->sk_rcvbuf)) && + !__mptcp_check_fallback(msk)) { + MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_RCVPRUNED); + mptcp_drop(sk, skb); + return false; + } + if (MPTCP_SKB_CB(skb)->map_seq =3D=3D msk->ack_seq) { /* in sequence */ msk->bytes_received +=3D copy_len; @@ -675,6 +685,7 @@ static void __mptcp_add_backlog(struct sock *sk, struct sk_buff *tail =3D NULL; struct sock *ssk =3D skb->sk; bool fragstolen; + u64 limit; int delta; =20 if (unlikely(sk->sk_state =3D=3D TCP_CLOSE)) { @@ -682,6 +693,16 @@ static void __mptcp_add_backlog(struct sock *sk, return; } =20 + /* Similar additional allowance as plain TCP. */ + limit =3D READ_ONCE(sk->sk_rcvbuf); + limit +=3D (limit >> 1) + 64 * 1024; + limit =3D min_t(u64, limit, UINT_MAX); + if (msk->backlog_len > limit && !__mptcp_check_fallback(msk)) { + __MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_BACKLOGDROP); + kfree_skb_reason(skb, SKB_DROP_REASON_SOCKET_BACKLOG); + return; + } + /* Try to coalesce with the last skb in our backlog */ if (!list_empty(&msk->backlog_list)) tail =3D list_last_entry(&msk->backlog_list, struct sk_buff, list); @@ -753,7 +774,7 @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp= _sock *msk, =20 mptcp_init_skb(ssk, skb, offset, len); =20 - if (own_msk && sk_rmem_alloc_get(sk) < sk->sk_rcvbuf) { + if (own_msk) { mptcp_subflow_lend_fwdmem(subflow, skb); ret |=3D __mptcp_move_skb(sk, skb); } else { @@ -2211,10 +2232,6 @@ static bool __mptcp_move_skbs(struct sock *sk, struc= t list_head *skbs, u32 *delt =20 *delta =3D 0; while (1) { - /* If the msk recvbuf is full stop, don't drop */ - if (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf) - break; - prefetch(skb->next); list_del(&skb->list); *delta +=3D skb->truesize; @@ -2242,9 +2259,7 @@ static bool mptcp_can_spool_backlog(struct sock *sk, = struct list_head *skbs) DEBUG_NET_WARN_ON_ONCE(msk->backlog_unaccounted && sk->sk_socket && mem_cgroup_from_sk(sk)); =20 - /* Don't spool the backlog if the rcvbuf is full. */ - if (list_empty(&msk->backlog_list) || - sk_rmem_alloc_get(sk) > sk->sk_rcvbuf) + if (list_empty(&msk->backlog_list)) return false; =20 INIT_LIST_HEAD(skbs); --=20 2.54.0 From nobody Mon May 25 04:16:59 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 05BAF25A359 for ; Fri, 22 May 2026 21:44:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779486266; cv=none; b=mwFBswF9dYbUINmeWcdE6ctDMu5O/3S+N6Pdg4s/1NSFfbU+RzKX4CpkHaNt8RBg//6ucpX9+06BaBVVo2uaey/eSaR4drdT0NL7zxgLRwgMdAqrfRUoLyk6cqV5sZjrIRuguN7GjpXk/68Ax2dm9M4WwqTCIsERk01aBC313F0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779486266; c=relaxed/simple; bh=yXe8oe9zXHi+/FXJN23sAxhVk3EtK9wjENkr4g9PQx4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=Exu9fA8vrZ4DaxBmSxII+EuVCAJDQAsViS9uNEf7Mo/AtZlVhdL3L8VuXAWkuhefSAFnytGtG5IQSQ8eCf26nQfxb46/XCc0Od1Lotvj5rlmE9h/Dsy7QJdGzxX12HxUbKVYn+PEM6ZTe/Eqhbxe8vDLYJF/JA9C9s9l0cmHTmo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=V2TNY7JU; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="V2TNY7JU" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1779486264; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=53U6O2iU4fV/wMSSQpJzcN5YR/QTazvSSRPrfJ/Pw3E=; b=V2TNY7JU8kIcYCpZ6wTlqKeB3lkJxG6uvvXA5BsWQOahcCz02DmqTulZG615U2pncvA0uW YceYGkhpKpJGipGiW2G37be2TYqjRc5FLX8mvalDz2u+lCbFg2J++ktMTd4gF8TmGt9uiA WfDwK9JOK//q7scoMVpYol0cUQhrpuk= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-676--6Qnox_sM8mh-2uC2ggDyw-1; Fri, 22 May 2026 17:44:18 -0400 X-MC-Unique: -6Qnox_sM8mh-2uC2ggDyw-1 X-Mimecast-MFC-AGG-ID: -6Qnox_sM8mh-2uC2ggDyw_1779486257 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 9FAFF19560B5; Fri, 22 May 2026 21:44:17 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.44.32.93]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 247611800347; Fri, 22 May 2026 21:44:15 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Cc: Matthieu Baerts , Geliang Tang , gang.yan@linux.dev Subject: [PATCH v8 mptcp-next 6/9] mptcp: enforce hard limit on backlog flushing Date: Fri, 22 May 2026 23:43:47 +0200 Message-ID: In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: adoCQlov3BRdVFINB4_5fHSs-vJR4B8-xeVtJyAItOs_1779486257 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" Currently a wild producer could keep the backlog flushing operation spinning for an unbound time. Since the previous patch the amount of data present in the backlog is hard-limited. Move the backlog len update at the end of the flush loop to prevent it spinning forever. Also, no need to splice back the remaining skbs list into the backlog, as such list is always empty after each backlog processing loop. Signed-off-by: Paolo Abeni --- net/mptcp/protocol.c | 21 ++++++--------------- 1 file changed, 6 insertions(+), 15 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 1d498c601145..03d6f8658467 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -2230,7 +2230,6 @@ static bool __mptcp_move_skbs(struct sock *sk, struct= list_head *skbs, u32 *delt struct mptcp_sock *msk =3D mptcp_sk(sk); bool moved =3D false; =20 - *delta =3D 0; while (1) { prefetch(skb->next); list_del(&skb->list); @@ -2267,20 +2266,12 @@ static bool mptcp_can_spool_backlog(struct sock *sk= , struct list_head *skbs) return true; } =20 -static void mptcp_backlog_spooled(struct sock *sk, u32 moved, - struct list_head *skbs) -{ - struct mptcp_sock *msk =3D mptcp_sk(sk); - - WRITE_ONCE(msk->backlog_len, msk->backlog_len - moved); - list_splice(skbs, &msk->backlog_list); -} - static bool mptcp_move_skbs(struct sock *sk) { + struct mptcp_sock *msk =3D mptcp_sk(sk); struct list_head skbs; bool enqueued =3D false; - u32 moved; + u32 moved =3D 0; =20 mptcp_data_lock(sk); while (mptcp_can_spool_backlog(sk, &skbs)) { @@ -2288,8 +2279,8 @@ static bool mptcp_move_skbs(struct sock *sk) enqueued |=3D __mptcp_move_skbs(sk, &skbs, &moved); =20 mptcp_data_lock(sk); - mptcp_backlog_spooled(sk, moved, &skbs); } + WRITE_ONCE(msk->backlog_len, msk->backlog_len - moved); mptcp_data_unlock(sk); if (enqueued && mptcp_epollin_ready(sk)) sk->sk_data_ready(sk); @@ -3678,12 +3669,12 @@ static void mptcp_release_cb(struct sock *sk) __must_hold(&sk->sk_lock.slock) { struct mptcp_sock *msk =3D mptcp_sk(sk); + u32 moved =3D 0; =20 for (;;) { unsigned long flags =3D (msk->cb_flags & MPTCP_FLAGS_PROCESS_CTX_NEED); struct list_head join_list, skbs; bool spool_bl; - u32 moved; =20 spool_bl =3D mptcp_can_spool_backlog(sk, &skbs); if (!flags && !spool_bl) @@ -3716,9 +3707,9 @@ static void mptcp_release_cb(struct sock *sk) =20 cond_resched(); spin_lock_bh(&sk->sk_lock.slock); - if (spool_bl) - mptcp_backlog_spooled(sk, moved, &skbs); } + if (moved) + WRITE_ONCE(msk->backlog_len, msk->backlog_len - moved); =20 if (__test_and_clear_bit(MPTCP_CLEAN_UNA, &msk->cb_flags)) __mptcp_clean_una_wakeup(sk); --=20 2.54.0 From nobody Mon May 25 04:16:59 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D3CBB38D3F8 for ; Fri, 22 May 2026 21:44:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779486271; cv=none; b=OZqI3tqbi7TkbWk7Nno8Tci73Qkh87bUhYEFo58EfVxvRSgBj+a/t7ORVisNDvi5z6awf50USW9baCzvx1CTHje8tz8Xd+MvWgASSgXVqi7PlrUxhsfMzEntFgYizJdxa3w4p4mFrTi9snyp7oGcGKa5tWKaTYs6oTccuz0F0Ow= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779486271; c=relaxed/simple; bh=o13NYolT5k+nEfLd27KLp4le6KIvNjMSLUhFnp3jLPM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=paNIkuleZxZBPUqZILcGJS3F0hMxKWwxFymsEA7WdBFeNe9LDadmCDcBWbQrENAr8QaLu7cxxrGjs1up49hApvYHHcXnhGkQDmOBH5cfA2iT5sqtYGlHaCMO3xSTflZp1YmBszVUsNJbQOD1VmgNhxs7zh9IXQNDBWse0BfxUlg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=gf+w1UOt; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="gf+w1UOt" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1779486268; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=quWSaWnMw4mZ3M5OuK9L7Q3zwSS0dWlde40/ujdEQcY=; b=gf+w1UOtFioEgVGvTZ6cQa49LgFmNF0eEBNJHZIKybaRz/LBUkq7l2xmUjYOjnm8qW0cXs PnHn5NLsAmpr6jgdWLEUd0cfv2SgSWj6orjILCSDfayiUvSlla++L/YMndXyN7TtGEwuMF 0oAfaHxClgEgTwVAElfIK/lFLtQ4/l8= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-106-VT8UmyavPtWy1OUGMxktRg-1; Fri, 22 May 2026 17:44:22 -0400 X-MC-Unique: VT8UmyavPtWy1OUGMxktRg-1 X-Mimecast-MFC-AGG-ID: VT8UmyavPtWy1OUGMxktRg_1779486259 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id CAA0E1956089; Fri, 22 May 2026 21:44:19 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.44.32.93]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 1EB2F18004A3; Fri, 22 May 2026 21:44:17 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Cc: Matthieu Baerts , Geliang Tang , gang.yan@linux.dev Subject: [PATCH v8 mptcp-next 7/9] mptcp: implemented OoO queue pruning Date: Fri, 22 May 2026 23:43:48 +0200 Message-ID: In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: 2HCtn-3xzbUPZmww3HX0SVaEfbX4A1MOPTyY60nzJhE_1779486259 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" Leverage the hybrid helpers to implement the receive queue and OoO queue collapsing at ingress time when reaching memory bounds. If the msk is owned by the user-space at incoming skb time, perform the pruning in the release_cb. The prune check is additionally performed when the skb reaches the msk-level queues. Signed-off-by: Paolo Abeni --- v6 -> v7: - fix u64 -> u32 truncation v2 -> v3: - deal with unsynced TFO skb at prune time - only possible when pruning in mptcp_over_limit() v1 -> v2: - collapse rcv queue, too - deal with MPC map, too - drop left-over sentence in the commit message RFC -> v1: - use data_seq only when available - avoid ack_seq lockless access - drop limit on fallback - collapse rcvqueue, too - drop only when pruning is not possible and over rcvbuf * 2 Note: - sashiko can be confused about fwd memory lifecycle (I can understand that :). Any exceeding amount of fwd allocated memory is always released by the next sk_mem_uncharge() - i.e. fwd memory is not tied to the current skb. - AFAICS KASAN handles bitmap variables in a sane way, and sashiko doesn't know about that --- net/mptcp/mib.c | 1 + net/mptcp/mib.h | 1 + net/mptcp/protocol.c | 46 +++++++++++++++++++++++++++++++++++++++++--- 3 files changed, 45 insertions(+), 3 deletions(-) diff --git a/net/mptcp/mib.c b/net/mptcp/mib.c index ef65e2df709f..d9bd4f4afcc0 100644 --- a/net/mptcp/mib.c +++ b/net/mptcp/mib.c @@ -87,6 +87,7 @@ static const struct snmp_mib mptcp_snmp_list[] =3D { SNMP_MIB_ITEM("WinProbe", MPTCP_MIB_WINPROBE), SNMP_MIB_ITEM("BacklogDrop", MPTCP_MIB_BACKLOGDROP), SNMP_MIB_ITEM("RcvPruned", MPTCP_MIB_RCVPRUNED), + SNMP_MIB_ITEM("OfoPruned", MPTCP_MIB_OFO_PRUNED), }; =20 /* mptcp_mib_alloc - allocate percpu mib counters diff --git a/net/mptcp/mib.h b/net/mptcp/mib.h index c84eb853d499..18f35f7e0a2d 100644 --- a/net/mptcp/mib.h +++ b/net/mptcp/mib.h @@ -90,6 +90,7 @@ enum linux_mptcp_mib_field { MPTCP_MIB_WINPROBE, /* MPTCP-level zero window probe */ MPTCP_MIB_BACKLOGDROP, /* Backlog over memory limit */ MPTCP_MIB_RCVPRUNED, /* Dropped due to memory constrains */ + MPTCP_MIB_OFO_PRUNED, /* MPTCP-level OoO queue pruned */ __MPTCP_MIB_MAX }; =20 diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 03d6f8658467..f446e22148b9 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -373,6 +373,43 @@ static void mptcp_init_skb(struct sock *ssk, struct sk= _buff *skb, int offset, skb_dst_drop(skb); } =20 +/* "Inspired" from the TCP version */ +static void mptcp_prune_ofo_queue(struct sock *sk, u64 seq) +{ + struct mptcp_sock *msk =3D mptcp_sk(sk); + struct rb_node *node, *prev; + bool pruned =3D false; + u64 mem; + + if (RB_EMPTY_ROOT(&msk->out_of_order_queue)) + return; + + node =3D &msk->ooo_last_skb->rbnode; + + do { + struct sk_buff *skb =3D rb_to_skb(node); + + /* Stop pruning if the incoming skb would land in OoO tail. */ + if (after64(seq, MPTCP_SKB_CB(skb)->map_seq)) + break; + + pruned =3D true; + prev =3D rb_prev(node); + rb_erase(node, &msk->out_of_order_queue); + mptcp_drop(sk, skb); + msk->ooo_last_skb =3D rb_to_skb(prev); + + mem =3D (unsigned int)atomic_read(&sk->sk_rmem_alloc); + if (mem < sk->sk_rcvbuf) + break; + + node =3D prev; + } while (node); + + if (pruned) + MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_OFO_PRUNED); +} + static bool __mptcp_move_skb(struct sock *sk, struct sk_buff *skb) { u64 copy_len =3D MPTCP_SKB_CB(skb)->end_seq - MPTCP_SKB_CB(skb)->map_seq; @@ -386,9 +423,12 @@ static bool __mptcp_move_skb(struct sock *sk, struct s= k_buff *skb) */ if (unlikely(sk_rmem_alloc_get(sk) > READ_ONCE(sk->sk_rcvbuf)) && !__mptcp_check_fallback(msk)) { - MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_RCVPRUNED); - mptcp_drop(sk, skb); - return false; + mptcp_prune_ofo_queue(sk, MPTCP_SKB_CB(skb)->map_seq); + if (sk_rmem_alloc_get(sk) > READ_ONCE(sk->sk_rcvbuf)) { + MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_RCVPRUNED); + mptcp_drop(sk, skb); + return false; + } } =20 if (MPTCP_SKB_CB(skb)->map_seq =3D=3D msk->ack_seq) { --=20 2.54.0 From nobody Mon May 25 04:16:59 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CCD0525A359 for ; Fri, 22 May 2026 21:44:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779486272; cv=none; b=hjz6W94jryKLcLivweHsvjNcP9Mvb15ffst2PlJmoFMSnzyDRC6dN6wJ90xU/muLaS1e7OO6op/1eVIbWQTFaYMEJlnTTEHfoFFaUtUGfxeYUeyqD/cT7e/4aiH1UkaLlQv4Ox7QPkmZ9+C6l+yKIuejdpxA/5tfed9ztrnB/BM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779486272; c=relaxed/simple; bh=Pk3+meXDz6C6nXwsEozgeBWBAzmqq34cR27TSC8F4Ws=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=glF+MerfVMFOOkHNVNnbPiJ1julVBlb15TlT8TkPzbkJohA4XgDPNFHkibMTf8lRo4QAVVIRbiB4mUI/Sy4MahQT3HLIfV+cIu8GaEa7Zx25XThe3X6YVI0ieqUJmLHF5JOWN071c8PIw7MxyRMF76T4AKPex6E0EPInLfUeIUE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=XNsoSgEy; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="XNsoSgEy" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1779486269; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2oJWkYx1N8ihTm35tpoMu+76Clo0T5IsTEVCdwb6J80=; b=XNsoSgEybTmiQOJED0+W/kThXw3MNXYbtKj9fRk8z0SZQxewL+o/qbrGrKGhb92EWnV3g3 vpKEiyBsnAAV+a3KjYaRMwN4qKgDmFyU5ADo54X07ED8VwiXI8MTAe8wdQ37eZVuyJhEnn GCVJoc+yJGULCJuQ0Y+YdsuKIS5Ct1A= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-624-E9vmAcTdO_KlJ0kioZo3Cg-1; Fri, 22 May 2026 17:44:22 -0400 X-MC-Unique: E9vmAcTdO_KlJ0kioZo3Cg-1 X-Mimecast-MFC-AGG-ID: E9vmAcTdO_KlJ0kioZo3Cg_1779486261 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id AB0DA1956089; Fri, 22 May 2026 21:44:21 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.44.32.93]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 3169A1800347; Fri, 22 May 2026 21:44:19 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Cc: Matthieu Baerts , Geliang Tang , gang.yan@linux.dev Subject: [PATCH v8 mptcp-next 8/9] mptcp: move the retrans loop to a separate helper Date: Fri, 22 May 2026 23:43:49 +0200 Message-ID: In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: Xnb6LRmQJH848NM9dhL9UQvTtAtPAFQVTNW7U8a5qrw_1779486261 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" This is a cleanup in order to make the next patch simpler. No functional change intended. Signed-off-by: Paolo Abeni --- net/mptcp/protocol.c | 74 +++++++++++++++++++++++++------------------- 1 file changed, 43 insertions(+), 31 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index f446e22148b9..4ebe45e8a3d2 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -2824,41 +2824,14 @@ static void mptcp_check_fastclose(struct mptcp_sock= *msk) sk_error_report(sk); } =20 -static void __mptcp_retrans(struct sock *sk) +/* Retransmit the specified data fragment on all the selected subflows. */ +static int __mptcp_push_retrans(struct sock *sk, struct mptcp_data_frag *d= frag) { struct mptcp_sendmsg_info info =3D { .data_lock_held =3D true, }; struct mptcp_sock *msk =3D mptcp_sk(sk); struct mptcp_subflow_context *subflow; - struct mptcp_data_frag *dfrag; struct sock *ssk; - int ret, err; - u16 len =3D 0; - - mptcp_clean_una_wakeup(sk); - - /* first check ssk: need to kick "stale" logic */ - err =3D mptcp_sched_get_retrans(msk); - dfrag =3D mptcp_rtx_head(sk); - if (!dfrag) { - if (mptcp_data_fin_enabled(msk)) { - struct inet_connection_sock *icsk =3D inet_csk(sk); - - WRITE_ONCE(icsk->icsk_retransmits, - icsk->icsk_retransmits + 1); - mptcp_set_datafin_timeout(sk); - mptcp_send_ack(msk); - - goto reset_timer; - } - - if (!mptcp_send_head(sk)) - goto clear_scheduled; - - goto reset_timer; - } - - if (err) - goto reset_timer; + int ret, len =3D 0; =20 mptcp_for_each_subflow(msk, subflow) { if (READ_ONCE(subflow->scheduled)) { @@ -2886,7 +2859,7 @@ static void __mptcp_retrans(struct sock *sk) !msk->allow_subflows) { spin_unlock_bh(&msk->fallback_lock); release_sock(ssk); - goto clear_scheduled; + return -1; } =20 while (info.sent < info.limit) { @@ -2909,6 +2882,45 @@ static void __mptcp_retrans(struct sock *sk) release_sock(ssk); } } + return len; +} + +static void __mptcp_retrans(struct sock *sk) +{ + struct mptcp_sock *msk =3D mptcp_sk(sk); + struct mptcp_subflow_context *subflow; + struct mptcp_data_frag *dfrag; + int err, len; + + mptcp_clean_una_wakeup(sk); + + /* first check ssk: need to kick "stale" logic */ + err =3D mptcp_sched_get_retrans(msk); + dfrag =3D mptcp_rtx_head(sk); + if (!dfrag) { + if (mptcp_data_fin_enabled(msk)) { + struct inet_connection_sock *icsk =3D inet_csk(sk); + + WRITE_ONCE(icsk->icsk_retransmits, + icsk->icsk_retransmits + 1); + mptcp_set_datafin_timeout(sk); + mptcp_send_ack(msk); + + goto reset_timer; + } + + if (!mptcp_send_head(sk)) + goto clear_scheduled; + + goto reset_timer; + } + + if (err) + goto reset_timer; + + len =3D __mptcp_push_retrans(sk, dfrag); + if (len < 0) + goto clear_scheduled; =20 msk->bytes_retrans +=3D len; dfrag->already_sent =3D max(dfrag->already_sent, len); --=20 2.54.0 From nobody Mon May 25 04:16:59 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1C9E425A359 for ; Fri, 22 May 2026 21:44:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779486268; cv=none; b=i9L8SuIpIh/plmx4DXz3dToGzCE7NU68Pic2ipwFMJRwRU9sc4+UkjXpGfKh2Gn87OITA5aiqIQJinXK0tD08+EkFsbNpIfGK2ztIQrtrEPXS1YH9/ylqfzNjAsfprR8/NdgzliU4m0m6TkDthVXnoYgH3YQbgpsycqj6odfdCg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779486268; c=relaxed/simple; bh=1cDGvb6zoiMgyUf8vNUhTVaHZJ4CaOkPpszUIGfMqEo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=m3t3/+6aILKgnDBoI6PfH2SdvaDs/MOYLEpZ2MIl5iELQYz6FTpuzjFD5DzdhfLwUvqHBKGcPL2XnPytmsm472uqTl7+kT8jnzZbHF/kyCh9dsKuybeZUSQPkR2jsgLm+uEP8rRJE+rTpD3SuLUGv06u10pEUEoRx2UIhhev9IE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=Hj8em8oK; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Hj8em8oK" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1779486266; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8sKgiz4iaU1lG3qqojL0lYxUKYvR8FGb9Ie2vZbQY7E=; b=Hj8em8oKNrnZK0lKi1IgaPn1853Yi6sMI+aRjKCrnVoMOWjQELRp5+MrVyTOrpQuEyayit Wwspkzjgx0fPuNxf4PGPlKT7x+Fvnn89wD55i9hcpWRvZc+6v5tOFK7MvgkXNZZ2dlSlre B3frHXmgLR+Pe8ax58GZDjFdJg9P6XA= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-575-Ah5QyCuIP-uGJly2aF4U5Q-1; Fri, 22 May 2026 17:44:24 -0400 X-MC-Unique: Ah5QyCuIP-uGJly2aF4U5Q-1 X-Mimecast-MFC-AGG-ID: Ah5QyCuIP-uGJly2aF4U5Q_1779486263 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id A19271800451; Fri, 22 May 2026 21:44:23 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.44.32.93]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 2898718004A3; Fri, 22 May 2026 21:44:21 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Cc: Matthieu Baerts , Geliang Tang , gang.yan@linux.dev Subject: [PATCH v8 mptcp-next 9/9] mptcp: let the retrans scheduler do its job. Date: Fri, 22 May 2026 23:43:50 +0200 Message-ID: In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: NxzOHCzJj4vXEqzL2zsWi8WYIYYWWMm1if2AgTdfXfI_1779486263 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" Currently the MPTCP core enforces that when MPTCP-level retrans timer fires, at most a single dfrag is retransmitted. If some corner-cases it may be necessary retransmit multiple dfrags, and the MPTCP socket will need to wait multiple retrans timeout to accomplish that. Remove the mentioned constraint, allowing to transmit multiple dfrags per retrans period, as long as the scheduler keeps selecting subflows for retransmissions and pending data is available in the rtx queue. The default scheduler will transmit a dfrag per available subflow. Signed-off-by: Paolo Abeni --- v7 -> v8 - fix corner-case retrans_seq update v4 -> v5: - fixed already_sent update v3 -> v4: - avoid quadratic behavior, fix retrans_seq update - fix rtx timer re-schedule miss v2 -> v3: - fix infinite loop issue (should address tls tests failures) v1 -> v2: - fix retrans sequence update (sashiko) Note: - sashiko see issues when dfrag =3D mptcp_rtx_head(sk) !=3D NULL and dfrag->already_sent =3D=3D 0. That condition should not possible: if mptcp_rtx_head() is not NULL there should be some data already sent. --- net/mptcp/protocol.c | 117 +++++++++++++++++++++++++++++++------------ 1 file changed, 85 insertions(+), 32 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 4ebe45e8a3d2..892fc44dffac 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -1197,13 +1197,6 @@ static void __mptcp_clean_una_wakeup(struct sock *sk) mptcp_write_space(sk); } =20 -static void mptcp_clean_una_wakeup(struct sock *sk) -{ - mptcp_data_lock(sk); - __mptcp_clean_una_wakeup(sk); - mptcp_data_unlock(sk); -} - static void mptcp_enter_memory_pressure(struct sock *sk) { struct mptcp_subflow_context *subflow; @@ -2824,8 +2817,12 @@ static void mptcp_check_fastclose(struct mptcp_sock = *msk) sk_error_report(sk); } =20 -/* Retransmit the specified data fragment on all the selected subflows. */ -static int __mptcp_push_retrans(struct sock *sk, struct mptcp_data_frag *d= frag) +/* + * Retransmit the specified data fragment on all the selected subflows, + * starting from the specified sequence + */ +static int __mptcp_push_retrans(struct sock *sk, struct mptcp_data_frag *d= frag, + u64 sent_seq) { struct mptcp_sendmsg_info info =3D { .data_lock_held =3D true, }; struct mptcp_sock *msk =3D mptcp_sk(sk); @@ -2835,6 +2832,7 @@ static int __mptcp_push_retrans(struct sock *sk, stru= ct mptcp_data_frag *dfrag) =20 mptcp_for_each_subflow(msk, subflow) { if (READ_ONCE(subflow->scheduled)) { + u16 offset =3D sent_seq - dfrag->data_seq; u16 copied =3D 0; =20 mptcp_subflow_set_scheduled(subflow, false); @@ -2844,9 +2842,12 @@ static int __mptcp_push_retrans(struct sock *sk, str= uct mptcp_data_frag *dfrag) lock_sock(ssk); =20 /* limit retransmission to the bytes already sent on some subflows */ - info.sent =3D 0; + info.sent =3D offset; info.limit =3D READ_ONCE(msk->csum_enabled) ? dfrag->data_len : dfrag->already_sent; + DEBUG_NET_WARN_ON_ONCE(!before64(sent_seq, + dfrag->data_seq + + info.limit)); =20 /* * make the whole retrans decision, xmit, disallow @@ -2890,45 +2891,97 @@ static void __mptcp_retrans(struct sock *sk) struct mptcp_sock *msk =3D mptcp_sk(sk); struct mptcp_subflow_context *subflow; struct mptcp_data_frag *dfrag; + bool retransmitted =3D false; + u64 retrans_seq; int err, len; =20 - mptcp_clean_una_wakeup(sk); - - /* first check ssk: need to kick "stale" logic */ - err =3D mptcp_sched_get_retrans(msk); + mptcp_data_lock(sk); + __mptcp_clean_una_wakeup(sk); + retrans_seq =3D msk->snd_una; dfrag =3D mptcp_rtx_head(sk); + mptcp_data_unlock(sk); + if (!dfrag) + goto check_data_fin; + + for (;;) { + bool already_retrans; + u64 sent_seq; + + /* The scheduler may clean the RTX queue. */ + get_page(dfrag->page); + + /* The default scheduler will kick "stale" logic. */ + err =3D mptcp_sched_get_retrans(msk); + if (err) { + put_page(dfrag->page); + break; + } + + /* Incoming acks can have moved retrans sequence after + * the current dfrag, if so try to start again from RTX head. + */ + mptcp_data_lock(sk); + already_retrans =3D !dfrag->already_sent || + !before64(msk->snd_una, dfrag->data_seq + + dfrag->already_sent); + put_page(dfrag->page); + if (already_retrans) { + __mptcp_clean_una_wakeup(sk); + retrans_seq =3D msk->snd_una; + dfrag =3D mptcp_rtx_head(sk); + } else if (after64(msk->snd_una, retrans_seq)) { + retrans_seq =3D msk->snd_una; + } + mptcp_data_unlock(sk); + if (!dfrag) + break; + + len =3D __mptcp_push_retrans(sk, dfrag, retrans_seq); + if (len < 0) + goto clear_scheduled; + + retransmitted =3D true; + retrans_seq +=3D len; + msk->bytes_retrans +=3D len; + dfrag->already_sent =3D max_t(u16, dfrag->already_sent, + retrans_seq - dfrag->data_seq); + + /* With csum enabled retransmission can send new data. */ + sent_seq =3D dfrag->already_sent + dfrag->data_seq; + if (after64(sent_seq, msk->snd_nxt)) + msk->snd_nxt =3D sent_seq; + + /* Attempt the next fragment only if the current one is + * completely retransmitted. + */ + if (before64(retrans_seq, dfrag->data_seq + dfrag->data_len)) + break; + + dfrag =3D list_is_last(&dfrag->list, &msk->rtx_queue) ? + NULL : list_next_entry(dfrag, list); + if (!dfrag || !dfrag->already_sent) + break; + } + + /* Data fin retransmission needed only if no data retransmission took + * place, and RTX queue is empty. + */ +check_data_fin: if (!dfrag) { - if (mptcp_data_fin_enabled(msk)) { + if (!retransmitted && mptcp_data_fin_enabled(msk)) { struct inet_connection_sock *icsk =3D inet_csk(sk); =20 WRITE_ONCE(icsk->icsk_retransmits, icsk->icsk_retransmits + 1); mptcp_set_datafin_timeout(sk); mptcp_send_ack(msk); - goto reset_timer; } =20 if (!mptcp_send_head(sk)) goto clear_scheduled; - - goto reset_timer; } =20 - if (err) - goto reset_timer; - - len =3D __mptcp_push_retrans(sk, dfrag); - if (len < 0) - goto clear_scheduled; - - msk->bytes_retrans +=3D len; - dfrag->already_sent =3D max(dfrag->already_sent, len); - - /* With csum enabled retransmission can send new data. */ - if (after64(dfrag->already_sent + dfrag->data_seq, msk->snd_nxt)) - msk->snd_nxt =3D dfrag->already_sent + dfrag->data_seq; - reset_timer: mptcp_check_and_set_pending(sk); =20 --=20 2.54.0