From nobody Mon Jun 8 07:21:48 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AA71B428841 for ; Wed, 3 Jun 2026 09:48:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780480092; cv=none; b=o+MZ4DrecWTW+jxevtKyHoP49aWcKuQSxFOV6+PdtXE3qjEJN/fygK9o+DsKblsVcLOb8eHYlBHlLJRX8yXw/kcuziIVYg+KJUwsgMmZbyrtC1eVfF03/W51jkSe9PNsdHfW3qXYU7P3+Vm984bbo7OMaiyjB9uwOrjtbNcKZGY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780480092; c=relaxed/simple; bh=e1bLDsxkbnLbsPYLFOq4u+Kt2vGyDGUV0qPfU4pbNMA=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=aNqVYdVfyreT3fsBi+Kv+2FrZ5lbbLfphCKIoqGUJt19e7MNPp+1wHAS6Bcv3Q4sT36DGpdRgQCHNL9DtcvXI3OnBlpf4qnx3FSAS4RRqE0w4aacCGdp0mK+erwz4eRy1tAmI/+c4Ji62EoYSoTt7lsLRrCCIxd/cngoZPFbYVc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=cJKrro1x; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="cJKrro1x" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1780480089; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ZKfn/Pp0ZrVjqdlNDatvlq4NruW84gFO1Mkzh5g10tk=; b=cJKrro1xYVHtJtOtLbw94qjvzmo91bBdhr2QeDkV6naIsMKx891SwCwMz/AjCfu/JvMnW2 T0AsRhuBsLoHVeH333dvRCChcg8xt3TVctRZ22ucN/oscVrL3TFGud88Ij3iq+wuVg9KDO R5EsUvZIVxtOCaCpjC6dK8LZM4fez/E= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-241-y7S3mHLANkCAtTABddn2CA-1; Wed, 03 Jun 2026 05:48:07 -0400 X-MC-Unique: y7S3mHLANkCAtTABddn2CA-1 X-Mimecast-MFC-AGG-ID: y7S3mHLANkCAtTABddn2CA_1780480087 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 3543E180035C for ; Wed, 3 Jun 2026 09:48:07 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.44.48.207]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 6E50D1800367 for ; Wed, 3 Jun 2026 09:48:06 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Subject: [PATCH v12 mptcp-next 1/2] mptcp: move the retrans loop to a separate helper Date: Wed, 3 Jun 2026 11:47:54 +0200 Message-ID: <7c84c058582a179aa7f6b4de3ab9657bdaa7d1d8.1780479980.git.pabeni@redhat.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: _NZHEtEuWS44KpPHrF55tb3u7R_AQnudbSKumDNsoE4_1780480087 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" This is a cleanup in order to make the next patch simpler. No functional change intended. Tested-by: Gang Yan Tested-by: Geliang Tang Acked-by: Geliang Tang Signed-off-by: Paolo Abeni --- net/mptcp/protocol.c | 74 +++++++++++++++++++++++++------------------- 1 file changed, 43 insertions(+), 31 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index f1d74d4b28cf..18c505605dae 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -2830,41 +2830,14 @@ static void mptcp_check_fastclose(struct mptcp_sock= *msk) sk_error_report(sk); } =20 -static void __mptcp_retrans(struct sock *sk) +/* Retransmit the specified data fragment on all the selected subflows. */ +static int __mptcp_push_retrans(struct sock *sk, struct mptcp_data_frag *d= frag) { struct mptcp_sendmsg_info info =3D { .data_lock_held =3D true, }; struct mptcp_sock *msk =3D mptcp_sk(sk); struct mptcp_subflow_context *subflow; - struct mptcp_data_frag *dfrag; struct sock *ssk; - int ret, err; - u16 len =3D 0; - - mptcp_clean_una_wakeup(sk); - - /* first check ssk: need to kick "stale" logic */ - err =3D mptcp_sched_get_retrans(msk); - dfrag =3D mptcp_rtx_head(sk); - if (!dfrag) { - if (mptcp_data_fin_enabled(msk)) { - struct inet_connection_sock *icsk =3D inet_csk(sk); - - WRITE_ONCE(icsk->icsk_retransmits, - icsk->icsk_retransmits + 1); - mptcp_set_datafin_timeout(sk); - mptcp_send_ack(msk); - - goto reset_timer; - } - - if (!mptcp_send_head(sk)) - goto clear_scheduled; - - goto reset_timer; - } - - if (err) - goto reset_timer; + int ret, len =3D 0; =20 mptcp_for_each_subflow(msk, subflow) { if (READ_ONCE(subflow->scheduled)) { @@ -2892,7 +2865,7 @@ static void __mptcp_retrans(struct sock *sk) !msk->allow_subflows) { spin_unlock_bh(&msk->fallback_lock); release_sock(ssk); - goto clear_scheduled; + return -1; } =20 while (info.sent < info.limit) { @@ -2915,6 +2888,45 @@ static void __mptcp_retrans(struct sock *sk) release_sock(ssk); } } + return len; +} + +static void __mptcp_retrans(struct sock *sk) +{ + struct mptcp_sock *msk =3D mptcp_sk(sk); + struct mptcp_subflow_context *subflow; + struct mptcp_data_frag *dfrag; + int err, len; + + mptcp_clean_una_wakeup(sk); + + /* first check ssk: need to kick "stale" logic */ + err =3D mptcp_sched_get_retrans(msk); + dfrag =3D mptcp_rtx_head(sk); + if (!dfrag) { + if (mptcp_data_fin_enabled(msk)) { + struct inet_connection_sock *icsk =3D inet_csk(sk); + + WRITE_ONCE(icsk->icsk_retransmits, + icsk->icsk_retransmits + 1); + mptcp_set_datafin_timeout(sk); + mptcp_send_ack(msk); + + goto reset_timer; + } + + if (!mptcp_send_head(sk)) + goto clear_scheduled; + + goto reset_timer; + } + + if (err) + goto reset_timer; + + len =3D __mptcp_push_retrans(sk, dfrag); + if (len < 0) + goto clear_scheduled; =20 msk->bytes_retrans +=3D len; dfrag->already_sent =3D max(dfrag->already_sent, len); --=20 2.54.0 From nobody Mon Jun 8 07:21:48 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1A4D543901C for ; Wed, 3 Jun 2026 09:48:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780480093; cv=none; b=Jmm8leEM+/dT2McV9n9+o66Z4ICnT8Vqvo05Vesuw+ZHet1zXyKhdA0zJ6j8lQulrPMMLjV5dnUnq8eO8EDhQ2plkyiaBNuW1OKYjllesfSjCUncqezcsdD0yIL85NqnTMfShQDMUer8V5m9cKyhqHhBaUJpfSEAt7rIq4pWQb0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780480093; c=relaxed/simple; bh=TJOzvFOHw4zLKPEZIjb0HyjxnwYt/UUR7AJYNIP6qSY=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=IABIk7syYmlEOmryX9YEOul87ghKDDT/PIH3JUjlXqm7WyueUhvlggnf1TEkExPKpH/QeeIGBlT/4KALDo+ZcnXh1UuO6PXSURQLonPcKX3NYtAysyJK0HHbqgTFbKjCI5426GwKQ597WROjtvGOccQW72UEMddPA6Me2i0gNCw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=e80jbUTB; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="e80jbUTB" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1780480091; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2GG83fl3PNzPxEwNivHgPULXo5JA4IGMnyTSnFNHCho=; b=e80jbUTBTydcXCkXMqueNtG9wkoHd4sh/hnd/2eaVmp8dxa7ADn9z9NTseVp8wLBZqHFDz EML8wE3hGuue2KNyB9M+hbMhxjDwz3MOdWaMLRzRH60+tvE/M7AC3sFT5gDqpzS1ED3v0W h4vHtyTKAajBOkbgGLsmWUzi5/bIQ1M= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-161-NqNI-dQhOa-AKWnx2MiMow-1; Wed, 03 Jun 2026 05:48:09 -0400 X-MC-Unique: NqNI-dQhOa-AKWnx2MiMow-1 X-Mimecast-MFC-AGG-ID: NqNI-dQhOa-AKWnx2MiMow_1780480089 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 0E3F7180034B for ; Wed, 3 Jun 2026 09:48:09 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.44.48.207]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 465CC180049F for ; Wed, 3 Jun 2026 09:48:08 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Subject: [PATCH v12 mptcp-next 2/2] mptcp: let the retrans scheduler do its job. Date: Wed, 3 Jun 2026 11:47:55 +0200 Message-ID: <706c7074dd80f14da83828d566dc5967c94c5d8e.1780479980.git.pabeni@redhat.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: wlQLY6K_Uu5CgQWSxnCyuXdAMgj1HN9qs7bSA2IdCQo_1780480089 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" Currently the MPTCP core enforces that when MPTCP-level retrans timer fires, at most a single dfrag is retransmitted. If some corner-cases it may be necessary retransmit multiple dfrags, and the MPTCP socket will need to wait multiple retrans timeout to accomplish that. Remove the mentioned constraint, allowing to transmit multiple dfrags per retrans period, as long as the scheduler keeps selecting subflows for retransmissions and pending data is available in the rtx queue. The default scheduler will transmit a dfrag per available subflow. Tested-by: Gang Yan Tested-by: Geliang Tang Acked-by: Geliang Tang Signed-off-by: Paolo Abeni --- v11 -> v12: - avoid infinite loop (sashiko) v10 -> v11: - avoid WARNING when retransmitting dfrag with 0 already_sent, as such status can happend, as reported by Geliang. Instead explicitly check for the bad condition and skip. v9 -> v10: - simpler handling for data-fin rtx v7 -> v8 - fix corner-case retrans_seq update v4 -> v5: - fixed already_sent update v3 -> v4: - avoid quadratic behavior, fix retrans_seq update - fix rtx timer re-schedule miss v2 -> v3: - fix infinite loop issue (should address tls tests failures) v1 -> v2: - fix retrans sequence update (sashiko) Note: - sashiko may see missing data-fin rtx when the initial `dfrag` is not NULL. data-fin RTX is NOT needed in such scenario. --- net/mptcp/protocol.c | 119 +++++++++++++++++++++++++++++++------------ 1 file changed, 87 insertions(+), 32 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 18c505605dae..a4f7e99b30db 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -1201,13 +1201,6 @@ static void __mptcp_clean_una_wakeup(struct sock *sk) mptcp_write_space(sk); } =20 -static void mptcp_clean_una_wakeup(struct sock *sk) -{ - mptcp_data_lock(sk); - __mptcp_clean_una_wakeup(sk); - mptcp_data_unlock(sk); -} - static void mptcp_enter_memory_pressure(struct sock *sk) { struct mptcp_subflow_context *subflow; @@ -2830,8 +2823,12 @@ static void mptcp_check_fastclose(struct mptcp_sock = *msk) sk_error_report(sk); } =20 -/* Retransmit the specified data fragment on all the selected subflows. */ -static int __mptcp_push_retrans(struct sock *sk, struct mptcp_data_frag *d= frag) +/* + * Retransmit the specified data fragment on all the selected subflows, + * starting from the specified sequence + */ +static int __mptcp_push_retrans(struct sock *sk, struct mptcp_data_frag *d= frag, + u64 sent_seq) { struct mptcp_sendmsg_info info =3D { .data_lock_held =3D true, }; struct mptcp_sock *msk =3D mptcp_sk(sk); @@ -2841,6 +2838,7 @@ static int __mptcp_push_retrans(struct sock *sk, stru= ct mptcp_data_frag *dfrag) =20 mptcp_for_each_subflow(msk, subflow) { if (READ_ONCE(subflow->scheduled)) { + u16 offset =3D sent_seq - dfrag->data_seq; u16 copied =3D 0; =20 mptcp_subflow_set_scheduled(subflow, false); @@ -2850,7 +2848,7 @@ static int __mptcp_push_retrans(struct sock *sk, stru= ct mptcp_data_frag *dfrag) lock_sock(ssk); =20 /* limit retransmission to the bytes already sent on some subflows */ - info.sent =3D 0; + info.sent =3D offset; info.limit =3D READ_ONCE(msk->csum_enabled) ? dfrag->data_len : dfrag->already_sent; =20 @@ -2896,14 +2894,88 @@ static void __mptcp_retrans(struct sock *sk) struct mptcp_sock *msk =3D mptcp_sk(sk); struct mptcp_subflow_context *subflow; struct mptcp_data_frag *dfrag; + bool need_retrans; + u64 retrans_seq; int err, len; =20 - mptcp_clean_una_wakeup(sk); - - /* first check ssk: need to kick "stale" logic */ - err =3D mptcp_sched_get_retrans(msk); + mptcp_data_lock(sk); + __mptcp_clean_una_wakeup(sk); + retrans_seq =3D msk->snd_una; dfrag =3D mptcp_rtx_head(sk); - if (!dfrag) { + need_retrans =3D !!dfrag; + mptcp_data_unlock(sk); + if (!dfrag) + goto check_data_fin; + + for (;;) { + bool already_retrans; + u64 sent_seq; + + /* The default scheduler will kick "stale" logic, that in + * turn can process incoming acks and clean the RTX queue; + * ensure that the current dfrag will still be around + * afterwards. + */ + get_page(dfrag->page); + err =3D mptcp_sched_get_retrans(msk); + if (err) { + put_page(dfrag->page); + break; + } + + /* Incoming acks can have moved retrans sequence after + * the current dfrag, if so try to start again from RTX head. + */ + mptcp_data_lock(sk); + already_retrans =3D !before64(msk->snd_una, dfrag->data_seq + + dfrag->already_sent); + put_page(dfrag->page); + if (already_retrans) { + __mptcp_clean_una_wakeup(sk); + retrans_seq =3D msk->snd_una; + dfrag =3D mptcp_rtx_head(sk); + need_retrans =3D !!dfrag; + } else if (after64(msk->snd_una, retrans_seq)) { + retrans_seq =3D msk->snd_una; + } + mptcp_data_unlock(sk); + + /* `already_sent` can be 0 for `dfrag` belonging to the RTX + * queue due to __mptcp_retransmit_pending_data(). + */ + if (!dfrag || !dfrag->already_sent) + break; + + /* Can fail only in case of fallback. */ + len =3D __mptcp_push_retrans(sk, dfrag, retrans_seq); + if (len < 0) + goto clear_scheduled; + + retrans_seq +=3D len; + msk->bytes_retrans +=3D len; + dfrag->already_sent =3D max_t(u16, dfrag->already_sent, + retrans_seq - dfrag->data_seq); + + /* With csum enabled retransmission can send new data. */ + sent_seq =3D dfrag->already_sent + dfrag->data_seq; + if (after64(sent_seq, msk->snd_nxt)) + WRITE_ONCE(msk->snd_nxt, sent_seq); + + /* Attempt the next fragment only if the current one is + * completely retransmitted. + */ + if (before64(retrans_seq, dfrag->data_seq + dfrag->data_len)) + break; + + dfrag =3D list_is_last(&dfrag->list, &msk->rtx_queue) ? + NULL : list_next_entry(dfrag, list); + if (!dfrag) + break; + } + + /* Attempt data-fin retransmission only when the RTX queue is empty. */ + if (!need_retrans) { +check_data_fin: if (mptcp_data_fin_enabled(msk)) { struct inet_connection_sock *icsk =3D inet_csk(sk); =20 @@ -2911,30 +2983,13 @@ static void __mptcp_retrans(struct sock *sk) icsk->icsk_retransmits + 1); mptcp_set_datafin_timeout(sk); mptcp_send_ack(msk); - goto reset_timer; } =20 if (!mptcp_send_head(sk)) goto clear_scheduled; - - goto reset_timer; } =20 - if (err) - goto reset_timer; - - len =3D __mptcp_push_retrans(sk, dfrag); - if (len < 0) - goto clear_scheduled; - - msk->bytes_retrans +=3D len; - dfrag->already_sent =3D max(dfrag->already_sent, len); - - /* With csum enabled retransmission can send new data. */ - if (after64(dfrag->already_sent + dfrag->data_seq, msk->snd_nxt)) - WRITE_ONCE(msk->snd_nxt, dfrag->already_sent + dfrag->data_seq); - reset_timer: mptcp_check_and_set_pending(sk); =20 --=20 2.54.0