From nobody Mon Jun 8 09:48:01 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BB75F3112B2 for ; Sat, 30 May 2026 14:59:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780153199; cv=none; b=TKwr/ZC8cZwlYFtOoYfw/ZbFOPeIIanPMAC2nYf+dXesUvF0zIBCEprukPZdEifiTfszmPDixgrOMQfVrTh35f2wwjFfCAuwvPyNEABDZUq/GZfenRuORvzvtu0iMiX2/MuQK3v7x9OkOo7R3sQ7B3GmlEsPukWOj316gyYF3cI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780153199; c=relaxed/simple; bh=/itHIMzXokaWpfmdtRG5ehEeYajJNmkGEFd9bUc+tXw=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=f8tCT864+tTdFUEe5ysY3MTExPZfNySRhuJebDLZsO1W9s7XDkPWaGuO+zohm/l7QZ9lFeEG9BQvZLYoHVP12DGPkTgaSn5l1GygVwKCwjMzXFt3jxYnewbRT4NPFPbuy2gnPvq0CTV181vXEAnJQoa+iN2jvwEkmhV4T1c5d7U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=e2X1oFNY; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="e2X1oFNY" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1780153197; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1gEorogvp0cfUoqUVykoGThauqzCDDanjDxSJ7yR8FI=; b=e2X1oFNYWA4dNxbHgAKKPiOtGpMz82Fh4aKE0pdGxAaqMpIKPs+PaNTpPpzRJiyFxSqZL1 i+LRcbtU81D9lY//oPIxJQHJnCx/0rUf2IkG1l6OZffAGixIViQ9y7KqCF6ftUHiGKcdZx j3eFcwjdy17Fg7qfReCZ1WpDrorgn3A= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-208-wYWiTlRTN-65-7zmoVZScQ-1; Sat, 30 May 2026 10:59:55 -0400 X-MC-Unique: wYWiTlRTN-65-7zmoVZScQ-1 X-Mimecast-MFC-AGG-ID: wYWiTlRTN-65-7zmoVZScQ_1780153194 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id A17C41800343 for ; Sat, 30 May 2026 14:59:54 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.44.32.9]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id D96BC1800576 for ; Sat, 30 May 2026 14:59:53 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Subject: [PATCH v11 mptcp-next 1/2] mptcp: move the retrans loop to a separate helper Date: Sat, 30 May 2026 16:59:48 +0200 Message-ID: <67e9b29a7a56df345124111775b8336cff52d30e.1780152380.git.pabeni@redhat.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: GCnw04VpDwa5jVVK4d4vr_ET-Y_wqUazfspjTPMIsOo_1780153194 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" This is a cleanup in order to make the next patch simpler. No functional change intended. Tested-by: Gang Yan Tested-by: Geliang Tang Acked-by: Geliang Tang Signed-off-by: Paolo Abeni --- net/mptcp/protocol.c | 74 +++++++++++++++++++++++++------------------- 1 file changed, 43 insertions(+), 31 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 03234e8cc26c..51756800edc2 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -2830,41 +2830,14 @@ static void mptcp_check_fastclose(struct mptcp_sock= *msk) sk_error_report(sk); } =20 -static void __mptcp_retrans(struct sock *sk) +/* Retransmit the specified data fragment on all the selected subflows. */ +static int __mptcp_push_retrans(struct sock *sk, struct mptcp_data_frag *d= frag) { struct mptcp_sendmsg_info info =3D { .data_lock_held =3D true, }; struct mptcp_sock *msk =3D mptcp_sk(sk); struct mptcp_subflow_context *subflow; - struct mptcp_data_frag *dfrag; struct sock *ssk; - int ret, err; - u16 len =3D 0; - - mptcp_clean_una_wakeup(sk); - - /* first check ssk: need to kick "stale" logic */ - err =3D mptcp_sched_get_retrans(msk); - dfrag =3D mptcp_rtx_head(sk); - if (!dfrag) { - if (mptcp_data_fin_enabled(msk)) { - struct inet_connection_sock *icsk =3D inet_csk(sk); - - WRITE_ONCE(icsk->icsk_retransmits, - icsk->icsk_retransmits + 1); - mptcp_set_datafin_timeout(sk); - mptcp_send_ack(msk); - - goto reset_timer; - } - - if (!mptcp_send_head(sk)) - goto clear_scheduled; - - goto reset_timer; - } - - if (err) - goto reset_timer; + int ret, len =3D 0; =20 mptcp_for_each_subflow(msk, subflow) { if (READ_ONCE(subflow->scheduled)) { @@ -2892,7 +2865,7 @@ static void __mptcp_retrans(struct sock *sk) !msk->allow_subflows) { spin_unlock_bh(&msk->fallback_lock); release_sock(ssk); - goto clear_scheduled; + return -1; } =20 while (info.sent < info.limit) { @@ -2915,6 +2888,45 @@ static void __mptcp_retrans(struct sock *sk) release_sock(ssk); } } + return len; +} + +static void __mptcp_retrans(struct sock *sk) +{ + struct mptcp_sock *msk =3D mptcp_sk(sk); + struct mptcp_subflow_context *subflow; + struct mptcp_data_frag *dfrag; + int err, len; + + mptcp_clean_una_wakeup(sk); + + /* first check ssk: need to kick "stale" logic */ + err =3D mptcp_sched_get_retrans(msk); + dfrag =3D mptcp_rtx_head(sk); + if (!dfrag) { + if (mptcp_data_fin_enabled(msk)) { + struct inet_connection_sock *icsk =3D inet_csk(sk); + + WRITE_ONCE(icsk->icsk_retransmits, + icsk->icsk_retransmits + 1); + mptcp_set_datafin_timeout(sk); + mptcp_send_ack(msk); + + goto reset_timer; + } + + if (!mptcp_send_head(sk)) + goto clear_scheduled; + + goto reset_timer; + } + + if (err) + goto reset_timer; + + len =3D __mptcp_push_retrans(sk, dfrag); + if (len < 0) + goto clear_scheduled; =20 msk->bytes_retrans +=3D len; dfrag->already_sent =3D max(dfrag->already_sent, len); --=20 2.54.0 From nobody Mon Jun 8 09:48:01 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B7B2D3019A9 for ; Sat, 30 May 2026 14:59:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780153201; cv=none; b=nXR0B/7r1cpIqbKqP5wC9W8Cv7RCzlBG0Oqvt9EQp9kQ/mORXzBF1Veliz90Zh13daGuzCl7YGCMbtzbZk+4wbiDuaxuAoiGGMcWFo4275kGBwHI1DZVOT9mb3zJ33ae2HeuZ+zxspcl6rZM4FRrc+tgfSLyi2uk7Q40T+FsEFA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780153201; c=relaxed/simple; bh=2Ih6tvlGZDHLs2T4gfxZ0zPain2BDsPq5H70iNTW32E=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=LMYWzvP7lcRLxZOlzenn7VjkxNoif22uisvm2V7xvFr13fQQOJTmxsXH0c997Wd5aKkuhhdeZcfFLWc0fl5nnC6J1casjdRQRNAgK55WlJ3qoW+mxkJif+Rh448FLhLQPAOZeqOHky8Roxa4i5E5CuyHNsYfP6I71uvq3LImG8g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=ECq63nLO; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="ECq63nLO" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1780153198; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4aOEJXXeyJII85fHxkdzIinVeTVVPY2nJ+iNq29N0H0=; b=ECq63nLOuWtb+IQ0sY1fKaYD7vF1kGKOjK5UZtMCGn2e9xPFurde3qgqGkMypdUeGeQZqk 0nt8Dawzf2699ihvzF6Kh3cMyUJgbaVWO997lmiY8ghdclG4zguJEdmaJN0PoEQyFq+cFV geBl6x8458ZmINRLIEIhu+UGh0SueYI= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-640-aKGMhXOtOkmrsLzaKIAxPA-1; Sat, 30 May 2026 10:59:56 -0400 X-MC-Unique: aKGMhXOtOkmrsLzaKIAxPA-1 X-Mimecast-MFC-AGG-ID: aKGMhXOtOkmrsLzaKIAxPA_1780153196 Received: from mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.111]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 0104A180035D for ; Sat, 30 May 2026 14:59:56 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.44.32.9]) by mx-prod-int-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 36F3B1800352 for ; Sat, 30 May 2026 14:59:54 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Subject: [PATCH v11 mptcp-next 2/2] mptcp: let the retrans scheduler do its job. Date: Sat, 30 May 2026 16:59:49 +0200 Message-ID: <7c5a1a19204f267fb4b0daa4286d8229cd791c9e.1780152380.git.pabeni@redhat.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.111 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: 6nJcg8OlgU19Jw9MUPCuKqkufoexZ00lbYmv2yW6Lgs_1780153196 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" Currently the MPTCP core enforces that when MPTCP-level retrans timer fires, at most a single dfrag is retransmitted. If some corner-cases it may be necessary retransmit multiple dfrags, and the MPTCP socket will need to wait multiple retrans timeout to accomplish that. Remove the mentioned constraint, allowing to transmit multiple dfrags per retrans period, as long as the scheduler keeps selecting subflows for retransmissions and pending data is available in the rtx queue. The default scheduler will transmit a dfrag per available subflow. Tested-by: Gang Yan Tested-by: Geliang Tang Acked-by: Geliang Tang Signed-off-by: Paolo Abeni --- v10 -> v11: - avoid WARNING when retransmitting dfrag with 0 already_sent, as such status can happend, as reported by Geliang. Instead explicitly check for the bad condition and skip. v9 -> v10: - simpler handling for data-fin rtx v7 -> v8 - fix corner-case retrans_seq update v4 -> v5: - fixed already_sent update v3 -> v4: - avoid quadratic behavior, fix retrans_seq update - fix rtx timer re-schedule miss v2 -> v3: - fix infinite loop issue (should address tls tests failures) v1 -> v2: - fix retrans sequence update (sashiko) Note: - sashiko may see missing data-fin rtx when the initial `dfrag` is not NULL. data-fin RTX is NOT needed in such scenario. --- net/mptcp/protocol.c | 120 +++++++++++++++++++++++++++++++------------ 1 file changed, 88 insertions(+), 32 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 51756800edc2..264a13bc6f3e 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -1201,13 +1201,6 @@ static void __mptcp_clean_una_wakeup(struct sock *sk) mptcp_write_space(sk); } =20 -static void mptcp_clean_una_wakeup(struct sock *sk) -{ - mptcp_data_lock(sk); - __mptcp_clean_una_wakeup(sk); - mptcp_data_unlock(sk); -} - static void mptcp_enter_memory_pressure(struct sock *sk) { struct mptcp_subflow_context *subflow; @@ -2830,8 +2823,12 @@ static void mptcp_check_fastclose(struct mptcp_sock = *msk) sk_error_report(sk); } =20 -/* Retransmit the specified data fragment on all the selected subflows. */ -static int __mptcp_push_retrans(struct sock *sk, struct mptcp_data_frag *d= frag) +/* + * Retransmit the specified data fragment on all the selected subflows, + * starting from the specified sequence + */ +static int __mptcp_push_retrans(struct sock *sk, struct mptcp_data_frag *d= frag, + u64 sent_seq) { struct mptcp_sendmsg_info info =3D { .data_lock_held =3D true, }; struct mptcp_sock *msk =3D mptcp_sk(sk); @@ -2841,6 +2838,7 @@ static int __mptcp_push_retrans(struct sock *sk, stru= ct mptcp_data_frag *dfrag) =20 mptcp_for_each_subflow(msk, subflow) { if (READ_ONCE(subflow->scheduled)) { + u16 offset =3D sent_seq - dfrag->data_seq; u16 copied =3D 0; =20 mptcp_subflow_set_scheduled(subflow, false); @@ -2850,7 +2848,7 @@ static int __mptcp_push_retrans(struct sock *sk, stru= ct mptcp_data_frag *dfrag) lock_sock(ssk); =20 /* limit retransmission to the bytes already sent on some subflows */ - info.sent =3D 0; + info.sent =3D offset; info.limit =3D READ_ONCE(msk->csum_enabled) ? dfrag->data_len : dfrag->already_sent; =20 @@ -2896,14 +2894,89 @@ static void __mptcp_retrans(struct sock *sk) struct mptcp_sock *msk =3D mptcp_sk(sk); struct mptcp_subflow_context *subflow; struct mptcp_data_frag *dfrag; + bool need_retrans; + u64 retrans_seq; int err, len; =20 - mptcp_clean_una_wakeup(sk); - - /* first check ssk: need to kick "stale" logic */ - err =3D mptcp_sched_get_retrans(msk); + mptcp_data_lock(sk); + __mptcp_clean_una_wakeup(sk); + retrans_seq =3D msk->snd_una; dfrag =3D mptcp_rtx_head(sk); - if (!dfrag) { + need_retrans =3D !!dfrag; + mptcp_data_unlock(sk); + if (!dfrag) + goto check_data_fin; + + for (;;) { + bool already_retrans; + u64 sent_seq; + + /* The default scheduler will kick "stale" logic, that in + * turn can process incoming acks and clean the RTX queue; + * ensure that the current dfrag will still be around + * afterwards. + */ + get_page(dfrag->page); + err =3D mptcp_sched_get_retrans(msk); + if (err) { + put_page(dfrag->page); + break; + } + + /* Incoming acks can have moved retrans sequence after + * the current dfrag, if so try to start again from RTX head. + */ + mptcp_data_lock(sk); + already_retrans =3D !dfrag->already_sent || + !before64(msk->snd_una, dfrag->data_seq + + dfrag->already_sent); + put_page(dfrag->page); + if (already_retrans) { + __mptcp_clean_una_wakeup(sk); + retrans_seq =3D msk->snd_una; + dfrag =3D mptcp_rtx_head(sk); + need_retrans =3D !!dfrag; + } else if (after64(msk->snd_una, retrans_seq)) { + retrans_seq =3D msk->snd_una; + } + mptcp_data_unlock(sk); + + /* `already_sent` can be 0 for `dfrag` belonging to the RTX + * queue due to __mptcp_retransmit_pending_data(). + */ + if (!dfrag || !dfrag->already_sent) + break; + + /* Can fail only in case of fallback. */ + len =3D __mptcp_push_retrans(sk, dfrag, retrans_seq); + if (len < 0) + goto clear_scheduled; + + retrans_seq +=3D len; + msk->bytes_retrans +=3D len; + dfrag->already_sent =3D max_t(u16, dfrag->already_sent, + retrans_seq - dfrag->data_seq); + + /* With csum enabled retransmission can send new data. */ + sent_seq =3D dfrag->already_sent + dfrag->data_seq; + if (after64(sent_seq, msk->snd_nxt)) + WRITE_ONCE(msk->snd_nxt, sent_seq); + + /* Attempt the next fragment only if the current one is + * completely retransmitted. + */ + if (before64(retrans_seq, dfrag->data_seq + dfrag->data_len)) + break; + + dfrag =3D list_is_last(&dfrag->list, &msk->rtx_queue) ? + NULL : list_next_entry(dfrag, list); + if (!dfrag) + break; + } + + /* Attempt data-fin retransmission only when the RTX queue is empty. */ + if (!need_retrans) { +check_data_fin: if (mptcp_data_fin_enabled(msk)) { struct inet_connection_sock *icsk =3D inet_csk(sk); =20 @@ -2911,30 +2984,13 @@ static void __mptcp_retrans(struct sock *sk) icsk->icsk_retransmits + 1); mptcp_set_datafin_timeout(sk); mptcp_send_ack(msk); - goto reset_timer; } =20 if (!mptcp_send_head(sk)) goto clear_scheduled; - - goto reset_timer; } =20 - if (err) - goto reset_timer; - - len =3D __mptcp_push_retrans(sk, dfrag); - if (len < 0) - goto clear_scheduled; - - msk->bytes_retrans +=3D len; - dfrag->already_sent =3D max(dfrag->already_sent, len); - - /* With csum enabled retransmission can send new data. */ - if (after64(dfrag->already_sent + dfrag->data_seq, msk->snd_nxt)) - WRITE_ONCE(msk->snd_nxt, dfrag->already_sent + dfrag->data_seq); - reset_timer: mptcp_check_and_set_pending(sk); =20 --=20 2.54.0