From nobody Wed Sep 17 16:11:42 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EFC59371EB2 for ; Tue, 16 Sep 2025 16:27:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758040054; cv=none; b=dHB6g5KOK4yc7XSEIPREAhGbh/WcPA0zZ14Zgv3vr7hk6tVsDEVtEFQwa2LikOzbAivSsS9vEYNcSB7Rhzl4kwEbytYXOspXnS9AKzUv89AqqMfKM2RxbHjzlFPtpwTjJdspYytTjicbXtI6ac61e7oh9VrPgw+LFrOm8o1OBNY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758040054; c=relaxed/simple; bh=2Z2yknyfCbepGGBX5iwovAXZ9csRYcEcN+ucgXLdVsI=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=iOkhUdJBcYhDo40Ft9q8Go1VVt1dNih68l04dBnJ+wvtQxv8rG3LNHS9jh7MLzlNk3UNoMupO6E36NLa0pzvYhyNBOffmsbfB4Qd3bENia+LTwVV5nmn2IakzKzW4W2LwUMANDo4JpHiDhK156zWmYKgdNmdmNOAj6YOQ6Ph9LU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=crpsvyKt; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="crpsvyKt" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1758040051; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6C79Vquxdp59KsW/DAVb9ZIpmsK+9KCNv3rVcxOyLfg=; b=crpsvyKtU/BRtOAoZteIXfAbg/szFIEnY8yPyMdK1TVZq30pdc5yJIRcBc68lX4tccE81V YvapjXY1qlbjrMXh++bdOCxcfSUI2b4PS92M91x5gGaCF96e+x9WCyD5mL0dUbtNSmwgqQ Qyof6iQe/rtu89ODltGATovUQOLvBfA= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-675-Ie-edlZQPgas1Gkgv7KVHw-1; Tue, 16 Sep 2025 12:27:30 -0400 X-MC-Unique: Ie-edlZQPgas1Gkgv7KVHw-1 X-Mimecast-MFC-AGG-ID: Ie-edlZQPgas1Gkgv7KVHw_1758040049 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 552271800576 for ; Tue, 16 Sep 2025 16:27:29 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.44.32.160]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 734DB19560B8 for ; Tue, 16 Sep 2025 16:27:28 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Subject: [MPTCP next 01/12] mptcp: leverage skb deferral free Date: Tue, 16 Sep 2025 18:27:11 +0200 Message-ID: <91fd56b60fb3766c3568013a123baf1968985869.1758039775.git.pabeni@redhat.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: xb8Li5uFQxZJu9k5zLPFZ1GguuEDKDBA6ICu8EsUjwA_1758040049 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" Usage of the skb deferral API is straight-forward; with multiple subflows actives this allow moving part of the received application load into multiple CPUs. Also fix a typo in the related comment. Signed-off-by: Paolo Abeni --- net/mptcp/protocol.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index f2e7282394804..c51aede20779d 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -1913,12 +1913,13 @@ static int __mptcp_recvmsg_mskq(struct sock *sk, } =20 if (!(flags & MSG_PEEK)) { - /* avoid the indirect call, we know the destructor is sock_wfree */ + /* avoid the indirect call, we know the destructor is sock_rfree */ skb->destructor =3D NULL; + skb->sk =3D NULL; atomic_sub(skb->truesize, &sk->sk_rmem_alloc); sk_mem_uncharge(sk, skb->truesize); __skb_unlink(skb, &sk->sk_receive_queue); - __kfree_skb(skb); + skb_attempt_defer_free(skb); msk->bytes_consumed +=3D count; } =20 --=20 2.51.0 From nobody Wed Sep 17 16:11:42 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 74740370598 for ; Tue, 16 Sep 2025 16:27:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758040056; cv=none; b=ug5WlzGmt2I5VGJqCoD6sM/+G35XY6RxRyUUCnYJJw7OAPuK7PRuWHJUB/xfxEjNK54giwvyBfZ/Qurpt+54pIlEimML8zP3qK9UE9a/mxpvlo9Huq9nIHioktepDlKYJ2HMshGENXUHicQ3BKXdoVTABYa8J6EeKUz4PE2buGg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758040056; c=relaxed/simple; bh=gyssziIGGEdsOXefNv/wfttrrrysWJBAmDQrvxcN8Q4=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=f/sDt/AGAgdjvCl13YoYx3tWBVsy/jRzKb42C6/xon6uF3Z35VTNNFlPSJnyaD4HMs/S4mQEiKmpzPVMh4RjKpP2HXlwGq1wYnAGREXfj/UlUSUL7AMx/wqrgbyxr0uRRoQVBz6Z6MyDv+AFyG3fnYp8aRCjkaGqUDLyEwJuAKU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=inZuKTvu; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="inZuKTvu" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1758040053; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mODELDa4hnLanuBFOLGRgzvjm8PQk5jvijnPBkcUn78=; b=inZuKTvubUCpkEB0wPkw57dD92MByoK/GI2M74z7vgIrtEhSbAWNeanoAtC/qbrtOTPjYo O5nM+HGP7+bXuhZlG9bvUox0ONjvh7egFqkOarx62fMRhLH+5P4oA/4LMn2WugjYnA4pAR JpokzlZhlh5fdWS0KpjkzrVMiIULAf8= Received: from mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-695-OS3swH79MieErlguErFc6Q-1; Tue, 16 Sep 2025 12:27:31 -0400 X-MC-Unique: OS3swH79MieErlguErFc6Q-1 X-Mimecast-MFC-AGG-ID: OS3swH79MieErlguErFc6Q_1758040050 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id C4B96195609F for ; Tue, 16 Sep 2025 16:27:30 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.44.32.160]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id ECFC519560B8 for ; Tue, 16 Sep 2025 16:27:29 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Subject: [MPTCP next 02/12] tcp: make tcp_rcvbuf_grow() accessible to mptcp code Date: Tue, 16 Sep 2025 18:27:12 +0200 Message-ID: <53625108d8a976163497ad5cd44cce2eebab119c.1758039775.git.pabeni@redhat.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: -Lhh2ZyPdBgv2NmVJ1VmULRVy0bdpSTVhvv8SS63weo_1758040050 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" To leverage the auto-tuning improvements brought by commit 2da35e4b4df9 ("Merge branch 'tcp-receive-side-improvements'"), the MPTCP stack need to access the mentioned helper. Signed-off-by: Paolo Abeni --- include/net/tcp.h | 1 + net/ipv4/tcp_input.c | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index 2936b8175950f..8298029f2d0f7 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -346,6 +346,7 @@ void tcp_delack_timer_handler(struct sock *sk); int tcp_ioctl(struct sock *sk, int cmd, int *karg); enum skb_drop_reason tcp_rcv_state_process(struct sock *sk, struct sk_buff= *skb); void tcp_rcv_established(struct sock *sk, struct sk_buff *skb); +void tcp_rcvbuf_grow(struct sock *sk); void tcp_rcv_space_adjust(struct sock *sk); int tcp_twsk_unique(struct sock *sk, struct sock *sktw, void *twp); void tcp_twsk_destructor(struct sock *sk); diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index a52a747d8a55e..f9f5705390430 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -744,7 +744,7 @@ static inline void tcp_rcv_rtt_measure_ts(struct sock *= sk, } } =20 -static void tcp_rcvbuf_grow(struct sock *sk) +void tcp_rcvbuf_grow(struct sock *sk) { const struct net *net =3D sock_net(sk); struct tcp_sock *tp =3D tcp_sk(sk); --=20 2.51.0 From nobody Wed Sep 17 16:11:42 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 98078371EB2 for ; Tue, 16 Sep 2025 16:27:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758040057; cv=none; b=Y2j0Xw5vWl+ghL06m1bUG0B4+75HmkE/JnvoddJrxFVwJZdmrLR4XyJ+ics7osbCKj0svuHg7HjLGXPm83rWTlnrQilm5pJ/w842WD7U57Hb1APsEhhZcLsD96wPcreEzMVOxDfN0s6UXVgjwRSiXE6hYlESwl6SdVegpIXNClI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758040057; c=relaxed/simple; bh=Q2INywZt/aKkUcdXYBYNLzbqVCUMIuLt/RDMp8uvix8=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=VceK8UyhxVXdzAWqwK8dOfZCyvYp4QyvyDCJY8jPyQi5krqCgGzsF8ner7gM50MKO2jNMIMFkXcFGmghxKYZ+XArSV9PCxd0AY9ybxm0SwHeokcOKUlQRXhfod77tFzbhBjfOEw8zWsAZ6iwUMW1si8GhAP3qaSrgAeUUlL4C1k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=CHdhH89C; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="CHdhH89C" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1758040054; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=leevm05Lx0IpAHQgB2Gr714aqU5lgDMSwqYLJ/kqFVo=; b=CHdhH89CVZ+CSljGTTdbnB1Fl5FFm+luDBPQETSM6Wvp0fiUvhIg5HkfUIV48vS/qVv60b wUoKaFycpNnzQgjVApG8NK80ycgB6LoAoa/myNRw4g7e3d6iIlLRSVUQ0W4H56qjPUAtwf 135dTD6C7sACFej7jeb+RhetHaZ/KG4= Received: from mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-646-SYMyiZGQO5Kj-8fU9YnHKw-1; Tue, 16 Sep 2025 12:27:33 -0400 X-MC-Unique: SYMyiZGQO5Kj-8fU9YnHKw-1 X-Mimecast-MFC-AGG-ID: SYMyiZGQO5Kj-8fU9YnHKw_1758040052 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 3F0741956048 for ; Tue, 16 Sep 2025 16:27:32 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.44.32.160]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 6759419560B8 for ; Tue, 16 Sep 2025 16:27:31 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Subject: [MPTCP next 03/12] mptcp: rcvbuf auto-tuning improvement Date: Tue, 16 Sep 2025 18:27:13 +0200 Message-ID: In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: LOuEttaC3IEU5oK6jgmuoo_WoGwvv-l_GWvUrZGIyLY_1758040052 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" Apply to the MPTCP auto-tuning the same improvements introduced for the TCP protocol by the merge commit 2da35e4b4df9 ("Merge branch 'tcp-receive-side-improvements'"). The main difference is that TCP subflow and the main MPTCP socket need to account separately for OoO: MPTCP does not care for TCP-level OoO and vice versa. The above additionally allow dropping the msk receive buffer update at receive time, as the latter only intended to cope with subflow receive buffer increase due to OoO packets. Signed-off-by: Paolo Abeni --- net/mptcp/protocol.c | 95 +++++++++++++++++++++----------------------- net/mptcp/protocol.h | 4 +- 2 files changed, 47 insertions(+), 52 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index c51aede20779d..671c51cb9539c 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -178,6 +178,33 @@ static bool mptcp_ooo_try_coalesce(struct mptcp_sock *= msk, struct sk_buff *to, return mptcp_try_coalesce((struct sock *)msk, to, from); } =20 +static bool mptcp_rcvbuf_grow(struct sock *sk) +{ + struct mptcp_sock *msk =3D mptcp_sk(sk); + int rcvwin, rcvbuf; + + if (!READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_moderate_rcvbuf) || + (sk->sk_userlocks & SOCK_RCVBUF_LOCK)) + return false; + + rcvwin =3D ((u64)msk->rcvq_space.space << 1); + + if (!RB_EMPTY_ROOT(&msk->out_of_order_queue)) + rcvwin +=3D MPTCP_SKB_CB(msk->ooo_last_skb)->end_seq - msk->ack_seq; + + rcvbuf =3D min_t(u64, mptcp_space_from_win(sk, rcvwin), + READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[2])); + + if (rcvbuf > sk->sk_rcvbuf) { + u32 window_clamp; + + window_clamp =3D mptcp_win_from_space(sk, rcvbuf); + WRITE_ONCE(sk->sk_rcvbuf, rcvbuf); + return true; + } + return false; +} + /* "inspired" by tcp_data_queue_ofo(), main differences: * - use mptcp seqs * - don't cope with sacks @@ -291,6 +318,9 @@ static void mptcp_data_queue_ofo(struct mptcp_sock *msk= , struct sk_buff *skb) end: skb_condense(skb); skb_set_owner_r(skb, sk); + /* do not grow rcvbuf for not-yet-accepted or orphaned sockets. */ + if (sk->sk_socket) + mptcp_rcvbuf_grow(sk); } =20 static bool __mptcp_move_skb(struct mptcp_sock *msk, struct sock *ssk, @@ -770,18 +800,10 @@ static bool move_skbs_to_msk(struct mptcp_sock *msk, = struct sock *ssk) return moved; } =20 -static void __mptcp_rcvbuf_update(struct sock *sk, struct sock *ssk) -{ - if (unlikely(ssk->sk_rcvbuf > sk->sk_rcvbuf)) - WRITE_ONCE(sk->sk_rcvbuf, ssk->sk_rcvbuf); -} - static void __mptcp_data_ready(struct sock *sk, struct sock *ssk) { struct mptcp_sock *msk =3D mptcp_sk(sk); =20 - __mptcp_rcvbuf_update(sk, ssk); - /* Wake-up the reader only for in-sequence data */ if (move_skbs_to_msk(msk, ssk) && mptcp_epollin_ready(sk)) sk->sk_data_ready(sk); @@ -1984,48 +2006,26 @@ static void mptcp_rcv_space_adjust(struct mptcp_soc= k *msk, int copied) if (msk->rcvq_space.copied <=3D msk->rcvq_space.space) goto new_measure; =20 - if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_moderate_rcvbuf) && - !(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) { - u64 rcvwin, grow; - int rcvbuf; - - rcvwin =3D ((u64)msk->rcvq_space.copied << 1) + 16 * advmss; - - grow =3D rcvwin * (msk->rcvq_space.copied - msk->rcvq_space.space); - - do_div(grow, msk->rcvq_space.space); - rcvwin +=3D (grow << 1); - - rcvbuf =3D min_t(u64, mptcp_space_from_win(sk, rcvwin), - READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[2])); - - if (rcvbuf > sk->sk_rcvbuf) { - u32 window_clamp; - - window_clamp =3D mptcp_win_from_space(sk, rcvbuf); - WRITE_ONCE(sk->sk_rcvbuf, rcvbuf); + msk->rcvq_space.space =3D msk->rcvq_space.copied; + if (mptcp_rcvbuf_grow(sk)) { =20 - /* Make subflows follow along. If we do not do this, we - * get drops at subflow level if skbs can't be moved to - * the mptcp rx queue fast enough (announced rcv_win can - * exceed ssk->sk_rcvbuf). - */ - mptcp_for_each_subflow(msk, subflow) { - struct sock *ssk; - bool slow; + /* Make subflows follow along. If we do not do this, we + * get drops at subflow level if skbs can't be moved to + * the mptcp rx queue fast enough (announced rcv_win can + * exceed ssk->sk_rcvbuf). + */ + mptcp_for_each_subflow(msk, subflow) { + struct sock *ssk; + bool slow; =20 - ssk =3D mptcp_subflow_tcp_sock(subflow); - slow =3D lock_sock_fast(ssk); - WRITE_ONCE(ssk->sk_rcvbuf, rcvbuf); - WRITE_ONCE(tcp_sk(ssk)->window_clamp, window_clamp); - if (tcp_can_send_ack(ssk)) - tcp_cleanup_rbuf(ssk, 1); - unlock_sock_fast(ssk, slow); - } + ssk =3D mptcp_subflow_tcp_sock(subflow); + slow =3D lock_sock_fast(ssk); + tcp_sk(ssk)->rcvq_space.space =3D msk->rcvq_space.copied; + tcp_rcvbuf_grow(ssk); + unlock_sock_fast(ssk, slow); } } =20 - msk->rcvq_space.space =3D msk->rcvq_space.copied; new_measure: msk->rcvq_space.copied =3D 0; msk->rcvq_space.time =3D mstamp; @@ -2054,11 +2054,6 @@ static bool __mptcp_move_skbs(struct sock *sk) if (list_empty(&msk->conn_list)) return false; =20 - /* verify we can move any data from the subflow, eventually updating */ - if (!(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) - mptcp_for_each_subflow(msk, subflow) - __mptcp_rcvbuf_update(sk, subflow->tcp_sock); - subflow =3D list_first_entry(&msk->conn_list, struct mptcp_subflow_context, node); for (;;) { diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index a1787a1344ac1..128baea5b496e 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -341,8 +341,8 @@ struct mptcp_sock { struct mptcp_pm_data pm; struct mptcp_sched_ops *sched; struct { - u32 space; /* bytes copied in last measurement window */ - u32 copied; /* bytes copied in this measurement window */ + int space; /* bytes copied in last measurement window */ + int copied; /* bytes copied in this measurement window */ u64 time; /* start time of measurement window */ u64 rtt_us; /* last maximum rtt of subflows */ } rcvq_space; --=20 2.51.0 From nobody Wed Sep 17 16:11:42 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C46F93728BB for ; Tue, 16 Sep 2025 16:27:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758040058; cv=none; b=CIamzZ07x/LQQ8ikO2KNGPp52AC5wWI2ikrWFHMmFhIQbWqBQcwyc5I3EQWxdmKPm0e+I5D7hBjpE+p/D3KfFVHlri0lrsYqgk7FfV5ACeDT1duPIqb6oRsSht51veJ5+pUpQj3vxoukXUv+tfUPATSL0rrxx31M3freb8bO6jw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758040058; c=relaxed/simple; bh=0yAQeNL9S7huaiEkZqoNfIS8+QbBeUYeY9R+rGC/OE4=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=o/c3YItFsc5AXZ0QcNty1oVDGiECou6zMpkfdiwMSBkG+b1QMeeb9/DmHtw7qCXm2ssd+f0Rd99E6UegMN0mHowSONWOC5G8KxW8Ev0tEdL6yftDqiGDz+ZFGYKsUUBTkC/9IQNVxtO2Hdfrut61X2kzjF0xq37ejzX/+FMkGpA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=EI3QXGrh; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="EI3QXGrh" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1758040055; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2MwOWkdJt7xZFcquPO37uZ9jZzo6PlW4CuZ15oYasL8=; b=EI3QXGrh6ioAxqTqSXoIB9qc+RsHQGB/eCdAtOefCY6HpZEBpDC0iTJD8mepWz+rsf4e68 0y01Zp8skEms0RkMlRY0wMTTG3PMW5duXGnlSZ+6JI6wtrB9G2LnYiMUZHEhDMseJlJuSv BXiO8UshgU3+GzEVedqYJwl3gBvNUkA= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-690-dUL_hv9ZNHSLwG8zRcwqBg-1; Tue, 16 Sep 2025 12:27:34 -0400 X-MC-Unique: dUL_hv9ZNHSLwG8zRcwqBg-1 X-Mimecast-MFC-AGG-ID: dUL_hv9ZNHSLwG8zRcwqBg_1758040053 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 981091800293 for ; Tue, 16 Sep 2025 16:27:33 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.44.32.160]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id BC3EF19560B8 for ; Tue, 16 Sep 2025 16:27:32 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Subject: [MPTCP next 04/12] mptcp: introduce the mptcp_init_skb helper. Date: Tue, 16 Sep 2025 18:27:14 +0200 Message-ID: In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: QZucPffE-cTZHOUH_cJOCBR9EM4NtRYVIDSu3XPikpE_1758040053 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" Factor out all the skb initialization step in a new helper and use it. Note that this change moves the MPTCP CB initialization earlier: we can do such step as soon as the skb leaves the subflow socket receive queues. Signed-off-by: Paolo Abeni --- net/mptcp/protocol.c | 46 ++++++++++++++++++++++++-------------------- 1 file changed, 25 insertions(+), 21 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 671c51cb9539c..879157a1f4fb1 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -323,27 +323,11 @@ static void mptcp_data_queue_ofo(struct mptcp_sock *m= sk, struct sk_buff *skb) mptcp_rcvbuf_grow(sk); } =20 -static bool __mptcp_move_skb(struct mptcp_sock *msk, struct sock *ssk, - struct sk_buff *skb, unsigned int offset, - size_t copy_len) +static void mptcp_init_skb(struct sock *ssk, + struct mptcp_subflow_context *subflow, + struct sk_buff *skb, int offset, int copy_len) { - struct mptcp_subflow_context *subflow =3D mptcp_subflow_ctx(ssk); - struct sock *sk =3D (struct sock *)msk; - struct sk_buff *tail; - bool has_rxtstamp; - - __skb_unlink(skb, &ssk->sk_receive_queue); - - skb_ext_reset(skb); - skb_orphan(skb); - - /* try to fetch required memory from subflow */ - if (!sk_rmem_schedule(sk, skb, skb->truesize)) { - MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_RCVPRUNED); - goto drop; - } - - has_rxtstamp =3D TCP_SKB_CB(skb)->has_rxtstamp; + bool has_rxtstamp =3D TCP_SKB_CB(skb)->has_rxtstamp; =20 /* the skb map_seq accounts for the skb offset: * mptcp_subflow_get_mapped_dsn() is based on the current tp->copied_seq @@ -355,6 +339,24 @@ static bool __mptcp_move_skb(struct mptcp_sock *msk, s= truct sock *ssk, MPTCP_SKB_CB(skb)->has_rxtstamp =3D has_rxtstamp; MPTCP_SKB_CB(skb)->cant_coalesce =3D 0; =20 + __skb_unlink(skb, &ssk->sk_receive_queue); + + skb_ext_reset(skb); + skb_dst_drop(skb); +} + +static bool __mptcp_move_skb(struct mptcp_sock *msk, struct sk_buff *skb) +{ + u64 copy_len =3D MPTCP_SKB_CB(skb)->end_seq - MPTCP_SKB_CB(skb)->map_seq; + struct sock *sk =3D (struct sock *)msk; + struct sk_buff *tail; + + /* try to fetch required memory from subflow */ + if (!sk_rmem_schedule(sk, skb, skb->truesize)) { + MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_RCVPRUNED); + goto drop; + } + if (MPTCP_SKB_CB(skb)->map_seq =3D=3D msk->ack_seq) { /* in sequence */ msk->bytes_received +=3D copy_len; @@ -662,7 +664,9 @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp= _sock *msk, if (offset < skb->len) { size_t len =3D skb->len - offset; =20 - ret =3D __mptcp_move_skb(msk, ssk, skb, offset, len) || ret; + mptcp_init_skb(ssk, subflow, skb, offset, len); + skb_orphan(skb); + ret =3D __mptcp_move_skb(msk, skb) || ret; seq +=3D len; =20 if (unlikely(map_remaining < len)) { --=20 2.51.0 From nobody Wed Sep 17 16:11:42 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 083EF3629AC for ; Tue, 16 Sep 2025 16:27:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758040060; cv=none; b=oIHbYGimfGwHTBMKalmUOg/lvD2VwoTDRzaxn8XLZz2vre7w5nQcuU5sZiONDjvnqBZmLFWM1FnZ/AinqqIWDxzBPyIgphfv241h86H9iUbEgxq0ElQ7CyrThn6JHToAdjSDbAI8NQZU8k51K5OCUkXopIfFPG7+rSVLjztGriM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758040060; c=relaxed/simple; bh=UrN2bCtaGuZZZlsrb7Sues0FQIRls8b3Id+lIBxhb5s=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=Tom6/qEFpO2iMeCmvSgnAphdP0d0IV3jYE6qJsx1jBd/bg2ss48xBYxq4c1AwOXYFnclU5GBWv1BfrhIdUhJj/RaYD7TNak9yCE3rSGKDWffzfPXPzcnmwhgJQPwdOf10Ksn9TRRwnrT/16Zw4mycna3/ZLnn8oXRI9u+eSzr4Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=YO8/AQ3J; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="YO8/AQ3J" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1758040058; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hmPrS0XVlJrinw2xqry3vXCE+9sjdJPdx2HjxpLbHHQ=; b=YO8/AQ3JFH/3wLQf0QY3+DgOFIqZrcgreGXUKJWeVd1frVESf35vIvgl4X7BCOmfry0PyD 3vOhLxP5ZeWXWjB/Q85TRWtnLpMfkaSP0nJZ1dCbJe3h1Lo/imxVs/bwL93wC3+ON8JM6u jFOdl9iz6rThg8SOipA/4/DVQhCuCvY= Received: from mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-94-rDTxUEbNMJydj86wqC_C5Q-1; Tue, 16 Sep 2025 12:27:35 -0400 X-MC-Unique: rDTxUEbNMJydj86wqC_C5Q-1 X-Mimecast-MFC-AGG-ID: rDTxUEbNMJydj86wqC_C5Q_1758040055 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 02ABE1956096 for ; Tue, 16 Sep 2025 16:27:35 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.44.32.160]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 20FD019560B8 for ; Tue, 16 Sep 2025 16:27:33 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Subject: [MPTCP next 05/12] mptcp: remove unneeded mptcp_move_skb() Date: Tue, 16 Sep 2025 18:27:15 +0200 Message-ID: <5df965245b1785907e6f54c6ea9b63b74dde9d0e.1758039775.git.pabeni@redhat.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: XoOalvxalZJGTFF3dDGEhhP-1aCHnrpGWv1iDWNwWtQ_1758040055 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" Since commit b7535cfed223 ("mptcp: drop legacy code around RX EOF"), sk_shutdown can't change during the main recvmsg loop, we can drop the related race breaker. Signed-off-by: Paolo Abeni --- net/mptcp/protocol.c | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 879157a1f4fb1..58744823effc8 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -2173,14 +2173,8 @@ static int mptcp_recvmsg(struct sock *sk, struct msg= hdr *msg, size_t len, break; } =20 - if (sk->sk_shutdown & RCV_SHUTDOWN) { - /* race breaker: the shutdown could be after the - * previous receive queue check - */ - if (__mptcp_move_skbs(sk)) - continue; + if (sk->sk_shutdown & RCV_SHUTDOWN) break; - } =20 if (sk->sk_state =3D=3D TCP_CLOSE) { copied =3D -ENOTCONN; --=20 2.51.0 From nobody Wed Sep 17 16:11:42 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 69868374279 for ; Tue, 16 Sep 2025 16:27:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758040060; cv=none; b=qCV42pJQN7AjTYI6PyZ3hiISLFuixz8oorZNSeJbvn0hADN5GWmtZ7ZXSWIGKV3/Vf5qwW9xP0spjpB1x02jZuQAueviJYxOyMuUE53EoYjfJyC36QBwefsxKZEsigQQ36zQqbm+GkWanzvD1TpF4ukHhw3BHrA0TG36WUJKiZ8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758040060; c=relaxed/simple; bh=+AEhzCnelXlzSTvMeEtRWt5Ke4jhJX/pG4yLqdJfx9U=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=hvPXD1b+o9iKlKb0iBfxi4kJf+sFvhm8j6LL9JgXmWI9dpRdfcgZK5KUC65phCNZCzDGxA4kmSlGhhsI1toxIFfKNBv7nXkHsarZKA7tlf6cmJasJH0c3ePmTqUigFkT+IXYq63yTBAS09Tm+88FFFN7sRWLdaa7ThSkVmpGt4k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=aVOdP7f9; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="aVOdP7f9" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1758040058; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=9BgdZXOV9jOuHR2lFR2miw2wuXgEo9Gl3x3DleKQNXg=; b=aVOdP7f9IPc5cu/I0RwXC6Iemgr/uSNG93dxTCr6umwzD4U5g4nm1vinjwph9c0TPetcI6 14bXp5bhdah4Oh8x+2kw5HEyPn3H3l2xTJAzdQuo5AO8tLNE1b/AvjmM3IdIkLiuUKDxDf XUNrZNVzd3PE2EpOl6AUZ89Hgipee8c= Received: from mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-170-cKZtV1mMP2S33Jghpt58qg-1; Tue, 16 Sep 2025 12:27:37 -0400 X-MC-Unique: cKZtV1mMP2S33Jghpt58qg-1 X-Mimecast-MFC-AGG-ID: cKZtV1mMP2S33Jghpt58qg_1758040056 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 5C07B195609F for ; Tue, 16 Sep 2025 16:27:36 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.44.32.160]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 7F7A019560B8 for ; Tue, 16 Sep 2025 16:27:35 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Subject: [MPTCP next 06/12] mptcp: factor out a basic skb coalesce helper Date: Tue, 16 Sep 2025 18:27:16 +0200 Message-ID: <11c0bc6ea9c43b0da423bf5abe44834a887fd2d7.1758039775.git.pabeni@redhat.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: 2yvrMSj1U9WXn6Ao96lUnilH7z4ynOrQDsP8X9-mtTA_1758040056 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" The upcoming patch will introduced backlog processing for MPTCP socket, and we want to leverage coalescing in such data path. Factor out the relevant bits not touching memory accounting to deal with such use-case. Signed-off-by: Paolo Abeni --- net/mptcp/protocol.c | 24 ++++++++++++++++++------ 1 file changed, 18 insertions(+), 6 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 58744823effc8..e904571bb94e6 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -141,22 +141,34 @@ static void mptcp_drop(struct sock *sk, struct sk_buf= f *skb) __kfree_skb(skb); } =20 -static bool mptcp_try_coalesce(struct sock *sk, struct sk_buff *to, - struct sk_buff *from) +static int __mptcp_try_coalesce(struct sock *sk, struct sk_buff *to, + struct sk_buff *from, bool *fragstolen) { - bool fragstolen; + int limit =3D READ_ONCE(sk->sk_rcvbuf); int delta; =20 if (unlikely(MPTCP_SKB_CB(to)->cant_coalesce) || MPTCP_SKB_CB(from)->offset || - ((to->len + from->len) > (sk->sk_rcvbuf >> 3)) || - !skb_try_coalesce(to, from, &fragstolen, &delta)) - return false; + ((to->len + from->len) > (limit >> 3)) || + !skb_try_coalesce(to, from, fragstolen, &delta)) + return 0; =20 pr_debug("colesced seq %llx into %llx new len %d new end seq %llx\n", MPTCP_SKB_CB(from)->map_seq, MPTCP_SKB_CB(to)->map_seq, to->len, MPTCP_SKB_CB(from)->end_seq); MPTCP_SKB_CB(to)->end_seq =3D MPTCP_SKB_CB(from)->end_seq; + return delta; +} + +static bool mptcp_try_coalesce(struct sock *sk, struct sk_buff *to, + struct sk_buff *from) +{ + bool fragstolen; + int delta; + + delta =3D __mptcp_try_coalesce(sk, to, from, &fragstolen); + if (!delta) + return false; =20 /* note the fwd memory can reach a negative value after accounting * for the delta, but the later skb free will restore a non --=20 2.51.0 From nobody Wed Sep 17 16:11:42 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0E41937288F for ; Tue, 16 Sep 2025 16:27:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758040062; cv=none; b=X7JcFLeAItLB0mq88FThgRU0eabW0B8IBDrRlFby56PAjYcPBxUQ3/tSUhI2aM3SmAmoPg/vhQf2D5CmwCwkW9krqj4fxU7Y5Xn7iRsgQqnEVfujhgI50Fa9+eDEeBtNIRxrRZbslJE1vfUfwsFsKGN7uqmU5B2TAzIIdVSWMBw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758040062; c=relaxed/simple; bh=PJtgScV8dzM5VgAM+a+8gNGgfjKjAFecr98H5bv0vH0=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=q9WEf2Ert8dAWGTX1ePIyxdR1qBf9S1QkYbPfIMrlRIVaGhPonycKBpoxugeO5FuLVZ88y+RsjO98/gAUL6BbWh0qqDbp6E75mWzXAog2tqzxfS1Pl65vqWk6geE4CDir6qwCXbNgubfW6HnHDnTj6GY7RsH3hpvp8cuu0RavFQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=YHj2JLuL; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="YHj2JLuL" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1758040060; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5o3xrUX/cHr+FIAFtYZDDKD6Fq96KT2JfZrc8infc3U=; b=YHj2JLuL0LhvtMMqICf7WsBibNDByLQATxy910pL97LK/bZY2OkrlvgwQsf4rSiRNHg0a+ iuvU/59fhXwf1mk2LtdZCtCAgLu6Myky83D5AFJG5/1fR298sOIPDvlqz9WiOYK7/nkntF 2SvNI6f0KxMUsn1Z/4dvDeFN+LLkOcM= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-288-vIZRQ4EtPMKSJWkUGa7yQQ-1; Tue, 16 Sep 2025 12:27:38 -0400 X-MC-Unique: vIZRQ4EtPMKSJWkUGa7yQQ-1 X-Mimecast-MFC-AGG-ID: vIZRQ4EtPMKSJWkUGa7yQQ_1758040058 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id DF7FE1800578 for ; Tue, 16 Sep 2025 16:27:37 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.44.32.160]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 0829319560B8 for ; Tue, 16 Sep 2025 16:27:36 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Subject: [MPTCP next 07/12] mptcp: minor move_skbs_to_msk() cleanup Date: Tue, 16 Sep 2025 18:27:17 +0200 Message-ID: In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: 0BNPMNDUVDA1yJRNvFP4Uxu4p9p657yDXlq8ttwvWaY_1758040058 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" Such function is called only by __mptcp_data_ready(), which in turn is always invoked when msk is not owned by the user: we can drop the redundant, related check. Additionally mptcp needs to propagate the socket error only for current subflow. Signed-off-by: Paolo Abeni --- net/mptcp/protocol.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index e904571bb94e6..251760183118a 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -799,12 +799,8 @@ static bool move_skbs_to_msk(struct mptcp_sock *msk, s= truct sock *ssk) =20 moved =3D __mptcp_move_skbs_from_subflow(msk, ssk); __mptcp_ofo_queue(msk); - if (unlikely(ssk->sk_err)) { - if (!sock_owned_by_user(sk)) - __mptcp_error_report(sk); - else - __set_bit(MPTCP_ERROR_REPORT, &msk->cb_flags); - } + if (unlikely(ssk->sk_err)) + __mptcp_subflow_error_report(sk, ssk); =20 /* If the moves have caught up with the DATA_FIN sequence number * it's time to ack the DATA_FIN and change socket state, but --=20 2.51.0 From nobody Wed Sep 17 16:11:42 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 76ACF374279 for ; Tue, 16 Sep 2025 16:27:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758040064; cv=none; b=PJCzD705rd9jHImzdqSJuhh45Q4A3DIOgSEH3iWCK06tJ0FCdDy+dbr34yHlE0xJNqYpeYfKi1sNxPa0H2TrcL2VNtL1RkHJOPPYiux+4rz5iiLZu6/xqZLjIvBzR54Zj9wn8fokooEMmpKvxtuDZnoFWLPDHGekppJtr8EEXN4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758040064; c=relaxed/simple; bh=BIJ8IVHYsAVzv8+xsF+ojFBqIaY5kmu6LjJkLZMWlvw=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=NWYJDrew70KEkB4psNgKZGlT+VhizjB9ZIzJ2ksilsN0WqKNwFBgqzgcjFubGM0r8TKNGEiLwcRJ5yybPt5d1Si/8Qh/PkAFNkM8fRFJ1IYJA8urNFJ855EcPH+mggCOjbPIwdS4n6gwJ8h0aAj8dM6sNVDSvLca6z0HiTfKOdg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=DZ7G5r4U; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="DZ7G5r4U" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1758040061; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=wDUcih6KYkyyo6nbKLPecOvpIZ7zrypSdMPS6+Z6AbU=; b=DZ7G5r4UVaYSIPlnVqdNOnyrs2+TvBkqEGh/n37AG47QDX525Ad9Uta2eu5io4x/ojIWu7 NkZzs8rh5GHuG1hF35hE7ptbYJRuoQI3Nd+kn41DJC2uv7+Rkwl8JevHLXgXqexXgpud5A h8tvKWxWVDLZWoJxG8LKw42yRUvelsg= Received: from mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-414-xr8e80JUOjyIHYc4pGf_7g-1; Tue, 16 Sep 2025 12:27:40 -0400 X-MC-Unique: xr8e80JUOjyIHYc4pGf_7g-1 X-Mimecast-MFC-AGG-ID: xr8e80JUOjyIHYc4pGf_7g_1758040059 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-08.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 41D60180034D for ; Tue, 16 Sep 2025 16:27:39 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.44.32.160]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 6BE2E19560B8 for ; Tue, 16 Sep 2025 16:27:38 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Subject: [MPTCP next 08/12] mptcp: cleanup fallback data fin reception Date: Tue, 16 Sep 2025 18:27:18 +0200 Message-ID: In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: b8UEdpYEDbHChGYHFiGeEEjZgeXwA-jX8_X6LwLxXcI_1758040059 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" MPTCP currently generate a dummy data_fin for fallback socket when the fallback subflow has completed data reception using the current ack_seq. We are going to introduce backlog usage for the msk soon, even for fallback sockets: the above condition will be not be correct as it will ignore data_seq sitting in the backlog. Instead generate the dummy data_fin when the last data packet is extracted by the fallback subflow. The scenario with fallback socket receiving a reset while the receive queue empty is catched via the generic 'all subflows closed' timeout: ensure such timeout is zero for fallback sockets. Signed-off-by: Paolo Abeni --- net/mptcp/ctrl.c | 2 ++ net/mptcp/subflow.c | 16 ++++++++-------- 2 files changed, 10 insertions(+), 8 deletions(-) diff --git a/net/mptcp/ctrl.c b/net/mptcp/ctrl.c index fed40dae5583a..4f7795968abe2 100644 --- a/net/mptcp/ctrl.c +++ b/net/mptcp/ctrl.c @@ -74,6 +74,8 @@ unsigned int mptcp_stale_loss_cnt(const struct net *net) =20 unsigned int mptcp_close_timeout(const struct sock *sk) { + if (__mptcp_check_fallback(mptcp_sk(sk))) + return 0; if (sock_flag(sk, SOCK_DEAD)) return TCP_TIMEWAIT_LEN; return mptcp_get_pernet(sock_net(sk))->close_timeout; diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index c8a7e4b59db11..5339a00528a7a 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -1293,14 +1293,6 @@ static void subflow_sched_work_if_closed(struct mptc= p_sock *msk, struct sock *ss =20 if (!test_and_set_bit(MPTCP_WORK_CLOSE_SUBFLOW, &msk->flags)) mptcp_schedule_work(sk); - - /* when the fallback subflow closes the rx side, trigger a 'dummy' - * ingress data fin, so that the msk state will follow along - */ - if (__mptcp_check_fallback(msk) && subflow_is_done(ssk) && - msk->first =3D=3D ssk && - mptcp_update_rcv_data_fin(msk, READ_ONCE(msk->ack_seq), true)) - mptcp_schedule_work(sk); } =20 static bool mptcp_subflow_fail(struct mptcp_sock *msk, struct sock *ssk) @@ -1433,6 +1425,14 @@ static bool subflow_check_data_avail(struct sock *ss= k) subflow->map_data_len =3D skb->len; subflow->map_subflow_seq =3D tcp_sk(ssk)->copied_seq - subflow->ssn_offse= t; WRITE_ONCE(subflow->data_avail, true); + + /* last skb in closed fallback subflow: we are at data fin */ + if (subflow_is_done(ssk) && ssk =3D=3D msk->first && + skb =3D=3D skb_peek_tail(&ssk->sk_receive_queue)) { + mptcp_update_rcv_data_fin(msk, subflow->map_seq + + subflow->map_data_len, true); + subflow->map_data_fin =3D 1; + } return true; } =20 --=20 2.51.0 From nobody Wed Sep 17 16:11:42 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AC84B328582 for ; Tue, 16 Sep 2025 16:27:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758040065; cv=none; b=P8dJbry7YOMFvYYonWY1jWTM15PbhQFOLNq0DJ5CrlM3QtUEnBorGAo85Ko3nqe0t1W4csqWAx30Ct21T8QuEhdm9AEV2vTy4t64PkfKRIF4ArMDh1W96R8wWsPePeTHehMyvPpeznJCs2+oglXRxHVXLtTdbeqmycWrUJ2pPFQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758040065; c=relaxed/simple; bh=9BncvxTViMGzsEy3ANw+LHwRTdUzKIxctNWj/Kj8hM8=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=EXnKexFYBhm151FEMJ1FKO2EAXEtvgR7T2x/MlTE+S7LqLE7ISXTl4IDELW1agwi+1F93hEJoRGlWlM1t5cUjKLU6GhDClFiQCUWJFtPNnexZwdQUfITceKuZ/2B1FzLtv0j+dWyl3moIEUrBvz/E/O8+sEuL5kszv/Tn+11/jA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=b++xWsLs; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="b++xWsLs" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1758040062; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=9NsK4qMH9IUCod5TVzrgjdBJhqbVkn7dFZjpSCH5lRs=; b=b++xWsLsNRAYmej39RwWEqokImOT9ckkFKHhoiBKVOi6vUTgZQR0Sp0vtERPrFBST2Wt0P Ot4wsyBGvFu0ffc2uz4v4uyaq+egDfax5UIss3Jjsck6UBG3Twrd8E6cfRX4ubs2Fd+xhh SBO66S531gJoE0w86OuwLhd7t0SOZEo= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-400-N3ivrzKRNRe18bWIrMhOMQ-1; Tue, 16 Sep 2025 12:27:41 -0400 X-MC-Unique: N3ivrzKRNRe18bWIrMhOMQ-1 X-Mimecast-MFC-AGG-ID: N3ivrzKRNRe18bWIrMhOMQ_1758040060 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id A14CA19560B2 for ; Tue, 16 Sep 2025 16:27:40 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.44.32.160]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id C205F19560B8 for ; Tue, 16 Sep 2025 16:27:39 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Subject: [MPTCP next 09/12] mptcp: leverage the sk backlog for RX packet processing. Date: Tue, 16 Sep 2025 18:27:19 +0200 Message-ID: In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: har8U2nhrrR6TV0CIl7CKdpOmdSeJaktXFOZ4h79gu8_1758040060 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" This streamline the RX path implementation and improves the RX performances by reducing the subflow-level locking and the amount of work done under the msk socket lock; the implementation mirror closely the TCP backlog processing. Note that MPTCP needs now to traverse the existing subflow looking for data that was left there due to the msk receive buffer full, only after that recvmsg completely empties the receive queue. Signed-off-by: Paolo Abeni --- net/mptcp/protocol.c | 107 ++++++++++++++++++++++++++++++------------- net/mptcp/protocol.h | 2 +- 2 files changed, 75 insertions(+), 34 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 251760183118a..9c3baed948d1d 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -357,10 +357,31 @@ static void mptcp_init_skb(struct sock *ssk, skb_dst_drop(skb); } =20 -static bool __mptcp_move_skb(struct mptcp_sock *msk, struct sk_buff *skb) +static void __mptcp_add_backlog(struct sock *sk, struct sock *ssk, + struct sk_buff *skb) +{ + struct sk_buff *tail =3D sk->sk_backlog.tail; + bool fragstolen; + int delta; + + if (tail && MPTCP_SKB_CB(skb)->map_seq =3D=3D MPTCP_SKB_CB(tail)->end_seq= ) { + delta =3D __mptcp_try_coalesce(sk, tail, skb, &fragstolen); + if (delta) { + sk->sk_backlog.len +=3D delta; + kfree_skb_partial(skb, fragstolen); + return; + } + } + + /* mptcp checks the limit before adding the skb to the backlog */ + __sk_add_backlog(sk, skb); + sk->sk_backlog.len +=3D skb->truesize; +} + +static bool __mptcp_move_skb(struct sock *sk, struct sk_buff *skb) { u64 copy_len =3D MPTCP_SKB_CB(skb)->end_seq - MPTCP_SKB_CB(skb)->map_seq; - struct sock *sk =3D (struct sock *)msk; + struct mptcp_sock *msk =3D mptcp_sk(sk); struct sk_buff *tail; =20 /* try to fetch required memory from subflow */ @@ -632,7 +653,7 @@ static void mptcp_dss_corruption(struct mptcp_sock *msk= , struct sock *ssk) } =20 static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk, - struct sock *ssk) + struct sock *ssk, bool own_msk) { struct mptcp_subflow_context *subflow =3D mptcp_subflow_ctx(ssk); struct sock *sk =3D (struct sock *)msk; @@ -643,12 +664,13 @@ static bool __mptcp_move_skbs_from_subflow(struct mpt= cp_sock *msk, pr_debug("msk=3D%p ssk=3D%p\n", msk, ssk); tp =3D tcp_sk(ssk); do { + int mem =3D own_msk ? sk_rmem_alloc_get(sk) : sk->sk_backlog.len; u32 map_remaining, offset; u32 seq =3D tp->copied_seq; struct sk_buff *skb; bool fin; =20 - if (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf) + if (mem > READ_ONCE(sk->sk_rcvbuf)) break; =20 /* try to move as much data as available */ @@ -678,7 +700,11 @@ static bool __mptcp_move_skbs_from_subflow(struct mptc= p_sock *msk, =20 mptcp_init_skb(ssk, subflow, skb, offset, len); skb_orphan(skb); - ret =3D __mptcp_move_skb(msk, skb) || ret; + + if (own_msk) + ret |=3D __mptcp_move_skb(sk, skb); + else + __mptcp_add_backlog(sk, ssk, skb); seq +=3D len; =20 if (unlikely(map_remaining < len)) { @@ -699,7 +725,7 @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp= _sock *msk, =20 } while (more_data_avail); =20 - if (ret) + if (ret && own_msk) msk->last_data_recv =3D tcp_jiffies32; return ret; } @@ -797,7 +823,7 @@ static bool move_skbs_to_msk(struct mptcp_sock *msk, st= ruct sock *ssk) struct sock *sk =3D (struct sock *)msk; bool moved; =20 - moved =3D __mptcp_move_skbs_from_subflow(msk, ssk); + moved =3D __mptcp_move_skbs_from_subflow(msk, ssk, true); __mptcp_ofo_queue(msk); if (unlikely(ssk->sk_err)) __mptcp_subflow_error_report(sk, ssk); @@ -812,18 +838,10 @@ static bool move_skbs_to_msk(struct mptcp_sock *msk, = struct sock *ssk) return moved; } =20 -static void __mptcp_data_ready(struct sock *sk, struct sock *ssk) -{ - struct mptcp_sock *msk =3D mptcp_sk(sk); - - /* Wake-up the reader only for in-sequence data */ - if (move_skbs_to_msk(msk, ssk) && mptcp_epollin_ready(sk)) - sk->sk_data_ready(sk); -} - void mptcp_data_ready(struct sock *sk, struct sock *ssk) { struct mptcp_subflow_context *subflow =3D mptcp_subflow_ctx(ssk); + struct mptcp_sock *msk =3D mptcp_sk(sk); =20 /* The peer can send data while we are shutting down this * subflow at msk destruction time, but we must avoid enqueuing @@ -833,13 +851,33 @@ void mptcp_data_ready(struct sock *sk, struct sock *s= sk) return; =20 mptcp_data_lock(sk); - if (!sock_owned_by_user(sk)) - __mptcp_data_ready(sk, ssk); - else - __set_bit(MPTCP_DEQUEUE, &mptcp_sk(sk)->cb_flags); + if (!sock_owned_by_user(sk)) { + /* Wake-up the reader only for in-sequence data */ + if (move_skbs_to_msk(msk, ssk) && mptcp_epollin_ready(sk)) + sk->sk_data_ready(sk); + } else { + __mptcp_move_skbs_from_subflow(msk, ssk, false); + if (unlikely(ssk->sk_err)) + __set_bit(MPTCP_ERROR_REPORT, &msk->cb_flags); + } mptcp_data_unlock(sk); } =20 +static int mptcp_move_skb(struct sock *sk, struct sk_buff *skb) +{ + struct mptcp_sock *msk =3D mptcp_sk(sk); + + if (__mptcp_move_skb(sk, skb)) { + msk->last_data_recv =3D tcp_jiffies32; + __mptcp_ofo_queue(msk); + /* notify ack seq update */ + mptcp_cleanup_rbuf(msk, 0); + mptcp_check_data_fin(sk); + sk->sk_data_ready(sk); + } + return 0; +} + static void mptcp_subflow_joined(struct mptcp_sock *msk, struct sock *ssk) { mptcp_subflow_ctx(ssk)->map_seq =3D READ_ONCE(msk->ack_seq); @@ -2085,7 +2123,7 @@ static bool __mptcp_move_skbs(struct sock *sk) =20 ssk =3D mptcp_subflow_tcp_sock(subflow); slowpath =3D lock_sock_fast(ssk); - ret =3D __mptcp_move_skbs_from_subflow(msk, ssk) || ret; + ret =3D __mptcp_move_skbs_from_subflow(msk, ssk, true) || ret; if (unlikely(ssk->sk_err)) __mptcp_error_report(sk); unlock_sock_fast(ssk, slowpath); @@ -2159,8 +2197,12 @@ static int mptcp_recvmsg(struct sock *sk, struct msg= hdr *msg, size_t len, =20 copied +=3D bytes_read; =20 - if (skb_queue_empty(&sk->sk_receive_queue) && __mptcp_move_skbs(sk)) - continue; + if (skb_queue_empty(&sk->sk_receive_queue)) { + __sk_flush_backlog(sk); + if (!skb_queue_empty(&sk->sk_receive_queue) || + __mptcp_move_skbs(sk)) + continue; + } =20 /* only the MPTCP socket status is relevant here. The exit * conditions mirror closely tcp_recvmsg() @@ -2508,7 +2550,6 @@ static void __mptcp_close_subflow(struct sock *sk) =20 mptcp_close_ssk(sk, ssk, subflow); } - } =20 static bool mptcp_close_tout_expired(const struct sock *sk) @@ -3092,6 +3133,13 @@ bool __mptcp_close(struct sock *sk, long timeout) pr_debug("msk=3D%p state=3D%d\n", sk, sk->sk_state); mptcp_pm_connection_closed(msk); =20 + /* process the backlog; note that it never destroies the msk */ + local_bh_disable(); + bh_lock_sock(sk); + __release_sock(sk); + bh_unlock_sock(sk); + local_bh_enable(); + if (sk->sk_state =3D=3D TCP_CLOSE) { __mptcp_destroy_sock(sk); do_cancel_work =3D true; @@ -3392,8 +3440,7 @@ void __mptcp_check_push(struct sock *sk, struct sock = *ssk) =20 #define MPTCP_FLAGS_PROCESS_CTX_NEED (BIT(MPTCP_PUSH_PENDING) | \ BIT(MPTCP_RETRANSMIT) | \ - BIT(MPTCP_FLUSH_JOIN_LIST) | \ - BIT(MPTCP_DEQUEUE)) + BIT(MPTCP_FLUSH_JOIN_LIST)) =20 /* processes deferred events and flush wmem */ static void mptcp_release_cb(struct sock *sk) @@ -3427,11 +3474,6 @@ static void mptcp_release_cb(struct sock *sk) __mptcp_push_pending(sk, 0); if (flags & BIT(MPTCP_RETRANSMIT)) __mptcp_retrans(sk); - if ((flags & BIT(MPTCP_DEQUEUE)) && __mptcp_move_skbs(sk)) { - /* notify ack seq update */ - mptcp_cleanup_rbuf(msk, 0); - sk->sk_data_ready(sk); - } =20 cond_resched(); spin_lock_bh(&sk->sk_lock.slock); @@ -3668,8 +3710,6 @@ static int mptcp_ioctl(struct sock *sk, int cmd, int = *karg) return -EINVAL; =20 lock_sock(sk); - if (__mptcp_move_skbs(sk)) - mptcp_cleanup_rbuf(msk, 0); *karg =3D mptcp_inq_hint(sk); release_sock(sk); break; @@ -3781,6 +3821,7 @@ static struct proto mptcp_prot =3D { .sendmsg =3D mptcp_sendmsg, .ioctl =3D mptcp_ioctl, .recvmsg =3D mptcp_recvmsg, + .backlog_rcv =3D mptcp_move_skb, .release_cb =3D mptcp_release_cb, .hash =3D mptcp_hash, .unhash =3D mptcp_unhash, diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index 128baea5b496e..a6e775d6412e5 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -124,7 +124,6 @@ #define MPTCP_FLUSH_JOIN_LIST 5 #define MPTCP_SYNC_STATE 6 #define MPTCP_SYNC_SNDBUF 7 -#define MPTCP_DEQUEUE 8 =20 struct mptcp_skb_cb { u64 map_seq; @@ -407,6 +406,7 @@ static inline int mptcp_space_from_win(const struct soc= k *sk, int win) static inline int __mptcp_space(const struct sock *sk) { return mptcp_win_from_space(sk, READ_ONCE(sk->sk_rcvbuf) - + READ_ONCE(sk->sk_backlog.len) - sk_rmem_alloc_get(sk)); } =20 --=20 2.51.0 From nobody Wed Sep 17 16:11:42 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 973B52F7AA6 for ; Tue, 16 Sep 2025 16:27:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758040071; cv=none; b=t+tTYtOcShIoyvBhirQvp5bDivPYJ1MCitv93kbLNG0DwFYOrKTtuF2ZPtBkjqHgA2HRFmmg3Mui5W4yjSHZFTtrneSpL7tjZbeIqXkxSPzfUgTHBcVXBVUGWb+wpNLzeKCDB3i3HXjCs6i89cMI2jquq220680EF2vZLkFpaxE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758040071; c=relaxed/simple; bh=vlTAfd2YF5vOaQohY0F6PvXTTCfn58beu7u8KzirW9s=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=JM23McdMTlbyyxppb40+oD7Iqt/knEuv/lKG5emWOEWw1iwRdiIYdhsO4Hznwds3h3IGJHmZBbHRVYYufpKJp7N+gfwTEOKgkLilPskBJHSkbzWj+b1P8DZvmUZdRrOO4CffDBGOVRP7FYgz/6IzgzbT//tDWHQbKopKjjL1aU8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=FDdFD21U; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="FDdFD21U" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1758040068; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hrsxKyYgpVFXf9TFixAIDt7teXpvh2As8MNaCoOdS6s=; b=FDdFD21UXBbQysR5PBEnCUU4KTFetf+DtyZH8Eyuw5u3WJjcEG+LB29yZqAUomhEk6IVNO V/RcbQT8Rtr+5unsCmymnVWYKLLuUJYhAgYbUczZPYIFohLRUlrnRaFzN8vsW6+acgJdSQ FqURSI1W/qCDIXnzkm+ojqswj9rSlKk= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-211-yOBPHOg7P-SskSu8OuQdBg-1; Tue, 16 Sep 2025 12:27:44 -0400 X-MC-Unique: yOBPHOg7P-SskSu8OuQdBg-1 X-Mimecast-MFC-AGG-ID: yOBPHOg7P-SskSu8OuQdBg_1758040062 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 093031800299 for ; Tue, 16 Sep 2025 16:27:42 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.44.32.160]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 2C6C419560B8 for ; Tue, 16 Sep 2025 16:27:40 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Subject: [MPTCP next 10/12] mptcp: prevernt __mptcp_move_skbs() interferring with the fastpath Date: Tue, 16 Sep 2025 18:27:20 +0200 Message-ID: In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: 2x6yNHOPbGlxGG7XIRi_wA_rkCPxrWpXfR6kRofiQ4o_1758040062 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" skbs will be left waiting in the subflow only in exceptional cases, we want to avoid messing with the fast path by unintentionally processing in __mptcp_move_skbs() packets landed into the subflows after the last check. Use a separate flag to mark delayed skbs and only process subflow with such flag set. Also add new mibs to track the exceptional events. Signed-off-by: Paolo Abeni --- net/mptcp/mib.c | 2 ++ net/mptcp/mib.h | 4 ++++ net/mptcp/protocol.c | 40 ++++++++++++---------------------------- net/mptcp/protocol.h | 1 + 4 files changed, 19 insertions(+), 28 deletions(-) diff --git a/net/mptcp/mib.c b/net/mptcp/mib.c index cf879c188ca26..7af9d35cde884 100644 --- a/net/mptcp/mib.c +++ b/net/mptcp/mib.c @@ -85,6 +85,8 @@ static const struct snmp_mib mptcp_snmp_list[] =3D { SNMP_MIB_ITEM("DssFallback", MPTCP_MIB_DSSFALLBACK), SNMP_MIB_ITEM("SimultConnectFallback", MPTCP_MIB_SIMULTCONNFALLBACK), SNMP_MIB_ITEM("FallbackFailed", MPTCP_MIB_FALLBACKFAILED), + SNMP_MIB_ITEM("RcvDelayed", MPTCP_MIB_RCVDELAYED), + SNMP_MIB_ITEM("DelayedProcess", MPTCP_MIB_DELAYED_PROCESS), SNMP_MIB_SENTINEL }; =20 diff --git a/net/mptcp/mib.h b/net/mptcp/mib.h index 309bac6fea325..f6d0eaea463e5 100644 --- a/net/mptcp/mib.h +++ b/net/mptcp/mib.h @@ -88,6 +88,10 @@ enum linux_mptcp_mib_field { MPTCP_MIB_DSSFALLBACK, /* Bad or missing DSS */ MPTCP_MIB_SIMULTCONNFALLBACK, /* Simultaneous connect */ MPTCP_MIB_FALLBACKFAILED, /* Can't fallback due to msk status */ + MPTCP_MIB_RCVDELAYED, /* Data move from subflow is delayed due to msk + * receive buffer full + */ + MPTCP_MIB_DELAYED_PROCESS, /* Delayed data moved in slowpath */ __MPTCP_MIB_MAX }; =20 diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 9c3baed948d1d..f211a939daf39 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -665,13 +665,17 @@ static bool __mptcp_move_skbs_from_subflow(struct mpt= cp_sock *msk, tp =3D tcp_sk(ssk); do { int mem =3D own_msk ? sk_rmem_alloc_get(sk) : sk->sk_backlog.len; + bool over_limit =3D mem > READ_ONCE(sk->sk_rcvbuf); u32 map_remaining, offset; u32 seq =3D tp->copied_seq; struct sk_buff *skb; bool fin; =20 - if (mem > READ_ONCE(sk->sk_rcvbuf)) + WRITE_ONCE(subflow->data_delayed, over_limit); + if (subflow->data_delayed) { + MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_RCVDELAYED); break; + } =20 /* try to move as much data as available */ map_remaining =3D subflow->map_data_len - @@ -2081,32 +2085,13 @@ static void mptcp_rcv_space_adjust(struct mptcp_soc= k *msk, int copied) msk->rcvq_space.time =3D mstamp; } =20 -static struct mptcp_subflow_context * -__mptcp_first_ready_from(struct mptcp_sock *msk, - struct mptcp_subflow_context *subflow) -{ - struct mptcp_subflow_context *start_subflow =3D subflow; - - while (!READ_ONCE(subflow->data_avail)) { - subflow =3D mptcp_next_subflow(msk, subflow); - if (subflow =3D=3D start_subflow) - return NULL; - } - return subflow; -} - static bool __mptcp_move_skbs(struct sock *sk) { struct mptcp_subflow_context *subflow; struct mptcp_sock *msk =3D mptcp_sk(sk); bool ret =3D false; =20 - if (list_empty(&msk->conn_list)) - return false; - - subflow =3D list_first_entry(&msk->conn_list, - struct mptcp_subflow_context, node); - for (;;) { + mptcp_for_each_subflow(msk, subflow) { struct sock *ssk; bool slowpath; =20 @@ -2117,23 +2102,22 @@ static bool __mptcp_move_skbs(struct sock *sk) if (sk_rmem_alloc_get(sk) > sk->sk_rcvbuf) break; =20 - subflow =3D __mptcp_first_ready_from(msk, subflow); - if (!subflow) - break; + if (!subflow->data_delayed) + continue; =20 ssk =3D mptcp_subflow_tcp_sock(subflow); slowpath =3D lock_sock_fast(ssk); - ret =3D __mptcp_move_skbs_from_subflow(msk, ssk, true) || ret; + ret |=3D __mptcp_move_skbs_from_subflow(msk, ssk, true); if (unlikely(ssk->sk_err)) __mptcp_error_report(sk); unlock_sock_fast(ssk, slowpath); - - subflow =3D mptcp_next_subflow(msk, subflow); } =20 __mptcp_ofo_queue(msk); - if (ret) + if (ret) { + MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_DELAYED_PROCESS); mptcp_check_data_fin((struct sock *)msk); + } return ret; } =20 diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index a6e775d6412e5..2905e4b250070 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -559,6 +559,7 @@ struct mptcp_subflow_context { u8 reset_transient:1; u8 reset_reason:4; u8 stale_count; + bool data_delayed; =20 u32 subflow_id; =20 --=20 2.51.0 From nobody Wed Sep 17 16:11:42 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 95DB9328582 for ; Tue, 16 Sep 2025 16:27:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758040068; cv=none; b=dCFNfcN+ckmZl+fbTnZEYOrqlOHTugLZknWQUlxPdJCxctkz6RYoPxHzzT11I8xoKEpE+G2Hu5VNBcsrylY+ko8s2WnwMvqgyrNVLTJ3fxPiU8ZhweKXVkkTxI94H8hO7w3lydKMp7C7WHkvNgst+c1XZ1IGaSgtk4TZ4IEhkxI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758040068; c=relaxed/simple; bh=WP9/kZlIOsylZwrQ9Z55IMaTVHcZG5H4am5Ydp+klBc=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=T7sdb8OXi4mjgn/FGjJVe5epc+ZS6bj4MrL5QO6OpBtEnRQC3BoGMlpVFhDknEReQt6VFkFlhSflcrH1/xQsEKreexYQnynY/SNv8pdPizZPE5o12lXio1CKEpwxzYt/STvqy02pHPqSImaMN0VDhEUT1L7v6WFZ/z8JO0cJmt0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=e2Zr7b8n; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="e2Zr7b8n" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1758040065; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=crHEmzEwLcvNIVKD9ZopN/7/nNTf/0EfsKiEGs/aLdI=; b=e2Zr7b8nSEmPMhqFcx3EtIo6FXMji1znOw+XUkhfI/oWC+udSCsjKEfh7/Mtbn8gQJ57dR RojvjrP7O1isVv0uuc9uHmVEIE4zB7LNE0HtR7NmD4m45F8bRSUA9ebEKROkstFEGoFRv8 PYK24fYjKDfvYwzWSIurnqmPbmq/QLM= Received: from mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-520-UBhJA6S2O9OF0JvXkbtKGA-1; Tue, 16 Sep 2025 12:27:44 -0400 X-MC-Unique: UBhJA6S2O9OF0JvXkbtKGA-1 X-Mimecast-MFC-AGG-ID: UBhJA6S2O9OF0JvXkbtKGA_1758040063 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-04.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 6977D1955DB8 for ; Tue, 16 Sep 2025 16:27:43 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.44.32.160]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 8845C19560B8 for ; Tue, 16 Sep 2025 16:27:42 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Subject: [MPTCP next 11/12] mptcp: borrow forward memory from subflow Date: Tue, 16 Sep 2025 18:27:21 +0200 Message-ID: In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: IKo-5y_H6Am0yL4nFric_7D5zdyYDvq0bGU5VB8iWEM_1758040063 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" In the MPTCP receive path, we release the subflow allocated fwd memory just to allocate it again shortly after for the msk. That could increases the change of failures, especially during backlog processing, when other actions could consume the just released memory before the msk socket has a chance to do the rcv allocation. Explicitly borrow, with a PAGE_SIZE granularity, the fwd memory from the subflow socket instead of releasing it. During backlog processing the borrowed memory is accounted at release_cb time. Signed-off-by: Paolo Abeni --- net/mptcp/protocol.c | 29 ++++++++++++++++++++++------- net/mptcp/protocol.h | 1 + 2 files changed, 23 insertions(+), 7 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index f211a939daf39..b883e8548dcb8 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -335,11 +335,12 @@ static void mptcp_data_queue_ofo(struct mptcp_sock *m= sk, struct sk_buff *skb) mptcp_rcvbuf_grow(sk); } =20 -static void mptcp_init_skb(struct sock *ssk, - struct mptcp_subflow_context *subflow, - struct sk_buff *skb, int offset, int copy_len) +static int mptcp_init_skb(struct sock *ssk, + struct mptcp_subflow_context *subflow, + struct sk_buff *skb, int offset, int copy_len) { bool has_rxtstamp =3D TCP_SKB_CB(skb)->has_rxtstamp; + int borrowed; =20 /* the skb map_seq accounts for the skb offset: * mptcp_subflow_get_mapped_dsn() is based on the current tp->copied_seq @@ -355,6 +356,15 @@ static void mptcp_init_skb(struct sock *ssk, =20 skb_ext_reset(skb); skb_dst_drop(skb); + + /* "borrow" the fwd memory from the subflow, instead of reclaiming it */ + skb->destructor =3D NULL; + skb->sk =3D NULL; + atomic_sub(skb->truesize, &ssk->sk_rmem_alloc); + borrowed =3D ssk->sk_forward_alloc - sk_unused_reserved_mem(ssk); + borrowed &=3D ~(PAGE_SIZE - 1); + sk_forward_alloc_add(ssk, skb->truesize - borrowed); + return borrowed; } =20 static void __mptcp_add_backlog(struct sock *sk, struct sock *ssk, @@ -701,14 +711,17 @@ static bool __mptcp_move_skbs_from_subflow(struct mpt= cp_sock *msk, =20 if (offset < skb->len) { size_t len =3D skb->len - offset; + int bmem; =20 - mptcp_init_skb(ssk, subflow, skb, offset, len); - skb_orphan(skb); + bmem =3D mptcp_init_skb(ssk, subflow, skb, offset, len); =20 - if (own_msk) + if (own_msk) { + sk_forward_alloc_add(sk, bmem); ret |=3D __mptcp_move_skb(sk, skb); - else + } else { + msk->borrowed_fwd_mem +=3D bmem; __mptcp_add_backlog(sk, ssk, skb); + } seq +=3D len; =20 if (unlikely(map_remaining < len)) { @@ -3477,6 +3490,8 @@ static void mptcp_release_cb(struct sock *sk) if (__test_and_clear_bit(MPTCP_SYNC_SNDBUF, &msk->cb_flags)) __mptcp_sync_sndbuf(sk); } + sk_forward_alloc_add(sk, msk->borrowed_fwd_mem); + msk->borrowed_fwd_mem =3D 0; } =20 /* MP_JOIN client subflow must wait for 4th ack before sending any data: diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index 2905e4b250070..93e7b1b3fe359 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -297,6 +297,7 @@ struct mptcp_sock { u32 last_data_sent; u32 last_data_recv; u32 last_ack_recv; + int borrowed_fwd_mem; unsigned long timer_ival; u32 token; unsigned long flags; --=20 2.51.0 From nobody Wed Sep 17 16:11:42 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C24892F7AA6 for ; Tue, 16 Sep 2025 16:27:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758040069; cv=none; b=NBlAj9LwKAt0vncQ8rd+qDk0Y0qYiXnhZbabvjHY/wGXaKiChiLRlqqHPyrvjr0LRZPZa3K5yPJ0l1lmddRvKOuKctj6g9XLVi1k9OCHsgDbYEE/tVcO3AvAVEpC/sd4AQN2HUSCPtNTJfv2Knu3qxz3MOhkDDGi+f+T5lgFO5k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758040069; c=relaxed/simple; bh=pJyzsL3Ix0u5cgdVaKyoArqSER4UMTDQpnulcFnl64o=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=Qpg7dua4430XrFYCr0ygme31SWUJ9TFHBbDtt8hzcpEUuC2qVoWwTHzo34D5ffBGbhtKACXvPwIdZpFHbShdWPyZc6IKZ7+nQe6uurkEmAXQd2YOLVYh06rHHtn8XApBp0v1mB1P2ZRynJhtdSNuwtD7z+JFGlZcAtE62oCM590= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=UlfPjlrk; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="UlfPjlrk" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1758040066; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=cRm3UYvPTTHEruegNJeNiN2HTqTNZ83y0Qlvq+PF4T0=; b=UlfPjlrkwgQ9NhnAE5zs0Ny/YHZLCgfFsgUMnd57XsaJMSuN+lYICBsKtKGt7MCvvPSRy6 OAxdATrKsYONiWaMNFTG6IlYtKpmPgYlRnjiljtimGIVMeRZ8eAq0HfGAXZwxgkWBYmkZD Wr93mivOJjf3vKusnTLHb5rdy32FCog= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-59-bcljUIhWMXKxMZo3aox7Bw-1; Tue, 16 Sep 2025 12:27:45 -0400 X-MC-Unique: bcljUIhWMXKxMZo3aox7Bw-1 X-Mimecast-MFC-AGG-ID: bcljUIhWMXKxMZo3aox7Bw_1758040064 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id C063019560BB for ; Tue, 16 Sep 2025 16:27:44 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.44.32.160]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id E64EF19560B8 for ; Tue, 16 Sep 2025 16:27:43 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Subject: [MPTCP next 12/12] mptcp: make fallback backlog aware Date: Tue, 16 Sep 2025 18:27:22 +0200 Message-ID: In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: wAejePXujbKSqiSwwjvGsZWoU_bbC5jMLNsXjXXfKvA_1758040064 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" MPTCP can't relay on ack_seq outside the msk socket log scope. With packets pending in the backlog, such value can be quite far from the actual dummy correct sequence map value. Fallback dummy mapping are in sequence by definition, generate the data seq from the subflow sequence. Signed-off-by: Paolo Abeni --- net/mptcp/subflow.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index 5339a00528a7a..02bf89f6b5a1a 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -491,6 +491,9 @@ static void subflow_set_remote_key(struct mptcp_sock *m= sk, mptcp_crypto_key_sha(subflow->remote_key, NULL, &subflow->iasn); subflow->iasn++; =20 + /* for fallback's sake */ + subflow->map_seq =3D subflow->iasn; + WRITE_ONCE(msk->remote_key, subflow->remote_key); WRITE_ONCE(msk->ack_seq, subflow->iasn); WRITE_ONCE(msk->can_ack, true); @@ -1421,9 +1424,12 @@ static bool subflow_check_data_avail(struct sock *ss= k) =20 skb =3D skb_peek(&ssk->sk_receive_queue); subflow->map_valid =3D 1; - subflow->map_seq =3D READ_ONCE(msk->ack_seq); subflow->map_data_len =3D skb->len; subflow->map_subflow_seq =3D tcp_sk(ssk)->copied_seq - subflow->ssn_offse= t; + subflow->map_seq =3D __mptcp_expand_seq(subflow->map_seq, + subflow->iasn + + TCP_SKB_CB(skb)->seq - + subflow->ssn_offset - 1); WRITE_ONCE(subflow->data_avail, true); =20 /* last skb in closed fallback subflow: we are at data fin */ --=20 2.51.0