From nobody Thu Sep 18 12:40:05 2025 Delivered-To: wpasupplicant.patchew@gmail.com Received: by 2002:a02:1d48:0:0:0:0:0 with SMTP id 69csp2186198jaj; Fri, 17 Sep 2021 08:39:17 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwU9pOygVnXtHMcR7ESAAkjoOgB8nz0hhT0ec7tV//3b75ZNm/2+wxbKMTEqtw0Pnq7romr X-Received: by 2002:a0c:b2c5:: with SMTP id d5mr11573507qvf.65.1631893157003; Fri, 17 Sep 2021 08:39:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1631893156; cv=none; d=google.com; s=arc-20160816; b=jI56AHZsEAMttBWniYwHR0hw1M+/pnHqUeKu/8v8Q7RW1P8PRSCp+mm/i474xWsk73 Ev1AYXxgkM45+CcOry/FLCQIwMgO7gFZ5Nftno4X1uRb1MVZAQ+SlFD6kZ+xGIOoCQ5+ 2Jb4oSeekY82wmeuRLE0J68E5tztRDwrAL0/d/fKVbynYkpdVTGEOvn8f6y5o6oUHAQ/ /L444mgAzf3ADDrEfmEWBlzd73ojXT6iWIEJJ+AhJfxh8rgOC+67BpjExayboa7ls7yl MW17gQ3cZJ5KBVrZWRdGo/902L6M3bo0LYuF20IPJQuXnW3femM/FdYZQeAREQnMBr/A MSSQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=BBe+YUebzRWUqHknG9R3JD0QUOLI6kwE9YOHiB2f5eM=; b=Lf1FrKrmDV5Nz//2bWXtD0nATJJumbMrz8PiIfzeON85AFnld9vYMpMmVYYF5PUvfX BafZPxEQmlWdYpui3z8XQ/lsUYJLGwfPYy6PkPZKmw97GhLIZw4YnQ67ToucYW2leuh1 aF1MORApUM/ndaaBC9GwsN/Nq+41Nda0BCdKOC3lnQwdgpRgO9lmKYcGWLOf4GpvBwvH vijQ5PFXGN04wGcEWwunLP6RObBjg70Cw2uUBzzOvmbe0uHw39jNm/iP95NzJP47kCYe CHfrhWzu3NPrML+FDOqh78pp1Dl26MwejKbnXyJ+Wi+3ItAR/Mk1bys+ZoEmAn3udGer rY5w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=S+hZDPP5; spf=pass (google.com: domain of mptcp+bounces-1950-wpasupplicant.patchew=gmail.com@lists.linux.dev designates 2604:1380:1:3600::1 as permitted sender) smtp.mailfrom="mptcp+bounces-1950-wpasupplicant.patchew=gmail.com@lists.linux.dev"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from ewr.edge.kernel.org (ewr.edge.kernel.org. [2604:1380:1:3600::1]) by mx.google.com with ESMTPS id j14si1347050qkp.181.2021.09.17.08.39.16 for (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Fri, 17 Sep 2021 08:39:16 -0700 (PDT) Received-SPF: pass (google.com: domain of mptcp+bounces-1950-wpasupplicant.patchew=gmail.com@lists.linux.dev designates 2604:1380:1:3600::1 as permitted sender) client-ip=2604:1380:1:3600::1; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=S+hZDPP5; spf=pass (google.com: domain of mptcp+bounces-1950-wpasupplicant.patchew=gmail.com@lists.linux.dev designates 2604:1380:1:3600::1 as permitted sender) smtp.mailfrom="mptcp+bounces-1950-wpasupplicant.patchew=gmail.com@lists.linux.dev"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ewr.edge.kernel.org (Postfix) with ESMTPS id 5884A1C0F8B for ; Fri, 17 Sep 2021 15:39:16 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 200762FB2; Fri, 17 Sep 2021 15:39:15 +0000 (UTC) X-Original-To: mptcp@lists.linux.dev Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9B5712FAE for ; Fri, 17 Sep 2021 15:39:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1631893152; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=BBe+YUebzRWUqHknG9R3JD0QUOLI6kwE9YOHiB2f5eM=; b=S+hZDPP5QKr1FSgmcpL5Gs26HyaWBRgQyBu9n8teM0cOBHyEsUVg3qnFw1EIgJ0eSoHaVw sePyc1JY2oySeN6pZ8lZQsa4Wc9G1d6AJ0nKG5SC4Iso6RwszzReTEa/Yf24pwwT1+ePtd bEFZMuPm01m9OK9HOsSbRsjAHN1IHbQ= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-573-xSwYvZEbMeidIBqhx-acuQ-1; Fri, 17 Sep 2021 11:39:11 -0400 X-MC-Unique: xSwYvZEbMeidIBqhx-acuQ-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 8F01B3FA1; Fri, 17 Sep 2021 15:39:09 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.39.192.44]) by smtp.corp.redhat.com (Postfix) with ESMTP id E173A2C175; Fri, 17 Sep 2021 15:39:07 +0000 (UTC) From: Paolo Abeni To: netdev@vger.kernel.org Cc: "David S. Miller" , Jakub Kicinski , Mat Martineau , Ayush Sawal , Eric Dumazet , mptcp@lists.linux.dev Subject: [RFC PATCH 3/5] mptcp: stop relying on tcp_tx_skb_cache Date: Fri, 17 Sep 2021 17:38:38 +0200 Message-Id: <2a69b0b2231e0c7126c8448381c980d3307b28be.1631888517.git.pabeni@redhat.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 Content-Type: text/plain; charset="utf-8" We want to revert the skb TX cache, but MPTCP is currently using it unconditionally. Rework the MPTCP tx code, so that tcp_tx_skb_cache is not needed anymore: do the whole coalescing check, skb allocation skb initialization/update inside mptcp_sendmsg_frag(), quite alike the current TCP code. Reviewed-by: Mat Martineau Signed-off-by: Paolo Abeni --- net/mptcp/protocol.c | 137 ++++++++++++++++++++++++------------------- 1 file changed, 77 insertions(+), 60 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 2602f1386160..95503dadab55 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -1224,6 +1224,7 @@ static struct sk_buff *__mptcp_do_alloc_tx_skb(struct= sock *sk, gfp_t gfp) if (likely(__mptcp_add_ext(skb, gfp))) { skb_reserve(skb, MAX_TCP_HEADER); skb->reserved_tailroom =3D skb->end - skb->tail; + INIT_LIST_HEAD(&skb->tcp_tsorted_anchor); return skb; } __kfree_skb(skb); @@ -1233,31 +1234,23 @@ static struct sk_buff *__mptcp_do_alloc_tx_skb(stru= ct sock *sk, gfp_t gfp) return NULL; } =20 -static bool __mptcp_alloc_tx_skb(struct sock *sk, struct sock *ssk, gfp_t = gfp) +static struct sk_buff *__mptcp_alloc_tx_skb(struct sock *sk, struct sock *= ssk, gfp_t gfp) { struct sk_buff *skb; =20 - if (ssk->sk_tx_skb_cache) { - skb =3D ssk->sk_tx_skb_cache; - if (unlikely(!skb_ext_find(skb, SKB_EXT_MPTCP) && - !__mptcp_add_ext(skb, gfp))) - return false; - return true; - } - skb =3D __mptcp_do_alloc_tx_skb(sk, gfp); if (!skb) - return false; + return NULL; =20 if (likely(sk_wmem_schedule(ssk, skb->truesize))) { - ssk->sk_tx_skb_cache =3D skb; - return true; + skb_entail(ssk, skb); + return skb; } kfree_skb(skb); - return false; + return NULL; } =20 -static bool mptcp_alloc_tx_skb(struct sock *sk, struct sock *ssk, bool dat= a_lock_held) +static struct sk_buff *mptcp_alloc_tx_skb(struct sock *sk, struct sock *ss= k, bool data_lock_held) { gfp_t gfp =3D data_lock_held ? GFP_ATOMIC : sk->sk_allocation; =20 @@ -1287,23 +1280,29 @@ static int mptcp_sendmsg_frag(struct sock *sk, stru= ct sock *ssk, struct mptcp_sendmsg_info *info) { u64 data_seq =3D dfrag->data_seq + info->sent; + int offset =3D dfrag->offset + info->sent; struct mptcp_sock *msk =3D mptcp_sk(sk); bool zero_window_probe =3D false; struct mptcp_ext *mpext =3D NULL; - struct sk_buff *skb, *tail; - bool must_collapse =3D false; - int size_bias =3D 0; - int avail_size; - size_t ret =3D 0; + bool can_coalesce =3D false; + bool reuse_skb =3D true; + struct sk_buff *skb; + size_t copy; + int i; =20 pr_debug("msk=3D%p ssk=3D%p sending dfrag at seq=3D%llu len=3D%u already = sent=3D%u", msk, ssk, dfrag->data_seq, dfrag->data_len, info->sent); =20 + if (WARN_ON_ONCE(info->sent > info->limit || + info->limit > dfrag->data_len)) + return 0; + /* compute send limit */ info->mss_now =3D tcp_send_mss(ssk, &info->size_goal, info->flags); - avail_size =3D info->size_goal; + copy =3D info->size_goal; + skb =3D tcp_write_queue_tail(ssk); - if (skb) { + if (skb && copy > skb->len) { /* Limit the write to the size available in the * current skb, if any, so that we create at most a new skb. * Explicitly tells TCP internals to avoid collapsing on later @@ -1316,62 +1315,80 @@ static int mptcp_sendmsg_frag(struct sock *sk, stru= ct sock *ssk, goto alloc_skb; } =20 - must_collapse =3D (info->size_goal - skb->len > 0) && - (skb_shinfo(skb)->nr_frags < sysctl_max_skb_frags); - if (must_collapse) { - size_bias =3D skb->len; - avail_size =3D info->size_goal - skb->len; + i =3D skb_shinfo(skb)->nr_frags; + can_coalesce =3D skb_can_coalesce(skb, i, dfrag->page, offset); + if (!can_coalesce && i >=3D sysctl_max_skb_frags) { + tcp_mark_push(tcp_sk(ssk), skb); + goto alloc_skb; } - } =20 + copy -=3D skb->len; + } else { alloc_skb: - if (!must_collapse && !ssk->sk_tx_skb_cache && - !mptcp_alloc_tx_skb(sk, ssk, info->data_lock_held)) - return 0; + skb =3D mptcp_alloc_tx_skb(sk, ssk, info->data_lock_held); + if (!skb) + return -ENOMEM; + + i =3D skb_shinfo(skb)->nr_frags; + reuse_skb =3D false; + mpext =3D skb_ext_find(skb, SKB_EXT_MPTCP); + } =20 /* Zero window and all data acked? Probe. */ - avail_size =3D mptcp_check_allowed_size(msk, data_seq, avail_size); - if (avail_size =3D=3D 0) { + copy =3D mptcp_check_allowed_size(msk, data_seq, copy); + if (copy =3D=3D 0) { u64 snd_una =3D READ_ONCE(msk->snd_una); =20 - if (skb || snd_una !=3D msk->snd_nxt) + if (snd_una !=3D msk->snd_nxt) { + tcp_remove_empty_skb(ssk, tcp_write_queue_tail(ssk)); return 0; + } + zero_window_probe =3D true; data_seq =3D snd_una - 1; - avail_size =3D 1; - } + copy =3D 1; =20 - if (WARN_ON_ONCE(info->sent > info->limit || - info->limit > dfrag->data_len)) - return 0; + /* all mptcp-level data is acked, no skbs should be present into the + * ssk write queue + */ + WARN_ON_ONCE(reuse_skb); + } =20 - ret =3D info->limit - info->sent; - tail =3D tcp_build_frag(ssk, avail_size + size_bias, info->flags, - dfrag->page, dfrag->offset + info->sent, &ret); - if (!tail) { - tcp_remove_empty_skb(sk, tcp_write_queue_tail(ssk)); + copy =3D min_t(size_t, copy, info->limit - info->sent); + if (!sk_wmem_schedule(ssk, copy)) { + tcp_remove_empty_skb(ssk, tcp_write_queue_tail(ssk)); return -ENOMEM; } =20 - /* if the tail skb is still the cached one, collapsing really happened. - */ - if (skb =3D=3D tail) { - TCP_SKB_CB(tail)->tcp_flags &=3D ~TCPHDR_PSH; - mpext->data_len +=3D ret; + if (can_coalesce) { + skb_frag_size_add(&skb_shinfo(skb)->frags[i - 1], copy); + } else { + get_page(dfrag->page); + skb_fill_page_desc(skb, i, dfrag->page, offset, copy); + } + + skb->len +=3D copy; + skb->data_len +=3D copy; + skb->truesize +=3D copy; + sk_wmem_queued_add(ssk, copy); + sk_mem_charge(ssk, copy); + skb->ip_summed =3D CHECKSUM_PARTIAL; + WRITE_ONCE(tcp_sk(ssk)->write_seq, tcp_sk(ssk)->write_seq + copy); + TCP_SKB_CB(skb)->end_seq +=3D copy; + tcp_skb_pcount_set(skb, 0); + + /* on skb reuse we just need to update the DSS len */ + if (reuse_skb) { + TCP_SKB_CB(skb)->tcp_flags &=3D ~TCPHDR_PSH; + mpext->data_len +=3D copy; WARN_ON_ONCE(zero_window_probe); goto out; } =20 - mpext =3D skb_ext_find(tail, SKB_EXT_MPTCP); - if (WARN_ON_ONCE(!mpext)) { - /* should never reach here, stream corrupted */ - return -EINVAL; - } - memset(mpext, 0, sizeof(*mpext)); mpext->data_seq =3D data_seq; mpext->subflow_seq =3D mptcp_subflow_ctx(ssk)->rel_write_seq; - mpext->data_len =3D ret; + mpext->data_len =3D copy; mpext->use_map =3D 1; mpext->dsn64 =3D 1; =20 @@ -1380,18 +1397,18 @@ static int mptcp_sendmsg_frag(struct sock *sk, stru= ct sock *ssk, mpext->dsn64); =20 if (zero_window_probe) { - mptcp_subflow_ctx(ssk)->rel_write_seq +=3D ret; + mptcp_subflow_ctx(ssk)->rel_write_seq +=3D copy; mpext->frozen =3D 1; if (READ_ONCE(msk->csum_enabled)) - mptcp_update_data_checksum(tail, ret); + mptcp_update_data_checksum(skb, copy); tcp_push_pending_frames(ssk); return 0; } out: if (READ_ONCE(msk->csum_enabled)) - mptcp_update_data_checksum(tail, ret); - mptcp_subflow_ctx(ssk)->rel_write_seq +=3D ret; - return ret; + mptcp_update_data_checksum(skb, copy); + mptcp_subflow_ctx(ssk)->rel_write_seq +=3D copy; + return copy; } =20 #define MPTCP_SEND_BURST_SIZE ((1 << 16) - \ --=20 2.26.3