From nobody Sun May 5 12:11:09 2024 Delivered-To: wpasupplicant.patchew@gmail.com Received: by 2002:a05:6638:27c:0:0:0:0 with SMTP id x28csp3962801jaq; Tue, 5 Oct 2021 09:39:32 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxGOn+V6CJDnjorlzf2LReXIYvPvrJnj5vw868TZS4L3+1rKJ4b1UU/MnBxHuVd3f9LrAi4 X-Received: by 2002:ae9:df82:: with SMTP id t124mr15206704qkf.69.1633451972654; Tue, 05 Oct 2021 09:39:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1633451972; cv=none; d=google.com; s=arc-20160816; b=EwC7q0lxI3YU/yseKcKJ2JMxhe/Q/oMjyHGuXcvGEK6c1k3RuLugNLniQVsgv7uUsm z8rJx8wDFry0yUgxFbcr0SfV3SbS2o5BhmMR0XJ1f5hmGjLGLymqSxpBK2OPuGNhkmlz 9n9m7CVN5et/FqN/HuLu7dDJM7zOsi+sbJlbRWmQwRo9azM/EYWwC19K8tonnleY698f 2AJCaZ/fxmhVoFBuM5v6Nxcm8Tt026DVx9cfNPJ50Jwfun++JXrFSAZtOJoU9Z/ZekUE yX+4L8kgDBjMya5gytljfLPI4JTrbS/DQz07bCME8rXchRjHW4E10h5bOmvCnGLfSKkr o2Fw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:message-id:date:subject:cc:to :from:dkim-signature; bh=rT1807LSOgDCcahatDIj3xLoSeVkeaB9Axv+TD9wTq8=; b=kfILUcfJSErRUoOgGRkg9qUMOSgM3IhGst5ASNMSciebelWNJknWRDgbGOuqJ1OW5n SotF2tXOh15yxzoj6TNHGcX9kWyVYOriaMIB5U6ax1GOUJoOfwh1Qa9vMYZVW+YonGTJ aUwW2SxQkejwXKv6q21lGQIu5kC7QJ7tFsp9kmK8Xgm8+nfStgFfs4p/54is1U1kiSZ8 tmtryC7jWZauaX5MrDTuQAYANWeFMKUe+x0LygkDirH3vXcUE/4FQSjwMRwHak39DzhG I/2ZI+17mvrP7pmrjeBjeF/3XEMM/UbI9U4ZPTV3MhAv0z4OfO1xbKaMT8do8Pvn2FNF Ps3A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Q3zJOYiC; spf=pass (google.com: domain of mptcp+bounces-2110-wpasupplicant.patchew=gmail.com@lists.linux.dev designates 147.75.197.195 as permitted sender) smtp.mailfrom="mptcp+bounces-2110-wpasupplicant.patchew=gmail.com@lists.linux.dev"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from ewr.edge.kernel.org (ewr.edge.kernel.org. [147.75.197.195]) by mx.google.com with ESMTPS id h10si7542186qkk.206.2021.10.05.09.39.32 for (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 05 Oct 2021 09:39:32 -0700 (PDT) Received-SPF: pass (google.com: domain of mptcp+bounces-2110-wpasupplicant.patchew=gmail.com@lists.linux.dev designates 147.75.197.195 as permitted sender) client-ip=147.75.197.195; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Q3zJOYiC; spf=pass (google.com: domain of mptcp+bounces-2110-wpasupplicant.patchew=gmail.com@lists.linux.dev designates 147.75.197.195 as permitted sender) smtp.mailfrom="mptcp+bounces-2110-wpasupplicant.patchew=gmail.com@lists.linux.dev"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ewr.edge.kernel.org (Postfix) with ESMTPS id ADED21C09F5 for ; Tue, 5 Oct 2021 16:39:31 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 532A52C8A; Tue, 5 Oct 2021 16:39:30 +0000 (UTC) X-Original-To: mptcp@lists.linux.dev Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E46772C89 for ; Tue, 5 Oct 2021 16:39:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1633451966; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=rT1807LSOgDCcahatDIj3xLoSeVkeaB9Axv+TD9wTq8=; b=Q3zJOYiCFFAiKgY5gU/NLsv/hiC9LsH9FK+ISHLJpsu6fn+d4xXBN7UMJLHApPCgWnSirR G71jzA0/NxNoT2oJpSJE1DYBIVhXLMC8/CEiAUbMw1ikpUjhz/436GgzsdrDyjrEF8YEnI o/5ptgz8iU1vk/LEf7J8cKHA3eFVOz8= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-525-0GZv3-QBPNWH9ibvXxpmgQ-1; Tue, 05 Oct 2021 12:39:25 -0400 X-MC-Unique: 0GZv3-QBPNWH9ibvXxpmgQ-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id C7267362FC for ; Tue, 5 Oct 2021 16:39:24 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.39.194.71]) by smtp.corp.redhat.com (Postfix) with ESMTP id DF73919730; Tue, 5 Oct 2021 16:39:23 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Cc: fwestpha@redhat.com Subject: [RFC PATCH] mptcp: allocate fwd memory separatelly on the rx and tx path Date: Tue, 5 Oct 2021 18:39:08 +0200 Message-Id: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=pabeni@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" All the mptcp receive path is protected by the msk socket spinlock. As consequence, the tx path has to play a few to allocate the forward memory without acquiring the spinlock multile times, making the overall TX path quite complex. This patch tries to clean-up a bit the tx path, using completely separated fwd memory allocation, for the rx and the tx path. The forward memory allocated in the rx path is now accounted in msk->rmem_fwd_alloc and is (still) protected by the msk socket spinlock. To cope with the above we provied a few MPTCP-specific variant for the helpers to charge, uncharge, reclaim and free the forward memory in the receive path. msk->sk_forward_alloc now accounts only the forward memory for the tx path, we can the plain core sock helper to manipulate it and drop quite a bit of complexity. On memory pressure reclaim both rx and tx fwd memory. Signed-off-by: Paolo Abeni --- Note: this would additionally require a core change to properly fetch the forward allocated memory in inet_sk_diag_fill(). I think a new 'struct proto' ops would do: struct proto { // ... int (*forward_alloc)(const struct sock *sk); // ... }; static inline sk_forward_alloc(const struct sock *sk) { if (!sk->sk_prot.forward_alloc) return sk->sk_forward_alloc; return sk->sk_prot.forward_alloc(sk); } --- net/mptcp/protocol.c | 219 ++++++++++++++++++------------------------- net/mptcp/protocol.h | 15 +-- 2 files changed, 90 insertions(+), 144 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 21716392e754..1d26462a1daf 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -126,6 +126,11 @@ static void mptcp_drop(struct sock *sk, struct sk_buff= *skb) __kfree_skb(skb); } =20 +static void mptcp_rmem_charge(struct sock *sk, int size) +{ + mptcp_sk(sk)->rmem_fwd_alloc -=3D size; +} + static bool mptcp_try_coalesce(struct sock *sk, struct sk_buff *to, struct sk_buff *from) { @@ -142,7 +147,7 @@ static bool mptcp_try_coalesce(struct sock *sk, struct = sk_buff *to, MPTCP_SKB_CB(to)->end_seq =3D MPTCP_SKB_CB(from)->end_seq; kfree_skb_partial(from, fragstolen); atomic_add(delta, &sk->sk_rmem_alloc); - sk_mem_charge(sk, delta); + mptcp_rmem_charge(sk, delta); return true; } =20 @@ -155,6 +160,50 @@ static bool mptcp_ooo_try_coalesce(struct mptcp_sock *= msk, struct sk_buff *to, return mptcp_try_coalesce((struct sock *)msk, to, from); } =20 +void __mptcp_rmem_reclaim(struct sock *sk, int amount) +{ + amount >>=3D SK_MEM_QUANTUM_SHIFT; + mptcp_sk(sk)->rmem_fwd_alloc -=3D amount << SK_MEM_QUANTUM_SHIFT; + __sk_mem_reduce_allocated(sk, amount); +} + +static void mptcp_rmem_uncharge(struct sock *sk, int size) +{ + struct mptcp_sock *msk =3D mptcp_sk(sk); + int reclaimable; + + msk->rmem_fwd_alloc +=3D size; + reclaimable =3D msk->rmem_fwd_alloc - sk_unused_reserved_mem(sk); + + /* Avoid a possible overflow. + * TCP send queues can make this happen, if sk_mem_reclaim() + * is not called and more than 2 GBytes are released at once. + * + * If we reach 2 MBytes, reclaim 1 MBytes right now, there is + * no need to hold that much forward allocation anyway. + */ + if (unlikely(reclaimable >=3D 1 << 21)) + __mptcp_rmem_reclaim(sk, (1 << 20)); +} + +static void mptcp_rfree(struct sk_buff *skb) +{ + unsigned int len =3D skb->truesize; + struct sock *sk =3D skb->sk; + + atomic_sub(len, &sk->sk_rmem_alloc); + mptcp_rmem_uncharge(sk, len); +} + +static void mptcp_set_owner_r(struct sk_buff *skb, struct sock *sk) +{ + skb_orphan(skb); + skb->sk =3D sk; + skb->destructor =3D mptcp_rfree; + atomic_add(skb->truesize, &sk->sk_rmem_alloc); + mptcp_rmem_charge(sk, skb->truesize); +} + /* "inspired" by tcp_data_queue_ofo(), main differences: * - use mptcp seqs * - don't cope with sacks @@ -267,7 +316,29 @@ static void mptcp_data_queue_ofo(struct mptcp_sock *ms= k, struct sk_buff *skb) =20 end: skb_condense(skb); - skb_set_owner_r(skb, sk); + mptcp_set_owner_r(skb, sk); +} + +static bool mptcp_rmem_schedule(struct sock *sk, struct sock *ssk, int siz= e) +{ + struct mptcp_sock *msk =3D mptcp_sk(sk); + int amt, amount; + + if (size < msk->rmem_fwd_alloc) + return true;; + + amt =3D sk_mem_pages(size); + amount =3D amt << SK_MEM_QUANTUM_SHIFT; + msk->rmem_fwd_alloc +=3D amount; + if (!__sk_mem_raise_allocated(sk, size, amt, SK_MEM_RECV)) { + if (ssk->sk_forward_alloc < amount) { + msk->rmem_fwd_alloc -=3D amount; + return false; + } + + ssk->sk_forward_alloc -=3D amount; + } + return true; } =20 static bool __mptcp_move_skb(struct mptcp_sock *msk, struct sock *ssk, @@ -285,15 +356,8 @@ static bool __mptcp_move_skb(struct mptcp_sock *msk, s= truct sock *ssk, skb_orphan(skb); =20 /* try to fetch required memory from subflow */ - if (!sk_rmem_schedule(sk, skb, skb->truesize)) { - int amount =3D sk_mem_pages(skb->truesize) << SK_MEM_QUANTUM_SHIFT; - - if (ssk->sk_forward_alloc < amount) - goto drop; - - ssk->sk_forward_alloc -=3D amount; - sk->sk_forward_alloc +=3D amount; - } + if (!mptcp_rmem_schedule(sk, ssk, skb->truesize)) + goto drop; =20 has_rxtstamp =3D TCP_SKB_CB(skb)->has_rxtstamp; =20 @@ -313,7 +377,7 @@ static bool __mptcp_move_skb(struct mptcp_sock *msk, st= ruct sock *ssk, if (tail && mptcp_try_coalesce(sk, tail, skb)) return true; =20 - skb_set_owner_r(skb, sk); + mptcp_set_owner_r(skb, sk); __skb_queue_tail(&sk->sk_receive_queue, skb); return true; } else if (after64(MPTCP_SKB_CB(skb)->map_seq, msk->ack_seq)) { @@ -908,122 +972,20 @@ static bool mptcp_frag_can_collapse_to(const struct = mptcp_sock *msk, df->data_seq + df->data_len =3D=3D msk->write_seq; } =20 -static int mptcp_wmem_with_overhead(int size) -{ - return size + ((sizeof(struct mptcp_data_frag) * size) >> PAGE_SHIFT); -} - -static void __mptcp_wmem_reserve(struct sock *sk, int size) -{ - int amount =3D mptcp_wmem_with_overhead(size); - struct mptcp_sock *msk =3D mptcp_sk(sk); - - WARN_ON_ONCE(msk->wmem_reserved); - if (WARN_ON_ONCE(amount < 0)) - amount =3D 0; - - if (amount <=3D sk->sk_forward_alloc) - goto reserve; - - /* under memory pressure try to reserve at most a single page - * otherwise try to reserve the full estimate and fallback - * to a single page before entering the error path - */ - if ((tcp_under_memory_pressure(sk) && amount > PAGE_SIZE) || - !sk_wmem_schedule(sk, amount)) { - if (amount <=3D PAGE_SIZE) - goto nomem; - - amount =3D PAGE_SIZE; - if (!sk_wmem_schedule(sk, amount)) - goto nomem; - } - -reserve: - msk->wmem_reserved =3D amount; - sk->sk_forward_alloc -=3D amount; - return; - -nomem: - /* we will wait for memory on next allocation */ - msk->wmem_reserved =3D -1; -} - -static void __mptcp_update_wmem(struct sock *sk) +static void __mptcp_mem_reclaim_partial(struct sock *sk) { - struct mptcp_sock *msk =3D mptcp_sk(sk); + int reclaimable =3D mptcp_sk(sk)->rmem_fwd_alloc - sk_unused_reserved_mem= (sk); =20 lockdep_assert_held_once(&sk->sk_lock.slock); =20 - if (!msk->wmem_reserved) - return; - - if (msk->wmem_reserved < 0) - msk->wmem_reserved =3D 0; - if (msk->wmem_reserved > 0) { - sk->sk_forward_alloc +=3D msk->wmem_reserved; - msk->wmem_reserved =3D 0; - } -} - -static bool mptcp_wmem_alloc(struct sock *sk, int size) -{ - struct mptcp_sock *msk =3D mptcp_sk(sk); - - /* check for pre-existing error condition */ - if (msk->wmem_reserved < 0) - return false; - - if (msk->wmem_reserved >=3D size) - goto account; - - mptcp_data_lock(sk); - if (!sk_wmem_schedule(sk, size)) { - mptcp_data_unlock(sk); - return false; - } - - sk->sk_forward_alloc -=3D size; - msk->wmem_reserved +=3D size; - mptcp_data_unlock(sk); - -account: - msk->wmem_reserved -=3D size; - return true; -} - -static void mptcp_wmem_uncharge(struct sock *sk, int size) -{ - struct mptcp_sock *msk =3D mptcp_sk(sk); - - if (msk->wmem_reserved < 0) - msk->wmem_reserved =3D 0; - msk->wmem_reserved +=3D size; -} - -static void __mptcp_mem_reclaim_partial(struct sock *sk) -{ - lockdep_assert_held_once(&sk->sk_lock.slock); - __mptcp_update_wmem(sk); + __mptcp_rmem_reclaim(sk, reclaimable - 1); sk_mem_reclaim_partial(sk); } =20 static void mptcp_mem_reclaim_partial(struct sock *sk) { - struct mptcp_sock *msk =3D mptcp_sk(sk); - - /* if we are experiencing a transint allocation error, - * the forward allocation memory has been already - * released - */ - if (msk->wmem_reserved < 0) - return; - mptcp_data_lock(sk); - sk->sk_forward_alloc +=3D msk->wmem_reserved; - sk_mem_reclaim_partial(sk); - msk->wmem_reserved =3D sk->sk_forward_alloc; - sk->sk_forward_alloc =3D 0; + __mptcp_mem_reclaim_partial(sk); mptcp_data_unlock(sk); } =20 @@ -1664,7 +1626,6 @@ static void __mptcp_subflow_push_pending(struct sock = *sk, struct sock *ssk) /* __mptcp_alloc_tx_skb could have released some wmem and we are * not going to flush it via release_sock() */ - __mptcp_update_wmem(sk); if (copied) { tcp_push(ssk, 0, info.mss_now, tcp_sk(ssk)->nonagle, info.size_goal); @@ -1701,7 +1662,7 @@ static int mptcp_sendmsg(struct sock *sk, struct msgh= dr *msg, size_t len) /* silently ignore everything else */ msg->msg_flags &=3D MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL; =20 - mptcp_lock_sock(sk, __mptcp_wmem_reserve(sk, min_t(size_t, 1 << 20, len))= ); + lock_sock(sk); =20 timeo =3D sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT); =20 @@ -1749,17 +1710,17 @@ static int mptcp_sendmsg(struct sock *sk, struct ms= ghdr *msg, size_t len) psize =3D min_t(size_t, psize, msg_data_left(msg)); total_ts =3D psize + frag_truesize; =20 - if (!mptcp_wmem_alloc(sk, total_ts)) + if (!sk_wmem_schedule(sk, total_ts)) goto wait_for_memory; =20 if (copy_page_from_iter(dfrag->page, offset, psize, &msg->msg_iter) !=3D psize) { - mptcp_wmem_uncharge(sk, psize + frag_truesize); ret =3D -EFAULT; goto out; } =20 /* data successfully copied into the write queue */ + sk->sk_forward_alloc -=3D total_ts; copied +=3D psize; dfrag->data_len +=3D psize; frag_truesize +=3D psize; @@ -1956,7 +1917,7 @@ static void __mptcp_update_rmem(struct sock *sk) return; =20 atomic_sub(msk->rmem_released, &sk->sk_rmem_alloc); - sk_mem_uncharge(sk, msk->rmem_released); + mptcp_rmem_uncharge(sk, msk->rmem_released); WRITE_ONCE(msk->rmem_released, 0); } =20 @@ -2024,7 +1985,7 @@ static int mptcp_recvmsg(struct sock *sk, struct msgh= dr *msg, size_t len, if (unlikely(flags & MSG_ERRQUEUE)) return inet_recv_error(sk, msg, len, addr_len); =20 - mptcp_lock_sock(sk, __mptcp_splice_receive_queue(sk)); + lock_sock(sk); if (unlikely(sk->sk_state =3D=3D TCP_LISTEN)) { copied =3D -ENOTCONN; goto out_err; @@ -2504,7 +2465,6 @@ static int __mptcp_init_sock(struct sock *sk) __skb_queue_head_init(&msk->receive_queue); msk->out_of_order_queue =3D RB_ROOT; msk->first_pending =3D NULL; - msk->wmem_reserved =3D 0; WRITE_ONCE(msk->rmem_released, 0); msk->timer_ival =3D TCP_RTO_MIN; =20 @@ -2719,7 +2679,7 @@ static void __mptcp_destroy_sock(struct sock *sk) =20 sk->sk_prot->destroy(sk); =20 - WARN_ON_ONCE(msk->wmem_reserved); + WARN_ON_ONCE(msk->rmem_fwd_alloc); WARN_ON_ONCE(msk->rmem_released); sk_stream_kill_queues(sk); xfrm_sk_free_policy(sk); @@ -2954,6 +2914,9 @@ void mptcp_destroy_common(struct mptcp_sock *msk) =20 /* move to sk_receive_queue, sk_stream_kill_queues will purge it */ skb_queue_splice_tail_init(&msk->receive_queue, &sk->sk_receive_queue); + __skb_queue_purge(&sk->sk_receive_queue); + sk->sk_forward_alloc +=3D msk->rmem_fwd_alloc; + msk->rmem_fwd_alloc =3D 0; =20 skb_rbtree_purge(&msk->out_of_order_queue); mptcp_token_destroy(msk); @@ -3037,10 +3000,6 @@ static void mptcp_release_cb(struct sock *sk) if (test_and_clear_bit(MPTCP_ERROR_REPORT, &mptcp_sk(sk)->flags)) __mptcp_error_report(sk); =20 - /* push_pending may touch wmem_reserved, ensure we do the cleanup - * later - */ - __mptcp_update_wmem(sk); __mptcp_update_rmem(sk); } =20 diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index 7379ab580a7e..cfb374634a83 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -227,7 +227,7 @@ struct mptcp_sock { u64 ack_seq; u64 rcv_wnd_sent; u64 rcv_data_fin_seq; - int wmem_reserved; + int rmem_fwd_alloc; struct sock *last_snd; int snd_burst; int old_wspace; @@ -272,19 +272,6 @@ struct mptcp_sock { char ca_name[TCP_CA_NAME_MAX]; }; =20 -#define mptcp_lock_sock(___sk, cb) do { \ - struct sock *__sk =3D (___sk); /* silence macro reuse warning */ \ - might_sleep(); \ - spin_lock_bh(&__sk->sk_lock.slock); \ - if (__sk->sk_lock.owned) \ - __lock_sock(__sk); \ - cb; \ - __sk->sk_lock.owned =3D 1; \ - spin_unlock(&__sk->sk_lock.slock); \ - mutex_acquire(&__sk->sk_lock.dep_map, 0, 0, _RET_IP_); \ - local_bh_enable(); \ -} while (0) - #define mptcp_data_lock(sk) spin_lock_bh(&(sk)->sk_lock.slock) #define mptcp_data_unlock(sk) spin_unlock_bh(&(sk)->sk_lock.slock) =20 --=20 2.26.3