From nobody Thu Dec 26 20:24:29 2024 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CCED31A01C4 for ; Fri, 29 Nov 2024 17:45:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732902320; cv=none; b=LLTb+8VmUma31DW9+r+8wq1efoZsruZSRdhBK/Xb90YFj8llbI+afkr7qtMjpnDxYT6b9vutPhp5ao7TlWTr3JiPhHMh2e/PYiaZsWopw6aCwW1xeMfjvNZJ8vOq/HHCGFvoiYkljryW/WySG5pkf9b6ZlX9tyRvGdpbsHupzDg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732902320; c=relaxed/simple; bh=KF+J4q7oRd+I7I9zatDjqHDiqm1jVBpye/Kyn9MvDg0=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=axFGc507I+4hfTLyKlyoxxyVYwtOBsCZS0HmIfTnu9zRuzh4UbXETB3TBrMoB83+G5d03LjkfnZ27CLZssR30weiI8mOg5+MgL7GnUvGO3MRnYfWNAaxZ8JGFv8pC2MGvnsX4hU+/NCL7c/4GuERIN/qmP9aRX36CUDPn7gnqMI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=e7+NTTMl; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="e7+NTTMl" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1732902316; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=iRG4XVTr7kH+D8O6lfBhwViJt6foO4w32ofzaV73Htk=; b=e7+NTTMlfH1G+gAfLkgiawYd+mVSOJHQXLVvoxb8FBQKH4uuPoCs5uA3EzMCbXSjwDmwtw ZpEF6ITiC4oNh5ttcUiLEjUTjsMHg09BE995ogCWXbIpYO1WBxro4SYSflFLY8Wf+cThRx wDTeYV7vsJW/3ixfJVKLeA/1a4+qWKQ= Received: from mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-197-jWUYZqo1NsGzGzPZX3gGLA-1; Fri, 29 Nov 2024 12:45:15 -0500 X-MC-Unique: jWUYZqo1NsGzGzPZX3gGLA-1 X-Mimecast-MFC-AGG-ID: jWUYZqo1NsGzGzPZX3gGLA Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-02.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 794C31944DC5 for ; Fri, 29 Nov 2024 17:45:14 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.39.193.89]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id AED5B1955D45 for ; Fri, 29 Nov 2024 17:45:13 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Subject: [PATCH mptcp-next-next 1/3] mptcp: consolidate subflow cleanup Date: Fri, 29 Nov 2024 18:45:03 +0100 Message-ID: <4a640805e72901a59f0029d61a3eacf2068f0ca8.1732902181.git.pabeni@redhat.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: 7ya4_p_4GG6gjRf1Rxg5w7cMDGUW9MJ9-iW7_YrgYBc_1732902314 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" Consolidate all the cleanup actions requiring the worked in a single helper and ensure the dummy data fin creation for fallback socket is performed only when the tcp rx queue is empty. There are no functional changes intended, but this will simplify the next patch, when the tcp rx queue spooling could be delayed at release_cb time. Signed-off-by: Paolo Abeni --- net/mptcp/subflow.c | 33 ++++++++++++++++++--------------- 1 file changed, 18 insertions(+), 15 deletions(-) diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index fd021cf8286e..2926bdf88e42 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -1271,7 +1271,12 @@ static void mptcp_subflow_discard_data(struct sock *= ssk, struct sk_buff *skb, subflow->map_valid =3D 0; } =20 -/* sched mptcp worker to remove the subflow if no more data is pending */ +static bool subflow_is_done(const struct sock *sk) +{ + return sk->sk_shutdown & RCV_SHUTDOWN || sk->sk_state =3D=3D TCP_CLOSE; +} + +/* sched mptcp worker for subflow cleanup if no more data is pending */ static void subflow_sched_work_if_closed(struct mptcp_sock *msk, struct so= ck *ssk) { struct sock *sk =3D (struct sock *)msk; @@ -1281,8 +1286,18 @@ static void subflow_sched_work_if_closed(struct mptc= p_sock *msk, struct sock *ss inet_sk_state_load(sk) !=3D TCP_ESTABLISHED))) return; =20 - if (skb_queue_empty(&ssk->sk_receive_queue) && - !test_and_set_bit(MPTCP_WORK_CLOSE_SUBFLOW, &msk->flags)) + if (!skb_queue_empty(&ssk->sk_receive_queue)) + return; + + if (!test_and_set_bit(MPTCP_WORK_CLOSE_SUBFLOW, &msk->flags)) + mptcp_schedule_work(sk); + + /* when the fallback subflow closes the rx side, trigger a 'dummy' + * ingress data fin, so that the msk state will follow along + */ + if (__mptcp_check_fallback(msk) && subflow_is_done(ssk) && + msk->first =3D=3D ssk && + mptcp_update_rcv_data_fin(msk, READ_ONCE(msk->ack_seq), true)) mptcp_schedule_work(sk); } =20 @@ -1842,11 +1857,6 @@ static void __subflow_state_change(struct sock *sk) rcu_read_unlock(); } =20 -static bool subflow_is_done(const struct sock *sk) -{ - return sk->sk_shutdown & RCV_SHUTDOWN || sk->sk_state =3D=3D TCP_CLOSE; -} - static void subflow_state_change(struct sock *sk) { struct mptcp_subflow_context *subflow =3D mptcp_subflow_ctx(sk); @@ -1873,13 +1883,6 @@ static void subflow_state_change(struct sock *sk) subflow_error_report(sk); =20 subflow_sched_work_if_closed(mptcp_sk(parent), sk); - - /* when the fallback subflow closes the rx side, trigger a 'dummy' - * ingress data fin, so that the msk state will follow along - */ - if (__mptcp_check_fallback(msk) && subflow_is_done(sk) && msk->first =3D= =3D sk && - mptcp_update_rcv_data_fin(msk, READ_ONCE(msk->ack_seq), true)) - mptcp_schedule_work(parent); } =20 void mptcp_subflow_queue_clean(struct sock *listener_sk, struct sock *list= ener_ssk) --=20 2.45.2 From nobody Thu Dec 26 20:24:29 2024 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E9D8B1A01B8 for ; Fri, 29 Nov 2024 17:45:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732902320; cv=none; b=crATHYpN4+vfYZoHPPUa+xw6GD7Fxfkyc6og0GAsBwX56HPHume/XWXScnJY4bolX1oQjg/wNcWhn9qOF08YUF/D1UD1jRKaE8/khVXUEruHRmXWAA7U5TtJdT/PNHoAd0PyCcoWZ8f17wCo1ZgTDlapbLE1o5Xn55cNrKMOKu8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732902320; c=relaxed/simple; bh=+et46xe7dXH7E+cqite9xGpuZUuKui3Tz8SoyamHWHI=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=a4h5H2qtDzbSTC1BvBXHV4IYtO+CK7gjX9z5KbnZPwe/rgOF+wZi9DLWSI548ds/LfSJefpxK9sExSbtSdHmerQ6P5hyJtY5M4bln4Zmb3Y9peo6fy+ZLt5ZMdapjRIpSKyNBW2aQxjU4+9RpBQVnzZ02eO9iziOw6b8YH/6rUs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=XjkGctOR; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="XjkGctOR" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1732902318; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=GIWefwGQCRsRAodeRnbqsw99HxuzH3ERqs+5oT8ij6I=; b=XjkGctORQBAIPxmLYZQrhzlUNa6v8+2mockSb1WnRyMm1rz3uXic8Qs/a/u1prEmcdNTVi 0BVeVXCwtSemm8ez5orETQO/7WxAg3L9sTuK9yVItBqSZDLSXA4O1BW9hZ/6WTq1WFpZv3 KxnHLHHYJs+OkDQOc08SiCh/NPiwW5Q= Received: from mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-223-_TZYmieGPwmuL-Pm3jxEuA-1; Fri, 29 Nov 2024 12:45:16 -0500 X-MC-Unique: _TZYmieGPwmuL-Pm3jxEuA-1 X-Mimecast-MFC-AGG-ID: _TZYmieGPwmuL-Pm3jxEuA Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-05.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id BD6E81955EB3 for ; Fri, 29 Nov 2024 17:45:15 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.39.193.89]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id F01F81955D45 for ; Fri, 29 Nov 2024 17:45:14 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Subject: [PATCH mptcp-next-next 2/3] mptcp: move the whole rx path under msk socket lock protection Date: Fri, 29 Nov 2024 18:45:04 +0100 Message-ID: <19310c3f96743f3298c029ec5a89b4e4f1ed10b9.1732902181.git.pabeni@redhat.com> In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: NYyj60oWeWypvYkq3z8yUuU_J0Msh5gKWhX0PDy1Ixo_1732902315 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" After commit c2e6048fa1cf ("mptcp: fix race in release_cb") it's pretty straight forward move the whole MPTCP rx path under the socket lock leveraging the release_cb. We can drop a bunch of spin_lock pairs in the receive functions, use a single receive queue and invoke __mptcp_move_skbs only when subflows ask for it. This will allow more cleanup in the next patch Signed-off-by: Paolo Abeni --- net/mptcp/fastopen.c | 2 ++ net/mptcp/protocol.c | 76 +++++++++++++++++++------------------------- net/mptcp/protocol.h | 2 +- 3 files changed, 36 insertions(+), 44 deletions(-) diff --git a/net/mptcp/fastopen.c b/net/mptcp/fastopen.c index a29ff901df75..fb945c0d50bf 100644 --- a/net/mptcp/fastopen.c +++ b/net/mptcp/fastopen.c @@ -49,6 +49,7 @@ void mptcp_fastopen_subflow_synack_set_params(struct mptc= p_subflow_context *subf MPTCP_SKB_CB(skb)->has_rxtstamp =3D TCP_SKB_CB(skb)->has_rxtstamp; =20 mptcp_data_lock(sk); + DEBUG_NET_WARN_ON_ONCE(sock_owned_by_user_nocheck(sk)); =20 mptcp_set_owner_r(skb, sk); __skb_queue_tail(&sk->sk_receive_queue, skb); @@ -65,6 +66,7 @@ void __mptcp_fastopen_gen_msk_ackseq(struct mptcp_sock *m= sk, struct mptcp_subflo struct sock *sk =3D (struct sock *)msk; struct sk_buff *skb; =20 + DEBUG_NET_WARN_ON_ONCE(sock_owned_by_user_nocheck(sk)); skb =3D skb_peek_tail(&sk->sk_receive_queue); if (skb) { WARN_ON_ONCE(MPTCP_SKB_CB(skb)->end_seq); diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index f768aa4473fb..159add48f6d9 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -845,7 +845,7 @@ static bool move_skbs_to_msk(struct mptcp_sock *msk, st= ruct sock *ssk) return moved > 0; } =20 -void mptcp_data_ready(struct sock *sk, struct sock *ssk) +static void __mptcp_data_ready(struct sock *sk, struct sock *ssk) { struct mptcp_subflow_context *subflow =3D mptcp_subflow_ctx(ssk); struct mptcp_sock *msk =3D mptcp_sk(sk); @@ -868,9 +868,17 @@ void mptcp_data_ready(struct sock *sk, struct sock *ss= k) return; =20 /* Wake-up the reader only for in-sequence data */ - mptcp_data_lock(sk); if (move_skbs_to_msk(msk, ssk) && mptcp_epollin_ready(sk)) sk->sk_data_ready(sk); +} + +void mptcp_data_ready(struct sock *sk, struct sock *ssk) +{ + mptcp_data_lock(sk); + if (!sock_owned_by_user(sk)) + __mptcp_data_ready(sk, ssk); + else + __set_bit(MPTCP_DEQUEUE, &mptcp_sk(sk)->cb_flags); mptcp_data_unlock(sk); } =20 @@ -1077,9 +1085,7 @@ static void __mptcp_clean_una_wakeup(struct sock *sk) =20 static void mptcp_clean_una_wakeup(struct sock *sk) { - mptcp_data_lock(sk); __mptcp_clean_una_wakeup(sk); - mptcp_data_unlock(sk); } =20 static void mptcp_enter_memory_pressure(struct sock *sk) @@ -1939,16 +1945,22 @@ static int mptcp_sendmsg(struct sock *sk, struct ms= ghdr *msg, size_t len) goto out; } =20 -static int __mptcp_recvmsg_mskq(struct mptcp_sock *msk, +static bool __mptcp_move_skbs(struct sock *sk); + +static int __mptcp_recvmsg_mskq(struct sock *sk, struct msghdr *msg, size_t len, int flags, struct scm_timestamping_internal *tss, int *cmsg_flags) { + struct mptcp_sock *msk =3D mptcp_sk(sk); struct sk_buff *skb, *tmp; int copied =3D 0; =20 - skb_queue_walk_safe(&msk->receive_queue, skb, tmp) { + if (skb_queue_empty(&sk->sk_receive_queue) && !__mptcp_move_skbs(sk)) + return 0; + + skb_queue_walk_safe(&sk->sk_receive_queue, skb, tmp) { u32 offset =3D MPTCP_SKB_CB(skb)->offset; u32 data_len =3D skb->len - offset; u32 count =3D min_t(size_t, len - copied, data_len); @@ -1983,7 +1995,7 @@ static int __mptcp_recvmsg_mskq(struct mptcp_sock *ms= k, /* we will bulk release the skb memory later */ skb->destructor =3D NULL; WRITE_ONCE(msk->rmem_released, msk->rmem_released + skb->truesize); - __skb_unlink(skb, &msk->receive_queue); + __skb_unlink(skb, &sk->sk_receive_queue); __kfree_skb(skb); msk->bytes_consumed +=3D count; } @@ -2107,16 +2119,9 @@ static void __mptcp_update_rmem(struct sock *sk) WRITE_ONCE(msk->rmem_released, 0); } =20 -static void __mptcp_splice_receive_queue(struct sock *sk) +static bool __mptcp_move_skbs(struct sock *sk) { struct mptcp_sock *msk =3D mptcp_sk(sk); - - skb_queue_splice_tail_init(&sk->sk_receive_queue, &msk->receive_queue); -} - -static bool __mptcp_move_skbs(struct mptcp_sock *msk) -{ - struct sock *sk =3D (struct sock *)msk; unsigned int moved =3D 0; bool ret, done; =20 @@ -2124,37 +2129,27 @@ static bool __mptcp_move_skbs(struct mptcp_sock *ms= k) struct sock *ssk =3D mptcp_subflow_recv_lookup(msk); bool slowpath; =20 - /* we can have data pending in the subflows only if the msk - * receive buffer was full at subflow_data_ready() time, - * that is an unlikely slow path. - */ - if (likely(!ssk)) + if (unlikely(!ssk)) break; =20 slowpath =3D lock_sock_fast(ssk); - mptcp_data_lock(sk); __mptcp_update_rmem(sk); done =3D __mptcp_move_skbs_from_subflow(msk, ssk, &moved); - mptcp_data_unlock(sk); =20 if (unlikely(ssk->sk_err)) __mptcp_error_report(sk); unlock_sock_fast(ssk, slowpath); } while (!done); =20 - /* acquire the data lock only if some input data is pending */ ret =3D moved > 0; if (!RB_EMPTY_ROOT(&msk->out_of_order_queue) || - !skb_queue_empty_lockless(&sk->sk_receive_queue)) { - mptcp_data_lock(sk); + !skb_queue_empty(&sk->sk_receive_queue)) { __mptcp_update_rmem(sk); ret |=3D __mptcp_ofo_queue(msk); - __mptcp_splice_receive_queue(sk); - mptcp_data_unlock(sk); } if (ret) mptcp_check_data_fin((struct sock *)msk); - return !skb_queue_empty(&msk->receive_queue); + return ret; } =20 static unsigned int mptcp_inq_hint(const struct sock *sk) @@ -2162,7 +2157,7 @@ static unsigned int mptcp_inq_hint(const struct sock = *sk) const struct mptcp_sock *msk =3D mptcp_sk(sk); const struct sk_buff *skb; =20 - skb =3D skb_peek(&msk->receive_queue); + skb =3D skb_peek(&sk->sk_receive_queue); if (skb) { u64 hint_val =3D READ_ONCE(msk->ack_seq) - MPTCP_SKB_CB(skb)->map_seq; =20 @@ -2208,7 +2203,7 @@ static int mptcp_recvmsg(struct sock *sk, struct msgh= dr *msg, size_t len, while (copied < len) { int err, bytes_read; =20 - bytes_read =3D __mptcp_recvmsg_mskq(msk, msg, len - copied, flags, &tss,= &cmsg_flags); + bytes_read =3D __mptcp_recvmsg_mskq(sk, msg, len - copied, flags, &tss, = &cmsg_flags); if (unlikely(bytes_read < 0)) { if (!copied) copied =3D bytes_read; @@ -2220,8 +2215,6 @@ static int mptcp_recvmsg(struct sock *sk, struct msgh= dr *msg, size_t len, /* be sure to advertise window change */ mptcp_cleanup_rbuf(msk); =20 - if (skb_queue_empty(&msk->receive_queue) && __mptcp_move_skbs(msk)) - continue; =20 /* only the MPTCP socket status is relevant here. The exit * conditions mirror closely tcp_recvmsg() @@ -2246,7 +2239,7 @@ static int mptcp_recvmsg(struct sock *sk, struct msgh= dr *msg, size_t len, /* race breaker: the shutdown could be after the * previous receive queue check */ - if (__mptcp_move_skbs(msk)) + if (__mptcp_move_skbs(sk)) continue; break; } @@ -2290,9 +2283,8 @@ static int mptcp_recvmsg(struct sock *sk, struct msgh= dr *msg, size_t len, } } =20 - pr_debug("msk=3D%p rx queue empty=3D%d:%d copied=3D%d\n", - msk, skb_queue_empty_lockless(&sk->sk_receive_queue), - skb_queue_empty(&msk->receive_queue), copied); + pr_debug("msk=3D%p rx queue empty=3D%d copied=3D%d", + msk, skb_queue_empty(&sk->sk_receive_queue), copied); =20 release_sock(sk); return copied; @@ -2819,7 +2811,6 @@ static void __mptcp_init_sock(struct sock *sk) INIT_LIST_HEAD(&msk->join_list); INIT_LIST_HEAD(&msk->rtx_queue); INIT_WORK(&msk->work, mptcp_worker); - __skb_queue_head_init(&msk->receive_queue); msk->out_of_order_queue =3D RB_ROOT; msk->first_pending =3D NULL; WRITE_ONCE(msk->rmem_fwd_alloc, 0); @@ -3402,12 +3393,8 @@ void mptcp_destroy_common(struct mptcp_sock *msk, un= signed int flags) mptcp_for_each_subflow_safe(msk, subflow, tmp) __mptcp_close_ssk(sk, mptcp_subflow_tcp_sock(subflow), subflow, flags); =20 - /* move to sk_receive_queue, sk_stream_kill_queues will purge it */ - mptcp_data_lock(sk); - skb_queue_splice_tail_init(&msk->receive_queue, &sk->sk_receive_queue); __skb_queue_purge(&sk->sk_receive_queue); skb_rbtree_purge(&msk->out_of_order_queue); - mptcp_data_unlock(sk); =20 /* move all the rx fwd alloc into the sk_mem_reclaim_final in * inet_sock_destruct() will dispose it @@ -3450,7 +3437,8 @@ void __mptcp_check_push(struct sock *sk, struct sock = *ssk) =20 #define MPTCP_FLAGS_PROCESS_CTX_NEED (BIT(MPTCP_PUSH_PENDING) | \ BIT(MPTCP_RETRANSMIT) | \ - BIT(MPTCP_FLUSH_JOIN_LIST)) + BIT(MPTCP_FLUSH_JOIN_LIST) | \ + BIT(MPTCP_DEQUEUE)) =20 /* processes deferred events and flush wmem */ static void mptcp_release_cb(struct sock *sk) @@ -3484,6 +3472,8 @@ static void mptcp_release_cb(struct sock *sk) __mptcp_push_pending(sk, 0); if (flags & BIT(MPTCP_RETRANSMIT)) __mptcp_retrans(sk); + if ((flags & BIT(MPTCP_DEQUEUE)) && __mptcp_move_skbs(sk)) + sk->sk_data_ready(sk); =20 cond_resched(); spin_lock_bh(&sk->sk_lock.slock); @@ -3721,7 +3711,7 @@ static int mptcp_ioctl(struct sock *sk, int cmd, int = *karg) return -EINVAL; =20 lock_sock(sk); - __mptcp_move_skbs(msk); + __mptcp_move_skbs(sk); *karg =3D mptcp_inq_hint(sk); release_sock(sk); break; diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index b4c72a73594f..ad940cc1f26f 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -124,6 +124,7 @@ #define MPTCP_FLUSH_JOIN_LIST 5 #define MPTCP_SYNC_STATE 6 #define MPTCP_SYNC_SNDBUF 7 +#define MPTCP_DEQUEUE 8 =20 struct mptcp_skb_cb { u64 map_seq; @@ -322,7 +323,6 @@ struct mptcp_sock { struct work_struct work; struct sk_buff *ooo_last_skb; struct rb_root out_of_order_queue; - struct sk_buff_head receive_queue; struct list_head conn_list; struct list_head rtx_queue; struct mptcp_data_frag *first_pending; --=20 2.45.2 From nobody Thu Dec 26 20:24:29 2024 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4B5EA1A070E for ; Fri, 29 Nov 2024 17:45:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732902322; cv=none; b=IhEFbmB1XVDZIUOZfiUzw38JIST8ShiellkEei2It6AMfJuNC2mXKkW0NnLaMZdiZV8CchsutksE8AprGDHZendg8OQ+MhQzg+T1tqTO2+Pjgqg/Fv3Zc29xyiUFTuZzhDGWGet/JKsS9+s99jnmacyfA2L1Yhw6vaVHCDRpCxo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1732902322; c=relaxed/simple; bh=FZI2lemmeR+nrc4WD0RZe1ZnDIJfOzyxzEWBrQhEims=; h=From:To:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:content-type; b=fB36JtjeDyx/edfMwIk0Qqztfg7KfFSVojB5Gb3kR3YzdO0RtPKgAQ0JUWhZZseLCUFbwLEfUZXJQv1GtS1VF9QpfzYZ5uPggHYsBzZgMCykI2DDNP/T7o89CVdj93lArmWu+u2GRRgFRynDTTDFrPWHYqxFGgCEBzDmnKGBkKw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=Ncwum9dd; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Ncwum9dd" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1732902319; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=e4RnZFFyit9FQqtEwqA5J8bEFC1AhKqS3f7kg2+hLtw=; b=Ncwum9ddHvb/TSS01POkkclXuLx7JalOAishLUJ0kwVUnvflxfWeJwj5UPbtChLMq0t+17 WQqcfIznR/FU7G0tkr6yyHVPGgHdt1V4EZ/i9aS7E6f0bcYyC1j8xVT0cdMAt5ipxGmPNi CkfHnMFk+fcJZZxfASOG3BPpVS3y5yc= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-151-F9MweMufNXy84vDs2RDQ9Q-1; Fri, 29 Nov 2024 12:45:17 -0500 X-MC-Unique: F9MweMufNXy84vDs2RDQ9Q-1 X-Mimecast-MFC-AGG-ID: F9MweMufNXy84vDs2RDQ9Q Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 0F7F21955DA1 for ; Fri, 29 Nov 2024 17:45:17 +0000 (UTC) Received: from gerbillo.redhat.com (unknown [10.39.193.89]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id 436621955D45 for ; Fri, 29 Nov 2024 17:45:16 +0000 (UTC) From: Paolo Abeni To: mptcp@lists.linux.dev Subject: [PATCH mptcp-next-next 3/3] mptcp: cleanup mem accounting. Date: Fri, 29 Nov 2024 18:45:05 +0100 Message-ID: In-Reply-To: References: Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: ItM6mI8LXktsBZFKL28fWWftL2VHfk5e_vac-_U_WmQ_1732902317 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8"; x-default="true" After the previous patch, updating sk_forward_memory is cheap and we can drop a lot of complexity from the MPTCP memory acconting, removing the custom fwd mem allocations for rmem. Signed-off-by: Paolo Abeni --- net/mptcp/fastopen.c | 2 +- net/mptcp/protocol.c | 128 ++++--------------------------------------- net/mptcp/protocol.h | 4 +- 3 files changed, 13 insertions(+), 121 deletions(-) diff --git a/net/mptcp/fastopen.c b/net/mptcp/fastopen.c index fb945c0d50bf..b0f1dddfb143 100644 --- a/net/mptcp/fastopen.c +++ b/net/mptcp/fastopen.c @@ -51,7 +51,7 @@ void mptcp_fastopen_subflow_synack_set_params(struct mptc= p_subflow_context *subf mptcp_data_lock(sk); DEBUG_NET_WARN_ON_ONCE(sock_owned_by_user_nocheck(sk)); =20 - mptcp_set_owner_r(skb, sk); + skb_set_owner_r(skb, sk); __skb_queue_tail(&sk->sk_receive_queue, skb); mptcp_sk(sk)->bytes_received +=3D skb->len; =20 diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 159add48f6d9..426acc03a932 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -118,17 +118,6 @@ static void mptcp_drop(struct sock *sk, struct sk_buff= *skb) __kfree_skb(skb); } =20 -static void mptcp_rmem_fwd_alloc_add(struct sock *sk, int size) -{ - WRITE_ONCE(mptcp_sk(sk)->rmem_fwd_alloc, - mptcp_sk(sk)->rmem_fwd_alloc + size); -} - -static void mptcp_rmem_charge(struct sock *sk, int size) -{ - mptcp_rmem_fwd_alloc_add(sk, -size); -} - static bool mptcp_try_coalesce(struct sock *sk, struct sk_buff *to, struct sk_buff *from) { @@ -149,7 +138,7 @@ static bool mptcp_try_coalesce(struct sock *sk, struct = sk_buff *to, * negative one */ atomic_add(delta, &sk->sk_rmem_alloc); - mptcp_rmem_charge(sk, delta); + sk_mem_charge(sk, delta); kfree_skb_partial(from, fragstolen); =20 return true; @@ -164,44 +153,6 @@ static bool mptcp_ooo_try_coalesce(struct mptcp_sock *= msk, struct sk_buff *to, return mptcp_try_coalesce((struct sock *)msk, to, from); } =20 -static void __mptcp_rmem_reclaim(struct sock *sk, int amount) -{ - amount >>=3D PAGE_SHIFT; - mptcp_rmem_charge(sk, amount << PAGE_SHIFT); - __sk_mem_reduce_allocated(sk, amount); -} - -static void mptcp_rmem_uncharge(struct sock *sk, int size) -{ - struct mptcp_sock *msk =3D mptcp_sk(sk); - int reclaimable; - - mptcp_rmem_fwd_alloc_add(sk, size); - reclaimable =3D msk->rmem_fwd_alloc - sk_unused_reserved_mem(sk); - - /* see sk_mem_uncharge() for the rationale behind the following schema */ - if (unlikely(reclaimable >=3D PAGE_SIZE)) - __mptcp_rmem_reclaim(sk, reclaimable); -} - -static void mptcp_rfree(struct sk_buff *skb) -{ - unsigned int len =3D skb->truesize; - struct sock *sk =3D skb->sk; - - atomic_sub(len, &sk->sk_rmem_alloc); - mptcp_rmem_uncharge(sk, len); -} - -void mptcp_set_owner_r(struct sk_buff *skb, struct sock *sk) -{ - skb_orphan(skb); - skb->sk =3D sk; - skb->destructor =3D mptcp_rfree; - atomic_add(skb->truesize, &sk->sk_rmem_alloc); - mptcp_rmem_charge(sk, skb->truesize); -} - /* "inspired" by tcp_data_queue_ofo(), main differences: * - use mptcp seqs * - don't cope with sacks @@ -314,25 +265,7 @@ static void mptcp_data_queue_ofo(struct mptcp_sock *ms= k, struct sk_buff *skb) =20 end: skb_condense(skb); - mptcp_set_owner_r(skb, sk); -} - -static bool mptcp_rmem_schedule(struct sock *sk, struct sock *ssk, int siz= e) -{ - struct mptcp_sock *msk =3D mptcp_sk(sk); - int amt, amount; - - if (size <=3D msk->rmem_fwd_alloc) - return true; - - size -=3D msk->rmem_fwd_alloc; - amt =3D sk_mem_pages(size); - amount =3D amt << PAGE_SHIFT; - if (!__sk_mem_raise_allocated(sk, size, amt, SK_MEM_RECV)) - return false; - - mptcp_rmem_fwd_alloc_add(sk, amount); - return true; + skb_set_owner_r(skb, sk); } =20 static bool __mptcp_move_skb(struct mptcp_sock *msk, struct sock *ssk, @@ -350,7 +283,7 @@ static bool __mptcp_move_skb(struct mptcp_sock *msk, st= ruct sock *ssk, skb_orphan(skb); =20 /* try to fetch required memory from subflow */ - if (!mptcp_rmem_schedule(sk, ssk, skb->truesize)) { + if (!sk_rmem_schedule(sk, skb, skb->truesize)) { MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_RCVPRUNED); goto drop; } @@ -374,7 +307,7 @@ static bool __mptcp_move_skb(struct mptcp_sock *msk, st= ruct sock *ssk, if (tail && mptcp_try_coalesce(sk, tail, skb)) return true; =20 - mptcp_set_owner_r(skb, sk); + skb_set_owner_r(skb, sk); __skb_queue_tail(&sk->sk_receive_queue, skb); return true; } else if (after64(MPTCP_SKB_CB(skb)->map_seq, msk->ack_seq)) { @@ -1077,17 +1010,10 @@ static void __mptcp_clean_una(struct sock *sk) =20 static void __mptcp_clean_una_wakeup(struct sock *sk) { - lockdep_assert_held_once(&sk->sk_lock.slock); - __mptcp_clean_una(sk); mptcp_write_space(sk); } =20 -static void mptcp_clean_una_wakeup(struct sock *sk) -{ - __mptcp_clean_una_wakeup(sk); -} - static void mptcp_enter_memory_pressure(struct sock *sk) { struct mptcp_subflow_context *subflow; @@ -1992,9 +1918,10 @@ static int __mptcp_recvmsg_mskq(struct sock *sk, } =20 if (!(flags & MSG_PEEK)) { - /* we will bulk release the skb memory later */ + /* avoid the indirect call, we know the destructor is sock_wfree */ skb->destructor =3D NULL; - WRITE_ONCE(msk->rmem_released, msk->rmem_released + skb->truesize); + atomic_sub(skb->truesize, &sk->sk_rmem_alloc); + sk_mem_uncharge(sk, skb->truesize); __skb_unlink(skb, &sk->sk_receive_queue); __kfree_skb(skb); msk->bytes_consumed +=3D count; @@ -2107,18 +2034,6 @@ static void mptcp_rcv_space_adjust(struct mptcp_sock= *msk, int copied) msk->rcvq_space.time =3D mstamp; } =20 -static void __mptcp_update_rmem(struct sock *sk) -{ - struct mptcp_sock *msk =3D mptcp_sk(sk); - - if (!msk->rmem_released) - return; - - atomic_sub(msk->rmem_released, &sk->sk_rmem_alloc); - mptcp_rmem_uncharge(sk, msk->rmem_released); - WRITE_ONCE(msk->rmem_released, 0); -} - static bool __mptcp_move_skbs(struct sock *sk) { struct mptcp_sock *msk =3D mptcp_sk(sk); @@ -2133,7 +2048,6 @@ static bool __mptcp_move_skbs(struct sock *sk) break; =20 slowpath =3D lock_sock_fast(ssk); - __mptcp_update_rmem(sk); done =3D __mptcp_move_skbs_from_subflow(msk, ssk, &moved); =20 if (unlikely(ssk->sk_err)) @@ -2143,10 +2057,9 @@ static bool __mptcp_move_skbs(struct sock *sk) =20 ret =3D moved > 0; if (!RB_EMPTY_ROOT(&msk->out_of_order_queue) || - !skb_queue_empty(&sk->sk_receive_queue)) { - __mptcp_update_rmem(sk); + !skb_queue_empty(&sk->sk_receive_queue)) ret |=3D __mptcp_ofo_queue(msk); - } + if (ret) mptcp_check_data_fin((struct sock *)msk); return ret; @@ -2371,17 +2284,13 @@ bool __mptcp_retransmit_pending_data(struct sock *s= k) * some data in the mptcp rtx queue has not really xmitted yet. * keep it simple and re-inject the whole mptcp level rtx queue */ - mptcp_data_lock(sk); __mptcp_clean_una_wakeup(sk); rtx_head =3D mptcp_rtx_head(sk); - if (!rtx_head) { - mptcp_data_unlock(sk); + if (!rtx_head) return false; - } =20 msk->recovery_snd_nxt =3D msk->snd_nxt; msk->recovery =3D true; - mptcp_data_unlock(sk); =20 msk->first_pending =3D rtx_head; msk->snd_burst =3D 0; @@ -2640,7 +2549,7 @@ static void __mptcp_retrans(struct sock *sk) int ret, err; u16 len =3D 0; =20 - mptcp_clean_una_wakeup(sk); + __mptcp_clean_una_wakeup(sk); =20 /* first check ssk: need to kick "stale" logic */ err =3D mptcp_sched_get_retrans(msk); @@ -2813,8 +2722,6 @@ static void __mptcp_init_sock(struct sock *sk) INIT_WORK(&msk->work, mptcp_worker); msk->out_of_order_queue =3D RB_ROOT; msk->first_pending =3D NULL; - WRITE_ONCE(msk->rmem_fwd_alloc, 0); - WRITE_ONCE(msk->rmem_released, 0); msk->timer_ival =3D TCP_RTO_MIN; msk->scaling_ratio =3D TCP_DEFAULT_SCALING_RATIO; =20 @@ -3040,8 +2947,6 @@ static void __mptcp_destroy_sock(struct sock *sk) =20 sk->sk_prot->destroy(sk); =20 - WARN_ON_ONCE(READ_ONCE(msk->rmem_fwd_alloc)); - WARN_ON_ONCE(msk->rmem_released); sk_stream_kill_queues(sk); xfrm_sk_free_policy(sk); =20 @@ -3399,8 +3304,6 @@ void mptcp_destroy_common(struct mptcp_sock *msk, uns= igned int flags) /* move all the rx fwd alloc into the sk_mem_reclaim_final in * inet_sock_destruct() will dispose it */ - sk_forward_alloc_add(sk, msk->rmem_fwd_alloc); - WRITE_ONCE(msk->rmem_fwd_alloc, 0); mptcp_token_destroy(msk); mptcp_pm_free_anno_list(msk); mptcp_free_local_addr_list(msk); @@ -3493,8 +3396,6 @@ static void mptcp_release_cb(struct sock *sk) if (__test_and_clear_bit(MPTCP_SYNC_SNDBUF, &msk->cb_flags)) __mptcp_sync_sndbuf(sk); } - - __mptcp_update_rmem(sk); } =20 /* MP_JOIN client subflow must wait for 4th ack before sending any data: @@ -3665,12 +3566,6 @@ static void mptcp_shutdown(struct sock *sk, int how) __mptcp_wr_shutdown(sk); } =20 -static int mptcp_forward_alloc_get(const struct sock *sk) -{ - return READ_ONCE(sk->sk_forward_alloc) + - READ_ONCE(mptcp_sk(sk)->rmem_fwd_alloc); -} - static int mptcp_ioctl_outq(const struct mptcp_sock *msk, u64 v) { const struct sock *sk =3D (void *)msk; @@ -3828,7 +3723,6 @@ static struct proto mptcp_prot =3D { .hash =3D mptcp_hash, .unhash =3D mptcp_unhash, .get_port =3D mptcp_get_port, - .forward_alloc_get =3D mptcp_forward_alloc_get, .stream_memory_free =3D mptcp_stream_memory_free, .sockets_allocated =3D &mptcp_sockets_allocated, =20 diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index ad940cc1f26f..a0d46b69746d 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -278,7 +278,6 @@ struct mptcp_sock { u64 rcv_data_fin_seq; u64 bytes_retrans; u64 bytes_consumed; - int rmem_fwd_alloc; int snd_burst; int old_wspace; u64 recovery_snd_nxt; /* in recovery mode accept up to this seq; @@ -293,7 +292,6 @@ struct mptcp_sock { u32 last_ack_recv; unsigned long timer_ival; u32 token; - int rmem_released; unsigned long flags; unsigned long cb_flags; bool recovery; /* closing subflow write queue reinjected */ @@ -384,7 +382,7 @@ static inline void msk_owned_by_me(const struct mptcp_s= ock *msk) */ static inline int __mptcp_rmem(const struct sock *sk) { - return atomic_read(&sk->sk_rmem_alloc) - READ_ONCE(mptcp_sk(sk)->rmem_rel= eased); + return atomic_read(&sk->sk_rmem_alloc); } =20 static inline int mptcp_win_from_space(const struct sock *sk, int space) --=20 2.45.2