From nobody Thu Apr 25 23:31:55 2024 Delivered-To: wpasupplicant.patchew@gmail.com Received: by 2002:a02:cbb9:0:0:0:0:0 with SMTP id v25csp1803777jap; Fri, 17 Dec 2021 15:37:13 -0800 (PST) X-Google-Smtp-Source: ABdhPJz0mGzhG6r6P/vnBFC8fcYeMgpe6ffPFbHbGmrZt2tzoTE0hSzl4RXYH1bv21OaoAsrLHub X-Received: by 2002:a63:f4e:: with SMTP id 14mr4928234pgp.575.1639784233197; Fri, 17 Dec 2021 15:37:13 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1639784233; cv=none; d=google.com; s=arc-20160816; b=Rfu13aVklXK8e2x//OyLy6L5r8neBc1NF+RkQKEE8n7l3tQvlUpBlbPlyS12PawoQX kaHxI2r0oo643UdOqrWejPhOHi9w4EkPj3PtDPtxff7vT7lgRK8raHNt5Ls8V7ZYXc+I wPeTxue4xrIO2HSnKrgCcWOhU4r2KZ9Ksp0PrhQYkT7nRKrCXtURGKUhbhlFm7sSBo+y qQN7tW4S+Nv1mFYTqmIVe0igJbi0ysJ7hXLapoDZXE00gijg/VlWJFC02AtLX8GzZEwn LThUk0KeK6GMnkKVWe5MvPqvgPBx18Jvbq7HFNe/cngsfcbk01FjlEyf4MpMCFhmdMj1 MV5w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from; bh=Mp/ASb9lmIiga1ud5MJU0NJUzC3f+xSgSy2HjZ8pTDM=; b=uovPfep8p6yCSNj0olRKTcMe41dPgpn7OG8SGDIlT8veq4b68CS8fRMIcHgkWp110H l5sqkikHswgNkrNm2iWbO8kYgm8YpPywDpQDAFBpGjUO8DaJUPsoz3DVK9LPVUyGp0gq owB7INSTSdW085TpKELpCSy9JoUcN3x1QgtUs7WBgu17AVEyEhS/udHE4C+j6CUeLJLM KRHBl3jtUuL/1C7KG/BjUu7CNVOiWQt0fC7Sq4w94OV/gare2YgnTBe2T7nBcTWUHWT/ 5NUh3hPZ9DWN8VQgrmloocYbXFDp2luRlGgdOIXSE7DxPZRYfbbggCmOSuCdoHA/QMer BRnA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of mptcp+bounces-2837-wpasupplicant.patchew=gmail.com@lists.linux.dev designates 147.75.69.165 as permitted sender) smtp.mailfrom="mptcp+bounces-2837-wpasupplicant.patchew=gmail.com@lists.linux.dev"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from sjc.edge.kernel.org (sjc.edge.kernel.org. [147.75.69.165]) by mx.google.com with ESMTPS id n8si9187789plc.461.2021.12.17.15.37.13 for (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Fri, 17 Dec 2021 15:37:13 -0800 (PST) Received-SPF: pass (google.com: domain of mptcp+bounces-2837-wpasupplicant.patchew=gmail.com@lists.linux.dev designates 147.75.69.165 as permitted sender) client-ip=147.75.69.165; Authentication-Results: mx.google.com; spf=pass (google.com: domain of mptcp+bounces-2837-wpasupplicant.patchew=gmail.com@lists.linux.dev designates 147.75.69.165 as permitted sender) smtp.mailfrom="mptcp+bounces-2837-wpasupplicant.patchew=gmail.com@lists.linux.dev"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sjc.edge.kernel.org (Postfix) with ESMTPS id 51BA23E0F53 for ; Fri, 17 Dec 2021 23:37:12 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 38DA42CB0; Fri, 17 Dec 2021 23:37:11 +0000 (UTC) X-Original-To: mptcp@lists.linux.dev Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EA2662CB2 for ; Fri, 17 Dec 2021 23:37:08 +0000 (UTC) X-IronPort-AV: E=McAfee;i="6200,9189,10201"; a="303232019" X-IronPort-AV: E=Sophos;i="5.88,215,1635231600"; d="scan'208";a="303232019" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Dec 2021 15:37:07 -0800 X-IronPort-AV: E=Sophos;i="5.88,215,1635231600"; d="scan'208";a="683556053" Received: from mjmartin-desk2.amr.corp.intel.com (HELO mjmartin-desk2.intel.com) ([10.209.7.225]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Dec 2021 15:37:06 -0800 From: Mat Martineau To: netdev@vger.kernel.org Cc: Paolo Abeni , davem@davemloft.net, kuba@kernel.org, matthieu.baerts@tessares.net, mptcp@lists.linux.dev, Mat Martineau Subject: [PATCH net-next 1/3] mptcp: enforce HoL-blocking estimation Date: Fri, 17 Dec 2021 15:37:00 -0800 Message-Id: <20211217233702.299461-2-mathew.j.martineau@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20211217233702.299461-1-mathew.j.martineau@linux.intel.com> References: <20211217233702.299461-1-mathew.j.martineau@linux.intel.com> Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Paolo Abeni The MPTCP packet scheduler has sub-optimal behavior with asymmetric subflows: if the faster subflow-level cwin is closed, the packet scheduler can enqueue "too much" data on a slower subflow. When all the data on the faster subflow is acked, if the mptcp-level cwin is closed, and link utilization becomes suboptimal. The solution is implementing blest-like[1] HoL-blocking estimation, transmitting only on the subflow with the shorter estimated time to flush the queued memory. If such subflows cwin is closed, we wait even if other subflows are available. This is quite simpler than the original blest implementation, as we leverage the pacing rate provided by the TCP socket. To get a more accurate estimation for the subflow linger-time, we maintain a per-subflow weighted average of such info. Additionally drop magic numbers usage in favor of newly defined macros and use more meaningful names for status variable. [1] http://dl.ifip.org/db/conf/networking/networking2016/1570234725.pdf Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/137 Reviewed-by: Matthieu Baerts Signed-off-by: Paolo Abeni Signed-off-by: Mat Martineau --- net/mptcp/protocol.c | 72 +++++++++++++++++++++++++++++--------------- net/mptcp/protocol.h | 1 + 2 files changed, 48 insertions(+), 25 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 3e549f6190c0..df5a0cf431c1 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -1372,7 +1372,7 @@ static int mptcp_sendmsg_frag(struct sock *sk, struct= sock *ssk, =20 struct subflow_send_info { struct sock *ssk; - u64 ratio; + u64 linger_time; }; =20 void mptcp_subflow_set_active(struct mptcp_subflow_context *subflow) @@ -1397,20 +1397,24 @@ bool mptcp_subflow_active(struct mptcp_subflow_cont= ext *subflow) return __mptcp_subflow_active(subflow); } =20 +#define SSK_MODE_ACTIVE 0 +#define SSK_MODE_BACKUP 1 +#define SSK_MODE_MAX 2 + /* implement the mptcp packet scheduler; * returns the subflow that will transmit the next DSS * additionally updates the rtx timeout */ static struct sock *mptcp_subflow_get_send(struct mptcp_sock *msk) { - struct subflow_send_info send_info[2]; + struct subflow_send_info send_info[SSK_MODE_MAX]; struct mptcp_subflow_context *subflow; struct sock *sk =3D (struct sock *)msk; + u32 pace, burst, wmem; int i, nr_active =3D 0; struct sock *ssk; + u64 linger_time; long tout =3D 0; - u64 ratio; - u32 pace; =20 sock_owned_by_me(sk); =20 @@ -1429,10 +1433,11 @@ static struct sock *mptcp_subflow_get_send(struct m= ptcp_sock *msk) } =20 /* pick the subflow with the lower wmem/wspace ratio */ - for (i =3D 0; i < 2; ++i) { + for (i =3D 0; i < SSK_MODE_MAX; ++i) { send_info[i].ssk =3D NULL; - send_info[i].ratio =3D -1; + send_info[i].linger_time =3D -1; } + mptcp_for_each_subflow(msk, subflow) { trace_mptcp_subflow_get_send(subflow); ssk =3D mptcp_subflow_tcp_sock(subflow); @@ -1441,34 +1446,51 @@ static struct sock *mptcp_subflow_get_send(struct m= ptcp_sock *msk) =20 tout =3D max(tout, mptcp_timeout_from_subflow(subflow)); nr_active +=3D !subflow->backup; - if (!sk_stream_memory_free(subflow->tcp_sock) || !tcp_sk(ssk)->snd_wnd) - continue; - - pace =3D READ_ONCE(ssk->sk_pacing_rate); - if (!pace) - continue; + pace =3D subflow->avg_pacing_rate; + if (unlikely(!pace)) { + /* init pacing rate from socket */ + subflow->avg_pacing_rate =3D READ_ONCE(ssk->sk_pacing_rate); + pace =3D subflow->avg_pacing_rate; + if (!pace) + continue; + } =20 - ratio =3D div_u64((u64)READ_ONCE(ssk->sk_wmem_queued) << 32, - pace); - if (ratio < send_info[subflow->backup].ratio) { + linger_time =3D div_u64((u64)READ_ONCE(ssk->sk_wmem_queued) << 32, pace); + if (linger_time < send_info[subflow->backup].linger_time) { send_info[subflow->backup].ssk =3D ssk; - send_info[subflow->backup].ratio =3D ratio; + send_info[subflow->backup].linger_time =3D linger_time; } } __mptcp_set_timeout(sk, tout); =20 /* pick the best backup if no other subflow is active */ if (!nr_active) - send_info[0].ssk =3D send_info[1].ssk; - - if (send_info[0].ssk) { - msk->last_snd =3D send_info[0].ssk; - msk->snd_burst =3D min_t(int, MPTCP_SEND_BURST_SIZE, - tcp_sk(msk->last_snd)->snd_wnd); - return msk->last_snd; - } + send_info[SSK_MODE_ACTIVE].ssk =3D send_info[SSK_MODE_BACKUP].ssk; + + /* According to the blest algorithm, to avoid HoL blocking for the + * faster flow, we need to: + * - estimate the faster flow linger time + * - use the above to estimate the amount of byte transferred + * by the faster flow + * - check that the amount of queued data is greter than the above, + * otherwise do not use the picked, slower, subflow + * We select the subflow with the shorter estimated time to flush + * the queued mem, which basically ensure the above. We just need + * to check that subflow has a non empty cwin. + */ + ssk =3D send_info[SSK_MODE_ACTIVE].ssk; + if (!ssk || !sk_stream_memory_free(ssk) || !tcp_sk(ssk)->snd_wnd) + return NULL; =20 - return NULL; + burst =3D min_t(int, MPTCP_SEND_BURST_SIZE, tcp_sk(ssk)->snd_wnd); + wmem =3D READ_ONCE(ssk->sk_wmem_queued); + subflow =3D mptcp_subflow_ctx(ssk); + subflow->avg_pacing_rate =3D div_u64((u64)subflow->avg_pacing_rate * wmem= + + READ_ONCE(ssk->sk_pacing_rate) * burst, + burst + wmem); + msk->last_snd =3D ssk; + msk->snd_burst =3D burst; + return ssk; } =20 static void mptcp_push_release(struct sock *ssk, struct mptcp_sendmsg_info= *info) diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index e1469155fb15..0486c9f5b38b 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -395,6 +395,7 @@ DECLARE_PER_CPU(struct mptcp_delegated_action, mptcp_de= legated_actions); /* MPTCP subflow context */ struct mptcp_subflow_context { struct list_head node;/* conn_list of subflows */ + unsigned long avg_pacing_rate; /* protected by msk socket lock */ u64 local_key; u64 remote_key; u64 idsn; --=20 2.34.1 From nobody Thu Apr 25 23:31:55 2024 Delivered-To: wpasupplicant.patchew@gmail.com Received: by 2002:a02:cbb9:0:0:0:0:0 with SMTP id v25csp1803768jap; Fri, 17 Dec 2021 15:37:12 -0800 (PST) X-Google-Smtp-Source: ABdhPJyk2A4VIt2hWXGO7HHJ7h8Evcs/rac8RfSh5woyaUC8LM1EBHgSBKWOpjIGwRGub4UGipE3 X-Received: by 2002:a17:902:76c6:b0:142:644e:e9a with SMTP id j6-20020a17090276c600b00142644e0e9amr5327432plt.6.1639784231878; Fri, 17 Dec 2021 15:37:11 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1639784231; cv=none; d=google.com; s=arc-20160816; b=tSdFc7+NJzBd+EcCnKKX/OBu0RWMROmA/xwutWeNIPV0slgYyPZpQZuNULP7iLSg6s /oJZZxLu5tP+b5opNB1pADmPy+gxKJLfdavC2ZJI00nq27UP1gt4NQsqc4Q/w1W1Tgn1 Puio8zp73HCQsNEOqBEgOqXgju8S4Xl6vlonR0LKnhO0EQ86SMW6eGKy8RV4aexRg/o4 qbJFEjzla6+T+VD6Nj8ujGxIU8phHKGnK5eMUei3Fckc9KjvY6ZdS8Bcd0FVTMiTs7dc WIZ06m3CBmpTyIAtRbyatHr+EgXDTYCUV/bOkUbxV4qUkcVtiMgxp0PlC/bJHeUWDvW7 0HLg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from; bh=E/c7sKTr/y6BB24Nmg4QHjyBt6Zzbq3W9wvOUf7z+vo=; b=roHzS8eugbncjKNd2+t5j1AXNujzHuzfZfDiSCDoDe+TDtgnOiGaxOMsslMYHtMmv4 P3dljOsKqOtRhIYLjMV+kOJpURVVzqdxvNQBvnPwOk5feo5e78LEsKw09Q/0FfAyntU5 546+keyjkwTiU/15J4jyTy6ZXd3PDjpFNXzaO4geEj5mWiVtheniZmtZk5E7kQCEnI1/ q6fiW0wq8Cg8C1+w7mlo0QI7qKZfr3jbNiAuzheX8ByiS1z+XI71ORbthC1qGiqf1uOB skckJmRFchGTHzHGq6YQXvuik6X3b39r0HLKw5tOqADUeQcCrB/JOLWRhuhgostA3hu4 KWDQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of mptcp+bounces-2836-wpasupplicant.patchew=gmail.com@lists.linux.dev designates 147.75.69.165 as permitted sender) smtp.mailfrom="mptcp+bounces-2836-wpasupplicant.patchew=gmail.com@lists.linux.dev"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from sjc.edge.kernel.org (sjc.edge.kernel.org. [147.75.69.165]) by mx.google.com with ESMTPS id m2si9922653plx.471.2021.12.17.15.37.11 for (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Fri, 17 Dec 2021 15:37:11 -0800 (PST) Received-SPF: pass (google.com: domain of mptcp+bounces-2836-wpasupplicant.patchew=gmail.com@lists.linux.dev designates 147.75.69.165 as permitted sender) client-ip=147.75.69.165; Authentication-Results: mx.google.com; spf=pass (google.com: domain of mptcp+bounces-2836-wpasupplicant.patchew=gmail.com@lists.linux.dev designates 147.75.69.165 as permitted sender) smtp.mailfrom="mptcp+bounces-2836-wpasupplicant.patchew=gmail.com@lists.linux.dev"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sjc.edge.kernel.org (Postfix) with ESMTPS id 206743E0585 for ; Fri, 17 Dec 2021 23:37:11 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 3DD732CB5; Fri, 17 Dec 2021 23:37:10 +0000 (UTC) X-Original-To: mptcp@lists.linux.dev Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EA2352C9D for ; Fri, 17 Dec 2021 23:37:08 +0000 (UTC) X-IronPort-AV: E=McAfee;i="6200,9189,10201"; a="303232021" X-IronPort-AV: E=Sophos;i="5.88,215,1635231600"; d="scan'208";a="303232021" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Dec 2021 15:37:07 -0800 X-IronPort-AV: E=Sophos;i="5.88,215,1635231600"; d="scan'208";a="683556054" Received: from mjmartin-desk2.amr.corp.intel.com (HELO mjmartin-desk2.intel.com) ([10.209.7.225]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Dec 2021 15:37:07 -0800 From: Mat Martineau To: netdev@vger.kernel.org Cc: Florian Westphal , davem@davemloft.net, kuba@kernel.org, matthieu.baerts@tessares.net, mptcp@lists.linux.dev, Mat Martineau Subject: [PATCH net-next 2/3] selftests: mptcp: try to set mptcp ulp mode in different sk states Date: Fri, 17 Dec 2021 15:37:01 -0800 Message-Id: <20211217233702.299461-3-mathew.j.martineau@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20211217233702.299461-1-mathew.j.martineau@linux.intel.com> References: <20211217233702.299461-1-mathew.j.martineau@linux.intel.com> Precedence: bulk X-Mailing-List: mptcp@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Florian Westphal The kernel will crash without 'mptcp: clear 'kern' flag from fallback sockets' change. Since this doesn't slow down testing in a noticeable way, run this unconditionally. The explicit test did not catch this, because the check was done for tcp socket returned by 'socket(.. IPPROTO_TCP) rather than a tcp socket returned by accept() on a mptcp listen fd. Signed-off-by: Florian Westphal Signed-off-by: Mat Martineau --- .../selftests/net/mptcp/mptcp_connect.c | 97 ++++++++++--------- .../selftests/net/mptcp/mptcp_connect.sh | 20 ---- 2 files changed, 51 insertions(+), 66 deletions(-) diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.c b/tools/test= ing/selftests/net/mptcp/mptcp_connect.c index 98de28ac3ba8..a30e93c5c549 100644 --- a/tools/testing/selftests/net/mptcp/mptcp_connect.c +++ b/tools/testing/selftests/net/mptcp/mptcp_connect.c @@ -59,7 +59,6 @@ static enum cfg_peek cfg_peek =3D CFG_NONE_PEEK; static const char *cfg_host; static const char *cfg_port =3D "12000"; static int cfg_sock_proto =3D IPPROTO_MPTCP; -static bool tcpulp_audit; static int pf =3D AF_INET; static int cfg_sndbuf; static int cfg_rcvbuf; @@ -103,7 +102,6 @@ static void die_usage(void) fprintf(stderr, "\t-s [MPTCP|TCP] -- use mptcp(default) or tcp sockets\n"= ); fprintf(stderr, "\t-m [poll|mmap|sendfile] -- use poll(default)/mmap+writ= e/sendfile\n"); fprintf(stderr, "\t-M mark -- set socket packet mark\n"); - fprintf(stderr, "\t-u -- check mptcp ulp\n"); fprintf(stderr, "\t-w num -- wait num sec before closing the socket\n"); fprintf(stderr, "\t-c cmsg -- test cmsg type \n"); fprintf(stderr, "\t-o option -- test sockopt