From nobody Fri Oct 3 05:27:00 2025 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B74A329D281; Thu, 4 Sep 2025 21:13:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757020399; cv=none; b=A24LD03B7Z6G29UEYOlTsbg6pshuubdZGRafpfNySFvyMgKb4+SJ+zlbLxlRlQKXcX3jWBxVdg5EbBCrujs8ux++57mTMFmwzUrQowRtdWZCgUqlQP8YgjV1l3VChtbYDpdFUnu4CwyY9xvIttI3r56jk87QJk3S40qd3SkimgQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757020399; c=relaxed/simple; bh=PG+ib/nk6SWnN9J3aSk2bi5cGPIS3zNAw9Aj90ie2lw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=pE0B2n2YwG6hDaYfIwySkNAe4QQ5dFvhYZLSa+NR7i4ZM9l9ftvxJBP7IDXeq4bBfsdw4Mjyd5NO2Tq/q7vr8PKWN2ZUoi220vSZ1/dCVtTEzkbFegq4YqrSYAlQrH6hTL1e5A22Qm1ps2kZx2N1HHqxDXYvXWPm9TyyeEvOmQU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=hyzk1hC9; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="hyzk1hC9" Received: from pps.filterd (m0353729.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 584ELORD009202; Thu, 4 Sep 2025 21:13:09 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=8a6tHUF39FrptoQJQ e5qW45HCdImsqKMlWl9c5/fngo=; b=hyzk1hC90mkVJIxCW9WUcsuf0FgeanCdT 2n1GEltzdPtLvuCYazucaA0JPs07xZ0s0eFaycGGsoFGBuXaomHfDsDbmaP2qtT/ QW+XfIBlecKVwDT5gIuGHdgoGkNkjFE7aINkyZOf4a+lvqAZ8zMpg75bLQzfvYBS UQBc7RqjhK9fsze7V7cu21Lq81JKRXWwuc5UDQqyc2XgmWksTnHtHtD6dVgAOnQd 7S1Ht9E6Tsa0WXfj8tMOp50hTiPx5te8AULyGR2bNNxQ4F8XbXTdCdLyodXIpMx3 0YE7GfZluf4L26mS+z0d1GGusDXNKdxnoclTPBz78/Fcr3f3lW1aQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 48uswdmqu1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 04 Sep 2025 21:13:09 +0000 (GMT) Received: from m0353729.ppops.net (m0353729.ppops.net [127.0.0.1]) by pps.reinject (8.18.1.12/8.18.0.8) with ESMTP id 584L9WGJ006319; Thu, 4 Sep 2025 21:13:08 GMT Received: from ppma11.dal12v.mail.ibm.com (db.9e.1632.ip4.static.sl-reverse.com [50.22.158.219]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 48uswdmqtx-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 04 Sep 2025 21:13:08 +0000 (GMT) Received: from pps.filterd (ppma11.dal12v.mail.ibm.com [127.0.0.1]) by ppma11.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 584Kr2GM014345; Thu, 4 Sep 2025 21:13:07 GMT Received: from smtprelay05.fra02v.mail.ibm.com ([9.218.2.225]) by ppma11.dal12v.mail.ibm.com (PPS) with ESMTPS id 48veb3p51j-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 04 Sep 2025 21:13:07 +0000 Received: from smtpav01.fra02v.mail.ibm.com (smtpav01.fra02v.mail.ibm.com [10.20.54.100]) by smtprelay05.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 584LD48M51642834 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 4 Sep 2025 21:13:04 GMT Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 2193120043; Thu, 4 Sep 2025 21:13:04 +0000 (GMT) Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id CF3E420040; Thu, 4 Sep 2025 21:13:03 +0000 (GMT) Received: from tuxmaker.boeblingen.de.ibm.com (unknown [9.152.85.9]) by smtpav01.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 4 Sep 2025 21:13:03 +0000 (GMT) From: Halil Pasic To: Jakub Kicinski , Paolo Abeni , Simon Horman , "D. Wythe" , Dust Li , Sidraya Jayagond , Wenjia Zhang , Mahanta Jambigi , Tony Lu , Wen Gu , netdev@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, linux-s390@vger.kernel.org Cc: Halil Pasic Subject: [PATCH net-next 1/2] net/smc: make wr buffer count configurable Date: Thu, 4 Sep 2025 23:12:52 +0200 Message-ID: <20250904211254.1057445-2-pasic@linux.ibm.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250904211254.1057445-1-pasic@linux.ibm.com> References: <20250904211254.1057445-1-pasic@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Authority-Analysis: v=2.4 cv=PeP/hjhd c=1 sm=1 tr=0 ts=68ba00e5 cx=c_pps a=aDMHemPKRhS1OARIsFnwRA==:117 a=aDMHemPKRhS1OARIsFnwRA==:17 a=yJojWOMRYYMA:10 a=VnNF1IyMAAAA:8 a=bFSKpROA1of_4W3tK1kA:9 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUwODMwMDAzNCBTYWx0ZWRfX/3L2dsQq29D9 GQnnGvUOexAE+hnMsZFjRYFrjQL1pg/+7tU317SsGWO5guyOY3sibJ7zGiVyRWT37J8ssrQLWdZ jqCPBblbrqAUsPSR2zZx6E8hwGMSLnHpxySzUN4HSaqH7Ybi9iqNZPaoATwANAt10pRiA0bUp3C F+6AUbpIYB1UB4n3kqY68vEbV/YW3Jg4ne5IqHLl+b0feJxtPJgbgWFzG4WgaPJvqVINaZIoLy8 OqAPUmjre/skLiVscjblZ7pI31uN3NeDoPOYpZwXtFufKNVWY1LHlEpiASSc0iKvSg2KRD51Bi7 UDS7ZMMn50z+vhS3tAh0jdk916LNN4pwpssH3YYaQFoJOraPZ79r+4B/Eg1P7Uv0YAbg8jnLPHl pdIJIWw+ X-Proofpoint-GUID: sWg36E1s0zG8R-QJlqtVrguB119In2Jv X-Proofpoint-ORIG-GUID: D-Hn-qxd3qdk5Gd6EdcKPCV5-GwoDS4R X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1117,Hydra:6.1.9,FMLib:17.12.80.40 definitions=2025-09-04_07,2025-09-04_01,2025-03-28_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1011 priorityscore=1501 malwarescore=0 spamscore=0 adultscore=0 impostorscore=0 bulkscore=0 phishscore=0 suspectscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2507300000 definitions=main-2508300034 Content-Type: text/plain; charset="utf-8" Think SMC_WR_BUF_CNT_SEND :=3D SMC_WR_BUF_CNT used in send context and SMC_WR_BUF_CNT_RECV :=3D 3 * SMC_WR_BUF_CNT used in recv context. Those get replaced with lgr->max_send_wr and lgr->max_recv_wr respective. While at it let us also remove a confusing comment that is either not about the context in which it resides (describing qp_attr.cap.max_send_wr and qp_attr.cap.max_recv_wr) or not applicable any more when these values become configurable . Signed-off-by: Halil Pasic Reviewed-by: Wenjia Zhang --- Documentation/networking/smc-sysctl.rst | 37 +++++++++++++++++++++++++ net/smc/smc.h | 2 ++ net/smc/smc_core.h | 4 +++ net/smc/smc_ib.c | 7 ++--- net/smc/smc_llc.c | 2 ++ net/smc/smc_sysctl.c | 22 +++++++++++++++ net/smc/smc_wr.c | 32 +++++++++++---------- net/smc/smc_wr.h | 2 -- 8 files changed, 86 insertions(+), 22 deletions(-) diff --git a/Documentation/networking/smc-sysctl.rst b/Documentation/networ= king/smc-sysctl.rst index a874d007f2db..c687092329e3 100644 --- a/Documentation/networking/smc-sysctl.rst +++ b/Documentation/networking/smc-sysctl.rst @@ -71,3 +71,40 @@ smcr_max_conns_per_lgr - INTEGER acceptable value ranges from 16 to 255. Only for SMC-R v2.1 and later. =20 Default: 255 + +smcr_max_send_wr - INTEGER + So called work request buffers are SMCR link (and RDMA queue pair) level + resources necessary for performing RDMA operations. Since up to 255 + connections can share a link group and thus also a link and the number + of the work request buffers is decided when the link is allocated, + depending on the workload it can a bottleneck in a sense that threads + have to wait for work request buffers to become available. Before the + introduction of this control the maximal number of work request buffers + available on the send path used to be hard coded to 16. With this control + it becomes configurable. The acceptable range is between 2 and 2048. + + Please be aware that all the buffers need to be allocated as a physically + continuous array in which each element is a single buffer and has the size + of SMC_WR_BUF_SIZE (48) bytes. If the allocation fails we give up much + like before having this control. + this control. + + Default: 16 + +smcr_max_recv_wr - INTEGER + So called work request buffers are SMCR link (and RDMA queue pair) level + resources necessary for performing RDMA operations. Since up to 255 + connections can share a link group and thus also a link and the number + of the work request buffers is decided when the link is allocated, + depending on the workload it can a bottleneck in a sense that threads + have to wait for work request buffers to become available. Before the + introduction of this control the maximal number of work request buffers + available on the receive path used to be hard coded to 16. With this cont= rol + it becomes configurable. The acceptable range is between 2 and 2048. + + Please be aware that all the buffers need to be allocated as a physically + continuous array in which each element is a single buffer and has the size + of SMC_WR_BUF_SIZE (48) bytes. If the allocation fails we give up much + like before having this control. + + Default: 48 diff --git a/net/smc/smc.h b/net/smc/smc.h index 2c9084963739..ffe48253fa1f 100644 --- a/net/smc/smc.h +++ b/net/smc/smc.h @@ -33,6 +33,8 @@ =20 extern struct proto smc_proto; extern struct proto smc_proto6; +extern unsigned int smc_ib_sysctl_max_send_wr; +extern unsigned int smc_ib_sysctl_max_recv_wr; =20 extern struct smc_hashinfo smc_v4_hashinfo; extern struct smc_hashinfo smc_v6_hashinfo; diff --git a/net/smc/smc_core.h b/net/smc/smc_core.h index 48a1b1dcb576..b883f43fc206 100644 --- a/net/smc/smc_core.h +++ b/net/smc/smc_core.h @@ -361,6 +361,10 @@ struct smc_link_group { /* max conn can be assigned to lgr */ u8 max_links; /* max links can be added in lgr */ + u16 max_send_wr; + /* number of WR buffers on send */ + u16 max_recv_wr; + /* number of WR buffers on recv */ }; struct { /* SMC-D */ struct smcd_gid peer_gid; diff --git a/net/smc/smc_ib.c b/net/smc/smc_ib.c index 0052f02756eb..e8d35c22c525 100644 --- a/net/smc/smc_ib.c +++ b/net/smc/smc_ib.c @@ -669,11 +669,6 @@ int smc_ib_create_queue_pair(struct smc_link *lnk) .recv_cq =3D lnk->smcibdev->roce_cq_recv, .srq =3D NULL, .cap =3D { - /* include unsolicited rdma_writes as well, - * there are max. 2 RDMA_WRITE per 1 WR_SEND - */ - .max_send_wr =3D SMC_WR_BUF_CNT * 3, - .max_recv_wr =3D SMC_WR_BUF_CNT * 3, .max_send_sge =3D SMC_IB_MAX_SEND_SGE, .max_recv_sge =3D lnk->wr_rx_sge_cnt, .max_inline_data =3D 0, @@ -683,6 +678,8 @@ int smc_ib_create_queue_pair(struct smc_link *lnk) }; int rc; =20 + qp_attr.cap.max_send_wr =3D 3 * lnk->lgr->max_send_wr; + qp_attr.cap.max_recv_wr =3D lnk->lgr->max_recv_wr; lnk->roce_qp =3D ib_create_qp(lnk->roce_pd, &qp_attr); rc =3D PTR_ERR_OR_ZERO(lnk->roce_qp); if (IS_ERR(lnk->roce_qp)) diff --git a/net/smc/smc_llc.c b/net/smc/smc_llc.c index f865c58c3aa7..91c936bf7336 100644 --- a/net/smc/smc_llc.c +++ b/net/smc/smc_llc.c @@ -2157,6 +2157,8 @@ void smc_llc_lgr_init(struct smc_link_group *lgr, str= uct smc_sock *smc) init_waitqueue_head(&lgr->llc_msg_waiter); init_rwsem(&lgr->llc_conf_mutex); lgr->llc_testlink_time =3D READ_ONCE(net->smc.sysctl_smcr_testlink_time); + lgr->max_send_wr =3D (u16)(READ_ONCE(smc_ib_sysctl_max_send_wr)); + lgr->max_recv_wr =3D (u16)(READ_ONCE(smc_ib_sysctl_max_recv_wr)); } =20 /* called after lgr was removed from lgr_list */ diff --git a/net/smc/smc_sysctl.c b/net/smc/smc_sysctl.c index 2fab6456f765..01da1297e150 100644 --- a/net/smc/smc_sysctl.c +++ b/net/smc/smc_sysctl.c @@ -29,6 +29,10 @@ static int links_per_lgr_min =3D SMC_LINKS_ADD_LNK_MIN; static int links_per_lgr_max =3D SMC_LINKS_ADD_LNK_MAX; static int conns_per_lgr_min =3D SMC_CONN_PER_LGR_MIN; static int conns_per_lgr_max =3D SMC_CONN_PER_LGR_MAX; +unsigned int smc_ib_sysctl_max_send_wr =3D 16; +unsigned int smc_ib_sysctl_max_recv_wr =3D 48; +static unsigned int smc_ib_sysctl_max_wr_min =3D 2; +static unsigned int smc_ib_sysctl_max_wr_max =3D 2048; =20 static struct ctl_table smc_table[] =3D { { @@ -99,6 +103,24 @@ static struct ctl_table smc_table[] =3D { .extra1 =3D SYSCTL_ZERO, .extra2 =3D SYSCTL_ONE, }, + { + .procname =3D "smcr_max_send_wr", + .data =3D &smc_ib_sysctl_max_send_wr, + .maxlen =3D sizeof(int), + .mode =3D 0644, + .proc_handler =3D proc_dointvec_minmax, + .extra1 =3D &smc_ib_sysctl_max_wr_min, + .extra2 =3D &smc_ib_sysctl_max_wr_max, + }, + { + .procname =3D "smcr_max_recv_wr", + .data =3D &smc_ib_sysctl_max_recv_wr, + .maxlen =3D sizeof(int), + .mode =3D 0644, + .proc_handler =3D proc_dointvec_minmax, + .extra1 =3D &smc_ib_sysctl_max_wr_min, + .extra2 =3D &smc_ib_sysctl_max_wr_max, + }, }; =20 int __net_init smc_sysctl_net_init(struct net *net) diff --git a/net/smc/smc_wr.c b/net/smc/smc_wr.c index b04a21b8c511..85ebc65f1546 100644 --- a/net/smc/smc_wr.c +++ b/net/smc/smc_wr.c @@ -34,6 +34,7 @@ #define SMC_WR_MAX_POLL_CQE 10 /* max. # of compl. queue elements in 1 pol= l */ =20 #define SMC_WR_RX_HASH_BITS 4 + static DEFINE_HASHTABLE(smc_wr_rx_hash, SMC_WR_RX_HASH_BITS); static DEFINE_SPINLOCK(smc_wr_rx_hash_lock); =20 @@ -547,9 +548,9 @@ void smc_wr_remember_qp_attr(struct smc_link *lnk) IB_QP_DEST_QPN, &init_attr); =20 - lnk->wr_tx_cnt =3D min_t(size_t, SMC_WR_BUF_CNT, + lnk->wr_tx_cnt =3D min_t(size_t, lnk->lgr->max_send_wr, lnk->qp_attr.cap.max_send_wr); - lnk->wr_rx_cnt =3D min_t(size_t, SMC_WR_BUF_CNT * 3, + lnk->wr_rx_cnt =3D min_t(size_t, lnk->lgr->max_recv_wr, lnk->qp_attr.cap.max_recv_wr); } =20 @@ -741,50 +742,51 @@ int smc_wr_alloc_lgr_mem(struct smc_link_group *lgr) int smc_wr_alloc_link_mem(struct smc_link *link) { /* allocate link related memory */ - link->wr_tx_bufs =3D kcalloc(SMC_WR_BUF_CNT, SMC_WR_BUF_SIZE, GFP_KERNEL); + link->wr_tx_bufs =3D kcalloc(link->lgr->max_send_wr, + SMC_WR_BUF_SIZE, GFP_KERNEL); if (!link->wr_tx_bufs) goto no_mem; - link->wr_rx_bufs =3D kcalloc(SMC_WR_BUF_CNT * 3, link->wr_rx_buflen, + link->wr_rx_bufs =3D kcalloc(link->lgr->max_recv_wr, SMC_WR_BUF_SIZE, GFP_KERNEL); if (!link->wr_rx_bufs) goto no_mem_wr_tx_bufs; - link->wr_tx_ibs =3D kcalloc(SMC_WR_BUF_CNT, sizeof(link->wr_tx_ibs[0]), - GFP_KERNEL); + link->wr_tx_ibs =3D kcalloc(link->lgr->max_send_wr, + sizeof(link->wr_tx_ibs[0]), GFP_KERNEL); if (!link->wr_tx_ibs) goto no_mem_wr_rx_bufs; - link->wr_rx_ibs =3D kcalloc(SMC_WR_BUF_CNT * 3, + link->wr_rx_ibs =3D kcalloc(link->lgr->max_recv_wr, sizeof(link->wr_rx_ibs[0]), GFP_KERNEL); if (!link->wr_rx_ibs) goto no_mem_wr_tx_ibs; - link->wr_tx_rdmas =3D kcalloc(SMC_WR_BUF_CNT, + link->wr_tx_rdmas =3D kcalloc(link->lgr->max_send_wr, sizeof(link->wr_tx_rdmas[0]), GFP_KERNEL); if (!link->wr_tx_rdmas) goto no_mem_wr_rx_ibs; - link->wr_tx_rdma_sges =3D kcalloc(SMC_WR_BUF_CNT, + link->wr_tx_rdma_sges =3D kcalloc(link->lgr->max_send_wr, sizeof(link->wr_tx_rdma_sges[0]), GFP_KERNEL); if (!link->wr_tx_rdma_sges) goto no_mem_wr_tx_rdmas; - link->wr_tx_sges =3D kcalloc(SMC_WR_BUF_CNT, sizeof(link->wr_tx_sges[0]), + link->wr_tx_sges =3D kcalloc(link->lgr->max_send_wr, sizeof(link->wr_tx_s= ges[0]), GFP_KERNEL); if (!link->wr_tx_sges) goto no_mem_wr_tx_rdma_sges; - link->wr_rx_sges =3D kcalloc(SMC_WR_BUF_CNT * 3, + link->wr_rx_sges =3D kcalloc(link->lgr->max_recv_wr, sizeof(link->wr_rx_sges[0]) * link->wr_rx_sge_cnt, GFP_KERNEL); if (!link->wr_rx_sges) goto no_mem_wr_tx_sges; - link->wr_tx_mask =3D bitmap_zalloc(SMC_WR_BUF_CNT, GFP_KERNEL); + link->wr_tx_mask =3D bitmap_zalloc(link->lgr->max_send_wr, GFP_KERNEL); if (!link->wr_tx_mask) goto no_mem_wr_rx_sges; - link->wr_tx_pends =3D kcalloc(SMC_WR_BUF_CNT, + link->wr_tx_pends =3D kcalloc(link->lgr->max_send_wr, sizeof(link->wr_tx_pends[0]), GFP_KERNEL); if (!link->wr_tx_pends) goto no_mem_wr_tx_mask; - link->wr_tx_compl =3D kcalloc(SMC_WR_BUF_CNT, + link->wr_tx_compl =3D kcalloc(link->lgr->max_send_wr, sizeof(link->wr_tx_compl[0]), GFP_KERNEL); if (!link->wr_tx_compl) @@ -905,7 +907,7 @@ int smc_wr_create_link(struct smc_link *lnk) goto dma_unmap; } smc_wr_init_sge(lnk); - bitmap_zero(lnk->wr_tx_mask, SMC_WR_BUF_CNT); + bitmap_zero(lnk->wr_tx_mask, lnk->lgr->max_send_wr); init_waitqueue_head(&lnk->wr_tx_wait); rc =3D percpu_ref_init(&lnk->wr_tx_refs, smcr_wr_tx_refs_free, 0, GFP_KER= NEL); if (rc) diff --git a/net/smc/smc_wr.h b/net/smc/smc_wr.h index f3008dda222a..aa4533af9122 100644 --- a/net/smc/smc_wr.h +++ b/net/smc/smc_wr.h @@ -19,8 +19,6 @@ #include "smc.h" #include "smc_core.h" =20 -#define SMC_WR_BUF_CNT 16 /* # of ctrl buffers per link */ - #define SMC_WR_TX_WAIT_FREE_SLOT_TIME (10 * HZ) =20 #define SMC_WR_TX_SIZE 44 /* actual size of wr_send data (<=3DSMC_WR_BUF_S= IZE) */ --=20 2.48.1 From nobody Fri Oct 3 05:27:00 2025 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B742C29AB02; Thu, 4 Sep 2025 21:13:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757020399; cv=none; b=ApFCeVd7hYbsK47+NSe3ebUlGjoXihStzhwS1lCYFODdye4jjF6nn4yKuhipfBxtG/VreYFHJjzsRn/okx3EwP30Lt+uvi4s76rLvF+/HXLo92FgmoNrIc8jdRB6SsHMpcFCxnrHLTJY5cBs8J5wN/W/D/2CcLBQbx0qge2A4SM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757020399; c=relaxed/simple; bh=hFC6Nms8C9ZkV5HnsMqil0Xb8sl6YFpXTCVKesSFKnI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=tYn5PLDH1Jo2l5Zz605ciHtAmW6f7aGOGfLvro+eL0nlwxndzPsL1glGeERnApCH38Xmrg0mf99BbSjt7qEtLBkiS+PLq1XuVltU+8CqCCgQBSgZZJnvRHmEEXde89K6vUTkkjP8fonDAlBTQMjS/6ceSpWBoX/2Y2tFxqdFRBc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=AcmhCGZs; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="AcmhCGZs" Received: from pps.filterd (m0356517.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 584JislO021545; Thu, 4 Sep 2025 21:13:12 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=q7bxqLDXoCV8WUS1V 9DL3EafhTM2XLjsHohmB9OjoGE=; b=AcmhCGZsE+f3KacQIiFNRAta1I+vuAV2b LtAPXSMqMYTZTDeiwTfBKuN+oVt0YLXZC2oPFVluRWqKU+HetLwempkD+z8BbmXr nbfAb2ERVreSAyrX+bavglGKqZiT5USFi9sFg5vdzrQW84Hthvkxo9IG+kRhQUL6 XJoP/2AMPERn8qDt+2nEripDQMOVRWcAqHSm5CYayU8ERkleCxQ75P5VHlvYJZQ6 QcUcuLceixlo2egCn63VCmUH36sTBDkJMUPntLS1MWTo8mnQoYmqYLNJCPLvMhNV i9frgikcJaLj0Zsil/apY9FLcfhfjDk9j5fGsnWmKjh4Mk7sPPWDg== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 48usurcqn6-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 04 Sep 2025 21:13:12 +0000 (GMT) Received: from m0356517.ppops.net (m0356517.ppops.net [127.0.0.1]) by pps.reinject (8.18.1.12/8.18.0.8) with ESMTP id 584Kubh5023452; Thu, 4 Sep 2025 21:13:12 GMT Received: from ppma21.wdc07v.mail.ibm.com (5b.69.3da9.ip4.static.sl-reverse.com [169.61.105.91]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 48usurcqn3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 04 Sep 2025 21:13:11 +0000 (GMT) Received: from pps.filterd (ppma21.wdc07v.mail.ibm.com [127.0.0.1]) by ppma21.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 584Ixlwx021184; Thu, 4 Sep 2025 21:13:10 GMT Received: from smtprelay07.fra02v.mail.ibm.com ([9.218.2.229]) by ppma21.wdc07v.mail.ibm.com (PPS) with ESMTPS id 48vcmpxfnd-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 04 Sep 2025 21:13:10 +0000 Received: from smtpav01.fra02v.mail.ibm.com (smtpav01.fra02v.mail.ibm.com [10.20.54.100]) by smtprelay07.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 584LD66b46334214 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 4 Sep 2025 21:13:06 GMT Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id C491320043; Thu, 4 Sep 2025 21:13:06 +0000 (GMT) Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 7E54120040; Thu, 4 Sep 2025 21:13:06 +0000 (GMT) Received: from tuxmaker.boeblingen.de.ibm.com (unknown [9.152.85.9]) by smtpav01.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 4 Sep 2025 21:13:06 +0000 (GMT) From: Halil Pasic To: Jakub Kicinski , Paolo Abeni , Simon Horman , "D. Wythe" , Dust Li , Sidraya Jayagond , Wenjia Zhang , Mahanta Jambigi , Tony Lu , Wen Gu , netdev@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, linux-s390@vger.kernel.org Cc: Halil Pasic Subject: [PATCH net-next 2/2] net/smc: handle -ENOMEM from smc_wr_alloc_link_mem gracefully Date: Thu, 4 Sep 2025 23:12:53 +0200 Message-ID: <20250904211254.1057445-3-pasic@linux.ibm.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250904211254.1057445-1-pasic@linux.ibm.com> References: <20250904211254.1057445-1-pasic@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUwODMwMDAzMCBTYWx0ZWRfX0bkxGRNwBPTM 0+T/lgY0FxHe+qLbexriYhrHeeemslx8RNxsS/riuQMs0peoLeEnjaOnOdO0avozXri6szFkZ4S 6HQ1cpl4dlCN9B6Om6I4/mxWBquByAxuU0nJXv/R7x0N5dLAnjAO07p6PelQmiH7SSjGPpL57b9 FdDOd59xEkVhT2+r6qwG1R4i2QVTlxiRdoQGWW7JXOCtcGe9lZR7c8rKEGo24RYc+NgFq5C+qVH 4yQw6XHGRjxXyzmNagaTO9MgxG90pbPjOElJTHMOnkGJ1qdt8shQhCAGIUkBE/6hlROFmIZWLSt 7IWwDqqWtSYSNWQcKXrZXYjDI+rpvweiIj4xK36Ig1vM4BQejK5ghqpPri1UG23k945ZJruzivQ uhhUDgXj X-Proofpoint-GUID: 1qU3_8re3h3iooajrEp7y9Iwnmn-pJka X-Proofpoint-ORIG-GUID: 4bbmx_NnqJcWUh89kPKyVUrszrR6RGnQ X-Authority-Analysis: v=2.4 cv=Ao/u3P9P c=1 sm=1 tr=0 ts=68ba00e8 cx=c_pps a=GFwsV6G8L6GxiO2Y/PsHdQ==:117 a=GFwsV6G8L6GxiO2Y/PsHdQ==:17 a=yJojWOMRYYMA:10 a=VnNF1IyMAAAA:8 a=7VMdMTDjWeNOirE0wYMA:9 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1117,Hydra:6.1.9,FMLib:17.12.80.40 definitions=2025-09-04_07,2025-09-04_01,2025-03-28_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 phishscore=0 impostorscore=0 priorityscore=1501 spamscore=0 suspectscore=0 bulkscore=0 adultscore=0 malwarescore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2507300000 definitions=main-2508300030 Content-Type: text/plain; charset="utf-8" Currently if a -ENOMEM from smc_wr_alloc_link_mem() is handled by giving up and going the way of a TCP fallback. This was reasonable before the sizes of the allocations there were compile time constants and reasonably small. But now those are actually configurable. So instead of giving up, keep retrying with half of the requested size unless we dip below the old static sizes -- then give up! Signed-off-by: Halil Pasic Reviewed-by: Wenjia Zhang --- Documentation/networking/smc-sysctl.rst | 9 ++++--- net/smc/smc_core.c | 34 +++++++++++++++++-------- net/smc/smc_core.h | 2 ++ net/smc/smc_wr.c | 28 ++++++++++---------- 4 files changed, 46 insertions(+), 27 deletions(-) diff --git a/Documentation/networking/smc-sysctl.rst b/Documentation/networ= king/smc-sysctl.rst index c687092329e3..c8dbe7ac8bdf 100644 --- a/Documentation/networking/smc-sysctl.rst +++ b/Documentation/networking/smc-sysctl.rst @@ -85,9 +85,10 @@ smcr_max_send_wr - INTEGER =20 Please be aware that all the buffers need to be allocated as a physically continuous array in which each element is a single buffer and has the size - of SMC_WR_BUF_SIZE (48) bytes. If the allocation fails we give up much + of SMC_WR_BUF_SIZE (48) bytes. If the allocation fails, we keep retrying + with half of the buffer count until it is ether successful or (unlikely) + we dip below the old hard coded value which is 16 where we give up much like before having this control. - this control. =20 Default: 16 =20 @@ -104,7 +105,9 @@ smcr_max_recv_wr - INTEGER =20 Please be aware that all the buffers need to be allocated as a physically continuous array in which each element is a single buffer and has the size - of SMC_WR_BUF_SIZE (48) bytes. If the allocation fails we give up much + of SMC_WR_BUF_SIZE (48) bytes. If the allocation fails, we keep retrying + with half of the buffer count until it is ether successful or (unlikely) + we dip below the old hard coded value which is 16 where we give up much like before having this control. =20 Default: 48 diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c index 262746e304dd..da2bde99ebc6 100644 --- a/net/smc/smc_core.c +++ b/net/smc/smc_core.c @@ -810,6 +810,8 @@ int smcr_link_init(struct smc_link_group *lgr, struct s= mc_link *lnk, lnk->clearing =3D 0; lnk->path_mtu =3D lnk->smcibdev->pattr[lnk->ibport - 1].active_mtu; lnk->link_id =3D smcr_next_link_id(lgr); + lnk->max_send_wr =3D lgr->max_send_wr; + lnk->max_recv_wr =3D lgr->max_recv_wr; lnk->lgr =3D lgr; smc_lgr_hold(lgr); /* lgr_put in smcr_link_clear() */ lnk->link_idx =3D link_idx; @@ -836,27 +838,39 @@ int smcr_link_init(struct smc_link_group *lgr, struct= smc_link *lnk, rc =3D smc_llc_link_init(lnk); if (rc) goto out; - rc =3D smc_wr_alloc_link_mem(lnk); - if (rc) - goto clear_llc_lnk; rc =3D smc_ib_create_protection_domain(lnk); if (rc) - goto free_link_mem; - rc =3D smc_ib_create_queue_pair(lnk); - if (rc) - goto dealloc_pd; + goto clear_llc_lnk; + do { + rc =3D smc_ib_create_queue_pair(lnk); + if (rc) + goto dealloc_pd; + rc =3D smc_wr_alloc_link_mem(lnk); + if (!rc) + break; + else if (rc !=3D -ENOMEM) /* give up */ + goto destroy_qp; + /* retry with smaller ... */ + lnk->max_send_wr /=3D 2; + lnk->max_recv_wr /=3D 2; + /* ... unless droping below old SMC_WR_BUF_SIZE */ + if (lnk->max_send_wr < 16 || lnk->max_recv_wr < 48) + goto destroy_qp; + smc_ib_destroy_queue_pair(lnk); + } while (1); + rc =3D smc_wr_create_link(lnk); if (rc) - goto destroy_qp; + goto free_link_mem; lnk->state =3D SMC_LNK_ACTIVATING; return 0; =20 +free_link_mem: + smc_wr_free_link_mem(lnk); destroy_qp: smc_ib_destroy_queue_pair(lnk); dealloc_pd: smc_ib_dealloc_protection_domain(lnk); -free_link_mem: - smc_wr_free_link_mem(lnk); clear_llc_lnk: smc_llc_link_clear(lnk, false); out: diff --git a/net/smc/smc_core.h b/net/smc/smc_core.h index b883f43fc206..92d70c57d23d 100644 --- a/net/smc/smc_core.h +++ b/net/smc/smc_core.h @@ -172,6 +172,8 @@ struct smc_link { struct completion llc_testlink_resp; /* wait for rx of testlink */ int llc_testlink_time; /* testlink interval */ atomic_t conn_cnt; /* connections on this link */ + u16 max_send_wr; + u16 max_recv_wr; }; =20 /* For now we just allow one parallel link per link group. The SMC protocol diff --git a/net/smc/smc_wr.c b/net/smc/smc_wr.c index 85ebc65f1546..4759041d3b02 100644 --- a/net/smc/smc_wr.c +++ b/net/smc/smc_wr.c @@ -548,9 +548,9 @@ void smc_wr_remember_qp_attr(struct smc_link *lnk) IB_QP_DEST_QPN, &init_attr); =20 - lnk->wr_tx_cnt =3D min_t(size_t, lnk->lgr->max_send_wr, + lnk->wr_tx_cnt =3D min_t(size_t, lnk->max_send_wr, lnk->qp_attr.cap.max_send_wr); - lnk->wr_rx_cnt =3D min_t(size_t, lnk->lgr->max_recv_wr, + lnk->wr_rx_cnt =3D min_t(size_t, lnk->max_recv_wr, lnk->qp_attr.cap.max_recv_wr); } =20 @@ -742,51 +742,51 @@ int smc_wr_alloc_lgr_mem(struct smc_link_group *lgr) int smc_wr_alloc_link_mem(struct smc_link *link) { /* allocate link related memory */ - link->wr_tx_bufs =3D kcalloc(link->lgr->max_send_wr, + link->wr_tx_bufs =3D kcalloc(link->max_send_wr, SMC_WR_BUF_SIZE, GFP_KERNEL); if (!link->wr_tx_bufs) goto no_mem; - link->wr_rx_bufs =3D kcalloc(link->lgr->max_recv_wr, SMC_WR_BUF_SIZE, + link->wr_rx_bufs =3D kcalloc(link->max_recv_wr, SMC_WR_BUF_SIZE, GFP_KERNEL); if (!link->wr_rx_bufs) goto no_mem_wr_tx_bufs; - link->wr_tx_ibs =3D kcalloc(link->lgr->max_send_wr, + link->wr_tx_ibs =3D kcalloc(link->max_send_wr, sizeof(link->wr_tx_ibs[0]), GFP_KERNEL); if (!link->wr_tx_ibs) goto no_mem_wr_rx_bufs; - link->wr_rx_ibs =3D kcalloc(link->lgr->max_recv_wr, + link->wr_rx_ibs =3D kcalloc(link->max_recv_wr, sizeof(link->wr_rx_ibs[0]), GFP_KERNEL); if (!link->wr_rx_ibs) goto no_mem_wr_tx_ibs; - link->wr_tx_rdmas =3D kcalloc(link->lgr->max_send_wr, + link->wr_tx_rdmas =3D kcalloc(link->max_send_wr, sizeof(link->wr_tx_rdmas[0]), GFP_KERNEL); if (!link->wr_tx_rdmas) goto no_mem_wr_rx_ibs; - link->wr_tx_rdma_sges =3D kcalloc(link->lgr->max_send_wr, + link->wr_tx_rdma_sges =3D kcalloc(link->max_send_wr, sizeof(link->wr_tx_rdma_sges[0]), GFP_KERNEL); if (!link->wr_tx_rdma_sges) goto no_mem_wr_tx_rdmas; - link->wr_tx_sges =3D kcalloc(link->lgr->max_send_wr, sizeof(link->wr_tx_s= ges[0]), + link->wr_tx_sges =3D kcalloc(link->max_send_wr, sizeof(link->wr_tx_sges[0= ]), GFP_KERNEL); if (!link->wr_tx_sges) goto no_mem_wr_tx_rdma_sges; - link->wr_rx_sges =3D kcalloc(link->lgr->max_recv_wr, + link->wr_rx_sges =3D kcalloc(link->max_recv_wr, sizeof(link->wr_rx_sges[0]) * link->wr_rx_sge_cnt, GFP_KERNEL); if (!link->wr_rx_sges) goto no_mem_wr_tx_sges; - link->wr_tx_mask =3D bitmap_zalloc(link->lgr->max_send_wr, GFP_KERNEL); + link->wr_tx_mask =3D bitmap_zalloc(link->max_send_wr, GFP_KERNEL); if (!link->wr_tx_mask) goto no_mem_wr_rx_sges; - link->wr_tx_pends =3D kcalloc(link->lgr->max_send_wr, + link->wr_tx_pends =3D kcalloc(link->max_send_wr, sizeof(link->wr_tx_pends[0]), GFP_KERNEL); if (!link->wr_tx_pends) goto no_mem_wr_tx_mask; - link->wr_tx_compl =3D kcalloc(link->lgr->max_send_wr, + link->wr_tx_compl =3D kcalloc(link->max_send_wr, sizeof(link->wr_tx_compl[0]), GFP_KERNEL); if (!link->wr_tx_compl) @@ -907,7 +907,7 @@ int smc_wr_create_link(struct smc_link *lnk) goto dma_unmap; } smc_wr_init_sge(lnk); - bitmap_zero(lnk->wr_tx_mask, lnk->lgr->max_send_wr); + bitmap_zero(lnk->wr_tx_mask, lnk->max_send_wr); init_waitqueue_head(&lnk->wr_tx_wait); rc =3D percpu_ref_init(&lnk->wr_tx_refs, smcr_wr_tx_refs_free, 0, GFP_KER= NEL); if (rc) --=20 2.48.1