From nobody Wed Sep 10 01:55:22 2025 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 83A1C23B61B; Mon, 8 Sep 2025 22:02:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757368937; cv=none; b=lrrtvqQ9AWqd3Zzp0ajthxTgL3HJ52CQd5VF6NfrMU+0XX4iWide/kfk09GLTlmBCCUr2ooODxi7REO0tW+1AS1OGL/WU6s6k3UUwWzoo88a0Ohls7AHGjWONxzHuqfO1dKItkYwcrWGJ6T9m/XYYt153yV0PSH2mUp+r86pzYQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757368937; c=relaxed/simple; bh=lcyXud24vZNQdrcsnCWqajJtyg6x23Eo7WKbBDn0xFo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=NmIqtvJwt2BemtOlHZbPF5ktbSyy0Y5n9WIjQE2yoDhV4t8G0Uhh2jzmLyBezri/6bHQk8zD3tAPf1LKVo9Kd6avkRpgk2YPRdfOanDZoaOoV/hg3KunjSFgQ40KjXtOp2D0rOMt3boTbUz7tz0cutpAAQhchBfZkPWyblmnCmw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=e5xclmi0; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="e5xclmi0" Received: from pps.filterd (m0360072.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 588JHHMk015803; Mon, 8 Sep 2025 22:02:10 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=tsYWuK5YeqBuZKKfY an0AcDxkEgw1Def9PEDD+3IlWw=; b=e5xclmi0PJTnLaBik/Ld32F93hEYVMbhF 9HlmGSfpOMOVuPJoSePhAS/s5n9PUyS0+D5lb945zfOq07j3V5zWWxdyZD8kWf2Q S8dwZxg+qAbnVLCPr4t2rUU/mg8/OYNgYoJtnOKtZVADyilIEZYRPaQR9OgVerBY JBSJxelYsHR0NuEyBOBLeAExNYRG0LMUhu6q+GzFjLiFJ4hox7zygfTLwDl5DSZB NIggYsZhnTZTBSNDqFk/G5fEFV3sA8GJ5eowamaRgFkE8qx3J1XC2hfJoKXJqVq7 sSvpMbAKQcCx61Q3zUItTqs+vsdN/b0S+HlLlOZpQ35qzGsM5hwRQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 490uke9c2u-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 08 Sep 2025 22:02:10 +0000 (GMT) Received: from m0360072.ppops.net (m0360072.ppops.net [127.0.0.1]) by pps.reinject (8.18.1.12/8.18.0.8) with ESMTP id 588LxGkR014420; Mon, 8 Sep 2025 22:02:10 GMT Received: from ppma22.wdc07v.mail.ibm.com (5c.69.3da9.ip4.static.sl-reverse.com [169.61.105.92]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 490uke9c2r-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 08 Sep 2025 22:02:10 +0000 (GMT) Received: from pps.filterd (ppma22.wdc07v.mail.ibm.com [127.0.0.1]) by ppma22.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 588KCMMP020458; Mon, 8 Sep 2025 22:02:09 GMT Received: from smtprelay03.fra02v.mail.ibm.com ([9.218.2.224]) by ppma22.wdc07v.mail.ibm.com (PPS) with ESMTPS id 490yp0r8m2-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 08 Sep 2025 22:02:09 +0000 Received: from smtpav02.fra02v.mail.ibm.com (smtpav02.fra02v.mail.ibm.com [10.20.54.101]) by smtprelay03.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 588M25ed55443838 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 8 Sep 2025 22:02:05 GMT Received: from smtpav02.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 37DB22004B; Mon, 8 Sep 2025 22:02:05 +0000 (GMT) Received: from smtpav02.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E72CD20043; Mon, 8 Sep 2025 22:02:04 +0000 (GMT) Received: from tuxmaker.boeblingen.de.ibm.com (unknown [9.152.85.9]) by smtpav02.fra02v.mail.ibm.com (Postfix) with ESMTP; Mon, 8 Sep 2025 22:02:04 +0000 (GMT) From: Halil Pasic To: Jakub Kicinski , Paolo Abeni , Simon Horman , "D. Wythe" , Dust Li , Sidraya Jayagond , Wenjia Zhang , Mahanta Jambigi , Tony Lu , Wen Gu , netdev@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, linux-s390@vger.kernel.org Cc: Halil Pasic Subject: [PATCH net-next v2 1/2] net/smc: make wr buffer count configurable Date: Tue, 9 Sep 2025 00:01:49 +0200 Message-ID: <20250908220150.3329433-2-pasic@linux.ibm.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250908220150.3329433-1-pasic@linux.ibm.com> References: <20250908220150.3329433-1-pasic@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUwOTA2MDE5NSBTYWx0ZWRfX0nVr6xhnYNwa OpKpsERpfnYWydKWTJaNi1zt4O67c+PVjYRz17dRfVpV7dHhHEjd0+nZgnnTEvx9u3R6Di4iSWF UpmgBo16JtfOkrmj4HyeHvD6xi2kUbFh+ym/kEJyHIwHXVQaewKfib0YdzKt2M3rrZ86T2zF5Tg BjX4+unO+Kx4NKT9nguAXM2XVKb+M54hf3xyLKKrNM8SquTu4QhQFW+TjYicTnUNBaGEr236OVn /7x/hhV94lBZ8s9DmmlJEEYgMqEsV5XOrr7JOYDR3UXztuYLU9koCUTOslm2ANVLvhPyAcLsrkM ZrTpAxrCPXZDail8a1z/Z+OB2Vd2L4yyQa+WiDoMriQtdgF1isYJfjzUYN5joDPy6AwUC4Ik1wO 49tp+1mN X-Proofpoint-ORIG-GUID: r2lAsB7W5dnEtT3bT-I3WzC6eW-1bgUt X-Proofpoint-GUID: vW4KVPQzP5cL6iHjEO4dyM_WRSKvVYqu X-Authority-Analysis: v=2.4 cv=StCQ6OO0 c=1 sm=1 tr=0 ts=68bf5262 cx=c_pps a=5BHTudwdYE3Te8bg5FgnPg==:117 a=5BHTudwdYE3Te8bg5FgnPg==:17 a=yJojWOMRYYMA:10 a=VnNF1IyMAAAA:8 a=a6OFRn_g8X_OcWKUc2YA:9 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1117,Hydra:6.1.9,FMLib:17.12.80.40 definitions=2025-09-08_06,2025-09-08_02,2025-03-28_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 malwarescore=0 bulkscore=0 clxscore=1015 adultscore=0 suspectscore=0 priorityscore=1501 impostorscore=0 phishscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2507300000 definitions=main-2509060195 Content-Type: text/plain; charset="utf-8" Think SMC_WR_BUF_CNT_SEND :=3D SMC_WR_BUF_CNT used in send context and SMC_WR_BUF_CNT_RECV :=3D 3 * SMC_WR_BUF_CNT used in recv context. Those get replaced with lgr->pref_send_wr and lgr->max_recv_wr respective. While at it let us also remove a confusing comment that is either not about the context in which it resides (describing qp_attr.cap.pref_send_wr and qp_attr.cap.max_recv_wr) or not applicable any more when these values become configurable. Signed-off-by: Halil Pasic --- Documentation/networking/smc-sysctl.rst | 37 +++++++++++++++++++++++++ include/net/netns/smc.h | 2 ++ net/smc/smc_core.h | 6 ++++ net/smc/smc_ib.c | 7 ++--- net/smc/smc_llc.c | 2 ++ net/smc/smc_sysctl.c | 22 +++++++++++++++ net/smc/smc_sysctl.h | 2 ++ net/smc/smc_wr.c | 32 +++++++++++---------- net/smc/smc_wr.h | 2 -- 9 files changed, 90 insertions(+), 22 deletions(-) diff --git a/Documentation/networking/smc-sysctl.rst b/Documentation/networ= king/smc-sysctl.rst index a874d007f2db..d533830df28f 100644 --- a/Documentation/networking/smc-sysctl.rst +++ b/Documentation/networking/smc-sysctl.rst @@ -71,3 +71,40 @@ smcr_max_conns_per_lgr - INTEGER acceptable value ranges from 16 to 255. Only for SMC-R v2.1 and later. =20 Default: 255 + +smcr_pref_send_wr - INTEGER + So called work request buffers are SMCR link (and RDMA queue pair) level + resources necessary for performing RDMA operations. Since up to 255 + connections can share a link group and thus also a link and the number + of the work request buffers is decided when the link is allocated, + depending on the workload it can a bottleneck in a sense that threads + have to wait for work request buffers to become available. Before the + introduction of this control the maximal number of work request buffers + available on the send path used to be hard coded to 16. With this control + it becomes configurable. The acceptable range is between 2 and 2048. + + Please be aware that all the buffers need to be allocated as a physically + continuous array in which each element is a single buffer and has the size + of SMC_WR_BUF_SIZE (48) bytes. If the allocation fails we give up much + like before having this control. + this control. + + Default: 16 + +smcr_pref_recv_wr - INTEGER + So called work request buffers are SMCR link (and RDMA queue pair) level + resources necessary for performing RDMA operations. Since up to 255 + connections can share a link group and thus also a link and the number + of the work request buffers is decided when the link is allocated, + depending on the workload it can a bottleneck in a sense that threads + have to wait for work request buffers to become available. Before the + introduction of this control the maximal number of work request buffers + available on the receive path used to be hard coded to 16. With this cont= rol + it becomes configurable. The acceptable range is between 2 and 2048. + + Please be aware that all the buffers need to be allocated as a physically + continuous array in which each element is a single buffer and has the size + of SMC_WR_BUF_SIZE (48) bytes. If the allocation fails we give up much + like before having this control. + + Default: 48 diff --git a/include/net/netns/smc.h b/include/net/netns/smc.h index fc752a50f91b..830817fc7fd7 100644 --- a/include/net/netns/smc.h +++ b/include/net/netns/smc.h @@ -24,5 +24,7 @@ struct netns_smc { int sysctl_rmem; int sysctl_max_links_per_lgr; int sysctl_max_conns_per_lgr; + unsigned int sysctl_smcr_pref_send_wr; + unsigned int sysctl_smcr_pref_recv_wr; }; #endif diff --git a/net/smc/smc_core.h b/net/smc/smc_core.h index 48a1b1dcb576..78d5bcefa1b8 100644 --- a/net/smc/smc_core.h +++ b/net/smc/smc_core.h @@ -33,6 +33,8 @@ * distributions may modify it to a value between * 16-255 as needed. */ +#define SMCR_MAX_SEND_WR_DEF 16 /* Default number of work requests per sen= d queue */ +#define SMCR_MAX_RECV_WR_DEF 48 /* Default number of work requests per rec= v queue */ =20 struct smc_lgr_list { /* list of link group definition */ struct list_head list; @@ -361,6 +363,10 @@ struct smc_link_group { /* max conn can be assigned to lgr */ u8 max_links; /* max links can be added in lgr */ + u16 pref_send_wr; + /* number of WR buffers on send */ + u16 pref_recv_wr; + /* number of WR buffers on recv */ }; struct { /* SMC-D */ struct smcd_gid peer_gid; diff --git a/net/smc/smc_ib.c b/net/smc/smc_ib.c index 0052f02756eb..2f8f214fc634 100644 --- a/net/smc/smc_ib.c +++ b/net/smc/smc_ib.c @@ -669,11 +669,6 @@ int smc_ib_create_queue_pair(struct smc_link *lnk) .recv_cq =3D lnk->smcibdev->roce_cq_recv, .srq =3D NULL, .cap =3D { - /* include unsolicited rdma_writes as well, - * there are max. 2 RDMA_WRITE per 1 WR_SEND - */ - .max_send_wr =3D SMC_WR_BUF_CNT * 3, - .max_recv_wr =3D SMC_WR_BUF_CNT * 3, .max_send_sge =3D SMC_IB_MAX_SEND_SGE, .max_recv_sge =3D lnk->wr_rx_sge_cnt, .max_inline_data =3D 0, @@ -683,6 +678,8 @@ int smc_ib_create_queue_pair(struct smc_link *lnk) }; int rc; =20 + qp_attr.cap.max_send_wr =3D 3 * lnk->lgr->pref_send_wr; + qp_attr.cap.max_recv_wr =3D lnk->lgr->pref_recv_wr; lnk->roce_qp =3D ib_create_qp(lnk->roce_pd, &qp_attr); rc =3D PTR_ERR_OR_ZERO(lnk->roce_qp); if (IS_ERR(lnk->roce_qp)) diff --git a/net/smc/smc_llc.c b/net/smc/smc_llc.c index f865c58c3aa7..1098bdc3557b 100644 --- a/net/smc/smc_llc.c +++ b/net/smc/smc_llc.c @@ -2157,6 +2157,8 @@ void smc_llc_lgr_init(struct smc_link_group *lgr, str= uct smc_sock *smc) init_waitqueue_head(&lgr->llc_msg_waiter); init_rwsem(&lgr->llc_conf_mutex); lgr->llc_testlink_time =3D READ_ONCE(net->smc.sysctl_smcr_testlink_time); + lgr->pref_send_wr =3D (u16)(READ_ONCE(net->smc.sysctl_smcr_pref_send_wr)); + lgr->pref_recv_wr =3D (u16)(READ_ONCE(net->smc.sysctl_smcr_pref_recv_wr)); } =20 /* called after lgr was removed from lgr_list */ diff --git a/net/smc/smc_sysctl.c b/net/smc/smc_sysctl.c index 2fab6456f765..f320443e563b 100644 --- a/net/smc/smc_sysctl.c +++ b/net/smc/smc_sysctl.c @@ -29,6 +29,8 @@ static int links_per_lgr_min =3D SMC_LINKS_ADD_LNK_MIN; static int links_per_lgr_max =3D SMC_LINKS_ADD_LNK_MAX; static int conns_per_lgr_min =3D SMC_CONN_PER_LGR_MIN; static int conns_per_lgr_max =3D SMC_CONN_PER_LGR_MAX; +static unsigned int smcr_max_wr_min =3D 2; +static unsigned int smcr_max_wr_max =3D 2048; =20 static struct ctl_table smc_table[] =3D { { @@ -99,6 +101,24 @@ static struct ctl_table smc_table[] =3D { .extra1 =3D SYSCTL_ZERO, .extra2 =3D SYSCTL_ONE, }, + { + .procname =3D "smcr_pref_send_wr", + .data =3D &init_net.smc.sysctl_smcr_pref_send_wr, + .maxlen =3D sizeof(int), + .mode =3D 0644, + .proc_handler =3D proc_dointvec_minmax, + .extra1 =3D &smcr_max_wr_min, + .extra2 =3D &smcr_max_wr_max, + }, + { + .procname =3D "smcr_pref_recv_wr", + .data =3D &init_net.smc.sysctl_smcr_pref_recv_wr, + .maxlen =3D sizeof(int), + .mode =3D 0644, + .proc_handler =3D proc_dointvec_minmax, + .extra1 =3D &smcr_max_wr_min, + .extra2 =3D &smcr_max_wr_max, + }, }; =20 int __net_init smc_sysctl_net_init(struct net *net) @@ -130,6 +150,8 @@ int __net_init smc_sysctl_net_init(struct net *net) WRITE_ONCE(net->smc.sysctl_rmem, net_smc_rmem_init); net->smc.sysctl_max_links_per_lgr =3D SMC_LINKS_PER_LGR_MAX_PREFER; net->smc.sysctl_max_conns_per_lgr =3D SMC_CONN_PER_LGR_PREFER; + net->smc.sysctl_smcr_pref_send_wr =3D SMCR_MAX_SEND_WR_DEF; + net->smc.sysctl_smcr_pref_recv_wr =3D SMCR_MAX_RECV_WR_DEF; /* disable handshake limitation by default */ net->smc.limit_smc_hs =3D 0; =20 diff --git a/net/smc/smc_sysctl.h b/net/smc/smc_sysctl.h index eb2465ae1e15..5d17c6082cc2 100644 --- a/net/smc/smc_sysctl.h +++ b/net/smc/smc_sysctl.h @@ -25,6 +25,8 @@ static inline int smc_sysctl_net_init(struct net *net) net->smc.sysctl_autocorking_size =3D SMC_AUTOCORKING_DEFAULT_SIZE; net->smc.sysctl_max_links_per_lgr =3D SMC_LINKS_PER_LGR_MAX_PREFER; net->smc.sysctl_max_conns_per_lgr =3D SMC_CONN_PER_LGR_PREFER; + net->smc.sysctl_smcr_pref_send_wr =3D SMCR_MAX_SEND_WR_DEF; + net->smc.sysctl_smcr_pref_recv_wr =3D SMCR_MAX_RECV_WR_DEF; return 0; } =20 diff --git a/net/smc/smc_wr.c b/net/smc/smc_wr.c index b04a21b8c511..606fe0bec4ef 100644 --- a/net/smc/smc_wr.c +++ b/net/smc/smc_wr.c @@ -34,6 +34,7 @@ #define SMC_WR_MAX_POLL_CQE 10 /* max. # of compl. queue elements in 1 pol= l */ =20 #define SMC_WR_RX_HASH_BITS 4 + static DEFINE_HASHTABLE(smc_wr_rx_hash, SMC_WR_RX_HASH_BITS); static DEFINE_SPINLOCK(smc_wr_rx_hash_lock); =20 @@ -547,9 +548,9 @@ void smc_wr_remember_qp_attr(struct smc_link *lnk) IB_QP_DEST_QPN, &init_attr); =20 - lnk->wr_tx_cnt =3D min_t(size_t, SMC_WR_BUF_CNT, + lnk->wr_tx_cnt =3D min_t(size_t, lnk->lgr->pref_send_wr, lnk->qp_attr.cap.max_send_wr); - lnk->wr_rx_cnt =3D min_t(size_t, SMC_WR_BUF_CNT * 3, + lnk->wr_rx_cnt =3D min_t(size_t, lnk->lgr->pref_recv_wr, lnk->qp_attr.cap.max_recv_wr); } =20 @@ -741,50 +742,51 @@ int smc_wr_alloc_lgr_mem(struct smc_link_group *lgr) int smc_wr_alloc_link_mem(struct smc_link *link) { /* allocate link related memory */ - link->wr_tx_bufs =3D kcalloc(SMC_WR_BUF_CNT, SMC_WR_BUF_SIZE, GFP_KERNEL); + link->wr_tx_bufs =3D kcalloc(link->lgr->pref_send_wr, + SMC_WR_BUF_SIZE, GFP_KERNEL); if (!link->wr_tx_bufs) goto no_mem; - link->wr_rx_bufs =3D kcalloc(SMC_WR_BUF_CNT * 3, link->wr_rx_buflen, + link->wr_rx_bufs =3D kcalloc(link->lgr->pref_recv_wr, SMC_WR_BUF_SIZE, GFP_KERNEL); if (!link->wr_rx_bufs) goto no_mem_wr_tx_bufs; - link->wr_tx_ibs =3D kcalloc(SMC_WR_BUF_CNT, sizeof(link->wr_tx_ibs[0]), - GFP_KERNEL); + link->wr_tx_ibs =3D kcalloc(link->lgr->pref_send_wr, + sizeof(link->wr_tx_ibs[0]), GFP_KERNEL); if (!link->wr_tx_ibs) goto no_mem_wr_rx_bufs; - link->wr_rx_ibs =3D kcalloc(SMC_WR_BUF_CNT * 3, + link->wr_rx_ibs =3D kcalloc(link->lgr->pref_recv_wr, sizeof(link->wr_rx_ibs[0]), GFP_KERNEL); if (!link->wr_rx_ibs) goto no_mem_wr_tx_ibs; - link->wr_tx_rdmas =3D kcalloc(SMC_WR_BUF_CNT, + link->wr_tx_rdmas =3D kcalloc(link->lgr->pref_send_wr, sizeof(link->wr_tx_rdmas[0]), GFP_KERNEL); if (!link->wr_tx_rdmas) goto no_mem_wr_rx_ibs; - link->wr_tx_rdma_sges =3D kcalloc(SMC_WR_BUF_CNT, + link->wr_tx_rdma_sges =3D kcalloc(link->lgr->pref_send_wr, sizeof(link->wr_tx_rdma_sges[0]), GFP_KERNEL); if (!link->wr_tx_rdma_sges) goto no_mem_wr_tx_rdmas; - link->wr_tx_sges =3D kcalloc(SMC_WR_BUF_CNT, sizeof(link->wr_tx_sges[0]), + link->wr_tx_sges =3D kcalloc(link->lgr->pref_send_wr, sizeof(link->wr_tx_= sges[0]), GFP_KERNEL); if (!link->wr_tx_sges) goto no_mem_wr_tx_rdma_sges; - link->wr_rx_sges =3D kcalloc(SMC_WR_BUF_CNT * 3, + link->wr_rx_sges =3D kcalloc(link->lgr->pref_recv_wr, sizeof(link->wr_rx_sges[0]) * link->wr_rx_sge_cnt, GFP_KERNEL); if (!link->wr_rx_sges) goto no_mem_wr_tx_sges; - link->wr_tx_mask =3D bitmap_zalloc(SMC_WR_BUF_CNT, GFP_KERNEL); + link->wr_tx_mask =3D bitmap_zalloc(link->lgr->pref_send_wr, GFP_KERNEL); if (!link->wr_tx_mask) goto no_mem_wr_rx_sges; - link->wr_tx_pends =3D kcalloc(SMC_WR_BUF_CNT, + link->wr_tx_pends =3D kcalloc(link->lgr->pref_send_wr, sizeof(link->wr_tx_pends[0]), GFP_KERNEL); if (!link->wr_tx_pends) goto no_mem_wr_tx_mask; - link->wr_tx_compl =3D kcalloc(SMC_WR_BUF_CNT, + link->wr_tx_compl =3D kcalloc(link->lgr->pref_send_wr, sizeof(link->wr_tx_compl[0]), GFP_KERNEL); if (!link->wr_tx_compl) @@ -905,7 +907,7 @@ int smc_wr_create_link(struct smc_link *lnk) goto dma_unmap; } smc_wr_init_sge(lnk); - bitmap_zero(lnk->wr_tx_mask, SMC_WR_BUF_CNT); + bitmap_zero(lnk->wr_tx_mask, lnk->lgr->pref_send_wr); init_waitqueue_head(&lnk->wr_tx_wait); rc =3D percpu_ref_init(&lnk->wr_tx_refs, smcr_wr_tx_refs_free, 0, GFP_KER= NEL); if (rc) diff --git a/net/smc/smc_wr.h b/net/smc/smc_wr.h index f3008dda222a..aa4533af9122 100644 --- a/net/smc/smc_wr.h +++ b/net/smc/smc_wr.h @@ -19,8 +19,6 @@ #include "smc.h" #include "smc_core.h" =20 -#define SMC_WR_BUF_CNT 16 /* # of ctrl buffers per link */ - #define SMC_WR_TX_WAIT_FREE_SLOT_TIME (10 * HZ) =20 #define SMC_WR_TX_SIZE 44 /* actual size of wr_send data (<=3DSMC_WR_BUF_S= IZE) */ --=20 2.48.1 From nobody Wed Sep 10 01:55:22 2025 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D7A042E0B48; Mon, 8 Sep 2025 22:02:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757368937; cv=none; b=UJDMvIvbXuGQbSM0gGyST9ZVQlJtyqP20WwloEL3AfEDmMWPIPxuGsATI6GTE7+ZFgChL2icleZPeWof/ztBVvbpUTrjq/togZ3YxOPZv/DABFp7lLmtnz6Y5TikbjcNYPSZpmfWhAhpzpgSuKJpiYHIWHbzUxxduOzEqurY/KE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757368937; c=relaxed/simple; bh=7NY4au+FVcHrL1Ppuem7gZipZUMhri1geaZszZwCD5Y=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=JVGii+z+82oz3Z/InTQVGKXRpGNP/Je3X8lDJKAbA3Jv3RI/T92O/b+Ai9KOWnxb0k+y4W3ZvNcKvhY0QXOMvYpRu0Wcmi2gIjf0xP+1N62JEwVkiTW7mXvFwlEQdsDtxcLeQe437s4HVSE3pP0N2YF1L7ewTq5AEIaADIz7MKQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=l8t2XxIA; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="l8t2XxIA" Received: from pps.filterd (m0356517.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 588FD4SO027077; Mon, 8 Sep 2025 22:02:12 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=I7eqHUoxoVp4Ft9D7 njwXRV3GwVh95kt1h9dSGvY0zI=; b=l8t2XxIAOHM3HQQg6AbzbM7g1n0gTXLkK jkVxb5FWtINjMyVeresqQMjm5TUU6Qjg9PsSNhkX+mxIRjqGbpe6tSWKh0lGtIOj Bd7qGhKBBZUnHyx3WwwvyMWqWlprK5DFYV1TER3fWuuVoHAryPOGcvhjy+sCamD1 zTlr5J46gnkH1YP7rFz0UMmkLvLF+AXpu32zyo7EwxMSUrcU8uLTQAYfXb3FzXVz s3+G2WF8k7d6qxYrTag8E9C+nUBl6mbhAYXHf9yYZDHfaUDRrZ/MgPCXfoa0q79+ uSBJdPmVpquEuE5mt17wBBSu2mehctefCcVJD2TACGyImAM/MxKGQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 490xycs1x7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 08 Sep 2025 22:02:11 +0000 (GMT) Received: from m0356517.ppops.net (m0356517.ppops.net [127.0.0.1]) by pps.reinject (8.18.1.12/8.18.0.8) with ESMTP id 588M2BFd027898; Mon, 8 Sep 2025 22:02:11 GMT Received: from ppma13.dal12v.mail.ibm.com (dd.9e.1632.ip4.static.sl-reverse.com [50.22.158.221]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 490xycs1x2-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 08 Sep 2025 22:02:11 +0000 (GMT) Received: from pps.filterd (ppma13.dal12v.mail.ibm.com [127.0.0.1]) by ppma13.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 588M0ZhA017188; Mon, 8 Sep 2025 22:02:10 GMT Received: from smtprelay04.fra02v.mail.ibm.com ([9.218.2.228]) by ppma13.dal12v.mail.ibm.com (PPS) with ESMTPS id 4911gm7wn0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 08 Sep 2025 22:02:10 +0000 Received: from smtpav02.fra02v.mail.ibm.com (smtpav02.fra02v.mail.ibm.com [10.20.54.101]) by smtprelay04.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 588M26O331851136 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 8 Sep 2025 22:02:06 GMT Received: from smtpav02.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 758F620040; Mon, 8 Sep 2025 22:02:06 +0000 (GMT) Received: from smtpav02.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 30CD42004B; Mon, 8 Sep 2025 22:02:06 +0000 (GMT) Received: from tuxmaker.boeblingen.de.ibm.com (unknown [9.152.85.9]) by smtpav02.fra02v.mail.ibm.com (Postfix) with ESMTP; Mon, 8 Sep 2025 22:02:06 +0000 (GMT) From: Halil Pasic To: Jakub Kicinski , Paolo Abeni , Simon Horman , "D. Wythe" , Dust Li , Sidraya Jayagond , Wenjia Zhang , Mahanta Jambigi , Tony Lu , Wen Gu , netdev@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org, linux-s390@vger.kernel.org Cc: Halil Pasic Subject: [PATCH net-next v2 2/2] net/smc: handle -ENOMEM from smc_wr_alloc_link_mem gracefully Date: Tue, 9 Sep 2025 00:01:50 +0200 Message-ID: <20250908220150.3329433-3-pasic@linux.ibm.com> X-Mailer: git-send-email 2.48.1 In-Reply-To: <20250908220150.3329433-1-pasic@linux.ibm.com> References: <20250908220150.3329433-1-pasic@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: pe98gGZTqXxbY0TJr6moTRohHPiMPIsN X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUwOTA2MDIzNSBTYWx0ZWRfX/BuYjfEd5uyz EklwP5h/e+wSM2RXbqwOb3b03HstKMHfnuc932avFcsWKL6JwE3m8XUxsnqfWjMufnZJ5SvEpWI W2qJMVgxMlj2wyjdq2e6+sOGQM/YMhkrBWcyKHkQt7Ux/KNT8WCocXNYjXrjcwzhW8pUH8YmKVZ rqr1ThuQPolJb5HX2aHPa066mFYwYA7HRdx4hbBwslwztYNtb80CYmg97kdMNpNTbEpBYrlMFxY cHJFEdZ2/CmiVPrz9Sxnr5nPIrgFEF39Ywl9QWV4k+uu5To+fRjZgK72PJtM4id2Lw17CqfQKX1 Pd9d5G2qBGdXffAtDYEti6iY4I+x9HkRoW/GYtXLy5OCkgdxSJal5wLvtb7DNheyQBRrUBM4k/c RMvcYUzt X-Proofpoint-GUID: 8APDllAdHVL77cDITuVXnJbKh-3j4E4p X-Authority-Analysis: v=2.4 cv=F59XdrhN c=1 sm=1 tr=0 ts=68bf5263 cx=c_pps a=AfN7/Ok6k8XGzOShvHwTGQ==:117 a=AfN7/Ok6k8XGzOShvHwTGQ==:17 a=yJojWOMRYYMA:10 a=VnNF1IyMAAAA:8 a=uE9bR6y1tBfQdXGO-kEA:9 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1117,Hydra:6.1.9,FMLib:17.12.80.40 definitions=2025-09-08_06,2025-09-08_02,2025-03-28_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 clxscore=1015 impostorscore=0 bulkscore=0 adultscore=0 spamscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2507300000 definitions=main-2509060235 Content-Type: text/plain; charset="utf-8" Currently if a -ENOMEM from smc_wr_alloc_link_mem() is handled by giving up and going the way of a TCP fallback. This was reasonable before the sizes of the allocations there were compile time constants and reasonably small. But now those are actually configurable. So instead of giving up, keep retrying with half of the requested size unless we dip below the old static sizes -- then give up! Signed-off-by: Halil Pasic Reviewed-by: Wenjia Zhang Reviewed-by: Mahanta Jambigi --- Documentation/networking/smc-sysctl.rst | 9 ++++--- net/smc/smc_core.c | 34 +++++++++++++++++-------- net/smc/smc_core.h | 2 ++ net/smc/smc_wr.c | 28 ++++++++++---------- 4 files changed, 46 insertions(+), 27 deletions(-) diff --git a/Documentation/networking/smc-sysctl.rst b/Documentation/networ= king/smc-sysctl.rst index d533830df28f..846fdea87c84 100644 --- a/Documentation/networking/smc-sysctl.rst +++ b/Documentation/networking/smc-sysctl.rst @@ -85,9 +85,10 @@ smcr_pref_send_wr - INTEGER =20 Please be aware that all the buffers need to be allocated as a physically continuous array in which each element is a single buffer and has the size - of SMC_WR_BUF_SIZE (48) bytes. If the allocation fails we give up much + of SMC_WR_BUF_SIZE (48) bytes. If the allocation fails, we keep retrying + with half of the buffer count until it is ether successful or (unlikely) + we dip below the old hard coded value which is 16 where we give up much like before having this control. - this control. =20 Default: 16 =20 @@ -104,7 +105,9 @@ smcr_pref_recv_wr - INTEGER =20 Please be aware that all the buffers need to be allocated as a physically continuous array in which each element is a single buffer and has the size - of SMC_WR_BUF_SIZE (48) bytes. If the allocation fails we give up much + of SMC_WR_BUF_SIZE (48) bytes. If the allocation fails, we keep retrying + with half of the buffer count until it is ether successful or (unlikely) + we dip below the old hard coded value which is 16 where we give up much like before having this control. =20 Default: 48 diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c index 262746e304dd..d55511d79cc2 100644 --- a/net/smc/smc_core.c +++ b/net/smc/smc_core.c @@ -810,6 +810,8 @@ int smcr_link_init(struct smc_link_group *lgr, struct s= mc_link *lnk, lnk->clearing =3D 0; lnk->path_mtu =3D lnk->smcibdev->pattr[lnk->ibport - 1].active_mtu; lnk->link_id =3D smcr_next_link_id(lgr); + lnk->pref_send_wr =3D lgr->pref_send_wr; + lnk->pref_recv_wr =3D lgr->pref_recv_wr; lnk->lgr =3D lgr; smc_lgr_hold(lgr); /* lgr_put in smcr_link_clear() */ lnk->link_idx =3D link_idx; @@ -836,27 +838,39 @@ int smcr_link_init(struct smc_link_group *lgr, struct= smc_link *lnk, rc =3D smc_llc_link_init(lnk); if (rc) goto out; - rc =3D smc_wr_alloc_link_mem(lnk); - if (rc) - goto clear_llc_lnk; rc =3D smc_ib_create_protection_domain(lnk); if (rc) - goto free_link_mem; - rc =3D smc_ib_create_queue_pair(lnk); - if (rc) - goto dealloc_pd; + goto clear_llc_lnk; + do { + rc =3D smc_ib_create_queue_pair(lnk); + if (rc) + goto dealloc_pd; + rc =3D smc_wr_alloc_link_mem(lnk); + if (!rc) + break; + else if (rc !=3D -ENOMEM) /* give up */ + goto destroy_qp; + /* retry with smaller ... */ + lnk->pref_send_wr /=3D 2; + lnk->pref_recv_wr /=3D 2; + /* ... unless droping below old SMC_WR_BUF_SIZE */ + if (lnk->pref_send_wr < 16 || lnk->pref_recv_wr < 48) + goto destroy_qp; + smc_ib_destroy_queue_pair(lnk); + } while (1); + rc =3D smc_wr_create_link(lnk); if (rc) - goto destroy_qp; + goto free_link_mem; lnk->state =3D SMC_LNK_ACTIVATING; return 0; =20 +free_link_mem: + smc_wr_free_link_mem(lnk); destroy_qp: smc_ib_destroy_queue_pair(lnk); dealloc_pd: smc_ib_dealloc_protection_domain(lnk); -free_link_mem: - smc_wr_free_link_mem(lnk); clear_llc_lnk: smc_llc_link_clear(lnk, false); out: diff --git a/net/smc/smc_core.h b/net/smc/smc_core.h index 78d5bcefa1b8..18ba0364ff52 100644 --- a/net/smc/smc_core.h +++ b/net/smc/smc_core.h @@ -174,6 +174,8 @@ struct smc_link { struct completion llc_testlink_resp; /* wait for rx of testlink */ int llc_testlink_time; /* testlink interval */ atomic_t conn_cnt; /* connections on this link */ + u16 pref_send_wr; + u16 pref_recv_wr; }; =20 /* For now we just allow one parallel link per link group. The SMC protocol diff --git a/net/smc/smc_wr.c b/net/smc/smc_wr.c index 606fe0bec4ef..632d095599ed 100644 --- a/net/smc/smc_wr.c +++ b/net/smc/smc_wr.c @@ -548,9 +548,9 @@ void smc_wr_remember_qp_attr(struct smc_link *lnk) IB_QP_DEST_QPN, &init_attr); =20 - lnk->wr_tx_cnt =3D min_t(size_t, lnk->lgr->pref_send_wr, + lnk->wr_tx_cnt =3D min_t(size_t, lnk->pref_send_wr, lnk->qp_attr.cap.max_send_wr); - lnk->wr_rx_cnt =3D min_t(size_t, lnk->lgr->pref_recv_wr, + lnk->wr_rx_cnt =3D min_t(size_t, lnk->pref_recv_wr, lnk->qp_attr.cap.max_recv_wr); } =20 @@ -742,51 +742,51 @@ int smc_wr_alloc_lgr_mem(struct smc_link_group *lgr) int smc_wr_alloc_link_mem(struct smc_link *link) { /* allocate link related memory */ - link->wr_tx_bufs =3D kcalloc(link->lgr->pref_send_wr, + link->wr_tx_bufs =3D kcalloc(link->pref_send_wr, SMC_WR_BUF_SIZE, GFP_KERNEL); if (!link->wr_tx_bufs) goto no_mem; - link->wr_rx_bufs =3D kcalloc(link->lgr->pref_recv_wr, SMC_WR_BUF_SIZE, + link->wr_rx_bufs =3D kcalloc(link->pref_recv_wr, SMC_WR_BUF_SIZE, GFP_KERNEL); if (!link->wr_rx_bufs) goto no_mem_wr_tx_bufs; - link->wr_tx_ibs =3D kcalloc(link->lgr->pref_send_wr, + link->wr_tx_ibs =3D kcalloc(link->pref_send_wr, sizeof(link->wr_tx_ibs[0]), GFP_KERNEL); if (!link->wr_tx_ibs) goto no_mem_wr_rx_bufs; - link->wr_rx_ibs =3D kcalloc(link->lgr->pref_recv_wr, + link->wr_rx_ibs =3D kcalloc(link->pref_recv_wr, sizeof(link->wr_rx_ibs[0]), GFP_KERNEL); if (!link->wr_rx_ibs) goto no_mem_wr_tx_ibs; - link->wr_tx_rdmas =3D kcalloc(link->lgr->pref_send_wr, + link->wr_tx_rdmas =3D kcalloc(link->pref_send_wr, sizeof(link->wr_tx_rdmas[0]), GFP_KERNEL); if (!link->wr_tx_rdmas) goto no_mem_wr_rx_ibs; - link->wr_tx_rdma_sges =3D kcalloc(link->lgr->pref_send_wr, + link->wr_tx_rdma_sges =3D kcalloc(link->pref_send_wr, sizeof(link->wr_tx_rdma_sges[0]), GFP_KERNEL); if (!link->wr_tx_rdma_sges) goto no_mem_wr_tx_rdmas; - link->wr_tx_sges =3D kcalloc(link->lgr->pref_send_wr, sizeof(link->wr_tx_= sges[0]), + link->wr_tx_sges =3D kcalloc(link->pref_send_wr, sizeof(link->wr_tx_sges[= 0]), GFP_KERNEL); if (!link->wr_tx_sges) goto no_mem_wr_tx_rdma_sges; - link->wr_rx_sges =3D kcalloc(link->lgr->pref_recv_wr, + link->wr_rx_sges =3D kcalloc(link->pref_recv_wr, sizeof(link->wr_rx_sges[0]) * link->wr_rx_sge_cnt, GFP_KERNEL); if (!link->wr_rx_sges) goto no_mem_wr_tx_sges; - link->wr_tx_mask =3D bitmap_zalloc(link->lgr->pref_send_wr, GFP_KERNEL); + link->wr_tx_mask =3D bitmap_zalloc(link->pref_send_wr, GFP_KERNEL); if (!link->wr_tx_mask) goto no_mem_wr_rx_sges; - link->wr_tx_pends =3D kcalloc(link->lgr->pref_send_wr, + link->wr_tx_pends =3D kcalloc(link->pref_send_wr, sizeof(link->wr_tx_pends[0]), GFP_KERNEL); if (!link->wr_tx_pends) goto no_mem_wr_tx_mask; - link->wr_tx_compl =3D kcalloc(link->lgr->pref_send_wr, + link->wr_tx_compl =3D kcalloc(link->pref_send_wr, sizeof(link->wr_tx_compl[0]), GFP_KERNEL); if (!link->wr_tx_compl) @@ -907,7 +907,7 @@ int smc_wr_create_link(struct smc_link *lnk) goto dma_unmap; } smc_wr_init_sge(lnk); - bitmap_zero(lnk->wr_tx_mask, lnk->lgr->pref_send_wr); + bitmap_zero(lnk->wr_tx_mask, lnk->pref_send_wr); init_waitqueue_head(&lnk->wr_tx_wait); rc =3D percpu_ref_init(&lnk->wr_tx_refs, smcr_wr_tx_refs_free, 0, GFP_KER= NEL); if (rc) --=20 2.48.1