From nobody Sun Feb  8 05:36:59 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 18066C433EF
	for <linux-kernel@archiver.kernel.org>; Mon, 16 May 2022 05:52:00 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S239980AbiEPFv4 (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 16 May 2022 01:51:56 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48860 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S239931AbiEPFvv (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 16 May 2022 01:51:51 -0400
Received: from out30-131.freemail.mail.aliyun.com
 (out30-131.freemail.mail.aliyun.com [115.124.30.131])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E8F4015716;
        Sun, 15 May 2022 22:51:49 -0700 (PDT)
X-Alimail-AntiSpam: 
 AC=PASS;BC=-1|-1;BR=01201311R151e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04357;MF=guangguan.wang@linux.alibaba.com;NM=1;PH=DS;RN=11;SR=0;TI=SMTPD_---0VDEQJfX_1652680306;
Received: from localhost.localdomain(mailfrom:guangguan.wang@linux.alibaba.com
 fp:SMTPD_---0VDEQJfX_1652680306)
          by smtp.aliyun-inc.com(127.0.0.1);
          Mon, 16 May 2022 13:51:46 +0800
From: Guangguan Wang <guangguan.wang@linux.alibaba.com>
To: kgraul@linux.ibm.com, davem@davemloft.net, edumazet@google.com,
        kuba@kernel.org, pabeni@redhat.com, leon@kernel.org,
        tonylu@linux.alibaba.com
Cc: linux-s390@vger.kernel.org, netdev@vger.kernel.org,
        linux-kernel@vger.kernel.org, kernel test robot <lkp@intel.com>
Subject: [PATCH net-next v3 1/2] net/smc: send cdc msg inline if qp has
 sufficient inline space
Date: Mon, 16 May 2022 13:51:36 +0800
Message-Id: <20220516055137.51873-2-guangguan.wang@linux.alibaba.com>
X-Mailer: git-send-email 2.24.3 (Apple Git-128)
In-Reply-To: <20220516055137.51873-1-guangguan.wang@linux.alibaba.com>
References: <20220516055137.51873-1-guangguan.wang@linux.alibaba.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

As cdc msg's length is 44B, cdc msgs can be sent inline in
most rdma devices, which can help reducing sending latency.

In my test environment, which are 2 VMs running on the same
physical host and whose NICs(ConnectX-4Lx) are working on
SR-IOV mode, qperf shows 0.4us-0.7us improvement in latency.

Test command:
server: smc_run taskset -c 1 qperf
client: smc_run taskset -c 1 qperf <server ip> -oo \
		msg_size:1:2K:*2 -t 30 -vu tcp_lat

The results shown below:
msgsize     before       after
1B          11.9 us      11.2 us (-0.7 us)
2B          11.7 us      11.2 us (-0.5 us)
4B          11.7 us      11.3 us (-0.4 us)
8B          11.6 us      11.2 us (-0.4 us)
16B         11.7 us      11.3 us (-0.4 us)
32B         11.7 us      11.3 us (-0.4 us)
64B         11.7 us      11.2 us (-0.5 us)
128B        11.6 us      11.2 us (-0.4 us)
256B        11.8 us      11.2 us (-0.6 us)
512B        11.8 us      11.4 us (-0.4 us)
1KB         11.9 us      11.4 us (-0.5 us)
2KB         12.1 us      11.5 us (-0.6 us)

Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Tested-by: kernel test robot <lkp@intel.com>
Acked-by: Karsten Graul <kgraul@linux.ibm.com>
---
 net/smc/smc_ib.c | 1 +
 net/smc/smc_wr.c | 5 ++++-
 2 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/net/smc/smc_ib.c b/net/smc/smc_ib.c
index a3e2d3b89568..dcda4165d107 100644
--- a/net/smc/smc_ib.c
+++ b/net/smc/smc_ib.c
@@ -671,6 +671,7 @@ int smc_ib_create_queue_pair(struct smc_link *lnk)
 			.max_recv_wr =3D SMC_WR_BUF_CNT * 3,
 			.max_send_sge =3D SMC_IB_MAX_SEND_SGE,
 			.max_recv_sge =3D sges_per_buf,
+			.max_inline_data =3D 0,
 		},
 		.sq_sig_type =3D IB_SIGNAL_REQ_WR,
 		.qp_type =3D IB_QPT_RC,
diff --git a/net/smc/smc_wr.c b/net/smc/smc_wr.c
index 24be1d03fef9..26f8f240d9e8 100644
--- a/net/smc/smc_wr.c
+++ b/net/smc/smc_wr.c
@@ -554,10 +554,11 @@ void smc_wr_remember_qp_attr(struct smc_link *lnk)
 static void smc_wr_init_sge(struct smc_link *lnk)
 {
 	int sges_per_buf =3D (lnk->lgr->smc_version =3D=3D SMC_V2) ? 2 : 1;
+	bool send_inline =3D (lnk->qp_attr.cap.max_inline_data > SMC_WR_TX_SIZE);
 	u32 i;
=20
 	for (i =3D 0; i < lnk->wr_tx_cnt; i++) {
-		lnk->wr_tx_sges[i].addr =3D
+		lnk->wr_tx_sges[i].addr =3D send_inline ? (uintptr_t)(&lnk->wr_tx_bufs[i=
]) :
 			lnk->wr_tx_dma_addr + i * SMC_WR_BUF_SIZE;
 		lnk->wr_tx_sges[i].length =3D SMC_WR_TX_SIZE;
 		lnk->wr_tx_sges[i].lkey =3D lnk->roce_pd->local_dma_lkey;
@@ -575,6 +576,8 @@ static void smc_wr_init_sge(struct smc_link *lnk)
 		lnk->wr_tx_ibs[i].opcode =3D IB_WR_SEND;
 		lnk->wr_tx_ibs[i].send_flags =3D
 			IB_SEND_SIGNALED | IB_SEND_SOLICITED;
+		if (send_inline)
+			lnk->wr_tx_ibs[i].send_flags |=3D IB_SEND_INLINE;
 		lnk->wr_tx_rdmas[i].wr_tx_rdma[0].wr.opcode =3D IB_WR_RDMA_WRITE;
 		lnk->wr_tx_rdmas[i].wr_tx_rdma[1].wr.opcode =3D IB_WR_RDMA_WRITE;
 		lnk->wr_tx_rdmas[i].wr_tx_rdma[0].wr.sg_list =3D
--=20
2.24.3 (Apple Git-128)
From nobody Sun Feb  8 05:36:59 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 63D85C433F5
	for <linux-kernel@archiver.kernel.org>; Mon, 16 May 2022 05:52:05 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S238186AbiEPFwD (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Mon, 16 May 2022 01:52:03 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48876 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S239943AbiEPFvw (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 16 May 2022 01:51:52 -0400
Received: from out30-133.freemail.mail.aliyun.com
 (out30-133.freemail.mail.aliyun.com [115.124.30.133])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1453615712;
        Sun, 15 May 2022 22:51:50 -0700 (PDT)
X-Alimail-AntiSpam: 
 AC=PASS;BC=-1|-1;BR=01201311R881e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04400;MF=guangguan.wang@linux.alibaba.com;NM=1;PH=DS;RN=11;SR=0;TI=SMTPD_---0VDEQJfr_1652680307;
Received: from localhost.localdomain(mailfrom:guangguan.wang@linux.alibaba.com
 fp:SMTPD_---0VDEQJfr_1652680307)
          by smtp.aliyun-inc.com(127.0.0.1);
          Mon, 16 May 2022 13:51:47 +0800
From: Guangguan Wang <guangguan.wang@linux.alibaba.com>
To: kgraul@linux.ibm.com, davem@davemloft.net, edumazet@google.com,
        kuba@kernel.org, pabeni@redhat.com, leon@kernel.org,
        tonylu@linux.alibaba.com
Cc: linux-s390@vger.kernel.org, netdev@vger.kernel.org,
        linux-kernel@vger.kernel.org, kernel test robot <lkp@intel.com>
Subject: [PATCH net-next v3 2/2] net/smc: rdma write inline if qp has
 sufficient inline space
Date: Mon, 16 May 2022 13:51:37 +0800
Message-Id: <20220516055137.51873-3-guangguan.wang@linux.alibaba.com>
X-Mailer: git-send-email 2.24.3 (Apple Git-128)
In-Reply-To: <20220516055137.51873-1-guangguan.wang@linux.alibaba.com>
References: <20220516055137.51873-1-guangguan.wang@linux.alibaba.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

Rdma write with inline flag when sending small packages,
whose length is shorter than the qp's max_inline_data, can
help reducing latency.

In my test environment, which are 2 VMs running on the same
physical host and whose NICs(ConnectX-4Lx) are working on
SR-IOV mode, qperf shows 0.5us-0.7us improvement in latency.

Test command:
server: smc_run taskset -c 1 qperf
client: smc_run taskset -c 1 qperf <server ip> -oo \
		msg_size:1:2K:*2 -t 30 -vu tcp_lat

The results shown below:
msgsize     before       after
1B          11.2 us      10.6 us (-0.6 us)
2B          11.2 us      10.7 us (-0.5 us)
4B          11.3 us      10.7 us (-0.6 us)
8B          11.2 us      10.6 us (-0.6 us)
16B         11.3 us      10.7 us (-0.6 us)
32B         11.3 us      10.6 us (-0.7 us)
64B         11.2 us      11.2 us (0 us)
128B        11.2 us      11.2 us (0 us)
256B        11.2 us      11.2 us (0 us)
512B        11.4 us      11.3 us (-0.1 us)
1KB         11.4 us      11.5 us (0.1 us)
2KB         11.5 us      11.5 us (0 us)

Signed-off-by: Guangguan Wang <guangguan.wang@linux.alibaba.com>
Reviewed-by: Tony Lu <tonylu@linux.alibaba.com>
Tested-by: kernel test robot <lkp@intel.com>
Acked-by: Karsten Graul <kgraul@linux.ibm.com>
---
 net/smc/smc_tx.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/net/smc/smc_tx.c b/net/smc/smc_tx.c
index 98ca9229fe87..805a546e8c04 100644
--- a/net/smc/smc_tx.c
+++ b/net/smc/smc_tx.c
@@ -391,12 +391,20 @@ static int smcr_tx_rdma_writes(struct smc_connection =
*conn, size_t len,
 	int rc;
=20
 	for (dstchunk =3D 0; dstchunk < 2; dstchunk++) {
-		struct ib_sge *sge =3D
-			wr_rdma_buf->wr_tx_rdma[dstchunk].wr.sg_list;
+		struct ib_rdma_wr *wr =3D &wr_rdma_buf->wr_tx_rdma[dstchunk];
+		struct ib_sge *sge =3D wr->wr.sg_list;
+		u64 base_addr =3D dma_addr;
+
+		if (dst_len < link->qp_attr.cap.max_inline_data) {
+			base_addr =3D (uintptr_t)conn->sndbuf_desc->cpu_addr;
+			wr->wr.send_flags |=3D IB_SEND_INLINE;
+		} else {
+			wr->wr.send_flags &=3D ~IB_SEND_INLINE;
+		}
=20
 		num_sges =3D 0;
 		for (srcchunk =3D 0; srcchunk < 2; srcchunk++) {
-			sge[srcchunk].addr =3D dma_addr + src_off;
+			sge[srcchunk].addr =3D base_addr + src_off;
 			sge[srcchunk].length =3D src_len;
 			num_sges++;
=20
@@ -410,8 +418,7 @@ static int smcr_tx_rdma_writes(struct smc_connection *c=
onn, size_t len,
 			src_len =3D dst_len - src_len; /* remainder */
 			src_len_sum +=3D src_len;
 		}
-		rc =3D smc_tx_rdma_write(conn, dst_off, num_sges,
-				       &wr_rdma_buf->wr_tx_rdma[dstchunk]);
+		rc =3D smc_tx_rdma_write(conn, dst_off, num_sges, wr);
 		if (rc)
 			return rc;
 		if (dst_len_sum =3D=3D len)
--=20
2.24.3 (Apple Git-128)