[PATCH net-next 3/3] net/smc: optimize MTTE consumption for SMC-R buffers

D. Wythe posted 3 patches 2 weeks, 1 day ago
[PATCH net-next 3/3] net/smc: optimize MTTE consumption for SMC-R buffers
Posted by D. Wythe 2 weeks, 1 day ago
SMC-R buffers currently use 4KB page mapping for IB registration.
Each page consumes one MTTE, which is inefficient and quickly depletes
limited IB hardware resources for large buffers.

For virtual contiguous buffer, switch to vmalloc_huge() to leverage
huge page support. By using larger page sizes during IB MR registration,
we can drastically reduce MTTE consumption.

For physically contiguous buffer, the entire buffer now requires only
one single MTTE.

Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Reviewed-by: Dust Li <dust.li@linux.alibaba.com>
---
 net/smc/smc_core.c |  3 ++-
 net/smc/smc_ib.c   | 23 ++++++++++++++++++++---
 2 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
index 6219db498976..8aca5dc54be7 100644
--- a/net/smc/smc_core.c
+++ b/net/smc/smc_core.c
@@ -2348,7 +2348,8 @@ static struct smc_buf_desc *smcr_new_buf_create(struct smc_link_group *lgr,
 			goto out;
 		fallthrough;	// try virtually contiguous buf
 	case SMCR_VIRT_CONT_BUFS:
-		buf_desc->cpu_addr = vzalloc(PAGE_SIZE << buf_desc->order);
+		buf_desc->cpu_addr = vmalloc_huge(PAGE_SIZE << buf_desc->order,
+						  GFP_KERNEL | __GFP_ZERO);
 		if (!buf_desc->cpu_addr)
 			goto out;
 		buf_desc->pages = NULL;
diff --git a/net/smc/smc_ib.c b/net/smc/smc_ib.c
index 1154907c5c05..67211d44a1db 100644
--- a/net/smc/smc_ib.c
+++ b/net/smc/smc_ib.c
@@ -20,6 +20,7 @@
 #include <linux/wait.h>
 #include <linux/mutex.h>
 #include <linux/inetdevice.h>
+#include <linux/vmalloc.h>
 #include <rdma/ib_verbs.h>
 #include <rdma/ib_cache.h>
 
@@ -697,6 +698,18 @@ void smc_ib_put_memory_region(struct ib_mr *mr)
 	ib_dereg_mr(mr);
 }
 
+static inline int smc_buf_get_vm_page_order(struct smc_buf_desc *buf_slot)
+{
+#ifdef CONFIG_HAVE_ARCH_HUGE_VMALLOC
+	struct vm_struct *vm;
+
+	vm = find_vm_area(buf_slot->cpu_addr);
+	return vm ? vm->page_order : 0;
+#else
+	return 0;
+#endif
+}
+
 static int smc_ib_map_mr_sg(struct smc_buf_desc *buf_slot, u8 link_idx)
 {
 	unsigned int offset = 0;
@@ -706,8 +719,9 @@ static int smc_ib_map_mr_sg(struct smc_buf_desc *buf_slot, u8 link_idx)
 	sg_num = ib_map_mr_sg(buf_slot->mr[link_idx],
 			      buf_slot->sgt[link_idx].sgl,
 			      buf_slot->sgt[link_idx].orig_nents,
-			      &offset, PAGE_SIZE);
-
+			      &offset,
+			      buf_slot->is_vm ? PAGE_SIZE << smc_buf_get_vm_page_order(buf_slot) :
+			      PAGE_SIZE << buf_slot->order);
 	return sg_num;
 }
 
@@ -719,7 +733,10 @@ int smc_ib_get_memory_region(struct ib_pd *pd, int access_flags,
 		return 0; /* already done */
 
 	buf_slot->mr[link_idx] =
-		ib_alloc_mr(pd, IB_MR_TYPE_MEM_REG, 1 << buf_slot->order);
+		ib_alloc_mr(pd, IB_MR_TYPE_MEM_REG,
+			    buf_slot->is_vm ?
+			    1 << (buf_slot->order - smc_buf_get_vm_page_order(buf_slot)) : 1);
+
 	if (IS_ERR(buf_slot->mr[link_idx])) {
 		int rc;
 
-- 
2.45.0
Re: [PATCH net-next 3/3] net/smc: optimize MTTE consumption for SMC-R buffers
Posted by Christoph Hellwig 2 weeks ago
On Fri, Jan 23, 2026 at 04:23:49PM +0800, D. Wythe wrote:
> +static inline int smc_buf_get_vm_page_order(struct smc_buf_desc *buf_slot)
> +{
> +#ifdef CONFIG_HAVE_ARCH_HUGE_VMALLOC
> +	struct vm_struct *vm;
> +
> +	vm = find_vm_area(buf_slot->cpu_addr);
> +	return vm ? vm->page_order : 0;
> +#else
> +	return 0;
> +#endif

You might want to encapsulate this logic in a vmalloc_order or similar
helper in vmalloc.c.
Re: [PATCH net-next 3/3] net/smc: optimize MTTE consumption for SMC-R buffers
Posted by D. Wythe 2 weeks ago
On Fri, Jan 23, 2026 at 06:52:55AM -0800, Christoph Hellwig wrote:
> On Fri, Jan 23, 2026 at 04:23:49PM +0800, D. Wythe wrote:
> > +static inline int smc_buf_get_vm_page_order(struct smc_buf_desc *buf_slot)
> > +{
> > +#ifdef CONFIG_HAVE_ARCH_HUGE_VMALLOC
> > +	struct vm_struct *vm;
> > +
> > +	vm = find_vm_area(buf_slot->cpu_addr);
> > +	return vm ? vm->page_order : 0;
> > +#else
> > +	return 0;
> > +#endif
> 
> You might want to encapsulate this logic in a vmalloc_order or similar
> helper in vmalloc.c.

Hi Christoph,

That's a great suggestion. Encapsulating this logic into a helper like
vmalloc_page_order() (or similar) within vmalloc.c is indeed much
cleaner than exporting find_vm_area().

I'll implement this helper in V2 and use it in the SMC code. Thanks for
pointing this out!

Thanks,
D. Wythe