[PATCH for-next v3] RDMA/core: Avoid hmm_dma_map_alloc() for virtual DMA devices

Daisuke Matsuda posted 1 patch 6 months, 3 weeks ago
drivers/infiniband/core/device.c   | 17 +++++++++++++++++
drivers/infiniband/core/umem_odp.c | 11 ++++++++---
include/rdma/ib_verbs.h            |  4 ++++
3 files changed, 29 insertions(+), 3 deletions(-)
[PATCH for-next v3] RDMA/core: Avoid hmm_dma_map_alloc() for virtual DMA devices
Posted by Daisuke Matsuda 6 months, 3 weeks ago
Drivers such as rxe, which use virtual DMA, must not call into the DMA
mapping core since they lack physical DMA capabilities. Otherwise, a NULL
pointer dereference is observed as shown below. This patch ensures the RDMA
core handles virtual and physical DMA paths appropriately.

This fixes the following kernel oops:

 BUG: kernel NULL pointer dereference, address: 00000000000002fc
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 1028eb067 P4D 1028eb067 PUD 105da0067 PMD 0
 Oops: Oops: 0000 [#1] SMP NOPTI
 CPU: 3 UID: 1000 PID: 1854 Comm: python3 Tainted: G        W           6.15.0-rc1+ #11 PREEMPT(voluntary)
 Tainted: [W]=WARN
 Hardware name: Trigkey Key N/Key N, BIOS KEYN101 09/02/2024
 RIP: 0010:hmm_dma_map_alloc+0x25/0x100
 Code: 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 49 89 d6 49 c1 e6 0c 41 55 41 54 53 49 39 ce 0f 82 c6 00 00 00 49 89 fc <f6> 87 fc 02 00 00 20 0f 84 af 00 00 00 49 89 f5 48 89 d3 49 89 cf
 RSP: 0018:ffffd3d3420eb830 EFLAGS: 00010246
 RAX: 0000000000001000 RBX: ffff8b727c7f7400 RCX: 0000000000001000
 RDX: 0000000000000001 RSI: ffff8b727c7f74b0 RDI: 0000000000000000
 RBP: ffffd3d3420eb858 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
 R13: 00007262a622a000 R14: 0000000000001000 R15: ffff8b727c7f74b0
 FS:  00007262a62a1080(0000) GS:ffff8b762ac3e000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00000000000002fc CR3: 000000010a1f0004 CR4: 0000000000f72ef0
 PKRU: 55555554
 Call Trace:
  <TASK>
  ib_init_umem_odp+0xb6/0x110 [ib_uverbs]
  ib_umem_odp_get+0xf0/0x150 [ib_uverbs]
  rxe_odp_mr_init_user+0x71/0x170 [rdma_rxe]
  rxe_reg_user_mr+0x217/0x2e0 [rdma_rxe]
  ib_uverbs_reg_mr+0x19e/0x2e0 [ib_uverbs]
  ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0xd9/0x150 [ib_uverbs]
  ib_uverbs_cmd_verbs+0xd19/0xee0 [ib_uverbs]
  ? mmap_region+0x63/0xd0
  ? __pfx_ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x10/0x10 [ib_uverbs]
  ib_uverbs_ioctl+0xba/0x130 [ib_uverbs]
  __x64_sys_ioctl+0xa4/0xe0
  x64_sys_call+0x1178/0x2660
  do_syscall_64+0x7e/0x170
  ? syscall_exit_to_user_mode+0x4e/0x250
  ? do_syscall_64+0x8a/0x170
  ? do_syscall_64+0x8a/0x170
  ? syscall_exit_to_user_mode+0x4e/0x250
  ? do_syscall_64+0x8a/0x170
  ? syscall_exit_to_user_mode+0x4e/0x250
  ? do_syscall_64+0x8a/0x170
  ? do_user_addr_fault+0x1d2/0x8d0
  ? irqentry_exit_to_user_mode+0x43/0x250
  ? irqentry_exit+0x43/0x50
  ? exc_page_fault+0x93/0x1d0
  entry_SYSCALL_64_after_hwframe+0x76/0x7e
 RIP: 0033:0x7262a6124ded
 Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00
 RSP: 002b:00007fffd08c3960 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
 RAX: ffffffffffffffda RBX: 00007fffd08c39f0 RCX: 00007262a6124ded
 RDX: 00007fffd08c3a10 RSI: 00000000c0181b01 RDI: 0000000000000007
 RBP: 00007fffd08c39b0 R08: 0000000014107820 R09: 00007fffd08c3b44
 R10: 000000000000000c R11: 0000000000000246 R12: 00007fffd08c3b44
 R13: 000000000000000c R14: 00007fffd08c3b58 R15: 0000000014107960
  </TASK>

Fixes: 1efe8c0670d6 ("RDMA/core: Convert UMEM ODP DMA mapping to caching IOVA and page linkage")
Closes: https://lore.kernel.org/all/3e8f343f-7d66-4f7a-9f08-3910623e322f@gmail.com/
Signed-off-by: Daisuke Matsuda <dskmtsd@gmail.com>
---
 drivers/infiniband/core/device.c   | 17 +++++++++++++++++
 drivers/infiniband/core/umem_odp.c | 11 ++++++++---
 include/rdma/ib_verbs.h            |  4 ++++
 3 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index b4e3e4beb7f4..abb8fed292c0 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -2864,6 +2864,23 @@ int ib_dma_virt_map_sg(struct ib_device *dev, struct scatterlist *sg, int nents)
 	return nents;
 }
 EXPORT_SYMBOL(ib_dma_virt_map_sg);
+int ib_dma_virt_map_alloc(struct hmm_dma_map *map, size_t nr_entries,
+			  size_t dma_entry_size)
+{
+	if (!(nr_entries * PAGE_SIZE / dma_entry_size))
+		return -EINVAL;
+
+	map->dma_entry_size = dma_entry_size;
+	map->pfn_list = kvcalloc(nr_entries, sizeof(*map->pfn_list),
+				 GFP_KERNEL | __GFP_NOWARN);
+	if (!map->pfn_list)
+		return -ENOMEM;
+
+	map->dma_list = NULL;
+
+	return 0;
+}
+EXPORT_SYMBOL(ib_dma_virt_map_alloc);
 #endif /* CONFIG_INFINIBAND_VIRT_DMA */
 
 static const struct rdma_nl_cbs ibnl_ls_cb_table[RDMA_NL_LS_NUM_OPS] = {
diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c
index 51d518989914..a5b17be0894a 100644
--- a/drivers/infiniband/core/umem_odp.c
+++ b/drivers/infiniband/core/umem_odp.c
@@ -75,9 +75,14 @@ static int ib_init_umem_odp(struct ib_umem_odp *umem_odp,
 	if (unlikely(end < page_size))
 		return -EOVERFLOW;
 
-	ret = hmm_dma_map_alloc(dev->dma_device, &umem_odp->map,
-				(end - start) >> PAGE_SHIFT,
-				1 << umem_odp->page_shift);
+	if (ib_uses_virt_dma(dev))
+		ret = ib_dma_virt_map_alloc(&umem_odp->map,
+					    (end - start) >> PAGE_SHIFT,
+					    1 << umem_odp->page_shift);
+	else
+		ret = hmm_dma_map_alloc(dev->dma_device, &umem_odp->map,
+					(end - start) >> PAGE_SHIFT,
+					1 << umem_odp->page_shift);
 	if (ret)
 		return ret;
 
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index b06a0ed81bdd..9ea41f288736 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -36,6 +36,7 @@
 #include <linux/irqflags.h>
 #include <linux/preempt.h>
 #include <linux/dim.h>
+#include <linux/hmm-dma.h>
 #include <uapi/rdma/ib_user_verbs.h>
 #include <rdma/rdma_counter.h>
 #include <rdma/restrack.h>
@@ -4221,6 +4222,9 @@ static inline void ib_dma_unmap_sg_attrs(struct ib_device *dev,
 				   dma_attrs);
 }
 
+int ib_dma_virt_map_alloc(struct hmm_dma_map *map, size_t nr_entries,
+			  size_t dma_entry_size);
+
 /**
  * ib_dma_map_sgtable_attrs - Map a scatter/gather table to DMA addresses
  * @dev: The device for which the DMA addresses are to be created
-- 
2.43.0
Re: [PATCH for-next v3] RDMA/core: Avoid hmm_dma_map_alloc() for virtual DMA devices
Posted by Leon Romanovsky 6 months, 3 weeks ago
On Sat, 24 May 2025 14:43:28 +0000, Daisuke Matsuda wrote:
> Drivers such as rxe, which use virtual DMA, must not call into the DMA
> mapping core since they lack physical DMA capabilities. Otherwise, a NULL
> pointer dereference is observed as shown below. This patch ensures the RDMA
> core handles virtual and physical DMA paths appropriately.
> 
> This fixes the following kernel oops:
> 
> [...]

Applied, thanks!

[1/1] RDMA/core: Avoid hmm_dma_map_alloc() for virtual DMA devices
      https://git.kernel.org/rdma/rdma/c/5071e94f47af42

Best regards,
-- 
Leon Romanovsky <leon@kernel.org>
Re: [PATCH for-next v3] RDMA/core: Avoid hmm_dma_map_alloc() for virtual DMA devices
Posted by Christoph Hellwig 6 months, 3 weeks ago
On Sun, May 25, 2025 at 03:51:08AM -0400, Leon Romanovsky wrote:
> 
> On Sat, 24 May 2025 14:43:28 +0000, Daisuke Matsuda wrote:
> > Drivers such as rxe, which use virtual DMA, must not call into the DMA
> > mapping core since they lack physical DMA capabilities. Otherwise, a NULL
> > pointer dereference is observed as shown below. This patch ensures the RDMA
> > core handles virtual and physical DMA paths appropriately.
> > 
> > This fixes the following kernel oops:
> > 
> > [...]
> 
> Applied, thanks!

So while this version look correct, the idea of open coding the
virtual device version of hmm_dma_map directly in the ODP code
is a nasty leaky abstraction.  Please pull it into a proper ib_dma_*
wrapper.
Re: [PATCH for-next v3] RDMA/core: Avoid hmm_dma_map_alloc() for virtual DMA devices
Posted by Jason Gunthorpe 6 months, 3 weeks ago
On Sun, May 25, 2025 at 11:57:14PM -0700, Christoph Hellwig wrote:
> On Sun, May 25, 2025 at 03:51:08AM -0400, Leon Romanovsky wrote:
> > 
> > On Sat, 24 May 2025 14:43:28 +0000, Daisuke Matsuda wrote:
> > > Drivers such as rxe, which use virtual DMA, must not call into the DMA
> > > mapping core since they lack physical DMA capabilities. Otherwise, a NULL
> > > pointer dereference is observed as shown below. This patch ensures the RDMA
> > > core handles virtual and physical DMA paths appropriately.
> > > 
> > > This fixes the following kernel oops:
> > > 
> > > [...]
> > 
> > Applied, thanks!
> 
> So while this version look correct, the idea of open coding the
> virtual device version of hmm_dma_map directly in the ODP code
> is a nasty leaky abstraction.  Please pull it into a proper ib_dma_*
> wrapper.

IMHO the ib_dma_* family was intended to be a ULP facing API so that
verbs using drivers can pass the right information through the WQE
APIs. Inside a driver it should not be calling these functions.

I think the bigger issue is that the virt drivers all expect to be
working in terms of struct page. It doesn't make any sense that rxe
would be using struct hmm_dma_map *at all*.

Indeed maybe it shouldn't even be using ib_umem_odp at all, since
basically everything it needs to do is different.

The nasty bit here is trying to make umem_odp overload struct
hmm_dma_map to also handle a struct page * array for the virtual
drivers.

Jason
Re: [PATCH for-next v3] RDMA/core: Avoid hmm_dma_map_alloc() for virtual DMA devices
Posted by Leon Romanovsky 6 months, 3 weeks ago
On Sun, May 25, 2025 at 11:57:14PM -0700, Christoph Hellwig wrote:
> On Sun, May 25, 2025 at 03:51:08AM -0400, Leon Romanovsky wrote:
> > 
> > On Sat, 24 May 2025 14:43:28 +0000, Daisuke Matsuda wrote:
> > > Drivers such as rxe, which use virtual DMA, must not call into the DMA
> > > mapping core since they lack physical DMA capabilities. Otherwise, a NULL
> > > pointer dereference is observed as shown below. This patch ensures the RDMA
> > > core handles virtual and physical DMA paths appropriately.
> > > 
> > > This fixes the following kernel oops:
> > > 
> > > [...]
> > 
> > Applied, thanks!
> 
> So while this version look correct, the idea of open coding the
> virtual device version of hmm_dma_map directly in the ODP code
> is a nasty leaky abstraction.  Please pull it into a proper ib_dma_*
> wrapper.

I did it on purpose as these ib_dma_* are used by all IB users (core, drivers
and ULPs) and at this point I don't want them to use any of that specific API.

Thanks
Re: [PATCH for-next v3] RDMA/core: Avoid hmm_dma_map_alloc() for virtual DMA devices
Posted by Leon Romanovsky 6 months, 3 weeks ago
On Sat, May 24, 2025 at 02:43:28PM +0000, Daisuke Matsuda wrote:
> Drivers such as rxe, which use virtual DMA, must not call into the DMA
> mapping core since they lack physical DMA capabilities. Otherwise, a NULL
> pointer dereference is observed as shown below. This patch ensures the RDMA
> core handles virtual and physical DMA paths appropriately.
> 
> This fixes the following kernel oops:
> 
>  BUG: kernel NULL pointer dereference, address: 00000000000002fc
>  #PF: supervisor read access in kernel mode
>  #PF: error_code(0x0000) - not-present page
>  PGD 1028eb067 P4D 1028eb067 PUD 105da0067 PMD 0
>  Oops: Oops: 0000 [#1] SMP NOPTI
>  CPU: 3 UID: 1000 PID: 1854 Comm: python3 Tainted: G        W           6.15.0-rc1+ #11 PREEMPT(voluntary)
>  Tainted: [W]=WARN
>  Hardware name: Trigkey Key N/Key N, BIOS KEYN101 09/02/2024
>  RIP: 0010:hmm_dma_map_alloc+0x25/0x100
>  Code: 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 49 89 d6 49 c1 e6 0c 41 55 41 54 53 49 39 ce 0f 82 c6 00 00 00 49 89 fc <f6> 87 fc 02 00 00 20 0f 84 af 00 00 00 49 89 f5 48 89 d3 49 89 cf
>  RSP: 0018:ffffd3d3420eb830 EFLAGS: 00010246
>  RAX: 0000000000001000 RBX: ffff8b727c7f7400 RCX: 0000000000001000
>  RDX: 0000000000000001 RSI: ffff8b727c7f74b0 RDI: 0000000000000000
>  RBP: ffffd3d3420eb858 R08: 0000000000000000 R09: 0000000000000000
>  R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
>  R13: 00007262a622a000 R14: 0000000000001000 R15: ffff8b727c7f74b0
>  FS:  00007262a62a1080(0000) GS:ffff8b762ac3e000(0000) knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  CR2: 00000000000002fc CR3: 000000010a1f0004 CR4: 0000000000f72ef0
>  PKRU: 55555554
>  Call Trace:
>   <TASK>
>   ib_init_umem_odp+0xb6/0x110 [ib_uverbs]
>   ib_umem_odp_get+0xf0/0x150 [ib_uverbs]
>   rxe_odp_mr_init_user+0x71/0x170 [rdma_rxe]
>   rxe_reg_user_mr+0x217/0x2e0 [rdma_rxe]
>   ib_uverbs_reg_mr+0x19e/0x2e0 [ib_uverbs]
>   ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0xd9/0x150 [ib_uverbs]
>   ib_uverbs_cmd_verbs+0xd19/0xee0 [ib_uverbs]
>   ? mmap_region+0x63/0xd0
>   ? __pfx_ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x10/0x10 [ib_uverbs]
>   ib_uverbs_ioctl+0xba/0x130 [ib_uverbs]
>   __x64_sys_ioctl+0xa4/0xe0
>   x64_sys_call+0x1178/0x2660
>   do_syscall_64+0x7e/0x170
>   ? syscall_exit_to_user_mode+0x4e/0x250
>   ? do_syscall_64+0x8a/0x170
>   ? do_syscall_64+0x8a/0x170
>   ? syscall_exit_to_user_mode+0x4e/0x250
>   ? do_syscall_64+0x8a/0x170
>   ? syscall_exit_to_user_mode+0x4e/0x250
>   ? do_syscall_64+0x8a/0x170
>   ? do_user_addr_fault+0x1d2/0x8d0
>   ? irqentry_exit_to_user_mode+0x43/0x250
>   ? irqentry_exit+0x43/0x50
>   ? exc_page_fault+0x93/0x1d0
>   entry_SYSCALL_64_after_hwframe+0x76/0x7e
>  RIP: 0033:0x7262a6124ded
>  Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00
>  RSP: 002b:00007fffd08c3960 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
>  RAX: ffffffffffffffda RBX: 00007fffd08c39f0 RCX: 00007262a6124ded
>  RDX: 00007fffd08c3a10 RSI: 00000000c0181b01 RDI: 0000000000000007
>  RBP: 00007fffd08c39b0 R08: 0000000014107820 R09: 00007fffd08c3b44
>  R10: 000000000000000c R11: 0000000000000246 R12: 00007fffd08c3b44
>  R13: 000000000000000c R14: 00007fffd08c3b58 R15: 0000000014107960
>   </TASK>
> 
> Fixes: 1efe8c0670d6 ("RDMA/core: Convert UMEM ODP DMA mapping to caching IOVA and page linkage")
> Closes: https://lore.kernel.org/all/3e8f343f-7d66-4f7a-9f08-3910623e322f@gmail.com/
> Signed-off-by: Daisuke Matsuda <dskmtsd@gmail.com>
> ---
>  drivers/infiniband/core/device.c   | 17 +++++++++++++++++
>  drivers/infiniband/core/umem_odp.c | 11 ++++++++---
>  include/rdma/ib_verbs.h            |  4 ++++
>  3 files changed, 29 insertions(+), 3 deletions(-)

Please include changelogs when you submit vX patches next time.

I ended with the following patch:

diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c
index 51d518989914e..cf16549919e02 100644
--- a/drivers/infiniband/core/umem_odp.c
+++ b/drivers/infiniband/core/umem_odp.c
@@ -60,9 +60,11 @@ static int ib_init_umem_odp(struct ib_umem_odp *umem_odp,
 {
 	struct ib_device *dev = umem_odp->umem.ibdev;
 	size_t page_size = 1UL << umem_odp->page_shift;
+	struct hmm_dma_map *map;
 	unsigned long start;
 	unsigned long end;
-	int ret;
+	size_t nr_entries;
+	int ret = 0;
 
 	umem_odp->umem.is_odp = 1;
 	mutex_init(&umem_odp->umem_mutex);
@@ -75,9 +77,20 @@ static int ib_init_umem_odp(struct ib_umem_odp *umem_odp,
 	if (unlikely(end < page_size))
 		return -EOVERFLOW;
 
-	ret = hmm_dma_map_alloc(dev->dma_device, &umem_odp->map,
-				(end - start) >> PAGE_SHIFT,
-				1 << umem_odp->page_shift);
+	nr_entries = (end - start) >> PAGE_SHIFT;
+	if (!(nr_entries * PAGE_SIZE / page_size))
+		return -EINVAL;
+
+	nap = &umem_odp->map;
+	if (ib_uses_virt_dma(dev)) {
+		map->pfn_list = kvcalloc(nr_entries, sizeof(*map->pfn_list),
+					 GFP_KERNEL | __GFP_NOWARN);
+		if (!map->pfn_list)
+			ret = -ENOMEM;
+	} else
+		ret = hmm_dma_map_alloc(dev->dma_device, &umem_odp->map,
+					(end - start) >> PAGE_SHIFT,
+					1 << umem_odp->page_shift);
 	if (ret)
 		return ret;
 
@@ -90,7 +103,10 @@ static int ib_init_umem_odp(struct ib_umem_odp *umem_odp,
 	return 0;
 
 out_free_map:
-	hmm_dma_map_free(dev->dma_device, &umem_odp->map);
+	if (ib_uses_virt_dma(dev))
+		kfree(map->pfn_list);
+	else
+		hmm_dma_map_free(dev->dma_device, &umem_odp->map);
 	return ret;
 }
 
@@ -259,7 +275,10 @@ static void ib_umem_odp_free(struct ib_umem_odp *umem_odp)
 				    ib_umem_end(umem_odp));
 	mutex_unlock(&umem_odp->umem_mutex);
 	mmu_interval_notifier_remove(&umem_odp->notifier);
-	hmm_dma_map_free(dev->dma_device, &umem_odp->map);
+	if (ib_uses_virt_dma(dev))
+		kfree(umem_odp->map->pfn_list);
+	else
+		hmm_dma_map_free(dev->dma_device, &umem_odp->map);
 }
 
 void ib_umem_odp_release(struct ib_umem_odp *umem_odp)
diff --git a/mm/hmm.c b/mm/hmm.c
index a8bf097677f39..feac86196a65f 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -640,8 +640,7 @@ int hmm_dma_map_alloc(struct device *dev, struct hmm_dma_map *map,
 	bool dma_need_sync = false;
 	bool use_iova;
 
-	if (!(nr_entries * PAGE_SIZE / dma_entry_size))
-		return -EINVAL;
+	WARN_ON_ONCE(!(nr_entries * PAGE_SIZE / dma_entry_size));
 
 	/*
 	 * The HMM API violates our normal DMA buffer ownership rules and can't
-- 
2.49.0
Re: [PATCH for-next v3] RDMA/core: Avoid hmm_dma_map_alloc() for virtual DMA devices
Posted by Daisuke Matsuda 6 months, 3 weeks ago
Hi Leon,

Thank you for amending the patch.
I've run the test and confirmed that the bug has been resolved.

We still have two build errors. Please see my reply below.

On 2025/05/25 16:08, Leon Romanovsky wrote:
> On Sat, May 24, 2025 at 02:43:28PM +0000, Daisuke Matsuda wrote:
<...>
>>   drivers/infiniband/core/device.c   | 17 +++++++++++++++++
>>   drivers/infiniband/core/umem_odp.c | 11 ++++++++---
>>   include/rdma/ib_verbs.h            |  4 ++++
>>   3 files changed, 29 insertions(+), 3 deletions(-)
> 
> Please include changelogs when you submit vX patches next time.

Oh, sorry. I will be careful from next time.

> 
> I ended with the following patch:
> 
> diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c
> index 51d518989914e..cf16549919e02 100644
> --- a/drivers/infiniband/core/umem_odp.c
> +++ b/drivers/infiniband/core/umem_odp.c
> @@ -60,9 +60,11 @@ static int ib_init_umem_odp(struct ib_umem_odp *umem_odp,
>   {
>   	struct ib_device *dev = umem_odp->umem.ibdev;
>   	size_t page_size = 1UL << umem_odp->page_shift;
> +	struct hmm_dma_map *map;
>   	unsigned long start;
>   	unsigned long end;
> -	int ret;
> +	size_t nr_entries;
> +	int ret = 0;
>   
>   	umem_odp->umem.is_odp = 1;
>   	mutex_init(&umem_odp->umem_mutex);
> @@ -75,9 +77,20 @@ static int ib_init_umem_odp(struct ib_umem_odp *umem_odp,
>   	if (unlikely(end < page_size))
>   		return -EOVERFLOW;
>   
> -	ret = hmm_dma_map_alloc(dev->dma_device, &umem_odp->map,
> -				(end - start) >> PAGE_SHIFT,
> -				1 << umem_odp->page_shift);
> +	nr_entries = (end - start) >> PAGE_SHIFT;
> +	if (!(nr_entries * PAGE_SIZE / page_size))
> +		return -EINVAL;
> +
> +	nap = &umem_odp->map;

BUILD ERROR: 'nap' should be 'map'

> +	if (ib_uses_virt_dma(dev)) {
> +		map->pfn_list = kvcalloc(nr_entries, sizeof(*map->pfn_list),
> +					 GFP_KERNEL | __GFP_NOWARN);
> +		if (!map->pfn_list)
> +			ret = -ENOMEM;
> +	} else
> +		ret = hmm_dma_map_alloc(dev->dma_device, &umem_odp->map,

OPTIONAL: Perhaps we can just pass 'map' for the 2nd arg?

> +					(end - start) >> PAGE_SHIFT,
> +					1 << umem_odp->page_shift);
>   	if (ret)
>   		return ret;
>   
> @@ -90,7 +103,10 @@ static int ib_init_umem_odp(struct ib_umem_odp *umem_odp,
>   	return 0;
>   
>   out_free_map:
> -	hmm_dma_map_free(dev->dma_device, &umem_odp->map);
> +	if (ib_uses_virt_dma(dev))
> +		kfree(map->pfn_list);
> +	else
> +		hmm_dma_map_free(dev->dma_device, &umem_odp->map);

OPTIONAL: Here too.

>   	return ret;
>   }
>   
> @@ -259,7 +275,10 @@ static void ib_umem_odp_free(struct ib_umem_odp *umem_odp)
>   				    ib_umem_end(umem_odp));
>   	mutex_unlock(&umem_odp->umem_mutex);
>   	mmu_interval_notifier_remove(&umem_odp->notifier);
> -	hmm_dma_map_free(dev->dma_device, &umem_odp->map);
> +	if (ib_uses_virt_dma(dev))
> +		kfree(umem_odp->map->pfn_list);

BUILD ERROR:     'umem_odp->map.pfn_list' is correct.

Thanks again,
Daisuke
Re: [PATCH for-next v3] RDMA/core: Avoid hmm_dma_map_alloc() for virtual DMA devices
Posted by Leon Romanovsky 6 months, 3 weeks ago
On Sun, May 25, 2025 at 06:34:56PM +0900, Daisuke Matsuda wrote:
> Hi Leon,
> 
> Thank you for amending the patch.
> I've run the test and confirmed that the bug has been resolved.
> 
> We still have two build errors. Please see my reply below.
> 
> On 2025/05/25 16:08, Leon Romanovsky wrote:
> > On Sat, May 24, 2025 at 02:43:28PM +0000, Daisuke Matsuda wrote:
> <...>
> > >   drivers/infiniband/core/device.c   | 17 +++++++++++++++++
> > >   drivers/infiniband/core/umem_odp.c | 11 ++++++++---
> > >   include/rdma/ib_verbs.h            |  4 ++++
> > >   3 files changed, 29 insertions(+), 3 deletions(-)
> > 
> > Please include changelogs when you submit vX patches next time.
> 
> Oh, sorry. I will be careful from next time.
> 
> > 
> > I ended with the following patch:
> > 
> > diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c
> > index 51d518989914e..cf16549919e02 100644
> > --- a/drivers/infiniband/core/umem_odp.c
> > +++ b/drivers/infiniband/core/umem_odp.c
> > @@ -60,9 +60,11 @@ static int ib_init_umem_odp(struct ib_umem_odp *umem_odp,
> >   {
> >   	struct ib_device *dev = umem_odp->umem.ibdev;
> >   	size_t page_size = 1UL << umem_odp->page_shift;
> > +	struct hmm_dma_map *map;
> >   	unsigned long start;
> >   	unsigned long end;
> > -	int ret;
> > +	size_t nr_entries;
> > +	int ret = 0;
> >   	umem_odp->umem.is_odp = 1;
> >   	mutex_init(&umem_odp->umem_mutex);
> > @@ -75,9 +77,20 @@ static int ib_init_umem_odp(struct ib_umem_odp *umem_odp,
> >   	if (unlikely(end < page_size))
> >   		return -EOVERFLOW;
> > -	ret = hmm_dma_map_alloc(dev->dma_device, &umem_odp->map,
> > -				(end - start) >> PAGE_SHIFT,
> > -				1 << umem_odp->page_shift);
> > +	nr_entries = (end - start) >> PAGE_SHIFT;
> > +	if (!(nr_entries * PAGE_SIZE / page_size))
> > +		return -EINVAL;
> > +
> > +	nap = &umem_odp->map;
> 
> BUILD ERROR: 'nap' should be 'map'

Sorry, I pushed the code too fast.

> 
> > +	if (ib_uses_virt_dma(dev)) {
> > +		map->pfn_list = kvcalloc(nr_entries, sizeof(*map->pfn_list),
> > +					 GFP_KERNEL | __GFP_NOWARN);
> > +		if (!map->pfn_list)
> > +			ret = -ENOMEM;
> > +	} else
> > +		ret = hmm_dma_map_alloc(dev->dma_device, &umem_odp->map,
> 
> OPTIONAL: Perhaps we can just pass 'map' for the 2nd arg?


<...>

> > +		hmm_dma_map_free(dev->dma_device, &umem_odp->map);
> 
> OPTIONAL: Here too.

Sure, will change.

> 
> >   	return ret;
> >   }
> > @@ -259,7 +275,10 @@ static void ib_umem_odp_free(struct ib_umem_odp *umem_odp)
> >   				    ib_umem_end(umem_odp));
> >   	mutex_unlock(&umem_odp->umem_mutex);
> >   	mmu_interval_notifier_remove(&umem_odp->notifier);
> > -	hmm_dma_map_free(dev->dma_device, &umem_odp->map);
> > +	if (ib_uses_virt_dma(dev))
> > +		kfree(umem_odp->map->pfn_list);
> 
> BUILD ERROR:     'umem_odp->map.pfn_list' is correct.

Sorry about that too.

> 
> Thanks again,
> Daisuke
> 
>
Re: [PATCH for-next v3] RDMA/core: Avoid hmm_dma_map_alloc() for virtual DMA devices
Posted by Zhu Yanjun 6 months, 3 weeks ago
在 2025/5/24 16:43, Daisuke Matsuda 写道:
> Drivers such as rxe, which use virtual DMA, must not call into the DMA
> mapping core since they lack physical DMA capabilities. Otherwise, a NULL
> pointer dereference is observed as shown below. This patch ensures the RDMA
> core handles virtual and physical DMA paths appropriately.
> 
> This fixes the following kernel oops:
> 
>   BUG: kernel NULL pointer dereference, address: 00000000000002fc
>   #PF: supervisor read access in kernel mode
>   #PF: error_code(0x0000) - not-present page
>   PGD 1028eb067 P4D 1028eb067 PUD 105da0067 PMD 0
>   Oops: Oops: 0000 [#1] SMP NOPTI
>   CPU: 3 UID: 1000 PID: 1854 Comm: python3 Tainted: G        W           6.15.0-rc1+ #11 PREEMPT(voluntary)
>   Tainted: [W]=WARN
>   Hardware name: Trigkey Key N/Key N, BIOS KEYN101 09/02/2024
>   RIP: 0010:hmm_dma_map_alloc+0x25/0x100
>   Code: 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 49 89 d6 49 c1 e6 0c 41 55 41 54 53 49 39 ce 0f 82 c6 00 00 00 49 89 fc <f6> 87 fc 02 00 00 20 0f 84 af 00 00 00 49 89 f5 48 89 d3 49 89 cf
>   RSP: 0018:ffffd3d3420eb830 EFLAGS: 00010246
>   RAX: 0000000000001000 RBX: ffff8b727c7f7400 RCX: 0000000000001000
>   RDX: 0000000000000001 RSI: ffff8b727c7f74b0 RDI: 0000000000000000
>   RBP: ffffd3d3420eb858 R08: 0000000000000000 R09: 0000000000000000
>   R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
>   R13: 00007262a622a000 R14: 0000000000001000 R15: ffff8b727c7f74b0
>   FS:  00007262a62a1080(0000) GS:ffff8b762ac3e000(0000) knlGS:0000000000000000
>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>   CR2: 00000000000002fc CR3: 000000010a1f0004 CR4: 0000000000f72ef0
>   PKRU: 55555554
>   Call Trace:
>    <TASK>
>    ib_init_umem_odp+0xb6/0x110 [ib_uverbs]
>    ib_umem_odp_get+0xf0/0x150 [ib_uverbs]
>    rxe_odp_mr_init_user+0x71/0x170 [rdma_rxe]
>    rxe_reg_user_mr+0x217/0x2e0 [rdma_rxe]
>    ib_uverbs_reg_mr+0x19e/0x2e0 [ib_uverbs]
>    ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0xd9/0x150 [ib_uverbs]
>    ib_uverbs_cmd_verbs+0xd19/0xee0 [ib_uverbs]
>    ? mmap_region+0x63/0xd0
>    ? __pfx_ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x10/0x10 [ib_uverbs]
>    ib_uverbs_ioctl+0xba/0x130 [ib_uverbs]
>    __x64_sys_ioctl+0xa4/0xe0
>    x64_sys_call+0x1178/0x2660
>    do_syscall_64+0x7e/0x170
>    ? syscall_exit_to_user_mode+0x4e/0x250
>    ? do_syscall_64+0x8a/0x170
>    ? do_syscall_64+0x8a/0x170
>    ? syscall_exit_to_user_mode+0x4e/0x250
>    ? do_syscall_64+0x8a/0x170
>    ? syscall_exit_to_user_mode+0x4e/0x250
>    ? do_syscall_64+0x8a/0x170
>    ? do_user_addr_fault+0x1d2/0x8d0
>    ? irqentry_exit_to_user_mode+0x43/0x250
>    ? irqentry_exit+0x43/0x50
>    ? exc_page_fault+0x93/0x1d0
>    entry_SYSCALL_64_after_hwframe+0x76/0x7e
>   RIP: 0033:0x7262a6124ded
>   Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00
>   RSP: 002b:00007fffd08c3960 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
>   RAX: ffffffffffffffda RBX: 00007fffd08c39f0 RCX: 00007262a6124ded
>   RDX: 00007fffd08c3a10 RSI: 00000000c0181b01 RDI: 0000000000000007
>   RBP: 00007fffd08c39b0 R08: 0000000014107820 R09: 00007fffd08c3b44
>   R10: 000000000000000c R11: 0000000000000246 R12: 00007fffd08c3b44
>   R13: 000000000000000c R14: 00007fffd08c3b58 R15: 0000000014107960
>    </TASK>
> 
> Fixes: 1efe8c0670d6 ("RDMA/core: Convert UMEM ODP DMA mapping to caching IOVA and page linkage")
> Closes: https://lore.kernel.org/all/3e8f343f-7d66-4f7a-9f08-3910623e322f@gmail.com/
> Signed-off-by: Daisuke Matsuda <dskmtsd@gmail.com>

I tried to apply this commit to the following rdma branch.
But I failed. The error is as below:

"
Applying: RDMA/core: Avoid hmm_dma_map_alloc() for virtual DMA devices
error: patch failed: drivers/infiniband/core/umem_odp.c:75
error: drivers/infiniband/core/umem_odp.c: patch does not apply
Patch failed at 0001 RDMA/core: Avoid hmm_dma_map_alloc() for virtual 
DMA devices
hint: Use 'git am --show-current-patch=diff' to see the failed patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

"

The remote branch is remotes/rdma/rdma-next
The head commit is 3b6a1e410c7f RDMA/mlx5: Fix CC counters query for MPV

I am not sure if I use the correct repository and branch or not.

Best Regards,
Zhu Yanjun

> ---
>   drivers/infiniband/core/device.c   | 17 +++++++++++++++++
>   drivers/infiniband/core/umem_odp.c | 11 ++++++++---
>   include/rdma/ib_verbs.h            |  4 ++++
>   3 files changed, 29 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
> index b4e3e4beb7f4..abb8fed292c0 100644
> --- a/drivers/infiniband/core/device.c
> +++ b/drivers/infiniband/core/device.c
> @@ -2864,6 +2864,23 @@ int ib_dma_virt_map_sg(struct ib_device *dev, struct scatterlist *sg, int nents)
>   	return nents;
>   }
>   EXPORT_SYMBOL(ib_dma_virt_map_sg);
> +int ib_dma_virt_map_alloc(struct hmm_dma_map *map, size_t nr_entries,
> +			  size_t dma_entry_size)
> +{
> +	if (!(nr_entries * PAGE_SIZE / dma_entry_size))
> +		return -EINVAL;
> +
> +	map->dma_entry_size = dma_entry_size;
> +	map->pfn_list = kvcalloc(nr_entries, sizeof(*map->pfn_list),
> +				 GFP_KERNEL | __GFP_NOWARN);
> +	if (!map->pfn_list)
> +		return -ENOMEM;
> +
> +	map->dma_list = NULL;
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL(ib_dma_virt_map_alloc);
>   #endif /* CONFIG_INFINIBAND_VIRT_DMA */
>   
>   static const struct rdma_nl_cbs ibnl_ls_cb_table[RDMA_NL_LS_NUM_OPS] = {
> diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c
> index 51d518989914..a5b17be0894a 100644
> --- a/drivers/infiniband/core/umem_odp.c
> +++ b/drivers/infiniband/core/umem_odp.c
> @@ -75,9 +75,14 @@ static int ib_init_umem_odp(struct ib_umem_odp *umem_odp,
>   	if (unlikely(end < page_size))
>   		return -EOVERFLOW;
>   
> -	ret = hmm_dma_map_alloc(dev->dma_device, &umem_odp->map,
> -				(end - start) >> PAGE_SHIFT,
> -				1 << umem_odp->page_shift);
> +	if (ib_uses_virt_dma(dev))
> +		ret = ib_dma_virt_map_alloc(&umem_odp->map,
> +					    (end - start) >> PAGE_SHIFT,
> +					    1 << umem_odp->page_shift);
> +	else
> +		ret = hmm_dma_map_alloc(dev->dma_device, &umem_odp->map,
> +					(end - start) >> PAGE_SHIFT,
> +					1 << umem_odp->page_shift);
>   	if (ret)
>   		return ret;
>   
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index b06a0ed81bdd..9ea41f288736 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -36,6 +36,7 @@
>   #include <linux/irqflags.h>
>   #include <linux/preempt.h>
>   #include <linux/dim.h>
> +#include <linux/hmm-dma.h>
>   #include <uapi/rdma/ib_user_verbs.h>
>   #include <rdma/rdma_counter.h>
>   #include <rdma/restrack.h>
> @@ -4221,6 +4222,9 @@ static inline void ib_dma_unmap_sg_attrs(struct ib_device *dev,
>   				   dma_attrs);
>   }
>   
> +int ib_dma_virt_map_alloc(struct hmm_dma_map *map, size_t nr_entries,
> +			  size_t dma_entry_size);
> +
>   /**
>    * ib_dma_map_sgtable_attrs - Map a scatter/gather table to DMA addresses
>    * @dev: The device for which the DMA addresses are to be created

Re: [PATCH for-next v3] RDMA/core: Avoid hmm_dma_map_alloc() for virtual DMA devices
Posted by Daisuke Matsuda 6 months, 3 weeks ago

On 2025/05/25 14:22, Zhu Yanjun wrote:
> 在 2025/5/24 16:43, Daisuke Matsuda 写道:
>> Drivers such as rxe, which use virtual DMA, must not call into the DMA
>> mapping core since they lack physical DMA capabilities. Otherwise, a NULL
>> pointer dereference is observed as shown below. This patch ensures the RDMA
>> core handles virtual and physical DMA paths appropriately.
>>
>> This fixes the following kernel oops:
>>
>>   BUG: kernel NULL pointer dereference, address: 00000000000002fc
>>   #PF: supervisor read access in kernel mode
>>   #PF: error_code(0x0000) - not-present page
>>   PGD 1028eb067 P4D 1028eb067 PUD 105da0067 PMD 0
>>   Oops: Oops: 0000 [#1] SMP NOPTI
>>   CPU: 3 UID: 1000 PID: 1854 Comm: python3 Tainted: G        W           6.15.0-rc1+ #11 PREEMPT(voluntary)
>>   Tainted: [W]=WARN
>>   Hardware name: Trigkey Key N/Key N, BIOS KEYN101 09/02/2024
>>   RIP: 0010:hmm_dma_map_alloc+0x25/0x100
>>   Code: 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 49 89 d6 49 c1 e6 0c 41 55 41 54 53 49 39 ce 0f 82 c6 00 00 00 49 89 fc <f6> 87 fc 02 00 00 20 0f 84 af 00 00 00 49 89 f5 48 89 d3 49 89 cf
>>   RSP: 0018:ffffd3d3420eb830 EFLAGS: 00010246
>>   RAX: 0000000000001000 RBX: ffff8b727c7f7400 RCX: 0000000000001000
>>   RDX: 0000000000000001 RSI: ffff8b727c7f74b0 RDI: 0000000000000000
>>   RBP: ffffd3d3420eb858 R08: 0000000000000000 R09: 0000000000000000
>>   R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
>>   R13: 00007262a622a000 R14: 0000000000001000 R15: ffff8b727c7f74b0
>>   FS:  00007262a62a1080(0000) GS:ffff8b762ac3e000(0000) knlGS:0000000000000000
>>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>   CR2: 00000000000002fc CR3: 000000010a1f0004 CR4: 0000000000f72ef0
>>   PKRU: 55555554
>>   Call Trace:
>>    <TASK>
>>    ib_init_umem_odp+0xb6/0x110 [ib_uverbs]
>>    ib_umem_odp_get+0xf0/0x150 [ib_uverbs]
>>    rxe_odp_mr_init_user+0x71/0x170 [rdma_rxe]
>>    rxe_reg_user_mr+0x217/0x2e0 [rdma_rxe]
>>    ib_uverbs_reg_mr+0x19e/0x2e0 [ib_uverbs]
>>    ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0xd9/0x150 [ib_uverbs]
>>    ib_uverbs_cmd_verbs+0xd19/0xee0 [ib_uverbs]
>>    ? mmap_region+0x63/0xd0
>>    ? __pfx_ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x10/0x10 [ib_uverbs]
>>    ib_uverbs_ioctl+0xba/0x130 [ib_uverbs]
>>    __x64_sys_ioctl+0xa4/0xe0
>>    x64_sys_call+0x1178/0x2660
>>    do_syscall_64+0x7e/0x170
>>    ? syscall_exit_to_user_mode+0x4e/0x250
>>    ? do_syscall_64+0x8a/0x170
>>    ? do_syscall_64+0x8a/0x170
>>    ? syscall_exit_to_user_mode+0x4e/0x250
>>    ? do_syscall_64+0x8a/0x170
>>    ? syscall_exit_to_user_mode+0x4e/0x250
>>    ? do_syscall_64+0x8a/0x170
>>    ? do_user_addr_fault+0x1d2/0x8d0
>>    ? irqentry_exit_to_user_mode+0x43/0x250
>>    ? irqentry_exit+0x43/0x50
>>    ? exc_page_fault+0x93/0x1d0
>>    entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>   RIP: 0033:0x7262a6124ded
>>   Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00
>>   RSP: 002b:00007fffd08c3960 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
>>   RAX: ffffffffffffffda RBX: 00007fffd08c39f0 RCX: 00007262a6124ded
>>   RDX: 00007fffd08c3a10 RSI: 00000000c0181b01 RDI: 0000000000000007
>>   RBP: 00007fffd08c39b0 R08: 0000000014107820 R09: 00007fffd08c3b44
>>   R10: 000000000000000c R11: 0000000000000246 R12: 00007fffd08c3b44
>>   R13: 000000000000000c R14: 00007fffd08c3b58 R15: 0000000014107960
>>    </TASK>
>>
>> Fixes: 1efe8c0670d6 ("RDMA/core: Convert UMEM ODP DMA mapping to caching IOVA and page linkage")
>> Closes: https://lore.kernel.org/all/3e8f343f-7d66-4f7a-9f08-3910623e322f@gmail.com/
>> Signed-off-by: Daisuke Matsuda <dskmtsd@gmail.com>
> 
> I tried to apply this commit to the following rdma branch.
> But I failed. The error is as below:
> 
> "
> Applying: RDMA/core: Avoid hmm_dma_map_alloc() for virtual DMA devices
> error: patch failed: drivers/infiniband/core/umem_odp.c:75
> error: drivers/infiniband/core/umem_odp.c: patch does not apply
> Patch failed at 0001 RDMA/core: Avoid hmm_dma_map_alloc() for virtual DMA devices
> hint: Use 'git am --show-current-patch=diff' to see the failed patch
> When you have resolved this problem, run "git am --continue".
> If you prefer to skip this patch, run "git am --skip" instead.
> To restore the original branch and stop patching, run "git am --abort".
> 
> "
> 
> The remote branch is remotes/rdma/rdma-next
> The head commit is 3b6a1e410c7f RDMA/mlx5: Fix CC counters query for MPV
> 
> I am not sure if I use the correct repository and branch or not.

Thank you for trying the patch.

The change is meant to be applied to 'for-next' branch,
which contains patches from the following series:
[PATCH v10 00/24] Provide a new two step DMA mapping API
https://lore.kernel.org/linux-rdma/cover.1745831017.git.leon@kernel.org/

cf. https://kernel.googlesource.com/pub/scm/linux/kernel/git/rdma/rdma/

Thanks,
Daisuke

> 
> Best Regards,
> Zhu Yanjun
> 
>> ---
>>   drivers/infiniband/core/device.c   | 17 +++++++++++++++++
>>   drivers/infiniband/core/umem_odp.c | 11 ++++++++---
>>   include/rdma/ib_verbs.h            |  4 ++++
>>   3 files changed, 29 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
>> index b4e3e4beb7f4..abb8fed292c0 100644
>> --- a/drivers/infiniband/core/device.c
>> +++ b/drivers/infiniband/core/device.c
>> @@ -2864,6 +2864,23 @@ int ib_dma_virt_map_sg(struct ib_device *dev, struct scatterlist *sg, int nents)
>>       return nents;
>>   }
>>   EXPORT_SYMBOL(ib_dma_virt_map_sg);
>> +int ib_dma_virt_map_alloc(struct hmm_dma_map *map, size_t nr_entries,
>> +              size_t dma_entry_size)
>> +{
>> +    if (!(nr_entries * PAGE_SIZE / dma_entry_size))
>> +        return -EINVAL;
>> +
>> +    map->dma_entry_size = dma_entry_size;
>> +    map->pfn_list = kvcalloc(nr_entries, sizeof(*map->pfn_list),
>> +                 GFP_KERNEL | __GFP_NOWARN);
>> +    if (!map->pfn_list)
>> +        return -ENOMEM;
>> +
>> +    map->dma_list = NULL;
>> +
>> +    return 0;
>> +}
>> +EXPORT_SYMBOL(ib_dma_virt_map_alloc);
>>   #endif /* CONFIG_INFINIBAND_VIRT_DMA */
>>   static const struct rdma_nl_cbs ibnl_ls_cb_table[RDMA_NL_LS_NUM_OPS] = {
>> diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c
>> index 51d518989914..a5b17be0894a 100644
>> --- a/drivers/infiniband/core/umem_odp.c
>> +++ b/drivers/infiniband/core/umem_odp.c
>> @@ -75,9 +75,14 @@ static int ib_init_umem_odp(struct ib_umem_odp *umem_odp,
>>       if (unlikely(end < page_size))
>>           return -EOVERFLOW;
>> -    ret = hmm_dma_map_alloc(dev->dma_device, &umem_odp->map,
>> -                (end - start) >> PAGE_SHIFT,
>> -                1 << umem_odp->page_shift);
>> +    if (ib_uses_virt_dma(dev))
>> +        ret = ib_dma_virt_map_alloc(&umem_odp->map,
>> +                        (end - start) >> PAGE_SHIFT,
>> +                        1 << umem_odp->page_shift);
>> +    else
>> +        ret = hmm_dma_map_alloc(dev->dma_device, &umem_odp->map,
>> +                    (end - start) >> PAGE_SHIFT,
>> +                    1 << umem_odp->page_shift);
>>       if (ret)
>>           return ret;
>> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
>> index b06a0ed81bdd..9ea41f288736 100644
>> --- a/include/rdma/ib_verbs.h
>> +++ b/include/rdma/ib_verbs.h
>> @@ -36,6 +36,7 @@
>>   #include <linux/irqflags.h>
>>   #include <linux/preempt.h>
>>   #include <linux/dim.h>
>> +#include <linux/hmm-dma.h>
>>   #include <uapi/rdma/ib_user_verbs.h>
>>   #include <rdma/rdma_counter.h>
>>   #include <rdma/restrack.h>
>> @@ -4221,6 +4222,9 @@ static inline void ib_dma_unmap_sg_attrs(struct ib_device *dev,
>>                      dma_attrs);
>>   }
>> +int ib_dma_virt_map_alloc(struct hmm_dma_map *map, size_t nr_entries,
>> +              size_t dma_entry_size);
>> +
>>   /**
>>    * ib_dma_map_sgtable_attrs - Map a scatter/gather table to DMA addresses
>>    * @dev: The device for which the DMA addresses are to be created
> 

Re: [PATCH for-next v3] RDMA/core: Avoid hmm_dma_map_alloc() for virtual DMA devices
Posted by Zhu Yanjun 6 months, 3 weeks ago
在 2025/5/25 8:03, Daisuke Matsuda 写道:
> 
> 
> On 2025/05/25 14:22, Zhu Yanjun wrote:
>> 在 2025/5/24 16:43, Daisuke Matsuda 写道:
>>> Drivers such as rxe, which use virtual DMA, must not call into the DMA
>>> mapping core since they lack physical DMA capabilities. Otherwise, a 
>>> NULL
>>> pointer dereference is observed as shown below. This patch ensures 
>>> the RDMA
>>> core handles virtual and physical DMA paths appropriately.
>>>
>>> This fixes the following kernel oops:
>>>
>>>   BUG: kernel NULL pointer dereference, address: 00000000000002fc
>>>   #PF: supervisor read access in kernel mode
>>>   #PF: error_code(0x0000) - not-present page
>>>   PGD 1028eb067 P4D 1028eb067 PUD 105da0067 PMD 0
>>>   Oops: Oops: 0000 [#1] SMP NOPTI
>>>   CPU: 3 UID: 1000 PID: 1854 Comm: python3 Tainted: G        
>>> W           6.15.0-rc1+ #11 PREEMPT(voluntary)
>>>   Tainted: [W]=WARN
>>>   Hardware name: Trigkey Key N/Key N, BIOS KEYN101 09/02/2024
>>>   RIP: 0010:hmm_dma_map_alloc+0x25/0x100
>>>   Code: 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 49 89 
>>> d6 49 c1 e6 0c 41 55 41 54 53 49 39 ce 0f 82 c6 00 00 00 49 89 fc 
>>> <f6> 87 fc 02 00 00 20 0f 84 af 00 00 00 49 89 f5 48 89 d3 49 89 cf
>>>   RSP: 0018:ffffd3d3420eb830 EFLAGS: 00010246
>>>   RAX: 0000000000001000 RBX: ffff8b727c7f7400 RCX: 0000000000001000
>>>   RDX: 0000000000000001 RSI: ffff8b727c7f74b0 RDI: 0000000000000000
>>>   RBP: ffffd3d3420eb858 R08: 0000000000000000 R09: 0000000000000000
>>>   R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
>>>   R13: 00007262a622a000 R14: 0000000000001000 R15: ffff8b727c7f74b0
>>>   FS:  00007262a62a1080(0000) GS:ffff8b762ac3e000(0000) 
>>> knlGS:0000000000000000
>>>   CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>   CR2: 00000000000002fc CR3: 000000010a1f0004 CR4: 0000000000f72ef0
>>>   PKRU: 55555554
>>>   Call Trace:
>>>    <TASK>
>>>    ib_init_umem_odp+0xb6/0x110 [ib_uverbs]
>>>    ib_umem_odp_get+0xf0/0x150 [ib_uverbs]
>>>    rxe_odp_mr_init_user+0x71/0x170 [rdma_rxe]
>>>    rxe_reg_user_mr+0x217/0x2e0 [rdma_rxe]
>>>    ib_uverbs_reg_mr+0x19e/0x2e0 [ib_uverbs]
>>>    ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0xd9/0x150 [ib_uverbs]
>>>    ib_uverbs_cmd_verbs+0xd19/0xee0 [ib_uverbs]
>>>    ? mmap_region+0x63/0xd0
>>>    ? __pfx_ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x10/0x10 
>>> [ib_uverbs]
>>>    ib_uverbs_ioctl+0xba/0x130 [ib_uverbs]
>>>    __x64_sys_ioctl+0xa4/0xe0
>>>    x64_sys_call+0x1178/0x2660
>>>    do_syscall_64+0x7e/0x170
>>>    ? syscall_exit_to_user_mode+0x4e/0x250
>>>    ? do_syscall_64+0x8a/0x170
>>>    ? do_syscall_64+0x8a/0x170
>>>    ? syscall_exit_to_user_mode+0x4e/0x250
>>>    ? do_syscall_64+0x8a/0x170
>>>    ? syscall_exit_to_user_mode+0x4e/0x250
>>>    ? do_syscall_64+0x8a/0x170
>>>    ? do_user_addr_fault+0x1d2/0x8d0
>>>    ? irqentry_exit_to_user_mode+0x43/0x250
>>>    ? irqentry_exit+0x43/0x50
>>>    ? exc_page_fault+0x93/0x1d0
>>>    entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>>   RIP: 0033:0x7262a6124ded
>>>   Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 
>>> 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 
>>> <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00
>>>   RSP: 002b:00007fffd08c3960 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
>>>   RAX: ffffffffffffffda RBX: 00007fffd08c39f0 RCX: 00007262a6124ded
>>>   RDX: 00007fffd08c3a10 RSI: 00000000c0181b01 RDI: 0000000000000007
>>>   RBP: 00007fffd08c39b0 R08: 0000000014107820 R09: 00007fffd08c3b44
>>>   R10: 000000000000000c R11: 0000000000000246 R12: 00007fffd08c3b44
>>>   R13: 000000000000000c R14: 00007fffd08c3b58 R15: 0000000014107960
>>>    </TASK>
>>>
>>> Fixes: 1efe8c0670d6 ("RDMA/core: Convert UMEM ODP DMA mapping to 
>>> caching IOVA and page linkage")
>>> Closes: https://lore.kernel.org/ 
>>> all/3e8f343f-7d66-4f7a-9f08-3910623e322f@gmail.com/
>>> Signed-off-by: Daisuke Matsuda <dskmtsd@gmail.com>
>>
>> I tried to apply this commit to the following rdma branch.
>> But I failed. The error is as below:
>>
>> "
>> Applying: RDMA/core: Avoid hmm_dma_map_alloc() for virtual DMA devices
>> error: patch failed: drivers/infiniband/core/umem_odp.c:75
>> error: drivers/infiniband/core/umem_odp.c: patch does not apply
>> Patch failed at 0001 RDMA/core: Avoid hmm_dma_map_alloc() for virtual 
>> DMA devices
>> hint: Use 'git am --show-current-patch=diff' to see the failed patch
>> When you have resolved this problem, run "git am --continue".
>> If you prefer to skip this patch, run "git am --skip" instead.
>> To restore the original branch and stop patching, run "git am --abort".
>>
>> "
>>
>> The remote branch is remotes/rdma/rdma-next
>> The head commit is 3b6a1e410c7f RDMA/mlx5: Fix CC counters query for MPV
>>
>> I am not sure if I use the correct repository and branch or not.
> 
> Thank you for trying the patch.
> 
> The change is meant to be applied to 'for-next' branch,
> which contains patches from the following series:
> [PATCH v10 00/24] Provide a new two step DMA mapping API
> https://lore.kernel.org/linux-rdma/cover.1745831017.git.leon@kernel.org/
> 
> cf. https://kernel.googlesource.com/pub/scm/linux/kernel/git/rdma/rdma/

Exactly. With the above link and for-next branch, this commit can be 
applied successfully.

Thanks,

Zhu Yanjun

> 
> Thanks,
> Daisuke
> 
>>
>> Best Regards,
>> Zhu Yanjun
>>
>>> ---
>>>   drivers/infiniband/core/device.c   | 17 +++++++++++++++++
>>>   drivers/infiniband/core/umem_odp.c | 11 ++++++++---
>>>   include/rdma/ib_verbs.h            |  4 ++++
>>>   3 files changed, 29 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/ 
>>> core/device.c
>>> index b4e3e4beb7f4..abb8fed292c0 100644
>>> --- a/drivers/infiniband/core/device.c
>>> +++ b/drivers/infiniband/core/device.c
>>> @@ -2864,6 +2864,23 @@ int ib_dma_virt_map_sg(struct ib_device *dev, 
>>> struct scatterlist *sg, int nents)
>>>       return nents;
>>>   }
>>>   EXPORT_SYMBOL(ib_dma_virt_map_sg);
>>> +int ib_dma_virt_map_alloc(struct hmm_dma_map *map, size_t nr_entries,
>>> +              size_t dma_entry_size)
>>> +{
>>> +    if (!(nr_entries * PAGE_SIZE / dma_entry_size))
>>> +        return -EINVAL;
>>> +
>>> +    map->dma_entry_size = dma_entry_size;
>>> +    map->pfn_list = kvcalloc(nr_entries, sizeof(*map->pfn_list),
>>> +                 GFP_KERNEL | __GFP_NOWARN);
>>> +    if (!map->pfn_list)
>>> +        return -ENOMEM;
>>> +
>>> +    map->dma_list = NULL;
>>> +
>>> +    return 0;
>>> +}
>>> +EXPORT_SYMBOL(ib_dma_virt_map_alloc);
>>>   #endif /* CONFIG_INFINIBAND_VIRT_DMA */
>>>   static const struct rdma_nl_cbs 
>>> ibnl_ls_cb_table[RDMA_NL_LS_NUM_OPS] = {
>>> diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/ 
>>> core/umem_odp.c
>>> index 51d518989914..a5b17be0894a 100644
>>> --- a/drivers/infiniband/core/umem_odp.c
>>> +++ b/drivers/infiniband/core/umem_odp.c
>>> @@ -75,9 +75,14 @@ static int ib_init_umem_odp(struct ib_umem_odp 
>>> *umem_odp,
>>>       if (unlikely(end < page_size))
>>>           return -EOVERFLOW;
>>> -    ret = hmm_dma_map_alloc(dev->dma_device, &umem_odp->map,
>>> -                (end - start) >> PAGE_SHIFT,
>>> -                1 << umem_odp->page_shift);
>>> +    if (ib_uses_virt_dma(dev))
>>> +        ret = ib_dma_virt_map_alloc(&umem_odp->map,
>>> +                        (end - start) >> PAGE_SHIFT,
>>> +                        1 << umem_odp->page_shift);
>>> +    else
>>> +        ret = hmm_dma_map_alloc(dev->dma_device, &umem_odp->map,
>>> +                    (end - start) >> PAGE_SHIFT,
>>> +                    1 << umem_odp->page_shift);
>>>       if (ret)
>>>           return ret;
>>> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
>>> index b06a0ed81bdd..9ea41f288736 100644
>>> --- a/include/rdma/ib_verbs.h
>>> +++ b/include/rdma/ib_verbs.h
>>> @@ -36,6 +36,7 @@
>>>   #include <linux/irqflags.h>
>>>   #include <linux/preempt.h>
>>>   #include <linux/dim.h>
>>> +#include <linux/hmm-dma.h>
>>>   #include <uapi/rdma/ib_user_verbs.h>
>>>   #include <rdma/rdma_counter.h>
>>>   #include <rdma/restrack.h>
>>> @@ -4221,6 +4222,9 @@ static inline void ib_dma_unmap_sg_attrs(struct 
>>> ib_device *dev,
>>>                      dma_attrs);
>>>   }
>>> +int ib_dma_virt_map_alloc(struct hmm_dma_map *map, size_t nr_entries,
>>> +              size_t dma_entry_size);
>>> +
>>>   /**
>>>    * ib_dma_map_sgtable_attrs - Map a scatter/gather table to DMA 
>>> addresses
>>>    * @dev: The device for which the DMA addresses are to be created
>>
> 

Re: [PATCH for-next v3] RDMA/core: Avoid hmm_dma_map_alloc() for virtual DMA devices
Posted by Greg Sword 6 months, 3 weeks ago
On Sat, May 24, 2025 at 10:43 PM Daisuke Matsuda <dskmtsd@gmail.com> wrote:
>
> Drivers such as rxe, which use virtual DMA, must not call into the DMA
> mapping core since they lack physical DMA capabilities. Otherwise, a NULL
> pointer dereference is observed as shown below. This patch ensures the RDMA
> core handles virtual and physical DMA paths appropriately.
>
> This fixes the following kernel oops:
>
>  BUG: kernel NULL pointer dereference, address: 00000000000002fc
>  #PF: supervisor read access in kernel mode
>  #PF: error_code(0x0000) - not-present page
>  PGD 1028eb067 P4D 1028eb067 PUD 105da0067 PMD 0
>  Oops: Oops: 0000 [#1] SMP NOPTI
>  CPU: 3 UID: 1000 PID: 1854 Comm: python3 Tainted: G        W           6.15.0-rc1+ #11 PREEMPT(voluntary)
>  Tainted: [W]=WARN
>  Hardware name: Trigkey Key N/Key N, BIOS KEYN101 09/02/2024
>  RIP: 0010:hmm_dma_map_alloc+0x25/0x100
>  Code: 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 49 89 d6 49 c1 e6 0c 41 55 41 54 53 49 39 ce 0f 82 c6 00 00 00 49 89 fc <f6> 87 fc 02 00 00 20 0f 84 af 00 00 00 49 89 f5 48 89 d3 49 89 cf
>  RSP: 0018:ffffd3d3420eb830 EFLAGS: 00010246
>  RAX: 0000000000001000 RBX: ffff8b727c7f7400 RCX: 0000000000001000
>  RDX: 0000000000000001 RSI: ffff8b727c7f74b0 RDI: 0000000000000000
>  RBP: ffffd3d3420eb858 R08: 0000000000000000 R09: 0000000000000000
>  R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
>  R13: 00007262a622a000 R14: 0000000000001000 R15: ffff8b727c7f74b0
>  FS:  00007262a62a1080(0000) GS:ffff8b762ac3e000(0000) knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  CR2: 00000000000002fc CR3: 000000010a1f0004 CR4: 0000000000f72ef0
>  PKRU: 55555554
>  Call Trace:
>   <TASK>
>   ib_init_umem_odp+0xb6/0x110 [ib_uverbs]
>   ib_umem_odp_get+0xf0/0x150 [ib_uverbs]
>   rxe_odp_mr_init_user+0x71/0x170 [rdma_rxe]
>   rxe_reg_user_mr+0x217/0x2e0 [rdma_rxe]
>   ib_uverbs_reg_mr+0x19e/0x2e0 [ib_uverbs]
>   ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0xd9/0x150 [ib_uverbs]
>   ib_uverbs_cmd_verbs+0xd19/0xee0 [ib_uverbs]
>   ? mmap_region+0x63/0xd0
>   ? __pfx_ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x10/0x10 [ib_uverbs]
>   ib_uverbs_ioctl+0xba/0x130 [ib_uverbs]
>   __x64_sys_ioctl+0xa4/0xe0
>   x64_sys_call+0x1178/0x2660
>   do_syscall_64+0x7e/0x170
>   ? syscall_exit_to_user_mode+0x4e/0x250
>   ? do_syscall_64+0x8a/0x170
>   ? do_syscall_64+0x8a/0x170
>   ? syscall_exit_to_user_mode+0x4e/0x250
>   ? do_syscall_64+0x8a/0x170
>   ? syscall_exit_to_user_mode+0x4e/0x250
>   ? do_syscall_64+0x8a/0x170
>   ? do_user_addr_fault+0x1d2/0x8d0
>   ? irqentry_exit_to_user_mode+0x43/0x250
>   ? irqentry_exit+0x43/0x50
>   ? exc_page_fault+0x93/0x1d0
>   entry_SYSCALL_64_after_hwframe+0x76/0x7e
>  RIP: 0033:0x7262a6124ded
>  Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00
>  RSP: 002b:00007fffd08c3960 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
>  RAX: ffffffffffffffda RBX: 00007fffd08c39f0 RCX: 00007262a6124ded
>  RDX: 00007fffd08c3a10 RSI: 00000000c0181b01 RDI: 0000000000000007
>  RBP: 00007fffd08c39b0 R08: 0000000014107820 R09: 00007fffd08c3b44
>  R10: 000000000000000c R11: 0000000000000246 R12: 00007fffd08c3b44
>  R13: 000000000000000c R14: 00007fffd08c3b58 R15: 0000000014107960
>   </TASK>
>
> Fixes: 1efe8c0670d6 ("RDMA/core: Convert UMEM ODP DMA mapping to caching IOVA and page linkage")
> Closes: https://lore.kernel.org/all/3e8f343f-7d66-4f7a-9f08-3910623e322f@gmail.com/
> Signed-off-by: Daisuke Matsuda <dskmtsd@gmail.com>
> ---
>  drivers/infiniband/core/device.c   | 17 +++++++++++++++++
>  drivers/infiniband/core/umem_odp.c | 11 ++++++++---
>  include/rdma/ib_verbs.h            |  4 ++++
>  3 files changed, 29 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
> index b4e3e4beb7f4..abb8fed292c0 100644
> --- a/drivers/infiniband/core/device.c
> +++ b/drivers/infiniband/core/device.c
> @@ -2864,6 +2864,23 @@ int ib_dma_virt_map_sg(struct ib_device *dev, struct scatterlist *sg, int nents)
>         return nents;
>  }
>  EXPORT_SYMBOL(ib_dma_virt_map_sg);
> +int ib_dma_virt_map_alloc(struct hmm_dma_map *map, size_t nr_entries,
> +                         size_t dma_entry_size)
> +{
> +       if (!(nr_entries * PAGE_SIZE / dma_entry_size))
> +               return -EINVAL;
> +
> +       map->dma_entry_size = dma_entry_size;
> +       map->pfn_list = kvcalloc(nr_entries, sizeof(*map->pfn_list),
> +                                GFP_KERNEL | __GFP_NOWARN);
> +       if (!map->pfn_list)
> +               return -ENOMEM;
> +
> +       map->dma_list = NULL;
> +
> +       return 0;
> +}
> +EXPORT_SYMBOL(ib_dma_virt_map_alloc);
>  #endif /* CONFIG_INFINIBAND_VIRT_DMA */
>

Your ODP patches have caused significant issues, including system
instability. The latest version of your patches has led to critical
failures in our environment. Due to these ongoing problems, we have
decided that our system will no longer incorporate your patches going
forward.

--G--

>  static const struct rdma_nl_cbs ibnl_ls_cb_table[RDMA_NL_LS_NUM_OPS] = {
> diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c
> index 51d518989914..a5b17be0894a 100644
> --- a/drivers/infiniband/core/umem_odp.c
> +++ b/drivers/infiniband/core/umem_odp.c
> @@ -75,9 +75,14 @@ static int ib_init_umem_odp(struct ib_umem_odp *umem_odp,
>         if (unlikely(end < page_size))
>                 return -EOVERFLOW;
>
> -       ret = hmm_dma_map_alloc(dev->dma_device, &umem_odp->map,
> -                               (end - start) >> PAGE_SHIFT,
> -                               1 << umem_odp->page_shift);
> +       if (ib_uses_virt_dma(dev))
> +               ret = ib_dma_virt_map_alloc(&umem_odp->map,
> +                                           (end - start) >> PAGE_SHIFT,
> +                                           1 << umem_odp->page_shift);
> +       else
> +               ret = hmm_dma_map_alloc(dev->dma_device, &umem_odp->map,
> +                                       (end - start) >> PAGE_SHIFT,
> +                                       1 << umem_odp->page_shift);
>         if (ret)
>                 return ret;
>
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index b06a0ed81bdd..9ea41f288736 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -36,6 +36,7 @@
>  #include <linux/irqflags.h>
>  #include <linux/preempt.h>
>  #include <linux/dim.h>
> +#include <linux/hmm-dma.h>
>  #include <uapi/rdma/ib_user_verbs.h>
>  #include <rdma/rdma_counter.h>
>  #include <rdma/restrack.h>
> @@ -4221,6 +4222,9 @@ static inline void ib_dma_unmap_sg_attrs(struct ib_device *dev,
>                                    dma_attrs);
>  }
>
> +int ib_dma_virt_map_alloc(struct hmm_dma_map *map, size_t nr_entries,
> +                         size_t dma_entry_size);
> +
>  /**
>   * ib_dma_map_sgtable_attrs - Map a scatter/gather table to DMA addresses
>   * @dev: The device for which the DMA addresses are to be created
> --
> 2.43.0
>
>
Re: [PATCH for-next v3] RDMA/core: Avoid hmm_dma_map_alloc() for virtual DMA devices
Posted by Leon Romanovsky 6 months, 3 weeks ago
On Sun, May 25, 2025 at 05:27:23AM +0800, Greg Sword wrote:
> On Sat, May 24, 2025 at 10:43 PM Daisuke Matsuda <dskmtsd@gmail.com> wrote:
> >
> > Drivers such as rxe, which use virtual DMA, must not call into the DMA
> > mapping core since they lack physical DMA capabilities. Otherwise, a NULL
> > pointer dereference is observed as shown below. This patch ensures the RDMA
> > core handles virtual and physical DMA paths appropriately.

<...>

> > +EXPORT_SYMBOL(ib_dma_virt_map_alloc);
> >  #endif /* CONFIG_INFINIBAND_VIRT_DMA */
> >
> 
> Your ODP patches have caused significant issues, including system
> instability. The latest version of your patches has led to critical
> failures in our environment. Due to these ongoing problems, we have
> decided that our system will no longer incorporate your patches going
> forward.

Please be civil and appreciate the work done by other people. Daisuke
invested a lot of effort to implement ODP which is very non-trivial part
of RDMA HW.

If you can do it better, just do it, we will be happy to merge your
patches too.

At the end, we are talking about -next branch, which has potential to be
unstable, so next time provide detailed bug report to support your claims.

In addition, it was me who broke RXE.

Thanks
Re: [PATCH for-next v3] RDMA/core: Avoid hmm_dma_map_alloc() for virtual DMA devices
Posted by Greg Sword 6 months, 3 weeks ago
On Sun, May 25, 2025 at 2:05 PM Leon Romanovsky <leon@kernel.org> wrote:
>
> On Sun, May 25, 2025 at 05:27:23AM +0800, Greg Sword wrote:
> > On Sat, May 24, 2025 at 10:43 PM Daisuke Matsuda <dskmtsd@gmail.com> wrote:
> > >
> > > Drivers such as rxe, which use virtual DMA, must not call into the DMA
> > > mapping core since they lack physical DMA capabilities. Otherwise, a NULL
> > > pointer dereference is observed as shown below. This patch ensures the RDMA
> > > core handles virtual and physical DMA paths appropriately.
>
> <...>
>
> > > +EXPORT_SYMBOL(ib_dma_virt_map_alloc);
> > >  #endif /* CONFIG_INFINIBAND_VIRT_DMA */
> > >
> >
> > Your ODP patches have caused significant issues, including system
> > instability. The latest version of your patches has led to critical
> > failures in our environment. Due to these ongoing problems, we have
> > decided that our system will no longer incorporate your patches going
> > forward.
>
> Please be civil and appreciate the work done by other people. Daisuke
> invested a lot of effort to implement ODP which is very non-trivial part
> of RDMA HW.
>
> If you can do it better, just do it, we will be happy to merge your
> patches too.
>
> At the end, we are talking about -next branch, which has potential to be
> unstable, so next time provide detailed bug report to support your claims.
>
> In addition, it was me who broke RXE.

Is it an Honor?

>
> Thanks
Re: [PATCH for-next v3] RDMA/core: Avoid hmm_dma_map_alloc() for virtual DMA devices
Posted by Leon Romanovsky 6 months, 3 weeks ago
On Tue, May 27, 2025 at 01:52:58AM +0800, Greg Sword wrote:
> On Sun, May 25, 2025 at 2:05 PM Leon Romanovsky <leon@kernel.org> wrote:
> >
> > On Sun, May 25, 2025 at 05:27:23AM +0800, Greg Sword wrote:
> > > On Sat, May 24, 2025 at 10:43 PM Daisuke Matsuda <dskmtsd@gmail.com> wrote:
> > > >
> > > > Drivers such as rxe, which use virtual DMA, must not call into the DMA
> > > > mapping core since they lack physical DMA capabilities. Otherwise, a NULL
> > > > pointer dereference is observed as shown below. This patch ensures the RDMA
> > > > core handles virtual and physical DMA paths appropriately.
> >
> > <...>
> >
> > > > +EXPORT_SYMBOL(ib_dma_virt_map_alloc);
> > > >  #endif /* CONFIG_INFINIBAND_VIRT_DMA */
> > > >
> > >
> > > Your ODP patches have caused significant issues, including system
> > > instability. The latest version of your patches has led to critical
> > > failures in our environment. Due to these ongoing problems, we have
> > > decided that our system will no longer incorporate your patches going
> > > forward.
> >
> > Please be civil and appreciate the work done by other people. Daisuke
> > invested a lot of effort to implement ODP which is very non-trivial part
> > of RDMA HW.
> >
> > If you can do it better, just do it, we will be happy to merge your
> > patches too.
> >
> > At the end, we are talking about -next branch, which has potential to be
> > unstable, so next time provide detailed bug report to support your claims.
> >
> > In addition, it was me who broke RXE.
> 
> Is it an Honor?

It is a fact.

> 
> >
> > Thanks
Re: [PATCH for-next v3] RDMA/core: Avoid hmm_dma_map_alloc() for virtual DMA devices
Posted by Daisuke Matsuda 6 months, 3 weeks ago
> Your ODP patches have caused significant issues, including system
> instability. The latest version of your patches has led to critical
> failures in our environment. Due to these ongoing problems, we have
> decided that our system will no longer incorporate your patches going
> forward.

I always wonder why this kind of "report" seen around RXE never includes
the details of the problem encountered and the steps to reproduce them.
Everybody else in the linux kernel community does so.

--D--

> 
> --G--
Re: [PATCH for-next v3] RDMA/core: Avoid hmm_dma_map_alloc() for virtual DMA devices
Posted by Greg Sword 6 months, 3 weeks ago
On Sun, May 25, 2025 at 10:54 AM Daisuke Matsuda <dskmtsd@gmail.com> wrote:
>
>
> > Your ODP patches have caused significant issues, including system
> > instability. The latest version of your patches has led to critical
> > failures in our environment. Due to these ongoing problems, we have
> > decided that our system will no longer incorporate your patches going
> > forward.
>
> I always wonder why this kind of "report" seen around RXE never includes
> the details of the problem encountered and the steps to reproduce them.
> Everybody else in the linux kernel community does so.

Test it a few more times on your local machine and you can re this
issue, it's such a simple test, can't you do it?

>
> --D--
>
> >
> > --G--