mm/hmm.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-)
Some drivers (such as rxe) may legitimately call hmm_dma_map_alloc() with a
NULL device pointer, which leads to a NULL pointer dereference. This patch
adds NULL checks to safely bypass device-specific DMA features when no
device is provided.
This fixes the following kernel oops:
BUG: kernel NULL pointer dereference, address: 00000000000002fc
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 1028eb067 P4D 1028eb067 PUD 105da0067 PMD 0
Oops: Oops: 0000 [#1] SMP NOPTI
CPU: 3 UID: 1000 PID: 1854 Comm: python3 Tainted: G W 6.15.0-rc1+ #11 PREEMPT(voluntary)
Tainted: [W]=WARN
Hardware name: Trigkey Key N/Key N, BIOS KEYN101 09/02/2024
RIP: 0010:hmm_dma_map_alloc+0x25/0x100
Code: 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 49 89 d6 49 c1 e6 0c 41 55 41 54 53 49 39 ce 0f 82 c6 00 00 00 49 89 fc <f6> 87 fc 02 00 00 20 0f 84 af 00 00 00 49 89 f5 48 89 d3 49 89 cf
RSP: 0018:ffffd3d3420eb830 EFLAGS: 00010246
RAX: 0000000000001000 RBX: ffff8b727c7f7400 RCX: 0000000000001000
RDX: 0000000000000001 RSI: ffff8b727c7f74b0 RDI: 0000000000000000
RBP: ffffd3d3420eb858 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 00007262a622a000 R14: 0000000000001000 R15: ffff8b727c7f74b0
FS: 00007262a62a1080(0000) GS:ffff8b762ac3e000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000002fc CR3: 000000010a1f0004 CR4: 0000000000f72ef0
PKRU: 55555554
Call Trace:
<TASK>
ib_init_umem_odp+0xb6/0x110 [ib_uverbs]
ib_umem_odp_get+0xf0/0x150 [ib_uverbs]
rxe_odp_mr_init_user+0x71/0x170 [rdma_rxe]
rxe_reg_user_mr+0x217/0x2e0 [rdma_rxe]
ib_uverbs_reg_mr+0x19e/0x2e0 [ib_uverbs]
ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0xd9/0x150 [ib_uverbs]
ib_uverbs_cmd_verbs+0xd19/0xee0 [ib_uverbs]
? mmap_region+0x63/0xd0
? __pfx_ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x10/0x10 [ib_uverbs]
ib_uverbs_ioctl+0xba/0x130 [ib_uverbs]
__x64_sys_ioctl+0xa4/0xe0
x64_sys_call+0x1178/0x2660
do_syscall_64+0x7e/0x170
? syscall_exit_to_user_mode+0x4e/0x250
? do_syscall_64+0x8a/0x170
? do_syscall_64+0x8a/0x170
? syscall_exit_to_user_mode+0x4e/0x250
? do_syscall_64+0x8a/0x170
? syscall_exit_to_user_mode+0x4e/0x250
? do_syscall_64+0x8a/0x170
? do_user_addr_fault+0x1d2/0x8d0
? irqentry_exit_to_user_mode+0x43/0x250
? irqentry_exit+0x43/0x50
? exc_page_fault+0x93/0x1d0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7262a6124ded
Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00
RSP: 002b:00007fffd08c3960 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007fffd08c39f0 RCX: 00007262a6124ded
RDX: 00007fffd08c3a10 RSI: 00000000c0181b01 RDI: 0000000000000007
RBP: 00007fffd08c39b0 R08: 0000000014107820 R09: 00007fffd08c3b44
R10: 000000000000000c R11: 0000000000000246 R12: 00007fffd08c3b44
R13: 000000000000000c R14: 00007fffd08c3b58 R15: 0000000014107960
</TASK>
Fixes: 1efe8c0670d6 ("RDMA/core: Convert UMEM ODP DMA mapping to caching IOVA and page linkage")
Closes: https://lore.kernel.org/all/3e8f343f-7d66-4f7a-9f08-3910623e322f@gmail.com/
Signed-off-by: Daisuke Matsuda <dskmtsd@gmail.com>
---
mm/hmm.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a/mm/hmm.c b/mm/hmm.c
index a8bf097677f3..311141124e67 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -638,7 +638,7 @@ int hmm_dma_map_alloc(struct device *dev, struct hmm_dma_map *map,
size_t nr_entries, size_t dma_entry_size)
{
bool dma_need_sync = false;
- bool use_iova;
+ bool use_iova = false;
if (!(nr_entries * PAGE_SIZE / dma_entry_size))
return -EINVAL;
@@ -649,9 +649,9 @@ int hmm_dma_map_alloc(struct device *dev, struct hmm_dma_map *map,
* best approximation to ensure no swiotlb buffering happens.
*/
#ifdef CONFIG_DMA_NEED_SYNC
- dma_need_sync = !dev->dma_skip_sync;
+ dma_need_sync = dev ? !dev->dma_skip_sync : false;
#endif /* CONFIG_DMA_NEED_SYNC */
- if (dma_need_sync || dma_addressing_limited(dev))
+ if (dev && (dma_need_sync || dma_addressing_limited(dev)))
return -EOPNOTSUPP;
map->dma_entry_size = dma_entry_size;
@@ -660,9 +660,11 @@ int hmm_dma_map_alloc(struct device *dev, struct hmm_dma_map *map,
if (!map->pfn_list)
return -ENOMEM;
- use_iova = dma_iova_try_alloc(dev, &map->state, 0,
- nr_entries * PAGE_SIZE);
- if (!use_iova && dma_need_unmap(dev)) {
+ if (dev)
+ use_iova = dma_iova_try_alloc(dev, &map->state, 0,
+ nr_entries * PAGE_SIZE);
+
+ if (!dev || (!use_iova && dma_need_unmap(dev))) {
map->dma_list = kvcalloc(nr_entries, sizeof(*map->dma_list),
GFP_KERNEL | __GFP_NOWARN);
if (!map->dma_list)
--
2.43.0
On Fri, May 23, 2025 at 02:35:37PM +0000, Daisuke Matsuda wrote: > Some drivers (such as rxe) may legitimately call hmm_dma_map_alloc() with a > NULL device pointer, No, they may not. If something has no device with physical DMA capabilities, it has not business calling into it.
On 2025/05/23 23:48, Christoph Hellwig wrote:
> On Fri, May 23, 2025 at 02:35:37PM +0000, Daisuke Matsuda wrote:
>> Some drivers (such as rxe) may legitimately call hmm_dma_map_alloc() with a
>> NULL device pointer,
>
> No, they may not. If something has no device with physical DMA
> capabilities, it has not business calling into it.
>
Hi Christoph,
RXE is a software emulator of IBTA RoCEv2, designed to allow systems equipped with standard Ethernet adapters to interoperate with other RoCEv2-capable nodes.
Like other Infiniband subsystem drivers (under drivers/infiniband/{hw,sw}), RXE depends on the ib_core and ib_uverbs layers in drivers/infiniband/core. These common RDMA layers, in turn, rely on the HMM infrastructure for specific features such as On-Demand Paging.
As a result, even though RXE lacks physical DMA capabilities, it still needs to interact with hmm_dma_map_alloc() through the shared RDMA core paths. This patch ensures that such software-only use cases do not trigger unintended null pointer dereferences.
Thanks,
Daisuke
Thank you very much, but I know rxe very well. And given your apparent knowledge of the rdma subsystem you should also know pretty well that it does not otherwise call into the dma mapping core for virtual devices because calling into the dma mapping code is not valid for the virtual devices. Please fix the rdma core to not call into the hmm dma mapping helpers for the ib_uses_virt_dma() case.
On 2025/05/24 0:42, Christoph Hellwig wrote: > Thank you very much, but I know rxe very well. And given your apparent > knowledge of the rdma subsystem you should also know pretty well that > it does not otherwise call into the dma mapping core for virtual devices > because calling into the dma mapping code is not valid for the virtual > devices. > > Please fix the rdma core to not call into the hmm dma mapping helpers > for the ib_uses_virt_dma() case. > Thank you for the clarification and guidance. I'll look into updating the RDMA core to avoid calling hmm_dma_map_alloc() when ib_uses_virt_dma() is true. That should help keep the layering and responsibilities properly separated. Thanks again, Daisuke
© 2016 - 2025 Red Hat, Inc.