drivers/nvme/host/rdma.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
nvme_rdma_remove_one() first verifies that an ib_device has an
nvme_rdma_device on device_list, but it drops device_list_mutex before it
walks nvme_rdma_ctrl_list. It then identifies matching controllers by
dereferencing ctrl->device->dev while holding only nvme_rdma_ctrl_mutex.
ctrl->device is a cached copy of queue 0's struct nvme_rdma_device. Queue
teardown owns that object's kref, so controller list membership does not
keep the nvme_rdma_device alive.
The buggy scenario involves two paths, with each column showing the order
within that path:
RDMA remove callback: Controller error recovery:
1. find ndev on device_list 1. run nvme_rdma_error_recovery_work()
2. drop device_list_mutex 2. tear down the admin queue
3. walk nvme_rdma_ctrl_list 3. drop the final queue device ref
4. read ctrl->device->dev 4. free the nvme_rdma_device
Fix this by taking a temporary reference to the matching struct
nvme_rdma_device while still holding device_list_mutex. The controller
walk can then compare ctrl->device directly with the pinned
nvme_rdma_device without dereferencing a queue-owned object that might
have been freed. Release the temporary reference after the delete
workqueue flush.
Validation reproduced this kernel report:
BUG: KASAN: slab-use-after-free in nvme_rdma_remove_one+0x281/0x2c0
[nvme_rdma]
Call Trace:
<TASK>
dump_stack_lvl+0x66/0xa0
print_report+0xce/0x630
? nvme_rdma_remove_one+0x281/0x2c0 [nvme_rdma]
? srso_alias_return_thunk+0x5/0xfbef5
? __virt_addr_valid+0x20d/0x410
? nvme_rdma_remove_one+0x281/0x2c0 [nvme_rdma]
kasan_report+0xe0/0x110
? nvme_rdma_remove_one+0x281/0x2c0 [nvme_rdma]
nvme_rdma_remove_one+0x281/0x2c0 [nvme_rdma]
remove_client_context+0xa9/0xf0 [ib_core]
disable_device+0x12d/0x240 [ib_core]
? __pfx_disable_device+0x10/0x10 [ib_core]
? srso_alias_return_thunk+0x5/0xfbef5
? __mutex_unlock_slowpath+0x147/0x900
__ib_unregister_device+0x26f/0x460 [ib_core]
ib_unregister_device_and_put+0x55/0x70 [ib_core]
nldev_dellink+0x29e/0x3c0 [ib_core]
? unwind_next_frame+0x6e3/0x2190
? __pfx_nldev_dellink+0x10/0x10 [ib_core]
? lock_acquire+0x2b8/0x2f0
? srso_alias_return_thunk+0x5/0xfbef5
? cap_capable+0x196/0x330
? __pfx_down_read+0x10/0x10
rdma_nl_rcv_msg+0x2db/0x5f0 [ib_core]
? __pfx_rdma_nl_rcv_msg+0x10/0x10 [ib_core]
rdma_nl_rcv_skb.constprop.0.isra.0+0x222/0x380 [ib_core]
? __pfx_rdma_nl_rcv_skb.constprop.0.isra.0+0x10/0x10 [ib_core]
? srso_alias_return_thunk+0x5/0xfbef5
? srso_alias_return_thunk+0x5/0xfbef5
? netlink_deliver_tap+0x150/0xac0
netlink_unicast+0x47c/0x790
? __pfx_netlink_unicast+0x10/0x10
netlink_sendmsg+0x767/0xc30
? __pfx_netlink_sendmsg+0x10/0x10
? lock_release+0x1e0/0x280
__sys_sendto+0x339/0x390
? __pfx___sys_sendto+0x10/0x10
? srso_alias_return_thunk+0x5/0xfbef5
__x64_sys_sendto+0xe0/0x1c0
? do_syscall_64+0x81/0x6a0
? srso_alias_return_thunk+0x5/0xfbef5
? trace_hardirqs_on+0x18/0x160
do_syscall_64+0x115/0x6a0
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Allocated by task 436:
kasan_save_stack+0x33/0x60
kasan_save_track+0x14/0x30
__kasan_kmalloc+0xaa/0xb0
nvme_rdma_cm_handler+0xcbc/0x2914 [nvme_rdma]
cma_cm_event_handler+0xb2/0x390 [rdma_cm]
addr_handler+0x199/0x2b0 [rdma_cm]
process_one_req+0x113/0x650 [ib_core]
process_one_work+0x8d0/0x1870
worker_thread+0x575/0xf80
kthread+0x2e7/0x3c0
ret_from_fork+0x576/0x810
ret_from_fork_asm+0x1a/0x30
Freed by task 436:
kasan_save_stack+0x33/0x60
kasan_save_track+0x14/0x30
kasan_save_free_info+0x3b/0x60
__kasan_slab_free+0x5f/0x80
kfree+0x307/0x580
nvme_rdma_free_dev+0x16d/0x260 [nvme_rdma]
nvme_rdma_free_queue+0x6d/0x90 [nvme_rdma]
nvme_rdma_error_recovery_work+0x7f/0x110 [nvme_rdma]
process_one_work+0x8d0/0x1870
worker_thread+0x575/0xf80
kthread+0x2e7/0x3c0
ret_from_fork+0x576/0x810
ret_from_fork_asm+0x1a/0x30
Fixes: e87a911fed07 ("nvme-rdma: use ib_client API to detect device removal")
Assisted-by: Codex:gpt-5.5
Signed-off-by: Cen Zhang <zzzccc427@gmail.com>
---
v4:
Renamed the subject and commit message wording to refer explicitly to struct
nvme_rdma_device instead of the vague "device wrapper" term.
v3:
Removed the temporary iterator and goto-based lookup in
nvme_rdma_remove_one().
Preserved the original found-based device-list flow while pinning the
matched nvme_rdma_device.
v2:
Reworked the fix to take a temporary nvme_rdma_device reference during the
device_list lookup instead of adding a cached ib_device field to struct
nvme_rdma_ctrl.
Changed the controller-list match to compare ctrl->device against the pinned
wrapper while preserving the existing delete workqueue flush behavior.
drivers/nvme/host/rdma.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 6909e3542794..663e38540073 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -2378,7 +2378,7 @@ static void nvme_rdma_remove_one(struct ib_device *ib_device, void *client_data)
mutex_lock(&device_list_mutex);
list_for_each_entry(ndev, &device_list, entry) {
if (ndev->dev == ib_device) {
- found = true;
+ found = nvme_rdma_dev_get(ndev);
break;
}
}
@@ -2390,13 +2390,14 @@ static void nvme_rdma_remove_one(struct ib_device *ib_device, void *client_data)
/* Delete all controllers using this device */
mutex_lock(&nvme_rdma_ctrl_mutex);
list_for_each_entry(ctrl, &nvme_rdma_ctrl_list, list) {
- if (ctrl->device->dev != ib_device)
+ if (ctrl->device != ndev)
continue;
nvme_delete_ctrl(&ctrl->ctrl);
}
mutex_unlock(&nvme_rdma_ctrl_mutex);
flush_workqueue(nvme_delete_wq);
+ nvme_rdma_dev_put(ndev);
}
static struct ib_client nvme_rdma_ib_client = {
--
2.43.0
© 2016 - 2026 Red Hat, Inc.