nvme_setup_descriptor_pools() indexes dev->descriptor_pools[] using the
numa_node forwarded from hctx->numa_node by its single caller,
nvme_init_hctx_common(). On a non-NUMA kernel hctx->numa_node is
NUMA_NO_NODE (-1). Because the parameter was declared 'unsigned', the
value becomes UINT_MAX and the index walks off the array (sized to
nr_node_ids), faulting during nvme_alloc_ns() and leaving the namespace
without a /dev node.
Reproduces on any NVMe controller probed by a CONFIG_NUMA=n kernel:
BUG: unable to handle page fault for address: ffff889101603d38
RIP: 0010:nvme_init_hctx_common+0x5a/0x190 [nvme]
Call Trace:
nvme_init_hctx+0x10/0x20 [nvme]
nvme_alloc_ns+0x9e/0xa10 [nvme_core]
nvme_scan_ns+0x301/0x3b0 [nvme_core]
nvme_scan_ns_async+0x23/0x30 [nvme_core]
Switch the parameter to int and fall back to node 0 when it is
NUMA_NO_NODE; node 0 is always present.
Fixes: d977506f8863 ("nvme-pci: make PRP list DMA pools per-NUMA-node")
Reported-by: Sung-woo Kim <iam@sung-woo.kim>
Link: https://lore.kernel.org/r/20260309062840.2937858-2-iam@sung-woo.kim
Signed-off-by: Mateusz Nowicki <mateusz.nowicki@posteo.net>
---
v2:
- drop the (numa_node >= nr_node_ids) check: cpu_to_node() never returns
that in practice, so NUMA_NO_NODE is the only out-of-range value worth
guarding against. (Caleb Sander)
- test against NUMA_NO_NODE explicitly instead of (numa_node < 0)
- add Fixes: tag, Reported-by/Link to Sung-woo Kim's earlier report.
drivers/nvme/host/pci.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 9fd04cd7c5cb..9815823c974e 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -587,11 +587,16 @@ static bool nvme_dbbuf_update_and_check_event(u16 value, __le32 *dbbuf_db,
}
static struct nvme_descriptor_pools *
-nvme_setup_descriptor_pools(struct nvme_dev *dev, unsigned numa_node)
+nvme_setup_descriptor_pools(struct nvme_dev *dev, int numa_node)
{
- struct nvme_descriptor_pools *pools = &dev->descriptor_pools[numa_node];
+ struct nvme_descriptor_pools *pools;
size_t small_align = NVME_SMALL_POOL_SIZE;
+ if (numa_node == NUMA_NO_NODE)
+ numa_node = 0;
+
+ pools = &dev->descriptor_pools[numa_node];
+
if (pools->small)
return pools; /* already initialized */
--
2.53.0