[PATCH] nvme-pci: fix swapped arguments in SGL DMA unmap path

Alireza Haghdoost posted 1 patch 2 months, 1 week ago
drivers/nvme/host/pci.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
[PATCH] nvme-pci: fix swapped arguments in SGL DMA unmap path
Posted by Alireza Haghdoost 2 months, 1 week ago
The arguments to nvme_free_sgls() in nvme_unmap_data() are swapped for
the multi-entry SGL case. The first argument (sge) should be the
segment descriptor from the NVMe command's data pointer (type
NVME_SGL_FMT_LAST_SEG_DESC), and the second argument (sg_list) should
be the pool-allocated array of data descriptors.

With the arguments swapped, sge points to the first data descriptor
(type NVME_SGL_FMT_DATA_DESC). nvme_free_sgls() sees a data descriptor,
unmaps only that single entry, and returns -- leaking the DMA mappings
for all subsequent segments.

This manifests as unbounded iommu_iova slab growth on ARM64 systems
with 64K pages and IOMMU DMA translation, where IOVA coalescing is
disabled due to the NVMe 4K page / IOMMU 64K page granularity
mismatch. On x86 and ARM64 with 4K pages, IOVA coalescing handles
the unmap via dma_iova_destroy() and the buggy path is never reached.

Fixes: 7ce3c1dd78fc ("nvme-pci: convert the data mapping to blk_rq_dma_map")
Signed-off-by: Alireza Haghdoost <haghdoost@uber.com>
---
 drivers/nvme/host/pci.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 28f638413e122..728999e4247d8 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -761,8 +761,8 @@ static void nvme_unmap_data(struct request *req)

        if (!blk_rq_dma_unmap(req, dma_dev, &iod->dma_state, iod->total_len)) {
                if (nvme_pci_cmd_use_sgl(&iod->cmd))
-                       nvme_free_sgls(req, iod->descriptors[0],
-                                      &iod->cmd.common.dptr.sgl);
+                       nvme_free_sgls(req, &iod->cmd.common.dptr.sgl,
+                                      iod->descriptors[0]);
                else
                        nvme_free_prps(req);
        }
-- 
2.39.5
Re: [PATCH] nvme-pci: fix swapped arguments in SGL DMA unmap path
Posted by Alireza Haghdoost 2 months, 1 week ago
On Fri, Apr 10, 2026 at 3:29 PM Alireza Haghdoost <haghdoost@uber.com> wrote:
>
> The arguments to nvme_free_sgls() in nvme_unmap_data() are swapped for
> the multi-entry SGL case. The first argument (sge) should be the
> segment descriptor from the NVMe command's data pointer (type
> NVME_SGL_FMT_LAST_SEG_DESC), and the second argument (sg_list) should
> be the pool-allocated array of data descriptors.
>
> With the arguments swapped, sge points to the first data descriptor
> (type NVME_SGL_FMT_DATA_DESC). nvme_free_sgls() sees a data descriptor,
> unmaps only that single entry, and returns -- leaking the DMA mappings
> for all subsequent segments.
>
> This manifests as unbounded iommu_iova slab growth on ARM64 systems
> with 64K pages and IOMMU DMA translation, where IOVA coalescing is
> disabled due to the NVMe 4K page / IOMMU 64K page granularity
> mismatch. On x86 and ARM64 with 4K pages, IOVA coalescing handles
> the unmap via dma_iova_destroy() and the buggy path is never reached.
>
> Fixes: 7ce3c1dd78fc ("nvme-pci: convert the data mapping to blk_rq_dma_map")
> Signed-off-by: Alireza Haghdoost <haghdoost@uber.com>
> ---
>  drivers/nvme/host/pci.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 28f638413e122..728999e4247d8 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -761,8 +761,8 @@ static void nvme_unmap_data(struct request *req)
>
>         if (!blk_rq_dma_unmap(req, dma_dev, &iod->dma_state, iod->total_len)) {
>                 if (nvme_pci_cmd_use_sgl(&iod->cmd))
> -                       nvme_free_sgls(req, iod->descriptors[0],
> -                                      &iod->cmd.common.dptr.sgl);
> +                       nvme_free_sgls(req, &iod->cmd.common.dptr.sgl,
> +                                      iod->descriptors[0]);
>                 else
>                         nvme_free_prps(req);
>         }
> --
> 2.39.5

Apologies, I wasn't aware Roger Pau Monne already submitted this fix
(commit a54afbc8a2138 "nvme-pci: DMA unmap the correct regions in
nvme_free_sgls"), which is already in 6.19.y. Please disregard this
patch.

For the record, we independently confirmed the bug on production ARM64
hosts (64K pages, IOMMU DMA-FQ) where it caused ~490 GiB of leaked
iommu_iova slab over 42 days. Setting sgl_threshold=0 stopped the leak
immediately.

Alireza