nvme-pci: Fix NULL pointer dereference in nvme_pci_prp_iter_next

[PATCH V1] nvme-pci: Fix NULL pointer dereference in nvme_pci_prp_iter_next

Posted by Pradeep P V K 4 days, 19 hours ago

Fix a NULL pointer dereference that occurs in nvme_pci_prp_iter_next()
when SWIOTLB bounce buffering becomes active during runtime.

The issue occurs when SWIOTLB activation changes the device's DMA
mapping requirements at runtime, creating a mismatch between
iod->dma_vecs allocation and access logic.

The problem manifests when:
1. Device initially operates with dma_skip_sync=true
   (coherent DMA assumed)
2. First SWIOTLB mapping occurs due to DMA address limitations,
   memory encryption, or IOMMU bounce buffering requirements
3. SWIOTLB calls dma_reset_need_sync(), permanently setting
   dma_skip_sync=false
4. Subsequent I/Os now have dma_need_unmap()=true, requiring
   iod->dma_vecs

The issue arises from the timing of allocation versus access:
- nvme_pci_setup_data_prp() allocates iod->dma_vecs only when both
  (!dma_use_iova() && dma_need_unmap()) conditions are met
- nvme_pci_prp_iter_next() assumes iod->dma_vecs is valid whenever
  the same conditions are true, without NULL checking
- This creates a race where the device's DMA requirements change
  dynamically after the initial allocation decision, leading to NULL
  pointer access

  Unable to handle kernel NULL pointer dereference at
  virtual address 0000000000000000
  pc : nvme_pci_prp_iter_next+0xe4/0x128 [nvme]
  Call trace:
   nvme_pci_prp_iter_next+0xe4/0x128 [nvme]
   nvme_prep_rq+0x5f4/0xa6c [nvme]
   nvme_queue_rqs+0xa8/0x18c [nvme]
   blk_mq_dispatch_queue_requests.constprop.0+0x108/0x120
   blk_mq_flush_plug_list+0x8c/0x174
   __blk_flush_plug+0xe4/0x140
   blk_finish_plug+0x38/0x4c
   read_pages+0x184/0x288
   page_cache_ra_order+0x1e0/0x3a4
   filemap_fault+0x518/0xa90
   __do_fault+0x3c/0x22c
   __handle_mm_fault+0x10ec/0x19b8
   handle_mm_fault+0xb4/0x294

Fix this by:
1. Initialize iod->dma_vecs to NULL in nvme_prep_rq()
2. Add NULL pointer check before accessing iod->dma_vecs in
   nvme_pci_prp_iter_next()
3. Set iod->dma_vecs to NULL after freeing for defensive programming

Fixes: b8b7570a7ec8 ("nvme-pci: fix dma unmapping when using PRPs and not using the IOVA mapping")
Co-developed-by: Nitin Rawat <nitin.rawat@oss.qualcomm.com>
Signed-off-by: Nitin Rawat <nitin.rawat@oss.qualcomm.com>
Signed-off-by: Pradeep P V K <pradeep.pragallapati@oss.qualcomm.com>
---
 drivers/nvme/host/pci.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 2a52cf46d960..e235654e7ee0 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -720,6 +720,7 @@ static void nvme_free_prps(struct request *req, unsigned int attrs)
 		dma_unmap_phys(nvmeq->dev->dev, iod->dma_vecs[i].addr,
 			       iod->dma_vecs[i].len, rq_dma_dir(req), attrs);
 	mempool_free(iod->dma_vecs, nvmeq->dev->dmavec_mempool);
+	iod->dma_vecs = NULL;
 }
 
 static void nvme_free_sgls(struct request *req, struct nvme_sgl_desc *sge,
@@ -825,7 +826,7 @@ static bool nvme_pci_prp_iter_next(struct request *req, struct device *dma_dev,
 		return true;
 	if (!blk_rq_dma_map_iter_next(req, dma_dev, iter))
 		return false;
-	if (!dma_use_iova(&iod->dma_state) && dma_need_unmap(dma_dev)) {
+	if (iod->dma_vecs && !dma_use_iova(&iod->dma_state) && dma_need_unmap(dma_dev)) {
 		iod->dma_vecs[iod->nr_dma_vecs].addr = iter->addr;
 		iod->dma_vecs[iod->nr_dma_vecs].len = iter->len;
 		iod->nr_dma_vecs++;
@@ -1218,6 +1219,7 @@ static blk_status_t nvme_prep_rq(struct request *req)
 	iod->nr_descriptors = 0;
 	iod->total_len = 0;
 	iod->meta_total_len = 0;
+	iod->dma_vecs = NULL;
 
 	ret = nvme_setup_cmd(req->q->queuedata, req);
 	if (ret)
-- 
2.34.1

Re: [PATCH V1] nvme-pci: Fix NULL pointer dereference in nvme_pci_prp_iter_next

Posted by Keith Busch 4 days, 15 hours ago

On Mon, Feb 02, 2026 at 06:27:38PM +0530, Pradeep P V K wrote:
> @@ -720,6 +720,7 @@ static void nvme_free_prps(struct request *req, unsigned int attrs)
>  		dma_unmap_phys(nvmeq->dev->dev, iod->dma_vecs[i].addr,
>  			       iod->dma_vecs[i].len, rq_dma_dir(req), attrs);
>  	mempool_free(iod->dma_vecs, nvmeq->dev->dmavec_mempool);
> +	iod->dma_vecs = NULL;
>  }
>  
>  static void nvme_free_sgls(struct request *req, struct nvme_sgl_desc *sge,
> @@ -825,7 +826,7 @@ static bool nvme_pci_prp_iter_next(struct request *req, struct device *dma_dev,
>  		return true;
>  	if (!blk_rq_dma_map_iter_next(req, dma_dev, iter))
>  		return false;
> -	if (!dma_use_iova(&iod->dma_state) && dma_need_unmap(dma_dev)) {
> +	if (iod->dma_vecs && !dma_use_iova(&iod->dma_state) && dma_need_unmap(dma_dev)) {

So the return of dma_need_unmap() may change after any call to
dma_map_*? Does it only go from false -> true, and never back to false?

Since we didn't allocate the dma_vecs here, doesn't that mean the
completion side is leaking the mapping?

Re: [PATCH V1] nvme-pci: Fix NULL pointer dereference in nvme_pci_prp_iter_next

Posted by Christoph Hellwig 4 days, 17 hours ago

On Mon, Feb 02, 2026 at 06:27:38PM +0530, Pradeep P V K wrote:
> Fix a NULL pointer dereference that occurs in nvme_pci_prp_iter_next()
> when SWIOTLB bounce buffering becomes active during runtime.
> 
> The issue occurs when SWIOTLB activation changes the device's DMA
> mapping requirements at runtime,
>
> creating a mismatch between
> iod->dma_vecs allocation and access logic.
> 
> The problem manifests when:
> 1. Device initially operates with dma_skip_sync=true
>    (coherent DMA assumed)
> 2. First SWIOTLB mapping occurs due to DMA address limitations,
>    memory encryption, or IOMMU bounce buffering requirements
> 3. SWIOTLB calls dma_reset_need_sync(), permanently setting
>    dma_skip_sync=false
> 4. Subsequent I/Os now have dma_need_unmap()=true, requiring
>    iod->dma_vecs

I think this patch just papers over the bug.  If dma_need_unmap
can't be trusted before the dma_map_* call, we've not saved
the unmap information and the unmap won't work properly.

So we'll need to extend the core code to tell if a mapping
will set dma_skip_sync=false before doing the mapping.

Re: [PATCH V1] nvme-pci: Fix NULL pointer dereference in nvme_pci_prp_iter_next

Posted by Leon Romanovsky 4 days, 17 hours ago

On Mon, Feb 02, 2026 at 03:35:48PM +0100, Christoph Hellwig wrote:
> On Mon, Feb 02, 2026 at 06:27:38PM +0530, Pradeep P V K wrote:
> > Fix a NULL pointer dereference that occurs in nvme_pci_prp_iter_next()
> > when SWIOTLB bounce buffering becomes active during runtime.
> > 
> > The issue occurs when SWIOTLB activation changes the device's DMA
> > mapping requirements at runtime,
> >
> > creating a mismatch between
> > iod->dma_vecs allocation and access logic.
> > 
> > The problem manifests when:
> > 1. Device initially operates with dma_skip_sync=true
> >    (coherent DMA assumed)
> > 2. First SWIOTLB mapping occurs due to DMA address limitations,
> >    memory encryption, or IOMMU bounce buffering requirements
> > 3. SWIOTLB calls dma_reset_need_sync(), permanently setting
> >    dma_skip_sync=false
> > 4. Subsequent I/Os now have dma_need_unmap()=true, requiring
> >    iod->dma_vecs
> 
> I think this patch just papers over the bug.  

Agree

> If dma_need_unmap can't be trusted before the dma_map_* call, we've not saved
> the unmap information and the unmap won't work properly.
> 
> So we'll need to extend the core code to tell if a mapping
> will set dma_skip_sync=false before doing the mapping.

There are two paths that lead to SWIOTLB in dma_direct_map_phys().  
The first is is_swiotlb_force_bounce(dev), which dma_need_unmap() can  
easily evaluate. The second is more problematic, as it depends on  
dma_addr and size, neither of which is available in dma_need_unmap():

  102                 if (unlikely(!dma_capable(dev, dma_addr, size, true)) ||
  103                     dma_kmalloc_needs_bounce(dev, size, dir)) {
  104                         if (is_swiotlb_active(dev))

What about the following change?

diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 37163eb49f9f..1510b93a8791 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -461,6 +461,8 @@ bool dma_need_unmap(struct device *dev)
 {
        if (!dma_map_direct(dev, get_dma_ops(dev)))
                return true;
+       if (is_swiotlb_force_bounce(dev) || is_swiotlb_active(dev))
+               return true;
        if (!dev->dma_skip_sync)
                return true;
        return IS_ENABLED(CONFIG_DMA_API_DEBUG);

Re: [PATCH V1] nvme-pci: Fix NULL pointer dereference in nvme_pci_prp_iter_next

Posted by Robin Murphy 4 days, 17 hours ago

On 2026-02-02 3:22 pm, Leon Romanovsky wrote:
> On Mon, Feb 02, 2026 at 03:35:48PM +0100, Christoph Hellwig wrote:
>> On Mon, Feb 02, 2026 at 06:27:38PM +0530, Pradeep P V K wrote:
>>> Fix a NULL pointer dereference that occurs in nvme_pci_prp_iter_next()
>>> when SWIOTLB bounce buffering becomes active during runtime.
>>>
>>> The issue occurs when SWIOTLB activation changes the device's DMA
>>> mapping requirements at runtime,
>>>
>>> creating a mismatch between
>>> iod->dma_vecs allocation and access logic.
>>>
>>> The problem manifests when:
>>> 1. Device initially operates with dma_skip_sync=true
>>>     (coherent DMA assumed)
>>> 2. First SWIOTLB mapping occurs due to DMA address limitations,
>>>     memory encryption, or IOMMU bounce buffering requirements
>>> 3. SWIOTLB calls dma_reset_need_sync(), permanently setting
>>>     dma_skip_sync=false
>>> 4. Subsequent I/Os now have dma_need_unmap()=true, requiring
>>>     iod->dma_vecs
>>
>> I think this patch just papers over the bug.
> 
> Agree
> 
>> If dma_need_unmap can't be trusted before the dma_map_* call, we've not saved
>> the unmap information and the unmap won't work properly.
>>
>> So we'll need to extend the core code to tell if a mapping
>> will set dma_skip_sync=false before doing the mapping.
> 
> There are two paths that lead to SWIOTLB in dma_direct_map_phys().
> The first is is_swiotlb_force_bounce(dev), which dma_need_unmap() can
> easily evaluate. The second is more problematic, as it depends on
> dma_addr and size, neither of which is available in dma_need_unmap():
> 
>    102                 if (unlikely(!dma_capable(dev, dma_addr, size, true)) ||
>    103                     dma_kmalloc_needs_bounce(dev, size, dir)) {
>    104                         if (is_swiotlb_active(dev))
> 
> What about the following change?
> 
> diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
> index 37163eb49f9f..1510b93a8791 100644
> --- a/kernel/dma/mapping.c
> +++ b/kernel/dma/mapping.c
> @@ -461,6 +461,8 @@ bool dma_need_unmap(struct device *dev)
>   {
>          if (!dma_map_direct(dev, get_dma_ops(dev)))
>                  return true;
> +       if (is_swiotlb_force_bounce(dev) || is_swiotlb_active(dev))
> +               return true;

This will always be true if a default SWIOTLB buffer exists at all, and 
thus pretty much defeat the point of whatever optimisation the caller is 
trying to make.

Thanks,
Robin.

>          if (!dev->dma_skip_sync)
>                  return true;
>          return IS_ENABLED(CONFIG_DMA_API_DEBUG);
>

Re: [PATCH V1] nvme-pci: Fix NULL pointer dereference in nvme_pci_prp_iter_next

Posted by Robin Murphy 4 days, 17 hours ago

On 2026-02-02 2:35 pm, Christoph Hellwig wrote:
> On Mon, Feb 02, 2026 at 06:27:38PM +0530, Pradeep P V K wrote:
>> Fix a NULL pointer dereference that occurs in nvme_pci_prp_iter_next()
>> when SWIOTLB bounce buffering becomes active during runtime.
>>
>> The issue occurs when SWIOTLB activation changes the device's DMA
>> mapping requirements at runtime,
>>
>> creating a mismatch between
>> iod->dma_vecs allocation and access logic.
>>
>> The problem manifests when:
>> 1. Device initially operates with dma_skip_sync=true
>>     (coherent DMA assumed)
>> 2. First SWIOTLB mapping occurs due to DMA address limitations,
>>     memory encryption, or IOMMU bounce buffering requirements
>> 3. SWIOTLB calls dma_reset_need_sync(), permanently setting
>>     dma_skip_sync=false
>> 4. Subsequent I/Os now have dma_need_unmap()=true, requiring
>>     iod->dma_vecs
> 
> I think this patch just papers over the bug.  If dma_need_unmap
> can't be trusted before the dma_map_* call, we've not saved
> the unmap information and the unmap won't work properly.

The dma_need_unmap() kerneldoc says:

"This function must be called after all mappings that might
  need to be unmapped have been performed."

Trying to infer anything from it beforehand is definitely a bug in the 
caller.

> So we'll need to extend the core code to tell if a mapping
> will set dma_skip_sync=false before doing the mapping.

I don't see that being possible - at best we could reasonably infer that 
a fully-coherent system with no sync ops, no SWIOTLB and no DMA_DEBUG 
shouldn't ever set it to true, but as for the other way round, by the 
time you've run through all the SWIOTLB logic to guess whether a 
particular mapping would be bounced or not, you've basically performed 
the mapping anyway. Thus at best, such an API to potentially do a whole 
dry-run mapping before every actual mapping would seem like a pretty 
pointless anti-optimisation.

Thanks,
Robin.

Re: [PATCH V1] nvme-pci: Fix NULL pointer dereference in nvme_pci_prp_iter_next

Posted by Keith Busch 4 days, 15 hours ago

On Mon, Feb 02, 2026 at 03:16:50PM +0000, Robin Murphy wrote:
> On 2026-02-02 2:35 pm, Christoph Hellwig wrote:
> > On Mon, Feb 02, 2026 at 06:27:38PM +0530, Pradeep P V K wrote:
> > > Fix a NULL pointer dereference that occurs in nvme_pci_prp_iter_next()
> > > when SWIOTLB bounce buffering becomes active during runtime.
> > > 
> > > The issue occurs when SWIOTLB activation changes the device's DMA
> > > mapping requirements at runtime,
> > > 
> > > creating a mismatch between
> > > iod->dma_vecs allocation and access logic.
> > > 
> > > The problem manifests when:
> > > 1. Device initially operates with dma_skip_sync=true
> > >     (coherent DMA assumed)
> > > 2. First SWIOTLB mapping occurs due to DMA address limitations,
> > >     memory encryption, or IOMMU bounce buffering requirements
> > > 3. SWIOTLB calls dma_reset_need_sync(), permanently setting
> > >     dma_skip_sync=false
> > > 4. Subsequent I/Os now have dma_need_unmap()=true, requiring
> > >     iod->dma_vecs
> > 
> > I think this patch just papers over the bug.  If dma_need_unmap
> > can't be trusted before the dma_map_* call, we've not saved
> > the unmap information and the unmap won't work properly.
> 
> The dma_need_unmap() kerneldoc says:
> 
> "This function must be called after all mappings that might
>  need to be unmapped have been performed."
> 
> Trying to infer anything from it beforehand is definitely a bug in the
> caller.

Well that doesn't really make sense. No matter how many mappings the
driver has done, there will always be more. ?

Re: [PATCH V1] nvme-pci: Fix NULL pointer dereference in nvme_pci_prp_iter_next

Posted by Robin Murphy 4 days, 14 hours ago

On 2026-02-02 5:13 pm, Keith Busch wrote:
> On Mon, Feb 02, 2026 at 03:16:50PM +0000, Robin Murphy wrote:
>> On 2026-02-02 2:35 pm, Christoph Hellwig wrote:
>>> On Mon, Feb 02, 2026 at 06:27:38PM +0530, Pradeep P V K wrote:
>>>> Fix a NULL pointer dereference that occurs in nvme_pci_prp_iter_next()
>>>> when SWIOTLB bounce buffering becomes active during runtime.
>>>>
>>>> The issue occurs when SWIOTLB activation changes the device's DMA
>>>> mapping requirements at runtime,
>>>>
>>>> creating a mismatch between
>>>> iod->dma_vecs allocation and access logic.
>>>>
>>>> The problem manifests when:
>>>> 1. Device initially operates with dma_skip_sync=true
>>>>      (coherent DMA assumed)
>>>> 2. First SWIOTLB mapping occurs due to DMA address limitations,
>>>>      memory encryption, or IOMMU bounce buffering requirements
>>>> 3. SWIOTLB calls dma_reset_need_sync(), permanently setting
>>>>      dma_skip_sync=false
>>>> 4. Subsequent I/Os now have dma_need_unmap()=true, requiring
>>>>      iod->dma_vecs
>>>
>>> I think this patch just papers over the bug.  If dma_need_unmap
>>> can't be trusted before the dma_map_* call, we've not saved
>>> the unmap information and the unmap won't work properly.
>>
>> The dma_need_unmap() kerneldoc says:
>>
>> "This function must be called after all mappings that might
>>   need to be unmapped have been performed."
>>
>> Trying to infer anything from it beforehand is definitely a bug in the
>> caller.
> 
> Well that doesn't really make sense. No matter how many mappings the
> driver has done, there will always be more. ?

But equally the fact that none of the mappings made so far happened to 
not need bouncing still doesn't mean that future ones won't. This is not 
guaranteed to be a static property of the device, but nor is it really a 
property of the *device* at all; it's a property of a set of one or more 
DMA mappings with the same lifetime, there's just no suitable generic 
notion of that temporal context in the DMA API to carry around and pass 
as an explicit argument, so it's left implicit in the usage model.

Whatever higher-level thing it's doing, the driver must have some 
context, so within "operation A" it makes some DMA mappings, checks 
dma_need_unmap() and sees it's false, so can conclude that "operation A" 
does not need to preserve DMA unmap state. However it may then start 
"operation B", do some more mappings, check dma_need_unmap() and see 
it's now returned true, so "operation B" *does* need to keep the DMA 
data and explicitly unmap it when it finishes.

This is essentially the point I made at the time about it not 
necessarily being as useful a thing as it seems, since if an "operation" 
involves multiple mappings, it must still store the full state of those 
mappings for at least long enough to finish them all and then call 
dma_need_unmap(), to only then see if it might be OK to throw that state 
away again.

Thanks,
Robin.

Re: [PATCH V1] nvme-pci: Fix NULL pointer dereference in nvme_pci_prp_iter_next

Posted by Christoph Hellwig 4 days, 14 hours ago

On Mon, Feb 02, 2026 at 10:13:24AM -0700, Keith Busch wrote:
> > "This function must be called after all mappings that might
> >  need to be unmapped have been performed."
> > 
> > Trying to infer anything from it beforehand is definitely a bug in the
> > caller.
> 
> Well that doesn't really make sense. No matter how many mappings the
> driver has done, there will always be more. ?

Yeah.  It's more like if this returns true, all future calls, plus
the previous one (which might have caused this).  For that something
like the patch below should work in nvme.  Totally untested as I'm
about to head away from the desk and prepare dinner.


diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 2a52cf46d960..f944b747e1bd 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -816,6 +816,22 @@ static void nvme_unmap_data(struct request *req)
 		nvme_free_descriptors(req);
 }
 
+static bool nvme_pci_alloc_dma_vecs(struct request *req,
+		struct blk_dma_iter *iter)
+{
+	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
+	struct nvme_queue *nvmeq = req->mq_hctx->driver_data;
+
+	iod->dma_vecs = mempool_alloc(nvmeq->dev->dmavec_mempool,
+			GFP_ATOMIC);
+	if (!iod->dma_vecs)
+		return false;
+	iod->dma_vecs[0].addr = iter->addr;
+	iod->dma_vecs[0].len = iter->len;
+	iod->nr_dma_vecs = 1;
+	return true;
+}
+
 static bool nvme_pci_prp_iter_next(struct request *req, struct device *dma_dev,
 		struct blk_dma_iter *iter)
 {
@@ -826,6 +842,8 @@ static bool nvme_pci_prp_iter_next(struct request *req, struct device *dma_dev,
 	if (!blk_rq_dma_map_iter_next(req, dma_dev, iter))
 		return false;
 	if (!dma_use_iova(&iod->dma_state) && dma_need_unmap(dma_dev)) {
+		if (!iod->nr_dma_vecs && !nvme_pci_alloc_dma_vecs(req, iter))
+			return false;
 		iod->dma_vecs[iod->nr_dma_vecs].addr = iter->addr;
 		iod->dma_vecs[iod->nr_dma_vecs].len = iter->len;
 		iod->nr_dma_vecs++;
@@ -844,13 +862,8 @@ static blk_status_t nvme_pci_setup_data_prp(struct request *req,
 	__le64 *prp_list;
 
 	if (!dma_use_iova(&iod->dma_state) && dma_need_unmap(nvmeq->dev->dev)) {
-		iod->dma_vecs = mempool_alloc(nvmeq->dev->dmavec_mempool,
-				GFP_ATOMIC);
-		if (!iod->dma_vecs)
+		if (!nvme_pci_alloc_dma_vecs(req, iter))
 			return BLK_STS_RESOURCE;
-		iod->dma_vecs[0].addr = iter->addr;
-		iod->dma_vecs[0].len = iter->len;
-		iod->nr_dma_vecs = 1;
 	}
 
 	/*

Re: [PATCH V1] nvme-pci: Fix NULL pointer dereference in nvme_pci_prp_iter_next

Posted by Keith Busch 4 days, 13 hours ago

On Mon, Feb 02, 2026 at 06:36:24PM +0100, Christoph Hellwig wrote:
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 2a52cf46d960..f944b747e1bd 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -816,6 +816,22 @@ static void nvme_unmap_data(struct request *req)
>  		nvme_free_descriptors(req);
>  }
>  
> +static bool nvme_pci_alloc_dma_vecs(struct request *req,
> +		struct blk_dma_iter *iter)
> +{
> +	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
> +	struct nvme_queue *nvmeq = req->mq_hctx->driver_data;
> +
> +	iod->dma_vecs = mempool_alloc(nvmeq->dev->dmavec_mempool,
> +			GFP_ATOMIC);
> +	if (!iod->dma_vecs)
> +		return false;
> +	iod->dma_vecs[0].addr = iter->addr;
> +	iod->dma_vecs[0].len = iter->len;
> +	iod->nr_dma_vecs = 1;
> +	return true;
> +}
> +
>  static bool nvme_pci_prp_iter_next(struct request *req, struct device *dma_dev,
>  		struct blk_dma_iter *iter)
>  {
> @@ -826,6 +842,8 @@ static bool nvme_pci_prp_iter_next(struct request *req, struct device *dma_dev,
>  	if (!blk_rq_dma_map_iter_next(req, dma_dev, iter))
>  		return false;
>  	if (!dma_use_iova(&iod->dma_state) && dma_need_unmap(dma_dev)) {
> +		if (!iod->nr_dma_vecs && !nvme_pci_alloc_dma_vecs(req, iter))
> +			return false;

In the case where this iteration caused dma_need_unmap() to toggle to
true, this is the iteration that allocates the dma_vecs, and it
initializes the first entry to this iter. But the next lines proceed to
the save this iter in the next index, so it's doubly accounted for and
will get unmapped twice in the completion.

Also, if the allocation fails, we should set iter->status to
BLK_STS_RESOURCE so the callers know why the iteration can't continue.
Otherwise, the caller will think the request is badly formed if you
return false from here without setting iter->status.

Here's my quick take. Boot tested with swiotlb enabled, but haven't
tried to test the changing dma_need_unmap() scenario.
---
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 9fc4a60280a07..233378faab9bd 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -816,6 +816,28 @@ static void nvme_unmap_data(struct request *req)
 		nvme_free_descriptors(req);
 }
 
+static bool nvme_pci_prp_save_mapping(struct blk_dma_iter *iter,
+				      struct request *req)
+{
+	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
+
+	if (!iod->dma_vecs) {
+		struct nvme_queue *nvmeq = req->mq_hctx->driver_data;
+
+		iod->dma_vecs = mempool_alloc(nvmeq->dev->dmavec_mempool,
+				GFP_ATOMIC);
+		if (!iod->dma_vecs) {
+			iter->status = BLK_STS_RESOURCE;
+			return false;
+		}
+	}
+
+	iod->dma_vecs[iod->nr_dma_vecs].addr = iter->addr;
+	iod->dma_vecs[iod->nr_dma_vecs].len = iter->len;
+	iod->nr_dma_vecs++;
+	return true;
+}
+
 static bool nvme_pci_prp_iter_next(struct request *req, struct device *dma_dev,
 		struct blk_dma_iter *iter)
 {
@@ -825,11 +847,8 @@ static bool nvme_pci_prp_iter_next(struct request *req, struct device *dma_dev,
 		return true;
 	if (!blk_rq_dma_map_iter_next(req, dma_dev, iter))
 		return false;
-	if (!dma_use_iova(&iod->dma_state) && dma_need_unmap(dma_dev)) {
-		iod->dma_vecs[iod->nr_dma_vecs].addr = iter->addr;
-		iod->dma_vecs[iod->nr_dma_vecs].len = iter->len;
-		iod->nr_dma_vecs++;
-	}
+	if (!dma_use_iova(&iod->dma_state) && dma_need_unmap(dma_dev))
+		return nvme_pci_prp_save_mapping(iter, req);
 	return true;
 }
 
@@ -843,15 +862,9 @@ static blk_status_t nvme_pci_setup_data_prp(struct request *req,
 	unsigned int prp_len, i;
 	__le64 *prp_list;
 
-	if (!dma_use_iova(&iod->dma_state) && dma_need_unmap(nvmeq->dev->dev)) {
-		iod->dma_vecs = mempool_alloc(nvmeq->dev->dmavec_mempool,
-				GFP_ATOMIC);
-		if (!iod->dma_vecs)
-			return BLK_STS_RESOURCE;
-		iod->dma_vecs[0].addr = iter->addr;
-		iod->dma_vecs[0].len = iter->len;
-		iod->nr_dma_vecs = 1;
-	}
+	if (!dma_use_iova(&iod->dma_state) && dma_need_unmap(nvmeq->dev->dev))
+		if (!nvme_pci_prp_save_mapping(iter, req))
+			return iter->status;
 
 	/*
 	 * PRP1 always points to the start of the DMA transfers.
@@ -1218,6 +1231,8 @@ static blk_status_t nvme_prep_rq(struct request *req)
 	iod->nr_descriptors = 0;
 	iod->total_len = 0;
 	iod->meta_total_len = 0;
+	iod->nr_dma_vecs = 0;
+	iod->dma_vecs = NULL;
 
 	ret = nvme_setup_cmd(req->q->queuedata, req);
 	if (ret)
--

Re: [PATCH V1] nvme-pci: Fix NULL pointer dereference in nvme_pci_prp_iter_next

Posted by Leon Romanovsky 3 days, 22 hours ago

On Mon, Feb 02, 2026 at 11:59:04AM -0700, Keith Busch wrote:
> On Mon, Feb 02, 2026 at 06:36:24PM +0100, Christoph Hellwig wrote:
> > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> > index 2a52cf46d960..f944b747e1bd 100644
> > --- a/drivers/nvme/host/pci.c
> > +++ b/drivers/nvme/host/pci.c
> > @@ -816,6 +816,22 @@ static void nvme_unmap_data(struct request *req)
> >  		nvme_free_descriptors(req);
> >  }
> >  
> > +static bool nvme_pci_alloc_dma_vecs(struct request *req,
> > +		struct blk_dma_iter *iter)
> > +{
> > +	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
> > +	struct nvme_queue *nvmeq = req->mq_hctx->driver_data;
> > +
> > +	iod->dma_vecs = mempool_alloc(nvmeq->dev->dmavec_mempool,
> > +			GFP_ATOMIC);
> > +	if (!iod->dma_vecs)
> > +		return false;
> > +	iod->dma_vecs[0].addr = iter->addr;
> > +	iod->dma_vecs[0].len = iter->len;
> > +	iod->nr_dma_vecs = 1;
> > +	return true;
> > +}
> > +
> >  static bool nvme_pci_prp_iter_next(struct request *req, struct device *dma_dev,
> >  		struct blk_dma_iter *iter)
> >  {
> > @@ -826,6 +842,8 @@ static bool nvme_pci_prp_iter_next(struct request *req, struct device *dma_dev,
> >  	if (!blk_rq_dma_map_iter_next(req, dma_dev, iter))
> >  		return false;
> >  	if (!dma_use_iova(&iod->dma_state) && dma_need_unmap(dma_dev)) {
> > +		if (!iod->nr_dma_vecs && !nvme_pci_alloc_dma_vecs(req, iter))
> > +			return false;
> 
> In the case where this iteration caused dma_need_unmap() to toggle to
> true, this is the iteration that allocates the dma_vecs, and it
> initializes the first entry to this iter. But the next lines proceed to
> the save this iter in the next index, so it's doubly accounted for and
> will get unmapped twice in the completion.
> 
> Also, if the allocation fails, we should set iter->status to
> BLK_STS_RESOURCE so the callers know why the iteration can't continue.
> Otherwise, the caller will think the request is badly formed if you
> return false from here without setting iter->status.
> 
> Here's my quick take. Boot tested with swiotlb enabled, but haven't
> tried to test the changing dma_need_unmap() scenario.
> ---
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index 9fc4a60280a07..233378faab9bd 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -816,6 +816,28 @@ static void nvme_unmap_data(struct request *req)
>  		nvme_free_descriptors(req);
>  }
>  
> +static bool nvme_pci_prp_save_mapping(struct blk_dma_iter *iter,
> +				      struct request *req)
> +{
> +	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
> +
> +	if (!iod->dma_vecs) {
> +		struct nvme_queue *nvmeq = req->mq_hctx->driver_data;
> +
> +		iod->dma_vecs = mempool_alloc(nvmeq->dev->dmavec_mempool,
> +				GFP_ATOMIC);
> +		if (!iod->dma_vecs) {
> +			iter->status = BLK_STS_RESOURCE;
> +			return false;
> +		}
> +	}
> +
> +	iod->dma_vecs[iod->nr_dma_vecs].addr = iter->addr;
> +	iod->dma_vecs[iod->nr_dma_vecs].len = iter->len;
> +	iod->nr_dma_vecs++;
> +	return true;
> +}
> +
>  static bool nvme_pci_prp_iter_next(struct request *req, struct device *dma_dev,
>  		struct blk_dma_iter *iter)
>  {
> @@ -825,11 +847,8 @@ static bool nvme_pci_prp_iter_next(struct request *req, struct device *dma_dev,
>  		return true;
>  	if (!blk_rq_dma_map_iter_next(req, dma_dev, iter))
>  		return false;
> -	if (!dma_use_iova(&iod->dma_state) && dma_need_unmap(dma_dev)) {
> -		iod->dma_vecs[iod->nr_dma_vecs].addr = iter->addr;
> -		iod->dma_vecs[iod->nr_dma_vecs].len = iter->len;
> -		iod->nr_dma_vecs++;
> -	}
> +	if (!dma_use_iova(&iod->dma_state) && dma_need_unmap(dma_dev))

Can dev->dma_skip_sync be modified in parallel with this check?  
If so, dma_need_unmap() may return different results depending on the  
time at which it is invoked.

> +		return nvme_pci_prp_save_mapping(iter, req);

Thanks

Re: [PATCH V1] nvme-pci: Fix NULL pointer dereference in nvme_pci_prp_iter_next

Posted by Robin Murphy 3 days, 18 hours ago

On 2026-02-03 9:42 am, Leon Romanovsky wrote:
> On Mon, Feb 02, 2026 at 11:59:04AM -0700, Keith Busch wrote:
>> On Mon, Feb 02, 2026 at 06:36:24PM +0100, Christoph Hellwig wrote:
>>> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
>>> index 2a52cf46d960..f944b747e1bd 100644
>>> --- a/drivers/nvme/host/pci.c
>>> +++ b/drivers/nvme/host/pci.c
>>> @@ -816,6 +816,22 @@ static void nvme_unmap_data(struct request *req)
>>>   		nvme_free_descriptors(req);
>>>   }
>>>   
>>> +static bool nvme_pci_alloc_dma_vecs(struct request *req,
>>> +		struct blk_dma_iter *iter)
>>> +{
>>> +	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
>>> +	struct nvme_queue *nvmeq = req->mq_hctx->driver_data;
>>> +
>>> +	iod->dma_vecs = mempool_alloc(nvmeq->dev->dmavec_mempool,
>>> +			GFP_ATOMIC);
>>> +	if (!iod->dma_vecs)
>>> +		return false;
>>> +	iod->dma_vecs[0].addr = iter->addr;
>>> +	iod->dma_vecs[0].len = iter->len;
>>> +	iod->nr_dma_vecs = 1;
>>> +	return true;
>>> +}
>>> +
>>>   static bool nvme_pci_prp_iter_next(struct request *req, struct device *dma_dev,
>>>   		struct blk_dma_iter *iter)
>>>   {
>>> @@ -826,6 +842,8 @@ static bool nvme_pci_prp_iter_next(struct request *req, struct device *dma_dev,
>>>   	if (!blk_rq_dma_map_iter_next(req, dma_dev, iter))
>>>   		return false;
>>>   	if (!dma_use_iova(&iod->dma_state) && dma_need_unmap(dma_dev)) {
>>> +		if (!iod->nr_dma_vecs && !nvme_pci_alloc_dma_vecs(req, iter))
>>> +			return false;
>>
>> In the case where this iteration caused dma_need_unmap() to toggle to
>> true, this is the iteration that allocates the dma_vecs, and it
>> initializes the first entry to this iter. But the next lines proceed to
>> the save this iter in the next index, so it's doubly accounted for and
>> will get unmapped twice in the completion.
>>
>> Also, if the allocation fails, we should set iter->status to
>> BLK_STS_RESOURCE so the callers know why the iteration can't continue.
>> Otherwise, the caller will think the request is badly formed if you
>> return false from here without setting iter->status.
>>
>> Here's my quick take. Boot tested with swiotlb enabled, but haven't
>> tried to test the changing dma_need_unmap() scenario.
>> ---
>> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
>> index 9fc4a60280a07..233378faab9bd 100644
>> --- a/drivers/nvme/host/pci.c
>> +++ b/drivers/nvme/host/pci.c
>> @@ -816,6 +816,28 @@ static void nvme_unmap_data(struct request *req)
>>   		nvme_free_descriptors(req);
>>   }
>>   
>> +static bool nvme_pci_prp_save_mapping(struct blk_dma_iter *iter,
>> +				      struct request *req)
>> +{
>> +	struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
>> +
>> +	if (!iod->dma_vecs) {
>> +		struct nvme_queue *nvmeq = req->mq_hctx->driver_data;
>> +
>> +		iod->dma_vecs = mempool_alloc(nvmeq->dev->dmavec_mempool,
>> +				GFP_ATOMIC);
>> +		if (!iod->dma_vecs) {
>> +			iter->status = BLK_STS_RESOURCE;
>> +			return false;
>> +		}
>> +	}
>> +
>> +	iod->dma_vecs[iod->nr_dma_vecs].addr = iter->addr;
>> +	iod->dma_vecs[iod->nr_dma_vecs].len = iter->len;
>> +	iod->nr_dma_vecs++;
>> +	return true;
>> +}
>> +
>>   static bool nvme_pci_prp_iter_next(struct request *req, struct device *dma_dev,
>>   		struct blk_dma_iter *iter)
>>   {
>> @@ -825,11 +847,8 @@ static bool nvme_pci_prp_iter_next(struct request *req, struct device *dma_dev,
>>   		return true;
>>   	if (!blk_rq_dma_map_iter_next(req, dma_dev, iter))
>>   		return false;
>> -	if (!dma_use_iova(&iod->dma_state) && dma_need_unmap(dma_dev)) {
>> -		iod->dma_vecs[iod->nr_dma_vecs].addr = iter->addr;
>> -		iod->dma_vecs[iod->nr_dma_vecs].len = iter->len;
>> -		iod->nr_dma_vecs++;
>> -	}
>> +	if (!dma_use_iova(&iod->dma_state) && dma_need_unmap(dma_dev))
> 
> Can dev->dma_skip_sync be modified in parallel with this check?
> If so, dma_need_unmap() may return different results depending on the
> time at which it is invoked.

It can if another thread is making mappings in parallel, however as 
things currently stand that would only lead to the current thread 
thinking it must save the unmap state for the mappings it's already made 
even if it technically didn't need to.

In principle it could also change back the other way if another thread 
reset the device's DMA mask, but doing that with active mappings would 
fundamentally break things in regard to the dma_skip_sync mechanism anyway.

Thanks,
Robin.

> 
>> +		return nvme_pci_prp_save_mapping(iter, req);
> 
> Thanks

Re: [PATCH V1] nvme-pci: Fix NULL pointer dereference in nvme_pci_prp_iter_next

Posted by Keith Busch 3 days, 14 hours ago

On Tue, Feb 03, 2026 at 01:50:12PM +0000, Robin Murphy wrote:
> > Can dev->dma_skip_sync be modified in parallel with this check?
> > If so, dma_need_unmap() may return different results depending on the
> > time at which it is invoked.
> 
> It can if another thread is making mappings in parallel, however as things
> currently stand that would only lead to the current thread thinking it must
> save the unmap state for the mappings it's already made even if it
> technically didn't need to.
> 
> In principle it could also change back the other way if another thread reset
> the device's DMA mask, but doing that with active mappings would
> fundamentally break things in regard to the dma_skip_sync mechanism anyway.

We can handle a change in dma_needs_unmap() from false -> true. Worst
case scenario is we save off a descriptor when we didn't really need to,
but it's just a small memory use for the lifetime of the IO.

We don't correctly handle a transition from true -> false, though. We'd
currently leak memory if that happened. It sounds like that transition
is broken for other reasons too, so I won't bother trying to handle it.

Re: [PATCH V1] nvme-pci: Fix NULL pointer dereference in nvme_pci_prp_iter_next

Posted by Christoph Hellwig 4 days, 3 hours ago

On Mon, Feb 02, 2026 at 11:59:04AM -0700, Keith Busch wrote:
> In the case where this iteration caused dma_need_unmap() to toggle to
> true, this is the iteration that allocates the dma_vecs, and it
> initializes the first entry to this iter. But the next lines proceed to
> the save this iter in the next index, so it's doubly accounted for and
> will get unmapped twice in the completion.

Yeah.

> Also, if the allocation fails, we should set iter->status to
> BLK_STS_RESOURCE so the callers know why the iteration can't continue.
> Otherwise, the caller will think the request is badly formed if you
> return false from here without setting iter->status.
> 
> Here's my quick take. Boot tested with swiotlb enabled, but haven't
> tried to test the changing dma_need_unmap() scenario.

Looks much better.  Cosmetic nits below.

Pradeep, can you test this with your setup?

> +	if (!dma_use_iova(&iod->dma_state) && dma_need_unmap(dma_dev))
> +		return nvme_pci_prp_save_mapping(iter, req);

> +	if (!dma_use_iova(&iod->dma_state) && dma_need_unmap(nvmeq->dev->dev))
> +		if (!nvme_pci_prp_save_mapping(iter, req))
> +			return iter->status;

I'd move the dma_use_iova / dma_need_unmap checks into
nvme_pci_prp_save_mapping to simplify this a bit more.

>  
>  	/*
>  	 * PRP1 always points to the start of the DMA transfers.
> @@ -1218,6 +1231,8 @@ static blk_status_t nvme_prep_rq(struct request *req)
>  	iod->nr_descriptors = 0;
>  	iod->total_len = 0;
>  	iod->meta_total_len = 0;
> +	iod->nr_dma_vecs = 0;
> +	iod->dma_vecs = NULL;

I don't think we need the dma_vecs initialization here, as everything
is keyed off nr_dma_vecs.

Re: [PATCH V1] nvme-pci: Fix NULL pointer dereference in nvme_pci_prp_iter_next

Posted by Pradeep Pragallapati 3 days, 18 hours ago


On 2/3/2026 10:57 AM, Christoph Hellwig wrote:
> On Mon, Feb 02, 2026 at 11:59:04AM -0700, Keith Busch wrote:
>> In the case where this iteration caused dma_need_unmap() to toggle to
>> true, this is the iteration that allocates the dma_vecs, and it
>> initializes the first entry to this iter. But the next lines proceed to
>> the save this iter in the next index, so it's doubly accounted for and
>> will get unmapped twice in the completion.
> 
> Yeah.
> 
>> Also, if the allocation fails, we should set iter->status to
>> BLK_STS_RESOURCE so the callers know why the iteration can't continue.
>> Otherwise, the caller will think the request is badly formed if you
>> return false from here without setting iter->status.
>>
>> Here's my quick take. Boot tested with swiotlb enabled, but haven't
>> tried to test the changing dma_need_unmap() scenario.
> 
> Looks much better.  Cosmetic nits below.
> 
> Pradeep, can you test this with your setup?
Sure, testing has started, and I will share the findings soon.
Also, I did not pick up the initialization of dma_vecs during testing.

> 
>> +	if (!dma_use_iova(&iod->dma_state) && dma_need_unmap(dma_dev))
>> +		return nvme_pci_prp_save_mapping(iter, req);
> 
>> +	if (!dma_use_iova(&iod->dma_state) && dma_need_unmap(nvmeq->dev->dev))
>> +		if (!nvme_pci_prp_save_mapping(iter, req))
>> +			return iter->status;
> 
> I'd move the dma_use_iova / dma_need_unmap checks into
> nvme_pci_prp_save_mapping to simplify this a bit more.
> 
>>   
>>   	/*
>>   	 * PRP1 always points to the start of the DMA transfers.
>> @@ -1218,6 +1231,8 @@ static blk_status_t nvme_prep_rq(struct request *req)
>>   	iod->nr_descriptors = 0;
>>   	iod->total_len = 0;
>>   	iod->meta_total_len = 0;
>> +	iod->nr_dma_vecs = 0;
>> +	iod->dma_vecs = NULL;
> 
> I don't think we need the dma_vecs initialization here, as everything
> is keyed off nr_dma_vecs.

Re: [PATCH V1] nvme-pci: Fix NULL pointer dereference in nvme_pci_prp_iter_next

Posted by Pradeep Pragallapati 2 days, 18 hours ago


On 2/3/2026 7:35 PM, Pradeep Pragallapati wrote:
> 
> 
> On 2/3/2026 10:57 AM, Christoph Hellwig wrote:
>> On Mon, Feb 02, 2026 at 11:59:04AM -0700, Keith Busch wrote:
>>> In the case where this iteration caused dma_need_unmap() to toggle to
>>> true, this is the iteration that allocates the dma_vecs, and it
>>> initializes the first entry to this iter. But the next lines proceed to
>>> the save this iter in the next index, so it's doubly accounted for and
>>> will get unmapped twice in the completion.
>>
>> Yeah.
>>
>>> Also, if the allocation fails, we should set iter->status to
>>> BLK_STS_RESOURCE so the callers know why the iteration can't continue.
>>> Otherwise, the caller will think the request is badly formed if you
>>> return false from here without setting iter->status.
>>>
>>> Here's my quick take. Boot tested with swiotlb enabled, but haven't
>>> tried to test the changing dma_need_unmap() scenario.
>>
>> Looks much better.  Cosmetic nits below.
>>
>> Pradeep, can you test this with your setup?
> Sure, testing has started, and I will share the findings soon.
> Also, I did not pick up the initialization of dma_vecs during testing.

I ran testing for over 20 hours and did not observe the issue on my 
setup. It appears to be helping.
> 
>>
>>> +    if (!dma_use_iova(&iod->dma_state) && dma_need_unmap(dma_dev))
>>> +        return nvme_pci_prp_save_mapping(iter, req);
>>
>>> +    if (!dma_use_iova(&iod->dma_state) && dma_need_unmap(nvmeq->dev- 
>>> >dev))
>>> +        if (!nvme_pci_prp_save_mapping(iter, req))
>>> +            return iter->status;
>>
>> I'd move the dma_use_iova / dma_need_unmap checks into
>> nvme_pci_prp_save_mapping to simplify this a bit more.
>>
>>>       /*
>>>        * PRP1 always points to the start of the DMA transfers.
>>> @@ -1218,6 +1231,8 @@ static blk_status_t nvme_prep_rq(struct request 
>>> *req)
>>>       iod->nr_descriptors = 0;
>>>       iod->total_len = 0;
>>>       iod->meta_total_len = 0;
>>> +    iod->nr_dma_vecs = 0;
>>> +    iod->dma_vecs = NULL;
>>
>> I don't think we need the dma_vecs initialization here, as everything
>> is keyed off nr_dma_vecs.
>

Re: [PATCH V1] nvme-pci: Fix NULL pointer dereference in nvme_pci_prp_iter_next

Posted by Keith Busch 2 days, 18 hours ago

On Wed, Feb 04, 2026 at 07:34:42PM +0530, Pradeep Pragallapati wrote:
> I ran testing for over 20 hours and did not observe the issue on my setup.
> It appears to be helping.

Thanks for for the test! I'll put a real patch together with Christoph's
suggestions folded in.

Re: [PATCH V1] nvme-pci: Fix NULL pointer dereference in nvme_pci_prp_iter_next

Posted by Keith Busch 4 days, 2 hours ago

On Tue, Feb 03, 2026 at 06:27:56AM +0100, Christoph Hellwig wrote:
> >  	iod->nr_descriptors = 0;
> >  	iod->total_len = 0;
> >  	iod->meta_total_len = 0;
> > +	iod->nr_dma_vecs = 0;
> > +	iod->dma_vecs = NULL;
> 
> I don't think we need the dma_vecs initialization here, as everything
> is keyed off nr_dma_vecs.

Yes, we should definitely use nr_dma_vecs and skip the NULL setting. I'm
a big fan removing unnecessary initialisations. Just a caution, my
suggested patch has this check:

	if (!iod->dma_vecs)

So we just need to update it to use 'iod->nr_dma_vecs' instead, which
would have been correct, too.

Re: [PATCH V1] nvme-pci: Fix NULL pointer dereference in nvme_pci_prp_iter_next

Posted by Christoph Hellwig 4 days, 2 hours ago

On Mon, Feb 02, 2026 at 11:14:23PM -0700, Keith Busch wrote:
> On Tue, Feb 03, 2026 at 06:27:56AM +0100, Christoph Hellwig wrote:
> > >  	iod->nr_descriptors = 0;
> > >  	iod->total_len = 0;
> > >  	iod->meta_total_len = 0;
> > > +	iod->nr_dma_vecs = 0;
> > > +	iod->dma_vecs = NULL;
> > 
> > I don't think we need the dma_vecs initialization here, as everything
> > is keyed off nr_dma_vecs.
> 
> Yes, we should definitely use nr_dma_vecs and skip the NULL setting. I'm
> a big fan removing unnecessary initialisations. Just a caution, my
> suggested patch has this check:
> 
> 	if (!iod->dma_vecs)
> 
> So we just need to update it to use 'iod->nr_dma_vecs' instead, which
> would have been correct, too.

Ah, right.

Re: [PATCH V1] nvme-pci: Fix NULL pointer dereference in nvme_pci_prp_iter_next

Posted by Leon Romanovsky 4 days, 16 hours ago

On Mon, Feb 02, 2026 at 03:16:50PM +0000, Robin Murphy wrote:
> On 2026-02-02 2:35 pm, Christoph Hellwig wrote:
> > On Mon, Feb 02, 2026 at 06:27:38PM +0530, Pradeep P V K wrote:
> > > Fix a NULL pointer dereference that occurs in nvme_pci_prp_iter_next()
> > > when SWIOTLB bounce buffering becomes active during runtime.
> > > 
> > > The issue occurs when SWIOTLB activation changes the device's DMA
> > > mapping requirements at runtime,
> > > 
> > > creating a mismatch between
> > > iod->dma_vecs allocation and access logic.
> > > 
> > > The problem manifests when:
> > > 1. Device initially operates with dma_skip_sync=true
> > >     (coherent DMA assumed)
> > > 2. First SWIOTLB mapping occurs due to DMA address limitations,
> > >     memory encryption, or IOMMU bounce buffering requirements
> > > 3. SWIOTLB calls dma_reset_need_sync(), permanently setting
> > >     dma_skip_sync=false
> > > 4. Subsequent I/Os now have dma_need_unmap()=true, requiring
> > >     iod->dma_vecs
> > 
> > I think this patch just papers over the bug.  If dma_need_unmap
> > can't be trusted before the dma_map_* call, we've not saved
> > the unmap information and the unmap won't work properly.
> 
> The dma_need_unmap() kerneldoc says:
> 
> "This function must be called after all mappings that might
>  need to be unmapped have been performed."
> 
> Trying to infer anything from it beforehand is definitely a bug in the
> caller.

At least for HMM, dma_need_unmap() works as expected. HMM doesn't work
with SWIOTLB.

Thanks