Enable compound page for p2pdma memory

[PATCH 01/13] PCI/P2PDMA: Release the per-cpu ref of pgmap when vm_insert_page() fails

Posted by Hou Tao 1 month, 3 weeks ago

From: Hou Tao <houtao1@huawei.com>

When vm_insert_page() fails in p2pmem_alloc_mmap(), p2pmem_alloc_mmap()
doesn't invoke percpu_ref_put() to free the per-cpu ref of pgmap
acquired after gen_pool_alloc_owner(), and memunmap_pages() will hang
forever when trying to remove the PCIe device.

Fix it by adding the missed percpu_ref_put().

Fixes: 7e9c7ef83d78 ("PCI/P2PDMA: Allow userspace VMA allocations through sysfs")
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 drivers/pci/p2pdma.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index 4a2fc7ab42c3..218c1f5252b6 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -152,6 +152,7 @@ static int p2pmem_alloc_mmap(struct file *filp, struct kobject *kobj,
 		ret = vm_insert_page(vma, vaddr, page);
 		if (ret) {
 			gen_pool_free(p2pdma->pool, (uintptr_t)kaddr, len);
+			percpu_ref_put(ref);
 			return ret;
 		}
 		percpu_ref_get(ref);
-- 
2.29.2

Re: [PATCH 01/13] PCI/P2PDMA: Release the per-cpu ref of pgmap when vm_insert_page() fails

Posted by Alistair Popple 1 month ago

On 2025-12-20 at 15:04 +1100, Hou Tao <houtao@huaweicloud.com> wrote...
> From: Hou Tao <houtao1@huawei.com>
> 
> When vm_insert_page() fails in p2pmem_alloc_mmap(), p2pmem_alloc_mmap()
> doesn't invoke percpu_ref_put() to free the per-cpu ref of pgmap
> acquired after gen_pool_alloc_owner(), and memunmap_pages() will hang
> forever when trying to remove the PCIe device.
> 
> Fix it by adding the missed percpu_ref_put().

This pairs with the percpu_ref_tryget_live_rcu() above right? Might be worth
mentioning that as a comment, but overall looks good to me so feel free to add:

Reviewed-by: Alistair Popple <apopple@nvidia.com>

> 
> Fixes: 7e9c7ef83d78 ("PCI/P2PDMA: Allow userspace VMA allocations through sysfs")
> Signed-off-by: Hou Tao <houtao1@huawei.com>
> ---
>  drivers/pci/p2pdma.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> index 4a2fc7ab42c3..218c1f5252b6 100644
> --- a/drivers/pci/p2pdma.c
> +++ b/drivers/pci/p2pdma.c
> @@ -152,6 +152,7 @@ static int p2pmem_alloc_mmap(struct file *filp, struct kobject *kobj,
>  		ret = vm_insert_page(vma, vaddr, page);
>  		if (ret) {
>  			gen_pool_free(p2pdma->pool, (uintptr_t)kaddr, len);
> +			percpu_ref_put(ref);
>  			return ret;
>  		}
>  		percpu_ref_get(ref);
> -- 
> 2.29.2
>

Re: [PATCH 01/13] PCI/P2PDMA: Release the per-cpu ref of pgmap when vm_insert_page() fails

Posted by Bjorn Helgaas 1 month ago

On Thu, Jan 08, 2026 at 02:23:16PM +1100, Alistair Popple wrote:
> On 2025-12-20 at 15:04 +1100, Hou Tao <houtao@huaweicloud.com> wrote...
> > From: Hou Tao <houtao1@huawei.com>
> > 
> > When vm_insert_page() fails in p2pmem_alloc_mmap(), p2pmem_alloc_mmap()
> > doesn't invoke percpu_ref_put() to free the per-cpu ref of pgmap
> > acquired after gen_pool_alloc_owner(), and memunmap_pages() will hang
> > forever when trying to remove the PCIe device.
> > 
> > Fix it by adding the missed percpu_ref_put().
> 
> This pairs with the percpu_ref_tryget_live_rcu() above right? Might
> be worth mentioning that as a comment, but overall looks good to me
> so feel free to add:
> 
> Reviewed-by: Alistair Popple <apopple@nvidia.com>

Added your Reviewed-by, thanks!

Would the following commit log address your suggestion?

  When the vm_insert_page() in p2pmem_alloc_mmap() failed, we did not
  invoke percpu_ref_put() to free the per-CPU pgmap ref acquired by
  percpu_ref_tryget_live_rcu(), which meant that PCI device removal would
  hang forever in memunmap_pages().

  Fix it by adding the missed percpu_ref_put().

Looking at this again, I'm confused about why in the normal, non-error
case, we do the percpu_ref_tryget_live_rcu(ref), followed by another
percpu_ref_get(ref) for each page, followed by just a single
percpu_ref_put() at the exit.

So we do ref_get() "1 + number of pages" times but we only do a single
ref_put().  Is there a loop of ref_put() for each page elsewhere?

> > Fixes: 7e9c7ef83d78 ("PCI/P2PDMA: Allow userspace VMA allocations through sysfs")
> > Signed-off-by: Hou Tao <houtao1@huawei.com>
> > ---
> >  drivers/pci/p2pdma.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> > index 4a2fc7ab42c3..218c1f5252b6 100644
> > --- a/drivers/pci/p2pdma.c
> > +++ b/drivers/pci/p2pdma.c
> > @@ -152,6 +152,7 @@ static int p2pmem_alloc_mmap(struct file *filp, struct kobject *kobj,
> >  		ret = vm_insert_page(vma, vaddr, page);
> >  		if (ret) {
> >  			gen_pool_free(p2pdma->pool, (uintptr_t)kaddr, len);
> > +			percpu_ref_put(ref);
> >  			return ret;
> >  		}
> >  		percpu_ref_get(ref);
> > -- 
> > 2.29.2
> >

Re: [PATCH 01/13] PCI/P2PDMA: Release the per-cpu ref of pgmap when vm_insert_page() fails

Posted by Alistair Popple 1 month ago

On 2026-01-09 at 02:55 +1100, Bjorn Helgaas <helgaas@kernel.org> wrote...
> On Thu, Jan 08, 2026 at 02:23:16PM +1100, Alistair Popple wrote:
> > On 2025-12-20 at 15:04 +1100, Hou Tao <houtao@huaweicloud.com> wrote...
> > > From: Hou Tao <houtao1@huawei.com>
> > > 
> > > When vm_insert_page() fails in p2pmem_alloc_mmap(), p2pmem_alloc_mmap()
> > > doesn't invoke percpu_ref_put() to free the per-cpu ref of pgmap
> > > acquired after gen_pool_alloc_owner(), and memunmap_pages() will hang
> > > forever when trying to remove the PCIe device.
> > > 
> > > Fix it by adding the missed percpu_ref_put().
> > 
> > This pairs with the percpu_ref_tryget_live_rcu() above right? Might
> > be worth mentioning that as a comment, but overall looks good to me
> > so feel free to add:
> > 
> > Reviewed-by: Alistair Popple <apopple@nvidia.com>
> 
> Added your Reviewed-by, thanks!
> 
> Would the following commit log address your suggestion?
> 
>   When the vm_insert_page() in p2pmem_alloc_mmap() failed, we did not
>   invoke percpu_ref_put() to free the per-CPU pgmap ref acquired by
>   percpu_ref_tryget_live_rcu(), which meant that PCI device removal would
>   hang forever in memunmap_pages().
> 
>   Fix it by adding the missed percpu_ref_put().

Yes, that looks perfect. Thanks.

> Looking at this again, I'm confused about why in the normal, non-error
> case, we do the percpu_ref_tryget_live_rcu(ref), followed by another
> percpu_ref_get(ref) for each page, followed by just a single
> percpu_ref_put() at the exit.
> 
> So we do ref_get() "1 + number of pages" times but we only do a single
> ref_put().  Is there a loop of ref_put() for each page elsewhere?

Right, the per-page ref_put() happens when the page is freed (ie. the struct
page refcount drops to zero) - in this case free_zone_device_folio() will call
p2pdma_folio_free() which has the corresponding percpu_ref_put().

It would be nice to harmonize the pgmap refcounting across all ZONE_DEVICE
users. For example for MEMORY_DEVICE_PRIVATE/COHERENT pages drop the reference
in the generic free_zone_device_folio() rather than in the specific free
callback. Although the whole thing is actually a bit redundant now and I have
debated removing it entirely - it really just serves as an optimised way to do
a sanity check that no pages are in use when memunmap_pages() is called. The
alternative would be just to check the refcount of every page.

> > > Fixes: 7e9c7ef83d78 ("PCI/P2PDMA: Allow userspace VMA allocations through sysfs")
> > > Signed-off-by: Hou Tao <houtao1@huawei.com>
> > > ---
> > >  drivers/pci/p2pdma.c | 1 +
> > >  1 file changed, 1 insertion(+)
> > > 
> > > diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
> > > index 4a2fc7ab42c3..218c1f5252b6 100644
> > > --- a/drivers/pci/p2pdma.c
> > > +++ b/drivers/pci/p2pdma.c
> > > @@ -152,6 +152,7 @@ static int p2pmem_alloc_mmap(struct file *filp, struct kobject *kobj,
> > >  		ret = vm_insert_page(vma, vaddr, page);
> > >  		if (ret) {
> > >  			gen_pool_free(p2pdma->pool, (uintptr_t)kaddr, len);
> > > +			percpu_ref_put(ref);
> > >  			return ret;
> > >  		}
> > >  		percpu_ref_get(ref);
> > > -- 
> > > 2.29.2
> > > 
>

Re: [PATCH 01/13] PCI/P2PDMA: Release the per-cpu ref of pgmap when vm_insert_page() fails

Posted by Bjorn Helgaas 1 month ago

On Fri, Jan 09, 2026 at 11:41:51AM +1100, Alistair Popple wrote:
> On 2026-01-09 at 02:55 +1100, Bjorn Helgaas <helgaas@kernel.org> wrote...
> > On Thu, Jan 08, 2026 at 02:23:16PM +1100, Alistair Popple wrote:
> > > On 2025-12-20 at 15:04 +1100, Hou Tao <houtao@huaweicloud.com> wrote...
> > > > From: Hou Tao <houtao1@huawei.com>
> > > > 
> > > > When vm_insert_page() fails in p2pmem_alloc_mmap(), p2pmem_alloc_mmap()
> > > > doesn't invoke percpu_ref_put() to free the per-cpu ref of pgmap
> > > > acquired after gen_pool_alloc_owner(), and memunmap_pages() will hang
> > > > forever when trying to remove the PCIe device.
> > > > 
> > > > Fix it by adding the missed percpu_ref_put().
> ...

> > Looking at this again, I'm confused about why in the normal, non-error
> > case, we do the percpu_ref_tryget_live_rcu(ref), followed by another
> > percpu_ref_get(ref) for each page, followed by just a single
> > percpu_ref_put() at the exit.
> > 
> > So we do ref_get() "1 + number of pages" times but we only do a single
> > ref_put().  Is there a loop of ref_put() for each page elsewhere?
> 
> Right, the per-page ref_put() happens when the page is freed (ie. the struct
> page refcount drops to zero) - in this case free_zone_device_folio() will call
> p2pdma_folio_free() which has the corresponding percpu_ref_put().

I don't see anything that looks like a loop to call ref_put() for each
page in free_zone_device_folio() or in p2pdma_folio_free(), but this
is all completely out of my range, so I'll take your word for it :)  

Bjorn

Re: [PATCH 01/13] PCI/P2PDMA: Release the per-cpu ref of pgmap when vm_insert_page() fails

Posted by Alistair Popple 4 weeks, 1 day ago

On 2026-01-10 at 02:03 +1100, Bjorn Helgaas <helgaas@kernel.org> wrote...
> On Fri, Jan 09, 2026 at 11:41:51AM +1100, Alistair Popple wrote:
> > On 2026-01-09 at 02:55 +1100, Bjorn Helgaas <helgaas@kernel.org> wrote...
> > > On Thu, Jan 08, 2026 at 02:23:16PM +1100, Alistair Popple wrote:
> > > > On 2025-12-20 at 15:04 +1100, Hou Tao <houtao@huaweicloud.com> wrote...
> > > > > From: Hou Tao <houtao1@huawei.com>
> > > > > 
> > > > > When vm_insert_page() fails in p2pmem_alloc_mmap(), p2pmem_alloc_mmap()
> > > > > doesn't invoke percpu_ref_put() to free the per-cpu ref of pgmap
> > > > > acquired after gen_pool_alloc_owner(), and memunmap_pages() will hang
> > > > > forever when trying to remove the PCIe device.
> > > > > 
> > > > > Fix it by adding the missed percpu_ref_put().
> > ...
> 
> > > Looking at this again, I'm confused about why in the normal, non-error
> > > case, we do the percpu_ref_tryget_live_rcu(ref), followed by another
> > > percpu_ref_get(ref) for each page, followed by just a single
> > > percpu_ref_put() at the exit.
> > > 
> > > So we do ref_get() "1 + number of pages" times but we only do a single
> > > ref_put().  Is there a loop of ref_put() for each page elsewhere?
> > 
> > Right, the per-page ref_put() happens when the page is freed (ie. the struct
> > page refcount drops to zero) - in this case free_zone_device_folio() will call
> > p2pdma_folio_free() which has the corresponding percpu_ref_put().
> 
> I don't see anything that looks like a loop to call ref_put() for each
> page in free_zone_device_folio() or in p2pdma_folio_free(), but this
> is all completely out of my range, so I'll take your word for it :)  

That's brave :-)

What happens is the core mm takes over managing the page life time once
vm_insert_page() has been (successfully) called to map the page:

	VM_WARN_ON_ONCE_PAGE(!page_ref_count(page), page);
	set_page_count(page, 1);
	ret = vm_insert_page(vma, vaddr, page);
	if (ret) {
		gen_pool_free(p2pdma->pool, (uintptr_t)kaddr, len);
		return ret;
	}
	percpu_ref_get(ref);
	put_page(page);

In the above sequence vm_insert_page() takes a page ref for each page it maps
into the user page tables with folio_get(). This reference is dropped when the
user page table entry is removed, typically by the loop in zap_pte_range().

Normally the user page table mapping is the only thing holding a reference so
it ends up calling folio_put()->free_zone_device_folio->...->ref_put() one page
at a time as the PTEs are removed from the page tables. At least that's what
happens conceptually - the TLB batching code makes it hard to actually see where
the folio_put() is called in this sequence.

Note the extra set_page_count(1) and put_page(page) in the above sequence is
just to make vm_insert_page() happy - it complains it you try and insert a page
with a zero page ref.

And looking at that sequence there is another minor bug - in the failure
path we are exiting the loop with the failed page ref count set to
1 from set_page_count(page, 1). That needs to be reset to zero with
set_page_count(page, 0) to avoid the VM_WARN_ON_ONCE_PAGE() if the page gets
reused. I will send a fix for that.

 - Alistair

> Bjorn

Re: [PATCH 01/13] PCI/P2PDMA: Release the per-cpu ref of pgmap when vm_insert_page() fails

Posted by Alistair Popple 4 weeks, 1 day ago

On 2026-01-12 at 10:21 +1100, Alistair Popple <apopple@nvidia.com> wrote...
> On 2026-01-10 at 02:03 +1100, Bjorn Helgaas <helgaas@kernel.org> wrote...
> > On Fri, Jan 09, 2026 at 11:41:51AM +1100, Alistair Popple wrote:
> > > On 2026-01-09 at 02:55 +1100, Bjorn Helgaas <helgaas@kernel.org> wrote...
> > > > On Thu, Jan 08, 2026 at 02:23:16PM +1100, Alistair Popple wrote:
> > > > > On 2025-12-20 at 15:04 +1100, Hou Tao <houtao@huaweicloud.com> wrote...
> > > > > > From: Hou Tao <houtao1@huawei.com>
> > > > > > 
> > > > > > When vm_insert_page() fails in p2pmem_alloc_mmap(), p2pmem_alloc_mmap()
> > > > > > doesn't invoke percpu_ref_put() to free the per-cpu ref of pgmap
> > > > > > acquired after gen_pool_alloc_owner(), and memunmap_pages() will hang
> > > > > > forever when trying to remove the PCIe device.
> > > > > > 
> > > > > > Fix it by adding the missed percpu_ref_put().
> > > ...
> > 
> > > > Looking at this again, I'm confused about why in the normal, non-error
> > > > case, we do the percpu_ref_tryget_live_rcu(ref), followed by another
> > > > percpu_ref_get(ref) for each page, followed by just a single
> > > > percpu_ref_put() at the exit.
> > > > 
> > > > So we do ref_get() "1 + number of pages" times but we only do a single
> > > > ref_put().  Is there a loop of ref_put() for each page elsewhere?
> > > 
> > > Right, the per-page ref_put() happens when the page is freed (ie. the struct
> > > page refcount drops to zero) - in this case free_zone_device_folio() will call
> > > p2pdma_folio_free() which has the corresponding percpu_ref_put().
> > 
> > I don't see anything that looks like a loop to call ref_put() for each
> > page in free_zone_device_folio() or in p2pdma_folio_free(), but this
> > is all completely out of my range, so I'll take your word for it :)  
> 
> That's brave :-)
> 
> What happens is the core mm takes over managing the page life time once
> vm_insert_page() has been (successfully) called to map the page:
> 
> 	VM_WARN_ON_ONCE_PAGE(!page_ref_count(page), page);
> 	set_page_count(page, 1);
> 	ret = vm_insert_page(vma, vaddr, page);
> 	if (ret) {
> 		gen_pool_free(p2pdma->pool, (uintptr_t)kaddr, len);
> 		return ret;
> 	}
> 	percpu_ref_get(ref);
> 	put_page(page);
> 
> In the above sequence vm_insert_page() takes a page ref for each page it maps
> into the user page tables with folio_get(). This reference is dropped when the
> user page table entry is removed, typically by the loop in zap_pte_range().
> 
> Normally the user page table mapping is the only thing holding a reference so
> it ends up calling folio_put()->free_zone_device_folio->...->ref_put() one page
> at a time as the PTEs are removed from the page tables. At least that's what
> happens conceptually - the TLB batching code makes it hard to actually see where
> the folio_put() is called in this sequence.
> 
> Note the extra set_page_count(1) and put_page(page) in the above sequence is
> just to make vm_insert_page() happy - it complains it you try and insert a page
> with a zero page ref.
> 
> And looking at that sequence there is another minor bug - in the failure
> path we are exiting the loop with the failed page ref count set to
> 1 from set_page_count(page, 1). That needs to be reset to zero with
> set_page_count(page, 0) to avoid the VM_WARN_ON_ONCE_PAGE() if the page gets
> reused. I will send a fix for that.

Actually the whole failure path above seems wrong to me - we
free the entire allocation with gen_pool_free() even though
vm_insert_page() may have succeeded in mapping some pages. AFAICT the
generic VFS mmap code will call unmap_region() to undo any partial
mapping (see __mmap_new_file_vma) but that should end up calling
folio_put()->zone_free_device_range()->p2pdma_folio_free()->gen_pool_free_owner()
for the mapped pages even though we've already freed the entire pool.

>  - Alistair
> 
> > Bjorn
>

Re: [PATCH 01/13] PCI/P2PDMA: Release the per-cpu ref of pgmap when vm_insert_page() fails

Posted by Alistair Popple 4 weeks, 1 day ago

On 2026-01-12 at 11:12 +1100, Alistair Popple <apopple@nvidia.com> wrote...
> On 2026-01-12 at 10:21 +1100, Alistair Popple <apopple@nvidia.com> wrote...
> > On 2026-01-10 at 02:03 +1100, Bjorn Helgaas <helgaas@kernel.org> wrote...
> > > On Fri, Jan 09, 2026 at 11:41:51AM +1100, Alistair Popple wrote:
> > > > On 2026-01-09 at 02:55 +1100, Bjorn Helgaas <helgaas@kernel.org> wrote...
> > > > > On Thu, Jan 08, 2026 at 02:23:16PM +1100, Alistair Popple wrote:
> > > > > > On 2025-12-20 at 15:04 +1100, Hou Tao <houtao@huaweicloud.com> wrote...
> > > > > > > From: Hou Tao <houtao1@huawei.com>
> > > > > > > 
> > > > > > > When vm_insert_page() fails in p2pmem_alloc_mmap(), p2pmem_alloc_mmap()
> > > > > > > doesn't invoke percpu_ref_put() to free the per-cpu ref of pgmap
> > > > > > > acquired after gen_pool_alloc_owner(), and memunmap_pages() will hang
> > > > > > > forever when trying to remove the PCIe device.
> > > > > > > 
> > > > > > > Fix it by adding the missed percpu_ref_put().
> > > > ...
> > > 
> > > > > Looking at this again, I'm confused about why in the normal, non-error
> > > > > case, we do the percpu_ref_tryget_live_rcu(ref), followed by another
> > > > > percpu_ref_get(ref) for each page, followed by just a single
> > > > > percpu_ref_put() at the exit.
> > > > > 
> > > > > So we do ref_get() "1 + number of pages" times but we only do a single
> > > > > ref_put().  Is there a loop of ref_put() for each page elsewhere?
> > > > 
> > > > Right, the per-page ref_put() happens when the page is freed (ie. the struct
> > > > page refcount drops to zero) - in this case free_zone_device_folio() will call
> > > > p2pdma_folio_free() which has the corresponding percpu_ref_put().
> > > 
> > > I don't see anything that looks like a loop to call ref_put() for each
> > > page in free_zone_device_folio() or in p2pdma_folio_free(), but this
> > > is all completely out of my range, so I'll take your word for it :)  
> > 
> > That's brave :-)
> > 
> > What happens is the core mm takes over managing the page life time once
> > vm_insert_page() has been (successfully) called to map the page:
> > 
> > 	VM_WARN_ON_ONCE_PAGE(!page_ref_count(page), page);
> > 	set_page_count(page, 1);
> > 	ret = vm_insert_page(vma, vaddr, page);
> > 	if (ret) {
> > 		gen_pool_free(p2pdma->pool, (uintptr_t)kaddr, len);
> > 		return ret;
> > 	}
> > 	percpu_ref_get(ref);
> > 	put_page(page);
> > 
> > In the above sequence vm_insert_page() takes a page ref for each page it maps
> > into the user page tables with folio_get(). This reference is dropped when the
> > user page table entry is removed, typically by the loop in zap_pte_range().
> > 
> > Normally the user page table mapping is the only thing holding a reference so
> > it ends up calling folio_put()->free_zone_device_folio->...->ref_put() one page
> > at a time as the PTEs are removed from the page tables. At least that's what
> > happens conceptually - the TLB batching code makes it hard to actually see where
> > the folio_put() is called in this sequence.
> > 
> > Note the extra set_page_count(1) and put_page(page) in the above sequence is
> > just to make vm_insert_page() happy - it complains it you try and insert a page
> > with a zero page ref.
> > 
> > And looking at that sequence there is another minor bug - in the failure
> > path we are exiting the loop with the failed page ref count set to
> > 1 from set_page_count(page, 1). That needs to be reset to zero with
> > set_page_count(page, 0) to avoid the VM_WARN_ON_ONCE_PAGE() if the page gets
> > reused. I will send a fix for that.
> 
> Actually the whole failure path above seems wrong to me - we
> free the entire allocation with gen_pool_free() even though
> vm_insert_page() may have succeeded in mapping some pages. AFAICT the
> generic VFS mmap code will call unmap_region() to undo any partial
> mapping (see __mmap_new_file_vma) but that should end up calling
> folio_put()->zone_free_device_range()->p2pdma_folio_free()->gen_pool_free_owner()
> for the mapped pages even though we've already freed the entire pool.

Oh nevermind, I hit send too soon. Ignore the above paragraph - I hadn't noticed
kaddr/len gets updated at the end of the loop to account for the successful
mappings.

> >  - Alistair
> > 
> > > Bjorn
> > 
>

Re: [PATCH 01/13] PCI/P2PDMA: Release the per-cpu ref of pgmap when vm_insert_page() fails

Posted by Logan Gunthorpe 1 month, 2 weeks ago


On 2025-12-19 21:04, Hou Tao wrote:
> From: Hou Tao <houtao1@huawei.com>
> 
> When vm_insert_page() fails in p2pmem_alloc_mmap(), p2pmem_alloc_mmap()
> doesn't invoke percpu_ref_put() to free the per-cpu ref of pgmap
> acquired after gen_pool_alloc_owner(), and memunmap_pages() will hang
> forever when trying to remove the PCIe device.
> 
> Fix it by adding the missed percpu_ref_put().
> 
> Fixes: 7e9c7ef83d78 ("PCI/P2PDMA: Allow userspace VMA allocations through sysfs")
> Signed-off-by: Hou Tao <houtao1@huawei.com>
Nice catch, thanks:

Reviewed-by: Logan Gunthorpe <logang@deltatee.com>


Logan