[PATCH] iommu: iommufd: Explicitly check for VM_PFNMAP in iommufd_ioas_map

Shuai Xue posted 1 patch 3 months, 1 week ago
drivers/iommu/iommufd/ioas.c | 28 ++++++++++++++++++++++++++++
1 file changed, 28 insertions(+)
[PATCH] iommu: iommufd: Explicitly check for VM_PFNMAP in iommufd_ioas_map
Posted by Shuai Xue 3 months, 1 week ago
The iommufd_ioas_map function currently returns -EFAULT when attempting
to map VM_PFNMAP VMAs because pin_user_pages_fast() cannot handle such
mappings. This error code is misleading and does not accurately reflect
the nature of the failure.

Add an explicit check for the VM_PFNMAP flag before attempting the
pin_user_pages operation. If VM_PFNMAP is set, return -EOPNOTSUPP to
clearly indicate that PFNMAP regions are not supported through the
IOMMU_IOAS_MAP interface.

This change improves error reporting and helps userspace applications
distinguish between different failure modes when working with special
mappings like MMIO regions.

Note that Jason Gunthorpe is working on extending IOMMU_IOAS_MAP_FILE to
support dma-buf file descriptors for MMIO BARs[1], which will provide a
secure and controlled method for sharing device memory. Until that
support is available, PFNMAP mappings through IOMMUFD are not supported.
[1]https://lore.kernel.org/all/0-v1-64bed2430cdb+31b-iommufd_dmabuf_jgg@nvidia.com/

Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
---
 drivers/iommu/iommufd/ioas.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/drivers/iommu/iommufd/ioas.c b/drivers/iommu/iommufd/ioas.c
index 0dee38d7252d..0c4f242eba49 100644
--- a/drivers/iommu/iommufd/ioas.c
+++ b/drivers/iommu/iommufd/ioas.c
@@ -241,6 +241,32 @@ int iommufd_ioas_map_file(struct iommufd_ucmd *ucmd)
 	return rc;
 }
 
+/**
+ * iommufd_check_vm_pfnmap - Check if a user address has the VM_PFNMAP flag set
+ * @vaddr: User virtual address to check
+ *
+ * This function checks if the VMA (Virtual Memory Area) containing the given
+ * virtual address has the VM_PFNMAP flag set. This flag is typically used for
+ * memory regions that directly map hardware resources (e.g., PCI BARs).
+ *
+ * Returns: true if VM_PFNMAP is set, false otherwise.
+ */
+static bool iommufd_check_vm_pfnmap(unsigned long vaddr)
+{
+	struct mm_struct *mm = current->mm;
+	struct vm_area_struct *vma;
+	bool ret = false;
+
+	mmap_read_lock(mm);
+	vaddr = untagged_addr_remote(mm, vaddr);
+	vma = vma_lookup(mm, vaddr);
+	if (vma && vma->vm_flags & VM_PFNMAP)
+		ret = true;
+	mmap_read_unlock(mm);
+
+	return ret;
+}
+
 int iommufd_ioas_map(struct iommufd_ucmd *ucmd)
 {
 	struct iommu_ioas_map *cmd = ucmd->cmd;
@@ -254,6 +280,8 @@ int iommufd_ioas_map(struct iommufd_ucmd *ucmd)
 	       IOMMU_IOAS_MAP_READABLE)) ||
 	    cmd->__reserved)
 		return -EOPNOTSUPP;
+	if (iommufd_check_vm_pfnmap(cmd->user_va))
+		return -EOPNOTSUPP;
 	if (cmd->iova >= ULONG_MAX || cmd->length >= ULONG_MAX)
 		return -EOVERFLOW;
 
-- 
2.39.3
Re: [PATCH] iommu: iommufd: Explicitly check for VM_PFNMAP in iommufd_ioas_map
Posted by Jason Gunthorpe 3 months, 1 week ago
On Wed, Oct 29, 2025 at 08:52:26PM +0800, Shuai Xue wrote:
> The iommufd_ioas_map function currently returns -EFAULT when attempting
> to map VM_PFNMAP VMAs because pin_user_pages_fast() cannot handle such
> mappings. This error code is misleading and does not accurately reflect
> the nature of the failure.

Sure, but why do you care? Userspace should know not to do this based
on how it created the mmaps, not rely on errnos to figure it out after
the fact.

> +static bool iommufd_check_vm_pfnmap(unsigned long vaddr)
> +{
> +	struct mm_struct *mm = current->mm;
> +	struct vm_area_struct *vma;
> +	bool ret = false;
> +
> +	mmap_read_lock(mm);
> +	vaddr = untagged_addr_remote(mm, vaddr);
> +	vma = vma_lookup(mm, vaddr);
> +	if (vma && vma->vm_flags & VM_PFNMAP)
> +		ret = true;
> +	mmap_read_unlock(mm);

This isn't really sufficient, the range can span multiple VMAs and you
can hit special PTEs in PFNMAPs, or you can hit P2P struct pages in
fully normal VMAs.

I think if you really want this errno distinction it should come from
pin_user_pages() directly as only it knows the reason it didn't work.

Jason
Re: [PATCH] iommu: iommufd: Explicitly check for VM_PFNMAP in iommufd_ioas_map
Posted by Shuai Xue 3 months, 1 week ago

在 2025/10/29 21:34, Jason Gunthorpe 写道:
> On Wed, Oct 29, 2025 at 08:52:26PM +0800, Shuai Xue wrote:
>> The iommufd_ioas_map function currently returns -EFAULT when attempting
>> to map VM_PFNMAP VMAs because pin_user_pages_fast() cannot handle such
>> mappings. This error code is misleading and does not accurately reflect
>> the nature of the failure.

Hi, Jason,

> 
> Sure, but why do you care? Userspace should know not to do this based
> on how it created the mmaps, not rely on errnos to figure it out after
> the fact.

We run different VMMs (QEMU, Kata Containers) to meet diverse business
requirements, while our production environment deploys various evolving
kernel versions. Additionally, we are migrating from VFIO Type 1 to
IOMMUFD. Although IOMMUFD claims to provide compatible
iommufd_vfio_ioctl APIs, these APIs are not fully compatible in
practice. For example, with VFIO_IOMMU_MAP_DMA, iommufd_vfio_map_dma
doesn't support MMIO mapping, and we can only rely on the implicit
EFAULT error from pin_user_pages_fast(). (I initially considered adding
explicit checks in iommufd_vfio_map_dma, but I noticed you plan to add
dma_buf support there.)

While we certainly aim for a seamless migration from VFIO Type 1 to
IOMMUFD, as you know, this isn't always feasible.

For GPU-related issues encountered in production, the debugging path is
quite long - from business teams to virtualization teams, and finally to
our kernel team.

Therefore, having explicit checks with deterministic error codes
returned to userspace would be greatly appreciated.

> 
>> +static bool iommufd_check_vm_pfnmap(unsigned long vaddr)
>> +{
>> +	struct mm_struct *mm = current->mm;
>> +	struct vm_area_struct *vma;
>> +	bool ret = false;
>> +
>> +	mmap_read_lock(mm);
>> +	vaddr = untagged_addr_remote(mm, vaddr);
>> +	vma = vma_lookup(mm, vaddr);
>> +	if (vma && vma->vm_flags & VM_PFNMAP)
>> +		ret = true;
>> +	mmap_read_unlock(mm);
> 
> This isn't really sufficient, the range can span multiple VMAs and you
> can hit special PTEs in PFNMAPs, or you can hit P2P struct pages in
> fully normal VMAs.
> 
> I think if you really want this errno distinction it should come from
> pin_user_pages() directly as only it knows the reason it didn't work.
> 

Aha, I see. Thank you for pointing out this issue. The check indeed
needs to be more comprehensive. Do you mind use pin_user_pages() as a
precheck?

Thanks for quick reply.

Best Regards,
Shuai
Re: [PATCH] iommu: iommufd: Explicitly check for VM_PFNMAP in iommufd_ioas_map
Posted by Jason Gunthorpe 3 months, 1 week ago
On Wed, Oct 29, 2025 at 10:44:31PM +0800, Shuai Xue wrote:

> We run different VMMs (QEMU, Kata Containers) to meet diverse business
> requirements, while our production environment deploys various evolving
> kernel versions. Additionally, we are migrating from VFIO Type 1 to
> IOMMUFD. Although IOMMUFD claims to provide compatible
> iommufd_vfio_ioctl APIs, these APIs are not fully compatible in
> practice. 

Well, it aims to, but we are not there yet. Hopefully in the coming
months the MMIO to VFIO will be supported in type 1 emulation as well.

But broadly the EFAULT return here always means the underlying VMA is
incompatible with IOMMUFD, I'm not sure there is that much value in
further determining why exactly it is incompatible.

> Aha, I see. Thank you for pointing out this issue. The check indeed
> needs to be more comprehensive. Do you mind use pin_user_pages() as a
> precheck?

I mean we already call pin_user_pages deep inside the mapping code and
propogate whatever error code it gives back up to userspace. If it
gives a more specific code then it will be returned naturally, no need
to change iommufd at all.

Jason