[PATCH v2 03/14] vfio/iommufd: Create nesting parent hwpt with IOMMU_HWPT_ALLOC_PASID flag

Zhenzhong Duan posted 14 patches 1 week ago
Maintainers: Yi Liu <yi.l.liu@intel.com>, Eric Auger <eric.auger@redhat.com>, Zhenzhong Duan <zhenzhong.duan@intel.com>, Peter Maydell <peter.maydell@linaro.org>, "Michael S. Tsirkin" <mst@redhat.com>, Jason Wang <jasowang@redhat.com>, "Clément Mathieu--Drif" <clement.mathieu--drif@bull.com>, Marcel Apfelbaum <marcel.apfelbaum@gmail.com>, Paolo Bonzini <pbonzini@redhat.com>, Richard Henderson <richard.henderson@linaro.org>, Alex Williamson <alex@shazbot.org>, "Cédric Le Goater" <clg@redhat.com>
[PATCH v2 03/14] vfio/iommufd: Create nesting parent hwpt with IOMMU_HWPT_ALLOC_PASID flag
Posted by Zhenzhong Duan 1 week ago
When both device and vIOMMU have PASID enabled, then guest may setup
pasid attached translation.

We need to create the nesting parent hwpt with IOMMU_HWPT_ALLOC_PASID
flag because according to uAPI, any domain attached to the non-PASID
part of the device must also be flagged, otherwise attaching a PASID
will blocked.

Introduce a vfio_device_get_viommu_flags_pasid_supported() helper to
facilitate this implementation.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
---
 include/hw/vfio/vfio-device.h |  1 +
 hw/vfio/device.c              | 11 +++++++++++
 hw/vfio/iommufd.c             |  8 +++++++-
 3 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
index 828a31c006..dd0355eb3d 100644
--- a/include/hw/vfio/vfio-device.h
+++ b/include/hw/vfio/vfio-device.h
@@ -268,6 +268,7 @@ void vfio_device_prepare(VFIODevice *vbasedev, VFIOContainer *bcontainer,
 void vfio_device_unprepare(VFIODevice *vbasedev);
 
 bool vfio_device_get_viommu_flags_want_nesting(VFIODevice *vbasedev);
+bool vfio_device_get_viommu_flags_pasid_supported(VFIODevice *vbasedev);
 bool vfio_device_get_host_iommu_quirk_bypass_ro(VFIODevice *vbasedev,
                                                 uint32_t type, void *caps,
                                                 uint32_t size);
diff --git a/hw/vfio/device.c b/hw/vfio/device.c
index 973fc35b59..b15ca6ef0a 100644
--- a/hw/vfio/device.c
+++ b/hw/vfio/device.c
@@ -533,6 +533,17 @@ bool vfio_device_get_viommu_flags_want_nesting(VFIODevice *vbasedev)
     return false;
 }
 
+bool vfio_device_get_viommu_flags_pasid_supported(VFIODevice *vbasedev)
+{
+    VFIOPCIDevice *vdev = vfio_pci_from_vfio_device(vbasedev);
+
+    if (vdev) {
+        return !!(pci_device_get_viommu_flags(PCI_DEVICE(vdev)) &
+                  VIOMMU_FLAG_PASID_SUPPORTED);
+    }
+    return false;
+}
+
 bool vfio_device_get_host_iommu_quirk_bypass_ro(VFIODevice *vbasedev,
                                                 uint32_t type, void *caps,
                                                 uint32_t size)
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index e822039858..dce4e4ce72 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -363,6 +363,7 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
     VendorCaps caps;
     VFIOIOASHwpt *hwpt;
     uint32_t hwpt_id;
+    uint8_t max_pasid_log2 = 0;
     int ret;
 
     /* Try to find a domain */
@@ -408,7 +409,7 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
      */
     if (!iommufd_backend_get_device_info(vbasedev->iommufd, vbasedev->devid,
                                          &type, &caps, sizeof(caps), &hw_caps,
-                                         NULL, errp)) {
+                                         &max_pasid_log2, errp)) {
         return false;
     }
 
@@ -430,6 +431,11 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
         }
     }
 
+    if (max_pasid_log2 &&
+        vfio_device_get_viommu_flags_pasid_supported(vbasedev)) {
+        flags |= IOMMU_HWPT_ALLOC_PASID;
+    }
+
     if (cpr_is_incoming()) {
         hwpt_id = vbasedev->cpr.hwpt_id;
         goto skip_alloc;
-- 
2.47.3
Re: [PATCH v2 03/14] vfio/iommufd: Create nesting parent hwpt with IOMMU_HWPT_ALLOC_PASID flag
Posted by Yi Liu 6 days, 17 hours ago
On 3/26/26 17:11, Zhenzhong Duan wrote:
> When both device and vIOMMU have PASID enabled, then guest may setup
> pasid attached translation.
> 
> We need to create the nesting parent hwpt with IOMMU_HWPT_ALLOC_PASID
> flag because according to uAPI, any domain attached to the non-PASID
> part of the device must also be flagged, otherwise attaching a PASID
> will blocked.

echo the comment on the commit message.

https://lore.kernel.org/qemu-devel/a33c785a-ab94-4dc2-85eb-10b7d288f661@intel.com/
RE: [PATCH v2 03/14] vfio/iommufd: Create nesting parent hwpt with IOMMU_HWPT_ALLOC_PASID flag
Posted by Duan, Zhenzhong 6 days, 14 hours ago

>-----Original Message-----
>From: Liu, Yi L <yi.l.liu@intel.com>
>Subject: Re: [PATCH v2 03/14] vfio/iommufd: Create nesting parent hwpt with
>IOMMU_HWPT_ALLOC_PASID flag
>
>On 3/26/26 17:11, Zhenzhong Duan wrote:
>> When both device and vIOMMU have PASID enabled, then guest may setup
>> pasid attached translation.
>>
>> We need to create the nesting parent hwpt with IOMMU_HWPT_ALLOC_PASID
>> flag because according to uAPI, any domain attached to the non-PASID
>> part of the device must also be flagged, otherwise attaching a PASID
>> will blocked.
>
>echo the comment on the commit message.
>
>https://lore.kernel.org/qemu-devel/a33c785a-ab94-4dc2-85eb-
>10b7d288f661@intel.com/

Sorry, seems I missed it. Will use below to replace above paragraph.

"VFIO needs to be aware of potential pasid
usage and should attach the non-pasid part of pasid-capable device to
hwpt flagged with IOMMU_HWPT_ALLOC_PASID."

Thanks
Zhenzhong
Re: [PATCH v2 03/14] vfio/iommufd: Create nesting parent hwpt with IOMMU_HWPT_ALLOC_PASID flag
Posted by Nicolin Chen 6 days, 23 hours ago
On Thu, Mar 26, 2026 at 05:11:17AM -0400, Zhenzhong Duan wrote:
> @@ -430,6 +431,11 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>          }
>      }
>  
> +    if (max_pasid_log2 &&
> +        vfio_device_get_viommu_flags_pasid_supported(vbasedev)) {
> +        flags |= IOMMU_HWPT_ALLOC_PASID;
> +    }

This would set it to:
	IOMMU_HWPT_ALLOC_PASID | IOMMU_HWPT_ALLOC_NEST_PARENT
which isn't supported on ARM :-/

The VIOMMU_FLAG_PASID_SUPPORTED flag is to allow PCI core to expose
PASID cap. It feels like we need a different VIOMMU flag for VT-d,
e.g. VIOMMU_FLAG_WANT_PASID_ATTACH?

Thanks
Nicolin
RE: [PATCH v2 03/14] vfio/iommufd: Create nesting parent hwpt with IOMMU_HWPT_ALLOC_PASID flag
Posted by Duan, Zhenzhong 6 days, 19 hours ago

>-----Original Message-----
>From: Nicolin Chen <nicolinc@nvidia.com>
>Subject: Re: [PATCH v2 03/14] vfio/iommufd: Create nesting parent hwpt with
>IOMMU_HWPT_ALLOC_PASID flag
>
>On Thu, Mar 26, 2026 at 05:11:17AM -0400, Zhenzhong Duan wrote:
>> @@ -430,6 +431,11 @@ static bool
>iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>>          }
>>      }
>>
>> +    if (max_pasid_log2 &&
>> +        vfio_device_get_viommu_flags_pasid_supported(vbasedev)) {
>> +        flags |= IOMMU_HWPT_ALLOC_PASID;
>> +    }
>
>This would set it to:
>	IOMMU_HWPT_ALLOC_PASID | IOMMU_HWPT_ALLOC_NEST_PARENT
>which isn't supported on ARM :-/

I am a bit confused, if smmu supports dirty tracking, flags would be
set to IOMMU_HWPT_ALLOC_DIRTY_TRACKING | IOMMU_HWPT_ALLOC_NEST_PARENT,
in arm_smmu_domain_alloc_paging_flags(), I see it will return -EOPNOTSUPP.
So how did smmu work in this case?

>
>The VIOMMU_FLAG_PASID_SUPPORTED flag is to allow PCI core to expose
>PASID cap. It feels like we need a different VIOMMU flag for VT-d,
>e.g. VIOMMU_FLAG_WANT_PASID_ATTACH?
>
>Thanks
>Nicolin
Re: [PATCH v2 03/14] vfio/iommufd: Create nesting parent hwpt with IOMMU_HWPT_ALLOC_PASID flag
Posted by Nicolin Chen 6 days, 18 hours ago
On Fri, Mar 27, 2026 at 02:29:20AM +0000, Duan, Zhenzhong wrote:
> >-----Original Message-----
> >From: Nicolin Chen <nicolinc@nvidia.com>
> >Subject: Re: [PATCH v2 03/14] vfio/iommufd: Create nesting parent hwpt with
> >IOMMU_HWPT_ALLOC_PASID flag
> >
> >On Thu, Mar 26, 2026 at 05:11:17AM -0400, Zhenzhong Duan wrote:
> >> @@ -430,6 +431,11 @@ static bool
> >iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
> >>          }
> >>      }
> >>
> >> +    if (max_pasid_log2 &&
> >> +        vfio_device_get_viommu_flags_pasid_supported(vbasedev)) {
> >> +        flags |= IOMMU_HWPT_ALLOC_PASID;
> >> +    }
> >
> >This would set it to:
> >	IOMMU_HWPT_ALLOC_PASID | IOMMU_HWPT_ALLOC_NEST_PARENT
> >which isn't supported on ARM :-/
> 
> I am a bit confused, if smmu supports dirty tracking, flags would be
> set to IOMMU_HWPT_ALLOC_DIRTY_TRACKING | IOMMU_HWPT_ALLOC_NEST_PARENT,
> in arm_smmu_domain_alloc_paging_flags(), I see it will return -EOPNOTSUPP.
> So how did smmu work in this case?

You hit a point. I almost forgot we need to do something with that
dirty tracking flag. This is currently broken..

For NVIDIA, the current generation Grace CPU doesn't support dirty
tracking. So, our QEMU VMs don't set that flag. This is just lucky
for us. Yet, it would trigger -EOPNOTSUPP on ARM CPU that supports,
as you mentioned.

For pasid attachment however, ARM doesn't need it: regular pasid=0
attach already has the pointer to a stage-1 PASID table.

Nicolin
RE: [PATCH v2 03/14] vfio/iommufd: Create nesting parent hwpt with IOMMU_HWPT_ALLOC_PASID flag
Posted by Duan, Zhenzhong 6 days, 15 hours ago

>-----Original Message-----
>From: Nicolin Chen <nicolinc@nvidia.com>
>Subject: Re: [PATCH v2 03/14] vfio/iommufd: Create nesting parent hwpt with
>IOMMU_HWPT_ALLOC_PASID flag
>
>On Fri, Mar 27, 2026 at 02:29:20AM +0000, Duan, Zhenzhong wrote:
>> >-----Original Message-----
>> >From: Nicolin Chen <nicolinc@nvidia.com>
>> >Subject: Re: [PATCH v2 03/14] vfio/iommufd: Create nesting parent hwpt with
>> >IOMMU_HWPT_ALLOC_PASID flag
>> >
>> >On Thu, Mar 26, 2026 at 05:11:17AM -0400, Zhenzhong Duan wrote:
>> >> @@ -430,6 +431,11 @@ static bool
>> >iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>> >>          }
>> >>      }
>> >>
>> >> +    if (max_pasid_log2 &&
>> >> +        vfio_device_get_viommu_flags_pasid_supported(vbasedev)) {
>> >> +        flags |= IOMMU_HWPT_ALLOC_PASID;
>> >> +    }
>> >
>> >This would set it to:
>> >	IOMMU_HWPT_ALLOC_PASID | IOMMU_HWPT_ALLOC_NEST_PARENT
>> >which isn't supported on ARM :-/
>>
>> I am a bit confused, if smmu supports dirty tracking, flags would be
>> set to IOMMU_HWPT_ALLOC_DIRTY_TRACKING |
>IOMMU_HWPT_ALLOC_NEST_PARENT,
>> in arm_smmu_domain_alloc_paging_flags(), I see it will return -EOPNOTSUPP.
>> So how did smmu work in this case?
>
>You hit a point. I almost forgot we need to do something with that
>dirty tracking flag. This is currently broken..
>
>For NVIDIA, the current generation Grace CPU doesn't support dirty
>tracking. So, our QEMU VMs don't set that flag. This is just lucky
>for us. Yet, it would trigger -EOPNOTSUPP on ARM CPU that supports,
>as you mentioned.
>
>For pasid attachment however, ARM doesn't need it: regular pasid=0
>attach already has the pointer to a stage-1 PASID table.

Clear, will add VIOMMU_FLAG_WANT_PASID_ATTACH and check it instead.

Thanks
Zhenzhong