[PATCH v4 04/20] vfio/iommufd: Force creating nested parent domain

Zhenzhong Duan posted 20 patches 3 months, 2 weeks ago
Maintainers: "Michael S. Tsirkin" <mst@redhat.com>, Jason Wang <jasowang@redhat.com>, Yi Liu <yi.l.liu@intel.com>, "Clément Mathieu--Drif" <clement.mathieu--drif@eviden.com>, Paolo Bonzini <pbonzini@redhat.com>, Richard Henderson <richard.henderson@linaro.org>, Eduardo Habkost <eduardo@habkost.net>, Marcel Apfelbaum <marcel.apfelbaum@gmail.com>, Alex Williamson <alex.williamson@redhat.com>, "Cédric Le Goater" <clg@redhat.com>, Eric Auger <eric.auger@redhat.com>, Zhenzhong Duan <zhenzhong.duan@intel.com>
There is a newer version of this series
[PATCH v4 04/20] vfio/iommufd: Force creating nested parent domain
Posted by Zhenzhong Duan 3 months, 2 weeks ago
Call pci_device_get_viommu_cap() to get if vIOMMU supports VIOMMU_CAP_HW_NESTED,
if yes, create nested parent domain which could be reused by vIOMMU to create
nested domain.

It is safe because hw_caps & VIOMMU_CAP_HW_NESTED cannot be set yet because
s->flts is forbidden until we support passthrough device with x-flts=on.

Suggested-by: Nicolin Chen <nicolinc@nvidia.com>
Suggested-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
---
 hw/vfio/iommufd.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 48c590b6a9..61a548f13f 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -20,6 +20,7 @@
 #include "trace.h"
 #include "qapi/error.h"
 #include "system/iommufd.h"
+#include "hw/iommu.h"
 #include "hw/qdev-core.h"
 #include "hw/vfio/vfio-cpr.h"
 #include "system/reset.h"
@@ -379,6 +380,19 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
         flags = IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
     }
 
+    /*
+     * If vIOMMU supports stage-1 translation, force to create nested parent
+     * domain which could be reused by vIOMMU to create nested domain.
+     */
+    if (vbasedev->type == VFIO_DEVICE_TYPE_PCI) {
+        VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
+
+        hw_caps = pci_device_get_viommu_cap(&vdev->pdev);
+        if (hw_caps & VIOMMU_CAP_HW_NESTED) {
+            flags |= IOMMU_HWPT_ALLOC_NEST_PARENT;
+        }
+    }
+
     if (cpr_is_incoming()) {
         hwpt_id = vbasedev->cpr.hwpt_id;
         goto skip_alloc;
-- 
2.47.1
Re: [PATCH v4 04/20] vfio/iommufd: Force creating nested parent domain
Posted by Cédric Le Goater 3 months, 2 weeks ago
On 7/29/25 11:20, Zhenzhong Duan wrote:
> Call pci_device_get_viommu_cap() to get if vIOMMU supports VIOMMU_CAP_HW_NESTED,
> if yes, create nested parent domain which could be reused by vIOMMU to create
> nested domain.
> 
> It is safe because hw_caps & VIOMMU_CAP_HW_NESTED cannot be set yet because
> s->flts is forbidden until we support passthrough device with x-flts=on.
> 
> Suggested-by: Nicolin Chen <nicolinc@nvidia.com>
> Suggested-by: Yi Liu <yi.l.liu@intel.com>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> Reviewed-by: Eric Auger <eric.auger@redhat.com>
> ---
>   hw/vfio/iommufd.c | 14 ++++++++++++++
>   1 file changed, 14 insertions(+)
> 
> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> index 48c590b6a9..61a548f13f 100644
> --- a/hw/vfio/iommufd.c
> +++ b/hw/vfio/iommufd.c
> @@ -20,6 +20,7 @@
>   #include "trace.h"
>   #include "qapi/error.h"
>   #include "system/iommufd.h"
> +#include "hw/iommu.h"
>   #include "hw/qdev-core.h"
>   #include "hw/vfio/vfio-cpr.h"
>   #include "system/reset.h"
> @@ -379,6 +380,19 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>           flags = IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
>       }
>   
> +    /*
> +     * If vIOMMU supports stage-1 translation, force to create nested parent
> +     * domain which could be reused by vIOMMU to create nested domain.
> +     */
> +    if (vbasedev->type == VFIO_DEVICE_TYPE_PCI) {
> +        VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
> +
> +        hw_caps = pci_device_get_viommu_cap(&vdev->pdev);
> +        if (hw_caps & VIOMMU_CAP_HW_NESTED) {
> +            flags |= IOMMU_HWPT_ALLOC_NEST_PARENT;
> +        }
> +    }
>

Could you please add a wrapper for the above ? Something like :

static bool vfio_device_viommu_get_nested(VFIODevice *vbasedev)
{
     if (vbasedev->type == VFIO_DEVICE_TYPE_PCI) {
         VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);

	return !!(pci_device_get_viommu_cap(&vdev->pdev) & VIOMMU_CAP_HW_NESTED);
     }
     return false;
}
	
May be this routine belongs to hw/vfio/device.c.


Thanks,

C.




>       if (cpr_is_incoming()) {
>           hwpt_id = vbasedev->cpr.hwpt_id;
>           goto skip_alloc;
RE: [PATCH v4 04/20] vfio/iommufd: Force creating nested parent domain
Posted by Duan, Zhenzhong 3 months, 2 weeks ago

>-----Original Message-----
>From: Cédric Le Goater <clg@redhat.com>
>Subject: Re: [PATCH v4 04/20] vfio/iommufd: Force creating nested parent
>domain
>
>On 7/29/25 11:20, Zhenzhong Duan wrote:
>> Call pci_device_get_viommu_cap() to get if vIOMMU supports
>VIOMMU_CAP_HW_NESTED,
>> if yes, create nested parent domain which could be reused by vIOMMU to
>create
>> nested domain.
>>
>> It is safe because hw_caps & VIOMMU_CAP_HW_NESTED cannot be set yet
>because
>> s->flts is forbidden until we support passthrough device with x-flts=on.
>>
>> Suggested-by: Nicolin Chen <nicolinc@nvidia.com>
>> Suggested-by: Yi Liu <yi.l.liu@intel.com>
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> Reviewed-by: Eric Auger <eric.auger@redhat.com>
>> ---
>>   hw/vfio/iommufd.c | 14 ++++++++++++++
>>   1 file changed, 14 insertions(+)
>>
>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>> index 48c590b6a9..61a548f13f 100644
>> --- a/hw/vfio/iommufd.c
>> +++ b/hw/vfio/iommufd.c
>> @@ -20,6 +20,7 @@
>>   #include "trace.h"
>>   #include "qapi/error.h"
>>   #include "system/iommufd.h"
>> +#include "hw/iommu.h"
>>   #include "hw/qdev-core.h"
>>   #include "hw/vfio/vfio-cpr.h"
>>   #include "system/reset.h"
>> @@ -379,6 +380,19 @@ static bool
>iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>>           flags = IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
>>       }
>>
>> +    /*
>> +     * If vIOMMU supports stage-1 translation, force to create nested
>parent
>> +     * domain which could be reused by vIOMMU to create nested
>domain.
>> +     */
>> +    if (vbasedev->type == VFIO_DEVICE_TYPE_PCI) {
>> +        VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice,
>vbasedev);
>> +
>> +        hw_caps = pci_device_get_viommu_cap(&vdev->pdev);
>> +        if (hw_caps & VIOMMU_CAP_HW_NESTED) {
>> +            flags |= IOMMU_HWPT_ALLOC_NEST_PARENT;
>> +        }
>> +    }
>>
>
>Could you please add a wrapper for the above ? Something like :
>
>static bool vfio_device_viommu_get_nested(VFIODevice *vbasedev)
>{
>     if (vbasedev->type == VFIO_DEVICE_TYPE_PCI) {
>         VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice,
>vbasedev);
>
>	return !!(pci_device_get_viommu_cap(&vdev->pdev) &
>VIOMMU_CAP_HW_NESTED);
>     }
>     return false;
>}
>
>May be this routine belongs to hw/vfio/device.c.

Done, see https://github.com/yiliu1765/qemu/commit/7ce1af90d5c1f418f23c9e397a16a3914b30f09f

I also introduced another wrapper vfio_device_to_vfio_pci(), let me know if you think it's unnecessary, I'll fold it.

Thanks
Zhenzhong
Re: [PATCH v4 04/20] vfio/iommufd: Force creating nested parent domain
Posted by Cédric Le Goater 3 months, 2 weeks ago
On 7/30/25 12:55, Duan, Zhenzhong wrote:
> 
> 
>> -----Original Message-----
>> From: Cédric Le Goater <clg@redhat.com>
>> Subject: Re: [PATCH v4 04/20] vfio/iommufd: Force creating nested parent
>> domain
>>
>> On 7/29/25 11:20, Zhenzhong Duan wrote:
>>> Call pci_device_get_viommu_cap() to get if vIOMMU supports
>> VIOMMU_CAP_HW_NESTED,
>>> if yes, create nested parent domain which could be reused by vIOMMU to
>> create
>>> nested domain.
>>>
>>> It is safe because hw_caps & VIOMMU_CAP_HW_NESTED cannot be set yet
>> because
>>> s->flts is forbidden until we support passthrough device with x-flts=on.
>>>
>>> Suggested-by: Nicolin Chen <nicolinc@nvidia.com>
>>> Suggested-by: Yi Liu <yi.l.liu@intel.com>
>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>> Reviewed-by: Eric Auger <eric.auger@redhat.com>
>>> ---
>>>    hw/vfio/iommufd.c | 14 ++++++++++++++
>>>    1 file changed, 14 insertions(+)
>>>
>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>> index 48c590b6a9..61a548f13f 100644
>>> --- a/hw/vfio/iommufd.c
>>> +++ b/hw/vfio/iommufd.c
>>> @@ -20,6 +20,7 @@
>>>    #include "trace.h"
>>>    #include "qapi/error.h"
>>>    #include "system/iommufd.h"
>>> +#include "hw/iommu.h"
>>>    #include "hw/qdev-core.h"
>>>    #include "hw/vfio/vfio-cpr.h"
>>>    #include "system/reset.h"
>>> @@ -379,6 +380,19 @@ static bool
>> iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>>>            flags = IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
>>>        }
>>>
>>> +    /*
>>> +     * If vIOMMU supports stage-1 translation, force to create nested
>> parent
>>> +     * domain which could be reused by vIOMMU to create nested
>> domain.
>>> +     */
>>> +    if (vbasedev->type == VFIO_DEVICE_TYPE_PCI) {
>>> +        VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice,
>> vbasedev);
>>> +
>>> +        hw_caps = pci_device_get_viommu_cap(&vdev->pdev);
>>> +        if (hw_caps & VIOMMU_CAP_HW_NESTED) {
>>> +            flags |= IOMMU_HWPT_ALLOC_NEST_PARENT;
>>> +        }
>>> +    }
>>>
>>
>> Could you please add a wrapper for the above ? Something like :
>>
>> static bool vfio_device_viommu_get_nested(VFIODevice *vbasedev)
>> {
>>      if (vbasedev->type == VFIO_DEVICE_TYPE_PCI) {
>>          VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice,
>> vbasedev);
>>
>> 	return !!(pci_device_get_viommu_cap(&vdev->pdev) &
>> VIOMMU_CAP_HW_NESTED);
>>      }
>>      return false;
>> }
>>
>> May be this routine belongs to hw/vfio/device.c.
> 
> Done, see https://github.com/yiliu1765/qemu/commit/7ce1af90d5c1f418f23c9e397a16a3914b30f09f
> 
> I also introduced another wrapper vfio_device_to_vfio_pci(), let me know if you think it's unnecessary, I'll fold it.

It's good to have it if you use it everywhere:

hw/vfio/device.c:    if (vbasedev->type != VFIO_DEVICE_TYPE_PCI) {
hw/vfio/container.c:                vbasedev_iter->type != VFIO_DEVICE_TYPE_PCI) {
hw/vfio/container.c:                vbasedev_iter->type != VFIO_DEVICE_TYPE_PCI) {
hw/vfio/listener.c:    if (vbasedev && vbasedev->type == VFIO_DEVICE_TYPE_PCI) {
hw/vfio/listener.c:        if (vbasedev->type != VFIO_DEVICE_TYPE_PCI) {
hw/vfio/iommufd.c:    if (vbasedev->type == VFIO_DEVICE_TYPE_PCI) {
hw/vfio/iommufd.c:        vbasedev_tmp->type != VFIO_DEVICE_TYPE_PCI) {

So, I would address this cleanup separately.

FYI, We also have this helper :

VFIODevice *vfio_get_vfio_device(Object *obj)
{
     if (object_dynamic_cast(obj, TYPE_VFIO_PCI)) {
         return &VFIO_PCI_BASE(obj)->vbasedev;
     } else {
         return NULL;
     }
}


Thanks,

C.




RE: [PATCH v4 04/20] vfio/iommufd: Force creating nested parent domain
Posted by Duan, Zhenzhong 3 months, 2 weeks ago

>-----Original Message-----
>From: Cédric Le Goater <clg@redhat.com>
>Subject: Re: [PATCH v4 04/20] vfio/iommufd: Force creating nested parent
>domain
>
>On 7/30/25 12:55, Duan, Zhenzhong wrote:
>>
>>
>>> -----Original Message-----
>>> From: Cédric Le Goater <clg@redhat.com>
>>> Subject: Re: [PATCH v4 04/20] vfio/iommufd: Force creating nested parent
>>> domain
>>>
>>> On 7/29/25 11:20, Zhenzhong Duan wrote:
>>>> Call pci_device_get_viommu_cap() to get if vIOMMU supports
>>> VIOMMU_CAP_HW_NESTED,
>>>> if yes, create nested parent domain which could be reused by vIOMMU to
>>> create
>>>> nested domain.
>>>>
>>>> It is safe because hw_caps & VIOMMU_CAP_HW_NESTED cannot be set
>yet
>>> because
>>>> s->flts is forbidden until we support passthrough device with x-flts=on.
>>>>
>>>> Suggested-by: Nicolin Chen <nicolinc@nvidia.com>
>>>> Suggested-by: Yi Liu <yi.l.liu@intel.com>
>>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>>> Reviewed-by: Eric Auger <eric.auger@redhat.com>
>>>> ---
>>>>    hw/vfio/iommufd.c | 14 ++++++++++++++
>>>>    1 file changed, 14 insertions(+)
>>>>
>>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>>> index 48c590b6a9..61a548f13f 100644
>>>> --- a/hw/vfio/iommufd.c
>>>> +++ b/hw/vfio/iommufd.c
>>>> @@ -20,6 +20,7 @@
>>>>    #include "trace.h"
>>>>    #include "qapi/error.h"
>>>>    #include "system/iommufd.h"
>>>> +#include "hw/iommu.h"
>>>>    #include "hw/qdev-core.h"
>>>>    #include "hw/vfio/vfio-cpr.h"
>>>>    #include "system/reset.h"
>>>> @@ -379,6 +380,19 @@ static bool
>>> iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>>>>            flags = IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
>>>>        }
>>>>
>>>> +    /*
>>>> +     * If vIOMMU supports stage-1 translation, force to create nested
>>> parent
>>>> +     * domain which could be reused by vIOMMU to create nested
>>> domain.
>>>> +     */
>>>> +    if (vbasedev->type == VFIO_DEVICE_TYPE_PCI) {
>>>> +        VFIOPCIDevice *vdev = container_of(vbasedev,
>VFIOPCIDevice,
>>> vbasedev);
>>>> +
>>>> +        hw_caps = pci_device_get_viommu_cap(&vdev->pdev);
>>>> +        if (hw_caps & VIOMMU_CAP_HW_NESTED) {
>>>> +            flags |= IOMMU_HWPT_ALLOC_NEST_PARENT;
>>>> +        }
>>>> +    }
>>>>
>>>
>>> Could you please add a wrapper for the above ? Something like :
>>>
>>> static bool vfio_device_viommu_get_nested(VFIODevice *vbasedev)
>>> {
>>>      if (vbasedev->type == VFIO_DEVICE_TYPE_PCI) {
>>>          VFIOPCIDevice *vdev = container_of(vbasedev,
>VFIOPCIDevice,
>>> vbasedev);
>>>
>>> 	return !!(pci_device_get_viommu_cap(&vdev->pdev) &
>>> VIOMMU_CAP_HW_NESTED);
>>>      }
>>>      return false;
>>> }
>>>
>>> May be this routine belongs to hw/vfio/device.c.
>>
>> Done, see
>https://github.com/yiliu1765/qemu/commit/7ce1af90d5c1f418f23c9e397a16
>a3914b30f09f
>>
>> I also introduced another wrapper vfio_device_to_vfio_pci(), let me know if
>you think it's unnecessary, I'll fold it.
>
>It's good to have it if you use it everywhere:
>
>hw/vfio/device.c:    if (vbasedev->type != VFIO_DEVICE_TYPE_PCI) {
>hw/vfio/container.c:                vbasedev_iter->type !=
>VFIO_DEVICE_TYPE_PCI) {
>hw/vfio/container.c:                vbasedev_iter->type !=
>VFIO_DEVICE_TYPE_PCI) {
>hw/vfio/listener.c:    if (vbasedev && vbasedev->type ==
>VFIO_DEVICE_TYPE_PCI) {
>hw/vfio/listener.c:        if (vbasedev->type != VFIO_DEVICE_TYPE_PCI) {
>hw/vfio/iommufd.c:    if (vbasedev->type == VFIO_DEVICE_TYPE_PCI) {
>hw/vfio/iommufd.c:        vbasedev_tmp->type !=
>VFIO_DEVICE_TYPE_PCI) {
>
>So, I would address this cleanup separately.

OK, will send a separate one.

>
>FYI, We also have this helper :
>
>VFIODevice *vfio_get_vfio_device(Object *obj)
>{
>     if (object_dynamic_cast(obj, TYPE_VFIO_PCI)) {
>         return &VFIO_PCI_BASE(obj)->vbasedev;
>     } else {
>         return NULL;
>     }
>}

I'll put new helper right under it.

Thanks
Zhenzhong