[PATCH v5 05/21] vfio/iommufd: Force creating nested parent domain

Zhenzhong Duan posted 21 patches 2 months, 3 weeks ago
Maintainers: "Michael S. Tsirkin" <mst@redhat.com>, Jason Wang <jasowang@redhat.com>, Yi Liu <yi.l.liu@intel.com>, "Clément Mathieu--Drif" <clement.mathieu--drif@eviden.com>, Marcel Apfelbaum <marcel.apfelbaum@gmail.com>, Paolo Bonzini <pbonzini@redhat.com>, Richard Henderson <richard.henderson@linaro.org>, Eduardo Habkost <eduardo@habkost.net>, Alex Williamson <alex.williamson@redhat.com>, "Cédric Le Goater" <clg@redhat.com>, Eric Auger <eric.auger@redhat.com>, Zhenzhong Duan <zhenzhong.duan@intel.com>
[PATCH v5 05/21] vfio/iommufd: Force creating nested parent domain
Posted by Zhenzhong Duan 2 months, 3 weeks ago
Call pci_device_get_viommu_cap() to get if vIOMMU supports VIOMMU_CAP_HW_NESTED,
if yes, create nested parent domain which could be reused by vIOMMU to create
nested domain.

Introduce helper vfio_device_viommu_get_nested to facilitate this
implementation.

It is safe because even if VIOMMU_CAP_HW_NESTED is returned, s->flts is
forbidden and VFIO device fails in set_iommu_device() call, until we support
passthrough device with x-flts=on.

Suggested-by: Nicolin Chen <nicolinc@nvidia.com>
Suggested-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
 include/hw/vfio/vfio-device.h |  2 ++
 hw/vfio/device.c              | 12 ++++++++++++
 hw/vfio/iommufd.c             |  8 ++++++++
 3 files changed, 22 insertions(+)

diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
index 6e4d5ccdac..ecd82c16c7 100644
--- a/include/hw/vfio/vfio-device.h
+++ b/include/hw/vfio/vfio-device.h
@@ -257,6 +257,8 @@ void vfio_device_prepare(VFIODevice *vbasedev, VFIOContainerBase *bcontainer,
 
 void vfio_device_unprepare(VFIODevice *vbasedev);
 
+bool vfio_device_viommu_get_nested(VFIODevice *vbasedev);
+
 int vfio_device_get_region_info(VFIODevice *vbasedev, int index,
                                 struct vfio_region_info **info);
 int vfio_device_get_region_info_type(VFIODevice *vbasedev, uint32_t type,
diff --git a/hw/vfio/device.c b/hw/vfio/device.c
index 08f12ac31f..3eeb71bd51 100644
--- a/hw/vfio/device.c
+++ b/hw/vfio/device.c
@@ -23,6 +23,7 @@
 
 #include "hw/vfio/vfio-device.h"
 #include "hw/vfio/pci.h"
+#include "hw/iommu.h"
 #include "hw/hw.h"
 #include "trace.h"
 #include "qapi/error.h"
@@ -504,6 +505,17 @@ void vfio_device_unprepare(VFIODevice *vbasedev)
     vbasedev->bcontainer = NULL;
 }
 
+bool vfio_device_viommu_get_nested(VFIODevice *vbasedev)
+{
+    VFIOPCIDevice *vdev = vfio_pci_from_vfio_device(vbasedev);
+
+    if (vdev) {
+        return !!(pci_device_get_viommu_cap(&vdev->pdev) &
+                  VIOMMU_CAP_HW_NESTED);
+    }
+    return false;
+}
+
 /*
  * Traditional ioctl() based io
  */
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 8c27222f75..e503c232e1 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -379,6 +379,14 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
         flags = IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
     }
 
+    /*
+     * If vIOMMU supports stage-1 translation, force to create nested parent
+     * domain which could be reused by vIOMMU to create nested domain.
+     */
+    if (vfio_device_viommu_get_nested(vbasedev)) {
+        flags |= IOMMU_HWPT_ALLOC_NEST_PARENT;
+    }
+
     if (cpr_is_incoming()) {
         hwpt_id = vbasedev->cpr.hwpt_id;
         goto skip_alloc;
-- 
2.47.1
Re: [PATCH v5 05/21] vfio/iommufd: Force creating nested parent domain
Posted by Eric Auger 2 months, 2 weeks ago
Hi Zhenzhong,

On 8/22/25 8:40 AM, Zhenzhong Duan wrote:
> Call pci_device_get_viommu_cap() to get if vIOMMU supports VIOMMU_CAP_HW_NESTED,
> if yes, create nested parent domain which could be reused by vIOMMU to create
> nested domain.
>
> Introduce helper vfio_device_viommu_get_nested to facilitate this
> implementation.
>
> It is safe because even if VIOMMU_CAP_HW_NESTED is returned, s->flts is
> forbidden and VFIO device fails in set_iommu_device() call, until we support
> passthrough device with x-flts=on.
>
> Suggested-by: Nicolin Chen <nicolinc@nvidia.com>
> Suggested-by: Yi Liu <yi.l.liu@intel.com>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
>  include/hw/vfio/vfio-device.h |  2 ++
>  hw/vfio/device.c              | 12 ++++++++++++
>  hw/vfio/iommufd.c             |  8 ++++++++
>  3 files changed, 22 insertions(+)
>
> diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
> index 6e4d5ccdac..ecd82c16c7 100644
> --- a/include/hw/vfio/vfio-device.h
> +++ b/include/hw/vfio/vfio-device.h
> @@ -257,6 +257,8 @@ void vfio_device_prepare(VFIODevice *vbasedev, VFIOContainerBase *bcontainer,
>  
>  void vfio_device_unprepare(VFIODevice *vbasedev);
>  
> +bool vfio_device_viommu_get_nested(VFIODevice *vbasedev);
I would suggest vfio_device_viommu_has_feature_hw_nested or something alike
get usually means tou take a ref count associated with a put
> +
>  int vfio_device_get_region_info(VFIODevice *vbasedev, int index,
>                                  struct vfio_region_info **info);
>  int vfio_device_get_region_info_type(VFIODevice *vbasedev, uint32_t type,
> diff --git a/hw/vfio/device.c b/hw/vfio/device.c
> index 08f12ac31f..3eeb71bd51 100644
> --- a/hw/vfio/device.c
> +++ b/hw/vfio/device.c
> @@ -23,6 +23,7 @@
>  
>  #include "hw/vfio/vfio-device.h"
>  #include "hw/vfio/pci.h"
> +#include "hw/iommu.h"
>  #include "hw/hw.h"
>  #include "trace.h"
>  #include "qapi/error.h"
> @@ -504,6 +505,17 @@ void vfio_device_unprepare(VFIODevice *vbasedev)
>      vbasedev->bcontainer = NULL;
>  }
>  
> +bool vfio_device_viommu_get_nested(VFIODevice *vbasedev)
> +{
> +    VFIOPCIDevice *vdev = vfio_pci_from_vfio_device(vbasedev);
> +
> +    if (vdev) {
> +        return !!(pci_device_get_viommu_cap(&vdev->pdev) &
> +                  VIOMMU_CAP_HW_NESTED);
> +    }
> +    return false;
> +}
> +
>  /*
>   * Traditional ioctl() based io
>   */
> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> index 8c27222f75..e503c232e1 100644
> --- a/hw/vfio/iommufd.c
> +++ b/hw/vfio/iommufd.c
> @@ -379,6 +379,14 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>          flags = IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
>      }
>  
> +    /*
> +     * If vIOMMU supports stage-1 translation, force to create nested parent
I would rather not use another terminology here. You previously used
hw_nested, I think that's better. Also bear in mind that smmu supports
S1, S2 and S1+S2 in emulated code.

Thanks

Eric
> +     * domain which could be reused by vIOMMU to create nested domain.
> +     */
> +    if (vfio_device_viommu_get_nested(vbasedev)) {
> +        flags |= IOMMU_HWPT_ALLOC_NEST_PARENT;
> +    }
> +
>      if (cpr_is_incoming()) {
>          hwpt_id = vbasedev->cpr.hwpt_id;
>          goto skip_alloc;
RE: [PATCH v5 05/21] vfio/iommufd: Force creating nested parent domain
Posted by Duan, Zhenzhong 2 months, 2 weeks ago
Hi Eric,

>-----Original Message-----
>From: Eric Auger <eric.auger@redhat.com>
>Subject: Re: [PATCH v5 05/21] vfio/iommufd: Force creating nested parent
>domain
>
>Hi Zhenzhong,
>
>On 8/22/25 8:40 AM, Zhenzhong Duan wrote:
>> Call pci_device_get_viommu_cap() to get if vIOMMU supports
>VIOMMU_CAP_HW_NESTED,
>> if yes, create nested parent domain which could be reused by vIOMMU to
>create
>> nested domain.
>>
>> Introduce helper vfio_device_viommu_get_nested to facilitate this
>> implementation.
>>
>> It is safe because even if VIOMMU_CAP_HW_NESTED is returned, s->flts is
>> forbidden and VFIO device fails in set_iommu_device() call, until we support
>> passthrough device with x-flts=on.
>>
>> Suggested-by: Nicolin Chen <nicolinc@nvidia.com>
>> Suggested-by: Yi Liu <yi.l.liu@intel.com>
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> ---
>>  include/hw/vfio/vfio-device.h |  2 ++
>>  hw/vfio/device.c              | 12 ++++++++++++
>>  hw/vfio/iommufd.c             |  8 ++++++++
>>  3 files changed, 22 insertions(+)
>>
>> diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
>> index 6e4d5ccdac..ecd82c16c7 100644
>> --- a/include/hw/vfio/vfio-device.h
>> +++ b/include/hw/vfio/vfio-device.h
>> @@ -257,6 +257,8 @@ void vfio_device_prepare(VFIODevice *vbasedev,
>VFIOContainerBase *bcontainer,
>>
>>  void vfio_device_unprepare(VFIODevice *vbasedev);
>>
>> +bool vfio_device_viommu_get_nested(VFIODevice *vbasedev);
>I would suggest vfio_device_viommu_has_feature_hw_nested or something
>alike
>get usually means tou take a ref count associated with a put

Sure.

>> +
>>  int vfio_device_get_region_info(VFIODevice *vbasedev, int index,
>>                                  struct vfio_region_info **info);
>>  int vfio_device_get_region_info_type(VFIODevice *vbasedev, uint32_t
>type,
>> diff --git a/hw/vfio/device.c b/hw/vfio/device.c
>> index 08f12ac31f..3eeb71bd51 100644
>> --- a/hw/vfio/device.c
>> +++ b/hw/vfio/device.c
>> @@ -23,6 +23,7 @@
>>
>>  #include "hw/vfio/vfio-device.h"
>>  #include "hw/vfio/pci.h"
>> +#include "hw/iommu.h"
>>  #include "hw/hw.h"
>>  #include "trace.h"
>>  #include "qapi/error.h"
>> @@ -504,6 +505,17 @@ void vfio_device_unprepare(VFIODevice
>*vbasedev)
>>      vbasedev->bcontainer = NULL;
>>  }
>>
>> +bool vfio_device_viommu_get_nested(VFIODevice *vbasedev)
>> +{
>> +    VFIOPCIDevice *vdev = vfio_pci_from_vfio_device(vbasedev);
>> +
>> +    if (vdev) {
>> +        return !!(pci_device_get_viommu_cap(&vdev->pdev) &
>> +                  VIOMMU_CAP_HW_NESTED);
>> +    }
>> +    return false;
>> +}
>> +
>>  /*
>>   * Traditional ioctl() based io
>>   */
>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>> index 8c27222f75..e503c232e1 100644
>> --- a/hw/vfio/iommufd.c
>> +++ b/hw/vfio/iommufd.c
>> @@ -379,6 +379,14 @@ static bool
>iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>>          flags = IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
>>      }
>>
>> +    /*
>> +     * If vIOMMU supports stage-1 translation, force to create nested
>parent
>I would rather not use another terminology here. You previously used
>hw_nested, I think that's better. Also bear in mind that smmu supports
>S1, S2 and S1+S2 in emulated code.

What about 'nesting parent' to match kernel side terminology, per Nicolin's suggestion:

In kernel kdoc/uAPI, we use:
 - "nesting parent" for stage-2 object
 - "nested hwpt", "nested domain" for stage-1 object

Thanks
Zhenzhong
>
>Thanks
>
>Eric
>> +     * domain which could be reused by vIOMMU to create nested
>domain.
>> +     */
>> +    if (vfio_device_viommu_get_nested(vbasedev)) {
>> +        flags |= IOMMU_HWPT_ALLOC_NEST_PARENT;
>> +    }
>> +
>>      if (cpr_is_incoming()) {
>>          hwpt_id = vbasedev->cpr.hwpt_id;
>>          goto skip_alloc;

Re: [PATCH v5 05/21] vfio/iommufd: Force creating nested parent domain
Posted by Eric Auger 2 months, 2 weeks ago

On 8/28/25 11:53 AM, Duan, Zhenzhong wrote:
> Hi Eric,
>
>> -----Original Message-----
>> From: Eric Auger <eric.auger@redhat.com>
>> Subject: Re: [PATCH v5 05/21] vfio/iommufd: Force creating nested parent
>> domain
>>
>> Hi Zhenzhong,
>>
>> On 8/22/25 8:40 AM, Zhenzhong Duan wrote:
>>> Call pci_device_get_viommu_cap() to get if vIOMMU supports
>> VIOMMU_CAP_HW_NESTED,
>>> if yes, create nested parent domain which could be reused by vIOMMU to
>> create
>>> nested domain.
>>>
>>> Introduce helper vfio_device_viommu_get_nested to facilitate this
>>> implementation.
>>>
>>> It is safe because even if VIOMMU_CAP_HW_NESTED is returned, s->flts is
>>> forbidden and VFIO device fails in set_iommu_device() call, until we support
>>> passthrough device with x-flts=on.
>>>
>>> Suggested-by: Nicolin Chen <nicolinc@nvidia.com>
>>> Suggested-by: Yi Liu <yi.l.liu@intel.com>
>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>> ---
>>>  include/hw/vfio/vfio-device.h |  2 ++
>>>  hw/vfio/device.c              | 12 ++++++++++++
>>>  hw/vfio/iommufd.c             |  8 ++++++++
>>>  3 files changed, 22 insertions(+)
>>>
>>> diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
>>> index 6e4d5ccdac..ecd82c16c7 100644
>>> --- a/include/hw/vfio/vfio-device.h
>>> +++ b/include/hw/vfio/vfio-device.h
>>> @@ -257,6 +257,8 @@ void vfio_device_prepare(VFIODevice *vbasedev,
>> VFIOContainerBase *bcontainer,
>>>  void vfio_device_unprepare(VFIODevice *vbasedev);
>>>
>>> +bool vfio_device_viommu_get_nested(VFIODevice *vbasedev);
>> I would suggest vfio_device_viommu_has_feature_hw_nested or something
>> alike
>> get usually means tou take a ref count associated with a put
> Sure.
>
>>> +
>>>  int vfio_device_get_region_info(VFIODevice *vbasedev, int index,
>>>                                  struct vfio_region_info **info);
>>>  int vfio_device_get_region_info_type(VFIODevice *vbasedev, uint32_t
>> type,
>>> diff --git a/hw/vfio/device.c b/hw/vfio/device.c
>>> index 08f12ac31f..3eeb71bd51 100644
>>> --- a/hw/vfio/device.c
>>> +++ b/hw/vfio/device.c
>>> @@ -23,6 +23,7 @@
>>>
>>>  #include "hw/vfio/vfio-device.h"
>>>  #include "hw/vfio/pci.h"
>>> +#include "hw/iommu.h"
>>>  #include "hw/hw.h"
>>>  #include "trace.h"
>>>  #include "qapi/error.h"
>>> @@ -504,6 +505,17 @@ void vfio_device_unprepare(VFIODevice
>> *vbasedev)
>>>      vbasedev->bcontainer = NULL;
>>>  }
>>>
>>> +bool vfio_device_viommu_get_nested(VFIODevice *vbasedev)
>>> +{
>>> +    VFIOPCIDevice *vdev = vfio_pci_from_vfio_device(vbasedev);
>>> +
>>> +    if (vdev) {
>>> +        return !!(pci_device_get_viommu_cap(&vdev->pdev) &
>>> +                  VIOMMU_CAP_HW_NESTED);
>>> +    }
>>> +    return false;
>>> +}
>>> +
>>>  /*
>>>   * Traditional ioctl() based io
>>>   */
>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>> index 8c27222f75..e503c232e1 100644
>>> --- a/hw/vfio/iommufd.c
>>> +++ b/hw/vfio/iommufd.c
>>> @@ -379,6 +379,14 @@ static bool
>> iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>>>          flags = IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
>>>      }
>>>
>>> +    /*
>>> +     * If vIOMMU supports stage-1 translation, force to create nested
>> parent
>> I would rather not use another terminology here. You previously used
>> hw_nested, I think that's better. Also bear in mind that smmu supports
>> S1, S2 and S1+S2 in emulated code.
> What about 'nesting parent' to match kernel side terminology, per Nicolin's suggestion:
>
> In kernel kdoc/uAPI, we use:
>  - "nesting parent" for stage-2 object
>  - "nested hwpt", "nested domain" for stage-1 object
I still think that since you queried the HW_NESTED cap it makes sense to
continue using it. This can come along with the kernel terminology though.

Eric
>
> Thanks
> Zhenzhong
>> Thanks
>>
>> Eric
>>> +     * domain which could be reused by vIOMMU to create nested
>> domain.
>>> +     */
>>> +    if (vfio_device_viommu_get_nested(vbasedev)) {
>>> +        flags |= IOMMU_HWPT_ALLOC_NEST_PARENT;
>>> +    }
>>> +
>>>      if (cpr_is_incoming()) {
>>>          hwpt_id = vbasedev->cpr.hwpt_id;
>>>          goto skip_alloc;
RE: [PATCH v5 05/21] vfio/iommufd: Force creating nested parent domain
Posted by Duan, Zhenzhong 2 months, 2 weeks ago

>-----Original Message-----
>From: Eric Auger <eric.auger@redhat.com>
>Subject: Re: [PATCH v5 05/21] vfio/iommufd: Force creating nested parent
>domain
>
>
>
>On 8/28/25 11:53 AM, Duan, Zhenzhong wrote:
>> Hi Eric,
>>
>>> -----Original Message-----
>>> From: Eric Auger <eric.auger@redhat.com>
>>> Subject: Re: [PATCH v5 05/21] vfio/iommufd: Force creating nested parent
>>> domain
>>>
>>> Hi Zhenzhong,
>>>
>>> On 8/22/25 8:40 AM, Zhenzhong Duan wrote:
>>>> Call pci_device_get_viommu_cap() to get if vIOMMU supports
>>> VIOMMU_CAP_HW_NESTED,
>>>> if yes, create nested parent domain which could be reused by vIOMMU to
>>> create
>>>> nested domain.
>>>>
>>>> Introduce helper vfio_device_viommu_get_nested to facilitate this
>>>> implementation.
>>>>
>>>> It is safe because even if VIOMMU_CAP_HW_NESTED is returned, s->flts
>is
>>>> forbidden and VFIO device fails in set_iommu_device() call, until we
>support
>>>> passthrough device with x-flts=on.
>>>>
>>>> Suggested-by: Nicolin Chen <nicolinc@nvidia.com>
>>>> Suggested-by: Yi Liu <yi.l.liu@intel.com>
>>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>>> ---
>>>>  include/hw/vfio/vfio-device.h |  2 ++
>>>>  hw/vfio/device.c              | 12 ++++++++++++
>>>>  hw/vfio/iommufd.c             |  8 ++++++++
>>>>  3 files changed, 22 insertions(+)
>>>>
>>>> diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
>>>> index 6e4d5ccdac..ecd82c16c7 100644
>>>> --- a/include/hw/vfio/vfio-device.h
>>>> +++ b/include/hw/vfio/vfio-device.h
>>>> @@ -257,6 +257,8 @@ void vfio_device_prepare(VFIODevice *vbasedev,
>>> VFIOContainerBase *bcontainer,
>>>>  void vfio_device_unprepare(VFIODevice *vbasedev);
>>>>
>>>> +bool vfio_device_viommu_get_nested(VFIODevice *vbasedev);
>>> I would suggest vfio_device_viommu_has_feature_hw_nested or
>something
>>> alike
>>> get usually means tou take a ref count associated with a put
>> Sure.
>>
>>>> +
>>>>  int vfio_device_get_region_info(VFIODevice *vbasedev, int index,
>>>>                                  struct vfio_region_info **info);
>>>>  int vfio_device_get_region_info_type(VFIODevice *vbasedev, uint32_t
>>> type,
>>>> diff --git a/hw/vfio/device.c b/hw/vfio/device.c
>>>> index 08f12ac31f..3eeb71bd51 100644
>>>> --- a/hw/vfio/device.c
>>>> +++ b/hw/vfio/device.c
>>>> @@ -23,6 +23,7 @@
>>>>
>>>>  #include "hw/vfio/vfio-device.h"
>>>>  #include "hw/vfio/pci.h"
>>>> +#include "hw/iommu.h"
>>>>  #include "hw/hw.h"
>>>>  #include "trace.h"
>>>>  #include "qapi/error.h"
>>>> @@ -504,6 +505,17 @@ void vfio_device_unprepare(VFIODevice
>>> *vbasedev)
>>>>      vbasedev->bcontainer = NULL;
>>>>  }
>>>>
>>>> +bool vfio_device_viommu_get_nested(VFIODevice *vbasedev)
>>>> +{
>>>> +    VFIOPCIDevice *vdev = vfio_pci_from_vfio_device(vbasedev);
>>>> +
>>>> +    if (vdev) {
>>>> +        return !!(pci_device_get_viommu_cap(&vdev->pdev) &
>>>> +                  VIOMMU_CAP_HW_NESTED);
>>>> +    }
>>>> +    return false;
>>>> +}
>>>> +
>>>>  /*
>>>>   * Traditional ioctl() based io
>>>>   */
>>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>>> index 8c27222f75..e503c232e1 100644
>>>> --- a/hw/vfio/iommufd.c
>>>> +++ b/hw/vfio/iommufd.c
>>>> @@ -379,6 +379,14 @@ static bool
>>> iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>>>>          flags = IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
>>>>      }
>>>>
>>>> +    /*
>>>> +     * If vIOMMU supports stage-1 translation, force to create nested
>>> parent
>>> I would rather not use another terminology here. You previously used
>>> hw_nested, I think that's better. Also bear in mind that smmu supports
>>> S1, S2 and S1+S2 in emulated code.
>> What about 'nesting parent' to match kernel side terminology, per Nicolin's
>suggestion:
>>
>> In kernel kdoc/uAPI, we use:
>>  - "nesting parent" for stage-2 object
>>  - "nested hwpt", "nested domain" for stage-1 object
>I still think that since you queried the HW_NESTED cap it makes sense to
>continue using it. This can come along with the kernel terminology though.

OK, like below, do I understand right?

+    /*
+     * If vIOMMU supports stage-1 translation, force to create hw_nested
+     * (aka. nesting parent in kernel) domain which could be reused by
+     * vIOMMU to create nested domain.
+     */

Thanks
Zhenzhong
Re: [PATCH v5 05/21] vfio/iommufd: Force creating nested parent domain
Posted by Nicolin Chen 2 months, 2 weeks ago
On Fri, Aug 29, 2025 at 01:40:01AM +0000, Duan, Zhenzhong wrote:
> >-----Original Message-----
> >On 8/28/25 11:53 AM, Duan, Zhenzhong wrote:
> >>>> +    /*
> >>>> +     * If vIOMMU supports stage-1 translation, force to create nested
> >>> parent
> >>> I would rather not use another terminology here. You previously used
> >>> hw_nested, I think that's better. Also bear in mind that smmu supports
> >>> S1, S2 and S1+S2 in emulated code.
> >> What about 'nesting parent' to match kernel side terminology, per Nicolin's
> >suggestion:
> >>
> >> In kernel kdoc/uAPI, we use:
> >>  - "nesting parent" for stage-2 object
> >>  - "nested hwpt", "nested domain" for stage-1 object
> >I still think that since you queried the HW_NESTED cap it makes sense to
> >continue using it. This can come along with the kernel terminology though.
> 
> OK, like below, do I understand right?
> 
> +    /*
> +     * If vIOMMU supports stage-1 translation, force to create hw_nested
> +     * (aka. nesting parent in kernel) domain which could be reused by
> +     * vIOMMU to create nested domain.
> +     */

FWIW, while I was targeting the word "nested parent", I think Eric
was commenting on the word "stage-1 translation".

The vSMMU code supports "stage-1", "stage-2", and even "nested" as
its full emulation modes (no HW acceleration). So, any word like
"stage-1 translation" or "nested S1" can be confusing to the vSMMU
folks, as neither of them necessarily means "HW_NESTED" that stands
for "HW-accelerated nested stage-1".

Also, "HW_NESTED" != "nesting parent". They're two different things.
Thus, "force to create hw_nested" isn't accurate. Here, we want to
create a "nesting parent" HWPT. There is no other alternative name,
IMHO, given this is essentially a kernel-defined object.

Anyway, if we all agree on the VIOMMU_FLAG_WANT_NESTING_PARENT, it
is not necessary to have this comment (at least the first part) --
we could still note that the nesting parent HWPT will be reused by
vIOMMU to create nested HWPTs, if you'd like to.

Thanks
Nicolin
Re: [PATCH v5 05/21] vfio/iommufd: Force creating nested parent domain
Posted by Nicolin Chen 2 months, 3 weeks ago
On Fri, Aug 22, 2025 at 02:40:43AM -0400, Zhenzhong Duan wrote:
> Call pci_device_get_viommu_cap() to get if vIOMMU supports VIOMMU_CAP_HW_NESTED,
> if yes, create nested parent domain which could be reused by vIOMMU to create
> nested domain.
>
> Introduce helper vfio_device_viommu_get_nested to facilitate this
> implementation.

It'd be nicer to slightly mention the benefit of having it. Assuming
that QEMU commit message can be as long as 80 characters:

-------------------------
Call pci_device_get_viommu_cap() to get if vIOMMU supports VIOMMU_CAP_HW_NESTED.

If yes, create a nesting parent domain and add it to the container's hwpt_list,
letting this parent domain cover the entire stage-2 mappings (gPA=>PA).

This allows a VFIO passthrough device to directly attach to this default domain
and then to use the system address space and its listener.

Introduce a vfio_device_viommu_get_nested() helper to facilitate this
implementation.
-------------------------
 
> It is safe because even if VIOMMU_CAP_HW_NESTED is returned, s->flts is
> forbidden and VFIO device fails in set_iommu_device() call, until we support
> passthrough device with x-flts=on.

I think this is too vendor specific to be mentioned here. Likely
the previous VTD patch is the place to have this.

Or you could say:

--------------------------
It is safe to do so because a vIOMMU will be able to fail in set_iommu_device()
call, if something else related to the VFIO device or vIOMMU isn't compatible.
--------------------------

> +bool vfio_device_viommu_get_nested(VFIODevice *vbasedev)
> +{
> +    VFIOPCIDevice *vdev = vfio_pci_from_vfio_device(vbasedev);
> +
> +    if (vdev) {
> +        return !!(pci_device_get_viommu_cap(&vdev->pdev) &
> +                  VIOMMU_CAP_HW_NESTED);

"get_nested" feels too general. Here it particularly means the cap:

bool vfio_device_get_viommu_cap_hw_nested(VFIODevice *vbasedev)

> @@ -379,6 +379,14 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>          flags = IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
>      }
>  
> +    /*
> +     * If vIOMMU supports stage-1 translation, force to create nested parent

"nested parent" is a contradictory phrase. Parent is a container
holding some nested items. A nested parent sounds like a "parent"
item that lives inside another parent container.

In kernel kdoc/uAPI, we use:
 - "nesting parent" for stage-2 object
 - "nested hwpt", "nested domain" for stage-1 object

> +     * domain which could be reused by vIOMMU to create nested domain.
> +     */
> +    if (vfio_device_viommu_get_nested(vbasedev)) {

With these addressed,

Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
RE: [PATCH v5 05/21] vfio/iommufd: Force creating nested parent domain
Posted by Duan, Zhenzhong 2 months, 3 weeks ago

>-----Original Message-----
>From: Nicolin Chen <nicolinc@nvidia.com>
>Subject: Re: [PATCH v5 05/21] vfio/iommufd: Force creating nested parent
>domain
>
>On Fri, Aug 22, 2025 at 02:40:43AM -0400, Zhenzhong Duan wrote:
>> Call pci_device_get_viommu_cap() to get if vIOMMU supports
>VIOMMU_CAP_HW_NESTED,
>> if yes, create nested parent domain which could be reused by vIOMMU to
>create
>> nested domain.
>>
>> Introduce helper vfio_device_viommu_get_nested to facilitate this
>> implementation.
>
>It'd be nicer to slightly mention the benefit of having it. Assuming
>that QEMU commit message can be as long as 80 characters:
>
>-------------------------
>Call pci_device_get_viommu_cap() to get if vIOMMU supports
>VIOMMU_CAP_HW_NESTED.
>
>If yes, create a nesting parent domain and add it to the container's hwpt_list,
>letting this parent domain cover the entire stage-2 mappings (gPA=>PA).
>
>This allows a VFIO passthrough device to directly attach to this default
>domain
>and then to use the system address space and its listener.
>
>Introduce a vfio_device_viommu_get_nested() helper to facilitate this
>implementation.
>-------------------------

Thanks, will do.

>
>> It is safe because even if VIOMMU_CAP_HW_NESTED is returned, s->flts is
>> forbidden and VFIO device fails in set_iommu_device() call, until we support
>> passthrough device with x-flts=on.
>
>I think this is too vendor specific to be mentioned here. Likely
>the previous VTD patch is the place to have this.
>
>Or you could say:
>
>--------------------------
>It is safe to do so because a vIOMMU will be able to fail in
>set_iommu_device()
>call, if something else related to the VFIO device or vIOMMU isn't compatible.
>--------------------------

Will do.

>
>> +bool vfio_device_viommu_get_nested(VFIODevice *vbasedev)
>> +{
>> +    VFIOPCIDevice *vdev = vfio_pci_from_vfio_device(vbasedev);
>> +
>> +    if (vdev) {
>> +        return !!(pci_device_get_viommu_cap(&vdev->pdev) &
>> +                  VIOMMU_CAP_HW_NESTED);
>
>"get_nested" feels too general. Here it particularly means the cap:
>
>bool vfio_device_get_viommu_cap_hw_nested(VFIODevice *vbasedev)

Will use vfio_device_get_viommu_cap_hw_nested()

>
>> @@ -379,6 +379,14 @@ static bool
>iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>>          flags = IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
>>      }
>>
>> +    /*
>> +     * If vIOMMU supports stage-1 translation, force to create nested
>parent
>
>"nested parent" is a contradictory phrase. Parent is a container
>holding some nested items. A nested parent sounds like a "parent"
>item that lives inside another parent container.
>
>In kernel kdoc/uAPI, we use:
> - "nesting parent" for stage-2 object
> - "nested hwpt", "nested domain" for stage-1 object

Thanks for sharing this info, I didn't notice that. I will fix the whole series to use 'nesting parent'.

BRs,
Zhenzhong
Re: [PATCH v5 05/21] vfio/iommufd: Force creating nested parent domain
Posted by Eric Auger 2 months, 2 weeks ago

On 8/25/25 10:28 AM, Duan, Zhenzhong wrote:
>
>> -----Original Message-----
>> From: Nicolin Chen <nicolinc@nvidia.com>
>> Subject: Re: [PATCH v5 05/21] vfio/iommufd: Force creating nested parent
>> domain
>>
>> On Fri, Aug 22, 2025 at 02:40:43AM -0400, Zhenzhong Duan wrote:
>>> Call pci_device_get_viommu_cap() to get if vIOMMU supports
>> VIOMMU_CAP_HW_NESTED,
>>> if yes, create nested parent domain which could be reused by vIOMMU to
>> create
>>> nested domain.
>>>
>>> Introduce helper vfio_device_viommu_get_nested to facilitate this
>>> implementation.
>> It'd be nicer to slightly mention the benefit of having it. Assuming
>> that QEMU commit message can be as long as 80 characters:
>>
>> -------------------------
>> Call pci_device_get_viommu_cap() to get if vIOMMU supports
>> VIOMMU_CAP_HW_NESTED.
>>
>> If yes, create a nesting parent domain and add it to the container's hwpt_list,
>> letting this parent domain cover the entire stage-2 mappings (gPA=>PA).
>>
>> This allows a VFIO passthrough device to directly attach to this default
>> domain
>> and then to use the system address space and its listener.
>>
>> Introduce a vfio_device_viommu_get_nested() helper to facilitate this
>> implementation.
>> -------------------------
> Thanks, will do.
>
>>> It is safe because even if VIOMMU_CAP_HW_NESTED is returned, s->flts is
>>> forbidden and VFIO device fails in set_iommu_device() call, until we support
>>> passthrough device with x-flts=on.
>> I think this is too vendor specific to be mentioned here. Likely
>> the previous VTD patch is the place to have this.
>>
>> Or you could say:
>>
>> --------------------------
>> It is safe to do so because a vIOMMU will be able to fail in
>> set_iommu_device()
>> call, if something else related to the VFIO device or vIOMMU isn't compatible.
>> --------------------------
> Will do.
I would say: it is safe to add the flags |=
IOMMU_HWPT_ALLOC_NEST_PARENT; at this stage, despite the whole
functionality is not in place because HW_NESTED is currently forced off in

set_iommu_device()

Eric

>
>>> +bool vfio_device_viommu_get_nested(VFIODevice *vbasedev)
>>> +{
>>> +    VFIOPCIDevice *vdev = vfio_pci_from_vfio_device(vbasedev);
>>> +
>>> +    if (vdev) {
>>> +        return !!(pci_device_get_viommu_cap(&vdev->pdev) &
>>> +                  VIOMMU_CAP_HW_NESTED);
>> "get_nested" feels too general. Here it particularly means the cap:
>>
>> bool vfio_device_get_viommu_cap_hw_nested(VFIODevice *vbasedev)
> Will use vfio_device_get_viommu_cap_hw_nested()
>
>>> @@ -379,6 +379,14 @@ static bool
>> iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>>>          flags = IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
>>>      }
>>>
>>> +    /*
>>> +     * If vIOMMU supports stage-1 translation, force to create nested
>> parent
>>
>> "nested parent" is a contradictory phrase. Parent is a container
>> holding some nested items. A nested parent sounds like a "parent"
>> item that lives inside another parent container.
>>
>> In kernel kdoc/uAPI, we use:
>> - "nesting parent" for stage-2 object
>> - "nested hwpt", "nested domain" for stage-1 object
> Thanks for sharing this info, I didn't notice that. I will fix the whole series to use 'nesting parent'.
>
> BRs,
> Zhenzhong
>