Call pci_device_get_viommu_cap() to get if vIOMMU supports VIOMMU_CAP_HW_NESTED,
if yes, create nested parent domain which could be reused by vIOMMU to create
nested domain.
Introduce helper vfio_device_viommu_get_nested to facilitate this
implementation.
It is safe because even if VIOMMU_CAP_HW_NESTED is returned, s->flts is
forbidden and VFIO device fails in set_iommu_device() call, until we support
passthrough device with x-flts=on.
Suggested-by: Nicolin Chen <nicolinc@nvidia.com>
Suggested-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
include/hw/vfio/vfio-device.h | 2 ++
hw/vfio/device.c | 12 ++++++++++++
hw/vfio/iommufd.c | 8 ++++++++
3 files changed, 22 insertions(+)
diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
index 6e4d5ccdac..ecd82c16c7 100644
--- a/include/hw/vfio/vfio-device.h
+++ b/include/hw/vfio/vfio-device.h
@@ -257,6 +257,8 @@ void vfio_device_prepare(VFIODevice *vbasedev, VFIOContainerBase *bcontainer,
void vfio_device_unprepare(VFIODevice *vbasedev);
+bool vfio_device_viommu_get_nested(VFIODevice *vbasedev);
+
int vfio_device_get_region_info(VFIODevice *vbasedev, int index,
struct vfio_region_info **info);
int vfio_device_get_region_info_type(VFIODevice *vbasedev, uint32_t type,
diff --git a/hw/vfio/device.c b/hw/vfio/device.c
index 08f12ac31f..3eeb71bd51 100644
--- a/hw/vfio/device.c
+++ b/hw/vfio/device.c
@@ -23,6 +23,7 @@
#include "hw/vfio/vfio-device.h"
#include "hw/vfio/pci.h"
+#include "hw/iommu.h"
#include "hw/hw.h"
#include "trace.h"
#include "qapi/error.h"
@@ -504,6 +505,17 @@ void vfio_device_unprepare(VFIODevice *vbasedev)
vbasedev->bcontainer = NULL;
}
+bool vfio_device_viommu_get_nested(VFIODevice *vbasedev)
+{
+ VFIOPCIDevice *vdev = vfio_pci_from_vfio_device(vbasedev);
+
+ if (vdev) {
+ return !!(pci_device_get_viommu_cap(&vdev->pdev) &
+ VIOMMU_CAP_HW_NESTED);
+ }
+ return false;
+}
+
/*
* Traditional ioctl() based io
*/
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 8c27222f75..e503c232e1 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -379,6 +379,14 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
flags = IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
}
+ /*
+ * If vIOMMU supports stage-1 translation, force to create nested parent
+ * domain which could be reused by vIOMMU to create nested domain.
+ */
+ if (vfio_device_viommu_get_nested(vbasedev)) {
+ flags |= IOMMU_HWPT_ALLOC_NEST_PARENT;
+ }
+
if (cpr_is_incoming()) {
hwpt_id = vbasedev->cpr.hwpt_id;
goto skip_alloc;
--
2.47.1
Hi Zhenzhong,
On 8/22/25 8:40 AM, Zhenzhong Duan wrote:
> Call pci_device_get_viommu_cap() to get if vIOMMU supports VIOMMU_CAP_HW_NESTED,
> if yes, create nested parent domain which could be reused by vIOMMU to create
> nested domain.
>
> Introduce helper vfio_device_viommu_get_nested to facilitate this
> implementation.
>
> It is safe because even if VIOMMU_CAP_HW_NESTED is returned, s->flts is
> forbidden and VFIO device fails in set_iommu_device() call, until we support
> passthrough device with x-flts=on.
>
> Suggested-by: Nicolin Chen <nicolinc@nvidia.com>
> Suggested-by: Yi Liu <yi.l.liu@intel.com>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
> include/hw/vfio/vfio-device.h | 2 ++
> hw/vfio/device.c | 12 ++++++++++++
> hw/vfio/iommufd.c | 8 ++++++++
> 3 files changed, 22 insertions(+)
>
> diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
> index 6e4d5ccdac..ecd82c16c7 100644
> --- a/include/hw/vfio/vfio-device.h
> +++ b/include/hw/vfio/vfio-device.h
> @@ -257,6 +257,8 @@ void vfio_device_prepare(VFIODevice *vbasedev, VFIOContainerBase *bcontainer,
>
> void vfio_device_unprepare(VFIODevice *vbasedev);
>
> +bool vfio_device_viommu_get_nested(VFIODevice *vbasedev);
I would suggest vfio_device_viommu_has_feature_hw_nested or something alike
get usually means tou take a ref count associated with a put
> +
> int vfio_device_get_region_info(VFIODevice *vbasedev, int index,
> struct vfio_region_info **info);
> int vfio_device_get_region_info_type(VFIODevice *vbasedev, uint32_t type,
> diff --git a/hw/vfio/device.c b/hw/vfio/device.c
> index 08f12ac31f..3eeb71bd51 100644
> --- a/hw/vfio/device.c
> +++ b/hw/vfio/device.c
> @@ -23,6 +23,7 @@
>
> #include "hw/vfio/vfio-device.h"
> #include "hw/vfio/pci.h"
> +#include "hw/iommu.h"
> #include "hw/hw.h"
> #include "trace.h"
> #include "qapi/error.h"
> @@ -504,6 +505,17 @@ void vfio_device_unprepare(VFIODevice *vbasedev)
> vbasedev->bcontainer = NULL;
> }
>
> +bool vfio_device_viommu_get_nested(VFIODevice *vbasedev)
> +{
> + VFIOPCIDevice *vdev = vfio_pci_from_vfio_device(vbasedev);
> +
> + if (vdev) {
> + return !!(pci_device_get_viommu_cap(&vdev->pdev) &
> + VIOMMU_CAP_HW_NESTED);
> + }
> + return false;
> +}
> +
> /*
> * Traditional ioctl() based io
> */
> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> index 8c27222f75..e503c232e1 100644
> --- a/hw/vfio/iommufd.c
> +++ b/hw/vfio/iommufd.c
> @@ -379,6 +379,14 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
> flags = IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
> }
>
> + /*
> + * If vIOMMU supports stage-1 translation, force to create nested parent
I would rather not use another terminology here. You previously used
hw_nested, I think that's better. Also bear in mind that smmu supports
S1, S2 and S1+S2 in emulated code.
Thanks
Eric
> + * domain which could be reused by vIOMMU to create nested domain.
> + */
> + if (vfio_device_viommu_get_nested(vbasedev)) {
> + flags |= IOMMU_HWPT_ALLOC_NEST_PARENT;
> + }
> +
> if (cpr_is_incoming()) {
> hwpt_id = vbasedev->cpr.hwpt_id;
> goto skip_alloc;
Hi Eric,
>-----Original Message-----
>From: Eric Auger <eric.auger@redhat.com>
>Subject: Re: [PATCH v5 05/21] vfio/iommufd: Force creating nested parent
>domain
>
>Hi Zhenzhong,
>
>On 8/22/25 8:40 AM, Zhenzhong Duan wrote:
>> Call pci_device_get_viommu_cap() to get if vIOMMU supports
>VIOMMU_CAP_HW_NESTED,
>> if yes, create nested parent domain which could be reused by vIOMMU to
>create
>> nested domain.
>>
>> Introduce helper vfio_device_viommu_get_nested to facilitate this
>> implementation.
>>
>> It is safe because even if VIOMMU_CAP_HW_NESTED is returned, s->flts is
>> forbidden and VFIO device fails in set_iommu_device() call, until we support
>> passthrough device with x-flts=on.
>>
>> Suggested-by: Nicolin Chen <nicolinc@nvidia.com>
>> Suggested-by: Yi Liu <yi.l.liu@intel.com>
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> ---
>> include/hw/vfio/vfio-device.h | 2 ++
>> hw/vfio/device.c | 12 ++++++++++++
>> hw/vfio/iommufd.c | 8 ++++++++
>> 3 files changed, 22 insertions(+)
>>
>> diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
>> index 6e4d5ccdac..ecd82c16c7 100644
>> --- a/include/hw/vfio/vfio-device.h
>> +++ b/include/hw/vfio/vfio-device.h
>> @@ -257,6 +257,8 @@ void vfio_device_prepare(VFIODevice *vbasedev,
>VFIOContainerBase *bcontainer,
>>
>> void vfio_device_unprepare(VFIODevice *vbasedev);
>>
>> +bool vfio_device_viommu_get_nested(VFIODevice *vbasedev);
>I would suggest vfio_device_viommu_has_feature_hw_nested or something
>alike
>get usually means tou take a ref count associated with a put
Sure.
>> +
>> int vfio_device_get_region_info(VFIODevice *vbasedev, int index,
>> struct vfio_region_info **info);
>> int vfio_device_get_region_info_type(VFIODevice *vbasedev, uint32_t
>type,
>> diff --git a/hw/vfio/device.c b/hw/vfio/device.c
>> index 08f12ac31f..3eeb71bd51 100644
>> --- a/hw/vfio/device.c
>> +++ b/hw/vfio/device.c
>> @@ -23,6 +23,7 @@
>>
>> #include "hw/vfio/vfio-device.h"
>> #include "hw/vfio/pci.h"
>> +#include "hw/iommu.h"
>> #include "hw/hw.h"
>> #include "trace.h"
>> #include "qapi/error.h"
>> @@ -504,6 +505,17 @@ void vfio_device_unprepare(VFIODevice
>*vbasedev)
>> vbasedev->bcontainer = NULL;
>> }
>>
>> +bool vfio_device_viommu_get_nested(VFIODevice *vbasedev)
>> +{
>> + VFIOPCIDevice *vdev = vfio_pci_from_vfio_device(vbasedev);
>> +
>> + if (vdev) {
>> + return !!(pci_device_get_viommu_cap(&vdev->pdev) &
>> + VIOMMU_CAP_HW_NESTED);
>> + }
>> + return false;
>> +}
>> +
>> /*
>> * Traditional ioctl() based io
>> */
>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>> index 8c27222f75..e503c232e1 100644
>> --- a/hw/vfio/iommufd.c
>> +++ b/hw/vfio/iommufd.c
>> @@ -379,6 +379,14 @@ static bool
>iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>> flags = IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
>> }
>>
>> + /*
>> + * If vIOMMU supports stage-1 translation, force to create nested
>parent
>I would rather not use another terminology here. You previously used
>hw_nested, I think that's better. Also bear in mind that smmu supports
>S1, S2 and S1+S2 in emulated code.
What about 'nesting parent' to match kernel side terminology, per Nicolin's suggestion:
In kernel kdoc/uAPI, we use:
- "nesting parent" for stage-2 object
- "nested hwpt", "nested domain" for stage-1 object
Thanks
Zhenzhong
>
>Thanks
>
>Eric
>> + * domain which could be reused by vIOMMU to create nested
>domain.
>> + */
>> + if (vfio_device_viommu_get_nested(vbasedev)) {
>> + flags |= IOMMU_HWPT_ALLOC_NEST_PARENT;
>> + }
>> +
>> if (cpr_is_incoming()) {
>> hwpt_id = vbasedev->cpr.hwpt_id;
>> goto skip_alloc;
On 8/28/25 11:53 AM, Duan, Zhenzhong wrote:
> Hi Eric,
>
>> -----Original Message-----
>> From: Eric Auger <eric.auger@redhat.com>
>> Subject: Re: [PATCH v5 05/21] vfio/iommufd: Force creating nested parent
>> domain
>>
>> Hi Zhenzhong,
>>
>> On 8/22/25 8:40 AM, Zhenzhong Duan wrote:
>>> Call pci_device_get_viommu_cap() to get if vIOMMU supports
>> VIOMMU_CAP_HW_NESTED,
>>> if yes, create nested parent domain which could be reused by vIOMMU to
>> create
>>> nested domain.
>>>
>>> Introduce helper vfio_device_viommu_get_nested to facilitate this
>>> implementation.
>>>
>>> It is safe because even if VIOMMU_CAP_HW_NESTED is returned, s->flts is
>>> forbidden and VFIO device fails in set_iommu_device() call, until we support
>>> passthrough device with x-flts=on.
>>>
>>> Suggested-by: Nicolin Chen <nicolinc@nvidia.com>
>>> Suggested-by: Yi Liu <yi.l.liu@intel.com>
>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>> ---
>>> include/hw/vfio/vfio-device.h | 2 ++
>>> hw/vfio/device.c | 12 ++++++++++++
>>> hw/vfio/iommufd.c | 8 ++++++++
>>> 3 files changed, 22 insertions(+)
>>>
>>> diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
>>> index 6e4d5ccdac..ecd82c16c7 100644
>>> --- a/include/hw/vfio/vfio-device.h
>>> +++ b/include/hw/vfio/vfio-device.h
>>> @@ -257,6 +257,8 @@ void vfio_device_prepare(VFIODevice *vbasedev,
>> VFIOContainerBase *bcontainer,
>>> void vfio_device_unprepare(VFIODevice *vbasedev);
>>>
>>> +bool vfio_device_viommu_get_nested(VFIODevice *vbasedev);
>> I would suggest vfio_device_viommu_has_feature_hw_nested or something
>> alike
>> get usually means tou take a ref count associated with a put
> Sure.
>
>>> +
>>> int vfio_device_get_region_info(VFIODevice *vbasedev, int index,
>>> struct vfio_region_info **info);
>>> int vfio_device_get_region_info_type(VFIODevice *vbasedev, uint32_t
>> type,
>>> diff --git a/hw/vfio/device.c b/hw/vfio/device.c
>>> index 08f12ac31f..3eeb71bd51 100644
>>> --- a/hw/vfio/device.c
>>> +++ b/hw/vfio/device.c
>>> @@ -23,6 +23,7 @@
>>>
>>> #include "hw/vfio/vfio-device.h"
>>> #include "hw/vfio/pci.h"
>>> +#include "hw/iommu.h"
>>> #include "hw/hw.h"
>>> #include "trace.h"
>>> #include "qapi/error.h"
>>> @@ -504,6 +505,17 @@ void vfio_device_unprepare(VFIODevice
>> *vbasedev)
>>> vbasedev->bcontainer = NULL;
>>> }
>>>
>>> +bool vfio_device_viommu_get_nested(VFIODevice *vbasedev)
>>> +{
>>> + VFIOPCIDevice *vdev = vfio_pci_from_vfio_device(vbasedev);
>>> +
>>> + if (vdev) {
>>> + return !!(pci_device_get_viommu_cap(&vdev->pdev) &
>>> + VIOMMU_CAP_HW_NESTED);
>>> + }
>>> + return false;
>>> +}
>>> +
>>> /*
>>> * Traditional ioctl() based io
>>> */
>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>> index 8c27222f75..e503c232e1 100644
>>> --- a/hw/vfio/iommufd.c
>>> +++ b/hw/vfio/iommufd.c
>>> @@ -379,6 +379,14 @@ static bool
>> iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>>> flags = IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
>>> }
>>>
>>> + /*
>>> + * If vIOMMU supports stage-1 translation, force to create nested
>> parent
>> I would rather not use another terminology here. You previously used
>> hw_nested, I think that's better. Also bear in mind that smmu supports
>> S1, S2 and S1+S2 in emulated code.
> What about 'nesting parent' to match kernel side terminology, per Nicolin's suggestion:
>
> In kernel kdoc/uAPI, we use:
> - "nesting parent" for stage-2 object
> - "nested hwpt", "nested domain" for stage-1 object
I still think that since you queried the HW_NESTED cap it makes sense to
continue using it. This can come along with the kernel terminology though.
Eric
>
> Thanks
> Zhenzhong
>> Thanks
>>
>> Eric
>>> + * domain which could be reused by vIOMMU to create nested
>> domain.
>>> + */
>>> + if (vfio_device_viommu_get_nested(vbasedev)) {
>>> + flags |= IOMMU_HWPT_ALLOC_NEST_PARENT;
>>> + }
>>> +
>>> if (cpr_is_incoming()) {
>>> hwpt_id = vbasedev->cpr.hwpt_id;
>>> goto skip_alloc;
>-----Original Message-----
>From: Eric Auger <eric.auger@redhat.com>
>Subject: Re: [PATCH v5 05/21] vfio/iommufd: Force creating nested parent
>domain
>
>
>
>On 8/28/25 11:53 AM, Duan, Zhenzhong wrote:
>> Hi Eric,
>>
>>> -----Original Message-----
>>> From: Eric Auger <eric.auger@redhat.com>
>>> Subject: Re: [PATCH v5 05/21] vfio/iommufd: Force creating nested parent
>>> domain
>>>
>>> Hi Zhenzhong,
>>>
>>> On 8/22/25 8:40 AM, Zhenzhong Duan wrote:
>>>> Call pci_device_get_viommu_cap() to get if vIOMMU supports
>>> VIOMMU_CAP_HW_NESTED,
>>>> if yes, create nested parent domain which could be reused by vIOMMU to
>>> create
>>>> nested domain.
>>>>
>>>> Introduce helper vfio_device_viommu_get_nested to facilitate this
>>>> implementation.
>>>>
>>>> It is safe because even if VIOMMU_CAP_HW_NESTED is returned, s->flts
>is
>>>> forbidden and VFIO device fails in set_iommu_device() call, until we
>support
>>>> passthrough device with x-flts=on.
>>>>
>>>> Suggested-by: Nicolin Chen <nicolinc@nvidia.com>
>>>> Suggested-by: Yi Liu <yi.l.liu@intel.com>
>>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>>> ---
>>>> include/hw/vfio/vfio-device.h | 2 ++
>>>> hw/vfio/device.c | 12 ++++++++++++
>>>> hw/vfio/iommufd.c | 8 ++++++++
>>>> 3 files changed, 22 insertions(+)
>>>>
>>>> diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
>>>> index 6e4d5ccdac..ecd82c16c7 100644
>>>> --- a/include/hw/vfio/vfio-device.h
>>>> +++ b/include/hw/vfio/vfio-device.h
>>>> @@ -257,6 +257,8 @@ void vfio_device_prepare(VFIODevice *vbasedev,
>>> VFIOContainerBase *bcontainer,
>>>> void vfio_device_unprepare(VFIODevice *vbasedev);
>>>>
>>>> +bool vfio_device_viommu_get_nested(VFIODevice *vbasedev);
>>> I would suggest vfio_device_viommu_has_feature_hw_nested or
>something
>>> alike
>>> get usually means tou take a ref count associated with a put
>> Sure.
>>
>>>> +
>>>> int vfio_device_get_region_info(VFIODevice *vbasedev, int index,
>>>> struct vfio_region_info **info);
>>>> int vfio_device_get_region_info_type(VFIODevice *vbasedev, uint32_t
>>> type,
>>>> diff --git a/hw/vfio/device.c b/hw/vfio/device.c
>>>> index 08f12ac31f..3eeb71bd51 100644
>>>> --- a/hw/vfio/device.c
>>>> +++ b/hw/vfio/device.c
>>>> @@ -23,6 +23,7 @@
>>>>
>>>> #include "hw/vfio/vfio-device.h"
>>>> #include "hw/vfio/pci.h"
>>>> +#include "hw/iommu.h"
>>>> #include "hw/hw.h"
>>>> #include "trace.h"
>>>> #include "qapi/error.h"
>>>> @@ -504,6 +505,17 @@ void vfio_device_unprepare(VFIODevice
>>> *vbasedev)
>>>> vbasedev->bcontainer = NULL;
>>>> }
>>>>
>>>> +bool vfio_device_viommu_get_nested(VFIODevice *vbasedev)
>>>> +{
>>>> + VFIOPCIDevice *vdev = vfio_pci_from_vfio_device(vbasedev);
>>>> +
>>>> + if (vdev) {
>>>> + return !!(pci_device_get_viommu_cap(&vdev->pdev) &
>>>> + VIOMMU_CAP_HW_NESTED);
>>>> + }
>>>> + return false;
>>>> +}
>>>> +
>>>> /*
>>>> * Traditional ioctl() based io
>>>> */
>>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>>> index 8c27222f75..e503c232e1 100644
>>>> --- a/hw/vfio/iommufd.c
>>>> +++ b/hw/vfio/iommufd.c
>>>> @@ -379,6 +379,14 @@ static bool
>>> iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>>>> flags = IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
>>>> }
>>>>
>>>> + /*
>>>> + * If vIOMMU supports stage-1 translation, force to create nested
>>> parent
>>> I would rather not use another terminology here. You previously used
>>> hw_nested, I think that's better. Also bear in mind that smmu supports
>>> S1, S2 and S1+S2 in emulated code.
>> What about 'nesting parent' to match kernel side terminology, per Nicolin's
>suggestion:
>>
>> In kernel kdoc/uAPI, we use:
>> - "nesting parent" for stage-2 object
>> - "nested hwpt", "nested domain" for stage-1 object
>I still think that since you queried the HW_NESTED cap it makes sense to
>continue using it. This can come along with the kernel terminology though.
OK, like below, do I understand right?
+ /*
+ * If vIOMMU supports stage-1 translation, force to create hw_nested
+ * (aka. nesting parent in kernel) domain which could be reused by
+ * vIOMMU to create nested domain.
+ */
Thanks
Zhenzhong
On Fri, Aug 29, 2025 at 01:40:01AM +0000, Duan, Zhenzhong wrote: > >-----Original Message----- > >On 8/28/25 11:53 AM, Duan, Zhenzhong wrote: > >>>> + /* > >>>> + * If vIOMMU supports stage-1 translation, force to create nested > >>> parent > >>> I would rather not use another terminology here. You previously used > >>> hw_nested, I think that's better. Also bear in mind that smmu supports > >>> S1, S2 and S1+S2 in emulated code. > >> What about 'nesting parent' to match kernel side terminology, per Nicolin's > >suggestion: > >> > >> In kernel kdoc/uAPI, we use: > >> - "nesting parent" for stage-2 object > >> - "nested hwpt", "nested domain" for stage-1 object > >I still think that since you queried the HW_NESTED cap it makes sense to > >continue using it. This can come along with the kernel terminology though. > > OK, like below, do I understand right? > > + /* > + * If vIOMMU supports stage-1 translation, force to create hw_nested > + * (aka. nesting parent in kernel) domain which could be reused by > + * vIOMMU to create nested domain. > + */ FWIW, while I was targeting the word "nested parent", I think Eric was commenting on the word "stage-1 translation". The vSMMU code supports "stage-1", "stage-2", and even "nested" as its full emulation modes (no HW acceleration). So, any word like "stage-1 translation" or "nested S1" can be confusing to the vSMMU folks, as neither of them necessarily means "HW_NESTED" that stands for "HW-accelerated nested stage-1". Also, "HW_NESTED" != "nesting parent". They're two different things. Thus, "force to create hw_nested" isn't accurate. Here, we want to create a "nesting parent" HWPT. There is no other alternative name, IMHO, given this is essentially a kernel-defined object. Anyway, if we all agree on the VIOMMU_FLAG_WANT_NESTING_PARENT, it is not necessary to have this comment (at least the first part) -- we could still note that the nesting parent HWPT will be reused by vIOMMU to create nested HWPTs, if you'd like to. Thanks Nicolin
On Fri, Aug 22, 2025 at 02:40:43AM -0400, Zhenzhong Duan wrote:
> Call pci_device_get_viommu_cap() to get if vIOMMU supports VIOMMU_CAP_HW_NESTED,
> if yes, create nested parent domain which could be reused by vIOMMU to create
> nested domain.
>
> Introduce helper vfio_device_viommu_get_nested to facilitate this
> implementation.
It'd be nicer to slightly mention the benefit of having it. Assuming
that QEMU commit message can be as long as 80 characters:
-------------------------
Call pci_device_get_viommu_cap() to get if vIOMMU supports VIOMMU_CAP_HW_NESTED.
If yes, create a nesting parent domain and add it to the container's hwpt_list,
letting this parent domain cover the entire stage-2 mappings (gPA=>PA).
This allows a VFIO passthrough device to directly attach to this default domain
and then to use the system address space and its listener.
Introduce a vfio_device_viommu_get_nested() helper to facilitate this
implementation.
-------------------------
> It is safe because even if VIOMMU_CAP_HW_NESTED is returned, s->flts is
> forbidden and VFIO device fails in set_iommu_device() call, until we support
> passthrough device with x-flts=on.
I think this is too vendor specific to be mentioned here. Likely
the previous VTD patch is the place to have this.
Or you could say:
--------------------------
It is safe to do so because a vIOMMU will be able to fail in set_iommu_device()
call, if something else related to the VFIO device or vIOMMU isn't compatible.
--------------------------
> +bool vfio_device_viommu_get_nested(VFIODevice *vbasedev)
> +{
> + VFIOPCIDevice *vdev = vfio_pci_from_vfio_device(vbasedev);
> +
> + if (vdev) {
> + return !!(pci_device_get_viommu_cap(&vdev->pdev) &
> + VIOMMU_CAP_HW_NESTED);
"get_nested" feels too general. Here it particularly means the cap:
bool vfio_device_get_viommu_cap_hw_nested(VFIODevice *vbasedev)
> @@ -379,6 +379,14 @@ static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
> flags = IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
> }
>
> + /*
> + * If vIOMMU supports stage-1 translation, force to create nested parent
"nested parent" is a contradictory phrase. Parent is a container
holding some nested items. A nested parent sounds like a "parent"
item that lives inside another parent container.
In kernel kdoc/uAPI, we use:
- "nesting parent" for stage-2 object
- "nested hwpt", "nested domain" for stage-1 object
> + * domain which could be reused by vIOMMU to create nested domain.
> + */
> + if (vfio_device_viommu_get_nested(vbasedev)) {
With these addressed,
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
>-----Original Message-----
>From: Nicolin Chen <nicolinc@nvidia.com>
>Subject: Re: [PATCH v5 05/21] vfio/iommufd: Force creating nested parent
>domain
>
>On Fri, Aug 22, 2025 at 02:40:43AM -0400, Zhenzhong Duan wrote:
>> Call pci_device_get_viommu_cap() to get if vIOMMU supports
>VIOMMU_CAP_HW_NESTED,
>> if yes, create nested parent domain which could be reused by vIOMMU to
>create
>> nested domain.
>>
>> Introduce helper vfio_device_viommu_get_nested to facilitate this
>> implementation.
>
>It'd be nicer to slightly mention the benefit of having it. Assuming
>that QEMU commit message can be as long as 80 characters:
>
>-------------------------
>Call pci_device_get_viommu_cap() to get if vIOMMU supports
>VIOMMU_CAP_HW_NESTED.
>
>If yes, create a nesting parent domain and add it to the container's hwpt_list,
>letting this parent domain cover the entire stage-2 mappings (gPA=>PA).
>
>This allows a VFIO passthrough device to directly attach to this default
>domain
>and then to use the system address space and its listener.
>
>Introduce a vfio_device_viommu_get_nested() helper to facilitate this
>implementation.
>-------------------------
Thanks, will do.
>
>> It is safe because even if VIOMMU_CAP_HW_NESTED is returned, s->flts is
>> forbidden and VFIO device fails in set_iommu_device() call, until we support
>> passthrough device with x-flts=on.
>
>I think this is too vendor specific to be mentioned here. Likely
>the previous VTD patch is the place to have this.
>
>Or you could say:
>
>--------------------------
>It is safe to do so because a vIOMMU will be able to fail in
>set_iommu_device()
>call, if something else related to the VFIO device or vIOMMU isn't compatible.
>--------------------------
Will do.
>
>> +bool vfio_device_viommu_get_nested(VFIODevice *vbasedev)
>> +{
>> + VFIOPCIDevice *vdev = vfio_pci_from_vfio_device(vbasedev);
>> +
>> + if (vdev) {
>> + return !!(pci_device_get_viommu_cap(&vdev->pdev) &
>> + VIOMMU_CAP_HW_NESTED);
>
>"get_nested" feels too general. Here it particularly means the cap:
>
>bool vfio_device_get_viommu_cap_hw_nested(VFIODevice *vbasedev)
Will use vfio_device_get_viommu_cap_hw_nested()
>
>> @@ -379,6 +379,14 @@ static bool
>iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>> flags = IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
>> }
>>
>> + /*
>> + * If vIOMMU supports stage-1 translation, force to create nested
>parent
>
>"nested parent" is a contradictory phrase. Parent is a container
>holding some nested items. A nested parent sounds like a "parent"
>item that lives inside another parent container.
>
>In kernel kdoc/uAPI, we use:
> - "nesting parent" for stage-2 object
> - "nested hwpt", "nested domain" for stage-1 object
Thanks for sharing this info, I didn't notice that. I will fix the whole series to use 'nesting parent'.
BRs,
Zhenzhong
On 8/25/25 10:28 AM, Duan, Zhenzhong wrote:
>
>> -----Original Message-----
>> From: Nicolin Chen <nicolinc@nvidia.com>
>> Subject: Re: [PATCH v5 05/21] vfio/iommufd: Force creating nested parent
>> domain
>>
>> On Fri, Aug 22, 2025 at 02:40:43AM -0400, Zhenzhong Duan wrote:
>>> Call pci_device_get_viommu_cap() to get if vIOMMU supports
>> VIOMMU_CAP_HW_NESTED,
>>> if yes, create nested parent domain which could be reused by vIOMMU to
>> create
>>> nested domain.
>>>
>>> Introduce helper vfio_device_viommu_get_nested to facilitate this
>>> implementation.
>> It'd be nicer to slightly mention the benefit of having it. Assuming
>> that QEMU commit message can be as long as 80 characters:
>>
>> -------------------------
>> Call pci_device_get_viommu_cap() to get if vIOMMU supports
>> VIOMMU_CAP_HW_NESTED.
>>
>> If yes, create a nesting parent domain and add it to the container's hwpt_list,
>> letting this parent domain cover the entire stage-2 mappings (gPA=>PA).
>>
>> This allows a VFIO passthrough device to directly attach to this default
>> domain
>> and then to use the system address space and its listener.
>>
>> Introduce a vfio_device_viommu_get_nested() helper to facilitate this
>> implementation.
>> -------------------------
> Thanks, will do.
>
>>> It is safe because even if VIOMMU_CAP_HW_NESTED is returned, s->flts is
>>> forbidden and VFIO device fails in set_iommu_device() call, until we support
>>> passthrough device with x-flts=on.
>> I think this is too vendor specific to be mentioned here. Likely
>> the previous VTD patch is the place to have this.
>>
>> Or you could say:
>>
>> --------------------------
>> It is safe to do so because a vIOMMU will be able to fail in
>> set_iommu_device()
>> call, if something else related to the VFIO device or vIOMMU isn't compatible.
>> --------------------------
> Will do.
I would say: it is safe to add the flags |=
IOMMU_HWPT_ALLOC_NEST_PARENT; at this stage, despite the whole
functionality is not in place because HW_NESTED is currently forced off in
set_iommu_device()
Eric
>
>>> +bool vfio_device_viommu_get_nested(VFIODevice *vbasedev)
>>> +{
>>> + VFIOPCIDevice *vdev = vfio_pci_from_vfio_device(vbasedev);
>>> +
>>> + if (vdev) {
>>> + return !!(pci_device_get_viommu_cap(&vdev->pdev) &
>>> + VIOMMU_CAP_HW_NESTED);
>> "get_nested" feels too general. Here it particularly means the cap:
>>
>> bool vfio_device_get_viommu_cap_hw_nested(VFIODevice *vbasedev)
> Will use vfio_device_get_viommu_cap_hw_nested()
>
>>> @@ -379,6 +379,14 @@ static bool
>> iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
>>> flags = IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
>>> }
>>>
>>> + /*
>>> + * If vIOMMU supports stage-1 translation, force to create nested
>> parent
>>
>> "nested parent" is a contradictory phrase. Parent is a container
>> holding some nested items. A nested parent sounds like a "parent"
>> item that lives inside another parent container.
>>
>> In kernel kdoc/uAPI, we use:
>> - "nesting parent" for stage-2 object
>> - "nested hwpt", "nested domain" for stage-1 object
> Thanks for sharing this info, I didn't notice that. I will fix the whole series to use 'nesting parent'.
>
> BRs,
> Zhenzhong
>
© 2016 - 2025 Red Hat, Inc.