When vIOMMU is configured x-flts=on in scalable mode, stage-1 page table
is passed to host to construct nested page table. We need to check
compatibility of some critical IOMMU capabilities between vIOMMU and
host IOMMU to ensure guest stage-1 page table could be used by host.
For instance, vIOMMU supports stage-1 1GB huge page mapping, but host
does not, then this IOMMUFD backed device should fail.
Even of the checks pass, for now we willingly reject the association
because all the bits are not there yet.
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
hw/i386/intel_iommu.c | 30 +++++++++++++++++++++++++++++-
hw/i386/intel_iommu_internal.h | 1 +
2 files changed, 30 insertions(+), 1 deletion(-)
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index e90fd2f28f..c57ca02cdd 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -40,6 +40,7 @@
#include "kvm/kvm_i386.h"
#include "migration/vmstate.h"
#include "trace.h"
+#include "system/iommufd.h"
/* context entry operations */
#define VTD_CE_GET_RID2PASID(ce) \
@@ -4355,7 +4356,34 @@ static bool vtd_check_hiod(IntelIOMMUState *s, HostIOMMUDevice *hiod,
return true;
}
- error_setg(errp, "host device is uncompatible with stage-1 translation");
+#ifdef CONFIG_IOMMUFD
+ struct HostIOMMUDeviceCaps *caps = &hiod->caps;
+ struct iommu_hw_info_vtd *vtd = &caps->vendor_caps.vtd;
+
+ /* Remaining checks are all stage-1 translation specific */
+ if (!object_dynamic_cast(OBJECT(hiod), TYPE_HOST_IOMMU_DEVICE_IOMMUFD)) {
+ error_setg(errp, "Need IOMMUFD backend when x-flts=on");
+ return false;
+ }
+
+ if (caps->type != IOMMU_HW_INFO_TYPE_INTEL_VTD) {
+ error_setg(errp, "Incompatible host platform IOMMU type %d",
+ caps->type);
+ return false;
+ }
+
+ if (!(vtd->ecap_reg & VTD_ECAP_NEST)) {
+ error_setg(errp, "Host IOMMU doesn't support nested translation");
+ return false;
+ }
+
+ if (s->fs1gp && !(vtd->cap_reg & VTD_CAP_FS1GP)) {
+ error_setg(errp, "Stage-1 1GB huge page is unsupported by host IOMMU");
+ return false;
+ }
+#endif
+
+ error_setg(errp, "host IOMMU is incompatible with stage-1 translation");
return false;
}
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index 7aba259ef8..18bc22fc72 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -192,6 +192,7 @@
#define VTD_ECAP_PT (1ULL << 6)
#define VTD_ECAP_SC (1ULL << 7)
#define VTD_ECAP_MHMV (15ULL << 20)
+#define VTD_ECAP_NEST (1ULL << 26)
#define VTD_ECAP_SRS (1ULL << 31)
#define VTD_ECAP_PASID (1ULL << 40)
#define VTD_ECAP_SMTS (1ULL << 43)
--
2.47.1
Hi Zhenzhong,
On 7/8/25 1:05 PM, Zhenzhong Duan wrote:
> When vIOMMU is configured x-flts=on in scalable mode, stage-1 page table
> is passed to host to construct nested page table. We need to check
> compatibility of some critical IOMMU capabilities between vIOMMU and
> host IOMMU to ensure guest stage-1 page table could be used by host.
>
> For instance, vIOMMU supports stage-1 1GB huge page mapping, but host
> does not, then this IOMMUFD backed device should fail.
>
> Even of the checks pass, for now we willingly reject the association
> because all the bits are not there yet.
>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
> hw/i386/intel_iommu.c | 30 +++++++++++++++++++++++++++++-
> hw/i386/intel_iommu_internal.h | 1 +
> 2 files changed, 30 insertions(+), 1 deletion(-)
>
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index e90fd2f28f..c57ca02cdd 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -40,6 +40,7 @@
> #include "kvm/kvm_i386.h"
> #include "migration/vmstate.h"
> #include "trace.h"
> +#include "system/iommufd.h"
>
> /* context entry operations */
> #define VTD_CE_GET_RID2PASID(ce) \
> @@ -4355,7 +4356,34 @@ static bool vtd_check_hiod(IntelIOMMUState *s, HostIOMMUDevice *hiod,
> return true;
> }
>
> - error_setg(errp, "host device is uncompatible with stage-1 translation");
> +#ifdef CONFIG_IOMMUFD
> + struct HostIOMMUDeviceCaps *caps = &hiod->caps;
> + struct iommu_hw_info_vtd *vtd = &caps->vendor_caps.vtd;
I am now confused about how this relates to vtd_get_viommu_cap().
PCIIOMMUOps.set_iommu_device = vtd_dev_set_iommu_device calls
vtd_check_hiod()
viommu might return HW_NESTED_CAP through PCIIOMMUOps.get_viommu_cap
without making sure the underlying HW IOMMU does support it. Is that a
correct understanding? Maybe we should clarify the calling order between
set_iommu_device vs get_viommu_cap? Could we check HW IOMMU
prerequisites in vtd_get_viommu_cap() by enforcing this is called after
set_iommu_device. I think we should clarify the exact semantic of
get_viommu_cap().Thanks Eric
> +
> + /* Remaining checks are all stage-1 translation specific */
> + if (!object_dynamic_cast(OBJECT(hiod), TYPE_HOST_IOMMU_DEVICE_IOMMUFD)) {
> + error_setg(errp, "Need IOMMUFD backend when x-flts=on");
> + return false;
> + }
> +
> + if (caps->type != IOMMU_HW_INFO_TYPE_INTEL_VTD) {
> + error_setg(errp, "Incompatible host platform IOMMU type %d",
> + caps->type);
> + return false;
> + }
> +
> + if (!(vtd->ecap_reg & VTD_ECAP_NEST)) {
> + error_setg(errp, "Host IOMMU doesn't support nested translation");
> + return false;
> + }
> +
> + if (s->fs1gp && !(vtd->cap_reg & VTD_CAP_FS1GP)) {
> + error_setg(errp, "Stage-1 1GB huge page is unsupported by host IOMMU");
> + return false;
> + }
> +#endif
> +
> + error_setg(errp, "host IOMMU is incompatible with stage-1 translation");
> return false;
> }
>
> diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
> index 7aba259ef8..18bc22fc72 100644
> --- a/hw/i386/intel_iommu_internal.h
> +++ b/hw/i386/intel_iommu_internal.h
> @@ -192,6 +192,7 @@
> #define VTD_ECAP_PT (1ULL << 6)
> #define VTD_ECAP_SC (1ULL << 7)
> #define VTD_ECAP_MHMV (15ULL << 20)
> +#define VTD_ECAP_NEST (1ULL << 26)
> #define VTD_ECAP_SRS (1ULL << 31)
> #define VTD_ECAP_PASID (1ULL << 40)
> #define VTD_ECAP_SMTS (1ULL << 43)
Hi Eric, >-----Original Message----- >From: Eric Auger <eric.auger@redhat.com> >Subject: Re: [PATCH v3 07/20] intel_iommu: Check for compatibility with >IOMMUFD backed device when x-flts=on > >Hi Zhenzhong, > >On 7/8/25 1:05 PM, Zhenzhong Duan wrote: >> When vIOMMU is configured x-flts=on in scalable mode, stage-1 page table >> is passed to host to construct nested page table. We need to check >> compatibility of some critical IOMMU capabilities between vIOMMU and >> host IOMMU to ensure guest stage-1 page table could be used by host. >> >> For instance, vIOMMU supports stage-1 1GB huge page mapping, but host >> does not, then this IOMMUFD backed device should fail. >> >> Even of the checks pass, for now we willingly reject the association >> because all the bits are not there yet. >> >> Signed-off-by: Yi Liu <yi.l.liu@intel.com> >> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> >> --- >> hw/i386/intel_iommu.c | 30 >+++++++++++++++++++++++++++++- >> hw/i386/intel_iommu_internal.h | 1 + >> 2 files changed, 30 insertions(+), 1 deletion(-) >> >> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c >> index e90fd2f28f..c57ca02cdd 100644 >> --- a/hw/i386/intel_iommu.c >> +++ b/hw/i386/intel_iommu.c >> @@ -40,6 +40,7 @@ >> #include "kvm/kvm_i386.h" >> #include "migration/vmstate.h" >> #include "trace.h" >> +#include "system/iommufd.h" >> >> /* context entry operations */ >> #define VTD_CE_GET_RID2PASID(ce) \ >> @@ -4355,7 +4356,34 @@ static bool vtd_check_hiod(IntelIOMMUState *s, >HostIOMMUDevice *hiod, >> return true; >> } >> >> - error_setg(errp, "host device is uncompatible with stage-1 >translation"); >> +#ifdef CONFIG_IOMMUFD >> + struct HostIOMMUDeviceCaps *caps = &hiod->caps; >> + struct iommu_hw_info_vtd *vtd = &caps->vendor_caps.vtd; > >I am now confused about how this relates to vtd_get_viommu_cap(). >PCIIOMMUOps.set_iommu_device = vtd_dev_set_iommu_device calls >vtd_check_hiod() >viommu might return HW_NESTED_CAP through >PCIIOMMUOps.get_viommu_cap >without making sure the underlying HW IOMMU does support it. Is that a >correct understanding? Maybe we should clarify the calling order between >set_iommu_device vs get_viommu_cap? Could we check HW IOMMU >prerequisites in vtd_get_viommu_cap() by enforcing this is called after >set_iommu_device. I think we should clarify the exact semantic of >get_viommu_cap().Thanks Eric My understanding get_viommu_cap() returns pure vIOMMU's capabilities with no host IOMMU's capabilities involved. For example, returned HW_NESTED_CAP means this vIOMMU has code to support creating nested hwpt and attaching, no matter if host IOMMU supports nesting or not. The compatibility check between host IOMMU vs vIOMMU is done in set_iommu_device(), see vtd_check_hiod(). It's too late for VFIO to call get_viommu_cap() after set_iommu_device() because we need get_viommu_cap() to determine if creating nested parent hwpt or not at attaching stage. If host doesn't support nesting, then creating nested parent hwpt will fail early in vfio realize before vtd_check_hiod() is called and we fail the hotplug. Thanks Zhenzhong
Hi Zhenzhong, On 7/16/25 12:31 PM, Duan, Zhenzhong wrote: > Hi Eric, > >> -----Original Message----- >> From: Eric Auger <eric.auger@redhat.com> >> Subject: Re: [PATCH v3 07/20] intel_iommu: Check for compatibility with >> IOMMUFD backed device when x-flts=on >> >> Hi Zhenzhong, >> >> On 7/8/25 1:05 PM, Zhenzhong Duan wrote: >>> When vIOMMU is configured x-flts=on in scalable mode, stage-1 page table >>> is passed to host to construct nested page table. We need to check >>> compatibility of some critical IOMMU capabilities between vIOMMU and >>> host IOMMU to ensure guest stage-1 page table could be used by host. >>> >>> For instance, vIOMMU supports stage-1 1GB huge page mapping, but host >>> does not, then this IOMMUFD backed device should fail. >>> >>> Even of the checks pass, for now we willingly reject the association >>> because all the bits are not there yet. >>> >>> Signed-off-by: Yi Liu <yi.l.liu@intel.com> >>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> >>> --- >>> hw/i386/intel_iommu.c | 30 >> +++++++++++++++++++++++++++++- >>> hw/i386/intel_iommu_internal.h | 1 + >>> 2 files changed, 30 insertions(+), 1 deletion(-) >>> >>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c >>> index e90fd2f28f..c57ca02cdd 100644 >>> --- a/hw/i386/intel_iommu.c >>> +++ b/hw/i386/intel_iommu.c >>> @@ -40,6 +40,7 @@ >>> #include "kvm/kvm_i386.h" >>> #include "migration/vmstate.h" >>> #include "trace.h" >>> +#include "system/iommufd.h" >>> >>> /* context entry operations */ >>> #define VTD_CE_GET_RID2PASID(ce) \ >>> @@ -4355,7 +4356,34 @@ static bool vtd_check_hiod(IntelIOMMUState *s, >> HostIOMMUDevice *hiod, >>> return true; >>> } >>> >>> - error_setg(errp, "host device is uncompatible with stage-1 >> translation"); >>> +#ifdef CONFIG_IOMMUFD >>> + struct HostIOMMUDeviceCaps *caps = &hiod->caps; >>> + struct iommu_hw_info_vtd *vtd = &caps->vendor_caps.vtd; >> I am now confused about how this relates to vtd_get_viommu_cap(). >> PCIIOMMUOps.set_iommu_device = vtd_dev_set_iommu_device calls >> vtd_check_hiod() >> viommu might return HW_NESTED_CAP through >> PCIIOMMUOps.get_viommu_cap >> without making sure the underlying HW IOMMU does support it. Is that a >> correct understanding? Maybe we should clarify the calling order between >> set_iommu_device vs get_viommu_cap? Could we check HW IOMMU >> prerequisites in vtd_get_viommu_cap() by enforcing this is called after >> set_iommu_device. I think we should clarify the exact semantic of >> get_viommu_cap().Thanks Eric > My understanding get_viommu_cap() returns pure vIOMMU's capabilities > with no host IOMMU's capabilities involved. > > For example, returned HW_NESTED_CAP means this vIOMMU has code > to support creating nested hwpt and attaching, no matter if host IOMMU > supports nesting or not. Then I think you need to refine the description in 2/20 to make this clear. stating explicitly get_viommu_cap returns theoretical capabilities which are independent on the actual host capabilities they may depend on. > > The compatibility check between host IOMMU vs vIOMMU is done in > set_iommu_device(), see vtd_check_hiod(). > > It's too late for VFIO to call get_viommu_cap() after set_iommu_device() > because we need get_viommu_cap() to determine if creating nested parent > hwpt or not at attaching stage. isn't it possible to rework the call sequence? Eric > > If host doesn't support nesting, then creating nested parent hwpt will fail > early in vfio realize before vtd_check_hiod() is called and we fail the hotplug. > > Thanks > Zhenzhong
Hi Eric,
>-----Original Message-----
>From: Eric Auger <eric.auger@redhat.com>
>Sent: Wednesday, July 16, 2025 8:09 PM
>To: Duan, Zhenzhong <zhenzhong.duan@intel.com>;
>qemu-devel@nongnu.org
>Cc: alex.williamson@redhat.com; clg@redhat.com; mst@redhat.com;
>jasowang@redhat.com; peterx@redhat.com; ddutile@redhat.com;
>jgg@nvidia.com; nicolinc@nvidia.com;
>shameerali.kolothum.thodi@huawei.com; joao.m.martins@oracle.com;
>clement.mathieu--drif@eviden.com; Tian, Kevin <kevin.tian@intel.com>; Liu,
>Yi L <yi.l.liu@intel.com>; Peng, Chao P <chao.p.peng@intel.com>
>Subject: Re: [PATCH v3 07/20] intel_iommu: Check for compatibility with
>IOMMUFD backed device when x-flts=on
>
>Hi Zhenzhong,
>
>On 7/16/25 12:31 PM, Duan, Zhenzhong wrote:
>> Hi Eric,
>>
>>> -----Original Message-----
>>> From: Eric Auger <eric.auger@redhat.com>
>>> Subject: Re: [PATCH v3 07/20] intel_iommu: Check for compatibility with
>>> IOMMUFD backed device when x-flts=on
>>>
>>> Hi Zhenzhong,
>>>
>>> On 7/8/25 1:05 PM, Zhenzhong Duan wrote:
>>>> When vIOMMU is configured x-flts=on in scalable mode, stage-1 page
>table
>>>> is passed to host to construct nested page table. We need to check
>>>> compatibility of some critical IOMMU capabilities between vIOMMU and
>>>> host IOMMU to ensure guest stage-1 page table could be used by host.
>>>>
>>>> For instance, vIOMMU supports stage-1 1GB huge page mapping, but
>host
>>>> does not, then this IOMMUFD backed device should fail.
>>>>
>>>> Even of the checks pass, for now we willingly reject the association
>>>> because all the bits are not there yet.
>>>>
>>>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
>>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>>> ---
>>>> hw/i386/intel_iommu.c | 30
>>> +++++++++++++++++++++++++++++-
>>>> hw/i386/intel_iommu_internal.h | 1 +
>>>> 2 files changed, 30 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>>>> index e90fd2f28f..c57ca02cdd 100644
>>>> --- a/hw/i386/intel_iommu.c
>>>> +++ b/hw/i386/intel_iommu.c
>>>> @@ -40,6 +40,7 @@
>>>> #include "kvm/kvm_i386.h"
>>>> #include "migration/vmstate.h"
>>>> #include "trace.h"
>>>> +#include "system/iommufd.h"
>>>>
>>>> /* context entry operations */
>>>> #define VTD_CE_GET_RID2PASID(ce) \
>>>> @@ -4355,7 +4356,34 @@ static bool vtd_check_hiod(IntelIOMMUState
>*s,
>>> HostIOMMUDevice *hiod,
>>>> return true;
>>>> }
>>>>
>>>> - error_setg(errp, "host device is uncompatible with stage-1
>>> translation");
>>>> +#ifdef CONFIG_IOMMUFD
>>>> + struct HostIOMMUDeviceCaps *caps = &hiod->caps;
>>>> + struct iommu_hw_info_vtd *vtd = &caps->vendor_caps.vtd;
>>> I am now confused about how this relates to vtd_get_viommu_cap().
>>> PCIIOMMUOps.set_iommu_device = vtd_dev_set_iommu_device calls
>>> vtd_check_hiod()
>>> viommu might return HW_NESTED_CAP through
>>> PCIIOMMUOps.get_viommu_cap
>>> without making sure the underlying HW IOMMU does support it. Is that a
>>> correct understanding? Maybe we should clarify the calling order between
>>> set_iommu_device vs get_viommu_cap? Could we check HW IOMMU
>>> prerequisites in vtd_get_viommu_cap() by enforcing this is called after
>>> set_iommu_device. I think we should clarify the exact semantic of
>>> get_viommu_cap().Thanks Eric
>> My understanding get_viommu_cap() returns pure vIOMMU's capabilities
>> with no host IOMMU's capabilities involved.
>>
>> For example, returned HW_NESTED_CAP means this vIOMMU has code
>> to support creating nested hwpt and attaching, no matter if host IOMMU
>> supports nesting or not.
>
>Then I think you need to refine the description in 2/20 to make this clear.
>stating explicitly get_viommu_cap returns theoretical capabilities which
>are independent on the actual host capabilities they may depend on.
Will do.
For virtual vtd, we are unable to return capabilities depending on host capacities,
Because different host IOMMU may have different capabilities, we want to return
a consistent result only depending on user's cmdline config.
>>
>> The compatibility check between host IOMMU vs vIOMMU is done in
>> set_iommu_device(), see vtd_check_hiod().
>>
>> It's too late for VFIO to call get_viommu_cap() after set_iommu_device()
>> because we need get_viommu_cap() to determine if creating nested parent
>> hwpt or not at attaching stage.
>isn't it possible to rework the call sequence?
I think not. Current sequence:
attach_device()
get_viommu_cap()
create hwpt
...
create hiod
set_iommu_device(hiod)
Hiod realize needs iommufd,devid and hwpt_id which are ready after attach_device().
Thanks
Zhenzhong
Hi Zhenzhong, On 7/17/25 5:47 AM, Duan, Zhenzhong wrote: > Hi Eric, > >> -----Original Message----- >> From: Eric Auger <eric.auger@redhat.com> >> Sent: Wednesday, July 16, 2025 8:09 PM >> To: Duan, Zhenzhong <zhenzhong.duan@intel.com>; >> qemu-devel@nongnu.org >> Cc: alex.williamson@redhat.com; clg@redhat.com; mst@redhat.com; >> jasowang@redhat.com; peterx@redhat.com; ddutile@redhat.com; >> jgg@nvidia.com; nicolinc@nvidia.com; >> shameerali.kolothum.thodi@huawei.com; joao.m.martins@oracle.com; >> clement.mathieu--drif@eviden.com; Tian, Kevin <kevin.tian@intel.com>; Liu, >> Yi L <yi.l.liu@intel.com>; Peng, Chao P <chao.p.peng@intel.com> >> Subject: Re: [PATCH v3 07/20] intel_iommu: Check for compatibility with >> IOMMUFD backed device when x-flts=on >> >> Hi Zhenzhong, >> >> On 7/16/25 12:31 PM, Duan, Zhenzhong wrote: >>> Hi Eric, >>> >>>> -----Original Message----- >>>> From: Eric Auger <eric.auger@redhat.com> >>>> Subject: Re: [PATCH v3 07/20] intel_iommu: Check for compatibility with >>>> IOMMUFD backed device when x-flts=on >>>> >>>> Hi Zhenzhong, >>>> >>>> On 7/8/25 1:05 PM, Zhenzhong Duan wrote: >>>>> When vIOMMU is configured x-flts=on in scalable mode, stage-1 page >> table >>>>> is passed to host to construct nested page table. We need to check >>>>> compatibility of some critical IOMMU capabilities between vIOMMU and >>>>> host IOMMU to ensure guest stage-1 page table could be used by host. >>>>> >>>>> For instance, vIOMMU supports stage-1 1GB huge page mapping, but >> host >>>>> does not, then this IOMMUFD backed device should fail. >>>>> >>>>> Even of the checks pass, for now we willingly reject the association >>>>> because all the bits are not there yet. >>>>> >>>>> Signed-off-by: Yi Liu <yi.l.liu@intel.com> >>>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> >>>>> --- >>>>> hw/i386/intel_iommu.c | 30 >>>> +++++++++++++++++++++++++++++- >>>>> hw/i386/intel_iommu_internal.h | 1 + >>>>> 2 files changed, 30 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c >>>>> index e90fd2f28f..c57ca02cdd 100644 >>>>> --- a/hw/i386/intel_iommu.c >>>>> +++ b/hw/i386/intel_iommu.c >>>>> @@ -40,6 +40,7 @@ >>>>> #include "kvm/kvm_i386.h" >>>>> #include "migration/vmstate.h" >>>>> #include "trace.h" >>>>> +#include "system/iommufd.h" >>>>> >>>>> /* context entry operations */ >>>>> #define VTD_CE_GET_RID2PASID(ce) \ >>>>> @@ -4355,7 +4356,34 @@ static bool vtd_check_hiod(IntelIOMMUState >> *s, >>>> HostIOMMUDevice *hiod, >>>>> return true; >>>>> } >>>>> >>>>> - error_setg(errp, "host device is uncompatible with stage-1 >>>> translation"); >>>>> +#ifdef CONFIG_IOMMUFD >>>>> + struct HostIOMMUDeviceCaps *caps = &hiod->caps; >>>>> + struct iommu_hw_info_vtd *vtd = &caps->vendor_caps.vtd; >>>> I am now confused about how this relates to vtd_get_viommu_cap(). >>>> PCIIOMMUOps.set_iommu_device = vtd_dev_set_iommu_device calls >>>> vtd_check_hiod() >>>> viommu might return HW_NESTED_CAP through >>>> PCIIOMMUOps.get_viommu_cap >>>> without making sure the underlying HW IOMMU does support it. Is that a >>>> correct understanding? Maybe we should clarify the calling order between >>>> set_iommu_device vs get_viommu_cap? Could we check HW IOMMU >>>> prerequisites in vtd_get_viommu_cap() by enforcing this is called after >>>> set_iommu_device. I think we should clarify the exact semantic of >>>> get_viommu_cap().Thanks Eric >>> My understanding get_viommu_cap() returns pure vIOMMU's capabilities >>> with no host IOMMU's capabilities involved. >>> >>> For example, returned HW_NESTED_CAP means this vIOMMU has code >>> to support creating nested hwpt and attaching, no matter if host IOMMU >>> supports nesting or not. >> Then I think you need to refine the description in 2/20 to make this clear. >> stating explicitly get_viommu_cap returns theoretical capabilities which >> are independent on the actual host capabilities they may depend on. > Will do. > > For virtual vtd, we are unable to return capabilities depending on host capacities, > Because different host IOMMU may have different capabilities, we want to return > a consistent result only depending on user's cmdline config. ok > >>> The compatibility check between host IOMMU vs vIOMMU is done in >>> set_iommu_device(), see vtd_check_hiod(). >>> >>> It's too late for VFIO to call get_viommu_cap() after set_iommu_device() >>> because we need get_viommu_cap() to determine if creating nested parent >>> hwpt or not at attaching stage. >> isn't it possible to rework the call sequence? > I think not. Current sequence: > > attach_device() > get_viommu_cap() > create hwpt > ... > create hiod > set_iommu_device(hiod) > > Hiod realize needs iommufd,devid and hwpt_id which are ready after attach_device(). OK. I would add this explanation in the commit msg too. > > Thanks > Zhenzhong Thanks Eric
Hi Eric, >-----Original Message----- >From: Eric Auger <eric.auger@redhat.com> >Subject: Re: [PATCH v3 07/20] intel_iommu: Check for compatibility with >IOMMUFD backed device when x-flts=on > >Hi Zhenzhong, > >On 7/17/25 5:47 AM, Duan, Zhenzhong wrote: >> Hi Eric, >> >>> -----Original Message----- >>> From: Eric Auger <eric.auger@redhat.com> >>> Sent: Wednesday, July 16, 2025 8:09 PM >>> To: Duan, Zhenzhong <zhenzhong.duan@intel.com>; >>> qemu-devel@nongnu.org >>> Cc: alex.williamson@redhat.com; clg@redhat.com; mst@redhat.com; >>> jasowang@redhat.com; peterx@redhat.com; ddutile@redhat.com; >>> jgg@nvidia.com; nicolinc@nvidia.com; >>> shameerali.kolothum.thodi@huawei.com; joao.m.martins@oracle.com; >>> clement.mathieu--drif@eviden.com; Tian, Kevin <kevin.tian@intel.com>; >Liu, >>> Yi L <yi.l.liu@intel.com>; Peng, Chao P <chao.p.peng@intel.com> >>> Subject: Re: [PATCH v3 07/20] intel_iommu: Check for compatibility with >>> IOMMUFD backed device when x-flts=on >>> >>> Hi Zhenzhong, >>> >>> On 7/16/25 12:31 PM, Duan, Zhenzhong wrote: >>>> Hi Eric, >>>> >>>>> -----Original Message----- >>>>> From: Eric Auger <eric.auger@redhat.com> >>>>> Subject: Re: [PATCH v3 07/20] intel_iommu: Check for compatibility with >>>>> IOMMUFD backed device when x-flts=on >>>>> >>>>> Hi Zhenzhong, >>>>> >>>>> On 7/8/25 1:05 PM, Zhenzhong Duan wrote: >>>>>> When vIOMMU is configured x-flts=on in scalable mode, stage-1 page >>> table >>>>>> is passed to host to construct nested page table. We need to check >>>>>> compatibility of some critical IOMMU capabilities between vIOMMU >and >>>>>> host IOMMU to ensure guest stage-1 page table could be used by host. >>>>>> >>>>>> For instance, vIOMMU supports stage-1 1GB huge page mapping, but >>> host >>>>>> does not, then this IOMMUFD backed device should fail. >>>>>> >>>>>> Even of the checks pass, for now we willingly reject the association >>>>>> because all the bits are not there yet. >>>>>> >>>>>> Signed-off-by: Yi Liu <yi.l.liu@intel.com> >>>>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> >>>>>> --- >>>>>> hw/i386/intel_iommu.c | 30 >>>>> +++++++++++++++++++++++++++++- >>>>>> hw/i386/intel_iommu_internal.h | 1 + >>>>>> 2 files changed, 30 insertions(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c >>>>>> index e90fd2f28f..c57ca02cdd 100644 >>>>>> --- a/hw/i386/intel_iommu.c >>>>>> +++ b/hw/i386/intel_iommu.c >>>>>> @@ -40,6 +40,7 @@ >>>>>> #include "kvm/kvm_i386.h" >>>>>> #include "migration/vmstate.h" >>>>>> #include "trace.h" >>>>>> +#include "system/iommufd.h" >>>>>> >>>>>> /* context entry operations */ >>>>>> #define VTD_CE_GET_RID2PASID(ce) \ >>>>>> @@ -4355,7 +4356,34 @@ static bool >vtd_check_hiod(IntelIOMMUState >>> *s, >>>>> HostIOMMUDevice *hiod, >>>>>> return true; >>>>>> } >>>>>> >>>>>> - error_setg(errp, "host device is uncompatible with stage-1 >>>>> translation"); >>>>>> +#ifdef CONFIG_IOMMUFD >>>>>> + struct HostIOMMUDeviceCaps *caps = &hiod->caps; >>>>>> + struct iommu_hw_info_vtd *vtd = &caps->vendor_caps.vtd; >>>>> I am now confused about how this relates to vtd_get_viommu_cap(). >>>>> PCIIOMMUOps.set_iommu_device = vtd_dev_set_iommu_device calls >>>>> vtd_check_hiod() >>>>> viommu might return HW_NESTED_CAP through >>>>> PCIIOMMUOps.get_viommu_cap >>>>> without making sure the underlying HW IOMMU does support it. Is that >a >>>>> correct understanding? Maybe we should clarify the calling order >between >>>>> set_iommu_device vs get_viommu_cap? Could we check HW IOMMU >>>>> prerequisites in vtd_get_viommu_cap() by enforcing this is called after >>>>> set_iommu_device. I think we should clarify the exact semantic of >>>>> get_viommu_cap().Thanks Eric >>>> My understanding get_viommu_cap() returns pure vIOMMU's capabilities >>>> with no host IOMMU's capabilities involved. >>>> >>>> For example, returned HW_NESTED_CAP means this vIOMMU has code >>>> to support creating nested hwpt and attaching, no matter if host IOMMU >>>> supports nesting or not. >>> Then I think you need to refine the description in 2/20 to make this clear. >>> stating explicitly get_viommu_cap returns theoretical capabilities which >>> are independent on the actual host capabilities they may depend on. >> Will do. >> >> For virtual vtd, we are unable to return capabilities depending on host >capacities, >> Because different host IOMMU may have different capabilities, we want to >return >> a consistent result only depending on user's cmdline config. >ok >> >>>> The compatibility check between host IOMMU vs vIOMMU is done in >>>> set_iommu_device(), see vtd_check_hiod(). >>>> >>>> It's too late for VFIO to call get_viommu_cap() after set_iommu_device() >>>> because we need get_viommu_cap() to determine if creating nested >parent >>>> hwpt or not at attaching stage. >>> isn't it possible to rework the call sequence? >> I think not. Current sequence: >> >> attach_device() >> get_viommu_cap() >> create hwpt >> ... >> create hiod >> set_iommu_device(hiod) >> >> Hiod realize needs iommufd,devid and hwpt_id which are ready after >attach_device(). >OK. I would add this explanation in the commit msg too. Got it, will do. Thanks Zhenzhong
© 2016 - 2025 Red Hat, Inc.