[PATCH v5 08/21] intel_iommu: Check for compatibility with IOMMUFD backed device when x-flts=on

Zhenzhong Duan posted 21 patches 2 months, 3 weeks ago
Maintainers: "Michael S. Tsirkin" <mst@redhat.com>, Jason Wang <jasowang@redhat.com>, Yi Liu <yi.l.liu@intel.com>, "Clément Mathieu--Drif" <clement.mathieu--drif@eviden.com>, Marcel Apfelbaum <marcel.apfelbaum@gmail.com>, Paolo Bonzini <pbonzini@redhat.com>, Richard Henderson <richard.henderson@linaro.org>, Eduardo Habkost <eduardo@habkost.net>, Alex Williamson <alex.williamson@redhat.com>, "Cédric Le Goater" <clg@redhat.com>, Eric Auger <eric.auger@redhat.com>, Zhenzhong Duan <zhenzhong.duan@intel.com>
[PATCH v5 08/21] intel_iommu: Check for compatibility with IOMMUFD backed device when x-flts=on
Posted by Zhenzhong Duan 2 months, 3 weeks ago
When vIOMMU is configured x-flts=on in scalable mode, stage-1 page table
is passed to host to construct nested page table. We need to check
compatibility of some critical IOMMU capabilities between vIOMMU and
host IOMMU to ensure guest stage-1 page table could be used by host.

For instance, vIOMMU supports stage-1 1GB huge page mapping, but host
does not, then this IOMMUFD backed device should fail.

Even of the checks pass, for now we willingly reject the association
because all the bits are not there yet.

Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
 hw/i386/intel_iommu_internal.h |  1 +
 hw/i386/intel_iommu.c          | 30 +++++++++++++++++++++++++++++-
 2 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index c7046eb4e2..f7510861d1 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -192,6 +192,7 @@
 #define VTD_ECAP_PT                 (1ULL << 6)
 #define VTD_ECAP_SC                 (1ULL << 7)
 #define VTD_ECAP_MHMV               (15ULL << 20)
+#define VTD_ECAP_NEST               (1ULL << 26)
 #define VTD_ECAP_SRS                (1ULL << 31)
 #define VTD_ECAP_PSS                (7ULL << 35) /* limit: MemTxAttrs::pid */
 #define VTD_ECAP_PASID              (1ULL << 40)
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 512ca4fdc5..da355bda79 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -40,6 +40,7 @@
 #include "kvm/kvm_i386.h"
 #include "migration/vmstate.h"
 #include "trace.h"
+#include "system/iommufd.h"
 
 /* context entry operations */
 #define VTD_CE_GET_RID2PASID(ce) \
@@ -4366,7 +4367,34 @@ static bool vtd_check_hiod(IntelIOMMUState *s, HostIOMMUDevice *hiod,
         return true;
     }
 
-    error_setg(errp, "host device is uncompatible with stage-1 translation");
+#ifdef CONFIG_IOMMUFD
+    struct HostIOMMUDeviceCaps *caps = &hiod->caps;
+    struct iommu_hw_info_vtd *vtd = &caps->vendor_caps.vtd;
+
+    /* Remaining checks are all stage-1 translation specific */
+    if (!object_dynamic_cast(OBJECT(hiod), TYPE_HOST_IOMMU_DEVICE_IOMMUFD)) {
+        error_setg(errp, "Need IOMMUFD backend when x-flts=on");
+        return false;
+    }
+
+    if (caps->type != IOMMU_HW_INFO_TYPE_INTEL_VTD) {
+        error_setg(errp, "Incompatible host platform IOMMU type %d",
+                   caps->type);
+        return false;
+    }
+
+    if (!(vtd->ecap_reg & VTD_ECAP_NEST)) {
+        error_setg(errp, "Host IOMMU doesn't support nested translation");
+        return false;
+    }
+
+    if (s->fs1gp && !(vtd->cap_reg & VTD_CAP_FS1GP)) {
+        error_setg(errp, "Stage-1 1GB huge page is unsupported by host IOMMU");
+        return false;
+    }
+#endif
+
+    error_setg(errp, "host IOMMU is incompatible with stage-1 translation");
     return false;
 }
 
-- 
2.47.1
Re: [PATCH v5 08/21] intel_iommu: Check for compatibility with IOMMUFD backed device when x-flts=on
Posted by Eric Auger 2 months, 2 weeks ago

On 8/22/25 8:40 AM, Zhenzhong Duan wrote:
> When vIOMMU is configured x-flts=on in scalable mode, stage-1 page table
> is passed to host to construct nested page table. We need to check
> compatibility of some critical IOMMU capabilities between vIOMMU and
> host IOMMU to ensure guest stage-1 page table could be used by host.
>
> For instance, vIOMMU supports stage-1 1GB huge page mapping, but host
> does not, then this IOMMUFD backed device should fail.
>
> Even of the checks pass, for now we willingly reject the association
> because all the bits are not there yet.
>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>

Reviewed-by: Eric Auger <eric.auger@redhat.com>

Eric
> ---
>  hw/i386/intel_iommu_internal.h |  1 +
>  hw/i386/intel_iommu.c          | 30 +++++++++++++++++++++++++++++-
>  2 files changed, 30 insertions(+), 1 deletion(-)
>
> diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
> index c7046eb4e2..f7510861d1 100644
> --- a/hw/i386/intel_iommu_internal.h
> +++ b/hw/i386/intel_iommu_internal.h
> @@ -192,6 +192,7 @@
>  #define VTD_ECAP_PT                 (1ULL << 6)
>  #define VTD_ECAP_SC                 (1ULL << 7)
>  #define VTD_ECAP_MHMV               (15ULL << 20)
> +#define VTD_ECAP_NEST               (1ULL << 26)
>  #define VTD_ECAP_SRS                (1ULL << 31)
>  #define VTD_ECAP_PSS                (7ULL << 35) /* limit: MemTxAttrs::pid */
>  #define VTD_ECAP_PASID              (1ULL << 40)
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 512ca4fdc5..da355bda79 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -40,6 +40,7 @@
>  #include "kvm/kvm_i386.h"
>  #include "migration/vmstate.h"
>  #include "trace.h"
> +#include "system/iommufd.h"
>  
>  /* context entry operations */
>  #define VTD_CE_GET_RID2PASID(ce) \
> @@ -4366,7 +4367,34 @@ static bool vtd_check_hiod(IntelIOMMUState *s, HostIOMMUDevice *hiod,
>          return true;
>      }
>  
> -    error_setg(errp, "host device is uncompatible with stage-1 translation");
> +#ifdef CONFIG_IOMMUFD
> +    struct HostIOMMUDeviceCaps *caps = &hiod->caps;
> +    struct iommu_hw_info_vtd *vtd = &caps->vendor_caps.vtd;
> +
> +    /* Remaining checks are all stage-1 translation specific */
> +    if (!object_dynamic_cast(OBJECT(hiod), TYPE_HOST_IOMMU_DEVICE_IOMMUFD)) {
> +        error_setg(errp, "Need IOMMUFD backend when x-flts=on");
> +        return false;
> +    }
> +
> +    if (caps->type != IOMMU_HW_INFO_TYPE_INTEL_VTD) {
> +        error_setg(errp, "Incompatible host platform IOMMU type %d",
> +                   caps->type);
> +        return false;
> +    }
> +
> +    if (!(vtd->ecap_reg & VTD_ECAP_NEST)) {
> +        error_setg(errp, "Host IOMMU doesn't support nested translation");
> +        return false;
> +    }
> +
> +    if (s->fs1gp && !(vtd->cap_reg & VTD_CAP_FS1GP)) {
> +        error_setg(errp, "Stage-1 1GB huge page is unsupported by host IOMMU");
> +        return false;
> +    }
> +#endif
> +
> +    error_setg(errp, "host IOMMU is incompatible with stage-1 translation");
>      return false;
>  }
>
Re: [PATCH v5 08/21] intel_iommu: Check for compatibility with IOMMUFD backed device when x-flts=on
Posted by Yi Liu 2 months, 2 weeks ago
On 2025/8/22 14:40, Zhenzhong Duan wrote:
> When vIOMMU is configured x-flts=on in scalable mode, stage-1 page table
> is passed to host to construct nested page table.

for passthrough devices :)

> We need to check
> compatibility of some critical IOMMU capabilities between vIOMMU and
> host IOMMU to ensure guest stage-1 page table could be used by host.
> 
> For instance, vIOMMU supports stage-1 1GB huge page mapping, but host
> does not, then this IOMMUFD backed device should fail.

do you have a list of what caps should be checked to ensure guest
stage-1 page table work on hw? I can see EAFS. But it is not yet exposed
to guest, so no need to check it for now.

> 
> Even of the checks pass, for now we willingly reject the association
> because all the bits are not there yet.

better call out it would be relaxed in the end of this series. Otherwise
it's a little confused. :)

> 
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
>   hw/i386/intel_iommu_internal.h |  1 +
>   hw/i386/intel_iommu.c          | 30 +++++++++++++++++++++++++++++-
>   2 files changed, 30 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
> index c7046eb4e2..f7510861d1 100644
> --- a/hw/i386/intel_iommu_internal.h
> +++ b/hw/i386/intel_iommu_internal.h
> @@ -192,6 +192,7 @@
>   #define VTD_ECAP_PT                 (1ULL << 6)
>   #define VTD_ECAP_SC                 (1ULL << 7)
>   #define VTD_ECAP_MHMV               (15ULL << 20)
> +#define VTD_ECAP_NEST               (1ULL << 26)
>   #define VTD_ECAP_SRS                (1ULL << 31)
>   #define VTD_ECAP_PSS                (7ULL << 35) /* limit: MemTxAttrs::pid */
>   #define VTD_ECAP_PASID              (1ULL << 40)
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 512ca4fdc5..da355bda79 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -40,6 +40,7 @@
>   #include "kvm/kvm_i386.h"
>   #include "migration/vmstate.h"
>   #include "trace.h"
> +#include "system/iommufd.h"
>   
>   /* context entry operations */
>   #define VTD_CE_GET_RID2PASID(ce) \
> @@ -4366,7 +4367,34 @@ static bool vtd_check_hiod(IntelIOMMUState *s, HostIOMMUDevice *hiod,
>           return true;
>       }
>   
> -    error_setg(errp, "host device is uncompatible with stage-1 translation");
> +#ifdef CONFIG_IOMMUFD
> +    struct HostIOMMUDeviceCaps *caps = &hiod->caps;
> +    struct iommu_hw_info_vtd *vtd = &caps->vendor_caps.vtd;
> +
> +    /* Remaining checks are all stage-1 translation specific */
> +    if (!object_dynamic_cast(OBJECT(hiod), TYPE_HOST_IOMMU_DEVICE_IOMMUFD)) {
> +        error_setg(errp, "Need IOMMUFD backend when x-flts=on");
> +        return false;
> +    }
> +
> +    if (caps->type != IOMMU_HW_INFO_TYPE_INTEL_VTD) {
> +        error_setg(errp, "Incompatible host platform IOMMU type %d",
> +                   caps->type);
> +        return false;
> +    }
> +
> +    if (!(vtd->ecap_reg & VTD_ECAP_NEST)) {
> +        error_setg(errp, "Host IOMMU doesn't support nested translation");
> +        return false;
> +    }

this check may be already been covered by the sync in patch 05 as
the set_iommu_device op is called after attach_device. If no NESTED cap,
allocating nested hwpt would be failed.

> +
> +    if (s->fs1gp && !(vtd->cap_reg & VTD_CAP_FS1GP)) {
> +        error_setg(errp, "Stage-1 1GB huge page is unsupported by host IOMMU");

s/huge page/large page/ as VT-d spec use large page.

> +        return false;
> +    }
> +#endif > +
> +    error_setg(errp, "host IOMMU is incompatible with stage-1 translation");

s/stage-1 translation/guest stage-1 translation/

>       return false;
>   }
>   

with above minor nits done, the patch looks good to me. Hence,

Reviewed-by: Yi Liu <yi.l.liu@intel.com>
RE: [PATCH v5 08/21] intel_iommu: Check for compatibility with IOMMUFD backed device when x-flts=on
Posted by Duan, Zhenzhong 2 months, 2 weeks ago

>-----Original Message-----
>From: Liu, Yi L <yi.l.liu@intel.com>
>Subject: Re: [PATCH v5 08/21] intel_iommu: Check for compatibility with
>IOMMUFD backed device when x-flts=on
>
>On 2025/8/22 14:40, Zhenzhong Duan wrote:
>> When vIOMMU is configured x-flts=on in scalable mode, stage-1 page table
>> is passed to host to construct nested page table.
>
>for passthrough devices :)

OK.

>
>> We need to check
>> compatibility of some critical IOMMU capabilities between vIOMMU and
>> host IOMMU to ensure guest stage-1 page table could be used by host.
>>
>> For instance, vIOMMU supports stage-1 1GB huge page mapping, but host
>> does not, then this IOMMUFD backed device should fail.
>
>do you have a list of what caps should be checked to ensure guest
>stage-1 page table work on hw? I can see EAFS. But it is not yet exposed
>to guest, so no need to check it for now.

Currently I only see FS1GP, ATS, PRQ and PASID isn't supported yet. vIOMMU only enables a small set of capabilities when x-flts=on.

>
>>
>> Even of the checks pass, for now we willingly reject the association
>> because all the bits are not there yet.
>
>better call out it would be relaxed in the end of this series. Otherwise
>it's a little confused. :)

This comment is per Eric's suggestion, I'll combine yours with his, like:

"Even of the checks pass, for now we willingly reject the association
because all the bits are not there yet, it will be relaxed in the end of this series."

>
>>
>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> ---
>>   hw/i386/intel_iommu_internal.h |  1 +
>>   hw/i386/intel_iommu.c          | 30
>+++++++++++++++++++++++++++++-
>>   2 files changed, 30 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/i386/intel_iommu_internal.h
>b/hw/i386/intel_iommu_internal.h
>> index c7046eb4e2..f7510861d1 100644
>> --- a/hw/i386/intel_iommu_internal.h
>> +++ b/hw/i386/intel_iommu_internal.h
>> @@ -192,6 +192,7 @@
>>   #define VTD_ECAP_PT                 (1ULL << 6)
>>   #define VTD_ECAP_SC                 (1ULL << 7)
>>   #define VTD_ECAP_MHMV               (15ULL << 20)
>> +#define VTD_ECAP_NEST               (1ULL << 26)
>>   #define VTD_ECAP_SRS                (1ULL << 31)
>>   #define VTD_ECAP_PSS                (7ULL << 35) /* limit:
>MemTxAttrs::pid */
>>   #define VTD_ECAP_PASID              (1ULL << 40)
>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>> index 512ca4fdc5..da355bda79 100644
>> --- a/hw/i386/intel_iommu.c
>> +++ b/hw/i386/intel_iommu.c
>> @@ -40,6 +40,7 @@
>>   #include "kvm/kvm_i386.h"
>>   #include "migration/vmstate.h"
>>   #include "trace.h"
>> +#include "system/iommufd.h"
>>
>>   /* context entry operations */
>>   #define VTD_CE_GET_RID2PASID(ce) \
>> @@ -4366,7 +4367,34 @@ static bool vtd_check_hiod(IntelIOMMUState *s,
>HostIOMMUDevice *hiod,
>>           return true;
>>       }
>>
>> -    error_setg(errp, "host device is uncompatible with stage-1
>translation");
>> +#ifdef CONFIG_IOMMUFD
>> +    struct HostIOMMUDeviceCaps *caps = &hiod->caps;
>> +    struct iommu_hw_info_vtd *vtd = &caps->vendor_caps.vtd;
>> +
>> +    /* Remaining checks are all stage-1 translation specific */
>> +    if (!object_dynamic_cast(OBJECT(hiod),
>TYPE_HOST_IOMMU_DEVICE_IOMMUFD)) {
>> +        error_setg(errp, "Need IOMMUFD backend when x-flts=on");
>> +        return false;
>> +    }
>> +
>> +    if (caps->type != IOMMU_HW_INFO_TYPE_INTEL_VTD) {
>> +        error_setg(errp, "Incompatible host platform IOMMU type %d",
>> +                   caps->type);
>> +        return false;
>> +    }
>> +
>> +    if (!(vtd->ecap_reg & VTD_ECAP_NEST)) {
>> +        error_setg(errp, "Host IOMMU doesn't support nested
>translation");
>> +        return false;
>> +    }
>
>this check may be already been covered by the sync in patch 05 as
>the set_iommu_device op is called after attach_device. If no NESTED cap,
>allocating nested hwpt would be failed.

Indeed, will drop the check.

>
>> +
>> +    if (s->fs1gp && !(vtd->cap_reg & VTD_CAP_FS1GP)) {
>> +        error_setg(errp, "Stage-1 1GB huge page is unsupported by host
>IOMMU");
>
>s/huge page/large page/ as VT-d spec use large page.

Will do.

>
>> +        return false;
>> +    }
>> +#endif > +
>> +    error_setg(errp, "host IOMMU is incompatible with stage-1
>translation");
>
>s/stage-1 translation/guest stage-1 translation/

Will do

Thanks
Zhenzhong

>
>>       return false;
>>   }
>>
>
>with above minor nits done, the patch looks good to me. Hence,
>
>Reviewed-by: Yi Liu <yi.l.liu@intel.com>