[PATCH v7 09/23] intel_iommu: Stick to system MR for IOMMUFD backed host device when x-flts=on

Zhenzhong Duan posted 23 patches 3 months, 2 weeks ago
There is a newer version of this series
[PATCH v7 09/23] intel_iommu: Stick to system MR for IOMMUFD backed host device when x-flts=on
Posted by Zhenzhong Duan 3 months, 2 weeks ago
When guest enables scalable mode and setup first stage page table, we don't
want to use IOMMU MR but rather continue using the system MR for IOMMUFD
backed host device.

Then default HWPT in VFIO contains GPA->HPA mappings which could be reused
as nesting parent HWPT to construct nested HWPT in vIOMMU.

Suggested-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Yi Liu <yi.l.liu@intel.com>
---
 hw/i386/intel_iommu.c | 36 ++++++++++++++++++++++++++++++++++--
 1 file changed, 34 insertions(+), 2 deletions(-)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 4c83578c54..ce4c54165e 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -41,6 +41,7 @@
 #include "migration/misc.h"
 #include "migration/vmstate.h"
 #include "trace.h"
+#include "system/iommufd.h"
 
 /* context entry operations */
 #define PASID_0    0
@@ -1713,6 +1714,24 @@ static bool vtd_dev_pt_enabled(IntelIOMMUState *s, VTDContextEntry *ce,
 
 }
 
+static VTDHostIOMMUDevice *vtd_find_hiod_iommufd(VTDAddressSpace *as)
+{
+    IntelIOMMUState *s = as->iommu_state;
+    struct vtd_as_key key = {
+        .bus = as->bus,
+        .devfn = as->devfn,
+    };
+    VTDHostIOMMUDevice *vtd_hiod = g_hash_table_lookup(s->vtd_host_iommu_dev,
+                                                       &key);
+
+    if (vtd_hiod && vtd_hiod->hiod &&
+        object_dynamic_cast(OBJECT(vtd_hiod->hiod),
+                            TYPE_HOST_IOMMU_DEVICE_IOMMUFD)) {
+        return vtd_hiod;
+    }
+    return NULL;
+}
+
 static bool vtd_as_pt_enabled(VTDAddressSpace *as)
 {
     IntelIOMMUState *s;
@@ -1738,12 +1757,25 @@ static bool vtd_as_pt_enabled(VTDAddressSpace *as)
 /* Return whether the device is using IOMMU translation. */
 static bool vtd_switch_address_space(VTDAddressSpace *as)
 {
+    IntelIOMMUState *s;
     bool use_iommu, pt;
 
     assert(as);
 
-    use_iommu = as->iommu_state->dmar_enabled && !vtd_as_pt_enabled(as);
-    pt = as->iommu_state->dmar_enabled && vtd_as_pt_enabled(as);
+    s = as->iommu_state;
+    use_iommu = s->dmar_enabled && !vtd_as_pt_enabled(as);
+    pt = s->dmar_enabled && vtd_as_pt_enabled(as);
+
+    /*
+     * When guest enables scalable mode and sets up first stage page table,
+     * we stick to system MR for IOMMUFD backed host device. Then its
+     * default hwpt contains GPA->HPA mappings which is used directly if
+     * PGTT=PT and used as nesting parent if PGTT=FST. Otherwise fall back
+     * to original processing.
+     */
+    if (s->root_scalable && s->fsts && vtd_find_hiod_iommufd(as)) {
+        use_iommu = false;
+    }
 
     trace_vtd_switch_address_space(pci_bus_num(as->bus),
                                    VTD_PCI_SLOT(as->devfn),
-- 
2.47.1
Re: [PATCH v7 09/23] intel_iommu: Stick to system MR for IOMMUFD backed host device when x-flts=on
Posted by Eric Auger 3 months, 1 week ago
Hi Zhenzhong,

On 10/24/25 10:43 AM, Zhenzhong Duan wrote:
> When guest enables scalable mode and setup first stage page table, we don't
> want to use IOMMU MR but rather continue using the system MR for IOMMUFD
> backed host device.
>
> Then default HWPT in VFIO contains GPA->HPA mappings which could be reused
> as nesting parent HWPT to construct nested HWPT in vIOMMU.

we had a discussion thread with Nicolin and Shameer about usage of AS
for nested SMMU
(https://lore.kernel.org/all/add07edd-3652-430d-b52c-cb2bdbc7f587@redhat.com/)
If I understand correctly you also rely on system MR for nested. I am
not sure this is a good usage of the API/AS objects as in practice you
have an actual translation in place (althout implemented by HW) while by
using the system MR you do not reflect that. I encouraged Shameer to try
using a dummy dedicated AS that can be shared. I think it would be
better if we could align the strategies.

>
> Suggested-by: Yi Liu <yi.l.liu@intel.com>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> Reviewed-by: Eric Auger <eric.auger@redhat.com>
> Reviewed-by: Yi Liu <yi.l.liu@intel.com>
> ---
>  hw/i386/intel_iommu.c | 36 ++++++++++++++++++++++++++++++++++--
>  1 file changed, 34 insertions(+), 2 deletions(-)
>
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 4c83578c54..ce4c54165e 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -41,6 +41,7 @@
>  #include "migration/misc.h"
>  #include "migration/vmstate.h"
>  #include "trace.h"
> +#include "system/iommufd.h"
>  
>  /* context entry operations */
>  #define PASID_0    0
> @@ -1713,6 +1714,24 @@ static bool vtd_dev_pt_enabled(IntelIOMMUState *s, VTDContextEntry *ce,
>  
>  }
>  
> +static VTDHostIOMMUDevice *vtd_find_hiod_iommufd(VTDAddressSpace *as)
> +{
> +    IntelIOMMUState *s = as->iommu_state;
> +    struct vtd_as_key key = {
> +        .bus = as->bus,
> +        .devfn = as->devfn,
> +    };
> +    VTDHostIOMMUDevice *vtd_hiod = g_hash_table_lookup(s->vtd_host_iommu_dev,
> +                                                       &key);
> +
> +    if (vtd_hiod && vtd_hiod->hiod &&
> +        object_dynamic_cast(OBJECT(vtd_hiod->hiod),
> +                            TYPE_HOST_IOMMU_DEVICE_IOMMUFD)) {
> +        return vtd_hiod;
> +    }
> +    return NULL;
> +}
> +
>  static bool vtd_as_pt_enabled(VTDAddressSpace *as)
>  {
>      IntelIOMMUState *s;
> @@ -1738,12 +1757,25 @@ static bool vtd_as_pt_enabled(VTDAddressSpace *as)
>  /* Return whether the device is using IOMMU translation. */
>  static bool vtd_switch_address_space(VTDAddressSpace *as)
>  {
> +    IntelIOMMUState *s;
>      bool use_iommu, pt;
>  
>      assert(as);
>  
> -    use_iommu = as->iommu_state->dmar_enabled && !vtd_as_pt_enabled(as);
> -    pt = as->iommu_state->dmar_enabled && vtd_as_pt_enabled(as);
> +    s = as->iommu_state;
> +    use_iommu = s->dmar_enabled && !vtd_as_pt_enabled(as);
> +    pt = s->dmar_enabled && vtd_as_pt_enabled(as);
> +
> +    /*
> +     * When guest enables scalable mode and sets up first stage page table,
> +     * we stick to system MR for IOMMUFD backed host device. Then its
> +     * default hwpt contains GPA->HPA mappings which is used directly if
> +     * PGTT=PT and used as nesting parent if PGTT=FST. Otherwise fall back
> +     * to original processing.
According to the above comment you have a S1 translation in place but
you set use_iommu = false and use system MR?

Revoking my R-bs for now because I am not convinced we shall use system
MR when S1+S2 is setup. I may be wrong but at least I need more
explanations ;-)

Eric
> +     */
> +    if (s->root_scalable && s->fsts && vtd_find_hiod_iommufd(as)) {
> +        use_iommu = false;
> +    }
>  
>      trace_vtd_switch_address_space(pci_bus_num(as->bus),
>                                     VTD_PCI_SLOT(as->devfn),
RE: [PATCH v7 09/23] intel_iommu: Stick to system MR for IOMMUFD backed host device when x-flts=on
Posted by Duan, Zhenzhong 3 months, 1 week ago
Hi Eric,

>-----Original Message-----
>From: Eric Auger <eric.auger@redhat.com>
>Subject: Re: [PATCH v7 09/23] intel_iommu: Stick to system MR for
>IOMMUFD backed host device when x-flts=on
>
>Hi Zhenzhong,
>
>On 10/24/25 10:43 AM, Zhenzhong Duan wrote:
>> When guest enables scalable mode and setup first stage page table, we
>don't
>> want to use IOMMU MR but rather continue using the system MR for
>IOMMUFD
>> backed host device.
>>
>> Then default HWPT in VFIO contains GPA->HPA mappings which could be
>reused
>> as nesting parent HWPT to construct nested HWPT in vIOMMU.
>
>we had a discussion thread with Nicolin and Shameer about usage of AS
>for nested SMMU
>(https://lore.kernel.org/all/add07edd-3652-430d-b52c-cb2bdbc7f587@redha
>t.com/)
>If I understand correctly you also rely on system MR for nested. I am
>not sure this is a good usage of the API/AS objects as in practice you
>have an actual translation in place (althout implemented by HW) while by
>using the system MR you do not reflect that. I encouraged Shameer to try
>using a dummy dedicated AS that can be shared. I think it would be
>better if we could align the strategies.

Hmm, I think it's hard for VTD to use dedicated AS just like smmu, because
VTD supports legacy mode even with 'x-scalable-mode=on,x-flts=on', we
don't know guest's choice at runtime. So we always return IOMMU AS to
VFIO, we should never return address_space_memory or a dedicated AS
in vtd_find_add_as().

There was a discussion with Nicolin and Liuyi on this.

https://lore.kernel.org/qemu-devel/SJ0PR11MB6744340B889FF65D3BD5B8459267A@SJ0PR11MB6744.namprd11.prod.outlook.com/

>
>>
>> Suggested-by: Yi Liu <yi.l.liu@intel.com>
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> Reviewed-by: Eric Auger <eric.auger@redhat.com>
>> Reviewed-by: Yi Liu <yi.l.liu@intel.com>
>> ---
>>  hw/i386/intel_iommu.c | 36 ++++++++++++++++++++++++++++++++++--
>>  1 file changed, 34 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>> index 4c83578c54..ce4c54165e 100644
>> --- a/hw/i386/intel_iommu.c
>> +++ b/hw/i386/intel_iommu.c
>> @@ -41,6 +41,7 @@
>>  #include "migration/misc.h"
>>  #include "migration/vmstate.h"
>>  #include "trace.h"
>> +#include "system/iommufd.h"
>>
>>  /* context entry operations */
>>  #define PASID_0    0
>> @@ -1713,6 +1714,24 @@ static bool
>vtd_dev_pt_enabled(IntelIOMMUState *s, VTDContextEntry *ce,
>>
>>  }
>>
>> +static VTDHostIOMMUDevice *vtd_find_hiod_iommufd(VTDAddressSpace
>*as)
>> +{
>> +    IntelIOMMUState *s = as->iommu_state;
>> +    struct vtd_as_key key = {
>> +        .bus = as->bus,
>> +        .devfn = as->devfn,
>> +    };
>> +    VTDHostIOMMUDevice *vtd_hiod =
>g_hash_table_lookup(s->vtd_host_iommu_dev,
>> +                                                       &key);
>> +
>> +    if (vtd_hiod && vtd_hiod->hiod &&
>> +        object_dynamic_cast(OBJECT(vtd_hiod->hiod),
>> +
>TYPE_HOST_IOMMU_DEVICE_IOMMUFD)) {
>> +        return vtd_hiod;
>> +    }
>> +    return NULL;
>> +}
>> +
>>  static bool vtd_as_pt_enabled(VTDAddressSpace *as)
>>  {
>>      IntelIOMMUState *s;
>> @@ -1738,12 +1757,25 @@ static bool
>vtd_as_pt_enabled(VTDAddressSpace *as)
>>  /* Return whether the device is using IOMMU translation. */
>>  static bool vtd_switch_address_space(VTDAddressSpace *as)
>>  {
>> +    IntelIOMMUState *s;
>>      bool use_iommu, pt;
>>
>>      assert(as);
>>
>> -    use_iommu = as->iommu_state->dmar_enabled
>&& !vtd_as_pt_enabled(as);
>> -    pt = as->iommu_state->dmar_enabled && vtd_as_pt_enabled(as);
>> +    s = as->iommu_state;
>> +    use_iommu = s->dmar_enabled && !vtd_as_pt_enabled(as);
>> +    pt = s->dmar_enabled && vtd_as_pt_enabled(as);
>> +
>> +    /*
>> +     * When guest enables scalable mode and sets up first stage page
>table,
>> +     * we stick to system MR for IOMMUFD backed host device. Then its
>> +     * default hwpt contains GPA->HPA mappings which is used directly if
>> +     * PGTT=PT and used as nesting parent if PGTT=FST. Otherwise fall
>back
>> +     * to original processing.
>According to the above comment you have a S1 translation in place but
>you set use_iommu = false and use system MR?

Yes, we have extended the usages of MRs under IOMMU AS with nesting.
In nesting mode, system MR on/off isn't aligned with S1 translation anymore.

>
>Revoking my R-bs for now because I am not convinced we shall use system
>MR when S1+S2 is setup. I may be wrong but at least I need more
>explanations ;-)

Okay, let's discuss further.

Thanks
Zhenzhong
Re: [PATCH v7 09/23] intel_iommu: Stick to system MR for IOMMUFD backed host device when x-flts=on
Posted by Nicolin Chen 3 months, 1 week ago
On Fri, Oct 31, 2025 at 09:52:16AM +0000, Duan, Zhenzhong wrote:
> >> +    /*
> >> +     * When guest enables scalable mode and sets up first stage page
> >table,
> >> +     * we stick to system MR for IOMMUFD backed host device. Then its
> >> +     * default hwpt contains GPA->HPA mappings which is used directly if
> >> +     * PGTT=PT and used as nesting parent if PGTT=FST. Otherwise fall
> >back
> >> +     * to original processing.
> >According to the above comment you have a S1 translation in place but
> >you set use_iommu = false and use system MR?
> 
> Yes, we have extended the usages of MRs under IOMMU AS with nesting.
> In nesting mode, system MR on/off isn't aligned with S1 translation anymore.

Does Intel vIOMMU need S1 translation when the working in the HW
accelerated mode? I thought everything is handled by the HW..no?

Nicolin