Hi Eric,
>-----Original Message-----
>From: Eric Auger <eric.auger@redhat.com>
>Subject: Re: [PATCH v7 09/23] intel_iommu: Stick to system MR for
>IOMMUFD backed host device when x-flts=on
>
>Hi Zhenzhong,
>
>On 10/24/25 10:43 AM, Zhenzhong Duan wrote:
>> When guest enables scalable mode and setup first stage page table, we
>don't
>> want to use IOMMU MR but rather continue using the system MR for
>IOMMUFD
>> backed host device.
>>
>> Then default HWPT in VFIO contains GPA->HPA mappings which could be
>reused
>> as nesting parent HWPT to construct nested HWPT in vIOMMU.
>
>we had a discussion thread with Nicolin and Shameer about usage of AS
>for nested SMMU
>(https://lore.kernel.org/all/add07edd-3652-430d-b52c-cb2bdbc7f587@redha
>t.com/)
>If I understand correctly you also rely on system MR for nested. I am
>not sure this is a good usage of the API/AS objects as in practice you
>have an actual translation in place (althout implemented by HW) while by
>using the system MR you do not reflect that. I encouraged Shameer to try
>using a dummy dedicated AS that can be shared. I think it would be
>better if we could align the strategies.
Hmm, I think it's hard for VTD to use dedicated AS just like smmu, because
VTD supports legacy mode even with 'x-scalable-mode=on,x-flts=on', we
don't know guest's choice at runtime. So we always return IOMMU AS to
VFIO, we should never return address_space_memory or a dedicated AS
in vtd_find_add_as().
There was a discussion with Nicolin and Liuyi on this.
https://lore.kernel.org/qemu-devel/SJ0PR11MB6744340B889FF65D3BD5B8459267A@SJ0PR11MB6744.namprd11.prod.outlook.com/
>
>>
>> Suggested-by: Yi Liu <yi.l.liu@intel.com>
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> Reviewed-by: Eric Auger <eric.auger@redhat.com>
>> Reviewed-by: Yi Liu <yi.l.liu@intel.com>
>> ---
>> hw/i386/intel_iommu.c | 36 ++++++++++++++++++++++++++++++++++--
>> 1 file changed, 34 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>> index 4c83578c54..ce4c54165e 100644
>> --- a/hw/i386/intel_iommu.c
>> +++ b/hw/i386/intel_iommu.c
>> @@ -41,6 +41,7 @@
>> #include "migration/misc.h"
>> #include "migration/vmstate.h"
>> #include "trace.h"
>> +#include "system/iommufd.h"
>>
>> /* context entry operations */
>> #define PASID_0 0
>> @@ -1713,6 +1714,24 @@ static bool
>vtd_dev_pt_enabled(IntelIOMMUState *s, VTDContextEntry *ce,
>>
>> }
>>
>> +static VTDHostIOMMUDevice *vtd_find_hiod_iommufd(VTDAddressSpace
>*as)
>> +{
>> + IntelIOMMUState *s = as->iommu_state;
>> + struct vtd_as_key key = {
>> + .bus = as->bus,
>> + .devfn = as->devfn,
>> + };
>> + VTDHostIOMMUDevice *vtd_hiod =
>g_hash_table_lookup(s->vtd_host_iommu_dev,
>> + &key);
>> +
>> + if (vtd_hiod && vtd_hiod->hiod &&
>> + object_dynamic_cast(OBJECT(vtd_hiod->hiod),
>> +
>TYPE_HOST_IOMMU_DEVICE_IOMMUFD)) {
>> + return vtd_hiod;
>> + }
>> + return NULL;
>> +}
>> +
>> static bool vtd_as_pt_enabled(VTDAddressSpace *as)
>> {
>> IntelIOMMUState *s;
>> @@ -1738,12 +1757,25 @@ static bool
>vtd_as_pt_enabled(VTDAddressSpace *as)
>> /* Return whether the device is using IOMMU translation. */
>> static bool vtd_switch_address_space(VTDAddressSpace *as)
>> {
>> + IntelIOMMUState *s;
>> bool use_iommu, pt;
>>
>> assert(as);
>>
>> - use_iommu = as->iommu_state->dmar_enabled
>&& !vtd_as_pt_enabled(as);
>> - pt = as->iommu_state->dmar_enabled && vtd_as_pt_enabled(as);
>> + s = as->iommu_state;
>> + use_iommu = s->dmar_enabled && !vtd_as_pt_enabled(as);
>> + pt = s->dmar_enabled && vtd_as_pt_enabled(as);
>> +
>> + /*
>> + * When guest enables scalable mode and sets up first stage page
>table,
>> + * we stick to system MR for IOMMUFD backed host device. Then its
>> + * default hwpt contains GPA->HPA mappings which is used directly if
>> + * PGTT=PT and used as nesting parent if PGTT=FST. Otherwise fall
>back
>> + * to original processing.
>According to the above comment you have a S1 translation in place but
>you set use_iommu = false and use system MR?
Yes, we have extended the usages of MRs under IOMMU AS with nesting.
In nesting mode, system MR on/off isn't aligned with S1 translation anymore.
>
>Revoking my R-bs for now because I am not convinced we shall use system
>MR when S1+S2 is setup. I may be wrong but at least I need more
>explanations ;-)
Okay, let's discuss further.
Thanks
Zhenzhong