[RFC PATCH v2 18/20] hw/arm/smmu-common: Bypass emulated IOTLB for a accel SMMUv3

Shameer Kolothum via posted 20 patches 3 weeks, 4 days ago
[RFC PATCH v2 18/20] hw/arm/smmu-common: Bypass emulated IOTLB for a accel SMMUv3
Posted by Shameer Kolothum via 3 weeks, 4 days ago
From: Nicolin Chen <nicolinc@nvidia.com>

If a vSMMU is configured as a accelerated one, HW IOTLB will be used
and all cache invalidation should be done to the HW IOTLB too, v.s.
the emulated iotlb. In this case, an iommu notifier isn't registered,
as the devices behind a SMMUv3-accel would stay in the system address
space for stage-2 mappings.

However, the KVM code still requests an iommu address space to translate
an MSI doorbell gIOVA via get_msi_address_space() and translate().

Since a SMMUv3-accel doesn't register an iommu notifier to flush emulated
iotlb, bypass the emulated IOTLB and always walk through the guest-level
IO page table.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---
 hw/arm/smmu-common.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
index 9fd455baa0..fd10df8866 100644
--- a/hw/arm/smmu-common.c
+++ b/hw/arm/smmu-common.c
@@ -77,6 +77,17 @@ static SMMUTLBEntry *smmu_iotlb_lookup_all_levels(SMMUState *bs,
     uint8_t level = 4 - (inputsize - 4) / stride;
     SMMUTLBEntry *entry = NULL;
 
+    /*
+     * Stage-1 translation with a accel SMMU in general uses HW IOTLB. However,
+     * KVM still requests for an iommu address space for an MSI fixup by looking
+     * up stage-1 page table. Make sure we don't go through the emulated pathway
+     * so that the emulated iotlb will not need any invalidation.
+     */
+
+    if (bs->accel) {
+        return NULL;
+    }
+
     while (level <= 3) {
         uint64_t subpage_size = 1ULL << level_shift(level, tt->granule_sz);
         uint64_t mask = subpage_size - 1;
@@ -142,6 +153,16 @@ void smmu_iotlb_insert(SMMUState *bs, SMMUTransCfg *cfg, SMMUTLBEntry *new)
     SMMUIOTLBKey *key = g_new0(SMMUIOTLBKey, 1);
     uint8_t tg = (new->granule - 10) / 2;
 
+    /*
+     * Stage-1 translation with a accel SMMU in general uses HW IOTLB. However,
+     * KVM still requests for an iommu address space for an MSI fixup by looking
+     * up stage-1 page table. Make sure we don't go through the emulated pathway
+     * so that the emulated iotlb will not need any invalidation.
+     */
+    if (bs->accel) {
+        return;
+    }
+
     if (g_hash_table_size(bs->iotlb) >= SMMU_IOTLB_MAX_SIZE) {
         smmu_iotlb_inv_all(bs);
     }
-- 
2.34.1
Re: [RFC PATCH v2 18/20] hw/arm/smmu-common: Bypass emulated IOTLB for a accel SMMUv3
Posted by Eric Auger 1 week, 3 days ago

On 3/11/25 3:10 PM, Shameer Kolothum wrote:
> From: Nicolin Chen <nicolinc@nvidia.com>
>
> If a vSMMU is configured as a accelerated one, HW IOTLB will be used
> and all cache invalidation should be done to the HW IOTLB too, v.s.
> the emulated iotlb. In this case, an iommu notifier isn't registered,
> as the devices behind a SMMUv3-accel would stay in the system address
> space for stage-2 mappings.
>
> However, the KVM code still requests an iommu address space to translate
> an MSI doorbell gIOVA via get_msi_address_space() and translate().
In case we you flat MSI mapping, can't we get rid about that problematic?

Sorry but I don't really understand the problematic here. Please can
elaborate?

Thanks

Eric
>
> Since a SMMUv3-accel doesn't register an iommu notifier to flush emulated
> iotlb, bypass the emulated IOTLB and always walk through the guest-level
> IO page table.
>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> ---
>  hw/arm/smmu-common.c | 21 +++++++++++++++++++++
>  1 file changed, 21 insertions(+)
>
> diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
> index 9fd455baa0..fd10df8866 100644
> --- a/hw/arm/smmu-common.c
> +++ b/hw/arm/smmu-common.c
> @@ -77,6 +77,17 @@ static SMMUTLBEntry *smmu_iotlb_lookup_all_levels(SMMUState *bs,
>      uint8_t level = 4 - (inputsize - 4) / stride;
>      SMMUTLBEntry *entry = NULL;
>  
> +    /*
> +     * Stage-1 translation with a accel SMMU in general uses HW IOTLB. However,
> +     * KVM still requests for an iommu address space for an MSI fixup by looking
> +     * up stage-1 page table. Make sure we don't go through the emulated pathway
> +     * so that the emulated iotlb will not need any invalidation.
> +     */
> +
> +    if (bs->accel) {
> +        return NULL;
> +    }
> +
>      while (level <= 3) {
>          uint64_t subpage_size = 1ULL << level_shift(level, tt->granule_sz);
>          uint64_t mask = subpage_size - 1;
> @@ -142,6 +153,16 @@ void smmu_iotlb_insert(SMMUState *bs, SMMUTransCfg *cfg, SMMUTLBEntry *new)
>      SMMUIOTLBKey *key = g_new0(SMMUIOTLBKey, 1);
>      uint8_t tg = (new->granule - 10) / 2;
>  
> +    /*
> +     * Stage-1 translation with a accel SMMU in general uses HW IOTLB. However,
> +     * KVM still requests for an iommu address space for an MSI fixup by looking
> +     * up stage-1 page table. Make sure we don't go through the emulated pathway
> +     * so that the emulated iotlb will not need any invalidation.
> +     */
> +    if (bs->accel) {
> +        return;
> +    }
> +
>      if (g_hash_table_size(bs->iotlb) >= SMMU_IOTLB_MAX_SIZE) {
>          smmu_iotlb_inv_all(bs);
>      }
Re: [RFC PATCH v2 18/20] hw/arm/smmu-common: Bypass emulated IOTLB for a accel SMMUv3
Posted by Nicolin Chen 1 week, 3 days ago
On Wed, Mar 26, 2025 at 06:40:10PM +0100, Eric Auger wrote:
> On 3/11/25 3:10 PM, Shameer Kolothum wrote:
> > From: Nicolin Chen <nicolinc@nvidia.com>
> >
> > If a vSMMU is configured as a accelerated one, HW IOTLB will be used
> > and all cache invalidation should be done to the HW IOTLB too, v.s.
> > the emulated iotlb. In this case, an iommu notifier isn't registered,
> > as the devices behind a SMMUv3-accel would stay in the system address
> > space for stage-2 mappings.
> >
> > However, the KVM code still requests an iommu address space to translate
> > an MSI doorbell gIOVA via get_msi_address_space() and translate().
> In case we you flat MSI mapping, can't we get rid about that problematic?
> 
> Sorry but I don't really understand the problematic here. Please can
> elaborate?

With RMR, the HW is doing flat mapping for stage-1, but the guest
isn't doing a 1:1 mapping.

The guest maps a gIOVA to the IPA of vITS page (IIRC, 0x8090000),
meanwhile the PCI HW is programmed with the RMR IOVA (0x8000000).

The translation part works well with the flat mapping alone, while
the vIRQ injection part (done by KVM) has to update the vITS page.

The details are in kvm_arch_fixup_msi_route() that uses the iommu
address space to translate the gIOVA (being programmed to the guest
level PCI) to the IPA of the vITS page.

Thanks
Nicolin
Re: [RFC PATCH v2 18/20] hw/arm/smmu-common: Bypass emulated IOTLB for a accel SMMUv3
Posted by Donald Dutile 2 weeks, 4 days ago

On 3/11/25 10:10 AM, Shameer Kolothum wrote:
> From: Nicolin Chen <nicolinc@nvidia.com>
> 
> If a vSMMU is configured as a accelerated one, HW IOTLB will be used
> and all cache invalidation should be done to the HW IOTLB too, v.s.
> the emulated iotlb. In this case, an iommu notifier isn't registered,
> as the devices behind a SMMUv3-accel would stay in the system address
> space for stage-2 mappings.
> 
> However, the KVM code still requests an iommu address space to translate
> an MSI doorbell gIOVA via get_msi_address_space() and translate().
> 
> Since a SMMUv3-accel doesn't register an iommu notifier to flush emulated
> iotlb, bypass the emulated IOTLB and always walk through the guest-level
> IO page table.
> 
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> ---
>   hw/arm/smmu-common.c | 21 +++++++++++++++++++++
>   1 file changed, 21 insertions(+)
> 
> diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
> index 9fd455baa0..fd10df8866 100644
> --- a/hw/arm/smmu-common.c
> +++ b/hw/arm/smmu-common.c
> @@ -77,6 +77,17 @@ static SMMUTLBEntry *smmu_iotlb_lookup_all_levels(SMMUState *bs,
>       uint8_t level = 4 - (inputsize - 4) / stride;
>       SMMUTLBEntry *entry = NULL;
>   
> +    /*
> +     * Stage-1 translation with a accel SMMU in general uses HW IOTLB. However,
> +     * KVM still requests for an iommu address space for an MSI fixup by looking
> +     * up stage-1 page table. Make sure we don't go through the emulated pathway
> +     * so that the emulated iotlb will not need any invalidation.
> +     */
> +
> +    if (bs->accel) {
> +        return NULL;
> +    }
> +
>       while (level <= 3) {
>           uint64_t subpage_size = 1ULL << level_shift(level, tt->granule_sz);
>           uint64_t mask = subpage_size - 1;
> @@ -142,6 +153,16 @@ void smmu_iotlb_insert(SMMUState *bs, SMMUTransCfg *cfg, SMMUTLBEntry *new)
>       SMMUIOTLBKey *key = g_new0(SMMUIOTLBKey, 1);
>       uint8_t tg = (new->granule - 10) / 2;
>   
> +    /*
> +     * Stage-1 translation with a accel SMMU in general uses HW IOTLB. However,
> +     * KVM still requests for an iommu address space for an MSI fixup by looking
> +     * up stage-1 page table. Make sure we don't go through the emulated pathway
> +     * so that the emulated iotlb will not need any invalidation.
> +     */
> +    if (bs->accel) {
> +        return;
> +    }
> +
>       if (g_hash_table_size(bs->iotlb) >= SMMU_IOTLB_MAX_SIZE) {
>           smmu_iotlb_inv_all(bs);
>       }

Ah! ... if 'accel', skip emulated code since hw handling it... in common smmu code... I like it! :)
- Don