[PATCH v4 10/28] KVM: arm64: iommu: Shadow host stage-2 page table

Mostafa Saleh posted 28 patches 1 month, 2 weeks ago
[PATCH v4 10/28] KVM: arm64: iommu: Shadow host stage-2 page table
Posted by Mostafa Saleh 1 month, 2 weeks ago
Create a shadow page table for the IOMMU that shadows the
host CPU stage-2 into the IOMMUs to establish DMA isolation.

An initial snapshot is created after the driver init, then
on every permission change a callback would be called for
the IOMMU driver to update the page table.

For some cases, an SMMUv3 may be able to share the same page
table used with the host CPU stage-2 directly.
However, this is too strict and requires changes to the core hypervisor
page table code, plus it would require the hypervisor to handle IOMMU
page faults. This can be added later as an optimization for SMMUV3.

Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
 arch/arm64/kvm/hyp/include/nvhe/iommu.h |  4 ++
 arch/arm64/kvm/hyp/nvhe/iommu/iommu.c   | 83 ++++++++++++++++++++++++-
 arch/arm64/kvm/hyp/nvhe/mem_protect.c   |  5 ++
 3 files changed, 90 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/iommu.h b/arch/arm64/kvm/hyp/include/nvhe/iommu.h
index 1ac70cc28a9e..219363045b1c 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/iommu.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/iommu.h
@@ -3,11 +3,15 @@
 #define __ARM64_KVM_NVHE_IOMMU_H__
 
 #include <asm/kvm_host.h>
+#include <asm/kvm_pgtable.h>
 
 struct kvm_iommu_ops {
 	int (*init)(void);
+	void (*host_stage2_idmap)(phys_addr_t start, phys_addr_t end, int prot);
 };
 
 int kvm_iommu_init(void);
 
+void kvm_iommu_host_stage2_idmap(phys_addr_t start, phys_addr_t end,
+				 enum kvm_pgtable_prot prot);
 #endif /* __ARM64_KVM_NVHE_IOMMU_H__ */
diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
index a01c036c55be..f7d1c8feb358 100644
--- a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
+++ b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
@@ -4,15 +4,94 @@
  *
  * Copyright (C) 2022 Linaro Ltd.
  */
+#include <linux/iommu.h>
+
 #include <nvhe/iommu.h>
+#include <nvhe/mem_protect.h>
+#include <nvhe/spinlock.h>
 
 /* Only one set of ops supported */
 struct kvm_iommu_ops *kvm_iommu_ops;
 
+/* Protected by host_mmu.lock */
+static bool kvm_idmap_initialized;
+
+static inline int pkvm_to_iommu_prot(enum kvm_pgtable_prot prot)
+{
+	int iommu_prot = 0;
+
+	if (prot & KVM_PGTABLE_PROT_R)
+		iommu_prot |= IOMMU_READ;
+	if (prot & KVM_PGTABLE_PROT_W)
+		iommu_prot |= IOMMU_WRITE;
+	if (prot == PKVM_HOST_MMIO_PROT)
+		iommu_prot |= IOMMU_MMIO;
+
+	/* We don't understand that, might be dangerous. */
+	WARN_ON(prot & ~PKVM_HOST_MEM_PROT);
+	return iommu_prot;
+}
+
+static int __snapshot_host_stage2(const struct kvm_pgtable_visit_ctx *ctx,
+				  enum kvm_pgtable_walk_flags visit)
+{
+	u64 start = ctx->addr;
+	kvm_pte_t pte = *ctx->ptep;
+	u32 level = ctx->level;
+	u64 end = start + kvm_granule_size(level);
+	int prot =  IOMMU_READ | IOMMU_WRITE;
+
+	/* Keep unmapped. */
+	if (pte && !kvm_pte_valid(pte))
+		return 0;
+
+	if (kvm_pte_valid(pte))
+		prot = pkvm_to_iommu_prot(kvm_pgtable_stage2_pte_prot(pte));
+	else if (!addr_is_memory(start))
+		prot |= IOMMU_MMIO;
+
+	kvm_iommu_ops->host_stage2_idmap(start, end, prot);
+	return 0;
+}
+
+static int kvm_iommu_snapshot_host_stage2(void)
+{
+	int ret;
+	struct kvm_pgtable_walker walker = {
+		.cb	= __snapshot_host_stage2,
+		.flags	= KVM_PGTABLE_WALK_LEAF,
+	};
+	struct kvm_pgtable *pgt = &host_mmu.pgt;
+
+	hyp_spin_lock(&host_mmu.lock);
+	ret = kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker);
+	/* Start receiving calls to host_stage2_idmap. */
+	kvm_idmap_initialized = !!ret;
+	hyp_spin_unlock(&host_mmu.lock);
+
+	return ret;
+}
+
 int kvm_iommu_init(void)
 {
-	if (!kvm_iommu_ops || !kvm_iommu_ops->init)
+	int ret;
+
+	if (!kvm_iommu_ops || !kvm_iommu_ops->init ||
+	    !kvm_iommu_ops->host_stage2_idmap)
 		return -ENODEV;
 
-	return kvm_iommu_ops->init();
+	ret = kvm_iommu_ops->init();
+	if (ret)
+		return ret;
+	return kvm_iommu_snapshot_host_stage2();
+}
+
+void kvm_iommu_host_stage2_idmap(phys_addr_t start, phys_addr_t end,
+				 enum kvm_pgtable_prot prot)
+{
+	hyp_assert_lock_held(&host_mmu.lock);
+
+	if (!kvm_idmap_initialized)
+		return;
+	kvm_iommu_ops->host_stage2_idmap(start, end, pkvm_to_iommu_prot(prot));
 }
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index c9a15ef6b18d..bce6690f29c0 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -15,6 +15,7 @@
 #include <hyp/fault.h>
 
 #include <nvhe/gfp.h>
+#include <nvhe/iommu.h>
 #include <nvhe/memory.h>
 #include <nvhe/mem_protect.h>
 #include <nvhe/mm.h>
@@ -529,6 +530,7 @@ static void __host_update_page_state(phys_addr_t addr, u64 size, enum pkvm_page_
 int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id)
 {
 	int ret;
+	enum kvm_pgtable_prot prot;
 
 	if (!range_is_memory(addr, addr + size))
 		return -EPERM;
@@ -538,6 +540,9 @@ int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id)
 	if (ret)
 		return ret;
 
+	prot = owner_id == PKVM_ID_HOST ? PKVM_HOST_MEM_PROT : 0;
+	kvm_iommu_host_stage2_idmap(addr, addr + size, prot);
+
 	/* Don't forget to update the vmemmap tracking for the host */
 	if (owner_id == PKVM_ID_HOST)
 		__host_update_page_state(addr, size, PKVM_PAGE_OWNED);
-- 
2.51.0.rc1.167.g924127e9c0-goog
Re: [PATCH v4 10/28] KVM: arm64: iommu: Shadow host stage-2 page table
Posted by Will Deacon 3 weeks, 3 days ago
On Tue, Aug 19, 2025 at 09:51:38PM +0000, Mostafa Saleh wrote:
> Create a shadow page table for the IOMMU that shadows the
> host CPU stage-2 into the IOMMUs to establish DMA isolation.
> 
> An initial snapshot is created after the driver init, then
> on every permission change a callback would be called for
> the IOMMU driver to update the page table.
> 
> For some cases, an SMMUv3 may be able to share the same page
> table used with the host CPU stage-2 directly.
> However, this is too strict and requires changes to the core hypervisor
> page table code, plus it would require the hypervisor to handle IOMMU
> page faults. This can be added later as an optimization for SMMUV3.
> 
> Signed-off-by: Mostafa Saleh <smostafa@google.com>
> ---
>  arch/arm64/kvm/hyp/include/nvhe/iommu.h |  4 ++
>  arch/arm64/kvm/hyp/nvhe/iommu/iommu.c   | 83 ++++++++++++++++++++++++-
>  arch/arm64/kvm/hyp/nvhe/mem_protect.c   |  5 ++
>  3 files changed, 90 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/iommu.h b/arch/arm64/kvm/hyp/include/nvhe/iommu.h
> index 1ac70cc28a9e..219363045b1c 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/iommu.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/iommu.h
> @@ -3,11 +3,15 @@
>  #define __ARM64_KVM_NVHE_IOMMU_H__
>  
>  #include <asm/kvm_host.h>
> +#include <asm/kvm_pgtable.h>
>  
>  struct kvm_iommu_ops {
>  	int (*init)(void);
> +	void (*host_stage2_idmap)(phys_addr_t start, phys_addr_t end, int prot);
>  };
>  
>  int kvm_iommu_init(void);
>  
> +void kvm_iommu_host_stage2_idmap(phys_addr_t start, phys_addr_t end,
> +				 enum kvm_pgtable_prot prot);
>  #endif /* __ARM64_KVM_NVHE_IOMMU_H__ */
> diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
> index a01c036c55be..f7d1c8feb358 100644
> --- a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
> +++ b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
> @@ -4,15 +4,94 @@
>   *
>   * Copyright (C) 2022 Linaro Ltd.
>   */
> +#include <linux/iommu.h>
> +
>  #include <nvhe/iommu.h>
> +#include <nvhe/mem_protect.h>
> +#include <nvhe/spinlock.h>
>  
>  /* Only one set of ops supported */
>  struct kvm_iommu_ops *kvm_iommu_ops;
>  
> +/* Protected by host_mmu.lock */
> +static bool kvm_idmap_initialized;
> +
> +static inline int pkvm_to_iommu_prot(enum kvm_pgtable_prot prot)
> +{
> +	int iommu_prot = 0;
> +
> +	if (prot & KVM_PGTABLE_PROT_R)
> +		iommu_prot |= IOMMU_READ;
> +	if (prot & KVM_PGTABLE_PROT_W)
> +		iommu_prot |= IOMMU_WRITE;
> +	if (prot == PKVM_HOST_MMIO_PROT)
> +		iommu_prot |= IOMMU_MMIO;

This looks a little odd to me.

On the CPU side, the only different between PKVM_HOST_MEM_PROT and
PKVM_HOST_MMIO_PROT is that the former has execute permission. Both are
mapped as cacheable at stage-2 because it's the job of the host to set
the more restrictive memory type at stage-1.

Carrying that over to the SMMU would suggest that we don't care about
IOMMU_MMIO at stage-2 at all, so why do we need to set it here?

> +	/* We don't understand that, might be dangerous. */
> +	WARN_ON(prot & ~PKVM_HOST_MEM_PROT);
> +	return iommu_prot;
> +}
> +
> +static int __snapshot_host_stage2(const struct kvm_pgtable_visit_ctx *ctx,
> +				  enum kvm_pgtable_walk_flags visit)
> +{
> +	u64 start = ctx->addr;
> +	kvm_pte_t pte = *ctx->ptep;
> +	u32 level = ctx->level;
> +	u64 end = start + kvm_granule_size(level);
> +	int prot =  IOMMU_READ | IOMMU_WRITE;
> +
> +	/* Keep unmapped. */
> +	if (pte && !kvm_pte_valid(pte))
> +		return 0;
> +
> +	if (kvm_pte_valid(pte))
> +		prot = pkvm_to_iommu_prot(kvm_pgtable_stage2_pte_prot(pte));
> +	else if (!addr_is_memory(start))
> +		prot |= IOMMU_MMIO;

Why do we need to map MMIO regions pro-actively here? I'd have thought
we could just do:

	if (!kvm_pte_valid(pte))
		return 0;

	prot = pkvm_to_iommu_prot(kvm_pgtable_stage2_pte_prot(pte);
	kvm_iommu_ops->host_stage2_idmap(start, end, prot);
	return 0;

but I think that IOMMU_MMIO is throwing me again...

Will
Re: [PATCH v4 10/28] KVM: arm64: iommu: Shadow host stage-2 page table
Posted by Mostafa Saleh 2 weeks, 3 days ago
On Tue, Sep 09, 2025 at 03:42:07PM +0100, Will Deacon wrote:
> On Tue, Aug 19, 2025 at 09:51:38PM +0000, Mostafa Saleh wrote:
> > Create a shadow page table for the IOMMU that shadows the
> > host CPU stage-2 into the IOMMUs to establish DMA isolation.
> > 
> > An initial snapshot is created after the driver init, then
> > on every permission change a callback would be called for
> > the IOMMU driver to update the page table.
> > 
> > For some cases, an SMMUv3 may be able to share the same page
> > table used with the host CPU stage-2 directly.
> > However, this is too strict and requires changes to the core hypervisor
> > page table code, plus it would require the hypervisor to handle IOMMU
> > page faults. This can be added later as an optimization for SMMUV3.
> > 
> > Signed-off-by: Mostafa Saleh <smostafa@google.com>
> > ---
> >  arch/arm64/kvm/hyp/include/nvhe/iommu.h |  4 ++
> >  arch/arm64/kvm/hyp/nvhe/iommu/iommu.c   | 83 ++++++++++++++++++++++++-
> >  arch/arm64/kvm/hyp/nvhe/mem_protect.c   |  5 ++
> >  3 files changed, 90 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/arm64/kvm/hyp/include/nvhe/iommu.h b/arch/arm64/kvm/hyp/include/nvhe/iommu.h
> > index 1ac70cc28a9e..219363045b1c 100644
> > --- a/arch/arm64/kvm/hyp/include/nvhe/iommu.h
> > +++ b/arch/arm64/kvm/hyp/include/nvhe/iommu.h
> > @@ -3,11 +3,15 @@
> >  #define __ARM64_KVM_NVHE_IOMMU_H__
> >  
> >  #include <asm/kvm_host.h>
> > +#include <asm/kvm_pgtable.h>
> >  
> >  struct kvm_iommu_ops {
> >  	int (*init)(void);
> > +	void (*host_stage2_idmap)(phys_addr_t start, phys_addr_t end, int prot);
> >  };
> >  
> >  int kvm_iommu_init(void);
> >  
> > +void kvm_iommu_host_stage2_idmap(phys_addr_t start, phys_addr_t end,
> > +				 enum kvm_pgtable_prot prot);
> >  #endif /* __ARM64_KVM_NVHE_IOMMU_H__ */
> > diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
> > index a01c036c55be..f7d1c8feb358 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
> > @@ -4,15 +4,94 @@
> >   *
> >   * Copyright (C) 2022 Linaro Ltd.
> >   */
> > +#include <linux/iommu.h>
> > +
> >  #include <nvhe/iommu.h>
> > +#include <nvhe/mem_protect.h>
> > +#include <nvhe/spinlock.h>
> >  
> >  /* Only one set of ops supported */
> >  struct kvm_iommu_ops *kvm_iommu_ops;
> >  
> > +/* Protected by host_mmu.lock */
> > +static bool kvm_idmap_initialized;
> > +
> > +static inline int pkvm_to_iommu_prot(enum kvm_pgtable_prot prot)
> > +{
> > +	int iommu_prot = 0;
> > +
> > +	if (prot & KVM_PGTABLE_PROT_R)
> > +		iommu_prot |= IOMMU_READ;
> > +	if (prot & KVM_PGTABLE_PROT_W)
> > +		iommu_prot |= IOMMU_WRITE;
> > +	if (prot == PKVM_HOST_MMIO_PROT)
> > +		iommu_prot |= IOMMU_MMIO;
> 
> This looks a little odd to me.
> 
> On the CPU side, the only different between PKVM_HOST_MEM_PROT and
> PKVM_HOST_MMIO_PROT is that the former has execute permission. Both are
> mapped as cacheable at stage-2 because it's the job of the host to set
> the more restrictive memory type at stage-1.
> 
> Carrying that over to the SMMU would suggest that we don't care about
> IOMMU_MMIO at stage-2 at all, so why do we need to set it here?

Unlike the CPU, the host can set the SMMU to bypass, in that case the
hypervisor will attach its stage-2 with no stage-1 configured. So,
stage-2 must have the correct attrs for MMIO.

> 
> > +	/* We don't understand that, might be dangerous. */
> > +	WARN_ON(prot & ~PKVM_HOST_MEM_PROT);
> > +	return iommu_prot;
> > +}
> > +
> > +static int __snapshot_host_stage2(const struct kvm_pgtable_visit_ctx *ctx,
> > +				  enum kvm_pgtable_walk_flags visit)
> > +{
> > +	u64 start = ctx->addr;
> > +	kvm_pte_t pte = *ctx->ptep;
> > +	u32 level = ctx->level;
> > +	u64 end = start + kvm_granule_size(level);
> > +	int prot =  IOMMU_READ | IOMMU_WRITE;
> > +
> > +	/* Keep unmapped. */
> > +	if (pte && !kvm_pte_valid(pte))
> > +		return 0;
> > +
> > +	if (kvm_pte_valid(pte))
> > +		prot = pkvm_to_iommu_prot(kvm_pgtable_stage2_pte_prot(pte));
> > +	else if (!addr_is_memory(start))
> > +		prot |= IOMMU_MMIO;
> 
> Why do we need to map MMIO regions pro-actively here? I'd have thought
> we could just do:
> 
> 	if (!kvm_pte_valid(pte))
> 		return 0;
> 
> 	prot = pkvm_to_iommu_prot(kvm_pgtable_stage2_pte_prot(pte);
> 	kvm_iommu_ops->host_stage2_idmap(start, end, prot);
> 	return 0;
> 
> but I think that IOMMU_MMIO is throwing me again...

We have to map everything pro-actively as we don’t handle page faults
in the SMMUv3 driver.
This would be a future work where the CPU stage-2 page table is shared with
the SMMUv3.

Thanks,
Mostafa

> 
> Will
Re: [PATCH v4 10/28] KVM: arm64: iommu: Shadow host stage-2 page table
Posted by Will Deacon 1 week ago
On Tue, Sep 16, 2025 at 02:24:46PM +0000, Mostafa Saleh wrote:
> On Tue, Sep 09, 2025 at 03:42:07PM +0100, Will Deacon wrote:
> > On Tue, Aug 19, 2025 at 09:51:38PM +0000, Mostafa Saleh wrote:
> > > diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
> > > index a01c036c55be..f7d1c8feb358 100644
> > > --- a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
> > > +++ b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
> > > @@ -4,15 +4,94 @@
> > >   *
> > >   * Copyright (C) 2022 Linaro Ltd.
> > >   */
> > > +#include <linux/iommu.h>
> > > +
> > >  #include <nvhe/iommu.h>
> > > +#include <nvhe/mem_protect.h>
> > > +#include <nvhe/spinlock.h>
> > >  
> > >  /* Only one set of ops supported */
> > >  struct kvm_iommu_ops *kvm_iommu_ops;
> > >  
> > > +/* Protected by host_mmu.lock */
> > > +static bool kvm_idmap_initialized;
> > > +
> > > +static inline int pkvm_to_iommu_prot(enum kvm_pgtable_prot prot)
> > > +{
> > > +	int iommu_prot = 0;
> > > +
> > > +	if (prot & KVM_PGTABLE_PROT_R)
> > > +		iommu_prot |= IOMMU_READ;
> > > +	if (prot & KVM_PGTABLE_PROT_W)
> > > +		iommu_prot |= IOMMU_WRITE;
> > > +	if (prot == PKVM_HOST_MMIO_PROT)
> > > +		iommu_prot |= IOMMU_MMIO;
> > 
> > This looks a little odd to me.
> > 
> > On the CPU side, the only different between PKVM_HOST_MEM_PROT and
> > PKVM_HOST_MMIO_PROT is that the former has execute permission. Both are
> > mapped as cacheable at stage-2 because it's the job of the host to set
> > the more restrictive memory type at stage-1.
> > 
> > Carrying that over to the SMMU would suggest that we don't care about
> > IOMMU_MMIO at stage-2 at all, so why do we need to set it here?
> 
> Unlike the CPU, the host can set the SMMU to bypass, in that case the
> hypervisor will attach its stage-2 with no stage-1 configured. So,
> stage-2 must have the correct attrs for MMIO.

I'm not sure about that.

If the SMMU is in stage-1 bypass, we still have the incoming memory
attributes from the transaction (modulo MTCFG which we shouldn't be
setting) and they should combine with the stage-2 attributes in roughly
the same way as the CPU, no?

> > > +static int __snapshot_host_stage2(const struct kvm_pgtable_visit_ctx *ctx,
> > > +				  enum kvm_pgtable_walk_flags visit)
> > > +{
> > > +	u64 start = ctx->addr;
> > > +	kvm_pte_t pte = *ctx->ptep;
> > > +	u32 level = ctx->level;
> > > +	u64 end = start + kvm_granule_size(level);
> > > +	int prot =  IOMMU_READ | IOMMU_WRITE;
> > > +
> > > +	/* Keep unmapped. */
> > > +	if (pte && !kvm_pte_valid(pte))
> > > +		return 0;
> > > +
> > > +	if (kvm_pte_valid(pte))
> > > +		prot = pkvm_to_iommu_prot(kvm_pgtable_stage2_pte_prot(pte));
> > > +	else if (!addr_is_memory(start))
> > > +		prot |= IOMMU_MMIO;
> > 
> > Why do we need to map MMIO regions pro-actively here? I'd have thought
> > we could just do:
> > 
> > 	if (!kvm_pte_valid(pte))
> > 		return 0;
> > 
> > 	prot = pkvm_to_iommu_prot(kvm_pgtable_stage2_pte_prot(pte);
> > 	kvm_iommu_ops->host_stage2_idmap(start, end, prot);
> > 	return 0;
> > 
> > but I think that IOMMU_MMIO is throwing me again...
> 
> We have to map everything pro-actively as we don’t handle page faults
> in the SMMUv3 driver.
> This would be a future work where the CPU stage-2 page table is shared with
> the SMMUv3.

Ah yes, I'd forgotten about that.

Thanks,

Will
Re: [PATCH v4 10/28] KVM: arm64: iommu: Shadow host stage-2 page table
Posted by Mostafa Saleh 4 days, 19 hours ago
On Fri, Sep 26, 2025 at 03:42:38PM +0100, Will Deacon wrote:
> On Tue, Sep 16, 2025 at 02:24:46PM +0000, Mostafa Saleh wrote:
> > On Tue, Sep 09, 2025 at 03:42:07PM +0100, Will Deacon wrote:
> > > On Tue, Aug 19, 2025 at 09:51:38PM +0000, Mostafa Saleh wrote:
> > > > diff --git a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
> > > > index a01c036c55be..f7d1c8feb358 100644
> > > > --- a/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
> > > > +++ b/arch/arm64/kvm/hyp/nvhe/iommu/iommu.c
> > > > @@ -4,15 +4,94 @@
> > > >   *
> > > >   * Copyright (C) 2022 Linaro Ltd.
> > > >   */
> > > > +#include <linux/iommu.h>
> > > > +
> > > >  #include <nvhe/iommu.h>
> > > > +#include <nvhe/mem_protect.h>
> > > > +#include <nvhe/spinlock.h>
> > > >  
> > > >  /* Only one set of ops supported */
> > > >  struct kvm_iommu_ops *kvm_iommu_ops;
> > > >  
> > > > +/* Protected by host_mmu.lock */
> > > > +static bool kvm_idmap_initialized;
> > > > +
> > > > +static inline int pkvm_to_iommu_prot(enum kvm_pgtable_prot prot)
> > > > +{
> > > > +	int iommu_prot = 0;
> > > > +
> > > > +	if (prot & KVM_PGTABLE_PROT_R)
> > > > +		iommu_prot |= IOMMU_READ;
> > > > +	if (prot & KVM_PGTABLE_PROT_W)
> > > > +		iommu_prot |= IOMMU_WRITE;
> > > > +	if (prot == PKVM_HOST_MMIO_PROT)
> > > > +		iommu_prot |= IOMMU_MMIO;
> > > 
> > > This looks a little odd to me.
> > > 
> > > On the CPU side, the only different between PKVM_HOST_MEM_PROT and
> > > PKVM_HOST_MMIO_PROT is that the former has execute permission. Both are
> > > mapped as cacheable at stage-2 because it's the job of the host to set
> > > the more restrictive memory type at stage-1.
> > > 
> > > Carrying that over to the SMMU would suggest that we don't care about
> > > IOMMU_MMIO at stage-2 at all, so why do we need to set it here?
> > 
> > Unlike the CPU, the host can set the SMMU to bypass, in that case the
> > hypervisor will attach its stage-2 with no stage-1 configured. So,
> > stage-2 must have the correct attrs for MMIO.
> 
> I'm not sure about that.
> 
> If the SMMU is in stage-1 bypass, we still have the incoming memory
> attributes from the transaction (modulo MTCFG which we shouldn't be
> setting) and they should combine with the stage-2 attributes in roughly
> the same way as the CPU, no?

Makes sense, we can remove that for now and map all stage-2 with
IOMMU_CACHE. However, that might not be true for other IOMMUs,
as they might not combine attributes as SMMUv3 stage-2, but we
can ignore that for now. I will update the logic in v5.

Thanks,
Mostafa

> 
> > > > +static int __snapshot_host_stage2(const struct kvm_pgtable_visit_ctx *ctx,
> > > > +				  enum kvm_pgtable_walk_flags visit)
> > > > +{
> > > > +	u64 start = ctx->addr;
> > > > +	kvm_pte_t pte = *ctx->ptep;
> > > > +	u32 level = ctx->level;
> > > > +	u64 end = start + kvm_granule_size(level);
> > > > +	int prot =  IOMMU_READ | IOMMU_WRITE;
> > > > +
> > > > +	/* Keep unmapped. */
> > > > +	if (pte && !kvm_pte_valid(pte))
> > > > +		return 0;
> > > > +
> > > > +	if (kvm_pte_valid(pte))
> > > > +		prot = pkvm_to_iommu_prot(kvm_pgtable_stage2_pte_prot(pte));
> > > > +	else if (!addr_is_memory(start))
> > > > +		prot |= IOMMU_MMIO;
> > > 
> > > Why do we need to map MMIO regions pro-actively here? I'd have thought
> > > we could just do:
> > > 
> > > 	if (!kvm_pte_valid(pte))
> > > 		return 0;
> > > 
> > > 	prot = pkvm_to_iommu_prot(kvm_pgtable_stage2_pte_prot(pte);
> > > 	kvm_iommu_ops->host_stage2_idmap(start, end, prot);
> > > 	return 0;
> > > 
> > > but I think that IOMMU_MMIO is throwing me again...
> > 
> > We have to map everything pro-actively as we don’t handle page faults
> > in the SMMUv3 driver.
> > This would be a future work where the CPU stage-2 page table is shared with
> > the SMMUv3.
> 
> Ah yes, I'd forgotten about that.
> 
> Thanks,
> 
> Will
Re: [PATCH v4 10/28] KVM: arm64: iommu: Shadow host stage-2 page table
Posted by Jason Gunthorpe 3 days, 18 hours ago
On Mon, Sep 29, 2025 at 11:01:10AM +0000, Mostafa Saleh wrote:

> > If the SMMU is in stage-1 bypass, we still have the incoming memory
> > attributes from the transaction (modulo MTCFG which we shouldn't be
> > setting) and they should combine with the stage-2 attributes in roughly
> > the same way as the CPU, no?
> 
> Makes sense, we can remove that for now and map all stage-2 with
> IOMMU_CACHE. 

Robin was saying in another thread that the DMA API has to use
IOMMU_MMIO properly or it won't work.. I think what happens depends on
the SOC design.

Yes, the incoming attribute combines, but unlike the CPU which will
have per-page memory attributes in the S1, the DMA initiator will
almost always use the same memory attributes.

In other words, we cannot rely on the DMA initiator to indicate if the
underlying memory should be MMIO or CACHE like the CPU can.

I think you have to set CACHE/MMIO correctly here.

Jason
Re: [PATCH v4 10/28] KVM: arm64: iommu: Shadow host stage-2 page table
Posted by Mostafa Saleh 3 days, 17 hours ago
On Tue, Sep 30, 2025 at 09:38:39AM -0300, Jason Gunthorpe wrote:
> On Mon, Sep 29, 2025 at 11:01:10AM +0000, Mostafa Saleh wrote:
> 
> > > If the SMMU is in stage-1 bypass, we still have the incoming memory
> > > attributes from the transaction (modulo MTCFG which we shouldn't be
> > > setting) and they should combine with the stage-2 attributes in roughly
> > > the same way as the CPU, no?
> > 
> > Makes sense, we can remove that for now and map all stage-2 with
> > IOMMU_CACHE. 
> 
> Robin was saying in another thread that the DMA API has to use
> IOMMU_MMIO properly or it won't work.. I think what happens depends on
> the SOC design.
> 
> Yes, the incoming attribute combines, but unlike the CPU which will
> have per-page memory attributes in the S1, the DMA initiator will
> almost always use the same memory attributes.
> 
> In other words, we cannot rely on the DMA initiator to indicate if the
> underlying memory should be MMIO or CACHE like the CPU can.
> 
> I think you have to set CACHE/MMIO correctly here.

I see, I think you mean[1], thanks for pointing it, I think we have to
keep things as is.

Thanks,
Mostafa

[1] https://lore.kernel.org/all/8f912671-f1d9-4f73-9c1d-e39938bfc09f@arm.com/

> 
> Jason