[RFC PATCH v1 04/38] tsm: Support DMA Allocation from private memory

Aneesh Kumar K.V (Arm) posted 38 patches 2 months, 1 week ago
[RFC PATCH v1 04/38] tsm: Support DMA Allocation from private memory
Posted by Aneesh Kumar K.V (Arm) 2 months, 1 week ago
Currently, we enforce the use of bounce buffers to ensure that memory
accessed by non-secure devices is explicitly shared with the host [1].
However, for secure devices, this approach must be avoided.

To achieve this, we introduce a device flag that controls whether a
bounce buffer allocation is required for the device. Additionally, this flag is
used to manage the top IPA bit assignment for setting up
protected/unprotected IPA aliases.

[1] commit fbf979a01375 ("arm64: Enforce bounce buffers for realm DMA")

based on changes from Alexey Kardashevskiy <aik@amd.com>
Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
---
 arch/arm64/include/asm/mem_encrypt.h |  6 +-----
 arch/arm64/mm/mem_encrypt.c          | 10 ++++++++++
 drivers/pci/tsm.c                    |  6 ++++++
 include/linux/device.h               |  1 +
 include/linux/swiotlb.h              |  4 ++++
 5 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/mem_encrypt.h b/arch/arm64/include/asm/mem_encrypt.h
index 314b2b52025f..d77c10cd5b79 100644
--- a/arch/arm64/include/asm/mem_encrypt.h
+++ b/arch/arm64/include/asm/mem_encrypt.h
@@ -15,14 +15,10 @@ int arm64_mem_crypt_ops_register(const struct arm64_mem_crypt_ops *ops);
 
 int set_memory_encrypted(unsigned long addr, int numpages);
 int set_memory_decrypted(unsigned long addr, int numpages);
+bool force_dma_unencrypted(struct device *dev);
 
 int realm_register_memory_enc_ops(void);
 
-static inline bool force_dma_unencrypted(struct device *dev)
-{
-	return is_realm_world();
-}
-
 /*
  * For Arm CCA guests, canonical addresses are "encrypted", so no changes
  * required for dma_addr_encrypted().
diff --git a/arch/arm64/mm/mem_encrypt.c b/arch/arm64/mm/mem_encrypt.c
index ee3c0ab04384..279696a8af3f 100644
--- a/arch/arm64/mm/mem_encrypt.c
+++ b/arch/arm64/mm/mem_encrypt.c
@@ -17,6 +17,7 @@
 #include <linux/compiler.h>
 #include <linux/err.h>
 #include <linux/mm.h>
+#include <linux/device.h>
 
 #include <asm/mem_encrypt.h>
 
@@ -48,3 +49,12 @@ int set_memory_decrypted(unsigned long addr, int numpages)
 	return crypt_ops->decrypt(addr, numpages);
 }
 EXPORT_SYMBOL_GPL(set_memory_decrypted);
+
+bool force_dma_unencrypted(struct device *dev)
+{
+	if (dev->tdi_enabled)
+		return false;
+
+	return is_realm_world();
+}
+EXPORT_SYMBOL_GPL(force_dma_unencrypted);
diff --git a/drivers/pci/tsm.c b/drivers/pci/tsm.c
index e4a3b5b37939..60f50d57a725 100644
--- a/drivers/pci/tsm.c
+++ b/drivers/pci/tsm.c
@@ -120,6 +120,7 @@ static int pci_tsm_disconnect(struct pci_dev *pdev)
 
 	tsm_ops->disconnect(pdev);
 	tsm->state = PCI_TSM_INIT;
+	pdev->dev.tdi_enabled = false;
 
 	return 0;
 }
@@ -199,6 +200,8 @@ static int pci_tsm_accept(struct pci_dev *pdev)
 	if (rc)
 		return rc;
 	tsm->state = PCI_TSM_ACCEPT;
+	pdev->dev.tdi_enabled = true;
+
 	return 0;
 }
 
@@ -557,6 +560,9 @@ static void __pci_tsm_init(struct pci_dev *pdev)
 	default:
 		break;
 	}
+
+	/* FIXME!! should this be default true and switch to false for TEE capable device */
+	pdev->dev.tdi_enabled = false;
 }
 
 void pci_tsm_init(struct pci_dev *pdev)
diff --git a/include/linux/device.h b/include/linux/device.h
index 4940db137fff..d62e0dd9d8ee 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -688,6 +688,7 @@ struct device {
 #ifdef CONFIG_IOMMU_DMA
 	bool			dma_iommu:1;
 #endif
+	bool			tdi_enabled:1;
 };
 
 /**
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 3dae0f592063..61e7cff7768b 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -173,6 +173,10 @@ static inline bool is_swiotlb_force_bounce(struct device *dev)
 {
 	struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
 
+	if (dev->tdi_enabled) {
+		dev_warn_once(dev, "(TIO) Disable SWIOTLB");
+		return false;
+	}
 	return mem && mem->force_bounce;
 }
 
-- 
2.43.0
Re: [RFC PATCH v1 04/38] tsm: Support DMA Allocation from private memory
Posted by Mostafa Saleh 2 weeks, 6 days ago
Hi Aneesh,

On Mon, Jul 28, 2025 at 07:21:41PM +0530, Aneesh Kumar K.V (Arm) wrote:
> Currently, we enforce the use of bounce buffers to ensure that memory
> accessed by non-secure devices is explicitly shared with the host [1].
> However, for secure devices, this approach must be avoided.


Sorry this might be a basic question, I just started looking into this.
I see that “force_dma_unencrypted” and “is_swiotlb_force_bounce” are only
used from DMA-direct, but it seems in your case it involves an IOMMU.
How does it influence bouncing in that case?

Thanks,
Mostafa

> 
> To achieve this, we introduce a device flag that controls whether a
> bounce buffer allocation is required for the device. Additionally, this flag is
> used to manage the top IPA bit assignment for setting up
> protected/unprotected IPA aliases.
> 
> [1] commit fbf979a01375 ("arm64: Enforce bounce buffers for realm DMA")
> 
> based on changes from Alexey Kardashevskiy <aik@amd.com>
> Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
> ---
>  arch/arm64/include/asm/mem_encrypt.h |  6 +-----
>  arch/arm64/mm/mem_encrypt.c          | 10 ++++++++++
>  drivers/pci/tsm.c                    |  6 ++++++
>  include/linux/device.h               |  1 +
>  include/linux/swiotlb.h              |  4 ++++
>  5 files changed, 22 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/mem_encrypt.h b/arch/arm64/include/asm/mem_encrypt.h
> index 314b2b52025f..d77c10cd5b79 100644
> --- a/arch/arm64/include/asm/mem_encrypt.h
> +++ b/arch/arm64/include/asm/mem_encrypt.h
> @@ -15,14 +15,10 @@ int arm64_mem_crypt_ops_register(const struct arm64_mem_crypt_ops *ops);
>  
>  int set_memory_encrypted(unsigned long addr, int numpages);
>  int set_memory_decrypted(unsigned long addr, int numpages);
> +bool force_dma_unencrypted(struct device *dev);
>  
>  int realm_register_memory_enc_ops(void);
>  
> -static inline bool force_dma_unencrypted(struct device *dev)
> -{
> -	return is_realm_world();
> -}
> -
>  /*
>   * For Arm CCA guests, canonical addresses are "encrypted", so no changes
>   * required for dma_addr_encrypted().
> diff --git a/arch/arm64/mm/mem_encrypt.c b/arch/arm64/mm/mem_encrypt.c
> index ee3c0ab04384..279696a8af3f 100644
> --- a/arch/arm64/mm/mem_encrypt.c
> +++ b/arch/arm64/mm/mem_encrypt.c
> @@ -17,6 +17,7 @@
>  #include <linux/compiler.h>
>  #include <linux/err.h>
>  #include <linux/mm.h>
> +#include <linux/device.h>
>  
>  #include <asm/mem_encrypt.h>
>  
> @@ -48,3 +49,12 @@ int set_memory_decrypted(unsigned long addr, int numpages)
>  	return crypt_ops->decrypt(addr, numpages);
>  }
>  EXPORT_SYMBOL_GPL(set_memory_decrypted);
> +
> +bool force_dma_unencrypted(struct device *dev)
> +{
> +	if (dev->tdi_enabled)
> +		return false;
> +
> +	return is_realm_world();
> +}
> +EXPORT_SYMBOL_GPL(force_dma_unencrypted);
> diff --git a/drivers/pci/tsm.c b/drivers/pci/tsm.c
> index e4a3b5b37939..60f50d57a725 100644
> --- a/drivers/pci/tsm.c
> +++ b/drivers/pci/tsm.c
> @@ -120,6 +120,7 @@ static int pci_tsm_disconnect(struct pci_dev *pdev)
>  
>  	tsm_ops->disconnect(pdev);
>  	tsm->state = PCI_TSM_INIT;
> +	pdev->dev.tdi_enabled = false;
>  
>  	return 0;
>  }
> @@ -199,6 +200,8 @@ static int pci_tsm_accept(struct pci_dev *pdev)
>  	if (rc)
>  		return rc;
>  	tsm->state = PCI_TSM_ACCEPT;
> +	pdev->dev.tdi_enabled = true;
> +
>  	return 0;
>  }
>  
> @@ -557,6 +560,9 @@ static void __pci_tsm_init(struct pci_dev *pdev)
>  	default:
>  		break;
>  	}
> +
> +	/* FIXME!! should this be default true and switch to false for TEE capable device */
> +	pdev->dev.tdi_enabled = false;
>  }
>  
>  void pci_tsm_init(struct pci_dev *pdev)
> diff --git a/include/linux/device.h b/include/linux/device.h
> index 4940db137fff..d62e0dd9d8ee 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -688,6 +688,7 @@ struct device {
>  #ifdef CONFIG_IOMMU_DMA
>  	bool			dma_iommu:1;
>  #endif
> +	bool			tdi_enabled:1;
>  };
>  
>  /**
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index 3dae0f592063..61e7cff7768b 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -173,6 +173,10 @@ static inline bool is_swiotlb_force_bounce(struct device *dev)
>  {
>  	struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
>  
> +	if (dev->tdi_enabled) {
> +		dev_warn_once(dev, "(TIO) Disable SWIOTLB");
> +		return false;
> +	}
>  	return mem && mem->force_bounce;
>  }
>  
> -- 
> 2.43.0
> 
Re: [RFC PATCH v1 04/38] tsm: Support DMA Allocation from private memory
Posted by Aneesh Kumar K.V 2 weeks, 5 days ago
Mostafa Saleh <smostafa@google.com> writes:

> Hi Aneesh,
>
> On Mon, Jul 28, 2025 at 07:21:41PM +0530, Aneesh Kumar K.V (Arm) wrote:
>> Currently, we enforce the use of bounce buffers to ensure that memory
>> accessed by non-secure devices is explicitly shared with the host [1].
>> However, for secure devices, this approach must be avoided.
>
>
> Sorry this might be a basic question, I just started looking into this.
> I see that “force_dma_unencrypted” and “is_swiotlb_force_bounce” are only
> used from DMA-direct, but it seems in your case it involves an IOMMU.
> How does it influence bouncing in that case?
>

With the current patchset, the guest does not have an assigned IOMMU (no
Stage1 SMMU), so guest DMA operations use DMA-direct.

For non-secure devices:
 - Streaming DMA uses swiotlb, which is a shared pool with the hypervisor.
 - Non-streaming DMA uses DMA-direct, and the attributes of the allocated
   memory are updated with dma_set_decrypted().

For secure devices, neither of these mechanisms is needed.

-aneesh
Re: [RFC PATCH v1 04/38] tsm: Support DMA Allocation from private memory
Posted by Mostafa Saleh 2 weeks, 5 days ago
On Tue, Sep 16, 2025 at 09:45:18AM +0530, Aneesh Kumar K.V wrote:
> Mostafa Saleh <smostafa@google.com> writes:
> 
> > Hi Aneesh,
> >
> > On Mon, Jul 28, 2025 at 07:21:41PM +0530, Aneesh Kumar K.V (Arm) wrote:
> >> Currently, we enforce the use of bounce buffers to ensure that memory
> >> accessed by non-secure devices is explicitly shared with the host [1].
> >> However, for secure devices, this approach must be avoided.
> >
> >
> > Sorry this might be a basic question, I just started looking into this.
> > I see that “force_dma_unencrypted” and “is_swiotlb_force_bounce” are only
> > used from DMA-direct, but it seems in your case it involves an IOMMU.
> > How does it influence bouncing in that case?
> >
> 
> With the current patchset, the guest does not have an assigned IOMMU (no
> Stage1 SMMU), so guest DMA operations use DMA-direct.
> 
> For non-secure devices:
>  - Streaming DMA uses swiotlb, which is a shared pool with the hypervisor.
>  - Non-streaming DMA uses DMA-direct, and the attributes of the allocated
>    memory are updated with dma_set_decrypted().
> 
> For secure devices, neither of these mechanisms is needed.

I see, thanks for the explanation!

Thanks,
Mostafa

> 
> -aneesh
Re: [RFC PATCH v1 04/38] tsm: Support DMA Allocation from private memory
Posted by Bjorn Helgaas 2 months ago
On Mon, Jul 28, 2025 at 07:21:41PM +0530, Aneesh Kumar K.V (Arm) wrote:
> Currently, we enforce the use of bounce buffers to ensure that memory
> accessed by non-secure devices is explicitly shared with the host [1].
> However, for secure devices, this approach must be avoided.

> @@ -557,6 +560,9 @@ static void __pci_tsm_init(struct pci_dev *pdev)
>  	default:
>  		break;
>  	}
> +
> +	/* FIXME!! should this be default true and switch to false for TEE capable device */

Fix whatever this is, or make it a real sentence and wrap to fit in 80
columns.

> +	pdev->dev.tdi_enabled = false;
>  }
Re: [RFC PATCH v1 04/38] tsm: Support DMA Allocation from private memory
Posted by Jason Gunthorpe 2 months, 1 week ago
On Mon, Jul 28, 2025 at 07:21:41PM +0530, Aneesh Kumar K.V (Arm) wrote:
> @@ -48,3 +49,12 @@ int set_memory_decrypted(unsigned long addr, int numpages)
>  	return crypt_ops->decrypt(addr, numpages);
>  }
>  EXPORT_SYMBOL_GPL(set_memory_decrypted);
> +
> +bool force_dma_unencrypted(struct device *dev)
> +{
> +	if (dev->tdi_enabled)
> +		return false;

Is this OK? I see code like this:

static inline dma_addr_t phys_to_dma_direct(struct device *dev,
		phys_addr_t phys)
{
	if (force_dma_unencrypted(dev))
		return phys_to_dma_unencrypted(dev, phys);
	return phys_to_dma(dev, phys);

What are the ARM rules for generating dma addreses?

1) Device is T=0, memory is unencrypted, call dma_addr_unencrypted()
   and do "top bit IBA set"

2) Device is T=1, memory is encrypted, use the phys_to_dma() normally

3) Device it T=1, memory is uncrypted, use the phys_to_dma()
   normally??? Seems odd, I would have guessed the DMA address sould
   be the same as case #1?

Can you document this in a comment?

> diff --git a/include/linux/device.h b/include/linux/device.h
> index 4940db137fff..d62e0dd9d8ee 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -688,6 +688,7 @@ struct device {
>  #ifdef CONFIG_IOMMU_DMA
>  	bool			dma_iommu:1;
>  #endif
> +	bool			tdi_enabled:1;
>  };

I would give the dev->tdi_enabled a clearer name, maybe
dev->encrypted_dma_supported ?

Also need to think carefully of a bitfield is OK here, we can't
locklessly change a bitfield so need to audit that all members are set
under, probably, the device lock or some other single threaded hand
waving. It seems believable it is like that but should be checked out,
and add a lockdep if it relies on the device lock.

Jason
Re: [RFC PATCH v1 04/38] tsm: Support DMA Allocation from private memory
Posted by Alexey Kardashevskiy 2 months ago

On 28/7/25 20:03, Jason Gunthorpe wrote:
> On Mon, Jul 28, 2025 at 07:21:41PM +0530, Aneesh Kumar K.V (Arm) wrote:
>> @@ -48,3 +49,12 @@ int set_memory_decrypted(unsigned long addr, int numpages)
>>   	return crypt_ops->decrypt(addr, numpages);
>>   }
>>   EXPORT_SYMBOL_GPL(set_memory_decrypted);
>> +
>> +bool force_dma_unencrypted(struct device *dev)
>> +{
>> +	if (dev->tdi_enabled)
>> +		return false;
> 
> Is this OK? I see code like this:
> 
> static inline dma_addr_t phys_to_dma_direct(struct device *dev,
> 		phys_addr_t phys)
> {
> 	if (force_dma_unencrypted(dev))
> 		return phys_to_dma_unencrypted(dev, phys);
> 	return phys_to_dma(dev, phys);

I did write this in the first place so I'll comment :)

> 
> What are the ARM rules for generating dma addreses?
> 
> 1) Device is T=0, memory is unencrypted, call dma_addr_unencrypted()
>     and do "top bit IBA set"
> 
> 2) Device is T=1, memory is encrypted, use the phys_to_dma() normally
> 
> 3) Device it T=1, memory is uncrypted, use the phys_to_dma()
>     normally??? Seems odd, I would have guessed the DMA address sould
>     be the same as case #1?
> 
> Can you document this in a comment?


On AMD, T=1 only encrypts the PCIe trafic, when a DMA request hits the IOMMU, the IOMMU decrypts it and then decides whether to encrypt it with a memory key: if there is secure vIOMMU - it will do what Cbit says in the guest IOMMU table (this is in the works) oooor just always set Cbit without guest vIOMMU (which is a big knob per a device and this is what my patches do now).

And with vIOMMU, I'd expect phys_to_dma_direct() not to be called as this one is in a direct map path.

> 
>> diff --git a/include/linux/device.h b/include/linux/device.h
>> index 4940db137fff..d62e0dd9d8ee 100644
>> --- a/include/linux/device.h
>> +++ b/include/linux/device.h
>> @@ -688,6 +688,7 @@ struct device {
>>   #ifdef CONFIG_IOMMU_DMA
>>   	bool			dma_iommu:1;
>>   #endif
>> +	bool			tdi_enabled:1;
>>   };
> 
> I would give the dev->tdi_enabled a clearer name, maybe
> dev->encrypted_dma_supported ?


May be but "_enabled", not "_supported". And, ideally, with vIOMMU, at least AMD won't be needing it.

> 
> Also need to think carefully of a bitfield is OK here, we can't
> locklessly change a bitfield so need to audit that all members are set
> under, probably, the device lock or some other single threaded hand
> waving. It seems believable it is like that but should be checked out,
> and add a lockdep if it relies on the device lock.

True.

> 
> Jason
> 

-- 
Alexey
Re: [RFC PATCH v1 04/38] tsm: Support DMA Allocation from private memory
Posted by Jason Gunthorpe 2 months ago
On Tue, Aug 05, 2025 at 08:22:10PM +1000, Alexey Kardashevskiy wrote:

>> static inline dma_addr_t phys_to_dma_direct(struct device *dev,
>>               phys_addr_t phys)
>> {
>>       if (force_dma_unencrypted(dev))
>>               return phys_to_dma_unencrypted(dev, phys);
>>       return phys_to_dma(dev, phys);

On AMD what is the force_dma_unencrypted() for?

I thought AMD had only one IOMMU and effectively one S2 mapping. Why
does it need to change the phys depending on it being shared or private?

> On AMD, T=1 only encrypts the PCIe trafic, when a DMA request hits
> the IOMMU, the IOMMU decrypts it and then decides whether to encrypt
> it with a memory key: if there is secure vIOMMU - it will do what
> Cbit says in the guest IOMMU table (this is in the works) oooor just
> always set Cbit without guest vIOMMU (which is a big knob per a
> device and this is what my patches do now).

AMD doesn't have the split IOMMU design that something like ARM has,
so it is bit different..

On ARM the T=1 IOMMU should map the entire CPU address space, so any
IOVA with any address should just work. So I'd expect AMD and ARM to
be the same here.

For the T=0 iommu ARM (I think) will only map the shared pages to the
shared IPA alias, so the guest VM has to ensure the shared physical
alias is used. Then it sounds like the CPU will sometimes accept the
private physical alias, and linus will sometimes prefer the physical
alias, for the shared memory too so Linux gets things muddled.

IMHO ARM probably should fix this much higher up the stack when it has
more information to tell if the phys_addr is actualy the private alias
a shared page.

> > > +	bool			tdi_enabled:1;
> > >   };
> > 
> > I would give the dev->tdi_enabled a clearer name, maybe
> > dev->encrypted_dma_supported ?
> 
> 
> May be but "_enabled", not "_supported". And, ideally, with vIOMMU, at least AMD won't be needing it.

Yes

Jason
Re: [RFC PATCH v1 04/38] tsm: Support DMA Allocation from private memory
Posted by Aneesh Kumar K.V 2 months, 1 week ago
Jason Gunthorpe <jgg@ziepe.ca> writes:

> On Mon, Jul 28, 2025 at 07:21:41PM +0530, Aneesh Kumar K.V (Arm) wrote:
>> @@ -48,3 +49,12 @@ int set_memory_decrypted(unsigned long addr, int numpages)
>>  	return crypt_ops->decrypt(addr, numpages);
>>  }
>>  EXPORT_SYMBOL_GPL(set_memory_decrypted);
>> +
>> +bool force_dma_unencrypted(struct device *dev)
>> +{
>> +	if (dev->tdi_enabled)
>> +		return false;
>
> Is this OK? I see code like this:
>
> static inline dma_addr_t phys_to_dma_direct(struct device *dev,
> 		phys_addr_t phys)
> {
> 	if (force_dma_unencrypted(dev))
> 		return phys_to_dma_unencrypted(dev, phys);
> 	return phys_to_dma(dev, phys);
>
> What are the ARM rules for generating dma addreses?
>
> 1) Device is T=0, memory is unencrypted, call dma_addr_unencrypted()
>    and do "top bit IBA set"
>
> 2) Device is T=1, memory is encrypted, use the phys_to_dma() normally
>
> 3) Device it T=1, memory is uncrypted, use the phys_to_dma()
>    normally??? Seems odd, I would have guessed the DMA address sould
>    be the same as case #1?
>
> Can you document this in a comment?
>

If a device is operating in secure mode (T=1), it is currently assumed
that only access to private (encrypted) memory is supported. It is
unclear whether devices would need to perform DMA to shared
(unencrypted) memory while operating in this mode, as TLPs with T=1 are
generally expected to target private memory.

Based on this assumption, T=1 devices will always access
private/encrypted memory, while T=0 devices will be restricted to
shared/unencrypted memory.

>
>> diff --git a/include/linux/device.h b/include/linux/device.h
>> index 4940db137fff..d62e0dd9d8ee 100644
>> --- a/include/linux/device.h
>> +++ b/include/linux/device.h
>> @@ -688,6 +688,7 @@ struct device {
>>  #ifdef CONFIG_IOMMU_DMA
>>  	bool			dma_iommu:1;
>>  #endif
>> +	bool			tdi_enabled:1;
>>  };
>
> I would give the dev->tdi_enabled a clearer name, maybe
> dev->encrypted_dma_supported ?
>
> Also need to think carefully of a bitfield is OK here, we can't
> locklessly change a bitfield so need to audit that all members are set
> under, probably, the device lock or some other single threaded hand
> waving. It seems believable it is like that but should be checked out,
> and add a lockdep if it relies on the device lock.
>

Will check and update the patch.

-aneesh
Re: [RFC PATCH v1 04/38] tsm: Support DMA Allocation from private memory
Posted by Jason Gunthorpe 2 months, 1 week ago
On Tue, Jul 29, 2025 at 01:53:10PM +0530, Aneesh Kumar K.V wrote:
> Jason Gunthorpe <jgg@ziepe.ca> writes:
> 
> > On Mon, Jul 28, 2025 at 07:21:41PM +0530, Aneesh Kumar K.V (Arm) wrote:
> >> @@ -48,3 +49,12 @@ int set_memory_decrypted(unsigned long addr, int numpages)
> >>  	return crypt_ops->decrypt(addr, numpages);
> >>  }
> >>  EXPORT_SYMBOL_GPL(set_memory_decrypted);
> >> +
> >> +bool force_dma_unencrypted(struct device *dev)
> >> +{
> >> +	if (dev->tdi_enabled)
> >> +		return false;
> >
> > Is this OK? I see code like this:
> >
> > static inline dma_addr_t phys_to_dma_direct(struct device *dev,
> > 		phys_addr_t phys)
> > {
> > 	if (force_dma_unencrypted(dev))
> > 		return phys_to_dma_unencrypted(dev, phys);
> > 	return phys_to_dma(dev, phys);
> >
> > What are the ARM rules for generating dma addreses?
> >
> > 1) Device is T=0, memory is unencrypted, call dma_addr_unencrypted()
> >    and do "top bit IBA set"
> >
> > 2) Device is T=1, memory is encrypted, use the phys_to_dma() normally
> >
> > 3) Device it T=1, memory is uncrypted, use the phys_to_dma()
> >    normally??? Seems odd, I would have guessed the DMA address sould
> >    be the same as case #1?
> >
> > Can you document this in a comment?
> >
> 
> If a device is operating in secure mode (T=1), it is currently assumed
> that only access to private (encrypted) memory is supported.

No, this is no how the PCI specs were written as far as I
understand. The XT bit thing is supposed to add more fine grained
device side control over what memory the DMA can target. T alone does
not do that.

> It is unclear whether devices would need to perform DMA to shared
> (unencrypted) memory while operating in this mode, as TLPs with T=1
> are generally expected to target private memory.

PCI SIG supports it, kernel should support it.

Jason
Re: [RFC PATCH v1 04/38] tsm: Support DMA Allocation from private memory
Posted by Aneesh Kumar K.V 2 months ago
Jason Gunthorpe <jgg@ziepe.ca> writes:

> On Tue, Jul 29, 2025 at 01:53:10PM +0530, Aneesh Kumar K.V wrote:
>> Jason Gunthorpe <jgg@ziepe.ca> writes:
>> 
>> > On Mon, Jul 28, 2025 at 07:21:41PM +0530, Aneesh Kumar K.V (Arm) wrote:
>> >> @@ -48,3 +49,12 @@ int set_memory_decrypted(unsigned long addr, int numpages)
>> >>  	return crypt_ops->decrypt(addr, numpages);
>> >>  }
>> >>  EXPORT_SYMBOL_GPL(set_memory_decrypted);
>> >> +
>> >> +bool force_dma_unencrypted(struct device *dev)
>> >> +{
>> >> +	if (dev->tdi_enabled)
>> >> +		return false;
>> >
>> > Is this OK? I see code like this:
>> >
>> > static inline dma_addr_t phys_to_dma_direct(struct device *dev,
>> > 		phys_addr_t phys)
>> > {
>> > 	if (force_dma_unencrypted(dev))
>> > 		return phys_to_dma_unencrypted(dev, phys);
>> > 	return phys_to_dma(dev, phys);
>> >
>> > What are the ARM rules for generating dma addreses?
>> >
>> > 1) Device is T=0, memory is unencrypted, call dma_addr_unencrypted()
>> >    and do "top bit IBA set"
>> >
>> > 2) Device is T=1, memory is encrypted, use the phys_to_dma() normally
>> >
>> > 3) Device it T=1, memory is uncrypted, use the phys_to_dma()
>> >    normally??? Seems odd, I would have guessed the DMA address sould
>> >    be the same as case #1?
>> >
>> > Can you document this in a comment?
>> >
>> 
>> If a device is operating in secure mode (T=1), it is currently assumed
>> that only access to private (encrypted) memory is supported.
>
> No, this is no how the PCI specs were written as far as I
> understand. The XT bit thing is supposed to add more fine grained
> device side control over what memory the DMA can target. T alone does
> not do that.
>
>> It is unclear whether devices would need to perform DMA to shared
>> (unencrypted) memory while operating in this mode, as TLPs with T=1
>> are generally expected to target private memory.
>
> PCI SIG supports it, kernel should support it.
>

Would we also need a separate DMA allocation API for allocating
addresses intended to be shared with the non-secure hypervisor?

Are there any existing drivers in the kernel that already require such
shared allocations, which I could use as a reference?

-aneesh
Re: [RFC PATCH v1 04/38] tsm: Support DMA Allocation from private memory
Posted by Jason Gunthorpe 2 months ago
On Sat, Aug 02, 2025 at 02:14:20PM +0530, Aneesh Kumar K.V wrote:
> Jason Gunthorpe <jgg@ziepe.ca> writes:
> 
> > On Tue, Jul 29, 2025 at 01:53:10PM +0530, Aneesh Kumar K.V wrote:
> >> Jason Gunthorpe <jgg@ziepe.ca> writes:
> >> 
> >> > On Mon, Jul 28, 2025 at 07:21:41PM +0530, Aneesh Kumar K.V (Arm) wrote:
> >> >> @@ -48,3 +49,12 @@ int set_memory_decrypted(unsigned long addr, int numpages)
> >> >>  	return crypt_ops->decrypt(addr, numpages);
> >> >>  }
> >> >>  EXPORT_SYMBOL_GPL(set_memory_decrypted);
> >> >> +
> >> >> +bool force_dma_unencrypted(struct device *dev)
> >> >> +{
> >> >> +	if (dev->tdi_enabled)
> >> >> +		return false;
> >> >
> >> > Is this OK? I see code like this:
> >> >
> >> > static inline dma_addr_t phys_to_dma_direct(struct device *dev,
> >> > 		phys_addr_t phys)
> >> > {
> >> > 	if (force_dma_unencrypted(dev))
> >> > 		return phys_to_dma_unencrypted(dev, phys);
> >> > 	return phys_to_dma(dev, phys);
> >> >
> >> > What are the ARM rules for generating dma addreses?
> >> >
> >> > 1) Device is T=0, memory is unencrypted, call dma_addr_unencrypted()
> >> >    and do "top bit IBA set"
> >> >
> >> > 2) Device is T=1, memory is encrypted, use the phys_to_dma() normally
> >> >
> >> > 3) Device it T=1, memory is uncrypted, use the phys_to_dma()
> >> >    normally??? Seems odd, I would have guessed the DMA address sould
> >> >    be the same as case #1?
> >> >
> >> > Can you document this in a comment?
> >> >
> >> 
> >> If a device is operating in secure mode (T=1), it is currently assumed
> >> that only access to private (encrypted) memory is supported.
> >
> > No, this is no how the PCI specs were written as far as I
> > understand. The XT bit thing is supposed to add more fine grained
> > device side control over what memory the DMA can target. T alone does
> > not do that.
> >
> >> It is unclear whether devices would need to perform DMA to shared
> >> (unencrypted) memory while operating in this mode, as TLPs with T=1
> >> are generally expected to target private memory.
> >
> > PCI SIG supports it, kernel should support it.
> >
> 
> Would we also need a separate DMA allocation API for allocating
> addresses intended to be shared with the non-secure hypervisor?
> 
> Are there any existing drivers in the kernel that already require such
> shared allocations, which I could use as a reference?

The most likely case in the near term is PCI P2P to shared MMIO.

I don't know any way to allocate shared memory in a driver??

At the bare minimum this patch should be documenting the correct
architecture and outlining any gaps in the current implementation.

I also don't really understand what the above code is even
doing.. Isn't the design on ARM that the IPA always encodes the
shared/private in the top bit?

How do we get a shared page that does not already have a phys_addr_t
in the shared IPA? Shouldn't the kernel have switched to the shared
IPA alias when it returned the swiotlb buffer? eg why do we need to do:

#define dma_addr_unencrypted(x)		((x) | PROT_NS_SHARED)

At all?

Suzuki ?

Jason
Re: [RFC PATCH v1 04/38] tsm: Support DMA Allocation from private memory
Posted by Aneesh Kumar K.V 2 months ago
Jason Gunthorpe <jgg@ziepe.ca> writes:

> On Sat, Aug 02, 2025 at 02:14:20PM +0530, Aneesh Kumar K.V wrote:
>> Jason Gunthorpe <jgg@ziepe.ca> writes:
>> 
>> > On Tue, Jul 29, 2025 at 01:53:10PM +0530, Aneesh Kumar K.V wrote:
>> >> Jason Gunthorpe <jgg@ziepe.ca> writes:
>> >> 
>> >> > On Mon, Jul 28, 2025 at 07:21:41PM +0530, Aneesh Kumar K.V (Arm) wrote:
>> >> >> @@ -48,3 +49,12 @@ int set_memory_decrypted(unsigned long addr, int numpages)
>> >> >>  	return crypt_ops->decrypt(addr, numpages);
>> >> >>  }
>> >> >>  EXPORT_SYMBOL_GPL(set_memory_decrypted);
>> >> >> +
>> >> >> +bool force_dma_unencrypted(struct device *dev)
>> >> >> +{
>> >> >> +	if (dev->tdi_enabled)
>> >> >> +		return false;
>> >> >
>> >> > Is this OK? I see code like this:
>> >> >
>> >> > static inline dma_addr_t phys_to_dma_direct(struct device *dev,
>> >> > 		phys_addr_t phys)
>> >> > {
>> >> > 	if (force_dma_unencrypted(dev))
>> >> > 		return phys_to_dma_unencrypted(dev, phys);
>> >> > 	return phys_to_dma(dev, phys);
>> >> >
>> >> > What are the ARM rules for generating dma addreses?
>> >> >
>> >> > 1) Device is T=0, memory is unencrypted, call dma_addr_unencrypted()
>> >> >    and do "top bit IBA set"
>> >> >
>> >> > 2) Device is T=1, memory is encrypted, use the phys_to_dma() normally
>> >> >
>> >> > 3) Device it T=1, memory is uncrypted, use the phys_to_dma()
>> >> >    normally??? Seems odd, I would have guessed the DMA address sould
>> >> >    be the same as case #1?
>> >> >
>> >> > Can you document this in a comment?
>> >> >
>> >> 
>> >> If a device is operating in secure mode (T=1), it is currently assumed
>> >> that only access to private (encrypted) memory is supported.
>> >
>> > No, this is no how the PCI specs were written as far as I
>> > understand. The XT bit thing is supposed to add more fine grained
>> > device side control over what memory the DMA can target. T alone does
>> > not do that.
>> >
>> >> It is unclear whether devices would need to perform DMA to shared
>> >> (unencrypted) memory while operating in this mode, as TLPs with T=1
>> >> are generally expected to target private memory.
>> >
>> > PCI SIG supports it, kernel should support it.
>> >
>> 
>> Would we also need a separate DMA allocation API for allocating
>> addresses intended to be shared with the non-secure hypervisor?
>> 
>> Are there any existing drivers in the kernel that already require such
>> shared allocations, which I could use as a reference?
>
> The most likely case in the near term is PCI P2P to shared MMIO.
>
> I don't know any way to allocate shared memory in a driver??
>
> At the bare minimum this patch should be documenting the correct
> architecture and outlining any gaps in the current implementation.
>
> I also don't really understand what the above code is even
> doing.. Isn't the design on ARM that the IPA always encodes the
> shared/private in the top bit?
>
> How do we get a shared page that does not already have a phys_addr_t
> in the shared IPA? Shouldn't the kernel have switched to the shared
> IPA alias when it returned the swiotlb buffer? eg why do we need to do:
>
> #define dma_addr_unencrypted(x)		((x) | PROT_NS_SHARED)
>

swiotlb virt addr is updated in the direct map page table such that we
have the correct attribute set. For ex: swiotlb_update_mem_attributes
uses set_memory_decrypted() to mark the memory as shared.

	set_memory_decrypted((unsigned long)mem->vaddr, bytes >> PAGE_SHIFT);

However, when mapping swiotlb regions to obtain a `dma_addr_t`, we still
need to explicitly convert the physical address:

swiotlb_map()
	swiotlb_addr = swiotlb_tbl_map_single(dev, paddr, size, 0, dir, attrs);
        ...

	/* Ensure that the address returned is DMA'ble */
	dma_addr = phys_to_dma_unencrypted(dev, swiotlb_addr);

Note that we don’t update the phys_addr_t to set the top
bit. For reference:

	tlb_addr = slot_addr(pool->start, index) + offset;

-aneesh
Re: [RFC PATCH v1 04/38] tsm: Support DMA Allocation from private memory
Posted by Jason Gunthorpe 2 months ago
On Mon, Aug 04, 2025 at 12:28:33PM +0530, Aneesh Kumar K.V wrote:
> Note that we don’t update the phys_addr_t to set the top
> bit. For reference:
> 
> 	tlb_addr = slot_addr(pool->start, index) + offset;

This seems unfortunate.

So you end up with the private IPA space having shared pages in it,
so *sometimes* you have to force the unencrypted bit?

Seems to me we should insist the phys_addr is cannonised before
reaching the dma API. Ie that the swiotlb/etc code will set the right
IPA bit.

Jason
Re: [RFC PATCH v1 04/38] tsm: Support DMA Allocation from private memory
Posted by Suzuki K Poulose 2 months, 1 week ago
On 29/07/2025 15:33, Jason Gunthorpe wrote:
> On Tue, Jul 29, 2025 at 01:53:10PM +0530, Aneesh Kumar K.V wrote:
>> Jason Gunthorpe <jgg@ziepe.ca> writes:
>>
>>> On Mon, Jul 28, 2025 at 07:21:41PM +0530, Aneesh Kumar K.V (Arm) wrote:
>>>> @@ -48,3 +49,12 @@ int set_memory_decrypted(unsigned long addr, int numpages)
>>>>   	return crypt_ops->decrypt(addr, numpages);
>>>>   }
>>>>   EXPORT_SYMBOL_GPL(set_memory_decrypted);
>>>> +
>>>> +bool force_dma_unencrypted(struct device *dev)
>>>> +{
>>>> +	if (dev->tdi_enabled)
>>>> +		return false;
>>>
>>> Is this OK? I see code like this:
>>>
>>> static inline dma_addr_t phys_to_dma_direct(struct device *dev,
>>> 		phys_addr_t phys)
>>> {
>>> 	if (force_dma_unencrypted(dev))
>>> 		return phys_to_dma_unencrypted(dev, phys);
>>> 	return phys_to_dma(dev, phys);
>>>
>>> What are the ARM rules for generating dma addreses?
>>>
>>> 1) Device is T=0, memory is unencrypted, call dma_addr_unencrypted()
>>>     and do "top bit IBA set"
>>>
>>> 2) Device is T=1, memory is encrypted, use the phys_to_dma() normally
>>>
>>> 3) Device it T=1, memory is uncrypted, use the phys_to_dma()
>>>     normally??? Seems odd, I would have guessed the DMA address sould
>>>     be the same as case #1?
>>>
>>> Can you document this in a comment?
>>>
>>
>> If a device is operating in secure mode (T=1), it is currently assumed
>> that only access to private (encrypted) memory is supported.
> 
> No, this is no how the PCI specs were written as far as I
> understand. The XT bit thing is supposed to add more fine grained
> device side control over what memory the DMA can target. T alone does
> not do that.
> 
>> It is unclear whether devices would need to perform DMA to shared
>> (unencrypted) memory while operating in this mode, as TLPs with T=1
>> are generally expected to target private memory.
> 
> PCI SIG supports it, kernel should support it.

ACK. On Arm CCA, the device can access shared IPA, with T=1 transaction
as long as the mapping is active in the Stage2 managed by RMM.

Rather than mapping the entire memory from the host, it would be ideal
if the Coco vms have some sort of a callback to "make sure the DMA
wouldn't fault for a device". e.g, it could be as simple as touching
the page in Arm CCA (GFP_ZERO could do the trick, well one byte
per Granule is good). or an ACCEPT a given page.

Is this a problem for AMDE SNP / Intel TDX ?

Suzuki




> 
> Jason
Re: [RFC PATCH v1 04/38] tsm: Support DMA Allocation from private memory
Posted by Jason Gunthorpe 2 months ago
On Wed, Jul 30, 2025 at 11:09:35AM +0100, Suzuki K Poulose wrote:
> > > It is unclear whether devices would need to perform DMA to shared
> > > (unencrypted) memory while operating in this mode, as TLPs with T=1
> > > are generally expected to target private memory.
> > 
> > PCI SIG supports it, kernel should support it.
> 
> ACK. On Arm CCA, the device can access shared IPA, with T=1 transaction
> as long as the mapping is active in the Stage2 managed by RMM.

Right, I expect that the T=0 SMMU S2 translation is a perfect subset of
the T=1 S2 rmm translation. At most pages that are not available to
T=0 should be removed when making the subset.

I'm not sure what the plan is here on ARM though, do you expect to
pre-load the entire T=0 SMMU S2 with the shared IPA aliases and rely
on the GPT for protection or will the hypervisor dynamically change
the T=0 SMMU S2 after each shared/private change? Same question for
the RMM S2?

The first option sounds fairly appealing, IMHO

> Rather than mapping the entire memory from the host, it would be ideal
> if the Coco vms have some sort of a callback to "make sure the DMA
> wouldn't fault for a device". 

Isn't that a different topic? For right now we expect that all pages
are pinned and loaded into both S2s. Upon any private/shared
conversion the pages should be reloaded into the appropriate S2s if
required. The VM never needs to tell the hypervisor that it wants to
do DMA.

There are all sorts of options here to relax this but exploring them
it an entirely different project that CCA, IMHO.

Jason
Re: [RFC PATCH v1 04/38] tsm: Support DMA Allocation from private memory
Posted by Suzuki K Poulose 2 months ago
On 31/07/2025 13:17, Jason Gunthorpe wrote:
> On Wed, Jul 30, 2025 at 11:09:35AM +0100, Suzuki K Poulose wrote:
>>>> It is unclear whether devices would need to perform DMA to shared
>>>> (unencrypted) memory while operating in this mode, as TLPs with T=1
>>>> are generally expected to target private memory.
>>>
>>> PCI SIG supports it, kernel should support it.
>>
>> ACK. On Arm CCA, the device can access shared IPA, with T=1 transaction
>> as long as the mapping is active in the Stage2 managed by RMM.
> 
> Right, I expect that the T=0 SMMU S2 translation is a perfect subset of
> the T=1 S2 rmm translation. At most pages that are not available to
> T=0 should be removed when making the subset.

Yes, this is what the VMM is supposed to do today, see [0] & [1].

> 
> I'm not sure what the plan is here on ARM though, do you expect to
> pre-load the entire T=0 SMMU S2 with the shared IPA aliases and rely
> on the GPT for protection or will the hypervisor dynamically change
> the T=0 SMMU S2 after each shared/private change? Same question for

Yes, share/private transitions do go all the way back to VMM and it
is supposed to make the necessary changes to the SMMU S2 (as in [1]).

> the RMM S2?
> 

As for the RMM S2, the current plan is to re-use the CPU S2 managed
by RMM.

> The first option sounds fairly appealing, IMHO
> 
>> Rather than mapping the entire memory from the host, it would be ideal
>> if the Coco vms have some sort of a callback to "make sure the DMA
>> wouldn't fault for a device".
> 
> Isn't that a different topic? For right now we expect that all pages
> are pinned and loaded into both S2s. Upon any private/shared

Actually it is. But might solve the problem for confidential VMs,
where the S2 mapping is kind of pinned.

Population of S2 is a bit tricky for CVMs, as there are restrictions
due to :
   1) Pre-boot measurements
   2) Restrictions on modifying the S2 (at least on CCA).

Thus, "the preload S2 and pin" must be done, after the "Initial images 
are loaded". And that becomes tricky from the Hypervisor (thinking of 
this, the VMM may be able to do this properly, as long as it remembers
which areas where loaded).

Filling in the S2, with already populated S2 is complicated for CCA
(costly, but not impossible). But the easier way is for the Realm to
fault in the pages before they are used for DMA (and S2 mappings can be
pinned by the hyp as default). Hence that suggestion.

Suzuki

[0] https://gitlab.arm.com/linux-arm/kvmtool-cca/-/commit/7c34972ddc
[1] https://gitlab.arm.com/linux-arm/kvmtool-cca/-/commit/ab4e654c4

> conversion the pages should be reloaded into the appropriate S2s if
> required. The VM never needs to tell the hypervisor that it wants to
> do DMA.

> 
> There are all sorts of options here to relax this but exploring them
> it an entirely different project that CCA, IMHO.
> 
> Jason
Re: [RFC PATCH v1 04/38] tsm: Support DMA Allocation from private memory
Posted by Jason Gunthorpe 2 months ago
On Thu, Jul 31, 2025 at 02:48:23PM +0100, Suzuki K Poulose wrote:
> On 31/07/2025 13:17, Jason Gunthorpe wrote:
> > On Wed, Jul 30, 2025 at 11:09:35AM +0100, Suzuki K Poulose wrote:
> > > > > It is unclear whether devices would need to perform DMA to shared
> > > > > (unencrypted) memory while operating in this mode, as TLPs with T=1
> > > > > are generally expected to target private memory.
> > > > 
> > > > PCI SIG supports it, kernel should support it.
> > > 
> > > ACK. On Arm CCA, the device can access shared IPA, with T=1 transaction
> > > as long as the mapping is active in the Stage2 managed by RMM.
> > 
> > Right, I expect that the T=0 SMMU S2 translation is a perfect subset of
> > the T=1 S2 rmm translation. At most pages that are not available to
> > T=0 should be removed when making the subset.
> 
> Yes, this is what the VMM is supposed to do today, see [0] & [1].

Okay great!

> > I'm not sure what the plan is here on ARM though, do you expect to
> > pre-load the entire T=0 SMMU S2 with the shared IPA aliases and rely
> > on the GPT for protection or will the hypervisor dynamically change
> > the T=0 SMMU S2 after each shared/private change? Same question for
> 
> Yes, share/private transitions do go all the way back to VMM and it
> is supposed to make the necessary changes to the SMMU S2 (as in [1]).

Okay, it works, but also why?

From a hypervisor perspective when using VFIO I'd like the guestmemfd
to fix all the physical memory immediately, so the entire physical map
is fixed and known. Backed by 1G huge pages most likely.

Is there a reason not to just dump that into the T=0 SMMU using 1G
huge pages and never touch it again? The GPT provides protection?

Sure sounds appealing..

> As for the RMM S2, the current plan is to re-use the CPU S2 managed
> by RMM.

Yes, but my question is if the CPU will be prepopulated
 
> Actually it is. But might solve the problem for confidential VMs,
> where the S2 mapping is kind of pinned.

Not kind of pinned, it is pinned in the hypervisor..
 
> Population of S2 is a bit tricky for CVMs, as there are restrictions
> due to :
>   1) Pre-boot measurements
>   2) Restrictions on modifying the S2 (at least on CCA).

I haven't dug into any of this, but I'd challenge you to try to make
it run fast if the guestmemfd has a full fixed address map in 1G pages
and could just dump them into the RMM efficiently once during boot.

Perhaps there are ways to optimize the measurements for huge amounts
of zero'd memory.

> Filling in the S2, with already populated S2 is complicated for CCA
> (costly, but not impossible). But the easier way is for the Realm to
> fault in the pages before they are used for DMA (and S2 mappings can be
> pinned by the hyp as default). Hence that suggestion.

I guess, but it's weird, kinda slow, and the RMM can never unfault them..

How will you reconstruct the 1G huge pages in the S2 if you are only
populating on faults? Can you really fault the entire 1G page? If so
why can't it be prepopulated?

Jason
Re: [RFC PATCH v1 04/38] tsm: Support DMA Allocation from private memory
Posted by Suzuki K Poulose 2 months ago
On 31/07/2025 17:44, Jason Gunthorpe wrote:
> On Thu, Jul 31, 2025 at 02:48:23PM +0100, Suzuki K Poulose wrote:
>> On 31/07/2025 13:17, Jason Gunthorpe wrote:
>>> On Wed, Jul 30, 2025 at 11:09:35AM +0100, Suzuki K Poulose wrote:
>>>>>> It is unclear whether devices would need to perform DMA to shared
>>>>>> (unencrypted) memory while operating in this mode, as TLPs with T=1
>>>>>> are generally expected to target private memory.
>>>>>
>>>>> PCI SIG supports it, kernel should support it.
>>>>
>>>> ACK. On Arm CCA, the device can access shared IPA, with T=1 transaction
>>>> as long as the mapping is active in the Stage2 managed by RMM.
>>>
>>> Right, I expect that the T=0 SMMU S2 translation is a perfect subset of
>>> the T=1 S2 rmm translation. At most pages that are not available to
>>> T=0 should be removed when making the subset.
>>
>> Yes, this is what the VMM is supposed to do today, see [0] & [1].
> 
> Okay great!
> 
>>> I'm not sure what the plan is here on ARM though, do you expect to
>>> pre-load the entire T=0 SMMU S2 with the shared IPA aliases and rely
>>> on the GPT for protection or will the hypervisor dynamically change
>>> the T=0 SMMU S2 after each shared/private change? Same question for
>>
>> Yes, share/private transitions do go all the way back to VMM and it
>> is supposed to make the necessary changes to the SMMU S2 (as in [1]).
> 
> Okay, it works, but also why?
> 
>  From a hypervisor perspective when using VFIO I'd like the guestmemfd
> to fix all the physical memory immediately, so the entire physical map
> is fixed and known. Backed by 1G huge pages most likely.
> 
> Is there a reason not to just dump that into the T=0 SMMU using 1G
> huge pages and never touch it again? The GPT provides protection?

That is possible, once we get guest_memfd mmap support merged upstream.
GPT does provide protection. The only caveat is, does the guest_memfd
support this at all ? i.e., shared->private transitions with a shared
mapping in place (Though this is in SMMU only, not the Host CPU
pagetables)


> 
> Sure sounds appealing..
> 
>> As for the RMM S2, the current plan is to re-use the CPU S2 managed
>> by RMM.
> 
> Yes, but my question is if the CPU will be prepopulated
>   
>> Actually it is. But might solve the problem for confidential VMs,
>> where the S2 mapping is kind of pinned.
> 
> Not kind of pinned, it is pinned in the hypervisor..
>   
>> Population of S2 is a bit tricky for CVMs, as there are restrictions
>> due to :
>>    1) Pre-boot measurements
>>    2) Restrictions on modifying the S2 (at least on CCA).
> 
> I haven't dug into any of this, but I'd challenge you to try to make
> it run fast if the guestmemfd has a full fixed address map in 1G pages
> and could just dump them into the RMM efficiently once during boot.
> 
> Perhaps there are ways to optimize the measurements for huge amounts
> of zero'd memory.

There is. We (VMM) can choose not to "measure" the zero'd pages.


>> Filling in the S2, with already populated S2 is complicated for CCA
>> (costly, but not impossible). But the easier way is for the Realm to
>> fault in the pages before they are used for DMA (and S2 mappings can be
>> pinned by the hyp as default). Hence that suggestion.
> 
> I guess, but it's weird, kinda slow, and the RMM can never unfault them..
> 
> How will you reconstruct the 1G huge pages in the S2 if you are only
> populating on faults? Can you really fault the entire 1G page? If so
> why can't it be prepopulated?

It is tricky to prepopulate the 1G page, as parts of the pages may be
"populated" with contents. We can recreate the 1G block mapping by
"FOLD" ing the leaf level tables, all the way upto 1G, after the
mappings are created. We have to do that anyway for CCA.

I think we can go ahead with VMM pre-populating the entire DRAM
and keeping it pinned for DA. Rather than doing this from the
vfio kernel, it could be done by the VMM as it has better knowledge
of the populated contents and map the rest as "unmeasured" 0s.

Suzuki
Re: [RFC PATCH v1 04/38] tsm: Support DMA Allocation from private memory
Posted by Jason Gunthorpe 2 months ago
On Fri, Aug 01, 2025 at 10:30:35AM +0100, Suzuki K Poulose wrote:
> > Is there a reason not to just dump that into the T=0 SMMU using 1G
> > huge pages and never touch it again? The GPT provides protection?
> 
> That is possible, once we get guest_memfd mmap support merged upstream.
> GPT does provide protection. The only caveat is, does the guest_memfd
> support this at all ? i.e., shared->private transitions with a shared
> mapping in place (Though this is in SMMU only, not the Host CPU
> pagetables)

I don't know, we haven't got to the guestmemfd/IOMMU integration yet,
which is why I ask the questions.

I think AMD and ARM would both be interested in guestmemfd <-> iommu
working this way, at least.

> I think we can go ahead with VMM pre-populating the entire DRAM
> and keeping it pinned for DA. Rather than doing this from the
> vfio kernel, it could be done by the VMM as it has better knowledge
> of the populated contents and map the rest as "unmeasured" 0s.

Yes, if done it should be done by the VMM and run through
guestmemfd/kvm however that is agreed to.

Jason