x86: CVMs: Align memory conversions to 2M granularity

[RFC V1 1/5] swiotlb: Support allocating DMA memory from SWIOTLB

Posted by Vishal Annapurve 2 years ago

Modify SWIOTLB framework to allocate DMA memory always from SWIOTLB.

CVMs use SWIOTLB buffers for bouncing memory when using dma_map_* APIs
to setup memory for IO operations. SWIOTLB buffers are marked as shared
once during early boot.

Buffers allocated using dma_alloc_* APIs are allocated from kernel memory
and then converted to shared during each API invocation. This patch ensures
that such buffers are also allocated from already shared SWIOTLB
regions. This allows enforcing alignment requirements on regions marked
as shared.

Signed-off-by: Vishal Annapurve <vannapurve@google.com>
---
 include/linux/swiotlb.h | 17 +----------------
 kernel/dma/direct.c     |  4 ++--
 kernel/dma/swiotlb.c    |  5 +++--
 3 files changed, 6 insertions(+), 20 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index ecde0312dd52..058901313405 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -17,6 +17,7 @@ struct scatterlist;
 #define SWIOTLB_VERBOSE	(1 << 0) /* verbose initialization */
 #define SWIOTLB_FORCE	(1 << 1) /* force bounce buffering */
 #define SWIOTLB_ANY	(1 << 2) /* allow any memory for the buffer */
+#define SWIOTLB_ALLOC	(1 << 4) /* force dma allocation through swiotlb */
 
 /*
  * Maximum allowable number of contiguous slabs to map,
@@ -259,7 +260,6 @@ static inline phys_addr_t default_swiotlb_limit(void)
 
 extern void swiotlb_print_info(void);
 
-#ifdef CONFIG_DMA_RESTRICTED_POOL
 struct page *swiotlb_alloc(struct device *dev, size_t size);
 bool swiotlb_free(struct device *dev, struct page *page, size_t size);
 
@@ -267,20 +267,5 @@ static inline bool is_swiotlb_for_alloc(struct device *dev)
 {
 	return dev->dma_io_tlb_mem->for_alloc;
 }
-#else
-static inline struct page *swiotlb_alloc(struct device *dev, size_t size)
-{
-	return NULL;
-}
-static inline bool swiotlb_free(struct device *dev, struct page *page,
-				size_t size)
-{
-	return false;
-}
-static inline bool is_swiotlb_for_alloc(struct device *dev)
-{
-	return false;
-}
-#endif /* CONFIG_DMA_RESTRICTED_POOL */
 
 #endif /* __LINUX_SWIOTLB_H */
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 73c95815789a..a7d3266d3d83 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -78,7 +78,7 @@ bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t size)
 
 static int dma_set_decrypted(struct device *dev, void *vaddr, size_t size)
 {
-	if (!force_dma_unencrypted(dev))
+	if (!force_dma_unencrypted(dev) || is_swiotlb_for_alloc(dev))
 		return 0;
 	return set_memory_decrypted((unsigned long)vaddr, PFN_UP(size));
 }
@@ -87,7 +87,7 @@ static int dma_set_encrypted(struct device *dev, void *vaddr, size_t size)
 {
 	int ret;
 
-	if (!force_dma_unencrypted(dev))
+	if (!force_dma_unencrypted(dev) || is_swiotlb_for_alloc(dev))
 		return 0;
 	ret = set_memory_encrypted((unsigned long)vaddr, PFN_UP(size));
 	if (ret)
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 33d942615be5..a056d2f8b9ee 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -363,6 +363,7 @@ void __init swiotlb_init_remap(bool addressing_limit, unsigned int flags,
 
 	io_tlb_default_mem.force_bounce =
 		swiotlb_force_bounce || (flags & SWIOTLB_FORCE);
+	io_tlb_default_mem.for_alloc = (flags & SWIOTLB_ALLOC);
 
 #ifdef CONFIG_SWIOTLB_DYNAMIC
 	if (!remap)
@@ -1601,8 +1602,6 @@ static inline void swiotlb_create_debugfs_files(struct io_tlb_mem *mem,
 
 #endif	/* CONFIG_DEBUG_FS */
 
-#ifdef CONFIG_DMA_RESTRICTED_POOL
-
 struct page *swiotlb_alloc(struct device *dev, size_t size)
 {
 	struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
@@ -1634,6 +1633,8 @@ bool swiotlb_free(struct device *dev, struct page *page, size_t size)
 	return true;
 }
 
+#ifdef CONFIG_DMA_RESTRICTED_POOL
+
 static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
 				    struct device *dev)
 {
-- 
2.43.0.275.g3460e3d667-goog

Re: [RFC V1 1/5] swiotlb: Support allocating DMA memory from SWIOTLB

Posted by Kirill A. Shutemov 1 year, 12 months ago

On Fri, Jan 12, 2024 at 05:52:47AM +0000, Vishal Annapurve wrote:
> Modify SWIOTLB framework to allocate DMA memory always from SWIOTLB.
> 
> CVMs use SWIOTLB buffers for bouncing memory when using dma_map_* APIs
> to setup memory for IO operations. SWIOTLB buffers are marked as shared
> once during early boot.
> 
> Buffers allocated using dma_alloc_* APIs are allocated from kernel memory
> and then converted to shared during each API invocation. This patch ensures
> that such buffers are also allocated from already shared SWIOTLB
> regions. This allows enforcing alignment requirements on regions marked
> as shared.

But does it work in practice? 

Some devices (like GPUs) require a lot of DMA memory. So with this approach
we would need to have huge SWIOTLB buffer that is unused in most VMs.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

Re: [RFC V1 1/5] swiotlb: Support allocating DMA memory from SWIOTLB

Posted by Vishal Annapurve 1 year, 12 months ago

On Wed, Feb 14, 2024 at 8:20 PM Kirill A. Shutemov <kirill@shutemov.name> wrote:
>
> On Fri, Jan 12, 2024 at 05:52:47AM +0000, Vishal Annapurve wrote:
> > Modify SWIOTLB framework to allocate DMA memory always from SWIOTLB.
> >
> > CVMs use SWIOTLB buffers for bouncing memory when using dma_map_* APIs
> > to setup memory for IO operations. SWIOTLB buffers are marked as shared
> > once during early boot.
> >
> > Buffers allocated using dma_alloc_* APIs are allocated from kernel memory
> > and then converted to shared during each API invocation. This patch ensures
> > that such buffers are also allocated from already shared SWIOTLB
> > regions. This allows enforcing alignment requirements on regions marked
> > as shared.
>
> But does it work in practice?
>
> Some devices (like GPUs) require a lot of DMA memory. So with this approach
> we would need to have huge SWIOTLB buffer that is unused in most VMs.
>

Current implementation limits the size of statically allocated SWIOTLB
memory pool to 1G. I was proposing to enable dynamic SWIOTLB for CVMs
in addition to aligning the memory allocations to hugepage sizes, so
that the SWIOTLB pool can be scaled up on demand.

The issue with aligning the pool areas to hugepages is that hugepage
allocation at runtime is not guaranteed. Guaranteeing the hugepage
allocation might need calculating the upper bound in advance, which
defeats the purpose of enabling dynamic SWIOTLB. I am open to
suggestions here.

> --
>   Kiryl Shutsemau / Kirill A. Shutemov

Re: [RFC V1 1/5] swiotlb: Support allocating DMA memory from SWIOTLB

Posted by Alexander Graf 1 year, 12 months ago

On 15.02.24 04:33, Vishal Annapurve wrote:
> On Wed, Feb 14, 2024 at 8:20 PM Kirill A. Shutemov <kirill@shutemov.name> wrote:
>> On Fri, Jan 12, 2024 at 05:52:47AM +0000, Vishal Annapurve wrote:
>>> Modify SWIOTLB framework to allocate DMA memory always from SWIOTLB.
>>>
>>> CVMs use SWIOTLB buffers for bouncing memory when using dma_map_* APIs
>>> to setup memory for IO operations. SWIOTLB buffers are marked as shared
>>> once during early boot.
>>>
>>> Buffers allocated using dma_alloc_* APIs are allocated from kernel memory
>>> and then converted to shared during each API invocation. This patch ensures
>>> that such buffers are also allocated from already shared SWIOTLB
>>> regions. This allows enforcing alignment requirements on regions marked
>>> as shared.
>> But does it work in practice?
>>
>> Some devices (like GPUs) require a lot of DMA memory. So with this approach
>> we would need to have huge SWIOTLB buffer that is unused in most VMs.
>>
> Current implementation limits the size of statically allocated SWIOTLB
> memory pool to 1G. I was proposing to enable dynamic SWIOTLB for CVMs
> in addition to aligning the memory allocations to hugepage sizes, so
> that the SWIOTLB pool can be scaled up on demand.
>
> The issue with aligning the pool areas to hugepages is that hugepage
> allocation at runtime is not guaranteed. Guaranteeing the hugepage
> allocation might need calculating the upper bound in advance, which
> defeats the purpose of enabling dynamic SWIOTLB. I am open to
> suggestions here.


You could allocate a max bound at boot using CMA and then only fill into 
the CMA area when SWIOTLB size requirements increase? The CMA region 
will allow movable allocations as long as you don't require the CMA space.


Alex




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879

Re: [RFC V1 1/5] swiotlb: Support allocating DMA memory from SWIOTLB

Posted by Vishal Annapurve 1 year, 11 months ago

On Thu, Feb 15, 2024 at 3:14 PM Alexander Graf <graf@amazon.com> wrote:
> > ...
> > The issue with aligning the pool areas to hugepages is that hugepage
> > allocation at runtime is not guaranteed. Guaranteeing the hugepage
> > allocation might need calculating the upper bound in advance, which
> > defeats the purpose of enabling dynamic SWIOTLB. I am open to
> > suggestions here.
>
>
> You could allocate a max bound at boot using CMA and then only fill into
> the CMA area when SWIOTLB size requirements increase? The CMA region
> will allow movable allocations as long as you don't require the CMA space.

Thanks Alex for the inputs, I wanted to understand CMA better before
responding here.

I am trying to get the following requirements satisfied:
1) SWIOTLB pools are aligned to hugepage sizes.
2) SWIOTLB pool areas can be scaled up dynamically on demand to avoid
pre-allocating large memory ranges.

Using CMA to back SWIOTLB pools for CoCo VMs as per this suggestion would need:
1) Pre-configuring CMA areas with a certain amount at boot either with,
   - command line argument to the kernel (tedious to specify with
different memory shapes) or
   - kernel init time hook called by architecture specific code to
setup CMA areas according to the amount of memory available (Possibly
a percentage of memory as done for SWIOTLB configuration)
2) SWIOTLB pool dynamic allocation logic can first scan for CMA areas
to find the hugepage aligned ranges,  and if not found, can fall back
to allocate the ranges using buddy allocator (which should ideally
happen after depleting the CMA area).
3) SWIOTLB pool regions would need to be allocatable from >4G ranges as well.

Setting up a suitable percentage of memory for CMA area in case of
CoCo VMs will allow larger SWIOTLB pool area additions, this should
help alleviate Michael Kelley's concern about spin lock contention due
to smaller pool areas. This will need some analysis of shared memory
usage with current devices in use with CoCo VMs, especially GPUs which
might need large amounts of shared memory.

>
>
> Alex