[PATCH] device-tree: Improve hwdom memory allocation for DMA

Michal Orzel posted 1 patch 2 weeks ago
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/20260417091155.39653-1-michal.orzel@amd.com
xen/common/device-tree/domain-build.c | 152 ++++++++++++++++----------
1 file changed, 96 insertions(+), 56 deletions(-)
[PATCH] device-tree: Improve hwdom memory allocation for DMA
Posted by Michal Orzel 2 weeks ago
When LLC coloring is enabled, the hardware domain gets memory from
host free regions rather than the fixed guest RAM banks.  The previous
code sorted these regions by descending size, which usually causes
high-address memory to be allocated first.

All allocated memory could reside above 4 GB leaving DMA non-functional
for devices with limited addressing capabilities.

Improve the handling as follows:
- Sort free regions by ascending address instead of descending size,
  so low-memory banks are allocated first,
- Skip banks smaller than 128 MB (or the total remaining allocation,
  whichever is less) until the first bank is placed, ensuring
  place_modules() has enough contiguous space,
- Extract the hardware domain allocation path into its own function
  (allocate_hwdom_memory) for clarity.

Signed-off-by: Michal Orzel <michal.orzel@amd.com>
---
 xen/common/device-tree/domain-build.c | 152 ++++++++++++++++----------
 1 file changed, 96 insertions(+), 56 deletions(-)

diff --git a/xen/common/device-tree/domain-build.c b/xen/common/device-tree/domain-build.c
index 540627b74e96..c51520ebadf9 100644
--- a/xen/common/device-tree/domain-build.c
+++ b/xen/common/device-tree/domain-build.c
@@ -133,9 +133,9 @@ static int __init add_hwdom_free_regions(unsigned long s_gfn,
     e += 1;
     size = (e - start) & ~(SZ_2M - 1);
 
-    /* Find the insert position (descending order). */
-    for ( i = 0; i < free_regions->nr_banks ; i++ )
-        if ( size > free_regions->bank[i].size )
+    /* Find the insert position (ascending address order). */
+    for ( i = 0; i < free_regions->nr_banks; i++ )
+        if ( start < free_regions->bank[i].start )
             break;
 
     /* Move the other banks to make space. */
@@ -234,82 +234,123 @@ out:
     return res;
 }
 
-void __init allocate_memory(struct domain *d, struct kernel_info *kinfo)
+/*
+ * Allocate memory for the hardware domain using the host memory layout, when
+ * the domain is not direct mapped. The only case for this is when LLC coloring
+ * is enabled.
+ *
+ * Banks are sorted by ascending address from add_hwdom_free_regions(), so
+ * low memory banks are naturally allocated first (if any). This allows the
+ * hardware domain to have memory reachable by devices with limited DMA address
+ * capabilities (e.g. 32-bit DMA).
+ *
+ * The first bank allocated must be large enough for place_modules() to fit
+ * the kernel, DTB and initrd.
+ */
+static bool __init allocate_hwdom_memory(struct kernel_info *kinfo)
 {
+    const paddr_t min_bank_size =
+        min_t(paddr_t, kinfo->unassigned_mem, MB(128));
     struct membanks *mem = kernel_info_get_mem(kinfo);
-    unsigned int i, nr_banks = GUEST_RAM_BANKS;
-    struct membanks *hwdom_free_mem = NULL;
-
-    printk(XENLOG_INFO "Allocating mappings totalling %ldMB for %pd:\n",
-           /* Don't want format this as PRIpaddr (16 digit hex) */
-           (unsigned long)(kinfo->unassigned_mem >> 20), d);
+    unsigned int i, nr_banks;
+    struct membanks *hwdom_free_mem;
+    struct membanks *gnttab =
+        IS_ENABLED(CONFIG_GRANT_TABLE)
+        ? membanks_xzalloc(1, MEMORY)
+        : NULL;
 
-    mem->nr_banks = 0;
     /*
-     * Use host memory layout for hwdom. Only case for this is when LLC coloring
-     * is enabled.
+     * Exclude the following regions:
+     * 1) Remove reserved memory
+     * 2) Grant table assigned to hwdom
      */
-    if ( is_hardware_domain(d) )
-    {
-        struct membanks *gnttab =
-            IS_ENABLED(CONFIG_GRANT_TABLE)
-            ? membanks_xzalloc(1, MEMORY)
-            : NULL;
-        /*
-         * Exclude the following regions:
-         * 1) Remove reserved memory
-         * 2) Grant table assigned to hwdom
-         */
-        const struct membanks *mem_banks[] = {
-            bootinfo_get_reserved_mem(),
-            gnttab,
-        };
+    const struct membanks *mem_banks[] = {
+        bootinfo_get_reserved_mem(),
+        gnttab,
+    };
 
 #ifdef CONFIG_GRANT_TABLE
-        if ( !gnttab )
-            goto fail;
+    if ( !gnttab )
+        return false;
 
-        gnttab->nr_banks = 1;
-        gnttab->bank[0].start = kinfo->gnttab_start;
-        gnttab->bank[0].size = kinfo->gnttab_size;
+    gnttab->nr_banks = 1;
+    gnttab->bank[0].start = kinfo->gnttab_start;
+    gnttab->bank[0].size = kinfo->gnttab_size;
 #endif
 
-        hwdom_free_mem = membanks_xzalloc(NR_MEM_BANKS, MEMORY);
-        if ( !hwdom_free_mem )
-            goto fail;
-
-        if ( find_unallocated_memory(kinfo, mem_banks, ARRAY_SIZE(mem_banks),
-                                     hwdom_free_mem, add_hwdom_free_regions) )
-            goto fail;
+    hwdom_free_mem = membanks_xzalloc(NR_MEM_BANKS, MEMORY);
+    if ( !hwdom_free_mem )
+    {
+        xfree(gnttab);
+        return false;
+    }
 
-        nr_banks = hwdom_free_mem->nr_banks;
+    if ( find_unallocated_memory(kinfo, mem_banks, ARRAY_SIZE(mem_banks),
+                                 hwdom_free_mem, add_hwdom_free_regions) )
+    {
         xfree(gnttab);
+        xfree(hwdom_free_mem);
+        return false;
     }
 
-    for ( i = 0; kinfo->unassigned_mem > 0 && nr_banks > 0; i++, nr_banks-- )
+    xfree(gnttab);
+    nr_banks = hwdom_free_mem->nr_banks;
+
+    for ( i = 0; (kinfo->unassigned_mem > 0) && (i < nr_banks); i++ )
     {
-        paddr_t bank_start, bank_size;
+        paddr_t bank_size;
+
+        /*
+         * The first bank must be large enough for place_modules() to
+         * fit the kernel, DTB and initrd.  Skip small regions to avoid
+         * ending up with a tiny first bank.
+         */
+        if ( !mem->nr_banks && (hwdom_free_mem->bank[i].size < min_bank_size) )
+            continue;
 
-        if ( is_hardware_domain(d) )
+        bank_size = MIN(hwdom_free_mem->bank[i].size, kinfo->unassigned_mem);
+        if ( !allocate_bank_memory(kinfo,
+                                   gaddr_to_gfn(hwdom_free_mem->bank[i].start),
+                                   bank_size) )
         {
-            bank_start = hwdom_free_mem->bank[i].start;
-            bank_size = hwdom_free_mem->bank[i].size;
+            xfree(hwdom_free_mem);
+            return false;
         }
-        else
+    }
+
+    xfree(hwdom_free_mem);
+    return true;
+}
+
+void __init allocate_memory(struct domain *d, struct kernel_info *kinfo)
+{
+    struct membanks *mem = kernel_info_get_mem(kinfo);
+    unsigned int i;
+
+    printk(XENLOG_INFO "Allocating mappings totalling %ldMB for %pd:\n",
+           /* Don't want format this as PRIpaddr (16 digit hex) */
+           (unsigned long)(kinfo->unassigned_mem >> 20), d);
+
+    mem->nr_banks = 0;
+
+    if ( is_hardware_domain(d) )
+    {
+        if ( !allocate_hwdom_memory(kinfo) )
+            goto fail;
+    }
+    else
+    {
+        for ( i = 0; kinfo->unassigned_mem > 0 && i < GUEST_RAM_BANKS; i++ )
         {
             const uint64_t bankbase[] = GUEST_RAM_BANK_BASES;
             const uint64_t banksize[] = GUEST_RAM_BANK_SIZES;
+            paddr_t bank_size;
 
-            if ( i >= GUEST_RAM_BANKS )
+            bank_size = MIN(banksize[i], kinfo->unassigned_mem);
+            if ( !allocate_bank_memory(kinfo, gaddr_to_gfn(bankbase[i]),
+                                       bank_size) )
                 goto fail;
-
-            bank_start = bankbase[i];
-            bank_size = banksize[i];
         }
-
-        bank_size = MIN(bank_size, kinfo->unassigned_mem);
-        if ( !allocate_bank_memory(kinfo, gaddr_to_gfn(bank_start), bank_size) )
-            goto fail;
     }
 
     if ( kinfo->unassigned_mem )
@@ -326,7 +367,6 @@ void __init allocate_memory(struct domain *d, struct kernel_info *kinfo)
                (unsigned long)(mem->bank[i].size >> 20));
     }
 
-    xfree(hwdom_free_mem);
     return;
 
   fail:
-- 
2.43.0
Re: [PATCH] device-tree: Improve hwdom memory allocation for DMA
Posted by Luca Fancellu 1 week, 4 days ago
Hi Michal,

> On 17 Apr 2026, at 10:11, Michal Orzel <michal.orzel@amd.com> wrote:
> 
> When LLC coloring is enabled, the hardware domain gets memory from
> host free regions rather than the fixed guest RAM banks.  The previous
> code sorted these regions by descending size, which usually causes
> high-address memory to be allocated first.
> 
> All allocated memory could reside above 4 GB leaving DMA non-functional
> for devices with limited addressing capabilities.
> 
> Improve the handling as follows:
> - Sort free regions by ascending address instead of descending size,
>  so low-memory banks are allocated first,
> - Skip banks smaller than 128 MB (or the total remaining allocation,
>  whichever is less) until the first bank is placed, ensuring
>  place_modules() has enough contiguous space,
> - Extract the hardware domain allocation path into its own function
>  (allocate_hwdom_memory) for clarity.
> 
> Signed-off-by: Michal Orzel <michal.orzel@amd.com>
> ---
> xen/common/device-tree/domain-build.c | 152 ++++++++++++++++----------
> 1 file changed, 96 insertions(+), 56 deletions(-)

It looks ok to me, the only thing is that if we have many smaller banks < 128 Mb before reaching
one that is at least 128 Mb, we won’t allocate them and loose them forever.

It feels only a corner case so for me it’s ok

Reviewed-by: Luca Fancellu <luca.fancellu@arm.com <mailto:luca.fancellu@arm.com>>

Cheers,
Luca


Re: [PATCH] device-tree: Improve hwdom memory allocation for DMA
Posted by Orzel, Michal 1 week, 3 days ago

On 20/04/2026 16:54, Luca Fancellu wrote:
> Hi Michal,
> 
>> On 17 Apr 2026, at 10:11, Michal Orzel <michal.orzel@amd.com> wrote:
>>
>> When LLC coloring is enabled, the hardware domain gets memory from
>> host free regions rather than the fixed guest RAM banks.  The previous
>> code sorted these regions by descending size, which usually causes
>> high-address memory to be allocated first.
>>
>> All allocated memory could reside above 4 GB leaving DMA non-functional
>> for devices with limited addressing capabilities.
>>
>> Improve the handling as follows:
>> - Sort free regions by ascending address instead of descending size,
>>  so low-memory banks are allocated first,
>> - Skip banks smaller than 128 MB (or the total remaining allocation,
>>  whichever is less) until the first bank is placed, ensuring
>>  place_modules() has enough contiguous space,
>> - Extract the hardware domain allocation path into its own function
>>  (allocate_hwdom_memory) for clarity.
>>
>> Signed-off-by: Michal Orzel <michal.orzel@amd.com>
>> ---
>> xen/common/device-tree/domain-build.c | 152 ++++++++++++++++----------
>> 1 file changed, 96 insertions(+), 56 deletions(-)
> 
> It looks ok to me, the only thing is that if we have many smaller banks < 128 Mb before reaching
> one that is at least 128 Mb, we won’t allocate them and loose them forever.
> 
> It feels only a corner case so for me it’s ok
Yes, that's something documented in the commit msg, so if we ever face upon the
issue due to that we could revisit the implementation. For now, we haven't
observed any issues.

> 
> Reviewed-by: Luca Fancellu <luca.fancellu@arm.com <mailto:luca.fancellu@arm.com>>
Thanks.

~Michal

> 
> Cheers,
> Luca
> 
> 


Re: [PATCH] device-tree: Improve hwdom memory allocation for DMA
Posted by Luca Fancellu 1 week, 3 days ago
>> 
>> It looks ok to me, the only thing is that if we have many smaller banks < 128 Mb before reaching
>> one that is at least 128 Mb, we won’t allocate them and loose them forever.
>> 
>> It feels only a corner case so for me it’s ok
> Yes, that's something documented in the commit msg, so if we ever face upon the
> issue due to that we could revisit the implementation. For now, we haven't
> observed any issues.
> 
>> 
>> Reviewed-by: Luca Fancellu <luca.fancellu@arm.com <mailto:luca.fancellu@arm.com>>
> Thanks.

I realised my mail client messed up the tag:

Reviewed-by: Luca Fancellu <luca.fancellu@arm.com>

Cheers,
Luca

Re: [PATCH] device-tree: Improve hwdom memory allocation for DMA
Posted by Stefano Stabellini 1 week, 3 days ago
On Tue, 21 Apr 2026, Luca Fancellu wrote:
> >> It looks ok to me, the only thing is that if we have many smaller banks < 128 Mb before reaching
> >> one that is at least 128 Mb, we won’t allocate them and loose them forever.
> >> 
> >> It feels only a corner case so for me it’s ok
> > Yes, that's something documented in the commit msg, so if we ever face upon the
> > issue due to that we could revisit the implementation. For now, we haven't
> > observed any issues.
> > 
> >> 
> >> Reviewed-by: Luca Fancellu <luca.fancellu@arm.com <mailto:luca.fancellu@arm.com>>
> > Thanks.
> 
> I realised my mail client messed up the tag:
> 
> Reviewed-by: Luca Fancellu <luca.fancellu@arm.com>

Acked-by: Stefano Stabellini <sstabellini@kernel.org>