[PATCH] xen/dom0less: Increase guest DTB size for high-vCPU guests

Oleksandr Tyshchenko posted 1 patch 1 week, 1 day ago
xen/common/device-tree/dom0less-build.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
[PATCH] xen/dom0less: Increase guest DTB size for high-vCPU guests
Posted by Oleksandr Tyshchenko 1 week, 1 day ago
Creating a guest with a high vCPU count (e.g., >32) fails because
the guest's device tree buffer (DOMU_DTB_SIZE) overflows during creation.
The FDT nodes for each vCPU quickly exhaust the 4KiB buffer,
causing a guest creation failure.

Increase the buffer size to 16KiB to support guests up to
the MAX_VIRT_CPUS limit (128).

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
---
Noticed when testing the boundary conditions for dom0less guest
creation on Arm64.

Domain configuration:
fdt mknod /chosen domU0
fdt set /chosen/domU0 compatible "xen,domain"
fdt set /chosen/domU0 \#address-cells <0x2>
fdt set /chosen/domU0 \#size-cells <0x2>
fdt set /chosen/domU0 memory <0x0 0x10000 >
fdt set /chosen/domU0 cpus <33>
fdt set /chosen/domU0 vpl011
fdt mknod /chosen/domU0 module@40400000
fdt set /chosen/domU0/module@40400000 compatible  "multiboot,kernel" "multiboot,module"
fdt set /chosen/domU0/module@40400000 reg <0x0 0x40400000 0x0 0x16000 >
fdt set /chosen/domU0/module@40400000 bootargs "console=ttyAMA0"

Failure log:
(XEN) Xen dom0less mode detected
(XEN) *** LOADING DOMU cpus=33 memory=0x10000KB ***
(XEN) Loading d1 kernel from boot module @ 0000000040400000
(XEN) Allocating mappings totalling 64MB for d1:
(XEN) d1 BANK[0] 0x00000040000000-0x00000044000000 (64MB)
(XEN) Device tree generation failed (-22).
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Could not set up domain domU0 (rc = -22)
(XEN) ****************************************
---
---
 xen/common/device-tree/dom0less-build.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/xen/common/device-tree/dom0less-build.c b/xen/common/device-tree/dom0less-build.c
index 3f5b987ed8..d7d0a47b97 100644
--- a/xen/common/device-tree/dom0less-build.c
+++ b/xen/common/device-tree/dom0less-build.c
@@ -461,10 +461,12 @@ static int __init domain_handle_dtb_boot_module(struct domain *d,
 
 /*
  * The max size for DT is 2MB. However, the generated DT is small (not including
- * domU passthrough DT nodes whose size we account separately), 4KB are enough
- * for now, but we might have to increase it in the future.
+ * domU passthrough DT nodes whose size we account separately). The size is
+ * primarily driven by the number of vCPU nodes. The previous 4KiB buffer was
+ * insufficient for guests with high vCPU counts, so it has been increased
+ * to support up to the MAX_VIRT_CPUS limit (128).
  */
-#define DOMU_DTB_SIZE 4096
+#define DOMU_DTB_SIZE (4096 * 4)
 static int __init prepare_dtb_domU(struct domain *d, struct kernel_info *kinfo)
 {
     int addrcells, sizecells;
-- 
2.34.1
Re: [PATCH] xen/dom0less: Increase guest DTB size for high-vCPU guests
Posted by Grygorii Strashko 1 week, 1 day ago

On 02.12.25 21:32, Oleksandr Tyshchenko wrote:
> Creating a guest with a high vCPU count (e.g., >32) fails because
> the guest's device tree buffer (DOMU_DTB_SIZE) overflows during creation.
> The FDT nodes for each vCPU quickly exhaust the 4KiB buffer,
> causing a guest creation failure.
> 
> Increase the buffer size to 16KiB to support guests up to
> the MAX_VIRT_CPUS limit (128).
> 
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> ---
> Noticed when testing the boundary conditions for dom0less guest
> creation on Arm64.
> 
> Domain configuration:
> fdt mknod /chosen domU0
> fdt set /chosen/domU0 compatible "xen,domain"
> fdt set /chosen/domU0 \#address-cells <0x2>
> fdt set /chosen/domU0 \#size-cells <0x2>
> fdt set /chosen/domU0 memory <0x0 0x10000 >
> fdt set /chosen/domU0 cpus <33>
> fdt set /chosen/domU0 vpl011
> fdt mknod /chosen/domU0 module@40400000
> fdt set /chosen/domU0/module@40400000 compatible  "multiboot,kernel" "multiboot,module"
> fdt set /chosen/domU0/module@40400000 reg <0x0 0x40400000 0x0 0x16000 >
> fdt set /chosen/domU0/module@40400000 bootargs "console=ttyAMA0"
> 
> Failure log:
> (XEN) Xen dom0less mode detected
> (XEN) *** LOADING DOMU cpus=33 memory=0x10000KB ***
> (XEN) Loading d1 kernel from boot module @ 0000000040400000
> (XEN) Allocating mappings totalling 64MB for d1:
> (XEN) d1 BANK[0] 0x00000040000000-0x00000044000000 (64MB)
> (XEN) Device tree generation failed (-22).
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 0:
> (XEN) Could not set up domain domU0 (rc = -22)
> (XEN) ****************************************
> ---
> ---
>   xen/common/device-tree/dom0less-build.c | 8 +++++---
>   1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/xen/common/device-tree/dom0less-build.c b/xen/common/device-tree/dom0less-build.c
> index 3f5b987ed8..d7d0a47b97 100644
> --- a/xen/common/device-tree/dom0less-build.c
> +++ b/xen/common/device-tree/dom0less-build.c
> @@ -461,10 +461,12 @@ static int __init domain_handle_dtb_boot_module(struct domain *d,
>   
>   /*
>    * The max size for DT is 2MB. However, the generated DT is small (not including
> - * domU passthrough DT nodes whose size we account separately), 4KB are enough
> - * for now, but we might have to increase it in the future.
> + * domU passthrough DT nodes whose size we account separately). The size is
> + * primarily driven by the number of vCPU nodes. The previous 4KiB buffer was
> + * insufficient for guests with high vCPU counts, so it has been increased
> + * to support up to the MAX_VIRT_CPUS limit (128).
>    */
> -#define DOMU_DTB_SIZE 4096
> +#define DOMU_DTB_SIZE (4096 * 4)

May be It wants Kconfig?
Or some formula which accounts MAX_VIRT_CPUS?

>   static int __init prepare_dtb_domU(struct domain *d, struct kernel_info *kinfo)
>   {
>       int addrcells, sizecells;

-- 
Best regards,
-grygorii
Re: [PATCH] xen/dom0less: Increase guest DTB size for high-vCPU guests
Posted by Oleksandr Tyshchenko 1 week ago

On 02.12.25 23:33, Grygorii Strashko wrote:


Hello Grygorii

> 
> 
> On 02.12.25 21:32, Oleksandr Tyshchenko wrote:
>> Creating a guest with a high vCPU count (e.g., >32) fails because
>> the guest's device tree buffer (DOMU_DTB_SIZE) overflows during creation.
>> The FDT nodes for each vCPU quickly exhaust the 4KiB buffer,
>> causing a guest creation failure.
>>
>> Increase the buffer size to 16KiB to support guests up to
>> the MAX_VIRT_CPUS limit (128).
>>
>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>> ---
>> Noticed when testing the boundary conditions for dom0less guest
>> creation on Arm64.
>>
>> Domain configuration:
>> fdt mknod /chosen domU0
>> fdt set /chosen/domU0 compatible "xen,domain"
>> fdt set /chosen/domU0 \#address-cells <0x2>
>> fdt set /chosen/domU0 \#size-cells <0x2>
>> fdt set /chosen/domU0 memory <0x0 0x10000 >
>> fdt set /chosen/domU0 cpus <33>
>> fdt set /chosen/domU0 vpl011
>> fdt mknod /chosen/domU0 module@40400000
>> fdt set /chosen/domU0/module@40400000 compatible  "multiboot,kernel" 
>> "multiboot,module"
>> fdt set /chosen/domU0/module@40400000 reg <0x0 0x40400000 0x0 0x16000 >
>> fdt set /chosen/domU0/module@40400000 bootargs "console=ttyAMA0"
>>
>> Failure log:
>> (XEN) Xen dom0less mode detected
>> (XEN) *** LOADING DOMU cpus=33 memory=0x10000KB ***
>> (XEN) Loading d1 kernel from boot module @ 0000000040400000
>> (XEN) Allocating mappings totalling 64MB for d1:
>> (XEN) d1 BANK[0] 0x00000040000000-0x00000044000000 (64MB)
>> (XEN) Device tree generation failed (-22).
>> (XEN)
>> (XEN) ****************************************
>> (XEN) Panic on CPU 0:
>> (XEN) Could not set up domain domU0 (rc = -22)
>> (XEN) ****************************************
>> ---
>> ---
>>   xen/common/device-tree/dom0less-build.c | 8 +++++---
>>   1 file changed, 5 insertions(+), 3 deletions(-)
>>
>> diff --git a/xen/common/device-tree/dom0less-build.c b/xen/common/ 
>> device-tree/dom0less-build.c
>> index 3f5b987ed8..d7d0a47b97 100644
>> --- a/xen/common/device-tree/dom0less-build.c
>> +++ b/xen/common/device-tree/dom0less-build.c
>> @@ -461,10 +461,12 @@ static int __init 
>> domain_handle_dtb_boot_module(struct domain *d,
>>   /*
>>    * The max size for DT is 2MB. However, the generated DT is small 
>> (not including
>> - * domU passthrough DT nodes whose size we account separately), 4KB 
>> are enough
>> - * for now, but we might have to increase it in the future.
>> + * domU passthrough DT nodes whose size we account separately). The 
>> size is
>> + * primarily driven by the number of vCPU nodes. The previous 4KiB 
>> buffer was
>> + * insufficient for guests with high vCPU counts, so it has been 
>> increased
>> + * to support up to the MAX_VIRT_CPUS limit (128).
>>    */
>> -#define DOMU_DTB_SIZE 4096
>> +#define DOMU_DTB_SIZE (4096 * 4)
> 
> May be It wants Kconfig?
> Or some formula which accounts MAX_VIRT_CPUS?


I agree that using a formula that accounts for MAX_VIRT_CPUS is the most 
robust approach.

Here is the empirical data (by testing with the maximum number of device 
tree nodes (e.g., hypervisor and reserved-memory nodes) and enabling all 
optional CPU properties (e.g., clock-frequency)):

cpus=1
(XEN) Final compacted FDT size is: 1586 bytes

cpus=2
(XEN) Final compacted FDT size is: 1698 bytes

cpus=32
(XEN) Final compacted FDT size is: 5058 bytes

cpus=128
(XEN) Final compacted FDT size is: 15810 bytes


static int __init prepare_dtb_domU(struct domain *d, struct kernel_info 
*kinfo)
  {
      int addrcells, sizecells;
@@ -569,6 +569,8 @@ static int __init prepare_dtb_domU(struct domain *d, 
struct kernel_info *kinfo)
      if ( ret < 0 )
          goto err;

+    printk("Final compacted FDT size is: %d bytes\n", 
fdt_totalsize(kinfo->fdt));
+
      return 0;

    err:

This data shows (assuming my testing/calculations are correct):

- A marginal cost of 112 bytes per vCPU in the final, compacted device tree.
- A fixed base size of 1474 bytes for all non-vCPU content.

Based on that I would propose the following formula with the justification:

/*
  * The size is calculated from a fixed baseline plus a scalable
  * portion for each potential vCPU node up to the system limit
  * (MAX_VIRT_CPUS), as the vCPU nodes are the primary consumer
  * of space.
  *
  * The baseline of 2KiB is a safe buffer for all non-vCPU FDT
  * content. The 128 bytes per vCPU is derived from a worst-case
  * analysis of the FDT construction-time size for a single
  * vCPU node.
  */
#define DOMU_DTB_SIZE (2048 + (MAX_VIRT_CPUS * 128))

**********************************************

Please tell me would you be happy with that?


> 
>>   static int __init prepare_dtb_domU(struct domain *d, struct 
>> kernel_info *kinfo)
>>   {
>>       int addrcells, sizecells;
> 
Re: [PATCH] xen/dom0less: Increase guest DTB size for high-vCPU guests
Posted by Grygorii Strashko 1 week ago
Hi Oleksandr,

On 03.12.25 13:03, Oleksandr Tyshchenko wrote:
> 
> 
> On 02.12.25 23:33, Grygorii Strashko wrote:
> 
> 
> Hello Grygorii
> 
>>
>>
>> On 02.12.25 21:32, Oleksandr Tyshchenko wrote:
>>> Creating a guest with a high vCPU count (e.g., >32) fails because
>>> the guest's device tree buffer (DOMU_DTB_SIZE) overflows during creation.
>>> The FDT nodes for each vCPU quickly exhaust the 4KiB buffer,
>>> causing a guest creation failure.
>>>
>>> Increase the buffer size to 16KiB to support guests up to
>>> the MAX_VIRT_CPUS limit (128).
>>>
>>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>> ---
>>> Noticed when testing the boundary conditions for dom0less guest
>>> creation on Arm64.
>>>
>>> Domain configuration:
>>> fdt mknod /chosen domU0
>>> fdt set /chosen/domU0 compatible "xen,domain"
>>> fdt set /chosen/domU0 \#address-cells <0x2>
>>> fdt set /chosen/domU0 \#size-cells <0x2>
>>> fdt set /chosen/domU0 memory <0x0 0x10000 >
>>> fdt set /chosen/domU0 cpus <33>
>>> fdt set /chosen/domU0 vpl011
>>> fdt mknod /chosen/domU0 module@40400000
>>> fdt set /chosen/domU0/module@40400000 compatible  "multiboot,kernel"
>>> "multiboot,module"
>>> fdt set /chosen/domU0/module@40400000 reg <0x0 0x40400000 0x0 0x16000 >
>>> fdt set /chosen/domU0/module@40400000 bootargs "console=ttyAMA0"
>>>
>>> Failure log:
>>> (XEN) Xen dom0less mode detected
>>> (XEN) *** LOADING DOMU cpus=33 memory=0x10000KB ***
>>> (XEN) Loading d1 kernel from boot module @ 0000000040400000
>>> (XEN) Allocating mappings totalling 64MB for d1:
>>> (XEN) d1 BANK[0] 0x00000040000000-0x00000044000000 (64MB)
>>> (XEN) Device tree generation failed (-22).
>>> (XEN)
>>> (XEN) ****************************************
>>> (XEN) Panic on CPU 0:
>>> (XEN) Could not set up domain domU0 (rc = -22)
>>> (XEN) ****************************************
>>> ---
>>> ---
>>>    xen/common/device-tree/dom0less-build.c | 8 +++++---
>>>    1 file changed, 5 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/xen/common/device-tree/dom0less-build.c b/xen/common/
>>> device-tree/dom0less-build.c
>>> index 3f5b987ed8..d7d0a47b97 100644
>>> --- a/xen/common/device-tree/dom0less-build.c
>>> +++ b/xen/common/device-tree/dom0less-build.c
>>> @@ -461,10 +461,12 @@ static int __init
>>> domain_handle_dtb_boot_module(struct domain *d,
>>>    /*
>>>     * The max size for DT is 2MB. However, the generated DT is small
>>> (not including
>>> - * domU passthrough DT nodes whose size we account separately), 4KB
>>> are enough
>>> - * for now, but we might have to increase it in the future.
>>> + * domU passthrough DT nodes whose size we account separately). The
>>> size is
>>> + * primarily driven by the number of vCPU nodes. The previous 4KiB
>>> buffer was
>>> + * insufficient for guests with high vCPU counts, so it has been
>>> increased
>>> + * to support up to the MAX_VIRT_CPUS limit (128).
>>>     */
>>> -#define DOMU_DTB_SIZE 4096
>>> +#define DOMU_DTB_SIZE (4096 * 4)
>>
>> May be It wants Kconfig?
>> Or some formula which accounts MAX_VIRT_CPUS?
> 
> 
> I agree that using a formula that accounts for MAX_VIRT_CPUS is the most
> robust approach.
> 
> Here is the empirical data (by testing with the maximum number of device
> tree nodes (e.g., hypervisor and reserved-memory nodes) and enabling all
> optional CPU properties (e.g., clock-frequency)):
> 
> cpus=1
> (XEN) Final compacted FDT size is: 1586 bytes
> 
> cpus=2
> (XEN) Final compacted FDT size is: 1698 bytes
> 
> cpus=32
> (XEN) Final compacted FDT size is: 5058 bytes
> 
> cpus=128
> (XEN) Final compacted FDT size is: 15810 bytes
> 
> 
> static int __init prepare_dtb_domU(struct domain *d, struct kernel_info
> *kinfo)
>    {
>        int addrcells, sizecells;
> @@ -569,6 +569,8 @@ static int __init prepare_dtb_domU(struct domain *d,
> struct kernel_info *kinfo)
>        if ( ret < 0 )
>            goto err;
> 
> +    printk("Final compacted FDT size is: %d bytes\n",
> fdt_totalsize(kinfo->fdt));
> +
>        return 0;
> 
>      err:
> 
> This data shows (assuming my testing/calculations are correct):
> 
> - A marginal cost of 112 bytes per vCPU in the final, compacted device tree.
> - A fixed base size of 1474 bytes for all non-vCPU content.

Thank for detailed analyses and info.

> 
> Based on that I would propose the following formula with the justification:
> 
> /*
>    * The size is calculated from a fixed baseline plus a scalable
>    * portion for each potential vCPU node up to the system limit
>    * (MAX_VIRT_CPUS), as the vCPU nodes are the primary consumer
>    * of space.
>    *
>    * The baseline of 2KiB is a safe buffer for all non-vCPU FDT
>    * content. The 128 bytes per vCPU is derived from a worst-case
>    * analysis of the FDT construction-time size for a single
>    * vCPU node.
>    */
> #define DOMU_DTB_SIZE (2048 + (MAX_VIRT_CPUS * 128))
> 
> **********************************************
> 
> Please tell me would you be happy with that?

It looks ok. One thing I worry about - should it be Xen page aligned?

-- 
Best regards,
-grygorii


Re: [PATCH] xen/dom0less: Increase guest DTB size for high-vCPU guests
Posted by Oleksandr Tyshchenko 1 week ago

On 03.12.25 16:32, Grygorii Strashko wrote:
> Hi Oleksandr,

Hello Grygorii

> 
> On 03.12.25 13:03, Oleksandr Tyshchenko wrote:
>>
>>
>> On 02.12.25 23:33, Grygorii Strashko wrote:
>>
>>
>> Hello Grygorii
>>
>>>
>>>
>>> On 02.12.25 21:32, Oleksandr Tyshchenko wrote:
>>>> Creating a guest with a high vCPU count (e.g., >32) fails because
>>>> the guest's device tree buffer (DOMU_DTB_SIZE) overflows during 
>>>> creation.
>>>> The FDT nodes for each vCPU quickly exhaust the 4KiB buffer,
>>>> causing a guest creation failure.
>>>>
>>>> Increase the buffer size to 16KiB to support guests up to
>>>> the MAX_VIRT_CPUS limit (128).
>>>>
>>>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>> ---
>>>> Noticed when testing the boundary conditions for dom0less guest
>>>> creation on Arm64.
>>>>
>>>> Domain configuration:
>>>> fdt mknod /chosen domU0
>>>> fdt set /chosen/domU0 compatible "xen,domain"
>>>> fdt set /chosen/domU0 \#address-cells <0x2>
>>>> fdt set /chosen/domU0 \#size-cells <0x2>
>>>> fdt set /chosen/domU0 memory <0x0 0x10000 >
>>>> fdt set /chosen/domU0 cpus <33>
>>>> fdt set /chosen/domU0 vpl011
>>>> fdt mknod /chosen/domU0 module@40400000
>>>> fdt set /chosen/domU0/module@40400000 compatible  "multiboot,kernel"
>>>> "multiboot,module"
>>>> fdt set /chosen/domU0/module@40400000 reg <0x0 0x40400000 0x0 0x16000 >
>>>> fdt set /chosen/domU0/module@40400000 bootargs "console=ttyAMA0"
>>>>
>>>> Failure log:
>>>> (XEN) Xen dom0less mode detected
>>>> (XEN) *** LOADING DOMU cpus=33 memory=0x10000KB ***
>>>> (XEN) Loading d1 kernel from boot module @ 0000000040400000
>>>> (XEN) Allocating mappings totalling 64MB for d1:
>>>> (XEN) d1 BANK[0] 0x00000040000000-0x00000044000000 (64MB)
>>>> (XEN) Device tree generation failed (-22).
>>>> (XEN)
>>>> (XEN) ****************************************
>>>> (XEN) Panic on CPU 0:
>>>> (XEN) Could not set up domain domU0 (rc = -22)
>>>> (XEN) ****************************************
>>>> ---
>>>> ---
>>>>    xen/common/device-tree/dom0less-build.c | 8 +++++---
>>>>    1 file changed, 5 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/xen/common/device-tree/dom0less-build.c b/xen/common/
>>>> device-tree/dom0less-build.c
>>>> index 3f5b987ed8..d7d0a47b97 100644
>>>> --- a/xen/common/device-tree/dom0less-build.c
>>>> +++ b/xen/common/device-tree/dom0less-build.c
>>>> @@ -461,10 +461,12 @@ static int __init
>>>> domain_handle_dtb_boot_module(struct domain *d,
>>>>    /*
>>>>     * The max size for DT is 2MB. However, the generated DT is small
>>>> (not including
>>>> - * domU passthrough DT nodes whose size we account separately), 4KB
>>>> are enough
>>>> - * for now, but we might have to increase it in the future.
>>>> + * domU passthrough DT nodes whose size we account separately). The
>>>> size is
>>>> + * primarily driven by the number of vCPU nodes. The previous 4KiB
>>>> buffer was
>>>> + * insufficient for guests with high vCPU counts, so it has been
>>>> increased
>>>> + * to support up to the MAX_VIRT_CPUS limit (128).
>>>>     */
>>>> -#define DOMU_DTB_SIZE 4096
>>>> +#define DOMU_DTB_SIZE (4096 * 4)
>>>
>>> May be It wants Kconfig?
>>> Or some formula which accounts MAX_VIRT_CPUS?
>>
>>
>> I agree that using a formula that accounts for MAX_VIRT_CPUS is the most
>> robust approach.
>>
>> Here is the empirical data (by testing with the maximum number of device
>> tree nodes (e.g., hypervisor and reserved-memory nodes) and enabling all
>> optional CPU properties (e.g., clock-frequency)):
>>
>> cpus=1
>> (XEN) Final compacted FDT size is: 1586 bytes
>>
>> cpus=2
>> (XEN) Final compacted FDT size is: 1698 bytes
>>
>> cpus=32
>> (XEN) Final compacted FDT size is: 5058 bytes
>>
>> cpus=128
>> (XEN) Final compacted FDT size is: 15810 bytes
>>
>>
>> static int __init prepare_dtb_domU(struct domain *d, struct kernel_info
>> *kinfo)
>>    {
>>        int addrcells, sizecells;
>> @@ -569,6 +569,8 @@ static int __init prepare_dtb_domU(struct domain *d,
>> struct kernel_info *kinfo)
>>        if ( ret < 0 )
>>            goto err;
>>
>> +    printk("Final compacted FDT size is: %d bytes\n",
>> fdt_totalsize(kinfo->fdt));
>> +
>>        return 0;
>>
>>      err:
>>
>> This data shows (assuming my testing/calculations are correct):
>>
>> - A marginal cost of 112 bytes per vCPU in the final, compacted device 
>> tree.
>> - A fixed base size of 1474 bytes for all non-vCPU content.
> 
> Thank for detailed analyses and info.
> 
>>
>> Based on that I would propose the following formula with the 
>> justification:
>>
>> /*
>>    * The size is calculated from a fixed baseline plus a scalable
>>    * portion for each potential vCPU node up to the system limit
>>    * (MAX_VIRT_CPUS), as the vCPU nodes are the primary consumer
>>    * of space.
>>    *
>>    * The baseline of 2KiB is a safe buffer for all non-vCPU FDT
>>    * content. The 128 bytes per vCPU is derived from a worst-case
>>    * analysis of the FDT construction-time size for a single
>>    * vCPU node.
>>    */
>> #define DOMU_DTB_SIZE (2048 + (MAX_VIRT_CPUS * 128))
>>
>> **********************************************
>>
>> Please tell me would you be happy with that?
> 
> It looks ok.

Thanks.


> One thing I worry about - should it be Xen page aligned?

Good question. I could not find any information that the device tree 
blob (DTB) size itself is required to be page-aligned (at least on Arm64).

1. The Linux Kernel Boot Protocol Documentation on Arm64 says about 
setting up the device tree":
"The device tree blob (dtb) must be placed on an 8-byte boundary and 
must not exceed 2 megabytes in size."

It does not say "the size must be a multiple of..." or "the size must be 
page-aligned."

2. The official Device Tree Specification says about the "totalsize" 
field in the header:
" - totalsize
This field shall contain the total size in bytes of the devicetree data 
structure."

It also does not say "the size must be a multiple of..." or "the size 
must be page-aligned."

My understanding is: no, the size of DTB does not need to be page-aligned.

> 
Re: [PATCH] xen/dom0less: Increase guest DTB size for high-vCPU guests
Posted by Oleksii Kurochko 1 week ago
Hello Oleksandr,

On 12/3/25 12:03 PM, Oleksandr Tyshchenko wrote:
> On 02.12.25 23:33, Grygorii Strashko wrote:
>>
>> On 02.12.25 21:32, Oleksandr Tyshchenko wrote:
>>> Creating a guest with a high vCPU count (e.g., >32) fails because
>>> the guest's device tree buffer (DOMU_DTB_SIZE) overflows during creation.
>>> The FDT nodes for each vCPU quickly exhaust the 4KiB buffer,
>>> causing a guest creation failure.
>>>
>>> Increase the buffer size to 16KiB to support guests up to
>>> the MAX_VIRT_CPUS limit (128).
>>>
>>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>> ---
>>> Noticed when testing the boundary conditions for dom0less guest
>>> creation on Arm64.
>>>
>>> Domain configuration:
>>> fdt mknod /chosen domU0
>>> fdt set /chosen/domU0 compatible "xen,domain"
>>> fdt set /chosen/domU0 \#address-cells <0x2>
>>> fdt set /chosen/domU0 \#size-cells <0x2>
>>> fdt set /chosen/domU0 memory <0x0 0x10000 >
>>> fdt set /chosen/domU0 cpus <33>
>>> fdt set /chosen/domU0 vpl011
>>> fdt mknod /chosen/domU0 module@40400000
>>> fdt set /chosen/domU0/module@40400000 compatible  "multiboot,kernel"
>>> "multiboot,module"
>>> fdt set /chosen/domU0/module@40400000 reg <0x0 0x40400000 0x0 0x16000 >
>>> fdt set /chosen/domU0/module@40400000 bootargs "console=ttyAMA0"
>>>
>>> Failure log:
>>> (XEN) Xen dom0less mode detected
>>> (XEN) *** LOADING DOMU cpus=33 memory=0x10000KB ***
>>> (XEN) Loading d1 kernel from boot module @ 0000000040400000
>>> (XEN) Allocating mappings totalling 64MB for d1:
>>> (XEN) d1 BANK[0] 0x00000040000000-0x00000044000000 (64MB)
>>> (XEN) Device tree generation failed (-22).
>>> (XEN)
>>> (XEN) ****************************************
>>> (XEN) Panic on CPU 0:
>>> (XEN) Could not set up domain domU0 (rc = -22)
>>> (XEN) ****************************************
>>> ---
>>> ---
>>>    xen/common/device-tree/dom0less-build.c | 8 +++++---
>>>    1 file changed, 5 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/xen/common/device-tree/dom0less-build.c b/xen/common/
>>> device-tree/dom0less-build.c
>>> index 3f5b987ed8..d7d0a47b97 100644
>>> --- a/xen/common/device-tree/dom0less-build.c
>>> +++ b/xen/common/device-tree/dom0less-build.c
>>> @@ -461,10 +461,12 @@ static int __init
>>> domain_handle_dtb_boot_module(struct domain *d,
>>>    /*
>>>     * The max size for DT is 2MB. However, the generated DT is small
>>> (not including
>>> - * domU passthrough DT nodes whose size we account separately), 4KB
>>> are enough
>>> - * for now, but we might have to increase it in the future.
>>> + * domU passthrough DT nodes whose size we account separately). The
>>> size is
>>> + * primarily driven by the number of vCPU nodes. The previous 4KiB
>>> buffer was
>>> + * insufficient for guests with high vCPU counts, so it has been
>>> increased
>>> + * to support up to the MAX_VIRT_CPUS limit (128).
>>>     */
>>> -#define DOMU_DTB_SIZE 4096
>>> +#define DOMU_DTB_SIZE (4096 * 4)
>> May be It wants Kconfig?
>> Or some formula which accounts MAX_VIRT_CPUS?
>
> I agree that using a formula that accounts for MAX_VIRT_CPUS is the most
> robust approach.

One option could be to detect the size at runtime, essentially, try to allocate
it, and if an error occurs, increase the fdtsize and try again. I don’t really
like this approach, but I wanted to mention it in case someone finds it useful.
The benefit of this approach is that if, in the future, something else such
as a CPU node contributes to the final FDT size, we won’t need to update the
formula again.

>
> Here is the empirical data (by testing with the maximum number of device
> tree nodes (e.g., hypervisor and reserved-memory nodes) and enabling all
> optional CPU properties (e.g., clock-frequency)):
>
> cpus=1
> (XEN) Final compacted FDT size is: 1586 bytes
>
> cpus=2
> (XEN) Final compacted FDT size is: 1698 bytes
>
> cpus=32
> (XEN) Final compacted FDT size is: 5058 bytes
>
> cpus=128
> (XEN) Final compacted FDT size is: 15810 bytes
>
>
> static int __init prepare_dtb_domU(struct domain *d, struct kernel_info
> *kinfo)
>    {
>        int addrcells, sizecells;
> @@ -569,6 +569,8 @@ static int __init prepare_dtb_domU(struct domain *d,
> struct kernel_info *kinfo)
>        if ( ret < 0 )
>            goto err;
>
> +    printk("Final compacted FDT size is: %d bytes\n",
> fdt_totalsize(kinfo->fdt));
> +
>        return 0;
>
>      err:
>
> This data shows (assuming my testing/calculations are correct):
>
> - A marginal cost of 112 bytes per vCPU in the final, compacted device tree.
> - A fixed base size of 1474 bytes for all non-vCPU content.
>
> Based on that I would propose the following formula with the justification:
>
> /*
>    * The size is calculated from a fixed baseline plus a scalable
>    * portion for each potential vCPU node up to the system limit
>    * (MAX_VIRT_CPUS), as the vCPU nodes are the primary consumer
>    * of space.
>    *
>    * The baseline of 2KiB is a safe buffer for all non-vCPU FDT
>    * content. The 128 bytes per vCPU is derived from a worst-case
>    * analysis of the FDT construction-time size for a single
>    * vCPU node.
>    */
> #define DOMU_DTB_SIZE (2048 + (MAX_VIRT_CPUS * 128))
>
> **********************************************
>
> Please tell me would you be happy with that?

I would also like to note that we probably want to add a BUILD_BUG_ON() check
to ensure that DOMU_DTB_SIZE is not larger than SZ_2M. Otherwise, we would get
a runtime error instead of a build-time failure, since there is code that limits
fdtsize to SZ_2M:

     /* Cap to max DT size if needed */
     fdt_size = min(fdt_size, SZ_2M);

Thanks.

~ Oleksii


Re: [PATCH] xen/dom0less: Increase guest DTB size for high-vCPU guests
Posted by Oleksandr Tyshchenko 1 week ago

On 03.12.25 15:36, Oleksii Kurochko wrote:


> [You don't often get email from oleksii.kurochko@gmail.com. Learn why 
> this is important at https://aka.ms/LearnAboutSenderIdentification ]
> 
> Hello Oleksandr,

Hello Oleksii

> 
> On 12/3/25 12:03 PM, Oleksandr Tyshchenko wrote:
>> On 02.12.25 23:33, Grygorii Strashko wrote:
>>>
>>> On 02.12.25 21:32, Oleksandr Tyshchenko wrote:
>>>> Creating a guest with a high vCPU count (e.g., >32) fails because
>>>> the guest's device tree buffer (DOMU_DTB_SIZE) overflows during 
>>>> creation.
>>>> The FDT nodes for each vCPU quickly exhaust the 4KiB buffer,
>>>> causing a guest creation failure.
>>>>
>>>> Increase the buffer size to 16KiB to support guests up to
>>>> the MAX_VIRT_CPUS limit (128).
>>>>
>>>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>> ---
>>>> Noticed when testing the boundary conditions for dom0less guest
>>>> creation on Arm64.
>>>>
>>>> Domain configuration:
>>>> fdt mknod /chosen domU0
>>>> fdt set /chosen/domU0 compatible "xen,domain"
>>>> fdt set /chosen/domU0 \#address-cells <0x2>
>>>> fdt set /chosen/domU0 \#size-cells <0x2>
>>>> fdt set /chosen/domU0 memory <0x0 0x10000 >
>>>> fdt set /chosen/domU0 cpus <33>
>>>> fdt set /chosen/domU0 vpl011
>>>> fdt mknod /chosen/domU0 module@40400000
>>>> fdt set /chosen/domU0/module@40400000 compatible  "multiboot,kernel"
>>>> "multiboot,module"
>>>> fdt set /chosen/domU0/module@40400000 reg <0x0 0x40400000 0x0 0x16000 >
>>>> fdt set /chosen/domU0/module@40400000 bootargs "console=ttyAMA0"
>>>>
>>>> Failure log:
>>>> (XEN) Xen dom0less mode detected
>>>> (XEN) *** LOADING DOMU cpus=33 memory=0x10000KB ***
>>>> (XEN) Loading d1 kernel from boot module @ 0000000040400000
>>>> (XEN) Allocating mappings totalling 64MB for d1:
>>>> (XEN) d1 BANK[0] 0x00000040000000-0x00000044000000 (64MB)
>>>> (XEN) Device tree generation failed (-22).
>>>> (XEN)
>>>> (XEN) ****************************************
>>>> (XEN) Panic on CPU 0:
>>>> (XEN) Could not set up domain domU0 (rc = -22)
>>>> (XEN) ****************************************
>>>> ---
>>>> ---
>>>>    xen/common/device-tree/dom0less-build.c | 8 +++++---
>>>>    1 file changed, 5 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/xen/common/device-tree/dom0less-build.c b/xen/common/
>>>> device-tree/dom0less-build.c
>>>> index 3f5b987ed8..d7d0a47b97 100644
>>>> --- a/xen/common/device-tree/dom0less-build.c
>>>> +++ b/xen/common/device-tree/dom0less-build.c
>>>> @@ -461,10 +461,12 @@ static int __init
>>>> domain_handle_dtb_boot_module(struct domain *d,
>>>>    /*
>>>>     * The max size for DT is 2MB. However, the generated DT is small
>>>> (not including
>>>> - * domU passthrough DT nodes whose size we account separately), 4KB
>>>> are enough
>>>> - * for now, but we might have to increase it in the future.
>>>> + * domU passthrough DT nodes whose size we account separately). The
>>>> size is
>>>> + * primarily driven by the number of vCPU nodes. The previous 4KiB
>>>> buffer was
>>>> + * insufficient for guests with high vCPU counts, so it has been
>>>> increased
>>>> + * to support up to the MAX_VIRT_CPUS limit (128).
>>>>     */
>>>> -#define DOMU_DTB_SIZE 4096
>>>> +#define DOMU_DTB_SIZE (4096 * 4)
>>> May be It wants Kconfig?
>>> Or some formula which accounts MAX_VIRT_CPUS?
>>
>> I agree that using a formula that accounts for MAX_VIRT_CPUS is the most
>> robust approach.
> 
> One option could be to detect the size at runtime, essentially, try to 
> allocate
> it, and if an error occurs, increase the fdtsize and try again. I don’t 
> really
> like this approach, but I wanted to mention it in case someone finds it 
> useful.
> The benefit of this approach is that if, in the future, something else such
> as a CPU node contributes to the final FDT size, we won’t need to update 
> the
> formula again.

I got your point and understand the goal, but I see the following 
concerns with that:

1. Xen has to do all the work to build the device tree, fail, throw all 
that work away, and then start over again. This wastes time during the 
system's boot-up process.

2. Boot-time code should be as deterministic and predictable as 
possible. A static, worst-case calculation is highly predictable, 
whereas a retry loop is not.

3. It adds logical complexity (error handling, looping, size increments) 
to what should be a straightforward setup step.

> 
>>
>> Here is the empirical data (by testing with the maximum number of device
>> tree nodes (e.g., hypervisor and reserved-memory nodes) and enabling all
>> optional CPU properties (e.g., clock-frequency)):
>>
>> cpus=1
>> (XEN) Final compacted FDT size is: 1586 bytes
>>
>> cpus=2
>> (XEN) Final compacted FDT size is: 1698 bytes
>>
>> cpus=32
>> (XEN) Final compacted FDT size is: 5058 bytes
>>
>> cpus=128
>> (XEN) Final compacted FDT size is: 15810 bytes
>>
>>
>> static int __init prepare_dtb_domU(struct domain *d, struct kernel_info
>> *kinfo)
>>    {
>>        int addrcells, sizecells;
>> @@ -569,6 +569,8 @@ static int __init prepare_dtb_domU(struct domain *d,
>> struct kernel_info *kinfo)
>>        if ( ret < 0 )
>>            goto err;
>>
>> +    printk("Final compacted FDT size is: %d bytes\n",
>> fdt_totalsize(kinfo->fdt));
>> +
>>        return 0;
>>
>>      err:
>>
>> This data shows (assuming my testing/calculations are correct):
>>
>> - A marginal cost of 112 bytes per vCPU in the final, compacted device 
>> tree.
>> - A fixed base size of 1474 bytes for all non-vCPU content.
>>
>> Based on that I would propose the following formula with the 
>> justification:
>>
>> /*
>>    * The size is calculated from a fixed baseline plus a scalable
>>    * portion for each potential vCPU node up to the system limit
>>    * (MAX_VIRT_CPUS), as the vCPU nodes are the primary consumer
>>    * of space.
>>    *
>>    * The baseline of 2KiB is a safe buffer for all non-vCPU FDT
>>    * content. The 128 bytes per vCPU is derived from a worst-case
>>    * analysis of the FDT construction-time size for a single
>>    * vCPU node.
>>    */
>> #define DOMU_DTB_SIZE (2048 + (MAX_VIRT_CPUS * 128))
>>
>> **********************************************
>>
>> Please tell me would you be happy with that?
> 
> I would also like to note that we probably want to add a BUILD_BUG_ON() 
> check
> to ensure that DOMU_DTB_SIZE is not larger than SZ_2M. Otherwise, we 
> would get
> a runtime error instead of a build-time failure, since there is code 
> that limits
> fdtsize to SZ_2M:
> 
>      /* Cap to max DT size if needed */
>      fdt_size = min(fdt_size, SZ_2M);


ok, sounds reasonable, will add:

BUILD_BUG_ON(DOMU_DTB_SIZE > SZ_2M);

> 
> Thanks.
> 
> ~ Oleksii
> 
Re: [PATCH] xen/dom0less: Increase guest DTB size for high-vCPU guests
Posted by Oleksii Kurochko 6 days, 21 hours ago
Hello Oleksandr,

On 12/3/25 3:05 PM, Oleksandr Tyshchenko wrote:
>
> On 03.12.25 15:36, Oleksii Kurochko wrote:
>
>
>> On 12/3/25 12:03 PM, Oleksandr Tyshchenko wrote:
>>> On 02.12.25 23:33, Grygorii Strashko wrote:
>>>> On 02.12.25 21:32, Oleksandr Tyshchenko wrote:
>>>>> Creating a guest with a high vCPU count (e.g., >32) fails because
>>>>> the guest's device tree buffer (DOMU_DTB_SIZE) overflows during
>>>>> creation.
>>>>> The FDT nodes for each vCPU quickly exhaust the 4KiB buffer,
>>>>> causing a guest creation failure.
>>>>>
>>>>> Increase the buffer size to 16KiB to support guests up to
>>>>> the MAX_VIRT_CPUS limit (128).
>>>>>
>>>>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>>> ---
>>>>> Noticed when testing the boundary conditions for dom0less guest
>>>>> creation on Arm64.
>>>>>
>>>>> Domain configuration:
>>>>> fdt mknod /chosen domU0
>>>>> fdt set /chosen/domU0 compatible "xen,domain"
>>>>> fdt set /chosen/domU0 \#address-cells <0x2>
>>>>> fdt set /chosen/domU0 \#size-cells <0x2>
>>>>> fdt set /chosen/domU0 memory <0x0 0x10000 >
>>>>> fdt set /chosen/domU0 cpus <33>
>>>>> fdt set /chosen/domU0 vpl011
>>>>> fdt mknod /chosen/domU0 module@40400000
>>>>> fdt set /chosen/domU0/module@40400000 compatible  "multiboot,kernel"
>>>>> "multiboot,module"
>>>>> fdt set /chosen/domU0/module@40400000 reg <0x0 0x40400000 0x0 0x16000 >
>>>>> fdt set /chosen/domU0/module@40400000 bootargs "console=ttyAMA0"
>>>>>
>>>>> Failure log:
>>>>> (XEN) Xen dom0less mode detected
>>>>> (XEN) *** LOADING DOMU cpus=33 memory=0x10000KB ***
>>>>> (XEN) Loading d1 kernel from boot module @ 0000000040400000
>>>>> (XEN) Allocating mappings totalling 64MB for d1:
>>>>> (XEN) d1 BANK[0] 0x00000040000000-0x00000044000000 (64MB)
>>>>> (XEN) Device tree generation failed (-22).
>>>>> (XEN)
>>>>> (XEN) ****************************************
>>>>> (XEN) Panic on CPU 0:
>>>>> (XEN) Could not set up domain domU0 (rc = -22)
>>>>> (XEN) ****************************************
>>>>> ---
>>>>> ---
>>>>>     xen/common/device-tree/dom0less-build.c | 8 +++++---
>>>>>     1 file changed, 5 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/xen/common/device-tree/dom0less-build.c b/xen/common/
>>>>> device-tree/dom0less-build.c
>>>>> index 3f5b987ed8..d7d0a47b97 100644
>>>>> --- a/xen/common/device-tree/dom0less-build.c
>>>>> +++ b/xen/common/device-tree/dom0less-build.c
>>>>> @@ -461,10 +461,12 @@ static int __init
>>>>> domain_handle_dtb_boot_module(struct domain *d,
>>>>>     /*
>>>>>      * The max size for DT is 2MB. However, the generated DT is small
>>>>> (not including
>>>>> - * domU passthrough DT nodes whose size we account separately), 4KB
>>>>> are enough
>>>>> - * for now, but we might have to increase it in the future.
>>>>> + * domU passthrough DT nodes whose size we account separately). The
>>>>> size is
>>>>> + * primarily driven by the number of vCPU nodes. The previous 4KiB
>>>>> buffer was
>>>>> + * insufficient for guests with high vCPU counts, so it has been
>>>>> increased
>>>>> + * to support up to the MAX_VIRT_CPUS limit (128).
>>>>>      */
>>>>> -#define DOMU_DTB_SIZE 4096
>>>>> +#define DOMU_DTB_SIZE (4096 * 4)
>>>> May be It wants Kconfig?
>>>> Or some formula which accounts MAX_VIRT_CPUS?
>>> I agree that using a formula that accounts for MAX_VIRT_CPUS is the most
>>> robust approach.
>> One option could be to detect the size at runtime, essentially, try to
>> allocate
>> it, and if an error occurs, increase the fdtsize and try again. I don’t
>> really
>> like this approach, but I wanted to mention it in case someone finds it
>> useful.
>> The benefit of this approach is that if, in the future, something else such
>> as a CPU node contributes to the final FDT size, we won’t need to update
>> the
>> formula again.
> I got your point and understand the goal, but I see the following
> concerns with that:
>
> 1. Xen has to do all the work to build the device tree, fail, throw all
> that work away, and then start over again. This wastes time during the
> system's boot-up process.
>
> 2. Boot-time code should be as deterministic and predictable as
> possible. A static, worst-case calculation is highly predictable,
> whereas a retry loop is not.
>
> 3. It adds logical complexity (error handling, looping, size increments)
> to what should be a straightforward setup step.

Yes, I totally agree with all your concerns, so lets just go with a formula
approach.

Thanks.

~ Oleksii