xen/common/device-tree/dom0less-build.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
Creating a guest with a high vCPU count (e.g., >32) fails because
the guest's device tree buffer (DOMU_DTB_SIZE) overflows during creation.
The FDT nodes for each vCPU quickly exhaust the 4KiB buffer,
causing a guest creation failure.
Increase the buffer size to 16KiB to support guests up to
the MAX_VIRT_CPUS limit (128).
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
---
Noticed when testing the boundary conditions for dom0less guest
creation on Arm64.
Domain configuration:
fdt mknod /chosen domU0
fdt set /chosen/domU0 compatible "xen,domain"
fdt set /chosen/domU0 \#address-cells <0x2>
fdt set /chosen/domU0 \#size-cells <0x2>
fdt set /chosen/domU0 memory <0x0 0x10000 >
fdt set /chosen/domU0 cpus <33>
fdt set /chosen/domU0 vpl011
fdt mknod /chosen/domU0 module@40400000
fdt set /chosen/domU0/module@40400000 compatible "multiboot,kernel" "multiboot,module"
fdt set /chosen/domU0/module@40400000 reg <0x0 0x40400000 0x0 0x16000 >
fdt set /chosen/domU0/module@40400000 bootargs "console=ttyAMA0"
Failure log:
(XEN) Xen dom0less mode detected
(XEN) *** LOADING DOMU cpus=33 memory=0x10000KB ***
(XEN) Loading d1 kernel from boot module @ 0000000040400000
(XEN) Allocating mappings totalling 64MB for d1:
(XEN) d1 BANK[0] 0x00000040000000-0x00000044000000 (64MB)
(XEN) Device tree generation failed (-22).
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Could not set up domain domU0 (rc = -22)
(XEN) ****************************************
---
---
xen/common/device-tree/dom0less-build.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/xen/common/device-tree/dom0less-build.c b/xen/common/device-tree/dom0less-build.c
index 3f5b987ed8..d7d0a47b97 100644
--- a/xen/common/device-tree/dom0less-build.c
+++ b/xen/common/device-tree/dom0less-build.c
@@ -461,10 +461,12 @@ static int __init domain_handle_dtb_boot_module(struct domain *d,
/*
* The max size for DT is 2MB. However, the generated DT is small (not including
- * domU passthrough DT nodes whose size we account separately), 4KB are enough
- * for now, but we might have to increase it in the future.
+ * domU passthrough DT nodes whose size we account separately). The size is
+ * primarily driven by the number of vCPU nodes. The previous 4KiB buffer was
+ * insufficient for guests with high vCPU counts, so it has been increased
+ * to support up to the MAX_VIRT_CPUS limit (128).
*/
-#define DOMU_DTB_SIZE 4096
+#define DOMU_DTB_SIZE (4096 * 4)
static int __init prepare_dtb_domU(struct domain *d, struct kernel_info *kinfo)
{
int addrcells, sizecells;
--
2.34.1
On 02.12.25 21:32, Oleksandr Tyshchenko wrote:
> Creating a guest with a high vCPU count (e.g., >32) fails because
> the guest's device tree buffer (DOMU_DTB_SIZE) overflows during creation.
> The FDT nodes for each vCPU quickly exhaust the 4KiB buffer,
> causing a guest creation failure.
>
> Increase the buffer size to 16KiB to support guests up to
> the MAX_VIRT_CPUS limit (128).
>
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> ---
> Noticed when testing the boundary conditions for dom0less guest
> creation on Arm64.
>
> Domain configuration:
> fdt mknod /chosen domU0
> fdt set /chosen/domU0 compatible "xen,domain"
> fdt set /chosen/domU0 \#address-cells <0x2>
> fdt set /chosen/domU0 \#size-cells <0x2>
> fdt set /chosen/domU0 memory <0x0 0x10000 >
> fdt set /chosen/domU0 cpus <33>
> fdt set /chosen/domU0 vpl011
> fdt mknod /chosen/domU0 module@40400000
> fdt set /chosen/domU0/module@40400000 compatible "multiboot,kernel" "multiboot,module"
> fdt set /chosen/domU0/module@40400000 reg <0x0 0x40400000 0x0 0x16000 >
> fdt set /chosen/domU0/module@40400000 bootargs "console=ttyAMA0"
>
> Failure log:
> (XEN) Xen dom0less mode detected
> (XEN) *** LOADING DOMU cpus=33 memory=0x10000KB ***
> (XEN) Loading d1 kernel from boot module @ 0000000040400000
> (XEN) Allocating mappings totalling 64MB for d1:
> (XEN) d1 BANK[0] 0x00000040000000-0x00000044000000 (64MB)
> (XEN) Device tree generation failed (-22).
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 0:
> (XEN) Could not set up domain domU0 (rc = -22)
> (XEN) ****************************************
> ---
> ---
> xen/common/device-tree/dom0less-build.c | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/xen/common/device-tree/dom0less-build.c b/xen/common/device-tree/dom0less-build.c
> index 3f5b987ed8..d7d0a47b97 100644
> --- a/xen/common/device-tree/dom0less-build.c
> +++ b/xen/common/device-tree/dom0less-build.c
> @@ -461,10 +461,12 @@ static int __init domain_handle_dtb_boot_module(struct domain *d,
>
> /*
> * The max size for DT is 2MB. However, the generated DT is small (not including
> - * domU passthrough DT nodes whose size we account separately), 4KB are enough
> - * for now, but we might have to increase it in the future.
> + * domU passthrough DT nodes whose size we account separately). The size is
> + * primarily driven by the number of vCPU nodes. The previous 4KiB buffer was
> + * insufficient for guests with high vCPU counts, so it has been increased
> + * to support up to the MAX_VIRT_CPUS limit (128).
> */
> -#define DOMU_DTB_SIZE 4096
> +#define DOMU_DTB_SIZE (4096 * 4)
May be It wants Kconfig?
Or some formula which accounts MAX_VIRT_CPUS?
> static int __init prepare_dtb_domU(struct domain *d, struct kernel_info *kinfo)
> {
> int addrcells, sizecells;
--
Best regards,
-grygorii
On 02.12.25 23:33, Grygorii Strashko wrote:
Hello Grygorii
>
>
> On 02.12.25 21:32, Oleksandr Tyshchenko wrote:
>> Creating a guest with a high vCPU count (e.g., >32) fails because
>> the guest's device tree buffer (DOMU_DTB_SIZE) overflows during creation.
>> The FDT nodes for each vCPU quickly exhaust the 4KiB buffer,
>> causing a guest creation failure.
>>
>> Increase the buffer size to 16KiB to support guests up to
>> the MAX_VIRT_CPUS limit (128).
>>
>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>> ---
>> Noticed when testing the boundary conditions for dom0less guest
>> creation on Arm64.
>>
>> Domain configuration:
>> fdt mknod /chosen domU0
>> fdt set /chosen/domU0 compatible "xen,domain"
>> fdt set /chosen/domU0 \#address-cells <0x2>
>> fdt set /chosen/domU0 \#size-cells <0x2>
>> fdt set /chosen/domU0 memory <0x0 0x10000 >
>> fdt set /chosen/domU0 cpus <33>
>> fdt set /chosen/domU0 vpl011
>> fdt mknod /chosen/domU0 module@40400000
>> fdt set /chosen/domU0/module@40400000 compatible "multiboot,kernel"
>> "multiboot,module"
>> fdt set /chosen/domU0/module@40400000 reg <0x0 0x40400000 0x0 0x16000 >
>> fdt set /chosen/domU0/module@40400000 bootargs "console=ttyAMA0"
>>
>> Failure log:
>> (XEN) Xen dom0less mode detected
>> (XEN) *** LOADING DOMU cpus=33 memory=0x10000KB ***
>> (XEN) Loading d1 kernel from boot module @ 0000000040400000
>> (XEN) Allocating mappings totalling 64MB for d1:
>> (XEN) d1 BANK[0] 0x00000040000000-0x00000044000000 (64MB)
>> (XEN) Device tree generation failed (-22).
>> (XEN)
>> (XEN) ****************************************
>> (XEN) Panic on CPU 0:
>> (XEN) Could not set up domain domU0 (rc = -22)
>> (XEN) ****************************************
>> ---
>> ---
>> xen/common/device-tree/dom0less-build.c | 8 +++++---
>> 1 file changed, 5 insertions(+), 3 deletions(-)
>>
>> diff --git a/xen/common/device-tree/dom0less-build.c b/xen/common/
>> device-tree/dom0less-build.c
>> index 3f5b987ed8..d7d0a47b97 100644
>> --- a/xen/common/device-tree/dom0less-build.c
>> +++ b/xen/common/device-tree/dom0less-build.c
>> @@ -461,10 +461,12 @@ static int __init
>> domain_handle_dtb_boot_module(struct domain *d,
>> /*
>> * The max size for DT is 2MB. However, the generated DT is small
>> (not including
>> - * domU passthrough DT nodes whose size we account separately), 4KB
>> are enough
>> - * for now, but we might have to increase it in the future.
>> + * domU passthrough DT nodes whose size we account separately). The
>> size is
>> + * primarily driven by the number of vCPU nodes. The previous 4KiB
>> buffer was
>> + * insufficient for guests with high vCPU counts, so it has been
>> increased
>> + * to support up to the MAX_VIRT_CPUS limit (128).
>> */
>> -#define DOMU_DTB_SIZE 4096
>> +#define DOMU_DTB_SIZE (4096 * 4)
>
> May be It wants Kconfig?
> Or some formula which accounts MAX_VIRT_CPUS?
I agree that using a formula that accounts for MAX_VIRT_CPUS is the most
robust approach.
Here is the empirical data (by testing with the maximum number of device
tree nodes (e.g., hypervisor and reserved-memory nodes) and enabling all
optional CPU properties (e.g., clock-frequency)):
cpus=1
(XEN) Final compacted FDT size is: 1586 bytes
cpus=2
(XEN) Final compacted FDT size is: 1698 bytes
cpus=32
(XEN) Final compacted FDT size is: 5058 bytes
cpus=128
(XEN) Final compacted FDT size is: 15810 bytes
static int __init prepare_dtb_domU(struct domain *d, struct kernel_info
*kinfo)
{
int addrcells, sizecells;
@@ -569,6 +569,8 @@ static int __init prepare_dtb_domU(struct domain *d,
struct kernel_info *kinfo)
if ( ret < 0 )
goto err;
+ printk("Final compacted FDT size is: %d bytes\n",
fdt_totalsize(kinfo->fdt));
+
return 0;
err:
This data shows (assuming my testing/calculations are correct):
- A marginal cost of 112 bytes per vCPU in the final, compacted device tree.
- A fixed base size of 1474 bytes for all non-vCPU content.
Based on that I would propose the following formula with the justification:
/*
* The size is calculated from a fixed baseline plus a scalable
* portion for each potential vCPU node up to the system limit
* (MAX_VIRT_CPUS), as the vCPU nodes are the primary consumer
* of space.
*
* The baseline of 2KiB is a safe buffer for all non-vCPU FDT
* content. The 128 bytes per vCPU is derived from a worst-case
* analysis of the FDT construction-time size for a single
* vCPU node.
*/
#define DOMU_DTB_SIZE (2048 + (MAX_VIRT_CPUS * 128))
**********************************************
Please tell me would you be happy with that?
>
>> static int __init prepare_dtb_domU(struct domain *d, struct
>> kernel_info *kinfo)
>> {
>> int addrcells, sizecells;
>
Hi Oleksandr,
On 03.12.25 13:03, Oleksandr Tyshchenko wrote:
>
>
> On 02.12.25 23:33, Grygorii Strashko wrote:
>
>
> Hello Grygorii
>
>>
>>
>> On 02.12.25 21:32, Oleksandr Tyshchenko wrote:
>>> Creating a guest with a high vCPU count (e.g., >32) fails because
>>> the guest's device tree buffer (DOMU_DTB_SIZE) overflows during creation.
>>> The FDT nodes for each vCPU quickly exhaust the 4KiB buffer,
>>> causing a guest creation failure.
>>>
>>> Increase the buffer size to 16KiB to support guests up to
>>> the MAX_VIRT_CPUS limit (128).
>>>
>>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>> ---
>>> Noticed when testing the boundary conditions for dom0less guest
>>> creation on Arm64.
>>>
>>> Domain configuration:
>>> fdt mknod /chosen domU0
>>> fdt set /chosen/domU0 compatible "xen,domain"
>>> fdt set /chosen/domU0 \#address-cells <0x2>
>>> fdt set /chosen/domU0 \#size-cells <0x2>
>>> fdt set /chosen/domU0 memory <0x0 0x10000 >
>>> fdt set /chosen/domU0 cpus <33>
>>> fdt set /chosen/domU0 vpl011
>>> fdt mknod /chosen/domU0 module@40400000
>>> fdt set /chosen/domU0/module@40400000 compatible "multiboot,kernel"
>>> "multiboot,module"
>>> fdt set /chosen/domU0/module@40400000 reg <0x0 0x40400000 0x0 0x16000 >
>>> fdt set /chosen/domU0/module@40400000 bootargs "console=ttyAMA0"
>>>
>>> Failure log:
>>> (XEN) Xen dom0less mode detected
>>> (XEN) *** LOADING DOMU cpus=33 memory=0x10000KB ***
>>> (XEN) Loading d1 kernel from boot module @ 0000000040400000
>>> (XEN) Allocating mappings totalling 64MB for d1:
>>> (XEN) d1 BANK[0] 0x00000040000000-0x00000044000000 (64MB)
>>> (XEN) Device tree generation failed (-22).
>>> (XEN)
>>> (XEN) ****************************************
>>> (XEN) Panic on CPU 0:
>>> (XEN) Could not set up domain domU0 (rc = -22)
>>> (XEN) ****************************************
>>> ---
>>> ---
>>> xen/common/device-tree/dom0less-build.c | 8 +++++---
>>> 1 file changed, 5 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/xen/common/device-tree/dom0less-build.c b/xen/common/
>>> device-tree/dom0less-build.c
>>> index 3f5b987ed8..d7d0a47b97 100644
>>> --- a/xen/common/device-tree/dom0less-build.c
>>> +++ b/xen/common/device-tree/dom0less-build.c
>>> @@ -461,10 +461,12 @@ static int __init
>>> domain_handle_dtb_boot_module(struct domain *d,
>>> /*
>>> * The max size for DT is 2MB. However, the generated DT is small
>>> (not including
>>> - * domU passthrough DT nodes whose size we account separately), 4KB
>>> are enough
>>> - * for now, but we might have to increase it in the future.
>>> + * domU passthrough DT nodes whose size we account separately). The
>>> size is
>>> + * primarily driven by the number of vCPU nodes. The previous 4KiB
>>> buffer was
>>> + * insufficient for guests with high vCPU counts, so it has been
>>> increased
>>> + * to support up to the MAX_VIRT_CPUS limit (128).
>>> */
>>> -#define DOMU_DTB_SIZE 4096
>>> +#define DOMU_DTB_SIZE (4096 * 4)
>>
>> May be It wants Kconfig?
>> Or some formula which accounts MAX_VIRT_CPUS?
>
>
> I agree that using a formula that accounts for MAX_VIRT_CPUS is the most
> robust approach.
>
> Here is the empirical data (by testing with the maximum number of device
> tree nodes (e.g., hypervisor and reserved-memory nodes) and enabling all
> optional CPU properties (e.g., clock-frequency)):
>
> cpus=1
> (XEN) Final compacted FDT size is: 1586 bytes
>
> cpus=2
> (XEN) Final compacted FDT size is: 1698 bytes
>
> cpus=32
> (XEN) Final compacted FDT size is: 5058 bytes
>
> cpus=128
> (XEN) Final compacted FDT size is: 15810 bytes
>
>
> static int __init prepare_dtb_domU(struct domain *d, struct kernel_info
> *kinfo)
> {
> int addrcells, sizecells;
> @@ -569,6 +569,8 @@ static int __init prepare_dtb_domU(struct domain *d,
> struct kernel_info *kinfo)
> if ( ret < 0 )
> goto err;
>
> + printk("Final compacted FDT size is: %d bytes\n",
> fdt_totalsize(kinfo->fdt));
> +
> return 0;
>
> err:
>
> This data shows (assuming my testing/calculations are correct):
>
> - A marginal cost of 112 bytes per vCPU in the final, compacted device tree.
> - A fixed base size of 1474 bytes for all non-vCPU content.
Thank for detailed analyses and info.
>
> Based on that I would propose the following formula with the justification:
>
> /*
> * The size is calculated from a fixed baseline plus a scalable
> * portion for each potential vCPU node up to the system limit
> * (MAX_VIRT_CPUS), as the vCPU nodes are the primary consumer
> * of space.
> *
> * The baseline of 2KiB is a safe buffer for all non-vCPU FDT
> * content. The 128 bytes per vCPU is derived from a worst-case
> * analysis of the FDT construction-time size for a single
> * vCPU node.
> */
> #define DOMU_DTB_SIZE (2048 + (MAX_VIRT_CPUS * 128))
>
> **********************************************
>
> Please tell me would you be happy with that?
It looks ok. One thing I worry about - should it be Xen page aligned?
--
Best regards,
-grygorii
On 03.12.25 16:32, Grygorii Strashko wrote:
> Hi Oleksandr,
Hello Grygorii
>
> On 03.12.25 13:03, Oleksandr Tyshchenko wrote:
>>
>>
>> On 02.12.25 23:33, Grygorii Strashko wrote:
>>
>>
>> Hello Grygorii
>>
>>>
>>>
>>> On 02.12.25 21:32, Oleksandr Tyshchenko wrote:
>>>> Creating a guest with a high vCPU count (e.g., >32) fails because
>>>> the guest's device tree buffer (DOMU_DTB_SIZE) overflows during
>>>> creation.
>>>> The FDT nodes for each vCPU quickly exhaust the 4KiB buffer,
>>>> causing a guest creation failure.
>>>>
>>>> Increase the buffer size to 16KiB to support guests up to
>>>> the MAX_VIRT_CPUS limit (128).
>>>>
>>>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>> ---
>>>> Noticed when testing the boundary conditions for dom0less guest
>>>> creation on Arm64.
>>>>
>>>> Domain configuration:
>>>> fdt mknod /chosen domU0
>>>> fdt set /chosen/domU0 compatible "xen,domain"
>>>> fdt set /chosen/domU0 \#address-cells <0x2>
>>>> fdt set /chosen/domU0 \#size-cells <0x2>
>>>> fdt set /chosen/domU0 memory <0x0 0x10000 >
>>>> fdt set /chosen/domU0 cpus <33>
>>>> fdt set /chosen/domU0 vpl011
>>>> fdt mknod /chosen/domU0 module@40400000
>>>> fdt set /chosen/domU0/module@40400000 compatible "multiboot,kernel"
>>>> "multiboot,module"
>>>> fdt set /chosen/domU0/module@40400000 reg <0x0 0x40400000 0x0 0x16000 >
>>>> fdt set /chosen/domU0/module@40400000 bootargs "console=ttyAMA0"
>>>>
>>>> Failure log:
>>>> (XEN) Xen dom0less mode detected
>>>> (XEN) *** LOADING DOMU cpus=33 memory=0x10000KB ***
>>>> (XEN) Loading d1 kernel from boot module @ 0000000040400000
>>>> (XEN) Allocating mappings totalling 64MB for d1:
>>>> (XEN) d1 BANK[0] 0x00000040000000-0x00000044000000 (64MB)
>>>> (XEN) Device tree generation failed (-22).
>>>> (XEN)
>>>> (XEN) ****************************************
>>>> (XEN) Panic on CPU 0:
>>>> (XEN) Could not set up domain domU0 (rc = -22)
>>>> (XEN) ****************************************
>>>> ---
>>>> ---
>>>> xen/common/device-tree/dom0less-build.c | 8 +++++---
>>>> 1 file changed, 5 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/xen/common/device-tree/dom0less-build.c b/xen/common/
>>>> device-tree/dom0less-build.c
>>>> index 3f5b987ed8..d7d0a47b97 100644
>>>> --- a/xen/common/device-tree/dom0less-build.c
>>>> +++ b/xen/common/device-tree/dom0less-build.c
>>>> @@ -461,10 +461,12 @@ static int __init
>>>> domain_handle_dtb_boot_module(struct domain *d,
>>>> /*
>>>> * The max size for DT is 2MB. However, the generated DT is small
>>>> (not including
>>>> - * domU passthrough DT nodes whose size we account separately), 4KB
>>>> are enough
>>>> - * for now, but we might have to increase it in the future.
>>>> + * domU passthrough DT nodes whose size we account separately). The
>>>> size is
>>>> + * primarily driven by the number of vCPU nodes. The previous 4KiB
>>>> buffer was
>>>> + * insufficient for guests with high vCPU counts, so it has been
>>>> increased
>>>> + * to support up to the MAX_VIRT_CPUS limit (128).
>>>> */
>>>> -#define DOMU_DTB_SIZE 4096
>>>> +#define DOMU_DTB_SIZE (4096 * 4)
>>>
>>> May be It wants Kconfig?
>>> Or some formula which accounts MAX_VIRT_CPUS?
>>
>>
>> I agree that using a formula that accounts for MAX_VIRT_CPUS is the most
>> robust approach.
>>
>> Here is the empirical data (by testing with the maximum number of device
>> tree nodes (e.g., hypervisor and reserved-memory nodes) and enabling all
>> optional CPU properties (e.g., clock-frequency)):
>>
>> cpus=1
>> (XEN) Final compacted FDT size is: 1586 bytes
>>
>> cpus=2
>> (XEN) Final compacted FDT size is: 1698 bytes
>>
>> cpus=32
>> (XEN) Final compacted FDT size is: 5058 bytes
>>
>> cpus=128
>> (XEN) Final compacted FDT size is: 15810 bytes
>>
>>
>> static int __init prepare_dtb_domU(struct domain *d, struct kernel_info
>> *kinfo)
>> {
>> int addrcells, sizecells;
>> @@ -569,6 +569,8 @@ static int __init prepare_dtb_domU(struct domain *d,
>> struct kernel_info *kinfo)
>> if ( ret < 0 )
>> goto err;
>>
>> + printk("Final compacted FDT size is: %d bytes\n",
>> fdt_totalsize(kinfo->fdt));
>> +
>> return 0;
>>
>> err:
>>
>> This data shows (assuming my testing/calculations are correct):
>>
>> - A marginal cost of 112 bytes per vCPU in the final, compacted device
>> tree.
>> - A fixed base size of 1474 bytes for all non-vCPU content.
>
> Thank for detailed analyses and info.
>
>>
>> Based on that I would propose the following formula with the
>> justification:
>>
>> /*
>> * The size is calculated from a fixed baseline plus a scalable
>> * portion for each potential vCPU node up to the system limit
>> * (MAX_VIRT_CPUS), as the vCPU nodes are the primary consumer
>> * of space.
>> *
>> * The baseline of 2KiB is a safe buffer for all non-vCPU FDT
>> * content. The 128 bytes per vCPU is derived from a worst-case
>> * analysis of the FDT construction-time size for a single
>> * vCPU node.
>> */
>> #define DOMU_DTB_SIZE (2048 + (MAX_VIRT_CPUS * 128))
>>
>> **********************************************
>>
>> Please tell me would you be happy with that?
>
> It looks ok.
Thanks.
> One thing I worry about - should it be Xen page aligned?
Good question. I could not find any information that the device tree
blob (DTB) size itself is required to be page-aligned (at least on Arm64).
1. The Linux Kernel Boot Protocol Documentation on Arm64 says about
setting up the device tree":
"The device tree blob (dtb) must be placed on an 8-byte boundary and
must not exceed 2 megabytes in size."
It does not say "the size must be a multiple of..." or "the size must be
page-aligned."
2. The official Device Tree Specification says about the "totalsize"
field in the header:
" - totalsize
This field shall contain the total size in bytes of the devicetree data
structure."
It also does not say "the size must be a multiple of..." or "the size
must be page-aligned."
My understanding is: no, the size of DTB does not need to be page-aligned.
>
Hello Oleksandr,
On 12/3/25 12:03 PM, Oleksandr Tyshchenko wrote:
> On 02.12.25 23:33, Grygorii Strashko wrote:
>>
>> On 02.12.25 21:32, Oleksandr Tyshchenko wrote:
>>> Creating a guest with a high vCPU count (e.g., >32) fails because
>>> the guest's device tree buffer (DOMU_DTB_SIZE) overflows during creation.
>>> The FDT nodes for each vCPU quickly exhaust the 4KiB buffer,
>>> causing a guest creation failure.
>>>
>>> Increase the buffer size to 16KiB to support guests up to
>>> the MAX_VIRT_CPUS limit (128).
>>>
>>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>> ---
>>> Noticed when testing the boundary conditions for dom0less guest
>>> creation on Arm64.
>>>
>>> Domain configuration:
>>> fdt mknod /chosen domU0
>>> fdt set /chosen/domU0 compatible "xen,domain"
>>> fdt set /chosen/domU0 \#address-cells <0x2>
>>> fdt set /chosen/domU0 \#size-cells <0x2>
>>> fdt set /chosen/domU0 memory <0x0 0x10000 >
>>> fdt set /chosen/domU0 cpus <33>
>>> fdt set /chosen/domU0 vpl011
>>> fdt mknod /chosen/domU0 module@40400000
>>> fdt set /chosen/domU0/module@40400000 compatible "multiboot,kernel"
>>> "multiboot,module"
>>> fdt set /chosen/domU0/module@40400000 reg <0x0 0x40400000 0x0 0x16000 >
>>> fdt set /chosen/domU0/module@40400000 bootargs "console=ttyAMA0"
>>>
>>> Failure log:
>>> (XEN) Xen dom0less mode detected
>>> (XEN) *** LOADING DOMU cpus=33 memory=0x10000KB ***
>>> (XEN) Loading d1 kernel from boot module @ 0000000040400000
>>> (XEN) Allocating mappings totalling 64MB for d1:
>>> (XEN) d1 BANK[0] 0x00000040000000-0x00000044000000 (64MB)
>>> (XEN) Device tree generation failed (-22).
>>> (XEN)
>>> (XEN) ****************************************
>>> (XEN) Panic on CPU 0:
>>> (XEN) Could not set up domain domU0 (rc = -22)
>>> (XEN) ****************************************
>>> ---
>>> ---
>>> xen/common/device-tree/dom0less-build.c | 8 +++++---
>>> 1 file changed, 5 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/xen/common/device-tree/dom0less-build.c b/xen/common/
>>> device-tree/dom0less-build.c
>>> index 3f5b987ed8..d7d0a47b97 100644
>>> --- a/xen/common/device-tree/dom0less-build.c
>>> +++ b/xen/common/device-tree/dom0less-build.c
>>> @@ -461,10 +461,12 @@ static int __init
>>> domain_handle_dtb_boot_module(struct domain *d,
>>> /*
>>> * The max size for DT is 2MB. However, the generated DT is small
>>> (not including
>>> - * domU passthrough DT nodes whose size we account separately), 4KB
>>> are enough
>>> - * for now, but we might have to increase it in the future.
>>> + * domU passthrough DT nodes whose size we account separately). The
>>> size is
>>> + * primarily driven by the number of vCPU nodes. The previous 4KiB
>>> buffer was
>>> + * insufficient for guests with high vCPU counts, so it has been
>>> increased
>>> + * to support up to the MAX_VIRT_CPUS limit (128).
>>> */
>>> -#define DOMU_DTB_SIZE 4096
>>> +#define DOMU_DTB_SIZE (4096 * 4)
>> May be It wants Kconfig?
>> Or some formula which accounts MAX_VIRT_CPUS?
>
> I agree that using a formula that accounts for MAX_VIRT_CPUS is the most
> robust approach.
One option could be to detect the size at runtime, essentially, try to allocate
it, and if an error occurs, increase the fdtsize and try again. I don’t really
like this approach, but I wanted to mention it in case someone finds it useful.
The benefit of this approach is that if, in the future, something else such
as a CPU node contributes to the final FDT size, we won’t need to update the
formula again.
>
> Here is the empirical data (by testing with the maximum number of device
> tree nodes (e.g., hypervisor and reserved-memory nodes) and enabling all
> optional CPU properties (e.g., clock-frequency)):
>
> cpus=1
> (XEN) Final compacted FDT size is: 1586 bytes
>
> cpus=2
> (XEN) Final compacted FDT size is: 1698 bytes
>
> cpus=32
> (XEN) Final compacted FDT size is: 5058 bytes
>
> cpus=128
> (XEN) Final compacted FDT size is: 15810 bytes
>
>
> static int __init prepare_dtb_domU(struct domain *d, struct kernel_info
> *kinfo)
> {
> int addrcells, sizecells;
> @@ -569,6 +569,8 @@ static int __init prepare_dtb_domU(struct domain *d,
> struct kernel_info *kinfo)
> if ( ret < 0 )
> goto err;
>
> + printk("Final compacted FDT size is: %d bytes\n",
> fdt_totalsize(kinfo->fdt));
> +
> return 0;
>
> err:
>
> This data shows (assuming my testing/calculations are correct):
>
> - A marginal cost of 112 bytes per vCPU in the final, compacted device tree.
> - A fixed base size of 1474 bytes for all non-vCPU content.
>
> Based on that I would propose the following formula with the justification:
>
> /*
> * The size is calculated from a fixed baseline plus a scalable
> * portion for each potential vCPU node up to the system limit
> * (MAX_VIRT_CPUS), as the vCPU nodes are the primary consumer
> * of space.
> *
> * The baseline of 2KiB is a safe buffer for all non-vCPU FDT
> * content. The 128 bytes per vCPU is derived from a worst-case
> * analysis of the FDT construction-time size for a single
> * vCPU node.
> */
> #define DOMU_DTB_SIZE (2048 + (MAX_VIRT_CPUS * 128))
>
> **********************************************
>
> Please tell me would you be happy with that?
I would also like to note that we probably want to add a BUILD_BUG_ON() check
to ensure that DOMU_DTB_SIZE is not larger than SZ_2M. Otherwise, we would get
a runtime error instead of a build-time failure, since there is code that limits
fdtsize to SZ_2M:
/* Cap to max DT size if needed */
fdt_size = min(fdt_size, SZ_2M);
Thanks.
~ Oleksii
On 03.12.25 15:36, Oleksii Kurochko wrote:
> [You don't often get email from oleksii.kurochko@gmail.com. Learn why
> this is important at https://aka.ms/LearnAboutSenderIdentification ]
>
> Hello Oleksandr,
Hello Oleksii
>
> On 12/3/25 12:03 PM, Oleksandr Tyshchenko wrote:
>> On 02.12.25 23:33, Grygorii Strashko wrote:
>>>
>>> On 02.12.25 21:32, Oleksandr Tyshchenko wrote:
>>>> Creating a guest with a high vCPU count (e.g., >32) fails because
>>>> the guest's device tree buffer (DOMU_DTB_SIZE) overflows during
>>>> creation.
>>>> The FDT nodes for each vCPU quickly exhaust the 4KiB buffer,
>>>> causing a guest creation failure.
>>>>
>>>> Increase the buffer size to 16KiB to support guests up to
>>>> the MAX_VIRT_CPUS limit (128).
>>>>
>>>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
>>>> ---
>>>> Noticed when testing the boundary conditions for dom0less guest
>>>> creation on Arm64.
>>>>
>>>> Domain configuration:
>>>> fdt mknod /chosen domU0
>>>> fdt set /chosen/domU0 compatible "xen,domain"
>>>> fdt set /chosen/domU0 \#address-cells <0x2>
>>>> fdt set /chosen/domU0 \#size-cells <0x2>
>>>> fdt set /chosen/domU0 memory <0x0 0x10000 >
>>>> fdt set /chosen/domU0 cpus <33>
>>>> fdt set /chosen/domU0 vpl011
>>>> fdt mknod /chosen/domU0 module@40400000
>>>> fdt set /chosen/domU0/module@40400000 compatible "multiboot,kernel"
>>>> "multiboot,module"
>>>> fdt set /chosen/domU0/module@40400000 reg <0x0 0x40400000 0x0 0x16000 >
>>>> fdt set /chosen/domU0/module@40400000 bootargs "console=ttyAMA0"
>>>>
>>>> Failure log:
>>>> (XEN) Xen dom0less mode detected
>>>> (XEN) *** LOADING DOMU cpus=33 memory=0x10000KB ***
>>>> (XEN) Loading d1 kernel from boot module @ 0000000040400000
>>>> (XEN) Allocating mappings totalling 64MB for d1:
>>>> (XEN) d1 BANK[0] 0x00000040000000-0x00000044000000 (64MB)
>>>> (XEN) Device tree generation failed (-22).
>>>> (XEN)
>>>> (XEN) ****************************************
>>>> (XEN) Panic on CPU 0:
>>>> (XEN) Could not set up domain domU0 (rc = -22)
>>>> (XEN) ****************************************
>>>> ---
>>>> ---
>>>> xen/common/device-tree/dom0less-build.c | 8 +++++---
>>>> 1 file changed, 5 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/xen/common/device-tree/dom0less-build.c b/xen/common/
>>>> device-tree/dom0less-build.c
>>>> index 3f5b987ed8..d7d0a47b97 100644
>>>> --- a/xen/common/device-tree/dom0less-build.c
>>>> +++ b/xen/common/device-tree/dom0less-build.c
>>>> @@ -461,10 +461,12 @@ static int __init
>>>> domain_handle_dtb_boot_module(struct domain *d,
>>>> /*
>>>> * The max size for DT is 2MB. However, the generated DT is small
>>>> (not including
>>>> - * domU passthrough DT nodes whose size we account separately), 4KB
>>>> are enough
>>>> - * for now, but we might have to increase it in the future.
>>>> + * domU passthrough DT nodes whose size we account separately). The
>>>> size is
>>>> + * primarily driven by the number of vCPU nodes. The previous 4KiB
>>>> buffer was
>>>> + * insufficient for guests with high vCPU counts, so it has been
>>>> increased
>>>> + * to support up to the MAX_VIRT_CPUS limit (128).
>>>> */
>>>> -#define DOMU_DTB_SIZE 4096
>>>> +#define DOMU_DTB_SIZE (4096 * 4)
>>> May be It wants Kconfig?
>>> Or some formula which accounts MAX_VIRT_CPUS?
>>
>> I agree that using a formula that accounts for MAX_VIRT_CPUS is the most
>> robust approach.
>
> One option could be to detect the size at runtime, essentially, try to
> allocate
> it, and if an error occurs, increase the fdtsize and try again. I don’t
> really
> like this approach, but I wanted to mention it in case someone finds it
> useful.
> The benefit of this approach is that if, in the future, something else such
> as a CPU node contributes to the final FDT size, we won’t need to update
> the
> formula again.
I got your point and understand the goal, but I see the following
concerns with that:
1. Xen has to do all the work to build the device tree, fail, throw all
that work away, and then start over again. This wastes time during the
system's boot-up process.
2. Boot-time code should be as deterministic and predictable as
possible. A static, worst-case calculation is highly predictable,
whereas a retry loop is not.
3. It adds logical complexity (error handling, looping, size increments)
to what should be a straightforward setup step.
>
>>
>> Here is the empirical data (by testing with the maximum number of device
>> tree nodes (e.g., hypervisor and reserved-memory nodes) and enabling all
>> optional CPU properties (e.g., clock-frequency)):
>>
>> cpus=1
>> (XEN) Final compacted FDT size is: 1586 bytes
>>
>> cpus=2
>> (XEN) Final compacted FDT size is: 1698 bytes
>>
>> cpus=32
>> (XEN) Final compacted FDT size is: 5058 bytes
>>
>> cpus=128
>> (XEN) Final compacted FDT size is: 15810 bytes
>>
>>
>> static int __init prepare_dtb_domU(struct domain *d, struct kernel_info
>> *kinfo)
>> {
>> int addrcells, sizecells;
>> @@ -569,6 +569,8 @@ static int __init prepare_dtb_domU(struct domain *d,
>> struct kernel_info *kinfo)
>> if ( ret < 0 )
>> goto err;
>>
>> + printk("Final compacted FDT size is: %d bytes\n",
>> fdt_totalsize(kinfo->fdt));
>> +
>> return 0;
>>
>> err:
>>
>> This data shows (assuming my testing/calculations are correct):
>>
>> - A marginal cost of 112 bytes per vCPU in the final, compacted device
>> tree.
>> - A fixed base size of 1474 bytes for all non-vCPU content.
>>
>> Based on that I would propose the following formula with the
>> justification:
>>
>> /*
>> * The size is calculated from a fixed baseline plus a scalable
>> * portion for each potential vCPU node up to the system limit
>> * (MAX_VIRT_CPUS), as the vCPU nodes are the primary consumer
>> * of space.
>> *
>> * The baseline of 2KiB is a safe buffer for all non-vCPU FDT
>> * content. The 128 bytes per vCPU is derived from a worst-case
>> * analysis of the FDT construction-time size for a single
>> * vCPU node.
>> */
>> #define DOMU_DTB_SIZE (2048 + (MAX_VIRT_CPUS * 128))
>>
>> **********************************************
>>
>> Please tell me would you be happy with that?
>
> I would also like to note that we probably want to add a BUILD_BUG_ON()
> check
> to ensure that DOMU_DTB_SIZE is not larger than SZ_2M. Otherwise, we
> would get
> a runtime error instead of a build-time failure, since there is code
> that limits
> fdtsize to SZ_2M:
>
> /* Cap to max DT size if needed */
> fdt_size = min(fdt_size, SZ_2M);
ok, sounds reasonable, will add:
BUILD_BUG_ON(DOMU_DTB_SIZE > SZ_2M);
>
> Thanks.
>
> ~ Oleksii
>
Hello Oleksandr, On 12/3/25 3:05 PM, Oleksandr Tyshchenko wrote: > > On 03.12.25 15:36, Oleksii Kurochko wrote: > > >> On 12/3/25 12:03 PM, Oleksandr Tyshchenko wrote: >>> On 02.12.25 23:33, Grygorii Strashko wrote: >>>> On 02.12.25 21:32, Oleksandr Tyshchenko wrote: >>>>> Creating a guest with a high vCPU count (e.g., >32) fails because >>>>> the guest's device tree buffer (DOMU_DTB_SIZE) overflows during >>>>> creation. >>>>> The FDT nodes for each vCPU quickly exhaust the 4KiB buffer, >>>>> causing a guest creation failure. >>>>> >>>>> Increase the buffer size to 16KiB to support guests up to >>>>> the MAX_VIRT_CPUS limit (128). >>>>> >>>>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> >>>>> --- >>>>> Noticed when testing the boundary conditions for dom0less guest >>>>> creation on Arm64. >>>>> >>>>> Domain configuration: >>>>> fdt mknod /chosen domU0 >>>>> fdt set /chosen/domU0 compatible "xen,domain" >>>>> fdt set /chosen/domU0 \#address-cells <0x2> >>>>> fdt set /chosen/domU0 \#size-cells <0x2> >>>>> fdt set /chosen/domU0 memory <0x0 0x10000 > >>>>> fdt set /chosen/domU0 cpus <33> >>>>> fdt set /chosen/domU0 vpl011 >>>>> fdt mknod /chosen/domU0 module@40400000 >>>>> fdt set /chosen/domU0/module@40400000 compatible "multiboot,kernel" >>>>> "multiboot,module" >>>>> fdt set /chosen/domU0/module@40400000 reg <0x0 0x40400000 0x0 0x16000 > >>>>> fdt set /chosen/domU0/module@40400000 bootargs "console=ttyAMA0" >>>>> >>>>> Failure log: >>>>> (XEN) Xen dom0less mode detected >>>>> (XEN) *** LOADING DOMU cpus=33 memory=0x10000KB *** >>>>> (XEN) Loading d1 kernel from boot module @ 0000000040400000 >>>>> (XEN) Allocating mappings totalling 64MB for d1: >>>>> (XEN) d1 BANK[0] 0x00000040000000-0x00000044000000 (64MB) >>>>> (XEN) Device tree generation failed (-22). >>>>> (XEN) >>>>> (XEN) **************************************** >>>>> (XEN) Panic on CPU 0: >>>>> (XEN) Could not set up domain domU0 (rc = -22) >>>>> (XEN) **************************************** >>>>> --- >>>>> --- >>>>> xen/common/device-tree/dom0less-build.c | 8 +++++--- >>>>> 1 file changed, 5 insertions(+), 3 deletions(-) >>>>> >>>>> diff --git a/xen/common/device-tree/dom0less-build.c b/xen/common/ >>>>> device-tree/dom0less-build.c >>>>> index 3f5b987ed8..d7d0a47b97 100644 >>>>> --- a/xen/common/device-tree/dom0less-build.c >>>>> +++ b/xen/common/device-tree/dom0less-build.c >>>>> @@ -461,10 +461,12 @@ static int __init >>>>> domain_handle_dtb_boot_module(struct domain *d, >>>>> /* >>>>> * The max size for DT is 2MB. However, the generated DT is small >>>>> (not including >>>>> - * domU passthrough DT nodes whose size we account separately), 4KB >>>>> are enough >>>>> - * for now, but we might have to increase it in the future. >>>>> + * domU passthrough DT nodes whose size we account separately). The >>>>> size is >>>>> + * primarily driven by the number of vCPU nodes. The previous 4KiB >>>>> buffer was >>>>> + * insufficient for guests with high vCPU counts, so it has been >>>>> increased >>>>> + * to support up to the MAX_VIRT_CPUS limit (128). >>>>> */ >>>>> -#define DOMU_DTB_SIZE 4096 >>>>> +#define DOMU_DTB_SIZE (4096 * 4) >>>> May be It wants Kconfig? >>>> Or some formula which accounts MAX_VIRT_CPUS? >>> I agree that using a formula that accounts for MAX_VIRT_CPUS is the most >>> robust approach. >> One option could be to detect the size at runtime, essentially, try to >> allocate >> it, and if an error occurs, increase the fdtsize and try again. I don’t >> really >> like this approach, but I wanted to mention it in case someone finds it >> useful. >> The benefit of this approach is that if, in the future, something else such >> as a CPU node contributes to the final FDT size, we won’t need to update >> the >> formula again. > I got your point and understand the goal, but I see the following > concerns with that: > > 1. Xen has to do all the work to build the device tree, fail, throw all > that work away, and then start over again. This wastes time during the > system's boot-up process. > > 2. Boot-time code should be as deterministic and predictable as > possible. A static, worst-case calculation is highly predictable, > whereas a retry loop is not. > > 3. It adds logical complexity (error handling, looping, size increments) > to what should be a straightforward setup step. Yes, I totally agree with all your concerns, so lets just go with a formula approach. Thanks. ~ Oleksii
© 2016 - 2025 Red Hat, Inc.