Fix the code which tries to pad the load segment to 2 MiB but only pads
it to a 1 MiB boundary.
This manifested itself as a page fault while scrubbing RAM during boot.
Xen failed to mark its location as reserved in the E820 because the last
2 MiB superpage overlapped a reserved region which meant the memory was
given to the allocator despite being RO.
Fixes: 4fb075201f54 ("x86/mkelf32: pad load segment to 2Mb boundary")
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
---
xen/arch/x86/boot/mkelf32.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/xen/arch/x86/boot/mkelf32.c b/xen/arch/x86/boot/mkelf32.c
index 373ba4ddd593..469d1ba0af41 100644
--- a/xen/arch/x86/boot/mkelf32.c
+++ b/xen/arch/x86/boot/mkelf32.c
@@ -345,7 +345,7 @@ int main(int argc, char **argv)
* the Xen image using 2M pages. To avoid running into adjacent non-RAM
* regions, pad the segment to the next 2M boundary.
*/
- mem_siz = ((uint32_t)in64_phdr.p_memsz + (1U << 20) - 1) & (-1U << 20);
+ mem_siz = ((uint32_t)in64_phdr.p_memsz + (1U << 21) - 1) & (-1U << 21);
note_sz = note_base = offset = 0;
if ( num_phdrs > 1 )
--
2.53.0
On 17/04/2026 11:54 am, Ross Lagerwall wrote:
> Fix the code which tries to pad the load segment to 2 MiB but only pads
> it to a 1 MiB boundary.
>
> This manifested itself as a page fault while scrubbing RAM during boot.
> Xen failed to mark its location as reserved in the E820 because the last
> 2 MiB superpage overlapped a reserved region which meant the memory was
> given to the allocator despite being RO.
Do you have the relevant snippet of the E820?
AIUI, you're saying that Xen was placed immediately below an E820
reserved region (a valid layout at 1M alignment), where said region was
inside the 2M-aligned boundary that Xen was expecting.
But I don't quite follow what happened next. Where does read-only-ness
come into it?
>
> Fixes: 4fb075201f54 ("x86/mkelf32: pad load segment to 2Mb boundary")
> Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
For the patch itself,
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
but likely to want a tweak to the commit message.
~Andrew
On 4/17/26 12:03 PM, Andrew Cooper wrote: > On 17/04/2026 11:54 am, Ross Lagerwall wrote: >> Fix the code which tries to pad the load segment to 2 MiB but only pads >> it to a 1 MiB boundary. >> >> This manifested itself as a page fault while scrubbing RAM during boot. >> Xen failed to mark its location as reserved in the E820 because the last >> 2 MiB superpage overlapped a reserved region which meant the memory was >> given to the allocator despite being RO. > > Do you have the relevant snippet of the E820? > > AIUI, you're saying that Xen was placed immediately below an E820 > reserved region (a valid layout at 1M alignment), where said region was > inside the 2M-aligned boundary that Xen was expecting. > > But I don't quite follow what happened next. Where does read-only-ness > come into it? > Relevant E820: (XEN) [00000063469ff02c] [000000003f2df000, 000000003f31efff] (ACPI NVS) (XEN) [00000063519dc9f2] [000000003f31f000, 000000004cfebfff] (usable) (XEN) [000000635c504aff] [000000004cfec000, 000000004d07bfff] (ACPI data) (XEN) [00000063677372dc] [000000004d07c000, 000000004d09bfff] (ACPI NVS) With a load size of 0x900000 (padded to a 1 MiB boundary), Xen was placed at 4c600000-4cefffff. In __start_xen(), there is a call... reserve_e820_ram(&boot_e820, __pa(_stext), __pa(__2M_rwdata_end)); ... which tries to reserve the region 4c600000-4cffffff (size 0x1000000), padded to a 2 MiB boundary since it is using superpages. reserve_e820_ram() doesn't reserve anything because the request doesn't fall within a single RAM region. Therefore, the pages get treated as normal RAM and will get scrubbed later. However, __start_xen() also calls modify_xen_mappings() to mark all of .text and .rodata as RO in the direct map so when it actually tries to scrub it it gets a page fault instead (which is I suppose slightly better than just zeroing Xen's .text). Ross
On 17/04/2026 2:25 pm, Ross Lagerwall wrote: > On 4/17/26 12:03 PM, Andrew Cooper wrote: >> On 17/04/2026 11:54 am, Ross Lagerwall wrote: >>> Fix the code which tries to pad the load segment to 2 MiB but only pads >>> it to a 1 MiB boundary. >>> >>> This manifested itself as a page fault while scrubbing RAM during boot. >>> Xen failed to mark its location as reserved in the E820 because the >>> last >>> 2 MiB superpage overlapped a reserved region which meant the memory was >>> given to the allocator despite being RO. >> >> Do you have the relevant snippet of the E820? >> >> AIUI, you're saying that Xen was placed immediately below an E820 >> reserved region (a valid layout at 1M alignment), where said region was >> inside the 2M-aligned boundary that Xen was expecting. >> >> But I don't quite follow what happened next. Where does read-only-ness >> come into it? >> > > Relevant E820: > > (XEN) [00000063469ff02c] [000000003f2df000, 000000003f31efff] (ACPI NVS) > (XEN) [00000063519dc9f2] [000000003f31f000, 000000004cfebfff] (usable) > (XEN) [000000635c504aff] [000000004cfec000, 000000004d07bfff] (ACPI > data) > (XEN) [00000063677372dc] [000000004d07c000, 000000004d09bfff] (ACPI NVS) > > With a load size of 0x900000 (padded to a 1 MiB boundary), Xen was > placed at > 4c600000-4cefffff. > > In __start_xen(), there is a call... > > reserve_e820_ram(&boot_e820, __pa(_stext), __pa(__2M_rwdata_end)); > > ... which tries to reserve the region 4c600000-4cffffff (size 0x1000000), > padded to a 2 MiB boundary since it is using superpages. > > reserve_e820_ram() doesn't reserve anything because the request > doesn't fall > within a single RAM region. Therefore, the pages get treated as normal > RAM and > will get scrubbed later. However, __start_xen() also calls > modify_xen_mappings() to mark all of .text and .rodata as RO in the > direct map > so when it actually tries to scrub it it gets a page fault instead > (which is I > suppose slightly better than just zeroing Xen's .text). Oh, well I'm glad that I fought to adjust the directmap perms. This is exactly the kind of thing I was looking to catch. reserve_e820_ram() failing here is also catastrophic; the bootscrub can be bypassed with a cmdline parameter. Either way, can I suggest the following adjustment to the commit message: This manifested itself as a page fault while scrubbing RAM during boot. Xen failed to mark itself as reserved in the E820 (due to spanning multiple regions), but did restrict the permissions in the directmap. All of Xen is then handed to physical memory manager as available for use, and scrubbing hit the directmap protections. ? I can fix on commit if you're happy. ~Andrew
On 4/17/26 3:35 PM, Andrew Cooper wrote: > On 17/04/2026 2:25 pm, Ross Lagerwall wrote: >> On 4/17/26 12:03 PM, Andrew Cooper wrote: >>> On 17/04/2026 11:54 am, Ross Lagerwall wrote: >>>> Fix the code which tries to pad the load segment to 2 MiB but only pads >>>> it to a 1 MiB boundary. >>>> >>>> This manifested itself as a page fault while scrubbing RAM during boot. >>>> Xen failed to mark its location as reserved in the E820 because the >>>> last >>>> 2 MiB superpage overlapped a reserved region which meant the memory was >>>> given to the allocator despite being RO. >>> >>> Do you have the relevant snippet of the E820? >>> >>> AIUI, you're saying that Xen was placed immediately below an E820 >>> reserved region (a valid layout at 1M alignment), where said region was >>> inside the 2M-aligned boundary that Xen was expecting. >>> >>> But I don't quite follow what happened next. Where does read-only-ness >>> come into it? >>> >> >> Relevant E820: >> >> (XEN) [00000063469ff02c] [000000003f2df000, 000000003f31efff] (ACPI NVS) >> (XEN) [00000063519dc9f2] [000000003f31f000, 000000004cfebfff] (usable) >> (XEN) [000000635c504aff] [000000004cfec000, 000000004d07bfff] (ACPI >> data) >> (XEN) [00000063677372dc] [000000004d07c000, 000000004d09bfff] (ACPI NVS) >> >> With a load size of 0x900000 (padded to a 1 MiB boundary), Xen was >> placed at >> 4c600000-4cefffff. >> >> In __start_xen(), there is a call... >> >> reserve_e820_ram(&boot_e820, __pa(_stext), __pa(__2M_rwdata_end)); >> >> ... which tries to reserve the region 4c600000-4cffffff (size 0x1000000), >> padded to a 2 MiB boundary since it is using superpages. >> >> reserve_e820_ram() doesn't reserve anything because the request >> doesn't fall >> within a single RAM region. Therefore, the pages get treated as normal >> RAM and >> will get scrubbed later. However, __start_xen() also calls >> modify_xen_mappings() to mark all of .text and .rodata as RO in the >> direct map >> so when it actually tries to scrub it it gets a page fault instead >> (which is I >> suppose slightly better than just zeroing Xen's .text). > > Oh, well I'm glad that I fought to adjust the directmap perms. This is > exactly the kind of thing I was looking to catch. > > reserve_e820_ram() failing here is also catastrophic; the bootscrub can > be bypassed with a cmdline parameter. > > Either way, can I suggest the following adjustment to the commit message: > > This manifested itself as a page fault while scrubbing RAM during boot. > Xen failed to mark itself as reserved in the E820 (due to spanning > multiple regions), but did restrict the permissions in the directmap. > All of Xen is then handed to physical memory manager as available for > use, and scrubbing hit the directmap protections. > > ? Sure, fine with me. Thanks, Ross
© 2016 - 2026 Red Hat, Inc.