In order to legitimately set up initial mappings past _end[], we need
to make sure that the entire mapped range is inside a RAM region.
Therefore we need to inform the bootloader (or alike) that our allocated
size is larger than just the next SECTION_ALIGN-ed boundary past _end[].
This allows dropping a command line option from the tool, which was
introduced to work around a supposed linker bug, when the problem was
really Xen's.
While adjusting adjacent code, correct the argc check to also cover the
case correctly when --notes was passed.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
There's no good Fixes: tag, I don't think, as in theory the issue could
even have happened when we still required to be loaded at a fixed
physical address (1Mb originally, later 2Mb), and when we statically
mapped the low 16Mb. If we assumed such can't happen below 16Mb, these
two should be added:
Fixes: e4dd91ea85a3 ("x86: Ensure RAM holes really are not mapped in Xen's ongoing 1:1 physmap")
Fixes: 7cd7f2f5e116 ("x86/boot: Remove the preconstructed low 16M superpage mappings")
---
v2: New.
--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -130,8 +130,7 @@ orphan-handling-$(call ld-option,--orpha
$(TARGET): TMP = $(dot-target).elf32
$(TARGET): $(TARGET)-syms $(efi-y) $(obj)/boot/mkelf32
- $(obj)/boot/mkelf32 $(notes_phdrs) $(TARGET)-syms $(TMP) $(XEN_IMG_OFFSET) \
- `$(NM) $(TARGET)-syms | sed -ne 's/^\([^ ]*\) . __2M_rwdata_end$$/0x\1/p'`
+ $(obj)/boot/mkelf32 $(notes_phdrs) $(TARGET)-syms $(TMP) $(XEN_IMG_OFFSET)
od -t x4 -N 8192 $(TMP) | grep 1badb002 > /dev/null || \
{ echo "No Multiboot1 header found" >&2; false; }
od -t x4 -N 32768 $(TMP) | grep e85250d6 > /dev/null || \
--- a/xen/arch/x86/boot/mkelf32.c
+++ b/xen/arch/x86/boot/mkelf32.c
@@ -248,7 +248,6 @@ static void do_read(int fd, void *data,
int main(int argc, char **argv)
{
- uint64_t final_exec_addr;
uint32_t loadbase, dat_siz, mem_siz, note_base, note_sz, offset;
char *inimage, *outimage;
int infd, outfd;
@@ -261,22 +260,24 @@ int main(int argc, char **argv)
Elf64_Ehdr in64_ehdr;
Elf64_Phdr in64_phdr;
- if ( argc < 5 )
+ if ( argc < 4 )
{
+ help:
fprintf(stderr, "Usage: mkelf32 [--notes] <in-image> <out-image> "
- "<load-base> <final-exec-addr>\n");
+ "<load-base>\n");
return 1;
}
if ( !strcmp(argv[1], "--notes") )
{
+ if ( argc < 5 )
+ goto help;
i = 2;
num_phdrs = 2;
}
inimage = argv[i++];
outimage = argv[i++];
loadbase = strtoul(argv[i++], NULL, 16);
- final_exec_addr = strtoull(argv[i++], NULL, 16);
infd = open(inimage, O_RDONLY);
if ( infd == -1 )
@@ -339,9 +340,12 @@ int main(int argc, char **argv)
(void)lseek(infd, in64_phdr.p_offset, SEEK_SET);
dat_siz = (uint32_t)in64_phdr.p_filesz;
- /* Do not use p_memsz: it does not include BSS alignment padding. */
- /*mem_siz = (uint32_t)in64_phdr.p_memsz;*/
- mem_siz = (uint32_t)(final_exec_addr - in64_phdr.p_vaddr);
+ /*
+ * We don't pad .bss in the linker script, but during early boot we map
+ * the Xen image using 2M pages. To avoid running into adjacent non-RAM
+ * regions, pad the segment to the next 2M boundary.
+ */
+ mem_siz = ((uint32_t)in64_phdr.p_memsz + (1U << 20) - 1) & (-1U << 20);
note_sz = note_base = offset = 0;
if ( num_phdrs > 1 )
On Mon, Aug 11, 2025 at 12:49:57PM +0200, Jan Beulich wrote:
> In order to legitimately set up initial mappings past _end[], we need
> to make sure that the entire mapped range is inside a RAM region.
> Therefore we need to inform the bootloader (or alike) that our allocated
> size is larger than just the next SECTION_ALIGN-ed boundary past _end[].
>
> This allows dropping a command line option from the tool, which was
> introduced to work around a supposed linker bug, when the problem was
> really Xen's.
>
> While adjusting adjacent code, correct the argc check to also cover the
> case correctly when --notes was passed.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> ---
> There's no good Fixes: tag, I don't think, as in theory the issue could
> even have happened when we still required to be loaded at a fixed
> physical address (1Mb originally, later 2Mb), and when we statically
> mapped the low 16Mb. If we assumed such can't happen below 16Mb, these
> two should be added:
> Fixes: e4dd91ea85a3 ("x86: Ensure RAM holes really are not mapped in Xen's ongoing 1:1 physmap")
> Fixes: 7cd7f2f5e116 ("x86/boot: Remove the preconstructed low 16M superpage mappings")
> ---
> v2: New.
>
> --- a/xen/arch/x86/Makefile
> +++ b/xen/arch/x86/Makefile
> @@ -130,8 +130,7 @@ orphan-handling-$(call ld-option,--orpha
>
> $(TARGET): TMP = $(dot-target).elf32
> $(TARGET): $(TARGET)-syms $(efi-y) $(obj)/boot/mkelf32
> - $(obj)/boot/mkelf32 $(notes_phdrs) $(TARGET)-syms $(TMP) $(XEN_IMG_OFFSET) \
> - `$(NM) $(TARGET)-syms | sed -ne 's/^\([^ ]*\) . __2M_rwdata_end$$/0x\1/p'`
> + $(obj)/boot/mkelf32 $(notes_phdrs) $(TARGET)-syms $(TMP) $(XEN_IMG_OFFSET)
> od -t x4 -N 8192 $(TMP) | grep 1badb002 > /dev/null || \
> { echo "No Multiboot1 header found" >&2; false; }
> od -t x4 -N 32768 $(TMP) | grep e85250d6 > /dev/null || \
> --- a/xen/arch/x86/boot/mkelf32.c
> +++ b/xen/arch/x86/boot/mkelf32.c
> @@ -248,7 +248,6 @@ static void do_read(int fd, void *data,
>
> int main(int argc, char **argv)
> {
> - uint64_t final_exec_addr;
> uint32_t loadbase, dat_siz, mem_siz, note_base, note_sz, offset;
> char *inimage, *outimage;
> int infd, outfd;
> @@ -261,22 +260,24 @@ int main(int argc, char **argv)
> Elf64_Ehdr in64_ehdr;
> Elf64_Phdr in64_phdr;
>
> - if ( argc < 5 )
> + if ( argc < 4 )
> {
> + help:
> fprintf(stderr, "Usage: mkelf32 [--notes] <in-image> <out-image> "
> - "<load-base> <final-exec-addr>\n");
> + "<load-base>\n");
> return 1;
> }
>
> if ( !strcmp(argv[1], "--notes") )
> {
> + if ( argc < 5 )
> + goto help;
> i = 2;
> num_phdrs = 2;
> }
> inimage = argv[i++];
> outimage = argv[i++];
> loadbase = strtoul(argv[i++], NULL, 16);
> - final_exec_addr = strtoull(argv[i++], NULL, 16);
>
> infd = open(inimage, O_RDONLY);
> if ( infd == -1 )
> @@ -339,9 +340,12 @@ int main(int argc, char **argv)
> (void)lseek(infd, in64_phdr.p_offset, SEEK_SET);
> dat_siz = (uint32_t)in64_phdr.p_filesz;
>
> - /* Do not use p_memsz: it does not include BSS alignment padding. */
> - /*mem_siz = (uint32_t)in64_phdr.p_memsz;*/
> - mem_siz = (uint32_t)(final_exec_addr - in64_phdr.p_vaddr);
> + /*
> + * We don't pad .bss in the linker script, but during early boot we map
> + * the Xen image using 2M pages. To avoid running into adjacent non-RAM
> + * regions, pad the segment to the next 2M boundary.
Won't it be easier to pad in the linker script? We could still have
__bss_end before the padding, so that initialization isn't done to the
extra padding area. Otherwise it would be helpful to mention why the
padding must be done here (opposed to being done in the linker
script).
Thanks, Roger.
On 12.08.2025 18:18, Roger Pau Monné wrote: > On Mon, Aug 11, 2025 at 12:49:57PM +0200, Jan Beulich wrote: >> @@ -339,9 +340,12 @@ int main(int argc, char **argv) >> (void)lseek(infd, in64_phdr.p_offset, SEEK_SET); >> dat_siz = (uint32_t)in64_phdr.p_filesz; >> >> - /* Do not use p_memsz: it does not include BSS alignment padding. */ >> - /*mem_siz = (uint32_t)in64_phdr.p_memsz;*/ >> - mem_siz = (uint32_t)(final_exec_addr - in64_phdr.p_vaddr); >> + /* >> + * We don't pad .bss in the linker script, but during early boot we map >> + * the Xen image using 2M pages. To avoid running into adjacent non-RAM >> + * regions, pad the segment to the next 2M boundary. > > Won't it be easier to pad in the linker script? We could still have > __bss_end before the padding, so that initialization isn't done to the > extra padding area. Otherwise it would be helpful to mention why the > padding must be done here (opposed to being done in the linker > script). The way the linker script currently is written doesn't lend itself to do the padding there: It would either mean to introduce an artificial padding section (which I'd dislike), or it would result in _end[] and __2M_rwdata_end[] also moving, which pretty clearly we don't want. Maybe there are other options that I simply don't see. A further complication would be xen.efi's .reloc, which we don't want to needlessly move either. That may be coverable by pr-processor conditionals, but I wanted to mention the aspect nevertheless. Jan
On Thu, Aug 14, 2025 at 09:02:35AM +0200, Jan Beulich wrote: > On 12.08.2025 18:18, Roger Pau Monné wrote: > > On Mon, Aug 11, 2025 at 12:49:57PM +0200, Jan Beulich wrote: > >> @@ -339,9 +340,12 @@ int main(int argc, char **argv) > >> (void)lseek(infd, in64_phdr.p_offset, SEEK_SET); > >> dat_siz = (uint32_t)in64_phdr.p_filesz; > >> > >> - /* Do not use p_memsz: it does not include BSS alignment padding. */ > >> - /*mem_siz = (uint32_t)in64_phdr.p_memsz;*/ > >> - mem_siz = (uint32_t)(final_exec_addr - in64_phdr.p_vaddr); > >> + /* > >> + * We don't pad .bss in the linker script, but during early boot we map > >> + * the Xen image using 2M pages. To avoid running into adjacent non-RAM > >> + * regions, pad the segment to the next 2M boundary. > > > > Won't it be easier to pad in the linker script? We could still have > > __bss_end before the padding, so that initialization isn't done to the > > extra padding area. Otherwise it would be helpful to mention why the > > padding must be done here (opposed to being done in the linker > > script). > > The way the linker script currently is written doesn't lend itself to do > the padding there: It would either mean to introduce an artificial > padding section (which I'd dislike), or it would result in _end[] and > __2M_rwdata_end[] also moving, which pretty clearly we don't want. Maybe > there are other options that I simply don't see. We could move both _end and __2M_rwdata_end inside the .bss section, but that's also ugly IMO. I would probably prefer the extra padding section. > A further complication would be xen.efi's .reloc, which we don't want to > needlessly move either. That may be coverable by pr-processor > conditionals, but I wanted to mention the aspect nevertheless. Yeah, we could make the extra padding section depend on pre-processor checks. I think I would prefer the usage of such extra section rather than mangling the elf program headers afterwards, but since we are already doing it: Acked-by: Roger Pau Monné <roger.pau@citrix.com> Thanks, Roger.
© 2016 - 2025 Red Hat, Inc.