arch/x86/kernel/reboot.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
native_machine_emergency_restart() writes unconditionally
to the physical address of 0x472 to pass the warm reboot
flags to BIOS. The BIOS reads this on booting to bypass memory
test and do the warm boot. On the non-BIOS systems, other
means have to be employed, and this write is a memory corruption.
Fix that by moving the offending write into the case where
the machine is rebooted via BIOS.
Signed-off-by: Roman Kisel <romank@linux.microsoft.com>
---
arch/x86/kernel/reboot.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index 615922838c51..6eec8653493f 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -637,9 +637,8 @@ static void native_machine_emergency_restart(void)
tboot_shutdown(TB_SHUTDOWN_REBOOT);
- /* Tell the BIOS if we want cold or warm reboot */
+ /* Tell the firmware if we want cold or warm reboot */
mode = reboot_mode == REBOOT_WARM ? 0x1234 : 0;
- *((unsigned short *)__va(0x472)) = mode;
/*
* If an EFI capsule has been registered with the firmware then
@@ -681,6 +680,7 @@ static void native_machine_emergency_restart(void)
break;
case BOOT_BIOS:
+ *((unsigned short *)__va(0x472)) = mode;
machine_real_restart(MRR_BIOS);
/* We're probably dead after this, but... */
base-commit: eea6e4b4dfb8859446177c32961c96726d0117be
--
2.34.1
* Roman Kisel <romank@linux.microsoft.com> wrote: > native_machine_emergency_restart() writes unconditionally > to the physical address of 0x472 to pass the warm reboot > flags to BIOS. The BIOS reads this on booting to bypass memory > test and do the warm boot. On the non-BIOS systems, other > means have to be employed, and this write is a memory corruption. > > Fix that by moving the offending write into the case where > the machine is rebooted via BIOS. > > Signed-off-by: Roman Kisel <romank@linux.microsoft.com> > --- > arch/x86/kernel/reboot.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c > index 615922838c51..6eec8653493f 100644 > --- a/arch/x86/kernel/reboot.c > +++ b/arch/x86/kernel/reboot.c > @@ -637,9 +637,8 @@ static void native_machine_emergency_restart(void) > > tboot_shutdown(TB_SHUTDOWN_REBOOT); > > - /* Tell the BIOS if we want cold or warm reboot */ > + /* Tell the firmware if we want cold or warm reboot */ > mode = reboot_mode == REBOOT_WARM ? 0x1234 : 0; > - *((unsigned short *)__va(0x472)) = mode; > > /* > * If an EFI capsule has been registered with the firmware then > @@ -681,6 +680,7 @@ static void native_machine_emergency_restart(void) > break; > > case BOOT_BIOS: > + *((unsigned short *)__va(0x472)) = mode; > machine_real_restart(MRR_BIOS); If the value of 0x472 is only meaningful in the legacy 'BOOT_BIOS' case, then at minimum the whole block should be moved to that case, not just the setting. Thanks, Ingo
On February 25, 2025 12:25:12 PM PST, Ingo Molnar <mingo@kernel.org> wrote: > >* Roman Kisel <romank@linux.microsoft.com> wrote: > >> native_machine_emergency_restart() writes unconditionally >> to the physical address of 0x472 to pass the warm reboot >> flags to BIOS. The BIOS reads this on booting to bypass memory >> test and do the warm boot. On the non-BIOS systems, other >> means have to be employed, and this write is a memory corruption. >> >> Fix that by moving the offending write into the case where >> the machine is rebooted via BIOS. >> >> Signed-off-by: Roman Kisel <romank@linux.microsoft.com> >> --- >> arch/x86/kernel/reboot.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c >> index 615922838c51..6eec8653493f 100644 >> --- a/arch/x86/kernel/reboot.c >> +++ b/arch/x86/kernel/reboot.c >> @@ -637,9 +637,8 @@ static void native_machine_emergency_restart(void) >> >> tboot_shutdown(TB_SHUTDOWN_REBOOT); >> >> - /* Tell the BIOS if we want cold or warm reboot */ >> + /* Tell the firmware if we want cold or warm reboot */ >> mode = reboot_mode == REBOOT_WARM ? 0x1234 : 0; >> - *((unsigned short *)__va(0x472)) = mode; >> >> /* >> * If an EFI capsule has been registered with the firmware then >> @@ -681,6 +680,7 @@ static void native_machine_emergency_restart(void) >> break; >> >> case BOOT_BIOS: >> + *((unsigned short *)__va(0x472)) = mode; >> machine_real_restart(MRR_BIOS); > >If the value of 0x472 is only meaningful in the legacy 'BOOT_BIOS' >case, then at minimum the whole block should be moved to that case, not >just the setting. > >Thanks, > > Ingo Does the memory corruption actually matter, though?
* H. Peter Anvin <hpa@zytor.com> wrote: > On February 25, 2025 12:25:12 PM PST, Ingo Molnar <mingo@kernel.org> wrote: > > > >* Roman Kisel <romank@linux.microsoft.com> wrote: > > > >> native_machine_emergency_restart() writes unconditionally > >> to the physical address of 0x472 to pass the warm reboot > >> flags to BIOS. The BIOS reads this on booting to bypass memory > >> test and do the warm boot. On the non-BIOS systems, other > >> means have to be employed, and this write is a memory corruption. > >> > >> Fix that by moving the offending write into the case where > >> the machine is rebooted via BIOS. > >> > >> Signed-off-by: Roman Kisel <romank@linux.microsoft.com> > >> --- > >> arch/x86/kernel/reboot.c | 4 ++-- > >> 1 file changed, 2 insertions(+), 2 deletions(-) > >> > >> diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c > >> index 615922838c51..6eec8653493f 100644 > >> --- a/arch/x86/kernel/reboot.c > >> +++ b/arch/x86/kernel/reboot.c > >> @@ -637,9 +637,8 @@ static void native_machine_emergency_restart(void) > >> > >> tboot_shutdown(TB_SHUTDOWN_REBOOT); > >> > >> - /* Tell the BIOS if we want cold or warm reboot */ > >> + /* Tell the firmware if we want cold or warm reboot */ > >> mode = reboot_mode == REBOOT_WARM ? 0x1234 : 0; > >> - *((unsigned short *)__va(0x472)) = mode; > >> > >> /* > >> * If an EFI capsule has been registered with the firmware then > >> @@ -681,6 +680,7 @@ static void native_machine_emergency_restart(void) > >> break; > >> > >> case BOOT_BIOS: > >> + *((unsigned short *)__va(0x472)) = mode; > >> machine_real_restart(MRR_BIOS); > > > >If the value of 0x472 is only meaningful in the legacy 'BOOT_BIOS' > >case, then at minimum the whole block should be moved to that case, not > >just the setting. > > > >Thanks, > > > > Ingo > > Does the memory corruption actually matter, though? I presume the issue came up by auditing & reviewing host writes to a barebones non-legacy VM? I'd argue that we shouldn't be writing random values into random legacy addresses, especially if that special address doesnt seem to be covered by any modern spec? Basic defensive coding practices and such. Thanks, Ingo
On February 25, 2025 12:39:59 PM PST, Ingo Molnar <mingo@kernel.org> wrote: > >* H. Peter Anvin <hpa@zytor.com> wrote: > >> On February 25, 2025 12:25:12 PM PST, Ingo Molnar <mingo@kernel.org> wrote: >> > >> >* Roman Kisel <romank@linux.microsoft.com> wrote: >> > >> >> native_machine_emergency_restart() writes unconditionally >> >> to the physical address of 0x472 to pass the warm reboot >> >> flags to BIOS. The BIOS reads this on booting to bypass memory >> >> test and do the warm boot. On the non-BIOS systems, other >> >> means have to be employed, and this write is a memory corruption. >> >> >> >> Fix that by moving the offending write into the case where >> >> the machine is rebooted via BIOS. >> >> >> >> Signed-off-by: Roman Kisel <romank@linux.microsoft.com> >> >> --- >> >> arch/x86/kernel/reboot.c | 4 ++-- >> >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> >> >> diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c >> >> index 615922838c51..6eec8653493f 100644 >> >> --- a/arch/x86/kernel/reboot.c >> >> +++ b/arch/x86/kernel/reboot.c >> >> @@ -637,9 +637,8 @@ static void native_machine_emergency_restart(void) >> >> >> >> tboot_shutdown(TB_SHUTDOWN_REBOOT); >> >> >> >> - /* Tell the BIOS if we want cold or warm reboot */ >> >> + /* Tell the firmware if we want cold or warm reboot */ >> >> mode = reboot_mode == REBOOT_WARM ? 0x1234 : 0; >> >> - *((unsigned short *)__va(0x472)) = mode; >> >> >> >> /* >> >> * If an EFI capsule has been registered with the firmware then >> >> @@ -681,6 +680,7 @@ static void native_machine_emergency_restart(void) >> >> break; >> >> >> >> case BOOT_BIOS: >> >> + *((unsigned short *)__va(0x472)) = mode; >> >> machine_real_restart(MRR_BIOS); >> > >> >If the value of 0x472 is only meaningful in the legacy 'BOOT_BIOS' >> >case, then at minimum the whole block should be moved to that case, not >> >just the setting. >> > >> >Thanks, >> > >> > Ingo >> >> Does the memory corruption actually matter, though? > >I presume the issue came up by auditing & reviewing host writes to a >barebones non-legacy VM? > >I'd argue that we shouldn't be writing random values into random legacy >addresses, especially if that special address doesnt seem to be covered >by any modern spec? Basic defensive coding practices and such. > >Thanks, > > Ingo The ultimate answer to the question is probably WWWD.
On January 9, 2025 12:43:52 PM PST, Roman Kisel <romank@linux.microsoft.com> wrote: >native_machine_emergency_restart() writes unconditionally >to the physical address of 0x472 to pass the warm reboot >flags to BIOS. The BIOS reads this on booting to bypass memory >test and do the warm boot. On the non-BIOS systems, other >means have to be employed, and this write is a memory corruption. > >Fix that by moving the offending write into the case where >the machine is rebooted via BIOS. > >Signed-off-by: Roman Kisel <romank@linux.microsoft.com> >--- > arch/x86/kernel/reboot.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > >diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c >index 615922838c51..6eec8653493f 100644 >--- a/arch/x86/kernel/reboot.c >+++ b/arch/x86/kernel/reboot.c >@@ -637,9 +637,8 @@ static void native_machine_emergency_restart(void) > > tboot_shutdown(TB_SHUTDOWN_REBOOT); > >- /* Tell the BIOS if we want cold or warm reboot */ >+ /* Tell the firmware if we want cold or warm reboot */ > mode = reboot_mode == REBOOT_WARM ? 0x1234 : 0; >- *((unsigned short *)__va(0x472)) = mode; > > /* > * If an EFI capsule has been registered with the firmware then >@@ -681,6 +680,7 @@ static void native_machine_emergency_restart(void) > break; > > case BOOT_BIOS: >+ *((unsigned short *)__va(0x472)) = mode; > machine_real_restart(MRR_BIOS); > > /* We're probably dead after this, but... */ > >base-commit: eea6e4b4dfb8859446177c32961c96726d0117be I should say: this patch is unambiguously *wrong*. It conflates the invocation mechanism with the desired post state, and they are not coupled. Calling the BIOS reboot entry point is not the normal way to reboot even on BIOS systems.
On 1/9/2025 7:25 PM, H. Peter Anvin wrote: > On January 9, 2025 12:43:52 PM PST, Roman Kisel <romank@linux.microsoft.com> wrote: [...] > > I should say: this patch is unambiguously *wrong*. It conflates the invocation mechanism with the desired post state, and they are not coupled. Calling the BIOS reboot entry point is not the normal way to reboot even on BIOS systems. Thank you very much for taking time to review and for the chance to learn from you! Would you like me to propose another patch where the line of *((unsigned short *)__va(0x472)) = mode; receives a comment for posterity why that it is okay to write at that address? Perhaps, /* * The common practice for the firmware is to report 0x0..0x1000 * as reserved in the RAM map. The value written to the address of * 0x472 may be used by the firmware to perform the cold or the warm * boot. */ might be a good addition to the code. I've looked at the UEFI+ACPI spec and couldn't find any mentions of that magical address, and the code writes at that address even in the UEFI case. If you have time to recommend normative documents where that might be explained, I'd greatly appreciate that! -- Thank you, Roman
On January 9, 2025 12:43:52 PM PST, Roman Kisel <romank@linux.microsoft.com> wrote: >native_machine_emergency_restart() writes unconditionally >to the physical address of 0x472 to pass the warm reboot >flags to BIOS. The BIOS reads this on booting to bypass memory >test and do the warm boot. On the non-BIOS systems, other >means have to be employed, and this write is a memory corruption. > >Fix that by moving the offending write into the case where >the machine is rebooted via BIOS. > >Signed-off-by: Roman Kisel <romank@linux.microsoft.com> >--- > arch/x86/kernel/reboot.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > >diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c >index 615922838c51..6eec8653493f 100644 >--- a/arch/x86/kernel/reboot.c >+++ b/arch/x86/kernel/reboot.c >@@ -637,9 +637,8 @@ static void native_machine_emergency_restart(void) > > tboot_shutdown(TB_SHUTDOWN_REBOOT); > >- /* Tell the BIOS if we want cold or warm reboot */ >+ /* Tell the firmware if we want cold or warm reboot */ > mode = reboot_mode == REBOOT_WARM ? 0x1234 : 0; >- *((unsigned short *)__va(0x472)) = mode; > > /* > * If an EFI capsule has been registered with the firmware then >@@ -681,6 +680,7 @@ static void native_machine_emergency_restart(void) > break; > > case BOOT_BIOS: >+ *((unsigned short *)__va(0x472)) = mode; > machine_real_restart(MRR_BIOS); > > /* We're probably dead after this, but... */ > >base-commit: eea6e4b4dfb8859446177c32961c96726d0117be Which system have you seen where this "corrupts" memory?
On Thu, 09 Jan 2025 19:23:31 -0800, H. Peter Anvin <hpa@zytor.com> wrote: [...] >Which system have you seen where this "corrupts" memory? These are X86_64 systems that boot off of confguration in DeviceTree, and neither ACPI, BIOS, nor UEFI are present. The processors start running in the long mode right off the bat, and the physical memory below 4GiB is non-RAM or reserved. The systems are virtual machines. All that makes `*((unsigned short *)__va(0x472)) = mode;` constitue a stray write and effectively corrupt contents at the address of 0x472 for these virtual systems. On the physical systems I have had access to, 0x0..0x1000 seems to be reported as reserved in E820 or UEFI memory map, and that's what the CPU might need in the 16-bit mode for the brief times it might be used. Perhaps, `native_machine_emergency_restart` being tailored to the quirks of the physical hardware cannot absorb some condition around `*((unsigned short *)__va(0x472)) = mode;` to make sure it is fine to write at that address? In the light of that, the callback `machine_ops.emergency_restart` looks as a safer option.
© 2016 - 2026 Red Hat, Inc.