[PATCH] x86/kexec: Push kjump return address even for non-kjump kexec

David Woodhouse posted 1 patch 2 months, 1 week ago
There is a newer version of this series
arch/x86/kernel/relocate_kernel_64.S | 2 ++
1 file changed, 2 insertions(+)
[PATCH] x86/kexec: Push kjump return address even for non-kjump kexec
Posted by David Woodhouse 2 months, 1 week ago
From: David Woodhouse <dwmw@amazon.co.uk>

The version of purgatory code shipped by kexec-tools attempts to look
above the top of its stack to find a return address for a kjump, even
in a non-kjump kexec. Since commit 2cacf7f23a02 ("x86/kexec: Fix stack
and handling of re-entry point for ::preserve_context") the page above
the stack might not be there, leading to a fault (which is at least
now caught my by exception-handling code in kexec).

That commit fixed things for the actual kjump path, but no longer
"gratuitously" pushes the unused return address to the stack in the
non-kjump path. Put that *back* in the non-kjump path, to prevent
purgatory from crashing when trying to access it.

Reported-by: Rohan Kakulawaram <rohanka@google.com>
Tested-by: Rohan Kakulawaram <rohanka@google.com>
Fixes: 2cacf7f23a02 ("x86/kexec: Fix stack and handling of re-entry point for ::preserve_context")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 arch/x86/kernel/relocate_kernel_64.S | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/kernel/relocate_kernel_64.S b/arch/x86/kernel/relocate_kernel_64.S
index 4ffba68dc57b..301c586282aa 100644
--- a/arch/x86/kernel/relocate_kernel_64.S
+++ b/arch/x86/kernel/relocate_kernel_64.S
@@ -136,6 +136,8 @@ SYM_CODE_START_LOCAL_NOALIGN(identity_mapped)
 	 * %r13 original CR4 when relocate_kernel() was invoked
 	 */
 
+	/* set return address to 0 if not preserving context */
+	pushq	$0
 	/* store the start address on the stack */
 	pushq   %rdx
 
-- 
2.43.0


Re: [PATCH] x86/kexec: Push kjump return address even for non-kjump kexec
Posted by Dave Hansen 1 month, 2 weeks ago
On 4/2/26 03:34, David Woodhouse wrote:
> The version of purgatory code shipped by kexec-tools attempts to look
> above the top of its stack to find a return address for a kjump

This is a bug in kexec-tools, right? Has kexec-tools been fixed?

The purgatory code is injected by userspace, so are you kinda asserting
here that the this change in the kernel stack "breaks userspace"?

I guess one little push isn't the end of the world. But, can we please
comment it to this effect:

	/*
	 * Work around a kexec-tools' <version here> purgatory bug that
	 * accesses the stack one long out of bounds. Push a dummy value
	 * to make the access harmless and avoid a fault.
	 */

Without that, we'll be scratching our heads for the next decade about
what this 0 on the stack does. The comment you suggested tells us what
it is doing, but not why.

It all feels kinda icky though. Our stack is an ABI?!?!?!
Re: [PATCH] x86/kexec: Push kjump return address even for non-kjump kexec
Posted by David Woodhouse 1 month, 2 weeks ago
On Tue, 2026-04-28 at 07:22 -0700, Dave Hansen wrote:
> On 4/2/26 03:34, David Woodhouse wrote:
> > The version of purgatory code shipped by kexec-tools attempts to look
> > above the top of its stack to find a return address for a kjump
> 
> This is a bug in kexec-tools, right? Has kexec-tools been fixed?

There isn't any other way to tell the difference between kjump and a
normal kexec, is there?

> The purgatory code is injected by userspace, so are you kinda asserting
> here that the this change in the kernel stack "breaks userspace"?

Essentially, yes. Rohan found that kexec broke on a kernel update,
bisected it to my commit, and worked out why.

This isn't just the "kernel stack". This is the stack that the kernel
sets up specifically for the kexec target.

> I guess one little push isn't the end of the world. But, can we please
> comment it to this effect:
> 
> 	/*
> 	 * Work around a kexec-tools' <version here> purgatory bug that
> 	 * accesses the stack one long out of bounds. Push a dummy value
> 	 * to make the access harmless and avoid a fault.
> 	 */
> 

Sure. Will phrase it thus in v2:

+       /*
+        * Set return address to 0 if not preserving context. The purgatory
+        * shipped in kexec-tools will unconditionally look for the return
+        * address on the stack and set a kexec_jump_back_entry= command
+        * line option if it's non-zero. There's no other way that it can
+        * tell a preserve-context (kjump) kexec from a normal one.
+        */


> Without that, we'll be scratching our heads for the next decade about
> what this 0 on the stack does. The comment you suggested tells us what
> it is doing, but not why.
> 
> It all feels kinda icky though. Our stack is an ABI?!?!?!

Well, it's kexec. Where else are we going to put things? Wanna declare
that we're going to use %rbx instead? Or the BIOS Data Area in segment
0x40? :)