arch/x86/include/asm/kexec.h | 7 ++ arch/x86/kernel/early_printk.c | 9 ++ arch/x86/kernel/machine_kexec_64.c | 50 ++++++-- arch/x86/kernel/relocate_kernel_64.S | 254 +++++++++++++++++++++++++++++++++++++++- 4 files changed, 308 insertions(+), 12 deletions(-)
Debugging kexec failures is painful, as anything going wrong in execution
of the critical relocate_kernel() function tends to just lead to a triple
fault. Thus leading to *weeks* of my life that I won't get back. Having
hacked something up for my own use, I figured I should share it...
Add a trivial exception handler in the relocate_kernel environment which
outputs to the early_printk serial console if configured. Currently only
8250-compatible serial ports are supported, but that could be extended.
I had to hack up QEMU support for a PCI serial port which matches what
the existing early_printk code can drive, and the *real* 8250_pci driver
doesn't seem to cope with that setup at all, but whatever... the kexec
code now drives the same 32-bit stride which is all that earlyprintk
supports. We can always add more later, if anyone cares.
Someone who cares might want to bring the i386 version into line with
this, although the lack of rip-based addressing makes all the PIC code a
bit harder.
David Woodhouse (8):
x86/kexec: Debugging support: load a GDT
x86/kexec: Debugging support: Load an IDT and basic exception entry points
x86/kexec: Debugging support: Dump registers on exception
x86/kexec: Add 8250 serial port output
x86/kexec: Add 8250 MMIO serial port output
x86/kexec: Invalidate GDT/IDT from relocate_kernel() instead of earlier
[DO NOT MERGE] x86/kexec: Add int3 in kexec path for testing
[DO NOT MERGE] x86/kexec: Add CFI type information to relocate_kernel()
arch/x86/include/asm/kexec.h | 7 ++
arch/x86/kernel/early_printk.c | 9 ++
arch/x86/kernel/machine_kexec_64.c | 50 ++++++--
arch/x86/kernel/relocate_kernel_64.S | 254 +++++++++++++++++++++++++++++++++++++++-
4 files changed, 308 insertions(+), 12 deletions(-)
v7:
• Drop CONFIG_KEXEC_DEBUG and make it all unconditional in order to
"throw regressions back into the face of whoever manages to introduce
them" (Ingo, https://lore.kernel.org/kexec/Z7rwA-qVauX7lY8G@gmail.com/)
• Move IDT invalidation into relocate_kernel() itself.
v6: https://lore.kernel.org/kexec/20250115191423.587774-1-dwmw2@infradead.org/
• Rebase onto already-merged fixes in tip/x86/boot.
• Move CONFIG_KEXEC_DEBUG to generic kernel/Kconfig.kexec as Bartosz is
working on an Arm64 version.
v5: https://lore.kernel.org/kexec/20241205153343.3275139-1-dwmw2@infradead.org/T/
• Drop [RFC].
• Drop _PAGE_NOPTISHADOW fix, which Ingo already took into tip/x86/urgent.
• Add memory-mapped serial port support (32-bit MMIO spacing only).
v4 (RFC): https://lore.kernel.org/kexec/20241127190343.44916-1-dwmw2@infradead.org/T/
• Add _PAGE_NOPTISHADOW fix for the identmap code.
• Drop explicit map of control page, which was masking the identmap bug.
v3 (RFC): https://lore.kernel.org/kexec/20241125100815.2512-1-dwmw2@infradead.org/T/
• Add CONFIG_KEXEC_DEBUG option and use earlyprintk config.
• Allocate PGD separately from control page.
• Explicitly map control page into identmap.
V2 (RFC): https://lore.kernel.org/kexec/20241122224715.171751-1-dwmw2@infradead.org/T/
• Introduce linker script, start to clean up data access.
V1 (RFC): https://lore.kernel.org/kexec/20241103054019.3795299-1-dwmw2@infradead.org/T/
• Initial proof-of-concept hacks.
* David Woodhouse <dwmw2@infradead.org> wrote: > Debugging kexec failures is painful, as anything going wrong in execution > of the critical relocate_kernel() function tends to just lead to a triple > fault. Thus leading to *weeks* of my life that I won't get back. Having > hacked something up for my own use, I figured I should share it... > > Add a trivial exception handler in the relocate_kernel environment which > outputs to the early_printk serial console if configured. Currently only > 8250-compatible serial ports are supported, but that could be extended. > > I had to hack up QEMU support for a PCI serial port which matches what > the existing early_printk code can drive, and the *real* 8250_pci driver > doesn't seem to cope with that setup at all, but whatever... the kexec > code now drives the same 32-bit stride which is all that earlyprintk > supports. We can always add more later, if anyone cares. > > Someone who cares might want to bring the i386 version into line with > this, although the lack of rip-based addressing makes all the PIC code a > bit harder. > > David Woodhouse (8): > x86/kexec: Debugging support: load a GDT > x86/kexec: Debugging support: Load an IDT and basic exception entry points > x86/kexec: Debugging support: Dump registers on exception > x86/kexec: Add 8250 serial port output > x86/kexec: Add 8250 MMIO serial port output > x86/kexec: Invalidate GDT/IDT from relocate_kernel() instead of earlier > [DO NOT MERGE] x86/kexec: Add int3 in kexec path for testing > [DO NOT MERGE] x86/kexec: Add CFI type information to relocate_kernel() > > arch/x86/include/asm/kexec.h | 7 ++ > arch/x86/kernel/early_printk.c | 9 ++ > arch/x86/kernel/machine_kexec_64.c | 50 ++++++-- > arch/x86/kernel/relocate_kernel_64.S | 254 +++++++++++++++++++++++++++++++++++++++- > 4 files changed, 308 insertions(+), 12 deletions(-) I applied the first 3 patches to tip:x86/boot for phased-risk-reduction reasons, and because I had some questions and suggestions starting at patch #4. Thanks, Ingo
* Ingo Molnar <mingo@kernel.org> wrote: > I applied the first 3 patches to tip:x86/boot for > phased-risk-reduction reasons, and because I had some questions and > suggestions starting at patch #4. So there's a new objtool build warning from the new exc_handler code: vmlinux.o: warning: objtool: exc_handler+0xe: early indirect call That's with a x86-64 defconfig + KVM enablement. Thanks, Ingo
On Thu, 2025-03-13 at 11:54 +0100, Ingo Molnar wrote:
>
> * Ingo Molnar <mingo@kernel.org> wrote:
>
> > I applied the first 3 patches to tip:x86/boot for
> > phased-risk-reduction reasons, and because I had some questions and
> > suggestions starting at patch #4.
>
> So there's a new objtool build warning from the new exc_handler code:
>
> vmlinux.o: warning: objtool: exc_handler+0xe: early indirect call
>
> That's with a x86-64 defconfig + KVM enablement.
Reproduced that by going back to x86-64 defconfig.
vmlinux.o: warning: objtool: exc_handler+0xe: early indirect call
SYM_CODE_START_LOCAL_NOALIGN(exc_handler)
pushq %rax
2b6: 50 push %rax
pushq %rbx
2b7: 53 push %rbx
pushq %rcx
2b8: 51 push %rcx
pushq %rdx
2b9: 52 push %rdx
pushq %rsi
2ba: 56 push %rsi
/* Set up %rdx/%rsi for debug output */
pr_setup
2bb: 48 8d 35 6e ff ff ff lea -0x92(%rip),%rsi # 230 <pr_char>
/* rip and exception info */
print_reg 'E', 'x', 'c', ':', 0x28(%rsp)
2c2: b0 45 mov $0x45,%al
2c4: ff d6 call *%rsi
So it's the 'call *$rsi' instruction at 0x2c4, but that's annotated
with ANNOTATE_RETPOLINE_SAFE:
.macro print_reg a, b, c, d, r
movb $\a, %al
ANNOTATE_RETPOLINE_SAFE
call *%rsi
So what's wrong with that? What *more* do I have to tell objtool to
make it shut up and go away?
Am I missing some documentation which would tell me what it's actually
unhappy about? Because "early indirect call" doesn't enlighten me very
much, and reading tools/objtool/check.c doesn't either.
On Thu, 2025-03-13 at 19:58 +0000, David Woodhouse wrote:
>
> Reproduced that by going back to x86-64 defconfig.
Turns out the unret check doesn't even run unless CONFIG_DEBUG_ENTRY is
enabled (which enables CONFIG_NOINSTR_VALIDATION and thus runs objtool
on vmlinux.o). Which is why I didn't see it.
> vmlinux.o: warning: objtool: exc_handler+0xe: early indirect call
With Peter's help (thanks), this is fixed by adding VALIDATE_UNRET_END.
I'll squash this into the next posting:
--- a/arch/x86/kernel/relocate_kernel_64.S
+++ b/arch/x86/kernel/relocate_kernel_64.S
@@ -481,6 +481,9 @@ SYM_CODE_START_NOALIGN(kexec_debug_exc_vectors)
SYM_CODE_END(kexec_debug_exc_vectors)
SYM_CODE_START_LOCAL_NOALIGN(exc_handler)
+ /* No need for ret mitigations during kexec */
+ VALIDATE_UNRET_END
+
pushq %rax
pushq %rbx
pushq %rcx
* David Woodhouse <dwmw2@infradead.org> wrote: > On Thu, 2025-03-13 at 19:58 +0000, David Woodhouse wrote: > > > > Reproduced that by going back to x86-64 defconfig. > > Turns out the unret check doesn't even run unless CONFIG_DEBUG_ENTRY is > enabled (which enables CONFIG_NOINSTR_VALIDATION and thus runs objtool > on vmlinux.o). Which is why I didn't see it. > > > vmlinux.o: warning: objtool: exc_handler+0xe: early indirect call > > With Peter's help (thanks), this is fixed by adding VALIDATE_UNRET_END. > I'll squash this into the next posting: > > --- a/arch/x86/kernel/relocate_kernel_64.S > +++ b/arch/x86/kernel/relocate_kernel_64.S > @@ -481,6 +481,9 @@ SYM_CODE_START_NOALIGN(kexec_debug_exc_vectors) > SYM_CODE_END(kexec_debug_exc_vectors) > > SYM_CODE_START_LOCAL_NOALIGN(exc_handler) > + /* No need for ret mitigations during kexec */ > + VALIDATE_UNRET_END > + > pushq %rax > pushq %rbx > pushq %rcx Great! I've applied patch #1 back to tip:x86/boot. I've skipped the -v7 versions of patch #2 and #3 because AFAICS you've changed exc_handler already, so a backmerge of this annotation fix wouldn't be enough. Thanks, Ingo
On Fri, 2025-03-14 at 11:21 +0100, Ingo Molnar wrote: > > I've applied patch #1 back to tip:x86/boot. > > I've skipped the -v7 versions of patch #2 and #3 because AFAICS you've > changed exc_handler already, so a backmerge of this annotation fix > wouldn't be enough. I haven't (yet) changed exc_handler, but I did post that annotation fix as a patch against patch 3 in the series, when actually it should be applied as as fixup to patch 2. I *am* cleaning up exc_handler in patch 3 though, for a more 'warm and fuzzy' experience on int3 rather than dumping the full register set. So I'll repost it from patch 2 against the new tip/x86/boot, including the annotation fix in the right place. Thanks. I haven't yet decided what to do about the unconditional int3. Slightly tempted to suggest we put it in #ifdef CONFIG_DEBUG_ENTRY now I've been reminded that option exists? But we should *also* be doing better testing of kexec-jump, with something like that test case I posted, and adding an int3 into that would be trivial too.
© 2016 - 2025 Red Hat, Inc.