[PATCH v7 0/8] x86/kexec: Add exception handling for relocate_kernel

David Woodhouse posted 8 patches 9 months, 1 week ago
There is a newer version of this series
arch/x86/include/asm/kexec.h         |   7 ++
arch/x86/kernel/early_printk.c       |   9 ++
arch/x86/kernel/machine_kexec_64.c   |  50 ++++++--
arch/x86/kernel/relocate_kernel_64.S | 254 +++++++++++++++++++++++++++++++++++++++-
4 files changed, 308 insertions(+), 12 deletions(-)
[PATCH v7 0/8] x86/kexec: Add exception handling for relocate_kernel
Posted by David Woodhouse 9 months, 1 week ago
Debugging kexec failures is painful, as anything going wrong in execution
of the critical relocate_kernel() function tends to just lead to a triple
fault. Thus leading to *weeks* of my life that I won't get back. Having
hacked something up for my own use, I figured I should share it...

Add a trivial exception handler in the relocate_kernel environment which 
outputs to the early_printk serial console if configured. Currently only 
8250-compatible serial ports are supported, but that could be extended.

I had to hack up QEMU support for a PCI serial port which matches what
the existing early_printk code can drive, and the *real* 8250_pci driver
doesn't seem to cope with that setup at all, but whatever... the kexec
code now drives the same 32-bit stride which is all that earlyprintk
supports. We can always add more later, if anyone cares.

Someone who cares might want to bring the i386 version into line with
this, although the lack of rip-based addressing makes all the PIC code a
bit harder.

David Woodhouse (8):
      x86/kexec: Debugging support: load a GDT
      x86/kexec: Debugging support: Load an IDT and basic exception entry points
      x86/kexec: Debugging support: Dump registers on exception
      x86/kexec: Add 8250 serial port output
      x86/kexec: Add 8250 MMIO serial port output
      x86/kexec: Invalidate GDT/IDT from relocate_kernel() instead of earlier
      [DO NOT MERGE] x86/kexec: Add int3 in kexec path for testing
      [DO NOT MERGE] x86/kexec: Add CFI type information to relocate_kernel()

 arch/x86/include/asm/kexec.h         |   7 ++
 arch/x86/kernel/early_printk.c       |   9 ++
 arch/x86/kernel/machine_kexec_64.c   |  50 ++++++--
 arch/x86/kernel/relocate_kernel_64.S | 254 +++++++++++++++++++++++++++++++++++++++-
 4 files changed, 308 insertions(+), 12 deletions(-)


v7:
 • Drop CONFIG_KEXEC_DEBUG and make it all unconditional in order to
   "throw regressions back into the face of whoever manages to introduce
   them" (Ingo, https://lore.kernel.org/kexec/Z7rwA-qVauX7lY8G@gmail.com/)
 • Move IDT invalidation into relocate_kernel() itself.

v6: https://lore.kernel.org/kexec/20250115191423.587774-1-dwmw2@infradead.org/
 • Rebase onto already-merged fixes in tip/x86/boot.
 • Move CONFIG_KEXEC_DEBUG to generic kernel/Kconfig.kexec as Bartosz is
   working on an Arm64 version.

v5: https://lore.kernel.org/kexec/20241205153343.3275139-1-dwmw2@infradead.org/T/
 • Drop [RFC].
 • Drop _PAGE_NOPTISHADOW fix, which Ingo already took into tip/x86/urgent.
 • Add memory-mapped serial port support (32-bit MMIO spacing only).

v4 (RFC): https://lore.kernel.org/kexec/20241127190343.44916-1-dwmw2@infradead.org/T/
 • Add _PAGE_NOPTISHADOW fix for the identmap code.
 • Drop explicit map of control page, which was masking the identmap bug.

v3 (RFC): https://lore.kernel.org/kexec/20241125100815.2512-1-dwmw2@infradead.org/T/
 • Add CONFIG_KEXEC_DEBUG option and use earlyprintk config.
 • Allocate PGD separately from control page.
 • Explicitly map control page into identmap.

V2 (RFC): https://lore.kernel.org/kexec/20241122224715.171751-1-dwmw2@infradead.org/T/
 • Introduce linker script, start to clean up data access.

V1 (RFC): https://lore.kernel.org/kexec/20241103054019.3795299-1-dwmw2@infradead.org/T/
 • Initial proof-of-concept hacks.

Re: [PATCH v7 0/8] x86/kexec: Add exception handling for relocate_kernel
Posted by Ingo Molnar 9 months, 1 week ago
* David Woodhouse <dwmw2@infradead.org> wrote:

> Debugging kexec failures is painful, as anything going wrong in execution
> of the critical relocate_kernel() function tends to just lead to a triple
> fault. Thus leading to *weeks* of my life that I won't get back. Having
> hacked something up for my own use, I figured I should share it...
> 
> Add a trivial exception handler in the relocate_kernel environment which 
> outputs to the early_printk serial console if configured. Currently only 
> 8250-compatible serial ports are supported, but that could be extended.
> 
> I had to hack up QEMU support for a PCI serial port which matches what
> the existing early_printk code can drive, and the *real* 8250_pci driver
> doesn't seem to cope with that setup at all, but whatever... the kexec
> code now drives the same 32-bit stride which is all that earlyprintk
> supports. We can always add more later, if anyone cares.
> 
> Someone who cares might want to bring the i386 version into line with
> this, although the lack of rip-based addressing makes all the PIC code a
> bit harder.
> 
> David Woodhouse (8):
>       x86/kexec: Debugging support: load a GDT
>       x86/kexec: Debugging support: Load an IDT and basic exception entry points
>       x86/kexec: Debugging support: Dump registers on exception
>       x86/kexec: Add 8250 serial port output
>       x86/kexec: Add 8250 MMIO serial port output
>       x86/kexec: Invalidate GDT/IDT from relocate_kernel() instead of earlier
>       [DO NOT MERGE] x86/kexec: Add int3 in kexec path for testing
>       [DO NOT MERGE] x86/kexec: Add CFI type information to relocate_kernel()
> 
>  arch/x86/include/asm/kexec.h         |   7 ++
>  arch/x86/kernel/early_printk.c       |   9 ++
>  arch/x86/kernel/machine_kexec_64.c   |  50 ++++++--
>  arch/x86/kernel/relocate_kernel_64.S | 254 +++++++++++++++++++++++++++++++++++++++-
>  4 files changed, 308 insertions(+), 12 deletions(-)

I applied the first 3 patches to tip:x86/boot for phased-risk-reduction 
reasons, and because I had some questions and suggestions starting at 
patch #4.

Thanks,

	Ingo
Re: [PATCH v7 0/8] x86/kexec: Add exception handling for relocate_kernel
Posted by Ingo Molnar 9 months, 1 week ago
* Ingo Molnar <mingo@kernel.org> wrote:

> I applied the first 3 patches to tip:x86/boot for 
> phased-risk-reduction reasons, and because I had some questions and 
> suggestions starting at patch #4.

So there's a new objtool build warning from the new exc_handler code:

 vmlinux.o: warning: objtool: exc_handler+0xe: early indirect call

That's with a x86-64 defconfig + KVM enablement.

Thanks,

	Ingo
Re: [PATCH v7 0/8] x86/kexec: Add exception handling for relocate_kernel
Posted by David Woodhouse 9 months, 1 week ago
On Thu, 2025-03-13 at 11:54 +0100, Ingo Molnar wrote:
> 
> * Ingo Molnar <mingo@kernel.org> wrote:
> 
> > I applied the first 3 patches to tip:x86/boot for 
> > phased-risk-reduction reasons, and because I had some questions and
> > suggestions starting at patch #4.
> 
> So there's a new objtool build warning from the new exc_handler code:
> 
>  vmlinux.o: warning: objtool: exc_handler+0xe: early indirect call
> 
> That's with a x86-64 defconfig + KVM enablement.

Reproduced that by going back to x86-64 defconfig. 

vmlinux.o: warning: objtool: exc_handler+0xe: early indirect call

SYM_CODE_START_LOCAL_NOALIGN(exc_handler)
        pushq   %rax
 2b6:   50                      push   %rax
        pushq   %rbx
 2b7:   53                      push   %rbx
        pushq   %rcx
 2b8:   51                      push   %rcx
        pushq   %rdx
 2b9:   52                      push   %rdx
        pushq   %rsi
 2ba:   56                      push   %rsi

        /* Set up %rdx/%rsi for debug output */
        pr_setup
 2bb:   48 8d 35 6e ff ff ff    lea    -0x92(%rip),%rsi        # 230 <pr_char>

        /* rip and exception info */
        print_reg 'E', 'x', 'c', ':', 0x28(%rsp)
 2c2:   b0 45                   mov    $0x45,%al
 2c4:   ff d6                   call   *%rsi



So it's the 'call *$rsi' instruction at 0x2c4, but that's annotated
with ANNOTATE_RETPOLINE_SAFE:

.macro print_reg a, b, c, d, r
	movb	$\a, %al
	ANNOTATE_RETPOLINE_SAFE
	call	*%rsi

So what's wrong with that? What *more* do I have to tell objtool to
make it shut up and go away?

Am I missing some documentation which would tell me what it's actually
unhappy about? Because "early indirect call" doesn't enlighten me very
much, and reading tools/objtool/check.c doesn't either.
Re: [PATCH v7 0/8] x86/kexec: Add exception handling for relocate_kernel
Posted by David Woodhouse 9 months, 1 week ago
On Thu, 2025-03-13 at 19:58 +0000, David Woodhouse wrote:
> 
> Reproduced that by going back to x86-64 defconfig. 

Turns out the unret check doesn't even run unless CONFIG_DEBUG_ENTRY is
enabled (which enables CONFIG_NOINSTR_VALIDATION and thus runs objtool
on vmlinux.o). Which is why I didn't see it.

> vmlinux.o: warning: objtool: exc_handler+0xe: early indirect call

With Peter's help (thanks), this is fixed by adding VALIDATE_UNRET_END.
I'll squash this into the next posting:

--- a/arch/x86/kernel/relocate_kernel_64.S
+++ b/arch/x86/kernel/relocate_kernel_64.S
@@ -481,6 +481,9 @@ SYM_CODE_START_NOALIGN(kexec_debug_exc_vectors)
 SYM_CODE_END(kexec_debug_exc_vectors)
 
 SYM_CODE_START_LOCAL_NOALIGN(exc_handler)
+       /* No need for ret mitigations during kexec */
+       VALIDATE_UNRET_END
+
        pushq   %rax
        pushq   %rbx
        pushq   %rcx



Re: [PATCH v7 0/8] x86/kexec: Add exception handling for relocate_kernel
Posted by Ingo Molnar 9 months, 1 week ago
* David Woodhouse <dwmw2@infradead.org> wrote:

> On Thu, 2025-03-13 at 19:58 +0000, David Woodhouse wrote:
> > 
> > Reproduced that by going back to x86-64 defconfig. 
> 
> Turns out the unret check doesn't even run unless CONFIG_DEBUG_ENTRY is
> enabled (which enables CONFIG_NOINSTR_VALIDATION and thus runs objtool
> on vmlinux.o). Which is why I didn't see it.
> 
> > vmlinux.o: warning: objtool: exc_handler+0xe: early indirect call
> 
> With Peter's help (thanks), this is fixed by adding VALIDATE_UNRET_END.
> I'll squash this into the next posting:
> 
> --- a/arch/x86/kernel/relocate_kernel_64.S
> +++ b/arch/x86/kernel/relocate_kernel_64.S
> @@ -481,6 +481,9 @@ SYM_CODE_START_NOALIGN(kexec_debug_exc_vectors)
>  SYM_CODE_END(kexec_debug_exc_vectors)
>  
>  SYM_CODE_START_LOCAL_NOALIGN(exc_handler)
> +       /* No need for ret mitigations during kexec */
> +       VALIDATE_UNRET_END
> +
>         pushq   %rax
>         pushq   %rbx
>         pushq   %rcx

Great!

I've applied patch #1 back to tip:x86/boot.

I've skipped the -v7 versions of patch #2 and #3 because AFAICS you've 
changed exc_handler already, so a backmerge of this annotation fix 
wouldn't be enough.

Thanks,

	Ingo
Re: [PATCH v7 0/8] x86/kexec: Add exception handling for relocate_kernel
Posted by David Woodhouse 9 months, 1 week ago
On Fri, 2025-03-14 at 11:21 +0100, Ingo Molnar wrote:
> 
> I've applied patch #1 back to tip:x86/boot.
> 
> I've skipped the -v7 versions of patch #2 and #3 because AFAICS you've 
> changed exc_handler already, so a backmerge of this annotation fix 
> wouldn't be enough.

I haven't (yet) changed exc_handler, but I did post that annotation fix
as a patch against patch 3 in the series, when actually it should be
applied as as fixup to patch 2.

I *am* cleaning up exc_handler in patch 3 though, for a more 'warm and
fuzzy' experience on int3 rather than dumping the full register set. So
I'll repost it from patch 2 against the new tip/x86/boot, including the
annotation fix in the right place.

Thanks.

I haven't yet decided what to do about the unconditional int3. Slightly
tempted to suggest we put it in #ifdef CONFIG_DEBUG_ENTRY now I've been
reminded that option exists? But we should *also* be doing better
testing of kexec-jump, with something like that test case I posted, and
adding an int3 into that would be trivial too.