From: David Woodhouse <dwmw@amazon.co.uk>
A previous commit added __nocfi to machine_kexec() because it makes an
indirect call to relocate_kernel() which lacked CFI type information,
and caused the system to crash.
Use SYM_TYPED_FUNC_START() to ensure that the type information is
present, and remove the __nocfi tag.
I still can't make objtool happy with this in both GCC and Clang builds
at the same time, so not yet for merging; only included in this series
to nerd-snipe the objtool maintainers.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
arch/x86/kernel/machine_kexec_64.c | 2 +-
arch/x86/kernel/relocate_kernel_64.S | 4 +++-
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index 7abc7aa0261b..84f59f18dcb6 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -380,7 +380,7 @@ void machine_kexec_cleanup(struct kimage *image)
* Do not allocate memory (or fail in any way) in machine_kexec().
* We are past the point of no return, committed to rebooting now.
*/
-void __nocfi machine_kexec(struct kimage *image)
+void machine_kexec(struct kimage *image)
{
unsigned long reloc_start = (unsigned long)__relocate_kernel_start;
relocate_kernel_fn *relocate_kernel_ptr;
diff --git a/arch/x86/kernel/relocate_kernel_64.S b/arch/x86/kernel/relocate_kernel_64.S
index 2b7fc59af373..999aca909803 100644
--- a/arch/x86/kernel/relocate_kernel_64.S
+++ b/arch/x86/kernel/relocate_kernel_64.S
@@ -5,6 +5,7 @@
*/
#include <linux/linkage.h>
+#include <linux/cfi_types.h>
#include <linux/stringify.h>
#include <asm/alternative.h>
#include <asm/page_types.h>
@@ -59,8 +60,9 @@ SYM_DATA_END(kexec_debug_idt)
.section .text..relocate_kernel,"ax";
.code64
-SYM_CODE_START_NOALIGN(relocate_kernel)
+SYM_TYPED_FUNC_START(relocate_kernel)
UNWIND_HINT_END_OF_STACK
+ UNWIND_HINT_FUNC
ANNOTATE_NOENDBR
/*
* %rdi indirection_page
--
2.48.1
On Wed, Mar 12, 2025 at 02:34:20PM +0000, David Woodhouse wrote: > From: David Woodhouse <dwmw@amazon.co.uk> > > A previous commit added __nocfi to machine_kexec() because it makes an > indirect call to relocate_kernel() which lacked CFI type information, > and caused the system to crash. > > Use SYM_TYPED_FUNC_START() to ensure that the type information is > present, and remove the __nocfi tag. > > I still can't make objtool happy with this in both GCC and Clang builds > at the same time, so not yet for merging; only included in this series > to nerd-snipe the objtool maintainers. Again, I'd recommend just compiling the code in .data..relocate_kernel. Then objtool doesn't need to care. Otherwise it would help to know what warnings you're getting with this patch, and the corresponding .config. -- Josh
On Fri, 2025-03-14 at 09:07 -0700, Josh Poimboeuf wrote:
> On Wed, Mar 12, 2025 at 02:34:20PM +0000, David Woodhouse wrote:
> > From: David Woodhouse <dwmw@amazon.co.uk>
> >
> > A previous commit added __nocfi to machine_kexec() because it makes an
> > indirect call to relocate_kernel() which lacked CFI type information,
> > and caused the system to crash.
> >
> > Use SYM_TYPED_FUNC_START() to ensure that the type information is
> > present, and remove the __nocfi tag.
> >
> > I still can't make objtool happy with this in both GCC and Clang builds
> > at the same time, so not yet for merging; only included in this series
> > to nerd-snipe the objtool maintainers.
>
> Again, I'd recommend just compiling the code in .data..relocate_kernel.
> Then objtool doesn't need to care.
>
> Otherwise it would help to know what warnings you're getting with this
> patch, and the corresponding .config.
ISTR this version is OK with Clang and CONFIG_CFI_CLANG but with GCC I
get this:
vmlinux.o: warning: objtool: relocate_kernel+0x69: unsupported stack register modification
/* setup a new stack at the end of the physical control page */
lea PAGE_SIZE(%rsi), %rsp
79: 48 8d a6 00 10 00 00 lea 0x1000(%rsi),%rsp
Maybe the answer is to put the UNWIND_HINT_FUNC into #ifdef
CONFIG_CFI_CLANG but that seems wrong.
I'll have another look at putting it in the data section, and see if I
can remember why I didn't want to do that before (and if that's still
relevant now).
On Fri, Mar 14, 2025 at 05:23:15PM +0000, David Woodhouse wrote: > ISTR this version is OK with Clang and CONFIG_CFI_CLANG but with GCC I > get this: > > vmlinux.o: warning: objtool: relocate_kernel+0x69: unsupported stack register modification > > /* setup a new stack at the end of the physical control page */ > lea PAGE_SIZE(%rsi), %rsp > 79: 48 8d a6 00 10 00 00 lea 0x1000(%rsi),%rsp > > > Maybe the answer is to put the UNWIND_HINT_FUNC into #ifdef > CONFIG_CFI_CLANG but that seems wrong. The UNWIND_HINT_FUNC definitely looks wrong, why would Clang need it? > I'll have another look at putting it in the data section, and see if I > can remember why I didn't want to do that before (and if that's still > relevant now). IIRC, the reasons were the patched alternative, and also you wanted to disassemble (but note that's still possible with gdb). Here was a patch to make it work: https://lore.kernel.org/20241218212326.44qff3i5n6cxuu5d@jpoimboe -- Josh
On Fri, 2025-03-14 at 10:52 -0700, Josh Poimboeuf wrote:
>
> IIRC, the reasons were the patched alternative, and also you wanted to
> disassemble (but note that's still possible with gdb).
It's meaningful output from 'objdump -S' that I miss. But OK.
> Here was a patch to make it work:
>
> https://lore.kernel.org/20241218212326.44qff3i5n6cxuu5d@jpoimboe
I've reworked that and the CR4 filtering, and now it doesn't like me
using SYM_TYPED_FUNC_START() to add the CFI information for the
relocate_kernel() function.
LD vmlinux.o
vmlinux.o: warning: objtool: bad call to elf_init_reloc_text_sym() for
data symbol .data..relocate_kernel
...
SORTTAB vmlinux
incomplete ORC unwind tables in file: vmlinux
Failed to sort kernel tables
This happens when I build with (clang and) CONFIG_CFI_CLANG, with and
updated version of your patch above, and this on top of it. This is at
https://git.infradead.org/users/dwmw2/linux.git/shortlog/refs/heads/kexec-debug
From 0f08a44613764e9b38a1c3332812685b61c99c2e Mon Sep 17 00:00:00 2001
From: David Woodhouse <dwmw@amazon.co.uk>
Date: Mon, 16 Dec 2024 10:26:24 +0000
Subject: [PATCH] x86/kexec: Add CFI type information to relocate_kernel()
A previous commit added __nocfi to machine_kexec() because it makes an
indirect call to relocate_kernel() which lacked CFI type information,
and caused the system to crash.
Use SYM_TYPED_FUNC_START() to ensure that the type information is
present, and remove the __nocfi tag.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
arch/x86/kernel/machine_kexec_64.c | 2 +-
arch/x86/kernel/relocate_kernel_64.S | 3 ++-
2 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index 016862d2b544..e1f5fc858aee 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -396,7 +396,7 @@ void machine_kexec_cleanup(struct kimage *image)
* Do not allocate memory (or fail in any way) in machine_kexec().
* We are past the point of no return, committed to rebooting now.
*/
-void __nocfi machine_kexec(struct kimage *image)
+void machine_kexec(struct kimage *image)
{
unsigned long reloc_start = (unsigned long)__relocate_kernel_start;
relocate_kernel_fn *relocate_kernel_ptr;
diff --git a/arch/x86/kernel/relocate_kernel_64.S b/arch/x86/kernel/relocate_kernel_64.S
index 814af7fa6318..c859fbe507e8 100644
--- a/arch/x86/kernel/relocate_kernel_64.S
+++ b/arch/x86/kernel/relocate_kernel_64.S
@@ -5,6 +5,7 @@
*/
#include <linux/linkage.h>
+#include <linux/cfi_types.h>
#include <linux/stringify.h>
#include <asm/alternative.h>
#include <asm/page_types.h>
@@ -68,7 +69,7 @@ SYM_DATA_END(kexec_debug_idt)
* opinions about it.
*/
.code64
-SYM_CODE_START_NOALIGN(relocate_kernel)
+SYM_TYPED_FUNC_START(relocate_kernel)
/*
* %rdi indirection_page
* %rsi pa_control_page
--
2.48.1
On Mon, Mar 17, 2025 at 12:40:14PM +0000, David Woodhouse wrote: > On Fri, 2025-03-14 at 10:52 -0700, Josh Poimboeuf wrote: > > > > IIRC, the reasons were the patched alternative, and also you wanted to > > disassemble (but note that's still possible with gdb). > > It's meaningful output from 'objdump -S' that I miss. But OK. FYI, this works: objdump -Srw -j .data..relocate_kernel vmlinux.o -- Josh
On Mon, Mar 17, 2025 at 05:17:24PM -0700, Josh Poimboeuf wrote: > On Mon, Mar 17, 2025 at 12:40:14PM +0000, David Woodhouse wrote: > > On Fri, 2025-03-14 at 10:52 -0700, Josh Poimboeuf wrote: > > > > > > IIRC, the reasons were the patched alternative, and also you wanted to > > > disassemble (but note that's still possible with gdb). > > > > It's meaningful output from 'objdump -S' that I miss. But OK. > > FYI, this works: > > objdump -Srw -j .data..relocate_kernel vmlinux.o ... but I see now that it doesn't intersperse the source. Never mind... -- Josh
On Mon, 2025-03-17 at 17:24 -0700, Josh Poimboeuf wrote:
> On Mon, Mar 17, 2025 at 05:17:24PM -0700, Josh Poimboeuf wrote:
> > On Mon, Mar 17, 2025 at 12:40:14PM +0000, David Woodhouse wrote:
> > > On Fri, 2025-03-14 at 10:52 -0700, Josh Poimboeuf wrote:
> > > >
> > > > IIRC, the reasons were the patched alternative, and also you
> > > > wanted to
> > > > disassemble (but note that's still possible with gdb).
> > >
> > > It's meaningful output from 'objdump -S' that I miss. But OK.
> >
> > FYI, this works:
> >
> > objdump -Srw -j .data..relocate_kernel vmlinux.o
>
> ... but I see now that it doesn't intersperse the source. Never
> mind...
To be fair, the source is assembler. So it isn't *so* hard to work it
out.
But on the whole, I'm not sure the CFI check is worth it.
CFI checks that the caller and callee agree about the prototype of the
function being called. There are two main benefits of this:
• to protect against attacks where function pointers are substituted
for gadgets.
• to protect against genuine bugs, where the caller and the callee
disagree about the function arguments.
For the relocate_kernel() case I don't think we care much about the
first. Without a CFI prologue, no *other* code can be tricked into
calling relocate_kernel() — and besides, it's in the kernel's data
section and isn't executable anyway until the kexec code copies it to a
page that *is*. And the kexec code itself isn't just following an
arbitrary function pointer; it copies the code into the control page
and invokes it there, so it's unlikely to follow a bogus pointer
either.
It's the *second* benefit which is more relevant to us, ensuring that
the caller and the callee genuinely do agree about the prototype.
But using the SYM_TYPED_FUNC_START() macro doesn't give us that; the
CFI prologue of the asm function is just using the signature emitted by
the *caller* in the weak __kcfi_typeid_relocate_kernel symbol anyway,
so they're never going to disagree. And how *could* the assembler side
build a typeid signature from the asm anyway?
I suspect that just leaving it marked __nocfi is probably the easier
option, unless I can *hardcode* the function signature 0x19835999 in
the CFI prologue in relocate_kernel_64.S to protect against someone
accidentally changing the C-visible 'relocate_kernel_fn' typedef
without changing the corresponding assembler? But honestly, is that
likely?
Looks like this (with your objtool relaxation on top, as well as
removing load_segments() in machine_kexec_64.c if I want the check to
actually emit the warning correctly.).
I just don't think it's worth it...
From 1bbed9c611fd286b68e2c459320910c4fefd4a22 Mon Sep 17 00:00:00 2001
From: David Woodhouse <dwmw@amazon.co.uk>
Date: Mon, 16 Dec 2024 10:26:24 +0000
Subject: [PATCH] x86/kexec: Add CFI type information to relocate_kernel()
A previous commit added __nocfi to machine_kexec() because it makes an
indirect call to relocate_kernel() which lacked CFI type information,
and caused the system to crash.
Use SYM_TYPED_FUNC_START() to ensure that the type information is
present, and remove the __nocfi tag. And emit the *actual* type
signature (0x19835999) because by default, asm code uses the type
signature emitted by the *caller*, which is never going to differ!
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
arch/x86/kernel/machine_kexec_64.c | 2 +-
arch/x86/kernel/relocate_kernel_64.S | 19 ++++++++++++++++++-
2 files changed, 19 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index 016862d2b544..e1f5fc858aee 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -396,7 +396,7 @@ void machine_kexec_cleanup(struct kimage *image)
* Do not allocate memory (or fail in any way) in machine_kexec().
* We are past the point of no return, committed to rebooting now.
*/
-void __nocfi machine_kexec(struct kimage *image)
+void machine_kexec(struct kimage *image)
{
unsigned long reloc_start = (unsigned long)__relocate_kernel_start;
relocate_kernel_fn *relocate_kernel_ptr;
diff --git a/arch/x86/kernel/relocate_kernel_64.S b/arch/x86/kernel/relocate_kernel_64.S
index 814af7fa6318..428401e55d29 100644
--- a/arch/x86/kernel/relocate_kernel_64.S
+++ b/arch/x86/kernel/relocate_kernel_64.S
@@ -5,6 +5,7 @@
*/
#include <linux/linkage.h>
+#include <linux/cfi_types.h>
#include <linux/stringify.h>
#include <asm/alternative.h>
#include <asm/page_types.h>
@@ -67,8 +68,24 @@ SYM_DATA_END(kexec_debug_idt)
* a data section even of the object file, to prevent objtool from having
* opinions about it.
*/
+
+#ifdef CONFIG_CFI_CLANG
+/*
+ * The SYM_TYPED_FUNC_START macro emits a CFI prologue for the function,
+ * referencing the __kcfi_typeid_##name symbol which is the signature of
+ * the function prototype. The value of that symbol ends up coming from
+ * a weak symbol emitted by the *caller* (in this case machine_kexec()),
+ * which means it's *entirely* useless for checking that the caller and
+ * callee agree about the prototype of the (asm) function being called.
+ * So, we define the signature *here* for ourselves, and if the C code
+ * ever calls relocate_kernel() in the belief that it has a different
+ * prototype, then the CFI check will trigger as $DEITY intended.
+ */
+ .weak __kcfi_typeid_relocate_kernel
+ .set __kcfi_typeid_relocate_kernel, 0x19835999
+#endif
.code64
-SYM_CODE_START_NOALIGN(relocate_kernel)
+SYM_TYPED_FUNC_START(relocate_kernel)
/*
* %rdi indirection_page
* %rsi pa_control_page
--
2.48.1
On Tue, Mar 18, 2025 at 03:56:36PM +0000, David Woodhouse wrote: > But on the whole, I'm not sure the CFI check is worth it. > > CFI checks that the caller and callee agree about the prototype of the > function being called. There are two main benefits of this: > > • to protect against attacks where function pointers are substituted > for gadgets. > > • to protect against genuine bugs, where the caller and the callee > disagree about the function arguments. AFAIK the first one is the main point of CFI. > For the relocate_kernel() case I don't think we care much about the > first. Without a CFI prologue, no *other* code can be tricked into > calling relocate_kernel() But for FineIBT the hash is checked on the callee side. So it loses FineIBT protection. > — and besides, it's in the kernel's data > section and isn't executable anyway until the kexec code copies it to a > page that *is*. Does the code get copied immediately before getting called, or can it be initialized earlier during boot when kdump does its initial setup? -- Josh
On Tue, 2025-03-18 at 10:14 -0700, Josh Poimboeuf wrote: > On Tue, Mar 18, 2025 at 03:56:36PM +0000, David Woodhouse wrote: > > But on the whole, I'm not sure the CFI check is worth it. > > > > CFI checks that the caller and callee agree about the prototype of the > > function being called. There are two main benefits of this: > > > > • to protect against attacks where function pointers are substituted > > for gadgets. > > > > • to protect against genuine bugs, where the caller and the callee > > disagree about the function arguments. > > AFAIK the first one is the main point of CFI. In the general case yes. I just don't think it matters much for relocate_kernel(). > > For the relocate_kernel() case I don't think we care much about the > > first. Without a CFI prologue, no *other* code can be tricked into > > calling relocate_kernel() > > But for FineIBT the hash is checked on the callee side. So it loses > FineIBT protection. Right now the relocate_kernel() code doesn't even have an endbr, does it? So it isn't a useful gadget? > > — and besides, it's in the kernel's data > > section and isn't executable anyway until the kexec code copies it to a > > page that *is*. > > Does the code get copied immediately before getting called, or can it be > initialized earlier during boot when kdump does its initial setup? It's initialized earlier, in machine_kexec_prepare(), and then the page is set ROX.
On Tue, Mar 18, 2025 at 09:06:58PM +0000, David Woodhouse wrote: > On Tue, 2025-03-18 at 10:14 -0700, Josh Poimboeuf wrote: > > On Tue, Mar 18, 2025 at 03:56:36PM +0000, David Woodhouse wrote: > > > For the relocate_kernel() case I don't think we care much about the > > > first. Without a CFI prologue, no *other* code can be tricked into > > > calling relocate_kernel() > > > > But for FineIBT the hash is checked on the callee side. So it loses > > FineIBT protection. > > Right now the relocate_kernel() code doesn't even have an endbr, does > it? So it isn't a useful gadget? In that case wouldn't IBT explode when you indirect call it? Or is IBT getting disabled beforehand? > > > — and besides, it's in the kernel's data > > > section and isn't executable anyway until the kexec code copies it to a > > > page that *is*. > > > > Does the code get copied immediately before getting called, or can it be > > initialized earlier during boot when kdump does its initial setup? > > It's initialized earlier, in machine_kexec_prepare(), and then the page > is set ROX. If that happens during boot (like for kdump init) then it'll be in text the whole time after boot, right? -- Josh
On 18 March 2025 22:41:43 GMT, Josh Poimboeuf <jpoimboe@kernel.org> wrote: >On Tue, Mar 18, 2025 at 09:06:58PM +0000, David Woodhouse wrote: >> On Tue, 2025-03-18 at 10:14 -0700, Josh Poimboeuf wrote: >> > On Tue, Mar 18, 2025 at 03:56:36PM +0000, David Woodhouse wrote: >> > > For the relocate_kernel() case I don't think we care much about the >> > > first. Without a CFI prologue, no *other* code can be tricked into >> > > calling relocate_kernel() >> > >> > But for FineIBT the hash is checked on the callee side. So it loses >> > FineIBT protection. >> >> Right now the relocate_kernel() code doesn't even have an endbr, does >> it? So it isn't a useful gadget? > >In that case wouldn't IBT explode when you indirect call it? Or is IBT >getting disabled beforehand? Not sure of the details. The machine_kexec() function which is the *caller* is currently marked with the __nocfi tag which stops any software checks. I guess any hardware feature which requires an endbr to be the target of an indirect branch has to already disabled on the way down? What specifically am I looking for, to check that? Or the hardware support has just never worked with kexec, perhaps? >> > > — and besides, it's in the kernel's data >> > > section and isn't executable anyway until the kexec code copies it to a >> > > page that *is*. >> > >> > Does the code get copied immediately before getting called, or can it be >> > initialized earlier during boot when kdump does its initial setup? >> >> It's initialized earlier, in machine_kexec_prepare(), and then the page >> is set ROX. > >If that happens during boot (like for kdump init) then it'll be in text >the whole time after boot, right? In an executable page, yes.
On Wed, Mar 19, 2025 at 01:04:20PM +0000, David Woodhouse wrote: > On 18 March 2025 22:41:43 GMT, Josh Poimboeuf <jpoimboe@kernel.org> wrote: > >On Tue, Mar 18, 2025 at 09:06:58PM +0000, David Woodhouse wrote: > >> On Tue, 2025-03-18 at 10:14 -0700, Josh Poimboeuf wrote: > >> > On Tue, Mar 18, 2025 at 03:56:36PM +0000, David Woodhouse wrote: > >> > > For the relocate_kernel() case I don't think we care much about the > >> > > first. Without a CFI prologue, no *other* code can be tricked into > >> > > calling relocate_kernel() > >> > > >> > But for FineIBT the hash is checked on the callee side. So it loses > >> > FineIBT protection. > >> > >> Right now the relocate_kernel() code doesn't even have an endbr, does > >> it? So it isn't a useful gadget? > > > >In that case wouldn't IBT explode when you indirect call it? Or is IBT > >getting disabled beforehand? > > Not sure of the details. The machine_kexec() function which is the > *caller* is currently marked with the __nocfi tag which stops any > software checks. I guess any hardware feature which requires an endbr > to be the target of an indirect branch has to already disabled on the > way down? What specifically am I looking for, to check that? Or the > hardware support has just never worked with kexec, perhaps? Looking at machine_kexec(), it calls cet_disable() before the indirect call. So yeah, it seems fine for relocate_kernel() to not have a CFI prologue or ENDBR. -- Josh
On Wed, 2025-03-19 at 08:47 -0700, Josh Poimboeuf wrote: > > Looking at machine_kexec(), it calls cet_disable() before the indirect > call. So yeah, it seems fine for relocate_kernel() to not have a CFI > prologue or ENDBR. Yeah. I'm just going to throw that into a branch and forget it for now. It's just causing noise around the *actual* stuff we want to get in, to do the exception handling. For that, I think the only thing outstanding is to add a userspace test based on http://david.woodhou.se/loadret.c which invokes the int3 from the 'payload' to test it. I think this is better than doing it inside the kernel itself, as it gives us a selftest for the kexec jump mechanism too.
On Fri, 2025-03-14 at 10:52 -0700, Josh Poimboeuf wrote: > On Fri, Mar 14, 2025 at 05:23:15PM +0000, David Woodhouse wrote: > > ISTR this version is OK with Clang and CONFIG_CFI_CLANG but with GCC I > > get this: > > > > vmlinux.o: warning: objtool: relocate_kernel+0x69: unsupported stack register modification > > > > /* setup a new stack at the end of the physical control page */ > > lea PAGE_SIZE(%rsi), %rsp > > 79: 48 8d a6 00 10 00 00 lea 0x1000(%rsi),%rsp > > > > > > Maybe the answer is to put the UNWIND_HINT_FUNC into #ifdef > > CONFIG_CFI_CLANG but that seems wrong. > > The UNWIND_HINT_FUNC definitely looks wrong, why would Clang need it? I think it's when CONFIG_CFI_CLANG makes the SYM_TYPED_FUNC_START() macro actually emit the CFI prologue? > > I'll have another look at putting it in the data section, and see if I > > can remember why I didn't want to do that before (and if that's still > > relevant now). > > IIRC, the reasons were the patched alternative, and also you wanted to > disassemble (but note that's still possible with gdb). > > Here was a patch to make it work: > > https://lore.kernel.org/20241218212326.44qff3i5n6cxuu5d@jpoimboe Yeah, that does seem reasonable. Sorry, I think I missed that before Christmas. I'll look at rolling it in. This part is kind of orthogonal to the actual debug support so it's fine to keep it separate. Thanks.
© 2016 - 2025 Red Hat, Inc.