[RFC PATCH 0/2] x86: kprobes: Fix CFI_CLANG related issues

Masami Hiramatsu (Google) posted 2 patches 2 years, 7 months ago
There is a newer version of this series
arch/x86/kernel/kprobes/core.c |   34 ++++++++++++++++++++++++++++++++++
kernel/kprobes.c               |   17 ++++++++++++++++-
2 files changed, 50 insertions(+), 1 deletion(-)
[RFC PATCH 0/2] x86: kprobes: Fix CFI_CLANG related issues
Posted by Masami Hiramatsu (Google) 2 years, 7 months ago
Hi Peter,

Here I tried to fix 2 issues discussed on the previous thread;

https://lore.kernel.org/all/20230706113403.GI2833176@hirez.programming.kicks-ass.net/

- Prohibit probing on __cfi_* preamble symbols, which have the typeid.
- Prohibit probing on compiler generated movl/addl which is used for
  detecting typeid on x86.

I'm not sure how arm64 implemented, but it seems 
cfi_handler()@arch/arm64/kernel/traps.c just reads the registers for 
the typeid instead of decoding the instructions.

I just build tested, since I could not boot the kernel with CFI_CLANG=y.
Would anyone know something about this error?

[    0.141030] MMIO Stale Data: Unknown: No mitigations
[    0.153511] SMP alternatives: Using kCFI
[    0.164593] Freeing SMP alternatives memory: 36K
[    0.165053] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: start_kernel+0x472/0x48b
[    0.166028] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.2-00002-g12b1b2fca8ef #126
[    0.166028] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
[    0.166028] Call Trace:
[    0.166028]  <TASK>
[    0.166028]  dump_stack_lvl+0x6e/0xb0
[    0.166028]  panic+0x146/0x2f0
[    0.166028]  ? start_kernel+0x472/0x48b
[    0.166028]  __stack_chk_fail+0x14/0x20
[    0.166028]  start_kernel+0x472/0x48b
[    0.166028]  x86_64_start_reservations+0x24/0x30
[    0.166028]  x86_64_start_kernel+0xa6/0xbb
[    0.166028]  secondary_startup_64_no_verify+0x106/0x11b
[    0.166028]  </TASK>
[    0.166028] ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: start_kernel+0x472/0x48b ]---


Thank you,

---

Masami Hiramatsu (Google) (2):
      kprobes: Prohibit probing on CFI preamble symbol
      x86/kprobes: Prohibit probing on compiler generated CFI checking code


 arch/x86/kernel/kprobes/core.c |   34 ++++++++++++++++++++++++++++++++++
 kernel/kprobes.c               |   17 ++++++++++++++++-
 2 files changed, 50 insertions(+), 1 deletion(-)

--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
Re: [RFC PATCH 0/2] x86: kprobes: Fix CFI_CLANG related issues
Posted by Peter Zijlstra 2 years, 7 months ago
On Mon, Jul 10, 2023 at 09:14:13PM +0900, Masami Hiramatsu (Google) wrote:

> I just build tested, since I could not boot the kernel with CFI_CLANG=y.
> Would anyone know something about this error?
> 
> [    0.141030] MMIO Stale Data: Unknown: No mitigations
> [    0.153511] SMP alternatives: Using kCFI
> [    0.164593] Freeing SMP alternatives memory: 36K
> [    0.165053] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: start_kernel+0x472/0x48b
> [    0.166028] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.2-00002-g12b1b2fca8ef #126
> [    0.166028] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
> [    0.166028] Call Trace:
> [    0.166028]  <TASK>
> [    0.166028]  dump_stack_lvl+0x6e/0xb0
> [    0.166028]  panic+0x146/0x2f0
> [    0.166028]  ? start_kernel+0x472/0x48b
> [    0.166028]  __stack_chk_fail+0x14/0x20
> [    0.166028]  start_kernel+0x472/0x48b
> [    0.166028]  x86_64_start_reservations+0x24/0x30
> [    0.166028]  x86_64_start_kernel+0xa6/0xbb
> [    0.166028]  secondary_startup_64_no_verify+0x106/0x11b
> [    0.166028]  </TASK>
> [    0.166028] ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: start_kernel+0x472/0x48b ]---
> 
> 

Hmm, I just build v6.4 using defconfig+kvm_guest.config+CFI_CLANG using
clang-16 and that boots using kvm... (on my IVB, and the thing also
boots natively on my ADL).

I'll go have a look at your patches shortly.
Re: [RFC PATCH 0/2] x86: kprobes: Fix CFI_CLANG related issues
Posted by Nathan Chancellor 2 years, 7 months ago
On Mon, Jul 10, 2023 at 09:14:13PM +0900, Masami Hiramatsu (Google) wrote:
> I just build tested, since I could not boot the kernel with CFI_CLANG=y.
> Would anyone know something about this error?
> 
> [    0.141030] MMIO Stale Data: Unknown: No mitigations
> [    0.153511] SMP alternatives: Using kCFI
> [    0.164593] Freeing SMP alternatives memory: 36K
> [    0.165053] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: start_kernel+0x472/0x48b
> [    0.166028] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.2-00002-g12b1b2fca8ef #126
> [    0.166028] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
> [    0.166028] Call Trace:
> [    0.166028]  <TASK>
> [    0.166028]  dump_stack_lvl+0x6e/0xb0
> [    0.166028]  panic+0x146/0x2f0
> [    0.166028]  ? start_kernel+0x472/0x48b
> [    0.166028]  __stack_chk_fail+0x14/0x20
> [    0.166028]  start_kernel+0x472/0x48b
> [    0.166028]  x86_64_start_reservations+0x24/0x30
> [    0.166028]  x86_64_start_kernel+0xa6/0xbb
> [    0.166028]  secondary_startup_64_no_verify+0x106/0x11b
> [    0.166028]  </TASK>
> [    0.166028] ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: start_kernel+0x472/0x48b ]---

This looks like https://github.com/ClangBuiltLinux/linux/issues/1815 to
me. What version of LLVM are you using? This was fixed in 16.0.4. Commit
514ca14ed544 ("start_kernel: Add __no_stack_protector function
attribute") should resolve it on the Linux side, it looks like that is
in 6.5-rc1. Not sure if we should backport it or just let people upgrade
their toolchains on older releases.

Cheers,
Nathan
Re: [RFC PATCH 0/2] x86: kprobes: Fix CFI_CLANG related issues
Posted by Masami Hiramatsu (Google) 2 years, 7 months ago
On Mon, 10 Jul 2023 08:57:03 -0700
Nathan Chancellor <nathan@kernel.org> wrote:

> On Mon, Jul 10, 2023 at 09:14:13PM +0900, Masami Hiramatsu (Google) wrote:
> > I just build tested, since I could not boot the kernel with CFI_CLANG=y.
> > Would anyone know something about this error?
> > 
> > [    0.141030] MMIO Stale Data: Unknown: No mitigations
> > [    0.153511] SMP alternatives: Using kCFI
> > [    0.164593] Freeing SMP alternatives memory: 36K
> > [    0.165053] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: start_kernel+0x472/0x48b
> > [    0.166028] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.2-00002-g12b1b2fca8ef #126
> > [    0.166028] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
> > [    0.166028] Call Trace:
> > [    0.166028]  <TASK>
> > [    0.166028]  dump_stack_lvl+0x6e/0xb0
> > [    0.166028]  panic+0x146/0x2f0
> > [    0.166028]  ? start_kernel+0x472/0x48b
> > [    0.166028]  __stack_chk_fail+0x14/0x20
> > [    0.166028]  start_kernel+0x472/0x48b
> > [    0.166028]  x86_64_start_reservations+0x24/0x30
> > [    0.166028]  x86_64_start_kernel+0xa6/0xbb
> > [    0.166028]  secondary_startup_64_no_verify+0x106/0x11b
> > [    0.166028]  </TASK>
> > [    0.166028] ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: start_kernel+0x472/0x48b ]---
> 
> This looks like https://github.com/ClangBuiltLinux/linux/issues/1815 to
> me. What version of LLVM are you using? This was fixed in 16.0.4. Commit
> 514ca14ed544 ("start_kernel: Add __no_stack_protector function
> attribute") should resolve it on the Linux side, it looks like that is
> in 6.5-rc1. Not sure if we should backport it or just let people upgrade
> their toolchains on older releases.

Thanks for the info. I confirmed that the commit fixed the boot issue.
So I think it should be backported to the stable tree.

Thanks!

> 
> Cheers,
> Nathan


-- 
Masami Hiramatsu (Google) <mhiramat@kernel.org>
Re: [RFC PATCH 0/2] x86: kprobes: Fix CFI_CLANG related issues
Posted by Nathan Chancellor 2 years, 7 months ago
Masami, thanks for verifying!

Hi Greg and Sasha,

On Tue, Jul 11, 2023 at 10:33:03AM +0900, Masami Hiramatsu wrote:
> On Mon, 10 Jul 2023 08:57:03 -0700
> Nathan Chancellor <nathan@kernel.org> wrote:
> 
> > On Mon, Jul 10, 2023 at 09:14:13PM +0900, Masami Hiramatsu (Google) wrote:
> > > I just build tested, since I could not boot the kernel with CFI_CLANG=y.
> > > Would anyone know something about this error?
> > > 
> > > [    0.141030] MMIO Stale Data: Unknown: No mitigations
> > > [    0.153511] SMP alternatives: Using kCFI
> > > [    0.164593] Freeing SMP alternatives memory: 36K
> > > [    0.165053] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: start_kernel+0x472/0x48b
> > > [    0.166028] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.2-00002-g12b1b2fca8ef #126
> > > [    0.166028] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
> > > [    0.166028] Call Trace:
> > > [    0.166028]  <TASK>
> > > [    0.166028]  dump_stack_lvl+0x6e/0xb0
> > > [    0.166028]  panic+0x146/0x2f0
> > > [    0.166028]  ? start_kernel+0x472/0x48b
> > > [    0.166028]  __stack_chk_fail+0x14/0x20
> > > [    0.166028]  start_kernel+0x472/0x48b
> > > [    0.166028]  x86_64_start_reservations+0x24/0x30
> > > [    0.166028]  x86_64_start_kernel+0xa6/0xbb
> > > [    0.166028]  secondary_startup_64_no_verify+0x106/0x11b
> > > [    0.166028]  </TASK>
> > > [    0.166028] ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: start_kernel+0x472/0x48b ]---
> > 
> > This looks like https://github.com/ClangBuiltLinux/linux/issues/1815 to
> > me. What version of LLVM are you using? This was fixed in 16.0.4. Commit
> > 514ca14ed544 ("start_kernel: Add __no_stack_protector function
> > attribute") should resolve it on the Linux side, it looks like that is
> > in 6.5-rc1. Not sure if we should backport it or just let people upgrade
> > their toolchains on older releases.
> 
> Thanks for the info. I confirmed that the commit fixed the boot issue.
> So I think it should be backported to the stable tree.

Would you please apply commit 514ca14ed544 ("start_kernel: Add
__no_stack_protector function attribute") to linux-6.4.y? The series
ending with commit 611d4c716db0 ("x86/hyperv: Mark hv_ghcb_terminate()
as noreturn") that shipped in 6.4 exposes an LLVM issue that affected
16.0.0 and 16.0.1, which was resolved in 16.0.2. When using those
affected LLVM releases, the following crash at boot occurs:

  [    0.181667] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: start_kernel+0x3cf/0x3d0
  [    0.182621] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.3 #1
  [    0.182621] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
  [    0.182621] Call Trace:
  [    0.182621]  <TASK>
  [    0.182621]  dump_stack_lvl+0x6a/0xa0
  [    0.182621]  panic+0x124/0x2f0
  [    0.182621]  ? start_kernel+0x3cf/0x3d0
  [    0.182621]  ? acpi_enable+0x64/0xc0
  [    0.182621]  __stack_chk_fail+0x14/0x20
  [    0.182621]  start_kernel+0x3cf/0x3d0
  [    0.182621]  x86_64_start_reservations+0x24/0x30
  [    0.182621]  x86_64_start_kernel+0xab/0xb0
  [    0.182621]  secondary_startup_64_no_verify+0x107/0x10b
  [    0.182621]  </TASK>
  [    0.182621] ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: start_kernel+0x3cf/0x3d0 ]---

514ca14ed544 aims to avoid this on the Linux side. I have verified that
it applies to 6.4.3 cleanly and resolves the issue there, as has Masami.

If there are any issues or questions, please let me know.

Cheers,
Nathan
Re: [RFC PATCH 0/2] x86: kprobes: Fix CFI_CLANG related issues
Posted by Greg Kroah-Hartman 2 years, 7 months ago
On Tue, Jul 11, 2023 at 11:37:04AM -0700, Nathan Chancellor wrote:
> Masami, thanks for verifying!
> 
> Hi Greg and Sasha,
> 
> On Tue, Jul 11, 2023 at 10:33:03AM +0900, Masami Hiramatsu wrote:
> > On Mon, 10 Jul 2023 08:57:03 -0700
> > Nathan Chancellor <nathan@kernel.org> wrote:
> > 
> > > On Mon, Jul 10, 2023 at 09:14:13PM +0900, Masami Hiramatsu (Google) wrote:
> > > > I just build tested, since I could not boot the kernel with CFI_CLANG=y.
> > > > Would anyone know something about this error?
> > > > 
> > > > [    0.141030] MMIO Stale Data: Unknown: No mitigations
> > > > [    0.153511] SMP alternatives: Using kCFI
> > > > [    0.164593] Freeing SMP alternatives memory: 36K
> > > > [    0.165053] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: start_kernel+0x472/0x48b
> > > > [    0.166028] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.2-00002-g12b1b2fca8ef #126
> > > > [    0.166028] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
> > > > [    0.166028] Call Trace:
> > > > [    0.166028]  <TASK>
> > > > [    0.166028]  dump_stack_lvl+0x6e/0xb0
> > > > [    0.166028]  panic+0x146/0x2f0
> > > > [    0.166028]  ? start_kernel+0x472/0x48b
> > > > [    0.166028]  __stack_chk_fail+0x14/0x20
> > > > [    0.166028]  start_kernel+0x472/0x48b
> > > > [    0.166028]  x86_64_start_reservations+0x24/0x30
> > > > [    0.166028]  x86_64_start_kernel+0xa6/0xbb
> > > > [    0.166028]  secondary_startup_64_no_verify+0x106/0x11b
> > > > [    0.166028]  </TASK>
> > > > [    0.166028] ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: start_kernel+0x472/0x48b ]---
> > > 
> > > This looks like https://github.com/ClangBuiltLinux/linux/issues/1815 to
> > > me. What version of LLVM are you using? This was fixed in 16.0.4. Commit
> > > 514ca14ed544 ("start_kernel: Add __no_stack_protector function
> > > attribute") should resolve it on the Linux side, it looks like that is
> > > in 6.5-rc1. Not sure if we should backport it or just let people upgrade
> > > their toolchains on older releases.
> > 
> > Thanks for the info. I confirmed that the commit fixed the boot issue.
> > So I think it should be backported to the stable tree.
> 
> Would you please apply commit 514ca14ed544 ("start_kernel: Add
> __no_stack_protector function attribute") to linux-6.4.y? The series
> ending with commit 611d4c716db0 ("x86/hyperv: Mark hv_ghcb_terminate()
> as noreturn") that shipped in 6.4 exposes an LLVM issue that affected
> 16.0.0 and 16.0.1, which was resolved in 16.0.2. When using those
> affected LLVM releases, the following crash at boot occurs:
> 
>   [    0.181667] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: start_kernel+0x3cf/0x3d0
>   [    0.182621] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.3 #1
>   [    0.182621] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
>   [    0.182621] Call Trace:
>   [    0.182621]  <TASK>
>   [    0.182621]  dump_stack_lvl+0x6a/0xa0
>   [    0.182621]  panic+0x124/0x2f0
>   [    0.182621]  ? start_kernel+0x3cf/0x3d0
>   [    0.182621]  ? acpi_enable+0x64/0xc0
>   [    0.182621]  __stack_chk_fail+0x14/0x20
>   [    0.182621]  start_kernel+0x3cf/0x3d0
>   [    0.182621]  x86_64_start_reservations+0x24/0x30
>   [    0.182621]  x86_64_start_kernel+0xab/0xb0
>   [    0.182621]  secondary_startup_64_no_verify+0x107/0x10b
>   [    0.182621]  </TASK>
>   [    0.182621] ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: start_kernel+0x3cf/0x3d0 ]---
> 
> 514ca14ed544 aims to avoid this on the Linux side. I have verified that
> it applies to 6.4.3 cleanly and resolves the issue there, as has Masami.
> 
> If there are any issues or questions, please let me know.

Now queued up, thanks.

greg k-h