x86/purgatory: Avoid kexec runtime warning with LLVM 18

[PATCH 0/2] x86/purgatory: Avoid kexec runtime warning with LLVM 18

Posted by Nathan Chancellor 1 year, 9 months ago

Hi all,

After LLVM commit d8a04398f949 ("Reland [X86] With large code model, put
functions into .ltext with large section flag (#73037)") [1], which
landed in the 18.x cycle, there is a runtime warning when loading a
kernel via kexec due to the presence of two text sections (.text and
.ltext).

  $ kexec -l /boot/vmlinuz-linux --initrd=/boot/initramfs-linux.img --reuse-cmdline
  $ dmesg -l warn+
  ...
  [    1.264240] ------------[ cut here ]------------
  [    1.264647] WARNING: CPU: 0 PID: 96 at kernel/kexec_file.c:945 kexec_load_purgatory+0x2c8/0x3c0
  [    1.265322] Modules linked in:
  [    1.265565] CPU: 0 PID: 96 Comm: kexec Not tainted 6.9.0-rc4-00031-g96fca68c4fbf #1 eae91b3fe699ecba2dd0a886471788e49eb36ac0
  [    1.266403] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
  [    1.267268] RIP: 0010:kexec_load_purgatory+0x2c8/0x3c0
  [    1.267661] Code: 54 24 0c 48 89 c8 48 29 d0 0f 82 5d ff ff ff 49 03 54 24 1c 48 39 d1 0f 83 4f ff ff ff 49 8b 17 48 39 4a 18 0f 84 30 ff ff ff <0f> 0b e9 3b ff ff ff 66 85 c9 74 18 48 8b 5a 28 48 01 d3 45 31 e4
  [    1.269052] RSP: 0018:ffffbe28007cfb50 EFLAGS: 00010206
  [    1.269447] RAX: 0000000000000000 RBX: 00000000000000d0 RCX: 0000000000000000
  [    1.269982] RDX: ffff988c8174d000 RSI: 0000000000000010 RDI: ffffbe2801d940c0
  [    1.270527] RBP: 0000000000000002 R08: 0000003d8b4c0000 R09: cc0000000025ff00
  [    1.271063] R10: 0000003d8b4c0000 R11: cc0000000025ff00 R12: ffffbe28000d5084
  [    1.271603] R13: 000000013ffff000 R14: ffff988c8174d000 R15: ffffbe28007cfbe0
  [    1.272140] FS:  00007fec73535740(0000) GS:ffff988cbbc00000(0000) knlGS:0000000000000000
  [    1.272744] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [    1.273178] CR2: 00007fec736b1390 CR3: 0000000101a24000 CR4: 0000000000350ef0
  [    1.273732] Call Trace:
  [    1.273929]  <TASK>
  [    1.274100]  ? __warn+0xc9/0x1c0
  [    1.274356]  ? kexec_load_purgatory+0x2c8/0x3c0
  [    1.274704]  ? report_bug+0x139/0x1e0
  [    1.274998]  ? handle_bug+0x42/0x70
  [    1.275269]  ? exc_invalid_op+0x1a/0x50
  [    1.275574]  ? asm_exc_invalid_op+0x1a/0x20
  [    1.275900]  ? kexec_load_purgatory+0x2c8/0x3c0
  [    1.276251]  bzImage64_load+0x1c1/0x6a0
  [    1.276556]  kexec_image_load_default+0x49/0x60
  [    1.276907]  __se_sys_kexec_file_load+0x606/0x790
  [    1.277280]  ? arch_exit_to_user_mode_prepare+0x6e/0x70
  [    1.277675]  do_syscall_64+0x90/0x170
  [    1.277955]  ? srso_return_thunk+0x5/0x5f
  [    1.278265]  ? __count_memcg_events+0x50/0xc0
  [    1.278597]  ? srso_return_thunk+0x5/0x5f
  [    1.278901]  ? handle_mm_fault+0xb18/0x11c0
  [    1.279218]  ? vfs_read+0x2c8/0x2f0
  [    1.279498]  ? srso_return_thunk+0x5/0x5f
  [    1.279802]  ? do_user_addr_fault+0x4d2/0x690
  [    1.280138]  ? srso_return_thunk+0x5/0x5f
  [    1.280449]  ? srso_return_thunk+0x5/0x5f
  [    1.280755]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
  [    1.281136] RIP: 0033:0x7fec7363e88d
  [    1.281411] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 14 0d 00 f7 d8 64 89 01 48
  [    1.282789] RSP: 002b:00007ffd136f4808 EFLAGS: 00000246 ORIG_RAX: 0000000000000140
  [    1.283354] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fec7363e88d
  [    1.283893] RDX: 00000000000000c5 RSI: 0000000000000005 RDI: 0000000000000003
  [    1.284427] RBP: 0000000000000003 R08: 0000000000000000 R09: 00005628517eef10
  [    1.284966] R10: 00005628580a75f0 R11: 0000000000000246 R12: 0000000000000003
  [    1.285500] R13: 00005628517f89a8 R14: 00007ffd136f4b98 R15: 0000000000000004
  [    1.286036]  </TASK>
  [    1.286210] ---[ end trace 0000000000000000 ]---

Unlike LTO and PGO, which were disabled for the purgatory in commit
97b6b9cbba40 ("x86/purgatory: remove PGO flags") and commit 75b2f7e4c9e0
("x86/purgatory: Remove LTO flags"), this optimization has no flag to
opt out of it. One way to resolve this would be to use '.ltext' and
'.lrodata' as the text and read-only data sections in the out of line
assembly in arch/x86/purgatory but there is nothing that stops future
changes from splitting the text section further.

Properly avoid the warning by using a linker script to coalesce all
separate text sections into one, which was alluded to by both the change
that introduced the warning and 75b2f7e4c9e0... I think this really
should have been done then but I wasn't looking too far ahead :) To
avoid backsliding now that all sections are properly described by the
linker script, turn on orphan section warnings as well.

[1]: https://github.com/llvm/llvm-project/commit/d8a04398f9492f043ffd8fbaf2458778f7d0fcd5

---
Nathan Chancellor (2):
      x86/purgatory: Add a linker script
      x86/purgatory: Enable orphan section warnings

 arch/x86/purgatory/.gitignore      |  1 +
 arch/x86/purgatory/Makefile        | 19 +++---------
 arch/x86/purgatory/purgatory.lds.S | 63 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 69 insertions(+), 14 deletions(-)
---
base-commit: 0bbac3facb5d6cc0171c45c9873a2dc96bea9680
change-id: 20240416-x86-fix-kexec-with-llvm-18-c986b21845c5

Best regards,
-- 
Nathan Chancellor <nathan@kernel.org>

Re: [PATCH 0/2] x86/purgatory: Avoid kexec runtime warning with LLVM 18

Posted by Borislav Petkov 1 year, 9 months ago

On April 17, 2024 11:53:44 PM GMT+02:00, Nathan Chancellor <nathan@kernel.org> wrote:
>Hi all,
>
>After LLVM commit d8a04398f949 ("Reland [X86] With large code model, put
>functions into .ltext with large section flag (#73037)") [1], which
>landed in the 18.x cycle, there is a runtime warning when loading a
>kernel via kexec due to the presence of two text sections (.text and
>.ltext).

How much of this silliness should we expect now for other parts of the kernel?

Can we turn this off?

Why does llvm enforce .ltext for large code models and why gcc doesn't do that? Why does llvm need to do that, what requirement dictates that?

Thx.

-- 
Sent from a small device: formatting sucks and brevity is inevitable.

Re: [PATCH 0/2] x86/purgatory: Avoid kexec runtime warning with LLVM 18

Posted by Nick Desaulniers 1 year, 9 months ago

On Thu, Apr 18, 2024 at 4:15 AM Borislav Petkov <bp@alien8.de> wrote:
> How much of this silliness should we expect now for other parts of the kernel?

Looks like ARCH=powerpc sets -mcmodel=large for modules and ARCH=um
does for the whole kernel. So that LLVM change may have implications
for those 2 other architectures. Not sure we've had any bug reports
or breakage in CI yet, like we have for x86+kexec.

> Can we turn this off?

Maybe we need to revisit
commit e16c2983fba0 ("x86/purgatory: Change compiler flags from
-mcmodel=kernel to -mcmodel=large to fix kexec relocation errors")

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e16c2983fba0fa6763e43ad10916be35e3d8dc05

at least the -mcmodel=kernel addition (since that patch added a few
additional compiler flags that still LGTM).

> Why does llvm enforce .ltext for large code models and why gcc doesn't do that? Why does llvm need to do that, what requirement dictates that?

Google is now at the point where a few binaries running in data
centers are measured in the gigabytes, and attempting to link them may
result in relocation overflows. From that commit message, it sounds
like they link together object files built with the default code model
and some objects from the larger code model. Putting large code model
data+code in distinct sections is helpful for then being able to place
those further away in an object. For other architectures, the linker
may insert a veneer/trampoline. Not sure why that's not used here.

https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html#index-mlarge-data-threshold
makes it sound like GCC may place data larger than a certain threshold
in a new section. Dunno about code (.text) though.

Arthur, you probably happen to know more about code models at this
point than anyone particularly cares to. The raison d'etre for
e16c2983fba0 was avoiding R_X86_64_32/R_X86_64_32S relocations. Do
you know if there's another code model that can force R_X86_64_64? Or
is the large code model the way to go here, with updates to linker
scripts for this new section?

+ Fangrui, Ard, who might know of alternative solutions to
-mcmodel=large for e16c2983fba0.

Otherwise, I think the dedicated linker script is the way to go. We
really want tight control over what is or is not in the purgatory
image.
--
Thanks,
~Nick Desaulniers

Re: [PATCH 0/2] x86/purgatory: Avoid kexec runtime warning with LLVM 18

Posted by Ard Biesheuvel 1 year, 9 months ago

On Thu, 18 Apr 2024 at 17:44, Nick Desaulniers <ndesaulniers@google.com> wrote:
>
> On Thu, Apr 18, 2024 at 4:15 AM Borislav Petkov <bp@alien8.de> wrote:
> > How much of this silliness should we expect now for other parts of the kernel?
>
> Looks like ARCH=powerpc sets -mcmodel=large for modules and ARCH=um
> does for the whole kernel. So that LLVM change may have implications
> for those 2 other architectures.  Not sure we've had any bug reports
> or breakage in CI yet, like we have for x86+kexec.
>
> > Can we turn this off?
>
> Maybe we need to revisit
> commit e16c2983fba0 ("x86/purgatory: Change compiler flags from
> -mcmodel=kernel to -mcmodel=large to fix kexec relocation errors")
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e16c2983fba0fa6763e43ad10916be35e3d8dc05
>
> at least the -mcmodel=kernel addition (since that patch added a few
> additional compiler flags that still LGTM).
>
...

> + Fangrui, Ard, who might know of alternative solutions to
> -mcmodel=large for e16c2983fba0.
>

I think it would be better to use -mcmodel=small -fpic. As Nick
explains, the large code model is really more suitable for executables
that span a large memory range. The issue with the purgatory seems to
be that it can be placed anywhere in memory, not that it is very big.

-mcmodel=small -fpic is what user space typically uses, so it is much
less likely to create problems.

Note that I have been looking into whether we can build the entire
kernel with -fpic (for various reasons). There are some issues to
resolve there, mostly related to per-CPU variables and the per-CPU
stack protector, but beyond that, things work happily and the number
of boot time relocations drops dramatically, due to the use of
RIP-relative references. So for the purgatory, I wouldn't expect too
many surprises.

> Otherwise, I think the dedicated linker script is the way to go. We
> really want tight control over what is or is not in the purgatory
> image.

Linker scripts are a bit tedious when it comes to maintenance,
especially with weird executables such as this one and needing to
support different linkers. So I'd prefer to avoid this.

Re: [PATCH 0/2] x86/purgatory: Avoid kexec runtime warning with LLVM 18

Posted by Ard Biesheuvel 1 year, 9 months ago

On Thu, 18 Apr 2024 at 17:59, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Thu, 18 Apr 2024 at 17:44, Nick Desaulniers <ndesaulniers@google.com> wrote:
> >
> > On Thu, Apr 18, 2024 at 4:15 AM Borislav Petkov <bp@alien8.de> wrote:
> > > How much of this silliness should we expect now for other parts of the kernel?
> >
> > Looks like ARCH=powerpc sets -mcmodel=large for modules and ARCH=um
> > does for the whole kernel. So that LLVM change may have implications
> > for those 2 other architectures.  Not sure we've had any bug reports
> > or breakage in CI yet, like we have for x86+kexec.
> >
> > > Can we turn this off?
> >
> > Maybe we need to revisit
> > commit e16c2983fba0 ("x86/purgatory: Change compiler flags from
> > -mcmodel=kernel to -mcmodel=large to fix kexec relocation errors")
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e16c2983fba0fa6763e43ad10916be35e3d8dc05
> >
> > at least the -mcmodel=kernel addition (since that patch added a few
> > additional compiler flags that still LGTM).
> >
> ...
>
> > + Fangrui, Ard, who might know of alternative solutions to
> > -mcmodel=large for e16c2983fba0.
> >
>
> I think it would be better to use -mcmodel=small -fpic. As Nick
> explains, the large code model is really more suitable for executables
> that span a large memory range. The issue with the purgatory seems to
> be that it can be placed anywhere in memory, not that it is very big.
>
> -mcmodel=small -fpic is what user space typically uses, so it is much
> less likely to create problems.
>
> Note that I have been looking into whether we can build the entire
> kernel with -fpic (for various reasons). There are some issues to
> resolve there, mostly related to per-CPU variables and the per-CPU
> stack protector, but beyond that, things work happily and the number
> of boot time relocations drops dramatically, due to the use of
> RIP-relative references. So for the purgatory, I wouldn't expect too
> many surprises.
>

Replacing -mcmodel=large in PURGATORY_CFLAGS with

--mcmodel=small -fpic -fvisibility=hidden

seems to do the trick for me.

Re: [PATCH 0/2] x86/purgatory: Avoid kexec runtime warning with LLVM 18

Posted by Nathan Chancellor 1 year, 9 months ago

On Thu, Apr 18, 2024 at 01:14:35PM +0200, Borislav Petkov wrote:
> On April 17, 2024 11:53:44 PM GMT+02:00, Nathan Chancellor <nathan@kernel.org> wrote:
> >Hi all,
> >
> >After LLVM commit d8a04398f949 ("Reland [X86] With large code model, put
> >functions into .ltext with large section flag (#73037)") [1], which
> >landed in the 18.x cycle, there is a runtime warning when loading a
> >kernel via kexec due to the presence of two text sections (.text and
> >.ltext).
> 
> How much of this silliness should we expect now for other parts of the kernel?

Not sure. If I could predict the future, I wouldn't be doing kernel
development :) The only reason the purgatory got bit by that LLVM change
is because it uses '-mcmodel=large', which is not very common within the
kernel (I only see it in arch/um and arch/powerpc other than here).

> Can we turn this off?

No, not as far as I am aware. I suspect it is because for the majority
of programs, this is not an issue so it does not justify having a reason
to make it toggleable but I am not the author of the LLVM change so I
cannot say. However, if this has been the solution when the issue of
multiple text sections was first brought up in 97b6b9cbba40, I would
just be adding '.ltext' and '.lrodata' to the '.text' and '.rodata'
sections to this linker script, so it would be nice to do this so that
any future changes are either taken care of by the '.text.*'
automatically like '.text.hot' or '.text.<func>' would have been or they
are caught by the orphan warnings and addressed in a separate change.

> Why does llvm enforce .ltext for large code models and why gcc doesn't do that? Why does llvm need to do that, what requirement dictates that?

Not sure, I can only go off of what is in the commit message of the LLVM
change that introduced this optimization and the surrounding PR
discussion, which just seems to indicate a desire to keep small/medium
and large text separate *shrug*

https://github.com/llvm/llvm-project/commit/d8a04398f9492f043ffd8fbaf2458778f7d0fcd5

https://github.com/llvm/llvm-project/pull/73037

Cheers,
Nathan