arch/x86/purgatory/.gitignore | 1 + arch/x86/purgatory/Makefile | 19 +++--------- arch/x86/purgatory/purgatory.lds.S | 63 ++++++++++++++++++++++++++++++++++++++ 3 files changed, 69 insertions(+), 14 deletions(-)
Hi all, After LLVM commit d8a04398f949 ("Reland [X86] With large code model, put functions into .ltext with large section flag (#73037)") [1], which landed in the 18.x cycle, there is a runtime warning when loading a kernel via kexec due to the presence of two text sections (.text and .ltext). $ kexec -l /boot/vmlinuz-linux --initrd=/boot/initramfs-linux.img --reuse-cmdline $ dmesg -l warn+ ... [ 1.264240] ------------[ cut here ]------------ [ 1.264647] WARNING: CPU: 0 PID: 96 at kernel/kexec_file.c:945 kexec_load_purgatory+0x2c8/0x3c0 [ 1.265322] Modules linked in: [ 1.265565] CPU: 0 PID: 96 Comm: kexec Not tainted 6.9.0-rc4-00031-g96fca68c4fbf #1 eae91b3fe699ecba2dd0a886471788e49eb36ac0 [ 1.266403] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 [ 1.267268] RIP: 0010:kexec_load_purgatory+0x2c8/0x3c0 [ 1.267661] Code: 54 24 0c 48 89 c8 48 29 d0 0f 82 5d ff ff ff 49 03 54 24 1c 48 39 d1 0f 83 4f ff ff ff 49 8b 17 48 39 4a 18 0f 84 30 ff ff ff <0f> 0b e9 3b ff ff ff 66 85 c9 74 18 48 8b 5a 28 48 01 d3 45 31 e4 [ 1.269052] RSP: 0018:ffffbe28007cfb50 EFLAGS: 00010206 [ 1.269447] RAX: 0000000000000000 RBX: 00000000000000d0 RCX: 0000000000000000 [ 1.269982] RDX: ffff988c8174d000 RSI: 0000000000000010 RDI: ffffbe2801d940c0 [ 1.270527] RBP: 0000000000000002 R08: 0000003d8b4c0000 R09: cc0000000025ff00 [ 1.271063] R10: 0000003d8b4c0000 R11: cc0000000025ff00 R12: ffffbe28000d5084 [ 1.271603] R13: 000000013ffff000 R14: ffff988c8174d000 R15: ffffbe28007cfbe0 [ 1.272140] FS: 00007fec73535740(0000) GS:ffff988cbbc00000(0000) knlGS:0000000000000000 [ 1.272744] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1.273178] CR2: 00007fec736b1390 CR3: 0000000101a24000 CR4: 0000000000350ef0 [ 1.273732] Call Trace: [ 1.273929] <TASK> [ 1.274100] ? __warn+0xc9/0x1c0 [ 1.274356] ? kexec_load_purgatory+0x2c8/0x3c0 [ 1.274704] ? report_bug+0x139/0x1e0 [ 1.274998] ? handle_bug+0x42/0x70 [ 1.275269] ? exc_invalid_op+0x1a/0x50 [ 1.275574] ? asm_exc_invalid_op+0x1a/0x20 [ 1.275900] ? kexec_load_purgatory+0x2c8/0x3c0 [ 1.276251] bzImage64_load+0x1c1/0x6a0 [ 1.276556] kexec_image_load_default+0x49/0x60 [ 1.276907] __se_sys_kexec_file_load+0x606/0x790 [ 1.277280] ? arch_exit_to_user_mode_prepare+0x6e/0x70 [ 1.277675] do_syscall_64+0x90/0x170 [ 1.277955] ? srso_return_thunk+0x5/0x5f [ 1.278265] ? __count_memcg_events+0x50/0xc0 [ 1.278597] ? srso_return_thunk+0x5/0x5f [ 1.278901] ? handle_mm_fault+0xb18/0x11c0 [ 1.279218] ? vfs_read+0x2c8/0x2f0 [ 1.279498] ? srso_return_thunk+0x5/0x5f [ 1.279802] ? do_user_addr_fault+0x4d2/0x690 [ 1.280138] ? srso_return_thunk+0x5/0x5f [ 1.280449] ? srso_return_thunk+0x5/0x5f [ 1.280755] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 1.281136] RIP: 0033:0x7fec7363e88d [ 1.281411] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 14 0d 00 f7 d8 64 89 01 48 [ 1.282789] RSP: 002b:00007ffd136f4808 EFLAGS: 00000246 ORIG_RAX: 0000000000000140 [ 1.283354] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fec7363e88d [ 1.283893] RDX: 00000000000000c5 RSI: 0000000000000005 RDI: 0000000000000003 [ 1.284427] RBP: 0000000000000003 R08: 0000000000000000 R09: 00005628517eef10 [ 1.284966] R10: 00005628580a75f0 R11: 0000000000000246 R12: 0000000000000003 [ 1.285500] R13: 00005628517f89a8 R14: 00007ffd136f4b98 R15: 0000000000000004 [ 1.286036] </TASK> [ 1.286210] ---[ end trace 0000000000000000 ]--- Unlike LTO and PGO, which were disabled for the purgatory in commit 97b6b9cbba40 ("x86/purgatory: remove PGO flags") and commit 75b2f7e4c9e0 ("x86/purgatory: Remove LTO flags"), this optimization has no flag to opt out of it. One way to resolve this would be to use '.ltext' and '.lrodata' as the text and read-only data sections in the out of line assembly in arch/x86/purgatory but there is nothing that stops future changes from splitting the text section further. Properly avoid the warning by using a linker script to coalesce all separate text sections into one, which was alluded to by both the change that introduced the warning and 75b2f7e4c9e0... I think this really should have been done then but I wasn't looking too far ahead :) To avoid backsliding now that all sections are properly described by the linker script, turn on orphan section warnings as well. [1]: https://github.com/llvm/llvm-project/commit/d8a04398f9492f043ffd8fbaf2458778f7d0fcd5 --- Nathan Chancellor (2): x86/purgatory: Add a linker script x86/purgatory: Enable orphan section warnings arch/x86/purgatory/.gitignore | 1 + arch/x86/purgatory/Makefile | 19 +++--------- arch/x86/purgatory/purgatory.lds.S | 63 ++++++++++++++++++++++++++++++++++++++ 3 files changed, 69 insertions(+), 14 deletions(-) --- base-commit: 0bbac3facb5d6cc0171c45c9873a2dc96bea9680 change-id: 20240416-x86-fix-kexec-with-llvm-18-c986b21845c5 Best regards, -- Nathan Chancellor <nathan@kernel.org>
On April 17, 2024 11:53:44 PM GMT+02:00, Nathan Chancellor <nathan@kernel.org> wrote: >Hi all, > >After LLVM commit d8a04398f949 ("Reland [X86] With large code model, put >functions into .ltext with large section flag (#73037)") [1], which >landed in the 18.x cycle, there is a runtime warning when loading a >kernel via kexec due to the presence of two text sections (.text and >.ltext). How much of this silliness should we expect now for other parts of the kernel? Can we turn this off? Why does llvm enforce .ltext for large code models and why gcc doesn't do that? Why does llvm need to do that, what requirement dictates that? Thx. -- Sent from a small device: formatting sucks and brevity is inevitable.
On Thu, Apr 18, 2024 at 4:15 AM Borislav Petkov <bp@alien8.de> wrote: > How much of this silliness should we expect now for other parts of the kernel? Looks like ARCH=powerpc sets -mcmodel=large for modules and ARCH=um does for the whole kernel. So that LLVM change may have implications for those 2 other architectures. Not sure we've had any bug reports or breakage in CI yet, like we have for x86+kexec. > Can we turn this off? Maybe we need to revisit commit e16c2983fba0 ("x86/purgatory: Change compiler flags from -mcmodel=kernel to -mcmodel=large to fix kexec relocation errors") https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e16c2983fba0fa6763e43ad10916be35e3d8dc05 at least the -mcmodel=kernel addition (since that patch added a few additional compiler flags that still LGTM). > Why does llvm enforce .ltext for large code models and why gcc doesn't do that? Why does llvm need to do that, what requirement dictates that? Google is now at the point where a few binaries running in data centers are measured in the gigabytes, and attempting to link them may result in relocation overflows. From that commit message, it sounds like they link together object files built with the default code model and some objects from the larger code model. Putting large code model data+code in distinct sections is helpful for then being able to place those further away in an object. For other architectures, the linker may insert a veneer/trampoline. Not sure why that's not used here. https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html#index-mlarge-data-threshold makes it sound like GCC may place data larger than a certain threshold in a new section. Dunno about code (.text) though. Arthur, you probably happen to know more about code models at this point than anyone particularly cares to. The raison d'etre for e16c2983fba0 was avoiding R_X86_64_32/R_X86_64_32S relocations. Do you know if there's another code model that can force R_X86_64_64? Or is the large code model the way to go here, with updates to linker scripts for this new section? + Fangrui, Ard, who might know of alternative solutions to -mcmodel=large for e16c2983fba0. Otherwise, I think the dedicated linker script is the way to go. We really want tight control over what is or is not in the purgatory image. -- Thanks, ~Nick Desaulniers
On Thu, 18 Apr 2024 at 17:44, Nick Desaulniers <ndesaulniers@google.com> wrote: > > On Thu, Apr 18, 2024 at 4:15 AM Borislav Petkov <bp@alien8.de> wrote: > > How much of this silliness should we expect now for other parts of the kernel? > > Looks like ARCH=powerpc sets -mcmodel=large for modules and ARCH=um > does for the whole kernel. So that LLVM change may have implications > for those 2 other architectures. Not sure we've had any bug reports > or breakage in CI yet, like we have for x86+kexec. > > > Can we turn this off? > > Maybe we need to revisit > commit e16c2983fba0 ("x86/purgatory: Change compiler flags from > -mcmodel=kernel to -mcmodel=large to fix kexec relocation errors") > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e16c2983fba0fa6763e43ad10916be35e3d8dc05 > > at least the -mcmodel=kernel addition (since that patch added a few > additional compiler flags that still LGTM). > ... > + Fangrui, Ard, who might know of alternative solutions to > -mcmodel=large for e16c2983fba0. > I think it would be better to use -mcmodel=small -fpic. As Nick explains, the large code model is really more suitable for executables that span a large memory range. The issue with the purgatory seems to be that it can be placed anywhere in memory, not that it is very big. -mcmodel=small -fpic is what user space typically uses, so it is much less likely to create problems. Note that I have been looking into whether we can build the entire kernel with -fpic (for various reasons). There are some issues to resolve there, mostly related to per-CPU variables and the per-CPU stack protector, but beyond that, things work happily and the number of boot time relocations drops dramatically, due to the use of RIP-relative references. So for the purgatory, I wouldn't expect too many surprises. > Otherwise, I think the dedicated linker script is the way to go. We > really want tight control over what is or is not in the purgatory > image. Linker scripts are a bit tedious when it comes to maintenance, especially with weird executables such as this one and needing to support different linkers. So I'd prefer to avoid this.
On Thu, 18 Apr 2024 at 17:59, Ard Biesheuvel <ardb@kernel.org> wrote: > > On Thu, 18 Apr 2024 at 17:44, Nick Desaulniers <ndesaulniers@google.com> wrote: > > > > On Thu, Apr 18, 2024 at 4:15 AM Borislav Petkov <bp@alien8.de> wrote: > > > How much of this silliness should we expect now for other parts of the kernel? > > > > Looks like ARCH=powerpc sets -mcmodel=large for modules and ARCH=um > > does for the whole kernel. So that LLVM change may have implications > > for those 2 other architectures. Not sure we've had any bug reports > > or breakage in CI yet, like we have for x86+kexec. > > > > > Can we turn this off? > > > > Maybe we need to revisit > > commit e16c2983fba0 ("x86/purgatory: Change compiler flags from > > -mcmodel=kernel to -mcmodel=large to fix kexec relocation errors") > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e16c2983fba0fa6763e43ad10916be35e3d8dc05 > > > > at least the -mcmodel=kernel addition (since that patch added a few > > additional compiler flags that still LGTM). > > > ... > > > + Fangrui, Ard, who might know of alternative solutions to > > -mcmodel=large for e16c2983fba0. > > > > I think it would be better to use -mcmodel=small -fpic. As Nick > explains, the large code model is really more suitable for executables > that span a large memory range. The issue with the purgatory seems to > be that it can be placed anywhere in memory, not that it is very big. > > -mcmodel=small -fpic is what user space typically uses, so it is much > less likely to create problems. > > Note that I have been looking into whether we can build the entire > kernel with -fpic (for various reasons). There are some issues to > resolve there, mostly related to per-CPU variables and the per-CPU > stack protector, but beyond that, things work happily and the number > of boot time relocations drops dramatically, due to the use of > RIP-relative references. So for the purgatory, I wouldn't expect too > many surprises. > Replacing -mcmodel=large in PURGATORY_CFLAGS with --mcmodel=small -fpic -fvisibility=hidden seems to do the trick for me.
On Thu, Apr 18, 2024 at 01:14:35PM +0200, Borislav Petkov wrote: > On April 17, 2024 11:53:44 PM GMT+02:00, Nathan Chancellor <nathan@kernel.org> wrote: > >Hi all, > > > >After LLVM commit d8a04398f949 ("Reland [X86] With large code model, put > >functions into .ltext with large section flag (#73037)") [1], which > >landed in the 18.x cycle, there is a runtime warning when loading a > >kernel via kexec due to the presence of two text sections (.text and > >.ltext). > > How much of this silliness should we expect now for other parts of the kernel? Not sure. If I could predict the future, I wouldn't be doing kernel development :) The only reason the purgatory got bit by that LLVM change is because it uses '-mcmodel=large', which is not very common within the kernel (I only see it in arch/um and arch/powerpc other than here). > Can we turn this off? No, not as far as I am aware. I suspect it is because for the majority of programs, this is not an issue so it does not justify having a reason to make it toggleable but I am not the author of the LLVM change so I cannot say. However, if this has been the solution when the issue of multiple text sections was first brought up in 97b6b9cbba40, I would just be adding '.ltext' and '.lrodata' to the '.text' and '.rodata' sections to this linker script, so it would be nice to do this so that any future changes are either taken care of by the '.text.*' automatically like '.text.hot' or '.text.<func>' would have been or they are caught by the orphan warnings and addressed in a separate change. > Why does llvm enforce .ltext for large code models and why gcc doesn't do that? Why does llvm need to do that, what requirement dictates that? Not sure, I can only go off of what is in the commit message of the LLVM change that introduced this optimization and the surrounding PR discussion, which just seems to indicate a desire to keep small/medium and large text separate *shrug* https://github.com/llvm/llvm-project/commit/d8a04398f9492f043ffd8fbaf2458778f7d0fcd5 https://github.com/llvm/llvm-project/pull/73037 Cheers, Nathan
© 2016 - 2024 Red Hat, Inc.