From: Ard Biesheuvel <ardb@kernel.org>
Build the kernel as a Position Independent Executable (PIE). This
results in more efficient relocation processing for the virtual
displacement of the kernel (for KASLR). More importantly, it instructs
the linker to generate what is actually needed (a program that can be
moved around in memory before execution), which is better than having to
rely on the linker to create a position dependent binary that happens to
tolerate being moved around after poking it in exactly the right manner.
Note that this means that all codegen should be compatible with PIE,
including Rust objects, so this needs to switch to the small code model
with the PIE relocation model as well.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/Kconfig | 2 +-
arch/x86/Makefile | 11 +++++++----
arch/x86/boot/compressed/misc.c | 2 ++
arch/x86/kernel/vmlinux.lds.S | 5 +++++
drivers/firmware/efi/libstub/x86-stub.c | 2 ++
5 files changed, 17 insertions(+), 5 deletions(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 54cb1f14218b..dbb4d284b0e1 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2187,7 +2187,7 @@ config RANDOMIZE_BASE
# Relocation on x86 needs some additional build support
config X86_NEED_RELOCS
def_bool y
- depends on RANDOMIZE_BASE || (X86_32 && RELOCATABLE)
+ depends on X86_32 && RELOCATABLE
config PHYSICAL_ALIGN
hex "Alignment value to which kernel should be aligned"
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 83d20f402535..c1dcff444bc8 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -206,9 +206,8 @@ else
PIE_CFLAGS-$(CONFIG_SMP) += -mstack-protector-guard-reg=gs
endif
- # Don't emit relaxable GOTPCREL relocations
- KBUILD_AFLAGS_KERNEL += -Wa,-mrelax-relocations=no
- KBUILD_CFLAGS_KERNEL += -Wa,-mrelax-relocations=no $(PIE_CFLAGS-y)
+ KBUILD_CFLAGS_KERNEL += $(PIE_CFLAGS-y)
+ KBUILD_RUSTFLAGS_KERNEL += -Ccode-model=small -Crelocation-model=pie
endif
#
@@ -264,12 +263,16 @@ else
LDFLAGS_vmlinux :=
endif
+ifdef CONFIG_X86_64
+ldflags-pie-$(CONFIG_LD_IS_LLD) := --apply-dynamic-relocs
+ldflags-pie-$(CONFIG_LD_IS_BFD) := -z call-nop=suffix-nop
+LDFLAGS_vmlinux += --pie -z text $(ldflags-pie-y)
+
#
# The 64-bit kernel must be aligned to 2MB. Pass -z max-page-size=0x200000 to
# the linker to force 2MB page size regardless of the default page size used
# by the linker.
#
-ifdef CONFIG_X86_64
LDFLAGS_vmlinux += -z max-page-size=0x200000
endif
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 89f01375cdb7..79e3ffe16f61 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -495,6 +495,8 @@ asmlinkage __visible void *extract_kernel(void *rmode, unsigned char *output)
error("Destination virtual address changed when not relocatable");
#endif
+ boot_params_ptr->kaslr_va_shift = virt_addr - LOAD_PHYSICAL_ADDR;
+
debug_putstr("\nDecompressing Linux... ");
if (init_unaccepted_memory()) {
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index f7e832c2ac61..d172e6e8eaaf 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -459,6 +459,11 @@ xen_elfnote_phys32_entry_offset =
DISCARDS
+ /DISCARD/ : {
+ *(.dynsym .gnu.hash .hash .dynamic .dynstr)
+ *(.interp .dynbss .eh_frame .sframe)
+ }
+
/*
* Make sure that the .got.plt is either completely empty or it
* contains only the lazy dispatch entries.
diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index f8e465da344d..5c03954924fe 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -912,6 +912,8 @@ static efi_status_t efi_decompress_kernel(unsigned long *kernel_entry)
if (status != EFI_SUCCESS)
return status;
+ boot_params_ptr->kaslr_va_shift = virt_addr - LOAD_PHYSICAL_ADDR;
+
entry = decompress_kernel((void *)addr, virt_addr, error);
if (entry == ULONG_MAX) {
efi_free(alloc_size, addr);
--
2.46.0.792.g87dc391469-goog
On 25/09/2024 17:01, Ard Biesheuvel wrote: > From: Ard Biesheuvel <ardb@kernel.org> > > Build the kernel as a Position Independent Executable (PIE). This > results in more efficient relocation processing for the virtual > displacement of the kernel (for KASLR). More importantly, it instructs > the linker to generate what is actually needed (a program that can be > moved around in memory before execution), which is better than having to > rely on the linker to create a position dependent binary that happens to > tolerate being moved around after poking it in exactly the right manner. > > Note that this means that all codegen should be compatible with PIE, > including Rust objects, so this needs to switch to the small code model > with the PIE relocation model as well. > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org> > --- > arch/x86/Kconfig | 2 +- > arch/x86/Makefile | 11 +++++++---- > arch/x86/boot/compressed/misc.c | 2 ++ > arch/x86/kernel/vmlinux.lds.S | 5 +++++ > drivers/firmware/efi/libstub/x86-stub.c | 2 ++ > 5 files changed, 17 insertions(+), 5 deletions(-) > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index 54cb1f14218b..dbb4d284b0e1 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -2187,7 +2187,7 @@ config RANDOMIZE_BASE > # Relocation on x86 needs some additional build support > config X86_NEED_RELOCS > def_bool y > - depends on RANDOMIZE_BASE || (X86_32 && RELOCATABLE) > + depends on X86_32 && RELOCATABLE > > config PHYSICAL_ALIGN > hex "Alignment value to which kernel should be aligned" > diff --git a/arch/x86/Makefile b/arch/x86/Makefile > index 83d20f402535..c1dcff444bc8 100644 > --- a/arch/x86/Makefile > +++ b/arch/x86/Makefile > @@ -206,9 +206,8 @@ else > PIE_CFLAGS-$(CONFIG_SMP) += -mstack-protector-guard-reg=gs > endif > > - # Don't emit relaxable GOTPCREL relocations > - KBUILD_AFLAGS_KERNEL += -Wa,-mrelax-relocations=no > - KBUILD_CFLAGS_KERNEL += -Wa,-mrelax-relocations=no $(PIE_CFLAGS-y) > + KBUILD_CFLAGS_KERNEL += $(PIE_CFLAGS-y) > + KBUILD_RUSTFLAGS_KERNEL += -Ccode-model=small -Crelocation-model=pie > endif > > # > @@ -264,12 +263,16 @@ else > LDFLAGS_vmlinux := > endif > > +ifdef CONFIG_X86_64 > +ldflags-pie-$(CONFIG_LD_IS_LLD) := --apply-dynamic-relocs > +ldflags-pie-$(CONFIG_LD_IS_BFD) := -z call-nop=suffix-nop > +LDFLAGS_vmlinux += --pie -z text $(ldflags-pie-y) > + > # > # The 64-bit kernel must be aligned to 2MB. Pass -z max-page-size=0x200000 to > # the linker to force 2MB page size regardless of the default page size used > # by the linker. > # > -ifdef CONFIG_X86_64 > LDFLAGS_vmlinux += -z max-page-size=0x200000 > endif > > diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c > index 89f01375cdb7..79e3ffe16f61 100644 > --- a/arch/x86/boot/compressed/misc.c > +++ b/arch/x86/boot/compressed/misc.c > @@ -495,6 +495,8 @@ asmlinkage __visible void *extract_kernel(void *rmode, unsigned char *output) > error("Destination virtual address changed when not relocatable"); > #endif > > + boot_params_ptr->kaslr_va_shift = virt_addr - LOAD_PHYSICAL_ADDR; > + > debug_putstr("\nDecompressing Linux... "); > > if (init_unaccepted_memory()) { > diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S > index f7e832c2ac61..d172e6e8eaaf 100644 > --- a/arch/x86/kernel/vmlinux.lds.S > +++ b/arch/x86/kernel/vmlinux.lds.S > @@ -459,6 +459,11 @@ xen_elfnote_phys32_entry_offset = > > DISCARDS > > + /DISCARD/ : { > + *(.dynsym .gnu.hash .hash .dynamic .dynstr) > + *(.interp .dynbss .eh_frame .sframe) > + } > + > /* > * Make sure that the .got.plt is either completely empty or it > * contains only the lazy dispatch entries. > diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c > index f8e465da344d..5c03954924fe 100644 > --- a/drivers/firmware/efi/libstub/x86-stub.c > +++ b/drivers/firmware/efi/libstub/x86-stub.c > @@ -912,6 +912,8 @@ static efi_status_t efi_decompress_kernel(unsigned long *kernel_entry) > if (status != EFI_SUCCESS) > return status; > > + boot_params_ptr->kaslr_va_shift = virt_addr - LOAD_PHYSICAL_ADDR; > + > entry = decompress_kernel((void *)addr, virt_addr, error); > if (entry == ULONG_MAX) { > efi_free(alloc_size, addr); This patch causes a build failure here (on 64-bit): LD .tmp_vmlinux2 NM .tmp_vmlinux2.syms KSYMS .tmp_vmlinux2.kallsyms.S AS .tmp_vmlinux2.kallsyms.o LD vmlinux BTFIDS vmlinux WARN: resolve_btfids: unresolved symbol bpf_lsm_key_free FAILED elf_update(WRITE): invalid section entry size make[5]: *** [scripts/Makefile.vmlinux:34: vmlinux] Error 255 make[5]: *** Deleting file 'vmlinux' make[4]: *** [Makefile:1153: vmlinux] Error 2 make[3]: *** [debian/rules:74: build-arch] Error 2 dpkg-buildpackage: error: make -f debian/rules binary subprocess returned exit status 2 make[2]: *** [scripts/Makefile.package:121: bindeb-pkg] Error 2 make[1]: *** [/home/opc/linux-mainline-worktree2/Makefile:1544: bindeb-pkg] Error 2 make: *** [Makefile:224: __sub-make] Error 2 The parent commit builds fine. With V=1: + ldflags='-m elf_x86_64 -z noexecstack --pie -z text -z call-nop=suffix-nop -z max-page-size=0x200000 --build-id=sha1 --orphan-handling=warn --script=./arch/x86/kernel/vmlinux.lds -Map=vmlinux.map' + ld -m elf_x86_64 -z noexecstack --pie -z text -z call-nop=suffix-nop -z max-page-size=0x200000 --build-id=sha1 --orphan-handling=warn --script=./arch/x86/kernel/vmlinux.lds -Map=vmlinux.map -o vmlinux --whole-archive vmlinux.a .vmlinux.export.o init/version-timestamp.o --no-whole-archive --start-group --end-group .tmp_vmlinux2.kallsyms.o .tmp_vmlinux1.btf.o + is_enabled CONFIG_DEBUG_INFO_BTF + grep -q '^CONFIG_DEBUG_INFO_BTF=y' include/config/auto.conf + info BTFIDS vmlinux + printf ' %-7s %s\n' BTFIDS vmlinux BTFIDS vmlinux + ./tools/bpf/resolve_btfids/resolve_btfids vmlinux WARN: resolve_btfids: unresolved symbol bpf_lsm_key_free FAILED elf_update(WRITE): invalid section entry size I can send the full config off-list if necessary, but looks like it might be enough to set CONFIG_DEBUG_INFO_BTF=y. Vegard
On Wed, 25 Sept 2024 at 22:25, Vegard Nossum <vegard.nossum@oracle.com> wrote: > > > On 25/09/2024 17:01, Ard Biesheuvel wrote: > > From: Ard Biesheuvel <ardb@kernel.org> > > > > Build the kernel as a Position Independent Executable (PIE). This > > results in more efficient relocation processing for the virtual > > displacement of the kernel (for KASLR). More importantly, it instructs > > the linker to generate what is actually needed (a program that can be > > moved around in memory before execution), which is better than having to > > rely on the linker to create a position dependent binary that happens to > > tolerate being moved around after poking it in exactly the right manner. > > > > Note that this means that all codegen should be compatible with PIE, > > including Rust objects, so this needs to switch to the small code model > > with the PIE relocation model as well. > > > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org> > > --- > > arch/x86/Kconfig | 2 +- > > arch/x86/Makefile | 11 +++++++---- > > arch/x86/boot/compressed/misc.c | 2 ++ > > arch/x86/kernel/vmlinux.lds.S | 5 +++++ > > drivers/firmware/efi/libstub/x86-stub.c | 2 ++ > > 5 files changed, 17 insertions(+), 5 deletions(-) > > ... > > This patch causes a build failure here (on 64-bit): > > LD .tmp_vmlinux2 > NM .tmp_vmlinux2.syms > KSYMS .tmp_vmlinux2.kallsyms.S > AS .tmp_vmlinux2.kallsyms.o > LD vmlinux > BTFIDS vmlinux > WARN: resolve_btfids: unresolved symbol bpf_lsm_key_free > FAILED elf_update(WRITE): invalid section entry size > make[5]: *** [scripts/Makefile.vmlinux:34: vmlinux] Error 255 > make[5]: *** Deleting file 'vmlinux' > make[4]: *** [Makefile:1153: vmlinux] Error 2 > make[3]: *** [debian/rules:74: build-arch] Error 2 > dpkg-buildpackage: error: make -f debian/rules binary subprocess > returned exit status 2 > make[2]: *** [scripts/Makefile.package:121: bindeb-pkg] Error 2 > make[1]: *** [/home/opc/linux-mainline-worktree2/Makefile:1544: > bindeb-pkg] Error 2 > make: *** [Makefile:224: __sub-make] Error 2 > > The parent commit builds fine. With V=1: > > + ldflags='-m elf_x86_64 -z noexecstack --pie -z text -z > call-nop=suffix-nop -z max-page-size=0x200000 --build-id=sha1 > --orphan-handling=warn --script=./arch/x86/kernel/vmlinux.lds > -Map=vmlinux.map' > + ld -m elf_x86_64 -z noexecstack --pie -z text -z call-nop=suffix-nop > -z max-page-size=0x200000 --build-id=sha1 --orphan-handling=warn > --script=./arch/x86/kernel/vmlinux.lds -Map=vmlinux.map -o vmlinux > --whole-archive vmlinux.a .vmlinux.export.o init/version-timestamp.o > --no-whole-archive --start-group --end-group .tmp_vmlinux2.kallsyms.o > .tmp_vmlinux1.btf.o > + is_enabled CONFIG_DEBUG_INFO_BTF > + grep -q '^CONFIG_DEBUG_INFO_BTF=y' include/config/auto.conf > + info BTFIDS vmlinux > + printf ' %-7s %s\n' BTFIDS vmlinux > BTFIDS vmlinux > + ./tools/bpf/resolve_btfids/resolve_btfids vmlinux > WARN: resolve_btfids: unresolved symbol bpf_lsm_key_free > FAILED elf_update(WRITE): invalid section entry size > > I can send the full config off-list if necessary, but looks like it > might be enough to set CONFIG_DEBUG_INFO_BTF=y. > Thanks for the report. Turns out that adding the GOT to .rodata bumps the section's sh_entsize to 8, and libelf complains if the section size is not a multiple of the entry size. I'll include a fix in the next revision.
On Wed, Sep 25, 2024 at 5:02 PM Ard Biesheuvel <ardb+git@google.com> wrote: > > From: Ard Biesheuvel <ardb@kernel.org> > > Build the kernel as a Position Independent Executable (PIE). This > results in more efficient relocation processing for the virtual > displacement of the kernel (for KASLR). More importantly, it instructs > the linker to generate what is actually needed (a program that can be > moved around in memory before execution), which is better than having to > rely on the linker to create a position dependent binary that happens to > tolerate being moved around after poking it in exactly the right manner. > > Note that this means that all codegen should be compatible with PIE, > including Rust objects, so this needs to switch to the small code model > with the PIE relocation model as well. I think that related to this work is the patch series [1] that introduces the changes necessary to build the kernel as Position Independent Executable (PIE) on x86_64 [1]. There are some more places that need to be adapted for PIE. The patch series also introduces objtool functionality to add validation for x86 PIE. [1] "[PATCH RFC 00/43] x86/pie: Make kernel image's virtual address flexible" https://lore.kernel.org/lkml/cover.1682673542.git.houwenlong.hwl@antgroup.com/ Uros. > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org> > --- > arch/x86/Kconfig | 2 +- > arch/x86/Makefile | 11 +++++++---- > arch/x86/boot/compressed/misc.c | 2 ++ > arch/x86/kernel/vmlinux.lds.S | 5 +++++ > drivers/firmware/efi/libstub/x86-stub.c | 2 ++ > 5 files changed, 17 insertions(+), 5 deletions(-) > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index 54cb1f14218b..dbb4d284b0e1 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -2187,7 +2187,7 @@ config RANDOMIZE_BASE > # Relocation on x86 needs some additional build support > config X86_NEED_RELOCS > def_bool y > - depends on RANDOMIZE_BASE || (X86_32 && RELOCATABLE) > + depends on X86_32 && RELOCATABLE > > config PHYSICAL_ALIGN > hex "Alignment value to which kernel should be aligned" > diff --git a/arch/x86/Makefile b/arch/x86/Makefile > index 83d20f402535..c1dcff444bc8 100644 > --- a/arch/x86/Makefile > +++ b/arch/x86/Makefile > @@ -206,9 +206,8 @@ else > PIE_CFLAGS-$(CONFIG_SMP) += -mstack-protector-guard-reg=gs > endif > > - # Don't emit relaxable GOTPCREL relocations > - KBUILD_AFLAGS_KERNEL += -Wa,-mrelax-relocations=no > - KBUILD_CFLAGS_KERNEL += -Wa,-mrelax-relocations=no $(PIE_CFLAGS-y) > + KBUILD_CFLAGS_KERNEL += $(PIE_CFLAGS-y) > + KBUILD_RUSTFLAGS_KERNEL += -Ccode-model=small -Crelocation-model=pie > endif > > # > @@ -264,12 +263,16 @@ else > LDFLAGS_vmlinux := > endif > > +ifdef CONFIG_X86_64 > +ldflags-pie-$(CONFIG_LD_IS_LLD) := --apply-dynamic-relocs > +ldflags-pie-$(CONFIG_LD_IS_BFD) := -z call-nop=suffix-nop > +LDFLAGS_vmlinux += --pie -z text $(ldflags-pie-y) > + > # > # The 64-bit kernel must be aligned to 2MB. Pass -z max-page-size=0x200000 to > # the linker to force 2MB page size regardless of the default page size used > # by the linker. > # > -ifdef CONFIG_X86_64 > LDFLAGS_vmlinux += -z max-page-size=0x200000 > endif > > diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c > index 89f01375cdb7..79e3ffe16f61 100644 > --- a/arch/x86/boot/compressed/misc.c > +++ b/arch/x86/boot/compressed/misc.c > @@ -495,6 +495,8 @@ asmlinkage __visible void *extract_kernel(void *rmode, unsigned char *output) > error("Destination virtual address changed when not relocatable"); > #endif > > + boot_params_ptr->kaslr_va_shift = virt_addr - LOAD_PHYSICAL_ADDR; > + > debug_putstr("\nDecompressing Linux... "); > > if (init_unaccepted_memory()) { > diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S > index f7e832c2ac61..d172e6e8eaaf 100644 > --- a/arch/x86/kernel/vmlinux.lds.S > +++ b/arch/x86/kernel/vmlinux.lds.S > @@ -459,6 +459,11 @@ xen_elfnote_phys32_entry_offset = > > DISCARDS > > + /DISCARD/ : { > + *(.dynsym .gnu.hash .hash .dynamic .dynstr) > + *(.interp .dynbss .eh_frame .sframe) > + } > + > /* > * Make sure that the .got.plt is either completely empty or it > * contains only the lazy dispatch entries. > diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c > index f8e465da344d..5c03954924fe 100644 > --- a/drivers/firmware/efi/libstub/x86-stub.c > +++ b/drivers/firmware/efi/libstub/x86-stub.c > @@ -912,6 +912,8 @@ static efi_status_t efi_decompress_kernel(unsigned long *kernel_entry) > if (status != EFI_SUCCESS) > return status; > > + boot_params_ptr->kaslr_va_shift = virt_addr - LOAD_PHYSICAL_ADDR; > + > entry = decompress_kernel((void *)addr, virt_addr, error); > if (entry == ULONG_MAX) { > efi_free(alloc_size, addr); > -- > 2.46.0.792.g87dc391469-goog >
On Wed, 25 Sept 2024 at 20:54, Uros Bizjak <ubizjak@gmail.com> wrote: > > On Wed, Sep 25, 2024 at 5:02 PM Ard Biesheuvel <ardb+git@google.com> wrote: > > > > From: Ard Biesheuvel <ardb@kernel.org> > > > > Build the kernel as a Position Independent Executable (PIE). This > > results in more efficient relocation processing for the virtual > > displacement of the kernel (for KASLR). More importantly, it instructs > > the linker to generate what is actually needed (a program that can be > > moved around in memory before execution), which is better than having to > > rely on the linker to create a position dependent binary that happens to > > tolerate being moved around after poking it in exactly the right manner. > > > > Note that this means that all codegen should be compatible with PIE, > > including Rust objects, so this needs to switch to the small code model > > with the PIE relocation model as well. > > I think that related to this work is the patch series [1] that > introduces the changes necessary to build the kernel as Position > Independent Executable (PIE) on x86_64 [1]. There are some more places > that need to be adapted for PIE. The patch series also introduces > objtool functionality to add validation for x86 PIE. > > [1] "[PATCH RFC 00/43] x86/pie: Make kernel image's virtual address flexible" > https://lore.kernel.org/lkml/cover.1682673542.git.houwenlong.hwl@antgroup.com/ > Hi Uros, I am aware of that discussion, as I took part in it as well. I don't think any of those changes are actually needed now - did you notice anything in particular that is missing?
On Wed, Sep 25, 2024 at 9:14 PM Ard Biesheuvel <ardb@kernel.org> wrote: > > On Wed, 25 Sept 2024 at 20:54, Uros Bizjak <ubizjak@gmail.com> wrote: > > > > On Wed, Sep 25, 2024 at 5:02 PM Ard Biesheuvel <ardb+git@google.com> wrote: > > > > > > From: Ard Biesheuvel <ardb@kernel.org> > > > > > > Build the kernel as a Position Independent Executable (PIE). This > > > results in more efficient relocation processing for the virtual > > > displacement of the kernel (for KASLR). More importantly, it instructs > > > the linker to generate what is actually needed (a program that can be > > > moved around in memory before execution), which is better than having to > > > rely on the linker to create a position dependent binary that happens to > > > tolerate being moved around after poking it in exactly the right manner. > > > > > > Note that this means that all codegen should be compatible with PIE, > > > including Rust objects, so this needs to switch to the small code model > > > with the PIE relocation model as well. > > > > I think that related to this work is the patch series [1] that > > introduces the changes necessary to build the kernel as Position > > Independent Executable (PIE) on x86_64 [1]. There are some more places > > that need to be adapted for PIE. The patch series also introduces > > objtool functionality to add validation for x86 PIE. > > > > [1] "[PATCH RFC 00/43] x86/pie: Make kernel image's virtual address flexible" > > https://lore.kernel.org/lkml/cover.1682673542.git.houwenlong.hwl@antgroup.com/ > > > > Hi Uros, > > I am aware of that discussion, as I took part in it as well. > > I don't think any of those changes are actually needed now - did you > notice anything in particular that is missing? Some time ago I went through the kernel sources and proposed several patches that changed all trivial occurrences of non-RIP addresses to RIP ones. The work was partially based on the mentioned patch series, and I remember, I left some of them out [e.g. 1], because they required a temporary variable. Also, there was discussion about ftrace [2], where no solution was found. Looking through your series, I didn't find some of the non-RIP -> RIP changes proposed by the original series (especially the ftrace part), and noticed that there is no objtool validator proposed to ensure that all generated code is indeed PIE compatible. Speaking of non-RIP -> RIP changes that require a temporary - would it be beneficial to make a macro that would use the RIP form only when #ifdef CONFIG_X86_PIE? That would avoid code size increase when PIE is not needed. [1] https://lore.kernel.org/lkml/a0b69f3fac1834c05f960b916cc6eb0004cdffbf.1682673543.git.houwenlong.hwl@antgroup.com/ [2] https://lore.kernel.org/lkml/20230428094454.0f2f5049@gandalf.local.home/ [3] https://lore.kernel.org/lkml/226af8c63c5bfa361763dd041a997ee84fe926cf.1682673543.git.houwenlong.hwl@antgroup.com/ Thanks and best regards, Uros.
On Wed, 25 Sept 2024 at 21:39, Uros Bizjak <ubizjak@gmail.com> wrote: > > On Wed, Sep 25, 2024 at 9:14 PM Ard Biesheuvel <ardb@kernel.org> wrote: > > > > On Wed, 25 Sept 2024 at 20:54, Uros Bizjak <ubizjak@gmail.com> wrote: > > > > > > On Wed, Sep 25, 2024 at 5:02 PM Ard Biesheuvel <ardb+git@google.com> wrote: > > > > > > > > From: Ard Biesheuvel <ardb@kernel.org> > > > > > > > > Build the kernel as a Position Independent Executable (PIE). This > > > > results in more efficient relocation processing for the virtual > > > > displacement of the kernel (for KASLR). More importantly, it instructs > > > > the linker to generate what is actually needed (a program that can be > > > > moved around in memory before execution), which is better than having to > > > > rely on the linker to create a position dependent binary that happens to > > > > tolerate being moved around after poking it in exactly the right manner. > > > > > > > > Note that this means that all codegen should be compatible with PIE, > > > > including Rust objects, so this needs to switch to the small code model > > > > with the PIE relocation model as well. > > > > > > I think that related to this work is the patch series [1] that > > > introduces the changes necessary to build the kernel as Position > > > Independent Executable (PIE) on x86_64 [1]. There are some more places > > > that need to be adapted for PIE. The patch series also introduces > > > objtool functionality to add validation for x86 PIE. > > > > > > [1] "[PATCH RFC 00/43] x86/pie: Make kernel image's virtual address flexible" > > > https://lore.kernel.org/lkml/cover.1682673542.git.houwenlong.hwl@antgroup.com/ > > > > > > > Hi Uros, > > > > I am aware of that discussion, as I took part in it as well. > > > > I don't think any of those changes are actually needed now - did you > > notice anything in particular that is missing? > > Some time ago I went through the kernel sources and proposed several > patches that changed all trivial occurrences of non-RIP addresses to > RIP ones. The work was partially based on the mentioned patch series, > and I remember, I left some of them out [e.g. 1], because they > required a temporary variable. I have a similar patch in my series, but the DEBUG_ENTRY code just uses pushf 1f@GOTPCREL(%rip) so no temporaries are needed. > Also, there was discussion about ftrace > [2], where no solution was found. > When linking with -z call-nop=suffix-nop, the __fentry__ call via the GOT will be relaxed by the linker into a 5 byte call followed by a 1 byte NOP, so I don't think we need to do anything special here. It might mean we currently lose -mnop-mcount until we find a solution for that in the compiler. In case you remember, I contributed and you merged a GCC patch that makes the __fentry__ emission logic honour -fdirect-access-external-data which should help here. This landed in GCC 14. > Looking through your series, I didn't find some of the non-RIP -> RIP > changes proposed by the original series (especially the ftrace part), > and noticed that there is no objtool validator proposed to ensure that > all generated code is indeed PIE compatible. > What would be the point of that? The linker will complain and throw an error if the code cannot be converted into a PIE executable, so I don't think we need objtool's help for that. > Speaking of non-RIP -> RIP changes that require a temporary - would it > be beneficial to make a macro that would use the RIP form only when > #ifdef CONFIG_X86_PIE? That would avoid code size increase when PIE is > not needed. > This series does not make the PIE support configurable. Do you think the code size increase is a concern if all GOT based symbol references are elided, e.g, via -fdirect-access-external-data?
On Wed, Sep 25, 2024 at 10:01 PM Ard Biesheuvel <ardb@kernel.org> wrote: > > On Wed, 25 Sept 2024 at 21:39, Uros Bizjak <ubizjak@gmail.com> wrote: > > > > On Wed, Sep 25, 2024 at 9:14 PM Ard Biesheuvel <ardb@kernel.org> wrote: > > > > > > On Wed, 25 Sept 2024 at 20:54, Uros Bizjak <ubizjak@gmail.com> wrote: > > > > > > > > On Wed, Sep 25, 2024 at 5:02 PM Ard Biesheuvel <ardb+git@google.com> wrote: > > > > > > > > > > From: Ard Biesheuvel <ardb@kernel.org> > > > > > > > > > > Build the kernel as a Position Independent Executable (PIE). This > > > > > results in more efficient relocation processing for the virtual > > > > > displacement of the kernel (for KASLR). More importantly, it instructs > > > > > the linker to generate what is actually needed (a program that can be > > > > > moved around in memory before execution), which is better than having to > > > > > rely on the linker to create a position dependent binary that happens to > > > > > tolerate being moved around after poking it in exactly the right manner. > > > > > > > > > > Note that this means that all codegen should be compatible with PIE, > > > > > including Rust objects, so this needs to switch to the small code model > > > > > with the PIE relocation model as well. > > > > > > > > I think that related to this work is the patch series [1] that > > > > introduces the changes necessary to build the kernel as Position > > > > Independent Executable (PIE) on x86_64 [1]. There are some more places > > > > that need to be adapted for PIE. The patch series also introduces > > > > objtool functionality to add validation for x86 PIE. > > > > > > > > [1] "[PATCH RFC 00/43] x86/pie: Make kernel image's virtual address flexible" > > > > https://lore.kernel.org/lkml/cover.1682673542.git.houwenlong.hwl@antgroup.com/ > > > > > > > > > > Hi Uros, > > > > > > I am aware of that discussion, as I took part in it as well. > > > > > > I don't think any of those changes are actually needed now - did you > > > notice anything in particular that is missing? > > > > Some time ago I went through the kernel sources and proposed several > > patches that changed all trivial occurrences of non-RIP addresses to > > RIP ones. The work was partially based on the mentioned patch series, > > and I remember, I left some of them out [e.g. 1], because they > > required a temporary variable. > > I have a similar patch in my series, but the DEBUG_ENTRY code just uses > > pushf 1f@GOTPCREL(%rip) > > so no temporaries are needed. > > > Also, there was discussion about ftrace > > [2], where no solution was found. > > > > When linking with -z call-nop=suffix-nop, the __fentry__ call via the > GOT will be relaxed by the linker into a 5 byte call followed by a 1 > byte NOP, so I don't think we need to do anything special here. It > might mean we currently lose -mnop-mcount until we find a solution for > that in the compiler. In case you remember, I contributed and you > merged a GCC patch that makes the __fentry__ emission logic honour > -fdirect-access-external-data which should help here. This landed in > GCC 14. > > > Looking through your series, I didn't find some of the non-RIP -> RIP > > changes proposed by the original series (especially the ftrace part), > > and noticed that there is no objtool validator proposed to ensure that > > all generated code is indeed PIE compatible. > > > > What would be the point of that? The linker will complain and throw an > error if the code cannot be converted into a PIE executable, so I > don't think we need objtool's help for that. Indeed. > > Speaking of non-RIP -> RIP changes that require a temporary - would it > > be beneficial to make a macro that would use the RIP form only when > > #ifdef CONFIG_X86_PIE? That would avoid code size increase when PIE is > > not needed. > > > > This series does not make the PIE support configurable. Do you think > the code size increase is a concern if all GOT based symbol references > are elided, e.g, via -fdirect-access-external-data? I was looking at the code size measurement of the original patch series (perhaps these are not relevant with your series) and I think 2.2% - 2.4% code size increase can be problematic. Can you perhaps provide new code size increase measurements with your patches applied? Thanks and BR, Uros.
© 2016 - 2024 Red Hat, Inc.