From nobody Mon Feb 9 16:12:32 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0029FC001DC for ; Wed, 12 Jul 2023 03:32:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231784AbjGLDcS (ORCPT ); Tue, 11 Jul 2023 23:32:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37630 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231765AbjGLDb7 (ORCPT ); Tue, 11 Jul 2023 23:31:59 -0400 Received: from out0-216.mail.aliyun.com (out0-216.mail.aliyun.com [140.205.0.216]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 42F27A1 for ; Tue, 11 Jul 2023 20:31:54 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R841e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018047193;MF=houwenlong.hwl@antgroup.com;NM=1;PH=DS;RN=26;SR=0;TI=SMTPD_---.TrdHx1m_1689132706; Received: from localhost(mailfrom:houwenlong.hwl@antgroup.com fp:SMTPD_---.TrdHx1m_1689132706) by smtp.aliyun-inc.com; Wed, 12 Jul 2023 11:31:47 +0800 From: "Hou Wenlong" To: linux-kernel@vger.kernel.org Cc: "Lai Jiangshan" , "Hou Wenlong" , "Thomas Gleixner" , "Ingo Molnar" , "Borislav Petkov" , "Dave Hansen" , "=?UTF-8?B?bWFpbnRhaW5lcjpYODYgQVJDSElURUNUVVJFIDMyLUJJVCBBTkQgNjQtQklU?=" , "H. Peter Anvin" , "Nathan Chancellor" , "Nick Desaulniers" , "Tom Rix" , "Josh Poimboeuf" , "=?UTF-8?B?UGV0ZXIgWmlqbHN0cmEgKEludGVsKQ==?=" , "Brian Gerst" , "Eric W. Biederman" , "=?UTF-8?B?TWFzYW1pIEhpcmFtYXRzdSAoR29vZ2xlKQ==?=" , "Masahiro Yamada" , "Sami Tolvanen" , "Alexander Potapenko" , "Mike Rapoport" , "Pasha Tatashin" , "David Woodhouse" , "Usama Arif" , "Tom Lendacky" , "=?UTF-8?B?b3BlbiBsaXN0OkNMQU5HL0xMVk0gQlVJTEQgU1VQUE9SVA==?=" Subject: [PATCH RFC 5/7] x86/head/64: Build the head code as PIE Date: Wed, 12 Jul 2023 11:30:09 +0800 Message-Id: <12bb41e38979f68760a98be3b25cc66cb955e559.1689130310.git.houwenlong.hwl@antgroup.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" During the early boot stage, the head code runs in a low identity address, so all absolute references would be incorrect. However, the compiler doesn't have to generate PC-relative references when accessing globals. To work around this problem, every global variable access must be adjusted using fixup_pointer(). Nevertheless, the compiler could generate absolute references for some global variable accesses, and the behavior differs between GCC and CLANG. For example, GCC generates PC-relative references for 'next_early_pgt', while CLANG generates absolute references. Moreover, the rule is not always clear, e.g., 'pgdir_shift' is a non-static global variable similar to 'next_early_pgt', but the compiler could generate the right PC-relative reference, so fixup_pointer() is not applied when using the PGDIR_SHIFT macro. To avoid such cases, build the head code as PIE to force the generation of PC-relative references and eliminate the need for fixup_pointer(). However, it may be necessary to obtain an absolute virtual address for some symbols in the head code. In this case, use 'movabsq'. Additionally, the 'mcmodel=3Dkernel' option is not compatible with the 'fPIE' option and needs to be removed. This will result in using the '%fs' register for stack canary access if the stack protector is enabled. The 'mstack-protector-guard-reg' compiler option could be used to fix this, but it is only used in 32-bit now and 64-bit is still using the fixed location version. Since the head code is in the early booting stage, it is safe to disable the stack protector for the head code. Suggested-by: Lai Jiangshan Signed-off-by: Hou Wenlong --- arch/x86/include/asm/setup.h | 2 +- arch/x86/kernel/Makefile | 11 +++ arch/x86/kernel/head64_identity.c | 112 +++++++++++------------------- arch/x86/kernel/head_64.S | 2 - 4 files changed, 52 insertions(+), 75 deletions(-) diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h index f3495623ac99..b893b0cdddac 100644 --- a/arch/x86/include/asm/setup.h +++ b/arch/x86/include/asm/setup.h @@ -50,7 +50,7 @@ extern unsigned long saved_video_mode; extern void reserve_standard_io_resources(void); extern void i386_reserve_resources(void); extern unsigned long __startup_64(unsigned long physaddr, struct boot_para= ms *bp); -extern void startup_64_setup_env(unsigned long physbase); +extern void startup_64_setup_env(void); extern void early_setup_idt(void); extern void __init do_early_exception(struct pt_regs *regs, int trapnr); =20 diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 2fd9a4fe27b1..6564113f5298 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -44,6 +44,17 @@ KCOV_INSTRUMENT :=3D n =20 CFLAGS_irq.o :=3D -I $(srctree)/$(src)/../include/asm/trace =20 +# The 'mcmodel=3Dkernel' option is not compatible with the 'fPIE' option a= nd +# needs to be removed. This will result in using the '%fs' register for st= ack +# canary access if the stack protector is enabled. The +# 'mstack-protector-guard-reg' compiler option could be used to fix this, = but +# it is only used in 32-bit now and 64-bit is still using the fixed locati= on +# version. Since the head code is in the early booting stage, it is safe to +# disable the stack protector for the head code. +CFLAGS_REMOVE_head64_identity.o +=3D -mcmodel=3Dkernel +CFLAGS_head64_identity.o +=3D -fPIE -include $(srctree)/include/linux/hidd= en.h +CFLAGS_head64_identity.o +=3D -fno-stack-protector + obj-y +=3D head_$(BITS).o obj-y +=3D head$(BITS).o obj-y +=3D ebda.o diff --git a/arch/x86/kernel/head64_identity.c b/arch/x86/kernel/head64_ide= ntity.c index a10acbe00fe9..93f5831917bc 100644 --- a/arch/x86/kernel/head64_identity.c +++ b/arch/x86/kernel/head64_identity.c @@ -45,23 +45,8 @@ static struct desc_ptr startup_gdt_descr __initdata =3D { =20 #define __head __section(".head.text") =20 -static void __head *fixup_pointer(void *ptr, unsigned long physaddr) -{ - return ptr - (void *)_text + (void *)physaddr; -} - -static unsigned long __head *fixup_long(void *ptr, unsigned long physaddr) -{ - return fixup_pointer(ptr, physaddr); -} - #ifdef CONFIG_X86_5LEVEL -static unsigned int __head *fixup_int(void *ptr, unsigned long physaddr) -{ - return fixup_pointer(ptr, physaddr); -} - -static bool __head check_la57_support(unsigned long physaddr) +static bool __head check_la57_support(void) { /* * 5-level paging is detected and enabled at kernel decompression @@ -70,22 +55,27 @@ static bool __head check_la57_support(unsigned long phy= saddr) if (!(native_read_cr4() & X86_CR4_LA57)) return false; =20 - *fixup_int(&__pgtable_l5_enabled, physaddr) =3D 1; - *fixup_int(&pgdir_shift, physaddr) =3D 48; - *fixup_int(&ptrs_per_p4d, physaddr) =3D 512; - *fixup_long(&page_offset_base, physaddr) =3D __PAGE_OFFSET_BASE_L5; - *fixup_long(&vmalloc_base, physaddr) =3D __VMALLOC_BASE_L5; - *fixup_long(&vmemmap_base, physaddr) =3D __VMEMMAP_BASE_L5; + __pgtable_l5_enabled =3D 1; + pgdir_shift =3D 48; + ptrs_per_p4d =3D 512; + page_offset_base =3D __PAGE_OFFSET_BASE_L5; + vmalloc_base =3D __VMALLOC_BASE_L5; + vmemmap_base =3D __VMEMMAP_BASE_L5; =20 return true; } #else -static bool __head check_la57_support(unsigned long physaddr) +static bool __head check_la57_support(void) { return false; } #endif =20 +#define SYM_ABS_VA(sym) ({ \ + unsigned long __v; \ + asm("movabsq $" __stringify(sym) ", %0":"=3Dr"(__v)); \ + __v; }) + static unsigned long __head sme_postprocess_startup(struct boot_params *bp= , pmdval_t *pmd) { unsigned long vaddr, vaddr_end; @@ -101,8 +91,8 @@ static unsigned long __head sme_postprocess_startup(stru= ct boot_params *bp, pmdv * attribute. */ if (sme_get_me_mask()) { - vaddr =3D (unsigned long)__start_bss_decrypted; - vaddr_end =3D (unsigned long)__end_bss_decrypted; + vaddr =3D SYM_ABS_VA(__start_bss_decrypted); + vaddr_end =3D SYM_ABS_VA(__end_bss_decrypted); =20 for (; vaddr < vaddr_end; vaddr +=3D PMD_SIZE) { /* @@ -129,12 +119,6 @@ static unsigned long __head sme_postprocess_startup(st= ruct boot_params *bp, pmdv return sme_get_me_mask(); } =20 -/* Code in __startup_64() can be relocated during execution, but the compi= ler - * doesn't have to generate PC-relative relocations when accessing globals= from - * that function. Clang actually does not generate them, which leads to - * boot-time crashes. To work around this problem, every global pointer mu= st - * be adjusted using fixup_pointer(). - */ unsigned long __head __startup_64(unsigned long physaddr, struct boot_params *bp) { @@ -144,12 +128,10 @@ unsigned long __head __startup_64(unsigned long physa= ddr, p4dval_t *p4d; pudval_t *pud; pmdval_t *pmd, pmd_entry; - pteval_t *mask_ptr; bool la57; int i; - unsigned int *next_pgt_ptr; =20 - la57 =3D check_la57_support(physaddr); + la57 =3D check_la57_support(); =20 /* Is the address too large? */ if (physaddr >> MAX_PHYSMEM_BITS) @@ -159,7 +141,7 @@ unsigned long __head __startup_64(unsigned long physadd= r, * Compute the delta between the address I am compiled to run at * and the address I am actually running at. */ - load_delta =3D physaddr - (unsigned long)(_text - __START_KERNEL_map); + load_delta =3D physaddr - (SYM_ABS_VA(_text) - __START_KERNEL_map); =20 /* Is the address not 2M aligned? */ if (load_delta & ~PMD_MASK) @@ -170,26 +152,24 @@ unsigned long __head __startup_64(unsigned long physa= ddr, =20 /* Fixup the physical addresses in the page table */ =20 - pgd =3D fixup_pointer(&early_top_pgt, physaddr); + pgd =3D (pgdval_t *)early_top_pgt; p =3D pgd + pgd_index(__START_KERNEL_map); if (la57) *p =3D (unsigned long)level4_kernel_pgt; else *p =3D (unsigned long)level3_kernel_pgt; - *p +=3D _PAGE_TABLE_NOENC - __START_KERNEL_map + load_delta; + *p +=3D _PAGE_TABLE_NOENC + sme_get_me_mask(); =20 if (la57) { - p4d =3D fixup_pointer(&level4_kernel_pgt, physaddr); + p4d =3D (p4dval_t *)level4_kernel_pgt; p4d[511] +=3D load_delta; } =20 - pud =3D fixup_pointer(&level3_kernel_pgt, physaddr); - pud[510] +=3D load_delta; - pud[511] +=3D load_delta; + level3_kernel_pgt[510].pud +=3D load_delta; + level3_kernel_pgt[511].pud +=3D load_delta; =20 - pmd =3D fixup_pointer(level2_fixmap_pgt, physaddr); for (i =3D FIXMAP_PMD_TOP; i > FIXMAP_PMD_TOP - FIXMAP_PMD_NUM; i--) - pmd[i] +=3D load_delta; + level2_fixmap_pgt[i].pmd +=3D load_delta; =20 /* * Set up the identity mapping for the switchover. These @@ -198,15 +178,13 @@ unsigned long __head __startup_64(unsigned long physa= ddr, * it avoids problems around wraparound. */ =20 - next_pgt_ptr =3D fixup_pointer(&next_early_pgt, physaddr); - pud =3D fixup_pointer(early_dynamic_pgts[(*next_pgt_ptr)++], physaddr); - pmd =3D fixup_pointer(early_dynamic_pgts[(*next_pgt_ptr)++], physaddr); + pud =3D (pudval_t *)early_dynamic_pgts[next_early_pgt++]; + pmd =3D (pmdval_t *)early_dynamic_pgts[next_early_pgt++]; =20 pgtable_flags =3D _KERNPG_TABLE_NOENC + sme_get_me_mask(); =20 if (la57) { - p4d =3D fixup_pointer(early_dynamic_pgts[(*next_pgt_ptr)++], - physaddr); + p4d =3D (p4dval_t *)early_dynamic_pgts[next_early_pgt++]; =20 i =3D (physaddr >> PGDIR_SHIFT) % PTRS_PER_PGD; pgd[i + 0] =3D (pgdval_t)p4d + pgtable_flags; @@ -227,8 +205,7 @@ unsigned long __head __startup_64(unsigned long physadd= r, =20 pmd_entry =3D __PAGE_KERNEL_LARGE_EXEC & ~_PAGE_GLOBAL; /* Filter out unsupported __PAGE_KERNEL_* bits: */ - mask_ptr =3D fixup_pointer(&__supported_pte_mask, physaddr); - pmd_entry &=3D *mask_ptr; + pmd_entry &=3D __supported_pte_mask; pmd_entry +=3D sme_get_me_mask(); pmd_entry +=3D physaddr; =20 @@ -253,15 +230,14 @@ unsigned long __head __startup_64(unsigned long physa= ddr, * speculative access to some reserved areas is caught as an * error, causing the BIOS to halt the system. */ - - pmd =3D fixup_pointer(level2_kernel_pgt, physaddr); + pmd =3D (pmdval_t *)level2_kernel_pgt; =20 /* invalidate pages before the kernel image */ - for (i =3D 0; i < pmd_index((unsigned long)_text); i++) + for (i =3D 0; i < pmd_index(SYM_ABS_VA(_text)); i++) pmd[i] &=3D ~_PAGE_PRESENT; =20 /* fixup pages that are part of the kernel image */ - for (; i <=3D pmd_index((unsigned long)_end); i++) + for (; i <=3D pmd_index(SYM_ABS_VA(_end)); i++) if (pmd[i] & _PAGE_PRESENT) pmd[i] +=3D load_delta; =20 @@ -273,37 +249,29 @@ unsigned long __head __startup_64(unsigned long physa= ddr, * Fixup phys_base - remove the memory encryption mask to obtain * the true physical address. */ - *fixup_long(&phys_base, physaddr) +=3D load_delta - sme_get_me_mask(); + phys_base +=3D load_delta - sme_get_me_mask(); =20 return sme_postprocess_startup(bp, pmd); } =20 /* This runs while still in the direct mapping */ -static void __head startup_64_load_idt(unsigned long physbase) +static void __head startup_64_load_idt(void) { - struct desc_ptr *desc =3D fixup_pointer(&bringup_idt_descr, physbase); - gate_desc *idt =3D fixup_pointer(bringup_idt_table, physbase); - - - if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT)) { - void *handler; - - /* VMM Communication Exception */ - handler =3D fixup_pointer(vc_no_ghcb, physbase); - set_bringup_idt_handler(idt, X86_TRAP_VC, handler); - } + /* VMM Communication Exception */ + if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT)) + set_bringup_idt_handler(bringup_idt_table, X86_TRAP_VC, vc_no_ghcb); =20 - desc->address =3D (unsigned long)idt; - native_load_idt(desc); + bringup_idt_descr.address =3D (unsigned long)bringup_idt_table; + native_load_idt(&bringup_idt_descr); } =20 /* * Setup boot CPU state needed before kernel switches to virtual addresses. */ -void __head startup_64_setup_env(unsigned long physbase) +void __head startup_64_setup_env(void) { /* Load GDT */ - startup_gdt_descr.address =3D (unsigned long)fixup_pointer(startup_gdt, p= hysbase); + startup_gdt_descr.address =3D (unsigned long)startup_gdt; native_load_gdt(&startup_gdt_descr); =20 /* New GDT is live - reload data segment registers */ @@ -311,5 +279,5 @@ void __head startup_64_setup_env(unsigned long physbase) "movl %%eax, %%ss\n" "movl %%eax, %%es\n" : : "a"(__KERNEL_DS) : "memory"); =20 - startup_64_load_idt(physbase); + startup_64_load_idt(); } diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index c5b9289837dc..5b46da66d6c8 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -66,8 +66,6 @@ SYM_CODE_START_NOALIGN(startup_64) /* Set up the stack for verify_cpu() */ leaq (__end_init_task - PTREGS_SIZE)(%rip), %rsp =20 - leaq _text(%rip), %rdi - /* Setup GSBASE to allow stack canary access for C code */ movl $MSR_GS_BASE, %ecx leaq INIT_PER_CPU_VAR(fixed_percpu_data)(%rip), %rdx --=20 2.31.1