From nobody Fri Dec 19 02:50:42 2025 Received: from mail-wm1-f73.google.com (mail-wm1-f73.google.com [209.85.128.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 250BE224AF0 for ; Mon, 7 Apr 2025 06:11:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744006306; cv=none; b=maKBGNWz7okkQIR+YJD/wbxOsB2UkJ3h4Z7xw/dtMvqcX34IiQPNSs0dvlK+6tkMEWWNxyDQXgK01gME8V0X5ssZwbZ1s5GVXqmL/S2lujHlPdIVY2FzxZ8pSFM/dTCgw5JydOO4SkOfDjA/L2ea+MGYvVrOy9DxVub/0bhikkY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744006306; c=relaxed/simple; bh=29wEvP3mo2FgQbpZl2dIbWEAUdhjtBmuI+j9Jhz91AM=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=BppilFInkYEdofsWi6mEsGd5lmPxSkLtUulkrd7VYHQlWTBf8uDyDNT4QvYV1OUFC7L3ylFRZxAk9UetMApMgZSkglHuFwsoeeWIsoKzFAkyKk4OqjI/cwC/yNFmbqT3WXgnZx0N/MXw1bJBwMxlEFxoGDflAPFhwe2fc51zeug= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ardb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=xne6OdtT; arc=none smtp.client-ip=209.85.128.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ardb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="xne6OdtT" Received: by mail-wm1-f73.google.com with SMTP id 5b1f17b1804b1-43eea5a5d80so7717315e9.1 for ; Sun, 06 Apr 2025 23:11:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1744006302; x=1744611102; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=BMxiBFcThX7tgGvKaVxziX0eJsMo+maZ3LTI6Ameq7o=; b=xne6OdtTKOwxe8Z43OxJooYgvXvMTJ56cz0wQCE1RsDOUgGR8Aznxn2AbDg9Pwkud1 MJSQlr/6DwkRW+ClNHsOM+9n5ebpeUYJDcGevz641xz5WRnlURwCq1tpN+o9rR7fYsj5 C9ZwAT7GPoWTehckoxP5aRe09tT2uD2EUtekDGLvf3HBpFyJQMpbiHgPk55L8xm0s37V GnFxX36/khd40A3xHSDI10+RDNL9Z/NlJEfbwPqoTgMyFFL3u9xt3POTu2V5fmfv7pma v59wL30jGqaaWpt2QyIrW/PMEgQhfvibX4tUb0MRTUBgL1pzTGe3l7RyDqG6sWeKdY0j G4xg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744006302; x=1744611102; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=BMxiBFcThX7tgGvKaVxziX0eJsMo+maZ3LTI6Ameq7o=; b=OYUN8c5ZJnkaBVcx1yY4aMF0aEzWE7dE7Bhnwzh57oeHGtXWrEIN3EMtuwIEpg5s1A R9ezo2a227O6v+WZytgkBrZR/NY8VJV5RO7DEdJVDNzkOmjIx9pfw3ahA6hPkzajZGaG OY9MV74MzftzousUTZKMrIv20nixjkGreHD7PNdUaploXkT/JTBM8U0rxOFl+i2RgRw/ /kWj1FhrqejiHLEtnBr4qXx0V9M6X4C935E255VstDPvZZwGSmMKuGHfnM2oepdHqXg4 ra/6L3+TJkyxNLAWNRljTn8xpB+uWpOeLuaWlk+uDNhvjpFcPl17ICS//ryekBEWBme0 /Ygg== X-Forwarded-Encrypted: i=1; AJvYcCWNe4nL2HFky+20s7V5db9m1O8cf+qEF7cMpdTvzY8AmYd4kIQRr3cuG8u22gVl8OzRmg8nRdF0Rtnan2I=@vger.kernel.org X-Gm-Message-State: AOJu0YyjMaz2FxJ42v3eEXHMzuMmsifaQ224HC/+juRKHV8bvRYqQ58m clXZsGGWjnXg4Sp5+z3Cqrum2riKNjzJP4naAOxritF6dhLmK9EOjXRNnjEZV6JyMcCKbA== X-Google-Smtp-Source: AGHT+IHRTs8Q23SbKVrOZW+vwKXToI+ggMkxg+UmNCECbwh8wFZga2Ek6mwzvgSVHQQ/vis+fOpRvZEU X-Received: from wrbcg11.prod.google.com ([2002:a5d:5ccb:0:b0:39a:bfe3:f904]) (user=ardb job=prod-delivery.src-stubby-dispatcher) by 2002:a5d:584c:0:b0:390:fb37:1ca with SMTP id ffacd0b85a97d-39cba93cf5dmr9460146f8f.53.1744006302441; Sun, 06 Apr 2025 23:11:42 -0700 (PDT) Date: Mon, 7 Apr 2025 08:11:34 +0200 In-Reply-To: <20250407061132.69315-4-ardb+git@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250407061132.69315-4-ardb+git@google.com> X-Developer-Key: i=ardb@kernel.org; a=openpgp; fpr=F43D03328115A198C90016883D200E9CA6329909 X-Developer-Signature: v=1; a=openpgp-sha256; l=16201; i=ardb@kernel.org; h=from:subject; bh=bFcOQFipnneenYlvZh5Ml236kuSq+b0rnH02R3Sld2s=; b=owGbwMvMwCFmkMcZplerG8N4Wi2JIf1zznSXw4sX1yy+ezVQPl6l9f70Lz0PndZqxUhUbWm7G SKR8Xl7RykLgxgHg6yYIovA7L/vdp6eKFXrPEsWZg4rE8gQBi5OAZhIyzxGhp8miokTU9iMDj/+ tIXzxlXJv2HXH2Ybzd4QNNk/g6Mk5TXD/yyzO5NlVCqmcJzYu+BYmPv2G9Wnr+evNauO6pmizZP qxA8A X-Mailer: git-send-email 2.49.0.504.g3bcea36a83-goog Message-ID: <20250407061132.69315-5-ardb+git@google.com> Subject: [PATCH v2 1/2] x86/boot: Move early kernel mapping code into startup/ From: Ard Biesheuvel To: mingo@kernel.org Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-efi@vger.kernel.org, Ard Biesheuvel Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Ard Biesheuvel The startup code that constructs the kernel virtual mapping runs from the 1:1 mapping of memory itself, and therefore, cannot use absolute symbol references. Move this code into a separate source file under arch/x86/boot/startup/ where all such code will be kept from now on. Since all code here is constructed in a manner that ensures that it tolerates running from the 1:1 mapping of memory, any uses of the RIP_REL_REF() macro can be dropped, along with __head annotations for placing this code in a dedicated startup section. Signed-off-by: Ard Biesheuvel --- arch/x86/boot/startup/Makefile | 2 +- arch/x86/boot/startup/map_kernel.c | 215 ++++++++++++++++++++ arch/x86/kernel/head64.c | 211 +------------------ 3 files changed, 217 insertions(+), 211 deletions(-) diff --git a/arch/x86/boot/startup/Makefile b/arch/x86/boot/startup/Makefile index 34b324cbd5a4..01423063fec2 100644 --- a/arch/x86/boot/startup/Makefile +++ b/arch/x86/boot/startup/Makefile @@ -15,7 +15,7 @@ KMSAN_SANITIZE :=3D n UBSAN_SANITIZE :=3D n KCOV_INSTRUMENT :=3D n =20 -obj-$(CONFIG_X86_64) +=3D gdt_idt.o +obj-$(CONFIG_X86_64) +=3D gdt_idt.o map_kernel.o =20 lib-$(CONFIG_X86_64) +=3D la57toggle.o lib-$(CONFIG_EFI_MIXED) +=3D efi-mixed.o diff --git a/arch/x86/boot/startup/map_kernel.c b/arch/x86/boot/startup/map= _kernel.c new file mode 100644 index 000000000000..1cf57c03c319 --- /dev/null +++ b/arch/x86/boot/startup/map_kernel.c @@ -0,0 +1,215 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include +#include +#include +#include +#include + +#include +#include +#include + +extern pmd_t early_dynamic_pgts[EARLY_DYNAMIC_PAGE_TABLES][PTRS_PER_PMD]; +extern unsigned int next_early_pgt; + +static inline bool check_la57_support(void) +{ + if (!IS_ENABLED(CONFIG_X86_5LEVEL)) + return false; + + /* + * 5-level paging is detected and enabled at kernel decompression + * stage. Only check if it has been enabled there. + */ + if (!(native_read_cr4() & X86_CR4_LA57)) + return false; + + __pgtable_l5_enabled =3D 1; + pgdir_shift =3D 48; + ptrs_per_p4d =3D 512; + page_offset_base =3D __PAGE_OFFSET_BASE_L5; + vmalloc_base =3D __VMALLOC_BASE_L5; + vmemmap_base =3D __VMEMMAP_BASE_L5; + + return true; +} + +static unsigned long sme_postprocess_startup(struct boot_params *bp, + pmdval_t *pmd, + unsigned long p2v_offset) +{ + unsigned long paddr, paddr_end; + int i; + + /* Encrypt the kernel and related (if SME is active) */ + sme_encrypt_kernel(bp); + + /* + * Clear the memory encryption mask from the .bss..decrypted section. + * The bss section will be memset to zero later in the initialization so + * there is no need to zero it after changing the memory encryption + * attribute. + */ + if (sme_get_me_mask()) { + paddr =3D (unsigned long)__start_bss_decrypted; + paddr_end =3D (unsigned long)__end_bss_decrypted; + + for (; paddr < paddr_end; paddr +=3D PMD_SIZE) { + /* + * On SNP, transition the page to shared in the RMP table so that + * it is consistent with the page table attribute change. + * + * __start_bss_decrypted has a virtual address in the high range + * mapping (kernel .text). PVALIDATE, by way of + * early_snp_set_memory_shared(), requires a valid virtual + * address but the kernel is currently running off of the identity + * mapping so use the PA to get a *currently* valid virtual address. + */ + early_snp_set_memory_shared(paddr, paddr, PTRS_PER_PMD); + + i =3D pmd_index(paddr - p2v_offset); + pmd[i] -=3D sme_get_me_mask(); + } + } + + /* + * Return the SME encryption mask (if SME is active) to be used as a + * modifier for the initial pgdir entry programmed into CR3. + */ + return sme_get_me_mask(); +} + +unsigned long __init __startup_64(unsigned long p2v_offset, + struct boot_params *bp) +{ + pmd_t (*early_pgts)[PTRS_PER_PMD] =3D early_dynamic_pgts; + unsigned long physaddr =3D (unsigned long)_text; + unsigned long va_text, va_end; + unsigned long pgtable_flags; + unsigned long load_delta; + pgdval_t *pgd; + p4dval_t *p4d; + pudval_t *pud; + pmdval_t *pmd, pmd_entry; + bool la57; + int i; + + la57 =3D check_la57_support(); + + /* Is the address too large? */ + if (physaddr >> MAX_PHYSMEM_BITS) + for (;;); + + /* + * Compute the delta between the address I am compiled to run at + * and the address I am actually running at. + */ + phys_base =3D load_delta =3D __START_KERNEL_map + p2v_offset; + + /* Is the address not 2M aligned? */ + if (load_delta & ~PMD_MASK) + for (;;); + + va_text =3D physaddr - p2v_offset; + va_end =3D (unsigned long)_end - p2v_offset; + + /* Include the SME encryption mask in the fixup value */ + load_delta +=3D sme_get_me_mask(); + + /* Fixup the physical addresses in the page table */ + + pgd =3D &early_top_pgt[0].pgd; + pgd[pgd_index(__START_KERNEL_map)] +=3D load_delta; + + if (IS_ENABLED(CONFIG_X86_5LEVEL) && la57) { + p4d =3D (p4dval_t *)level4_kernel_pgt; + p4d[MAX_PTRS_PER_P4D - 1] +=3D load_delta; + + pgd[pgd_index(__START_KERNEL_map)] =3D (pgdval_t)p4d | _PAGE_TABLE; + } + + level3_kernel_pgt[PTRS_PER_PUD - 2].pud +=3D load_delta; + level3_kernel_pgt[PTRS_PER_PUD - 1].pud +=3D load_delta; + + for (i =3D FIXMAP_PMD_TOP; i > FIXMAP_PMD_TOP - FIXMAP_PMD_NUM; i--) + level2_fixmap_pgt[i].pmd +=3D load_delta; + + /* + * Set up the identity mapping for the switchover. These + * entries should *NOT* have the global bit set! This also + * creates a bunch of nonsense entries but that is fine -- + * it avoids problems around wraparound. + */ + + pud =3D &early_pgts[0]->pmd; + pmd =3D &early_pgts[1]->pmd; + next_early_pgt =3D 2; + + pgtable_flags =3D _KERNPG_TABLE_NOENC + sme_get_me_mask(); + + if (la57) { + p4d =3D &early_pgts[next_early_pgt++]->pmd; + + i =3D (physaddr >> PGDIR_SHIFT) % PTRS_PER_PGD; + pgd[i + 0] =3D (pgdval_t)p4d + pgtable_flags; + pgd[i + 1] =3D (pgdval_t)p4d + pgtable_flags; + + i =3D physaddr >> P4D_SHIFT; + p4d[(i + 0) % PTRS_PER_P4D] =3D (pgdval_t)pud + pgtable_flags; + p4d[(i + 1) % PTRS_PER_P4D] =3D (pgdval_t)pud + pgtable_flags; + } else { + i =3D (physaddr >> PGDIR_SHIFT) % PTRS_PER_PGD; + pgd[i + 0] =3D (pgdval_t)pud + pgtable_flags; + pgd[i + 1] =3D (pgdval_t)pud + pgtable_flags; + } + + i =3D physaddr >> PUD_SHIFT; + pud[(i + 0) % PTRS_PER_PUD] =3D (pudval_t)pmd + pgtable_flags; + pud[(i + 1) % PTRS_PER_PUD] =3D (pudval_t)pmd + pgtable_flags; + + pmd_entry =3D __PAGE_KERNEL_LARGE_EXEC & ~_PAGE_GLOBAL; + /* Filter out unsupported __PAGE_KERNEL_* bits: */ + pmd_entry &=3D __supported_pte_mask; + pmd_entry +=3D sme_get_me_mask(); + pmd_entry +=3D physaddr; + + for (i =3D 0; i < DIV_ROUND_UP(va_end - va_text, PMD_SIZE); i++) { + int idx =3D i + (physaddr >> PMD_SHIFT); + + pmd[idx % PTRS_PER_PMD] =3D pmd_entry + i * PMD_SIZE; + } + + /* + * Fixup the kernel text+data virtual addresses. Note that + * we might write invalid pmds, when the kernel is relocated + * cleanup_highmap() fixes this up along with the mappings + * beyond _end. + * + * Only the region occupied by the kernel image has so far + * been checked against the table of usable memory regions + * provided by the firmware, so invalidate pages outside that + * region. A page table entry that maps to a reserved area of + * memory would allow processor speculation into that area, + * and on some hardware (particularly the UV platform) even + * speculative access to some reserved areas is caught as an + * error, causing the BIOS to halt the system. + */ + + pmd =3D &level2_kernel_pgt[0].pmd; + + /* invalidate pages before the kernel image */ + for (i =3D 0; i < pmd_index(va_text); i++) + pmd[i] &=3D ~_PAGE_PRESENT; + + /* fixup pages that are part of the kernel image */ + for (; i <=3D pmd_index(va_end); i++) + if (pmd[i] & _PAGE_PRESENT) + pmd[i] +=3D load_delta; + + /* invalidate pages after the kernel image */ + for (; i < PTRS_PER_PMD; i++) + pmd[i] &=3D ~_PAGE_PRESENT; + + return sme_postprocess_startup(bp, pmd, p2v_offset); +} diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c index 5b993b545c7e..6b68a206fa7f 100644 --- a/arch/x86/kernel/head64.c +++ b/arch/x86/kernel/head64.c @@ -47,7 +47,7 @@ * Manage page tables very early on. */ extern pmd_t early_dynamic_pgts[EARLY_DYNAMIC_PAGE_TABLES][PTRS_PER_PMD]; -static unsigned int __initdata next_early_pgt; +unsigned int __initdata next_early_pgt; pmdval_t early_pmd_flags =3D __PAGE_KERNEL_LARGE & ~(_PAGE_GLOBAL | _PAGE_= NX); =20 #ifdef CONFIG_X86_5LEVEL @@ -67,215 +67,6 @@ unsigned long vmemmap_base __ro_after_init =3D __VMEMMA= P_BASE_L4; EXPORT_SYMBOL(vmemmap_base); #endif =20 -static inline bool check_la57_support(void) -{ - if (!IS_ENABLED(CONFIG_X86_5LEVEL)) - return false; - - /* - * 5-level paging is detected and enabled at kernel decompression - * stage. Only check if it has been enabled there. - */ - if (!(native_read_cr4() & X86_CR4_LA57)) - return false; - - RIP_REL_REF(__pgtable_l5_enabled) =3D 1; - RIP_REL_REF(pgdir_shift) =3D 48; - RIP_REL_REF(ptrs_per_p4d) =3D 512; - RIP_REL_REF(page_offset_base) =3D __PAGE_OFFSET_BASE_L5; - RIP_REL_REF(vmalloc_base) =3D __VMALLOC_BASE_L5; - RIP_REL_REF(vmemmap_base) =3D __VMEMMAP_BASE_L5; - - return true; -} - -static unsigned long __head sme_postprocess_startup(struct boot_params *bp, - pmdval_t *pmd, - unsigned long p2v_offset) -{ - unsigned long paddr, paddr_end; - int i; - - /* Encrypt the kernel and related (if SME is active) */ - sme_encrypt_kernel(bp); - - /* - * Clear the memory encryption mask from the .bss..decrypted section. - * The bss section will be memset to zero later in the initialization so - * there is no need to zero it after changing the memory encryption - * attribute. - */ - if (sme_get_me_mask()) { - paddr =3D (unsigned long)&RIP_REL_REF(__start_bss_decrypted); - paddr_end =3D (unsigned long)&RIP_REL_REF(__end_bss_decrypted); - - for (; paddr < paddr_end; paddr +=3D PMD_SIZE) { - /* - * On SNP, transition the page to shared in the RMP table so that - * it is consistent with the page table attribute change. - * - * __start_bss_decrypted has a virtual address in the high range - * mapping (kernel .text). PVALIDATE, by way of - * early_snp_set_memory_shared(), requires a valid virtual - * address but the kernel is currently running off of the identity - * mapping so use the PA to get a *currently* valid virtual address. - */ - early_snp_set_memory_shared(paddr, paddr, PTRS_PER_PMD); - - i =3D pmd_index(paddr - p2v_offset); - pmd[i] -=3D sme_get_me_mask(); - } - } - - /* - * Return the SME encryption mask (if SME is active) to be used as a - * modifier for the initial pgdir entry programmed into CR3. - */ - return sme_get_me_mask(); -} - -/* Code in __startup_64() can be relocated during execution, but the compi= ler - * doesn't have to generate PC-relative relocations when accessing globals= from - * that function. Clang actually does not generate them, which leads to - * boot-time crashes. To work around this problem, every global pointer mu= st - * be accessed using RIP_REL_REF(). Kernel virtual addresses can be determ= ined - * by subtracting p2v_offset from the RIP-relative address. - */ -unsigned long __head __startup_64(unsigned long p2v_offset, - struct boot_params *bp) -{ - pmd_t (*early_pgts)[PTRS_PER_PMD] =3D RIP_REL_REF(early_dynamic_pgts); - unsigned long physaddr =3D (unsigned long)&RIP_REL_REF(_text); - unsigned long va_text, va_end; - unsigned long pgtable_flags; - unsigned long load_delta; - pgdval_t *pgd; - p4dval_t *p4d; - pudval_t *pud; - pmdval_t *pmd, pmd_entry; - bool la57; - int i; - - la57 =3D check_la57_support(); - - /* Is the address too large? */ - if (physaddr >> MAX_PHYSMEM_BITS) - for (;;); - - /* - * Compute the delta between the address I am compiled to run at - * and the address I am actually running at. - */ - load_delta =3D __START_KERNEL_map + p2v_offset; - RIP_REL_REF(phys_base) =3D load_delta; - - /* Is the address not 2M aligned? */ - if (load_delta & ~PMD_MASK) - for (;;); - - va_text =3D physaddr - p2v_offset; - va_end =3D (unsigned long)&RIP_REL_REF(_end) - p2v_offset; - - /* Include the SME encryption mask in the fixup value */ - load_delta +=3D sme_get_me_mask(); - - /* Fixup the physical addresses in the page table */ - - pgd =3D &RIP_REL_REF(early_top_pgt)->pgd; - pgd[pgd_index(__START_KERNEL_map)] +=3D load_delta; - - if (IS_ENABLED(CONFIG_X86_5LEVEL) && la57) { - p4d =3D (p4dval_t *)&RIP_REL_REF(level4_kernel_pgt); - p4d[MAX_PTRS_PER_P4D - 1] +=3D load_delta; - - pgd[pgd_index(__START_KERNEL_map)] =3D (pgdval_t)p4d | _PAGE_TABLE; - } - - RIP_REL_REF(level3_kernel_pgt)[PTRS_PER_PUD - 2].pud +=3D load_delta; - RIP_REL_REF(level3_kernel_pgt)[PTRS_PER_PUD - 1].pud +=3D load_delta; - - for (i =3D FIXMAP_PMD_TOP; i > FIXMAP_PMD_TOP - FIXMAP_PMD_NUM; i--) - RIP_REL_REF(level2_fixmap_pgt)[i].pmd +=3D load_delta; - - /* - * Set up the identity mapping for the switchover. These - * entries should *NOT* have the global bit set! This also - * creates a bunch of nonsense entries but that is fine -- - * it avoids problems around wraparound. - */ - - pud =3D &early_pgts[0]->pmd; - pmd =3D &early_pgts[1]->pmd; - RIP_REL_REF(next_early_pgt) =3D 2; - - pgtable_flags =3D _KERNPG_TABLE_NOENC + sme_get_me_mask(); - - if (la57) { - p4d =3D &early_pgts[RIP_REL_REF(next_early_pgt)++]->pmd; - - i =3D (physaddr >> PGDIR_SHIFT) % PTRS_PER_PGD; - pgd[i + 0] =3D (pgdval_t)p4d + pgtable_flags; - pgd[i + 1] =3D (pgdval_t)p4d + pgtable_flags; - - i =3D physaddr >> P4D_SHIFT; - p4d[(i + 0) % PTRS_PER_P4D] =3D (pgdval_t)pud + pgtable_flags; - p4d[(i + 1) % PTRS_PER_P4D] =3D (pgdval_t)pud + pgtable_flags; - } else { - i =3D (physaddr >> PGDIR_SHIFT) % PTRS_PER_PGD; - pgd[i + 0] =3D (pgdval_t)pud + pgtable_flags; - pgd[i + 1] =3D (pgdval_t)pud + pgtable_flags; - } - - i =3D physaddr >> PUD_SHIFT; - pud[(i + 0) % PTRS_PER_PUD] =3D (pudval_t)pmd + pgtable_flags; - pud[(i + 1) % PTRS_PER_PUD] =3D (pudval_t)pmd + pgtable_flags; - - pmd_entry =3D __PAGE_KERNEL_LARGE_EXEC & ~_PAGE_GLOBAL; - /* Filter out unsupported __PAGE_KERNEL_* bits: */ - pmd_entry &=3D RIP_REL_REF(__supported_pte_mask); - pmd_entry +=3D sme_get_me_mask(); - pmd_entry +=3D physaddr; - - for (i =3D 0; i < DIV_ROUND_UP(va_end - va_text, PMD_SIZE); i++) { - int idx =3D i + (physaddr >> PMD_SHIFT); - - pmd[idx % PTRS_PER_PMD] =3D pmd_entry + i * PMD_SIZE; - } - - /* - * Fixup the kernel text+data virtual addresses. Note that - * we might write invalid pmds, when the kernel is relocated - * cleanup_highmap() fixes this up along with the mappings - * beyond _end. - * - * Only the region occupied by the kernel image has so far - * been checked against the table of usable memory regions - * provided by the firmware, so invalidate pages outside that - * region. A page table entry that maps to a reserved area of - * memory would allow processor speculation into that area, - * and on some hardware (particularly the UV platform) even - * speculative access to some reserved areas is caught as an - * error, causing the BIOS to halt the system. - */ - - pmd =3D &RIP_REL_REF(level2_kernel_pgt)->pmd; - - /* invalidate pages before the kernel image */ - for (i =3D 0; i < pmd_index(va_text); i++) - pmd[i] &=3D ~_PAGE_PRESENT; - - /* fixup pages that are part of the kernel image */ - for (; i <=3D pmd_index(va_end); i++) - if (pmd[i] & _PAGE_PRESENT) - pmd[i] +=3D load_delta; - - /* invalidate pages after the kernel image */ - for (; i < PTRS_PER_PMD; i++) - pmd[i] &=3D ~_PAGE_PRESENT; - - return sme_postprocess_startup(bp, pmd, p2v_offset); -} - /* Wipe all early page tables except for the kernel symbol map */ static void __init reset_early_page_tables(void) { --=20 2.49.0.504.g3bcea36a83-goog From nobody Fri Dec 19 02:50:43 2025 Received: from mail-wm1-f73.google.com (mail-wm1-f73.google.com [209.85.128.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 56B7F225388 for ; Mon, 7 Apr 2025 06:11:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744006308; cv=none; b=iDmWPH0p17p3h8NbdXP3lem0Ks/qEUS4CtBtSysodxN1+NrUa6Qopmb5Pj+58P7R3Q89m+VzuxWBgmIsV6GiCuyINBEEklwZjSnGUgfNSfBiblt4+jNmVBpyHstsxnQbZJGOtejeL9nJrbQ8Eqj9YmBJXDw8KheXkFpZBnAdy1Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744006308; c=relaxed/simple; bh=a4UOxoM75WIQaqNEDX2cAYA7eHrzdSLznlT5GRnSf4E=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=XJBw9q7E9uLgZMb5BQ8BP8xAzg1C6Tcnq1VhIYtkn29KYRXz1cBbBvjJiifa1AftoSTfTVGws+0lF9PzuhFMPa2xZsZFWfMsLVObwYcgBF7yvYEoMezjZDLpOkhe1scZ/4zYr2ZVKPZ+OPnUeYEje802Gs71BJ4wCFwoKE/inAs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ardb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=YNfNlIkm; arc=none smtp.client-ip=209.85.128.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ardb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="YNfNlIkm" Received: by mail-wm1-f73.google.com with SMTP id 5b1f17b1804b1-43d0830c3f7so33131185e9.2 for ; Sun, 06 Apr 2025 23:11:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1744006304; x=1744611104; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=pbfnJnCDmQrrChz5QAx7ox2nJHFPzvTjb5gL5epTEnc=; b=YNfNlIkmvHJdd6QgTmmX1EtX2Otd7iLf66SpmKMBmTNsh+dorMixFXRS6sZ0VuQlsa Ik09F1gLAF3pKXF57AEBXSHfWSDdGIayzt4KHIj7PW205d5AsHpEVRb3HfsUuPwJfipc 901Z3jUAKCobXz0brKae7/oqbREuBbumZK15Ws4EC8k28ttaY2zh0KdaHmFMQulS+Lo/ QSzo8NcdXqHgx7ECaJFb+lTPnkRiLoeTNb1ydKcns+b8SKsa0GF3x61A3z3zwPhI0cAd RIrsNfIaBrJwnBsRt1ri9VfHzN/6Pzykwk3Xjny8246tWnitePq63vzDZaeQbSQyVSpn htUw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744006304; x=1744611104; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=pbfnJnCDmQrrChz5QAx7ox2nJHFPzvTjb5gL5epTEnc=; b=di2Wx66Fjm9QGqeeMTnDQVqX/SkNvMtgSspx5N9wZa9xoFcaGjkrQr/r8FTB5/x1Wn e0XtZG9ECYIiewagenkVZ2f9q2PmeX1HSFbmZeBf5+k/BiBZ9zDCFu/5HDb0kxQ79RUb Ic9raK7FoAvv5HSo/3NEcTfmPwOrBHV3ERDBtSJ5D9d1f3p3Qn7ZaQgNmoZm9CuF6OLz kf9t2iQayYT89YgYLVNdZ8yQLBka5mG7OZtXZGpRp+vganEWV/WiDbHDjttrAVieBSdV +HAAnqsQGsnDAgwgTkQUxSSkHOoiQBokwWk0a2n3X0PDyGeQHSf0q2fhTL12I76hPhO1 HGLA== X-Forwarded-Encrypted: i=1; AJvYcCWGhDjaUQoY5x7ASSpBpCBKR6AdLi4EnzPwkUyEMrO5LUxA3ZrkfPqSerXnwUN0I0uZCq3cn/TpmixFIBo=@vger.kernel.org X-Gm-Message-State: AOJu0YzecrHnyogzhauLw+R6NYl4BcpVVdq/VpL7c1MejPoq4ijuAZ8m ufeAxuUd5F8D189jT4Wn+93rtJfQ7zagvvXkV0B/GreVZFGV/VJaK6pLjlZgO8jwLWy9Mg== X-Google-Smtp-Source: AGHT+IH0iTHKoSnExWD38wMno/X063/6ygT8RIzLClPsv9h9+b1uVjMhb1uquURy2RnxO8O0FC8QoAQi X-Received: from wmsp14.prod.google.com ([2002:a05:600c:1d8e:b0:43d:b71:a576]) (user=ardb job=prod-delivery.src-stubby-dispatcher) by 2002:a5d:5f96:0:b0:39c:cc7:3c62 with SMTP id ffacd0b85a97d-39d147577famr8969577f8f.51.1744006304822; Sun, 06 Apr 2025 23:11:44 -0700 (PDT) Date: Mon, 7 Apr 2025 08:11:35 +0200 In-Reply-To: <20250407061132.69315-4-ardb+git@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250407061132.69315-4-ardb+git@google.com> X-Developer-Key: i=ardb@kernel.org; a=openpgp; fpr=F43D03328115A198C90016883D200E9CA6329909 X-Developer-Signature: v=1; a=openpgp-sha256; l=9644; i=ardb@kernel.org; h=from:subject; bh=rLLOUVelgqK5VOtLrSMJgnsMJeUEIBCp5YJLNfqkZe8=; b=owGbwMvMwCFmkMcZplerG8N4Wi2JIf1zzgyGh6aevt9TThfd95I22LzC+W9FWXbBjpSLrzwLf MUD5TQ6SlkYxDgYZMUUWQRm/3238/REqVrnWbIwc1iZwIZwcQrAREpbGBnWhGRWf3/of0v7Br/A bkmtE5eDrupM7Ju2ftICe+kHDTpyDP+zRdL5Tl6ZYCVQ+73RaIUzy5f527j2rNPe3lfBeW/byZV sAA== X-Mailer: git-send-email 2.49.0.504.g3bcea36a83-goog Message-ID: <20250407061132.69315-6-ardb+git@google.com> Subject: [PATCH v2 2/2] x86/boot: Move early SME init code into startup/ From: Ard Biesheuvel To: mingo@kernel.org Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-efi@vger.kernel.org, Ard Biesheuvel Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Ard Biesheuvel Move the SME initialization code, which runs from the 1:1 mapping of memory as it operates on the kernel virtual mapping, into the new sub-directory arch/x86/boot/startup/ where all startup code will reside that needs to tolerate executing from the 1:1 mapping. This allows RIP_REL_REF() macro invocations and __head annotations to be dropped. Signed-off-by: Ard Biesheuvel --- arch/x86/boot/startup/Makefile | 1 + arch/x86/{mm/mem_encrypt_identity.c =3D> boot/startup/sme.c} | 45 ++++++++= +----------- arch/x86/include/asm/mem_encrypt.h | 2 +- arch/x86/mm/Makefile | 6 --- 4 files changed, 23 insertions(+), 31 deletions(-) diff --git a/arch/x86/boot/startup/Makefile b/arch/x86/boot/startup/Makefile index 01423063fec2..480c2d2063a0 100644 --- a/arch/x86/boot/startup/Makefile +++ b/arch/x86/boot/startup/Makefile @@ -16,6 +16,7 @@ UBSAN_SANITIZE :=3D n KCOV_INSTRUMENT :=3D n =20 obj-$(CONFIG_X86_64) +=3D gdt_idt.o map_kernel.o +obj-$(CONFIG_AMD_MEM_ENCRYPT) +=3D sme.o =20 lib-$(CONFIG_X86_64) +=3D la57toggle.o lib-$(CONFIG_EFI_MIXED) +=3D efi-mixed.o diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/boot/startup/sme= .c similarity index 92% rename from arch/x86/mm/mem_encrypt_identity.c rename to arch/x86/boot/startup/sme.c index 5eecdd92da10..85bd39652535 100644 --- a/arch/x86/mm/mem_encrypt_identity.c +++ b/arch/x86/boot/startup/sme.c @@ -45,8 +45,6 @@ #include #include =20 -#include "mm_internal.h" - #define PGD_FLAGS _KERNPG_TABLE_NOENC #define P4D_FLAGS _KERNPG_TABLE_NOENC #define PUD_FLAGS _KERNPG_TABLE_NOENC @@ -93,7 +91,7 @@ struct sme_populate_pgd_data { */ static char sme_workarea[2 * PMD_SIZE] __section(".init.scratch"); =20 -static void __head sme_clear_pgd(struct sme_populate_pgd_data *ppd) +static void __init sme_clear_pgd(struct sme_populate_pgd_data *ppd) { unsigned long pgd_start, pgd_end, pgd_size; pgd_t *pgd_p; @@ -108,7 +106,7 @@ static void __head sme_clear_pgd(struct sme_populate_pg= d_data *ppd) memset(pgd_p, 0, pgd_size); } =20 -static pud_t __head *sme_prepare_pgd(struct sme_populate_pgd_data *ppd) +static pud_t __init *sme_prepare_pgd(struct sme_populate_pgd_data *ppd) { pgd_t *pgd; p4d_t *p4d; @@ -145,7 +143,7 @@ static pud_t __head *sme_prepare_pgd(struct sme_populat= e_pgd_data *ppd) return pud; } =20 -static void __head sme_populate_pgd_large(struct sme_populate_pgd_data *pp= d) +static void __init sme_populate_pgd_large(struct sme_populate_pgd_data *pp= d) { pud_t *pud; pmd_t *pmd; @@ -161,7 +159,7 @@ static void __head sme_populate_pgd_large(struct sme_po= pulate_pgd_data *ppd) set_pmd(pmd, __pmd(ppd->paddr | ppd->pmd_flags)); } =20 -static void __head sme_populate_pgd(struct sme_populate_pgd_data *ppd) +static void __init sme_populate_pgd(struct sme_populate_pgd_data *ppd) { pud_t *pud; pmd_t *pmd; @@ -187,7 +185,7 @@ static void __head sme_populate_pgd(struct sme_populate= _pgd_data *ppd) set_pte(pte, __pte(ppd->paddr | ppd->pte_flags)); } =20 -static void __head __sme_map_range_pmd(struct sme_populate_pgd_data *ppd) +static void __init __sme_map_range_pmd(struct sme_populate_pgd_data *ppd) { while (ppd->vaddr < ppd->vaddr_end) { sme_populate_pgd_large(ppd); @@ -197,7 +195,7 @@ static void __head __sme_map_range_pmd(struct sme_popul= ate_pgd_data *ppd) } } =20 -static void __head __sme_map_range_pte(struct sme_populate_pgd_data *ppd) +static void __init __sme_map_range_pte(struct sme_populate_pgd_data *ppd) { while (ppd->vaddr < ppd->vaddr_end) { sme_populate_pgd(ppd); @@ -207,7 +205,7 @@ static void __head __sme_map_range_pte(struct sme_popul= ate_pgd_data *ppd) } } =20 -static void __head __sme_map_range(struct sme_populate_pgd_data *ppd, +static void __init __sme_map_range(struct sme_populate_pgd_data *ppd, pmdval_t pmd_flags, pteval_t pte_flags) { unsigned long vaddr_end; @@ -231,22 +229,22 @@ static void __head __sme_map_range(struct sme_populat= e_pgd_data *ppd, __sme_map_range_pte(ppd); } =20 -static void __head sme_map_range_encrypted(struct sme_populate_pgd_data *p= pd) +static void __init sme_map_range_encrypted(struct sme_populate_pgd_data *p= pd) { __sme_map_range(ppd, PMD_FLAGS_ENC, PTE_FLAGS_ENC); } =20 -static void __head sme_map_range_decrypted(struct sme_populate_pgd_data *p= pd) +static void __init sme_map_range_decrypted(struct sme_populate_pgd_data *p= pd) { __sme_map_range(ppd, PMD_FLAGS_DEC, PTE_FLAGS_DEC); } =20 -static void __head sme_map_range_decrypted_wp(struct sme_populate_pgd_data= *ppd) +static void __init sme_map_range_decrypted_wp(struct sme_populate_pgd_data= *ppd) { __sme_map_range(ppd, PMD_FLAGS_DEC_WP, PTE_FLAGS_DEC_WP); } =20 -static unsigned long __head sme_pgtable_calc(unsigned long len) +static unsigned long __init sme_pgtable_calc(unsigned long len) { unsigned long entries =3D 0, tables =3D 0; =20 @@ -283,7 +281,7 @@ static unsigned long __head sme_pgtable_calc(unsigned l= ong len) return entries + tables; } =20 -void __head sme_encrypt_kernel(struct boot_params *bp) +void __init sme_encrypt_kernel(struct boot_params *bp) { unsigned long workarea_start, workarea_end, workarea_len; unsigned long execute_start, execute_end, execute_len; @@ -299,8 +297,7 @@ void __head sme_encrypt_kernel(struct boot_params *bp) * instrumentation or checking boot_cpu_data in the cc_platform_has() * function. */ - if (!sme_get_me_mask() || - RIP_REL_REF(sev_status) & MSR_AMD64_SEV_ENABLED) + if (!sme_get_me_mask() || sev_status & MSR_AMD64_SEV_ENABLED) return; =20 /* @@ -318,8 +315,8 @@ void __head sme_encrypt_kernel(struct boot_params *bp) * memory from being cached. */ =20 - kernel_start =3D (unsigned long)RIP_REL_REF(_text); - kernel_end =3D ALIGN((unsigned long)RIP_REL_REF(_end), PMD_SIZE); + kernel_start =3D (unsigned long)_text; + kernel_end =3D ALIGN((unsigned long)_end, PMD_SIZE); kernel_len =3D kernel_end - kernel_start; =20 initrd_start =3D 0; @@ -345,7 +342,7 @@ void __head sme_encrypt_kernel(struct boot_params *bp) * pagetable structures for the encryption of the kernel * pagetable structures for workarea (in case not currently mapped) */ - execute_start =3D workarea_start =3D (unsigned long)RIP_REL_REF(sme_worka= rea); + execute_start =3D workarea_start =3D (unsigned long)sme_workarea; execute_end =3D execute_start + (PAGE_SIZE * 2) + PMD_SIZE; execute_len =3D execute_end - execute_start; =20 @@ -488,7 +485,7 @@ void __head sme_encrypt_kernel(struct boot_params *bp) native_write_cr3(__native_read_cr3()); } =20 -void __head sme_enable(struct boot_params *bp) +void __init sme_enable(struct boot_params *bp) { unsigned int eax, ebx, ecx, edx; unsigned long feature_mask; @@ -526,7 +523,7 @@ void __head sme_enable(struct boot_params *bp) me_mask =3D 1UL << (ebx & 0x3f); =20 /* Check the SEV MSR whether SEV or SME is enabled */ - RIP_REL_REF(sev_status) =3D msr =3D __rdmsr(MSR_AMD64_SEV); + sev_status =3D msr =3D __rdmsr(MSR_AMD64_SEV); feature_mask =3D (msr & MSR_AMD64_SEV_ENABLED) ? AMD_SEV_BIT : AMD_SME_BI= T; =20 /* @@ -562,8 +559,8 @@ void __head sme_enable(struct boot_params *bp) return; } =20 - RIP_REL_REF(sme_me_mask) =3D me_mask; - RIP_REL_REF(physical_mask) &=3D ~me_mask; - RIP_REL_REF(cc_vendor) =3D CC_VENDOR_AMD; + sme_me_mask =3D me_mask; + physical_mask &=3D ~me_mask; + cc_vendor =3D CC_VENDOR_AMD; cc_set_mask(me_mask); } diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_= encrypt.h index 1530ee301dfe..ea6494628cb0 100644 --- a/arch/x86/include/asm/mem_encrypt.h +++ b/arch/x86/include/asm/mem_encrypt.h @@ -61,7 +61,7 @@ void __init sev_es_init_vc_handling(void); =20 static inline u64 sme_get_me_mask(void) { - return RIP_REL_REF(sme_me_mask); + return sme_me_mask; } =20 #define __bss_decrypted __section(".bss..decrypted") diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile index 32035d5be5a0..3faa60f13a61 100644 --- a/arch/x86/mm/Makefile +++ b/arch/x86/mm/Makefile @@ -3,12 +3,10 @@ KCOV_INSTRUMENT_tlb.o :=3D n KCOV_INSTRUMENT_mem_encrypt.o :=3D n KCOV_INSTRUMENT_mem_encrypt_amd.o :=3D n -KCOV_INSTRUMENT_mem_encrypt_identity.o :=3D n KCOV_INSTRUMENT_pgprot.o :=3D n =20 KASAN_SANITIZE_mem_encrypt.o :=3D n KASAN_SANITIZE_mem_encrypt_amd.o :=3D n -KASAN_SANITIZE_mem_encrypt_identity.o :=3D n KASAN_SANITIZE_pgprot.o :=3D n =20 # Disable KCSAN entirely, because otherwise we get warnings that some func= tions @@ -16,12 +14,10 @@ KASAN_SANITIZE_pgprot.o :=3D n KCSAN_SANITIZE :=3D n # Avoid recursion by not calling KMSAN hooks for CEA code. KMSAN_SANITIZE_cpu_entry_area.o :=3D n -KMSAN_SANITIZE_mem_encrypt_identity.o :=3D n =20 ifdef CONFIG_FUNCTION_TRACER CFLAGS_REMOVE_mem_encrypt.o =3D -pg CFLAGS_REMOVE_mem_encrypt_amd.o =3D -pg -CFLAGS_REMOVE_mem_encrypt_identity.o =3D -pg CFLAGS_REMOVE_pgprot.o =3D -pg endif =20 @@ -32,7 +28,6 @@ obj-y +=3D pat/ =20 # Make sure __phys_addr has no stackprotector CFLAGS_physaddr.o :=3D -fno-stack-protector -CFLAGS_mem_encrypt_identity.o :=3D -fno-stack-protector =20 CFLAGS_fault.o :=3D -I $(src)/../include/asm/trace =20 @@ -63,5 +58,4 @@ obj-$(CONFIG_MITIGATION_PAGE_TABLE_ISOLATION) +=3D pti.o obj-$(CONFIG_X86_MEM_ENCRYPT) +=3D mem_encrypt.o obj-$(CONFIG_AMD_MEM_ENCRYPT) +=3D mem_encrypt_amd.o =20 -obj-$(CONFIG_AMD_MEM_ENCRYPT) +=3D mem_encrypt_identity.o obj-$(CONFIG_AMD_MEM_ENCRYPT) +=3D mem_encrypt_boot.o --=20 2.49.0.504.g3bcea36a83-goog