When transitioning from 5-level to 4-level paging, the existing code
incorrectly accesses page table entries by directly dereferencing CR3
and applying PAGE_MASK. This approach has several issues:
- __native_read_cr3() returns the raw CR3 register value, which on
x86_64 includes not just the physical address but also flags. Bits
above the physical address width of the system i.e. above
__PHYSICAL_MASK_SHIFT) are also not masked.
- The PGD entry is masked by PAGE_SIZE which doesn't take into account
the higher bits such as _PAGE_BIT_NOPTISHADOW.
Replace this with proper accessor functions:
- read_cr3_pa(): Uses CR3_ADDR_MASK properly clearing SME encryption bit
and extracting only the physical address portion.
- mask pgd value with PTE_PFN_MASK instead of PAGE_MASK, accounting for
flags above physical address (_PAGE_BIT_NOPTISHADOW in particular).
Fixes: e9d0e6330eb8 ("x86/boot/compressed/64: Prepare new top-level page table for trampoline")
Co-developed-by: Kiryl Shutsemau <kas@kernel.org>
Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
Signed-off-by: Usama Arif <usamaarif642@gmail.com>
Reported-by: Michael van der Westhuizen <rmikey@meta.com>
Reported-by: Tobias Fleig <tfleig@meta.com>
---
arch/x86/boot/compressed/pgtable_64.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/arch/x86/boot/compressed/pgtable_64.c b/arch/x86/boot/compressed/pgtable_64.c
index bdd26050dff77..a56449938b7ec 100644
--- a/arch/x86/boot/compressed/pgtable_64.c
+++ b/arch/x86/boot/compressed/pgtable_64.c
@@ -170,7 +170,8 @@ asmlinkage void configure_5level_paging(struct boot_params *bp, void *pgtable)
*/
*trampoline_32bit = __native_read_cr3() | _PAGE_TABLE_NOENC;
} else {
- unsigned long src;
+ u64 *new_cr3;
+ pgd_t *pgdp;
/*
* For 5- to 4-level paging transition, copy page table pointed
@@ -180,8 +181,9 @@ asmlinkage void configure_5level_paging(struct boot_params *bp, void *pgtable)
* We cannot just point to the page table from trampoline as it
* may be above 4G.
*/
- src = *(unsigned long *)__native_read_cr3() & PAGE_MASK;
- memcpy(trampoline_32bit, (void *)src, PAGE_SIZE);
+ pgdp = (pgd_t *)read_cr3_pa();
+ new_cr3 = (u64 *)(pgd_val(pgdp[0]) & PTE_PFN_MASK);
+ memcpy(trampoline_32bit, new_cr3, PAGE_SIZE);
}
toggle_la57(trampoline_32bit);
--
2.47.3
Hi Usama,
kernel test robot noticed the following build errors:
[auto build test ERROR on tip/x86/core]
[also build test ERROR on tip/master efi/next linus/master v6.18-rc2 next-20251024]
[cannot apply to tip/auto-latest]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Usama-Arif/x86-boot-Fix-page-table-access-in-5-level-to-4-level-paging-transition/20251023-061048
base: tip/x86/core
patch link: https://lore.kernel.org/r/20251022220755.1026144-2-usamaarif642%40gmail.com
patch subject: [PATCH 1/3] x86/boot: Fix page table access in 5-level to 4-level paging transition
config: x86_64-buildonly-randconfig-004-20251024 (https://download.01.org/0day-ci/archive/20251024/202510241522.uU9W0Xbv-lkp@intel.com/config)
compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251024/202510241522.uU9W0Xbv-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202510241522.uU9W0Xbv-lkp@intel.com/
All errors (new ones prefixed by >>):
arch/x86/boot/compressed/pgtable_64.c: In function 'configure_5level_paging':
>> arch/x86/boot/compressed/pgtable_64.c:185:35: error: implicit declaration of function 'pgd_val' [-Wimplicit-function-declaration]
185 | new_cr3 = (u64 *)(pgd_val(pgdp[0]) & PTE_PFN_MASK);
| ^~~~~~~
vim +/pgd_val +185 arch/x86/boot/compressed/pgtable_64.c
101
102 asmlinkage void configure_5level_paging(struct boot_params *bp, void *pgtable)
103 {
104 void (*toggle_la57)(void *cr3);
105 bool l5_required = false;
106
107 /* Initialize boot_params. Required for cmdline_find_option_bool(). */
108 sanitize_boot_params(bp);
109 boot_params_ptr = bp;
110
111 /*
112 * Check if LA57 is desired and supported.
113 *
114 * There are several parts to the check:
115 * - if user asked to disable 5-level paging: no5lvl in cmdline
116 * - if the machine supports 5-level paging:
117 * + CPUID leaf 7 is supported
118 * + the leaf has the feature bit set
119 */
120 if (!cmdline_find_option_bool("no5lvl") &&
121 native_cpuid_eax(0) >= 7 && (native_cpuid_ecx(7) & BIT(16))) {
122 l5_required = true;
123
124 /* Initialize variables for 5-level paging */
125 __pgtable_l5_enabled = 1;
126 pgdir_shift = 48;
127 ptrs_per_p4d = 512;
128 }
129
130 /*
131 * The trampoline will not be used if the paging mode is already set to
132 * the desired one.
133 */
134 if (l5_required == !!(native_read_cr4() & X86_CR4_LA57))
135 return;
136
137 trampoline_32bit = (unsigned long *)find_trampoline_placement();
138
139 /* Preserve trampoline memory */
140 memcpy(trampoline_save, trampoline_32bit, TRAMPOLINE_32BIT_SIZE);
141
142 /* Clear trampoline memory first */
143 memset(trampoline_32bit, 0, TRAMPOLINE_32BIT_SIZE);
144
145 /* Copy trampoline code in place */
146 toggle_la57 = memcpy(trampoline_32bit +
147 TRAMPOLINE_32BIT_CODE_OFFSET / sizeof(unsigned long),
148 &trampoline_32bit_src, TRAMPOLINE_32BIT_CODE_SIZE);
149
150 /*
151 * Avoid the need for a stack in the 32-bit trampoline code, by using
152 * LJMP rather than LRET to return back to long mode. LJMP takes an
153 * immediate absolute address, which needs to be adjusted based on the
154 * placement of the trampoline.
155 */
156 *(u32 *)((u8 *)toggle_la57 + trampoline_ljmp_imm_offset) +=
157 (unsigned long)toggle_la57;
158
159 /*
160 * The code below prepares page table in trampoline memory.
161 *
162 * The new page table will be used by trampoline code for switching
163 * from 4- to 5-level paging or vice versa.
164 */
165
166 if (l5_required) {
167 /*
168 * For 4- to 5-level paging transition, set up current CR3 as
169 * the first and the only entry in a new top-level page table.
170 */
171 *trampoline_32bit = __native_read_cr3() | _PAGE_TABLE_NOENC;
172 } else {
173 u64 *new_cr3;
174 pgd_t *pgdp;
175
176 /*
177 * For 5- to 4-level paging transition, copy page table pointed
178 * by first entry in the current top-level page table as our
179 * new top-level page table.
180 *
181 * We cannot just point to the page table from trampoline as it
182 * may be above 4G.
183 */
184 pgdp = (pgd_t *)read_cr3_pa();
> 185 new_cr3 = (u64 *)(pgd_val(pgdp[0]) & PTE_PFN_MASK);
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Hi Usama,
kernel test robot noticed the following build errors:
[auto build test ERROR on tip/x86/core]
[also build test ERROR on tip/master efi/next linus/master v6.18-rc2 next-20251023]
[cannot apply to tip/auto-latest]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Usama-Arif/x86-boot-Fix-page-table-access-in-5-level-to-4-level-paging-transition/20251023-061048
base: tip/x86/core
patch link: https://lore.kernel.org/r/20251022220755.1026144-2-usamaarif642%40gmail.com
patch subject: [PATCH 1/3] x86/boot: Fix page table access in 5-level to 4-level paging transition
config: x86_64-allnoconfig (https://download.01.org/0day-ci/archive/20251024/202510240106.1aff6SIM-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251024/202510240106.1aff6SIM-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202510240106.1aff6SIM-lkp@intel.com/
All errors (new ones prefixed by >>):
>> arch/x86/boot/compressed/pgtable_64.c:185:21: error: call to undeclared function 'pgd_val'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
185 | new_cr3 = (u64 *)(pgd_val(pgdp[0]) & PTE_PFN_MASK);
| ^
1 error generated.
vim +/pgd_val +185 arch/x86/boot/compressed/pgtable_64.c
101
102 asmlinkage void configure_5level_paging(struct boot_params *bp, void *pgtable)
103 {
104 void (*toggle_la57)(void *cr3);
105 bool l5_required = false;
106
107 /* Initialize boot_params. Required for cmdline_find_option_bool(). */
108 sanitize_boot_params(bp);
109 boot_params_ptr = bp;
110
111 /*
112 * Check if LA57 is desired and supported.
113 *
114 * There are several parts to the check:
115 * - if user asked to disable 5-level paging: no5lvl in cmdline
116 * - if the machine supports 5-level paging:
117 * + CPUID leaf 7 is supported
118 * + the leaf has the feature bit set
119 */
120 if (!cmdline_find_option_bool("no5lvl") &&
121 native_cpuid_eax(0) >= 7 && (native_cpuid_ecx(7) & BIT(16))) {
122 l5_required = true;
123
124 /* Initialize variables for 5-level paging */
125 __pgtable_l5_enabled = 1;
126 pgdir_shift = 48;
127 ptrs_per_p4d = 512;
128 }
129
130 /*
131 * The trampoline will not be used if the paging mode is already set to
132 * the desired one.
133 */
134 if (l5_required == !!(native_read_cr4() & X86_CR4_LA57))
135 return;
136
137 trampoline_32bit = (unsigned long *)find_trampoline_placement();
138
139 /* Preserve trampoline memory */
140 memcpy(trampoline_save, trampoline_32bit, TRAMPOLINE_32BIT_SIZE);
141
142 /* Clear trampoline memory first */
143 memset(trampoline_32bit, 0, TRAMPOLINE_32BIT_SIZE);
144
145 /* Copy trampoline code in place */
146 toggle_la57 = memcpy(trampoline_32bit +
147 TRAMPOLINE_32BIT_CODE_OFFSET / sizeof(unsigned long),
148 &trampoline_32bit_src, TRAMPOLINE_32BIT_CODE_SIZE);
149
150 /*
151 * Avoid the need for a stack in the 32-bit trampoline code, by using
152 * LJMP rather than LRET to return back to long mode. LJMP takes an
153 * immediate absolute address, which needs to be adjusted based on the
154 * placement of the trampoline.
155 */
156 *(u32 *)((u8 *)toggle_la57 + trampoline_ljmp_imm_offset) +=
157 (unsigned long)toggle_la57;
158
159 /*
160 * The code below prepares page table in trampoline memory.
161 *
162 * The new page table will be used by trampoline code for switching
163 * from 4- to 5-level paging or vice versa.
164 */
165
166 if (l5_required) {
167 /*
168 * For 4- to 5-level paging transition, set up current CR3 as
169 * the first and the only entry in a new top-level page table.
170 */
171 *trampoline_32bit = __native_read_cr3() | _PAGE_TABLE_NOENC;
172 } else {
173 u64 *new_cr3;
174 pgd_t *pgdp;
175
176 /*
177 * For 5- to 4-level paging transition, copy page table pointed
178 * by first entry in the current top-level page table as our
179 * new top-level page table.
180 *
181 * We cannot just point to the page table from trampoline as it
182 * may be above 4G.
183 */
184 pgdp = (pgd_t *)read_cr3_pa();
> 185 new_cr3 = (u64 *)(pgd_val(pgdp[0]) & PTE_PFN_MASK);
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
On 10/22/25 15:06, Usama Arif wrote:
> + pgdp = (pgd_t *)read_cr3_pa();
> + new_cr3 = (u64 *)(pgd_val(pgdp[0]) & PTE_PFN_MASK);
> + memcpy(trampoline_32bit, new_cr3, PAGE_SIZE);
Heh, somebody like casting, I see!
But seriously, read_cr3_pa() should be returning a physical address. No?
Today it does:
static inline unsigned long read_cr3_pa(void)
{
return __read_cr3() & CR3_ADDR_MASK;
}
So shouldn't CR3_ADDR_MASK be masking out any naughty non-address bits?
Shouldn't we fix read_cr3_pa() and not do this in its caller?
On October 22, 2025 4:16:34 PM PDT, Dave Hansen <dave.hansen@intel.com> wrote:
>On 10/22/25 15:06, Usama Arif wrote:
>> + pgdp = (pgd_t *)read_cr3_pa();
>> + new_cr3 = (u64 *)(pgd_val(pgdp[0]) & PTE_PFN_MASK);
>> + memcpy(trampoline_32bit, new_cr3, PAGE_SIZE);
>
>Heh, somebody like casting, I see!
>
>But seriously, read_cr3_pa() should be returning a physical address. No?
>Today it does:
>
>static inline unsigned long read_cr3_pa(void)
>{
> return __read_cr3() & CR3_ADDR_MASK;
>}
>
>So shouldn't CR3_ADDR_MASK be masking out any naughty non-address bits?
>Shouldn't we fix read_cr3_pa() and not do this in its caller?
Ah, the times when one can wish for C++.
Too bad they still haven't figured out tagged initializers.
On 23/10/2025 00:16, Dave Hansen wrote:
> On 10/22/25 15:06, Usama Arif wrote:
>> + pgdp = (pgd_t *)read_cr3_pa();
>> + new_cr3 = (u64 *)(pgd_val(pgdp[0]) & PTE_PFN_MASK);
>> + memcpy(trampoline_32bit, new_cr3, PAGE_SIZE);
>
> Heh, somebody like casting, I see!
haha yeah its a lot here.
>
> But seriously, read_cr3_pa() should be returning a physical address. No?
> Today it does:
>
> static inline unsigned long read_cr3_pa(void)
> {
> return __read_cr3() & CR3_ADDR_MASK;
> }
>
> So shouldn't CR3_ADDR_MASK be masking out any naughty non-address bits?
> Shouldn't we fix read_cr3_pa() and not do this in its caller?
So we need to mask 2 things here:
- cr3, which is done by read_cr3_pa using CR3_ADDR_MASK/(__sme_clr(PHYSICAL_PAGE_MASK))
as you pointed out.
- pgdp[0] (the deferenced value), i.e. the p4d table pointer (This was previously
*(unsigned long *)__native_read_cr3()). This needs to be masked by PTE_PFN_MASK and
and not PAGE_MASK which was done previously in order to take care of _PAGE_BIT_NOPTISHADOW.
© 2016 - 2026 Red Hat, Inc.