[PATCH 1/3] x86/boot: Fix page table access in 5-level to 4-level paging transition

Usama Arif posted 3 patches 3 months, 2 weeks ago
There is a newer version of this series
[PATCH 1/3] x86/boot: Fix page table access in 5-level to 4-level paging transition
Posted by Usama Arif 3 months, 2 weeks ago
When transitioning from 5-level to 4-level paging, the existing code
incorrectly accesses page table entries by directly dereferencing CR3
and applying PAGE_MASK. This approach has several issues:

- __native_read_cr3() returns the raw CR3 register value, which on
  x86_64 includes not just the physical address but also flags. Bits
  above the physical address width of the system i.e. above
  __PHYSICAL_MASK_SHIFT) are also not masked.
- The PGD entry is masked by PAGE_SIZE which doesn't take into account
  the higher bits such as _PAGE_BIT_NOPTISHADOW.

Replace this with proper accessor functions:
- read_cr3_pa(): Uses CR3_ADDR_MASK properly clearing SME encryption bit
  and extracting only the physical address portion.
- mask pgd value with PTE_PFN_MASK instead of PAGE_MASK, accounting for
  flags above physical address (_PAGE_BIT_NOPTISHADOW in particular).

Fixes: e9d0e6330eb8 ("x86/boot/compressed/64: Prepare new top-level page table for trampoline")
Co-developed-by: Kiryl Shutsemau <kas@kernel.org>
Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
Signed-off-by: Usama Arif <usamaarif642@gmail.com>
Reported-by: Michael van der Westhuizen <rmikey@meta.com>
Reported-by: Tobias Fleig <tfleig@meta.com>
---
 arch/x86/boot/compressed/pgtable_64.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/boot/compressed/pgtable_64.c b/arch/x86/boot/compressed/pgtable_64.c
index bdd26050dff77..a56449938b7ec 100644
--- a/arch/x86/boot/compressed/pgtable_64.c
+++ b/arch/x86/boot/compressed/pgtable_64.c
@@ -170,7 +170,8 @@ asmlinkage void configure_5level_paging(struct boot_params *bp, void *pgtable)
 		 */
 		*trampoline_32bit = __native_read_cr3() | _PAGE_TABLE_NOENC;
 	} else {
-		unsigned long src;
+		u64 *new_cr3;
+		pgd_t *pgdp;
 
 		/*
 		 * For 5- to 4-level paging transition, copy page table pointed
@@ -180,8 +181,9 @@ asmlinkage void configure_5level_paging(struct boot_params *bp, void *pgtable)
 		 * We cannot just point to the page table from trampoline as it
 		 * may be above 4G.
 		 */
-		src = *(unsigned long *)__native_read_cr3() & PAGE_MASK;
-		memcpy(trampoline_32bit, (void *)src, PAGE_SIZE);
+		pgdp = (pgd_t *)read_cr3_pa();
+		new_cr3 = (u64 *)(pgd_val(pgdp[0]) & PTE_PFN_MASK);
+		memcpy(trampoline_32bit, new_cr3, PAGE_SIZE);
 	}
 
 	toggle_la57(trampoline_32bit);
-- 
2.47.3
Re: [PATCH 1/3] x86/boot: Fix page table access in 5-level to 4-level paging transition
Posted by kernel test robot 3 months, 2 weeks ago
Hi Usama,

kernel test robot noticed the following build errors:

[auto build test ERROR on tip/x86/core]
[also build test ERROR on tip/master efi/next linus/master v6.18-rc2 next-20251024]
[cannot apply to tip/auto-latest]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Usama-Arif/x86-boot-Fix-page-table-access-in-5-level-to-4-level-paging-transition/20251023-061048
base:   tip/x86/core
patch link:    https://lore.kernel.org/r/20251022220755.1026144-2-usamaarif642%40gmail.com
patch subject: [PATCH 1/3] x86/boot: Fix page table access in 5-level to 4-level paging transition
config: x86_64-buildonly-randconfig-004-20251024 (https://download.01.org/0day-ci/archive/20251024/202510241522.uU9W0Xbv-lkp@intel.com/config)
compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251024/202510241522.uU9W0Xbv-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202510241522.uU9W0Xbv-lkp@intel.com/

All errors (new ones prefixed by >>):

   arch/x86/boot/compressed/pgtable_64.c: In function 'configure_5level_paging':
>> arch/x86/boot/compressed/pgtable_64.c:185:35: error: implicit declaration of function 'pgd_val' [-Wimplicit-function-declaration]
     185 |                 new_cr3 = (u64 *)(pgd_val(pgdp[0]) & PTE_PFN_MASK);
         |                                   ^~~~~~~


vim +/pgd_val +185 arch/x86/boot/compressed/pgtable_64.c

   101	
   102	asmlinkage void configure_5level_paging(struct boot_params *bp, void *pgtable)
   103	{
   104		void (*toggle_la57)(void *cr3);
   105		bool l5_required = false;
   106	
   107		/* Initialize boot_params. Required for cmdline_find_option_bool(). */
   108		sanitize_boot_params(bp);
   109		boot_params_ptr = bp;
   110	
   111		/*
   112		 * Check if LA57 is desired and supported.
   113		 *
   114		 * There are several parts to the check:
   115		 *   - if user asked to disable 5-level paging: no5lvl in cmdline
   116		 *   - if the machine supports 5-level paging:
   117		 *     + CPUID leaf 7 is supported
   118		 *     + the leaf has the feature bit set
   119		 */
   120		if (!cmdline_find_option_bool("no5lvl") &&
   121		    native_cpuid_eax(0) >= 7 && (native_cpuid_ecx(7) & BIT(16))) {
   122			l5_required = true;
   123	
   124			/* Initialize variables for 5-level paging */
   125			__pgtable_l5_enabled = 1;
   126			pgdir_shift = 48;
   127			ptrs_per_p4d = 512;
   128		}
   129	
   130		/*
   131		 * The trampoline will not be used if the paging mode is already set to
   132		 * the desired one.
   133		 */
   134		if (l5_required == !!(native_read_cr4() & X86_CR4_LA57))
   135			return;
   136	
   137		trampoline_32bit = (unsigned long *)find_trampoline_placement();
   138	
   139		/* Preserve trampoline memory */
   140		memcpy(trampoline_save, trampoline_32bit, TRAMPOLINE_32BIT_SIZE);
   141	
   142		/* Clear trampoline memory first */
   143		memset(trampoline_32bit, 0, TRAMPOLINE_32BIT_SIZE);
   144	
   145		/* Copy trampoline code in place */
   146		toggle_la57 = memcpy(trampoline_32bit +
   147				TRAMPOLINE_32BIT_CODE_OFFSET / sizeof(unsigned long),
   148				&trampoline_32bit_src, TRAMPOLINE_32BIT_CODE_SIZE);
   149	
   150		/*
   151		 * Avoid the need for a stack in the 32-bit trampoline code, by using
   152		 * LJMP rather than LRET to return back to long mode. LJMP takes an
   153		 * immediate absolute address, which needs to be adjusted based on the
   154		 * placement of the trampoline.
   155		 */
   156		*(u32 *)((u8 *)toggle_la57 + trampoline_ljmp_imm_offset) +=
   157							(unsigned long)toggle_la57;
   158	
   159		/*
   160		 * The code below prepares page table in trampoline memory.
   161		 *
   162		 * The new page table will be used by trampoline code for switching
   163		 * from 4- to 5-level paging or vice versa.
   164		 */
   165	
   166		if (l5_required) {
   167			/*
   168			 * For 4- to 5-level paging transition, set up current CR3 as
   169			 * the first and the only entry in a new top-level page table.
   170			 */
   171			*trampoline_32bit = __native_read_cr3() | _PAGE_TABLE_NOENC;
   172		} else {
   173			u64 *new_cr3;
   174			pgd_t *pgdp;
   175	
   176			/*
   177			 * For 5- to 4-level paging transition, copy page table pointed
   178			 * by first entry in the current top-level page table as our
   179			 * new top-level page table.
   180			 *
   181			 * We cannot just point to the page table from trampoline as it
   182			 * may be above 4G.
   183			 */
   184			pgdp = (pgd_t *)read_cr3_pa();
 > 185			new_cr3 = (u64 *)(pgd_val(pgdp[0]) & PTE_PFN_MASK);

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Re: [PATCH 1/3] x86/boot: Fix page table access in 5-level to 4-level paging transition
Posted by kernel test robot 3 months, 2 weeks ago
Hi Usama,

kernel test robot noticed the following build errors:

[auto build test ERROR on tip/x86/core]
[also build test ERROR on tip/master efi/next linus/master v6.18-rc2 next-20251023]
[cannot apply to tip/auto-latest]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Usama-Arif/x86-boot-Fix-page-table-access-in-5-level-to-4-level-paging-transition/20251023-061048
base:   tip/x86/core
patch link:    https://lore.kernel.org/r/20251022220755.1026144-2-usamaarif642%40gmail.com
patch subject: [PATCH 1/3] x86/boot: Fix page table access in 5-level to 4-level paging transition
config: x86_64-allnoconfig (https://download.01.org/0day-ci/archive/20251024/202510240106.1aff6SIM-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251024/202510240106.1aff6SIM-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202510240106.1aff6SIM-lkp@intel.com/

All errors (new ones prefixed by >>):

>> arch/x86/boot/compressed/pgtable_64.c:185:21: error: call to undeclared function 'pgd_val'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
     185 |                 new_cr3 = (u64 *)(pgd_val(pgdp[0]) & PTE_PFN_MASK);
         |                                   ^
   1 error generated.


vim +/pgd_val +185 arch/x86/boot/compressed/pgtable_64.c

   101	
   102	asmlinkage void configure_5level_paging(struct boot_params *bp, void *pgtable)
   103	{
   104		void (*toggle_la57)(void *cr3);
   105		bool l5_required = false;
   106	
   107		/* Initialize boot_params. Required for cmdline_find_option_bool(). */
   108		sanitize_boot_params(bp);
   109		boot_params_ptr = bp;
   110	
   111		/*
   112		 * Check if LA57 is desired and supported.
   113		 *
   114		 * There are several parts to the check:
   115		 *   - if user asked to disable 5-level paging: no5lvl in cmdline
   116		 *   - if the machine supports 5-level paging:
   117		 *     + CPUID leaf 7 is supported
   118		 *     + the leaf has the feature bit set
   119		 */
   120		if (!cmdline_find_option_bool("no5lvl") &&
   121		    native_cpuid_eax(0) >= 7 && (native_cpuid_ecx(7) & BIT(16))) {
   122			l5_required = true;
   123	
   124			/* Initialize variables for 5-level paging */
   125			__pgtable_l5_enabled = 1;
   126			pgdir_shift = 48;
   127			ptrs_per_p4d = 512;
   128		}
   129	
   130		/*
   131		 * The trampoline will not be used if the paging mode is already set to
   132		 * the desired one.
   133		 */
   134		if (l5_required == !!(native_read_cr4() & X86_CR4_LA57))
   135			return;
   136	
   137		trampoline_32bit = (unsigned long *)find_trampoline_placement();
   138	
   139		/* Preserve trampoline memory */
   140		memcpy(trampoline_save, trampoline_32bit, TRAMPOLINE_32BIT_SIZE);
   141	
   142		/* Clear trampoline memory first */
   143		memset(trampoline_32bit, 0, TRAMPOLINE_32BIT_SIZE);
   144	
   145		/* Copy trampoline code in place */
   146		toggle_la57 = memcpy(trampoline_32bit +
   147				TRAMPOLINE_32BIT_CODE_OFFSET / sizeof(unsigned long),
   148				&trampoline_32bit_src, TRAMPOLINE_32BIT_CODE_SIZE);
   149	
   150		/*
   151		 * Avoid the need for a stack in the 32-bit trampoline code, by using
   152		 * LJMP rather than LRET to return back to long mode. LJMP takes an
   153		 * immediate absolute address, which needs to be adjusted based on the
   154		 * placement of the trampoline.
   155		 */
   156		*(u32 *)((u8 *)toggle_la57 + trampoline_ljmp_imm_offset) +=
   157							(unsigned long)toggle_la57;
   158	
   159		/*
   160		 * The code below prepares page table in trampoline memory.
   161		 *
   162		 * The new page table will be used by trampoline code for switching
   163		 * from 4- to 5-level paging or vice versa.
   164		 */
   165	
   166		if (l5_required) {
   167			/*
   168			 * For 4- to 5-level paging transition, set up current CR3 as
   169			 * the first and the only entry in a new top-level page table.
   170			 */
   171			*trampoline_32bit = __native_read_cr3() | _PAGE_TABLE_NOENC;
   172		} else {
   173			u64 *new_cr3;
   174			pgd_t *pgdp;
   175	
   176			/*
   177			 * For 5- to 4-level paging transition, copy page table pointed
   178			 * by first entry in the current top-level page table as our
   179			 * new top-level page table.
   180			 *
   181			 * We cannot just point to the page table from trampoline as it
   182			 * may be above 4G.
   183			 */
   184			pgdp = (pgd_t *)read_cr3_pa();
 > 185			new_cr3 = (u64 *)(pgd_val(pgdp[0]) & PTE_PFN_MASK);

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Re: [PATCH 1/3] x86/boot: Fix page table access in 5-level to 4-level paging transition
Posted by Dave Hansen 3 months, 2 weeks ago
On 10/22/25 15:06, Usama Arif wrote:
> +		pgdp = (pgd_t *)read_cr3_pa();
> +		new_cr3 = (u64 *)(pgd_val(pgdp[0]) & PTE_PFN_MASK);
> +		memcpy(trampoline_32bit, new_cr3, PAGE_SIZE);

Heh, somebody like casting, I see!

But seriously, read_cr3_pa() should be returning a physical address. No?
Today it does:

static inline unsigned long read_cr3_pa(void)
{
        return __read_cr3() & CR3_ADDR_MASK;
}

So shouldn't CR3_ADDR_MASK be masking out any naughty non-address bits?
Shouldn't we fix read_cr3_pa() and not do this in its caller?
Re: [PATCH 1/3] x86/boot: Fix page table access in 5-level to 4-level paging transition
Posted by H. Peter Anvin 3 months, 2 weeks ago
On October 22, 2025 4:16:34 PM PDT, Dave Hansen <dave.hansen@intel.com> wrote:
>On 10/22/25 15:06, Usama Arif wrote:
>> +		pgdp = (pgd_t *)read_cr3_pa();
>> +		new_cr3 = (u64 *)(pgd_val(pgdp[0]) & PTE_PFN_MASK);
>> +		memcpy(trampoline_32bit, new_cr3, PAGE_SIZE);
>
>Heh, somebody like casting, I see!
>
>But seriously, read_cr3_pa() should be returning a physical address. No?
>Today it does:
>
>static inline unsigned long read_cr3_pa(void)
>{
>        return __read_cr3() & CR3_ADDR_MASK;
>}
>
>So shouldn't CR3_ADDR_MASK be masking out any naughty non-address bits?
>Shouldn't we fix read_cr3_pa() and not do this in its caller?

Ah, the times when one can wish for C++.

Too bad they still haven't figured out tagged initializers.
Re: [PATCH 1/3] x86/boot: Fix page table access in 5-level to 4-level paging transition
Posted by Usama Arif 3 months, 2 weeks ago

On 23/10/2025 00:16, Dave Hansen wrote:
> On 10/22/25 15:06, Usama Arif wrote:
>> +		pgdp = (pgd_t *)read_cr3_pa();
>> +		new_cr3 = (u64 *)(pgd_val(pgdp[0]) & PTE_PFN_MASK);
>> +		memcpy(trampoline_32bit, new_cr3, PAGE_SIZE);
> 
> Heh, somebody like casting, I see!

haha yeah its a lot here.
> 
> But seriously, read_cr3_pa() should be returning a physical address. No?
> Today it does:
> 
> static inline unsigned long read_cr3_pa(void)
> {
>         return __read_cr3() & CR3_ADDR_MASK;
> }
> 
> So shouldn't CR3_ADDR_MASK be masking out any naughty non-address bits?
> Shouldn't we fix read_cr3_pa() and not do this in its caller?

So we need to mask 2 things here:
- cr3, which is done by read_cr3_pa using CR3_ADDR_MASK/(__sme_clr(PHYSICAL_PAGE_MASK))
  as you pointed out.
- pgdp[0] (the deferenced value), i.e. the p4d table pointer (This was previously
  *(unsigned long *)__native_read_cr3()). This needs to be masked by PTE_PFN_MASK and
  and not PAGE_MASK which was done previously in order to take care of _PAGE_BIT_NOPTISHADOW.