When transitioning from 5-level to 4-level paging, the existing code
incorrectly accesses page table entries by directly dereferencing CR3
and applying PAGE_MASK. This approach has several issues:
- __native_read_cr3() returns the raw CR3 register value, which on
x86_64 includes not just the physical address but also flags Bits
above the physical address width of the system (i.e. above
__PHYSICAL_MASK_SHIFT) are also not masked.
- The pgd value is masked by PAGE_SIZE which doesn't take into account
the higher bits such as _PAGE_BIT_NOPTISHADOW.
Replace this with proper accessor functions:
- native_read_cr3_pa(): Uses CR3_ADDR_MASK properly clearing SME encryption
bit and extracting only the physical address portion.
- mask pgd value with PTE_PFN_MASK instead of PAGE_MASK, accounting for
flags above physical address (_PAGE_BIT_NOPTISHADOW in particular).
Fixes: cb1c9e02b0c1 ("x86/efistub: Perform 4/5 level paging switch from the stub")
Co-developed-by: Kiryl Shutsemau <kas@kernel.org>
Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
Signed-off-by: Usama Arif <usamaarif642@gmail.com>
Reported-by: Michael van der Westhuizen <rmikey@meta.com>
Reported-by: Tobias Fleig <tfleig@meta.com>
---
drivers/firmware/efi/libstub/x86-5lvl.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/firmware/efi/libstub/x86-5lvl.c b/drivers/firmware/efi/libstub/x86-5lvl.c
index f1c5fb45d5f7c..36b4a611f6adf 100644
--- a/drivers/firmware/efi/libstub/x86-5lvl.c
+++ b/drivers/firmware/efi/libstub/x86-5lvl.c
@@ -66,7 +66,7 @@ void efi_5level_switch(void)
bool have_la57 = native_read_cr4() & X86_CR4_LA57;
bool need_toggle = want_la57 ^ have_la57;
u64 *pgt = (void *)la57_toggle + PAGE_SIZE;
- u64 *cr3 = (u64 *)__native_read_cr3();
+ pgd_t *cr3 = (pgd_t *)native_read_cr3_pa();
u64 *new_cr3;
if (!la57_toggle || !need_toggle)
@@ -82,7 +82,7 @@ void efi_5level_switch(void)
new_cr3[0] = (u64)cr3 | _PAGE_TABLE_NOENC;
} else {
/* take the new root table pointer from the current entry #0 */
- new_cr3 = (u64 *)(cr3[0] & PAGE_MASK);
+ new_cr3 = (u64 *)(pgd_val(cr3[0]) & PTE_PFN_MASK);
/* copy the new root table if it is not 32-bit addressable */
if ((u64)new_cr3 > U32_MAX)
--
2.47.3
On Tue, Oct 28, 2025 at 10:55:57AM +0000, Usama Arif wrote:
> When transitioning from 5-level to 4-level paging, the existing code
> incorrectly accesses page table entries by directly dereferencing CR3
> and applying PAGE_MASK. This approach has several issues:
>
> - __native_read_cr3() returns the raw CR3 register value, which on
> x86_64 includes not just the physical address but also flags Bits
> above the physical address width of the system (i.e. above
> __PHYSICAL_MASK_SHIFT) are also not masked.
> - The pgd value is masked by PAGE_SIZE which doesn't take into account
> the higher bits such as _PAGE_BIT_NOPTISHADOW.
>
> Replace this with proper accessor functions:
> - native_read_cr3_pa(): Uses CR3_ADDR_MASK properly clearing SME encryption
> bit and extracting only the physical address portion.
> - mask pgd value with PTE_PFN_MASK instead of PAGE_MASK, accounting for
> flags above physical address (_PAGE_BIT_NOPTISHADOW in particular).
>
> Fixes: cb1c9e02b0c1 ("x86/efistub: Perform 4/5 level paging switch from the stub")
> Co-developed-by: Kiryl Shutsemau <kas@kernel.org>
> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> Signed-off-by: Usama Arif <usamaarif642@gmail.com>
> Reported-by: Michael van der Westhuizen <rmikey@meta.com>
> Reported-by: Tobias Fleig <tfleig@meta.com>
> ---
> drivers/firmware/efi/libstub/x86-5lvl.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
allmodconfig build:
ld: error: unplaced orphan section `__bug_table' from `drivers/firmware/efi/libstub/x86-5lvl.stub.o'
ld: error: unplaced orphan section `.altinstructions' from `drivers/firmware/efi/libstub/x86-5lvl.stub.o'
ld: error: unplaced orphan section `.altinstr_replacement' from `drivers/firmware/efi/libstub/x86-5lvl.stub.o'
ld: Unexpected GOT/PLT entries detected!
ld: Unexpected run-time procedure linkages detected!
ld: Unexpected run-time relocations (.rela) detected!
ld: drivers/firmware/efi/libstub/x86-5lvl.stub.o: in function `efi_5level_switch':
x86-5lvl.c:(.text+0x13e): undefined reference to `pv_ops'
ld: x86-5lvl.c:(.text+0x14d): undefined reference to `pv_ops'
ld: drivers/firmware/efi/libstub/x86-5lvl.stub.o:(.altinstr_replacement+0x1): undefined reference to `BUG_func'
ld: arch/x86/boot/compressed/vmlinux: hidden symbol `pv_ops' isn't defined
ld: final link failed: bad value
make[3]: *** [arch/x86/boot/compressed/Makefile:116: arch/x86/boot/compressed/vmlinux] Error 1
make[2]: *** [arch/x86/boot/Makefile:96: arch/x86/boot/compressed/vmlinux] Error 2
make[1]: *** [arch/x86/Makefile:308: bzImage] Error 2
make: *** [Makefile:248: __sub-make] Error 2
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Fri, 31 Oct 2025 at 15:40, Borislav Petkov <bp@alien8.de> wrote:
>
> On Tue, Oct 28, 2025 at 10:55:57AM +0000, Usama Arif wrote:
> > When transitioning from 5-level to 4-level paging, the existing code
> > incorrectly accesses page table entries by directly dereferencing CR3
> > and applying PAGE_MASK. This approach has several issues:
> >
> > - __native_read_cr3() returns the raw CR3 register value, which on
> > x86_64 includes not just the physical address but also flags Bits
> > above the physical address width of the system (i.e. above
> > __PHYSICAL_MASK_SHIFT) are also not masked.
> > - The pgd value is masked by PAGE_SIZE which doesn't take into account
> > the higher bits such as _PAGE_BIT_NOPTISHADOW.
> >
> > Replace this with proper accessor functions:
> > - native_read_cr3_pa(): Uses CR3_ADDR_MASK properly clearing SME encryption
> > bit and extracting only the physical address portion.
> > - mask pgd value with PTE_PFN_MASK instead of PAGE_MASK, accounting for
> > flags above physical address (_PAGE_BIT_NOPTISHADOW in particular).
> >
> > Fixes: cb1c9e02b0c1 ("x86/efistub: Perform 4/5 level paging switch from the stub")
> > Co-developed-by: Kiryl Shutsemau <kas@kernel.org>
> > Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> > Signed-off-by: Usama Arif <usamaarif642@gmail.com>
> > Reported-by: Michael van der Westhuizen <rmikey@meta.com>
> > Reported-by: Tobias Fleig <tfleig@meta.com>
> > ---
> > drivers/firmware/efi/libstub/x86-5lvl.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
>
> allmodconfig build:
>
> ld: error: unplaced orphan section `__bug_table' from `drivers/firmware/efi/libstub/x86-5lvl.stub.o'
> ld: error: unplaced orphan section `.altinstructions' from `drivers/firmware/efi/libstub/x86-5lvl.stub.o'
> ld: error: unplaced orphan section `.altinstr_replacement' from `drivers/firmware/efi/libstub/x86-5lvl.stub.o'
> ld: Unexpected GOT/PLT entries detected!
> ld: Unexpected run-time procedure linkages detected!
> ld: Unexpected run-time relocations (.rela) detected!
> ld: drivers/firmware/efi/libstub/x86-5lvl.stub.o: in function `efi_5level_switch':
> x86-5lvl.c:(.text+0x13e): undefined reference to `pv_ops'
> ld: x86-5lvl.c:(.text+0x14d): undefined reference to `pv_ops'
> ld: drivers/firmware/efi/libstub/x86-5lvl.stub.o:(.altinstr_replacement+0x1): undefined reference to `BUG_func'
> ld: arch/x86/boot/compressed/vmlinux: hidden symbol `pv_ops' isn't defined
> ld: final link failed: bad value
> make[3]: *** [arch/x86/boot/compressed/Makefile:116: arch/x86/boot/compressed/vmlinux] Error 1
> make[2]: *** [arch/x86/boot/Makefile:96: arch/x86/boot/compressed/vmlinux] Error 2
> make[1]: *** [arch/x86/Makefile:308: bzImage] Error 2
> make: *** [Makefile:248: __sub-make] Error 2
>
This code should be using native_pgd_val() not pgd_val().
On Fri, Oct 31, 2025 at 03:43:25PM +0100, Ard Biesheuvel wrote:
> This code should be using native_pgd_val() not pgd_val().
Seems to fix it, thanks.
I'll let Usama do more testing along with the usual build smoke tests - all
permutations of the below:
ARCHES=('x86_64' 'i386')
SMOKE_CONFIGS=("allnoconfig" "defconfig" "allmodconfig" "allyesconfig")
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
© 2016 - 2026 Red Hat, Inc.