When transitioning from 5-level to 4-level paging, the existing code
incorrectly accesses page table entries by directly dereferencing CR3
and applying PAGE_MASK. This approach has several issues:
- __native_read_cr3() returns the raw CR3 register value, which on
x86_64 includes not just the physical address but also flags Bits
above the physical address width of the system (i.e. above
__PHYSICAL_MASK_SHIFT) are also not masked.
- The pgd value is masked by PAGE_SIZE which doesn't take into account
the higher bits such as _PAGE_BIT_NOPTISHADOW.
Replace this with proper accessor functions:
- read_cr3_pa(): Uses CR3_ADDR_MASK properly clearing SME encryption bit
and extracting only the physical address portion.
- mask pgd value with PTE_PFN_MASK instead of PAGE_MASK, accounting for
flags above physical address (_PAGE_BIT_NOPTISHADOW in particular).
Fixes: cb1c9e02b0c1 ("x86/efistub: Perform 4/5 level paging switch from the stub")
Co-developed-by: Kiryl Shutsemau <kas@kernel.org>
Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
Signed-off-by: Usama Arif <usamaarif642@gmail.com>
Reported-by: Michael van der Westhuizen <rmikey@meta.com>
Reported-by: Tobias Fleig <tfleig@meta.com>
---
drivers/firmware/efi/libstub/x86-5lvl.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/firmware/efi/libstub/x86-5lvl.c b/drivers/firmware/efi/libstub/x86-5lvl.c
index f1c5fb45d5f7c..34b72da457487 100644
--- a/drivers/firmware/efi/libstub/x86-5lvl.c
+++ b/drivers/firmware/efi/libstub/x86-5lvl.c
@@ -81,8 +81,11 @@ void efi_5level_switch(void)
new_cr3 = memset(pgt, 0, PAGE_SIZE);
new_cr3[0] = (u64)cr3 | _PAGE_TABLE_NOENC;
} else {
+ pgd_t *pgdp;
+
+ pgdp = (pgd_t *)read_cr3_pa();
/* take the new root table pointer from the current entry #0 */
- new_cr3 = (u64 *)(cr3[0] & PAGE_MASK);
+ new_cr3 = (u64 *)(pgd_val(pgdp[0]) & PTE_PFN_MASK);
/* copy the new root table if it is not 32-bit addressable */
if ((u64)new_cr3 > U32_MAX)
--
2.47.3
On Thu, 23 Oct 2025 at 00:08, Usama Arif <usamaarif642@gmail.com> wrote:
>
> When transitioning from 5-level to 4-level paging, the existing code
> incorrectly accesses page table entries by directly dereferencing CR3
> and applying PAGE_MASK. This approach has several issues:
>
> - __native_read_cr3() returns the raw CR3 register value, which on
> x86_64 includes not just the physical address but also flags Bits
> above the physical address width of the system (i.e. above
> __PHYSICAL_MASK_SHIFT) are also not masked.
> - The pgd value is masked by PAGE_SIZE which doesn't take into account
> the higher bits such as _PAGE_BIT_NOPTISHADOW.
>
> Replace this with proper accessor functions:
> - read_cr3_pa(): Uses CR3_ADDR_MASK properly clearing SME encryption bit
> and extracting only the physical address portion.
> - mask pgd value with PTE_PFN_MASK instead of PAGE_MASK, accounting for
> flags above physical address (_PAGE_BIT_NOPTISHADOW in particular).
>
> Fixes: cb1c9e02b0c1 ("x86/efistub: Perform 4/5 level paging switch from the stub")
> Co-developed-by: Kiryl Shutsemau <kas@kernel.org>
> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> Signed-off-by: Usama Arif <usamaarif642@gmail.com>
> Reported-by: Michael van der Westhuizen <rmikey@meta.com>
> Reported-by: Tobias Fleig <tfleig@meta.com>
> ---
> drivers/firmware/efi/libstub/x86-5lvl.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/firmware/efi/libstub/x86-5lvl.c b/drivers/firmware/efi/libstub/x86-5lvl.c
> index f1c5fb45d5f7c..34b72da457487 100644
> --- a/drivers/firmware/efi/libstub/x86-5lvl.c
> +++ b/drivers/firmware/efi/libstub/x86-5lvl.c
> @@ -81,8 +81,11 @@ void efi_5level_switch(void)
> new_cr3 = memset(pgt, 0, PAGE_SIZE);
> new_cr3[0] = (u64)cr3 | _PAGE_TABLE_NOENC;
> } else {
> + pgd_t *pgdp;
> +
> + pgdp = (pgd_t *)read_cr3_pa();
Shouldn't this be using native_read_cr3_pa()? And is there any reason
to re-read CR3 here, rather than update the code that populates the
cr3 variable? The preceding other branch of the if() should probably
use the same sanitised value of CR3, no?
> /* take the new root table pointer from the current entry #0 */
> - new_cr3 = (u64 *)(cr3[0] & PAGE_MASK);
> + new_cr3 = (u64 *)(pgd_val(pgdp[0]) & PTE_PFN_MASK);
>
> /* copy the new root table if it is not 32-bit addressable */
> if ((u64)new_cr3 > U32_MAX)
> --
> 2.47.3
>
On Thu, Oct 23, 2025 at 04:13:26PM +0200, Ard Biesheuvel wrote:
> On Thu, 23 Oct 2025 at 00:08, Usama Arif <usamaarif642@gmail.com> wrote:
> >
> > When transitioning from 5-level to 4-level paging, the existing code
> > incorrectly accesses page table entries by directly dereferencing CR3
> > and applying PAGE_MASK. This approach has several issues:
> >
> > - __native_read_cr3() returns the raw CR3 register value, which on
> > x86_64 includes not just the physical address but also flags Bits
> > above the physical address width of the system (i.e. above
> > __PHYSICAL_MASK_SHIFT) are also not masked.
> > - The pgd value is masked by PAGE_SIZE which doesn't take into account
> > the higher bits such as _PAGE_BIT_NOPTISHADOW.
> >
> > Replace this with proper accessor functions:
> > - read_cr3_pa(): Uses CR3_ADDR_MASK properly clearing SME encryption bit
> > and extracting only the physical address portion.
> > - mask pgd value with PTE_PFN_MASK instead of PAGE_MASK, accounting for
> > flags above physical address (_PAGE_BIT_NOPTISHADOW in particular).
> >
> > Fixes: cb1c9e02b0c1 ("x86/efistub: Perform 4/5 level paging switch from the stub")
> > Co-developed-by: Kiryl Shutsemau <kas@kernel.org>
> > Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> > Signed-off-by: Usama Arif <usamaarif642@gmail.com>
> > Reported-by: Michael van der Westhuizen <rmikey@meta.com>
> > Reported-by: Tobias Fleig <tfleig@meta.com>
> > ---
> > drivers/firmware/efi/libstub/x86-5lvl.c | 5 ++++-
> > 1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/firmware/efi/libstub/x86-5lvl.c b/drivers/firmware/efi/libstub/x86-5lvl.c
> > index f1c5fb45d5f7c..34b72da457487 100644
> > --- a/drivers/firmware/efi/libstub/x86-5lvl.c
> > +++ b/drivers/firmware/efi/libstub/x86-5lvl.c
> > @@ -81,8 +81,11 @@ void efi_5level_switch(void)
> > new_cr3 = memset(pgt, 0, PAGE_SIZE);
> > new_cr3[0] = (u64)cr3 | _PAGE_TABLE_NOENC;
> > } else {
> > + pgd_t *pgdp;
> > +
> > + pgdp = (pgd_t *)read_cr3_pa();
>
> Shouldn't this be using native_read_cr3_pa()?
Perhaps. But I don't think it makes a difference.
We don't have paravirt in stub/decompressor, do we?
> And is there any reason
> to re-read CR3 here, rather than update the code that populates the
> cr3 variable? The preceding other branch of the if() should probably
> use the same sanitised value of CR3, no?
Good point.
--
Kiryl Shutsemau / Kirill A. Shutemov
© 2016 - 2026 Red Hat, Inc.