[PATCH 2/3] efi/libstub: Fix page table access in 5-level to 4-level paging transition

Usama Arif posted 3 patches 3 months, 2 weeks ago
There is a newer version of this series
[PATCH 2/3] efi/libstub: Fix page table access in 5-level to 4-level paging transition
Posted by Usama Arif 3 months, 2 weeks ago
When transitioning from 5-level to 4-level paging, the existing code
incorrectly accesses page table entries by directly dereferencing CR3
and applying PAGE_MASK. This approach has several issues:

- __native_read_cr3() returns the raw CR3 register value, which on
  x86_64 includes not just the physical address but also flags Bits
  above the physical address width of the system (i.e. above
  __PHYSICAL_MASK_SHIFT) are also not masked.
- The pgd value is masked by PAGE_SIZE which doesn't take into account
  the higher bits such as _PAGE_BIT_NOPTISHADOW.

Replace this with proper accessor functions:
- read_cr3_pa(): Uses CR3_ADDR_MASK properly clearing SME encryption bit
  and extracting only the physical address portion.
- mask pgd value with PTE_PFN_MASK instead of PAGE_MASK, accounting for
  flags above physical address (_PAGE_BIT_NOPTISHADOW in particular).

Fixes: cb1c9e02b0c1 ("x86/efistub: Perform 4/5 level paging switch from the stub")
Co-developed-by: Kiryl Shutsemau <kas@kernel.org>
Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
Signed-off-by: Usama Arif <usamaarif642@gmail.com>
Reported-by: Michael van der Westhuizen <rmikey@meta.com>
Reported-by: Tobias Fleig <tfleig@meta.com>
---
 drivers/firmware/efi/libstub/x86-5lvl.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/firmware/efi/libstub/x86-5lvl.c b/drivers/firmware/efi/libstub/x86-5lvl.c
index f1c5fb45d5f7c..34b72da457487 100644
--- a/drivers/firmware/efi/libstub/x86-5lvl.c
+++ b/drivers/firmware/efi/libstub/x86-5lvl.c
@@ -81,8 +81,11 @@ void efi_5level_switch(void)
 		new_cr3 = memset(pgt, 0, PAGE_SIZE);
 		new_cr3[0] = (u64)cr3 | _PAGE_TABLE_NOENC;
 	} else {
+		pgd_t *pgdp;
+
+		pgdp = (pgd_t *)read_cr3_pa();
 		/* take the new root table pointer from the current entry #0 */
-		new_cr3 = (u64 *)(cr3[0] & PAGE_MASK);
+		new_cr3 = (u64 *)(pgd_val(pgdp[0]) & PTE_PFN_MASK);
 
 		/* copy the new root table if it is not 32-bit addressable */
 		if ((u64)new_cr3 > U32_MAX)
-- 
2.47.3
Re: [PATCH 2/3] efi/libstub: Fix page table access in 5-level to 4-level paging transition
Posted by Ard Biesheuvel 3 months, 2 weeks ago
On Thu, 23 Oct 2025 at 00:08, Usama Arif <usamaarif642@gmail.com> wrote:
>
> When transitioning from 5-level to 4-level paging, the existing code
> incorrectly accesses page table entries by directly dereferencing CR3
> and applying PAGE_MASK. This approach has several issues:
>
> - __native_read_cr3() returns the raw CR3 register value, which on
>   x86_64 includes not just the physical address but also flags Bits
>   above the physical address width of the system (i.e. above
>   __PHYSICAL_MASK_SHIFT) are also not masked.
> - The pgd value is masked by PAGE_SIZE which doesn't take into account
>   the higher bits such as _PAGE_BIT_NOPTISHADOW.
>
> Replace this with proper accessor functions:
> - read_cr3_pa(): Uses CR3_ADDR_MASK properly clearing SME encryption bit
>   and extracting only the physical address portion.
> - mask pgd value with PTE_PFN_MASK instead of PAGE_MASK, accounting for
>   flags above physical address (_PAGE_BIT_NOPTISHADOW in particular).
>
> Fixes: cb1c9e02b0c1 ("x86/efistub: Perform 4/5 level paging switch from the stub")
> Co-developed-by: Kiryl Shutsemau <kas@kernel.org>
> Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> Signed-off-by: Usama Arif <usamaarif642@gmail.com>
> Reported-by: Michael van der Westhuizen <rmikey@meta.com>
> Reported-by: Tobias Fleig <tfleig@meta.com>
> ---
>  drivers/firmware/efi/libstub/x86-5lvl.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/firmware/efi/libstub/x86-5lvl.c b/drivers/firmware/efi/libstub/x86-5lvl.c
> index f1c5fb45d5f7c..34b72da457487 100644
> --- a/drivers/firmware/efi/libstub/x86-5lvl.c
> +++ b/drivers/firmware/efi/libstub/x86-5lvl.c
> @@ -81,8 +81,11 @@ void efi_5level_switch(void)
>                 new_cr3 = memset(pgt, 0, PAGE_SIZE);
>                 new_cr3[0] = (u64)cr3 | _PAGE_TABLE_NOENC;
>         } else {
> +               pgd_t *pgdp;
> +
> +               pgdp = (pgd_t *)read_cr3_pa();

Shouldn't this be using native_read_cr3_pa()? And is there any reason
to re-read CR3 here, rather than update the code that populates the
cr3 variable? The preceding other branch of the if() should probably
use the same sanitised value of CR3, no?


>                 /* take the new root table pointer from the current entry #0 */
> -               new_cr3 = (u64 *)(cr3[0] & PAGE_MASK);
> +               new_cr3 = (u64 *)(pgd_val(pgdp[0]) & PTE_PFN_MASK);
>
>                 /* copy the new root table if it is not 32-bit addressable */
>                 if ((u64)new_cr3 > U32_MAX)
> --
> 2.47.3
>
Re: [PATCH 2/3] efi/libstub: Fix page table access in 5-level to 4-level paging transition
Posted by Kiryl Shutsemau 3 months, 2 weeks ago
On Thu, Oct 23, 2025 at 04:13:26PM +0200, Ard Biesheuvel wrote:
> On Thu, 23 Oct 2025 at 00:08, Usama Arif <usamaarif642@gmail.com> wrote:
> >
> > When transitioning from 5-level to 4-level paging, the existing code
> > incorrectly accesses page table entries by directly dereferencing CR3
> > and applying PAGE_MASK. This approach has several issues:
> >
> > - __native_read_cr3() returns the raw CR3 register value, which on
> >   x86_64 includes not just the physical address but also flags Bits
> >   above the physical address width of the system (i.e. above
> >   __PHYSICAL_MASK_SHIFT) are also not masked.
> > - The pgd value is masked by PAGE_SIZE which doesn't take into account
> >   the higher bits such as _PAGE_BIT_NOPTISHADOW.
> >
> > Replace this with proper accessor functions:
> > - read_cr3_pa(): Uses CR3_ADDR_MASK properly clearing SME encryption bit
> >   and extracting only the physical address portion.
> > - mask pgd value with PTE_PFN_MASK instead of PAGE_MASK, accounting for
> >   flags above physical address (_PAGE_BIT_NOPTISHADOW in particular).
> >
> > Fixes: cb1c9e02b0c1 ("x86/efistub: Perform 4/5 level paging switch from the stub")
> > Co-developed-by: Kiryl Shutsemau <kas@kernel.org>
> > Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> > Signed-off-by: Usama Arif <usamaarif642@gmail.com>
> > Reported-by: Michael van der Westhuizen <rmikey@meta.com>
> > Reported-by: Tobias Fleig <tfleig@meta.com>
> > ---
> >  drivers/firmware/efi/libstub/x86-5lvl.c | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/firmware/efi/libstub/x86-5lvl.c b/drivers/firmware/efi/libstub/x86-5lvl.c
> > index f1c5fb45d5f7c..34b72da457487 100644
> > --- a/drivers/firmware/efi/libstub/x86-5lvl.c
> > +++ b/drivers/firmware/efi/libstub/x86-5lvl.c
> > @@ -81,8 +81,11 @@ void efi_5level_switch(void)
> >                 new_cr3 = memset(pgt, 0, PAGE_SIZE);
> >                 new_cr3[0] = (u64)cr3 | _PAGE_TABLE_NOENC;
> >         } else {
> > +               pgd_t *pgdp;
> > +
> > +               pgdp = (pgd_t *)read_cr3_pa();
> 
> Shouldn't this be using native_read_cr3_pa()?

Perhaps. But I don't think it makes a difference.

We don't have paravirt in stub/decompressor, do we?

> And is there any reason
> to re-read CR3 here, rather than update the code that populates the
> cr3 variable? The preceding other branch of the if() should probably
> use the same sanitised value of CR3, no?

Good point.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov