arch/x86/kernel/head64.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
Remove a redundant check on kernel code's PMD _PAGE_PRESENT attribute
before fix up.
Current process looks like this:
pmd in [0, _text)
unset _PAGE_PRESENT
pmd in [_text, _end]
if (_PAGE_PRESENT)
fix up delta
pmd in (_end, 512)
unset _PAGE_PRESENT
level2_kernel_pgt compiled with _PAGE_PRESENT set. The check is
redundant
Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
CC: Ingo Molnar <mingo@kernel.org>
CC: Steve Wahl <steve.wahl@hpe.com>
---
v3: refine the change log per kirill's comment
v2: adjust the change log to emphasize the redundant check
---
arch/x86/kernel/head64.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index a817ed0724d1..bac33ec19aa2 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -260,8 +260,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
/* fixup pages that are part of the kernel image */
for (; i <= pmd_index((unsigned long)_end); i++)
- if (pmd[i] & _PAGE_PRESENT)
- pmd[i] += load_delta;
+ pmd[i] += load_delta;
/* invalidate pages after the kernel image */
for (; i < PTRS_PER_PMD; i++)
--
2.34.1
May I ask what else I should do? On Thu, May 23, 2024 at 12:35:39PM +0000, Wei Yang wrote: >Remove a redundant check on kernel code's PMD _PAGE_PRESENT attribute >before fix up. > >Current process looks like this: > > pmd in [0, _text) > unset _PAGE_PRESENT > pmd in [_text, _end] > if (_PAGE_PRESENT) > fix up delta > pmd in (_end, 512) > unset _PAGE_PRESENT > >level2_kernel_pgt compiled with _PAGE_PRESENT set. The check is >redundant > >Signed-off-by: Wei Yang <richard.weiyang@gmail.com> >CC: Thomas Gleixner <tglx@linutronix.de> >CC: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> >CC: Ingo Molnar <mingo@kernel.org> >CC: Steve Wahl <steve.wahl@hpe.com> > >--- >v3: refine the change log per kirill's comment >v2: adjust the change log to emphasize the redundant check >--- > arch/x86/kernel/head64.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > >diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c >index a817ed0724d1..bac33ec19aa2 100644 >--- a/arch/x86/kernel/head64.c >+++ b/arch/x86/kernel/head64.c >@@ -260,8 +260,7 @@ unsigned long __head __startup_64(unsigned long physaddr, > > /* fixup pages that are part of the kernel image */ > for (; i <= pmd_index((unsigned long)_end); i++) >- if (pmd[i] & _PAGE_PRESENT) >- pmd[i] += load_delta; >+ pmd[i] += load_delta; > > /* invalidate pages after the kernel image */ > for (; i < PTRS_PER_PMD; i++) >-- >2.34.1 -- Wei Yang Help you, Help me
On 5/23/24 05:35, Wei Yang wrote:
> --- a/arch/x86/kernel/head64.c
> +++ b/arch/x86/kernel/head64.c
> @@ -260,8 +260,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
>
> /* fixup pages that are part of the kernel image */
> for (; i <= pmd_index((unsigned long)_end); i++)
> - if (pmd[i] & _PAGE_PRESENT)
> - pmd[i] += load_delta;
> + pmd[i] += load_delta;
So, I think this is correct. But, man, I wish folks would go through
the git history and make it clear that they understand _how_ the code
got the way it is.
I suspect that the original _PAGE_PRESENT check wasn't even necessary if
cleanup_highmap() really did fix things up. But this commit:
2aa85f246c18 ("x86/boot/64: Make level2_kernel_pgt pages invalid
outside kernel area")
tweaked things to actively clear out PMDs that weren't populated in
Kirill's original loop. It didn't touch the _PAGE_PRESENT check. But
it certainly did imply that the PMD doesn't have any holes in it and
there's nothing int he middle that needs _PAGE_PRESENT cleared.
> level2_kernel_pgt compiled with _PAGE_PRESENT set. The check is
> redundant
This isn't super reassuring. It also depends on nothing having munged
the page tables up to this point. The code is also a bit cruel in that
it manipulates two different sets of PMDs with the same 'pmd' variable.
Also, is this comment still accurate after '2aa85f246c18'?
> * Fixup the kernel text+data virtual addresses. Note that
> * we might write invalid pmds, when the kernel is relocated
> * cleanup_highmap() fixes this up along with the mappings
> * beyond _end.
On Mon, Jun 03, 2024 at 11:50:06AM -0700, Dave Hansen wrote:
>On 5/23/24 05:35, Wei Yang wrote:
>> --- a/arch/x86/kernel/head64.c
>> +++ b/arch/x86/kernel/head64.c
>> @@ -260,8 +260,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
>>
>> /* fixup pages that are part of the kernel image */
>> for (; i <= pmd_index((unsigned long)_end); i++)
>> - if (pmd[i] & _PAGE_PRESENT)
>> - pmd[i] += load_delta;
>> + pmd[i] += load_delta;
>
>So, I think this is correct. But, man, I wish folks would go through
>the git history and make it clear that they understand _how_ the code
>got the way it is.
>
>I suspect that the original _PAGE_PRESENT check wasn't even necessary if
>cleanup_highmap() really did fix things up. But this commit:
>
> 2aa85f246c18 ("x86/boot/64: Make level2_kernel_pgt pages invalid
> outside kernel area")
>
>tweaked things to actively clear out PMDs that weren't populated in
>Kirill's original loop. It didn't touch the _PAGE_PRESENT check. But
>it certainly did imply that the PMD doesn't have any holes in it and
>there's nothing int he middle that needs _PAGE_PRESENT cleared.
>
>> level2_kernel_pgt compiled with _PAGE_PRESENT set. The check is
>> redundant
>
>This isn't super reassuring. It also depends on nothing having munged
>the page tables up to this point. The code is also a bit cruel in that
>it manipulates two different sets of PMDs with the same 'pmd' variable.
>
>Also, is this comment still accurate after '2aa85f246c18'?
>
>> * Fixup the kernel text+data virtual addresses. Note that
>> * we might write invalid pmds, when the kernel is relocated
>> * cleanup_highmap() fixes this up along with the mappings
>> * beyond _end.
Hi, Dave
Do you have other suggestions? What do I expect to do next?
--
Wei Yang
Help you, Help me
On Mon, Jun 03, 2024 at 11:50:06AM -0700, Dave Hansen wrote:
>On 5/23/24 05:35, Wei Yang wrote:
>> --- a/arch/x86/kernel/head64.c
>> +++ b/arch/x86/kernel/head64.c
>> @@ -260,8 +260,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
>>
>> /* fixup pages that are part of the kernel image */
>> for (; i <= pmd_index((unsigned long)_end); i++)
>> - if (pmd[i] & _PAGE_PRESENT)
>> - pmd[i] += load_delta;
>> + pmd[i] += load_delta;
>
>So, I think this is correct. But, man, I wish folks would go through
>the git history and make it clear that they understand _how_ thecode
>got the way it is.
>
Dave
Thanks for your comment.
In my first version, it lists the historical change, while Thomas thought they
are not relevant. So I remove those descriptions.
https://lkml.org/lkml/2024/3/23/350
>I suspect that the original _PAGE_PRESENT check wasn't even necessary if
>cleanup_highmap() really did fix things up. But this commit:
>
> 2aa85f246c18 ("x86/boot/64: Make level2_kernel_pgt pages invalid
> outside kernel area")
>
>tweaked things to actively clear out PMDs that weren't populated in
>Kirill's original loop. It didn't touch the _PAGE_PRESENT check. But
>it certainly did imply that the PMD doesn't have any holes in it and
>there's nothing int he middle that needs _PAGE_PRESENT cleared.
>
As I mentioned in my first version, the original code is introduced by
commit 1ab60e0f72f7 ("[PATCH] x86-64: Relocatable Kernel Support")
The reason for the check on _PAGE_PRESENT is at that moment, level2_kernel_pgt
is defined as:
NEXT_PAGE(level2_kernel_pgt)
/* 40MB kernel mapping. The kernel code cannot be bigger than that.
When you change this change KERNEL_TEXT_SIZE in page.h too. */
/* (2^48-(2*1024*1024*1024)-((2^39)*511)-((2^30)*510)) = 0 */
PMDS(0x0000000000000000, __PAGE_KERNEL_LARGE_EXEC|_PAGE_GLOBAL,
KERNEL_TEXT_SIZE/PMD_SIZE)
/* Module mapping starts here */
.fill (PTRS_PER_PMD - (KERNEL_TEXT_SIZE/PMD_SIZE)),8,0
While now, it looks like this:
SYM_DATA_START_PAGE_ALIGNED(level2_kernel_pgt)
/*
* Kernel high mapping.
*
* The kernel code+data+bss must be located below KERNEL_IMAGE_SIZE in
* virtual address space, which is 1 GiB if RANDOMIZE_BASE is enabled,
* 512 MiB otherwise.
*
* (NOTE: after that starts the module area, see MODULES_VADDR.)
*
* This table is eventually used by the kernel during normal runtime.
* Care must be taken to clear out undesired bits later, like _PAGE_RW
* or _PAGE_GLOBAL in some cases.
*/
PMDS(0, __PAGE_KERNEL_LARGE_EXEC, KERNEL_IMAGE_SIZE/PMD_SIZE)
SYM_DATA_END(level2_kernel_pgt)
The difference is at the original version, level2_kernel_pgt is not all
defined with _PAGE_PRESENT set. I didn't dig into from which commit we expand
the level2_kernel_pgt to full, while I think from that point, the check is
redundant.
>> level2_kernel_pgt compiled with _PAGE_PRESENT set. The check is
>> redundant
>
>This isn't super reassuring. It also depends on nothing having munged
>the page tables up to this point. The code is also a bit cruel in that
>it manipulates two different sets of PMDs with the same 'pmd' variable.
>
>Also, is this comment still accurate after '2aa85f246c18'?
>
>> * Fixup the kernel text+data virtual addresses. Note that
>> * we might write invalid pmds, when the kernel is relocated
>> * cleanup_highmap() fixes this up along with the mappings
>> * beyond _end.
Sounds this is not necessary any more. Do you prefer to remove this in next
version of this patch.
--
Wei Yang
Help you, Help me
On Thu, May 23, 2024 at 12:35:39PM +0000, Wei Yang wrote: > Remove a redundant check on kernel code's PMD _PAGE_PRESENT attribute > before fix up. > > Current process looks like this: > > pmd in [0, _text) > unset _PAGE_PRESENT > pmd in [_text, _end] > if (_PAGE_PRESENT) > fix up delta > pmd in (_end, 512) > unset _PAGE_PRESENT > > level2_kernel_pgt compiled with _PAGE_PRESENT set. The check is > redundant > > Signed-off-by: Wei Yang <richard.weiyang@gmail.com> > CC: Thomas Gleixner <tglx@linutronix.de> > CC: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > CC: Ingo Molnar <mingo@kernel.org> > CC: Steve Wahl <steve.wahl@hpe.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> -- Kiryl Shutsemau / Kirill A. Shutemov
© 2016 - 2026 Red Hat, Inc.