Currently, idt_table is allocated as page-aligned .bss, and remapped
read-only after init. This breaks a 2 MiB large page into 4k page
mappings, which defeats some of the effort done at boot to map the
kernel image using large pages, for improved TLB efficiency.
Mark this allocation as __ro_after_init instead, so it will be made
read-only automatically after boot, without breaking up large page
mappings.
This also fixes a latent bug on i386, where the size of idt_table is
less than a page, and so remapping it read-only could potentially affect
other read-write variables too, if those are not page-aligned as well.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/kernel/idt.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c
index f445bec516a0..d6da25d7964f 100644
--- a/arch/x86/kernel/idt.c
+++ b/arch/x86/kernel/idt.c
@@ -170,7 +170,7 @@ static const __initconst struct idt_data apic_idts[] = {
};
/* Must be page-aligned because the real IDT is used in the cpu entry area */
-static gate_desc idt_table[IDT_ENTRIES] __page_aligned_bss;
+static gate_desc idt_table[IDT_ENTRIES] __aligned(PAGE_SIZE) __ro_after_init;
static struct desc_ptr idt_descr __ro_after_init = {
.size = IDT_TABLE_SIZE - 1,
@@ -308,9 +308,6 @@ void __init idt_setup_apic_and_irq_gates(void)
idt_map_in_cea();
load_idt(&idt_descr);
- /* Make the IDT table read only */
- set_memory_ro((unsigned long)&idt_table, 1);
-
idt_setup_done = true;
}
--
2.47.3
On Thu, Jan 08, 2026 at 09:25:28AM +0000, Ard Biesheuvel wrote:
> Currently, idt_table is allocated as page-aligned .bss, and remapped
> read-only after init. This breaks a 2 MiB large page into 4k page
> mappings, which defeats some of the effort done at boot to map the
> kernel image using large pages, for improved TLB efficiency.
>
> Mark this allocation as __ro_after_init instead, so it will be made
> read-only automatically after boot, without breaking up large page
> mappings.
>
> This also fixes a latent bug on i386, where the size of idt_table is
> less than a page, and so remapping it read-only could potentially affect
> other read-write variables too, if those are not page-aligned as well.
>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
> arch/x86/kernel/idt.c | 5 +----
> 1 file changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c
> index f445bec516a0..d6da25d7964f 100644
> --- a/arch/x86/kernel/idt.c
> +++ b/arch/x86/kernel/idt.c
> @@ -170,7 +170,7 @@ static const __initconst struct idt_data apic_idts[] = {
> };
>
> /* Must be page-aligned because the real IDT is used in the cpu entry area */
> -static gate_desc idt_table[IDT_ENTRIES] __page_aligned_bss;
> +static gate_desc idt_table[IDT_ENTRIES] __aligned(PAGE_SIZE) __ro_after_init;
>
> static struct desc_ptr idt_descr __ro_after_init = {
> .size = IDT_TABLE_SIZE - 1,
> @@ -308,9 +308,6 @@ void __init idt_setup_apic_and_irq_gates(void)
> idt_map_in_cea();
> load_idt(&idt_descr);
>
> - /* Make the IDT table read only */
> - set_memory_ro((unsigned long)&idt_table, 1);
> -
> idt_setup_done = true;
> }
Good idea, except my guest shows me something else:
before:
[ 0.186281] IDT table: 0xffffffff89c7f000
0xffffffff89c00000-0xffffffff89c7f000 508K RW GLB NX pte
0xffffffff89c7f000-0xffffffff89c80000 4K ro GLB NX pte
0xffffffff89c80000-0xffffffff89e00000 1536K RW GLB NX pte
0xffffffff89e00000-0xffffffff8be00000 32M RW PSE GLB NX pmd
This is clearly a single, 4K RO pageframe right in the middle of a splintered
2M page.
after:
[ 0.180635] IDT table: 0xffffffff822cf000
0xffffffff81e00000-0xffffffff82200000 4M ro PSE GLB NX pmd
0xffffffff82200000-0xffffffff8236f000 1468K ro GLB NX pte
0xffffffff8236f000-0xffffffff82400000 580K RW GLB NX pte
0xffffffff82400000-0xffffffff89800000 116M RW PSE GLB NX pmd
but after applying your patch it looks like it still broke the 2M mapping as
the remaining piece is RW.
If I do this:
static gate_desc idt_table[IDT_ENTRIES] __aligned(PMD_SIZE) __ro_after_init;
it still doesn't help:
[ 0.197808] IDT table: 0xffffffff82800000
0xffffffff81e00000-0xffffffff82800000 10M ro PSE GLB NX pmd
0xffffffff82800000-0xffffffff828a0000 640K ro GLB NX pte
0xffffffff828a0000-0xffffffff82a00000 1408K RW GLB NX pte
0xffffffff82a00000-0xffffffff89e00000 116M RW PSE GLB NX pmd
because that trailing piece of the 2M page is still RW.
And who knows what else am I breaking when doing this:
[ 2.368601] ------------[ cut here ]------------
[ 2.389816] [CRTC:35:crtc-0] vblank wait timed out
[ 2.396676] WARNING: drivers/gpu/drm/drm_atomic_helper.c:1920 at drm_atomic_helper_wait_for_vblanks.part.0+0x1ba/0x1e0, CPU#1: kworker/1:0/57
[ 2.406715] Modules linked in:
[ 2.408462] CPU: 1 UID: 0 PID: 57 Comm: kworker/1:0 Not tainted 6.19.0-rc6+ #4 PREEMPT(full)
...
I don't know, sacrificing a 2M page just for the idt_table and so that it
doesn't get splintered, not sure it is worth it.
Hmmm.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Thu, 22 Jan 2026 at 14:09, Borislav Petkov <bp@alien8.de> wrote:
>
> On Thu, Jan 08, 2026 at 09:25:28AM +0000, Ard Biesheuvel wrote:
> > Currently, idt_table is allocated as page-aligned .bss, and remapped
> > read-only after init. This breaks a 2 MiB large page into 4k page
> > mappings, which defeats some of the effort done at boot to map the
> > kernel image using large pages, for improved TLB efficiency.
> >
> > Mark this allocation as __ro_after_init instead, so it will be made
> > read-only automatically after boot, without breaking up large page
> > mappings.
> >
> > This also fixes a latent bug on i386, where the size of idt_table is
> > less than a page, and so remapping it read-only could potentially affect
> > other read-write variables too, if those are not page-aligned as well.
> >
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > ---
> > arch/x86/kernel/idt.c | 5 +----
> > 1 file changed, 1 insertion(+), 4 deletions(-)
> >
> > diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c
> > index f445bec516a0..d6da25d7964f 100644
> > --- a/arch/x86/kernel/idt.c
> > +++ b/arch/x86/kernel/idt.c
> > @@ -170,7 +170,7 @@ static const __initconst struct idt_data apic_idts[] = {
> > };
> >
> > /* Must be page-aligned because the real IDT is used in the cpu entry area */
> > -static gate_desc idt_table[IDT_ENTRIES] __page_aligned_bss;
> > +static gate_desc idt_table[IDT_ENTRIES] __aligned(PAGE_SIZE) __ro_after_init;
> >
> > static struct desc_ptr idt_descr __ro_after_init = {
> > .size = IDT_TABLE_SIZE - 1,
> > @@ -308,9 +308,6 @@ void __init idt_setup_apic_and_irq_gates(void)
> > idt_map_in_cea();
> > load_idt(&idt_descr);
> >
> > - /* Make the IDT table read only */
> > - set_memory_ro((unsigned long)&idt_table, 1);
> > -
> > idt_setup_done = true;
> > }
>
> Good idea, except my guest shows me something else:
>
> before:
>
> [ 0.186281] IDT table: 0xffffffff89c7f000
>
> 0xffffffff89c00000-0xffffffff89c7f000 508K RW GLB NX pte
> 0xffffffff89c7f000-0xffffffff89c80000 4K ro GLB NX pte
> 0xffffffff89c80000-0xffffffff89e00000 1536K RW GLB NX pte
> 0xffffffff89e00000-0xffffffff8be00000 32M RW PSE GLB NX pmd
>
> This is clearly a single, 4K RO pageframe right in the middle of a splintered
> 2M page.
>
> after:
>
> [ 0.180635] IDT table: 0xffffffff822cf000
>
> 0xffffffff81e00000-0xffffffff82200000 4M ro PSE GLB NX pmd
> 0xffffffff82200000-0xffffffff8236f000 1468K ro GLB NX pte
> 0xffffffff8236f000-0xffffffff82400000 580K RW GLB NX pte
> 0xffffffff82400000-0xffffffff89800000 116M RW PSE GLB NX pmd
>
> but after applying your patch it looks like it still broke the 2M mapping as
> the remaining piece is RW.
>
That is because the init region is between .data and .bss, which are
not 2M aligned, and so freeing/remapping the init region breaks the
hugepage mapping of the region between them. This is addressed in
patch #3.
On Thu, Jan 22, 2026 at 02:48:14PM +0100, Ard Biesheuvel wrote:
> That is because the init region is between .data and .bss, which are
> not 2M aligned, and so freeing/remapping the init region breaks the
> hugepage mapping of the region between them. This is addressed in
> patch #3.
Ok, I'll continue looking but this commit message should not mislead in
stating that it prevents the "breaking up large page mappings." I'll fix it up
if I end up picking up this or you fix it up in your next revision please.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Thu, 22 Jan 2026 at 14:58, Borislav Petkov <bp@alien8.de> wrote: > > On Thu, Jan 22, 2026 at 02:48:14PM +0100, Ard Biesheuvel wrote: > > That is because the init region is between .data and .bss, which are > > not 2M aligned, and so freeing/remapping the init region breaks the > > hugepage mapping of the region between them. This is addressed in > > patch #3. > > Ok, I'll continue looking but this commit message should not mislead in > stating that it prevents the "breaking up large page mappings." It does. It just doesn't prevent if from happening for other reasons. > I'll fix it up > if I end up picking up this or you fix it up in your next revision please. > Ack.
On Thu, Jan 22, 2026 at 03:09:19PM +0100, Ard Biesheuvel wrote:
> It does. It just doesn't prevent if from happening for other reasons.
So this reads to me like "it does but it doesn't". Huh?!?
C'mon Ard, let's be more precise pls.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Thu, 22 Jan 2026 at 15:16, Borislav Petkov <bp@alien8.de> wrote: > > On Thu, Jan 22, 2026 at 03:09:19PM +0100, Ard Biesheuvel wrote: > > It does. It just doesn't prevent if from happening for other reasons. > > So this reads to me like "it does but it doesn't". Huh?!? > > C'mon Ard, let's be more precise pls. > Fair enough. What I meant to say is that if your BSS placed on a 2M boundary or is sufficiently larger than that, some of it will be mapped using PMDs, and this change will prevent those from being broken up into page mappings. If your BSS is too small to be covered by huge page mappings in the first place, this change obviously does nothing (except fixing a bug on i386)
On Thu, Jan 22, 2026 at 03:20:39PM +0100, Ard Biesheuvel wrote:
> Fair enough. What I meant to say is that if your BSS placed on a 2M
> boundary or is sufficiently larger than that, some of it will be
> mapped using PMDs, and this change will prevent those from being
> broken up into page mappings. If your BSS is too small to be covered
> by huge page mappings in the first place, this change obviously does
> nothing (except fixing a bug on i386)
That makes more sense. :)
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
© 2016 - 2026 Red Hat, Inc.