[RFC/RFT PATCH 01/19] x86/idt: Move idt_table to __ro_after_init section

Ard Biesheuvel posted 19 patches 1 month ago
[RFC/RFT PATCH 01/19] x86/idt: Move idt_table to __ro_after_init section
Posted by Ard Biesheuvel 1 month ago
Currently, idt_table is allocated as page-aligned .bss, and remapped
read-only after init. This breaks a 2 MiB large page into 4k page
mappings, which defeats some of the effort done at boot to map the
kernel image using large pages, for improved TLB efficiency.

Mark this allocation as __ro_after_init instead, so it will be made
read-only automatically after boot, without breaking up large page
mappings.

This also fixes a latent bug on i386, where the size of idt_table is
less than a page, and so remapping it read-only could potentially affect
other read-write variables too, if those are not page-aligned as well.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/x86/kernel/idt.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c
index f445bec516a0..d6da25d7964f 100644
--- a/arch/x86/kernel/idt.c
+++ b/arch/x86/kernel/idt.c
@@ -170,7 +170,7 @@ static const __initconst struct idt_data apic_idts[] = {
 };
 
 /* Must be page-aligned because the real IDT is used in the cpu entry area */
-static gate_desc idt_table[IDT_ENTRIES] __page_aligned_bss;
+static gate_desc idt_table[IDT_ENTRIES] __aligned(PAGE_SIZE) __ro_after_init;
 
 static struct desc_ptr idt_descr __ro_after_init = {
 	.size		= IDT_TABLE_SIZE - 1,
@@ -308,9 +308,6 @@ void __init idt_setup_apic_and_irq_gates(void)
 	idt_map_in_cea();
 	load_idt(&idt_descr);
 
-	/* Make the IDT table read only */
-	set_memory_ro((unsigned long)&idt_table, 1);
-
 	idt_setup_done = true;
 }
 
-- 
2.47.3
Re: [RFC/RFT PATCH 01/19] x86/idt: Move idt_table to __ro_after_init section
Posted by Borislav Petkov 2 weeks, 5 days ago
On Thu, Jan 08, 2026 at 09:25:28AM +0000, Ard Biesheuvel wrote:
> Currently, idt_table is allocated as page-aligned .bss, and remapped
> read-only after init. This breaks a 2 MiB large page into 4k page
> mappings, which defeats some of the effort done at boot to map the
> kernel image using large pages, for improved TLB efficiency.
> 
> Mark this allocation as __ro_after_init instead, so it will be made
> read-only automatically after boot, without breaking up large page
> mappings.
> 
> This also fixes a latent bug on i386, where the size of idt_table is
> less than a page, and so remapping it read-only could potentially affect
> other read-write variables too, if those are not page-aligned as well.
> 
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
>  arch/x86/kernel/idt.c | 5 +----
>  1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c
> index f445bec516a0..d6da25d7964f 100644
> --- a/arch/x86/kernel/idt.c
> +++ b/arch/x86/kernel/idt.c
> @@ -170,7 +170,7 @@ static const __initconst struct idt_data apic_idts[] = {
>  };
>  
>  /* Must be page-aligned because the real IDT is used in the cpu entry area */
> -static gate_desc idt_table[IDT_ENTRIES] __page_aligned_bss;
> +static gate_desc idt_table[IDT_ENTRIES] __aligned(PAGE_SIZE) __ro_after_init;
>  
>  static struct desc_ptr idt_descr __ro_after_init = {
>  	.size		= IDT_TABLE_SIZE - 1,
> @@ -308,9 +308,6 @@ void __init idt_setup_apic_and_irq_gates(void)
>  	idt_map_in_cea();
>  	load_idt(&idt_descr);
>  
> -	/* Make the IDT table read only */
> -	set_memory_ro((unsigned long)&idt_table, 1);
> -
>  	idt_setup_done = true;
>  }

Good idea, except my guest shows me something else:

before:

[    0.186281] IDT table: 0xffffffff89c7f000

0xffffffff89c00000-0xffffffff89c7f000         508K     RW                 GLB NX pte
0xffffffff89c7f000-0xffffffff89c80000           4K     ro                 GLB NX pte
0xffffffff89c80000-0xffffffff89e00000        1536K     RW                 GLB NX pte
0xffffffff89e00000-0xffffffff8be00000          32M     RW         PSE     GLB NX pmd

This is clearly a single, 4K RO pageframe right in the middle of a splintered
2M page.

after:

[    0.180635] IDT table: 0xffffffff822cf000

0xffffffff81e00000-0xffffffff82200000           4M     ro         PSE     GLB NX pmd
0xffffffff82200000-0xffffffff8236f000        1468K     ro                 GLB NX pte
0xffffffff8236f000-0xffffffff82400000         580K     RW                 GLB NX pte
0xffffffff82400000-0xffffffff89800000         116M     RW         PSE     GLB NX pmd

but after applying your patch it looks like it still broke the 2M mapping as
the remaining piece is RW.

If I do this:

static gate_desc idt_table[IDT_ENTRIES] __aligned(PMD_SIZE) __ro_after_init;

it still doesn't help:

[    0.197808] IDT table: 0xffffffff82800000

0xffffffff81e00000-0xffffffff82800000          10M     ro         PSE     GLB NX pmd
0xffffffff82800000-0xffffffff828a0000         640K     ro                 GLB NX pte
0xffffffff828a0000-0xffffffff82a00000        1408K     RW                 GLB NX pte
0xffffffff82a00000-0xffffffff89e00000         116M     RW         PSE     GLB NX pmd

because that trailing piece of the 2M page is still RW.

And who knows what else am I breaking when doing this:

[    2.368601] ------------[ cut here ]------------
[    2.389816] [CRTC:35:crtc-0] vblank wait timed out
[    2.396676] WARNING: drivers/gpu/drm/drm_atomic_helper.c:1920 at drm_atomic_helper_wait_for_vblanks.part.0+0x1ba/0x1e0, CPU#1: kworker/1:0/57
[    2.406715] Modules linked in:
[    2.408462] CPU: 1 UID: 0 PID: 57 Comm: kworker/1:0 Not tainted 6.19.0-rc6+ #4 PREEMPT(full)
...

I don't know, sacrificing a 2M page just for the idt_table and so that it
doesn't get splintered, not sure it is worth it.

Hmmm.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Re: [RFC/RFT PATCH 01/19] x86/idt: Move idt_table to __ro_after_init section
Posted by Ard Biesheuvel 2 weeks, 5 days ago
On Thu, 22 Jan 2026 at 14:09, Borislav Petkov <bp@alien8.de> wrote:
>
> On Thu, Jan 08, 2026 at 09:25:28AM +0000, Ard Biesheuvel wrote:
> > Currently, idt_table is allocated as page-aligned .bss, and remapped
> > read-only after init. This breaks a 2 MiB large page into 4k page
> > mappings, which defeats some of the effort done at boot to map the
> > kernel image using large pages, for improved TLB efficiency.
> >
> > Mark this allocation as __ro_after_init instead, so it will be made
> > read-only automatically after boot, without breaking up large page
> > mappings.
> >
> > This also fixes a latent bug on i386, where the size of idt_table is
> > less than a page, and so remapping it read-only could potentially affect
> > other read-write variables too, if those are not page-aligned as well.
> >
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > ---
> >  arch/x86/kernel/idt.c | 5 +----
> >  1 file changed, 1 insertion(+), 4 deletions(-)
> >
> > diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c
> > index f445bec516a0..d6da25d7964f 100644
> > --- a/arch/x86/kernel/idt.c
> > +++ b/arch/x86/kernel/idt.c
> > @@ -170,7 +170,7 @@ static const __initconst struct idt_data apic_idts[] = {
> >  };
> >
> >  /* Must be page-aligned because the real IDT is used in the cpu entry area */
> > -static gate_desc idt_table[IDT_ENTRIES] __page_aligned_bss;
> > +static gate_desc idt_table[IDT_ENTRIES] __aligned(PAGE_SIZE) __ro_after_init;
> >
> >  static struct desc_ptr idt_descr __ro_after_init = {
> >       .size           = IDT_TABLE_SIZE - 1,
> > @@ -308,9 +308,6 @@ void __init idt_setup_apic_and_irq_gates(void)
> >       idt_map_in_cea();
> >       load_idt(&idt_descr);
> >
> > -     /* Make the IDT table read only */
> > -     set_memory_ro((unsigned long)&idt_table, 1);
> > -
> >       idt_setup_done = true;
> >  }
>
> Good idea, except my guest shows me something else:
>
> before:
>
> [    0.186281] IDT table: 0xffffffff89c7f000
>
> 0xffffffff89c00000-0xffffffff89c7f000         508K     RW                 GLB NX pte
> 0xffffffff89c7f000-0xffffffff89c80000           4K     ro                 GLB NX pte
> 0xffffffff89c80000-0xffffffff89e00000        1536K     RW                 GLB NX pte
> 0xffffffff89e00000-0xffffffff8be00000          32M     RW         PSE     GLB NX pmd
>
> This is clearly a single, 4K RO pageframe right in the middle of a splintered
> 2M page.
>
> after:
>
> [    0.180635] IDT table: 0xffffffff822cf000
>
> 0xffffffff81e00000-0xffffffff82200000           4M     ro         PSE     GLB NX pmd
> 0xffffffff82200000-0xffffffff8236f000        1468K     ro                 GLB NX pte
> 0xffffffff8236f000-0xffffffff82400000         580K     RW                 GLB NX pte
> 0xffffffff82400000-0xffffffff89800000         116M     RW         PSE     GLB NX pmd
>
> but after applying your patch it looks like it still broke the 2M mapping as
> the remaining piece is RW.
>

That is because the init region is between .data and .bss, which are
not 2M aligned, and so freeing/remapping the init region breaks the
hugepage mapping of the region between them. This is addressed in
patch #3.
Re: [RFC/RFT PATCH 01/19] x86/idt: Move idt_table to __ro_after_init section
Posted by Borislav Petkov 2 weeks, 5 days ago
On Thu, Jan 22, 2026 at 02:48:14PM +0100, Ard Biesheuvel wrote:
> That is because the init region is between .data and .bss, which are
> not 2M aligned, and so freeing/remapping the init region breaks the
> hugepage mapping of the region between them. This is addressed in
> patch #3.

Ok, I'll continue looking but this commit message should not mislead in
stating that it prevents the "breaking up large page mappings." I'll fix it up
if I end up picking up this or you fix it up in your next revision please.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Re: [RFC/RFT PATCH 01/19] x86/idt: Move idt_table to __ro_after_init section
Posted by Ard Biesheuvel 2 weeks, 5 days ago
On Thu, 22 Jan 2026 at 14:58, Borislav Petkov <bp@alien8.de> wrote:
>
> On Thu, Jan 22, 2026 at 02:48:14PM +0100, Ard Biesheuvel wrote:
> > That is because the init region is between .data and .bss, which are
> > not 2M aligned, and so freeing/remapping the init region breaks the
> > hugepage mapping of the region between them. This is addressed in
> > patch #3.
>
> Ok, I'll continue looking but this commit message should not mislead in
> stating that it prevents the "breaking up large page mappings."

It does. It just doesn't prevent if from happening for other reasons.

> I'll fix it up
> if I end up picking up this or you fix it up in your next revision please.
>

Ack.
Re: [RFC/RFT PATCH 01/19] x86/idt: Move idt_table to __ro_after_init section
Posted by Borislav Petkov 2 weeks, 5 days ago
On Thu, Jan 22, 2026 at 03:09:19PM +0100, Ard Biesheuvel wrote:
> It does. It just doesn't prevent if from happening for other reasons.

So this reads to me like "it does but it doesn't". Huh?!?

C'mon Ard, let's be more precise pls.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Re: [RFC/RFT PATCH 01/19] x86/idt: Move idt_table to __ro_after_init section
Posted by Ard Biesheuvel 2 weeks, 5 days ago
On Thu, 22 Jan 2026 at 15:16, Borislav Petkov <bp@alien8.de> wrote:
>
> On Thu, Jan 22, 2026 at 03:09:19PM +0100, Ard Biesheuvel wrote:
> > It does. It just doesn't prevent if from happening for other reasons.
>
> So this reads to me like "it does but it doesn't". Huh?!?
>
> C'mon Ard, let's be more precise pls.
>

Fair enough. What I meant to say is that if your BSS placed on a 2M
boundary or is sufficiently larger than that, some of it will be
mapped using PMDs, and this change will prevent those from being
broken up into page mappings. If your BSS is too small to be covered
by huge page mappings in the first place, this change obviously does
nothing (except fixing a bug on i386)
Re: [RFC/RFT PATCH 01/19] x86/idt: Move idt_table to __ro_after_init section
Posted by Borislav Petkov 2 weeks, 5 days ago
On Thu, Jan 22, 2026 at 03:20:39PM +0100, Ard Biesheuvel wrote:
> Fair enough. What I meant to say is that if your BSS placed on a 2M
> boundary or is sufficiently larger than that, some of it will be
> mapped using PMDs, and this change will prevent those from being
> broken up into page mappings. If your BSS is too small to be covered
> by huge page mappings in the first place, this change obviously does
> nothing (except fixing a bug on i386)

That makes more sense. :)

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette