[PATCH] powerpc/32: Remove PAGE_KERNEL_TEXT to fix startup failure

Christophe Leroy posted 1 patch 4 weeks ago
There is a newer version of this series
arch/powerpc/include/asm/pgtable.h | 12 ------------
arch/powerpc/mm/book3s32/mmu.c     |  4 ++--
arch/powerpc/mm/pgtable_32.c       |  2 +-
3 files changed, 3 insertions(+), 15 deletions(-)
[PATCH] powerpc/32: Remove PAGE_KERNEL_TEXT to fix startup failure
Posted by Christophe Leroy 4 weeks ago
PAGE_KERNEL_TEXT is an old macro that is used to tell kernel whether
kernel text has to be mapped read-only or read-write based on build
time options.

But nowadays, with functionnalities like jump_labels, static links,
etc ... more only less all kernels need to be read-write at some
point, and some combinations of configs failed to work due to
innacurate setting of PAGE_KERNEL_TEXT. On the other hand, today
we have CONFIG_STRICT_KERNEL_RWX which implements a more controlled
access to kernel modifications.

Instead of trying to keep PAGE_KERNEL_TEXT accurate with all
possible options that may imply kernel text modification, always
set kernel text read-write at startup and rely on
CONFIG_STRICT_KERNEL_RWX to provide accurate protection.

Reported-by: Erhard Furtner <erhard_f@mailbox.org>
Closes: https://lore.kernel.org/all/342b4120-911c-4723-82ec-d8c9b03a8aef@mailbox.org/
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/include/asm/pgtable.h | 12 ------------
 arch/powerpc/mm/book3s32/mmu.c     |  4 ++--
 arch/powerpc/mm/pgtable_32.c       |  2 +-
 3 files changed, 3 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h
index 93d77ad5a92f..d8f944a5a037 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -20,18 +20,6 @@ struct mm_struct;
 #include <asm/nohash/pgtable.h>
 #endif /* !CONFIG_PPC_BOOK3S */
 
-/*
- * Protection used for kernel text. We want the debuggers to be able to
- * set breakpoints anywhere, so don't write protect the kernel text
- * on platforms where such control is possible.
- */
-#if defined(CONFIG_KGDB) || defined(CONFIG_XMON) || defined(CONFIG_BDI_SWITCH) || \
-	defined(CONFIG_KPROBES) || defined(CONFIG_DYNAMIC_FTRACE)
-#define PAGE_KERNEL_TEXT	PAGE_KERNEL_X
-#else
-#define PAGE_KERNEL_TEXT	PAGE_KERNEL_ROX
-#endif
-
 /* Make modules code happy. We don't set RO yet */
 #define PAGE_KERNEL_EXEC	PAGE_KERNEL_X
 
diff --git a/arch/powerpc/mm/book3s32/mmu.c b/arch/powerpc/mm/book3s32/mmu.c
index be9c4106e22f..c42ecdf94e48 100644
--- a/arch/powerpc/mm/book3s32/mmu.c
+++ b/arch/powerpc/mm/book3s32/mmu.c
@@ -204,7 +204,7 @@ int mmu_mark_initmem_nx(void)
 
 	for (i = 0; i < nb - 1 && base < top;) {
 		size = bat_block_size(base, top);
-		setibat(i++, PAGE_OFFSET + base, base, size, PAGE_KERNEL_TEXT);
+		setibat(i++, PAGE_OFFSET + base, base, size, PAGE_KERNEL_X);
 		base += size;
 	}
 	if (base < top) {
@@ -215,7 +215,7 @@ int mmu_mark_initmem_nx(void)
 				pr_warn("Some RW data is getting mapped X. "
 					"Adjust CONFIG_DATA_SHIFT to avoid that.\n");
 		}
-		setibat(i++, PAGE_OFFSET + base, base, size, PAGE_KERNEL_TEXT);
+		setibat(i++, PAGE_OFFSET + base, base, size, PAGE_KERNEL_X);
 		base += size;
 	}
 	for (; i < nb; i++)
diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index 15276068f657..0c9ef705803e 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -104,7 +104,7 @@ static void __init __mapin_ram_chunk(unsigned long offset, unsigned long top)
 	p = memstart_addr + s;
 	for (; s < top; s += PAGE_SIZE) {
 		ktext = core_kernel_text(v);
-		map_kernel_page(v, p, ktext ? PAGE_KERNEL_TEXT : PAGE_KERNEL);
+		map_kernel_page(v, p, ktext ? PAGE_KERNEL_X : PAGE_KERNEL);
 		v += PAGE_SIZE;
 		p += PAGE_SIZE;
 	}
-- 
2.49.0
Re: [PATCH] powerpc/32: Remove PAGE_KERNEL_TEXT to fix startup failure
Posted by Erhard Furtner 3 weeks, 6 days ago
On 9/4/25 18:33, Christophe Leroy wrote:
> PAGE_KERNEL_TEXT is an old macro that is used to tell kernel whether
> kernel text has to be mapped read-only or read-write based on build
> time options.
> 
> But nowadays, with functionnalities like jump_labels, static links,
> etc ... more only less all kernels need to be read-write at some
> point, and some combinations of configs failed to work due to
> innacurate setting of PAGE_KERNEL_TEXT. On the other hand, today
> we have CONFIG_STRICT_KERNEL_RWX which implements a more controlled
> access to kernel modifications.
> 
> Instead of trying to keep PAGE_KERNEL_TEXT accurate with all
> possible options that may imply kernel text modification, always
> set kernel text read-write at startup and rely on
> CONFIG_STRICT_KERNEL_RWX to provide accurate protection.

I can confirm your patch fixes the startup failure for my G4 .config. 
Thanks!

Regards,
Erhard

> Reported-by: Erhard Furtner <erhard_f@mailbox.org>
> Closes: https://lore.kernel.org/all/342b4120-911c-4723-82ec-d8c9b03a8aef@mailbox.org/
> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
> ---
>   arch/powerpc/include/asm/pgtable.h | 12 ------------
>   arch/powerpc/mm/book3s32/mmu.c     |  4 ++--
>   arch/powerpc/mm/pgtable_32.c       |  2 +-
>   3 files changed, 3 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h
> index 93d77ad5a92f..d8f944a5a037 100644
> --- a/arch/powerpc/include/asm/pgtable.h
> +++ b/arch/powerpc/include/asm/pgtable.h
> @@ -20,18 +20,6 @@ struct mm_struct;
>   #include <asm/nohash/pgtable.h>
>   #endif /* !CONFIG_PPC_BOOK3S */
>   
> -/*
> - * Protection used for kernel text. We want the debuggers to be able to
> - * set breakpoints anywhere, so don't write protect the kernel text
> - * on platforms where such control is possible.
> - */
> -#if defined(CONFIG_KGDB) || defined(CONFIG_XMON) || defined(CONFIG_BDI_SWITCH) || \
> -	defined(CONFIG_KPROBES) || defined(CONFIG_DYNAMIC_FTRACE)
> -#define PAGE_KERNEL_TEXT	PAGE_KERNEL_X
> -#else
> -#define PAGE_KERNEL_TEXT	PAGE_KERNEL_ROX
> -#endif
> -
>   /* Make modules code happy. We don't set RO yet */
>   #define PAGE_KERNEL_EXEC	PAGE_KERNEL_X
>   
> diff --git a/arch/powerpc/mm/book3s32/mmu.c b/arch/powerpc/mm/book3s32/mmu.c
> index be9c4106e22f..c42ecdf94e48 100644
> --- a/arch/powerpc/mm/book3s32/mmu.c
> +++ b/arch/powerpc/mm/book3s32/mmu.c
> @@ -204,7 +204,7 @@ int mmu_mark_initmem_nx(void)
>   
>   	for (i = 0; i < nb - 1 && base < top;) {
>   		size = bat_block_size(base, top);
> -		setibat(i++, PAGE_OFFSET + base, base, size, PAGE_KERNEL_TEXT);
> +		setibat(i++, PAGE_OFFSET + base, base, size, PAGE_KERNEL_X);
>   		base += size;
>   	}
>   	if (base < top) {
> @@ -215,7 +215,7 @@ int mmu_mark_initmem_nx(void)
>   				pr_warn("Some RW data is getting mapped X. "
>   					"Adjust CONFIG_DATA_SHIFT to avoid that.\n");
>   		}
> -		setibat(i++, PAGE_OFFSET + base, base, size, PAGE_KERNEL_TEXT);
> +		setibat(i++, PAGE_OFFSET + base, base, size, PAGE_KERNEL_X);
>   		base += size;
>   	}
>   	for (; i < nb; i++)
> diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
> index 15276068f657..0c9ef705803e 100644
> --- a/arch/powerpc/mm/pgtable_32.c
> +++ b/arch/powerpc/mm/pgtable_32.c
> @@ -104,7 +104,7 @@ static void __init __mapin_ram_chunk(unsigned long offset, unsigned long top)
>   	p = memstart_addr + s;
>   	for (; s < top; s += PAGE_SIZE) {
>   		ktext = core_kernel_text(v);
> -		map_kernel_page(v, p, ktext ? PAGE_KERNEL_TEXT : PAGE_KERNEL);
> +		map_kernel_page(v, p, ktext ? PAGE_KERNEL_X : PAGE_KERNEL);
>   		v += PAGE_SIZE;
>   		p += PAGE_SIZE;
>   	}
Re: [PATCH] powerpc/32: Remove PAGE_KERNEL_TEXT to fix startup failure
Posted by Andrew Donnellan 3 weeks, 6 days ago
On Thu, 2025-09-04 at 18:33 +0200, Christophe Leroy wrote:
> PAGE_KERNEL_TEXT is an old macro that is used to tell kernel whether
> kernel text has to be mapped read-only or read-write based on build
> time options.
> 
> But nowadays, with functionnalities like jump_labels, static links,
> etc ... more only less all kernels need to be read-write at some
> point, and some combinations of configs failed to work due to
> innacurate setting of PAGE_KERNEL_TEXT. On the other hand, today
> we have CONFIG_STRICT_KERNEL_RWX which implements a more controlled
> access to kernel modifications.
> 
> Instead of trying to keep PAGE_KERNEL_TEXT accurate with all
> possible options that may imply kernel text modification, always
> set kernel text read-write at startup and rely on
> CONFIG_STRICT_KERNEL_RWX to provide accurate protection.
> 
> Reported-by: Erhard Furtner <erhard_f@mailbox.org>
> Closes:
> https://lore.kernel.org/all/342b4120-911c-4723-82ec-d8c9b03a8aef@mailbox.org/
> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>

The original issue that Erhard and I were investigating was why the latest
version of the PowerPC page table check series[0] was failing on his G4, when
built as part of a config with many other debugging options enabled.

With further instrumentation, it turns out that this was due to a failed
instruction patch while setting up a jump label for the
page_table_check_disabled static key, which was being checked in
page_table_check_pte_clear(), which was in turn inlined ultimately into
debug_vm_pgtable().

This patch seems to fix the problem, so:

Tested-by: Andrew Donnellan <ajd@linux.ibm.com>

But I'm still curious about why I only see the issue when:

  (a) CONFIG_KFENCE=y (even when disabled using kfence.sample_interval=0) -
noting that changing CONFIG_KFENCE doesn't change the definition of
PAGE_KERNEL_TEXT; and

  (b) when the jump label ends up in a __init function (removing __init from
debug_vm_pgtable() and its associated functions, or changing the code in such a
way that the static key check doesn't get inlined, resolves the issue, and
similarly for test_static_call_init() when CONFIG_STATIC_CALL_SELFTEST=y).

I don't understand the mm code well enough to make sense of this.

[0] https://lore.kernel.org/all/20250813062614.51759-1-ajd@linux.ibm.com/

-- 
Andrew Donnellan    OzLabs, ADL Canberra
ajd@linux.ibm.com   IBM Australia Limited
Re: [PATCH] powerpc/32: Remove PAGE_KERNEL_TEXT to fix startup failure
Posted by Christophe Leroy 3 weeks, 6 days ago

Le 05/09/2025 à 08:57, Andrew Donnellan a écrit :
> On Thu, 2025-09-04 at 18:33 +0200, Christophe Leroy wrote:
>> PAGE_KERNEL_TEXT is an old macro that is used to tell kernel whether
>> kernel text has to be mapped read-only or read-write based on build
>> time options.
>>
>> But nowadays, with functionnalities like jump_labels, static links,
>> etc ... more only less all kernels need to be read-write at some
>> point, and some combinations of configs failed to work due to
>> innacurate setting of PAGE_KERNEL_TEXT. On the other hand, today
>> we have CONFIG_STRICT_KERNEL_RWX which implements a more controlled
>> access to kernel modifications.
>>
>> Instead of trying to keep PAGE_KERNEL_TEXT accurate with all
>> possible options that may imply kernel text modification, always
>> set kernel text read-write at startup and rely on
>> CONFIG_STRICT_KERNEL_RWX to provide accurate protection.
>>
>> Reported-by: Erhard Furtner <erhard_f@mailbox.org>
>> Closes:
>> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fall%2F342b4120-911c-4723-82ec-d8c9b03a8aef%40mailbox.org%2F&data=05%7C02%7Cchristophe.leroy%40csgroup.eu%7Ce1df868f94284b06db0508ddec497516%7C8b87af7d86474dc78df45f69a2011bb5%7C0%7C0%7C638926522413828188%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=cqhzYIshhwKJluL2U2ULuNYoQ1CR1ZP0nsl5pb3wHd4%3D&reserved=0
>> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
> 
> The original issue that Erhard and I were investigating was why the latest
> version of the PowerPC page table check series[0] was failing on his G4, when
> built as part of a config with many other debugging options enabled.
> 
> With further instrumentation, it turns out that this was due to a failed
> instruction patch while setting up a jump label for the
> page_table_check_disabled static key, which was being checked in
> page_table_check_pte_clear(), which was in turn inlined ultimately into
> debug_vm_pgtable().
> 
> This patch seems to fix the problem, so:
> 
> Tested-by: Andrew Donnellan <ajd@linux.ibm.com>
> 
> But I'm still curious about why I only see the issue when:
> 
>    (a) CONFIG_KFENCE=y (even when disabled using kfence.sample_interval=0) -
> noting that changing CONFIG_KFENCE doesn't change the definition of
> PAGE_KERNEL_TEXT; and
> 
>    (b) when the jump label ends up in a __init function (removing __init from
> debug_vm_pgtable() and its associated functions, or changing the code in such a
> way that the static key check doesn't get inlined, resolves the issue, and
> similarly for test_static_call_init() when CONFIG_STATIC_CALL_SELFTEST=y).
> 
> I don't understand the mm code well enough to make sense of this.

That makes sense. When CONFIG_KFENCE is selected, only text and rodata 
are mapped with BATs. Everything else including inittext is mapped with 
pages. When CONFIG_KFENCE and CONFIG_DEBUG_PAGEALLOC are not selected, 
we map as much as possible with BATs.

And as you can see below, BATs are mapped with PAGE_KERNEL_X not with 
PAGE_KERNEL_TEXT.

Everything happen here below:

static unsigned long __init __mmu_mapin_ram(unsigned long base, unsigned 
long top)
{
	int idx;

	while ((idx = find_free_bat()) != -1 && base != top) {
		unsigned int size = bat_block_size(base, top);

		if (size < 128 << 10)
			break;
		setbat(idx, PAGE_OFFSET + base, base, size, PAGE_KERNEL_X);
		base += size;
	}

	return base;
}

unsigned long __init mmu_mapin_ram(unsigned long base, unsigned long top)
{
	unsigned long done;
	unsigned long border = (unsigned long)__srwx_boundary - PAGE_OFFSET;
	unsigned long size;

	size = roundup_pow_of_two((unsigned long)_einittext - PAGE_OFFSET);
	setibat(0, PAGE_OFFSET, 0, size, PAGE_KERNEL_X);

	if (debug_pagealloc_enabled_or_kfence()) {
		pr_debug_once("Read-Write memory mapped without BATs\n");
		if (base >= border)
			return base;
		if (top >= border)
			top = border;
	}

	if (!strict_kernel_rwx_enabled() || base >= border || top <= border)
		return __mmu_mapin_ram(base, top);

	done = __mmu_mapin_ram(base, border);
	if (done != border)
		return done;

	return __mmu_mapin_ram(border, top);
}


> 
> [0] https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fall%2F20250813062614.51759-1-ajd%40linux.ibm.com%2F&data=05%7C02%7Cchristophe.leroy%40csgroup.eu%7Ce1df868f94284b06db0508ddec497516%7C8b87af7d86474dc78df45f69a2011bb5%7C0%7C0%7C638926522413849910%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=1slIkZ4krf2sWUaKJ%2FayEX8t9dKpfsrDiAxZRohKfRQ%3D&reserved=0
> 

Re: [PATCH] powerpc/32: Remove PAGE_KERNEL_TEXT to fix startup failure
Posted by Ritesh Harjani (IBM) 4 weeks ago
Christophe Leroy <christophe.leroy@csgroup.eu> writes:

> PAGE_KERNEL_TEXT is an old macro that is used to tell kernel whether
> kernel text has to be mapped read-only or read-write based on build
> time options.
>
> But nowadays, with functionnalities like jump_labels, static links,
> etc ... more only less all kernels need to be read-write at some
> point, and some combinations of configs failed to work due to
> innacurate setting of PAGE_KERNEL_TEXT. On the other hand, today
> we have CONFIG_STRICT_KERNEL_RWX which implements a more controlled
> access to kernel modifications.
>
> Instead of trying to keep PAGE_KERNEL_TEXT accurate with all
> possible options that may imply kernel text modification, always
> set kernel text read-write at startup and rely on
> CONFIG_STRICT_KERNEL_RWX to provide accurate protection.
>
> Reported-by: Erhard Furtner <erhard_f@mailbox.org>
> Closes: https://lore.kernel.org/all/342b4120-911c-4723-82ec-d8c9b03a8aef@mailbox.org/
> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
> ---
>  arch/powerpc/include/asm/pgtable.h | 12 ------------
>  arch/powerpc/mm/book3s32/mmu.c     |  4 ++--
>  arch/powerpc/mm/pgtable_32.c       |  2 +-
>  3 files changed, 3 insertions(+), 15 deletions(-)
>

AFAIU - mmu_mark_initmem_nx gets called during kernel_init() which is
way after static call initialization correct? i.e.

start_kernel
  ...
  jump_label_init()
  static_call_init()
  ...
  ...
  rest_init()      /* Do the rest non-__init'ed, we're now alive */
    kernel_init()
      free_initmem() -> mark_initmem_nx() -> __mark_initmem_nx -> mmu_mark_initmem_nx() 
      mark_readonly()
        if (IS_ENABLED(CONFIG_STRICT_KERNEL_RWX) && rodata_enabled) {
           jump_label_init_ro()
           mark_rodata_ro() -> ....
           ...
        ...

Then I guess we mainly only need __mapin_ram_chunk() to be PAGE_KERNEL_X (RWX)
instead of PAGE_KERNEL_TEXT (ROX), isn't it?

Let me quickly validate it... 
...Ok, so I was able to get just this diff to be working. 

Thoughts?

diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
index 15276068f657..0c9ef705803e 100644
--- a/arch/powerpc/mm/pgtable_32.c
+++ b/arch/powerpc/mm/pgtable_32.c
@@ -104,7 +104,7 @@ static void __init __mapin_ram_chunk(unsigned long offset, unsigned long top)
        p = memstart_addr + s;
        for (; s < top; s += PAGE_SIZE) {
                ktext = core_kernel_text(v);
-               map_kernel_page(v, p, ktext ? PAGE_KERNEL_TEXT : PAGE_KERNEL);
+               map_kernel_page(v, p, ktext ? PAGE_KERNEL_X : PAGE_KERNEL);
                v += PAGE_SIZE;
                p += PAGE_SIZE;
        }

-ritesh



> diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h
> index 93d77ad5a92f..d8f944a5a037 100644
> --- a/arch/powerpc/include/asm/pgtable.h
> +++ b/arch/powerpc/include/asm/pgtable.h
> @@ -20,18 +20,6 @@ struct mm_struct;
>  #include <asm/nohash/pgtable.h>
>  #endif /* !CONFIG_PPC_BOOK3S */
>  
> -/*
> - * Protection used for kernel text. We want the debuggers to be able to
> - * set breakpoints anywhere, so don't write protect the kernel text
> - * on platforms where such control is possible.
> - */
> -#if defined(CONFIG_KGDB) || defined(CONFIG_XMON) || defined(CONFIG_BDI_SWITCH) || \
> -	defined(CONFIG_KPROBES) || defined(CONFIG_DYNAMIC_FTRACE)
> -#define PAGE_KERNEL_TEXT	PAGE_KERNEL_X
> -#else
> -#define PAGE_KERNEL_TEXT	PAGE_KERNEL_ROX
> -#endif
> -
>  /* Make modules code happy. We don't set RO yet */
>  #define PAGE_KERNEL_EXEC	PAGE_KERNEL_X
>  
> diff --git a/arch/powerpc/mm/book3s32/mmu.c b/arch/powerpc/mm/book3s32/mmu.c
> index be9c4106e22f..c42ecdf94e48 100644
> --- a/arch/powerpc/mm/book3s32/mmu.c
> +++ b/arch/powerpc/mm/book3s32/mmu.c
> @@ -204,7 +204,7 @@ int mmu_mark_initmem_nx(void)
>  
>  	for (i = 0; i < nb - 1 && base < top;) {
>  		size = bat_block_size(base, top);
> -		setibat(i++, PAGE_OFFSET + base, base, size, PAGE_KERNEL_TEXT);
> +		setibat(i++, PAGE_OFFSET + base, base, size, PAGE_KERNEL_X);
>  		base += size;
>  	}
>  	if (base < top) {
> @@ -215,7 +215,7 @@ int mmu_mark_initmem_nx(void)
>  				pr_warn("Some RW data is getting mapped X. "
>  					"Adjust CONFIG_DATA_SHIFT to avoid that.\n");
>  		}
> -		setibat(i++, PAGE_OFFSET + base, base, size, PAGE_KERNEL_TEXT);
> +		setibat(i++, PAGE_OFFSET + base, base, size, PAGE_KERNEL_X);
>  		base += size;
>  	}
>  	for (; i < nb; i++)
> diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
> index 15276068f657..0c9ef705803e 100644
> --- a/arch/powerpc/mm/pgtable_32.c
> +++ b/arch/powerpc/mm/pgtable_32.c
> @@ -104,7 +104,7 @@ static void __init __mapin_ram_chunk(unsigned long offset, unsigned long top)
>  	p = memstart_addr + s;
>  	for (; s < top; s += PAGE_SIZE) {
>  		ktext = core_kernel_text(v);
> -		map_kernel_page(v, p, ktext ? PAGE_KERNEL_TEXT : PAGE_KERNEL);
> +		map_kernel_page(v, p, ktext ? PAGE_KERNEL_X : PAGE_KERNEL);
>  		v += PAGE_SIZE;
>  		p += PAGE_SIZE;
>  	}
> -- 
> 2.49.0
Re: [PATCH] powerpc/32: Remove PAGE_KERNEL_TEXT to fix startup failure
Posted by Christophe Leroy 3 weeks, 6 days ago

Le 05/09/2025 à 05:55, Ritesh Harjani a écrit :
> Christophe Leroy <christophe.leroy@csgroup.eu> writes:
> 
>> PAGE_KERNEL_TEXT is an old macro that is used to tell kernel whether
>> kernel text has to be mapped read-only or read-write based on build
>> time options.
>>
>> But nowadays, with functionnalities like jump_labels, static links,
>> etc ... more only less all kernels need to be read-write at some
>> point, and some combinations of configs failed to work due to
>> innacurate setting of PAGE_KERNEL_TEXT. On the other hand, today
>> we have CONFIG_STRICT_KERNEL_RWX which implements a more controlled
>> access to kernel modifications.
>>
>> Instead of trying to keep PAGE_KERNEL_TEXT accurate with all
>> possible options that may imply kernel text modification, always
>> set kernel text read-write at startup and rely on
>> CONFIG_STRICT_KERNEL_RWX to provide accurate protection.
>>
>> Reported-by: Erhard Furtner <erhard_f@mailbox.org>
>> Closes: https://lore.kernel.org/all/342b4120-911c-4723-82ec-d8c9b03a8aef@mailbox.org/
>> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
>> ---
>>   arch/powerpc/include/asm/pgtable.h | 12 ------------
>>   arch/powerpc/mm/book3s32/mmu.c     |  4 ++--
>>   arch/powerpc/mm/pgtable_32.c       |  2 +-
>>   3 files changed, 3 insertions(+), 15 deletions(-)
>>
> 
> AFAIU - mmu_mark_initmem_nx gets called during kernel_init() which is
> way after static call initialization correct? i.e.
> 
> start_kernel
>    ...
>    jump_label_init()
>    static_call_init()
>    ...
>    ...
>    rest_init()      /* Do the rest non-__init'ed, we're now alive */
>      kernel_init()
>        free_initmem() -> mark_initmem_nx() -> __mark_initmem_nx -> mmu_mark_initmem_nx()
>        mark_readonly()
>          if (IS_ENABLED(CONFIG_STRICT_KERNEL_RWX) && rodata_enabled) {
>             jump_label_init_ro()
>             mark_rodata_ro() -> ....
>             ...
>          ...
> 
> Then I guess we mainly only need __mapin_ram_chunk() to be PAGE_KERNEL_X (RWX)
> instead of PAGE_KERNEL_TEXT (ROX), isn't it?
> 
> Let me quickly validate it...
> ...Ok, so I was able to get just this diff to be working.
> 
> Thoughts?

setibat() doesn't take into account whether it is RO or RW. Only X or NX 
is taken into account, so it doesn't matter whether it is X or ROX.

Then allthough you are right in principle, once the PAGE_KERNEL_TEXT is 
removed from __mapin_ram_chunk() it becomes completely useless, so 
better get rid of PAGE_KERNEL_TEXT completely.

> 
> diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
> index 15276068f657..0c9ef705803e 100644
> --- a/arch/powerpc/mm/pgtable_32.c
> +++ b/arch/powerpc/mm/pgtable_32.c
> @@ -104,7 +104,7 @@ static void __init __mapin_ram_chunk(unsigned long offset, unsigned long top)
>          p = memstart_addr + s;
>          for (; s < top; s += PAGE_SIZE) {
>                  ktext = core_kernel_text(v);
> -               map_kernel_page(v, p, ktext ? PAGE_KERNEL_TEXT : PAGE_KERNEL);
> +               map_kernel_page(v, p, ktext ? PAGE_KERNEL_X : PAGE_KERNEL);
>                  v += PAGE_SIZE;
>                  p += PAGE_SIZE;
>          }
> 
> -ritesh
> 
> 
> 
>> diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h
>> index 93d77ad5a92f..d8f944a5a037 100644
>> --- a/arch/powerpc/include/asm/pgtable.h
>> +++ b/arch/powerpc/include/asm/pgtable.h
>> @@ -20,18 +20,6 @@ struct mm_struct;
>>   #include <asm/nohash/pgtable.h>
>>   #endif /* !CONFIG_PPC_BOOK3S */
>>   
>> -/*
>> - * Protection used for kernel text. We want the debuggers to be able to
>> - * set breakpoints anywhere, so don't write protect the kernel text
>> - * on platforms where such control is possible.
>> - */
>> -#if defined(CONFIG_KGDB) || defined(CONFIG_XMON) || defined(CONFIG_BDI_SWITCH) || \
>> -	defined(CONFIG_KPROBES) || defined(CONFIG_DYNAMIC_FTRACE)
>> -#define PAGE_KERNEL_TEXT	PAGE_KERNEL_X
>> -#else
>> -#define PAGE_KERNEL_TEXT	PAGE_KERNEL_ROX
>> -#endif
>> -
>>   /* Make modules code happy. We don't set RO yet */
>>   #define PAGE_KERNEL_EXEC	PAGE_KERNEL_X
>>   
>> diff --git a/arch/powerpc/mm/book3s32/mmu.c b/arch/powerpc/mm/book3s32/mmu.c
>> index be9c4106e22f..c42ecdf94e48 100644
>> --- a/arch/powerpc/mm/book3s32/mmu.c
>> +++ b/arch/powerpc/mm/book3s32/mmu.c
>> @@ -204,7 +204,7 @@ int mmu_mark_initmem_nx(void)
>>   
>>   	for (i = 0; i < nb - 1 && base < top;) {
>>   		size = bat_block_size(base, top);
>> -		setibat(i++, PAGE_OFFSET + base, base, size, PAGE_KERNEL_TEXT);
>> +		setibat(i++, PAGE_OFFSET + base, base, size, PAGE_KERNEL_X);
>>   		base += size;
>>   	}
>>   	if (base < top) {
>> @@ -215,7 +215,7 @@ int mmu_mark_initmem_nx(void)
>>   				pr_warn("Some RW data is getting mapped X. "
>>   					"Adjust CONFIG_DATA_SHIFT to avoid that.\n");
>>   		}
>> -		setibat(i++, PAGE_OFFSET + base, base, size, PAGE_KERNEL_TEXT);
>> +		setibat(i++, PAGE_OFFSET + base, base, size, PAGE_KERNEL_X);
>>   		base += size;
>>   	}
>>   	for (; i < nb; i++)
>> diff --git a/arch/powerpc/mm/pgtable_32.c b/arch/powerpc/mm/pgtable_32.c
>> index 15276068f657..0c9ef705803e 100644
>> --- a/arch/powerpc/mm/pgtable_32.c
>> +++ b/arch/powerpc/mm/pgtable_32.c
>> @@ -104,7 +104,7 @@ static void __init __mapin_ram_chunk(unsigned long offset, unsigned long top)
>>   	p = memstart_addr + s;
>>   	for (; s < top; s += PAGE_SIZE) {
>>   		ktext = core_kernel_text(v);
>> -		map_kernel_page(v, p, ktext ? PAGE_KERNEL_TEXT : PAGE_KERNEL);
>> +		map_kernel_page(v, p, ktext ? PAGE_KERNEL_X : PAGE_KERNEL);
>>   		v += PAGE_SIZE;
>>   		p += PAGE_SIZE;
>>   	}
>> -- 
>> 2.49.0

Re: [PATCH] powerpc/32: Remove PAGE_KERNEL_TEXT to fix startup failure
Posted by Ritesh Harjani (IBM) 3 weeks, 6 days ago
Christophe Leroy <christophe.leroy@csgroup.eu> writes:

> Le 05/09/2025 à 05:55, Ritesh Harjani a écrit :
>> Christophe Leroy <christophe.leroy@csgroup.eu> writes:
>> 
>>> PAGE_KERNEL_TEXT is an old macro that is used to tell kernel whether
>>> kernel text has to be mapped read-only or read-write based on build
>>> time options.
>>>
>>> But nowadays, with functionnalities like jump_labels, static links,
>>> etc ... more only less all kernels need to be read-write at some
>>> point, and some combinations of configs failed to work due to
>>> innacurate setting of PAGE_KERNEL_TEXT. On the other hand, today
>>> we have CONFIG_STRICT_KERNEL_RWX which implements a more controlled
>>> access to kernel modifications.
>>>
>>> Instead of trying to keep PAGE_KERNEL_TEXT accurate with all
>>> possible options that may imply kernel text modification, always
>>> set kernel text read-write at startup and rely on
>>> CONFIG_STRICT_KERNEL_RWX to provide accurate protection.
>>>
>>> Reported-by: Erhard Furtner <erhard_f@mailbox.org>
>>> Closes: https://lore.kernel.org/all/342b4120-911c-4723-82ec-d8c9b03a8aef@mailbox.org/
>>> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
>>> ---
>>>   arch/powerpc/include/asm/pgtable.h | 12 ------------
>>>   arch/powerpc/mm/book3s32/mmu.c     |  4 ++--
>>>   arch/powerpc/mm/pgtable_32.c       |  2 +-
>>>   3 files changed, 3 insertions(+), 15 deletions(-)
>>>
>> 
>> AFAIU - mmu_mark_initmem_nx gets called during kernel_init() which is
>> way after static call initialization correct? i.e.
>> 
>> start_kernel
>>    ...
>>    jump_label_init()
>>    static_call_init()
>>    ...
>>    ...
>>    rest_init()      /* Do the rest non-__init'ed, we're now alive */
>>      kernel_init()
>>        free_initmem() -> mark_initmem_nx() -> __mark_initmem_nx -> mmu_mark_initmem_nx()
>>        mark_readonly()
>>          if (IS_ENABLED(CONFIG_STRICT_KERNEL_RWX) && rodata_enabled) {
>>             jump_label_init_ro()
>>             mark_rodata_ro() -> ....
>>             ...
>>          ...
>> 
>> Then I guess we mainly only need __mapin_ram_chunk() to be PAGE_KERNEL_X (RWX)
>> instead of PAGE_KERNEL_TEXT (ROX), isn't it?
>> 
>> Let me quickly validate it...
>> ...Ok, so I was able to get just this diff to be working.
>> 
>> Thoughts?
>
> setibat() doesn't take into account whether it is RO or RW. Only X or NX 
> is taken into account, so it doesn't matter whether it is X or ROX.
>
> Then allthough you are right in principle, once the PAGE_KERNEL_TEXT is 
> removed from __mapin_ram_chunk() it becomes completely useless, so 
> better get rid of PAGE_KERNEL_TEXT completely.
>

Aah yes, I checked the function setibat() and as you mentioned, it
doesn't honour RW permission anyways. Can we please update the same in
the commit message too? That makes it more clear then. 

-ritesh