[PATCH v2] x86/alternative: delay freeing of smp_locks section

Mike Rapoport posted 1 patch 1 day, 13 hours ago
arch/x86/kernel/alternative.c | 22 +++++++++++++++++-----
1 file changed, 17 insertions(+), 5 deletions(-)
[PATCH v2] x86/alternative: delay freeing of smp_locks section
Posted by Mike Rapoport 1 day, 13 hours ago
From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>

On SMP systems alternative_instructions() frees memory occupied by
smp_locks section immediately after patching the lock instructions.

The memory is freed using free_init_pages() that calls free_reserved_area()
that essentially does __free_page() for every page in the range.

Up until recently it didn't update memblock state so in cases when
CONFIG_ARCH_KEEP_MEMBLOCK is enabled (on x86 it is selected by
INTEL_TDX_HOST), the state of memblock and the memory map would be
inconsistent.

Additionally, with CONFIG_DEFERRED_STRUCT_PAGE_INIT enabled feeing of
smp_locks happens before the memory map is fully initialized and freeing
reserved memory may case an access to not-yet-initialized struct page when
__free_page() searches for a buddy page.

Following the discussion in [1], implementation of memblock_free_late() and
free_reserved_area() was unified to ensure that reserved memory that's
freed after memblock transfers the pages to the buddy allocator is actually
freed and that the memblock and the memory map are consistent. As a part of
these changes, free_reserved_area() now WARN()s when it is called before
the initialization of the memory map is complete.

The memory map is fully initialized in page_alloc_init_late() that
completes before initcalls are executed, so it is safe to free reserved
memory in any initcall except early_initcall().

Move freeing of smp_locks section to an initcall to ensure it will happen
after the memory map is fully initialized. Since it does not matter which
exactly initcall to use and the code lives in arch/, pick arch_initcall.

[1] https://lore.kernel.org/all/ec2aaef14783869b3be6e3c253b2dcbf67dbc12a.camel@kernel.crashing.org

Reported-By: Bert Karwatzki <spasswolf@web.de>
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202603302154.b50adaf1-lkp@intel.com
Tested-By: Bert Karwatzki <spasswolf@web.de>
Link: https://lore.kernel.org/r/20260327140109.7561-1-spasswolf@web.de
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
---
 arch/x86/kernel/alternative.c | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index e87da25d1236..62936a3bde19 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -2448,19 +2448,31 @@ void __init alternative_instructions(void)
 					    __smp_locks, __smp_locks_end,
 					    _text, _etext);
 	}
+#endif
 
+	restart_nmi();
+	alternatives_patched = 1;
+
+	alt_reloc_selftest();
+}
+
+#ifdef CONFIG_SMP
+/*
+ * With CONFIG_DEFERRED_STRUCT_PAGE_INIT enabled we can free_init_pages() only
+ * after the deferred initialization of the memory map is complete.
+ */
+static int __init free_smp_locks(void)
+{
 	if (!uniproc_patched || num_possible_cpus() == 1) {
 		free_init_pages("SMP alternatives",
 				(unsigned long)__smp_locks,
 				(unsigned long)__smp_locks_end);
 	}
-#endif
 
-	restart_nmi();
-	alternatives_patched = 1;
-
-	alt_reloc_selftest();
+	return 0;
 }
+arch_initcall(free_smp_locks);
+#endif
 
 /**
  * text_poke_early - Update instructions on a live kernel at boot time

base-commit: e77a5a5cfe43b4c25bd44a3818e487033287517f
-- 
2.53.0
Re: [PATCH v2] x86/alternative: delay freeing of smp_locks section
Posted by Peter Zijlstra 1 day, 12 hours ago
On Mon, Mar 30, 2026 at 10:10:00PM +0300, Mike Rapoport wrote:
> From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
> 
> On SMP systems alternative_instructions() frees memory occupied by
> smp_locks section immediately after patching the lock instructions.
> 
> The memory is freed using free_init_pages() that calls free_reserved_area()
> that essentially does __free_page() for every page in the range.
> 
> Up until recently it didn't update memblock state so in cases when
> CONFIG_ARCH_KEEP_MEMBLOCK is enabled (on x86 it is selected by
> INTEL_TDX_HOST), the state of memblock and the memory map would be
> inconsistent.
> 
> Additionally, with CONFIG_DEFERRED_STRUCT_PAGE_INIT enabled feeing of
> smp_locks happens before the memory map is fully initialized and freeing
> reserved memory may case an access to not-yet-initialized struct page when
> __free_page() searches for a buddy page.
> 
> Following the discussion in [1], implementation of memblock_free_late() and
> free_reserved_area() was unified to ensure that reserved memory that's
> freed after memblock transfers the pages to the buddy allocator is actually
> freed and that the memblock and the memory map are consistent. As a part of
> these changes, free_reserved_area() now WARN()s when it is called before
> the initialization of the memory map is complete.
> 
> The memory map is fully initialized in page_alloc_init_late() that
> completes before initcalls are executed, so it is safe to free reserved
> memory in any initcall except early_initcall().
> 
> Move freeing of smp_locks section to an initcall to ensure it will happen
> after the memory map is fully initialized. Since it does not matter which
> exactly initcall to use and the code lives in arch/, pick arch_initcall.

Silly question, why not put the .smp_locks in
__init_begin[],__init_end[] right next to .altinstr such that it gets
freed by free_initmem() ?
Re: [PATCH v2] x86/alternative: delay freeing of smp_locks section
Posted by Mike Rapoport 21 hours ago
On Mon, Mar 30, 2026 at 09:27:37PM +0200, Peter Zijlstra wrote:
> On Mon, Mar 30, 2026 at 10:10:00PM +0300, Mike Rapoport wrote:
> > From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
> > 
> > On SMP systems alternative_instructions() frees memory occupied by
> > smp_locks section immediately after patching the lock instructions.
> > 
> > The memory is freed using free_init_pages() that calls free_reserved_area()
> > that essentially does __free_page() for every page in the range.
> > 
> > Up until recently it didn't update memblock state so in cases when
> > CONFIG_ARCH_KEEP_MEMBLOCK is enabled (on x86 it is selected by
> > INTEL_TDX_HOST), the state of memblock and the memory map would be
> > inconsistent.
> > 
> > Additionally, with CONFIG_DEFERRED_STRUCT_PAGE_INIT enabled feeing of
> > smp_locks happens before the memory map is fully initialized and freeing
> > reserved memory may case an access to not-yet-initialized struct page when
> > __free_page() searches for a buddy page.
> > 
> > Following the discussion in [1], implementation of memblock_free_late() and
> > free_reserved_area() was unified to ensure that reserved memory that's
> > freed after memblock transfers the pages to the buddy allocator is actually
> > freed and that the memblock and the memory map are consistent. As a part of
> > these changes, free_reserved_area() now WARN()s when it is called before
> > the initialization of the memory map is complete.
> > 
> > The memory map is fully initialized in page_alloc_init_late() that
> > completes before initcalls are executed, so it is safe to free reserved
> > memory in any initcall except early_initcall().
> > 
> > Move freeing of smp_locks section to an initcall to ensure it will happen
> > after the memory map is fully initialized. Since it does not matter which
> > exactly initcall to use and the code lives in arch/, pick arch_initcall.
> 
> Silly question, why not put the .smp_locks in
> __init_begin[],__init_end[] right next to .altinstr such that it gets
> freed by free_initmem() ?

Because it's not always freed? :)

-- 
Sincerely yours,
Mike.