[PATCH 11/49] x86/alternatives: Remove the confusing, inaccurate & unnecessary 'temp_mm_state_t' abstraction

Ingo Molnar posted 49 patches 8 months, 3 weeks ago
There is a newer version of this series
[PATCH 11/49] x86/alternatives: Remove the confusing, inaccurate & unnecessary 'temp_mm_state_t' abstraction
Posted by Ingo Molnar 8 months, 3 weeks ago
So the temp_mm_state_t abstraction used by use_temporary_mm() and
unuse_temporary_mm() is super confusing:

 - The whole machinery is about temporarily switching to the
   text_poke_mm utility MM that got allocated during bootup
   for text-patching purposes alone:

	temp_mm_state_t prev;

        /*
         * Loading the temporary mm behaves as a compiler barrier, which
         * guarantees that the PTE will be set at the time memcpy() is done.
         */
        prev = use_temporary_mm(text_poke_mm);

 - Yet the value that gets saved in the temp_mm_state_t variable
   is not the temporary MM ... but the previous MM...

 - Ie. we temporarily put the non-temporary MM into a variable
   that has the temp_mm_state_t type. This makes no sense whatsoever.

 - The confusion continues in unuse_temporary_mm():

	static inline void unuse_temporary_mm(temp_mm_state_t prev_state)

   Here we unuse an MM that is ... not the temporary MM, but the
   previous MM. :-/

Fix up all this confusion by removing the unnecessary layer of
abstraction and using a bog-standard 'struct mm_struct *prev_mm'
variable to save the MM to.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/alternative.c | 24 ++++++++++--------------
 1 file changed, 10 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 056872aebe6e..edff7714556e 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -2139,10 +2139,6 @@ void __init_or_module text_poke_early(void *addr, const void *opcode,
 	}
 }
 
-typedef struct {
-	struct mm_struct *mm;
-} temp_mm_state_t;
-
 /*
  * Using a temporary mm allows to set temporary mappings that are not accessible
  * by other CPUs. Such mappings are needed to perform sensitive memory writes
@@ -2156,9 +2152,9 @@ typedef struct {
  *          loaded, thereby preventing interrupt handler bugs from overriding
  *          the kernel memory protection.
  */
-static inline temp_mm_state_t use_temporary_mm(struct mm_struct *mm)
+static inline struct mm_struct *use_temporary_mm(struct mm_struct *temp_mm)
 {
-	temp_mm_state_t temp_state;
+	struct mm_struct *prev_mm;
 
 	lockdep_assert_irqs_disabled();
 
@@ -2170,8 +2166,8 @@ static inline temp_mm_state_t use_temporary_mm(struct mm_struct *mm)
 	if (this_cpu_read(cpu_tlbstate_shared.is_lazy))
 		leave_mm();
 
-	temp_state.mm = this_cpu_read(cpu_tlbstate.loaded_mm);
-	switch_mm_irqs_off(NULL, mm, current);
+	prev_mm = this_cpu_read(cpu_tlbstate.loaded_mm);
+	switch_mm_irqs_off(NULL, temp_mm, current);
 
 	/*
 	 * If breakpoints are enabled, disable them while the temporary mm is
@@ -2187,17 +2183,17 @@ static inline temp_mm_state_t use_temporary_mm(struct mm_struct *mm)
 	if (hw_breakpoint_active())
 		hw_breakpoint_disable();
 
-	return temp_state;
+	return prev_mm;
 }
 
 __ro_after_init struct mm_struct *text_poke_mm;
 __ro_after_init unsigned long text_poke_mm_addr;
 
-static inline void unuse_temporary_mm(temp_mm_state_t prev_state)
+static inline void unuse_temporary_mm(struct mm_struct *prev_mm)
 {
 	lockdep_assert_irqs_disabled();
 
-	switch_mm_irqs_off(NULL, prev_state.mm, current);
+	switch_mm_irqs_off(NULL, prev_mm, current);
 
 	/* Clear the cpumask, to indicate no TLB flushing is needed anywhere */
 	cpumask_clear_cpu(raw_smp_processor_id(), mm_cpumask(text_poke_mm));
@@ -2228,7 +2224,7 @@ static void *__text_poke(text_poke_f func, void *addr, const void *src, size_t l
 {
 	bool cross_page_boundary = offset_in_page(addr) + len > PAGE_SIZE;
 	struct page *pages[2] = {NULL};
-	temp_mm_state_t prev;
+	struct mm_struct *prev_mm;
 	unsigned long flags;
 	pte_t pte, *ptep;
 	spinlock_t *ptl;
@@ -2286,7 +2282,7 @@ static void *__text_poke(text_poke_f func, void *addr, const void *src, size_t l
 	 * Loading the temporary mm behaves as a compiler barrier, which
 	 * guarantees that the PTE will be set at the time memcpy() is done.
 	 */
-	prev = use_temporary_mm(text_poke_mm);
+	prev_mm = use_temporary_mm(text_poke_mm);
 
 	kasan_disable_current();
 	func((u8 *)text_poke_mm_addr + offset_in_page(addr), src, len);
@@ -2307,7 +2303,7 @@ static void *__text_poke(text_poke_f func, void *addr, const void *src, size_t l
 	 * instruction that already allows the core to see the updated version.
 	 * Xen-PV is assumed to serialize execution in a similar manner.
 	 */
-	unuse_temporary_mm(prev);
+	unuse_temporary_mm(prev_mm);
 
 	/*
 	 * Flushing the TLB might involve IPIs, which would require enabled
-- 
2.45.2
Re: [PATCH 11/49] x86/alternatives: Remove the confusing, inaccurate & unnecessary 'temp_mm_state_t' abstraction
Posted by Peter Zijlstra 8 months, 2 weeks ago
On Fri, Mar 28, 2025 at 02:26:26PM +0100, Ingo Molnar wrote:
> So the temp_mm_state_t abstraction used by use_temporary_mm() and
> unuse_temporary_mm() is super confusing:

I thing I see what you're saying, but we also still have these patches
pending:

  https://lkml.kernel.org/r/20241119162527.952745944@infradead.org

:-(
Re: [PATCH 11/49] x86/alternatives: Remove the confusing, inaccurate & unnecessary 'temp_mm_state_t' abstraction
Posted by Ingo Molnar 8 months, 2 weeks ago
* Peter Zijlstra <peterz@infradead.org> wrote:

> On Fri, Mar 28, 2025 at 02:26:26PM +0100, Ingo Molnar wrote:
> > So the temp_mm_state_t abstraction used by use_temporary_mm() and
> > unuse_temporary_mm() is super confusing:
> 
> I thing I see what you're saying, but we also still have these patches
> pending:
> 
>   https://lkml.kernel.org/r/20241119162527.952745944@infradead.org
> 
> :-(

Yeah, so I think we should do your queue on top of mine, the
whole temp_mm_state_t abstraction was rather nonsensical,
and we shouldn't be iterating confusing code...

I've ported patches #1 and #3 from your queue on top, see
attached below - these should be the two that represent 99%
of the conflicts between these two efforts AFAICS.

Does that work for you?

Thanks,

	Ingo

================>
From: Peter Zijlstra <peterz@infradead.org>
Date: Tue, 19 Nov 2024 17:25:28 +0100
Subject: [PATCH] x86/mm: Add mm argument to unuse_temporary_mm()

In commit 209954cbc7d0 ("x86/mm/tlb: Update mm_cpumask lazily")
unuse_temporary_mm() grew the assumption that it gets used on
poking_nn exclusively. While this is currently true, lets not hard
code this assumption.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20241119163035.322525475@infradead.org
---
 arch/x86/kernel/alternative.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 5b1a6252a4b9..cfffcb80f564 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -2161,14 +2161,14 @@ static inline struct mm_struct *use_temporary_mm(struct mm_struct *temp_mm)
 __ro_after_init struct mm_struct *text_poke_mm;
 __ro_after_init unsigned long text_poke_mm_addr;
 
-static inline void unuse_temporary_mm(struct mm_struct *prev_mm)
+static inline void unuse_temporary_mm(struct mm_struct *mm, struct mm_struct *prev_mm)
 {
 	lockdep_assert_irqs_disabled();
 
 	switch_mm_irqs_off(NULL, prev_mm, current);
 
 	/* Clear the cpumask, to indicate no TLB flushing is needed anywhere */
-	cpumask_clear_cpu(raw_smp_processor_id(), mm_cpumask(text_poke_mm));
+	cpumask_clear_cpu(raw_smp_processor_id(), mm_cpumask(mm));
 
 	/*
 	 * Restore the breakpoints if they were disabled before the temporary mm
@@ -2275,7 +2275,7 @@ static void *__text_poke(text_poke_f func, void *addr, const void *src, size_t l
 	 * instruction that already allows the core to see the updated version.
 	 * Xen-PV is assumed to serialize execution in a similar manner.
 	 */
-	unuse_temporary_mm(prev_mm);
+	unuse_temporary_mm(text_poke_mm, prev_mm);
 
 	/*
 	 * Flushing the TLB might involve IPIs, which would require enabled


======================================>
From: Andy Lutomirski <luto@kernel.org>
Date: Tue, 19 Nov 2024 17:25:30 +0100
Subject: [PATCH] x86/mm: Make use/unuse_temporary_mm() non-static

This prepares them for use outside of the alternative machinery.
The code is unchanged.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/d1205bc7e165e249c52b7fe8cb1254f06e8a0e2a.1641659630.git.luto@kernel.org
Link: https://lore.kernel.org/r/20241119163035.533822339@infradead.org
---
 arch/x86/include/asm/mmu_context.h |  3 ++
 arch/x86/kernel/alternative.c      | 64 --------------------------------------
 arch/x86/mm/tlb.c                  | 64 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 67 insertions(+), 64 deletions(-)

diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 2398058b6e83..b103e1709a67 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -272,4 +272,7 @@ unsigned long __get_current_cr3_fast(void);
 
 #include <asm-generic/mmu_context.h>
 
+extern struct mm_struct *use_temporary_mm(struct mm_struct *temp_mm);
+extern void unuse_temporary_mm(struct mm_struct *mm, struct mm_struct *prev_mm);
+
 #endif /* _ASM_X86_MMU_CONTEXT_H */
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index cfffcb80f564..25abadaf8751 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -2111,73 +2111,9 @@ void __init_or_module text_poke_early(void *addr, const void *opcode,
 	}
 }
 
-/*
- * Using a temporary mm allows to set temporary mappings that are not accessible
- * by other CPUs. Such mappings are needed to perform sensitive memory writes
- * that override the kernel memory protections (e.g., W^X), without exposing the
- * temporary page-table mappings that are required for these write operations to
- * other CPUs. Using a temporary mm also allows to avoid TLB shootdowns when the
- * mapping is torn down.
- *
- * Context: The temporary mm needs to be used exclusively by a single core. To
- *          harden security IRQs must be disabled while the temporary mm is
- *          loaded, thereby preventing interrupt handler bugs from overriding
- *          the kernel memory protection.
- */
-static inline struct mm_struct *use_temporary_mm(struct mm_struct *temp_mm)
-{
-	struct mm_struct *prev_mm;
-
-	lockdep_assert_irqs_disabled();
-
-	/*
-	 * Make sure not to be in TLB lazy mode, as otherwise we'll end up
-	 * with a stale address space WITHOUT being in lazy mode after
-	 * restoring the previous mm.
-	 */
-	if (this_cpu_read(cpu_tlbstate_shared.is_lazy))
-		leave_mm();
-
-	prev_mm = this_cpu_read(cpu_tlbstate.loaded_mm);
-	switch_mm_irqs_off(NULL, temp_mm, current);
-
-	/*
-	 * If breakpoints are enabled, disable them while the temporary mm is
-	 * used. Userspace might set up watchpoints on addresses that are used
-	 * in the temporary mm, which would lead to wrong signals being sent or
-	 * crashes.
-	 *
-	 * Note that breakpoints are not disabled selectively, which also causes
-	 * kernel breakpoints (e.g., perf's) to be disabled. This might be
-	 * undesirable, but still seems reasonable as the code that runs in the
-	 * temporary mm should be short.
-	 */
-	if (hw_breakpoint_active())
-		hw_breakpoint_disable();
-
-	return prev_mm;
-}
-
 __ro_after_init struct mm_struct *text_poke_mm;
 __ro_after_init unsigned long text_poke_mm_addr;
 
-static inline void unuse_temporary_mm(struct mm_struct *mm, struct mm_struct *prev_mm)
-{
-	lockdep_assert_irqs_disabled();
-
-	switch_mm_irqs_off(NULL, prev_mm, current);
-
-	/* Clear the cpumask, to indicate no TLB flushing is needed anywhere */
-	cpumask_clear_cpu(raw_smp_processor_id(), mm_cpumask(mm));
-
-	/*
-	 * Restore the breakpoints if they were disabled before the temporary mm
-	 * was loaded.
-	 */
-	if (hw_breakpoint_active())
-		hw_breakpoint_restore();
-}
-
 static void text_poke_memcpy(void *dst, const void *src, size_t len)
 {
 	memcpy(dst, src, len);
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 0925768d00cb..06a1ad39be74 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -972,6 +972,70 @@ void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk)
 	this_cpu_write(cpu_tlbstate_shared.is_lazy, true);
 }
 
+/*
+ * Using a temporary mm allows to set temporary mappings that are not accessible
+ * by other CPUs. Such mappings are needed to perform sensitive memory writes
+ * that override the kernel memory protections (e.g., W^X), without exposing the
+ * temporary page-table mappings that are required for these write operations to
+ * other CPUs. Using a temporary mm also allows to avoid TLB shootdowns when the
+ * mapping is torn down.
+ *
+ * Context: The temporary mm needs to be used exclusively by a single core. To
+ *          harden security IRQs must be disabled while the temporary mm is
+ *          loaded, thereby preventing interrupt handler bugs from overriding
+ *          the kernel memory protection.
+ */
+struct mm_struct *use_temporary_mm(struct mm_struct *temp_mm)
+{
+	struct mm_struct *prev_mm;
+
+	lockdep_assert_irqs_disabled();
+
+	/*
+	 * Make sure not to be in TLB lazy mode, as otherwise we'll end up
+	 * with a stale address space WITHOUT being in lazy mode after
+	 * restoring the previous mm.
+	 */
+	if (this_cpu_read(cpu_tlbstate_shared.is_lazy))
+		leave_mm();
+
+	prev_mm = this_cpu_read(cpu_tlbstate.loaded_mm);
+	switch_mm_irqs_off(NULL, temp_mm, current);
+
+	/*
+	 * If breakpoints are enabled, disable them while the temporary mm is
+	 * used. Userspace might set up watchpoints on addresses that are used
+	 * in the temporary mm, which would lead to wrong signals being sent or
+	 * crashes.
+	 *
+	 * Note that breakpoints are not disabled selectively, which also causes
+	 * kernel breakpoints (e.g., perf's) to be disabled. This might be
+	 * undesirable, but still seems reasonable as the code that runs in the
+	 * temporary mm should be short.
+	 */
+	if (hw_breakpoint_active())
+		hw_breakpoint_disable();
+
+	return prev_mm;
+}
+
+void unuse_temporary_mm(struct mm_struct *mm, struct mm_struct *prev_mm)
+{
+	lockdep_assert_irqs_disabled();
+
+	switch_mm_irqs_off(NULL, prev_mm, current);
+
+	/* Clear the cpumask, to indicate no TLB flushing is needed anywhere */
+	cpumask_clear_cpu(raw_smp_processor_id(), mm_cpumask(mm));
+
+	/*
+	 * Restore the breakpoints if they were disabled before the temporary mm
+	 * was loaded.
+	 */
+	if (hw_breakpoint_active())
+		hw_breakpoint_restore();
+}
+
 /*
  * Call this when reinitializing a CPU.  It fixes the following potential
  * problems:
Re: [PATCH 11/49] x86/alternatives: Remove the confusing, inaccurate & unnecessary 'temp_mm_state_t' abstraction
Posted by Ingo Molnar 8 months, 2 weeks ago
* Ingo Molnar <mingo@kernel.org> wrote:

> 
> * Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Fri, Mar 28, 2025 at 02:26:26PM +0100, Ingo Molnar wrote:
> > > So the temp_mm_state_t abstraction used by use_temporary_mm() and
> > > unuse_temporary_mm() is super confusing:
> > 
> > I thing I see what you're saying, but we also still have these patches
> > pending:
> > 
> >   https://lkml.kernel.org/r/20241119162527.952745944@infradead.org
> > 
> > :-(
> 
> Yeah, so I think we should do your queue on top of mine, the
> whole temp_mm_state_t abstraction was rather nonsensical,
> and we shouldn't be iterating confusing code...
> 
> I've ported patches #1 and #3 from your queue on top, see
> attached below - these should be the two that represent 99%
> of the conflicts between these two efforts AFAICS.
> 
> Does that work for you?

To make this an easier decision, I've ported Andy's and your patches on 
top of the x86/alternatives series, into WIP.x86/mm, resolving the 
conflicts, with a few touchups:

  git://git.kernel.org/pub/scm/linux/kernel/git/mingo/tip.git WIP.x86/mm

Seems to work fine, after some light testing.

I'll send it out for another round of review if that's fine to you.

Thanks,

	Ingo

==========>

git://git.kernel.org/pub/scm/linux/kernel/git/mingo/tip.git WIP.x86/mm

Andy Lutomirski (5):
      x86/events, x86/insn-eval: Remove incorrect current->active_mm references
      x86/mm: Make use/unuse_temporary_mm() non-static
      x86/mm: Allow temporary mms when IRQs are on
      x86/efi: Make efi_enter/leave_mm use the *_temporary_mm() machinery
      x86/mm: Opt in to IRQs-off activate_mm()

Peter Zijlstra (2):
      x86/mm: Add mm argument to unuse_temporary_mm()
      x86/mm: Remove 'mm' argument from unuse_temporary_mm() again

 arch/x86/Kconfig                   |  1 +
 arch/x86/events/core.c             |  9 ++++-
 arch/x86/include/asm/mmu_context.h |  5 ++-
 arch/x86/kernel/alternative.c      | 64 -----------------------------------
 arch/x86/lib/insn-eval.c           | 13 +++++--
 arch/x86/mm/tlb.c                  | 69 ++++++++++++++++++++++++++++++++++++++
 arch/x86/platform/efi/efi_64.c     |  7 ++--
 7 files changed, 94 insertions(+), 74 deletions(-)
Re: [PATCH 11/49] x86/alternatives: Remove the confusing, inaccurate & unnecessary 'temp_mm_state_t' abstraction
Posted by Peter Zijlstra 8 months, 2 weeks ago
On Wed, Apr 02, 2025 at 06:07:19AM +0200, Ingo Molnar wrote:
> 
> * Ingo Molnar <mingo@kernel.org> wrote:
> 
> > 
> > * Peter Zijlstra <peterz@infradead.org> wrote:
> > 
> > > On Fri, Mar 28, 2025 at 02:26:26PM +0100, Ingo Molnar wrote:
> > > > So the temp_mm_state_t abstraction used by use_temporary_mm() and
> > > > unuse_temporary_mm() is super confusing:
> > > 
> > > I thing I see what you're saying, but we also still have these patches
> > > pending:
> > > 
> > >   https://lkml.kernel.org/r/20241119162527.952745944@infradead.org
> > > 
> > > :-(
> > 
> > Yeah, so I think we should do your queue on top of mine, the
> > whole temp_mm_state_t abstraction was rather nonsensical,
> > and we shouldn't be iterating confusing code...
> > 
> > I've ported patches #1 and #3 from your queue on top, see
> > attached below - these should be the two that represent 99%
> > of the conflicts between these two efforts AFAICS.
> > 
> > Does that work for you?
> 
> To make this an easier decision, I've ported Andy's and your patches on 
> top of the x86/alternatives series, into WIP.x86/mm, resolving the 
> conflicts, with a few touchups:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/mingo/tip.git WIP.x86/mm
> 
> Seems to work fine, after some light testing.
> 
> I'll send it out for another round of review if that's fine to you.

Yep, that looks right, Thanks!