[PATCH] arm64/signal: Avoid corruption of SME state when entering signal handler

Mark Brown posted 1 patch 1 month ago
There is a newer version of this series
arch/arm64/include/asm/fpsimd.h |  1 +
arch/arm64/kernel/fpsimd.c      | 30 ++++++++++++++++++++++++++++++
arch/arm64/kernel/signal.c      | 19 +------------------
3 files changed, 32 insertions(+), 18 deletions(-)
[PATCH] arm64/signal: Avoid corruption of SME state when entering signal handler
Posted by Mark Brown 1 month ago
When we enter a signal handler we exit streaming mode in order to ensure
that signal handlers can run normal FPSIMD code, and while we're at it we
also clear PSTATE.ZA. Currently the code in setup_return() updates both the
in memory copy of the state and the register state. Not only is this
redundant it can also lead to corruption if we are preempted.

Consider two tasks on one CPU:

 A: Begins signal entry in kernel mode, is preempted prior to SMSTOP.
 B: Using SM and/or ZA in userspace with register state current on the
    CPU, is preempted.
 A: Scheduled in, no register state changes made as in kernel mode.
 A: Executes SMSTOP, modifying live register state.
 A: Scheduled out.
 B: Scheduled in, fpsimd_thread_switch() sees the register state on the
    CPU is tracked as being that for task B so the state is not reloaded
    prior to returning to userspace.

Task B is now running with SM and ZA incorrectly cleared.

Fix this by check TIF_FOREIGN_FPSTATE and only updating one of the live
register context or the in memory copy when entering a signal handler.
Since this needs to happen atomically and all code that atomically
accesses FP state is in fpsimd.c also move the code there to ensure
consistency.

This race has been observed intermittently with fp-stress, especially
with preempt disabled.

Fixes: 40a8e87bb3285 ("arm64/sme: Disable ZA and streaming mode when handling signals")
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
---
 arch/arm64/include/asm/fpsimd.h |  1 +
 arch/arm64/kernel/fpsimd.c      | 30 ++++++++++++++++++++++++++++++
 arch/arm64/kernel/signal.c      | 19 +------------------
 3 files changed, 32 insertions(+), 18 deletions(-)

diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index f2a84efc361858d4deda99faf1967cc7cac386c1..09af7cfd9f6c2cec26332caa4c254976e117b1bf 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -76,6 +76,7 @@ extern void fpsimd_load_state(struct user_fpsimd_state *state);
 extern void fpsimd_thread_switch(struct task_struct *next);
 extern void fpsimd_flush_thread(void);
 
+extern void fpsimd_enter_sighandler(void);
 extern void fpsimd_signal_preserve_current_state(void);
 extern void fpsimd_preserve_current_state(void);
 extern void fpsimd_restore_current_state(void);
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 77006df20a75aee7c991cf116b6d06bfe953d1a4..e6b086dc09f21e7f30df32ab4f6875b53c4228fd 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -1693,6 +1693,36 @@ void fpsimd_signal_preserve_current_state(void)
 		sve_to_fpsimd(current);
 }
 
+/*
+ * Called by the signal handling code when preparing current to enter
+ * a signal handler. Currently this only needs to take care of exiting
+ * streaming mode and clearing ZA on SME systems.
+ */
+void fpsimd_enter_sighandler(void)
+{
+	if (!system_supports_sme())
+		return;
+
+	get_cpu_fpsimd_context();
+
+	if (test_thread_flag(TIF_FOREIGN_FPSTATE)) {
+		/* Exiting streaming mode zeros the FPSIMD state */
+		if (current->thread.svcr & SVCR_SM_MASK) {
+			memset(&current->thread.uw.fpsimd_state, 0,
+			       sizeof(current->thread.uw.fpsimd_state));
+			current->thread.fp_type = FP_STATE_FPSIMD;
+		}
+
+		current->thread.svcr &= ~(SVCR_ZA_MASK |
+					  SVCR_SM_MASK);
+	} else {
+		/* The register state is current, just update it. */
+		sme_smstop();
+	}
+
+	put_cpu_fpsimd_context();
+}
+
 /*
  * Called by KVM when entering the guest.
  */
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index 5619869475304776fc005fe24a385bf86bfdd253..fe07d0bd9f7978d73973f07ce38b7bdd7914abb2 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -1218,24 +1218,7 @@ static void setup_return(struct pt_regs *regs, struct k_sigaction *ka,
 	/* TCO (Tag Check Override) always cleared for signal handlers */
 	regs->pstate &= ~PSR_TCO_BIT;
 
-	/* Signal handlers are invoked with ZA and streaming mode disabled */
-	if (system_supports_sme()) {
-		/*
-		 * If we were in streaming mode the saved register
-		 * state was SVE but we will exit SM and use the
-		 * FPSIMD register state - flush the saved FPSIMD
-		 * register state in case it gets loaded.
-		 */
-		if (current->thread.svcr & SVCR_SM_MASK) {
-			memset(&current->thread.uw.fpsimd_state, 0,
-			       sizeof(current->thread.uw.fpsimd_state));
-			current->thread.fp_type = FP_STATE_FPSIMD;
-		}
-
-		current->thread.svcr &= ~(SVCR_ZA_MASK |
-					  SVCR_SM_MASK);
-		sme_smstop();
-	}
+	fpsimd_enter_sighandler();
 
 	if (system_supports_poe())
 		write_sysreg_s(POR_EL0_INIT, SYS_POR_EL0);

---
base-commit: 8e929cb546ee42c9a61d24fae60605e9e3192354
change-id: 20241023-arm64-fp-sme-sigentry-a2bd7187e71b

Best regards,
-- 
Mark Brown <broonie@kernel.org>
Re: [PATCH] arm64/signal: Avoid corruption of SME state when entering signal handler
Posted by Mark Rutland 3 weeks, 5 days ago
Hi Mark,

Thanks for this.

I originally just had a few comments on the commit message, but I
believe I've found a logic issue in this patch, and more general issue
throughout our FPSIMD/SVE/SME manipulation -- more details below.

On Wed, Oct 23, 2024 at 10:31:24PM +0100, Mark Brown wrote:
> When we enter a signal handler we exit streaming mode in order to ensure
> that signal handlers can run normal FPSIMD code, and while we're at it we
> also clear PSTATE.ZA. Currently the code in setup_return() updates both the
> in memory copy of the state and the register state. Not only is this
> redundant it can also lead to corruption if we are preempted.

It would be nice if we could be clearer regarding the implications, e.g.
that this has no effect on tasks which only use plain FPSIMD or SVE.

How about:

| We intend that signal handlers are entered with PSTATE.{SM,ZA}={0,0}.
| The logic for this in setup_return() manipulates the saved state and
| live CPU state in an unsafe manner, and consequently, when a task enters
| a signal handler:
| 
| * The task entering the signal handler might not have its PSTATE.{SM,ZA}
|   bits cleared, and other register state that is affected by changes to
|   PSTATE.{SM,ZA} might not be zeroed as expected.
| 
| * An unrelated task might have its PSTATE.{SM,ZA} bits cleared
|   unexpectedly, potentially zeroing other register state that is
|   affected by changes to PSTATE.{SM,ZA}.
| 
|   Tasks which do not set PSTATE.{SM,ZA} (i.e. those only using plain
|   FPSIMD or non-streaming SVE) are not affected, as there is no
|   resulting change to PSTATE.{SM,ZA}.

> Consider two tasks on one CPU:

Minor nit, but can we say:

| For example, consider two tasks on one CPU:

... since there are other races possible.

>  A: Begins signal entry in kernel mode, is preempted prior to SMSTOP.
>  B: Using SM and/or ZA in userspace with register state current on the
>     CPU, is preempted.
>  A: Scheduled in, no register state changes made as in kernel mode.
>  A: Executes SMSTOP, modifying live register state.
>  A: Scheduled out.
>  B: Scheduled in, fpsimd_thread_switch() sees the register state on the
>     CPU is tracked as being that for task B so the state is not reloaded
>     prior to returning to userspace.
> 
> Task B is now running with SM and ZA incorrectly cleared.

[ moving the "Fix ..." later ]

> This race has been observed intermittently with fp-stress, especially
> with preempt disabled.

It would be nice to have the signature of the failure as well, e.g.

| This is intermittently detected by the fp-stress test, which
| intermittently reports "ZA-VL-*-*: Bad SVCR: 0".

> Fix this by check TIF_FOREIGN_FPSTATE and only updating one of the live
> register context or the in memory copy when entering a signal handler.
> Since this needs to happen atomically and all code that atomically
> accesses FP state is in fpsimd.c also move the code there to ensure
> consistency.

How about:

| Fix this by:
| 
| * Checking TIF_FOREIGN_FPSTATE, and only updating the saved or live
|   state as appropriate.
| 
| * Using {get,put}_cpu_fpsimd_context() to ensure mutual exclusion
|   against other code which manipulates this state. To allow their use,
|   the logic is moved into a new fpsimd_enter_sighandler() helper in
|   fpsimd.c.

> Fixes: 40a8e87bb3285 ("arm64/sme: Disable ZA and streaming mode when handling signals")
> Signed-off-by: Mark Brown <broonie@kernel.org>
> Cc: stable@vger.kernel.org
> ---
>  arch/arm64/include/asm/fpsimd.h |  1 +
>  arch/arm64/kernel/fpsimd.c      | 30 ++++++++++++++++++++++++++++++
>  arch/arm64/kernel/signal.c      | 19 +------------------
>  3 files changed, 32 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
> index f2a84efc361858d4deda99faf1967cc7cac386c1..09af7cfd9f6c2cec26332caa4c254976e117b1bf 100644
> --- a/arch/arm64/include/asm/fpsimd.h
> +++ b/arch/arm64/include/asm/fpsimd.h
> @@ -76,6 +76,7 @@ extern void fpsimd_load_state(struct user_fpsimd_state *state);
>  extern void fpsimd_thread_switch(struct task_struct *next);
>  extern void fpsimd_flush_thread(void);
>  
> +extern void fpsimd_enter_sighandler(void);
>  extern void fpsimd_signal_preserve_current_state(void);
>  extern void fpsimd_preserve_current_state(void);
>  extern void fpsimd_restore_current_state(void);
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index 77006df20a75aee7c991cf116b6d06bfe953d1a4..e6b086dc09f21e7f30df32ab4f6875b53c4228fd 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -1693,6 +1693,36 @@ void fpsimd_signal_preserve_current_state(void)
>  		sve_to_fpsimd(current);
>  }
>  
> +/*
> + * Called by the signal handling code when preparing current to enter
> + * a signal handler. Currently this only needs to take care of exiting
> + * streaming mode and clearing ZA on SME systems.
> + */
> +void fpsimd_enter_sighandler(void)
> +{
> +	if (!system_supports_sme())
> +		return;
> +
> +	get_cpu_fpsimd_context();
> +
> +	if (test_thread_flag(TIF_FOREIGN_FPSTATE)) {
> +		/* Exiting streaming mode zeros the FPSIMD state */
> +		if (current->thread.svcr & SVCR_SM_MASK) {
> +			memset(&current->thread.uw.fpsimd_state, 0,
> +			       sizeof(current->thread.uw.fpsimd_state));
> +			current->thread.fp_type = FP_STATE_FPSIMD;
> +		}
> +
> +		current->thread.svcr &= ~(SVCR_ZA_MASK |
> +					  SVCR_SM_MASK);
> +	} else {
> +		/* The register state is current, just update it. */
> +		sme_smstop();
> +	}
> +
> +	put_cpu_fpsimd_context();
> +}

I don't think this is correct in the TIF_FOREIGN_FPSTATE case. We don't
unbind the saved state from another CPU it might still be resident on,
and so IIUC there's a race whereby the updates to the saved state can
end up discarded:

	CPU 0				CPU 1

	1. trap from user->kernel
	   with live state.
	2. context-switch out
	   - fpsimd_thread_switch()
	     saves the HW state
					3. context-switch in.
					   - fpsimd_thread_switch()
					     sets TIF_FOREIGN_FPSTATE

					4. fpsimd_enter_sighandler()
					5. context-switch out
					   - fpsimd_thread_switch() sees
					     TIF_FOREIGN_FPSTATE, saves
					     nothing

	6. context-switch in
	   - fpsimd_last_state.st is
	     this task's state
	   - this task's fpsimd_cIpu
	     is CPU 0
	   ... so fpsimd_thread_switch()
	   clears TIF_FOREIGN_FPSTATE
	
	7. running with stale live
	   state from step 1.

... and either:

* A subsequent return to userspace will see TIF_FOREIGN_FPSTATE is
  clear and not restore the in-memory state.

* A subsequent context-switch will see TIF_FOREIGN_FPSTATE is clear an 
  save the (stale) HW state again.

It looks like we have a similar pattern all over the place, e.g.  in
do_sve_acc():

void do_sve_acc(unsigned long esr, struct pt_regs *regs)
{
	...

	<< preempt; migrate from CPU 0 to CPU 1 >>
        
        get_cpu_fpsimd_context();

        if (test_and_set_thread_flag(TIF_SVE))
                WARN_ON(1); /* SVE access shouldn't have trapped */

        /*
         * Even if the task can have used streaming mode we can only
         * generate SVE access traps in normal SVE mode and
         * transitioning out of streaming mode may discard any
         * streaming mode state.  Always clear the high bits to avoid
         * any potential errors tracking what is properly initialised.
         */
        sve_init_regs();

        put_cpu_fpsimd_context();
	
	<< preempt; migrate from CPU 1 to CPU 0 >>

	<< TIF_SVE is set, as above >>
	<< TIF_FOREIGN_FPSTATE clear due to reusing old HW state >>
	<< Old HW state has CPACR_EL1.ZEN set to trap SVE >>
	<< Return to userapce won't reload HW state because >>
}

... which would explain the do_sve_acc() issue, and why it's so rare.

This is going to need a careful audit and a proper series of
fixes that can be backported to stable.

Mark.
Re: [PATCH] arm64/signal: Avoid corruption of SME state when entering signal handler
Posted by Mark Brown 3 weeks, 5 days ago
On Wed, Oct 30, 2024 at 05:34:16PM +0000, Mark Rutland wrote:

> I originally just had a few comments on the commit message, but I
> believe I've found a logic issue in this patch, and more general issue
> throughout our FPSIMD/SVE/SME manipulation -- more details below.

I'm fairly sure there's at least one other issue lurking somewhere with
TIF_SVE tearing, yes.  I've not been able to get that to reproduce, and
I've probably stared at this code too much to see it by pure inspection
however it looks like you might've spotted the issue here.

> On Wed, Oct 23, 2024 at 10:31:24PM +0100, Mark Brown wrote:

> It would be nice to have the signature of the failure as well, e.g.

> | This is intermittently detected by the fp-stress test, which
> | intermittently reports "ZA-VL-*-*: Bad SVCR: 0".

That's a common one for timing reasons, but it does also manifest with
other outputs (eg, if we turn off ZA while trying to execute
instructions that access ZA).

> I don't think this is correct in the TIF_FOREIGN_FPSTATE case. We don't
> unbind the saved state from another CPU it might still be resident on,
> and so IIUC there's a race whereby the updates to the saved state can
> end up discarded:

...

> ... and either:

> * A subsequent return to userspace will see TIF_FOREIGN_FPSTATE is
>   clear and not restore the in-memory state.

> * A subsequent context-switch will see TIF_FOREIGN_FPSTATE is clear an 
>   save the (stale) HW state again.

> It looks like we have a similar pattern all over the place, e.g.  in
> do_sve_acc():

Yes, indeed - I think that's a separate bug caused by the recalcuation
of TIF_FOREIGN_FPSTATE.

> This is going to need a careful audit and a proper series of
> fixes that can be backported to stable.

It feels like a separate thing at any rate.  We can do a simple and
robust but performance impacting fix by having fpsimd_thread_switch()
only ever set TIF_FOREIGN_FPSTATE, never clear it.  That'd cause extra
reloads in the case where we switch to a thread but stay in kernel mode
which probably happens often enough to be palatable.

Otherwise I'm not sure it's *too* hard, TIF_FOREIGN_FPSTATE is a bit of
a giveaway for places that could have issues.