arm64/gcs: Allow reuse of user managed shadow stacks

[PATCH RFC 1/3] arm64/gcs: Support reuse of GCS for exited threads

Posted by Mark Brown 1 week, 3 days ago

Currently when a thread with a userspace allocated stack exits it is not
possible for the GCS that was in use by the thread to be reused since no
stack switch token will be left on the stack, preventing use with
clone3() or userspace stack switching. The only thing userspace can
realistically do with the GCS is inspect it or unmap it which is not
ideal.

Enable reuse by modelling thread exit like pivoting in userspace with
the stack pivot instructions, writing a stack switch token at the top
entry of the GCS of the exiting thread.  This allows userspace to switch
back to the GCS in future, the use of the current stack location should
work well with glibc's current behaviour of fully uwninding the stack of
threads that exit cleanly.

This patch is an RFC and should not be applied as-is. Currently the
token will only be written for the current thread, but will be written
regardless of the reason the thread is exiting. This version of the
patch does not handle scheduling during exit() at all, the code is racy.

The feature is gated behind a new GCS mode flag PR_SHADOW_STACK_EXIT_TOKEN
to ensure that userspace that does not wish to use the tokens never has
to see them.

Signed-off-by: Mark Brown <broonie@kernel.org>
---
 arch/arm64/include/asm/gcs.h |  3 ++-
 arch/arm64/mm/gcs.c          | 25 ++++++++++++++++++++++++-
 include/uapi/linux/prctl.h   |  1 +
 3 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/gcs.h b/arch/arm64/include/asm/gcs.h
index b4bbec9382a1..1ec359d0ad51 100644
--- a/arch/arm64/include/asm/gcs.h
+++ b/arch/arm64/include/asm/gcs.h
@@ -52,7 +52,8 @@ static inline u64 gcsss2(void)
 }
 
 #define PR_SHADOW_STACK_SUPPORTED_STATUS_MASK \
-	(PR_SHADOW_STACK_ENABLE | PR_SHADOW_STACK_WRITE | PR_SHADOW_STACK_PUSH)
+	(PR_SHADOW_STACK_ENABLE | PR_SHADOW_STACK_WRITE | \
+	 PR_SHADOW_STACK_PUSH | PR_SHADOW_STACK_EXIT_TOKEN)
 
 #ifdef CONFIG_ARM64_GCS
 
diff --git a/arch/arm64/mm/gcs.c b/arch/arm64/mm/gcs.c
index fd1d5a6655de..4649c2b107a7 100644
--- a/arch/arm64/mm/gcs.c
+++ b/arch/arm64/mm/gcs.c
@@ -199,14 +199,37 @@ void gcs_set_el0_mode(struct task_struct *task)
 
 void gcs_free(struct task_struct *task)
 {
+	unsigned long __user *cap_ptr;
+	unsigned long cap_val;
+	int ret;
+
 	if (!system_supports_gcs())
 		return;
 
 	if (!task->mm || task->mm != current->mm)
 		return;
 
-	if (task->thread.gcs_base)
+	if (task->thread.gcs_base) {
 		vm_munmap(task->thread.gcs_base, task->thread.gcs_size);
+	} else if (task == current &&
+		   task->thread.gcs_el0_mode & PR_SHADOW_STACK_EXIT_TOKEN) {
+		cap_ptr = (unsigned long __user *)read_sysreg_s(SYS_GCSPR_EL0);
+		cap_ptr--;
+		cap_val = GCS_CAP(cap_ptr);
+
+		/*
+		 * We can't do anything constructive if this fails,
+		 * and the thread might be exiting due to being in a
+		 * bad state anyway.
+		 */
+		put_user_gcs(cap_val, cap_ptr, &ret);
+
+		/*
+		 * Ensure the new cap is ordered before standard
+		 * memory accesses to the same location.
+		 */
+		gcsb_dsync();
+	}
 
 	task->thread.gcspr_el0 = 0;
 	task->thread.gcs_base = 0;
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index ed3aed264aeb..c3c37c39639f 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -352,6 +352,7 @@ struct prctl_mm_map {
 # define PR_SHADOW_STACK_ENABLE         (1UL << 0)
 # define PR_SHADOW_STACK_WRITE		(1UL << 1)
 # define PR_SHADOW_STACK_PUSH		(1UL << 2)
+# define PR_SHADOW_STACK_EXIT_TOKEN	(1UL << 3)
 
 /*
  * Prevent further changes to the specified shadow stack

-- 
2.47.2

Re: [PATCH RFC 1/3] arm64/gcs: Support reuse of GCS for exited threads

Posted by Catalin Marinas 6 days, 13 hours ago

On Sun, Sep 21, 2025 at 02:21:35PM +0100, Mark Brown wrote:
> diff --git a/arch/arm64/mm/gcs.c b/arch/arm64/mm/gcs.c
> index fd1d5a6655de..4649c2b107a7 100644
> --- a/arch/arm64/mm/gcs.c
> +++ b/arch/arm64/mm/gcs.c
> @@ -199,14 +199,37 @@ void gcs_set_el0_mode(struct task_struct *task)
>  
>  void gcs_free(struct task_struct *task)
>  {
> +	unsigned long __user *cap_ptr;
> +	unsigned long cap_val;
> +	int ret;
> +
>  	if (!system_supports_gcs())
>  		return;
>  
>  	if (!task->mm || task->mm != current->mm)
>  		return;
> -	if (task->thread.gcs_base)
> +	if (task->thread.gcs_base) {
>  		vm_munmap(task->thread.gcs_base, task->thread.gcs_size);
> +	} else if (task == current &&
> +		   task->thread.gcs_el0_mode & PR_SHADOW_STACK_EXIT_TOKEN) {

I checked the code paths leading here and task is always current. But
better to keep the test in case the core code ever changes.

> +		cap_ptr = (unsigned long __user *)read_sysreg_s(SYS_GCSPR_EL0);
> +		cap_ptr--;
> +		cap_val = GCS_CAP(cap_ptr);
> +
> +		/*
> +		 * We can't do anything constructive if this fails,
> +		 * and the thread might be exiting due to being in a
> +		 * bad state anyway.
> +		 */
> +		put_user_gcs(cap_val, cap_ptr, &ret);
> +
> +		/*
> +		 * Ensure the new cap is ordered before standard
> +		 * memory accesses to the same location.
> +		 */
> +		gcsb_dsync();
> +	}

The only downside is that, if the thread did not unwind properly, we
don't write the token where it was initially. We could save the token
address from clone3() and restore it there instead.

-- 
Catalin

Re: [PATCH RFC 1/3] arm64/gcs: Support reuse of GCS for exited threads

Posted by Mark Brown 6 days, 13 hours ago

On Thu, Sep 25, 2025 at 05:46:46PM +0100, Catalin Marinas wrote:
> On Sun, Sep 21, 2025 at 02:21:35PM +0100, Mark Brown wrote:

> > +	} else if (task == current &&
> > +		   task->thread.gcs_el0_mode & PR_SHADOW_STACK_EXIT_TOKEN) {

> I checked the code paths leading here and task is always current. But
> better to keep the test in case the core code ever changes.

We can't have scheduled?  That's actually a pleasant surprise, that was
the main hole I was thinking of in the cover letter.

> > +		/*
> > +		 * We can't do anything constructive if this fails,
> > +		 * and the thread might be exiting due to being in a
> > +		 * bad state anyway.
> > +		 */
> > +		put_user_gcs(cap_val, cap_ptr, &ret);
> > +
> > +		/*
> > +		 * Ensure the new cap is ordered before standard
> > +		 * memory accesses to the same location.
> > +		 */
> > +		gcsb_dsync();
> > +	}

> The only downside is that, if the thread did not unwind properly, we
> don't write the token where it was initially. We could save the token
> address from clone3() and restore it there instead.

If we do that and the thread pivots away to another GCS and exits from
there then we'll write the token onto a different stack.  Writing onto
the location that userspace provided when creating the thread should be
fine for glibc's needs but it feels like the wrong assumption to bake
in, to me it feels less bad to have to map a new GCS in the case where
we didn't unwind properly.  There will be overhead in doing that but the
thread is already exiting uncleanly so imposing a cost doesn't seem
disproportionate.

Re: [PATCH RFC 1/3] arm64/gcs: Support reuse of GCS for exited threads

Posted by Catalin Marinas 6 days, 11 hours ago

On Thu, Sep 25, 2025 at 06:01:07PM +0100, Mark Brown wrote:
> On Thu, Sep 25, 2025 at 05:46:46PM +0100, Catalin Marinas wrote:
> > On Sun, Sep 21, 2025 at 02:21:35PM +0100, Mark Brown wrote:
> 
> > > +	} else if (task == current &&
> > > +		   task->thread.gcs_el0_mode & PR_SHADOW_STACK_EXIT_TOKEN) {
> 
> > I checked the code paths leading here and task is always current. But
> > better to keep the test in case the core code ever changes.
> 
> We can't have scheduled?  That's actually a pleasant surprise, that was
> the main hole I was thinking of in the cover letter.

Well, double-check. AFAICT, gcs_free() is only called on the exit_mm()
path when a thread dies.

I think gcs_free() may have been called in other contexts before the
cleanups you had in 6.16 (there were two more call sites for
gcs_free()). If that's the case, we could turn these checks into
WARN_ON_ONCE().

> > > +		/*
> > > +		 * We can't do anything constructive if this fails,
> > > +		 * and the thread might be exiting due to being in a
> > > +		 * bad state anyway.
> > > +		 */
> > > +		put_user_gcs(cap_val, cap_ptr, &ret);
> > > +
> > > +		/*
> > > +		 * Ensure the new cap is ordered before standard
> > > +		 * memory accesses to the same location.
> > > +		 */
> > > +		gcsb_dsync();
> > > +	}
> 
> > The only downside is that, if the thread did not unwind properly, we
> > don't write the token where it was initially. We could save the token
> > address from clone3() and restore it there instead.
> 
> If we do that and the thread pivots away to another GCS and exits from
> there then we'll write the token onto a different stack.  Writing onto
> the location that userspace provided when creating the thread should be
> fine for glibc's needs but it feels like the wrong assumption to bake
> in, to me it feels less bad to have to map a new GCS in the case where
> we didn't unwind properly.  There will be overhead in doing that but the
> thread is already exiting uncleanly so imposing a cost doesn't seem
> disproportionate.

You are right, that's the safest. glibc can always unmap the shadow
stack if the thread did not exit properly.

That said, does glibc ensure the thread unwinds its stack (and shadow
stack) on pthread_exit()? IIUC, it does, at least for the normal stack,
but I'm not familiar with the codebase.

-- 
Catalin

Re: [PATCH RFC 1/3] arm64/gcs: Support reuse of GCS for exited threads

Posted by Mark Brown 6 days, 11 hours ago

On Thu, Sep 25, 2025 at 07:36:50PM +0100, Catalin Marinas wrote:
> On Thu, Sep 25, 2025 at 06:01:07PM +0100, Mark Brown wrote:
> > On Thu, Sep 25, 2025 at 05:46:46PM +0100, Catalin Marinas wrote:
> > > On Sun, Sep 21, 2025 at 02:21:35PM +0100, Mark Brown wrote:

> > We can't have scheduled?  That's actually a pleasant surprise, that was
> > the main hole I was thinking of in the cover letter.

> Well, double-check. AFAICT, gcs_free() is only called on the exit_mm()
> path when a thread dies.

> I think gcs_free() may have been called in other contexts before the
> cleanups you had in 6.16 (there were two more call sites for
> gcs_free()). If that's the case, we could turn these checks into
> WARN_ON_ONCE().

Yeah, just I need to convince myself that we're always running the
exit_mm() path in the context of the exiting thread.  Like you say it
needs checking but hopefully you're right and the current code is more
correct than I had thought.

> > > The only downside is that, if the thread did not unwind properly, we
> > > don't write the token where it was initially. We could save the token
> > > address from clone3() and restore it there instead.

> > If we do that and the thread pivots away to another GCS and exits from
> > there then we'll write the token onto a different stack.  Writing onto
> > the location that userspace provided when creating the thread should be
> > fine for glibc's needs but it feels like the wrong assumption to bake
> > in, to me it feels less bad to have to map a new GCS in the case where
> > we didn't unwind properly.  There will be overhead in doing that but the
> > thread is already exiting uncleanly so imposing a cost doesn't seem
> > disproportionate.

> You are right, that's the safest. glibc can always unmap the shadow
> stack if the thread did not exit properly.

> That said, does glibc ensure the thread unwinds its stack (and shadow
> stack) on pthread_exit()? IIUC, it does, at least for the normal stack,
> but I'm not familiar with the codebase.

Florian indicated that it did in:

   https://marc.info/?l=glibc-alpha&m=175733266913483&w=2

I did look at the code to check, though I'm not at all familiar with the
codebase either so I'm not sure how much that check was worth.  If the
unwinder doesn't handle the shadow stack then userspace will be having a
very bad time anyway whenever it tries to run on an unwound stack so
doing so shouldn't be an additional cost there.

Re: [PATCH RFC 1/3] arm64/gcs: Support reuse of GCS for exited threads

Posted by Catalin Marinas 5 days, 19 hours ago

On Thu, Sep 25, 2025 at 08:00:40PM +0100, Mark Brown wrote:
> On Thu, Sep 25, 2025 at 07:36:50PM +0100, Catalin Marinas wrote:
> > On Thu, Sep 25, 2025 at 06:01:07PM +0100, Mark Brown wrote:
> > > On Thu, Sep 25, 2025 at 05:46:46PM +0100, Catalin Marinas wrote:
> > > > On Sun, Sep 21, 2025 at 02:21:35PM +0100, Mark Brown wrote:
> 
> > > We can't have scheduled?  That's actually a pleasant surprise, that was
> > > the main hole I was thinking of in the cover letter.
> 
> > Well, double-check. AFAICT, gcs_free() is only called on the exit_mm()
> > path when a thread dies.
> 
> > I think gcs_free() may have been called in other contexts before the
> > cleanups you had in 6.16 (there were two more call sites for
> > gcs_free()). If that's the case, we could turn these checks into
> > WARN_ON_ONCE().
> 
> Yeah, just I need to convince myself that we're always running the
> exit_mm() path in the context of the exiting thread.  Like you say it
> needs checking but hopefully you're right and the current code is more
> correct than I had thought.

The only path to gcs_free() is via mm_release() -> deactivate_mm().
mm_release() is called from either exit_mm_release() or
exec_mm_release(). These two functions are only called with current and
current->mm.

I guess for historical reasons, they take task and mm parameters but in
recent mainline, they don't seem to get anything other than current.

-- 
Catalin

Re: [PATCH RFC 1/3] arm64/gcs: Support reuse of GCS for exited threads

Posted by Mark Brown 5 days, 18 hours ago

On Fri, Sep 26, 2025 at 12:14:21PM +0100, Catalin Marinas wrote:
> On Thu, Sep 25, 2025 at 08:00:40PM +0100, Mark Brown wrote:

> > Yeah, just I need to convince myself that we're always running the
> > exit_mm() path in the context of the exiting thread.  Like you say it
> > needs checking but hopefully you're right and the current code is more
> > correct than I had thought.

> The only path to gcs_free() is via mm_release() -> deactivate_mm().
> mm_release() is called from either exit_mm_release() or
> exec_mm_release(). These two functions are only called with current and
> current->mm.

> I guess for historical reasons, they take task and mm parameters but in
> recent mainline, they don't seem to get anything other than current.

Thanks for checking for me.  I guess some refactoring might be in
order to make all this clear.