[v1] x86/fpu: don't abuse x86_task_fpu(PF_USER_WORKER) in .regset_get() paths

[PATCH 5/6] x86/shstk: don't create the shadow stack for PF_USER_WORKERs

Posted by Oleg Nesterov 1 month, 3 weeks ago

If a features_enabled(ARCH_SHSTK_SHSTK) userspace thread creates a
PF_USER_WORKER thread, shstk_alloc_thread_stack() allocates the shadow
stack for no reason, the new (kernel) thread will never return to usermode.

Plus the current code doesn't even look correct, in this case fpu_clone()
won't call update_fpu_shstk().

Add the new "bool minimal = !!args->fn" argument (which matches that of
fpu_clone()) to shstk_alloc_thread_stack() and change it to do
reset_thread_features(tsk) if "minimal" is true.

With this patch ssp_get() -> ssp_active(target) should never return true
if target->flags & PF_USER_WORKER.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 arch/x86/include/asm/shstk.h |  4 ++--
 arch/x86/kernel/process.c    |  2 +-
 arch/x86/kernel/shstk.c      | 11 ++++++++++-
 3 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/shstk.h b/arch/x86/include/asm/shstk.h
index 92d449cc352a..dfb2aeebc25f 100644
--- a/arch/x86/include/asm/shstk.h
+++ b/arch/x86/include/asm/shstk.h
@@ -17,7 +17,7 @@ struct thread_shstk {
 long shstk_prctl(struct task_struct *task, int option, unsigned long arg2);
 void reset_thread_features(struct task_struct *task);
 unsigned long shstk_alloc_thread_stack(struct task_struct *p, unsigned long clone_flags,
-				       unsigned long stack_size);
+				       bool minimal, unsigned long stack_size);
 void shstk_free(struct task_struct *p);
 int setup_signal_shadow_stack(struct ksignal *ksig);
 int restore_signal_shadow_stack(void);
@@ -28,7 +28,7 @@ static inline long shstk_prctl(struct task_struct *task, int option,
 			       unsigned long arg2) { return -EINVAL; }
 static inline void reset_thread_features(struct task_struct *task) {}
 static inline unsigned long shstk_alloc_thread_stack(struct task_struct *p,
-						     unsigned long clone_flags,
+						     unsigned long clone_flags, bool minimal,
 						     unsigned long stack_size) { return 0; }
 static inline void shstk_free(struct task_struct *p) {}
 static inline int setup_signal_shadow_stack(struct ksignal *ksig) { return 0; }
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 1b7960cf6eb0..e932e0e53972 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -209,7 +209,7 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
 	 * is disabled, new_ssp will remain 0, and fpu_clone() will know not to
 	 * update it.
 	 */
-	new_ssp = shstk_alloc_thread_stack(p, clone_flags, args->stack_size);
+	new_ssp = shstk_alloc_thread_stack(p, clone_flags, args->fn, args->stack_size);
 	if (IS_ERR_VALUE(new_ssp))
 		return PTR_ERR((void *)new_ssp);
 
diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c
index e6d3b1371b11..3da22c6f5874 100644
--- a/arch/x86/kernel/shstk.c
+++ b/arch/x86/kernel/shstk.c
@@ -192,11 +192,20 @@ void reset_thread_features(struct task_struct *tsk)
 }
 
 unsigned long shstk_alloc_thread_stack(struct task_struct *tsk, unsigned long clone_flags,
-				       unsigned long stack_size)
+				       bool minimal, unsigned long stack_size)
 {
 	struct thread_shstk *shstk = &tsk->thread.shstk;
 	unsigned long addr, size;
 
+	/*
+	 * Kernel threads cloned from userspace thread never return to
+	 * usermode.
+	 */
+	if (minimal) {
+		reset_thread_features(tsk);
+		return 0;
+	}
+
 	/*
 	 * If shadow stack is not enabled on the new thread, skip any
 	 * switch to a new shadow stack.
-- 
2.25.1.362.g51ebf55

Re: [PATCH 5/6] x86/shstk: don't create the shadow stack for PF_USER_WORKERs

Posted by Edgecombe, Rick P 1 month, 2 weeks ago

+Mark and Deepak

On Thu, 2025-08-14 at 12:14 +0200, Oleg Nesterov wrote:
> If a features_enabled(ARCH_SHSTK_SHSTK) userspace thread creates a
> PF_USER_WORKER thread, shstk_alloc_thread_stack() allocates the shadow
> stack for no reason, the new (kernel) thread will never return to usermode.
> 
> Plus the current code doesn't even look correct, in this case fpu_clone()
> won't call update_fpu_shstk().

The actual SSP register gets set when returning to userspace, it will get this
from MSR_IA32_PL3_SSP, which is restored from the xsave buffer. What I am seeing
is that fpu_clone() clears out the buffer in the 'minimal' case. So the xstate
copy of the SSP should already be zero and the problem is just that a shadow
stack gets allocated when one doesn't need to be.

I agree we don't need to allocate a shadow stack in this case, but I'm not sure
it is right to fully disable shadow stack in thread.features. First of all,
disabling it from shstk_alloc_thread_stack() seems weird. It just handles
allocating shadow stacks.

But also it seems the requirements are very similar to the vfork case where we
don't allocate a shadow stack. If we implement it similarly we can have less
special cases.

Lastly, it doesn't seem there is any way to clone from IO uring today, but there
were proposals. The general idea from the security POV is that copied threads
will keep shadow stack enabled. So it seems to fit better with the concepts
involved to not clear the thread.features.

How about just adding the 'minimal' condition to:
	if (clone_flags & CLONE_VFORK) {
		shstk->base = 0;
		shstk->size = 0;
		return 0;
	}
...then update all the comments where vfork is called out as the only case that
does this?

> 
> Add the new "bool minimal = !!args->fn" argument (which matches that of
> fpu_clone()) to shstk_alloc_thread_stack() and change it to do
> reset_thread_features(tsk) if "minimal" is true.
> 
> With this patch ssp_get() -> ssp_active(target) should never return true
> if target->flags & PF_USER_WORKER.
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> ---
>  arch/x86/include/asm/shstk.h |  4 ++--
>  arch/x86/kernel/process.c    |  2 +-
>  arch/x86/kernel/shstk.c      | 11 ++++++++++-
>  3 files changed, 13 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/include/asm/shstk.h b/arch/x86/include/asm/shstk.h
> index 92d449cc352a..dfb2aeebc25f 100644
> --- a/arch/x86/include/asm/shstk.h
> +++ b/arch/x86/include/asm/shstk.h
> @@ -17,7 +17,7 @@ struct thread_shstk {
>  long shstk_prctl(struct task_struct *task, int option, unsigned long arg2);
>  void reset_thread_features(struct task_struct *task);
>  unsigned long shstk_alloc_thread_stack(struct task_struct *p, unsigned long clone_flags,
> -				       unsigned long stack_size);
> +				       bool minimal, unsigned long stack_size);
>  void shstk_free(struct task_struct *p);
>  int setup_signal_shadow_stack(struct ksignal *ksig);
>  int restore_signal_shadow_stack(void);
> @@ -28,7 +28,7 @@ static inline long shstk_prctl(struct task_struct *task, int option,
>  			       unsigned long arg2) { return -EINVAL; }
>  static inline void reset_thread_features(struct task_struct *task) {}
>  static inline unsigned long shstk_alloc_thread_stack(struct task_struct *p,
> -						     unsigned long clone_flags,
> +						     unsigned long clone_flags, bool minimal,
>  						     unsigned long stack_size) { return 0; }
>  static inline void shstk_free(struct task_struct *p) {}
>  static inline int setup_signal_shadow_stack(struct ksignal *ksig) { return 0; }
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index 1b7960cf6eb0..e932e0e53972 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -209,7 +209,7 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
>  	 * is disabled, new_ssp will remain 0, and fpu_clone() will know not to
>  	 * update it.
>  	 */
> -	new_ssp = shstk_alloc_thread_stack(p, clone_flags, args->stack_size);
> +	new_ssp = shstk_alloc_thread_stack(p, clone_flags, args->fn, args->stack_size);
>  	if (IS_ERR_VALUE(new_ssp))
>  		return PTR_ERR((void *)new_ssp);
>  
> diff --git a/arch/x86/kernel/shstk.c b/arch/x86/kernel/shstk.c
> index e6d3b1371b11..3da22c6f5874 100644
> --- a/arch/x86/kernel/shstk.c
> +++ b/arch/x86/kernel/shstk.c
> @@ -192,11 +192,20 @@ void reset_thread_features(struct task_struct *tsk)
>  }
>  
>  unsigned long shstk_alloc_thread_stack(struct task_struct *tsk, unsigned long clone_flags,
> -				       unsigned long stack_size)
> +				       bool minimal, unsigned long stack_size)
>  {
>  	struct thread_shstk *shstk = &tsk->thread.shstk;
>  	unsigned long addr, size;
>  
> +	/*
> +	 * Kernel threads cloned from userspace thread never return to
> +	 * usermode.
> +	 */
> +	if (minimal) {
> +		reset_thread_features(tsk);
> +		return 0;
> +	}
> +
>  	/*
>  	 * If shadow stack is not enabled on the new thread, skip any
>  	 * switch to a new shadow stack.

Re: [PATCH 5/6] x86/shstk: don't create the shadow stack for PF_USER_WORKERs

Posted by Oleg Nesterov 1 month, 2 weeks ago

On 08/14, Edgecombe, Rick P wrote:
>
> +Mark and Deepak
>
> On Thu, 2025-08-14 at 12:14 +0200, Oleg Nesterov wrote:
> > If a features_enabled(ARCH_SHSTK_SHSTK) userspace thread creates a
> > PF_USER_WORKER thread, shstk_alloc_thread_stack() allocates the shadow
> > stack for no reason, the new (kernel) thread will never return to usermode.
> >
> > Plus the current code doesn't even look correct, in this case fpu_clone()
> > won't call update_fpu_shstk().

...

> I agree we don't need to allocate a shadow stack in this case,

Great,

> but I'm not sure
> it is right to fully disable shadow stack in thread.features.

Why?

> First of all,
> disabling it from shstk_alloc_thread_stack() seems weird. It just handles
> allocating shadow stacks.

I agree in advance with any other change.

> Lastly, it doesn't seem there is any way to clone from IO uring today,

Not sure I understand... create_io_thread() ?

> How about just adding the 'minimal' condition to:
> 	if (clone_flags & CLONE_VFORK) {
> 		shstk->base = 0;
> 		shstk->size = 0;
> 		return 0;
> 	}
> ...then update all the comments where vfork is called out as the only case that
> does this?

but create_io_thread() and vhost_task_create() do not use CLONE_VFORK?

Oleg.

Re: [PATCH 5/6] x86/shstk: don't create the shadow stack for PF_USER_WORKERs

Posted by Edgecombe, Rick P 1 month, 2 weeks ago

On Fri, 2025-08-15 at 14:17 +0200, Oleg Nesterov wrote:
> > but I'm not sure
> > it is right to fully disable shadow stack in thread.features.
> 
> Why?

The bit in thread.features is like a sticky bit that is inherrited whenver a
thread is cloned. How it works normally is that the first thread in the app
(really glibc loader) enables shadow stacks, then new threads automatically
inherit that shadow stack is enabled. So in practice it is like a process wide
thing, but stored on each thread. This process-wide behavior is to add to the
security. You don't want to allow a protected app to spawn a new thread that
escapes the enforcement. There is a way to manually disable shadow stack per-
thread, but it is protected by ARCH_SHSTK_LOCK, which gets set by glibc loader
before jumping into the actual app.

When shadow stack is enabled, depending on the circumstances a new shadow stack
will automatically be allocated for a new thread. shstk->base and shstk->size
are about that automatically enabled shadow stack.

So what are we trying to do for PF_USER_WORKER? Prevent them from wasting a VMA
with an unused shadow stack? Or set PF_USER_WORKER's aside from the logic that
is about more than protecting the individual thread in the process?

> 
> > First of all,
> > disabling it from shstk_alloc_thread_stack() seems weird. It just handles
> > allocating shadow stacks.
> 
> I agree in advance with any other change.
> 
> > Lastly, it doesn't seem there is any way to clone from IO uring today,
> 
> Not sure I understand... create_io_thread() ?

There was some discussion in the past about adding a clone, but the comment was
more about whether it fit the concepts in involved.

https://lwn.net/Articles/908268/

> 
> > How about just adding the 'minimal' condition to:
> >  	if (clone_flags & CLONE_VFORK) {
> >  		shstk->base = 0;
> >  		shstk->size = 0;
> >  		return 0;
> >  	}
> > ...then update all the comments where vfork is called out as the only case
> > that
> > does this?
> 
> but create_io_thread() and vhost_task_create() do not use CLONE_VFORK?

No, to make it have the same logic as the vfork case (which doesn't allocate a
new shadow stack).

Like:
 	if ((clone_flags & CLONE_VFORK) || minimal) {
 		shstk->base = 0;
 		shstk->size = 0;
 		return 0;
 	}

Or as Mark was suggesting, replace it with something like:
 	if (needs_new_shstk(clone_args)) {
 		shstk->base = 0;
 		shstk->size = 0;
 		return 0;
 	}

Re: [PATCH 5/6] x86/shstk: don't create the shadow stack for PF_USER_WORKERs

Posted by Oleg Nesterov 1 month, 2 weeks ago

On 08/15, Edgecombe, Rick P wrote:
>
> The bit in thread.features is like a sticky bit that is inherrited whenver a
> thread is cloned.

...

> You don't want to allow a protected app to spawn a new thread that
> escapes the enforcement.

Ah, this is clear. But again, PF_USER_WORKER is the kernel thread cloned
by the kernel. Yes, it shares the same thread-group, but this is only to
make SIGKILL/exit_group/etc work. It is not that userspace app can create
it via something like pthread_create().

> So what are we trying to do for PF_USER_WORKER? Prevent them from wasting a VMA
> with an unused shadow stack? Or set PF_USER_WORKER's aside from the logic that
> is about more than protecting the individual thread in the process?

Let me quote my answer to Mark:

	The fact that a kernel thread can have the pointless ARCH_SHSTK_SHSTK is
	the only reason I know why x86_task_fpu(PF_USER_WORKER) has to work.

	I'd like to make this logic consistent with PF_KTHREAD, and in the longer
	term change the x86 FPU code so that the kernel threads can run without
	without "struct fpu" attached to task_struct.

And again, please see

	Warning from x86_task_fpu()
	https://lore.kernel.org/all/aJVuZZgYjEMxiUYq@ly-workstation/

	PF_USER_WORKERs and shadow stack
	https://lore.kernel.org/all/20250813162824.GA15234@redhat.com/

and 6/6 in this series.

> No, to make it have the same logic as the vfork case (which doesn't allocate a
> new shadow stack).
>
> Like:
>  	if ((clone_flags & CLONE_VFORK) || minimal) {
>  		shstk->base = 0;
>  		shstk->size = 0;
>  		return 0;
>  	}

Aha, got it. Agreed, but I think we also need to clear ARCH_SHSTK_SHSTK
copied by arch_dup_task_struct() ?

Oleg.

Re: [PATCH 5/6] x86/shstk: don't create the shadow stack for PF_USER_WORKERs

Posted by Edgecombe, Rick P 1 month, 2 weeks ago

On Fri, 2025-08-15 at 18:54 +0200, Oleg Nesterov wrote:
> > So what are we trying to do for PF_USER_WORKER? Prevent them from wasting a
> > VMA
> > with an unused shadow stack? Or set PF_USER_WORKER's aside from the logic
> > that
> > is about more than protecting the individual thread in the process?
> 
> Let me quote my answer to Mark:
> 
> 	The fact that a kernel thread can have the pointless ARCH_SHSTK_SHSTK
> is
> 	the only reason I know why x86_task_fpu(PF_USER_WORKER) has to work.

Maybe you can explain the exact failure mode here? ARCH_SHSTK_SHSTK isn't part
of the FPU infrastructure, so maybe you can explain how there is some cascade.

> 
> 	I'd like to make this logic consistent with PF_KTHREAD, and in the
> longer
> 	term change the x86 FPU code so that the kernel threads can run
> without
> 	without "struct fpu" attached to task_struct.

For PF_USER_WORKER it still has access to the user MM, right? Shouldn't it
inherit PKRU from the parent?

Stop me if I'm telling you something you already know... For better or worse,
the x86 FPU state has grown from the classic "extra math registers", to include
other things like PKRU and supervisor state. Supervisor state controls other
thread specific state that *only* the kernel is supposed to have complete access
to. Some of the xfeatures are even about saving and restoring state that *only*
affects the kernel. So I think each xfeature needs to be evaluated, and they are
all annoyingly different. I think LBR works in the kernel too?

> 
> And again, please see
> 
> 	Warning from x86_task_fpu()
> 	https://lore.kernel.org/all/aJVuZZgYjEMxiUYq@ly-workstation/
> 
> 	PF_USER_WORKERs and shadow stack
> 	https://lore.kernel.org/all/20250813162824.GA15234@redhat.com/
> 
> and 6/6 in this series.

I kind of think it would be more appropriate for you to explain more about what
you are trying to do. I've read three things:
 - Prevent wasting a shstk VMA (which Dave suggested maybe wasn't worth it)
 - Prevent issue around update_fpu_shstk(), which I'm not sure is an issue
 - Prevent ptrace from setting FPU state on user workers because it caused
problems (details unclear), by changing the FPU design in a way that apparently
has impacts across FPU-using features (unclear why this is the best way to
prevent it)

TBH, based on my current understanding, all three sound dubious to me. Let's get
on the same page as far as the goals before we discuss shstk changes further.
You might be aware that the x86 FPU is seen as too complex already and so adding
special cases tends to have a high bar. So consider making a strong, clear
justification for the overall problem/solution you have in mind.

Re: [PATCH 5/6] x86/shstk: don't create the shadow stack for PF_USER_WORKERs

Posted by Oleg Nesterov 1 month, 2 weeks ago

Rick,

I'll try to write another email and send V2 next week.

Although perhaps I'll send some other unrelated and "obvious" cleanups
before that...

Until then:

On 08/15, Edgecombe, Rick P wrote:
>
> Maybe you can explain

Damn yes, agreed, my fault.

> Stop me if I'm telling you something you already know...

Or, quite possibly, you should stop me to send the patches which change
the code I can hardly understand ;) And that is why I CC'ed the experts
like you.

See you next week,

Oleg.

Re: [PATCH 5/6] x86/shstk: don't create the shadow stack for PF_USER_WORKERs

Posted by Mark Brown 1 month, 2 weeks ago

On Thu, Aug 14, 2025 at 05:03:36PM +0000, Edgecombe, Rick P wrote:
> On Thu, 2025-08-14 at 12:14 +0200, Oleg Nesterov wrote:

> > If a features_enabled(ARCH_SHSTK_SHSTK) userspace thread creates a
> > PF_USER_WORKER thread, shstk_alloc_thread_stack() allocates the shadow
> > stack for no reason, the new (kernel) thread will never return to usermode.

> I agree we don't need to allocate a shadow stack in this case, but I'm not sure
> it is right to fully disable shadow stack in thread.features. First of all,
> disabling it from shstk_alloc_thread_stack() seems weird. It just handles
> allocating shadow stacks.

I agree that it's better to leave userspace shadow stacks enabled, given
that the reason we're not allocating the shadow stack is that we don't
expect to ever return to userspace then it should be fine to leave the
feature turned on for userspace.  If we mess up and do somehow return to
userspace it seems better to have the feature enabled and hopefully give
us some protection against our mistake, and if something causes the
worker thread to start a normal thread then things should work as
expected.

> How about just adding the 'minimal' condition to:
> 	if (clone_flags & CLONE_VFORK) {
> 		shstk->base = 0;
> 		shstk->size = 0;
> 		return 0;
> 	}
> ...then update all the comments where vfork is called out as the only case that
> does this?

Perhaps we should factor the logic for deciding if we need to allocate a
userspace shadow stack out into the arch independent code and do
something like set a flag in kernel_clone_args that the arches can
check?  I think the logic is the same for all arches at the minute and
don't see a reason why it'd diverge.  That'd collide a bit with my
clone3() series, there's some overlap there with that creating another
reason why the decision would change.  Reducing the duplication would be
nice.

Re: [PATCH 5/6] x86/shstk: don't create the shadow stack for PF_USER_WORKERs

Posted by Oleg Nesterov 1 month, 2 weeks ago

On 08/14, Mark Brown wrote:
>
> On Thu, Aug 14, 2025 at 05:03:36PM +0000, Edgecombe, Rick P wrote:
> > On Thu, 2025-08-14 at 12:14 +0200, Oleg Nesterov wrote:
>
> > > If a features_enabled(ARCH_SHSTK_SHSTK) userspace thread creates a
> > > PF_USER_WORKER thread, shstk_alloc_thread_stack() allocates the shadow
> > > stack for no reason, the new (kernel) thread will never return to usermode.
>
> > I agree we don't need to allocate a shadow stack in this case, but I'm not sure
> > it is right to fully disable shadow stack in thread.features. First of all,
> > disabling it from shstk_alloc_thread_stack() seems weird. It just handles
> > allocating shadow stacks.
>
> I agree that it's better to leave userspace shadow stacks enabled, given
> that the reason we're not allocating the shadow stack is that we don't
> expect to ever return to userspace then it should be fine to leave the
> feature turned on for userspace.  If we mess up and do somehow return to
> userspace

But a PF_USER_WORKER task can never return to userspace. It doesn't differ
from PF_KTHREAD in this respect.

Oleg.

Re: [PATCH 5/6] x86/shstk: don't create the shadow stack for PF_USER_WORKERs

Posted by Oleg Nesterov 1 month, 2 weeks ago

On 08/15, Oleg Nesterov wrote:
>
> On 08/14, Mark Brown wrote:
> >
> > On Thu, Aug 14, 2025 at 05:03:36PM +0000, Edgecombe, Rick P wrote:
> > > On Thu, 2025-08-14 at 12:14 +0200, Oleg Nesterov wrote:
> >
> > > > If a features_enabled(ARCH_SHSTK_SHSTK) userspace thread creates a
> > > > PF_USER_WORKER thread, shstk_alloc_thread_stack() allocates the shadow
> > > > stack for no reason, the new (kernel) thread will never return to usermode.
> >
> > > I agree we don't need to allocate a shadow stack in this case, but I'm not sure
> > > it is right to fully disable shadow stack in thread.features. First of all,
> > > disabling it from shstk_alloc_thread_stack() seems weird. It just handles
> > > allocating shadow stacks.
> >
> > I agree that it's better to leave userspace shadow stacks enabled, given
> > that the reason we're not allocating the shadow stack is that we don't
> > expect to ever return to userspace then it should be fine to leave the
> > feature turned on for userspace.  If we mess up and do somehow return to
> > userspace
>
> But a PF_USER_WORKER task can never return to userspace. It doesn't differ
> from PF_KTHREAD in this respect.

... of course unless it does exec.

Oleg.

Re: [PATCH 5/6] x86/shstk: don't create the shadow stack for PF_USER_WORKERs

Posted by Mark Brown 1 month, 2 weeks ago

On Fri, Aug 15, 2025 at 03:08:52PM +0200, Oleg Nesterov wrote:
> On 08/15, Oleg Nesterov wrote:
> > On 08/14, Mark Brown wrote:

> > > I agree that it's better to leave userspace shadow stacks enabled, given
> > > that the reason we're not allocating the shadow stack is that we don't
> > > expect to ever return to userspace then it should be fine to leave the
> > > feature turned on for userspace.  If we mess up and do somehow return to
> > > userspace

> > But a PF_USER_WORKER task can never return to userspace. It doesn't differ
> > from PF_KTHREAD in this respect.

> ... of course unless it does exec.

Sure, but OTOH at least for arm64 there's no cost to leaving the feature
enabled unless you actually execute userspace code so if we never return
to userspace writing the code to disable isn't really buying us anything.

Re: [PATCH 5/6] x86/shstk: don't create the shadow stack for PF_USER_WORKERs

Posted by Oleg Nesterov 1 month, 2 weeks ago

On 08/15, Mark Brown wrote:
>
> On Fri, Aug 15, 2025 at 03:08:52PM +0200, Oleg Nesterov wrote:
> > On 08/15, Oleg Nesterov wrote:
> > > On 08/14, Mark Brown wrote:
>
> > > > I agree that it's better to leave userspace shadow stacks enabled, given
> > > > that the reason we're not allocating the shadow stack is that we don't
> > > > expect to ever return to userspace then it should be fine to leave the
> > > > feature turned on for userspace.  If we mess up and do somehow return to
> > > > userspace
>
> > > But a PF_USER_WORKER task can never return to userspace. It doesn't differ
> > > from PF_KTHREAD in this respect.
>
> > ... of course unless it does exec.
>
> Sure, but OTOH at least for arm64 there's no cost to leaving the feature
> enabled unless you actually execute userspace code so if we never return
> to userspace writing the code to disable isn't really buying us anything.

The fact that a kernel thread can have the pointless ARCH_SHSTK_SHSTK is
the only reason I know why x86_task_fpu(PF_USER_WORKER) has to work.

I'd like to make this logic consistent with PF_KTHREAD, and in the longer
term change the x86 FPU code so that the kernel threads can run without
without "struct fpu" attached to task_struct.

Again, please see
https://lore.kernel.org/all/20250813191441.GA26754@redhat.com/

Oleg.

Re: [PATCH 5/6] x86/shstk: don't create the shadow stack for PF_USER_WORKERs

Posted by Mark Brown 1 month, 2 weeks ago

On Fri, Aug 15, 2025 at 05:43:11PM +0200, Oleg Nesterov wrote:
> On 08/15, Mark Brown wrote:

> > Sure, but OTOH at least for arm64 there's no cost to leaving the feature
> > enabled unless you actually execute userspace code so if we never return
> > to userspace writing the code to disable isn't really buying us anything.

> The fact that a kernel thread can have the pointless ARCH_SHSTK_SHSTK is
> the only reason I know why x86_task_fpu(PF_USER_WORKER) has to work.

> I'd like to make this logic consistent with PF_KTHREAD, and in the longer
> term change the x86 FPU code so that the kernel threads can run without
> without "struct fpu" attached to task_struct.

OK, that's entirely x86 specific - there's no reason we'd want to do
that for arm64.

Re: [PATCH 5/6] x86/shstk: don't create the shadow stack for PF_USER_WORKERs

Posted by Oleg Nesterov 1 month, 2 weeks ago

On 08/15, Mark Brown wrote:
>
> On Fri, Aug 15, 2025 at 05:43:11PM +0200, Oleg Nesterov wrote:
> > On 08/15, Mark Brown wrote:
>
> > > Sure, but OTOH at least for arm64 there's no cost to leaving the feature
> > > enabled unless you actually execute userspace code so if we never return
> > > to userspace writing the code to disable isn't really buying us anything.
>
> > The fact that a kernel thread can have the pointless ARCH_SHSTK_SHSTK is
> > the only reason I know why x86_task_fpu(PF_USER_WORKER) has to work.
>
> > I'd like to make this logic consistent with PF_KTHREAD, and in the longer
> > term change the x86 FPU code so that the kernel threads can run without
> > without "struct fpu" attached to task_struct.
>
> OK, that's entirely x86 specific - there's no reason we'd want to do
> that for arm64.

Since I know nothing about arm64. Any reason we do want to have the unnecessary
ARCH_SHSTK_SHSTK/shstk on arm64?

And... do you agree that shstk_alloc_thread_stack() without update_fpu_shstk()
in copy_thread() path doesn't look right? Even if nothing really bad can happen.

Oleg.

Re: [PATCH 5/6] x86/shstk: don't create the shadow stack for PF_USER_WORKERs

Posted by Mark Brown 1 month, 2 weeks ago

On Fri, Aug 15, 2025 at 06:00:23PM +0200, Oleg Nesterov wrote:
> On 08/15, Mark Brown wrote:

> > OK, that's entirely x86 specific - there's no reason we'd want to do
> > that for arm64.

> Since I know nothing about arm64. Any reason we do want to have the unnecessary
> ARCH_SHSTK_SHSTK/shstk on arm64?

If you mean allocating the userspace shadow stack for threads that never
go to userspace then no, it's not needed.

> And... do you agree that shstk_alloc_thread_stack() without update_fpu_shstk()
> in copy_thread() path doesn't look right? Even if nothing really bad can happen.

Honestly the update_fpu_ stuff is sufficently x86 specific that it
doesn't really register as something I'd expect to see in that path.

Re: [PATCH 5/6] x86/shstk: don't create the shadow stack for PF_USER_WORKERs

Posted by Edgecombe, Rick P 1 month, 2 weeks ago

On Thu, 2025-08-14 at 19:33 +0100, Mark Brown wrote:
> > How about just adding the 'minimal' condition to:
> >  	if (clone_flags & CLONE_VFORK) {
> >  		shstk->base = 0;
> >  		shstk->size = 0;
> >  		return 0;
> >  	}
> > ...then update all the comments where vfork is called out as the only case
> > that
> > does this?
> 
> Perhaps we should factor the logic for deciding if we need to allocate a
> userspace shadow stack out into the arch independent code and do
> something like set a flag in kernel_clone_args that the arches can
> check?  I think the logic is the same for all arches at the minute and
> don't see a reason why it'd diverge.

Sounds good. Like a little start to this:
https://lore.kernel.org/lkml/20241010-shstk_converge-v1-0-631beca676e7@rivosinc.com/

>   That'd collide a bit with my
> clone3() series, there's some overlap there with that creating another
> reason why the decision would change.  Reducing the duplication would be
> nice.

Argh, I need to test the latest of that still.

Re: [PATCH 5/6] x86/shstk: don't create the shadow stack for PF_USER_WORKERs

Posted by Mark Brown 1 month, 2 weeks ago

On Thu, Aug 14, 2025 at 10:43:45PM +0000, Edgecombe, Rick P wrote:
> On Thu, 2025-08-14 at 19:33 +0100, Mark Brown wrote:

> > Perhaps we should factor the logic for deciding if we need to allocate a
> > userspace shadow stack out into the arch independent code and do
> > something like set a flag in kernel_clone_args that the arches can
> > check?  I think the logic is the same for all arches at the minute and
> > don't see a reason why it'd diverge.

> Sounds good. Like a little start to this:
> https://lore.kernel.org/lkml/20241010-shstk_converge-v1-0-631beca676e7@rivosinc.com/

Yes, exactly.

> >   That'd collide a bit with my
> > clone3() series, there's some overlap there with that creating another
> > reason why the decision would change.  Reducing the duplication would be
> > nice.

> Argh, I need to test the latest of that still.

Yury pointed me at some newer x86 systems I was able to get access to so
I was finally able to test that one myself before sending it out,
confirmation would be good but hopefully it's fine.  I've been holding
back on sending a rebased version out since Deepak was going to help me
get set up to test it on RISC-V.  Though I see now that the RISC-V code
has vanished from -next (I guess due to fallout from the issues with the
merge to Linus, it looks like there's almost nothing in the branch
currently), not sure what the plan is there?

Perhaps I should just send it out, but given the difficulty getting
anyone to pay attention I was trying to avoid issues with missing
updates for newly added RISC-V shadow stacks.

Re: [PATCH 5/6] x86/shstk: don't create the shadow stack for PF_USER_WORKERs

Posted by Deepak Gupta 1 month, 2 weeks ago

On Fri, Aug 15, 2025 at 12:44:14PM +0100, Mark Brown wrote:
>On Thu, Aug 14, 2025 at 10:43:45PM +0000, Edgecombe, Rick P wrote:
>> On Thu, 2025-08-14 at 19:33 +0100, Mark Brown wrote:
>
>> > Perhaps we should factor the logic for deciding if we need to allocate a
>> > userspace shadow stack out into the arch independent code and do
>> > something like set a flag in kernel_clone_args that the arches can
>> > check?  I think the logic is the same for all arches at the minute and
>> > don't see a reason why it'd diverge.
>
>> Sounds good. Like a little start to this:
>> https://lore.kernel.org/lkml/20241010-shstk_converge-v1-0-631beca676e7@rivosinc.com/
>
>Yes, exactly.
>
>> >   That'd collide a bit with my
>> > clone3() series, there's some overlap there with that creating another
>> > reason why the decision would change.  Reducing the duplication would be
>> > nice.
>
>> Argh, I need to test the latest of that still.
>
>Yury pointed me at some newer x86 systems I was able to get access to so
>I was finally able to test that one myself before sending it out,
>confirmation would be good but hopefully it's fine.  I've been holding
>back on sending a rebased version out since Deepak was going to help me
>get set up to test it on RISC-V.  Though I see now that the RISC-V code
>has vanished from -next (I guess due to fallout from the issues with the
>merge to Linus, it looks like there's almost nothing in the branch
>currently), not sure what the plan is there?
>
>Perhaps I should just send it out, but given the difficulty getting
>anyone to pay attention I was trying to avoid issues with missing
>updates for newly added RISC-V shadow stacks.

Yes I was trying to get that sorted as well. Because now I'll have to
rebase my changes to 6.17. So I wanted to make sure that it applies
cleanly. I suggest that you send it out because risc-v was left out
anyways. I'll apply your patch series on my risc-v shadow stack changes
(on top of 6.17) and will report back. It might be easier that way.

How does that sound?

Re: [PATCH 5/6] x86/shstk: don't create the shadow stack for PF_USER_WORKERs

Posted by Mark Brown 1 month, 2 weeks ago

On Fri, Aug 15, 2025 at 12:11:47PM -0700, Deepak Gupta wrote:
> On Fri, Aug 15, 2025 at 12:44:14PM +0100, Mark Brown wrote:

> > confirmation would be good but hopefully it's fine.  I've been holding
> > back on sending a rebased version out since Deepak was going to help me
> > get set up to test it on RISC-V.  Though I see now that the RISC-V code
> > has vanished from -next (I guess due to fallout from the issues with the
> > merge to Linus, it looks like there's almost nothing in the branch
> > currently), not sure what the plan is there?

> > Perhaps I should just send it out, but given the difficulty getting
> > anyone to pay attention I was trying to avoid issues with missing
> > updates for newly added RISC-V shadow stacks.

> Yes I was trying to get that sorted as well. Because now I'll have to
> rebase my changes to 6.17. So I wanted to make sure that it applies
> cleanly. I suggest that you send it out because risc-v was left out
> anyways. I'll apply your patch series on my risc-v shadow stack changes
> (on top of 6.17) and will report back. It might be easier that way.

> How does that sound?

Sounds good.

My main concern is that I don't want to end up needlessly holding off
either series due to dependencies/cross tree issues - I remain
(endlessly!) hopeful that the everyone's happy with the clone3() work at
this point and it could get merged, but if RISC-V support is going in
then it should support the new interface too.  Hopefully we can do
something like apply this on a branch and then merge that into the
RISC-V tree?

Re: [PATCH 5/6] x86/shstk: don't create the shadow stack for PF_USER_WORKERs

Posted by Deepak Gupta 1 month, 2 weeks ago

On Mon, Aug 18, 2025 at 06:27:02PM +0100, Mark Brown wrote:
>On Fri, Aug 15, 2025 at 12:11:47PM -0700, Deepak Gupta wrote:
>> On Fri, Aug 15, 2025 at 12:44:14PM +0100, Mark Brown wrote:
>
>> > confirmation would be good but hopefully it's fine.  I've been holding
>> > back on sending a rebased version out since Deepak was going to help me
>> > get set up to test it on RISC-V.  Though I see now that the RISC-V code
>> > has vanished from -next (I guess due to fallout from the issues with the
>> > merge to Linus, it looks like there's almost nothing in the branch
>> > currently), not sure what the plan is there?
>
>> > Perhaps I should just send it out, but given the difficulty getting
>> > anyone to pay attention I was trying to avoid issues with missing
>> > updates for newly added RISC-V shadow stacks.
>
>> Yes I was trying to get that sorted as well. Because now I'll have to
>> rebase my changes to 6.17. So I wanted to make sure that it applies
>> cleanly. I suggest that you send it out because risc-v was left out
>> anyways. I'll apply your patch series on my risc-v shadow stack changes
>> (on top of 6.17) and will report back. It might be easier that way.
>
>> How does that sound?
>
>Sounds good.
>
>My main concern is that I don't want to end up needlessly holding off
>either series due to dependencies/cross tree issues - I remain
>(endlessly!) hopeful that the everyone's happy with the clone3() work at
>this point and it could get merged, but if RISC-V support is going in
>then it should support the new interface too.  Hopefully we can do
>something like apply this on a branch and then merge that into the
>RISC-V tree?

Yes they should make to upstream independent of each other. There are some
changes which I can adopt in my current set of changes (like change
`shstk_alloc_thread_stack` prototype). In your current patchsets
"fork: Add shadow stack support to clone3()" will need riscv support. Other
than that I see that support for `my_syscall2` for riscv arch will be needed.

If clone3 changes make in before riscv shadow stack changes, then its easy
I'll just add riscv specific changes pertaining to clone3 in my current
patchsets. 

If clone3 changes are later than riscv shadow stack changes, then I'll point
you to a branch from where you can pick riscv specific clone3 + shadow stack
changes.

If they both are going in together (may happen for 6.18 window), then we will
have to coordinate on which one applies first. So let's keep an eye on that.

If I push my branch somewhere on github (riscv shadow stack changes and then
clone3 changes on top of them), Is that fine?

-Deepak