[PATCH v4 1/3] randomize_kstack: Maintain kstack_offset per task

Ryan Roberts posted 3 patches 2 weeks, 6 days ago
[PATCH v4 1/3] randomize_kstack: Maintain kstack_offset per task
Posted by Ryan Roberts 2 weeks, 6 days ago
kstack_offset was previously maintained per-cpu, but this caused a
couple of issues. So let's instead make it per-task.

Issue 1: add_random_kstack_offset() and choose_random_kstack_offset()
expected and required to be called with interrupts and preemption
disabled so that it could manipulate per-cpu state. But arm64, loongarch
and risc-v are calling them with interrupts and preemption enabled. I
don't _think_ this causes any functional issues, but it's certainly
unexpected and could lead to manipulating the wrong cpu's state, which
could cause a minor performance degradation due to bouncing the cache
lines. By maintaining the state per-task those functions can safely be
called in preemptible context.

Issue 2: add_random_kstack_offset() is called before executing the
syscall and expands the stack using a previously chosen random offset.
choose_random_kstack_offset() is called after executing the syscall and
chooses and stores a new random offset for the next syscall. With
per-cpu storage for this offset, an attacker could force cpu migration
during the execution of the syscall and prevent the offset from being
updated for the original cpu such that it is predictable for the next
syscall on that cpu. By maintaining the state per-task, this problem
goes away because the per-task random offset is updated after the
syscall regardless of which cpu it is executing on.

Fixes: 39218ff4c625 ("stack: Optionally randomize kernel stack offset each syscall")
Closes: https://lore.kernel.org/all/dd8c37bc-795f-4c7a-9086-69e584d8ab24@arm.com/
Cc: stable@vger.kernel.org
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 include/linux/randomize_kstack.h | 26 +++++++++++++++-----------
 include/linux/sched.h            |  4 ++++
 init/main.c                      |  1 -
 kernel/fork.c                    |  2 ++
 4 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/include/linux/randomize_kstack.h b/include/linux/randomize_kstack.h
index 1d982dbdd0d0..5d3916ca747c 100644
--- a/include/linux/randomize_kstack.h
+++ b/include/linux/randomize_kstack.h
@@ -9,7 +9,6 @@
 
 DECLARE_STATIC_KEY_MAYBE(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,
 			 randomize_kstack_offset);
-DECLARE_PER_CPU(u32, kstack_offset);
 
 /*
  * Do not use this anywhere else in the kernel. This is used here because
@@ -50,15 +49,14 @@ DECLARE_PER_CPU(u32, kstack_offset);
  * add_random_kstack_offset - Increase stack utilization by previously
  *			      chosen random offset
  *
- * This should be used in the syscall entry path when interrupts and
- * preempt are disabled, and after user registers have been stored to
- * the stack. For testing the resulting entropy, please see:
- * tools/testing/selftests/lkdtm/stack-entropy.sh
+ * This should be used in the syscall entry path after user registers have been
+ * stored to the stack. Preemption may be enabled. For testing the resulting
+ * entropy, please see: tools/testing/selftests/lkdtm/stack-entropy.sh
  */
 #define add_random_kstack_offset() do {					\
 	if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,	\
 				&randomize_kstack_offset)) {		\
-		u32 offset = raw_cpu_read(kstack_offset);		\
+		u32 offset = current->kstack_offset;			\
 		u8 *ptr = __kstack_alloca(KSTACK_OFFSET_MAX(offset));	\
 		/* Keep allocation even after "ptr" loses scope. */	\
 		asm volatile("" :: "r"(ptr) : "memory");		\
@@ -69,9 +67,9 @@ DECLARE_PER_CPU(u32, kstack_offset);
  * choose_random_kstack_offset - Choose the random offset for the next
  *				 add_random_kstack_offset()
  *
- * This should only be used during syscall exit when interrupts and
- * preempt are disabled. This position in the syscall flow is done to
- * frustrate attacks from userspace attempting to learn the next offset:
+ * This should only be used during syscall exit. Preemption may be enabled. This
+ * position in the syscall flow is done to frustrate attacks from userspace
+ * attempting to learn the next offset:
  * - Maximize the timing uncertainty visible from userspace: if the
  *   offset is chosen at syscall entry, userspace has much more control
  *   over the timing between choosing offsets. "How long will we be in
@@ -85,14 +83,20 @@ DECLARE_PER_CPU(u32, kstack_offset);
 #define choose_random_kstack_offset(rand) do {				\
 	if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,	\
 				&randomize_kstack_offset)) {		\
-		u32 offset = raw_cpu_read(kstack_offset);		\
+		u32 offset = current->kstack_offset;			\
 		offset = ror32(offset, 5) ^ (rand);			\
-		raw_cpu_write(kstack_offset, offset);			\
+		current->kstack_offset = offset;			\
 	}								\
 } while (0)
+
+static inline void random_kstack_task_init(struct task_struct *tsk)
+{
+	tsk->kstack_offset = 0;
+}
 #else /* CONFIG_RANDOMIZE_KSTACK_OFFSET */
 #define add_random_kstack_offset()		do { } while (0)
 #define choose_random_kstack_offset(rand)	do { } while (0)
+#define random_kstack_task_init(tsk)		do { } while (0)
 #endif /* CONFIG_RANDOMIZE_KSTACK_OFFSET */
 
 #endif
diff --git a/include/linux/sched.h b/include/linux/sched.h
index da0133524d08..23081a702ecf 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1591,6 +1591,10 @@ struct task_struct {
 	unsigned long			prev_lowest_stack;
 #endif
 
+#ifdef CONFIG_RANDOMIZE_KSTACK_OFFSET
+	u32				kstack_offset;
+#endif
+
 #ifdef CONFIG_X86_MCE
 	void __user			*mce_vaddr;
 	__u64				mce_kflags;
diff --git a/init/main.c b/init/main.c
index b84818ad9685..27fcbbde933e 100644
--- a/init/main.c
+++ b/init/main.c
@@ -830,7 +830,6 @@ static inline void initcall_debug_enable(void)
 #ifdef CONFIG_RANDOMIZE_KSTACK_OFFSET
 DEFINE_STATIC_KEY_MAYBE_RO(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT,
 			   randomize_kstack_offset);
-DEFINE_PER_CPU(u32, kstack_offset);
 
 static int __init early_randomize_kstack_offset(char *buf)
 {
diff --git a/kernel/fork.c b/kernel/fork.c
index b1f3915d5f8e..b061e1edbc43 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -95,6 +95,7 @@
 #include <linux/thread_info.h>
 #include <linux/kstack_erase.h>
 #include <linux/kasan.h>
+#include <linux/randomize_kstack.h>
 #include <linux/scs.h>
 #include <linux/io_uring.h>
 #include <linux/bpf.h>
@@ -2231,6 +2232,7 @@ __latent_entropy struct task_struct *copy_process(
 	if (retval)
 		goto bad_fork_cleanup_io;
 
+	random_kstack_task_init(p);
 	stackleak_task_init(p);
 
 	if (pid != &init_struct_pid) {
-- 
2.43.0
Re: [PATCH v4 1/3] randomize_kstack: Maintain kstack_offset per task
Posted by Dave Hansen 2 weeks, 6 days ago
On 1/19/26 05:01, Ryan Roberts wrote:
...
> Cc: stable@vger.kernel.org

Since this doesn't fix any known functional issues, if it were me, I'd
leave stable@ alone. It isn't clear that this is stable material.

> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1591,6 +1591,10 @@ struct task_struct {
>  	unsigned long			prev_lowest_stack;
>  #endif
>  
> +#ifdef CONFIG_RANDOMIZE_KSTACK_OFFSET
> +	u32				kstack_offset;
> +#endif
> +
>  #ifdef CONFIG_X86_MCE
>  	void __user			*mce_vaddr;

Nit: This seems to be throwing a u32 potentially in between a couple of
void*/ulong sized objects.

It probably doesn't matter with struct randomization and it's really
hard to get right among the web of task_struct #ifdefs. But, it would be
nice to at _least_ nestle this next to another int-sized thing.

Does it really even need to be 32 bits? x86 has this comment:

>         /*
>          * This value will get limited by KSTACK_OFFSET_MAX(), which is 10
>          * bits. The actual entropy will be further reduced by the compiler
>          * when applying stack alignment constraints (see cc_stack_align4/8 in
>          * arch/x86/Makefile), which will remove the 3 (x86_64) or 2 (ia32)
>          * low bits from any entropy chosen here.
>          *
>          * Therefore, final stack offset entropy will be 7 (x86_64) or
>          * 8 (ia32) bits.
>          */
Re: [PATCH v4 1/3] randomize_kstack: Maintain kstack_offset per task
Posted by Ryan Roberts 2 weeks, 6 days ago
Thanks for the review!

On 19/01/2026 16:10, Dave Hansen wrote:
> On 1/19/26 05:01, Ryan Roberts wrote:
> ...
>> Cc: stable@vger.kernel.org
> 
> Since this doesn't fix any known functional issues, if it were me, I'd
> leave stable@ alone. It isn't clear that this is stable material.

I listed 2 issues in the commit log; I agree that issue 1 falls into the
category of "don't really care", but issue 2 means that kstack randomization is
currently trivial to defeat. That's the reason I thought it would valuable in
stable.

But if you're saying don't bother and others agree, then this whole patch can be
dropped; this is just intended to be the backportable fix. Patch 3 reimplements
this entirely for upstream.

I'll wait and see if others have opinions if that's ok?

> 
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -1591,6 +1591,10 @@ struct task_struct {
>>  	unsigned long			prev_lowest_stack;
>>  #endif
>>  
>> +#ifdef CONFIG_RANDOMIZE_KSTACK_OFFSET
>> +	u32				kstack_offset;
>> +#endif
>> +
>>  #ifdef CONFIG_X86_MCE
>>  	void __user			*mce_vaddr;
> 
> Nit: This seems to be throwing a u32 potentially in between a couple of
> void*/ulong sized objects.

Yeah, I spent a bit of time with pahole but eventually concluded that it was
difficult to find somewhere to nestle it that would work reliably cross arch.
Eventually I just decided to group it with other stack meta data.

> 
> It probably doesn't matter with struct randomization and it's really
> hard to get right among the web of task_struct #ifdefs. But, it would be
> nice to at _least_ nestle this next to another int-sized thing.
> 
> Does it really even need to be 32 bits? x86 has this comment:
> 
>>         /*
>>          * This value will get limited by KSTACK_OFFSET_MAX(), which is 10
>>          * bits. The actual entropy will be further reduced by the compiler
>>          * when applying stack alignment constraints (see cc_stack_align4/8 in
>>          * arch/x86/Makefile), which will remove the 3 (x86_64) or 2 (ia32)
>>          * low bits from any entropy chosen here.
>>          *
>>          * Therefore, final stack offset entropy will be 7 (x86_64) or
>>          * 8 (ia32) bits.
>>          */

For more recent kernels it's 6 bits shifted by 4 for 64-bit kernels or 8 bits
shifted by 2 for 32-bit kernels regardless of arch. So could probably make it
work with 8 bits of storage. Although I was deliberately trying to keep the
change simple, since it was intended for backporting. Patch 3 rips it out.

Overall I'd prefer to leave it all as is. But if people don't think we should
backport, then let's just drop the whole patch.

Thanks,
Ryan
Re: [PATCH v4 1/3] randomize_kstack: Maintain kstack_offset per task
Posted by Dave Hansen 2 weeks, 6 days ago
On 1/19/26 08:51, Ryan Roberts wrote:
>> Since this doesn't fix any known functional issues, if it were me, I'd
>> leave stable@ alone. It isn't clear that this is stable material.
> I listed 2 issues in the commit log; I agree that issue 1 falls into the
> category of "don't really care", but issue 2 means that kstack randomization is
> currently trivial to defeat. That's the reason I thought it would valuable in
> stable.
> 
> But if you're saying don't bother and others agree, then this whole patch can be
> dropped; this is just intended to be the backportable fix. Patch 3 reimplements
> this entirely for upstream.
> 
> I'll wait and see if others have opinions if that's ok?

Sure. I don't feel that strongly about it.