riscv: uaccess: optimisations

[PATCH v6 1/5] riscv: save the SR_SUM status over switches

Posted by Cyril Bur 10 months ago

From: Ben Dooks <ben.dooks@codethink.co.uk>

When threads/tasks are switched we need to ensure the old execution's
SR_SUM state is saved and the new thread has the old SR_SUM state
restored.

The issue was seen under heavy load especially with the syz-stress tool
running, with crashes as follows in schedule_tail:

Unable to handle kernel access to user memory without uaccess routines
at virtual address 000000002749f0d0
Oops [#1]
Modules linked in:
CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted
5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0
Hardware name: riscv-virtio,qemu (DT)
epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
 ra : task_pid_vnr include/linux/sched.h:1421 [inline]
 ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264
epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp : ffffffe025d17ec0
 gp : ffffffe005d25378 tp : ffffffe00f0d0000 t0 : 0000000000000000
 t1 : 0000000000000001 t2 : 00000000000f4240 s0 : ffffffe025d17ee0
 s1 : 000000002749f0d0 a0 : 000000000000002a a1 : 0000000000000003
 a2 : 1ffffffc0cfac500 a3 : ffffffe0000c80cc a4 : 5ae9db91c19bbe00
 a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffe000082eba
 s2 : 0000000000040000 s3 : ffffffe00eef96c0 s4 : ffffffe022c77fe0
 s5 : 0000000000004000 s6 : ffffffe067d74e00 s7 : ffffffe067d74850
 s8 : ffffffe067d73e18 s9 : ffffffe067d74e00 s10: ffffffe00eef96e8
 s11: 000000ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffffffc4043cafb2
 t5 : ffffffc4043cafba t6 : 0000000000040000
status: 0000000000000120 badaddr: 000000002749f0d0 cause:
000000000000000f
Call Trace:
[<ffffffe00008c8b0>] schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
[<ffffffe000005570>] ret_from_exception+0x0/0x14
Dumping ftrace buffer:
   (ftrace buffer empty)
---[ end trace b5f8f9231dc87dda ]---

The issue comes from the put_user() in schedule_tail
(kernel/sched/core.c) doing the following:

asmlinkage __visible void schedule_tail(struct task_struct *prev)
{
...
        if (current->set_child_tid)
                put_user(task_pid_vnr(current), current->set_child_tid);
...
}

the put_user() macro causes the code sequence to come out as follows:

1:	__enable_user_access()
2:	reg = task_pid_vnr(current);
3:	*current->set_child_tid = reg;
4:	__disable_user_access()

The problem is that we may have a sleeping function as argument which
could clear SR_SUM causing the panic above. This was fixed by
evaluating the argument of the put_user() macro outside the user-enabled
section in commit 285a76bb2cf5 ("riscv: evaluate put_user() arg before
enabling user access")"

In order for riscv to take advantage of unsafe_get/put_XXX() macros and
to avoid the same issue we had with put_user() and sleeping functions we
must ensure code flow can go through switch_to() from within a region of
code with SR_SUM enabled and come back with SR_SUM still enabled. This
patch addresses the problem allowing future work to enable full use of
unsafe_get/put_XXX() macros without needing to take a CSR bit flip cost
on every access. Make switch_to() save and restore SR_SUM.

Reported-by: syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.com
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com>
---
 arch/riscv/include/asm/processor.h | 1 +
 arch/riscv/kernel/asm-offsets.c    | 5 +++++
 arch/riscv/kernel/entry.S          | 8 ++++++++
 3 files changed, 14 insertions(+)

diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
index 5f56eb9d114a..58fd11c89fe9 100644
--- a/arch/riscv/include/asm/processor.h
+++ b/arch/riscv/include/asm/processor.h
@@ -103,6 +103,7 @@ struct thread_struct {
 	struct __riscv_d_ext_state fstate;
 	unsigned long bad_cause;
 	unsigned long envcfg;
+	unsigned long status;
 	u32 riscv_v_flags;
 	u32 vstate_ctrl;
 	struct __riscv_v_ext_state vstate;
diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
index 16490755304e..969c65b1fe41 100644
--- a/arch/riscv/kernel/asm-offsets.c
+++ b/arch/riscv/kernel/asm-offsets.c
@@ -34,6 +34,7 @@ void asm_offsets(void)
 	OFFSET(TASK_THREAD_S9, task_struct, thread.s[9]);
 	OFFSET(TASK_THREAD_S10, task_struct, thread.s[10]);
 	OFFSET(TASK_THREAD_S11, task_struct, thread.s[11]);
+	OFFSET(TASK_THREAD_STATUS, task_struct, thread.status);
 
 	OFFSET(TASK_TI_CPU, task_struct, thread_info.cpu);
 	OFFSET(TASK_TI_PREEMPT_COUNT, task_struct, thread_info.preempt_count);
@@ -346,6 +347,10 @@ void asm_offsets(void)
 		  offsetof(struct task_struct, thread.s[11])
 		- offsetof(struct task_struct, thread.ra)
 	);
+	DEFINE(TASK_THREAD_STATUS_RA,
+		  offsetof(struct task_struct, thread.status)
+		- offsetof(struct task_struct, thread.ra)
+	);
 
 	DEFINE(TASK_THREAD_F0_F0,
 		  offsetof(struct task_struct, thread.fstate.f[0])
diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
index 33a5a9f2a0d4..00bd0de9faa2 100644
--- a/arch/riscv/kernel/entry.S
+++ b/arch/riscv/kernel/entry.S
@@ -397,9 +397,17 @@ SYM_FUNC_START(__switch_to)
 	REG_S s9,  TASK_THREAD_S9_RA(a3)
 	REG_S s10, TASK_THREAD_S10_RA(a3)
 	REG_S s11, TASK_THREAD_S11_RA(a3)
+
+	/* save the user space access flag */
+	li    s0, SR_SUM
+	csrr  s1, CSR_STATUS
+	REG_S s1, TASK_THREAD_STATUS_RA(a3)
+
 	/* Save the kernel shadow call stack pointer */
 	scs_save_current
 	/* Restore context from next->thread */
+	REG_L s0,  TASK_THREAD_STATUS_RA(a4)
+	csrs  CSR_STATUS, s0
 	REG_L ra,  TASK_THREAD_RA_RA(a4)
 	REG_L sp,  TASK_THREAD_SP_RA(a4)
 	REG_L s0,  TASK_THREAD_S0_RA(a4)
-- 
2.34.1

Re: [PATCH v6 1/5] riscv: save the SR_SUM status over switches

Posted by Deepak Gupta 9 months, 3 weeks ago

On Thu, Apr 10, 2025 at 07:05:22AM +0000, Cyril Bur wrote:
>From: Ben Dooks <ben.dooks@codethink.co.uk>
>
>When threads/tasks are switched we need to ensure the old execution's
>SR_SUM state is saved and the new thread has the old SR_SUM state
>restored.
>
>The issue was seen under heavy load especially with the syz-stress tool
>running, with crashes as follows in schedule_tail:
>
>Unable to handle kernel access to user memory without uaccess routines
>at virtual address 000000002749f0d0
>Oops [#1]
>Modules linked in:
>CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted
>5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0
>Hardware name: riscv-virtio,qemu (DT)
>epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
> ra : task_pid_vnr include/linux/sched.h:1421 [inline]
> ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264
>epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp : ffffffe025d17ec0
> gp : ffffffe005d25378 tp : ffffffe00f0d0000 t0 : 0000000000000000
> t1 : 0000000000000001 t2 : 00000000000f4240 s0 : ffffffe025d17ee0
> s1 : 000000002749f0d0 a0 : 000000000000002a a1 : 0000000000000003
> a2 : 1ffffffc0cfac500 a3 : ffffffe0000c80cc a4 : 5ae9db91c19bbe00
> a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffe000082eba
> s2 : 0000000000040000 s3 : ffffffe00eef96c0 s4 : ffffffe022c77fe0
> s5 : 0000000000004000 s6 : ffffffe067d74e00 s7 : ffffffe067d74850
> s8 : ffffffe067d73e18 s9 : ffffffe067d74e00 s10: ffffffe00eef96e8
> s11: 000000ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffffffc4043cafb2
> t5 : ffffffc4043cafba t6 : 0000000000040000
>status: 0000000000000120 badaddr: 000000002749f0d0 cause:
>000000000000000f
>Call Trace:
>[<ffffffe00008c8b0>] schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>[<ffffffe000005570>] ret_from_exception+0x0/0x14
>Dumping ftrace buffer:
>   (ftrace buffer empty)
>---[ end trace b5f8f9231dc87dda ]---
>
>The issue comes from the put_user() in schedule_tail
>(kernel/sched/core.c) doing the following:
>
>asmlinkage __visible void schedule_tail(struct task_struct *prev)
>{
>...
>        if (current->set_child_tid)
>                put_user(task_pid_vnr(current), current->set_child_tid);
>...
>}
>
>the put_user() macro causes the code sequence to come out as follows:
>
>1:	__enable_user_access()
>2:	reg = task_pid_vnr(current);
>3:	*current->set_child_tid = reg;
>4:	__disable_user_access()
>
>The problem is that we may have a sleeping function as argument which
>could clear SR_SUM causing the panic above. This was fixed by
>evaluating the argument of the put_user() macro outside the user-enabled
>section in commit 285a76bb2cf5 ("riscv: evaluate put_user() arg before
>enabling user access")"
>
>In order for riscv to take advantage of unsafe_get/put_XXX() macros and
>to avoid the same issue we had with put_user() and sleeping functions we
>must ensure code flow can go through switch_to() from within a region of
>code with SR_SUM enabled and come back with SR_SUM still enabled. This
>patch addresses the problem allowing future work to enable full use of
>unsafe_get/put_XXX() macros without needing to take a CSR bit flip cost
>on every access. Make switch_to() save and restore SR_SUM.
>
>Reported-by: syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.com
>Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
>Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com>
>---
> arch/riscv/include/asm/processor.h | 1 +
> arch/riscv/kernel/asm-offsets.c    | 5 +++++
> arch/riscv/kernel/entry.S          | 8 ++++++++
> 3 files changed, 14 insertions(+)
>
>diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
>index 5f56eb9d114a..58fd11c89fe9 100644
>--- a/arch/riscv/include/asm/processor.h
>+++ b/arch/riscv/include/asm/processor.h
>@@ -103,6 +103,7 @@ struct thread_struct {
> 	struct __riscv_d_ext_state fstate;
> 	unsigned long bad_cause;
> 	unsigned long envcfg;
>+	unsigned long status;
> 	u32 riscv_v_flags;
> 	u32 vstate_ctrl;
> 	struct __riscv_v_ext_state vstate;
>diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
>index 16490755304e..969c65b1fe41 100644
>--- a/arch/riscv/kernel/asm-offsets.c
>+++ b/arch/riscv/kernel/asm-offsets.c
>@@ -34,6 +34,7 @@ void asm_offsets(void)
> 	OFFSET(TASK_THREAD_S9, task_struct, thread.s[9]);
> 	OFFSET(TASK_THREAD_S10, task_struct, thread.s[10]);
> 	OFFSET(TASK_THREAD_S11, task_struct, thread.s[11]);
>+	OFFSET(TASK_THREAD_STATUS, task_struct, thread.status);
>
> 	OFFSET(TASK_TI_CPU, task_struct, thread_info.cpu);
> 	OFFSET(TASK_TI_PREEMPT_COUNT, task_struct, thread_info.preempt_count);
>@@ -346,6 +347,10 @@ void asm_offsets(void)
> 		  offsetof(struct task_struct, thread.s[11])
> 		- offsetof(struct task_struct, thread.ra)
> 	);
>+	DEFINE(TASK_THREAD_STATUS_RA,
>+		  offsetof(struct task_struct, thread.status)
>+		- offsetof(struct task_struct, thread.ra)
>+	);
>
> 	DEFINE(TASK_THREAD_F0_F0,
> 		  offsetof(struct task_struct, thread.fstate.f[0])
>diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
>index 33a5a9f2a0d4..00bd0de9faa2 100644
>--- a/arch/riscv/kernel/entry.S
>+++ b/arch/riscv/kernel/entry.S
>@@ -397,9 +397,17 @@ SYM_FUNC_START(__switch_to)
> 	REG_S s9,  TASK_THREAD_S9_RA(a3)
> 	REG_S s10, TASK_THREAD_S10_RA(a3)
> 	REG_S s11, TASK_THREAD_S11_RA(a3)
>+
>+	/* save the user space access flag */
>+	li    s0, SR_SUM
>+	csrr  s1, CSR_STATUS
>+	REG_S s1, TASK_THREAD_STATUS_RA(a3)
>+
> 	/* Save the kernel shadow call stack pointer */
> 	scs_save_current
> 	/* Restore context from next->thread */
>+	REG_L s0,  TASK_THREAD_STATUS_RA(a4)
>+	csrs  CSR_STATUS, s0
> 	REG_L ra,  TASK_THREAD_RA_RA(a4)
> 	REG_L sp,  TASK_THREAD_SP_RA(a4)
> 	REG_L s0,  TASK_THREAD_S0_RA(a4)

Reviewed-by: Deepak Gupta <debug@rivosinc.com>

Note to alex ghiti,

If this goes in before cfi changes, I might have to re-work some of the
changes with respect to zicfilp handling. zicfilp introduces `elp` state
in `sstatus`.

>-- 
>2.34.1
>
>

Re: [PATCH v6 1/5] riscv: save the SR_SUM status over switches

Posted by Deepak Gupta 8 months, 3 weeks ago

I did give this patch my RB and had planned to come back to it to see
if it impacts cfi related patches. Thanks to alex for brinigng to my
attention again. As it stands today, it doesn't impact cfi related
changes but I've some concerns.

Overall I do agree we should reduce number of SSTATUS accesses.

Couple of questions on introducing new `sstatus` field (inline)

On Tue, Apr 22, 2025 at 04:01:35PM -0700, Deepak Gupta wrote:
>On Thu, Apr 10, 2025 at 07:05:22AM +0000, Cyril Bur wrote:
>>From: Ben Dooks <ben.dooks@codethink.co.uk>
>>
>>When threads/tasks are switched we need to ensure the old execution's
>>SR_SUM state is saved and the new thread has the old SR_SUM state
>>restored.
>>
>>The issue was seen under heavy load especially with the syz-stress tool
>>running, with crashes as follows in schedule_tail:
>>
>>Unable to handle kernel access to user memory without uaccess routines
>>at virtual address 000000002749f0d0
>>Oops [#1]
>>Modules linked in:
>>CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted
>>5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0
>>Hardware name: riscv-virtio,qemu (DT)
>>epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>>ra : task_pid_vnr include/linux/sched.h:1421 [inline]
>>ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264
>>epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp : ffffffe025d17ec0
>>gp : ffffffe005d25378 tp : ffffffe00f0d0000 t0 : 0000000000000000
>>t1 : 0000000000000001 t2 : 00000000000f4240 s0 : ffffffe025d17ee0
>>s1 : 000000002749f0d0 a0 : 000000000000002a a1 : 0000000000000003
>>a2 : 1ffffffc0cfac500 a3 : ffffffe0000c80cc a4 : 5ae9db91c19bbe00
>>a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffe000082eba
>>s2 : 0000000000040000 s3 : ffffffe00eef96c0 s4 : ffffffe022c77fe0
>>s5 : 0000000000004000 s6 : ffffffe067d74e00 s7 : ffffffe067d74850
>>s8 : ffffffe067d73e18 s9 : ffffffe067d74e00 s10: ffffffe00eef96e8
>>s11: 000000ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffffffc4043cafb2
>>t5 : ffffffc4043cafba t6 : 0000000000040000
>>status: 0000000000000120 badaddr: 000000002749f0d0 cause:
>>000000000000000f
>>Call Trace:
>>[<ffffffe00008c8b0>] schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>>[<ffffffe000005570>] ret_from_exception+0x0/0x14
>>Dumping ftrace buffer:
>>  (ftrace buffer empty)
>>---[ end trace b5f8f9231dc87dda ]---
>>
>>The issue comes from the put_user() in schedule_tail
>>(kernel/sched/core.c) doing the following:
>>
>>asmlinkage __visible void schedule_tail(struct task_struct *prev)
>>{
>>...
>>       if (current->set_child_tid)
>>               put_user(task_pid_vnr(current), current->set_child_tid);
>>...
>>}
>>
>>the put_user() macro causes the code sequence to come out as follows:
>>
>>1:	__enable_user_access()
>>2:	reg = task_pid_vnr(current);
>>3:	*current->set_child_tid = reg;
>>4:	__disable_user_access()
>>
>>The problem is that we may have a sleeping function as argument which
>>could clear SR_SUM causing the panic above. This was fixed by
>>evaluating the argument of the put_user() macro outside the user-enabled
>>section in commit 285a76bb2cf5 ("riscv: evaluate put_user() arg before
>>enabling user access")"
>>
>>In order for riscv to take advantage of unsafe_get/put_XXX() macros and
>>to avoid the same issue we had with put_user() and sleeping functions we
>>must ensure code flow can go through switch_to() from within a region of
>>code with SR_SUM enabled and come back with SR_SUM still enabled. This
>>patch addresses the problem allowing future work to enable full use of
>>unsafe_get/put_XXX() macros without needing to take a CSR bit flip cost
>>on every access. Make switch_to() save and restore SR_SUM.
>>
>>Reported-by: syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.com
>>Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
>>Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com>
>>---
>>arch/riscv/include/asm/processor.h | 1 +
>>arch/riscv/kernel/asm-offsets.c    | 5 +++++
>>arch/riscv/kernel/entry.S          | 8 ++++++++
>>3 files changed, 14 insertions(+)
>>
>>diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
>>index 5f56eb9d114a..58fd11c89fe9 100644
>>--- a/arch/riscv/include/asm/processor.h
>>+++ b/arch/riscv/include/asm/processor.h
>>@@ -103,6 +103,7 @@ struct thread_struct {
>>	struct __riscv_d_ext_state fstate;
>>	unsigned long bad_cause;
>>	unsigned long envcfg;
>>+	unsigned long status;

Do we really need a new member field in `thread_struct`. We already have
`sstatus` in `pt_regs` which reflects overall execution environment situation
for current thread. This gets saved and restored on trap entry and exit.

If we put `status` in `thread_struct` it creates ambiguity in terms of which
`status` to save to and pick from from future maintainibility purposes as the
fields get introduced to this CSR.

Why can't we access current trap frame's `sstatus` image in `__switch_to` to
save and restore?

Let me know if I am missing something obvious here. If there is a complication,
I am missing here and we do end up using this member field, I would rename it
to something like `status_kernel` to reflect that. So that future changes are
cognizant of the fact that we have split `status`. One for kernel execution env
per thread and one for controlling user execution env per thread.


>>	u32 riscv_v_flags;
>>	u32 vstate_ctrl;
>>	struct __riscv_v_ext_state vstate;
>>diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
>>index 16490755304e..969c65b1fe41 100644
>>--- a/arch/riscv/kernel/asm-offsets.c
>>+++ b/arch/riscv/kernel/asm-offsets.c
>>@@ -34,6 +34,7 @@ void asm_offsets(void)
>>	OFFSET(TASK_THREAD_S9, task_struct, thread.s[9]);
>>	OFFSET(TASK_THREAD_S10, task_struct, thread.s[10]);
>>	OFFSET(TASK_THREAD_S11, task_struct, thread.s[11]);

Re: [PATCH v6 1/5] riscv: save the SR_SUM status over switches

Posted by Ben Dooks 8 months, 3 weeks ago

On 20/05/2025 17:49, Deepak Gupta wrote:
> I did give this patch my RB and had planned to come back to it to see
> if it impacts cfi related patches. Thanks to alex for brinigng to my
> attention again. As it stands today, it doesn't impact cfi related
> changes but I've some concerns.
> 
> Overall I do agree we should reduce number of SSTATUS accesses.
> 
> Couple of questions on introducing new `sstatus` field (inline)
> 
> On Tue, Apr 22, 2025 at 04:01:35PM -0700, Deepak Gupta wrote:
>> On Thu, Apr 10, 2025 at 07:05:22AM +0000, Cyril Bur wrote:
>>> From: Ben Dooks <ben.dooks@codethink.co.uk>
>>>
>>> When threads/tasks are switched we need to ensure the old execution's
>>> SR_SUM state is saved and the new thread has the old SR_SUM state
>>> restored.
>>>
>>> The issue was seen under heavy load especially with the syz-stress tool
>>> running, with crashes as follows in schedule_tail:
>>>
>>> Unable to handle kernel access to user memory without uaccess routines
>>> at virtual address 000000002749f0d0
>>> Oops [#1]
>>> Modules linked in:
>>> CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted
>>> 5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0
>>> Hardware name: riscv-virtio,qemu (DT)
>>> epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>>> ra : task_pid_vnr include/linux/sched.h:1421 [inline]
>>> ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264
>>> epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp : ffffffe025d17ec0
>>> gp : ffffffe005d25378 tp : ffffffe00f0d0000 t0 : 0000000000000000
>>> t1 : 0000000000000001 t2 : 00000000000f4240 s0 : ffffffe025d17ee0
>>> s1 : 000000002749f0d0 a0 : 000000000000002a a1 : 0000000000000003
>>> a2 : 1ffffffc0cfac500 a3 : ffffffe0000c80cc a4 : 5ae9db91c19bbe00
>>> a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffe000082eba
>>> s2 : 0000000000040000 s3 : ffffffe00eef96c0 s4 : ffffffe022c77fe0
>>> s5 : 0000000000004000 s6 : ffffffe067d74e00 s7 : ffffffe067d74850
>>> s8 : ffffffe067d73e18 s9 : ffffffe067d74e00 s10: ffffffe00eef96e8
>>> s11: 000000ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffffffc4043cafb2
>>> t5 : ffffffc4043cafba t6 : 0000000000040000
>>> status: 0000000000000120 badaddr: 000000002749f0d0 cause:
>>> 000000000000000f
>>> Call Trace:
>>> [<ffffffe00008c8b0>] schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>>> [<ffffffe000005570>] ret_from_exception+0x0/0x14
>>> Dumping ftrace buffer:
>>>  (ftrace buffer empty)
>>> ---[ end trace b5f8f9231dc87dda ]---
>>>
>>> The issue comes from the put_user() in schedule_tail
>>> (kernel/sched/core.c) doing the following:
>>>
>>> asmlinkage __visible void schedule_tail(struct task_struct *prev)
>>> {
>>> ...
>>>       if (current->set_child_tid)
>>>               put_user(task_pid_vnr(current), current->set_child_tid);
>>> ...
>>> }
>>>
>>> the put_user() macro causes the code sequence to come out as follows:
>>>
>>> 1:    __enable_user_access()
>>> 2:    reg = task_pid_vnr(current);
>>> 3:    *current->set_child_tid = reg;
>>> 4:    __disable_user_access()
>>>
>>> The problem is that we may have a sleeping function as argument which
>>> could clear SR_SUM causing the panic above. This was fixed by
>>> evaluating the argument of the put_user() macro outside the user-enabled
>>> section in commit 285a76bb2cf5 ("riscv: evaluate put_user() arg before
>>> enabling user access")"
>>>
>>> In order for riscv to take advantage of unsafe_get/put_XXX() macros and
>>> to avoid the same issue we had with put_user() and sleeping functions we
>>> must ensure code flow can go through switch_to() from within a region of
>>> code with SR_SUM enabled and come back with SR_SUM still enabled. This
>>> patch addresses the problem allowing future work to enable full use of
>>> unsafe_get/put_XXX() macros without needing to take a CSR bit flip cost
>>> on every access. Make switch_to() save and restore SR_SUM.
>>>
>>> Reported-by: syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.com
>>> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
>>> Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com>
>>> ---
>>> arch/riscv/include/asm/processor.h | 1 +
>>> arch/riscv/kernel/asm-offsets.c    | 5 +++++
>>> arch/riscv/kernel/entry.S          | 8 ++++++++
>>> 3 files changed, 14 insertions(+)
>>>
>>> diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/ 
>>> asm/processor.h
>>> index 5f56eb9d114a..58fd11c89fe9 100644
>>> --- a/arch/riscv/include/asm/processor.h
>>> +++ b/arch/riscv/include/asm/processor.h
>>> @@ -103,6 +103,7 @@ struct thread_struct {
>>>     struct __riscv_d_ext_state fstate;
>>>     unsigned long bad_cause;
>>>     unsigned long envcfg;
>>> +    unsigned long status;
> 
> Do we really need a new member field in `thread_struct`. We already have
> `sstatus` in `pt_regs` which reflects overall execution environment 
> situation
> for current thread. This gets saved and restored on trap entry and exit.
> 
> If we put `status` in `thread_struct` it creates ambiguity in terms of 
> which
> `status` to save to and pick from from future maintainibility purposes 
> as the
> fields get introduced to this CSR.
> 
> Why can't we access current trap frame's `sstatus` image in 
> `__switch_to` to
> save and restore?
> 
> Let me know if I am missing something obvious here. If there is a 
> complication,
> I am missing here and we do end up using this member field, I would 
> rename it
> to something like `status_kernel` to reflect that. So that future 
> changes are
> cognizant of the fact that we have split `status`. One for kernel 
> execution env
> per thread and one for controlling user execution env per thread.

This is so long ago now I cannot remember if there was any sstatus in
the pt_regs field, and if kernel threads have the same context as their
userland parts.

Does anyone else have any comment on this?

> 
>>>     u32 riscv_v_flags;
>>>     u32 vstate_ctrl;
>>>     struct __riscv_v_ext_state vstate;
>>> diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm- 
>>> offsets.c
>>> index 16490755304e..969c65b1fe41 100644
>>> --- a/arch/riscv/kernel/asm-offsets.c
>>> +++ b/arch/riscv/kernel/asm-offsets.c
>>> @@ -34,6 +34,7 @@ void asm_offsets(void)
>>>     OFFSET(TASK_THREAD_S9, task_struct, thread.s[9]);
>>>     OFFSET(TASK_THREAD_S10, task_struct, thread.s[10]);
>>>     OFFSET(TASK_THREAD_S11, task_struct, thread.s[11]);
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
> 


-- 
Ben Dooks				http://www.codethink.co.uk/
Senior Engineer				Codethink - Providing Genius

https://www.codethink.co.uk/privacy.html

Re: [PATCH v6 1/5] riscv: save the SR_SUM status over switches

Posted by Deepak Gupta 8 months, 3 weeks ago

On Thu, May 22, 2025 at 07:23:32AM +0100, Ben Dooks wrote:
>On 20/05/2025 17:49, Deepak Gupta wrote:
>>I did give this patch my RB and had planned to come back to it to see
>>if it impacts cfi related patches. Thanks to alex for brinigng to my
>>attention again. As it stands today, it doesn't impact cfi related
>>changes but I've some concerns.
>>
>>Overall I do agree we should reduce number of SSTATUS accesses.
>>
>>Couple of questions on introducing new `sstatus` field (inline)
>>
>>On Tue, Apr 22, 2025 at 04:01:35PM -0700, Deepak Gupta wrote:
>>>On Thu, Apr 10, 2025 at 07:05:22AM +0000, Cyril Bur wrote:
>>>>From: Ben Dooks <ben.dooks@codethink.co.uk>
>>>>
>>>>When threads/tasks are switched we need to ensure the old execution's
>>>>SR_SUM state is saved and the new thread has the old SR_SUM state
>>>>restored.
>>>>
>>>>The issue was seen under heavy load especially with the syz-stress tool
>>>>running, with crashes as follows in schedule_tail:
>>>>
>>>>Unable to handle kernel access to user memory without uaccess routines
>>>>at virtual address 000000002749f0d0
>>>>Oops [#1]
>>>>Modules linked in:
>>>>CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted
>>>>5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0
>>>>Hardware name: riscv-virtio,qemu (DT)
>>>>epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>>>>ra : task_pid_vnr include/linux/sched.h:1421 [inline]
>>>>ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264
>>>>epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp : ffffffe025d17ec0
>>>>gp : ffffffe005d25378 tp : ffffffe00f0d0000 t0 : 0000000000000000
>>>>t1 : 0000000000000001 t2 : 00000000000f4240 s0 : ffffffe025d17ee0
>>>>s1 : 000000002749f0d0 a0 : 000000000000002a a1 : 0000000000000003
>>>>a2 : 1ffffffc0cfac500 a3 : ffffffe0000c80cc a4 : 5ae9db91c19bbe00
>>>>a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffe000082eba
>>>>s2 : 0000000000040000 s3 : ffffffe00eef96c0 s4 : ffffffe022c77fe0
>>>>s5 : 0000000000004000 s6 : ffffffe067d74e00 s7 : ffffffe067d74850
>>>>s8 : ffffffe067d73e18 s9 : ffffffe067d74e00 s10: ffffffe00eef96e8
>>>>s11: 000000ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffffffc4043cafb2
>>>>t5 : ffffffc4043cafba t6 : 0000000000040000
>>>>status: 0000000000000120 badaddr: 000000002749f0d0 cause:
>>>>000000000000000f
>>>>Call Trace:
>>>>[<ffffffe00008c8b0>] schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>>>>[<ffffffe000005570>] ret_from_exception+0x0/0x14
>>>>Dumping ftrace buffer:
>>>> (ftrace buffer empty)
>>>>---[ end trace b5f8f9231dc87dda ]---
>>>>
>>>>The issue comes from the put_user() in schedule_tail
>>>>(kernel/sched/core.c) doing the following:
>>>>
>>>>asmlinkage __visible void schedule_tail(struct task_struct *prev)
>>>>{
>>>>...
>>>>      if (current->set_child_tid)
>>>>              put_user(task_pid_vnr(current), current->set_child_tid);
>>>>...
>>>>}
>>>>
>>>>the put_user() macro causes the code sequence to come out as follows:
>>>>
>>>>1:    __enable_user_access()
>>>>2:    reg = task_pid_vnr(current);
>>>>3:    *current->set_child_tid = reg;
>>>>4:    __disable_user_access()
>>>>
>>>>The problem is that we may have a sleeping function as argument which
>>>>could clear SR_SUM causing the panic above. This was fixed by
>>>>evaluating the argument of the put_user() macro outside the user-enabled
>>>>section in commit 285a76bb2cf5 ("riscv: evaluate put_user() arg before
>>>>enabling user access")"
>>>>
>>>>In order for riscv to take advantage of unsafe_get/put_XXX() macros and
>>>>to avoid the same issue we had with put_user() and sleeping functions we
>>>>must ensure code flow can go through switch_to() from within a region of
>>>>code with SR_SUM enabled and come back with SR_SUM still enabled. This
>>>>patch addresses the problem allowing future work to enable full use of
>>>>unsafe_get/put_XXX() macros without needing to take a CSR bit flip cost
>>>>on every access. Make switch_to() save and restore SR_SUM.
>>>>
>>>>Reported-by: syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.com
>>>>Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
>>>>Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com>
>>>>---
>>>>arch/riscv/include/asm/processor.h | 1 +
>>>>arch/riscv/kernel/asm-offsets.c    | 5 +++++
>>>>arch/riscv/kernel/entry.S          | 8 ++++++++
>>>>3 files changed, 14 insertions(+)
>>>>
>>>>diff --git a/arch/riscv/include/asm/processor.h 
>>>>b/arch/riscv/include/ asm/processor.h
>>>>index 5f56eb9d114a..58fd11c89fe9 100644
>>>>--- a/arch/riscv/include/asm/processor.h
>>>>+++ b/arch/riscv/include/asm/processor.h
>>>>@@ -103,6 +103,7 @@ struct thread_struct {
>>>>    struct __riscv_d_ext_state fstate;
>>>>    unsigned long bad_cause;
>>>>    unsigned long envcfg;
>>>>+    unsigned long status;
>>
>>Do we really need a new member field in `thread_struct`. We already have
>>`sstatus` in `pt_regs` which reflects overall execution environment 
>>situation
>>for current thread. This gets saved and restored on trap entry and exit.
>>
>>If we put `status` in `thread_struct` it creates ambiguity in terms 
>>of which
>>`status` to save to and pick from from future maintainibility 
>>purposes as the
>>fields get introduced to this CSR.
>>
>>Why can't we access current trap frame's `sstatus` image in 
>>`__switch_to` to
>>save and restore?
>>
>>Let me know if I am missing something obvious here. If there is a 
>>complication,
>>I am missing here and we do end up using this member field, I would 
>>rename it
>>to something like `status_kernel` to reflect that. So that future 
>>changes are
>>cognizant of the fact that we have split `status`. One for kernel 
>>execution env
>>per thread and one for controlling user execution env per thread.
>
>This is so long ago now I cannot remember if there was any sstatus in
>the pt_regs field, 

FS/VS bits encode status of floating point and vector on per-thread basis.
So `status` has been part of `pt_regs` for quite a while. 

> and if kernel threads have the same context as their
>userland parts.

I didn't mean kernel thread. What I meant was kernel execution environment
per-thread. A userland thread does spend sometime in kernel and kernel does
things on its behalf. One of those thing is touching user memory and that
requires mucking with this CSR. So what I meant was are we splitting `status`
on per-thread basis for their time spent in user and kernel.

Getting back to original question--
As I said, each thread spends sometime in user or in kernel. `status` in
`pt_regs` is saved on trap entry and restored on trap exit. In a sense,
`status` field in `pt_regs` is reflecting execution status of the thread on per
trap basis. Introducing `status` in `thread_struct` creates a confusion (if not
for today, certainly for future) of which `status` to pick from when we are
doing save/restore.

So my first question was why not to use `status` in `pt_regs`. It is granular
as it can get (it is available per thread context per trap basis). 


I did ask Alex as well. I'll ping him again.

>
>Does anyone else have any comment on this?
>
>>
>>>>    u32 riscv_v_flags;
>>>>    u32 vstate_ctrl;
>>>>    struct __riscv_v_ext_state vstate;
>>>>diff --git a/arch/riscv/kernel/asm-offsets.c 
>>>>b/arch/riscv/kernel/asm- offsets.c
>>>>index 16490755304e..969c65b1fe41 100644
>>>>--- a/arch/riscv/kernel/asm-offsets.c
>>>>+++ b/arch/riscv/kernel/asm-offsets.c
>>>>@@ -34,6 +34,7 @@ void asm_offsets(void)
>>>>    OFFSET(TASK_THREAD_S9, task_struct, thread.s[9]);
>>>>    OFFSET(TASK_THREAD_S10, task_struct, thread.s[10]);
>>>>    OFFSET(TASK_THREAD_S11, task_struct, thread.s[11]);
>>
>>_______________________________________________
>>linux-riscv mailing list
>>linux-riscv@lists.infradead.org
>>http://lists.infradead.org/mailman/listinfo/linux-riscv
>>
>
>
>-- 
>Ben Dooks				http://www.codethink.co.uk/
>Senior Engineer				Codethink - Providing Genius
>
>https://www.codethink.co.uk/privacy.html

Re: [PATCH v6 1/5] riscv: save the SR_SUM status over switches

Posted by Andy Chiu 8 months, 3 weeks ago

On Thu, May 22, 2025 at 11:09 PM Deepak Gupta <debug@rivosinc.com> wrote:
>
> On Thu, May 22, 2025 at 07:23:32AM +0100, Ben Dooks wrote:
> >On 20/05/2025 17:49, Deepak Gupta wrote:
> >>I did give this patch my RB and had planned to come back to it to see
> >>if it impacts cfi related patches. Thanks to alex for brinigng to my
> >>attention again. As it stands today, it doesn't impact cfi related
> >>changes but I've some concerns.
> >>
> >>Overall I do agree we should reduce number of SSTATUS accesses.
> >>
> >>Couple of questions on introducing new `sstatus` field (inline)
> >>
> >>On Tue, Apr 22, 2025 at 04:01:35PM -0700, Deepak Gupta wrote:
> >>>On Thu, Apr 10, 2025 at 07:05:22AM +0000, Cyril Bur wrote:
> >>>>From: Ben Dooks <ben.dooks@codethink.co.uk>
> >>>>
> >>>>When threads/tasks are switched we need to ensure the old execution's
> >>>>SR_SUM state is saved and the new thread has the old SR_SUM state
> >>>>restored.
> >>>>
> >>>>The issue was seen under heavy load especially with the syz-stress tool
> >>>>running, with crashes as follows in schedule_tail:
> >>>>
> >>>>Unable to handle kernel access to user memory without uaccess routines
> >>>>at virtual address 000000002749f0d0
> >>>>Oops [#1]
> >>>>Modules linked in:
> >>>>CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted
> >>>>5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0
> >>>>Hardware name: riscv-virtio,qemu (DT)
> >>>>epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
> >>>>ra : task_pid_vnr include/linux/sched.h:1421 [inline]
> >>>>ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264
> >>>>epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp : ffffffe025d17ec0
> >>>>gp : ffffffe005d25378 tp : ffffffe00f0d0000 t0 : 0000000000000000
> >>>>t1 : 0000000000000001 t2 : 00000000000f4240 s0 : ffffffe025d17ee0
> >>>>s1 : 000000002749f0d0 a0 : 000000000000002a a1 : 0000000000000003
> >>>>a2 : 1ffffffc0cfac500 a3 : ffffffe0000c80cc a4 : 5ae9db91c19bbe00
> >>>>a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffe000082eba
> >>>>s2 : 0000000000040000 s3 : ffffffe00eef96c0 s4 : ffffffe022c77fe0
> >>>>s5 : 0000000000004000 s6 : ffffffe067d74e00 s7 : ffffffe067d74850
> >>>>s8 : ffffffe067d73e18 s9 : ffffffe067d74e00 s10: ffffffe00eef96e8
> >>>>s11: 000000ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffffffc4043cafb2
> >>>>t5 : ffffffc4043cafba t6 : 0000000000040000
> >>>>status: 0000000000000120 badaddr: 000000002749f0d0 cause:
> >>>>000000000000000f
> >>>>Call Trace:
> >>>>[<ffffffe00008c8b0>] schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
> >>>>[<ffffffe000005570>] ret_from_exception+0x0/0x14
> >>>>Dumping ftrace buffer:
> >>>> (ftrace buffer empty)
> >>>>---[ end trace b5f8f9231dc87dda ]---
> >>>>
> >>>>The issue comes from the put_user() in schedule_tail
> >>>>(kernel/sched/core.c) doing the following:
> >>>>
> >>>>asmlinkage __visible void schedule_tail(struct task_struct *prev)
> >>>>{
> >>>>...
> >>>>      if (current->set_child_tid)
> >>>>              put_user(task_pid_vnr(current), current->set_child_tid);
> >>>>...
> >>>>}
> >>>>
> >>>>the put_user() macro causes the code sequence to come out as follows:
> >>>>
> >>>>1:    __enable_user_access()
> >>>>2:    reg = task_pid_vnr(current);
> >>>>3:    *current->set_child_tid = reg;
> >>>>4:    __disable_user_access()
> >>>>
> >>>>The problem is that we may have a sleeping function as argument which
> >>>>could clear SR_SUM causing the panic above. This was fixed by
> >>>>evaluating the argument of the put_user() macro outside the user-enabled
> >>>>section in commit 285a76bb2cf5 ("riscv: evaluate put_user() arg before
> >>>>enabling user access")"
> >>>>
> >>>>In order for riscv to take advantage of unsafe_get/put_XXX() macros and
> >>>>to avoid the same issue we had with put_user() and sleeping functions we
> >>>>must ensure code flow can go through switch_to() from within a region of
> >>>>code with SR_SUM enabled and come back with SR_SUM still enabled. This
> >>>>patch addresses the problem allowing future work to enable full use of
> >>>>unsafe_get/put_XXX() macros without needing to take a CSR bit flip cost
> >>>>on every access. Make switch_to() save and restore SR_SUM.
> >>>>
> >>>>Reported-by: syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.com
> >>>>Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
> >>>>Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com>
> >>>>---
> >>>>arch/riscv/include/asm/processor.h | 1 +
> >>>>arch/riscv/kernel/asm-offsets.c    | 5 +++++
> >>>>arch/riscv/kernel/entry.S          | 8 ++++++++
> >>>>3 files changed, 14 insertions(+)
> >>>>
> >>>>diff --git a/arch/riscv/include/asm/processor.h
> >>>>b/arch/riscv/include/ asm/processor.h
> >>>>index 5f56eb9d114a..58fd11c89fe9 100644
> >>>>--- a/arch/riscv/include/asm/processor.h
> >>>>+++ b/arch/riscv/include/asm/processor.h
> >>>>@@ -103,6 +103,7 @@ struct thread_struct {
> >>>>    struct __riscv_d_ext_state fstate;
> >>>>    unsigned long bad_cause;
> >>>>    unsigned long envcfg;
> >>>>+    unsigned long status;
> >>
> >>Do we really need a new member field in `thread_struct`. We already have
> >>`sstatus` in `pt_regs` which reflects overall execution environment
> >>situation
> >>for current thread. This gets saved and restored on trap entry and exit.
> >>
> >>If we put `status` in `thread_struct` it creates ambiguity in terms
> >>of which
> >>`status` to save to and pick from from future maintainibility
> >>purposes as the
> >>fields get introduced to this CSR.
> >>
> >>Why can't we access current trap frame's `sstatus` image in
> >>`__switch_to` to
> >>save and restore?
> >>
> >>Let me know if I am missing something obvious here. If there is a
> >>complication,
> >>I am missing here and we do end up using this member field, I would
> >>rename it
> >>to something like `status_kernel` to reflect that. So that future
> >>changes are
> >>cognizant of the fact that we have split `status`. One for kernel
> >>execution env
> >>per thread and one for controlling user execution env per thread.
> >
> >This is so long ago now I cannot remember if there was any sstatus in
> >the pt_regs field,
>
> FS/VS bits encode status of floating point and vector on per-thread basis.
> So `status` has been part of `pt_regs` for quite a while.
>
> > and if kernel threads have the same context as their
> >userland parts.
>
> I didn't mean kernel thread. What I meant was kernel execution environment
> per-thread. A userland thread does spend sometime in kernel and kernel does
> things on its behalf. One of those thing is touching user memory and that
> requires mucking with this CSR. So what I meant was are we splitting `status`
> on per-thread basis for their time spent in user and kernel.
>
> Getting back to original question--
> As I said, each thread spends sometime in user or in kernel. `status` in
> `pt_regs` is saved on trap entry and restored on trap exit. In a sense,
> `status` field in `pt_regs` is reflecting execution status of the thread on per
> trap basis. Introducing `status` in `thread_struct` creates a confusion (if not
> for today, certainly for future) of which `status` to pick from when we are
> doing save/restore.

I agree that it's a confusion. sstatus is already saved on pt_regs on
trap entries/return, adding another entry adds code complexity and
makes data inconsistent. But, perhaps we'd eventually need something
like this (I will explain why). Still, there might be a better
approach.

Yes, we can always reflect pt_regs for sstatus. We all know that
pt_regs reflects sstatus at trap entry, and the pt_regs at scheduler
point refers to "user's" pt_regs whenever it first enters kernel mode. Here
are reasons why SR_SUM here may or may not be properly tracked. First,
if this is a trap introduced context switch (such as interrupting in a
preemptible context after we manually enable user access in put_user),
then SR_SUM is saved somewhere in the kernel stack, and is not
reference-able with task_pt_reg during context switch. But we are safe
because the trap exit asm would help us restore the correct SR_SUM
back. However, if this is a self-initiating context switch (calling
into schedule()), then SR_SUM is not saved anywhere, and possibly
causing this error.

Preemptible Vector in the kernel mode also had this problem where a
self-initiating context switch loses the track of sstatus.vs. The way
I managed it is to track the VS bit at context switch time. However,
this bug shows that people are repeatedly facing the problem, and
maybe it suggests that we'd need a better way of managing sstatus
across context switches. Given the complex nature of this register,
which also touches the interrupt enable status, I don't think naively
saving/restoring the entire register is the way to go. Maybe the
variable deserves a more specific naming and documentation. And if
we'd need a centralized place for managing these statuses, then it
also has to take care of sstatus.VS.

Thanks,
Andy




>
> So my first question was why not to use `status` in `pt_regs`. It is granular
> as it can get (it is available per thread context per trap basis).
>
>
> I did ask Alex as well. I'll ping him again.
>
> >
> >Does anyone else have any comment on this?
> >
> >>
> >>>>    u32 riscv_v_flags;
> >>>>    u32 vstate_ctrl;
> >>>>    struct __riscv_v_ext_state vstate;
> >>>>diff --git a/arch/riscv/kernel/asm-offsets.c
> >>>>b/arch/riscv/kernel/asm- offsets.c
> >>>>index 16490755304e..969c65b1fe41 100644
> >>>>--- a/arch/riscv/kernel/asm-offsets.c
> >>>>+++ b/arch/riscv/kernel/asm-offsets.c
> >>>>@@ -34,6 +34,7 @@ void asm_offsets(void)
> >>>>    OFFSET(TASK_THREAD_S9, task_struct, thread.s[9]);
> >>>>    OFFSET(TASK_THREAD_S10, task_struct, thread.s[10]);
> >>>>    OFFSET(TASK_THREAD_S11, task_struct, thread.s[11]);
> >>
> >>_______________________________________________
> >>linux-riscv mailing list
> >>linux-riscv@lists.infradead.org
> >>http://lists.infradead.org/mailman/listinfo/linux-riscv
> >>
> >
> >
> >--
> >Ben Dooks                              http://www.codethink.co.uk/
> >Senior Engineer                                Codethink - Providing Genius
> >
> >https://www.codethink.co.uk/privacy.html
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

Re: [PATCH v6 1/5] riscv: save the SR_SUM status over switches

Posted by Deepak Gupta 8 months, 3 weeks ago

On Fri, May 23, 2025 at 01:42:49AM +0800, Andy Chiu wrote:
>On Thu, May 22, 2025 at 11:09 PM Deepak Gupta <debug@rivosinc.com> wrote:
>>
>> On Thu, May 22, 2025 at 07:23:32AM +0100, Ben Dooks wrote:
>> >On 20/05/2025 17:49, Deepak Gupta wrote:
>> >>I did give this patch my RB and had planned to come back to it to see
>> >>if it impacts cfi related patches. Thanks to alex for brinigng to my
>> >>attention again. As it stands today, it doesn't impact cfi related
>> >>changes but I've some concerns.
>> >>
>> >>Overall I do agree we should reduce number of SSTATUS accesses.
>> >>
>> >>Couple of questions on introducing new `sstatus` field (inline)
>> >>
>> >>On Tue, Apr 22, 2025 at 04:01:35PM -0700, Deepak Gupta wrote:
>> >>>On Thu, Apr 10, 2025 at 07:05:22AM +0000, Cyril Bur wrote:
>> >>>>From: Ben Dooks <ben.dooks@codethink.co.uk>
>> >>>>
>> >>>>When threads/tasks are switched we need to ensure the old execution's
>> >>>>SR_SUM state is saved and the new thread has the old SR_SUM state
>> >>>>restored.
>> >>>>
>> >>>>The issue was seen under heavy load especially with the syz-stress tool
>> >>>>running, with crashes as follows in schedule_tail:
>> >>>>
>> >>>>Unable to handle kernel access to user memory without uaccess routines
>> >>>>at virtual address 000000002749f0d0
>> >>>>Oops [#1]
>> >>>>Modules linked in:
>> >>>>CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted
>> >>>>5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0
>> >>>>Hardware name: riscv-virtio,qemu (DT)
>> >>>>epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>> >>>>ra : task_pid_vnr include/linux/sched.h:1421 [inline]
>> >>>>ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264
>> >>>>epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp : ffffffe025d17ec0
>> >>>>gp : ffffffe005d25378 tp : ffffffe00f0d0000 t0 : 0000000000000000
>> >>>>t1 : 0000000000000001 t2 : 00000000000f4240 s0 : ffffffe025d17ee0
>> >>>>s1 : 000000002749f0d0 a0 : 000000000000002a a1 : 0000000000000003
>> >>>>a2 : 1ffffffc0cfac500 a3 : ffffffe0000c80cc a4 : 5ae9db91c19bbe00
>> >>>>a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffe000082eba
>> >>>>s2 : 0000000000040000 s3 : ffffffe00eef96c0 s4 : ffffffe022c77fe0
>> >>>>s5 : 0000000000004000 s6 : ffffffe067d74e00 s7 : ffffffe067d74850
>> >>>>s8 : ffffffe067d73e18 s9 : ffffffe067d74e00 s10: ffffffe00eef96e8
>> >>>>s11: 000000ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffffffc4043cafb2
>> >>>>t5 : ffffffc4043cafba t6 : 0000000000040000
>> >>>>status: 0000000000000120 badaddr: 000000002749f0d0 cause:
>> >>>>000000000000000f
>> >>>>Call Trace:
>> >>>>[<ffffffe00008c8b0>] schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>> >>>>[<ffffffe000005570>] ret_from_exception+0x0/0x14
>> >>>>Dumping ftrace buffer:
>> >>>> (ftrace buffer empty)
>> >>>>---[ end trace b5f8f9231dc87dda ]---
>> >>>>
>> >>>>The issue comes from the put_user() in schedule_tail
>> >>>>(kernel/sched/core.c) doing the following:
>> >>>>
>> >>>>asmlinkage __visible void schedule_tail(struct task_struct *prev)
>> >>>>{
>> >>>>...
>> >>>>      if (current->set_child_tid)
>> >>>>              put_user(task_pid_vnr(current), current->set_child_tid);
>> >>>>...
>> >>>>}
>> >>>>
>> >>>>the put_user() macro causes the code sequence to come out as follows:
>> >>>>
>> >>>>1:    __enable_user_access()
>> >>>>2:    reg = task_pid_vnr(current);
>> >>>>3:    *current->set_child_tid = reg;
>> >>>>4:    __disable_user_access()
>> >>>>
>> >>>>The problem is that we may have a sleeping function as argument which
>> >>>>could clear SR_SUM causing the panic above. This was fixed by
>> >>>>evaluating the argument of the put_user() macro outside the user-enabled
>> >>>>section in commit 285a76bb2cf5 ("riscv: evaluate put_user() arg before
>> >>>>enabling user access")"
>> >>>>
>> >>>>In order for riscv to take advantage of unsafe_get/put_XXX() macros and
>> >>>>to avoid the same issue we had with put_user() and sleeping functions we
>> >>>>must ensure code flow can go through switch_to() from within a region of
>> >>>>code with SR_SUM enabled and come back with SR_SUM still enabled. This
>> >>>>patch addresses the problem allowing future work to enable full use of
>> >>>>unsafe_get/put_XXX() macros without needing to take a CSR bit flip cost
>> >>>>on every access. Make switch_to() save and restore SR_SUM.
>> >>>>
>> >>>>Reported-by: syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.com
>> >>>>Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
>> >>>>Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com>
>> >>>>---
>> >>>>arch/riscv/include/asm/processor.h | 1 +
>> >>>>arch/riscv/kernel/asm-offsets.c    | 5 +++++
>> >>>>arch/riscv/kernel/entry.S          | 8 ++++++++
>> >>>>3 files changed, 14 insertions(+)
>> >>>>
>> >>>>diff --git a/arch/riscv/include/asm/processor.h
>> >>>>b/arch/riscv/include/ asm/processor.h
>> >>>>index 5f56eb9d114a..58fd11c89fe9 100644
>> >>>>--- a/arch/riscv/include/asm/processor.h
>> >>>>+++ b/arch/riscv/include/asm/processor.h
>> >>>>@@ -103,6 +103,7 @@ struct thread_struct {
>> >>>>    struct __riscv_d_ext_state fstate;
>> >>>>    unsigned long bad_cause;
>> >>>>    unsigned long envcfg;
>> >>>>+    unsigned long status;
>> >>
>> >>Do we really need a new member field in `thread_struct`. We already have
>> >>`sstatus` in `pt_regs` which reflects overall execution environment
>> >>situation
>> >>for current thread. This gets saved and restored on trap entry and exit.
>> >>
>> >>If we put `status` in `thread_struct` it creates ambiguity in terms
>> >>of which
>> >>`status` to save to and pick from from future maintainibility
>> >>purposes as the
>> >>fields get introduced to this CSR.
>> >>
>> >>Why can't we access current trap frame's `sstatus` image in
>> >>`__switch_to` to
>> >>save and restore?
>> >>
>> >>Let me know if I am missing something obvious here. If there is a
>> >>complication,
>> >>I am missing here and we do end up using this member field, I would
>> >>rename it
>> >>to something like `status_kernel` to reflect that. So that future
>> >>changes are
>> >>cognizant of the fact that we have split `status`. One for kernel
>> >>execution env
>> >>per thread and one for controlling user execution env per thread.
>> >
>> >This is so long ago now I cannot remember if there was any sstatus in
>> >the pt_regs field,
>>
>> FS/VS bits encode status of floating point and vector on per-thread basis.
>> So `status` has been part of `pt_regs` for quite a while.
>>
>> > and if kernel threads have the same context as their
>> >userland parts.
>>
>> I didn't mean kernel thread. What I meant was kernel execution environment
>> per-thread. A userland thread does spend sometime in kernel and kernel does
>> things on its behalf. One of those thing is touching user memory and that
>> requires mucking with this CSR. So what I meant was are we splitting `status`
>> on per-thread basis for their time spent in user and kernel.
>>
>> Getting back to original question--
>> As I said, each thread spends sometime in user or in kernel. `status` in
>> `pt_regs` is saved on trap entry and restored on trap exit. In a sense,
>> `status` field in `pt_regs` is reflecting execution status of the thread on per
>> trap basis. Introducing `status` in `thread_struct` creates a confusion (if not
>> for today, certainly for future) of which `status` to pick from when we are
>> doing save/restore.
>
>I agree that it's a confusion. sstatus is already saved on pt_regs on
>trap entries/return, adding another entry adds code complexity and
>makes data inconsistent. But, perhaps we'd eventually need something
>like this (I will explain why). Still, there might be a better
>approach.
>
>Yes, we can always reflect pt_regs for sstatus. We all know that
>pt_regs reflects sstatus at trap entry, and the pt_regs at scheduler
>point refers to "user's" pt_regs whenever it first enters kernel mode. Here
>are reasons why SR_SUM here may or may not be properly tracked. First,
>if this is a trap introduced context switch (such as interrupting in a
>preemptible context after we manually enable user access in put_user),
>then SR_SUM is saved somewhere in the kernel stack, and is not
>reference-able with task_pt_reg during context switch. But we are safe
>because the trap exit asm would help us restore the correct SR_SUM
>back. However, if this is a self-initiating context switch (calling
>into schedule()), then SR_SUM is not saved anywhere, and possibly
>causing this error.
>
>Preemptible Vector in the kernel mode also had this problem where a
>self-initiating context switch loses the track of sstatus.vs. The way
>I managed it is to track the VS bit at context switch time. However,
>this bug shows that people are repeatedly facing the problem, and
>maybe it suggests that we'd need a better way of managing sstatus
>across context switches. Given the complex nature of this register,
>which also touches the interrupt enable status, I don't think naively
>saving/restoring the entire register is the way to go. Maybe the
>variable deserves a more specific naming and documentation. And if
>we'd need a centralized place for managing these statuses, then it
>also has to take care of sstatus.VS.


IMHO, the problem we are trying to solve in this patch is easily solvable in
below manner.


diff --git a/arch/riscv/include/asm/switch_to.h b/arch/riscv/include/asm/switch_to.h
index 0e71eb82f920..499d00a6fb67 100644
--- a/arch/riscv/include/asm/switch_to.h
+++ b/arch/riscv/include/asm/switch_to.h
@@ -58,6 +58,20 @@ static inline void __switch_to_fpu(struct task_struct *prev,
         fstate_restore(next, task_pt_regs(next));
  }
  
+static inline void __switch_to_status(struct task_struct *prev,
+                                  struct task_struct *next)
+{
+       struct pt_regs *regs;
+
+       /* save status */
+       regs = task_pt_regs(prev);
+       regs->status = csr_read(CSR_STATUS);
+
+       /* restore status */
+       regs = task_pt_regs(next);
+       csr_write(CSR_STATUS, regs->status);
+}
+
  static __always_inline bool has_fpu(void)
  {
         return riscv_has_extension_likely(RISCV_ISA_EXT_f) ||
@@ -115,6 +129,7 @@ do {                                                        \
         struct task_struct *__prev = (prev);            \
         struct task_struct *__next = (next);            \
         __set_prev_cpu(__prev->thread);                 \
+       __switch_to_status(__prev, __next)              \
         if (has_fpu())                                  \
                 __switch_to_fpu(__prev, __next);        \
         if (has_vector() || has_xtheadvector())         \
diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
index 8d25837a9384..a3b98c1be055 100644
--- a/arch/riscv/kernel/entry.S
+++ b/arch/riscv/kernel/entry.S
@@ -162,17 +162,8 @@ SYM_CODE_START(handle_exception)
         REG_S x5,  PT_T0(sp)
         save_from_x6_to_x31
  
-       /*
-        * Disable user-mode memory access as it should only be set in the
-        * actual user copy routines.
-        *
-        * Disable the FPU/Vector to detect illegal usage of floating point
-        * or vector in kernel space.
-        */
-       li t0, SR_SUM | SR_FS_VS | SR_ELP
-
         REG_L s0, TASK_TI_USER_SP(tp)
-       csrrc s1, CSR_STATUS, t0
+       csrr s1, CSR_STATUS
         save_userssp s2, s1
         csrr s2, CSR_EPC
         csrr s3, CSR_TVAL
@@ -185,6 +176,16 @@ SYM_CODE_START(handle_exception)
         REG_S s4, PT_CAUSE(sp)
         REG_S s5, PT_TP(sp)
  
+       /*
+        * It is fresh trap entry. Disable user-mode memory access as it should only be set in the
+        * actual user copy routines.
+        *
+        * Disable the FPU/Vector to detect illegal usage of floating point
+        * or vector in kernel space.
+        */
+       li t0, SR_SUM | SR_FS_VS | SR_ELP
+       csrrc s1, CSR_STATUS, t0
+
         /*
          * Set the scratch register to 0, so that if a recursive exception
          * occurs, the exception vector knows it came from the kernel



During the time spent in kernel if sets SUM bit in status then, above
`__switch_to_status` will ensure that `status` will get saved for current
thread and restored for next thread.

Furthermore, current trap entry code clears FS/VS/SUM (for right reasons). It
represents non-linear change of control flow and thus whatever will execute next
shouldn't need SUM/FS/VS unless it wants to set it). This patch slightly
modifies the flow by first saving the `status` on trap frame (thus if previous
trap frame had SUM=1, it will be saved and restored). And then it
unconditionally clears the SUM/FS/VS to ensure that this new trap context runs
without needing SUM=1. This ensures nesting of trap frames without diluting
security properties of SUM.

>
>Thanks,
>Andy
>
>
>
>
>>
>> So my first question was why not to use `status` in `pt_regs`. It is granular
>> as it can get (it is available per thread context per trap basis).
>>
>>
>> I did ask Alex as well. I'll ping him again.
>>
>> >
>> >Does anyone else have any comment on this?
>> >
>> >>
>> >>>>    u32 riscv_v_flags;
>> >>>>    u32 vstate_ctrl;
>> >>>>    struct __riscv_v_ext_state vstate;
>> >>>>diff --git a/arch/riscv/kernel/asm-offsets.c
>> >>>>b/arch/riscv/kernel/asm- offsets.c
>> >>>>index 16490755304e..969c65b1fe41 100644
>> >>>>--- a/arch/riscv/kernel/asm-offsets.c
>> >>>>+++ b/arch/riscv/kernel/asm-offsets.c
>> >>>>@@ -34,6 +34,7 @@ void asm_offsets(void)
>> >>>>    OFFSET(TASK_THREAD_S9, task_struct, thread.s[9]);
>> >>>>    OFFSET(TASK_THREAD_S10, task_struct, thread.s[10]);
>> >>>>    OFFSET(TASK_THREAD_S11, task_struct, thread.s[11]);
>> >>
>> >>_______________________________________________
>> >>linux-riscv mailing list
>> >>linux-riscv@lists.infradead.org
>> >>http://lists.infradead.org/mailman/listinfo/linux-riscv
>> >>
>> >
>> >
>> >--
>> >Ben Dooks                              http://www.codethink.co.uk/
>> >Senior Engineer                                Codethink - Providing Genius
>> >
>> >https://www.codethink.co.uk/privacy.html
>>
>> _______________________________________________
>> linux-riscv mailing list
>> linux-riscv@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-riscv

Re: [PATCH v6 1/5] riscv: save the SR_SUM status over switches

Posted by Alexandre Ghiti 8 months, 3 weeks ago

Hi Andy, Deepak,

On 5/23/25 00:43, Deepak Gupta wrote:
> On Fri, May 23, 2025 at 01:42:49AM +0800, Andy Chiu wrote:
>> On Thu, May 22, 2025 at 11:09 PM Deepak Gupta <debug@rivosinc.com> 
>> wrote:
>>>
>>> On Thu, May 22, 2025 at 07:23:32AM +0100, Ben Dooks wrote:
>>> >On 20/05/2025 17:49, Deepak Gupta wrote:
>>> >>I did give this patch my RB and had planned to come back to it to see
>>> >>if it impacts cfi related patches. Thanks to alex for brinigng to my
>>> >>attention again. As it stands today, it doesn't impact cfi related
>>> >>changes but I've some concerns.
>>> >>
>>> >>Overall I do agree we should reduce number of SSTATUS accesses.
>>> >>
>>> >>Couple of questions on introducing new `sstatus` field (inline)
>>> >>
>>> >>On Tue, Apr 22, 2025 at 04:01:35PM -0700, Deepak Gupta wrote:
>>> >>>On Thu, Apr 10, 2025 at 07:05:22AM +0000, Cyril Bur wrote:
>>> >>>>From: Ben Dooks <ben.dooks@codethink.co.uk>
>>> >>>>
>>> >>>>When threads/tasks are switched we need to ensure the old 
>>> execution's
>>> >>>>SR_SUM state is saved and the new thread has the old SR_SUM state
>>> >>>>restored.
>>> >>>>
>>> >>>>The issue was seen under heavy load especially with the 
>>> syz-stress tool
>>> >>>>running, with crashes as follows in schedule_tail:
>>> >>>>
>>> >>>>Unable to handle kernel access to user memory without uaccess 
>>> routines
>>> >>>>at virtual address 000000002749f0d0
>>> >>>>Oops [#1]
>>> >>>>Modules linked in:
>>> >>>>CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted
>>> >>>>5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0
>>> >>>>Hardware name: riscv-virtio,qemu (DT)
>>> >>>>epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>>> >>>>ra : task_pid_vnr include/linux/sched.h:1421 [inline]
>>> >>>>ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264
>>> >>>>epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp : ffffffe025d17ec0
>>> >>>>gp : ffffffe005d25378 tp : ffffffe00f0d0000 t0 : 0000000000000000
>>> >>>>t1 : 0000000000000001 t2 : 00000000000f4240 s0 : ffffffe025d17ee0
>>> >>>>s1 : 000000002749f0d0 a0 : 000000000000002a a1 : 0000000000000003
>>> >>>>a2 : 1ffffffc0cfac500 a3 : ffffffe0000c80cc a4 : 5ae9db91c19bbe00
>>> >>>>a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffe000082eba
>>> >>>>s2 : 0000000000040000 s3 : ffffffe00eef96c0 s4 : ffffffe022c77fe0
>>> >>>>s5 : 0000000000004000 s6 : ffffffe067d74e00 s7 : ffffffe067d74850
>>> >>>>s8 : ffffffe067d73e18 s9 : ffffffe067d74e00 s10: ffffffe00eef96e8
>>> >>>>s11: 000000ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffffffc4043cafb2
>>> >>>>t5 : ffffffc4043cafba t6 : 0000000000040000
>>> >>>>status: 0000000000000120 badaddr: 000000002749f0d0 cause:
>>> >>>>000000000000000f
>>> >>>>Call Trace:
>>> >>>>[<ffffffe00008c8b0>] schedule_tail+0x72/0xb2 
>>> kernel/sched/core.c:4264
>>> >>>>[<ffffffe000005570>] ret_from_exception+0x0/0x14
>>> >>>>Dumping ftrace buffer:
>>> >>>> (ftrace buffer empty)
>>> >>>>---[ end trace b5f8f9231dc87dda ]---
>>> >>>>
>>> >>>>The issue comes from the put_user() in schedule_tail
>>> >>>>(kernel/sched/core.c) doing the following:
>>> >>>>
>>> >>>>asmlinkage __visible void schedule_tail(struct task_struct *prev)
>>> >>>>{
>>> >>>>...
>>> >>>>      if (current->set_child_tid)
>>> >>>>              put_user(task_pid_vnr(current), 
>>> current->set_child_tid);
>>> >>>>...
>>> >>>>}
>>> >>>>
>>> >>>>the put_user() macro causes the code sequence to come out as 
>>> follows:
>>> >>>>
>>> >>>>1:    __enable_user_access()
>>> >>>>2:    reg = task_pid_vnr(current);
>>> >>>>3:    *current->set_child_tid = reg;
>>> >>>>4:    __disable_user_access()
>>> >>>>
>>> >>>>The problem is that we may have a sleeping function as argument 
>>> which
>>> >>>>could clear SR_SUM causing the panic above. This was fixed by
>>> >>>>evaluating the argument of the put_user() macro outside the 
>>> user-enabled
>>> >>>>section in commit 285a76bb2cf5 ("riscv: evaluate put_user() arg 
>>> before
>>> >>>>enabling user access")"
>>> >>>>
>>> >>>>In order for riscv to take advantage of unsafe_get/put_XXX() 
>>> macros and
>>> >>>>to avoid the same issue we had with put_user() and sleeping 
>>> functions we
>>> >>>>must ensure code flow can go through switch_to() from within a 
>>> region of
>>> >>>>code with SR_SUM enabled and come back with SR_SUM still 
>>> enabled. This
>>> >>>>patch addresses the problem allowing future work to enable full 
>>> use of
>>> >>>>unsafe_get/put_XXX() macros without needing to take a CSR bit 
>>> flip cost
>>> >>>>on every access. Make switch_to() save and restore SR_SUM.
>>> >>>>
>>> >>>>Reported-by: syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.com
>>> >>>>Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
>>> >>>>Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com>
>>> >>>>---
>>> >>>>arch/riscv/include/asm/processor.h | 1 +
>>> >>>>arch/riscv/kernel/asm-offsets.c    | 5 +++++
>>> >>>>arch/riscv/kernel/entry.S          | 8 ++++++++
>>> >>>>3 files changed, 14 insertions(+)
>>> >>>>
>>> >>>>diff --git a/arch/riscv/include/asm/processor.h
>>> >>>>b/arch/riscv/include/ asm/processor.h
>>> >>>>index 5f56eb9d114a..58fd11c89fe9 100644
>>> >>>>--- a/arch/riscv/include/asm/processor.h
>>> >>>>+++ b/arch/riscv/include/asm/processor.h
>>> >>>>@@ -103,6 +103,7 @@ struct thread_struct {
>>> >>>>    struct __riscv_d_ext_state fstate;
>>> >>>>    unsigned long bad_cause;
>>> >>>>    unsigned long envcfg;
>>> >>>>+    unsigned long status;
>>> >>
>>> >>Do we really need a new member field in `thread_struct`. We 
>>> already have
>>> >>`sstatus` in `pt_regs` which reflects overall execution environment
>>> >>situation
>>> >>for current thread. This gets saved and restored on trap entry and 
>>> exit.
>>> >>
>>> >>If we put `status` in `thread_struct` it creates ambiguity in terms
>>> >>of which
>>> >>`status` to save to and pick from from future maintainibility
>>> >>purposes as the
>>> >>fields get introduced to this CSR.
>>> >>
>>> >>Why can't we access current trap frame's `sstatus` image in
>>> >>`__switch_to` to
>>> >>save and restore?
>>> >>
>>> >>Let me know if I am missing something obvious here. If there is a
>>> >>complication,
>>> >>I am missing here and we do end up using this member field, I would
>>> >>rename it
>>> >>to something like `status_kernel` to reflect that. So that future
>>> >>changes are
>>> >>cognizant of the fact that we have split `status`. One for kernel
>>> >>execution env
>>> >>per thread and one for controlling user execution env per thread.
>>> >
>>> >This is so long ago now I cannot remember if there was any sstatus in
>>> >the pt_regs field,
>>>
>>> FS/VS bits encode status of floating point and vector on per-thread 
>>> basis.
>>> So `status` has been part of `pt_regs` for quite a while.
>>>
>>> > and if kernel threads have the same context as their
>>> >userland parts.
>>>
>>> I didn't mean kernel thread. What I meant was kernel execution 
>>> environment
>>> per-thread. A userland thread does spend sometime in kernel and 
>>> kernel does
>>> things on its behalf. One of those thing is touching user memory and 
>>> that
>>> requires mucking with this CSR. So what I meant was are we splitting 
>>> `status`
>>> on per-thread basis for their time spent in user and kernel.
>>>
>>> Getting back to original question--
>>> As I said, each thread spends sometime in user or in kernel. 
>>> `status` in
>>> `pt_regs` is saved on trap entry and restored on trap exit. In a sense,
>>> `status` field in `pt_regs` is reflecting execution status of the 
>>> thread on per
>>> trap basis. Introducing `status` in `thread_struct` creates a 
>>> confusion (if not
>>> for today, certainly for future) of which `status` to pick from when 
>>> we are
>>> doing save/restore.
>>
>> I agree that it's a confusion. sstatus is already saved on pt_regs on
>> trap entries/return, adding another entry adds code complexity and
>> makes data inconsistent. But, perhaps we'd eventually need something
>> like this (I will explain why). Still, there might be a better
>> approach.
>>
>> Yes, we can always reflect pt_regs for sstatus. We all know that
>> pt_regs reflects sstatus at trap entry, and the pt_regs at scheduler
>> point refers to "user's" pt_regs whenever it first enters kernel 
>> mode. Here
>> are reasons why SR_SUM here may or may not be properly tracked. First,
>> if this is a trap introduced context switch (such as interrupting in a
>> preemptible context after we manually enable user access in put_user),
>> then SR_SUM is saved somewhere in the kernel stack, and is not
>> reference-able with task_pt_reg during context switch. But we are safe
>> because the trap exit asm would help us restore the correct SR_SUM
>> back. However, if this is a self-initiating context switch (calling
>> into schedule()), then SR_SUM is not saved anywhere, and possibly
>> causing this error.
>>
>> Preemptible Vector in the kernel mode also had this problem where a
>> self-initiating context switch loses the track of sstatus.vs. The way
>> I managed it is to track the VS bit at context switch time. However,
>> this bug shows that people are repeatedly facing the problem, and
>> maybe it suggests that we'd need a better way of managing sstatus
>> across context switches. Given the complex nature of this register,
>> which also touches the interrupt enable status, I don't think naively
>> saving/restoring the entire register is the way to go. Maybe the
>> variable deserves a more specific naming and documentation. And if
>> we'd need a centralized place for managing these statuses, then it
>> also has to take care of sstatus.VS.


Andy, thanks for the precise explanation of the problem :)

So it took me some time but here are my thoughts on this. We should 
treat pt_regs and thread_struct differently as they do not represent the 
same thing:
- pt_regs represents the context of a thread when it takes a trap
- thread_struct represents a "kernel-induced" (or a "in-kernel") context 
not caused by traps

That's why I don't really like Deepak's proposal below as it mixes both 
and I find it tricky.

I can't find a situation where saving/restoring the entire sstatus at 
context-switch is a problem though, does anyone have such thing in mind?

Finally I understand that having another copy of sstatus in 
thread_struct is not intuitive and we should, either explain why or only 
store the SUM bit (like for sstatus.VS).

Please continue the discussion as we need to find a solution that 
pleases everyone soon :)

Thanks all for jumping in,

Alex


>
>
> IMHO, the problem we are trying to solve in this patch is easily 
> solvable in
> below manner.
>
>
> diff --git a/arch/riscv/include/asm/switch_to.h 
> b/arch/riscv/include/asm/switch_to.h
> index 0e71eb82f920..499d00a6fb67 100644
> --- a/arch/riscv/include/asm/switch_to.h
> +++ b/arch/riscv/include/asm/switch_to.h
> @@ -58,6 +58,20 @@ static inline void __switch_to_fpu(struct 
> task_struct *prev,
>         fstate_restore(next, task_pt_regs(next));
>  }
>
> +static inline void __switch_to_status(struct task_struct *prev,
> +                                  struct task_struct *next)
> +{
> +       struct pt_regs *regs;
> +
> +       /* save status */
> +       regs = task_pt_regs(prev);
> +       regs->status = csr_read(CSR_STATUS);
> +
> +       /* restore status */
> +       regs = task_pt_regs(next);
> +       csr_write(CSR_STATUS, regs->status);
> +}
> +
>  static __always_inline bool has_fpu(void)
>  {
>         return riscv_has_extension_likely(RISCV_ISA_EXT_f) ||
> @@ -115,6 +129,7 @@ do 
> {                                                        \
>         struct task_struct *__prev = (prev);            \
>         struct task_struct *__next = (next);            \
>         __set_prev_cpu(__prev->thread);                 \
> +       __switch_to_status(__prev, __next)              \
>         if (has_fpu())                                  \
>                 __switch_to_fpu(__prev, __next);        \
>         if (has_vector() || has_xtheadvector())         \
> diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
> index 8d25837a9384..a3b98c1be055 100644
> --- a/arch/riscv/kernel/entry.S
> +++ b/arch/riscv/kernel/entry.S
> @@ -162,17 +162,8 @@ SYM_CODE_START(handle_exception)
>         REG_S x5,  PT_T0(sp)
>         save_from_x6_to_x31
>
> -       /*
> -        * Disable user-mode memory access as it should only be set in 
> the
> -        * actual user copy routines.
> -        *
> -        * Disable the FPU/Vector to detect illegal usage of floating 
> point
> -        * or vector in kernel space.
> -        */
> -       li t0, SR_SUM | SR_FS_VS | SR_ELP
> -
>         REG_L s0, TASK_TI_USER_SP(tp)
> -       csrrc s1, CSR_STATUS, t0
> +       csrr s1, CSR_STATUS
>         save_userssp s2, s1
>         csrr s2, CSR_EPC
>         csrr s3, CSR_TVAL
> @@ -185,6 +176,16 @@ SYM_CODE_START(handle_exception)
>         REG_S s4, PT_CAUSE(sp)
>         REG_S s5, PT_TP(sp)
>
> +       /*
> +        * It is fresh trap entry. Disable user-mode memory access as 
> it should only be set in the
> +        * actual user copy routines.
> +        *
> +        * Disable the FPU/Vector to detect illegal usage of floating 
> point
> +        * or vector in kernel space.
> +        */
> +       li t0, SR_SUM | SR_FS_VS | SR_ELP
> +       csrrc s1, CSR_STATUS, t0
> +
>         /*
>          * Set the scratch register to 0, so that if a recursive 
> exception
>          * occurs, the exception vector knows it came from the kernel
>
>
>
> During the time spent in kernel if sets SUM bit in status then, above
> `__switch_to_status` will ensure that `status` will get saved for current
> thread and restored for next thread.
>
> Furthermore, current trap entry code clears FS/VS/SUM (for right 
> reasons). It
> represents non-linear change of control flow and thus whatever will 
> execute next
> shouldn't need SUM/FS/VS unless it wants to set it). This patch slightly
> modifies the flow by first saving the `status` on trap frame (thus if 
> previous
> trap frame had SUM=1, it will be saved and restored). And then it
> unconditionally clears the SUM/FS/VS to ensure that this new trap 
> context runs
> without needing SUM=1. This ensures nesting of trap frames without 
> diluting
> security properties of SUM.
>
>>
>> Thanks,
>> Andy
>>
>>
>>
>>
>>>
>>> So my first question was why not to use `status` in `pt_regs`. It is 
>>> granular
>>> as it can get (it is available per thread context per trap basis).
>>>
>>>
>>> I did ask Alex as well. I'll ping him again.
>>>
>>> >
>>> >Does anyone else have any comment on this?
>>> >
>>> >>
>>> >>>>    u32 riscv_v_flags;
>>> >>>>    u32 vstate_ctrl;
>>> >>>>    struct __riscv_v_ext_state vstate;
>>> >>>>diff --git a/arch/riscv/kernel/asm-offsets.c
>>> >>>>b/arch/riscv/kernel/asm- offsets.c
>>> >>>>index 16490755304e..969c65b1fe41 100644
>>> >>>>--- a/arch/riscv/kernel/asm-offsets.c
>>> >>>>+++ b/arch/riscv/kernel/asm-offsets.c
>>> >>>>@@ -34,6 +34,7 @@ void asm_offsets(void)
>>> >>>>    OFFSET(TASK_THREAD_S9, task_struct, thread.s[9]);
>>> >>>>    OFFSET(TASK_THREAD_S10, task_struct, thread.s[10]);
>>> >>>>    OFFSET(TASK_THREAD_S11, task_struct, thread.s[11]);
>>> >>
>>> >>_______________________________________________
>>> >>linux-riscv mailing list
>>> >>linux-riscv@lists.infradead.org
>>> >>http://lists.infradead.org/mailman/listinfo/linux-riscv
>>> >>
>>> >
>>> >
>>> >--
>>> >Ben Dooks http://www.codethink.co.uk/
>>> >Senior Engineer                                Codethink - 
>>> Providing Genius
>>> >
>>> >https://www.codethink.co.uk/privacy.html
>>>
>>> _______________________________________________
>>> linux-riscv mailing list
>>> linux-riscv@lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/linux-riscv
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

Re: [PATCH v6 1/5] riscv: save the SR_SUM status over switches

Posted by Deepak Gupta 8 months, 3 weeks ago

On Fri, May 23, 2025 at 02:22:21PM +0200, Alexandre Ghiti wrote:
>Hi Andy, Deepak,
>
>On 5/23/25 00:43, Deepak Gupta wrote:
>>On Fri, May 23, 2025 at 01:42:49AM +0800, Andy Chiu wrote:
>>>On Thu, May 22, 2025 at 11:09 PM Deepak Gupta <debug@rivosinc.com> 
>>>wrote:
>>>>
>>>>On Thu, May 22, 2025 at 07:23:32AM +0100, Ben Dooks wrote:
>>>>>On 20/05/2025 17:49, Deepak Gupta wrote:
>>>>>>I did give this patch my RB and had planned to come back to it to see
>>>>>>if it impacts cfi related patches. Thanks to alex for brinigng to my
>>>>>>attention again. As it stands today, it doesn't impact cfi related
>>>>>>changes but I've some concerns.
>>>>>>
>>>>>>Overall I do agree we should reduce number of SSTATUS accesses.
>>>>>>
>>>>>>Couple of questions on introducing new `sstatus` field (inline)
>>>>>>
>>>>>>On Tue, Apr 22, 2025 at 04:01:35PM -0700, Deepak Gupta wrote:
>>>>>>>On Thu, Apr 10, 2025 at 07:05:22AM +0000, Cyril Bur wrote:
>>>>>>>>From: Ben Dooks <ben.dooks@codethink.co.uk>
>>>>>>>>
>>>>>>>>When threads/tasks are switched we need to ensure the old 
>>>>execution's
>>>>>>>>SR_SUM state is saved and the new thread has the old SR_SUM state
>>>>>>>>restored.
>>>>>>>>
>>>>>>>>The issue was seen under heavy load especially with the 
>>>>syz-stress tool
>>>>>>>>running, with crashes as follows in schedule_tail:
>>>>>>>>
>>>>>>>>Unable to handle kernel access to user memory without 
>>>>uaccess routines
>>>>>>>>at virtual address 000000002749f0d0
>>>>>>>>Oops [#1]
>>>>>>>>Modules linked in:
>>>>>>>>CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted
>>>>>>>>5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0
>>>>>>>>Hardware name: riscv-virtio,qemu (DT)
>>>>>>>>epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>>>>>>>>ra : task_pid_vnr include/linux/sched.h:1421 [inline]
>>>>>>>>ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264
>>>>>>>>epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp : ffffffe025d17ec0
>>>>>>>>gp : ffffffe005d25378 tp : ffffffe00f0d0000 t0 : 0000000000000000
>>>>>>>>t1 : 0000000000000001 t2 : 00000000000f4240 s0 : ffffffe025d17ee0
>>>>>>>>s1 : 000000002749f0d0 a0 : 000000000000002a a1 : 0000000000000003
>>>>>>>>a2 : 1ffffffc0cfac500 a3 : ffffffe0000c80cc a4 : 5ae9db91c19bbe00
>>>>>>>>a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffe000082eba
>>>>>>>>s2 : 0000000000040000 s3 : ffffffe00eef96c0 s4 : ffffffe022c77fe0
>>>>>>>>s5 : 0000000000004000 s6 : ffffffe067d74e00 s7 : ffffffe067d74850
>>>>>>>>s8 : ffffffe067d73e18 s9 : ffffffe067d74e00 s10: ffffffe00eef96e8
>>>>>>>>s11: 000000ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffffffc4043cafb2
>>>>>>>>t5 : ffffffc4043cafba t6 : 0000000000040000
>>>>>>>>status: 0000000000000120 badaddr: 000000002749f0d0 cause:
>>>>>>>>000000000000000f
>>>>>>>>Call Trace:
>>>>>>>>[<ffffffe00008c8b0>] schedule_tail+0x72/0xb2 
>>>>kernel/sched/core.c:4264
>>>>>>>>[<ffffffe000005570>] ret_from_exception+0x0/0x14
>>>>>>>>Dumping ftrace buffer:
>>>>>>>> (ftrace buffer empty)
>>>>>>>>---[ end trace b5f8f9231dc87dda ]---
>>>>>>>>
>>>>>>>>The issue comes from the put_user() in schedule_tail
>>>>>>>>(kernel/sched/core.c) doing the following:
>>>>>>>>
>>>>>>>>asmlinkage __visible void schedule_tail(struct task_struct *prev)
>>>>>>>>{
>>>>>>>>...
>>>>>>>>      if (current->set_child_tid)
>>>>>>>>              put_user(task_pid_vnr(current), 
>>>>current->set_child_tid);
>>>>>>>>...
>>>>>>>>}
>>>>>>>>
>>>>>>>>the put_user() macro causes the code sequence to come out as 
>>>>follows:
>>>>>>>>
>>>>>>>>1:    __enable_user_access()
>>>>>>>>2:    reg = task_pid_vnr(current);
>>>>>>>>3:    *current->set_child_tid = reg;
>>>>>>>>4:    __disable_user_access()
>>>>>>>>
>>>>>>>>The problem is that we may have a sleeping function as 
>>>>argument which
>>>>>>>>could clear SR_SUM causing the panic above. This was fixed by
>>>>>>>>evaluating the argument of the put_user() macro outside the 
>>>>user-enabled
>>>>>>>>section in commit 285a76bb2cf5 ("riscv: evaluate put_user() 
>>>>arg before
>>>>>>>>enabling user access")"
>>>>>>>>
>>>>>>>>In order for riscv to take advantage of unsafe_get/put_XXX() 
>>>>macros and
>>>>>>>>to avoid the same issue we had with put_user() and sleeping 
>>>>functions we
>>>>>>>>must ensure code flow can go through switch_to() from within 
>>>>a region of
>>>>>>>>code with SR_SUM enabled and come back with SR_SUM still 
>>>>enabled. This
>>>>>>>>patch addresses the problem allowing future work to enable 
>>>>full use of
>>>>>>>>unsafe_get/put_XXX() macros without needing to take a CSR 
>>>>bit flip cost
>>>>>>>>on every access. Make switch_to() save and restore SR_SUM.
>>>>>>>>
>>>>>>>>Reported-by: syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.com
>>>>>>>>Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
>>>>>>>>Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com>
>>>>>>>>---
>>>>>>>>arch/riscv/include/asm/processor.h | 1 +
>>>>>>>>arch/riscv/kernel/asm-offsets.c    | 5 +++++
>>>>>>>>arch/riscv/kernel/entry.S          | 8 ++++++++
>>>>>>>>3 files changed, 14 insertions(+)
>>>>>>>>
>>>>>>>>diff --git a/arch/riscv/include/asm/processor.h
>>>>>>>>b/arch/riscv/include/ asm/processor.h
>>>>>>>>index 5f56eb9d114a..58fd11c89fe9 100644
>>>>>>>>--- a/arch/riscv/include/asm/processor.h
>>>>>>>>+++ b/arch/riscv/include/asm/processor.h
>>>>>>>>@@ -103,6 +103,7 @@ struct thread_struct {
>>>>>>>>    struct __riscv_d_ext_state fstate;
>>>>>>>>    unsigned long bad_cause;
>>>>>>>>    unsigned long envcfg;
>>>>>>>>+    unsigned long status;
>>>>>>
>>>>>>Do we really need a new member field in `thread_struct`. We 
>>>>already have
>>>>>>`sstatus` in `pt_regs` which reflects overall execution environment
>>>>>>situation
>>>>>>for current thread. This gets saved and restored on trap entry 
>>>>and exit.
>>>>>>
>>>>>>If we put `status` in `thread_struct` it creates ambiguity in terms
>>>>>>of which
>>>>>>`status` to save to and pick from from future maintainibility
>>>>>>purposes as the
>>>>>>fields get introduced to this CSR.
>>>>>>
>>>>>>Why can't we access current trap frame's `sstatus` image in
>>>>>>`__switch_to` to
>>>>>>save and restore?
>>>>>>
>>>>>>Let me know if I am missing something obvious here. If there is a
>>>>>>complication,
>>>>>>I am missing here and we do end up using this member field, I would
>>>>>>rename it
>>>>>>to something like `status_kernel` to reflect that. So that future
>>>>>>changes are
>>>>>>cognizant of the fact that we have split `status`. One for kernel
>>>>>>execution env
>>>>>>per thread and one for controlling user execution env per thread.
>>>>>
>>>>>This is so long ago now I cannot remember if there was any sstatus in
>>>>>the pt_regs field,
>>>>
>>>>FS/VS bits encode status of floating point and vector on 
>>>>per-thread basis.
>>>>So `status` has been part of `pt_regs` for quite a while.
>>>>
>>>>> and if kernel threads have the same context as their
>>>>>userland parts.
>>>>
>>>>I didn't mean kernel thread. What I meant was kernel execution 
>>>>environment
>>>>per-thread. A userland thread does spend sometime in kernel and 
>>>>kernel does
>>>>things on its behalf. One of those thing is touching user memory 
>>>>and that
>>>>requires mucking with this CSR. So what I meant was are we 
>>>>splitting `status`
>>>>on per-thread basis for their time spent in user and kernel.
>>>>
>>>>Getting back to original question--
>>>>As I said, each thread spends sometime in user or in kernel. 
>>>>`status` in
>>>>`pt_regs` is saved on trap entry and restored on trap exit. In a sense,
>>>>`status` field in `pt_regs` is reflecting execution status of 
>>>>the thread on per
>>>>trap basis. Introducing `status` in `thread_struct` creates a 
>>>>confusion (if not
>>>>for today, certainly for future) of which `status` to pick from 
>>>>when we are
>>>>doing save/restore.
>>>
>>>I agree that it's a confusion. sstatus is already saved on pt_regs on
>>>trap entries/return, adding another entry adds code complexity and
>>>makes data inconsistent. But, perhaps we'd eventually need something
>>>like this (I will explain why). Still, there might be a better
>>>approach.
>>>
>>>Yes, we can always reflect pt_regs for sstatus. We all know that
>>>pt_regs reflects sstatus at trap entry, and the pt_regs at scheduler
>>>point refers to "user's" pt_regs whenever it first enters kernel 
>>>mode. Here
>>>are reasons why SR_SUM here may or may not be properly tracked. First,
>>>if this is a trap introduced context switch (such as interrupting in a
>>>preemptible context after we manually enable user access in put_user),
>>>then SR_SUM is saved somewhere in the kernel stack, and is not
>>>reference-able with task_pt_reg during context switch. But we are safe
>>>because the trap exit asm would help us restore the correct SR_SUM
>>>back. However, if this is a self-initiating context switch (calling
>>>into schedule()), then SR_SUM is not saved anywhere, and possibly
>>>causing this error.
>>>
>>>Preemptible Vector in the kernel mode also had this problem where a
>>>self-initiating context switch loses the track of sstatus.vs. The way
>>>I managed it is to track the VS bit at context switch time. However,
>>>this bug shows that people are repeatedly facing the problem, and
>>>maybe it suggests that we'd need a better way of managing sstatus
>>>across context switches. Given the complex nature of this register,
>>>which also touches the interrupt enable status, I don't think naively
>>>saving/restoring the entire register is the way to go. Maybe the
>>>variable deserves a more specific naming and documentation. And if
>>>we'd need a centralized place for managing these statuses, then it
>>>also has to take care of sstatus.VS.
>
>
>Andy, thanks for the precise explanation of the problem :)
>
>So it took me some time but here are my thoughts on this. We should 
>treat pt_regs and thread_struct differently as they do not represent 
>the same thing:
>- pt_regs represents the context of a thread when it takes a trap
>- thread_struct represents a "kernel-induced" (or a "in-kernel") 
>context not caused by traps

Exactly they represent different context of execution. Trap represents a
non-linear control flow change and thus a fresh start of execution control
flow into kernel while `kernel-induced` one's are again non-linear but
fully a kernel/software construct.

A fresh trapped execution context shouldn't have SUM set which is how it is
currently in kernel. This bit gets cleared in trap entry and `sstatus` gets
saved in `pt_regs` (including SR_IE) so that it could be restored whenever
`sret` happens.

The problem we'are seeing here is two fold--

1) We don't want to set and clear when we are accessing array/string for each
    word. This is software problem and this entire series is addressing it.

2) To avoid first problem we are optimizing the access to CSR by setting it
    once and clearing it once. But now we don't want to loose this bit if there
    were:

	a) trap in between 
         b) kernel induced schedule out
         c) a) followed by b)
         d) a) followed by another a)
         e) nested traps

If a) occurs, we are definitley loosing the bit as per current code. If b)
happens then also the same situation.

Saving it in `thread_struct` only addresses `b`. And not `a`, `c`, `d` and
`e`. IMHO `e` is far-fetched situation but I believe `a`, `b`, `c` and `d` happen
during normal runtime of kernel.

So it all depends on nesting level of traps supported by riscv kernel.

Illustraing `c + d` example, if kernel can take 2 nested level of traps with
first trap context having had the SUM bit set, but the second trap had it clear
and now comes the switch out of this thread, at this point if it were saved in
`thread_struct` SUM would be lost for the first trap.

Later when the thread gets switched in again, you would go in 2nd trap
context without SUM (because `thread_context` didnt had it saved), which is
fine. Although when 2nd trap context eventually performs `sret`, it will
go back to first trap context where SUM was expected to be set because it
touching a user memory.

A good example would be a syscall, so that's the first trap. SUM bit is set,
touched user memory and took a trap (page fault). Now code is in second trap
which should clear the SUM bit. Somewhere in memory manager stack, thread is
scheduled out and now `sstatus` is saved in `thread_struct`. This is only
serving current trap context needs and not the one where `SUM` needed to be
set.

We can support such nesting only by ensuring below

On trap entry do 
- save `status` in `pt_regs` or some other FILO data structure
- clear SUM (and other bits needed to be cleared)

On trap return do
- reload `status` from `pt_regs` or some FILO data structure

Quite analogous to what we do for SR_IE as well.

>
>That's why I don't really like Deepak's proposal below as it mixes 
>both and I find it tricky.
>
>I can't find a situation where saving/restoring the entire sstatus at 
>context-switch is a problem though, does anyone have such thing in 
>mind?
>
>Finally I understand that having another copy of sstatus in 
>thread_struct is not intuitive and we should, either explain why or 
>only store the SUM bit (like for sstatus.VS).
>
>Please continue the discussion as we need to find a solution that 
>pleases everyone soon :)
>
>Thanks all for jumping in,
>
>Alex
>
>
>>
>>
>>IMHO, the problem we are trying to solve in this patch is easily 
>>solvable in
>>below manner.
>>
>>
>>diff --git a/arch/riscv/include/asm/switch_to.h 
>>b/arch/riscv/include/asm/switch_to.h
>>index 0e71eb82f920..499d00a6fb67 100644
>>--- a/arch/riscv/include/asm/switch_to.h
>>+++ b/arch/riscv/include/asm/switch_to.h
>>@@ -58,6 +58,20 @@ static inline void __switch_to_fpu(struct 
>>task_struct *prev,
>>        fstate_restore(next, task_pt_regs(next));
>> }
>>
>>+static inline void __switch_to_status(struct task_struct *prev,
>>+                                  struct task_struct *next)
>>+{
>>+       struct pt_regs *regs;
>>+
>>+       /* save status */
>>+       regs = task_pt_regs(prev);
>>+       regs->status = csr_read(CSR_STATUS);
>>+
>>+       /* restore status */
>>+       regs = task_pt_regs(next);
>>+       csr_write(CSR_STATUS, regs->status);
>>+}
>>+
>> static __always_inline bool has_fpu(void)
>> {
>>        return riscv_has_extension_likely(RISCV_ISA_EXT_f) ||
>>@@ -115,6 +129,7 @@ do 
>>{                                                        \
>>        struct task_struct *__prev = (prev);            \
>>        struct task_struct *__next = (next);            \
>>        __set_prev_cpu(__prev->thread);                 \
>>+       __switch_to_status(__prev, __next)              \
>>        if (has_fpu())                                  \
>>                __switch_to_fpu(__prev, __next);        \
>>        if (has_vector() || has_xtheadvector())         \
>>diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
>>index 8d25837a9384..a3b98c1be055 100644
>>--- a/arch/riscv/kernel/entry.S
>>+++ b/arch/riscv/kernel/entry.S
>>@@ -162,17 +162,8 @@ SYM_CODE_START(handle_exception)
>>        REG_S x5,  PT_T0(sp)
>>        save_from_x6_to_x31
>>
>>-       /*
>>-        * Disable user-mode memory access as it should only be set 
>>in the
>>-        * actual user copy routines.
>>-        *
>>-        * Disable the FPU/Vector to detect illegal usage of 
>>floating point
>>-        * or vector in kernel space.
>>-        */
>>-       li t0, SR_SUM | SR_FS_VS | SR_ELP
>>-
>>        REG_L s0, TASK_TI_USER_SP(tp)
>>-       csrrc s1, CSR_STATUS, t0
>>+       csrr s1, CSR_STATUS
>>        save_userssp s2, s1
>>        csrr s2, CSR_EPC
>>        csrr s3, CSR_TVAL
>>@@ -185,6 +176,16 @@ SYM_CODE_START(handle_exception)
>>        REG_S s4, PT_CAUSE(sp)
>>        REG_S s5, PT_TP(sp)
>>
>>+       /*
>>+        * It is fresh trap entry. Disable user-mode memory access 
>>as it should only be set in the
>>+        * actual user copy routines.
>>+        *
>>+        * Disable the FPU/Vector to detect illegal usage of 
>>floating point
>>+        * or vector in kernel space.
>>+        */
>>+       li t0, SR_SUM | SR_FS_VS | SR_ELP
>>+       csrrc s1, CSR_STATUS, t0
>>+
>>        /*
>>         * Set the scratch register to 0, so that if a recursive 
>>exception
>>         * occurs, the exception vector knows it came from the kernel
>>
>>
>>
>>During the time spent in kernel if sets SUM bit in status then, above
>>`__switch_to_status` will ensure that `status` will get saved for current
>>thread and restored for next thread.
>>
>>Furthermore, current trap entry code clears FS/VS/SUM (for right 
>>reasons). It
>>represents non-linear change of control flow and thus whatever will 
>>execute next
>>shouldn't need SUM/FS/VS unless it wants to set it). This patch slightly
>>modifies the flow by first saving the `status` on trap frame (thus 
>>if previous
>>trap frame had SUM=1, it will be saved and restored). And then it
>>unconditionally clears the SUM/FS/VS to ensure that this new trap 
>>context runs
>>without needing SUM=1. This ensures nesting of trap frames without 
>>diluting
>>security properties of SUM.
>>
>>>
>>>Thanks,
>>>Andy
>>>
>>>
>>>
>>>
>>>>
>>>>So my first question was why not to use `status` in `pt_regs`. 
>>>>It is granular
>>>>as it can get (it is available per thread context per trap basis).
>>>>
>>>>
>>>>I did ask Alex as well. I'll ping him again.
>>>>
>>>>>
>>>>>Does anyone else have any comment on this?
>>>>>
>>>>>>
>>>>>>>>    u32 riscv_v_flags;
>>>>>>>>    u32 vstate_ctrl;
>>>>>>>>    struct __riscv_v_ext_state vstate;
>>>>>>>>diff --git a/arch/riscv/kernel/asm-offsets.c
>>>>>>>>b/arch/riscv/kernel/asm- offsets.c
>>>>>>>>index 16490755304e..969c65b1fe41 100644
>>>>>>>>--- a/arch/riscv/kernel/asm-offsets.c
>>>>>>>>+++ b/arch/riscv/kernel/asm-offsets.c
>>>>>>>>@@ -34,6 +34,7 @@ void asm_offsets(void)
>>>>>>>>    OFFSET(TASK_THREAD_S9, task_struct, thread.s[9]);
>>>>>>>>    OFFSET(TASK_THREAD_S10, task_struct, thread.s[10]);
>>>>>>>>    OFFSET(TASK_THREAD_S11, task_struct, thread.s[11]);
>>>>>>
>>>>>>_______________________________________________
>>>>>>linux-riscv mailing list
>>>>>>linux-riscv@lists.infradead.org
>>>>>>http://lists.infradead.org/mailman/listinfo/linux-riscv
>>>>>>
>>>>>
>>>>>
>>>>>--
>>>>>Ben Dooks http://www.codethink.co.uk/
>>>>>Senior Engineer                                Codethink - 
>>>>Providing Genius
>>>>>
>>>>>https://www.codethink.co.uk/privacy.html
>>>>
>>>>_______________________________________________
>>>>linux-riscv mailing list
>>>>linux-riscv@lists.infradead.org
>>>>http://lists.infradead.org/mailman/listinfo/linux-riscv
>>
>>_______________________________________________
>>linux-riscv mailing list
>>linux-riscv@lists.infradead.org
>>http://lists.infradead.org/mailman/listinfo/linux-riscv

Re: [PATCH v6 1/5] riscv: save the SR_SUM status over switches

Posted by Andy Chiu 8 months, 3 weeks ago

On Sat, May 24, 2025 at 1:14 AM Deepak Gupta <debug@rivosinc.com> wrote:
>
> On Fri, May 23, 2025 at 02:22:21PM +0200, Alexandre Ghiti wrote:
> >Hi Andy, Deepak,
> >
> >On 5/23/25 00:43, Deepak Gupta wrote:
> >>On Fri, May 23, 2025 at 01:42:49AM +0800, Andy Chiu wrote:
> >>>On Thu, May 22, 2025 at 11:09 PM Deepak Gupta <debug@rivosinc.com>
> >>>wrote:
> >>>>
> >>>>On Thu, May 22, 2025 at 07:23:32AM +0100, Ben Dooks wrote:
> >>>>>On 20/05/2025 17:49, Deepak Gupta wrote:
> >>>>>>I did give this patch my RB and had planned to come back to it to see
> >>>>>>if it impacts cfi related patches. Thanks to alex for brinigng to my
> >>>>>>attention again. As it stands today, it doesn't impact cfi related
> >>>>>>changes but I've some concerns.
> >>>>>>
> >>>>>>Overall I do agree we should reduce number of SSTATUS accesses.
> >>>>>>
> >>>>>>Couple of questions on introducing new `sstatus` field (inline)
> >>>>>>
> >>>>>>On Tue, Apr 22, 2025 at 04:01:35PM -0700, Deepak Gupta wrote:
> >>>>>>>On Thu, Apr 10, 2025 at 07:05:22AM +0000, Cyril Bur wrote:
> >>>>>>>>From: Ben Dooks <ben.dooks@codethink.co.uk>
> >>>>>>>>
> >>>>>>>>When threads/tasks are switched we need to ensure the old
> >>>>execution's
> >>>>>>>>SR_SUM state is saved and the new thread has the old SR_SUM state
> >>>>>>>>restored.
> >>>>>>>>
> >>>>>>>>The issue was seen under heavy load especially with the
> >>>>syz-stress tool
> >>>>>>>>running, with crashes as follows in schedule_tail:
> >>>>>>>>
> >>>>>>>>Unable to handle kernel access to user memory without
> >>>>uaccess routines
> >>>>>>>>at virtual address 000000002749f0d0
> >>>>>>>>Oops [#1]
> >>>>>>>>Modules linked in:
> >>>>>>>>CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted
> >>>>>>>>5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0
> >>>>>>>>Hardware name: riscv-virtio,qemu (DT)
> >>>>>>>>epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
> >>>>>>>>ra : task_pid_vnr include/linux/sched.h:1421 [inline]
> >>>>>>>>ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264
> >>>>>>>>epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp : ffffffe025d17ec0
> >>>>>>>>gp : ffffffe005d25378 tp : ffffffe00f0d0000 t0 : 0000000000000000
> >>>>>>>>t1 : 0000000000000001 t2 : 00000000000f4240 s0 : ffffffe025d17ee0
> >>>>>>>>s1 : 000000002749f0d0 a0 : 000000000000002a a1 : 0000000000000003
> >>>>>>>>a2 : 1ffffffc0cfac500 a3 : ffffffe0000c80cc a4 : 5ae9db91c19bbe00
> >>>>>>>>a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffe000082eba
> >>>>>>>>s2 : 0000000000040000 s3 : ffffffe00eef96c0 s4 : ffffffe022c77fe0
> >>>>>>>>s5 : 0000000000004000 s6 : ffffffe067d74e00 s7 : ffffffe067d74850
> >>>>>>>>s8 : ffffffe067d73e18 s9 : ffffffe067d74e00 s10: ffffffe00eef96e8
> >>>>>>>>s11: 000000ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffffffc4043cafb2
> >>>>>>>>t5 : ffffffc4043cafba t6 : 0000000000040000
> >>>>>>>>status: 0000000000000120 badaddr: 000000002749f0d0 cause:
> >>>>>>>>000000000000000f
> >>>>>>>>Call Trace:
> >>>>>>>>[<ffffffe00008c8b0>] schedule_tail+0x72/0xb2
> >>>>kernel/sched/core.c:4264
> >>>>>>>>[<ffffffe000005570>] ret_from_exception+0x0/0x14
> >>>>>>>>Dumping ftrace buffer:
> >>>>>>>> (ftrace buffer empty)
> >>>>>>>>---[ end trace b5f8f9231dc87dda ]---
> >>>>>>>>
> >>>>>>>>The issue comes from the put_user() in schedule_tail
> >>>>>>>>(kernel/sched/core.c) doing the following:
> >>>>>>>>
> >>>>>>>>asmlinkage __visible void schedule_tail(struct task_struct *prev)
> >>>>>>>>{
> >>>>>>>>...
> >>>>>>>>      if (current->set_child_tid)
> >>>>>>>>              put_user(task_pid_vnr(current),
> >>>>current->set_child_tid);
> >>>>>>>>...
> >>>>>>>>}
> >>>>>>>>
> >>>>>>>>the put_user() macro causes the code sequence to come out as
> >>>>follows:
> >>>>>>>>
> >>>>>>>>1:    __enable_user_access()
> >>>>>>>>2:    reg = task_pid_vnr(current);
> >>>>>>>>3:    *current->set_child_tid = reg;
> >>>>>>>>4:    __disable_user_access()
> >>>>>>>>
> >>>>>>>>The problem is that we may have a sleeping function as
> >>>>argument which
> >>>>>>>>could clear SR_SUM causing the panic above. This was fixed by
> >>>>>>>>evaluating the argument of the put_user() macro outside the
> >>>>user-enabled
> >>>>>>>>section in commit 285a76bb2cf5 ("riscv: evaluate put_user()
> >>>>arg before
> >>>>>>>>enabling user access")"
> >>>>>>>>
> >>>>>>>>In order for riscv to take advantage of unsafe_get/put_XXX()
> >>>>macros and
> >>>>>>>>to avoid the same issue we had with put_user() and sleeping
> >>>>functions we
> >>>>>>>>must ensure code flow can go through switch_to() from within
> >>>>a region of
> >>>>>>>>code with SR_SUM enabled and come back with SR_SUM still
> >>>>enabled. This
> >>>>>>>>patch addresses the problem allowing future work to enable
> >>>>full use of
> >>>>>>>>unsafe_get/put_XXX() macros without needing to take a CSR
> >>>>bit flip cost
> >>>>>>>>on every access. Make switch_to() save and restore SR_SUM.
> >>>>>>>>
> >>>>>>>>Reported-by: syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.com
> >>>>>>>>Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
> >>>>>>>>Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com>
> >>>>>>>>---
> >>>>>>>>arch/riscv/include/asm/processor.h | 1 +
> >>>>>>>>arch/riscv/kernel/asm-offsets.c    | 5 +++++
> >>>>>>>>arch/riscv/kernel/entry.S          | 8 ++++++++
> >>>>>>>>3 files changed, 14 insertions(+)
> >>>>>>>>
> >>>>>>>>diff --git a/arch/riscv/include/asm/processor.h
> >>>>>>>>b/arch/riscv/include/ asm/processor.h
> >>>>>>>>index 5f56eb9d114a..58fd11c89fe9 100644
> >>>>>>>>--- a/arch/riscv/include/asm/processor.h
> >>>>>>>>+++ b/arch/riscv/include/asm/processor.h
> >>>>>>>>@@ -103,6 +103,7 @@ struct thread_struct {
> >>>>>>>>    struct __riscv_d_ext_state fstate;
> >>>>>>>>    unsigned long bad_cause;
> >>>>>>>>    unsigned long envcfg;
> >>>>>>>>+    unsigned long status;
> >>>>>>
> >>>>>>Do we really need a new member field in `thread_struct`. We
> >>>>already have
> >>>>>>`sstatus` in `pt_regs` which reflects overall execution environment
> >>>>>>situation
> >>>>>>for current thread. This gets saved and restored on trap entry
> >>>>and exit.
> >>>>>>
> >>>>>>If we put `status` in `thread_struct` it creates ambiguity in terms
> >>>>>>of which
> >>>>>>`status` to save to and pick from from future maintainibility
> >>>>>>purposes as the
> >>>>>>fields get introduced to this CSR.
> >>>>>>
> >>>>>>Why can't we access current trap frame's `sstatus` image in
> >>>>>>`__switch_to` to
> >>>>>>save and restore?
> >>>>>>
> >>>>>>Let me know if I am missing something obvious here. If there is a
> >>>>>>complication,
> >>>>>>I am missing here and we do end up using this member field, I would
> >>>>>>rename it
> >>>>>>to something like `status_kernel` to reflect that. So that future
> >>>>>>changes are
> >>>>>>cognizant of the fact that we have split `status`. One for kernel
> >>>>>>execution env
> >>>>>>per thread and one for controlling user execution env per thread.
> >>>>>
> >>>>>This is so long ago now I cannot remember if there was any sstatus in
> >>>>>the pt_regs field,
> >>>>
> >>>>FS/VS bits encode status of floating point and vector on
> >>>>per-thread basis.
> >>>>So `status` has been part of `pt_regs` for quite a while.
> >>>>
> >>>>> and if kernel threads have the same context as their
> >>>>>userland parts.
> >>>>
> >>>>I didn't mean kernel thread. What I meant was kernel execution
> >>>>environment
> >>>>per-thread. A userland thread does spend sometime in kernel and
> >>>>kernel does
> >>>>things on its behalf. One of those thing is touching user memory
> >>>>and that
> >>>>requires mucking with this CSR. So what I meant was are we
> >>>>splitting `status`
> >>>>on per-thread basis for their time spent in user and kernel.
> >>>>
> >>>>Getting back to original question--
> >>>>As I said, each thread spends sometime in user or in kernel.
> >>>>`status` in
> >>>>`pt_regs` is saved on trap entry and restored on trap exit. In a sense,
> >>>>`status` field in `pt_regs` is reflecting execution status of
> >>>>the thread on per
> >>>>trap basis. Introducing `status` in `thread_struct` creates a
> >>>>confusion (if not
> >>>>for today, certainly for future) of which `status` to pick from
> >>>>when we are
> >>>>doing save/restore.
> >>>
> >>>I agree that it's a confusion. sstatus is already saved on pt_regs on
> >>>trap entries/return, adding another entry adds code complexity and
> >>>makes data inconsistent. But, perhaps we'd eventually need something
> >>>like this (I will explain why). Still, there might be a better
> >>>approach.
> >>>
> >>>Yes, we can always reflect pt_regs for sstatus. We all know that
> >>>pt_regs reflects sstatus at trap entry, and the pt_regs at scheduler
> >>>point refers to "user's" pt_regs whenever it first enters kernel
> >>>mode. Here
> >>>are reasons why SR_SUM here may or may not be properly tracked. First,
> >>>if this is a trap introduced context switch (such as interrupting in a
> >>>preemptible context after we manually enable user access in put_user),
> >>>then SR_SUM is saved somewhere in the kernel stack, and is not
> >>>reference-able with task_pt_reg during context switch. But we are safe
> >>>because the trap exit asm would help us restore the correct SR_SUM
> >>>back. However, if this is a self-initiating context switch (calling
> >>>into schedule()), then SR_SUM is not saved anywhere, and possibly
> >>>causing this error.
> >>>
> >>>Preemptible Vector in the kernel mode also had this problem where a
> >>>self-initiating context switch loses the track of sstatus.vs. The way
> >>>I managed it is to track the VS bit at context switch time. However,
> >>>this bug shows that people are repeatedly facing the problem, and
> >>>maybe it suggests that we'd need a better way of managing sstatus
> >>>across context switches. Given the complex nature of this register,
> >>>which also touches the interrupt enable status, I don't think naively
> >>>saving/restoring the entire register is the way to go. Maybe the
> >>>variable deserves a more specific naming and documentation. And if
> >>>we'd need a centralized place for managing these statuses, then it
> >>>also has to take care of sstatus.VS.
> >
> >
> >Andy, thanks for the precise explanation of the problem :)

Thanks for reading it Alex! It's my bad making it wordy

> >
> >So it took me some time but here are my thoughts on this. We should
> >treat pt_regs and thread_struct differently as they do not represent
> >the same thing:
> >- pt_regs represents the context of a thread when it takes a trap
> >- thread_struct represents a "kernel-induced" (or a "in-kernel")
> >context not caused by traps
>
> Exactly they represent different context of execution. Trap represents a
> non-linear control flow change and thus a fresh start of execution control
> flow into kernel while `kernel-induced` one's are again non-linear but
> fully a kernel/software construct.
>
> A fresh trapped execution context shouldn't have SUM set which is how it is
> currently in kernel. This bit gets cleared in trap entry and `sstatus` gets
> saved in `pt_regs` (including SR_IE) so that it could be restored whenever
> `sret` happens.
>
> The problem we'are seeing here is two fold--
>
> 1) We don't want to set and clear when we are accessing array/string for each
>     word. This is software problem and this entire series is addressing it.
>
> 2) To avoid first problem we are optimizing the access to CSR by setting it
>     once and clearing it once. But now we don't want to loose this bit if there
>     were:
>
>         a) trap in between
>          b) kernel induced schedule out
>          c) a) followed by b)
>          d) a) followed by another a)
>          e) nested traps
>
> If a) occurs, we are definitley loosing the bit as per current code. If b)
> happens then also the same situation.
>
> Saving it in `thread_struct` only addresses `b`. And not `a`, `c`, `d` and
> `e`. IMHO `e` is far-fetched situation but I believe `a`, `b`, `c` and `d` happen
> during normal runtime of kernel.

The trap entry/exit routine should always take care of trap cases,
whenever the kernel traps, SUM is saved to pt_regs somewhere in the
kernel stack. Yes, a task may be scheduled out after a trap, which is
common, but please be aware of that after scheduling back to the
original task, it then has to execute the trap exit and thus restore
the SUM before going back to the original code (where it receives an
exception).

>
> So it all depends on nesting level of traps supported by riscv kernel.
>
> Illustraing `c + d` example, if kernel can take 2 nested level of traps with
> first trap context having had the SUM bit set, but the second trap had it clear
> and now comes the switch out of this thread, at this point if it were saved in
> `thread_struct` SUM would be lost for the first trap.

No, the trap exit always restores the in-context (correct) sstatus back

>
> Later when the thread gets switched in again, you would go in 2nd trap
> context without SUM (because `thread_context` didnt had it saved), which is
> fine. Although when 2nd trap context eventually performs `sret`, it will
> go back to first trap context where SUM was expected to be set because it
> touching a user memory.
>
> A good example would be a syscall, so that's the first trap. SUM bit is set,
> touched user memory and took a trap (page fault). Now code is in second trap
> which should clear the SUM bit. Somewhere in memory manager stack, thread is
> scheduled out and now `sstatus` is saved in `thread_struct`. This is only
> serving current trap context needs and not the one where `SUM` needed to be
> set.
>
> We can support such nesting only by ensuring below
>
> On trap entry do
> - save `status` in `pt_regs` or some other FILO data structure
> - clear SUM (and other bits needed to be cleared)
>
> On trap return do
> - reload `status` from `pt_regs` or some FILO data structure
>
> Quite analogous to what we do for SR_IE as well.

I am not sure if I understand what FILO is, but the current trap
handling routines do save/restore sstatus, which can be found at
handle_exception and ret_from_exception, as of today.

>
> >
> >That's why I don't really like Deepak's proposal below as it mixes
> >both and I find it tricky.
> >
> >I can't find a situation where saving/restoring the entire sstatus at
> >context-switch is a problem though, does anyone have such thing in
> >mind?

I agree that we should keep track of sstatus somewhere and be explicit
about what context it tracks.

sstatus not just tracks per-thread status, some are machine-wide.
Though __switch_to are always called with interrupt disabled, I think
conceptually interrupt enable status should not be saved/restore on a
per-thread basis.

Just FYI that some statuses are currently managed by individual
modules (such as the live sstatus.VS are managed in asm/vector.h). We
can discuss what is prefered. The final patch should take care of
this, or should document that VS is managed elsewhere, if we would
like a centralized sstatus management.

Personally, I would prefer a centralized sstatus management that only
touches SUM. This prevents duplicating condition matchings for vector
out to other places. But maybe there are better ways

Thanks,
Andy




> >
> >Finally I understand that having another copy of sstatus in
> >thread_struct is not intuitive and we should, either explain why or
> >only store the SUM bit (like for sstatus.VS).
> >
> >Please continue the discussion as we need to find a solution that
> >pleases everyone soon :)
> >
> >Thanks all for jumping in,
> >
> >Alex
> >
> >
> >>
> >>
> >>IMHO, the problem we are trying to solve in this patch is easily
> >>solvable in
> >>below manner.
> >>
> >>
> >>diff --git a/arch/riscv/include/asm/switch_to.h
> >>b/arch/riscv/include/asm/switch_to.h
> >>index 0e71eb82f920..499d00a6fb67 100644
> >>--- a/arch/riscv/include/asm/switch_to.h
> >>+++ b/arch/riscv/include/asm/switch_to.h
> >>@@ -58,6 +58,20 @@ static inline void __switch_to_fpu(struct
> >>task_struct *prev,
> >>        fstate_restore(next, task_pt_regs(next));
> >> }
> >>
> >>+static inline void __switch_to_status(struct task_struct *prev,
> >>+                                  struct task_struct *next)
> >>+{
> >>+       struct pt_regs *regs;
> >>+
> >>+       /* save status */
> >>+       regs = task_pt_regs(prev);
> >>+       regs->status = csr_read(CSR_STATUS);
> >>+
> >>+       /* restore status */
> >>+       regs = task_pt_regs(next);
> >>+       csr_write(CSR_STATUS, regs->status);
> >>+}
> >>+
> >> static __always_inline bool has_fpu(void)
> >> {
> >>        return riscv_has_extension_likely(RISCV_ISA_EXT_f) ||
> >>@@ -115,6 +129,7 @@ do
> >>{                                                        \
> >>        struct task_struct *__prev = (prev);            \
> >>        struct task_struct *__next = (next);            \
> >>        __set_prev_cpu(__prev->thread);                 \
> >>+       __switch_to_status(__prev, __next)              \
> >>        if (has_fpu())                                  \
> >>                __switch_to_fpu(__prev, __next);        \
> >>        if (has_vector() || has_xtheadvector())         \
> >>diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
> >>index 8d25837a9384..a3b98c1be055 100644
> >>--- a/arch/riscv/kernel/entry.S
> >>+++ b/arch/riscv/kernel/entry.S
> >>@@ -162,17 +162,8 @@ SYM_CODE_START(handle_exception)
> >>        REG_S x5,  PT_T0(sp)
> >>        save_from_x6_to_x31
> >>
> >>-       /*
> >>-        * Disable user-mode memory access as it should only be set
> >>in the
> >>-        * actual user copy routines.
> >>-        *
> >>-        * Disable the FPU/Vector to detect illegal usage of
> >>floating point
> >>-        * or vector in kernel space.
> >>-        */
> >>-       li t0, SR_SUM | SR_FS_VS | SR_ELP
> >>-
> >>        REG_L s0, TASK_TI_USER_SP(tp)
> >>-       csrrc s1, CSR_STATUS, t0
> >>+       csrr s1, CSR_STATUS
> >>        save_userssp s2, s1
> >>        csrr s2, CSR_EPC
> >>        csrr s3, CSR_TVAL
> >>@@ -185,6 +176,16 @@ SYM_CODE_START(handle_exception)
> >>        REG_S s4, PT_CAUSE(sp)
> >>        REG_S s5, PT_TP(sp)
> >>
> >>+       /*
> >>+        * It is fresh trap entry. Disable user-mode memory access
> >>as it should only be set in the
> >>+        * actual user copy routines.
> >>+        *
> >>+        * Disable the FPU/Vector to detect illegal usage of
> >>floating point
> >>+        * or vector in kernel space.
> >>+        */
> >>+       li t0, SR_SUM | SR_FS_VS | SR_ELP
> >>+       csrrc s1, CSR_STATUS, t0
> >>+
> >>        /*
> >>         * Set the scratch register to 0, so that if a recursive
> >>exception
> >>         * occurs, the exception vector knows it came from the kernel
> >>
> >>
> >>
> >>During the time spent in kernel if sets SUM bit in status then, above
> >>`__switch_to_status` will ensure that `status` will get saved for current
> >>thread and restored for next thread.
> >>
> >>Furthermore, current trap entry code clears FS/VS/SUM (for right
> >>reasons). It
> >>represents non-linear change of control flow and thus whatever will
> >>execute next
> >>shouldn't need SUM/FS/VS unless it wants to set it). This patch slightly
> >>modifies the flow by first saving the `status` on trap frame (thus
> >>if previous
> >>trap frame had SUM=1, it will be saved and restored). And then it
> >>unconditionally clears the SUM/FS/VS to ensure that this new trap
> >>context runs
> >>without needing SUM=1. This ensures nesting of trap frames without
> >>diluting
> >>security properties of SUM.
> >>
> >>>
> >>>Thanks,
> >>>Andy
> >>>
> >>>
> >>>
> >>>
> >>>>
> >>>>So my first question was why not to use `status` in `pt_regs`.
> >>>>It is granular
> >>>>as it can get (it is available per thread context per trap basis).
> >>>>
> >>>>
> >>>>I did ask Alex as well. I'll ping him again.
> >>>>
> >>>>>
> >>>>>Does anyone else have any comment on this?
> >>>>>
> >>>>>>
> >>>>>>>>    u32 riscv_v_flags;
> >>>>>>>>    u32 vstate_ctrl;
> >>>>>>>>    struct __riscv_v_ext_state vstate;
> >>>>>>>>diff --git a/arch/riscv/kernel/asm-offsets.c
> >>>>>>>>b/arch/riscv/kernel/asm- offsets.c
> >>>>>>>>index 16490755304e..969c65b1fe41 100644
> >>>>>>>>--- a/arch/riscv/kernel/asm-offsets.c
> >>>>>>>>+++ b/arch/riscv/kernel/asm-offsets.c
> >>>>>>>>@@ -34,6 +34,7 @@ void asm_offsets(void)
> >>>>>>>>    OFFSET(TASK_THREAD_S9, task_struct, thread.s[9]);
> >>>>>>>>    OFFSET(TASK_THREAD_S10, task_struct, thread.s[10]);
> >>>>>>>>    OFFSET(TASK_THREAD_S11, task_struct, thread.s[11]);
> >>>>>>
> >>>>>>_______________________________________________
> >>>>>>linux-riscv mailing list
> >>>>>>linux-riscv@lists.infradead.org
> >>>>>>http://lists.infradead.org/mailman/listinfo/linux-riscv
> >>>>>>
> >>>>>
> >>>>>
> >>>>>--
> >>>>>Ben Dooks http://www.codethink.co.uk/
> >>>>>Senior Engineer                                Codethink -
> >>>>Providing Genius
> >>>>>
> >>>>>https://www.codethink.co.uk/privacy.html
> >>>>
> >>>>_______________________________________________
> >>>>linux-riscv mailing list
> >>>>linux-riscv@lists.infradead.org
> >>>>http://lists.infradead.org/mailman/listinfo/linux-riscv
> >>
> >>_______________________________________________
> >>linux-riscv mailing list
> >>linux-riscv@lists.infradead.org
> >>http://lists.infradead.org/mailman/listinfo/linux-riscv

Re: [PATCH v6 1/5] riscv: save the SR_SUM status over switches

Posted by Deepak Gupta 8 months, 2 weeks ago

On Sat, May 24, 2025 at 06:00:00PM +0800, Andy Chiu wrote:
>On Sat, May 24, 2025 at 1:14 AM Deepak Gupta <debug@rivosinc.com> wrote:
>>
>> On Fri, May 23, 2025 at 02:22:21PM +0200, Alexandre Ghiti wrote:
>> >Hi Andy, Deepak,
>> >
>> >On 5/23/25 00:43, Deepak Gupta wrote:
>> >>On Fri, May 23, 2025 at 01:42:49AM +0800, Andy Chiu wrote:
>> >>>On Thu, May 22, 2025 at 11:09 PM Deepak Gupta <debug@rivosinc.com>
>> >>>wrote:
>> >>>>
>> >>>>On Thu, May 22, 2025 at 07:23:32AM +0100, Ben Dooks wrote:
>> >>>>>On 20/05/2025 17:49, Deepak Gupta wrote:
>> >>>>>>I did give this patch my RB and had planned to come back to it to see
>> >>>>>>if it impacts cfi related patches. Thanks to alex for brinigng to my
>> >>>>>>attention again. As it stands today, it doesn't impact cfi related
>> >>>>>>changes but I've some concerns.
>> >>>>>>
>> >>>>>>Overall I do agree we should reduce number of SSTATUS accesses.
>> >>>>>>
>> >>>>>>Couple of questions on introducing new `sstatus` field (inline)
>> >>>>>>
>> >>>>>>On Tue, Apr 22, 2025 at 04:01:35PM -0700, Deepak Gupta wrote:
>> >>>>>>>On Thu, Apr 10, 2025 at 07:05:22AM +0000, Cyril Bur wrote:
>> >>>>>>>>From: Ben Dooks <ben.dooks@codethink.co.uk>
>> >>>>>>>>
>> >>>>>>>>When threads/tasks are switched we need to ensure the old
>> >>>>execution's
>> >>>>>>>>SR_SUM state is saved and the new thread has the old SR_SUM state
>> >>>>>>>>restored.
>> >>>>>>>>
>> >>>>>>>>The issue was seen under heavy load especially with the
>> >>>>syz-stress tool
>> >>>>>>>>running, with crashes as follows in schedule_tail:
>> >>>>>>>>
>> >>>>>>>>Unable to handle kernel access to user memory without
>> >>>>uaccess routines
>> >>>>>>>>at virtual address 000000002749f0d0
>> >>>>>>>>Oops [#1]
>> >>>>>>>>Modules linked in:
>> >>>>>>>>CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted
>> >>>>>>>>5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0
>> >>>>>>>>Hardware name: riscv-virtio,qemu (DT)
>> >>>>>>>>epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>> >>>>>>>>ra : task_pid_vnr include/linux/sched.h:1421 [inline]
>> >>>>>>>>ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264
>> >>>>>>>>epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp : ffffffe025d17ec0
>> >>>>>>>>gp : ffffffe005d25378 tp : ffffffe00f0d0000 t0 : 0000000000000000
>> >>>>>>>>t1 : 0000000000000001 t2 : 00000000000f4240 s0 : ffffffe025d17ee0
>> >>>>>>>>s1 : 000000002749f0d0 a0 : 000000000000002a a1 : 0000000000000003
>> >>>>>>>>a2 : 1ffffffc0cfac500 a3 : ffffffe0000c80cc a4 : 5ae9db91c19bbe00
>> >>>>>>>>a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffe000082eba
>> >>>>>>>>s2 : 0000000000040000 s3 : ffffffe00eef96c0 s4 : ffffffe022c77fe0
>> >>>>>>>>s5 : 0000000000004000 s6 : ffffffe067d74e00 s7 : ffffffe067d74850
>> >>>>>>>>s8 : ffffffe067d73e18 s9 : ffffffe067d74e00 s10: ffffffe00eef96e8
>> >>>>>>>>s11: 000000ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffffffc4043cafb2
>> >>>>>>>>t5 : ffffffc4043cafba t6 : 0000000000040000
>> >>>>>>>>status: 0000000000000120 badaddr: 000000002749f0d0 cause:
>> >>>>>>>>000000000000000f
>> >>>>>>>>Call Trace:
>> >>>>>>>>[<ffffffe00008c8b0>] schedule_tail+0x72/0xb2
>> >>>>kernel/sched/core.c:4264
>> >>>>>>>>[<ffffffe000005570>] ret_from_exception+0x0/0x14
>> >>>>>>>>Dumping ftrace buffer:
>> >>>>>>>> (ftrace buffer empty)
>> >>>>>>>>---[ end trace b5f8f9231dc87dda ]---
>> >>>>>>>>
>> >>>>>>>>The issue comes from the put_user() in schedule_tail
>> >>>>>>>>(kernel/sched/core.c) doing the following:
>> >>>>>>>>
>> >>>>>>>>asmlinkage __visible void schedule_tail(struct task_struct *prev)
>> >>>>>>>>{
>> >>>>>>>>...
>> >>>>>>>>      if (current->set_child_tid)
>> >>>>>>>>              put_user(task_pid_vnr(current),
>> >>>>current->set_child_tid);
>> >>>>>>>>...
>> >>>>>>>>}
>> >>>>>>>>
>> >>>>>>>>the put_user() macro causes the code sequence to come out as
>> >>>>follows:
>> >>>>>>>>
>> >>>>>>>>1:    __enable_user_access()
>> >>>>>>>>2:    reg = task_pid_vnr(current);
>> >>>>>>>>3:    *current->set_child_tid = reg;
>> >>>>>>>>4:    __disable_user_access()
>> >>>>>>>>
>> >>>>>>>>The problem is that we may have a sleeping function as
>> >>>>argument which
>> >>>>>>>>could clear SR_SUM causing the panic above. This was fixed by
>> >>>>>>>>evaluating the argument of the put_user() macro outside the
>> >>>>user-enabled
>> >>>>>>>>section in commit 285a76bb2cf5 ("riscv: evaluate put_user()
>> >>>>arg before
>> >>>>>>>>enabling user access")"
>> >>>>>>>>
>> >>>>>>>>In order for riscv to take advantage of unsafe_get/put_XXX()
>> >>>>macros and
>> >>>>>>>>to avoid the same issue we had with put_user() and sleeping
>> >>>>functions we
>> >>>>>>>>must ensure code flow can go through switch_to() from within
>> >>>>a region of
>> >>>>>>>>code with SR_SUM enabled and come back with SR_SUM still
>> >>>>enabled. This
>> >>>>>>>>patch addresses the problem allowing future work to enable
>> >>>>full use of
>> >>>>>>>>unsafe_get/put_XXX() macros without needing to take a CSR
>> >>>>bit flip cost
>> >>>>>>>>on every access. Make switch_to() save and restore SR_SUM.
>> >>>>>>>>
>> >>>>>>>>Reported-by: syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.com
>> >>>>>>>>Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
>> >>>>>>>>Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com>
>> >>>>>>>>---
>> >>>>>>>>arch/riscv/include/asm/processor.h | 1 +
>> >>>>>>>>arch/riscv/kernel/asm-offsets.c    | 5 +++++
>> >>>>>>>>arch/riscv/kernel/entry.S          | 8 ++++++++
>> >>>>>>>>3 files changed, 14 insertions(+)
>> >>>>>>>>
>> >>>>>>>>diff --git a/arch/riscv/include/asm/processor.h
>> >>>>>>>>b/arch/riscv/include/ asm/processor.h
>> >>>>>>>>index 5f56eb9d114a..58fd11c89fe9 100644
>> >>>>>>>>--- a/arch/riscv/include/asm/processor.h
>> >>>>>>>>+++ b/arch/riscv/include/asm/processor.h
>> >>>>>>>>@@ -103,6 +103,7 @@ struct thread_struct {
>> >>>>>>>>    struct __riscv_d_ext_state fstate;
>> >>>>>>>>    unsigned long bad_cause;
>> >>>>>>>>    unsigned long envcfg;
>> >>>>>>>>+    unsigned long status;
>> >>>>>>
>> >>>>>>Do we really need a new member field in `thread_struct`. We
>> >>>>already have
>> >>>>>>`sstatus` in `pt_regs` which reflects overall execution environment
>> >>>>>>situation
>> >>>>>>for current thread. This gets saved and restored on trap entry
>> >>>>and exit.
>> >>>>>>
>> >>>>>>If we put `status` in `thread_struct` it creates ambiguity in terms
>> >>>>>>of which
>> >>>>>>`status` to save to and pick from from future maintainibility
>> >>>>>>purposes as the
>> >>>>>>fields get introduced to this CSR.
>> >>>>>>
>> >>>>>>Why can't we access current trap frame's `sstatus` image in
>> >>>>>>`__switch_to` to
>> >>>>>>save and restore?
>> >>>>>>
>> >>>>>>Let me know if I am missing something obvious here. If there is a
>> >>>>>>complication,
>> >>>>>>I am missing here and we do end up using this member field, I would
>> >>>>>>rename it
>> >>>>>>to something like `status_kernel` to reflect that. So that future
>> >>>>>>changes are
>> >>>>>>cognizant of the fact that we have split `status`. One for kernel
>> >>>>>>execution env
>> >>>>>>per thread and one for controlling user execution env per thread.
>> >>>>>
>> >>>>>This is so long ago now I cannot remember if there was any sstatus in
>> >>>>>the pt_regs field,
>> >>>>
>> >>>>FS/VS bits encode status of floating point and vector on
>> >>>>per-thread basis.
>> >>>>So `status` has been part of `pt_regs` for quite a while.
>> >>>>
>> >>>>> and if kernel threads have the same context as their
>> >>>>>userland parts.
>> >>>>
>> >>>>I didn't mean kernel thread. What I meant was kernel execution
>> >>>>environment
>> >>>>per-thread. A userland thread does spend sometime in kernel and
>> >>>>kernel does
>> >>>>things on its behalf. One of those thing is touching user memory
>> >>>>and that
>> >>>>requires mucking with this CSR. So what I meant was are we
>> >>>>splitting `status`
>> >>>>on per-thread basis for their time spent in user and kernel.
>> >>>>
>> >>>>Getting back to original question--
>> >>>>As I said, each thread spends sometime in user or in kernel.
>> >>>>`status` in
>> >>>>`pt_regs` is saved on trap entry and restored on trap exit. In a sense,
>> >>>>`status` field in `pt_regs` is reflecting execution status of
>> >>>>the thread on per
>> >>>>trap basis. Introducing `status` in `thread_struct` creates a
>> >>>>confusion (if not
>> >>>>for today, certainly for future) of which `status` to pick from
>> >>>>when we are
>> >>>>doing save/restore.
>> >>>
>> >>>I agree that it's a confusion. sstatus is already saved on pt_regs on
>> >>>trap entries/return, adding another entry adds code complexity and
>> >>>makes data inconsistent. But, perhaps we'd eventually need something
>> >>>like this (I will explain why). Still, there might be a better
>> >>>approach.
>> >>>
>> >>>Yes, we can always reflect pt_regs for sstatus. We all know that
>> >>>pt_regs reflects sstatus at trap entry, and the pt_regs at scheduler
>> >>>point refers to "user's" pt_regs whenever it first enters kernel
>> >>>mode. Here
>> >>>are reasons why SR_SUM here may or may not be properly tracked. First,
>> >>>if this is a trap introduced context switch (such as interrupting in a
>> >>>preemptible context after we manually enable user access in put_user),
>> >>>then SR_SUM is saved somewhere in the kernel stack, and is not
>> >>>reference-able with task_pt_reg during context switch. But we are safe
>> >>>because the trap exit asm would help us restore the correct SR_SUM
>> >>>back. However, if this is a self-initiating context switch (calling
>> >>>into schedule()), then SR_SUM is not saved anywhere, and possibly
>> >>>causing this error.
>> >>>
>> >>>Preemptible Vector in the kernel mode also had this problem where a
>> >>>self-initiating context switch loses the track of sstatus.vs. The way
>> >>>I managed it is to track the VS bit at context switch time. However,
>> >>>this bug shows that people are repeatedly facing the problem, and
>> >>>maybe it suggests that we'd need a better way of managing sstatus
>> >>>across context switches. Given the complex nature of this register,
>> >>>which also touches the interrupt enable status, I don't think naively
>> >>>saving/restoring the entire register is the way to go. Maybe the
>> >>>variable deserves a more specific naming and documentation. And if
>> >>>we'd need a centralized place for managing these statuses, then it
>> >>>also has to take care of sstatus.VS.
>> >
>> >
>> >Andy, thanks for the precise explanation of the problem :)
>
>Thanks for reading it Alex! It's my bad making it wordy
>
>> >
>> >So it took me some time but here are my thoughts on this. We should
>> >treat pt_regs and thread_struct differently as they do not represent
>> >the same thing:
>> >- pt_regs represents the context of a thread when it takes a trap
>> >- thread_struct represents a "kernel-induced" (or a "in-kernel")
>> >context not caused by traps
>>
>> Exactly they represent different context of execution. Trap represents a
>> non-linear control flow change and thus a fresh start of execution control
>> flow into kernel while `kernel-induced` one's are again non-linear but
>> fully a kernel/software construct.
>>
>> A fresh trapped execution context shouldn't have SUM set which is how it is
>> currently in kernel. This bit gets cleared in trap entry and `sstatus` gets
>> saved in `pt_regs` (including SR_IE) so that it could be restored whenever
>> `sret` happens.
>>
>> The problem we'are seeing here is two fold--
>>
>> 1) We don't want to set and clear when we are accessing array/string for each
>>     word. This is software problem and this entire series is addressing it.
>>
>> 2) To avoid first problem we are optimizing the access to CSR by setting it
>>     once and clearing it once. But now we don't want to loose this bit if there
>>     were:
>>
>>         a) trap in between
>>          b) kernel induced schedule out
>>          c) a) followed by b)
>>          d) a) followed by another a)
>>          e) nested traps
>>
>> If a) occurs, we are definitley loosing the bit as per current code. If b)
>> happens then also the same situation.
>>
>> Saving it in `thread_struct` only addresses `b`. And not `a`, `c`, `d` and
>> `e`. IMHO `e` is far-fetched situation but I believe `a`, `b`, `c` and `d` happen
>> during normal runtime of kernel.
>
>The trap entry/exit routine should always take care of trap cases,
>whenever the kernel traps, SUM is saved to pt_regs somewhere in the
>kernel stack. Yes, a task may be scheduled out after a trap, which is
>common, but please be aware of that after scheduling back to the
>original task, it then has to execute the trap exit and thus restore
>the SUM before going back to the original code (where it receives an
>exception).

Yes you are right. Thanks for correcting me.

As I mentioned in another fork of the thread. The nesting of traps is taken
care of by trap entry/exit. 
It's all about kernel induced event then.

Is there nesting of kernel induced event?
If there is no nesting then sure a field in `thread_struct` is fine.
But then in that case save/restore is in `pt_regs` is also fine and keep
a single image which truly represents current context and trap together.

>
>>
>> So it all depends on nesting level of traps supported by riscv kernel.
>>
>> Illustraing `c + d` example, if kernel can take 2 nested level of traps with
>> first trap context having had the SUM bit set, but the second trap had it clear
>> and now comes the switch out of this thread, at this point if it were saved in
>> `thread_struct` SUM would be lost for the first trap.
>
>No, the trap exit always restores the in-context (correct) sstatus back
>
>>
>> Later when the thread gets switched in again, you would go in 2nd trap
>> context without SUM (because `thread_context` didnt had it saved), which is
>> fine. Although when 2nd trap context eventually performs `sret`, it will
>> go back to first trap context where SUM was expected to be set because it
>> touching a user memory.
>>
>> A good example would be a syscall, so that's the first trap. SUM bit is set,
>> touched user memory and took a trap (page fault). Now code is in second trap
>> which should clear the SUM bit. Somewhere in memory manager stack, thread is
>> scheduled out and now `sstatus` is saved in `thread_struct`. This is only
>> serving current trap context needs and not the one where `SUM` needed to be
>> set.
>>
>> We can support such nesting only by ensuring below
>>
>> On trap entry do
>> - save `status` in `pt_regs` or some other FILO data structure
>> - clear SUM (and other bits needed to be cleared)
>>
>> On trap return do
>> - reload `status` from `pt_regs` or some FILO data structure
>>
>> Quite analogous to what we do for SR_IE as well.
>
>I am not sure if I understand what FILO is, but the current trap
>handling routines do save/restore sstatus, which can be found at
>handle_exception and ret_from_exception, as of today.
>
>>
>> >
>> >That's why I don't really like Deepak's proposal below as it mixes
>> >both and I find it tricky.
>> >
>> >I can't find a situation where saving/restoring the entire sstatus at
>> >context-switch is a problem though, does anyone have such thing in
>> >mind?
>
>I agree that we should keep track of sstatus somewhere and be explicit
>about what context it tracks.
>
>sstatus not just tracks per-thread status, some are machine-wide.
>Though __switch_to are always called with interrupt disabled, I think
>conceptually interrupt enable status should not be saved/restore on a
>per-thread basis.
>
>Just FYI that some statuses are currently managed by individual
>modules (such as the live sstatus.VS are managed in asm/vector.h). We
>can discuss what is prefered. The final patch should take care of
>this, or should document that VS is managed elsewhere, if we would
>like a centralized sstatus management.
>
>Personally, I would prefer a centralized sstatus management that only
>touches SUM. This prevents duplicating condition matchings for vector
>out to other places. But maybe there are better ways
>
>Thanks,
>Andy
>
>
>
>
>> >
>> >Finally I understand that having another copy of sstatus in
>> >thread_struct is not intuitive and we should, either explain why or
>> >only store the SUM bit (like for sstatus.VS).
>> >
>> >Please continue the discussion as we need to find a solution that
>> >pleases everyone soon :)
>> >
>> >Thanks all for jumping in,
>> >
>> >Alex
>> >
>> >
>> >>
>> >>
>> >>IMHO, the problem we are trying to solve in this patch is easily
>> >>solvable in
>> >>below manner.
>> >>
>> >>
>> >>diff --git a/arch/riscv/include/asm/switch_to.h
>> >>b/arch/riscv/include/asm/switch_to.h
>> >>index 0e71eb82f920..499d00a6fb67 100644
>> >>--- a/arch/riscv/include/asm/switch_to.h
>> >>+++ b/arch/riscv/include/asm/switch_to.h
>> >>@@ -58,6 +58,20 @@ static inline void __switch_to_fpu(struct
>> >>task_struct *prev,
>> >>        fstate_restore(next, task_pt_regs(next));
>> >> }
>> >>
>> >>+static inline void __switch_to_status(struct task_struct *prev,
>> >>+                                  struct task_struct *next)
>> >>+{
>> >>+       struct pt_regs *regs;
>> >>+
>> >>+       /* save status */
>> >>+       regs = task_pt_regs(prev);
>> >>+       regs->status = csr_read(CSR_STATUS);
>> >>+
>> >>+       /* restore status */
>> >>+       regs = task_pt_regs(next);
>> >>+       csr_write(CSR_STATUS, regs->status);
>> >>+}
>> >>+
>> >> static __always_inline bool has_fpu(void)
>> >> {
>> >>        return riscv_has_extension_likely(RISCV_ISA_EXT_f) ||
>> >>@@ -115,6 +129,7 @@ do
>> >>{                                                        \
>> >>        struct task_struct *__prev = (prev);            \
>> >>        struct task_struct *__next = (next);            \
>> >>        __set_prev_cpu(__prev->thread);                 \
>> >>+       __switch_to_status(__prev, __next)              \
>> >>        if (has_fpu())                                  \
>> >>                __switch_to_fpu(__prev, __next);        \
>> >>        if (has_vector() || has_xtheadvector())         \
>> >>diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
>> >>index 8d25837a9384..a3b98c1be055 100644
>> >>--- a/arch/riscv/kernel/entry.S
>> >>+++ b/arch/riscv/kernel/entry.S
>> >>@@ -162,17 +162,8 @@ SYM_CODE_START(handle_exception)
>> >>        REG_S x5,  PT_T0(sp)
>> >>        save_from_x6_to_x31
>> >>
>> >>-       /*
>> >>-        * Disable user-mode memory access as it should only be set
>> >>in the
>> >>-        * actual user copy routines.
>> >>-        *
>> >>-        * Disable the FPU/Vector to detect illegal usage of
>> >>floating point
>> >>-        * or vector in kernel space.
>> >>-        */
>> >>-       li t0, SR_SUM | SR_FS_VS | SR_ELP
>> >>-
>> >>        REG_L s0, TASK_TI_USER_SP(tp)
>> >>-       csrrc s1, CSR_STATUS, t0
>> >>+       csrr s1, CSR_STATUS
>> >>        save_userssp s2, s1
>> >>        csrr s2, CSR_EPC
>> >>        csrr s3, CSR_TVAL
>> >>@@ -185,6 +176,16 @@ SYM_CODE_START(handle_exception)
>> >>        REG_S s4, PT_CAUSE(sp)
>> >>        REG_S s5, PT_TP(sp)
>> >>
>> >>+       /*
>> >>+        * It is fresh trap entry. Disable user-mode memory access
>> >>as it should only be set in the
>> >>+        * actual user copy routines.
>> >>+        *
>> >>+        * Disable the FPU/Vector to detect illegal usage of
>> >>floating point
>> >>+        * or vector in kernel space.
>> >>+        */
>> >>+       li t0, SR_SUM | SR_FS_VS | SR_ELP
>> >>+       csrrc s1, CSR_STATUS, t0
>> >>+
>> >>        /*
>> >>         * Set the scratch register to 0, so that if a recursive
>> >>exception
>> >>         * occurs, the exception vector knows it came from the kernel
>> >>
>> >>
>> >>
>> >>During the time spent in kernel if sets SUM bit in status then, above
>> >>`__switch_to_status` will ensure that `status` will get saved for current
>> >>thread and restored for next thread.
>> >>
>> >>Furthermore, current trap entry code clears FS/VS/SUM (for right
>> >>reasons). It
>> >>represents non-linear change of control flow and thus whatever will
>> >>execute next
>> >>shouldn't need SUM/FS/VS unless it wants to set it). This patch slightly
>> >>modifies the flow by first saving the `status` on trap frame (thus
>> >>if previous
>> >>trap frame had SUM=1, it will be saved and restored). And then it
>> >>unconditionally clears the SUM/FS/VS to ensure that this new trap
>> >>context runs
>> >>without needing SUM=1. This ensures nesting of trap frames without
>> >>diluting
>> >>security properties of SUM.
>> >>
>> >>>
>> >>>Thanks,
>> >>>Andy
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>>
>> >>>>So my first question was why not to use `status` in `pt_regs`.
>> >>>>It is granular
>> >>>>as it can get (it is available per thread context per trap basis).
>> >>>>
>> >>>>
>> >>>>I did ask Alex as well. I'll ping him again.
>> >>>>
>> >>>>>
>> >>>>>Does anyone else have any comment on this?
>> >>>>>
>> >>>>>>
>> >>>>>>>>    u32 riscv_v_flags;
>> >>>>>>>>    u32 vstate_ctrl;
>> >>>>>>>>    struct __riscv_v_ext_state vstate;
>> >>>>>>>>diff --git a/arch/riscv/kernel/asm-offsets.c
>> >>>>>>>>b/arch/riscv/kernel/asm- offsets.c
>> >>>>>>>>index 16490755304e..969c65b1fe41 100644
>> >>>>>>>>--- a/arch/riscv/kernel/asm-offsets.c
>> >>>>>>>>+++ b/arch/riscv/kernel/asm-offsets.c
>> >>>>>>>>@@ -34,6 +34,7 @@ void asm_offsets(void)
>> >>>>>>>>    OFFSET(TASK_THREAD_S9, task_struct, thread.s[9]);
>> >>>>>>>>    OFFSET(TASK_THREAD_S10, task_struct, thread.s[10]);
>> >>>>>>>>    OFFSET(TASK_THREAD_S11, task_struct, thread.s[11]);
>> >>>>>>
>> >>>>>>_______________________________________________
>> >>>>>>linux-riscv mailing list
>> >>>>>>linux-riscv@lists.infradead.org
>> >>>>>>http://lists.infradead.org/mailman/listinfo/linux-riscv
>> >>>>>>
>> >>>>>
>> >>>>>
>> >>>>>--
>> >>>>>Ben Dooks http://www.codethink.co.uk/
>> >>>>>Senior Engineer                                Codethink -
>> >>>>Providing Genius
>> >>>>>
>> >>>>>https://www.codethink.co.uk/privacy.html
>> >>>>
>> >>>>_______________________________________________
>> >>>>linux-riscv mailing list
>> >>>>linux-riscv@lists.infradead.org
>> >>>>http://lists.infradead.org/mailman/listinfo/linux-riscv
>> >>
>> >>_______________________________________________
>> >>linux-riscv mailing list
>> >>linux-riscv@lists.infradead.org
>> >>http://lists.infradead.org/mailman/listinfo/linux-riscv

Re: [PATCH v6 1/5] riscv: save the SR_SUM status over switches

Posted by Alexandre Ghiti 8 months, 3 weeks ago

On 5/23/25 19:14, Deepak Gupta wrote:
> On Fri, May 23, 2025 at 02:22:21PM +0200, Alexandre Ghiti wrote:
>> Hi Andy, Deepak,
>>
>> On 5/23/25 00:43, Deepak Gupta wrote:
>>> On Fri, May 23, 2025 at 01:42:49AM +0800, Andy Chiu wrote:
>>>> On Thu, May 22, 2025 at 11:09 PM Deepak Gupta <debug@rivosinc.com> 
>>>> wrote:
>>>>>
>>>>> On Thu, May 22, 2025 at 07:23:32AM +0100, Ben Dooks wrote:
>>>>>> On 20/05/2025 17:49, Deepak Gupta wrote:
>>>>>>> I did give this patch my RB and had planned to come back to it 
>>>>>>> to see
>>>>>>> if it impacts cfi related patches. Thanks to alex for brinigng 
>>>>>>> to my
>>>>>>> attention again. As it stands today, it doesn't impact cfi related
>>>>>>> changes but I've some concerns.
>>>>>>>
>>>>>>> Overall I do agree we should reduce number of SSTATUS accesses.
>>>>>>>
>>>>>>> Couple of questions on introducing new `sstatus` field (inline)
>>>>>>>
>>>>>>> On Tue, Apr 22, 2025 at 04:01:35PM -0700, Deepak Gupta wrote:
>>>>>>>> On Thu, Apr 10, 2025 at 07:05:22AM +0000, Cyril Bur wrote:
>>>>>>>>> From: Ben Dooks <ben.dooks@codethink.co.uk>
>>>>>>>>>
>>>>>>>>> When threads/tasks are switched we need to ensure the old 
>>>>> execution's
>>>>>>>>> SR_SUM state is saved and the new thread has the old SR_SUM state
>>>>>>>>> restored.
>>>>>>>>>
>>>>>>>>> The issue was seen under heavy load especially with the 
>>>>> syz-stress tool
>>>>>>>>> running, with crashes as follows in schedule_tail:
>>>>>>>>>
>>>>>>>>> Unable to handle kernel access to user memory without 
>>>>> uaccess routines
>>>>>>>>> at virtual address 000000002749f0d0
>>>>>>>>> Oops [#1]
>>>>>>>>> Modules linked in:
>>>>>>>>> CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted
>>>>>>>>> 5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0
>>>>>>>>> Hardware name: riscv-virtio,qemu (DT)
>>>>>>>>> epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>>>>>>>>> ra : task_pid_vnr include/linux/sched.h:1421 [inline]
>>>>>>>>> ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264
>>>>>>>>> epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp : 
>>>>>>>>> ffffffe025d17ec0
>>>>>>>>> gp : ffffffe005d25378 tp : ffffffe00f0d0000 t0 : 0000000000000000
>>>>>>>>> t1 : 0000000000000001 t2 : 00000000000f4240 s0 : ffffffe025d17ee0
>>>>>>>>> s1 : 000000002749f0d0 a0 : 000000000000002a a1 : 0000000000000003
>>>>>>>>> a2 : 1ffffffc0cfac500 a3 : ffffffe0000c80cc a4 : 5ae9db91c19bbe00
>>>>>>>>> a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffe000082eba
>>>>>>>>> s2 : 0000000000040000 s3 : ffffffe00eef96c0 s4 : ffffffe022c77fe0
>>>>>>>>> s5 : 0000000000004000 s6 : ffffffe067d74e00 s7 : ffffffe067d74850
>>>>>>>>> s8 : ffffffe067d73e18 s9 : ffffffe067d74e00 s10: ffffffe00eef96e8
>>>>>>>>> s11: 000000ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffffffc4043cafb2
>>>>>>>>> t5 : ffffffc4043cafba t6 : 0000000000040000
>>>>>>>>> status: 0000000000000120 badaddr: 000000002749f0d0 cause:
>>>>>>>>> 000000000000000f
>>>>>>>>> Call Trace:
>>>>>>>>> [<ffffffe00008c8b0>] schedule_tail+0x72/0xb2 
>>>>> kernel/sched/core.c:4264
>>>>>>>>> [<ffffffe000005570>] ret_from_exception+0x0/0x14
>>>>>>>>> Dumping ftrace buffer:
>>>>>>>>> (ftrace buffer empty)
>>>>>>>>> ---[ end trace b5f8f9231dc87dda ]---
>>>>>>>>>
>>>>>>>>> The issue comes from the put_user() in schedule_tail
>>>>>>>>> (kernel/sched/core.c) doing the following:
>>>>>>>>>
>>>>>>>>> asmlinkage __visible void schedule_tail(struct task_struct *prev)
>>>>>>>>> {
>>>>>>>>> ...
>>>>>>>>>       if (current->set_child_tid)
>>>>>>>>>               put_user(task_pid_vnr(current), 
>>>>> current->set_child_tid);
>>>>>>>>> ...
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> the put_user() macro causes the code sequence to come out as 
>>>>> follows:
>>>>>>>>>
>>>>>>>>> 1:    __enable_user_access()
>>>>>>>>> 2:    reg = task_pid_vnr(current);
>>>>>>>>> 3:    *current->set_child_tid = reg;
>>>>>>>>> 4:    __disable_user_access()
>>>>>>>>>
>>>>>>>>> The problem is that we may have a sleeping function as 
>>>>> argument which
>>>>>>>>> could clear SR_SUM causing the panic above. This was fixed by
>>>>>>>>> evaluating the argument of the put_user() macro outside the 
>>>>> user-enabled
>>>>>>>>> section in commit 285a76bb2cf5 ("riscv: evaluate put_user() 
>>>>> arg before
>>>>>>>>> enabling user access")"
>>>>>>>>>
>>>>>>>>> In order for riscv to take advantage of unsafe_get/put_XXX() 
>>>>> macros and
>>>>>>>>> to avoid the same issue we had with put_user() and sleeping 
>>>>> functions we
>>>>>>>>> must ensure code flow can go through switch_to() from within 
>>>>> a region of
>>>>>>>>> code with SR_SUM enabled and come back with SR_SUM still 
>>>>> enabled. This
>>>>>>>>> patch addresses the problem allowing future work to enable 
>>>>> full use of
>>>>>>>>> unsafe_get/put_XXX() macros without needing to take a CSR 
>>>>> bit flip cost
>>>>>>>>> on every access. Make switch_to() save and restore SR_SUM.
>>>>>>>>>
>>>>>>>>> Reported-by: 
>>>>>>>>> syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.com
>>>>>>>>> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
>>>>>>>>> Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com>
>>>>>>>>> ---
>>>>>>>>> arch/riscv/include/asm/processor.h | 1 +
>>>>>>>>> arch/riscv/kernel/asm-offsets.c    | 5 +++++
>>>>>>>>> arch/riscv/kernel/entry.S          | 8 ++++++++
>>>>>>>>> 3 files changed, 14 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/arch/riscv/include/asm/processor.h
>>>>>>>>> b/arch/riscv/include/ asm/processor.h
>>>>>>>>> index 5f56eb9d114a..58fd11c89fe9 100644
>>>>>>>>> --- a/arch/riscv/include/asm/processor.h
>>>>>>>>> +++ b/arch/riscv/include/asm/processor.h
>>>>>>>>> @@ -103,6 +103,7 @@ struct thread_struct {
>>>>>>>>>     struct __riscv_d_ext_state fstate;
>>>>>>>>>     unsigned long bad_cause;
>>>>>>>>>     unsigned long envcfg;
>>>>>>>>> +    unsigned long status;
>>>>>>>
>>>>>>> Do we really need a new member field in `thread_struct`. We 
>>>>> already have
>>>>>>> `sstatus` in `pt_regs` which reflects overall execution environment
>>>>>>> situation
>>>>>>> for current thread. This gets saved and restored on trap entry 
>>>>> and exit.
>>>>>>>
>>>>>>> If we put `status` in `thread_struct` it creates ambiguity in terms
>>>>>>> of which
>>>>>>> `status` to save to and pick from from future maintainibility
>>>>>>> purposes as the
>>>>>>> fields get introduced to this CSR.
>>>>>>>
>>>>>>> Why can't we access current trap frame's `sstatus` image in
>>>>>>> `__switch_to` to
>>>>>>> save and restore?
>>>>>>>
>>>>>>> Let me know if I am missing something obvious here. If there is a
>>>>>>> complication,
>>>>>>> I am missing here and we do end up using this member field, I would
>>>>>>> rename it
>>>>>>> to something like `status_kernel` to reflect that. So that future
>>>>>>> changes are
>>>>>>> cognizant of the fact that we have split `status`. One for kernel
>>>>>>> execution env
>>>>>>> per thread and one for controlling user execution env per thread.
>>>>>>
>>>>>> This is so long ago now I cannot remember if there was any 
>>>>>> sstatus in
>>>>>> the pt_regs field,
>>>>>
>>>>> FS/VS bits encode status of floating point and vector on 
>>>>> per-thread basis.
>>>>> So `status` has been part of `pt_regs` for quite a while.
>>>>>
>>>>>> and if kernel threads have the same context as their
>>>>>> userland parts.
>>>>>
>>>>> I didn't mean kernel thread. What I meant was kernel execution 
>>>>> environment
>>>>> per-thread. A userland thread does spend sometime in kernel and 
>>>>> kernel does
>>>>> things on its behalf. One of those thing is touching user memory 
>>>>> and that
>>>>> requires mucking with this CSR. So what I meant was are we 
>>>>> splitting `status`
>>>>> on per-thread basis for their time spent in user and kernel.
>>>>>
>>>>> Getting back to original question--
>>>>> As I said, each thread spends sometime in user or in kernel. 
>>>>> `status` in
>>>>> `pt_regs` is saved on trap entry and restored on trap exit. In a 
>>>>> sense,
>>>>> `status` field in `pt_regs` is reflecting execution status of the 
>>>>> thread on per
>>>>> trap basis. Introducing `status` in `thread_struct` creates a 
>>>>> confusion (if not
>>>>> for today, certainly for future) of which `status` to pick from 
>>>>> when we are
>>>>> doing save/restore.
>>>>
>>>> I agree that it's a confusion. sstatus is already saved on pt_regs on
>>>> trap entries/return, adding another entry adds code complexity and
>>>> makes data inconsistent. But, perhaps we'd eventually need something
>>>> like this (I will explain why). Still, there might be a better
>>>> approach.
>>>>
>>>> Yes, we can always reflect pt_regs for sstatus. We all know that
>>>> pt_regs reflects sstatus at trap entry, and the pt_regs at scheduler
>>>> point refers to "user's" pt_regs whenever it first enters kernel 
>>>> mode. Here
>>>> are reasons why SR_SUM here may or may not be properly tracked. First,
>>>> if this is a trap introduced context switch (such as interrupting in a
>>>> preemptible context after we manually enable user access in put_user),
>>>> then SR_SUM is saved somewhere in the kernel stack, and is not
>>>> reference-able with task_pt_reg during context switch. But we are safe
>>>> because the trap exit asm would help us restore the correct SR_SUM
>>>> back. However, if this is a self-initiating context switch (calling
>>>> into schedule()), then SR_SUM is not saved anywhere, and possibly
>>>> causing this error.
>>>>
>>>> Preemptible Vector in the kernel mode also had this problem where a
>>>> self-initiating context switch loses the track of sstatus.vs. The way
>>>> I managed it is to track the VS bit at context switch time. However,
>>>> this bug shows that people are repeatedly facing the problem, and
>>>> maybe it suggests that we'd need a better way of managing sstatus
>>>> across context switches. Given the complex nature of this register,
>>>> which also touches the interrupt enable status, I don't think naively
>>>> saving/restoring the entire register is the way to go. Maybe the
>>>> variable deserves a more specific naming and documentation. And if
>>>> we'd need a centralized place for managing these statuses, then it
>>>> also has to take care of sstatus.VS.
>>
>>
>> Andy, thanks for the precise explanation of the problem :)
>>
>> So it took me some time but here are my thoughts on this. We should 
>> treat pt_regs and thread_struct differently as they do not represent 
>> the same thing:
>> - pt_regs represents the context of a thread when it takes a trap
>> - thread_struct represents a "kernel-induced" (or a "in-kernel") 
>> context not caused by traps
>
> Exactly they represent different context of execution. Trap represents a
> non-linear control flow change and thus a fresh start of execution 
> control
> flow into kernel while `kernel-induced` one's are again non-linear but
> fully a kernel/software construct.
>
> A fresh trapped execution context shouldn't have SUM set which is how 
> it is
> currently in kernel. This bit gets cleared in trap entry and `sstatus` 
> gets
> saved in `pt_regs` (including SR_IE) so that it could be restored 
> whenever
> `sret` happens.
>
> The problem we'are seeing here is two fold--
>
> 1) We don't want to set and clear when we are accessing array/string 
> for each
>    word. This is software problem and this entire series is addressing 
> it.
>
> 2) To avoid first problem we are optimizing the access to CSR by 
> setting it
>    once and clearing it once. But now we don't want to loose this bit 
> if there
>    were:
>
>     a) trap in between         b) kernel induced schedule out
>         c) a) followed by b)
>         d) a) followed by another a)
>         e) nested traps
>
> If a) occurs, we are definitley loosing the bit as per current code.


If a trap occurs while the SUM bit is set, the SUM bit will be saved in 
pt_regs and restored when we come back so we don't lose it when a) occurs.


> If b)
> happens then also the same situation.


Currently, we do lose it in that case indeed.


>
> Saving it in `thread_struct` only addresses `b`. And not `a`, `c`, `d` 
> and
> `e`. IMHO `e` is far-fetched situation but I believe `a`, `b`, `c` and 
> `d` happen
> during normal runtime of kernel.
>
> So it all depends on nesting level of traps supported by riscv kernel.
>
> Illustraing `c + d` example, if kernel can take 2 nested level of 
> traps with
> first trap context having had the SUM bit set, but the second trap had 
> it clear
> and now comes the switch out of this thread, at this point if it were 
> saved in
> `thread_struct` SUM would be lost for the first trap.
>
> Later when the thread gets switched in again, you would go in 2nd trap
> context without SUM (because `thread_context` didnt had it saved), 
> which is
> fine. Although when 2nd trap context eventually performs `sret`, it will
> go back to first trap context where SUM was expected to be set because it
> touching a user memory.
>
> A good example would be a syscall, so that's the first trap. SUM bit 
> is set,
> touched user memory and took a trap (page fault). Now code is in 
> second trap
> which should clear the SUM bit. Somewhere in memory manager stack, 
> thread is
> scheduled out and now `sstatus` is saved in `thread_struct`. This is only
> serving current trap context needs and not the one where `SUM` needed 
> to be
> set.


Hmm to me we don't lose the SUM bit in case of a trap, only when eager 
schedule happens:

thread A
|
|-> syscall
       |
       SUM bit is set
       |
        -> page fault (trap)
             |
              sstatus with SUM bit set is saved on pt_regs
              SUM bit is cleared
             |
              -> eager schedule
                  |
                  -> we save SUM bit cleared in thread_struct
                      |
                      |
                       schedule thread B....
                      |
                      |
                     <- switch_to thread A again
                  |
                  we restore SUM bit cleared from thread_struct
                  |
                <- we resume execution of page fault trap
               |
               so we restore SUM bit saved on pt_regs which *has* SUM 
bit set
               |
             <- sret
           |
           SUM bit is set and we continue the first syscall.

So based on my wonderful ascii art, it works :) Or did I miss something?


>
> We can support such nesting only by ensuring below
>
> On trap entry do - save `status` in `pt_regs` or some other FILO data 
> structure
> - clear SUM (and other bits needed to be cleared)
>
> On trap return do
> - reload `status` from `pt_regs` or some FILO data structure
>
> Quite analogous to what we do for SR_IE as well.
>
>>
>> That's why I don't really like Deepak's proposal below as it mixes 
>> both and I find it tricky.
>>
>> I can't find a situation where saving/restoring the entire sstatus at 
>> context-switch is a problem though, does anyone have such thing in mind?
>>
>> Finally I understand that having another copy of sstatus in 
>> thread_struct is not intuitive and we should, either explain why or 
>> only store the SUM bit (like for sstatus.VS).
>>
>> Please continue the discussion as we need to find a solution that 
>> pleases everyone soon :)
>>
>> Thanks all for jumping in,
>>
>> Alex
>>
>>
>>>
>>>
>>> IMHO, the problem we are trying to solve in this patch is easily 
>>> solvable in
>>> below manner.
>>>
>>>
>>> diff --git a/arch/riscv/include/asm/switch_to.h 
>>> b/arch/riscv/include/asm/switch_to.h
>>> index 0e71eb82f920..499d00a6fb67 100644
>>> --- a/arch/riscv/include/asm/switch_to.h
>>> +++ b/arch/riscv/include/asm/switch_to.h
>>> @@ -58,6 +58,20 @@ static inline void __switch_to_fpu(struct 
>>> task_struct *prev,
>>>         fstate_restore(next, task_pt_regs(next));
>>>  }
>>>
>>> +static inline void __switch_to_status(struct task_struct *prev,
>>> +                                  struct task_struct *next)
>>> +{
>>> +       struct pt_regs *regs;
>>> +
>>> +       /* save status */
>>> +       regs = task_pt_regs(prev);
>>> +       regs->status = csr_read(CSR_STATUS);
>>> +
>>> +       /* restore status */
>>> +       regs = task_pt_regs(next);
>>> +       csr_write(CSR_STATUS, regs->status);
>>> +}
>>> +
>>>  static __always_inline bool has_fpu(void)
>>>  {
>>>         return riscv_has_extension_likely(RISCV_ISA_EXT_f) ||
>>> @@ -115,6 +129,7 @@ do 
>>> {                                                        \
>>>         struct task_struct *__prev = (prev);            \
>>>         struct task_struct *__next = (next);            \
>>>         __set_prev_cpu(__prev->thread);                 \
>>> +       __switch_to_status(__prev, __next)              \
>>>         if (has_fpu())                                  \
>>>                 __switch_to_fpu(__prev, __next);        \
>>>         if (has_vector() || has_xtheadvector())         \
>>> diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
>>> index 8d25837a9384..a3b98c1be055 100644
>>> --- a/arch/riscv/kernel/entry.S
>>> +++ b/arch/riscv/kernel/entry.S
>>> @@ -162,17 +162,8 @@ SYM_CODE_START(handle_exception)
>>>         REG_S x5,  PT_T0(sp)
>>>         save_from_x6_to_x31
>>>
>>> -       /*
>>> -        * Disable user-mode memory access as it should only be set 
>>> in the
>>> -        * actual user copy routines.
>>> -        *
>>> -        * Disable the FPU/Vector to detect illegal usage of 
>>> floating point
>>> -        * or vector in kernel space.
>>> -        */
>>> -       li t0, SR_SUM | SR_FS_VS | SR_ELP
>>> -
>>>         REG_L s0, TASK_TI_USER_SP(tp)
>>> -       csrrc s1, CSR_STATUS, t0
>>> +       csrr s1, CSR_STATUS
>>>         save_userssp s2, s1
>>>         csrr s2, CSR_EPC
>>>         csrr s3, CSR_TVAL
>>> @@ -185,6 +176,16 @@ SYM_CODE_START(handle_exception)
>>>         REG_S s4, PT_CAUSE(sp)
>>>         REG_S s5, PT_TP(sp)
>>>
>>> +       /*
>>> +        * It is fresh trap entry. Disable user-mode memory access 
>>> as it should only be set in the
>>> +        * actual user copy routines.
>>> +        *
>>> +        * Disable the FPU/Vector to detect illegal usage of 
>>> floating point
>>> +        * or vector in kernel space.
>>> +        */
>>> +       li t0, SR_SUM | SR_FS_VS | SR_ELP
>>> +       csrrc s1, CSR_STATUS, t0
>>> +
>>>         /*
>>>          * Set the scratch register to 0, so that if a recursive 
>>> exception
>>>          * occurs, the exception vector knows it came from the kernel
>>>
>>>
>>>
>>> During the time spent in kernel if sets SUM bit in status then, above
>>> `__switch_to_status` will ensure that `status` will get saved for 
>>> current
>>> thread and restored for next thread.
>>>
>>> Furthermore, current trap entry code clears FS/VS/SUM (for right 
>>> reasons). It
>>> represents non-linear change of control flow and thus whatever will 
>>> execute next
>>> shouldn't need SUM/FS/VS unless it wants to set it). This patch 
>>> slightly
>>> modifies the flow by first saving the `status` on trap frame (thus 
>>> if previous
>>> trap frame had SUM=1, it will be saved and restored). And then it
>>> unconditionally clears the SUM/FS/VS to ensure that this new trap 
>>> context runs
>>> without needing SUM=1. This ensures nesting of trap frames without 
>>> diluting
>>> security properties of SUM.
>>>
>>>>
>>>> Thanks,
>>>> Andy
>>>>
>>>>
>>>>
>>>>
>>>>>
>>>>> So my first question was why not to use `status` in `pt_regs`. It 
>>>>> is granular
>>>>> as it can get (it is available per thread context per trap basis).
>>>>>
>>>>>
>>>>> I did ask Alex as well. I'll ping him again.
>>>>>
>>>>>>
>>>>>> Does anyone else have any comment on this?
>>>>>>
>>>>>>>
>>>>>>>>>     u32 riscv_v_flags;
>>>>>>>>>     u32 vstate_ctrl;
>>>>>>>>>     struct __riscv_v_ext_state vstate;
>>>>>>>>> diff --git a/arch/riscv/kernel/asm-offsets.c
>>>>>>>>> b/arch/riscv/kernel/asm- offsets.c
>>>>>>>>> index 16490755304e..969c65b1fe41 100644
>>>>>>>>> --- a/arch/riscv/kernel/asm-offsets.c
>>>>>>>>> +++ b/arch/riscv/kernel/asm-offsets.c
>>>>>>>>> @@ -34,6 +34,7 @@ void asm_offsets(void)
>>>>>>>>>     OFFSET(TASK_THREAD_S9, task_struct, thread.s[9]);
>>>>>>>>>     OFFSET(TASK_THREAD_S10, task_struct, thread.s[10]);
>>>>>>>>>     OFFSET(TASK_THREAD_S11, task_struct, thread.s[11]);
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> linux-riscv mailing list
>>>>>>> linux-riscv@lists.infradead.org
>>>>>>> http://lists.infradead.org/mailman/listinfo/linux-riscv
>>>>>>>
>>>>>>
>>>>>>
>>>>>> -- 
>>>>>> Ben Dooks http://www.codethink.co.uk/
>>>>>> Senior Engineer                                Codethink - 
>>>>> Providing Genius
>>>>>>
>>>>>> https://www.codethink.co.uk/privacy.html
>>>>>
>>>>> _______________________________________________
>>>>> linux-riscv mailing list
>>>>> linux-riscv@lists.infradead.org
>>>>> http://lists.infradead.org/mailman/listinfo/linux-riscv
>>>
>>> _______________________________________________
>>> linux-riscv mailing list
>>> linux-riscv@lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/linux-riscv
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

Re: [PATCH v6 1/5] riscv: save the SR_SUM status over switches

Posted by Deepak Gupta 8 months, 2 weeks ago

On Fri, May 23, 2025 at 10:00:11PM +0200, Alexandre Ghiti wrote:
>
>On 5/23/25 19:14, Deepak Gupta wrote:
>>On Fri, May 23, 2025 at 02:22:21PM +0200, Alexandre Ghiti wrote:
>>>Hi Andy, Deepak,
>>>
>>>On 5/23/25 00:43, Deepak Gupta wrote:
>>>>On Fri, May 23, 2025 at 01:42:49AM +0800, Andy Chiu wrote:
>>>>>On Thu, May 22, 2025 at 11:09 PM Deepak Gupta 
>>>>><debug@rivosinc.com> wrote:
>>>>>>
>>>>>>On Thu, May 22, 2025 at 07:23:32AM +0100, Ben Dooks wrote:
>>>>>>>On 20/05/2025 17:49, Deepak Gupta wrote:
>>>>>>>>I did give this patch my RB and had planned to come back 
>>>>>>>>to it to see
>>>>>>>>if it impacts cfi related patches. Thanks to alex for 
>>>>>>>>brinigng to my
>>>>>>>>attention again. As it stands today, it doesn't impact cfi related
>>>>>>>>changes but I've some concerns.
>>>>>>>>
>>>>>>>>Overall I do agree we should reduce number of SSTATUS accesses.
>>>>>>>>
>>>>>>>>Couple of questions on introducing new `sstatus` field (inline)
>>>>>>>>
>>>>>>>>On Tue, Apr 22, 2025 at 04:01:35PM -0700, Deepak Gupta wrote:
>>>>>>>>>On Thu, Apr 10, 2025 at 07:05:22AM +0000, Cyril Bur wrote:
>>>>>>>>>>From: Ben Dooks <ben.dooks@codethink.co.uk>
>>>>>>>>>>
>>>>>>>>>>When threads/tasks are switched we need to ensure 
>>>>>>>>>>the old
>>>>>>execution's
>>>>>>>>>>SR_SUM state is saved and the new thread has the old SR_SUM state
>>>>>>>>>>restored.
>>>>>>>>>>
>>>>>>>>>>The issue was seen under heavy load especially with 
>>>>>>>>>>the
>>>>>>syz-stress tool
>>>>>>>>>>running, with crashes as follows in schedule_tail:
>>>>>>>>>>
>>>>>>>>>>Unable to handle kernel access to user memory 
>>>>>>>>>>without
>>>>>>uaccess routines
>>>>>>>>>>at virtual address 000000002749f0d0
>>>>>>>>>>Oops [#1]
>>>>>>>>>>Modules linked in:
>>>>>>>>>>CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted
>>>>>>>>>>5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0
>>>>>>>>>>Hardware name: riscv-virtio,qemu (DT)
>>>>>>>>>>epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>>>>>>>>>>ra : task_pid_vnr include/linux/sched.h:1421 [inline]
>>>>>>>>>>ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264
>>>>>>>>>>epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp : 
>>>>>>>>>>ffffffe025d17ec0
>>>>>>>>>>gp : ffffffe005d25378 tp : ffffffe00f0d0000 t0 : 0000000000000000
>>>>>>>>>>t1 : 0000000000000001 t2 : 00000000000f4240 s0 : ffffffe025d17ee0
>>>>>>>>>>s1 : 000000002749f0d0 a0 : 000000000000002a a1 : 0000000000000003
>>>>>>>>>>a2 : 1ffffffc0cfac500 a3 : ffffffe0000c80cc a4 : 5ae9db91c19bbe00
>>>>>>>>>>a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffe000082eba
>>>>>>>>>>s2 : 0000000000040000 s3 : ffffffe00eef96c0 s4 : ffffffe022c77fe0
>>>>>>>>>>s5 : 0000000000004000 s6 : ffffffe067d74e00 s7 : ffffffe067d74850
>>>>>>>>>>s8 : ffffffe067d73e18 s9 : ffffffe067d74e00 s10: ffffffe00eef96e8
>>>>>>>>>>s11: 000000ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffffffc4043cafb2
>>>>>>>>>>t5 : ffffffc4043cafba t6 : 0000000000040000
>>>>>>>>>>status: 0000000000000120 badaddr: 000000002749f0d0 cause:
>>>>>>>>>>000000000000000f
>>>>>>>>>>Call Trace:
>>>>>>>>>>[<ffffffe00008c8b0>] schedule_tail+0x72/0xb2
>>>>>>kernel/sched/core.c:4264
>>>>>>>>>>[<ffffffe000005570>] ret_from_exception+0x0/0x14
>>>>>>>>>>Dumping ftrace buffer:
>>>>>>>>>>(ftrace buffer empty)
>>>>>>>>>>---[ end trace b5f8f9231dc87dda ]---
>>>>>>>>>>
>>>>>>>>>>The issue comes from the put_user() in schedule_tail
>>>>>>>>>>(kernel/sched/core.c) doing the following:
>>>>>>>>>>
>>>>>>>>>>asmlinkage __visible void schedule_tail(struct task_struct *prev)
>>>>>>>>>>{
>>>>>>>>>>...
>>>>>>>>>>      if (current->set_child_tid)
>>>>>>>>>>              put_user(task_pid_vnr(current),
>>>>>>current->set_child_tid);
>>>>>>>>>>...
>>>>>>>>>>}
>>>>>>>>>>
>>>>>>>>>>the put_user() macro causes the code sequence to 
>>>>>>>>>>come out as
>>>>>>follows:
>>>>>>>>>>
>>>>>>>>>>1:    __enable_user_access()
>>>>>>>>>>2:    reg = task_pid_vnr(current);
>>>>>>>>>>3:    *current->set_child_tid = reg;
>>>>>>>>>>4:    __disable_user_access()
>>>>>>>>>>
>>>>>>>>>>The problem is that we may have a sleeping function 
>>>>>>>>>>as
>>>>>>argument which
>>>>>>>>>>could clear SR_SUM causing the panic above. This was fixed by
>>>>>>>>>>evaluating the argument of the put_user() macro 
>>>>>>>>>>outside the
>>>>>>user-enabled
>>>>>>>>>>section in commit 285a76bb2cf5 ("riscv: evaluate 
>>>>>>>>>>put_user()
>>>>>>arg before
>>>>>>>>>>enabling user access")"
>>>>>>>>>>
>>>>>>>>>>In order for riscv to take advantage of 
>>>>>>>>>>unsafe_get/put_XXX()
>>>>>>macros and
>>>>>>>>>>to avoid the same issue we had with put_user() and 
>>>>>>>>>>sleeping
>>>>>>functions we
>>>>>>>>>>must ensure code flow can go through switch_to() 
>>>>>>>>>>from within
>>>>>>a region of
>>>>>>>>>>code with SR_SUM enabled and come back with SR_SUM 
>>>>>>>>>>still
>>>>>>enabled. This
>>>>>>>>>>patch addresses the problem allowing future work to 
>>>>>>>>>>enable
>>>>>>full use of
>>>>>>>>>>unsafe_get/put_XXX() macros without needing to take 
>>>>>>>>>>a CSR
>>>>>>bit flip cost
>>>>>>>>>>on every access. Make switch_to() save and restore SR_SUM.
>>>>>>>>>>
>>>>>>>>>>Reported-by: 
>>>>>>>>>>syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.com
>>>>>>>>>>Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
>>>>>>>>>>Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com>
>>>>>>>>>>---
>>>>>>>>>>arch/riscv/include/asm/processor.h | 1 +
>>>>>>>>>>arch/riscv/kernel/asm-offsets.c    | 5 +++++
>>>>>>>>>>arch/riscv/kernel/entry.S          | 8 ++++++++
>>>>>>>>>>3 files changed, 14 insertions(+)
>>>>>>>>>>
>>>>>>>>>>diff --git a/arch/riscv/include/asm/processor.h
>>>>>>>>>>b/arch/riscv/include/ asm/processor.h
>>>>>>>>>>index 5f56eb9d114a..58fd11c89fe9 100644
>>>>>>>>>>--- a/arch/riscv/include/asm/processor.h
>>>>>>>>>>+++ b/arch/riscv/include/asm/processor.h
>>>>>>>>>>@@ -103,6 +103,7 @@ struct thread_struct {
>>>>>>>>>>    struct __riscv_d_ext_state fstate;
>>>>>>>>>>    unsigned long bad_cause;
>>>>>>>>>>    unsigned long envcfg;
>>>>>>>>>>+    unsigned long status;
>>>>>>>>
>>>>>>>>Do we really need a new member field in `thread_struct`. 
>>>>>>>>We
>>>>>>already have
>>>>>>>>`sstatus` in `pt_regs` which reflects overall execution environment
>>>>>>>>situation
>>>>>>>>for current thread. This gets saved and restored on trap 
>>>>>>>>entry
>>>>>>and exit.
>>>>>>>>
>>>>>>>>If we put `status` in `thread_struct` it creates ambiguity in terms
>>>>>>>>of which
>>>>>>>>`status` to save to and pick from from future maintainibility
>>>>>>>>purposes as the
>>>>>>>>fields get introduced to this CSR.
>>>>>>>>
>>>>>>>>Why can't we access current trap frame's `sstatus` image in
>>>>>>>>`__switch_to` to
>>>>>>>>save and restore?
>>>>>>>>
>>>>>>>>Let me know if I am missing something obvious here. If there is a
>>>>>>>>complication,
>>>>>>>>I am missing here and we do end up using this member field, I would
>>>>>>>>rename it
>>>>>>>>to something like `status_kernel` to reflect that. So that future
>>>>>>>>changes are
>>>>>>>>cognizant of the fact that we have split `status`. One for kernel
>>>>>>>>execution env
>>>>>>>>per thread and one for controlling user execution env per thread.
>>>>>>>
>>>>>>>This is so long ago now I cannot remember if there was any 
>>>>>>>sstatus in
>>>>>>>the pt_regs field,
>>>>>>
>>>>>>FS/VS bits encode status of floating point and vector on 
>>>>>>per-thread basis.
>>>>>>So `status` has been part of `pt_regs` for quite a while.
>>>>>>
>>>>>>>and if kernel threads have the same context as their
>>>>>>>userland parts.
>>>>>>
>>>>>>I didn't mean kernel thread. What I meant was kernel 
>>>>>>execution environment
>>>>>>per-thread. A userland thread does spend sometime in kernel 
>>>>>>and kernel does
>>>>>>things on its behalf. One of those thing is touching user 
>>>>>>memory and that
>>>>>>requires mucking with this CSR. So what I meant was are we 
>>>>>>splitting `status`
>>>>>>on per-thread basis for their time spent in user and kernel.
>>>>>>
>>>>>>Getting back to original question--
>>>>>>As I said, each thread spends sometime in user or in kernel. 
>>>>>>`status` in
>>>>>>`pt_regs` is saved on trap entry and restored on trap exit. 
>>>>>>In a sense,
>>>>>>`status` field in `pt_regs` is reflecting execution status 
>>>>>>of the thread on per
>>>>>>trap basis. Introducing `status` in `thread_struct` creates 
>>>>>>a confusion (if not
>>>>>>for today, certainly for future) of which `status` to pick 
>>>>>>from when we are
>>>>>>doing save/restore.
>>>>>
>>>>>I agree that it's a confusion. sstatus is already saved on pt_regs on
>>>>>trap entries/return, adding another entry adds code complexity and
>>>>>makes data inconsistent. But, perhaps we'd eventually need something
>>>>>like this (I will explain why). Still, there might be a better
>>>>>approach.
>>>>>
>>>>>Yes, we can always reflect pt_regs for sstatus. We all know that
>>>>>pt_regs reflects sstatus at trap entry, and the pt_regs at scheduler
>>>>>point refers to "user's" pt_regs whenever it first enters 
>>>>>kernel mode. Here
>>>>>are reasons why SR_SUM here may or may not be properly tracked. First,
>>>>>if this is a trap introduced context switch (such as interrupting in a
>>>>>preemptible context after we manually enable user access in put_user),
>>>>>then SR_SUM is saved somewhere in the kernel stack, and is not
>>>>>reference-able with task_pt_reg during context switch. But we are safe
>>>>>because the trap exit asm would help us restore the correct SR_SUM
>>>>>back. However, if this is a self-initiating context switch (calling
>>>>>into schedule()), then SR_SUM is not saved anywhere, and possibly
>>>>>causing this error.
>>>>>
>>>>>Preemptible Vector in the kernel mode also had this problem where a
>>>>>self-initiating context switch loses the track of sstatus.vs. The way
>>>>>I managed it is to track the VS bit at context switch time. However,
>>>>>this bug shows that people are repeatedly facing the problem, and
>>>>>maybe it suggests that we'd need a better way of managing sstatus
>>>>>across context switches. Given the complex nature of this register,
>>>>>which also touches the interrupt enable status, I don't think naively
>>>>>saving/restoring the entire register is the way to go. Maybe the
>>>>>variable deserves a more specific naming and documentation. And if
>>>>>we'd need a centralized place for managing these statuses, then it
>>>>>also has to take care of sstatus.VS.
>>>
>>>
>>>Andy, thanks for the precise explanation of the problem :)
>>>
>>>So it took me some time but here are my thoughts on this. We 
>>>should treat pt_regs and thread_struct differently as they do not 
>>>represent the same thing:
>>>- pt_regs represents the context of a thread when it takes a trap
>>>- thread_struct represents a "kernel-induced" (or a "in-kernel") 
>>>context not caused by traps
>>
>>Exactly they represent different context of execution. Trap represents a
>>non-linear control flow change and thus a fresh start of execution 
>>control
>>flow into kernel while `kernel-induced` one's are again non-linear but
>>fully a kernel/software construct.
>>
>>A fresh trapped execution context shouldn't have SUM set which is 
>>how it is
>>currently in kernel. This bit gets cleared in trap entry and 
>>`sstatus` gets
>>saved in `pt_regs` (including SR_IE) so that it could be restored 
>>whenever
>>`sret` happens.
>>
>>The problem we'are seeing here is two fold--
>>
>>1) We don't want to set and clear when we are accessing array/string 
>>for each
>>   word. This is software problem and this entire series is 
>>addressing it.
>>
>>2) To avoid first problem we are optimizing the access to CSR by 
>>setting it
>>   once and clearing it once. But now we don't want to loose this 
>>bit if there
>>   were:
>>
>>    a) trap in between         b) kernel induced schedule out
>>        c) a) followed by b)
>>        d) a) followed by another a)
>>        e) nested traps
>>
>>If a) occurs, we are definitley loosing the bit as per current code.
>
>
>If a trap occurs while the SUM bit is set, the SUM bit will be saved 
>in pt_regs and restored when we come back so we don't lose it when a) 
>occurs.

yes. My bad on that, Sorry about that.

a) is fine with current `status` save/restore on pt_regs on trap frame.

>
>
>>If b)
>>happens then also the same situation.
>
>
>Currently, we do lose it in that case indeed.
>
>
>>
>>Saving it in `thread_struct` only addresses `b`. And not `a`, `c`, 
>>`d` and
>>`e`. IMHO `e` is far-fetched situation but I believe `a`, `b`, `c` 
>>and `d` happen
>>during normal runtime of kernel.
>>
>>So it all depends on nesting level of traps supported by riscv kernel.
>>
>>Illustraing `c + d` example, if kernel can take 2 nested level of 
>>traps with
>>first trap context having had the SUM bit set, but the second trap 
>>had it clear
>>and now comes the switch out of this thread, at this point if it 
>>were saved in
>>`thread_struct` SUM would be lost for the first trap.
>>
>>Later when the thread gets switched in again, you would go in 2nd trap
>>context without SUM (because `thread_context` didnt had it saved), 
>>which is
>>fine. Although when 2nd trap context eventually performs `sret`, it will
>>go back to first trap context where SUM was expected to be set because it
>>touching a user memory.
>>
>>A good example would be a syscall, so that's the first trap. SUM bit 
>>is set,
>>touched user memory and took a trap (page fault). Now code is in 
>>second trap
>>which should clear the SUM bit. Somewhere in memory manager stack, 
>>thread is
>>scheduled out and now `sstatus` is saved in `thread_struct`. This is only
>>serving current trap context needs and not the one where `SUM` 
>>needed to be
>>set.
>
>
>Hmm to me we don't lose the SUM bit in case of a trap, only when eager 
>schedule happens:
>
>thread A
>|
>|-> syscall
>      |
>      SUM bit is set
>      |
>       -> page fault (trap)
>            |
>             sstatus with SUM bit set is saved on pt_regs
>             SUM bit is cleared
>            |
>             -> eager schedule
>                 |
>                 -> we save SUM bit cleared in thread_struct
>                     |
>                     |
>                      schedule thread B....
>                     |
>                     |
>                    <- switch_to thread A again
>                 |
>                 we restore SUM bit cleared from thread_struct
>                 |
>               <- we resume execution of page fault trap
>              |
>              so we restore SUM bit saved on pt_regs which *has* SUM 
>bit set
>              |
>            <- sret
>          |
>          SUM bit is set and we continue the first syscall.
>
>So based on my wonderful ascii art, it works :) Or did I miss something?

Again I think I missed/confused it in my head when I was trying to ascertain
which `status` will be picked in which situation.

Two questions:

1) In this particular case, there won't be any yielding (kernel induced) between
    `set SUM` and `clear SUM`, right?


2) Will there be nesting of kernel induced events? If not, then I believe current
    patch is good enough. 


If I have to summarize--
- Nesting of `SUM` save/restore across traps is already served by trap entry/exit.
- Kernel induced control flow changes (scheduling) are not allowed between set
   and clear of SUM (and likely future status bits)
- If nesting of kernel induced events dont need to be supported and their
   invocation follow the 2nd rule, then having it in thread_struct makes sense.
   I would ideally call it something else to indicate intentionality.


Let me know if I got it right this time?


>
>
>>
>>We can support such nesting only by ensuring below
>>
>>On trap entry do - save `status` in `pt_regs` or some other FILO 
>>data structure
>>- clear SUM (and other bits needed to be cleared)
>>
>>On trap return do
>>- reload `status` from `pt_regs` or some FILO data structure
>>
>>Quite analogous to what we do for SR_IE as well.
>>
>>>
>>>That's why I don't really like Deepak's proposal below as it mixes 
>>>both and I find it tricky.
>>>
>>>I can't find a situation where saving/restoring the entire sstatus 
>>>at context-switch is a problem though, does anyone have such thing 
>>>in mind?
>>>
>>>Finally I understand that having another copy of sstatus in 
>>>thread_struct is not intuitive and we should, either explain why 
>>>or only store the SUM bit (like for sstatus.VS).
>>>
>>>Please continue the discussion as we need to find a solution that 
>>>pleases everyone soon :)
>>>
>>>Thanks all for jumping in,
>>>
>>>Alex
>>>
>>>
>>>>
>>>>
>>>>IMHO, the problem we are trying to solve in this patch is easily 
>>>>solvable in
>>>>below manner.
>>>>
>>>>
>>>>diff --git a/arch/riscv/include/asm/switch_to.h 
>>>>b/arch/riscv/include/asm/switch_to.h
>>>>index 0e71eb82f920..499d00a6fb67 100644
>>>>--- a/arch/riscv/include/asm/switch_to.h
>>>>+++ b/arch/riscv/include/asm/switch_to.h
>>>>@@ -58,6 +58,20 @@ static inline void __switch_to_fpu(struct 
>>>>task_struct *prev,
>>>>        fstate_restore(next, task_pt_regs(next));
>>>> }
>>>>
>>>>+static inline void __switch_to_status(struct task_struct *prev,
>>>>+                                  struct task_struct *next)
>>>>+{
>>>>+       struct pt_regs *regs;
>>>>+
>>>>+       /* save status */
>>>>+       regs = task_pt_regs(prev);
>>>>+       regs->status = csr_read(CSR_STATUS);
>>>>+
>>>>+       /* restore status */
>>>>+       regs = task_pt_regs(next);
>>>>+       csr_write(CSR_STATUS, regs->status);
>>>>+}
>>>>+
>>>> static __always_inline bool has_fpu(void)
>>>> {
>>>>        return riscv_has_extension_likely(RISCV_ISA_EXT_f) ||
>>>>@@ -115,6 +129,7 @@ do 
>>>>{                                                        \
>>>>        struct task_struct *__prev = (prev);            \
>>>>        struct task_struct *__next = (next);            \
>>>>        __set_prev_cpu(__prev->thread);                 \
>>>>+       __switch_to_status(__prev, __next)              \
>>>>        if (has_fpu())                                  \
>>>>                __switch_to_fpu(__prev, __next);        \
>>>>        if (has_vector() || has_xtheadvector())         \
>>>>diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
>>>>index 8d25837a9384..a3b98c1be055 100644
>>>>--- a/arch/riscv/kernel/entry.S
>>>>+++ b/arch/riscv/kernel/entry.S
>>>>@@ -162,17 +162,8 @@ SYM_CODE_START(handle_exception)
>>>>        REG_S x5,  PT_T0(sp)
>>>>        save_from_x6_to_x31
>>>>
>>>>-       /*
>>>>-        * Disable user-mode memory access as it should only be 
>>>>set in the
>>>>-        * actual user copy routines.
>>>>-        *
>>>>-        * Disable the FPU/Vector to detect illegal usage of 
>>>>floating point
>>>>-        * or vector in kernel space.
>>>>-        */
>>>>-       li t0, SR_SUM | SR_FS_VS | SR_ELP
>>>>-
>>>>        REG_L s0, TASK_TI_USER_SP(tp)
>>>>-       csrrc s1, CSR_STATUS, t0
>>>>+       csrr s1, CSR_STATUS
>>>>        save_userssp s2, s1
>>>>        csrr s2, CSR_EPC
>>>>        csrr s3, CSR_TVAL
>>>>@@ -185,6 +176,16 @@ SYM_CODE_START(handle_exception)
>>>>        REG_S s4, PT_CAUSE(sp)
>>>>        REG_S s5, PT_TP(sp)
>>>>
>>>>+       /*
>>>>+        * It is fresh trap entry. Disable user-mode memory 
>>>>access as it should only be set in the
>>>>+        * actual user copy routines.
>>>>+        *
>>>>+        * Disable the FPU/Vector to detect illegal usage of 
>>>>floating point
>>>>+        * or vector in kernel space.
>>>>+        */
>>>>+       li t0, SR_SUM | SR_FS_VS | SR_ELP
>>>>+       csrrc s1, CSR_STATUS, t0
>>>>+
>>>>        /*
>>>>         * Set the scratch register to 0, so that if a recursive 
>>>>exception
>>>>         * occurs, the exception vector knows it came from the kernel
>>>>
>>>>
>>>>
>>>>During the time spent in kernel if sets SUM bit in status then, above
>>>>`__switch_to_status` will ensure that `status` will get saved 
>>>>for current
>>>>thread and restored for next thread.
>>>>
>>>>Furthermore, current trap entry code clears FS/VS/SUM (for right 
>>>>reasons). It
>>>>represents non-linear change of control flow and thus whatever 
>>>>will execute next
>>>>shouldn't need SUM/FS/VS unless it wants to set it). This patch 
>>>>slightly
>>>>modifies the flow by first saving the `status` on trap frame 
>>>>(thus if previous
>>>>trap frame had SUM=1, it will be saved and restored). And then it
>>>>unconditionally clears the SUM/FS/VS to ensure that this new 
>>>>trap context runs
>>>>without needing SUM=1. This ensures nesting of trap frames 
>>>>without diluting
>>>>security properties of SUM.
>>>>
>>>>>
>>>>>Thanks,
>>>>>Andy
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>So my first question was why not to use `status` in 
>>>>>>`pt_regs`. It is granular
>>>>>>as it can get (it is available per thread context per trap basis).
>>>>>>
>>>>>>
>>>>>>I did ask Alex as well. I'll ping him again.
>>>>>>
>>>>>>>
>>>>>>>Does anyone else have any comment on this?
>>>>>>>
>>>>>>>>
>>>>>>>>>>    u32 riscv_v_flags;
>>>>>>>>>>    u32 vstate_ctrl;
>>>>>>>>>>    struct __riscv_v_ext_state vstate;
>>>>>>>>>>diff --git a/arch/riscv/kernel/asm-offsets.c
>>>>>>>>>>b/arch/riscv/kernel/asm- offsets.c
>>>>>>>>>>index 16490755304e..969c65b1fe41 100644
>>>>>>>>>>--- a/arch/riscv/kernel/asm-offsets.c
>>>>>>>>>>+++ b/arch/riscv/kernel/asm-offsets.c
>>>>>>>>>>@@ -34,6 +34,7 @@ void asm_offsets(void)
>>>>>>>>>>    OFFSET(TASK_THREAD_S9, task_struct, thread.s[9]);
>>>>>>>>>>    OFFSET(TASK_THREAD_S10, task_struct, thread.s[10]);
>>>>>>>>>>    OFFSET(TASK_THREAD_S11, task_struct, thread.s[11]);
>>>>>>>>
>>>>>>>>_______________________________________________
>>>>>>>>linux-riscv mailing list
>>>>>>>>linux-riscv@lists.infradead.org
>>>>>>>>http://lists.infradead.org/mailman/listinfo/linux-riscv
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>-- 
>>>>>>>Ben Dooks http://www.codethink.co.uk/
>>>>>>>Senior Engineer                                Codethink -
>>>>>>Providing Genius
>>>>>>>
>>>>>>>https://www.codethink.co.uk/privacy.html
>>>>>>
>>>>>>_______________________________________________
>>>>>>linux-riscv mailing list
>>>>>>linux-riscv@lists.infradead.org
>>>>>>http://lists.infradead.org/mailman/listinfo/linux-riscv
>>>>
>>>>_______________________________________________
>>>>linux-riscv mailing list
>>>>linux-riscv@lists.infradead.org
>>>>http://lists.infradead.org/mailman/listinfo/linux-riscv
>>
>>_______________________________________________
>>linux-riscv mailing list
>>linux-riscv@lists.infradead.org
>>http://lists.infradead.org/mailman/listinfo/linux-riscv

Re: [PATCH v6 1/5] riscv: save the SR_SUM status over switches

Posted by Alexandre Ghiti 9 months, 3 weeks ago

Hi Deepak,

On 23/04/2025 01:01, Deepak Gupta wrote:
> On Thu, Apr 10, 2025 at 07:05:22AM +0000, Cyril Bur wrote:
>> From: Ben Dooks <ben.dooks@codethink.co.uk>
>>
>> When threads/tasks are switched we need to ensure the old execution's
>> SR_SUM state is saved and the new thread has the old SR_SUM state
>> restored.
>>
>> The issue was seen under heavy load especially with the syz-stress tool
>> running, with crashes as follows in schedule_tail:
>>
>> Unable to handle kernel access to user memory without uaccess routines
>> at virtual address 000000002749f0d0
>> Oops [#1]
>> Modules linked in:
>> CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted
>> 5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0
>> Hardware name: riscv-virtio,qemu (DT)
>> epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>> ra : task_pid_vnr include/linux/sched.h:1421 [inline]
>> ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264
>> epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp : ffffffe025d17ec0
>> gp : ffffffe005d25378 tp : ffffffe00f0d0000 t0 : 0000000000000000
>> t1 : 0000000000000001 t2 : 00000000000f4240 s0 : ffffffe025d17ee0
>> s1 : 000000002749f0d0 a0 : 000000000000002a a1 : 0000000000000003
>> a2 : 1ffffffc0cfac500 a3 : ffffffe0000c80cc a4 : 5ae9db91c19bbe00
>> a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffe000082eba
>> s2 : 0000000000040000 s3 : ffffffe00eef96c0 s4 : ffffffe022c77fe0
>> s5 : 0000000000004000 s6 : ffffffe067d74e00 s7 : ffffffe067d74850
>> s8 : ffffffe067d73e18 s9 : ffffffe067d74e00 s10: ffffffe00eef96e8
>> s11: 000000ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffffffc4043cafb2
>> t5 : ffffffc4043cafba t6 : 0000000000040000
>> status: 0000000000000120 badaddr: 000000002749f0d0 cause:
>> 000000000000000f
>> Call Trace:
>> [<ffffffe00008c8b0>] schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>> [<ffffffe000005570>] ret_from_exception+0x0/0x14
>> Dumping ftrace buffer:
>>   (ftrace buffer empty)
>> ---[ end trace b5f8f9231dc87dda ]---
>>
>> The issue comes from the put_user() in schedule_tail
>> (kernel/sched/core.c) doing the following:
>>
>> asmlinkage __visible void schedule_tail(struct task_struct *prev)
>> {
>> ...
>>        if (current->set_child_tid)
>>                put_user(task_pid_vnr(current), current->set_child_tid);
>> ...
>> }
>>
>> the put_user() macro causes the code sequence to come out as follows:
>>
>> 1:    __enable_user_access()
>> 2:    reg = task_pid_vnr(current);
>> 3:    *current->set_child_tid = reg;
>> 4:    __disable_user_access()
>>
>> The problem is that we may have a sleeping function as argument which
>> could clear SR_SUM causing the panic above. This was fixed by
>> evaluating the argument of the put_user() macro outside the user-enabled
>> section in commit 285a76bb2cf5 ("riscv: evaluate put_user() arg before
>> enabling user access")"
>>
>> In order for riscv to take advantage of unsafe_get/put_XXX() macros and
>> to avoid the same issue we had with put_user() and sleeping functions we
>> must ensure code flow can go through switch_to() from within a region of
>> code with SR_SUM enabled and come back with SR_SUM still enabled. This
>> patch addresses the problem allowing future work to enable full use of
>> unsafe_get/put_XXX() macros without needing to take a CSR bit flip cost
>> on every access. Make switch_to() save and restore SR_SUM.
>>
>> Reported-by: syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.com
>> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
>> Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com>
>> ---
>> arch/riscv/include/asm/processor.h | 1 +
>> arch/riscv/kernel/asm-offsets.c    | 5 +++++
>> arch/riscv/kernel/entry.S          | 8 ++++++++
>> 3 files changed, 14 insertions(+)
>>
>> diff --git a/arch/riscv/include/asm/processor.h 
>> b/arch/riscv/include/asm/processor.h
>> index 5f56eb9d114a..58fd11c89fe9 100644
>> --- a/arch/riscv/include/asm/processor.h
>> +++ b/arch/riscv/include/asm/processor.h
>> @@ -103,6 +103,7 @@ struct thread_struct {
>>     struct __riscv_d_ext_state fstate;
>>     unsigned long bad_cause;
>>     unsigned long envcfg;
>> +    unsigned long status;
>>     u32 riscv_v_flags;
>>     u32 vstate_ctrl;
>>     struct __riscv_v_ext_state vstate;
>> diff --git a/arch/riscv/kernel/asm-offsets.c 
>> b/arch/riscv/kernel/asm-offsets.c
>> index 16490755304e..969c65b1fe41 100644
>> --- a/arch/riscv/kernel/asm-offsets.c
>> +++ b/arch/riscv/kernel/asm-offsets.c
>> @@ -34,6 +34,7 @@ void asm_offsets(void)
>>     OFFSET(TASK_THREAD_S9, task_struct, thread.s[9]);
>>     OFFSET(TASK_THREAD_S10, task_struct, thread.s[10]);
>>     OFFSET(TASK_THREAD_S11, task_struct, thread.s[11]);
>> +    OFFSET(TASK_THREAD_STATUS, task_struct, thread.status);
>>
>>     OFFSET(TASK_TI_CPU, task_struct, thread_info.cpu);
>>     OFFSET(TASK_TI_PREEMPT_COUNT, task_struct, 
>> thread_info.preempt_count);
>> @@ -346,6 +347,10 @@ void asm_offsets(void)
>>           offsetof(struct task_struct, thread.s[11])
>>         - offsetof(struct task_struct, thread.ra)
>>     );
>> +    DEFINE(TASK_THREAD_STATUS_RA,
>> +          offsetof(struct task_struct, thread.status)
>> +        - offsetof(struct task_struct, thread.ra)
>> +    );
>>
>>     DEFINE(TASK_THREAD_F0_F0,
>>           offsetof(struct task_struct, thread.fstate.f[0])
>> diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
>> index 33a5a9f2a0d4..00bd0de9faa2 100644
>> --- a/arch/riscv/kernel/entry.S
>> +++ b/arch/riscv/kernel/entry.S
>> @@ -397,9 +397,17 @@ SYM_FUNC_START(__switch_to)
>>     REG_S s9,  TASK_THREAD_S9_RA(a3)
>>     REG_S s10, TASK_THREAD_S10_RA(a3)
>>     REG_S s11, TASK_THREAD_S11_RA(a3)
>> +
>> +    /* save the user space access flag */
>> +    li    s0, SR_SUM
>> +    csrr  s1, CSR_STATUS
>> +    REG_S s1, TASK_THREAD_STATUS_RA(a3)
>> +
>>     /* Save the kernel shadow call stack pointer */
>>     scs_save_current
>>     /* Restore context from next->thread */
>> +    REG_L s0,  TASK_THREAD_STATUS_RA(a4)
>> +    csrs  CSR_STATUS, s0
>>     REG_L ra,  TASK_THREAD_RA_RA(a4)
>>     REG_L sp,  TASK_THREAD_SP_RA(a4)
>>     REG_L s0,  TASK_THREAD_S0_RA(a4)
>
> Reviewed-by: Deepak Gupta <debug@rivosinc.com>
>
> Note to alex ghiti,
>
> If this goes in before cfi changes, I might have to re-work some of the
> changes with respect to zicfilp handling. zicfilp introduces `elp` state
> in `sstatus`.


This patchset is in my for-next branch, CFI depends on SBI v3.0 so we 
can't know for sure it will get merged in 6.16.

So I advise you to rebase on top of this patchset :)

Thanks,

Alex


>
>> -- 
>> 2.34.1
>>
>>
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

Re: [PATCH v6 1/5] riscv: save the SR_SUM status over switches

Posted by Alexandre Ghiti 9 months, 3 weeks ago

Hi Cyril,

On 10/04/2025 09:05, Cyril Bur wrote:
> From: Ben Dooks <ben.dooks@codethink.co.uk>
>
> When threads/tasks are switched we need to ensure the old execution's
> SR_SUM state is saved and the new thread has the old SR_SUM state
> restored.
>
> The issue was seen under heavy load especially with the syz-stress tool
> running, with crashes as follows in schedule_tail:
>
> Unable to handle kernel access to user memory without uaccess routines
> at virtual address 000000002749f0d0
> Oops [#1]
> Modules linked in:
> CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted
> 5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0
> Hardware name: riscv-virtio,qemu (DT)
> epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>   ra : task_pid_vnr include/linux/sched.h:1421 [inline]
>   ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264
> epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp : ffffffe025d17ec0
>   gp : ffffffe005d25378 tp : ffffffe00f0d0000 t0 : 0000000000000000
>   t1 : 0000000000000001 t2 : 00000000000f4240 s0 : ffffffe025d17ee0
>   s1 : 000000002749f0d0 a0 : 000000000000002a a1 : 0000000000000003
>   a2 : 1ffffffc0cfac500 a3 : ffffffe0000c80cc a4 : 5ae9db91c19bbe00
>   a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffe000082eba
>   s2 : 0000000000040000 s3 : ffffffe00eef96c0 s4 : ffffffe022c77fe0
>   s5 : 0000000000004000 s6 : ffffffe067d74e00 s7 : ffffffe067d74850
>   s8 : ffffffe067d73e18 s9 : ffffffe067d74e00 s10: ffffffe00eef96e8
>   s11: 000000ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffffffc4043cafb2
>   t5 : ffffffc4043cafba t6 : 0000000000040000
> status: 0000000000000120 badaddr: 000000002749f0d0 cause:
> 000000000000000f
> Call Trace:
> [<ffffffe00008c8b0>] schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
> [<ffffffe000005570>] ret_from_exception+0x0/0x14
> Dumping ftrace buffer:
>     (ftrace buffer empty)
> ---[ end trace b5f8f9231dc87dda ]---
>
> The issue comes from the put_user() in schedule_tail
> (kernel/sched/core.c) doing the following:
>
> asmlinkage __visible void schedule_tail(struct task_struct *prev)
> {
> ...
>          if (current->set_child_tid)
>                  put_user(task_pid_vnr(current), current->set_child_tid);
> ...
> }
>
> the put_user() macro causes the code sequence to come out as follows:
>
> 1:	__enable_user_access()
> 2:	reg = task_pid_vnr(current);
> 3:	*current->set_child_tid = reg;
> 4:	__disable_user_access()
>
> The problem is that we may have a sleeping function as argument which
> could clear SR_SUM causing the panic above. This was fixed by
> evaluating the argument of the put_user() macro outside the user-enabled
> section in commit 285a76bb2cf5 ("riscv: evaluate put_user() arg before
> enabling user access")"
>
> In order for riscv to take advantage of unsafe_get/put_XXX() macros and
> to avoid the same issue we had with put_user() and sleeping functions we
> must ensure code flow can go through switch_to() from within a region of
> code with SR_SUM enabled and come back with SR_SUM still enabled. This
> patch addresses the problem allowing future work to enable full use of
> unsafe_get/put_XXX() macros without needing to take a CSR bit flip cost
> on every access. Make switch_to() save and restore SR_SUM.
>
> Reported-by: syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.com
> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
> Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com>
> ---
>   arch/riscv/include/asm/processor.h | 1 +
>   arch/riscv/kernel/asm-offsets.c    | 5 +++++
>   arch/riscv/kernel/entry.S          | 8 ++++++++
>   3 files changed, 14 insertions(+)
>
> diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
> index 5f56eb9d114a..58fd11c89fe9 100644
> --- a/arch/riscv/include/asm/processor.h
> +++ b/arch/riscv/include/asm/processor.h
> @@ -103,6 +103,7 @@ struct thread_struct {
>   	struct __riscv_d_ext_state fstate;
>   	unsigned long bad_cause;
>   	unsigned long envcfg;
> +	unsigned long status;
>   	u32 riscv_v_flags;
>   	u32 vstate_ctrl;
>   	struct __riscv_v_ext_state vstate;
> diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
> index 16490755304e..969c65b1fe41 100644
> --- a/arch/riscv/kernel/asm-offsets.c
> +++ b/arch/riscv/kernel/asm-offsets.c
> @@ -34,6 +34,7 @@ void asm_offsets(void)
>   	OFFSET(TASK_THREAD_S9, task_struct, thread.s[9]);
>   	OFFSET(TASK_THREAD_S10, task_struct, thread.s[10]);
>   	OFFSET(TASK_THREAD_S11, task_struct, thread.s[11]);
> +	OFFSET(TASK_THREAD_STATUS, task_struct, thread.status);
>   
>   	OFFSET(TASK_TI_CPU, task_struct, thread_info.cpu);
>   	OFFSET(TASK_TI_PREEMPT_COUNT, task_struct, thread_info.preempt_count);
> @@ -346,6 +347,10 @@ void asm_offsets(void)
>   		  offsetof(struct task_struct, thread.s[11])
>   		- offsetof(struct task_struct, thread.ra)
>   	);
> +	DEFINE(TASK_THREAD_STATUS_RA,
> +		  offsetof(struct task_struct, thread.status)
> +		- offsetof(struct task_struct, thread.ra)
> +	);
>   
>   	DEFINE(TASK_THREAD_F0_F0,
>   		  offsetof(struct task_struct, thread.fstate.f[0])
> diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
> index 33a5a9f2a0d4..00bd0de9faa2 100644
> --- a/arch/riscv/kernel/entry.S
> +++ b/arch/riscv/kernel/entry.S
> @@ -397,9 +397,17 @@ SYM_FUNC_START(__switch_to)
>   	REG_S s9,  TASK_THREAD_S9_RA(a3)
>   	REG_S s10, TASK_THREAD_S10_RA(a3)
>   	REG_S s11, TASK_THREAD_S11_RA(a3)
> +
> +	/* save the user space access flag */
> +	li    s0, SR_SUM


This is not needed anymore ^ but I'll remove it when merging your patchset.


> +	csrr  s1, CSR_STATUS
> +	REG_S s1, TASK_THREAD_STATUS_RA(a3)
> +
>   	/* Save the kernel shadow call stack pointer */
>   	scs_save_current
>   	/* Restore context from next->thread */
> +	REG_L s0,  TASK_THREAD_STATUS_RA(a4)
> +	csrs  CSR_STATUS, s0
>   	REG_L ra,  TASK_THREAD_RA_RA(a4)
>   	REG_L sp,  TASK_THREAD_SP_RA(a4)
>   	REG_L s0,  TASK_THREAD_S0_RA(a4)

Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>

Thanks for the multiple revisions!

Alex

Re: [PATCH v6 1/5] riscv: save the SR_SUM status over switches

Posted by Ben Dooks 8 months, 3 weeks ago

On 22/04/2025 11:22, Alexandre Ghiti wrote:
> Hi Cyril,
> 
> On 10/04/2025 09:05, Cyril Bur wrote:
>> From: Ben Dooks <ben.dooks@codethink.co.uk>
>>
>> When threads/tasks are switched we need to ensure the old execution's
>> SR_SUM state is saved and the new thread has the old SR_SUM state
>> restored.
>>
>> The issue was seen under heavy load especially with the syz-stress tool
>> running, with crashes as follows in schedule_tail:
>>
>> Unable to handle kernel access to user memory without uaccess routines
>> at virtual address 000000002749f0d0
>> Oops [#1]
>> Modules linked in:
>> CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted
>> 5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0
>> Hardware name: riscv-virtio,qemu (DT)
>> epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>>   ra : task_pid_vnr include/linux/sched.h:1421 [inline]
>>   ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264
>> epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp : ffffffe025d17ec0
>>   gp : ffffffe005d25378 tp : ffffffe00f0d0000 t0 : 0000000000000000
>>   t1 : 0000000000000001 t2 : 00000000000f4240 s0 : ffffffe025d17ee0
>>   s1 : 000000002749f0d0 a0 : 000000000000002a a1 : 0000000000000003
>>   a2 : 1ffffffc0cfac500 a3 : ffffffe0000c80cc a4 : 5ae9db91c19bbe00
>>   a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffe000082eba
>>   s2 : 0000000000040000 s3 : ffffffe00eef96c0 s4 : ffffffe022c77fe0
>>   s5 : 0000000000004000 s6 : ffffffe067d74e00 s7 : ffffffe067d74850
>>   s8 : ffffffe067d73e18 s9 : ffffffe067d74e00 s10: ffffffe00eef96e8
>>   s11: 000000ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffffffc4043cafb2
>>   t5 : ffffffc4043cafba t6 : 0000000000040000
>> status: 0000000000000120 badaddr: 000000002749f0d0 cause:
>> 000000000000000f
>> Call Trace:
>> [<ffffffe00008c8b0>] schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>> [<ffffffe000005570>] ret_from_exception+0x0/0x14
>> Dumping ftrace buffer:
>>     (ftrace buffer empty)
>> ---[ end trace b5f8f9231dc87dda ]---
>>
>> The issue comes from the put_user() in schedule_tail
>> (kernel/sched/core.c) doing the following:
>>
>> asmlinkage __visible void schedule_tail(struct task_struct *prev)
>> {
>> ...
>>          if (current->set_child_tid)
>>                  put_user(task_pid_vnr(current), current->set_child_tid);
>> ...
>> }
>>
>> the put_user() macro causes the code sequence to come out as follows:
>>
>> 1:    __enable_user_access()
>> 2:    reg = task_pid_vnr(current);
>> 3:    *current->set_child_tid = reg;
>> 4:    __disable_user_access()
>>
>> The problem is that we may have a sleeping function as argument which
>> could clear SR_SUM causing the panic above. This was fixed by
>> evaluating the argument of the put_user() macro outside the user-enabled
>> section in commit 285a76bb2cf5 ("riscv: evaluate put_user() arg before
>> enabling user access")"
>>
>> In order for riscv to take advantage of unsafe_get/put_XXX() macros and
>> to avoid the same issue we had with put_user() and sleeping functions we
>> must ensure code flow can go through switch_to() from within a region of
>> code with SR_SUM enabled and come back with SR_SUM still enabled. This
>> patch addresses the problem allowing future work to enable full use of
>> unsafe_get/put_XXX() macros without needing to take a CSR bit flip cost
>> on every access. Make switch_to() save and restore SR_SUM.
>>
>> Reported-by: syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.com
>> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
>> Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com>
>> ---
>>   arch/riscv/include/asm/processor.h | 1 +
>>   arch/riscv/kernel/asm-offsets.c    | 5 +++++
>>   arch/riscv/kernel/entry.S          | 8 ++++++++
>>   3 files changed, 14 insertions(+)
>>
>> diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/ 
>> asm/processor.h
>> index 5f56eb9d114a..58fd11c89fe9 100644
>> --- a/arch/riscv/include/asm/processor.h
>> +++ b/arch/riscv/include/asm/processor.h
>> @@ -103,6 +103,7 @@ struct thread_struct {
>>       struct __riscv_d_ext_state fstate;
>>       unsigned long bad_cause;
>>       unsigned long envcfg;
>> +    unsigned long status;
>>       u32 riscv_v_flags;
>>       u32 vstate_ctrl;
>>       struct __riscv_v_ext_state vstate;
>> diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm- 
>> offsets.c
>> index 16490755304e..969c65b1fe41 100644
>> --- a/arch/riscv/kernel/asm-offsets.c
>> +++ b/arch/riscv/kernel/asm-offsets.c
>> @@ -34,6 +34,7 @@ void asm_offsets(void)
>>       OFFSET(TASK_THREAD_S9, task_struct, thread.s[9]);
>>       OFFSET(TASK_THREAD_S10, task_struct, thread.s[10]);
>>       OFFSET(TASK_THREAD_S11, task_struct, thread.s[11]);
>> +    OFFSET(TASK_THREAD_STATUS, task_struct, thread.status);
>>       OFFSET(TASK_TI_CPU, task_struct, thread_info.cpu);
>>       OFFSET(TASK_TI_PREEMPT_COUNT, task_struct, 
>> thread_info.preempt_count);
>> @@ -346,6 +347,10 @@ void asm_offsets(void)
>>             offsetof(struct task_struct, thread.s[11])
>>           - offsetof(struct task_struct, thread.ra)
>>       );
>> +    DEFINE(TASK_THREAD_STATUS_RA,
>> +          offsetof(struct task_struct, thread.status)
>> +        - offsetof(struct task_struct, thread.ra)
>> +    );
>>       DEFINE(TASK_THREAD_F0_F0,
>>             offsetof(struct task_struct, thread.fstate.f[0])
>> diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
>> index 33a5a9f2a0d4..00bd0de9faa2 100644
>> --- a/arch/riscv/kernel/entry.S
>> +++ b/arch/riscv/kernel/entry.S
>> @@ -397,9 +397,17 @@ SYM_FUNC_START(__switch_to)
>>       REG_S s9,  TASK_THREAD_S9_RA(a3)
>>       REG_S s10, TASK_THREAD_S10_RA(a3)
>>       REG_S s11, TASK_THREAD_S11_RA(a3)
>> +
>> +    /* save the user space access flag */
>> +    li    s0, SR_SUM
> 
> 
> This is not needed anymore ^ but I'll remove it when merging your patchset.
> 

Could you be more specific about what "this" is?

If we don't save/restore the SR_SUM bit I think our old friend
the sched_tail bug will just return.


>> +    csrr  s1, CSR_STATUS
>> +    REG_S s1, TASK_THREAD_STATUS_RA(a3)
>> +
>>       /* Save the kernel shadow call stack pointer */
>>       scs_save_current
>>       /* Restore context from next->thread */
>> +    REG_L s0,  TASK_THREAD_STATUS_RA(a4)
>> +    csrs  CSR_STATUS, s0
>>       REG_L ra,  TASK_THREAD_RA_RA(a4)
>>       REG_L sp,  TASK_THREAD_SP_RA(a4)
>>       REG_L s0,  TASK_THREAD_S0_RA(a4)
> 
> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> 
> Thanks for the multiple revisions!
> 
> Alex
> 
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
> 


-- 
Ben Dooks				http://www.codethink.co.uk/
Senior Engineer				Codethink - Providing Genius

https://www.codethink.co.uk/privacy.html

Re: [PATCH v6 1/5] riscv: save the SR_SUM status over switches

Posted by Samuel Holland 8 months, 3 weeks ago

Hi Alex, Ben,

On 2025-05-21 3:26 AM, Ben Dooks wrote:
> On 22/04/2025 11:22, Alexandre Ghiti wrote:
>> Hi Cyril,
>>
>> On 10/04/2025 09:05, Cyril Bur wrote:
>>> From: Ben Dooks <ben.dooks@codethink.co.uk>
>>>
>>> When threads/tasks are switched we need to ensure the old execution's
>>> SR_SUM state is saved and the new thread has the old SR_SUM state
>>> restored.
>>>
>>> The issue was seen under heavy load especially with the syz-stress tool
>>> running, with crashes as follows in schedule_tail:
>>>
>>> Unable to handle kernel access to user memory without uaccess routines
>>> at virtual address 000000002749f0d0
>>> Oops [#1]
>>> Modules linked in:
>>> CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted
>>> 5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0
>>> Hardware name: riscv-virtio,qemu (DT)
>>> epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>>>   ra : task_pid_vnr include/linux/sched.h:1421 [inline]
>>>   ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264
>>> epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp : ffffffe025d17ec0
>>>   gp : ffffffe005d25378 tp : ffffffe00f0d0000 t0 : 0000000000000000
>>>   t1 : 0000000000000001 t2 : 00000000000f4240 s0 : ffffffe025d17ee0
>>>   s1 : 000000002749f0d0 a0 : 000000000000002a a1 : 0000000000000003
>>>   a2 : 1ffffffc0cfac500 a3 : ffffffe0000c80cc a4 : 5ae9db91c19bbe00
>>>   a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffe000082eba
>>>   s2 : 0000000000040000 s3 : ffffffe00eef96c0 s4 : ffffffe022c77fe0
>>>   s5 : 0000000000004000 s6 : ffffffe067d74e00 s7 : ffffffe067d74850
>>>   s8 : ffffffe067d73e18 s9 : ffffffe067d74e00 s10: ffffffe00eef96e8
>>>   s11: 000000ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffffffc4043cafb2
>>>   t5 : ffffffc4043cafba t6 : 0000000000040000
>>> status: 0000000000000120 badaddr: 000000002749f0d0 cause:
>>> 000000000000000f
>>> Call Trace:
>>> [<ffffffe00008c8b0>] schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>>> [<ffffffe000005570>] ret_from_exception+0x0/0x14
>>> Dumping ftrace buffer:
>>>     (ftrace buffer empty)
>>> ---[ end trace b5f8f9231dc87dda ]---
>>>
>>> The issue comes from the put_user() in schedule_tail
>>> (kernel/sched/core.c) doing the following:
>>>
>>> asmlinkage __visible void schedule_tail(struct task_struct *prev)
>>> {
>>> ...
>>>          if (current->set_child_tid)
>>>                  put_user(task_pid_vnr(current), current->set_child_tid);
>>> ...
>>> }
>>>
>>> the put_user() macro causes the code sequence to come out as follows:
>>>
>>> 1:    __enable_user_access()
>>> 2:    reg = task_pid_vnr(current);
>>> 3:    *current->set_child_tid = reg;
>>> 4:    __disable_user_access()
>>>
>>> The problem is that we may have a sleeping function as argument which
>>> could clear SR_SUM causing the panic above. This was fixed by
>>> evaluating the argument of the put_user() macro outside the user-enabled
>>> section in commit 285a76bb2cf5 ("riscv: evaluate put_user() arg before
>>> enabling user access")"
>>>
>>> In order for riscv to take advantage of unsafe_get/put_XXX() macros and
>>> to avoid the same issue we had with put_user() and sleeping functions we
>>> must ensure code flow can go through switch_to() from within a region of
>>> code with SR_SUM enabled and come back with SR_SUM still enabled. This
>>> patch addresses the problem allowing future work to enable full use of
>>> unsafe_get/put_XXX() macros without needing to take a CSR bit flip cost
>>> on every access. Make switch_to() save and restore SR_SUM.
>>>
>>> Reported-by: syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.com
>>> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
>>> Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com>
>>> ---
>>>   arch/riscv/include/asm/processor.h | 1 +
>>>   arch/riscv/kernel/asm-offsets.c    | 5 +++++
>>>   arch/riscv/kernel/entry.S          | 8 ++++++++
>>>   3 files changed, 14 insertions(+)
>>>
>>> diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/ asm/
>>> processor.h
>>> index 5f56eb9d114a..58fd11c89fe9 100644
>>> --- a/arch/riscv/include/asm/processor.h
>>> +++ b/arch/riscv/include/asm/processor.h
>>> @@ -103,6 +103,7 @@ struct thread_struct {
>>>       struct __riscv_d_ext_state fstate;
>>>       unsigned long bad_cause;
>>>       unsigned long envcfg;
>>> +    unsigned long status;
>>>       u32 riscv_v_flags;
>>>       u32 vstate_ctrl;
>>>       struct __riscv_v_ext_state vstate;
>>> diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm- offsets.c
>>> index 16490755304e..969c65b1fe41 100644
>>> --- a/arch/riscv/kernel/asm-offsets.c
>>> +++ b/arch/riscv/kernel/asm-offsets.c
>>> @@ -34,6 +34,7 @@ void asm_offsets(void)
>>>       OFFSET(TASK_THREAD_S9, task_struct, thread.s[9]);
>>>       OFFSET(TASK_THREAD_S10, task_struct, thread.s[10]);
>>>       OFFSET(TASK_THREAD_S11, task_struct, thread.s[11]);
>>> +    OFFSET(TASK_THREAD_STATUS, task_struct, thread.status);
>>>       OFFSET(TASK_TI_CPU, task_struct, thread_info.cpu);
>>>       OFFSET(TASK_TI_PREEMPT_COUNT, task_struct, thread_info.preempt_count);
>>> @@ -346,6 +347,10 @@ void asm_offsets(void)
>>>             offsetof(struct task_struct, thread.s[11])
>>>           - offsetof(struct task_struct, thread.ra)
>>>       );
>>> +    DEFINE(TASK_THREAD_STATUS_RA,
>>> +          offsetof(struct task_struct, thread.status)
>>> +        - offsetof(struct task_struct, thread.ra)
>>> +    );
>>>       DEFINE(TASK_THREAD_F0_F0,
>>>             offsetof(struct task_struct, thread.fstate.f[0])
>>> diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
>>> index 33a5a9f2a0d4..00bd0de9faa2 100644
>>> --- a/arch/riscv/kernel/entry.S
>>> +++ b/arch/riscv/kernel/entry.S
>>> @@ -397,9 +397,17 @@ SYM_FUNC_START(__switch_to)
>>>       REG_S s9,  TASK_THREAD_S9_RA(a3)
>>>       REG_S s10, TASK_THREAD_S10_RA(a3)
>>>       REG_S s11, TASK_THREAD_S11_RA(a3)
>>> +
>>> +    /* save the user space access flag */
>>> +    li    s0, SR_SUM
>>
>>
>> This is not needed anymore ^ but I'll remove it when merging your patchset.
>>
> 
> Could you be more specific about what "this" is?
> 
> If we don't save/restore the SR_SUM bit I think our old friend
> the sched_tail bug will just return.

I think Alex is saying the `li` instruction above is not needed because s0 is
unused. But instead I think there is an `and` instruction missing here. The
patch as merged ORs the entirety of the old sstatus with the new sstatus, not
just the SUM bit, which seems extremely dangerous.

Regards,
Samuel

>>> +    csrr  s1, CSR_STATUS
>>> +    REG_S s1, TASK_THREAD_STATUS_RA(a3)
>>> +
>>>       /* Save the kernel shadow call stack pointer */
>>>       scs_save_current
>>>       /* Restore context from next->thread */
>>> +    REG_L s0,  TASK_THREAD_STATUS_RA(a4)
>>> +    csrs  CSR_STATUS, s0
>>>       REG_L ra,  TASK_THREAD_RA_RA(a4)
>>>       REG_L sp,  TASK_THREAD_SP_RA(a4)
>>>       REG_L s0,  TASK_THREAD_S0_RA(a4)
>>
>> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
>>
>> Thanks for the multiple revisions!
>>
>> Alex
>>
>>
>> _______________________________________________
>> linux-riscv mailing list
>> linux-riscv@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-riscv
>>
> 
>

Re: [PATCH v6 1/5] riscv: save the SR_SUM status over switches

Posted by Alexandre Ghiti 8 months, 3 weeks ago

Hi Samuel,

On 5/21/25 15:38, Samuel Holland wrote:
> Hi Alex, Ben,
>
> On 2025-05-21 3:26 AM, Ben Dooks wrote:
>> On 22/04/2025 11:22, Alexandre Ghiti wrote:
>>> Hi Cyril,
>>>
>>> On 10/04/2025 09:05, Cyril Bur wrote:
>>>> From: Ben Dooks <ben.dooks@codethink.co.uk>
>>>>
>>>> When threads/tasks are switched we need to ensure the old execution's
>>>> SR_SUM state is saved and the new thread has the old SR_SUM state
>>>> restored.
>>>>
>>>> The issue was seen under heavy load especially with the syz-stress tool
>>>> running, with crashes as follows in schedule_tail:
>>>>
>>>> Unable to handle kernel access to user memory without uaccess routines
>>>> at virtual address 000000002749f0d0
>>>> Oops [#1]
>>>> Modules linked in:
>>>> CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted
>>>> 5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0
>>>> Hardware name: riscv-virtio,qemu (DT)
>>>> epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>>>>    ra : task_pid_vnr include/linux/sched.h:1421 [inline]
>>>>    ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264
>>>> epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp : ffffffe025d17ec0
>>>>    gp : ffffffe005d25378 tp : ffffffe00f0d0000 t0 : 0000000000000000
>>>>    t1 : 0000000000000001 t2 : 00000000000f4240 s0 : ffffffe025d17ee0
>>>>    s1 : 000000002749f0d0 a0 : 000000000000002a a1 : 0000000000000003
>>>>    a2 : 1ffffffc0cfac500 a3 : ffffffe0000c80cc a4 : 5ae9db91c19bbe00
>>>>    a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffe000082eba
>>>>    s2 : 0000000000040000 s3 : ffffffe00eef96c0 s4 : ffffffe022c77fe0
>>>>    s5 : 0000000000004000 s6 : ffffffe067d74e00 s7 : ffffffe067d74850
>>>>    s8 : ffffffe067d73e18 s9 : ffffffe067d74e00 s10: ffffffe00eef96e8
>>>>    s11: 000000ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffffffc4043cafb2
>>>>    t5 : ffffffc4043cafba t6 : 0000000000040000
>>>> status: 0000000000000120 badaddr: 000000002749f0d0 cause:
>>>> 000000000000000f
>>>> Call Trace:
>>>> [<ffffffe00008c8b0>] schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>>>> [<ffffffe000005570>] ret_from_exception+0x0/0x14
>>>> Dumping ftrace buffer:
>>>>      (ftrace buffer empty)
>>>> ---[ end trace b5f8f9231dc87dda ]---
>>>>
>>>> The issue comes from the put_user() in schedule_tail
>>>> (kernel/sched/core.c) doing the following:
>>>>
>>>> asmlinkage __visible void schedule_tail(struct task_struct *prev)
>>>> {
>>>> ...
>>>>           if (current->set_child_tid)
>>>>                   put_user(task_pid_vnr(current), current->set_child_tid);
>>>> ...
>>>> }
>>>>
>>>> the put_user() macro causes the code sequence to come out as follows:
>>>>
>>>> 1:    __enable_user_access()
>>>> 2:    reg = task_pid_vnr(current);
>>>> 3:    *current->set_child_tid = reg;
>>>> 4:    __disable_user_access()
>>>>
>>>> The problem is that we may have a sleeping function as argument which
>>>> could clear SR_SUM causing the panic above. This was fixed by
>>>> evaluating the argument of the put_user() macro outside the user-enabled
>>>> section in commit 285a76bb2cf5 ("riscv: evaluate put_user() arg before
>>>> enabling user access")"
>>>>
>>>> In order for riscv to take advantage of unsafe_get/put_XXX() macros and
>>>> to avoid the same issue we had with put_user() and sleeping functions we
>>>> must ensure code flow can go through switch_to() from within a region of
>>>> code with SR_SUM enabled and come back with SR_SUM still enabled. This
>>>> patch addresses the problem allowing future work to enable full use of
>>>> unsafe_get/put_XXX() macros without needing to take a CSR bit flip cost
>>>> on every access. Make switch_to() save and restore SR_SUM.
>>>>
>>>> Reported-by: syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.com
>>>> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
>>>> Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com>
>>>> ---
>>>>    arch/riscv/include/asm/processor.h | 1 +
>>>>    arch/riscv/kernel/asm-offsets.c    | 5 +++++
>>>>    arch/riscv/kernel/entry.S          | 8 ++++++++
>>>>    3 files changed, 14 insertions(+)
>>>>
>>>> diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/ asm/
>>>> processor.h
>>>> index 5f56eb9d114a..58fd11c89fe9 100644
>>>> --- a/arch/riscv/include/asm/processor.h
>>>> +++ b/arch/riscv/include/asm/processor.h
>>>> @@ -103,6 +103,7 @@ struct thread_struct {
>>>>        struct __riscv_d_ext_state fstate;
>>>>        unsigned long bad_cause;
>>>>        unsigned long envcfg;
>>>> +    unsigned long status;
>>>>        u32 riscv_v_flags;
>>>>        u32 vstate_ctrl;
>>>>        struct __riscv_v_ext_state vstate;
>>>> diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm- offsets.c
>>>> index 16490755304e..969c65b1fe41 100644
>>>> --- a/arch/riscv/kernel/asm-offsets.c
>>>> +++ b/arch/riscv/kernel/asm-offsets.c
>>>> @@ -34,6 +34,7 @@ void asm_offsets(void)
>>>>        OFFSET(TASK_THREAD_S9, task_struct, thread.s[9]);
>>>>        OFFSET(TASK_THREAD_S10, task_struct, thread.s[10]);
>>>>        OFFSET(TASK_THREAD_S11, task_struct, thread.s[11]);
>>>> +    OFFSET(TASK_THREAD_STATUS, task_struct, thread.status);
>>>>        OFFSET(TASK_TI_CPU, task_struct, thread_info.cpu);
>>>>        OFFSET(TASK_TI_PREEMPT_COUNT, task_struct, thread_info.preempt_count);
>>>> @@ -346,6 +347,10 @@ void asm_offsets(void)
>>>>              offsetof(struct task_struct, thread.s[11])
>>>>            - offsetof(struct task_struct, thread.ra)
>>>>        );
>>>> +    DEFINE(TASK_THREAD_STATUS_RA,
>>>> +          offsetof(struct task_struct, thread.status)
>>>> +        - offsetof(struct task_struct, thread.ra)
>>>> +    );
>>>>        DEFINE(TASK_THREAD_F0_F0,
>>>>              offsetof(struct task_struct, thread.fstate.f[0])
>>>> diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
>>>> index 33a5a9f2a0d4..00bd0de9faa2 100644
>>>> --- a/arch/riscv/kernel/entry.S
>>>> +++ b/arch/riscv/kernel/entry.S
>>>> @@ -397,9 +397,17 @@ SYM_FUNC_START(__switch_to)
>>>>        REG_S s9,  TASK_THREAD_S9_RA(a3)
>>>>        REG_S s10, TASK_THREAD_S10_RA(a3)
>>>>        REG_S s11, TASK_THREAD_S11_RA(a3)
>>>> +
>>>> +    /* save the user space access flag */
>>>> +    li    s0, SR_SUM
>>>
>>> This is not needed anymore ^ but I'll remove it when merging your patchset.
>>>
>> Could you be more specific about what "this" is?
>>
>> If we don't save/restore the SR_SUM bit I think our old friend
>> the sched_tail bug will just return.
> I think Alex is saying the `li` instruction above is not needed because s0 is
> unused. But instead I think there is an `and` instruction missing here. The
> patch as merged ORs the entirety of the old sstatus with the new sstatus, not
> just the SUM bit, which seems extremely dangerous.


I should have checked the definition of csrs, I thought it would write 
the csr, but you're right it ORs with the current csr value which isn't 
good at all.

@Cyril Can you send a patch for that? Which also removes the `li` 
instruction that I forgot to remove :) I think we can even ask Palmer to 
squash those fixes directly into the patch.

Let me know if you can't do it and I'll do.

Thanks Samuel for noticing,

Alex


>
> Regards,
> Samuel
>
>>>> +    csrr  s1, CSR_STATUS
>>>> +    REG_S s1, TASK_THREAD_STATUS_RA(a3)
>>>> +
>>>>        /* Save the kernel shadow call stack pointer */
>>>>        scs_save_current
>>>>        /* Restore context from next->thread */
>>>> +    REG_L s0,  TASK_THREAD_STATUS_RA(a4)
>>>> +    csrs  CSR_STATUS, s0
>>>>        REG_L ra,  TASK_THREAD_RA_RA(a4)
>>>>        REG_L sp,  TASK_THREAD_SP_RA(a4)
>>>>        REG_L s0,  TASK_THREAD_S0_RA(a4)
>>> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
>>>
>>> Thanks for the multiple revisions!
>>>
>>> Alex
>>>
>>>
>>> _______________________________________________
>>> linux-riscv mailing list
>>> linux-riscv@lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/linux-riscv
>>>
>>
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

Re: [PATCH v6 1/5] riscv: save the SR_SUM status over switches

Posted by Andy Chiu 8 months, 3 weeks ago

Hi Samuel and Alex,

On Wed, May 21, 2025 at 10:35 PM Alexandre Ghiti <alex@ghiti.fr> wrote:
>
> Hi Samuel,
>
> On 5/21/25 15:38, Samuel Holland wrote:
> > Hi Alex, Ben,
> >
> > On 2025-05-21 3:26 AM, Ben Dooks wrote:
> >> On 22/04/2025 11:22, Alexandre Ghiti wrote:
> >>> Hi Cyril,
> >>>
> >>> On 10/04/2025 09:05, Cyril Bur wrote:
> >>>> From: Ben Dooks <ben.dooks@codethink.co.uk>
> >>>>
> >>>> When threads/tasks are switched we need to ensure the old execution's
> >>>> SR_SUM state is saved and the new thread has the old SR_SUM state
> >>>> restored.
> >>>>
> >>>> The issue was seen under heavy load especially with the syz-stress tool
> >>>> running, with crashes as follows in schedule_tail:
> >>>>
> >>>> Unable to handle kernel access to user memory without uaccess routines
> >>>> at virtual address 000000002749f0d0
> >>>> Oops [#1]
> >>>> Modules linked in:
> >>>> CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted
> >>>> 5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0
> >>>> Hardware name: riscv-virtio,qemu (DT)
> >>>> epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
> >>>>    ra : task_pid_vnr include/linux/sched.h:1421 [inline]
> >>>>    ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264
> >>>> epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp : ffffffe025d17ec0
> >>>>    gp : ffffffe005d25378 tp : ffffffe00f0d0000 t0 : 0000000000000000
> >>>>    t1 : 0000000000000001 t2 : 00000000000f4240 s0 : ffffffe025d17ee0
> >>>>    s1 : 000000002749f0d0 a0 : 000000000000002a a1 : 0000000000000003
> >>>>    a2 : 1ffffffc0cfac500 a3 : ffffffe0000c80cc a4 : 5ae9db91c19bbe00
> >>>>    a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffe000082eba
> >>>>    s2 : 0000000000040000 s3 : ffffffe00eef96c0 s4 : ffffffe022c77fe0
> >>>>    s5 : 0000000000004000 s6 : ffffffe067d74e00 s7 : ffffffe067d74850
> >>>>    s8 : ffffffe067d73e18 s9 : ffffffe067d74e00 s10: ffffffe00eef96e8
> >>>>    s11: 000000ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffffffc4043cafb2
> >>>>    t5 : ffffffc4043cafba t6 : 0000000000040000
> >>>> status: 0000000000000120 badaddr: 000000002749f0d0 cause:
> >>>> 000000000000000f
> >>>> Call Trace:
> >>>> [<ffffffe00008c8b0>] schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
> >>>> [<ffffffe000005570>] ret_from_exception+0x0/0x14
> >>>> Dumping ftrace buffer:
> >>>>      (ftrace buffer empty)
> >>>> ---[ end trace b5f8f9231dc87dda ]---
> >>>>
> >>>> The issue comes from the put_user() in schedule_tail
> >>>> (kernel/sched/core.c) doing the following:
> >>>>
> >>>> asmlinkage __visible void schedule_tail(struct task_struct *prev)
> >>>> {
> >>>> ...
> >>>>           if (current->set_child_tid)
> >>>>                   put_user(task_pid_vnr(current), current->set_child_tid);
> >>>> ...
> >>>> }
> >>>>
> >>>> the put_user() macro causes the code sequence to come out as follows:
> >>>>
> >>>> 1:    __enable_user_access()
> >>>> 2:    reg = task_pid_vnr(current);
> >>>> 3:    *current->set_child_tid = reg;
> >>>> 4:    __disable_user_access()
> >>>>
> >>>> The problem is that we may have a sleeping function as argument which
> >>>> could clear SR_SUM causing the panic above. This was fixed by
> >>>> evaluating the argument of the put_user() macro outside the user-enabled
> >>>> section in commit 285a76bb2cf5 ("riscv: evaluate put_user() arg before
> >>>> enabling user access")"
> >>>>
> >>>> In order for riscv to take advantage of unsafe_get/put_XXX() macros and
> >>>> to avoid the same issue we had with put_user() and sleeping functions we
> >>>> must ensure code flow can go through switch_to() from within a region of
> >>>> code with SR_SUM enabled and come back with SR_SUM still enabled. This
> >>>> patch addresses the problem allowing future work to enable full use of
> >>>> unsafe_get/put_XXX() macros without needing to take a CSR bit flip cost
> >>>> on every access. Make switch_to() save and restore SR_SUM.
> >>>>
> >>>> Reported-by: syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.com
> >>>> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
> >>>> Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com>
> >>>> ---
> >>>>    arch/riscv/include/asm/processor.h | 1 +
> >>>>    arch/riscv/kernel/asm-offsets.c    | 5 +++++
> >>>>    arch/riscv/kernel/entry.S          | 8 ++++++++
> >>>>    3 files changed, 14 insertions(+)
> >>>>
> >>>> diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/ asm/
> >>>> processor.h
> >>>> index 5f56eb9d114a..58fd11c89fe9 100644
> >>>> --- a/arch/riscv/include/asm/processor.h
> >>>> +++ b/arch/riscv/include/asm/processor.h
> >>>> @@ -103,6 +103,7 @@ struct thread_struct {
> >>>>        struct __riscv_d_ext_state fstate;
> >>>>        unsigned long bad_cause;
> >>>>        unsigned long envcfg;
> >>>> +    unsigned long status;
> >>>>        u32 riscv_v_flags;
> >>>>        u32 vstate_ctrl;
> >>>>        struct __riscv_v_ext_state vstate;
> >>>> diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm- offsets.c
> >>>> index 16490755304e..969c65b1fe41 100644
> >>>> --- a/arch/riscv/kernel/asm-offsets.c
> >>>> +++ b/arch/riscv/kernel/asm-offsets.c
> >>>> @@ -34,6 +34,7 @@ void asm_offsets(void)
> >>>>        OFFSET(TASK_THREAD_S9, task_struct, thread.s[9]);
> >>>>        OFFSET(TASK_THREAD_S10, task_struct, thread.s[10]);
> >>>>        OFFSET(TASK_THREAD_S11, task_struct, thread.s[11]);
> >>>> +    OFFSET(TASK_THREAD_STATUS, task_struct, thread.status);
> >>>>        OFFSET(TASK_TI_CPU, task_struct, thread_info.cpu);
> >>>>        OFFSET(TASK_TI_PREEMPT_COUNT, task_struct, thread_info.preempt_count);
> >>>> @@ -346,6 +347,10 @@ void asm_offsets(void)
> >>>>              offsetof(struct task_struct, thread.s[11])
> >>>>            - offsetof(struct task_struct, thread.ra)
> >>>>        );
> >>>> +    DEFINE(TASK_THREAD_STATUS_RA,
> >>>> +          offsetof(struct task_struct, thread.status)
> >>>> +        - offsetof(struct task_struct, thread.ra)
> >>>> +    );
> >>>>        DEFINE(TASK_THREAD_F0_F0,
> >>>>              offsetof(struct task_struct, thread.fstate.f[0])
> >>>> diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
> >>>> index 33a5a9f2a0d4..00bd0de9faa2 100644
> >>>> --- a/arch/riscv/kernel/entry.S
> >>>> +++ b/arch/riscv/kernel/entry.S
> >>>> @@ -397,9 +397,17 @@ SYM_FUNC_START(__switch_to)
> >>>>        REG_S s9,  TASK_THREAD_S9_RA(a3)
> >>>>        REG_S s10, TASK_THREAD_S10_RA(a3)
> >>>>        REG_S s11, TASK_THREAD_S11_RA(a3)
> >>>> +
> >>>> +    /* save the user space access flag */
> >>>> +    li    s0, SR_SUM
> >>>
> >>> This is not needed anymore ^ but I'll remove it when merging your patchset.
> >>>
> >> Could you be more specific about what "this" is?
> >>
> >> If we don't save/restore the SR_SUM bit I think our old friend
> >> the sched_tail bug will just return.
> > I think Alex is saying the `li` instruction above is not needed because s0 is
> > unused. But instead I think there is an `and` instruction missing here. The
> > patch as merged ORs the entirety of the old sstatus with the new sstatus, not
> > just the SUM bit, which seems extremely dangerous.
>

Thanks for noticing this. I've also spent a bit of time pondering...

If this were an "and" instruction, I think we should rename the struct
to "status_sum" to prevent confusions, as it only holds the SUM bit
now. Or maybe we could create a bitfield "any only touch "and
save/restore" the specified bit.

Thanks,
Andy


>
> I should have checked the definition of csrs, I thought it would write
> the csr, but you're right it ORs with the current csr value which isn't
> good at all.
>
> @Cyril Can you send a patch for that? Which also removes the `li`
> instruction that I forgot to remove :) I think we can even ask Palmer to
> squash those fixes directly into the patch.
>
> Let me know if you can't do it and I'll do.
>
> Thanks Samuel for noticing,
>
> Alex
>
>
> >
> > Regards,
> > Samuel
> >
> >>>> +    csrr  s1, CSR_STATUS
> >>>> +    REG_S s1, TASK_THREAD_STATUS_RA(a3)
> >>>> +
> >>>>        /* Save the kernel shadow call stack pointer */
> >>>>        scs_save_current
> >>>>        /* Restore context from next->thread */
> >>>> +    REG_L s0,  TASK_THREAD_STATUS_RA(a4)
> >>>> +    csrs  CSR_STATUS, s0
> >>>>        REG_L ra,  TASK_THREAD_RA_RA(a4)
> >>>>        REG_L sp,  TASK_THREAD_SP_RA(a4)
> >>>>        REG_L s0,  TASK_THREAD_S0_RA(a4)
> >>> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> >>>
> >>> Thanks for the multiple revisions!
> >>>
> >>> Alex
> >>>
> >>>
> >>> _______________________________________________
> >>> linux-riscv mailing list
> >>> linux-riscv@lists.infradead.org
> >>> http://lists.infradead.org/mailman/listinfo/linux-riscv
> >>>
> >>
> >
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-riscv
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

Re: [PATCH v6 1/5] riscv: save the SR_SUM status over switches

Posted by Ben Dooks 8 months, 3 weeks ago

On 22/05/2025 18:40, Andy Chiu wrote:
> Hi Samuel and Alex,
> 
> On Wed, May 21, 2025 at 10:35 PM Alexandre Ghiti <alex@ghiti.fr> wrote:
>>
>> Hi Samuel,
>>
>> On 5/21/25 15:38, Samuel Holland wrote:
>>> Hi Alex, Ben,
>>>
>>> On 2025-05-21 3:26 AM, Ben Dooks wrote:
>>>> On 22/04/2025 11:22, Alexandre Ghiti wrote:
>>>>> Hi Cyril,
>>>>>
>>>>> On 10/04/2025 09:05, Cyril Bur wrote:
>>>>>> From: Ben Dooks <ben.dooks@codethink.co.uk>
>>>>>>
>>>>>> When threads/tasks are switched we need to ensure the old execution's
>>>>>> SR_SUM state is saved and the new thread has the old SR_SUM state
>>>>>> restored.
>>>>>>
>>>>>> The issue was seen under heavy load especially with the syz-stress tool
>>>>>> running, with crashes as follows in schedule_tail:
>>>>>>
>>>>>> Unable to handle kernel access to user memory without uaccess routines
>>>>>> at virtual address 000000002749f0d0
>>>>>> Oops [#1]
>>>>>> Modules linked in:
>>>>>> CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted
>>>>>> 5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0
>>>>>> Hardware name: riscv-virtio,qemu (DT)
>>>>>> epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>>>>>>     ra : task_pid_vnr include/linux/sched.h:1421 [inline]
>>>>>>     ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264
>>>>>> epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp : ffffffe025d17ec0
>>>>>>     gp : ffffffe005d25378 tp : ffffffe00f0d0000 t0 : 0000000000000000
>>>>>>     t1 : 0000000000000001 t2 : 00000000000f4240 s0 : ffffffe025d17ee0
>>>>>>     s1 : 000000002749f0d0 a0 : 000000000000002a a1 : 0000000000000003
>>>>>>     a2 : 1ffffffc0cfac500 a3 : ffffffe0000c80cc a4 : 5ae9db91c19bbe00
>>>>>>     a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffe000082eba
>>>>>>     s2 : 0000000000040000 s3 : ffffffe00eef96c0 s4 : ffffffe022c77fe0
>>>>>>     s5 : 0000000000004000 s6 : ffffffe067d74e00 s7 : ffffffe067d74850
>>>>>>     s8 : ffffffe067d73e18 s9 : ffffffe067d74e00 s10: ffffffe00eef96e8
>>>>>>     s11: 000000ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffffffc4043cafb2
>>>>>>     t5 : ffffffc4043cafba t6 : 0000000000040000
>>>>>> status: 0000000000000120 badaddr: 000000002749f0d0 cause:
>>>>>> 000000000000000f
>>>>>> Call Trace:
>>>>>> [<ffffffe00008c8b0>] schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>>>>>> [<ffffffe000005570>] ret_from_exception+0x0/0x14
>>>>>> Dumping ftrace buffer:
>>>>>>       (ftrace buffer empty)
>>>>>> ---[ end trace b5f8f9231dc87dda ]---
>>>>>>
>>>>>> The issue comes from the put_user() in schedule_tail
>>>>>> (kernel/sched/core.c) doing the following:
>>>>>>
>>>>>> asmlinkage __visible void schedule_tail(struct task_struct *prev)
>>>>>> {
>>>>>> ...
>>>>>>            if (current->set_child_tid)
>>>>>>                    put_user(task_pid_vnr(current), current->set_child_tid);
>>>>>> ...
>>>>>> }
>>>>>>
>>>>>> the put_user() macro causes the code sequence to come out as follows:
>>>>>>
>>>>>> 1:    __enable_user_access()
>>>>>> 2:    reg = task_pid_vnr(current);
>>>>>> 3:    *current->set_child_tid = reg;
>>>>>> 4:    __disable_user_access()
>>>>>>
>>>>>> The problem is that we may have a sleeping function as argument which
>>>>>> could clear SR_SUM causing the panic above. This was fixed by
>>>>>> evaluating the argument of the put_user() macro outside the user-enabled
>>>>>> section in commit 285a76bb2cf5 ("riscv: evaluate put_user() arg before
>>>>>> enabling user access")"
>>>>>>
>>>>>> In order for riscv to take advantage of unsafe_get/put_XXX() macros and
>>>>>> to avoid the same issue we had with put_user() and sleeping functions we
>>>>>> must ensure code flow can go through switch_to() from within a region of
>>>>>> code with SR_SUM enabled and come back with SR_SUM still enabled. This
>>>>>> patch addresses the problem allowing future work to enable full use of
>>>>>> unsafe_get/put_XXX() macros without needing to take a CSR bit flip cost
>>>>>> on every access. Make switch_to() save and restore SR_SUM.
>>>>>>
>>>>>> Reported-by: syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.com
>>>>>> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
>>>>>> Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com>
>>>>>> ---
>>>>>>     arch/riscv/include/asm/processor.h | 1 +
>>>>>>     arch/riscv/kernel/asm-offsets.c    | 5 +++++
>>>>>>     arch/riscv/kernel/entry.S          | 8 ++++++++
>>>>>>     3 files changed, 14 insertions(+)
>>>>>>
>>>>>> diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/ asm/
>>>>>> processor.h
>>>>>> index 5f56eb9d114a..58fd11c89fe9 100644
>>>>>> --- a/arch/riscv/include/asm/processor.h
>>>>>> +++ b/arch/riscv/include/asm/processor.h
>>>>>> @@ -103,6 +103,7 @@ struct thread_struct {
>>>>>>         struct __riscv_d_ext_state fstate;
>>>>>>         unsigned long bad_cause;
>>>>>>         unsigned long envcfg;
>>>>>> +    unsigned long status;
>>>>>>         u32 riscv_v_flags;
>>>>>>         u32 vstate_ctrl;
>>>>>>         struct __riscv_v_ext_state vstate;
>>>>>> diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm- offsets.c
>>>>>> index 16490755304e..969c65b1fe41 100644
>>>>>> --- a/arch/riscv/kernel/asm-offsets.c
>>>>>> +++ b/arch/riscv/kernel/asm-offsets.c
>>>>>> @@ -34,6 +34,7 @@ void asm_offsets(void)
>>>>>>         OFFSET(TASK_THREAD_S9, task_struct, thread.s[9]);
>>>>>>         OFFSET(TASK_THREAD_S10, task_struct, thread.s[10]);
>>>>>>         OFFSET(TASK_THREAD_S11, task_struct, thread.s[11]);
>>>>>> +    OFFSET(TASK_THREAD_STATUS, task_struct, thread.status);
>>>>>>         OFFSET(TASK_TI_CPU, task_struct, thread_info.cpu);
>>>>>>         OFFSET(TASK_TI_PREEMPT_COUNT, task_struct, thread_info.preempt_count);
>>>>>> @@ -346,6 +347,10 @@ void asm_offsets(void)
>>>>>>               offsetof(struct task_struct, thread.s[11])
>>>>>>             - offsetof(struct task_struct, thread.ra)
>>>>>>         );
>>>>>> +    DEFINE(TASK_THREAD_STATUS_RA,
>>>>>> +          offsetof(struct task_struct, thread.status)
>>>>>> +        - offsetof(struct task_struct, thread.ra)
>>>>>> +    );
>>>>>>         DEFINE(TASK_THREAD_F0_F0,
>>>>>>               offsetof(struct task_struct, thread.fstate.f[0])
>>>>>> diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
>>>>>> index 33a5a9f2a0d4..00bd0de9faa2 100644
>>>>>> --- a/arch/riscv/kernel/entry.S
>>>>>> +++ b/arch/riscv/kernel/entry.S
>>>>>> @@ -397,9 +397,17 @@ SYM_FUNC_START(__switch_to)
>>>>>>         REG_S s9,  TASK_THREAD_S9_RA(a3)
>>>>>>         REG_S s10, TASK_THREAD_S10_RA(a3)
>>>>>>         REG_S s11, TASK_THREAD_S11_RA(a3)
>>>>>> +
>>>>>> +    /* save the user space access flag */
>>>>>> +    li    s0, SR_SUM
>>>>>
>>>>> This is not needed anymore ^ but I'll remove it when merging your patchset.
>>>>>
>>>> Could you be more specific about what "this" is?
>>>>
>>>> If we don't save/restore the SR_SUM bit I think our old friend
>>>> the sched_tail bug will just return.
>>> I think Alex is saying the `li` instruction above is not needed because s0 is
>>> unused. But instead I think there is an `and` instruction missing here. The
>>> patch as merged ORs the entirety of the old sstatus with the new sstatus, not
>>> just the SUM bit, which seems extremely dangerous.
>>
> 
> Thanks for noticing this. I've also spent a bit of time pondering...
> 
> If this were an "and" instruction, I think we should rename the struct
> to "status_sum" to prevent confusions, as it only holds the SUM bit
> now. Or maybe we could create a bitfield "any only touch "and
> save/restore" the specified bit.
> 
> Thanks,
> Andy

So, is it worth just saving/restoring all the flags in the SSTATUS
or do we need to have some sort of mask (and if so, are there other
flags we should make sure get saved?)

I don't have time to setup a test system at the moment and I am out
of office until Tuesday 27th anyway with limited email access to my
codethink emails.

-- 
Ben Dooks				http://www.codethink.co.uk/
Senior Engineer				Codethink - Providing Genius

https://www.codethink.co.uk/privacy.html

Re: [PATCH v6 1/5] riscv: save the SR_SUM status over switches

Posted by Cyril Bur 8 months, 3 weeks ago

Hi Alex,

On 21/5/2025 12:30 am, Alexandre Ghiti wrote:
> Hi Samuel,
> 
> On 5/21/25 15:38, Samuel Holland wrote:
>> Hi Alex, Ben,
>>
>> On 2025-05-21 3:26 AM, Ben Dooks wrote:
>>> On 22/04/2025 11:22, Alexandre Ghiti wrote:
>>>> Hi Cyril,
>>>>
>>>> On 10/04/2025 09:05, Cyril Bur wrote:
>>>>> From: Ben Dooks <ben.dooks@codethink.co.uk>
>>>>>
>>>>> When threads/tasks are switched we need to ensure the old execution's
>>>>> SR_SUM state is saved and the new thread has the old SR_SUM state
>>>>> restored.
>>>>>
>>>>> The issue was seen under heavy load especially with the syz-stress 
>>>>> tool
>>>>> running, with crashes as follows in schedule_tail:
>>>>>
>>>>> Unable to handle kernel access to user memory without uaccess routines
>>>>> at virtual address 000000002749f0d0
>>>>> Oops [#1]
>>>>> Modules linked in:
>>>>> CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted
>>>>> 5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0
>>>>> Hardware name: riscv-virtio,qemu (DT)
>>>>> epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>>>>>    ra : task_pid_vnr include/linux/sched.h:1421 [inline]
>>>>>    ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264
>>>>> epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp : ffffffe025d17ec0
>>>>>    gp : ffffffe005d25378 tp : ffffffe00f0d0000 t0 : 0000000000000000
>>>>>    t1 : 0000000000000001 t2 : 00000000000f4240 s0 : ffffffe025d17ee0
>>>>>    s1 : 000000002749f0d0 a0 : 000000000000002a a1 : 0000000000000003
>>>>>    a2 : 1ffffffc0cfac500 a3 : ffffffe0000c80cc a4 : 5ae9db91c19bbe00
>>>>>    a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffe000082eba
>>>>>    s2 : 0000000000040000 s3 : ffffffe00eef96c0 s4 : ffffffe022c77fe0
>>>>>    s5 : 0000000000004000 s6 : ffffffe067d74e00 s7 : ffffffe067d74850
>>>>>    s8 : ffffffe067d73e18 s9 : ffffffe067d74e00 s10: ffffffe00eef96e8
>>>>>    s11: 000000ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffffffc4043cafb2
>>>>>    t5 : ffffffc4043cafba t6 : 0000000000040000
>>>>> status: 0000000000000120 badaddr: 000000002749f0d0 cause:
>>>>> 000000000000000f
>>>>> Call Trace:
>>>>> [<ffffffe00008c8b0>] schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>>>>> [<ffffffe000005570>] ret_from_exception+0x0/0x14
>>>>> Dumping ftrace buffer:
>>>>>      (ftrace buffer empty)
>>>>> ---[ end trace b5f8f9231dc87dda ]---
>>>>>
>>>>> The issue comes from the put_user() in schedule_tail
>>>>> (kernel/sched/core.c) doing the following:
>>>>>
>>>>> asmlinkage __visible void schedule_tail(struct task_struct *prev)
>>>>> {
>>>>> ...
>>>>>           if (current->set_child_tid)
>>>>>                   put_user(task_pid_vnr(current), current- 
>>>>> >set_child_tid);
>>>>> ...
>>>>> }
>>>>>
>>>>> the put_user() macro causes the code sequence to come out as follows:
>>>>>
>>>>> 1:    __enable_user_access()
>>>>> 2:    reg = task_pid_vnr(current);
>>>>> 3:    *current->set_child_tid = reg;
>>>>> 4:    __disable_user_access()
>>>>>
>>>>> The problem is that we may have a sleeping function as argument which
>>>>> could clear SR_SUM causing the panic above. This was fixed by
>>>>> evaluating the argument of the put_user() macro outside the user- 
>>>>> enabled
>>>>> section in commit 285a76bb2cf5 ("riscv: evaluate put_user() arg before
>>>>> enabling user access")"
>>>>>
>>>>> In order for riscv to take advantage of unsafe_get/put_XXX() macros 
>>>>> and
>>>>> to avoid the same issue we had with put_user() and sleeping 
>>>>> functions we
>>>>> must ensure code flow can go through switch_to() from within a 
>>>>> region of
>>>>> code with SR_SUM enabled and come back with SR_SUM still enabled. This
>>>>> patch addresses the problem allowing future work to enable full use of
>>>>> unsafe_get/put_XXX() macros without needing to take a CSR bit flip 
>>>>> cost
>>>>> on every access. Make switch_to() save and restore SR_SUM.
>>>>>
>>>>> Reported-by: syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.com
>>>>> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
>>>>> Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com>
>>>>> ---
>>>>>    arch/riscv/include/asm/processor.h | 1 +
>>>>>    arch/riscv/kernel/asm-offsets.c    | 5 +++++
>>>>>    arch/riscv/kernel/entry.S          | 8 ++++++++
>>>>>    3 files changed, 14 insertions(+)
>>>>>
>>>>> diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/ 
>>>>> include/ asm/
>>>>> processor.h
>>>>> index 5f56eb9d114a..58fd11c89fe9 100644
>>>>> --- a/arch/riscv/include/asm/processor.h
>>>>> +++ b/arch/riscv/include/asm/processor.h
>>>>> @@ -103,6 +103,7 @@ struct thread_struct {
>>>>>        struct __riscv_d_ext_state fstate;
>>>>>        unsigned long bad_cause;
>>>>>        unsigned long envcfg;
>>>>> +    unsigned long status;
>>>>>        u32 riscv_v_flags;
>>>>>        u32 vstate_ctrl;
>>>>>        struct __riscv_v_ext_state vstate;
>>>>> diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/ 
>>>>> asm- offsets.c
>>>>> index 16490755304e..969c65b1fe41 100644
>>>>> --- a/arch/riscv/kernel/asm-offsets.c
>>>>> +++ b/arch/riscv/kernel/asm-offsets.c
>>>>> @@ -34,6 +34,7 @@ void asm_offsets(void)
>>>>>        OFFSET(TASK_THREAD_S9, task_struct, thread.s[9]);
>>>>>        OFFSET(TASK_THREAD_S10, task_struct, thread.s[10]);
>>>>>        OFFSET(TASK_THREAD_S11, task_struct, thread.s[11]);
>>>>> +    OFFSET(TASK_THREAD_STATUS, task_struct, thread.status);
>>>>>        OFFSET(TASK_TI_CPU, task_struct, thread_info.cpu);
>>>>>        OFFSET(TASK_TI_PREEMPT_COUNT, task_struct, 
>>>>> thread_info.preempt_count);
>>>>> @@ -346,6 +347,10 @@ void asm_offsets(void)
>>>>>              offsetof(struct task_struct, thread.s[11])
>>>>>            - offsetof(struct task_struct, thread.ra)
>>>>>        );
>>>>> +    DEFINE(TASK_THREAD_STATUS_RA,
>>>>> +          offsetof(struct task_struct, thread.status)
>>>>> +        - offsetof(struct task_struct, thread.ra)
>>>>> +    );
>>>>>        DEFINE(TASK_THREAD_F0_F0,
>>>>>              offsetof(struct task_struct, thread.fstate.f[0])
>>>>> diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
>>>>> index 33a5a9f2a0d4..00bd0de9faa2 100644
>>>>> --- a/arch/riscv/kernel/entry.S
>>>>> +++ b/arch/riscv/kernel/entry.S
>>>>> @@ -397,9 +397,17 @@ SYM_FUNC_START(__switch_to)
>>>>>        REG_S s9,  TASK_THREAD_S9_RA(a3)
>>>>>        REG_S s10, TASK_THREAD_S10_RA(a3)
>>>>>        REG_S s11, TASK_THREAD_S11_RA(a3)
>>>>> +
>>>>> +    /* save the user space access flag */
>>>>> +    li    s0, SR_SUM
>>>>
>>>> This is not needed anymore ^ but I'll remove it when merging your 
>>>> patchset.
>>>>
>>> Could you be more specific about what "this" is?
>>>
>>> If we don't save/restore the SR_SUM bit I think our old friend
>>> the sched_tail bug will just return.
>> I think Alex is saying the `li` instruction above is not needed 
>> because s0 is
>> unused. But instead I think there is an `and` instruction missing 
>> here. The
>> patch as merged ORs the entirety of the old sstatus with the new 
>> sstatus, not
>> just the SUM bit, which seems extremely dangerous.
> 
> 
> I should have checked the definition of csrs, I thought it would write 
> the csr, but you're right it ORs with the current csr value which isn't 
> good at all.
> 
> @Cyril Can you send a patch for that? Which also removes the `li` 
> instruction that I forgot to remove :) I think we can even ask Palmer to 
> squash those fixes directly into the patch.

Yes can do, I'll whip something up.

Cyril
> 
> Let me know if you can't do it and I'll do.
> 
> Thanks Samuel for noticing,
> 
> Alex
> 
> 
>>
>> Regards,
>> Samuel
>>
>>>>> +    csrr  s1, CSR_STATUS
>>>>> +    REG_S s1, TASK_THREAD_STATUS_RA(a3)
>>>>> +
>>>>>        /* Save the kernel shadow call stack pointer */
>>>>>        scs_save_current
>>>>>        /* Restore context from next->thread */
>>>>> +    REG_L s0,  TASK_THREAD_STATUS_RA(a4)
>>>>> +    csrs  CSR_STATUS, s0
>>>>>        REG_L ra,  TASK_THREAD_RA_RA(a4)
>>>>>        REG_L sp,  TASK_THREAD_SP_RA(a4)
>>>>>        REG_L s0,  TASK_THREAD_S0_RA(a4)
>>>> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
>>>>
>>>> Thanks for the multiple revisions!
>>>>
>>>> Alex
>>>>
>>>>
>>>> _______________________________________________
>>>> linux-riscv mailing list
>>>> linux-riscv@lists.infradead.org
>>>> http://lists.infradead.org/mailman/listinfo/linux-riscv
>>>>
>>>
>>
>> _______________________________________________
>> linux-riscv mailing list
>> linux-riscv@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-riscv

Re: [EXT] Re: [PATCH v6 1/5] riscv: save the SR_SUM status over switches

Posted by Cyril Bur 8 months, 3 weeks ago

Hi all,

On 21/5/2025 12:30 am, Alexandre Ghiti wrote:
> Hi Samuel,
> 
> On 5/21/25 15:38, Samuel Holland wrote:
>> Hi Alex, Ben,
>>
>> On 2025-05-21 3:26 AM, Ben Dooks wrote:
>>> On 22/04/2025 11:22, Alexandre Ghiti wrote:
>>>> Hi Cyril,
>>>>
>>>> On 10/04/2025 09:05, Cyril Bur wrote:
>>>>> From: Ben Dooks <ben.dooks@codethink.co.uk>
>>>>>
>>>>> When threads/tasks are switched we need to ensure the old execution's
>>>>> SR_SUM state is saved and the new thread has the old SR_SUM state
>>>>> restored.
>>>>>
>>>>> The issue was seen under heavy load especially with the syz-stress 
>>>>> tool
>>>>> running, with crashes as follows in schedule_tail:
>>>>>
>>>>> Unable to handle kernel access to user memory without uaccess routines
>>>>> at virtual address 000000002749f0d0
>>>>> Oops [#1]
>>>>> Modules linked in:
>>>>> CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted
>>>>> 5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0
>>>>> Hardware name: riscv-virtio,qemu (DT)
>>>>> epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>>>>>    ra : task_pid_vnr include/linux/sched.h:1421 [inline]
>>>>>    ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264
>>>>> epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp : ffffffe025d17ec0
>>>>>    gp : ffffffe005d25378 tp : ffffffe00f0d0000 t0 : 0000000000000000
>>>>>    t1 : 0000000000000001 t2 : 00000000000f4240 s0 : ffffffe025d17ee0
>>>>>    s1 : 000000002749f0d0 a0 : 000000000000002a a1 : 0000000000000003
>>>>>    a2 : 1ffffffc0cfac500 a3 : ffffffe0000c80cc a4 : 5ae9db91c19bbe00
>>>>>    a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffe000082eba
>>>>>    s2 : 0000000000040000 s3 : ffffffe00eef96c0 s4 : ffffffe022c77fe0
>>>>>    s5 : 0000000000004000 s6 : ffffffe067d74e00 s7 : ffffffe067d74850
>>>>>    s8 : ffffffe067d73e18 s9 : ffffffe067d74e00 s10: ffffffe00eef96e8
>>>>>    s11: 000000ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffffffc4043cafb2
>>>>>    t5 : ffffffc4043cafba t6 : 0000000000040000
>>>>> status: 0000000000000120 badaddr: 000000002749f0d0 cause:
>>>>> 000000000000000f
>>>>> Call Trace:
>>>>> [<ffffffe00008c8b0>] schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>>>>> [<ffffffe000005570>] ret_from_exception+0x0/0x14
>>>>> Dumping ftrace buffer:
>>>>>      (ftrace buffer empty)
>>>>> ---[ end trace b5f8f9231dc87dda ]---
>>>>>
>>>>> The issue comes from the put_user() in schedule_tail
>>>>> (kernel/sched/core.c) doing the following:
>>>>>
>>>>> asmlinkage __visible void schedule_tail(struct task_struct *prev)
>>>>> {
>>>>> ...
>>>>>           if (current->set_child_tid)
>>>>>                   put_user(task_pid_vnr(current), current- 
>>>>> >set_child_tid);
>>>>> ...
>>>>> }
>>>>>
>>>>> the put_user() macro causes the code sequence to come out as follows:
>>>>>
>>>>> 1:    __enable_user_access()
>>>>> 2:    reg = task_pid_vnr(current);
>>>>> 3:    *current->set_child_tid = reg;
>>>>> 4:    __disable_user_access()
>>>>>
>>>>> The problem is that we may have a sleeping function as argument which
>>>>> could clear SR_SUM causing the panic above. This was fixed by
>>>>> evaluating the argument of the put_user() macro outside the user- 
>>>>> enabled
>>>>> section in commit 285a76bb2cf5 ("riscv: evaluate put_user() arg before
>>>>> enabling user access")"
>>>>>
>>>>> In order for riscv to take advantage of unsafe_get/put_XXX() macros 
>>>>> and
>>>>> to avoid the same issue we had with put_user() and sleeping 
>>>>> functions we
>>>>> must ensure code flow can go through switch_to() from within a 
>>>>> region of
>>>>> code with SR_SUM enabled and come back with SR_SUM still enabled. This
>>>>> patch addresses the problem allowing future work to enable full use of
>>>>> unsafe_get/put_XXX() macros without needing to take a CSR bit flip 
>>>>> cost
>>>>> on every access. Make switch_to() save and restore SR_SUM.
>>>>>
>>>>> Reported-by: syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.com
>>>>> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
>>>>> Signed-off-by: Cyril Bur <cyrilbur@tenstorrent.com>
>>>>> ---
>>>>>    arch/riscv/include/asm/processor.h | 1 +
>>>>>    arch/riscv/kernel/asm-offsets.c    | 5 +++++
>>>>>    arch/riscv/kernel/entry.S          | 8 ++++++++
>>>>>    3 files changed, 14 insertions(+)
>>>>>
>>>>> diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/ 
>>>>> include/ asm/
>>>>> processor.h
>>>>> index 5f56eb9d114a..58fd11c89fe9 100644
>>>>> --- a/arch/riscv/include/asm/processor.h
>>>>> +++ b/arch/riscv/include/asm/processor.h
>>>>> @@ -103,6 +103,7 @@ struct thread_struct {
>>>>>        struct __riscv_d_ext_state fstate;
>>>>>        unsigned long bad_cause;
>>>>>        unsigned long envcfg;
>>>>> +    unsigned long status;
>>>>>        u32 riscv_v_flags;
>>>>>        u32 vstate_ctrl;
>>>>>        struct __riscv_v_ext_state vstate;
>>>>> diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/ 
>>>>> asm- offsets.c
>>>>> index 16490755304e..969c65b1fe41 100644
>>>>> --- a/arch/riscv/kernel/asm-offsets.c
>>>>> +++ b/arch/riscv/kernel/asm-offsets.c
>>>>> @@ -34,6 +34,7 @@ void asm_offsets(void)
>>>>>        OFFSET(TASK_THREAD_S9, task_struct, thread.s[9]);
>>>>>        OFFSET(TASK_THREAD_S10, task_struct, thread.s[10]);
>>>>>        OFFSET(TASK_THREAD_S11, task_struct, thread.s[11]);
>>>>> +    OFFSET(TASK_THREAD_STATUS, task_struct, thread.status);
>>>>>        OFFSET(TASK_TI_CPU, task_struct, thread_info.cpu);
>>>>>        OFFSET(TASK_TI_PREEMPT_COUNT, task_struct, 
>>>>> thread_info.preempt_count);
>>>>> @@ -346,6 +347,10 @@ void asm_offsets(void)
>>>>>              offsetof(struct task_struct, thread.s[11])
>>>>>            - offsetof(struct task_struct, thread.ra)
>>>>>        );
>>>>> +    DEFINE(TASK_THREAD_STATUS_RA,
>>>>> +          offsetof(struct task_struct, thread.status)
>>>>> +        - offsetof(struct task_struct, thread.ra)
>>>>> +    );
>>>>>        DEFINE(TASK_THREAD_F0_F0,
>>>>>              offsetof(struct task_struct, thread.fstate.f[0])
>>>>> diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
>>>>> index 33a5a9f2a0d4..00bd0de9faa2 100644
>>>>> --- a/arch/riscv/kernel/entry.S
>>>>> +++ b/arch/riscv/kernel/entry.S
>>>>> @@ -397,9 +397,17 @@ SYM_FUNC_START(__switch_to)
>>>>>        REG_S s9,  TASK_THREAD_S9_RA(a3)
>>>>>        REG_S s10, TASK_THREAD_S10_RA(a3)
>>>>>        REG_S s11, TASK_THREAD_S11_RA(a3)
>>>>> +
>>>>> +    /* save the user space access flag */
>>>>> +    li    s0, SR_SUM
>>>>
>>>> This is not needed anymore ^ but I'll remove it when merging your 
>>>> patchset.
>>>>
>>> Could you be more specific about what "this" is?
>>>
>>> If we don't save/restore the SR_SUM bit I think our old friend
>>> the sched_tail bug will just return.
>> I think Alex is saying the `li` instruction above is not needed 
>> because s0 is
>> unused. But instead I think there is an `and` instruction missing 
>> here. The
>> patch as merged ORs the entirety of the old sstatus with the new 
>> sstatus, not
>> just the SUM bit, which seems extremely dangerous.
> 
> 
> I should have checked the definition of csrs, I thought it would write 
> the csr, but you're right it ORs with the current csr value which isn't 
> good at all.
> 
> @Cyril Can you send a patch for that? Which also removes the `li` 
> instruction that I forgot to remove :) I think we can even ask Palmer to 
> squash those fixes directly into the patch.

So I've sent a patch. In writing it, I think Ben was correct to have the 
original patch clear the SUM bit. The way we have it now, if the SUM bit 
is ever set, we don't clear it when swapping to the new thread. The 
condition is unlikely but if you extrapolate far enough, in theory, we 
could start running with the SUM bit effectively permanently on.

Should I resend with also clearing the SUM bit in between?

Cyril
> 
> Let me know if you can't do it and I'll do.
> 
> Thanks Samuel for noticing,
> 
> Alex
> 
> 
>>
>> Regards,
>> Samuel
>>
>>>>> +    csrr  s1, CSR_STATUS
>>>>> +    REG_S s1, TASK_THREAD_STATUS_RA(a3)
>>>>> +
>>>>>        /* Save the kernel shadow call stack pointer */
>>>>>        scs_save_current
>>>>>        /* Restore context from next->thread */
>>>>> +    REG_L s0,  TASK_THREAD_STATUS_RA(a4)
>>>>> +    csrs  CSR_STATUS, s0
>>>>>        REG_L ra,  TASK_THREAD_RA_RA(a4)
>>>>>        REG_L sp,  TASK_THREAD_SP_RA(a4)
>>>>>        REG_L s0,  TASK_THREAD_S0_RA(a4)
>>>> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
>>>>
>>>> Thanks for the multiple revisions!
>>>>
>>>> Alex
>>>>
>>>>
>>>> _______________________________________________
>>>> linux-riscv mailing list
>>>> linux-riscv@lists.infradead.org
>>>> http://lists.infradead.org/mailman/listinfo/linux-riscv
>>>>
>>>
>>
>> _______________________________________________
>> linux-riscv mailing list
>> linux-riscv@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-riscv

[PATCH v6 1/5] riscv: save the SR_SUM status over switches
[PATCH v6 2/5] riscv: implement user_access_begin() and families
[PATCH v6 3/5] riscv: uaccess: use input constraints for ptr of __put_user()
[PATCH v6 4/5] riscv: uaccess: use 'asm goto' for put_user()
[PATCH v6 5/5] riscv: uaccess: use 'asm_goto_output' for get_user()