[PATCH v2] riscv: entry: always initialize regs->a0 to -ENOSYS

Celeste Liu posted 1 patch 1 year, 5 months ago
arch/riscv/kernel/traps.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
[PATCH v2] riscv: entry: always initialize regs->a0 to -ENOSYS
Posted by Celeste Liu 1 year, 5 months ago
Otherwise when the tracer changes syscall number to -1, the kernel fails
to initialize a0 with -ENOSYS and subsequently fails to return the error
code of the failed syscall to userspace. For example, it will break
strace syscall tampering.

Fixes: 52449c17bdd1 ("riscv: entry: set a0 = -ENOSYS only when syscall != -1")
Reported-by: "Dmitry V. Levin" <ldv@strace.io>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Cc: stable@vger.kernel.org
Signed-off-by: Celeste Liu <CoelacanthusHex@gmail.com>
---
 arch/riscv/kernel/traps.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
index 05a16b1f0aee..51ebfd23e007 100644
--- a/arch/riscv/kernel/traps.c
+++ b/arch/riscv/kernel/traps.c
@@ -319,6 +319,7 @@ void do_trap_ecall_u(struct pt_regs *regs)
 
 		regs->epc += 4;
 		regs->orig_a0 = regs->a0;
+		regs->a0 = -ENOSYS;
 
 		riscv_v_vstate_discard(regs);
 
@@ -328,8 +329,7 @@ void do_trap_ecall_u(struct pt_regs *regs)
 
 		if (syscall >= 0 && syscall < NR_syscalls)
 			syscall_handler(regs, syscall);
-		else if (syscall != -1)
-			regs->a0 = -ENOSYS;
+
 		/*
 		 * Ultimately, this value will get limited by KSTACK_OFFSET_MAX(),
 		 * so the maximum stack offset is 1k bytes (10 bits).
-- 
2.45.2

Re: [PATCH v2] riscv: entry: always initialize regs->a0 to -ENOSYS
Posted by Andrea Bolognani 1 year, 3 months ago
On Thu, Jun 27, 2024 at 10:23:39PM GMT, Celeste Liu wrote:
> Otherwise when the tracer changes syscall number to -1, the kernel fails
> to initialize a0 with -ENOSYS and subsequently fails to return the error
> code of the failed syscall to userspace. For example, it will break
> strace syscall tampering.
>
> Fixes: 52449c17bdd1 ("riscv: entry: set a0 = -ENOSYS only when syscall != -1")
> Reported-by: "Dmitry V. Levin" <ldv@strace.io>
> Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
> Cc: stable@vger.kernel.org
> Signed-off-by: Celeste Liu <CoelacanthusHex@gmail.com>
> ---
>  arch/riscv/kernel/traps.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
> index 05a16b1f0aee..51ebfd23e007 100644
> --- a/arch/riscv/kernel/traps.c
> +++ b/arch/riscv/kernel/traps.c
> @@ -319,6 +319,7 @@ void do_trap_ecall_u(struct pt_regs *regs)
>
>  		regs->epc += 4;
>  		regs->orig_a0 = regs->a0;
> +		regs->a0 = -ENOSYS;
>
>  		riscv_v_vstate_discard(regs);
>
> @@ -328,8 +329,7 @@ void do_trap_ecall_u(struct pt_regs *regs)
>
>  		if (syscall >= 0 && syscall < NR_syscalls)
>  			syscall_handler(regs, syscall);
> -		else if (syscall != -1)
> -			regs->a0 = -ENOSYS;
> +
>  		/*
>  		 * Ultimately, this value will get limited by KSTACK_OFFSET_MAX(),
>  		 * so the maximum stack offset is 1k bytes (10 bits).

Hi,

this change seems to have broken strace's test suite.

In particular, the "legacy_syscall_info" test, which is meant to
verify that strace behaves correctly when PTRACE_GET_SYSCALL_INFO is
not available, reports a bogus value for the first argument of the
syscall (the one passed via a0).

The bogus value comes directly from the ptrace() call, before strace
has a chance to meddle with it, hence why the maintainer suggested
that the issue would likely be traced back to the kernel.

I have built a kernel with this change reverted and, as expected, the
strace test suite passes. Admittedly I've used the 6.11-rc7 Fedora
kernel as the baseline for this test, but none of the Fedora patches
touch the RISC-V code at all and the file itself hasn't been touched
since rc7, so I'm fairly confident the same behavior is present in
vanilla 6.11 too.

See

  https://github.com/strace/strace/issues/315

for the original report. Please let me know if I need to provide
additional information, report this anywhere else (bugzilla?), and so
on...

Thanks!

-- 
Andrea Bolognani / Red Hat / Virtualization
Re: [PATCH v2] riscv: entry: always initialize regs->a0 to -ENOSYS
Posted by Dmitry V . Levin 1 year, 3 months ago
On Tue, Sep 17, 2024 at 01:49:52AM +0900, Andrea Bolognani wrote:
> On Thu, Jun 27, 2024 at 10:23:39PM GMT, Celeste Liu wrote:
> > Otherwise when the tracer changes syscall number to -1, the kernel fails
> > to initialize a0 with -ENOSYS and subsequently fails to return the error
> > code of the failed syscall to userspace. For example, it will break
> > strace syscall tampering.
> >
> > Fixes: 52449c17bdd1 ("riscv: entry: set a0 = -ENOSYS only when syscall != -1")
> > Reported-by: "Dmitry V. Levin" <ldv@strace.io>
> > Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Celeste Liu <CoelacanthusHex@gmail.com>
> > ---
> >  arch/riscv/kernel/traps.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
> > index 05a16b1f0aee..51ebfd23e007 100644
> > --- a/arch/riscv/kernel/traps.c
> > +++ b/arch/riscv/kernel/traps.c
> > @@ -319,6 +319,7 @@ void do_trap_ecall_u(struct pt_regs *regs)
> >
> >  		regs->epc += 4;
> >  		regs->orig_a0 = regs->a0;
> > +		regs->a0 = -ENOSYS;
> >
> >  		riscv_v_vstate_discard(regs);
> >
> > @@ -328,8 +329,7 @@ void do_trap_ecall_u(struct pt_regs *regs)
> >
> >  		if (syscall >= 0 && syscall < NR_syscalls)
> >  			syscall_handler(regs, syscall);
> > -		else if (syscall != -1)
> > -			regs->a0 = -ENOSYS;
> > +
> >  		/*
> >  		 * Ultimately, this value will get limited by KSTACK_OFFSET_MAX(),
> >  		 * so the maximum stack offset is 1k bytes (10 bits).
> 
> Hi,
> 
> this change seems to have broken strace's test suite.
> 
> In particular, the "legacy_syscall_info" test, which is meant to
> verify that strace behaves correctly when PTRACE_GET_SYSCALL_INFO is
> not available, reports a bogus value for the first argument of the
> syscall (the one passed via a0).
> 
> The bogus value comes directly from the ptrace() call, before strace
> has a chance to meddle with it, hence why the maintainer suggested
> that the issue would likely be traced back to the kernel.
> 
> I have built a kernel with this change reverted and, as expected, the
> strace test suite passes. Admittedly I've used the 6.11-rc7 Fedora
> kernel as the baseline for this test, but none of the Fedora patches
> touch the RISC-V code at all and the file itself hasn't been touched
> since rc7, so I'm fairly confident the same behavior is present in
> vanilla 6.11 too.
> 
> See
> 
>   https://github.com/strace/strace/issues/315
> 
> for the original report. Please let me know if I need to provide
> additional information, report this anywhere else (bugzilla?), and so
> on...

By the way, in strace we had to apply a workaround [1] for the riscv ptrace
regression caused by commit 52449c17bdd1540940e21511612b58acebc49c06.

As result, reverting commit 61119394631f219e23ce98bcc3eb993a64a8ea64 that
fixed the regression but introduced a PTRACE_GETREGSET syscall argument
clobbering which is more serious regression seems to be the least of two
evils.

This essentially means strace would have to keep the workaround
indefinitely, but we can live with that.

[1] https://github.com/strace/strace/commit/c3ae2b27732952663a3600269884e363cb77a024


-- 
ldv