commit a9f3a74a29af ("entry: Provide generic syscall exit function")
introduce generic syscall exit function and call rseq_syscall()
before audit_syscall_exit() and arch_syscall_exit_tracehook().
And commit b74406f37737 ("arm: Add syscall detection for restartable
sequences") add rseq support for arm32, which also call rseq_syscall()
before audit_syscall_exit() and tracehook_report_syscall().
However, commit 409d5db49867c ("arm64: rseq: Implement backend rseq
calls and select HAVE_RSEQ") implement arm64 rseq and call
rseq_syscall() after audit_syscall_exit() and tracehook_report_syscall().
So compared to the generic entry and arm32 code, arm64 terminates
the process a bit later if the syscall is issued within
a restartable sequence.
But as commit b74406f37737 ("arm: Add syscall detection for restartable
sequences") said, syscalls are not allowed inside restartable sequences,
so should call rseq_syscall() at the very beginning of system call
exiting path for CONFIG_DEBUG_RSEQ=y kernel. This could help us to detect
whether there is a syscall issued inside restartable sequences.
It makes sense to raise SIGSEGV via rseq_syscall() before auditing
and ptrace syscall exit, because this guarantees that the process is
already in an error state with SIGSEGV pending when those later steps
run. Although it makes no practical difference to signal delivery (signals
are processed at the very end in arm64_exit_to_user_mode()), the ordering
is more logical: detect and flag the error first, then proceed with
the remaining work.
To make it more reasonable and in preparation for moving arm64 over to
the generic entry code, move rseq_syscall() ahead before
audit_syscall_exit().
Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
---
arch/arm64/kernel/ptrace.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index 79762ff33945..983d8d1104df 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -2442,6 +2442,8 @@ int syscall_trace_enter(struct pt_regs *regs, long syscall, unsigned long flags)
void syscall_trace_exit(struct pt_regs *regs, unsigned long flags)
{
+ rseq_syscall(regs);
+
audit_syscall_exit(regs);
if (flags & _TIF_SYSCALL_TRACEPOINT)
@@ -2449,8 +2451,6 @@ void syscall_trace_exit(struct pt_regs *regs, unsigned long flags)
if (flags & (_TIF_SYSCALL_TRACE | _TIF_SINGLESTEP))
report_syscall_exit(regs);
-
- rseq_syscall(regs);
}
/*
--
2.34.1
On Mon, Dec 22, 2025 at 07:47:26PM +0800, Jinjie Ruan wrote:
> commit a9f3a74a29af ("entry: Provide generic syscall exit function")
> introduce generic syscall exit function and call rseq_syscall()
> before audit_syscall_exit() and arch_syscall_exit_tracehook().
>
> And commit b74406f37737 ("arm: Add syscall detection for restartable
> sequences") add rseq support for arm32, which also call rseq_syscall()
> before audit_syscall_exit() and tracehook_report_syscall().
>
> However, commit 409d5db49867c ("arm64: rseq: Implement backend rseq
> calls and select HAVE_RSEQ") implement arm64 rseq and call
> rseq_syscall() after audit_syscall_exit() and tracehook_report_syscall().
>
> So compared to the generic entry and arm32 code, arm64 terminates
> the process a bit later if the syscall is issued within
> a restartable sequence.
Given that signals are processed until later, is this actually true?
> But as commit b74406f37737 ("arm: Add syscall detection for restartable
> sequences") said, syscalls are not allowed inside restartable sequences,
> so should call rseq_syscall() at the very beginning of system call
> exiting path for CONFIG_DEBUG_RSEQ=y kernel. This could help us to detect
> whether there is a syscall issued inside restartable sequences.
>
> It makes sense to raise SIGSEGV via rseq_syscall() before auditing
> and ptrace syscall exit, because this guarantees that the process is
> already in an error state with SIGSEGV pending when those later steps
> run. Although it makes no practical difference to signal delivery (signals
> are processed at the very end in arm64_exit_to_user_mode()), the ordering
> is more logical: detect and flag the error first, then proceed with
> the remaining work.
>
> To make it more reasonable and in preparation for moving arm64 over to
> the generic entry code, move rseq_syscall() ahead before
> audit_syscall_exit().
I've been struggling a bit to see how this helps to align with the
generic code. I'm also concerned that rseq_debug_update_user_cs()
operates on instruction_pointer(regs) which is something that can be
chaned by ptrace.
So, I'm not saying this is wrong, but it feels like a user-visible
change that needs better justification.
Will
On 26/01/2026 20:02, Will Deacon wrote:
> On Mon, Dec 22, 2025 at 07:47:26PM +0800, Jinjie Ruan wrote:
>> [...]
>>
>> To make it more reasonable and in preparation for moving arm64 over to
>> the generic entry code, move rseq_syscall() ahead before
>> audit_syscall_exit().
> I've been struggling a bit to see how this helps to align with the
> generic code.
rseq_syscall(), or rather rseq_debug_syscall_return() since eaa9088d568c
("rseq: Use static branch for syscall exit debug when
GENERIC_IRQ_ENTRY=y"), is called first in the generic
syscall_exit_to_user_mode_work(), so the aim of that patch is to align
the order of calls with generic entry.
> I'm also concerned that rseq_debug_update_user_cs()
> operates on instruction_pointer(regs) which is something that can be
> chaned by ptrace.
Isn't that true regardless of where rseq_syscall() is called on the
syscall exit path, though?
> So, I'm not saying this is wrong, but it feels like a user-visible
> change that needs better justification.
This seems to hang on whether the force_sig(SIGSEGV) that rseq_syscall()
might issue interacts in any way with the tracing calls. My feeling is
that it doesn't, but I haven't confirmed it. Worth noting this is only
relevant if rseq debugging is enabled, so any potential user-visible
effect is limited.
- Kevin
On 2026/1/27 17:44, Kevin Brodsky wrote:
> On 26/01/2026 20:02, Will Deacon wrote:
>> On Mon, Dec 22, 2025 at 07:47:26PM +0800, Jinjie Ruan wrote:
>>> [...]
>>>
>>> To make it more reasonable and in preparation for moving arm64 over to
>>> the generic entry code, move rseq_syscall() ahead before
>>> audit_syscall_exit().
>> I've been struggling a bit to see how this helps to align with the
>> generic code.
>
> rseq_syscall(), or rather rseq_debug_syscall_return() since eaa9088d568c
> ("rseq: Use static branch for syscall exit debug when
> GENERIC_IRQ_ENTRY=y"), is called first in the generic
> syscall_exit_to_user_mode_work(), so the aim of that patch is to align
> the order of calls with generic entry.
>
>> I'm also concerned that rseq_debug_update_user_cs()
>> operates on instruction_pointer(regs) which is something that can be
>> chaned by ptrace.
>
> Isn't that true regardless of where rseq_syscall() is called on the
> syscall exit path, though?
My understanding is that if instruction_pointer(regs) is hijacked and
modified via ptrace at the syscall exit (ptrace_report_syscall_exit()),
this modification will not be observed by rseq. Specifically, in the
generic entry syscall exit path, rseq_syscall() is unable to detect such
a PC modification.
Regards,
Jinjie
>
>> So, I'm not saying this is wrong, but it feels like a user-visible
>> change that needs better justification.
>
> This seems to hang on whether the force_sig(SIGSEGV) that rseq_syscall()
> might issue interacts in any way with the tracing calls. My feeling is
> that it doesn't, but I haven't confirmed it. Worth noting this is only
> relevant if rseq debugging is enabled, so any potential user-visible
> effect is limited.
>
> - Kevin
>
On 27/01/2026 12:34, Jinjie Ruan wrote: >> [...] >> >>> I'm also concerned that rseq_debug_update_user_cs() >>> operates on instruction_pointer(regs) which is something that can be >>> chaned by ptrace. >> Isn't that true regardless of where rseq_syscall() is called on the >> syscall exit path, though? > My understanding is that if instruction_pointer(regs) is hijacked and > modified via ptrace at the syscall exit (ptrace_report_syscall_exit()), > this modification will not be observed by rseq. Specifically, in the > generic entry syscall exit path, rseq_syscall() is unable to detect such > a PC modification. Good point. So concretely that means that currently on arm64, one could make the rseq debug check pass/fail by using the syscall exit trap to modify PC. OTOH this is impossible with generic entry because the rseq check is performed first. I'm not sure this is a feature anyone has even noticed, but it is a user-visible change indeed. - Kevin
On 2026/1/27 23:06, Kevin Brodsky wrote:
> On 27/01/2026 12:34, Jinjie Ruan wrote:
>>> [...]
>>>
>>>> I'm also concerned that rseq_debug_update_user_cs()
>>>> operates on instruction_pointer(regs) which is something that can be
>>>> chaned by ptrace.
>>> Isn't that true regardless of where rseq_syscall() is called on the
>>> syscall exit path, though?
>> My understanding is that if instruction_pointer(regs) is hijacked and
>> modified via ptrace at the syscall exit (ptrace_report_syscall_exit()),
>> this modification will not be observed by rseq. Specifically, in the
>> generic entry syscall exit path, rseq_syscall() is unable to detect such
>> a PC modification.
>
> Good point. So concretely that means that currently on arm64, one could
> make the rseq debug check pass/fail by using the syscall exit trap to
> modify PC. OTOH this is impossible with generic entry because the rseq
> check is performed first. I'm not sure this is a feature anyone has even
> noticed, but it is a user-visible change indeed.
After digging into the ptrace code, I found that ptrace does not modify
instruction_pointer(regs) on the syscall exit path; it only changes the
return value as below.
Therefore, if my understanding is correct, Will's concern does not apply
here.
ptrace_set_syscall_info()
-> ptrace_set_syscall_info_exit()
-> syscall_set_return_value(child, regs, 0, rval)
Regards,
Jinjie
>
> - Kevin
>
On 28/01/2026 02:09, Jinjie Ruan wrote: > > On 2026/1/27 23:06, Kevin Brodsky wrote: >> On 27/01/2026 12:34, Jinjie Ruan wrote: >>>> [...] >>>> >>>>> I'm also concerned that rseq_debug_update_user_cs() >>>>> operates on instruction_pointer(regs) which is something that can be >>>>> chaned by ptrace. >>>> Isn't that true regardless of where rseq_syscall() is called on the >>>> syscall exit path, though? >>> My understanding is that if instruction_pointer(regs) is hijacked and >>> modified via ptrace at the syscall exit (ptrace_report_syscall_exit()), >>> this modification will not be observed by rseq. Specifically, in the >>> generic entry syscall exit path, rseq_syscall() is unable to detect such >>> a PC modification. >> Good point. So concretely that means that currently on arm64, one could >> make the rseq debug check pass/fail by using the syscall exit trap to >> modify PC. OTOH this is impossible with generic entry because the rseq >> check is performed first. I'm not sure this is a feature anyone has even >> noticed, but it is a user-visible change indeed. > After digging into the ptrace code, I found that ptrace does not modify > instruction_pointer(regs) on the syscall exit path; it only changes the > return value as below. > Therefore, if my understanding is correct, Will's concern does not apply > here. > > ptrace_set_syscall_info() > -> ptrace_set_syscall_info_exit() > -> syscall_set_return_value(child, regs, 0, rval) I'm not following, how is that related to the call to ptrace_report_syscall_exit()? That eventually results in a call to ptrace_stop() (via ptrace_notify()), which synchronously causes the tracee to sleep and allows the tracer to issue ptrace commands, e.g. setting PC. - Kevin
On 2026/1/28 22:53, Kevin Brodsky wrote: > On 28/01/2026 02:09, Jinjie Ruan wrote: >> >> On 2026/1/27 23:06, Kevin Brodsky wrote: >>> On 27/01/2026 12:34, Jinjie Ruan wrote: >>>>> [...] >>>>> >>>>>> I'm also concerned that rseq_debug_update_user_cs() >>>>>> operates on instruction_pointer(regs) which is something that can be >>>>>> chaned by ptrace. >>>>> Isn't that true regardless of where rseq_syscall() is called on the >>>>> syscall exit path, though? >>>> My understanding is that if instruction_pointer(regs) is hijacked and >>>> modified via ptrace at the syscall exit (ptrace_report_syscall_exit()), >>>> this modification will not be observed by rseq. Specifically, in the >>>> generic entry syscall exit path, rseq_syscall() is unable to detect such >>>> a PC modification. >>> Good point. So concretely that means that currently on arm64, one could >>> make the rseq debug check pass/fail by using the syscall exit trap to >>> modify PC. OTOH this is impossible with generic entry because the rseq >>> check is performed first. I'm not sure this is a feature anyone has even >>> noticed, but it is a user-visible change indeed. >> After digging into the ptrace code, I found that ptrace does not modify >> instruction_pointer(regs) on the syscall exit path; it only changes the >> return value as below. >> Therefore, if my understanding is correct, Will's concern does not apply >> here. >> >> ptrace_set_syscall_info() >> -> ptrace_set_syscall_info_exit() >> -> syscall_set_return_value(child, regs, 0, rval) > > I'm not following, how is that related to the call to > ptrace_report_syscall_exit()? That eventually results in a call to > ptrace_stop() (via ptrace_notify()), which synchronously causes the > tracee to sleep and allows the tracer to issue ptrace commands, e.g. > setting PC. I realize I had a misunderstanding — PTRACE_SET_SYSCALL_INFO is only one possible ptrace command. What I actually tried was to modify regs->pc on the syscall return path using PTRACE_SETREGSET, and the result shows that I can indeed change regs->pc to make the tracee segment fault. > > - Kevin >
© 2016 - 2026 Red Hat, Inc.