[PATCH v13 RESEND 10/14] arm64: Inline el0_svc_common()

Jinjie Ruan posted 14 patches 2 weeks, 6 days ago
There is a newer version of this series
[PATCH v13 RESEND 10/14] arm64: Inline el0_svc_common()
Posted by Jinjie Ruan 2 weeks, 6 days ago
After converting arm64 to Generic Entry framework, the compiler no longer
inlines el0_svc_common() into its caller do_el0_svc(). This introduces
a small but measurable overhead in the critical system call path.

Manually forcing el0_svc_common() to be inlined restores the
performance. Benchmarking with perf bench syscall basic on a
Kunpeng 920 platform (based on v6.19-rc1) shows a ~1% performance
uplift.

Inlining this function reduces function prologue/epilogue overhead
and allows for better compiler optimization in the hot system call
dispatch path.

| Metric     | W/O this patch | With this patch | Change    |
| ---------- | -------------- | --------------- | --------- |
| Total time | 2.195 [sec]    | 2.171 [sec]     |  ↓1.1%   |
| usecs/op   | 0.219575       | 0.217192        |  ↓1.1%   |
| ops/sec    | 4,554,260      | 4,604,225       |  ↑1.1%    |

Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
---
 arch/arm64/kernel/syscall.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c
index 77d00a5cf0e9..6fcd97c46716 100644
--- a/arch/arm64/kernel/syscall.c
+++ b/arch/arm64/kernel/syscall.c
@@ -66,8 +66,8 @@ static void invoke_syscall(struct pt_regs *regs, unsigned int scno,
 	choose_random_kstack_offset(get_random_u16());
 }
 
-static void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr,
-			   const syscall_fn_t syscall_table[])
+static __always_inline void el0_svc_common(struct pt_regs *regs, int scno, int sc_nr,
+					   const syscall_fn_t syscall_table[])
 {
 	unsigned long work = READ_ONCE(current_thread_info()->syscall_work);
 	unsigned long flags = read_thread_flags();
-- 
2.34.1

Re: [PATCH v13 RESEND 10/14] arm64: Inline el0_svc_common()
Posted by Linus Walleij 2 weeks, 4 days ago
On Tue, Mar 17, 2026 at 9:20 AM Jinjie Ruan <ruanjinjie@huawei.com> wrote:

> After converting arm64 to Generic Entry framework, the compiler no longer
> inlines el0_svc_common() into its caller do_el0_svc(). This introduces
> a small but measurable overhead in the critical system call path.
>
> Manually forcing el0_svc_common() to be inlined restores the
> performance. Benchmarking with perf bench syscall basic on a
> Kunpeng 920 platform (based on v6.19-rc1) shows a ~1% performance
> uplift.
>
> Inlining this function reduces function prologue/epilogue overhead
> and allows for better compiler optimization in the hot system call
> dispatch path.
>
> | Metric     | W/O this patch | With this patch | Change    |
> | ---------- | -------------- | --------------- | --------- |
> | Total time | 2.195 [sec]    | 2.171 [sec]     |  ↓1.1%   |
> | usecs/op   | 0.219575       | 0.217192        |  ↓1.1%   |
> | ops/sec    | 4,554,260      | 4,604,225       |  ↑1.1%    |
>
> Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>
> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>

Reviewed-by: Linus Walleij <linusw@kernel.org>

Yours,
Linus Walleij