include/linux/irq-entry-common.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
clang needs __always_inline instead of inline, even for tiny helpers.
This saves some cycles in system call fast path, and saves 195 bytes
on x86_64 build:
$ size vmlinux.before vmlinux.after
text data bss dec hex filename
34652814 22291961 5875180 62819955 3be8e73 vmlinux.before
34652619 22291961 5875180 62819760 3be8db0 vmlinux.after
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/linux/irq-entry-common.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/linux/irq-entry-common.h b/include/linux/irq-entry-common.h
index 6ab913e57da0a8acde84a1002645a9dfa5e6303a..d26d1b1bcbfb9798885426fbb2b978f43fcfcdc1 100644
--- a/include/linux/irq-entry-common.h
+++ b/include/linux/irq-entry-common.h
@@ -110,7 +110,7 @@ static __always_inline void enter_from_user_mode(struct pt_regs *regs)
static inline void local_irq_enable_exit_to_user(unsigned long ti_work);
#ifndef local_irq_enable_exit_to_user
-static inline void local_irq_enable_exit_to_user(unsigned long ti_work)
+static __always_inline void local_irq_enable_exit_to_user(unsigned long ti_work)
{
local_irq_enable();
}
@@ -125,7 +125,7 @@ static inline void local_irq_enable_exit_to_user(unsigned long ti_work)
static inline void local_irq_disable_exit_to_user(void);
#ifndef local_irq_disable_exit_to_user
-static inline void local_irq_disable_exit_to_user(void)
+static __always_inline void local_irq_disable_exit_to_user(void)
{
local_irq_disable();
}
--
2.52.0.177.g9f829587af-goog
On Thu, Dec 04, 2025 at 03:31:27PM +0000, Eric Dumazet wrote: > clang needs __always_inline instead of inline, even for tiny helpers. > > This saves some cycles in system call fast path, and saves 195 bytes > on x86_64 build: > > $ size vmlinux.before vmlinux.after > text data bss dec hex filename > 34652814 22291961 5875180 62819955 3be8e73 vmlinux.before > 34652619 22291961 5875180 62819760 3be8db0 vmlinux.after > > Signed-off-by: Eric Dumazet <edumazet@google.com> Yeah, sometimes these inline heuristics drive me mad. I've picked up this and the rseq one. I'll do something with them after rc1.
On Fri, Dec 5, 2025 at 2:51 AM Peter Zijlstra <peterz@infradead.org> wrote: > > On Thu, Dec 04, 2025 at 03:31:27PM +0000, Eric Dumazet wrote: > > clang needs __always_inline instead of inline, even for tiny helpers. > > > > This saves some cycles in system call fast path, and saves 195 bytes > > on x86_64 build: > > > > $ size vmlinux.before vmlinux.after > > text data bss dec hex filename > > 34652814 22291961 5875180 62819955 3be8e73 vmlinux.before > > 34652619 22291961 5875180 62819760 3be8db0 vmlinux.after > > > > Signed-off-by: Eric Dumazet <edumazet@google.com> > > Yeah, sometimes these inline heuristics drive me mad. I've picked up > this and the rseq one. I'll do something with them after rc1. Thanks Peter. I forgot to include perf numbers for this one, but apparently having a local_irq_enable() in an out-of-line function in syscall path was adding a 5 % penalty on some platforms. Crazy...
On Fri, Dec 05, 2025 at 02:54:26AM -0800, Eric Dumazet wrote: > On Fri, Dec 5, 2025 at 2:51 AM Peter Zijlstra <peterz@infradead.org> wrote: > > > > On Thu, Dec 04, 2025 at 03:31:27PM +0000, Eric Dumazet wrote: > > > clang needs __always_inline instead of inline, even for tiny helpers. > > > > > > This saves some cycles in system call fast path, and saves 195 bytes > > > on x86_64 build: > > > > > > $ size vmlinux.before vmlinux.after > > > text data bss dec hex filename > > > 34652814 22291961 5875180 62819955 3be8e73 vmlinux.before > > > 34652619 22291961 5875180 62819760 3be8db0 vmlinux.after > > > > > > Signed-off-by: Eric Dumazet <edumazet@google.com> > > > > Yeah, sometimes these inline heuristics drive me mad. I've picked up > > this and the rseq one. I'll do something with them after rc1. > > Thanks Peter. > > I forgot to include perf numbers for this one, but apparently having a > local_irq_enable() > in an out-of-line function in syscall path was adding a 5 % penalty on > some platforms. > > Crazy... Earlier Zen with RET mitigation? ;-)
On Fri, Dec 5, 2025 at 4:45 AM Peter Zijlstra <peterz@infradead.org> wrote: > > On Fri, Dec 05, 2025 at 02:54:26AM -0800, Eric Dumazet wrote: > > On Fri, Dec 5, 2025 at 2:51 AM Peter Zijlstra <peterz@infradead.org> wrote: > > > > > > On Thu, Dec 04, 2025 at 03:31:27PM +0000, Eric Dumazet wrote: > > > > clang needs __always_inline instead of inline, even for tiny helpers. > > > > > > > > This saves some cycles in system call fast path, and saves 195 bytes > > > > on x86_64 build: > > > > > > > > $ size vmlinux.before vmlinux.after > > > > text data bss dec hex filename > > > > 34652814 22291961 5875180 62819955 3be8e73 vmlinux.before > > > > 34652619 22291961 5875180 62819760 3be8db0 vmlinux.after > > > > > > > > Signed-off-by: Eric Dumazet <edumazet@google.com> > > > > > > Yeah, sometimes these inline heuristics drive me mad. I've picked up > > > this and the rseq one. I'll do something with them after rc1. > > > > Thanks Peter. > > > > I forgot to include perf numbers for this one, but apparently having a > > local_irq_enable() > > in an out-of-line function in syscall path was adding a 5 % penalty on > > some platforms. > > > > Crazy... > > Earlier Zen with RET mitigation? ;-) This was AMD Rome "AMD EPYC 7B12 64-Core Processor", bu also AMDTurin "AMD EPYC 9B45 128-Core Processor" to a certain extent. When you say RET mitigation, this is the five int3 after retq, right ?
On Fri, Dec 05, 2025 at 05:03:33AM -0800, Eric Dumazet wrote: > > Earlier Zen with RET mitigation? ;-) > > This was AMD Rome "AMD EPYC 7B12 64-Core Processor", > bu also AMDTurin "AMD EPYC 9B45 128-Core Processor" to a certain extent. > > When you say RET mitigation, this is the five int3 after retq, right ? Nope, that one is SLS, AMD has BTB type confusion on return prediction (the AMD RetBleed) and patches all the RET sites with jumps to retbleed_return_thunk(), or one of the srso*return_thunk() thingies. All are somewhat expensive. So while normally CALL+RET is well optimized and hardly noticeable, the moment your uarch needs one of these return thunks, you're going to notice them.
The following commit has been merged into the core/urgent branch of tip:
Commit-ID: 4a824c3128998158a093eaadd776a79abe3a601a
Gitweb: https://git.kernel.org/tip/4a824c3128998158a093eaadd776a79abe3a601a
Author: Eric Dumazet <edumazet@google.com>
AuthorDate: Thu, 04 Dec 2025 15:31:27
Committer: Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 18 Dec 2025 10:43:52 +01:00
entry: Always inline local_irq_{enable,disable}_exit_to_user()
clang needs __always_inline instead of inline, even for tiny helpers.
This saves some cycles in system call fast path, and saves 195 bytes
on x86_64 build:
$ size vmlinux.before vmlinux.after
text data bss dec hex filename
34652814 22291961 5875180 62819955 3be8e73 vmlinux.before
34652619 22291961 5875180 62819760 3be8db0 vmlinux.after
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20251204153127.1321824-1-edumazet@google.com
---
include/linux/irq-entry-common.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/linux/irq-entry-common.h b/include/linux/irq-entry-common.h
index 6ab913e..d26d1b1 100644
--- a/include/linux/irq-entry-common.h
+++ b/include/linux/irq-entry-common.h
@@ -110,7 +110,7 @@ static __always_inline void enter_from_user_mode(struct pt_regs *regs)
static inline void local_irq_enable_exit_to_user(unsigned long ti_work);
#ifndef local_irq_enable_exit_to_user
-static inline void local_irq_enable_exit_to_user(unsigned long ti_work)
+static __always_inline void local_irq_enable_exit_to_user(unsigned long ti_work)
{
local_irq_enable();
}
@@ -125,7 +125,7 @@ static inline void local_irq_enable_exit_to_user(unsigned long ti_work)
static inline void local_irq_disable_exit_to_user(void);
#ifndef local_irq_disable_exit_to_user
-static inline void local_irq_disable_exit_to_user(void)
+static __always_inline void local_irq_disable_exit_to_user(void)
{
local_irq_disable();
}
© 2016 - 2026 Red Hat, Inc.