Currently, on GENERIC_IRQ_MULTI_HANDLER platforms, the handle_arch_irq
is a pointer which is set during booting, and every irq processing needs
to access it, so it sits in hot code path. We can use the
runtime constant mechanism which was introduced by Linus to speed up
its accessing.
Tested on Sipeed Lichee Pi 4A (riscv64) board, the perf sched benchmark is
improved by ~5.8%
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
---
include/linux/irq.h | 4 +++-
kernel/irq/handle.c | 8 +++++---
2 files changed, 8 insertions(+), 4 deletions(-)
diff --git a/include/linux/irq.h b/include/linux/irq.h
index 951acbdb9f84..2ba4a8afb71e 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -1274,6 +1274,7 @@ void ipi_mux_process(void);
int ipi_mux_create(unsigned int nr_ipi, void (*mux_send)(unsigned int cpu));
#ifdef CONFIG_GENERIC_IRQ_MULTI_HANDLER
+#include <asm/runtime-const.h>
/*
* Registers a generic IRQ handling function as the top-level IRQ handler in
* the system, which is generally the first C code called from an assembly
@@ -1288,7 +1289,8 @@ int __init set_handle_irq(void (*handle_irq)(struct pt_regs *));
* Allows interrupt handlers to find the irqchip that's been registered as the
* top-level IRQ handler.
*/
-extern void (*handle_arch_irq)(struct pt_regs *) __ro_after_init;
+extern void (*_handle_arch_irq)(struct pt_regs *) __ro_after_init;
+#define handle_arch_irq runtime_const_ptr(_handle_arch_irq)
asmlinkage void generic_handle_arch_irq(struct pt_regs *regs);
#else
#ifndef set_handle_irq
diff --git a/kernel/irq/handle.c b/kernel/irq/handle.c
index b7d52821837b..aac9e7b1301e 100644
--- a/kernel/irq/handle.c
+++ b/kernel/irq/handle.c
@@ -15,13 +15,14 @@
#include <linux/kernel_stat.h>
#include <asm/irq_regs.h>
+#include <asm/runtime-const.h>
#include <trace/events/irq.h>
#include "internals.h"
#ifdef CONFIG_GENERIC_IRQ_MULTI_HANDLER
-void (*handle_arch_irq)(struct pt_regs *) __ro_after_init;
+void (*_handle_arch_irq)(struct pt_regs *) __ro_after_init;
#endif
/**
@@ -270,10 +271,11 @@ irqreturn_t handle_irq_event(struct irq_desc *desc)
#ifdef CONFIG_GENERIC_IRQ_MULTI_HANDLER
int __init set_handle_irq(void (*handle_irq)(struct pt_regs *))
{
- if (handle_arch_irq)
+ if (_handle_arch_irq)
return -EBUSY;
- handle_arch_irq = handle_irq;
+ _handle_arch_irq = handle_irq;
+ runtime_const_init(ptr, _handle_arch_irq);
return 0;
}
--
2.51.0
On Fri, Feb 20, 2026 at 5:28 PM Jisheng Zhang <jszhang@kernel.org> wrote:
>
> Currently, on GENERIC_IRQ_MULTI_HANDLER platforms, the handle_arch_irq
> is a pointer which is set during booting, and every irq processing needs
> to access it, so it sits in hot code path. We can use the
> runtime constant mechanism which was introduced by Linus to speed up
> its accessing.
>
> Tested on Sipeed Lichee Pi 4A (riscv64) board, the perf sched benchmark is
> improved by ~5.8%
Thx for the work. It's a visible improvement.
Compared to the original handler pointer approach, this solution
introduces no observable drawbacks. Since the static_call alternative
requires further analysis and consensus, adopting this implementation
serves as a pragmatic interim step.
So, I give:
Reviewed-by: Guo Ren (Alibaba Damo Academy) <guoren@kernel.org>
>
> Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
> ---
> include/linux/irq.h | 4 +++-
> kernel/irq/handle.c | 8 +++++---
> 2 files changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/irq.h b/include/linux/irq.h
> index 951acbdb9f84..2ba4a8afb71e 100644
> --- a/include/linux/irq.h
> +++ b/include/linux/irq.h
> @@ -1274,6 +1274,7 @@ void ipi_mux_process(void);
> int ipi_mux_create(unsigned int nr_ipi, void (*mux_send)(unsigned int cpu));
>
> #ifdef CONFIG_GENERIC_IRQ_MULTI_HANDLER
> +#include <asm/runtime-const.h>
> /*
> * Registers a generic IRQ handling function as the top-level IRQ handler in
> * the system, which is generally the first C code called from an assembly
> @@ -1288,7 +1289,8 @@ int __init set_handle_irq(void (*handle_irq)(struct pt_regs *));
> * Allows interrupt handlers to find the irqchip that's been registered as the
> * top-level IRQ handler.
> */
> -extern void (*handle_arch_irq)(struct pt_regs *) __ro_after_init;
> +extern void (*_handle_arch_irq)(struct pt_regs *) __ro_after_init;
> +#define handle_arch_irq runtime_const_ptr(_handle_arch_irq)
> asmlinkage void generic_handle_arch_irq(struct pt_regs *regs);
> #else
> #ifndef set_handle_irq
> diff --git a/kernel/irq/handle.c b/kernel/irq/handle.c
> index b7d52821837b..aac9e7b1301e 100644
> --- a/kernel/irq/handle.c
> +++ b/kernel/irq/handle.c
> @@ -15,13 +15,14 @@
> #include <linux/kernel_stat.h>
>
> #include <asm/irq_regs.h>
> +#include <asm/runtime-const.h>
>
> #include <trace/events/irq.h>
>
> #include "internals.h"
>
> #ifdef CONFIG_GENERIC_IRQ_MULTI_HANDLER
> -void (*handle_arch_irq)(struct pt_regs *) __ro_after_init;
> +void (*_handle_arch_irq)(struct pt_regs *) __ro_after_init;
> #endif
>
> /**
> @@ -270,10 +271,11 @@ irqreturn_t handle_irq_event(struct irq_desc *desc)
> #ifdef CONFIG_GENERIC_IRQ_MULTI_HANDLER
> int __init set_handle_irq(void (*handle_irq)(struct pt_regs *))
> {
> - if (handle_arch_irq)
> + if (_handle_arch_irq)
> return -EBUSY;
>
> - handle_arch_irq = handle_irq;
> + _handle_arch_irq = handle_irq;
> + runtime_const_init(ptr, _handle_arch_irq);
> return 0;
> }
>
> --
> 2.51.0
>
--
Best Regards
Guo Ren
On Tue, Feb 24, 2026 at 9:40 AM Guo Ren <guoren@kernel.org> wrote:
>
> On Fri, Feb 20, 2026 at 5:28 PM Jisheng Zhang <jszhang@kernel.org> wrote:
> >
> > Currently, on GENERIC_IRQ_MULTI_HANDLER platforms, the handle_arch_irq
> > is a pointer which is set during booting, and every irq processing needs
> > to access it, so it sits in hot code path. We can use the
> > runtime constant mechanism which was introduced by Linus to speed up
> > its accessing.
> >
> > Tested on Sipeed Lichee Pi 4A (riscv64) board, the perf sched benchmark is
> > improved by ~5.8%
> Thx for the work. It's a visible improvement.
>
> Compared to the original handler pointer approach, this solution
> introduces no observable drawbacks. Since the static_call alternative
> requires further analysis and consensus, adopting this implementation
> serves as a pragmatic interim step.
>
> So, I give:
> Reviewed-by: Guo Ren (Alibaba Damo Academy) <guoren@kernel.org>
Oh, don't forget Mark's suggestion for the uninitialized handler,
which is also suitable for genirq:
"That means that if set_handle_irq() isn't called, an IRQ will result in
a call to that bogus address rather than default_handle_irq(), ..."
Maybe we need an empty _handle_arch_irq instance as a default for the
early stage.
>
> >
> > Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
> > ---
> > include/linux/irq.h | 4 +++-
> > kernel/irq/handle.c | 8 +++++---
> > 2 files changed, 8 insertions(+), 4 deletions(-)
> >
> > diff --git a/include/linux/irq.h b/include/linux/irq.h
> > index 951acbdb9f84..2ba4a8afb71e 100644
> > --- a/include/linux/irq.h
> > +++ b/include/linux/irq.h
> > @@ -1274,6 +1274,7 @@ void ipi_mux_process(void);
> > int ipi_mux_create(unsigned int nr_ipi, void (*mux_send)(unsigned int cpu));
> >
> > #ifdef CONFIG_GENERIC_IRQ_MULTI_HANDLER
> > +#include <asm/runtime-const.h>
> > /*
> > * Registers a generic IRQ handling function as the top-level IRQ handler in
> > * the system, which is generally the first C code called from an assembly
> > @@ -1288,7 +1289,8 @@ int __init set_handle_irq(void (*handle_irq)(struct pt_regs *));
> > * Allows interrupt handlers to find the irqchip that's been registered as the
> > * top-level IRQ handler.
> > */
> > -extern void (*handle_arch_irq)(struct pt_regs *) __ro_after_init;
> > +extern void (*_handle_arch_irq)(struct pt_regs *) __ro_after_init;
> > +#define handle_arch_irq runtime_const_ptr(_handle_arch_irq)
> > asmlinkage void generic_handle_arch_irq(struct pt_regs *regs);
> > #else
> > #ifndef set_handle_irq
> > diff --git a/kernel/irq/handle.c b/kernel/irq/handle.c
> > index b7d52821837b..aac9e7b1301e 100644
> > --- a/kernel/irq/handle.c
> > +++ b/kernel/irq/handle.c
> > @@ -15,13 +15,14 @@
> > #include <linux/kernel_stat.h>
> >
> > #include <asm/irq_regs.h>
> > +#include <asm/runtime-const.h>
> >
> > #include <trace/events/irq.h>
> >
> > #include "internals.h"
> >
> > #ifdef CONFIG_GENERIC_IRQ_MULTI_HANDLER
> > -void (*handle_arch_irq)(struct pt_regs *) __ro_after_init;
> > +void (*_handle_arch_irq)(struct pt_regs *) __ro_after_init;
> > #endif
> >
> > /**
> > @@ -270,10 +271,11 @@ irqreturn_t handle_irq_event(struct irq_desc *desc)
> > #ifdef CONFIG_GENERIC_IRQ_MULTI_HANDLER
> > int __init set_handle_irq(void (*handle_irq)(struct pt_regs *))
> > {
> > - if (handle_arch_irq)
> > + if (_handle_arch_irq)
> > return -EBUSY;
> >
> > - handle_arch_irq = handle_irq;
> > + _handle_arch_irq = handle_irq;
> > + runtime_const_init(ptr, _handle_arch_irq);
> > return 0;
> > }
> >
> > --
> > 2.51.0
> >
>
>
> --
> Best Regards
> Guo Ren
--
Best Regards
Guo Ren
On Fri, Feb 20 2026 at 17:09, Jisheng Zhang wrote:
> Currently, on GENERIC_IRQ_MULTI_HANDLER platforms, the handle_arch_irq
> is a pointer which is set during booting, and every irq processing needs
> to access it, so it sits in hot code path. We can use the
> runtime constant mechanism which was introduced by Linus to speed up
> its accessing.
The proper solution is to use a static call and update it in
set_handle_irq(). That removes the complete indirect call issue from
the hot path.
Thanks,
tglx
On Sun, Feb 22, 2026 at 11:06:11PM +0100, Thomas Gleixner wrote: > On Fri, Feb 20 2026 at 17:09, Jisheng Zhang wrote: > > Currently, on GENERIC_IRQ_MULTI_HANDLER platforms, the handle_arch_irq > > is a pointer which is set during booting, and every irq processing needs > > to access it, so it sits in hot code path. We can use the > > runtime constant mechanism which was introduced by Linus to speed up > > its accessing. > > The proper solution is to use a static call and update it in > set_handle_irq(). That removes the complete indirect call issue from > the hot path. + Ard, Mark, Good idea. The remaining problem is no static call support for current GENERIC_IRQ_MULTI_HANDLER (or similar, arm64 e.g) platforms. For arm64, Ard tried to add the static call support[1] in 2021, but Mark concerned "compiler could easily violate our expectations in future"[2], and asked for static calls "critical rather than a nice-to-have" usage. Hi Ard, Mark, Could this irq performance improvement be used as a "critical" usage for arm64 static call? Per my test, about 6.5% improvement was seen on quad CA55. Another alternative: disable static call if CFI is enabled, and give the platform/SoC users chance to enable static call to benefit from it. Any comment is appreciated. Thanks [1] https://www.spinics.net/lists/arm-kernel/msg931861.html [2] https://www.spinics.net/lists/arm-kernel/msg932481.html > > Thanks, > > tglx
On Mon, Feb 23, 2026 at 08:41:55PM +0800, Jisheng Zhang wrote: > On Sun, Feb 22, 2026 at 11:06:11PM +0100, Thomas Gleixner wrote: > > On Fri, Feb 20 2026 at 17:09, Jisheng Zhang wrote: > > > Currently, on GENERIC_IRQ_MULTI_HANDLER platforms, the handle_arch_irq > > > is a pointer which is set during booting, and every irq processing needs > > > to access it, so it sits in hot code path. We can use the > > > runtime constant mechanism which was introduced by Linus to speed up > > > its accessing. > > > > The proper solution is to use a static call and update it in > > set_handle_irq(). That removes the complete indirect call issue from > > the hot path. > > + Ard, Mark, > > Good idea. The remaining problem is no static call support for current > GENERIC_IRQ_MULTI_HANDLER (or similar, arm64 e.g) platforms. There are various reasons for not supporting static calls, and in general we end up having to have a fall-back path that's *more* expensive than just loading the pointer. > For arm64, Ard tried to add the static call support[1] in 2021, but > Mark concerned "compiler could easily violate our expectations in > future"[2], To be clear, that's ONE specific concern, not the ONLY reason. > and asked for static calls "critical rather than a nice-to-have" > usage. > > Hi Ard, Mark, > > Could this irq performance improvement be used as a "critical" usage for > arm64 static call? Per my test, about 6.5% improvement was seen on quad CA55. As per my other mail, does this meaningfully affect a real workload? > Another alternative: disable static call if CFI is enabled, and give > the platform/SoC users chance to enable static call to benefit from > it. Who is this actually going to matter to? Mark. > > Any comment is appreciated. > > Thanks > > > [1] https://www.spinics.net/lists/arm-kernel/msg931861.html > > [2] https://www.spinics.net/lists/arm-kernel/msg932481.html > > > > > Thanks, > > > > tglx >
On Mon, Feb 23, 2026 at 01:11:46PM +0000, Mark Rutland wrote: > On Mon, Feb 23, 2026 at 08:41:55PM +0800, Jisheng Zhang wrote: > > On Sun, Feb 22, 2026 at 11:06:11PM +0100, Thomas Gleixner wrote: > > > On Fri, Feb 20 2026 at 17:09, Jisheng Zhang wrote: > > > > Currently, on GENERIC_IRQ_MULTI_HANDLER platforms, the handle_arch_irq > > > > is a pointer which is set during booting, and every irq processing needs > > > > to access it, so it sits in hot code path. We can use the > > > > runtime constant mechanism which was introduced by Linus to speed up > > > > its accessing. > > > > > > The proper solution is to use a static call and update it in > > > set_handle_irq(). That removes the complete indirect call issue from > > > the hot path. > > > > + Ard, Mark, > > > > Good idea. The remaining problem is no static call support for current > > GENERIC_IRQ_MULTI_HANDLER (or similar, arm64 e.g) platforms. > > There are various reasons for not supporting static calls, and in > general we end up having to have a fall-back path that's *more* > expensive than just loading the pointer. indeed, if arch doesn't support static call, the fall-back addes one more loading overhead. > > > For arm64, Ard tried to add the static call support[1] in 2021, but > > Mark concerned "compiler could easily violate our expectations in > > future"[2], > > To be clear, that's ONE specific concern, not the ONLY reason. > > > and asked for static calls "critical rather than a nice-to-have" > > usage. > > > > Hi Ard, Mark, > > > > Could this irq performance improvement be used as a "critical" usage for > > arm64 static call? Per my test, about 6.5% improvement was seen on quad CA55. > > As per my other mail, does this meaningfully affect a real workload? This improves generic irq processcing, I think all real workload is affected. > > > Another alternative: disable static call if CFI is enabled, and give > > the platform/SoC users chance to enable static call to benefit from > > it. > > Who is this actually going to matter to? >
On Mon, Feb 23, 2026 at 09:22:44PM +0800, Jisheng Zhang wrote: > On Mon, Feb 23, 2026 at 01:11:46PM +0000, Mark Rutland wrote: > > There are various reasons for not supporting static calls, and in > > general we end up having to have a fall-back path that's *more* > > expensive than just loading the pointer. > > indeed, if arch doesn't support static call, the fall-back addes one > more loading overhead. I think you've misunderstood my point. I'm saying that *even if* arm64 supported static calls, we'd have to have dynamic fallback paths that are more expensive. For example, where branch range limitations force indirection via an out-of-line stub, adding an extra BL+RET pair. Note that was the case in the patches you linked from Ard. [...] > > > and asked for static calls "critical rather than a nice-to-have" > > > usage. > > > > > > Hi Ard, Mark, > > > > > > Could this irq performance improvement be used as a "critical" usage for > > > arm64 static call? Per my test, about 6.5% improvement was seen on quad CA55. > > > > As per my other mail, does this meaningfully affect a real workload? > > This improves generic irq processcing, I think all real workload is affected. I asked about meaningful impact. For a real workload, does this show up at all, or does this fall within the noise? At present, I don't think your singular microbenchmark result changes our previous decisions regarding static calls. For various reasons, on arm64 static calls are nowhere near as significant an optimization (and can be worse). I'd be happy to use a runtime constant (modulo my concerns about the initial value) given that the we already have the infrastructure and the maintenanace impact is minimal. Mark.
© 2016 - 2026 Red Hat, Inc.