kernel/irq_work.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
It was noticed that newer kernels (>= 6.1) raises a lot of "IRQ work"
interrupts where older ones (<= 5.10) haven't used this at all on our
system.
Root cause seems to be commit b4c6f86ec2f6 ('irq_work: Handle some
irq_work in a per-CPU thread on PREEMPT_RT'). This commit tries to avoid
calling irq_work callbacks from hardirq context as much as possible.
Therefore interrupts marked as IRQ_WORK_LAZY and (on PREEMT_RT) interrupts
not marked as IRQ_WORK_HARD_IRQ should be handled from an per-CPU thread.
Running the remaining interrupts from hardirq context is triggered using
irq_work_raise(). But on PREEMPT_RT irq_work_raise() will be called for
all interrupts not marked as IRQ_WORK_LAZY which results in unnecessary
"IRQ work" interrupts.
Fixes: b4c6f86ec2f6 ('irq_work: Handle some irq_work in a per-CPU thread on PREEMPT_RT')
Signed-off-by: Oliver Brandt <oliver.brandt@lenze.com>
---
kernel/irq_work.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/irq_work.c b/kernel/irq_work.c
index 2f4fb336dda1..df08b7dde7d5 100644
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -108,7 +108,7 @@ static void __irq_work_queue_local(struct irq_work *work)
return;
/* If the work is "lazy", handle it from next tick if any */
- if (!lazy_work || tick_nohz_tick_stopped())
+ if (!(lazy_work || rt_lazy_work) || tick_nohz_tick_stopped())
irq_work_raise(work);
}
--
2.46.0
On 2024-08-27 15:45:42 [+0000], Brandt, Oliver - Lenze wrote:
> It was noticed that newer kernels (>= 6.1) raises a lot of "IRQ work"
> interrupts where older ones (<= 5.10) haven't used this at all on our
> system.
>
> Root cause seems to be commit b4c6f86ec2f6 ('irq_work: Handle some
> irq_work in a per-CPU thread on PREEMPT_RT'). This commit tries to avoid
> calling irq_work callbacks from hardirq context as much as possible.
> Therefore interrupts marked as IRQ_WORK_LAZY and (on PREEMT_RT) interrupts
> not marked as IRQ_WORK_HARD_IRQ should be handled from an per-CPU thread.
>
> Running the remaining interrupts from hardirq context is triggered using
> irq_work_raise(). But on PREEMPT_RT irq_work_raise() will be called for
> all interrupts not marked as IRQ_WORK_LAZY which results in unnecessary
> "IRQ work" interrupts.
Good catch I think.
> Fixes: b4c6f86ec2f6 ('irq_work: Handle some irq_work in a per-CPU thread on PREEMPT_RT')
> Signed-off-by: Oliver Brandt <oliver.brandt@lenze.com>
> ---
> kernel/irq_work.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/irq_work.c b/kernel/irq_work.c
> index 2f4fb336dda1..df08b7dde7d5 100644
> --- a/kernel/irq_work.c
> +++ b/kernel/irq_work.c
> @@ -108,7 +108,7 @@ static void __irq_work_queue_local(struct irq_work *work)
> return;
>
> /* If the work is "lazy", handle it from next tick if any */
> - if (!lazy_work || tick_nohz_tick_stopped())
> + if (!(lazy_work || rt_lazy_work) || tick_nohz_tick_stopped())
> irq_work_raise(work);
Looking at this I *think* rt_lazy_work was needed earlier due to
different code but not anymore. Couldn't you just remove rt_lazy_work
and set lazy_work in the RT path? That should work.
> }
>
Sebastian
On 2024-08-28 11:37:20 [+0200], To Brandt, Oliver - Lenze wrote:
> > Fixes: b4c6f86ec2f6 ('irq_work: Handle some irq_work in a per-CPU thread on PREEMPT_RT')
> > Signed-off-by: Oliver Brandt <oliver.brandt@lenze.com>
> > ---
> > kernel/irq_work.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/kernel/irq_work.c b/kernel/irq_work.c
> > index 2f4fb336dda1..df08b7dde7d5 100644
> > --- a/kernel/irq_work.c
> > +++ b/kernel/irq_work.c
> > @@ -108,7 +108,7 @@ static void __irq_work_queue_local(struct irq_work *work)
> > return;
> >
> > /* If the work is "lazy", handle it from next tick if any */
> > - if (!lazy_work || tick_nohz_tick_stopped())
> > + if (!(lazy_work || rt_lazy_work) || tick_nohz_tick_stopped())
> > irq_work_raise(work);
>
> Looking at this I *think* rt_lazy_work was needed earlier due to
> different code but not anymore. Couldn't you just remove rt_lazy_work
> and set lazy_work in the RT path? That should work.
Actually no. If we merge rt_lazy_work into lazy_work then we would have
the behaviour you have here. But we need irq_work_raise() in order to
irq_work_run();
-> wake_irq_workd();
-> wake_up_process(irq_workd);
If we don't irq_work_raise() here then it will be delayed until the next
tick and we didn't want that if my memory serves me.
> > }
> >
Sebastian
On Wed, 2024-08-28 at 11:54 +0200, Sebastian Andrzej Siewior wrote:
> On 2024-08-28 11:37:20 [+0200], To Brandt, Oliver - Lenze wrote:
> > > Fixes: b4c6f86ec2f6 ('irq_work: Handle some irq_work in a per-CPU thread on PREEMPT_RT')
> > > Signed-off-by: Oliver Brandt <oliver.brandt@lenze.com>
> > > ---
> > > kernel/irq_work.c | 2 +-
> > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/kernel/irq_work.c b/kernel/irq_work.c
> > > index 2f4fb336dda1..df08b7dde7d5 100644
> > > --- a/kernel/irq_work.c
> > > +++ b/kernel/irq_work.c
> > > @@ -108,7 +108,7 @@ static void __irq_work_queue_local(struct irq_work *work)
> > > return;
> > >
> > > /* If the work is "lazy", handle it from next tick if any */
> > > - if (!lazy_work || tick_nohz_tick_stopped())
> > > + if (!(lazy_work || rt_lazy_work) || tick_nohz_tick_stopped())
> > > irq_work_raise(work);
> >
> > Looking at this I *think* rt_lazy_work was needed earlier due to
> > different code but not anymore. Couldn't you just remove rt_lazy_work
> > and set lazy_work in the RT path? That should work.
>
> Actually no. If we merge rt_lazy_work into lazy_work then we would have
> the behaviour you have here. But we need irq_work_raise() in order to
> irq_work_run();
> -> wake_irq_workd();
> -> wake_up_process(irq_workd);
>
> If we don't irq_work_raise() here then it will be delayed until the next
> tick and we didn't want that if my memory serves me.
Hmm.... I see. What about calling wake_irq_workd() directly; something
like
if (rt_lazy_work)
wake_irq_workd();
else if (!lazy_work || tick_nohz_tick_stopped())
irq_work_raise(work);
>
> Sebastian
Regards,
Oliver
On 2024-08-28 13:26:42 [+0000], Brandt, Oliver - Lenze wrote: > > Hmm.... I see. What about calling wake_irq_workd() directly; something > like > > if (rt_lazy_work) > wake_irq_workd(); > else if (!lazy_work || tick_nohz_tick_stopped()) > irq_work_raise(work); this might work but I'm not too sure about it. This will become a problem if irq_work_queue() is invoked from a path where scheduling is not possible due to recursion or acquired locks. How much of a problem is it and how much you gain by doing so? > Regards, > Oliver Sebastian
On Wed, 2024-08-28 at 16:02 +0200, bigeasy@linutronix.de wrote: > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. > > On 2024-08-28 13:26:42 [+0000], Brandt, Oliver - Lenze wrote: > > Hmm.... I see. What about calling wake_irq_workd() directly; something > > like > > > > if (rt_lazy_work) > > wake_irq_workd(); > > else if (!lazy_work || tick_nohz_tick_stopped()) > > irq_work_raise(work); > > this might work but I'm not too sure about it. This will become a > problem if irq_work_queue() is invoked from a path where scheduling is > not possible due to recursion or acquired locks. > > How much of a problem is it and how much you gain by doing so? > To be honest I haven't made any measurements. But we have a system with *very* tight timing constrains: One thing is a 16kHz irq; the other a 4kHz RT-task; both running on an isolated core. So if we assume ~2us "overhead" for an irq (this should be more or less the time on our system; Cortex-A9 with 800MHz) we would spend ~1% of our 250us grid for the additionally irq. Not really something we would like. Additionally we may get an (additionally) latency of ~2us before our 16- kHz irq could go to work. Also something we wouldn't like. What I didn't understand: The "IRQ work" irqs are needed in order to start the execution of something. Ok. But how was this done before? It seems that "softirqs" were used for this purpose on kernel v4.14 (don't ask why we are using such an old version). But I didn't get the idea of these "softirqs". Are they triggered via "real" irqs (e.g. IPIs)? In this case we would most likely have the same count of irqs on a kernel 4.14 and a kernel 6.1 (our goal for now; I don't lose hope that even a v6.6 may be used the next year). Any ideas about this assumption? > > Sebastian Oliver
On 2024-08-28 14:52:17 [+0000], Brandt, Oliver - Lenze wrote: > On Wed, 2024-08-28 at 16:02 +0200, bigeasy@linutronix.de wrote: > > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. > > > > On 2024-08-28 13:26:42 [+0000], Brandt, Oliver - Lenze wrote: > > > Hmm.... I see. What about calling wake_irq_workd() directly; something > > > like > > > > > > if (rt_lazy_work) > > > wake_irq_workd(); > > > else if (!lazy_work || tick_nohz_tick_stopped()) > > > irq_work_raise(work); > > > > this might work but I'm not too sure about it. This will become a > > problem if irq_work_queue() is invoked from a path where scheduling is > > not possible due to recursion or acquired locks. > > > > How much of a problem is it and how much you gain by doing so? > > > > To be honest I haven't made any measurements. But we have a system with > *very* tight timing constrains: One thing is a 16kHz irq; the other a > 4kHz RT-task; both running on an isolated core. So if we assume ~2us > "overhead" for an irq (this should be more or less the time on our > system; Cortex-A9 with 800MHz) we would spend ~1% of our 250us grid for > the additionally irq. Not really something we would like. > > Additionally we may get an (additionally) latency of ~2us before our 16- > kHz irq could go to work. Also something we wouldn't like. This is a local-IRQ. It will be slightly more expensive than doing the wakeup directly. > What I didn't understand: The "IRQ work" irqs are needed in order to > start the execution of something. Ok. But how was this done before? It > seems that "softirqs" were used for this purpose on kernel v4.14 (don't > ask why we are using such an old version). But I didn't get the idea of > these "softirqs". Are they triggered via "real" irqs (e.g. IPIs)? In > this case we would most likely have the same count of irqs on a kernel > 4.14 and a kernel 6.1 (our goal for now; I don't lose hope that even a > v6.6 may be used the next year). You schedule a "work item" via irq_work. This can be compared with tasklet or workqueue for instance. In the past a softirq was raised and it was handled as part it. Raising a softirq is simply ORing a bit and the softirq will be handled on IRQ-exit path or ksoftirqd will be scheduled (= task wakeup). This is all CPU-local in general. Cross-CPU requires sending an interrupt (IPI) and within that IPI you need to raise (OR) the bit for softirq. It is likely I had some hacks to get it work. However. Some of the things require hard-irq context and IRQs-off and it might be triggered from an NMI and the former infrastructure was not really fit for it. So we went closer to what mainline is doing and this is what we have now. You could look at what is triggering the irq_work requests and maybe based on understanding you could disable it (say this is issued by driver X and you can disable it because you don't use it). You could, without breaking much (I think), avoid triggering irq-work locally for the "only-lazy-work" case and then ensuring the timer-tick will do the wake-up of the irq-work thread. This is done in case there is no HW support for signaling irq-work. So would have less irqs but the work will be delayed to the next jiffy which will it make take a little bit longer. Ideally would be avoiding having to deal with irq-work in the first place. > Any ideas about this assumption? > > Oliver Sebastian
© 2016 - 2025 Red Hat, Inc.