include/linux/sched.h | 1 + kernel/irq/manage.c | 5 ++++- kernel/sched/syscalls.c | 13 +++++++++++++ 3 files changed, 18 insertions(+), 1 deletion(-)
Crystal reports that the PCIe Advanced Error Reporting driver gets stuck
in an infinite loop on PREEMPT_RT:
Both the primary interrupt handler aer_irq() as well as the secondary
handler aer_isr() are forced into threads with identical priority.
Crystal writes that on the ARM system in question, the primary handler
has to clear an error in the Root Error Status register...
"before the next error happens, or else the hardware will set the
Multiple ERR_COR Received bit. If that bit is set, then aer_isr()
can't rely on the Error Source Identification register, so it scans
through all devices looking for errors -- and for some reason, on
this system, accessing the AER registers (or any Config Space above
0x400, even though there are capabilities located there) generates
an Unsupported Request Error (but returns valid data). Since this
happens more than once, without aer_irq() preempting, it causes
another multi error and we get stuck in a loop."
The issue does not show on non-PREEMPT_RT because the primary handler
runs in hardirq context and thus can preempt the threaded secondary
handler, clear the Root Error Status register and prevent the secondary
handler from getting stuck.
Emulate the same behavior on PREEMPT_RT by assigning a lower default
priority to the secondary handler if the primary handler is forced into
a thread.
Reported-by: Crystal Wood <crwood@redhat.com>
Tested-by: Crystal Wood <crwood@redhat.com>
Closes: https://lore.kernel.org/r/20250902224441.368483-1-crwood@redhat.com/
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
Changes v1 -> v2:
* Rename to sched_set_fifo_secondary() (Thomas)
* Rephrase commit message and code comment (Thomas)
Link to v1:
https://lore.kernel.org/r/83f58870043e2ae64f19b3a2169b5c3cf3f95130.1757346718.git.lukas@wunner.de/
include/linux/sched.h | 1 +
kernel/irq/manage.c | 5 ++++-
kernel/sched/syscalls.c | 13 +++++++++++++
3 files changed, 18 insertions(+), 1 deletion(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index cbb7340..cd6be74 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1901,6 +1901,7 @@ static inline int task_nice(const struct task_struct *p)
extern int sched_setscheduler_nocheck(struct task_struct *, int, const struct sched_param *);
extern void sched_set_fifo(struct task_struct *p);
extern void sched_set_fifo_low(struct task_struct *p);
+extern void sched_set_fifo_secondary(struct task_struct *p);
extern void sched_set_normal(struct task_struct *p, int nice);
extern int sched_setattr(struct task_struct *, const struct sched_attr *);
extern int sched_setattr_nocheck(struct task_struct *, const struct sched_attr *);
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index c948373..268d751 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -1239,7 +1239,10 @@ static int irq_thread(void *data)
irq_thread_set_ready(desc, action);
- sched_set_fifo(current);
+ if (action->handler == irq_forced_secondary_handler)
+ sched_set_fifo_secondary(current);
+ else
+ sched_set_fifo(current);
if (force_irqthreads() && test_bit(IRQTF_FORCED_THREAD,
&action->thread_flags))
diff --git a/kernel/sched/syscalls.c b/kernel/sched/syscalls.c
index 77ae87f..0f00ac7 100644
--- a/kernel/sched/syscalls.c
+++ b/kernel/sched/syscalls.c
@@ -856,6 +856,19 @@ void sched_set_fifo_low(struct task_struct *p)
}
EXPORT_SYMBOL_GPL(sched_set_fifo_low);
+/*
+ * For when the primary interrupt handler is forced into a thread, in addition
+ * to the (always threaded) secondary handler. The secondary handler gets a
+ * slightly lower priority so that the primary handler can preempt it, thereby
+ * emulating the behavior of a non-PREEMPT_RT system where the primary handler
+ * runs in hardirq context.
+ */
+void sched_set_fifo_secondary(struct task_struct *p)
+{
+ struct sched_param sp = { .sched_priority = MAX_RT_PRIO / 2 - 1 };
+ WARN_ON_ONCE(sched_setscheduler_nocheck(p, SCHED_FIFO, &sp) != 0);
+}
+
void sched_set_normal(struct task_struct *p, int nice)
{
struct sched_attr attr = {
--
2.51.0
On 2025-10-27 13:59:31 [+0100], Lukas Wunner wrote: > Crystal reports that the PCIe Advanced Error Reporting driver gets stuck > in an infinite loop on PREEMPT_RT: > > Both the primary interrupt handler aer_irq() as well as the secondary > handler aer_isr() are forced into threads with identical priority. > Crystal writes that on the ARM system in question, the primary handler > has to clear an error in the Root Error Status register... > > "before the next error happens, or else the hardware will set the > Multiple ERR_COR Received bit. If that bit is set, then aer_isr() > can't rely on the Error Source Identification register, so it scans > through all devices looking for errors -- and for some reason, on > this system, accessing the AER registers (or any Config Space above > 0x400, even though there are capabilities located there) generates > an Unsupported Request Error (but returns valid data). Since this > happens more than once, without aer_irq() preempting, it causes > another multi error and we get stuck in a loop." > > The issue does not show on non-PREEMPT_RT because the primary handler > runs in hardirq context and thus can preempt the threaded secondary > handler, clear the Root Error Status register and prevent the secondary > handler from getting stuck. Not sure if I mentioned it before but this is due to forced threaded IRQs which can also be enabled on non-PREEMPT_RT systems via `threadirqs`. > Emulate the same behavior on PREEMPT_RT by assigning a lower default > priority to the secondary handler if the primary handler is forced into > a thread. > > Reported-by: Crystal Wood <crwood@redhat.com> > Tested-by: Crystal Wood <crwood@redhat.com> > Closes: https://lore.kernel.org/r/20250902224441.368483-1-crwood@redhat.com/ > Signed-off-by: Lukas Wunner <lukas@wunner.de> > --- a/kernel/sched/syscalls.c > +++ b/kernel/sched/syscalls.c > @@ -856,6 +856,19 @@ void sched_set_fifo_low(struct task_struct *p) > } > EXPORT_SYMBOL_GPL(sched_set_fifo_low); > > +/* > + * For when the primary interrupt handler is forced into a thread, in addition > + * to the (always threaded) secondary handler. The secondary handler gets a > + * slightly lower priority so that the primary handler can preempt it, thereby > + * emulating the behavior of a non-PREEMPT_RT system where the primary handler > + * runs in hardirq context. s/non-PREEMPT_RT/non-forced threaded/ ? Other than that, Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Sebastian
On Tue, Oct 28, 2025 at 01:06:52PM +0100, Sebastian Andrzej Siewior wrote:
> On 2025-10-27 13:59:31 [+0100], Lukas Wunner wrote:
> > The issue does not show on non-PREEMPT_RT because the primary handler
> > runs in hardirq context and thus can preempt the threaded secondary
> > handler, clear the Root Error Status register and prevent the secondary
> > handler from getting stuck.
>
> Not sure if I mentioned it before but this is due to forced threaded
> IRQs which can also be enabled on non-PREEMPT_RT systems via `threadirqs`.
According to the commit which introduced the "threadirqs" command line
option, 8d32a307e4fa ("genirq: Provide forced interrupt threading"),
it is "mostly a debug option". I guess the option allows testing
the waters on arches which do not yet "select ARCH_SUPPORTS_RT"
to see if force-threaded interrupts break anything. I recall the
option being available in mainline for much longer than PREEMPT_RT
and it was definitely useful as a justification to upstream changes
which were otherwise only needed by the out-of-tree PREEMPT_RT patches.
Intuitively I would assume that debug options are not worth calling out
in commit messages or code comments as users and developers will
primarily be interested in the real deal (i.e. PREEMPT_RT) and not
an option which gets us only halfway there. However if you
(or anyone else) feels strongly about it, I'll be happy to respin.
Thanks for taking a look!
Lukas
On 2025-10-28 14:44:38 [+0100], Lukas Wunner wrote:
> On Tue, Oct 28, 2025 at 01:06:52PM +0100, Sebastian Andrzej Siewior wrote:
> > On 2025-10-27 13:59:31 [+0100], Lukas Wunner wrote:
> > > The issue does not show on non-PREEMPT_RT because the primary handler
> > > runs in hardirq context and thus can preempt the threaded secondary
> > > handler, clear the Root Error Status register and prevent the secondary
> > > handler from getting stuck.
> >
> > Not sure if I mentioned it before but this is due to forced threaded
> > IRQs which can also be enabled on non-PREEMPT_RT systems via `threadirqs`.
>
> According to the commit which introduced the "threadirqs" command line
> option, 8d32a307e4fa ("genirq: Provide forced interrupt threading"),
> it is "mostly a debug option". I guess the option allows testing
> the waters on arches which do not yet "select ARCH_SUPPORTS_RT"
> to see if force-threaded interrupts break anything. I recall the
> option being available in mainline for much longer than PREEMPT_RT
> and it was definitely useful as a justification to upstream changes
> which were otherwise only needed by the out-of-tree PREEMPT_RT patches.
There are people using it without PREEMPT_RT due to $reasons. It is not
documented in Documentation/admin-guide/kernel-parameters.txt as an
option meant only for debugging.
> Intuitively I would assume that debug options are not worth calling out
> in commit messages or code comments as users and developers will
> primarily be interested in the real deal (i.e. PREEMPT_RT) and not
> an option which gets us only halfway there. However if you
> (or anyone else) feels strongly about it, I'll be happy to respin.
I argued and sent patches to fix code which was wrong on PREEMPT_RT due
to threadirqs and was also wrong without PREEMPT_RT enabled but solely
with forced-threaded irqs.
But please wait once tglx/irq or peterz/sched says something here before
repining.
> Thanks for taking a look!
>
> Lukas
Sebastian
© 2016 - 2026 Red Hat, Inc.