include/linux/sched.h | 1 + kernel/irq/manage.c | 5 ++++- kernel/sched/syscalls.c | 11 +++++++++++ 3 files changed, 16 insertions(+), 1 deletion(-)
Crystal reports the PCIe Advanced Error Reporting driver getting stuck
in an infinite loop on PREEMPT_RT: Both the primary IRQ handler
aer_irq() as well as the secondary handler aer_isr() are forced into
threads with identical priority.
Crystal writes that on the ARM system in question, the primary IRQ
handler has to clear an error in the Root Error Status register...
"before the next error happens, or else the hardware will set the
Multiple ERR_COR Received bit. If that bit is set, then aer_isr()
can't rely on the Error Source Identification register, so it scans
through all devices looking for errors -- and for some reason, on
this system, accessing the AER registers (or any Config Space above
0x400, even though there are capabilities located there) generates
an Unsupported Request Error (but returns valid data). Since this
happens more than once, without aer_irq() preempting, it causes
another multi error and we get stuck in a loop."
The issue does not show on non-PREEMPT_RT since the primary IRQ handler
runs in hardirq context and thus clears the Root Error Status register
before waking the secondary IRQ handler's thread.
Simulate the same behavior on PREEMPT_RT by assigning a lower default
priority to forced secondary IRQ handlers.
Reported-by: Crystal Wood <crwood@redhat.com>
Tested-by: Crystal Wood <crwood@redhat.com>
Closes: https://lore.kernel.org/r/20250902224441.368483-1-crwood@redhat.com/
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
Crystal provided a Tested-by off-list. I've used Crystal's suggestion
for the code comment, but if anyone prefers Sebastian's or some other
wording, I don't mind either respinning myself or a maintainer amending
this or any other aspect of the patch when applying. Thanks!
include/linux/sched.h | 1 +
kernel/irq/manage.c | 5 ++++-
kernel/sched/syscalls.c | 11 +++++++++++
3 files changed, 16 insertions(+), 1 deletion(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 2b27238..ceed23d 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1894,6 +1894,7 @@ static inline int task_nice(const struct task_struct *p)
extern int sched_setscheduler(struct task_struct *, int, const struct sched_param *);
extern int sched_setscheduler_nocheck(struct task_struct *, int, const struct sched_param *);
extern void sched_set_fifo(struct task_struct *p);
+extern void sched_set_fifo_minus_one(struct task_struct *p);
extern void sched_set_fifo_low(struct task_struct *p);
extern void sched_set_normal(struct task_struct *p, int nice);
extern int sched_setattr(struct task_struct *, const struct sched_attr *);
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index c948373..b09c18a 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -1239,7 +1239,10 @@ static int irq_thread(void *data)
irq_thread_set_ready(desc, action);
- sched_set_fifo(current);
+ if (action->handler == irq_forced_secondary_handler)
+ sched_set_fifo_minus_one(current);
+ else
+ sched_set_fifo(current);
if (force_irqthreads() && test_bit(IRQTF_FORCED_THREAD,
&action->thread_flags))
diff --git a/kernel/sched/syscalls.c b/kernel/sched/syscalls.c
index 77ae87f..4be85aa 100644
--- a/kernel/sched/syscalls.c
+++ b/kernel/sched/syscalls.c
@@ -847,6 +847,17 @@ void sched_set_fifo(struct task_struct *p)
EXPORT_SYMBOL_GPL(sched_set_fifo);
/*
+ * For tasks that must be preemptible by a sched_set_fifo() task, such as
+ * to simulate the behavior of a non-PREEMPT_RT system where the
+ * sched_set_fifo() task is a hard interrupt.
+ */
+void sched_set_fifo_minus_one(struct task_struct *p)
+{
+ struct sched_param sp = { .sched_priority = MAX_RT_PRIO / 2 - 1 };
+ WARN_ON_ONCE(sched_setscheduler_nocheck(p, SCHED_FIFO, &sp) != 0);
+}
+
+/*
* For when you don't much care about FIFO, but want to be above SCHED_NORMAL.
*/
void sched_set_fifo_low(struct task_struct *p)
--
2.50.1
© 2016 - 2025 Red Hat, Inc.