Allowing irq_work to be scheduled while trying to suspend has shown
to cause problems as some architectures interpret the pending
interrupts as a reason to not suspend. This became a problem for
printk() with the introduction of NBCON consoles. With every
printk() call, NBCON console printing kthreads are woken by queueing
irq_work. This means that irq_work continues to be queued due to
printk() calls late in the suspend procedure.
Avoid this problem by preventing printk() from queueing irq_work
once console suspending has begun. This applies to triggering NBCON
and legacy deferred printing as well as klogd waiters.
Since triggering of NBCON threaded printing relies on irq_work, the
pr_flush() within console_suspend_all() is used to perform the final
flushing before suspending consoles and blocking irq_work queueing.
NBCON consoles that are not suspended (due to the usage of the
"no_console_suspend" boot argument) transition to atomic flushing.
Introduce a new global variable @console_irqwork_blocked to flag
when irq_work queueing is to be avoided. The flag is used by
printk_get_console_flush_type() to avoid allowing deferred printing
and switch NBCON consoles to atomic flushing. It is also used by
vprintk_emit() to avoid klogd waking.
Add WARN_ON_ONCE(console_irqwork_blocked) to the irq_work queuing
functions to catch any code that attempts to queue printk irq_work
during the suspending/resuming procedure.
Cc: <stable@vger.kernel.org> # 6.13.x because no drivers in 6.12.x
Fixes: 6b93bb41f6ea ("printk: Add non-BKL (nbcon) console basic infrastructure")
Closes: https://lore.kernel.org/lkml/DB9PR04MB8429E7DDF2D93C2695DE401D92C4A@DB9PR04MB8429.eurprd04.prod.outlook.com
Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
@sherry.sun: This patch is essentially the same as v1, but since
two WARN_ON_ONCE() were added, I decided not to use your
Tested-by. It would be great if you could test again with this
series.
kernel/printk/internal.h | 8 +++---
kernel/printk/nbcon.c | 7 +++++
kernel/printk/printk.c | 58 +++++++++++++++++++++++++++++-----------
3 files changed, 55 insertions(+), 18 deletions(-)
diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index f72bbfa266d6c..b20929b7d71f5 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -230,6 +230,8 @@ struct console_flush_type {
bool legacy_offload;
};
+extern bool console_irqwork_blocked;
+
/*
* Identify which console flushing methods should be used in the context of
* the caller.
@@ -241,7 +243,7 @@ static inline void printk_get_console_flush_type(struct console_flush_type *ft)
switch (nbcon_get_default_prio()) {
case NBCON_PRIO_NORMAL:
if (have_nbcon_console && !have_boot_console) {
- if (printk_kthreads_running)
+ if (printk_kthreads_running && !console_irqwork_blocked)
ft->nbcon_offload = true;
else
ft->nbcon_atomic = true;
@@ -251,7 +253,7 @@ static inline void printk_get_console_flush_type(struct console_flush_type *ft)
if (have_legacy_console || have_boot_console) {
if (!is_printk_legacy_deferred())
ft->legacy_direct = true;
- else
+ else if (!console_irqwork_blocked)
ft->legacy_offload = true;
}
break;
@@ -264,7 +266,7 @@ static inline void printk_get_console_flush_type(struct console_flush_type *ft)
if (have_legacy_console || have_boot_console) {
if (!is_printk_legacy_deferred())
ft->legacy_direct = true;
- else
+ else if (!console_irqwork_blocked)
ft->legacy_offload = true;
}
break;
diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
index 73f315fd97a3e..730d14f6cbc58 100644
--- a/kernel/printk/nbcon.c
+++ b/kernel/printk/nbcon.c
@@ -1276,6 +1276,13 @@ void nbcon_kthreads_wake(void)
if (!printk_kthreads_running)
return;
+ /*
+ * It is not allowed to call this function when console irq_work
+ * is blocked.
+ */
+ if (WARN_ON_ONCE(console_irqwork_blocked))
+ return;
+
cookie = console_srcu_read_lock();
for_each_console_srcu(con) {
if (!(console_srcu_read_flags(con) & CON_NBCON))
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index dc89239cf1b58..b1c0d35cf3caa 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -462,6 +462,9 @@ bool have_boot_console;
/* See printk_legacy_allow_panic_sync() for details. */
bool legacy_allow_panic_sync;
+/* Avoid using irq_work when suspending. */
+bool console_irqwork_blocked;
+
#ifdef CONFIG_PRINTK
DECLARE_WAIT_QUEUE_HEAD(log_wait);
static DECLARE_WAIT_QUEUE_HEAD(legacy_wait);
@@ -2426,7 +2429,7 @@ asmlinkage int vprintk_emit(int facility, int level,
if (ft.legacy_offload)
defer_console_output();
- else
+ else if (!console_irqwork_blocked)
wake_up_klogd();
return printed_len;
@@ -2730,10 +2733,20 @@ void console_suspend_all(void)
{
struct console *con;
+ if (console_suspend_enabled)
+ pr_info("Suspending console(s) (use no_console_suspend to debug)\n");
+
+ /*
+ * Flush any console backlog and then avoid queueing irq_work until
+ * console_resume_all(). Until then deferred printing is no longer
+ * triggered, NBCON consoles transition to atomic flushing, and
+ * any klogd waiters are not triggered.
+ */
+ pr_flush(1000, true);
+ console_irqwork_blocked = true;
+
if (!console_suspend_enabled)
return;
- pr_info("Suspending console(s) (use no_console_suspend to debug)\n");
- pr_flush(1000, true);
console_list_lock();
for_each_console(con)
@@ -2754,26 +2767,34 @@ void console_resume_all(void)
struct console_flush_type ft;
struct console *con;
- if (!console_suspend_enabled)
- return;
-
- console_list_lock();
- for_each_console(con)
- console_srcu_write_flags(con, con->flags & ~CON_SUSPENDED);
- console_list_unlock();
-
/*
- * Ensure that all SRCU list walks have completed. All printing
- * contexts must be able to see they are no longer suspended so
- * that they are guaranteed to wake up and resume printing.
+ * Allow queueing irq_work. After restoring console state, deferred
+ * printing and any klogd waiters need to be triggered in case there
+ * is now a console backlog.
*/
- synchronize_srcu(&console_srcu);
+ console_irqwork_blocked = false;
+
+ if (console_suspend_enabled) {
+ console_list_lock();
+ for_each_console(con)
+ console_srcu_write_flags(con, con->flags & ~CON_SUSPENDED);
+ console_list_unlock();
+
+ /*
+ * Ensure that all SRCU list walks have completed. All printing
+ * contexts must be able to see they are no longer suspended so
+ * that they are guaranteed to wake up and resume printing.
+ */
+ synchronize_srcu(&console_srcu);
+ }
printk_get_console_flush_type(&ft);
if (ft.nbcon_offload)
nbcon_kthreads_wake();
if (ft.legacy_offload)
defer_console_output();
+ else
+ wake_up_klogd();
pr_flush(1000, true);
}
@@ -4511,6 +4532,13 @@ static void __wake_up_klogd(int val)
if (!printk_percpu_data_ready())
return;
+ /*
+ * It is not allowed to call this function when console irq_work
+ * is blocked.
+ */
+ if (WARN_ON_ONCE(console_irqwork_blocked))
+ return;
+
preempt_disable();
/*
* Guarantee any new records can be seen by tasks preparing to wait
--
2.47.3
On Thu 2025-11-13 17:09:48, John Ogness wrote:
> Allowing irq_work to be scheduled while trying to suspend has shown
> to cause problems as some architectures interpret the pending
> interrupts as a reason to not suspend. This became a problem for
> printk() with the introduction of NBCON consoles. With every
> printk() call, NBCON console printing kthreads are woken by queueing
> irq_work. This means that irq_work continues to be queued due to
> printk() calls late in the suspend procedure.
>
> Avoid this problem by preventing printk() from queueing irq_work
> once console suspending has begun. This applies to triggering NBCON
> and legacy deferred printing as well as klogd waiters.
>
> Since triggering of NBCON threaded printing relies on irq_work, the
> pr_flush() within console_suspend_all() is used to perform the final
> flushing before suspending consoles and blocking irq_work queueing.
> NBCON consoles that are not suspended (due to the usage of the
> "no_console_suspend" boot argument) transition to atomic flushing.
>
> Introduce a new global variable @console_irqwork_blocked to flag
> when irq_work queueing is to be avoided. The flag is used by
> printk_get_console_flush_type() to avoid allowing deferred printing
> and switch NBCON consoles to atomic flushing. It is also used by
> vprintk_emit() to avoid klogd waking.
>
> Add WARN_ON_ONCE(console_irqwork_blocked) to the irq_work queuing
> functions to catch any code that attempts to queue printk irq_work
> during the suspending/resuming procedure.
>
> Cc: <stable@vger.kernel.org> # 6.13.x because no drivers in 6.12.x
> Fixes: 6b93bb41f6ea ("printk: Add non-BKL (nbcon) console basic infrastructure")
> Closes: https://lore.kernel.org/lkml/DB9PR04MB8429E7DDF2D93C2695DE401D92C4A@DB9PR04MB8429.eurprd04.prod.outlook.com
> Signed-off-by: John Ogness <john.ogness@linutronix.de>
The changes look goot to me:
Reviewed-by: Petr Mladek <pmladek@suse.com>
Best Regards,
Petr
On Thu, Nov 13, 2025 at 05:09:48PM +0106, John Ogness wrote:
> ---
> @sherry.sun: This patch is essentially the same as v1, but since
> two WARN_ON_ONCE() were added, I decided not to use your
> Tested-by. It would be great if you could test again with this
> series.
>
> kernel/printk/internal.h | 8 +++---
> kernel/printk/nbcon.c | 7 +++++
> kernel/printk/printk.c | 58 +++++++++++++++++++++++++++++-----------
> 3 files changed, 55 insertions(+), 18 deletions(-)
>
> diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
> index 73f315fd97a3e..730d14f6cbc58 100644
> --- a/kernel/printk/nbcon.c
> +++ b/kernel/printk/nbcon.c
> @@ -1276,6 +1276,13 @@ void nbcon_kthreads_wake(void)
> if (!printk_kthreads_running)
> return;
>
> + /*
> + * It is not allowed to call this function when console irq_work
> + * is blocked.
> + */
> + if (WARN_ON_ONCE(console_irqwork_blocked))
> + return;
> +
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index dc89239cf1b58..b1c0d35cf3caa 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -462,6 +462,9 @@ bool have_boot_console;
> /* See printk_legacy_allow_panic_sync() for details. */
> bool legacy_allow_panic_sync;
>
> +/* Avoid using irq_work when suspending. */
> +bool console_irqwork_blocked;
> +
> #ifdef CONFIG_PRINTK
> DECLARE_WAIT_QUEUE_HEAD(log_wait);
> static DECLARE_WAIT_QUEUE_HEAD(legacy_wait);
> @@ -2426,7 +2429,7 @@ asmlinkage int vprintk_emit(int facility, int level,
>
> if (ft.legacy_offload)
> defer_console_output();
> - else
> + else if (!console_irqwork_blocked)
> wake_up_klogd();
>
> return printed_len;
> @@ -2730,10 +2733,20 @@ void console_suspend_all(void)
> {
> struct console *con;
>
> + if (console_suspend_enabled)
> + pr_info("Suspending console(s) (use no_console_suspend to debug)\n");
> +
> + /*
> + * Flush any console backlog and then avoid queueing irq_work until
> + * console_resume_all(). Until then deferred printing is no longer
> + * triggered, NBCON consoles transition to atomic flushing, and
> + * any klogd waiters are not triggered.
> + */
> + pr_flush(1000, true);
> + console_irqwork_blocked = true;
> +
Thanks for this. I have recently have been seeing the same issue with a large-CPU
workstation system in which the serial console been locking up entry/exit of S4
Hibernation sleep state at different intervals.
I am still running tests on the V1 of the series to determine reproducibility,
but I will try to get this version tested in a timely manner as well.
I did, however, test the proto-patch at [0]. The original issue was reproducible
with this patch applied. Avoiding klogd waking in vprintk_emit() and the
addition of the check in nbcon.c (new in this series) opposed to aborting
callers outright seems more airtight.
[0] https://github.com/Linutronix/linux/commit/ae173249d9028ef159fba040bdab260d80dda43f
--
Derek <debarbos@redhat.com>
Hi Derek, On 2025-11-13, Derek Barbosa <debarbos@redhat.com> wrote: > Thanks for this. I have recently have been seeing the same issue with a large-CPU > workstation system in which the serial console been locking up entry/exit of S4 > Hibernation sleep state at different intervals. > > I am still running tests on the V1 of the series to determine reproducibility, > but I will try to get this version tested in a timely manner as well. > > I did, however, test the proto-patch at [0]. The original issue was reproducible > with this patch applied. Avoiding klogd waking in vprintk_emit() and the > addition of the check in nbcon.c (new in this series) opposed to aborting > callers outright seems more airtight. I assume the problem you are seeing is with the PREEMPT_RT patches applied (i.e. with the 8250-NBCON included). If that is the case, note that recent versions of the 8250 driver introduce its own irq_work that is also problematic. I am currently reworking the 8250-NBCON series so that it does not introduce irq_work. Since you probably are not doing anything related to modem control, maybe you could test with the following hack (assuming you are using a v6.14 or later PREEMPT_RT patched kernel). diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c index 96d32db9f8872..2ad0f91ad467a 100644 --- a/drivers/tty/serial/8250/8250_port.c +++ b/drivers/tty/serial/8250/8250_port.c @@ -3459,7 +3459,7 @@ void serial8250_console_write(struct uart_8250_port *up, * may be a context that does not permit waking up tasks. */ if (is_atomic) - irq_work_queue(&up->modem_status_work); + ;//irq_work_queue(&up->modem_status_work); else serial8250_modem_status(up); } > [0] https://github.com/Linutronix/linux/commit/ae173249d9028ef159fba040bdab260d80dda43f John
Hi John, On Thu, Nov 13, 2025 at 06:12:57PM +0106, John Ogness wrote: > > I assume the problem you are seeing is with the PREEMPT_RT patches > applied (i.e. with the 8250-NBCON included). If that is the case, note > that recent versions of the 8250 driver introduce its own irq_work that > is also problematic. I am currently reworking the 8250-NBCON series so > that it does not introduce irq_work. > IIRC the aforementioned scenario was just recently tested with an rc5 kernel from Torvalds' tree. Sorry for any confusion > Since you probably are not doing anything related to modem control, > maybe you could test with the following hack (assuming you are using a > v6.14 or later PREEMPT_RT patched kernel). I'll give this a shot as a follow up, thanks for the suggestion > > diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c > index 96d32db9f8872..2ad0f91ad467a 100644 > --- a/drivers/tty/serial/8250/8250_port.c > +++ b/drivers/tty/serial/8250/8250_port.c > @@ -3459,7 +3459,7 @@ void serial8250_console_write(struct uart_8250_port *up, > * may be a context that does not permit waking up tasks. > */ > if (is_atomic) > - irq_work_queue(&up->modem_status_work); > + ;//irq_work_queue(&up->modem_status_work); > else > serial8250_modem_status(up); > } > > > [0] https://github.com/Linutronix/linux/commit/ae173249d9028ef159fba040bdab260d80dda43f > > John > -- Derek <debarbos@redhat.com>
On Thu, Nov 13, 2025 at 02:15:09PM -0500, Derek Barbosa wrote: > Hi John, > > On Thu, Nov 13, 2025 at 06:12:57PM +0106, John Ogness wrote: > > > > I assume the problem you are seeing is with the PREEMPT_RT patches > > applied (i.e. with the 8250-NBCON included). If that is the case, note > > that recent versions of the 8250 driver introduce its own irq_work that > > is also problematic. I am currently reworking the 8250-NBCON series so > > that it does not introduce irq_work. > > Hi John, Apologies for the late reply here. Just now got some results in. Testing this patch series atop of Linus' tree resolves the suspend issue seen on these large CPU workstation systems. I see this has already landed in the maintainers tree at printk/linux.git. Cheers, -- Derek <debarbos@redhat.com>
On Tue 2025-11-25 14:24:55, Derek Barbosa wrote: > On Thu, Nov 13, 2025 at 02:15:09PM -0500, Derek Barbosa wrote: > > Hi John, > > > > On Thu, Nov 13, 2025 at 06:12:57PM +0106, John Ogness wrote: > > > > > > I assume the problem you are seeing is with the PREEMPT_RT patches > > > applied (i.e. with the 8250-NBCON included). If that is the case, note > > > that recent versions of the 8250 driver introduce its own irq_work that > > > is also problematic. I am currently reworking the 8250-NBCON series so > > > that it does not introduce irq_work. > > > > > > Hi John, > > Apologies for the late reply here. Just now got some results in. No problem at all. > Testing this patch series atop of Linus' tree resolves the suspend issue seen on > these large CPU workstation systems. Thanks a lot for checking the patches. It is great to know that it resolved the problem. > I see this has already landed in the maintainers tree at printk/linux.git. Yes, I wanted to have it in linux-next in time before the merge window opens for 6.19 (likely next week). Best Regards, Petr
© 2016 - 2026 Red Hat, Inc.