[PATCH printk v2 0/2] Fix reported suspend failures

John Ogness posted 2 patches 2 months, 3 weeks ago
kernel/printk/internal.h |  8 ++--
kernel/printk/nbcon.c    |  9 ++++-
kernel/printk/printk.c   | 81 ++++++++++++++++++++++++++++++++--------
3 files changed, 78 insertions(+), 20 deletions(-)
[PATCH printk v2 0/2] Fix reported suspend failures
Posted by John Ogness 2 months, 3 weeks ago
This is v2 of a series to address multiple reports [0][1]
(+ 2 offlist) of suspend failing when NBCON console drivers are
in use. With the help of NXP and NVIDIA we were able to isolate
the problem and verify the fix.

v1 is here [2].

The first NBCON drivers appeared in 6.13, so currently there is
no LTS kernel that requires this series. But it should go into
6.17.x and 6.18.

The changes since v1:

- For printk_trigger_flush() add support for all flush types
  that are available. This will prevent printk_trigger_flush()
  from trying to inappropriately queue irq_work after this
  series is applied.

- Add WARN_ON_ONCE() to the printk irq_work queueing functions
  in case they are called when irq_work is blocked. There
  should never be (and currently are no) such callers, but
  these functions are externally available.

John Ogness

[0] https://lore.kernel.org/lkml/80b020fc-c18a-4da4-b222-16da1cab2f4c@nvidia.com
[1] https://lore.kernel.org/lkml/DB9PR04MB8429E7DDF2D93C2695DE401D92C4A@DB9PR04MB8429.eurprd04.prod.outlook.com
[2] https://lore.kernel.org/lkml/20251111144328.887159-1-john.ogness@linutronix.de

John Ogness (2):
  printk: Allow printk_trigger_flush() to flush all types
  printk: Avoid scheduling irq_work on suspend

 kernel/printk/internal.h |  8 ++--
 kernel/printk/nbcon.c    |  9 ++++-
 kernel/printk/printk.c   | 81 ++++++++++++++++++++++++++++++++--------
 3 files changed, 78 insertions(+), 20 deletions(-)


base-commit: e9a6fb0bcdd7609be6969112f3fbfcce3b1d4a7c
-- 
2.47.3
Re: [PATCH printk v2 0/2] Fix reported suspend failures
Posted by Thierry Reding 2 months, 2 weeks ago
On Thu, Nov 13, 2025 at 05:09:46PM +0106, John Ogness wrote:
> This is v2 of a series to address multiple reports [0][1]
> (+ 2 offlist) of suspend failing when NBCON console drivers are
> in use. With the help of NXP and NVIDIA we were able to isolate
> the problem and verify the fix.
> 
> v1 is here [2].
> 
> The first NBCON drivers appeared in 6.13, so currently there is
> no LTS kernel that requires this series. But it should go into
> 6.17.x and 6.18.
> 
> The changes since v1:
> 
> - For printk_trigger_flush() add support for all flush types
>   that are available. This will prevent printk_trigger_flush()
>   from trying to inappropriately queue irq_work after this
>   series is applied.
> 
> - Add WARN_ON_ONCE() to the printk irq_work queueing functions
>   in case they are called when irq_work is blocked. There
>   should never be (and currently are no) such callers, but
>   these functions are externally available.
> 
> John Ogness
> 
> [0] https://lore.kernel.org/lkml/80b020fc-c18a-4da4-b222-16da1cab2f4c@nvidia.com
> [1] https://lore.kernel.org/lkml/DB9PR04MB8429E7DDF2D93C2695DE401D92C4A@DB9PR04MB8429.eurprd04.prod.outlook.com
> [2] https://lore.kernel.org/lkml/20251111144328.887159-1-john.ogness@linutronix.de
> 
> John Ogness (2):
>   printk: Allow printk_trigger_flush() to flush all types
>   printk: Avoid scheduling irq_work on suspend
> 
>  kernel/printk/internal.h |  8 ++--
>  kernel/printk/nbcon.c    |  9 ++++-
>  kernel/printk/printk.c   | 81 ++++++++++++++++++++++++++++++++--------
>  3 files changed, 78 insertions(+), 20 deletions(-)

Sorry, I'm a bit late, but this seems to solve all the issues we were
seeing on Tegra boards, so for the record:

Tested-by: Thierry Reding <treding@nvidia.com>
Re: [PATCH printk v2 0/2] Fix reported suspend failures
Posted by Petr Mladek 2 months, 2 weeks ago
On Thu 2025-11-13 17:09:46, John Ogness wrote:
> This is v2 of a series to address multiple reports [0][1]
> (+ 2 offlist) of suspend failing when NBCON console drivers are
> in use. With the help of NXP and NVIDIA we were able to isolate
> the problem and verify the fix.
> 
> The first NBCON drivers appeared in 6.13, so currently there is
> no LTS kernel that requires this series. But it should go into
> 6.17.x and 6.18.
> 
> John Ogness (2):
>   printk: Allow printk_trigger_flush() to flush all types
>   printk: Avoid scheduling irq_work on suspend
> 
>  kernel/printk/internal.h |  8 ++--
>  kernel/printk/nbcon.c    |  9 ++++-
>  kernel/printk/printk.c   | 81 ++++++++++++++++++++++++++++++++--------
>  3 files changed, 78 insertions(+), 20 deletions(-)

JFYI, the patchset has been committed into printk/linux.git,
branch rework/suspend-fixes.

Best Regards,
Petr
Re: [PATCH printk v2 0/2] Fix reported suspend failures
Posted by John Ogness 2 months, 2 weeks ago
Hi Petr,

On 2025-11-19, Petr Mladek <pmladek@suse.com> wrote:
> JFYI, the patchset has been committed into printk/linux.git,
> branch rework/suspend-fixes.

While doing more testing I hit the new WARN_ON_ONCE() in
__wake_up_klogd():

[  125.306075][   T92] Timekeeping suspended for 9.749 seconds
[  125.306093][   T92] ------------[ cut here ]------------
[  125.306108][   T92] WARNING: CPU: 0 PID: 92 at kernel/printk/printk.c:4539 vprintk_emit+0x134/0x2e8
[  125.306151][   T92] Modules linked in: pm33xx ti_emif_sram wkup_m3_ipc wkup_m3_rproc omap_mailbox rtc_omap
[  125.306249][   T92] CPU: 0 UID: 0 PID: 92 Comm: rtcwake Not tainted 6.18.0-rc5-00005-g3d7d27fc1b14 #162 PREEMPT
[  125.306276][   T92] Hardware name: Generic AM33XX (Flattened Device Tree)
[  125.306290][   T92] Call trace:
[  125.306308][   T92]  unwind_backtrace from show_stack+0x18/0x1c
[  125.306356][   T92]  show_stack from dump_stack_lvl+0x50/0x64
[  125.306398][   T92]  dump_stack_lvl from __warn+0x7c/0x160
[  125.306433][   T92]  __warn from warn_slowpath_fmt+0x158/0x1f0
[  125.306459][   T92]  warn_slowpath_fmt from vprintk_emit+0x134/0x2e8
[  125.306487][   T92]  vprintk_emit from _printk_deferred+0x44/0x84
[  125.306520][   T92]  _printk_deferred from tk_debug_account_sleep_time+0x78/0x88
[  125.306574][   T92]  tk_debug_account_sleep_time from timekeeping_inject_sleeptime64+0x3c/0x6c
[  125.306624][   T92]  timekeeping_inject_sleeptime64 from rtc_resume.part.0+0x158/0x178
[  125.306666][   T92]  rtc_resume.part.0 from rtc_resume+0x54/0x64
[  125.306705][   T92]  rtc_resume from dpm_run_callback+0x68/0x1d4
[  125.306747][   T92]  dpm_run_callback from device_resume+0xc8/0x200
[  125.306779][   T92]  device_resume from dpm_resume+0x208/0x304
[  125.306813][   T92]  dpm_resume from dpm_resume_end+0x14/0x24
[  125.306846][   T92]  dpm_resume_end from suspend_devices_and_enter+0x1e8/0x8a4
[  125.306892][   T92]  suspend_devices_and_enter from pm_suspend+0x328/0x3c0
[  125.306924][   T92]  pm_suspend from state_store+0x70/0xd0
[  125.306955][   T92]  state_store from kernfs_fop_write_iter+0x124/0x1e4
[  125.307001][   T92]  kernfs_fop_write_iter from vfs_write+0x1f0/0x2bc
[  125.307049][   T92]  vfs_write from ksys_write+0x68/0xe8
[  125.307085][   T92]  ksys_write from ret_fast_syscall+0x0/0x58
[  125.307113][   T92] Exception stack(0xd025dfa8 to 0xd025dff0)
[  125.307137][   T92] dfa0:                   00000004 bed09f71 00000004 bed09f71 00000003 00000001
[  125.307157][   T92] dfc0: 00000004 bed09f71 00000003 00000004 00510bd4 00000000 00000000 0050e634
[  125.307172][   T92] dfe0: 00000004 bed09bd8 b6ebc20b b6e35616
[  125.307185][   T92] ---[ end trace 0000000000000000 ]---

It is due to a use of printk_deferred(). This goes through the special
case of "level == LOGLEVEL_SCHED" in vprintk_emit(). Originally I had
patched this code as well, but then later removed it thinking that it
was not needed. But it is needed. :-/ Something like:

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index b1c0d35cf3ca..c27fc7fc64eb 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2393,7 +2393,7 @@ asmlinkage int vprintk_emit(int facility, int level,
 	/* If called from the scheduler, we can not call up(). */
 	if (level == LOGLEVEL_SCHED) {
 		level = LOGLEVEL_DEFAULT;
-		ft.legacy_offload |= ft.legacy_direct;
+		ft.legacy_offload |= ft.legacy_direct && !console_irqwork_blocked;
 		ft.legacy_direct = false;
 	}
 
Is this solution ok for you? Do you prefer a follow-up patch or a v3?

John
Re: [PATCH printk v2 0/2] Fix reported suspend failures
Posted by Petr Mladek 2 months, 2 weeks ago
On Thu 2025-11-20 12:09:43, John Ogness wrote:
> Hi Petr,
> 
> On 2025-11-19, Petr Mladek <pmladek@suse.com> wrote:
> > JFYI, the patchset has been committed into printk/linux.git,
> > branch rework/suspend-fixes.
> 
> While doing more testing I hit the new WARN_ON_ONCE() in
> __wake_up_klogd():
> 
> [  125.306075][   T92] Timekeeping suspended for 9.749 seconds
> [  125.306093][   T92] ------------[ cut here ]------------
> [  125.306108][   T92] WARNING: CPU: 0 PID: 92 at kernel/printk/printk.c:4539 vprintk_emit+0x134/0x2e8
> [  125.306151][   T92] Modules linked in: pm33xx ti_emif_sram wkup_m3_ipc wkup_m3_rproc omap_mailbox rtc_omap
> [  125.306249][   T92] CPU: 0 UID: 0 PID: 92 Comm: rtcwake Not tainted 6.18.0-rc5-00005-g3d7d27fc1b14 #162 PREEMPT
> [  125.306276][   T92] Hardware name: Generic AM33XX (Flattened Device Tree)
> [  125.306290][   T92] Call trace:
> [  125.306308][   T92]  unwind_backtrace from show_stack+0x18/0x1c
> [  125.306356][   T92]  show_stack from dump_stack_lvl+0x50/0x64
> [  125.306398][   T92]  dump_stack_lvl from __warn+0x7c/0x160
> [  125.306433][   T92]  __warn from warn_slowpath_fmt+0x158/0x1f0
> [  125.306459][   T92]  warn_slowpath_fmt from vprintk_emit+0x134/0x2e8
> [  125.306487][   T92]  vprintk_emit from _printk_deferred+0x44/0x84
> [  125.306520][   T92]  _printk_deferred from tk_debug_account_sleep_time+0x78/0x88
> [  125.306574][   T92]  tk_debug_account_sleep_time from timekeeping_inject_sleeptime64+0x3c/0x6c
> [  125.306624][   T92]  timekeeping_inject_sleeptime64 from rtc_resume.part.0+0x158/0x178
> [  125.306666][   T92]  rtc_resume.part.0 from rtc_resume+0x54/0x64
> [  125.306705][   T92]  rtc_resume from dpm_run_callback+0x68/0x1d4
> [  125.306747][   T92]  dpm_run_callback from device_resume+0xc8/0x200
> [  125.306779][   T92]  device_resume from dpm_resume+0x208/0x304
> [  125.306813][   T92]  dpm_resume from dpm_resume_end+0x14/0x24
> [  125.306846][   T92]  dpm_resume_end from suspend_devices_and_enter+0x1e8/0x8a4
> [  125.306892][   T92]  suspend_devices_and_enter from pm_suspend+0x328/0x3c0
> [  125.306924][   T92]  pm_suspend from state_store+0x70/0xd0
> [  125.306955][   T92]  state_store from kernfs_fop_write_iter+0x124/0x1e4
> [  125.307001][   T92]  kernfs_fop_write_iter from vfs_write+0x1f0/0x2bc
> [  125.307049][   T92]  vfs_write from ksys_write+0x68/0xe8
> [  125.307085][   T92]  ksys_write from ret_fast_syscall+0x0/0x58
> [  125.307113][   T92] Exception stack(0xd025dfa8 to 0xd025dff0)
> [  125.307137][   T92] dfa0:                   00000004 bed09f71 00000004 bed09f71 00000003 00000001
> [  125.307157][   T92] dfc0: 00000004 bed09f71 00000003 00000004 00510bd4 00000000 00000000 0050e634
> [  125.307172][   T92] dfe0: 00000004 bed09bd8 b6ebc20b b6e35616
> [  125.307185][   T92] ---[ end trace 0000000000000000 ]---
> 
> It is due to a use of printk_deferred(). This goes through the special
> case of "level == LOGLEVEL_SCHED" in vprintk_emit(). Originally I had
> patched this code as well, but then later removed it thinking that it
> was not needed. But it is needed. :-/ Something like:

Great catch!

> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index b1c0d35cf3ca..c27fc7fc64eb 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2393,7 +2393,7 @@ asmlinkage int vprintk_emit(int facility, int level,
>  	/* If called from the scheduler, we can not call up(). */
>  	if (level == LOGLEVEL_SCHED) {
>  		level = LOGLEVEL_DEFAULT;
> -		ft.legacy_offload |= ft.legacy_direct;
> +		ft.legacy_offload |= ft.legacy_direct && !console_irqwork_blocked;
>  		ft.legacy_direct = false;
>  	}
>  
> Is this solution ok for you? Do you prefer a follow-up patch or a v3?

Nothing better comes to my mind ;-) A follow-up patch would be
lovely. Please, go ahead.

Best Regards,
Petr
Re: [PATCH printk v2 0/2] Fix reported suspend failures
Posted by Petr Mladek 2 months, 3 weeks ago
On Thu 2025-11-13 17:09:46, John Ogness wrote:
> This is v2 of a series to address multiple reports [0][1]
> (+ 2 offlist) of suspend failing when NBCON console drivers are
> in use. With the help of NXP and NVIDIA we were able to isolate
> the problem and verify the fix.
> 
> v1 is here [2].
> 
> The first NBCON drivers appeared in 6.13, so currently there is
> no LTS kernel that requires this series. But it should go into
> 6.17.x and 6.18.
> 
> The changes since v1:
> 
> - For printk_trigger_flush() add support for all flush types
>   that are available. This will prevent printk_trigger_flush()
>   from trying to inappropriately queue irq_work after this
>   series is applied.
> 
> - Add WARN_ON_ONCE() to the printk irq_work queueing functions
>   in case they are called when irq_work is blocked. There
>   should never be (and currently are no) such callers, but
>   these functions are externally available.
> 
> John Ogness
> 
> [0] https://lore.kernel.org/lkml/80b020fc-c18a-4da4-b222-16da1cab2f4c@nvidia.com
> [1] https://lore.kernel.org/lkml/DB9PR04MB8429E7DDF2D93C2695DE401D92C4A@DB9PR04MB8429.eurprd04.prod.outlook.com
> [2] https://lore.kernel.org/lkml/20251111144328.887159-1-john.ogness@linutronix.de
> 
> John Ogness (2):
>   printk: Allow printk_trigger_flush() to flush all types
>   printk: Avoid scheduling irq_work on suspend
> 
>  kernel/printk/internal.h |  8 ++--
>  kernel/printk/nbcon.c    |  9 ++++-
>  kernel/printk/printk.c   | 81 ++++++++++++++++++++++++++++++++--------
>  3 files changed, 78 insertions(+), 20 deletions(-)

The patchset seems to be ready for linux-next from my POV. I am going
to wait few more days for potential feedback. I'll push it later the
following week unless anyone complains.

Best Regards,
Petr