serial: 8250: nbcon_atomic_flush_pending() might trigger watchdog warnigns

[RFC 0/1] serial: 8250: nbcon_atomic_flush_pending() might trigger watchdog warnigns

Posted by Petr Mladek 1 month, 1 week ago

Hi,

this is a followup for the patchset which converted serial 8250
console driver to nbcon [1]. It was added into upstream but
the last two patches were later reverted because they caused
regressions.

We have seen the following hardlockup report when using the converted
serial 8250 console driver:

[    0.000000][    T0] Command line: elfcorehdr=0x145f000000  mitigations=auto security=selinux selinux=1 console=ttyS0,115200 earlyprintk=ttyS0,115200 ignore_loglevel panic=13 intel_iommu=on sysrq=yes reset_devices acpi_no_memhotplug cgroup_disable=memory nokaslr numa=off irqpoll nr_cpus=4 root=kdump rootflags=bind rd.udev.children-max=8 disable_cpu_apicid=0  
[...]
[    3.771531][    T1] pci 0000:3e:08.1: [8086:324
** replaying previous printk message **
[    3.771531][    T1] pci 0000:3e:08.1: [8086:3246] type 00 class 0x088000 PCIe Root Complex Integrated Endpoint
[ ... more than 2000 lines, about 200kB messages ... ]
[    3.837752][    T1] pci 0000:20:01.0: Adding to iommu group 18
[    3.837851][    T
** replaying previous printk message **
[    3.837851][    T1] pci 0000:20:03.0: Adding to iommu group 19
[    3.837946][    T1] pci 0000:20:05.0: Adding to iommu group 20
[ ... more than 500 messages for iommu groups 21-590 ...]
[    3.912932][    T1] pci 0000:f6:00.1: Adding to iommu group 591
[    3.913070][    T1] pci 0000:f6:00.2: Adding to iommu group 592
[    3.913243][    T1] DMAR: Intel(R) Virtualization Technology for Directed I/O
[    3.913245][    T1] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
[    3.913245][    T1] software IO TLB: mapped [mem 0x000000004f000000-0x0000000053000000] (64MB)
[    3.913324][    T1] RAPL PMU: API unit is 2^-32 Joules, 3 fixed counters, 655360 ms ovfl timer
[    3.913325][    T1] RAPL PMU: hw unit of domain package 2^-14 Joules
[    3.913326][    T1] RAPL PMU: hw unit of domain dram 2^-14 Joules
[    3.913327][    T1] RAPL PMU: hw unit of domain psys 2^-0 Joules
[    3.933486][    T1] ------------[ cut here ]------------
[    3.933488][    T1] WARNING: CPU: 2 PID: 1 at arch/x86/events/intel/uncore.c:1156 uncore_pci_pmu_register+0x15e/0x180
[    3.930291][    C0] watchdog: Watchdog detected hard LOCKUP on cpu 0
[    3.930291][    C0] Modules linked in:
[    3.930291][    C0] Supported: No, Unreleased kernel
[    3.930291][    C0] CPU: 0 UID: 0 PID: 18 Comm: pr/ttyS0 Not tainted 6.12.0-160000.18-default #1 PREEMPT(voluntary) SLFO-1.2 (unreleased) dd174c2cca19586eee16eaccfeba02f4d5b57c67
[    3.930291][    C0] Hardware name: HPE ProLiant DL560 Gen11/ProLiant DL560 Gen11, BIOS 2.48 03/11/2025
[    3.930291][    C0] RIP: 0010:nbcon_reacquire_nobuf+0x11/0x50
[    3.930291][    C0] Code: 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 53 48 89 fb eb 02 f3 90 <48> 89 df e8 b7 fc ff ff 84 c0 74 f2 48 8b 03 48 c7 43 28 00 00 00
[    3.930291][    C0] RSP: 0000:ffa0000000147d38 EFLAGS: 00000046
[    3.930291][    C0] RAX: 0000000000000000 RBX: ffa0000000147e78 RCX: 0000000000000001
[    3.930291][    C0] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffa0000000147e78
[    3.930291][    C0] RBP: ffa0000000147e78 R08: 0000000000000010 R09: 0000000000000000
[    3.930291][    C0] R10: 312e38303a65333a R11: 3030303020696370 R12: ff1100145f53f032
[    3.930291][    C0] R13: 0000000000000000 R14: ff1100145f53f032 R15: ff1100145f53f072
[    3.930291][    C0] FS:  0000000000000000(0000) GS:ff1100147ea00000(0000) knlGS:0000000000000000
[    3.930291][    C0] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.930291][    C0] CR2: ff1100147e402000 CR3: 000000147d238001 CR4: 0000000000f71ef0
[    3.930291][    C0] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    3.930291][    C0] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[    3.930291][    C0] PKRU: 55555554
[    3.930291][    C0] Call Trace:
[    3.930291][    C0]  <NMI>
[    3.930291][    C0]  ? watchdog_hardlockup_check.cold+0xe7/0xec
[    3.930291][    C0]  ? __perf_event_overflow+0x11b/0x380
[    3.930291][    C0]  ? handle_pmi_common+0x15c/0x3d0
[    3.930291][    C0]  ? intel_pmu_handle_irq+0x10f/0x5b0
[    3.930291][    C0]  ? perf_event_nmi_handler+0x2a/0x50
[    3.930291][    C0]  ? nmi_handle+0x5e/0x120
[    3.930291][    C0]  ? default_do_nmi+0x40/0x130
[    3.930291][    C0]  ? exc_nmi+0x187/0x240
[    3.930291][    C0]  ? end_repeat_nmi+0xf/0x53
[    3.930291][    C0]  ? nbcon_reacquire_nobuf+0x11/0x50
[    3.930291][    C0]  ? nbcon_reacquire_nobuf+0x11/0x50
[    3.930291][    C0]  ? nbcon_reacquire_nobuf+0x11/0x50
[    3.930291][    C0]  </NMI>
[    3.930291][    C0]  <TASK>
[    3.930291][    C0]  serial8250_console_write+0x16d/0x5c0
[    3.930291][    C0]  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
[    3.930291][    C0]  nbcon_emit_next_record+0x22c/0x250
[    3.930291][    C0]  nbcon_emit_one+0x93/0xe0
[    3.930291][    C0]  nbcon_kthread_func+0x13c/0x1c0
[    3.930291][    C0]  ? __pfx_nbcon_kthread_func+0x10/0x10
[    3.930291][    C0]  kthread+0xcd/0x110
[    3.930291][    C0]  ? __pfx_kthread+0x10/0x10
[    3.930291][    C0]  ret_from_fork+0x31/0x50
[    3.930291][    C0]  ? __pfx_kthread+0x10/0x10
[    3.930291][    C0]  ret_from_fork_asm+0x1a/0x30
[    3.930291][    C0]  </TASK>
[    3.930291][    C0] Kernel panic - not syncing: Hard LOCKUP
[    3.930291][    C0] CPU: 0 UID: 0 PID: 18 Comm: pr/ttyS0 Not tainted 6.12.0-160000.18-default #1 PREEMPT(voluntary) SLFO-1.2 (unreleased) dd174c2cca19586eee16eaccfeba02f4d5b57c67
[    3.930291][    C0] Hardware name: HPE ProLiant DL560 Gen11/ProLiant DL560 Gen11, BIOS 2.48 03/11/2025
[    3.930291][    C0] Call Trace:
[    3.930291][    C0]  <NMI>
[    3.930291][    C0]  dump_stack_lvl+0x4b/0x70
[    3.930291][    C0]  panic+0x106/0x2d3
[    3.930291][    C0]  nmi_panic.cold+0xc/0xc
[    3.930291][    C0]  watchdog_hardlockup_check.cold+0xca/0xec
[    3.930291][    C0]  __perf_event_overflow+0x11b/0x380
[    3.930291][    C0]  handle_pmi_common+0x15c/0x3d0
[    3.930291][    C0]  intel_pmu_handle_irq+0x10f/0x5b0
[    3.930291][    C0]  perf_event_nmi_handler+0x2a/0x50
[    3.930291][    C0]  nmi_handle+0x5e/0x120
[    3.930291][    C0]  default_do_nmi+0x40/0x130
[    3.930291][    C0]  exc_nmi+0x187/0x240
[    3.930291][    C0]  end_repeat_nmi+0xf/0x53
[    3.930291][    C0] RIP: 0010:nbcon_reacquire_nobuf+0x11/0x50
[    3.930291][    C0] Code: 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 53 48 89 fb eb 02 f3 90 <48> 89 df e8 b7 fc ff ff 84 c0 74 f2 48 8b 03 48 c7 43 28 00 00 00
[    3.930291][    C0] RSP: 0000:ffa0000000147d38 EFLAGS: 00000046
[    3.930291][    C0] RAX: 0000000000000000 RBX: ffa0000000147e78 RCX: 0000000000000001
[    3.930291][    C0] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffa0000000147e78
[    3.930291][    C0] RBP: ffa0000000147e78 R08: 0000000000000010 R09: 0000000000000000
[    3.930291][    C0] R10: 312e38303a65333a R11: 3030303020696370 R12: ff1100145f53f032
[    3.930291][    C0] R13: 0000000000000000 R14: ff1100145f53f032 R15: ff1100145f53f072
[    3.930291][    C0]  ? nbcon_reacquire_nobuf+0x11/0x50
[    3.930291][    C0]  ? nbcon_reacquire_nobuf+0x11/0x50
[    3.930291][    C0]  </NMI>
[    3.930291][    C0]  <TASK>
[    3.930291][    C0]  serial8250_console_write+0x16d/0x5c0
[    3.930291][    C0]  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
[    3.930291][    C0]  nbcon_emit_next_record+0x22c/0x250
[    3.930291][    C0]  nbcon_emit_one+0x93/0xe0
[    3.930291][    C0]  nbcon_kthread_func+0x13c/0x1c0
[    3.930291][    C0]  ? __pfx_nbcon_kthread_func+0x10/0x10
[    3.930291][    C0]  kthread+0xcd/0x110
[    3.930291][    C0]  ? __pfx_kthread+0x10/0x10
[    3.930291][    C0]  ret_from_fork+0x31/0x50
[    3.930291][    C0]  ? __pfx_kthread+0x10/0x10
[    3.930291][    C0]  ret_from_fork_asm+0x1a/0x30
[    3.930291][    C0]  </TASK>
[    3.930291][    C0] Shutting down cpus with NMI
[    3.930291][    C0] Rebooting in 13 seconds..  


There are clearly visible two points where nbcon_atomic_flush_pending()
took over the ownership from a lover priority context. I believe that:

  + 1st occurrence is triggered by the "WARNING: CPU: 2 PID: 1 at
    arch/x86/..." line printed with NBCON_PRIO_EMERGENCY.

  + 2nd occurrence is triggered by the "Kernel panic - not syncing:
    Hard LOCKUP" line printed with NBCON_PRIO_PANIC.

There were flushed more than 2500lines, about 240kB of characters,
in the NBCON_PRIO_EMERGENCY before the hardlockup detector
triggered panic.

If I count it correctly, a serial console with the speed 115200 baud/sec
would be able to emit about 11.5kB/sec. And it would take about 20sec
to emit the 240kB of messages.

=> softlockup is quite realistic

Solution:

IMHO, we really should flush all pending messages atomically.
It means that the watchdog reports need to be prevented
by touching the watchdog. It is not needed in
univ8250_console_write_thread()

=> put back touch_nmi_watchdog() into univ8250_console_write_atomic().


Warning:

The log shows report from the hardlockup report. And it triggered
only after about 0.2 sec. It looks suspicious but the softlockup
scenario seems realistic.

Note that the kernel parameter "root=kdump". The problem happened
when booting the crash kernel using __crash_kexec(). The original
kernel was crashed by "SysRq C" for testing purposes.

It means that it was a nested boot. There might be a bug in
the hardlockup detector in the nested boot so that perf event
was scheduled by the original kernel and happened too early.
Or something like this. I am going to investigate this as
well.

But still, it seems that the softlockup in
nbcon_atomic_flush_pending() is realistic and we should put
back the touch_nmi_watchdog() in the write_atomic() callbacks.


[1] https://lore.kernel.org/r/20250107212702.169493-1-john.ogness@linutronix.de

Petr Mladek (1):
  serial: 8250: Touch watchdogs in write_atomic()

 drivers/tty/serial/8250/8250_core.c | 5 +++++
 1 file changed, 5 insertions(+)

-- 
2.50.1

Re: [RFC 0/1] serial: 8250: nbcon_atomic_flush_pending() might trigger watchdog warnigns

Posted by John Ogness 1 month, 1 week ago

On 2025-08-22, Petr Mladek <pmladek@suse.com> wrote:
> Hi,
>
> this is a followup for the patchset which converted serial 8250
> console driver to nbcon [1]. It was added into upstream but
> the last two patches were later reverted because they caused
> regressions.
>
> We have seen the following hardlockup report when using the converted
> serial 8250 console driver:
>
> [    0.000000][    T0] Command line: elfcorehdr=0x145f000000  mitigations=auto security=selinux selinux=1 console=ttyS0,115200 earlyprintk=ttyS0,115200 ignore_loglevel panic=13 intel_iommu=on sysrq=yes reset_devices acpi_no_memhotplug cgroup_disable=memory nokaslr numa=off irqpoll nr_cpus=4 root=kdump rootflags=bind rd.udev.children-max=8 disable_cpu_apicid=0  
> [...]
> [    3.771531][    T1] pci 0000:3e:08.1: [8086:324
> ** replaying previous printk message **
> [    3.771531][    T1] pci 0000:3e:08.1: [8086:3246] type 00 class 0x088000 PCIe Root Complex Integrated Endpoint
> [ ... more than 2000 lines, about 200kB messages ... ]
> [    3.837752][    T1] pci 0000:20:01.0: Adding to iommu group 18
> [    3.837851][    T
> ** replaying previous printk message **
> [    3.837851][    T1] pci 0000:20:03.0: Adding to iommu group 19
> [    3.837946][    T1] pci 0000:20:05.0: Adding to iommu group 20
> [ ... more than 500 messages for iommu groups 21-590 ...]
> [    3.912932][    T1] pci 0000:f6:00.1: Adding to iommu group 591
> [    3.913070][    T1] pci 0000:f6:00.2: Adding to iommu group 592
> [    3.913243][    T1] DMAR: Intel(R) Virtualization Technology for Directed I/O
> [    3.913245][    T1] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
> [    3.913245][    T1] software IO TLB: mapped [mem 0x000000004f000000-0x0000000053000000] (64MB)
> [    3.913324][    T1] RAPL PMU: API unit is 2^-32 Joules, 3 fixed counters, 655360 ms ovfl timer
> [    3.913325][    T1] RAPL PMU: hw unit of domain package 2^-14 Joules
> [    3.913326][    T1] RAPL PMU: hw unit of domain dram 2^-14 Joules
> [    3.913327][    T1] RAPL PMU: hw unit of domain psys 2^-0 Joules
> [    3.933486][    T1] ------------[ cut here ]------------
> [    3.933488][    T1] WARNING: CPU: 2 PID: 1 at arch/x86/events/intel/uncore.c:1156 uncore_pci_pmu_register+0x15e/0x180
> [    3.930291][    C0] watchdog: Watchdog detected hard LOCKUP on cpu 0
> [    3.930291][    C0] Modules linked in:
> [    3.930291][    C0] Supported: No, Unreleased kernel
> [    3.930291][    C0] CPU: 0 UID: 0 PID: 18 Comm: pr/ttyS0 Not tainted 6.12.0-160000.18-default #1 PREEMPT(voluntary) SLFO-1.2 (unreleased) dd174c2cca19586eee16eaccfeba02f4d5b57c67
> [    3.930291][    C0] Hardware name: HPE ProLiant DL560 Gen11/ProLiant DL560 Gen11, BIOS 2.48 03/11/2025
> [    3.930291][    C0] RIP: 0010:nbcon_reacquire_nobuf+0x11/0x50
> [    3.930291][    C0] Code: 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 53 48 89 fb eb 02 f3 90 <48> 89 df e8 b7 fc ff ff 84 c0 74 f2 48 8b 03 48 c7 43 28 00 00 00
> [    3.930291][    C0] RSP: 0000:ffa0000000147d38 EFLAGS: 00000046
> [    3.930291][    C0] RAX: 0000000000000000 RBX: ffa0000000147e78 RCX: 0000000000000001
> [    3.930291][    C0] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffa0000000147e78
> [    3.930291][    C0] RBP: ffa0000000147e78 R08: 0000000000000010 R09: 0000000000000000
> [    3.930291][    C0] R10: 312e38303a65333a R11: 3030303020696370 R12: ff1100145f53f032
> [    3.930291][    C0] R13: 0000000000000000 R14: ff1100145f53f032 R15: ff1100145f53f072
> [    3.930291][    C0] FS:  0000000000000000(0000) GS:ff1100147ea00000(0000) knlGS:0000000000000000
> [    3.930291][    C0] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    3.930291][    C0] CR2: ff1100147e402000 CR3: 000000147d238001 CR4: 0000000000f71ef0
> [    3.930291][    C0] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [    3.930291][    C0] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
> [    3.930291][    C0] PKRU: 55555554
> [    3.930291][    C0] Call Trace:
> [    3.930291][    C0]  <NMI>
> [    3.930291][    C0]  ? watchdog_hardlockup_check.cold+0xe7/0xec
> [    3.930291][    C0]  ? __perf_event_overflow+0x11b/0x380
> [    3.930291][    C0]  ? handle_pmi_common+0x15c/0x3d0
> [    3.930291][    C0]  ? intel_pmu_handle_irq+0x10f/0x5b0
> [    3.930291][    C0]  ? perf_event_nmi_handler+0x2a/0x50
> [    3.930291][    C0]  ? nmi_handle+0x5e/0x120
> [    3.930291][    C0]  ? default_do_nmi+0x40/0x130
> [    3.930291][    C0]  ? exc_nmi+0x187/0x240
> [    3.930291][    C0]  ? end_repeat_nmi+0xf/0x53
> [    3.930291][    C0]  ? nbcon_reacquire_nobuf+0x11/0x50
> [    3.930291][    C0]  ? nbcon_reacquire_nobuf+0x11/0x50
> [    3.930291][    C0]  ? nbcon_reacquire_nobuf+0x11/0x50
> [    3.930291][    C0]  </NMI>
> [    3.930291][    C0]  <TASK>
> [    3.930291][    C0]  serial8250_console_write+0x16d/0x5c0
> [    3.930291][    C0]  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
> [    3.930291][    C0]  nbcon_emit_next_record+0x22c/0x250
> [    3.930291][    C0]  nbcon_emit_one+0x93/0xe0
> [    3.930291][    C0]  nbcon_kthread_func+0x13c/0x1c0
> [    3.930291][    C0]  ? __pfx_nbcon_kthread_func+0x10/0x10
> [    3.930291][    C0]  kthread+0xcd/0x110
> [    3.930291][    C0]  ? __pfx_kthread+0x10/0x10
> [    3.930291][    C0]  ret_from_fork+0x31/0x50
> [    3.930291][    C0]  ? __pfx_kthread+0x10/0x10
> [    3.930291][    C0]  ret_from_fork_asm+0x1a/0x30
> [    3.930291][    C0]  </TASK>
> [    3.930291][    C0] Kernel panic - not syncing: Hard LOCKUP
> [    3.930291][    C0] CPU: 0 UID: 0 PID: 18 Comm: pr/ttyS0 Not tainted 6.12.0-160000.18-default #1 PREEMPT(voluntary) SLFO-1.2 (unreleased) dd174c2cca19586eee16eaccfeba02f4d5b57c67
> [    3.930291][    C0] Hardware name: HPE ProLiant DL560 Gen11/ProLiant DL560 Gen11, BIOS 2.48 03/11/2025
> [    3.930291][    C0] Call Trace:
> [    3.930291][    C0]  <NMI>
> [    3.930291][    C0]  dump_stack_lvl+0x4b/0x70
> [    3.930291][    C0]  panic+0x106/0x2d3
> [    3.930291][    C0]  nmi_panic.cold+0xc/0xc
> [    3.930291][    C0]  watchdog_hardlockup_check.cold+0xca/0xec
> [    3.930291][    C0]  __perf_event_overflow+0x11b/0x380
> [    3.930291][    C0]  handle_pmi_common+0x15c/0x3d0
> [    3.930291][    C0]  intel_pmu_handle_irq+0x10f/0x5b0
> [    3.930291][    C0]  perf_event_nmi_handler+0x2a/0x50
> [    3.930291][    C0]  nmi_handle+0x5e/0x120
> [    3.930291][    C0]  default_do_nmi+0x40/0x130
> [    3.930291][    C0]  exc_nmi+0x187/0x240
> [    3.930291][    C0]  end_repeat_nmi+0xf/0x53
> [    3.930291][    C0] RIP: 0010:nbcon_reacquire_nobuf+0x11/0x50
> [    3.930291][    C0] Code: 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 53 48 89 fb eb 02 f3 90 <48> 89 df e8 b7 fc ff ff 84 c0 74 f2 48 8b 03 48 c7 43 28 00 00 00
> [    3.930291][    C0] RSP: 0000:ffa0000000147d38 EFLAGS: 00000046
> [    3.930291][    C0] RAX: 0000000000000000 RBX: ffa0000000147e78 RCX: 0000000000000001
> [    3.930291][    C0] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffa0000000147e78
> [    3.930291][    C0] RBP: ffa0000000147e78 R08: 0000000000000010 R09: 0000000000000000
> [    3.930291][    C0] R10: 312e38303a65333a R11: 3030303020696370 R12: ff1100145f53f032
> [    3.930291][    C0] R13: 0000000000000000 R14: ff1100145f53f032 R15: ff1100145f53f072
> [    3.930291][    C0]  ? nbcon_reacquire_nobuf+0x11/0x50
> [    3.930291][    C0]  ? nbcon_reacquire_nobuf+0x11/0x50
> [    3.930291][    C0]  </NMI>
> [    3.930291][    C0]  <TASK>
> [    3.930291][    C0]  serial8250_console_write+0x16d/0x5c0
> [    3.930291][    C0]  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
> [    3.930291][    C0]  nbcon_emit_next_record+0x22c/0x250
> [    3.930291][    C0]  nbcon_emit_one+0x93/0xe0
> [    3.930291][    C0]  nbcon_kthread_func+0x13c/0x1c0
> [    3.930291][    C0]  ? __pfx_nbcon_kthread_func+0x10/0x10
> [    3.930291][    C0]  kthread+0xcd/0x110
> [    3.930291][    C0]  ? __pfx_kthread+0x10/0x10
> [    3.930291][    C0]  ret_from_fork+0x31/0x50
> [    3.930291][    C0]  ? __pfx_kthread+0x10/0x10
> [    3.930291][    C0]  ret_from_fork_asm+0x1a/0x30
> [    3.930291][    C0]  </TASK>
> [    3.930291][    C0] Shutting down cpus with NMI
> [    3.930291][    C0] Rebooting in 13 seconds..  
>
>
> There are clearly visible two points where nbcon_atomic_flush_pending()
> took over the ownership from a lover priority context. I believe that:
>
>   + 1st occurrence is triggered by the "WARNING: CPU: 2 PID: 1 at
>     arch/x86/..." line printed with NBCON_PRIO_EMERGENCY.
>
>   + 2nd occurrence is triggered by the "Kernel panic - not syncing:
>     Hard LOCKUP" line printed with NBCON_PRIO_PANIC.
>
> There were flushed more than 2500lines, about 240kB of characters,
> in the NBCON_PRIO_EMERGENCY before the hardlockup detector
> triggered panic.
>
> If I count it correctly, a serial console with the speed 115200 baud/sec
> would be able to emit about 11.5kB/sec. And it would take about 20sec
> to emit the 240kB of messages.
>
> => softlockup is quite realistic
>
> Solution:
>
> IMHO, we really should flush all pending messages atomically.
> It means that the watchdog reports need to be prevented
> by touching the watchdog. It is not needed in
> univ8250_console_write_thread()
>
> => put back touch_nmi_watchdog() into univ8250_console_write_atomic().

I would expect the touch_nmi_watchdog() within wait_for_lsr() to be
sufficient. After all, that is the loop that leads to the large emit
times.

For QEMU, the touch_nmi_watchdog() within wait_for_lsr() will never be
called because QEMU does not implement baud rates. So that may be reason
enough to accept this change.

John

> [1] https://lore.kernel.org/r/20250107212702.169493-1-john.ogness@linutronix.de

Re: [RFC 0/1] serial: 8250: nbcon_atomic_flush_pending() might trigger watchdog warnigns

Posted by Petr Mladek 1 week, 4 days ago

On Mon 2025-08-25 13:06:27, John Ogness wrote:
> On 2025-08-22, Petr Mladek <pmladek@suse.com> wrote:
> > Hi,
> >
> > this is a followup for the patchset which converted serial 8250
> > console driver to nbcon [1]. It was added into upstream but
> > the last two patches were later reverted because they caused
> > regressions.
> >
> > We have seen the following hardlockup report when using the converted
> > serial 8250 console driver:
> >
> > [    0.000000][    T0] Command line: elfcorehdr=0x145f000000  mitigations=auto security=selinux selinux=1 console=ttyS0,115200 earlyprintk=ttyS0,115200 ignore_loglevel panic=13 intel_iommu=on sysrq=yes reset_devices acpi_no_memhotplug cgroup_disable=memory nokaslr numa=off irqpoll nr_cpus=4 root=kdump rootflags=bind rd.udev.children-max=8 disable_cpu_apicid=0  
> > [...]
> > [    3.771531][    T1] pci 0000:3e:08.1: [8086:324
> > ** replaying previous printk message **
> > [    3.771531][    T1] pci 0000:3e:08.1: [8086:3246] type 00 class 0x088000 PCIe Root Complex Integrated Endpoint
> > [ ... more than 2000 lines, about 200kB messages ... ]
> > [    3.837752][    T1] pci 0000:20:01.0: Adding to iommu group 18
> > [    3.837851][    T
> > ** replaying previous printk message **
> > [    3.837851][    T1] pci 0000:20:03.0: Adding to iommu group 19
> > [    3.837946][    T1] pci 0000:20:05.0: Adding to iommu group 20
> > [ ... more than 500 messages for iommu groups 21-590 ...]
> > [    3.912932][    T1] pci 0000:f6:00.1: Adding to iommu group 591
> > [    3.913070][    T1] pci 0000:f6:00.2: Adding to iommu group 592
> > [    3.913243][    T1] DMAR: Intel(R) Virtualization Technology for Directed I/O
> > [    3.913245][    T1] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
> > [    3.913245][    T1] software IO TLB: mapped [mem 0x000000004f000000-0x0000000053000000] (64MB)
> > [    3.913324][    T1] RAPL PMU: API unit is 2^-32 Joules, 3 fixed counters, 655360 ms ovfl timer
> > [    3.913325][    T1] RAPL PMU: hw unit of domain package 2^-14 Joules
> > [    3.913326][    T1] RAPL PMU: hw unit of domain dram 2^-14 Joules
> > [    3.913327][    T1] RAPL PMU: hw unit of domain psys 2^-0 Joules
> > [    3.933486][    T1] ------------[ cut here ]------------
> > [    3.933488][    T1] WARNING: CPU: 2 PID: 1 at arch/x86/events/intel/uncore.c:1156 uncore_pci_pmu_register+0x15e/0x180
> > [    3.930291][    C0] watchdog: Watchdog detected hard LOCKUP on cpu 0
> > [    3.930291][    C0] Modules linked in:
> > [    3.930291][    C0] Supported: No, Unreleased kernel
> > [    3.930291][    C0] CPU: 0 UID: 0 PID: 18 Comm: pr/ttyS0 Not tainted 6.12.0-160000.18-default #1 PREEMPT(voluntary) SLFO-1.2 (unreleased) dd174c2cca19586eee16eaccfeba02f4d5b57c67
> > [    3.930291][    C0] Hardware name: HPE ProLiant DL560 Gen11/ProLiant DL560 Gen11, BIOS 2.48 03/11/2025
> > [    3.930291][    C0] RIP: 0010:nbcon_reacquire_nobuf+0x11/0x50
> > [    3.930291][    C0] Code: 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 53 48 89 fb eb 02 f3 90 <48> 89 df e8 b7 fc ff ff 84 c0 74 f2 48 8b 03 48 c7 43 28 00 00 00
> > [    3.930291][    C0] RSP: 0000:ffa0000000147d38 EFLAGS: 00000046
> > [    3.930291][    C0] RAX: 0000000000000000 RBX: ffa0000000147e78 RCX: 0000000000000001
> > [    3.930291][    C0] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffa0000000147e78
> > [    3.930291][    C0] RBP: ffa0000000147e78 R08: 0000000000000010 R09: 0000000000000000
> > [    3.930291][    C0] R10: 312e38303a65333a R11: 3030303020696370 R12: ff1100145f53f032
> > [    3.930291][    C0] R13: 0000000000000000 R14: ff1100145f53f032 R15: ff1100145f53f072
> > [    3.930291][    C0] FS:  0000000000000000(0000) GS:ff1100147ea00000(0000) knlGS:0000000000000000
> > [    3.930291][    C0] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [    3.930291][    C0] CR2: ff1100147e402000 CR3: 000000147d238001 CR4: 0000000000f71ef0
> > [    3.930291][    C0] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [    3.930291][    C0] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
> > [    3.930291][    C0] PKRU: 55555554
> > [    3.930291][    C0] Call Trace:
> > [    3.930291][    C0]  <NMI>
> > [    3.930291][    C0]  ? watchdog_hardlockup_check.cold+0xe7/0xec
> > [    3.930291][    C0]  ? __perf_event_overflow+0x11b/0x380
> > [    3.930291][    C0]  ? handle_pmi_common+0x15c/0x3d0
> > [    3.930291][    C0]  ? intel_pmu_handle_irq+0x10f/0x5b0
> > [    3.930291][    C0]  ? perf_event_nmi_handler+0x2a/0x50
> > [    3.930291][    C0]  ? nmi_handle+0x5e/0x120
> > [    3.930291][    C0]  ? default_do_nmi+0x40/0x130
> > [    3.930291][    C0]  ? exc_nmi+0x187/0x240
> > [    3.930291][    C0]  ? end_repeat_nmi+0xf/0x53
> > [    3.930291][    C0]  ? nbcon_reacquire_nobuf+0x11/0x50
> > [    3.930291][    C0]  ? nbcon_reacquire_nobuf+0x11/0x50
> > [    3.930291][    C0]  ? nbcon_reacquire_nobuf+0x11/0x50
> > [    3.930291][    C0]  </NMI>
> > [    3.930291][    C0]  <TASK>
> > [    3.930291][    C0]  serial8250_console_write+0x16d/0x5c0
> > [    3.930291][    C0]  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
> > [    3.930291][    C0]  nbcon_emit_next_record+0x22c/0x250
> > [    3.930291][    C0]  nbcon_emit_one+0x93/0xe0
> > [    3.930291][    C0]  nbcon_kthread_func+0x13c/0x1c0
> > [    3.930291][    C0]  ? __pfx_nbcon_kthread_func+0x10/0x10
> > [    3.930291][    C0]  kthread+0xcd/0x110
> > [    3.930291][    C0]  ? __pfx_kthread+0x10/0x10
> > [    3.930291][    C0]  ret_from_fork+0x31/0x50
> > [    3.930291][    C0]  ? __pfx_kthread+0x10/0x10
> > [    3.930291][    C0]  ret_from_fork_asm+0x1a/0x30
> > [    3.930291][    C0]  </TASK>
> > [    3.930291][    C0] Kernel panic - not syncing: Hard LOCKUP
> > [    3.930291][    C0] CPU: 0 UID: 0 PID: 18 Comm: pr/ttyS0 Not tainted 6.12.0-160000.18-default #1 PREEMPT(voluntary) SLFO-1.2 (unreleased) dd174c2cca19586eee16eaccfeba02f4d5b57c67
> > [    3.930291][    C0] Hardware name: HPE ProLiant DL560 Gen11/ProLiant DL560 Gen11, BIOS 2.48 03/11/2025
> > [    3.930291][    C0] Call Trace:
> > [    3.930291][    C0]  <NMI>
> > [    3.930291][    C0]  dump_stack_lvl+0x4b/0x70
> > [    3.930291][    C0]  panic+0x106/0x2d3
> > [    3.930291][    C0]  nmi_panic.cold+0xc/0xc
> > [    3.930291][    C0]  watchdog_hardlockup_check.cold+0xca/0xec
> > [    3.930291][    C0]  __perf_event_overflow+0x11b/0x380
> > [    3.930291][    C0]  handle_pmi_common+0x15c/0x3d0
> > [    3.930291][    C0]  intel_pmu_handle_irq+0x10f/0x5b0
> > [    3.930291][    C0]  perf_event_nmi_handler+0x2a/0x50
> > [    3.930291][    C0]  nmi_handle+0x5e/0x120
> > [    3.930291][    C0]  default_do_nmi+0x40/0x130
> > [    3.930291][    C0]  exc_nmi+0x187/0x240
> > [    3.930291][    C0]  end_repeat_nmi+0xf/0x53
> > [    3.930291][    C0] RIP: 0010:nbcon_reacquire_nobuf+0x11/0x50
> > [    3.930291][    C0] Code: 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 53 48 89 fb eb 02 f3 90 <48> 89 df e8 b7 fc ff ff 84 c0 74 f2 48 8b 03 48 c7 43 28 00 00 00
> > [    3.930291][    C0] RSP: 0000:ffa0000000147d38 EFLAGS: 00000046
> > [    3.930291][    C0] RAX: 0000000000000000 RBX: ffa0000000147e78 RCX: 0000000000000001
> > [    3.930291][    C0] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffa0000000147e78
> > [    3.930291][    C0] RBP: ffa0000000147e78 R08: 0000000000000010 R09: 0000000000000000
> > [    3.930291][    C0] R10: 312e38303a65333a R11: 3030303020696370 R12: ff1100145f53f032
> > [    3.930291][    C0] R13: 0000000000000000 R14: ff1100145f53f032 R15: ff1100145f53f072
> > [    3.930291][    C0]  ? nbcon_reacquire_nobuf+0x11/0x50
> > [    3.930291][    C0]  ? nbcon_reacquire_nobuf+0x11/0x50
> > [    3.930291][    C0]  </NMI>
> > [    3.930291][    C0]  <TASK>
> > [    3.930291][    C0]  serial8250_console_write+0x16d/0x5c0
> > [    3.930291][    C0]  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
> > [    3.930291][    C0]  nbcon_emit_next_record+0x22c/0x250
> > [    3.930291][    C0]  nbcon_emit_one+0x93/0xe0
> > [    3.930291][    C0]  nbcon_kthread_func+0x13c/0x1c0
> > [    3.930291][    C0]  ? __pfx_nbcon_kthread_func+0x10/0x10
> > [    3.930291][    C0]  kthread+0xcd/0x110
> > [    3.930291][    C0]  ? __pfx_kthread+0x10/0x10
> > [    3.930291][    C0]  ret_from_fork+0x31/0x50
> > [    3.930291][    C0]  ? __pfx_kthread+0x10/0x10
> > [    3.930291][    C0]  ret_from_fork_asm+0x1a/0x30
> > [    3.930291][    C0]  </TASK>
> > [    3.930291][    C0] Shutting down cpus with NMI
> > [    3.930291][    C0] Rebooting in 13 seconds..  
> >
> >
> > There are clearly visible two points where nbcon_atomic_flush_pending()
> > took over the ownership from a lover priority context. I believe that:
> >
> >   + 1st occurrence is triggered by the "WARNING: CPU: 2 PID: 1 at
> >     arch/x86/..." line printed with NBCON_PRIO_EMERGENCY.
> >
> >   + 2nd occurrence is triggered by the "Kernel panic - not syncing:
> >     Hard LOCKUP" line printed with NBCON_PRIO_PANIC.
> >
> > There were flushed more than 2500lines, about 240kB of characters,
> > in the NBCON_PRIO_EMERGENCY before the hardlockup detector
> > triggered panic.
> >
> > If I count it correctly, a serial console with the speed 115200 baud/sec
> > would be able to emit about 11.5kB/sec. And it would take about 20sec
> > to emit the 240kB of messages.
> >
> > => softlockup is quite realistic
> >
> > Solution:
> >
> > IMHO, we really should flush all pending messages atomically.
> > It means that the watchdog reports need to be prevented
> > by touching the watchdog. It is not needed in
> > univ8250_console_write_thread()
> >
> > => put back touch_nmi_watchdog() into univ8250_console_write_atomic().
> 
> I would expect the touch_nmi_watchdog() within wait_for_lsr() to be
> sufficient. After all, that is the loop that leads to the large emit
> times.

Good point. I was not aware of this touch_nmi_watchdog().

> For QEMU, the touch_nmi_watchdog() within wait_for_lsr() will never be
> called because QEMU does not implement baud rates. So that may be reason
> enough to accept this change.

Another good point.

Well, the original problem happened on bare metal. And the problem
was reporoducible even with the extra touch_nmi_watchog() in
univ8250_console_write_atomic().

I was confused _until_ I realized that touch_nmi_watchog()
modified per-CPU variable:

notrace void arch_touch_nmi_watchdog(void)
{
	raw_cpu_write(watchdog_hardlockup_touched, true);
}

And the hardlockup detector checked only the one per-CPU variable
as well:

void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
{
	if (per_cpu(watchdog_hardlockup_touched, cpu)) {
		per_cpu(watchdog_hardlockup_touched, cpu) = false;
		return;
	}
[...]
}

By other words, touch_nmi_watchog() delays hardlockup report
only on the given CPU.

But we have two CPUs stuck by printk:

1. CPU2 is calling WARN():

   [    3.933488][    T1] WARNING: CPU: 2 PID: 1 at arch/x86/events/intel/uncore.c:1156 uncore_pci_pmu_register+0x15e/0x180

   It gets busy with flushing the backlog of pending messages
   in the emergency context.

   This context regularly touches the watchodog.
   So far, so good.


2. CPU0 tries to reacquire the console ownership so that it could
   restore IRQ settting from the printk kthread.

   The nbcon_reacquire_nobuf() is called with disabled IRQs
   so that it might trigger hardlockup. And it clearly
   happens:

   [    3.930291][    C0] watchdog: Watchdog detected hard LOCKUP on cpu 0
   [    3.930291][    C0] CPU: 0 UID: 0 PID: 18 Comm: pr/ttyS0 Not tainted 6.12.0-160000.18-default #1 PREEMPT(voluntary) SLFO-1.2 (unreleased) dd174c2cca19586eee16eaccfeba02f4d5b57c67
   [    3.930291][    C0] Hardware name: HPE ProLiant DL560 Gen11/ProLiant DL560 Gen11, BIOS 2.48 03/11/2025
   [    3.930291][    C0] RIP: 0010:nbcon_reacquire_nobuf+0x11/0x50
   [...]
   [    3.930291][    C0]  <TASK>
   [    3.930291][    C0]  serial8250_console_write+0x16d/0x5c0
   [    3.930291][    C0]  nbcon_emit_next_record+0x22c/0x250
   [    3.930291][    C0]  nbcon_emit_one+0x93/0xe0
   [    3.930291][    C0]  nbcon_kthread_func+0x13c/0x1c0


Note that CPU2 keeps the nbcon console ownership until all pending
messages are flushed and the ownership is blocked for a long
time:

static int __nbcon_atomic_flush_pending_con(struct console *con, u64 stop_seq,
					    bool allow_unsafe_takeover)
{
	if (!nbcon_context_try_acquire(ctxt, false))
		return -EPERM;

	while (nbcon_seq_read(con) < stop_seq) {
		if (!nbcon_emit_next_record(&wctxt, true))
			return -EAGAIN;
	}

	nbcon_context_release(ctxt);
}

An solution is to touch the watchdog also in nbcon_reacquire_nobuf()
because it might get blocked from known reasons. Something like:

diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
index 646801813415..dd5966261b09 100644
--- a/kernel/printk/nbcon.c
+++ b/kernel/printk/nbcon.c
@@ -12,6 +12,7 @@
 #include <linux/irqflags.h>
 #include <linux/kthread.h>
 #include <linux/minmax.h>
+#include <linux/nmi.h>
 #include <linux/percpu.h>
 #include <linux/preempt.h>
 #include <linux/slab.h>
@@ -932,8 +933,10 @@ void nbcon_reacquire_nobuf(struct nbcon_write_context *wctxt)
 {
 	struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
 
-	while (!nbcon_context_try_acquire(ctxt, true))
+	while (!nbcon_context_try_acquire(ctxt, true)) {
+		touch_nmi_watchdog();
 		cpu_relax();
+	}
 
 	nbcon_write_context_set_buf(wctxt, NULL, 0);
 }


Alternative solution would be to release the console ownership in
__nbcon_atomic_flush_pending_con() between each record. It might
give the kthread a chance to restore the IRQ setting an continue.

It might be better. But we would need to make sure that the kthread
would stay blocked until the emergency context flushes all messages.
Otherwise, the kthread would repeatedly lose the console ownership
in the middle of the message when __nbcon_atomic_flush_pending_con()
would acquire the context with NBCON_EMERGENCY_PRIO for the next
pending message.

We might need similar handshake also between panic and emergency
context.

I am not sure if this is worth the complexity.

What do you think?

Best Regards,
Petr

Re: [RFC 0/1] serial: 8250: nbcon_atomic_flush_pending() might trigger watchdog warnigns

Posted by John Ogness 1 week, 4 days ago

On 2025-09-22, Petr Mladek <pmladek@suse.com> wrote:
> On Mon 2025-08-25 13:06:27, John Ogness wrote:
>> On 2025-08-22, Petr Mladek <pmladek@suse.com> wrote:
>> > There are clearly visible two points where nbcon_atomic_flush_pending()
>> > took over the ownership from a lover priority context. I believe that:
>> >
>> >   + 1st occurrence is triggered by the "WARNING: CPU: 2 PID: 1 at
>> >     arch/x86/..." line printed with NBCON_PRIO_EMERGENCY.
>> >
>> >   + 2nd occurrence is triggered by the "Kernel panic - not syncing:
>> >     Hard LOCKUP" line printed with NBCON_PRIO_PANIC.
>> >
>> > There were flushed more than 2500lines, about 240kB of characters,
>> > in the NBCON_PRIO_EMERGENCY before the hardlockup detector
>> > triggered panic.
>> >
>> > If I count it correctly, a serial console with the speed 115200 baud/sec
>> > would be able to emit about 11.5kB/sec. And it would take about 20sec
>> > to emit the 240kB of messages.
>> >
>> > => softlockup is quite realistic
>> >
>> > Solution:
>> >
>> > IMHO, we really should flush all pending messages atomically.
>> > It means that the watchdog reports need to be prevented
>> > by touching the watchdog. It is not needed in
>> > univ8250_console_write_thread()
>> >
>> > => put back touch_nmi_watchdog() into univ8250_console_write_atomic().
>> 
>> I would expect the touch_nmi_watchdog() within wait_for_lsr() to be
>> sufficient. After all, that is the loop that leads to the large emit
>> times.
>
> Good point. I was not aware of this touch_nmi_watchdog().
>
>> For QEMU, the touch_nmi_watchdog() within wait_for_lsr() will never be
>> called because QEMU does not implement baud rates. So that may be reason
>> enough to accept this change.
>
> Another good point.
>
> Well, the original problem happened on bare metal. And the problem
> was reporoducible even with the extra touch_nmi_watchog() in
> univ8250_console_write_atomic().
>
> I was confused _until_ I realized that touch_nmi_watchog()
> modified per-CPU variable:
>
> notrace void arch_touch_nmi_watchdog(void)
> {
> 	raw_cpu_write(watchdog_hardlockup_touched, true);
> }
>
> And the hardlockup detector checked only the one per-CPU variable
> as well:
>
> void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
> {
> 	if (per_cpu(watchdog_hardlockup_touched, cpu)) {
> 		per_cpu(watchdog_hardlockup_touched, cpu) = false;
> 		return;
> 	}
> [...]
> }
>
> By other words, touch_nmi_watchog() delays hardlockup report
> only on the given CPU.
>
> But we have two CPUs stuck by printk:
>
> 1. CPU2 is calling WARN():
>
>    [    3.933488][    T1] WARNING: CPU: 2 PID: 1 at arch/x86/events/intel/uncore.c:1156 uncore_pci_pmu_register+0x15e/0x180
>
>    It gets busy with flushing the backlog of pending messages
>    in the emergency context.
>
>    This context regularly touches the watchodog.
>    So far, so good.
>
>
> 2. CPU0 tries to reacquire the console ownership so that it could
>    restore IRQ settting from the printk kthread.
>
>    The nbcon_reacquire_nobuf() is called with disabled IRQs
>    so that it might trigger hardlockup. And it clearly
>    happens:
>
>    [    3.930291][    C0] watchdog: Watchdog detected hard LOCKUP on cpu 0
>    [    3.930291][    C0] CPU: 0 UID: 0 PID: 18 Comm: pr/ttyS0 Not tainted 6.12.0-160000.18-default #1 PREEMPT(voluntary) SLFO-1.2 (unreleased) dd174c2cca19586eee16eaccfeba02f4d5b57c67
>    [    3.930291][    C0] Hardware name: HPE ProLiant DL560 Gen11/ProLiant DL560 Gen11, BIOS 2.48 03/11/2025
>    [    3.930291][    C0] RIP: 0010:nbcon_reacquire_nobuf+0x11/0x50
>    [...]
>    [    3.930291][    C0]  <TASK>
>    [    3.930291][    C0]  serial8250_console_write+0x16d/0x5c0
>    [    3.930291][    C0]  nbcon_emit_next_record+0x22c/0x250
>    [    3.930291][    C0]  nbcon_emit_one+0x93/0xe0
>    [    3.930291][    C0]  nbcon_kthread_func+0x13c/0x1c0
>
>
> Note that CPU2 keeps the nbcon console ownership until all pending
> messages are flushed and the ownership is blocked for a long
> time:
>
> static int __nbcon_atomic_flush_pending_con(struct console *con, u64 stop_seq,
> 					    bool allow_unsafe_takeover)
> {
> 	if (!nbcon_context_try_acquire(ctxt, false))
> 		return -EPERM;
>
> 	while (nbcon_seq_read(con) < stop_seq) {
> 		if (!nbcon_emit_next_record(&wctxt, true))
> 			return -EAGAIN;
> 	}
>
> 	nbcon_context_release(ctxt);
> }
>
> An solution is to touch the watchdog also in nbcon_reacquire_nobuf()
> because it might get blocked from known reasons. Something like:
>
> diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
> index 646801813415..dd5966261b09 100644
> --- a/kernel/printk/nbcon.c
> +++ b/kernel/printk/nbcon.c
> @@ -12,6 +12,7 @@
>  #include <linux/irqflags.h>
>  #include <linux/kthread.h>
>  #include <linux/minmax.h>
> +#include <linux/nmi.h>
>  #include <linux/percpu.h>
>  #include <linux/preempt.h>
>  #include <linux/slab.h>
> @@ -932,8 +933,10 @@ void nbcon_reacquire_nobuf(struct nbcon_write_context *wctxt)
>  {
>  	struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
>  
> -	while (!nbcon_context_try_acquire(ctxt, true))
> +	while (!nbcon_context_try_acquire(ctxt, true)) {
> +		touch_nmi_watchdog();
>  		cpu_relax();
> +	}
>  
>  	nbcon_write_context_set_buf(wctxt, NULL, 0);
>  }
>
>
> Alternative solution would be to release the console ownership in
> __nbcon_atomic_flush_pending_con() between each record. It might
> give the kthread a chance to restore the IRQ setting an continue.
>
> It might be better. But we would need to make sure that the kthread
> would stay blocked until the emergency context flushes all messages.
> Otherwise, the kthread would repeatedly lose the console ownership
> in the middle of the message when __nbcon_atomic_flush_pending_con()
> would acquire the context with NBCON_EMERGENCY_PRIO for the next
> pending message.
>
> We might need similar handshake also between panic and emergency
> context.
>
> I am not sure if this is worth the complexity.
>
> What do you think?

Originally I had implemented the atomic flushing to release between
records. The problem is, as you mentioned, that the threaded printers
keep jumping back in. So you end up with lots of "replaying previous
printk message" from the atomic printer taking over all the time. This
is visible from a simple WARN() and it is ugly as hell.

Trying to make the output clean is quite tricky. Mainly because the
lower-prio context (which may or may not be the kthread printer) and the
higher-prio context need to understand each other's intentions and
somehow coordinate. My code started to look like I was implementing a
second layer of ownership (indended ownership) and/or some type of
bizarre scheduling with "printing-prio boosting" and/or "proxy console
ownership". It was a lot of code to make emergency blocks look sane.

In the end I decided to keep things simple and let the kthread printer
busy-wait, possibly with interrupts disabled. Your suggestion of adding
touch_nmi_watchdog() to nbcon_reacquire_nobuf() would also follow that
line of simplicity. The simplicity comes at the cost of possibly having
two CPUs dedicated to atomically flushing a single console (one that is
actually printing and one that is the busy-waiting normal-prio printer).

Note that for PREEMPT_RT the hardware interrupts are not actually
disabled. That is not an excuse to keep things this way, just a
reminder. Non-RT may also want to use that 2nd CPU for something useful,
in which case we would need the higher-prio printer to somehow
temporarily yield ownership to the lower-prio printer. And quite
frankly, that is not something the nbcon console ownership model was
designed to support.

If we can come up with an elegant way to handle the temporary transfer
while preserving clean output, I am all for it. I will take another look
and see if I can come up with a _proper_ (no duct tape) solution.

John

Re: [RFC 0/1] serial: 8250: nbcon_atomic_flush_pending() might trigger watchdog warnigns

Posted by Petr Mladek 1 week, 2 days ago

On Mon 2025-09-22 19:08:45, John Ogness wrote:
> On 2025-09-22, Petr Mladek <pmladek@suse.com> wrote:
> > On Mon 2025-08-25 13:06:27, John Ogness wrote:
> >> On 2025-08-22, Petr Mladek <pmladek@suse.com> wrote:
> >> > There are clearly visible two points where nbcon_atomic_flush_pending()
> >> > took over the ownership from a lover priority context. I believe that:
> >> >
> >> >   + 1st occurrence is triggered by the "WARNING: CPU: 2 PID: 1 at
> >> >     arch/x86/..." line printed with NBCON_PRIO_EMERGENCY.
> >> >
> >> >   + 2nd occurrence is triggered by the "Kernel panic - not syncing:
> >> >     Hard LOCKUP" line printed with NBCON_PRIO_PANIC.
> >> >
> >> > There were flushed more than 2500lines, about 240kB of characters,
> >> > in the NBCON_PRIO_EMERGENCY before the hardlockup detector
> >> > triggered panic.
> >> >
> >> > If I count it correctly, a serial console with the speed 115200 baud/sec
> >> > would be able to emit about 11.5kB/sec. And it would take about 20sec
> >> > to emit the 240kB of messages.
> >> >
> >> > => softlockup is quite realistic
> >> >
> > Well, the original problem happened on bare metal. And the problem
> > was reporoducible even with the extra touch_nmi_watchog() in
> > univ8250_console_write_atomic().
> >
> > By other words, touch_nmi_watchog() delays hardlockup report
> > only on the given CPU.
> >
> > An solution is to touch the watchdog also in nbcon_reacquire_nobuf()
> > because it might get blocked from known reasons. Something like:
> >
> > Alternative solution would be to release the console ownership in
> > __nbcon_atomic_flush_pending_con() between each record. It might
> > give the kthread a chance to restore the IRQ setting an continue.
> >
> > It might be better. But we would need to make sure that the kthread
> > would stay blocked until the emergency context flushes all messages.
> > Otherwise, the kthread would repeatedly lose the console ownership
> > in the middle of the message when __nbcon_atomic_flush_pending_con()
> > would acquire the context with NBCON_EMERGENCY_PRIO for the next
> > pending message.
> >
> > We might need similar handshake also between panic and emergency
> > context.
> >
> > I am not sure if this is worth the complexity.
> >
> > What do you think?
> 
> Originally I had implemented the atomic flushing to release between
> records. The problem is, as you mentioned, that the threaded printers
> keep jumping back in. So you end up with lots of "replaying previous
> printk message" from the atomic printer taking over all the time. This
> is visible from a simple WARN() and it is ugly as hell.
> 
> Trying to make the output clean is quite tricky. Mainly because the
> lower-prio context (which may or may not be the kthread printer) and the
> higher-prio context need to understand each other's intentions and
> somehow coordinate. My code started to look like I was implementing a
> second layer of ownership (indended ownership) and/or some type of
> bizarre scheduling with "printing-prio boosting" and/or "proxy console
> ownership". It was a lot of code to make emergency blocks look sane.

I tried to implement some naive solution, see below. I think that
it can be described as the 2nd layer of ownership.

It does not look that complicated. It might be because I used it only
in two paths which do the writing. And it is possible that it does
not work properly.

Then I got another idea. It might even easier.

> In the end I decided to keep things simple and let the kthread printer
> busy-wait,

I think that it might actually be enough to block the kthread when
any CPU is in emergency context, for example, by using a global
atomit counter.

There should not be problem with panic because
nbcon_context_try_acquire() fails on non-panic CPUs anyway.

I am going to try this approach.

> possibly with interrupts disabled. Your suggestion of adding
> touch_nmi_watchdog() to nbcon_reacquire_nobuf() would also follow that
> line of simplicity. The simplicity comes at the cost of possibly having
> two CPUs dedicated to atomically flushing a single console (one that is
> actually printing and one that is the busy-waiting normal-prio printer).

Yup.

> Note that for PREEMPT_RT the hardware interrupts are not actually
> disabled. That is not an excuse to keep things this way, just a
> reminder. Non-RT may also want to use that 2nd CPU for something useful,
> in which case we would need the higher-prio printer to somehow
> temporarily yield ownership to the lower-prio printer. And quite
> frankly, that is not something the nbcon console ownership model was
> designed to support.

Good to know.

> If we can come up with an elegant way to handle the temporary transfer
> while preserving clean output, I am all for it. I will take another look
> and see if I can come up with a _proper_ (no duct tape) solution.

I am not sure if you already started working on it. I rather share
my naive ideas early so that I do not duplicate the effort.
It is possible that you have been there.

Anyway, here is POC of an API which blocks writing/flushing
in a context with a lower priority:

From 5d8f9c61c8f67096feca5972c5e7f751c8371b9f Mon Sep 17 00:00:00 2001
From: Petr Mladek <pmladek@suse.com>
Date: Wed, 24 Sep 2025 10:42:26 +0200
Subject: [POC] printk/nbcon: Allow to release console context after each
 record in atomic_flush

It gives nbcon_reacquire_nobuf() to acquire the ownership to clean up
the console after the interrupted write_thread() call and allow
to preempt the printk kthread.

Add an API to block writing/flushing messages in a context with
lower priority. Otherwise, it would start flushing a pending
message and get interrupted again by the higher priority context.

The API is used in both code paths which try to acquire the nbcon
console ownership and try to write a message using
nbcon_emit_next_record().

The priority field is set in nbcon unsafe context which
prevents takeovers. The context must not clear it after
loosing the console ownership.

Failure to reserve the flush priority is handled the same way
as a failure to get the context ownership. So, it might somehow
work.

Warning: This is just a POC. The code is not tested.

FIXME: The clearing of the flush priority is racy. It might
       clear a value set by another context when it was
       cleared by a higher priority context in the mean
       time.

       There are ways to fix it.

       But wait! It might be enough to synchronize normal vs. emergency
       context. Non-panic context won't be able to get
       the ownership anyway.

Reviewed-by: Petr Mladek <pmladek@suse.com>
---
 include/linux/console.h |  2 +
 kernel/printk/nbcon.c   | 86 ++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 86 insertions(+), 2 deletions(-)

diff --git a/include/linux/console.h b/include/linux/console.h
index 8f10d0a85bb4..51750f92673f 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -326,6 +326,7 @@ struct nbcon_write_context {
  * @nbcon_seq:		Sequence number of the next record for nbcon to print
  * @nbcon_device_ctxt:	Context available for non-printing operations
  * @nbcon_prev_seq:	Seq num the previous nbcon owner was assigned to print
+ * @nbcon_flush_prio:   Priority of a context flushing the console
  * @pbufs:		Pointer to nbcon private buffer
  * @kthread:		Printer kthread for this console
  * @rcuwait:		RCU-safe wait object for @kthread waking
@@ -461,6 +462,7 @@ struct console {
 	atomic_long_t		__private nbcon_seq;
 	struct nbcon_context	__private nbcon_device_ctxt;
 	atomic_long_t           __private nbcon_prev_seq;
+	enum nbcon_prio		__private nbcon_flush_prio;
 
 	struct printk_buffers	*pbufs;
 	struct task_struct	*kthread;
diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
index 646801813415..575b2628e0b2 100644
--- a/kernel/printk/nbcon.c
+++ b/kernel/printk/nbcon.c
@@ -911,6 +911,78 @@ bool nbcon_exit_unsafe(struct nbcon_write_context *wctxt)
 }
 EXPORT_SYMBOL_GPL(nbcon_exit_unsafe);
 
+/**
+ * nbcon_context_get_flush_prio - Reserve writing by the given context priority
+ * @wctxt:	The write context which wants to write messages
+ *
+ * The function allows to reserve rights for emitting/flushing messages with
+ * the priority of the given context.
+ *
+ * It signalizes an intention to flush pending messages.
+ *
+ * The motivation is to allow releasing the nbcon console ownership after
+ * each emitted message and still block any context with a lower priority
+ * from flushing the pending messages. It prevents repeated interrupts
+ * of the lower priority context in the middle of the message.
+ *
+ * Return:	True when the write context might try to flush messages.
+ *		False when a context with a higher priority is flushing
+ *		messages.
+ *
+ * FIXME: Maybe, only the NORMAL vs. EMERGENCY context is interesting.
+ *	  Both these context will get blocked when there is a panic
+ *	  in progress,
+ *
+ * It might be enough to synchronize kthread vs. emergency_enter/exit API.
+ * Well, there is also the legacy kthread.
+ */
+int nbcon_context_get_flush_prio(struct nbcon_write_context *wctxt)
+{
+	struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
+	struct console *con = ctxt->console;
+	ret = true;
+
+	if (!nbcon_context_try_acquire(ctxt, false))
+		return false;
+
+	if (!nbcon_context_enter_unsafe(ctxt))
+		return false;
+
+	if (con->nbcon_flush_prio <= ctxt->prio)
+		con->nbcon_flush_prio = ctxt->prio;
+	else
+		ret = false;
+
+	if (!nbcon_context_exit_unsafe(ctxt))
+		ret = false;
+
+	nbcon_context_release(ctxt);
+
+	return ret;
+}
+
+void nbcon_context_put_flush_prio(struct nbcon_write_context *wctxt)
+{
+	struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
+	struct console *con = ctxt->console;
+	ret = 0;
+
+	if (!nbcon_context_try_acquire(ctxt, false))
+		return -EPERM;
+
+	if (!nbcon_context_enter_unsafe(ctxt))
+		return -EAGAIN;
+
+	con->nbcon_flush_prio = NBCON_PRIO_NONE;
+
+	if (!nbcon_context_exit_unsafe(ctxt))
+		ret = -EAGAIN;
+
+	nbcon_context_release(ctxt);
+
+	return ret;
+}
+
 /**
  * nbcon_reacquire_nobuf - Reacquire a console after losing ownership
  *				while printing
@@ -1120,6 +1192,8 @@ static bool nbcon_emit_one(struct nbcon_write_context *wctxt, bool use_atomic)
 		cant_migrate();
 	}
 
+	if (!nbcon_context_get_flush_prio(wctxt))
+		goto out;
 	if (!nbcon_context_try_acquire(ctxt, false))
 		goto out;
 
@@ -1135,6 +1209,7 @@ static bool nbcon_emit_one(struct nbcon_write_context *wctxt, bool use_atomic)
 		goto out;
 
 	nbcon_context_release(ctxt);
+	nbcon_context_put_flush_prio(wctxt);
 
 	ret = ctxt->backlog;
 out:
@@ -1505,10 +1580,13 @@ static int __nbcon_atomic_flush_pending_con(struct console *con, u64 stop_seq,
 	ctxt->prio			= nbcon_get_default_prio();
 	ctxt->allow_unsafe_takeover	= allow_unsafe_takeover;
 
-	if (!nbcon_context_try_acquire(ctxt, false))
+	if (!nbcon_context_get_flush_prio(&wctxt);
 		return -EPERM;
 
 	while (nbcon_seq_read(con) < stop_seq) {
+		if (!nbcon_context_try_acquire(ctxt, false))
+			return -EPERM;
+
 		/*
 		 * nbcon_emit_next_record() returns false when the console was
 		 * handed over or taken over. In both cases the context is no
@@ -1523,10 +1601,14 @@ static int __nbcon_atomic_flush_pending_con(struct console *con, u64 stop_seq,
 				err = -ENOENT;
 			break;
 		}
+
+		nbcon_context_release(ctxt);
 	}
 
-	nbcon_context_release(ctxt);
+	nbcon_context_put_flush_prio(&wctxt);
+
 	return err;
+
 }
 
 /**
-- 
2.51.0

I am going to try an easier approach which would just block the
kthread when any CPU is in emergency context.

Best Regards,
Petr

Re: [RFC 0/1] serial: 8250: nbcon_atomic_flush_pending() might trigger watchdog warnigns

Posted by John Ogness 1 week, 2 days ago

On 2025-09-24, Petr Mladek <pmladek@suse.com> wrote:
> I tried to implement some naive solution, see below. I think that
> it can be described as the 2nd layer of ownership.
>
> It does not look that complicated. It might be because I used it only
> in two paths which do the writing. And it is possible that it does
> not work properly.

It is an interesting approach and is one that I tend to prefer.

> Then I got another idea. It might even easier.
>
> I think that it might actually be enough to block the kthread when
> any CPU is in emergency context, for example, by using a global
> atomit counter.

This is the quick idea that usually comes first. Basically introducing
an emergency_in_progress() to keep the kthreads out. My problem with
this is that it causes _all_ consoles to synchronize with each other,
which we worked hard to get away from. Yes, I realize Linus rejected the
"store the backtrace, then print it" implementation, which limits the
effectiveness of parallel printing. Nevertheless, I would prefer to work
towards returning to parallelization, rather than reducing it further.

> I am not sure if you already started working on it. I rather share
> my naive ideas early so that I do not duplicate the effort.
> It is possible that you have been there.

Thanks. Yes, I tried various ideas like this. But your version does go
about it differently. Although I am seeing the same problems. My
comments below.

> Anyway, here is POC of an API which blocks writing/flushing
> in a context with a lower priority:
>
> From 5d8f9c61c8f67096feca5972c5e7f751c8371b9f Mon Sep 17 00:00:00 2001
> From: Petr Mladek <pmladek@suse.com>
> Date: Wed, 24 Sep 2025 10:42:26 +0200
> Subject: [POC] printk/nbcon: Allow to release console context after each
>  record in atomic_flush
>
> It gives nbcon_reacquire_nobuf() to acquire the ownership to clean up
> the console after the interrupted write_thread() call and allow
> to preempt the printk kthread.
>
> Add an API to block writing/flushing messages in a context with
> lower priority. Otherwise, it would start flushing a pending
> message and get interrupted again by the higher priority context.
>
> The API is used in both code paths which try to acquire the nbcon
> console ownership and try to write a message using
> nbcon_emit_next_record().
>
> The priority field is set in nbcon unsafe context which
> prevents takeovers. The context must not clear it after
> loosing the console ownership.
>
> Failure to reserve the flush priority is handled the same way
> as a failure to get the context ownership. So, it might somehow
> work.
>
> Warning: This is just a POC. The code is not tested.
>
> FIXME: The clearing of the flush priority is racy. It might
>        clear a value set by another context when it was
>        cleared by a higher priority context in the mean
>        time.
>
>        There are ways to fix it.
>
>        But wait! It might be enough to synchronize normal vs. emergency
>        context. Non-panic context won't be able to get
>        the ownership anyway.
>
> Reviewed-by: Petr Mladek <pmladek@suse.com>
> ---
>  include/linux/console.h |  2 +
>  kernel/printk/nbcon.c   | 86 ++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 86 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/console.h b/include/linux/console.h
> index 8f10d0a85bb4..51750f92673f 100644
> --- a/include/linux/console.h
> +++ b/include/linux/console.h
> @@ -326,6 +326,7 @@ struct nbcon_write_context {
>   * @nbcon_seq:		Sequence number of the next record for nbcon to print
>   * @nbcon_device_ctxt:	Context available for non-printing operations
>   * @nbcon_prev_seq:	Seq num the previous nbcon owner was assigned to print
> + * @nbcon_flush_prio:   Priority of a context flushing the console
>   * @pbufs:		Pointer to nbcon private buffer
>   * @kthread:		Printer kthread for this console
>   * @rcuwait:		RCU-safe wait object for @kthread waking
> @@ -461,6 +462,7 @@ struct console {
>  	atomic_long_t		__private nbcon_seq;
>  	struct nbcon_context	__private nbcon_device_ctxt;
>  	atomic_long_t           __private nbcon_prev_seq;
> +	enum nbcon_prio		__private nbcon_flush_prio;
>  
>  	struct printk_buffers	*pbufs;
>  	struct task_struct	*kthread;
> diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
> index 646801813415..575b2628e0b2 100644
> --- a/kernel/printk/nbcon.c
> +++ b/kernel/printk/nbcon.c
> @@ -911,6 +911,78 @@ bool nbcon_exit_unsafe(struct nbcon_write_context *wctxt)
>  }
>  EXPORT_SYMBOL_GPL(nbcon_exit_unsafe);
>  
> +/**
> + * nbcon_context_get_flush_prio - Reserve writing by the given context priority
> + * @wctxt:	The write context which wants to write messages
> + *
> + * The function allows to reserve rights for emitting/flushing messages with
> + * the priority of the given context.
> + *
> + * It signalizes an intention to flush pending messages.
> + *
> + * The motivation is to allow releasing the nbcon console ownership after
> + * each emitted message and still block any context with a lower priority
> + * from flushing the pending messages. It prevents repeated interrupts
> + * of the lower priority context in the middle of the message.
> + *
> + * Return:	True when the write context might try to flush messages.
> + *		False when a context with a higher priority is flushing
> + *		messages.
> + *
> + * FIXME: Maybe, only the NORMAL vs. EMERGENCY context is interesting.
> + *	  Both these context will get blocked when there is a panic
> + *	  in progress,
> + *
> + * It might be enough to synchronize kthread vs. emergency_enter/exit API.
> + * Well, there is also the legacy kthread.
> + */
> +int nbcon_context_get_flush_prio(struct nbcon_write_context *wctxt)
> +{
> +	struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
> +	struct console *con = ctxt->console;
> +	ret = true;
> +
> +	if (!nbcon_context_try_acquire(ctxt, false))
> +		return false;
> +
> +	if (!nbcon_context_enter_unsafe(ctxt))
> +		return false;

Since a printer will want to do a console acquire right after setting
the flush_prio, it seems to me it would be better to require the caller
has already performed the console acquire. Then it does not need to give
it up in between. This enter_unsafe is enough to guarantee that this
context is still the owner.

> +
> +	if (con->nbcon_flush_prio <= ctxt->prio)
> +		con->nbcon_flush_prio = ctxt->prio;
> +	else
> +		ret = false;
> +
> +	if (!nbcon_context_exit_unsafe(ctxt))
> +		ret = false;
> +
> +	nbcon_context_release(ctxt);
> +
> +	return ret;
> +}
> +
> +void nbcon_context_put_flush_prio(struct nbcon_write_context *wctxt)
> +{
> +	struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
> +	struct console *con = ctxt->console;
> +	ret = 0;
> +
> +	if (!nbcon_context_try_acquire(ctxt, false))
> +		return -EPERM;

This is my main concern. If a context has set the flush_prio to
something higher, it _MUST_ set it back down later. This cannot be best
effort. A failed acquire does not mean that a context with the same
priority is the owner (for example, it could be a NORMAL context that is
in an unsafe section). Remember that there are owners that are not
printers.

So now we have a similar situation as nbcon_reacquire_nobuf(): we need
to regain ownership so that we can undo something we setup. And that is
the problem we are trying to solve in the first place. Maybe since this
moves the problem from NORMAL to EMERGENCY, the busy-waiting is
acceptable.

> +	if (!nbcon_context_enter_unsafe(ctxt))
> +		return -EAGAIN;
> +
> +	con->nbcon_flush_prio = NBCON_PRIO_NONE;
> +
> +	if (!nbcon_context_exit_unsafe(ctxt))
> +		ret = -EAGAIN;
> +
> +	nbcon_context_release(ctxt);
> +
> +	return ret;
> +}
> +
>  /**
>   * nbcon_reacquire_nobuf - Reacquire a console after losing ownership
>   *				while printing
> @@ -1120,6 +1192,8 @@ static bool nbcon_emit_one(struct nbcon_write_context *wctxt, bool use_atomic)
>  		cant_migrate();
>  	}
>  
> +	if (!nbcon_context_get_flush_prio(wctxt))
> +		goto out;
>  	if (!nbcon_context_try_acquire(ctxt, false))
>  		goto out;

All "out paths" must restore the flush prio.

>  
> @@ -1135,6 +1209,7 @@ static bool nbcon_emit_one(struct nbcon_write_context *wctxt, bool use_atomic)
>  		goto out;
>  
>  	nbcon_context_release(ctxt);
> +	nbcon_context_put_flush_prio(wctxt);
>  
>  	ret = ctxt->backlog;
>  out:
> @@ -1505,10 +1580,13 @@ static int __nbcon_atomic_flush_pending_con(struct console *con, u64 stop_seq,
>  	ctxt->prio			= nbcon_get_default_prio();
>  	ctxt->allow_unsafe_takeover	= allow_unsafe_takeover;
>  
> -	if (!nbcon_context_try_acquire(ctxt, false))
> +	if (!nbcon_context_get_flush_prio(&wctxt);
>  		return -EPERM;
>  
>  	while (nbcon_seq_read(con) < stop_seq) {
> +		if (!nbcon_context_try_acquire(ctxt, false))
> +			return -EPERM;
> +
>  		/*
>  		 * nbcon_emit_next_record() returns false when the console was
>  		 * handed over or taken over. In both cases the context is no
> @@ -1523,10 +1601,14 @@ static int __nbcon_atomic_flush_pending_con(struct console *con, u64 stop_seq,
>  				err = -ENOENT;
>  			break;
>  		}
> +
> +		nbcon_context_release(ctxt);
>  	}
>  
> -	nbcon_context_release(ctxt);
> +	nbcon_context_put_flush_prio(&wctxt);
> +
>  	return err;
> +
>  }
>  
>  /**
> -- 
> 2.51.0

As you mentioned, there is the problem that the flushing context could
change hands multiple times before the flush_prio is restored. And there
is also the recursive case where a WARN could happen within a WARN, or a
WARN could happen and then in an NMI another WARN. All these cases
probably mean that it needs to be a per-prio counter rather than simply
a single prio so that each context can increment/decrement their prio.

John

Re: [RFC 0/1] serial: 8250: nbcon_atomic_flush_pending() might trigger watchdog warnigns

Posted by Petr Mladek 1 week, 2 days ago

Added Andred Murray into Cc.

On Wed 2025-09-24 14:48:12, John Ogness wrote:
> On 2025-09-24, Petr Mladek <pmladek@suse.com> wrote:
> > I tried to implement some naive solution, see below. I think that
> > it can be described as the 2nd layer of ownership.
> >
> > It does not look that complicated. It might be because I used it only
> > in two paths which do the writing. And it is possible that it does
> > not work properly.
> 
> It is an interesting approach and is one that I tend to prefer.
> 
> > Then I got another idea. It might even easier.
> >
> > I think that it might actually be enough to block the kthread when
> > any CPU is in emergency context, for example, by using a global
> > atomit counter.
> 
> This is the quick idea that usually comes first. Basically introducing
> an emergency_in_progress() to keep the kthreads out. My problem with
> this is that it causes _all_ consoles to synchronize with each other,
> which we worked hard to get away from. Yes, I realize Linus rejected the
> "store the backtrace, then print it" implementation, which limits the
> effectiveness of parallel printing. Nevertheless, I would prefer to work
> towards returning to parallelization, rather than reducing it further.

Let me play the devil advocate for a while.

IMHO, keeping the kthreads running in parallel for a synchronous
emergency report opens a can of worms.

Yes, the kthreads might be fast enough when the emergency context
is busy with a slow console. But they might also cause repeated
takeovers for "every" message when a kthread starts emitting
each new message just to lose the ownership after few characters.

Honestly, blocking the kthreads during an emergency does not look
that bad to me.

> > I am not sure if you already started working on it. I rather share
> > my naive ideas early so that I do not duplicate the effort.
> > It is possible that you have been there.
> 
> Thanks. Yes, I tried various ideas like this. But your version does go
> about it differently. Although I am seeing the same problems. My
> comments below.
> 
> > +void nbcon_context_put_flush_prio(struct nbcon_write_context *wctxt)
> > +{
> > +	struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
> > +	struct console *con = ctxt->console;
> > +	ret = 0;
> > +
> > +	if (!nbcon_context_try_acquire(ctxt, false))
> > +		return -EPERM;
> 
> This is my main concern. If a context has set the flush_prio to
> something higher, it _MUST_ set it back down later. This cannot be best
> effort. A failed acquire does not mean that a context with the same
> priority is the owner (for example, it could be a NORMAL context that is
> in an unsafe section). Remember that there are owners that are not
> printers.
> 
> So now we have a similar situation as nbcon_reacquire_nobuf(): we need
> to regain ownership so that we can undo something we setup. And that is
> the problem we are trying to solve in the first place. Maybe since this
> moves the problem from NORMAL to EMERGENCY, the busy-waiting is
> acceptable.
>
> As you mentioned, there is the problem that the flushing context could
> change hands multiple times before the flush_prio is restored. And there
> is also the recursive case where a WARN could happen within a WARN, or a
> WARN could happen and then in an NMI another WARN. All these cases
> probably mean that it needs to be a per-prio counter rather than simply
> a single prio so that each context can increment/decrement their prio.

Thanks a lot for showing all the problems.

It might be easier when we combine it with the first approach. I mean
to block only the console-specific kthread from
__nbcon_atomic_flush_pending_con(). By other words, block
the kthread from __nbcon_atomic_flush_pending_con() instead of
emergency_enter() and store the counter in struct console.


Summary:

We currently have the following solutions for the original
problem (hardlockup in nbcon_reacquire_nobuf()):


1. Touch the watchdog in nbcon_reacquire_nobuf()

   Pros:
	+ trivial

   Cons:
	+ Two CPUs might be blocked by slow serial consoles.


2. Yield nbcon console context ownership between each record
   and block all kthreads from emergency_enter/exit API

   Pros:
	+ Only one CPU is blocked by slow serial console
	+ Prevents repeated takeovers for "every" new message

   Cons:
	+ More complex than 1
	+ Completely give up on parallel console handling in emergency


3. Yield nbcon console context ownership between each record
   and block only one kthread from __nbcon_atomic_flush_pending_con()

   Pros:
	+ Only one CPU is blocked by slow serial console
	+ Parallel console handling still possible in emergency

   Cons:
	+ More complex than 1   (similar to 2)
	+ Possible repeated takeovers for "every" new emergency message


Well, releasing the console context ownership after each record
might solve also some other problems [*]

I am going to try implementing the 3rd solution and see how
complicated  it would be.

It would be possible to change it two 2nd easily just by
using a global counter and updating it in emergency_enter/exit API.


[*] Andrew Murray is trying to do similar thing with console_lock
    and the legacy_kthread, see
    https://lore.kernel.org/r/20250915-printk_legacy_thread_console_lock-v1-0-f34d42a9bcb3@thegoodpenguin.co.uk

    He told me off-list that he saw similar problems also with nbcon_thread.
    I am not sure but it will likely be related to
    __nbcon_atomic_flush_pending_con() blocking a nbcon console context
    for too long.

Best Regards,
Petr

Re: [RFC 0/1] serial: 8250: nbcon_atomic_flush_pending() might trigger watchdog warnigns

Posted by Andrew Murray 4 days, 13 hours ago

On Wed, 24 Sept 2025 at 16:32, Petr Mladek <pmladek@suse.com> wrote:
>
> Added Andred Murray into Cc.
>

Thanks for the CC.

>
> Well, releasing the console context ownership after each record
> might solve also some other problems [*]
>
> I am going to try implementing the 3rd solution and see how
> complicated  it would be.
>
> It would be possible to change it two 2nd easily just by
> using a global counter and updating it in emergency_enter/exit API.
>
>
> [*] Andrew Murray is trying to do similar thing with console_lock
>     and the legacy_kthread, see
>     https://lore.kernel.org/r/20250915-printk_legacy_thread_console_lock-v1-0-f34d42a9bcb3@thegoodpenguin.co.uk
>
>     He told me off-list that he saw similar problems also with nbcon_thread.
>     I am not sure but it will likely be related to
>     __nbcon_atomic_flush_pending_con() blocking a nbcon console context
>     for too long.

I replicated my 'slow serial causes delays boot via contention at
filp_open call in console_on_rootfs' test, both with and without the
last patch [1] in your "printk/nbcon: Prevent hardlockup reports
caused by atomic nbcon flush" series. Though I found no obvious
improvement from your patch. I found without the patch the delay at
filp_open (with 8250 nbcon and with PREEMPT_RT) would range from 1.6
second through to 40 seconds, and with the patch from 0.07 through to
26 seconds. Potentially promising, but the numbers are all over the
place, so there must be something else going on. I'll continue to
investigate. In any case, that's not the problem your series was
trying to solve.

[1] https://lore.kernel.org/all/20250926124912.243464-4-pmladek@suse.com/

>
> Best Regards,
> Petr

Thanks,

Andrew Murray