From nobody Sun Dec 14 06:37:01 2025 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1D5E2158DA3 for ; Fri, 12 Dec 2025 12:45:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765543550; cv=none; b=n8vZ+yQYzTaGwKFN/WnrvimgCgwo9qZdUZwfRyDqV9BhPbQA7czNkIG65nRxI2PigYLPpgjSaKjs4EAX+vxwHK1ZGfFDQvCdBBtvbqsnHwYvIoZbt1a5Tu+kqQyH30QjwhJitHOlzPZ7Mue3wm0Ffz+6QQGclWi3qZlVKHmvk+k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765543550; c=relaxed/simple; bh=lKpPyjUBKMhEX1QqihcgRcxc4X7lHWD4yrKFEInvcKA=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=Jzh4VtbFtL9jNknjSHWx88R1hg5syqCOFvw/yQT10+gMqEaAo47BSf8i82R4as2WfaC9GIiHQIVOJpXprfRYaRiN8ZuRbeRPRJRS8ML4VWoRnKMmnzfx+CIEYqEK1riQXQxdkg2Uqirw7GqNLFQzqzXGOaIGjuwkDKUwbw6uW7E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com; spf=pass smtp.mailfrom=suse.com; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=nlmmiL6P; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b=Gn2QX7ql; arc=none smtp.client-ip=195.135.223.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=suse.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="nlmmiL6P"; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="Gn2QX7ql" Received: from pathway.suse.cz (unknown [IPv6:2a07:de40:b2bf:1b::12bd]) by smtp-out2.suse.de (Postfix) with ESMTP id 217C45BCF1; Fri, 12 Dec 2025 12:45:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1765543545; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=rSudLNDqQKxOxwukN402oAvn32+zEbpvs0asRQdKn6g=; b=nlmmiL6PLyLqVNqPED7ivodcEbw4EzOrvvNnBTzgxIkwmuBgMvVC19feKSslPDjoyEwnMM T+z1Y34Qrmk6gtocSE3RBDAdkdWndLxvCwgNretV6QF1P9xD3iynHdOT6iwMmpLDETPmZp w/XNS972pakk6piUxk1gWcPLbU76LtA= Authentication-Results: smtp-out2.suse.de; dkim=pass header.d=suse.com header.s=susede1 header.b=Gn2QX7ql DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1765543544; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=rSudLNDqQKxOxwukN402oAvn32+zEbpvs0asRQdKn6g=; b=Gn2QX7qlG7tb40t60nGitUGtEosbzISqGAYjTMUR4yEV74VMqxJ9meI9Y157uAFphOzq+2 pUXV9nls6tGw66EEufRj2YfhwfB04ercgW/lTY2+Jen81QuTKevQ0Fo/yF29h2FKcc8gZ7 ZApni6C5fx2qjvIO2THgCYfjFCZDd2g= From: Petr Mladek To: John Ogness Cc: Sergey Senozhatsky , Steven Rostedt , Breno Leitao , linux@armlinux.org.uk, paulmck@kernel.org, usamaarif642@gmail.com, leo.yan@arm.com, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, kernel-team@meta.com, rmikey@meta.com, Petr Mladek Subject: [PATCH v2] printk/nbcon: Restore IRQ in atomic flush after each emitted record Date: Fri, 12 Dec 2025 13:45:20 +0100 Message-ID: <20251212124520.244483-1-pmladek@suse.com> X-Mailer: git-send-email 2.52.0 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Spamd-Result: default: False [15.69 / 50.00]; SPAM_FLAG(5.00)[]; NEURAL_SPAM_LONG(3.49)[0.997]; BAYES_HAM(-3.00)[100.00%]; HFILTER_HOSTNAME_UNKNOWN(2.50)[]; NEURAL_SPAM_SHORT(2.01)[0.669]; RDNS_NONE(2.00)[]; ONCE_RECEIVED(1.20)[]; HFILTER_HELO_IP_A(1.00)[pathway.suse.cz]; MID_CONTAINS_FROM(1.00)[]; R_MISSING_CHARSET(0.50)[]; HFILTER_HELO_NORES_A_OR_MX(0.30)[pathway.suse.cz]; R_DKIM_ALLOW(-0.20)[suse.com:s=susede1]; MIME_GOOD(-0.10)[text/plain]; MX_GOOD(-0.01)[]; DKIM_TRACE(0.00)[suse.com:+]; FROM_HAS_DN(0.00)[]; FUZZY_RATELIMITED(0.00)[rspamd.com]; ARC_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; DIRECT_TO_MX(0.00)[git-send-email 2.52.0]; RCPT_COUNT_TWELVE(0.00)[13]; FROM_EQ_ENVFROM(0.00)[]; DKIM_SIGNED(0.00)[suse.com:s=susede1]; RCVD_COUNT_ZERO(0.00)[0]; FREEMAIL_CC(0.00)[chromium.org,goodmis.org,debian.org,armlinux.org.uk,kernel.org,gmail.com,arm.com,lists.infradead.org,vger.kernel.org,meta.com,suse.com]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.com:email,suse.com:dkim,suse.com:mid]; TO_DN_SOME(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; DNSWL_BLOCKED(0.00)[2a07:de40:b2bf:1b::12bd:from]; FREEMAIL_ENVRCPT(0.00)[gmail.com] X-Spamd-Bar: +++++++++++++++ X-Rspamd-Queue-Id: 217C45BCF1 X-Spam-Flag: YES X-Spam-Score: 15.69 X-Rspamd-Action: add header X-Rspamd-Server: rspamd2.dmz-prg2.suse.org X-Spam-Level: *************** X-Spam: Yes Content-Type: text/plain; charset="utf-8" The commit d5d399efff6577 ("printk/nbcon: Release nbcon consoles ownership in atomic flush after each emitted record") prevented stall of a CPU which lost nbcon console ownership because another CPU entered an emergency flush. But there is still the problem that the CPU doing the emergency flush might cause a stall on its own. Let's go even further and restore IRQ in the atomic flush after each emitted record. It is not a complete solution. The interrupts and/or scheduling might still be blocked when the emergency atomic flush was called with IRQs and/or scheduling disabled. But it should remove the following lockup: mlx5_core 0000:03:00.0: Shutdown was called kvm: exiting hardware virtualization arm-smmu-v3 arm-smmu-v3.10.auto: CMD_SYNC timeout at 0x00000103 [hwprod 0= x00000104, hwcons 0x00000102] smp: csd: Detected non-responsive CSD lock (#1) on CPU#4, waiting 5000000= 032 ns for CPU#00 do_nothing (kernel/smp.c:1057) smp: csd: CSD lock (#1) unresponsive. [...] Call trace: pl011_console_write_atomic (./arch/arm64/include/asm/vdso/processor.h:12 = drivers/tty/serial/amba-pl011.c:2540) (P) nbcon_emit_next_record (kernel/printk/nbcon.c:1049) __nbcon_atomic_flush_pending_con (kernel/printk/nbcon.c:1517) __nbcon_atomic_flush_pending.llvm.15488114865160659019 (./arch/arm64/incl= ude/asm/alternative-macros.h:254 ./arch/arm64/include/asm/cpufeature.h:808 = ./arch/arm64/include/asm/irqflags.h:192 kernel/printk/nbcon.c:1562 kernel/p= rintk/nbcon.c:1612) nbcon_atomic_flush_pending (kernel/printk/nbcon.c:1629) printk_kthreads_shutdown (kernel/printk/printk.c:?) syscore_shutdown (drivers/base/syscore.c:120) kernel_kexec (kernel/kexec_core.c:1045) __arm64_sys_reboot (kernel/reboot.c:794 kernel/reboot.c:722 kernel/reboot= .c:722) invoke_syscall (arch/arm64/kernel/syscall.c:50) el0_svc_common.llvm.14158405452757855239 (arch/arm64/kernel/syscall.c:?) do_el0_svc (arch/arm64/kernel/syscall.c:152) el0_svc (./arch/arm64/include/asm/alternative-macros.h:254 ./arch/arm64/i= nclude/asm/cpufeature.h:808 ./arch/arm64/include/asm/irqflags.h:73 arch/arm= 64/kernel/entry-common.c:169 arch/arm64/kernel/entry-common.c:182 arch/arm6= 4/kernel/entry-common.c:749) el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:820) el0t_64_sync (arch/arm64/kernel/entry.S:600) In this case, nbcon_atomic_flush_pending() is called from printk_kthreads_shutdown() with IRQs and scheduling enabled. Note that __nbcon_atomic_flush_pending_con() is directly called also from nbcon_device_release() where the disabled IRQs might break PREEMPT_RT guarantees. But the atomic flush is called only in emergency or panic situations where the latencies are irrelevant anyway. An ultimate solution would be a touching of watchdogs. But it would hide all problems. Let's do it later when anyone reports a stall which does not have a better solution. Closes: https://lore.kernel.org/r/sqwajvt7utnt463tzxgwu2yctyn5m6bjwrslsnupf= exeml6hkd@v6sqmpbu3vvu Tested-by: Breno Leitao Signed-off-by: Petr Mladek Reviewed-by: John Ogness --- Changes against [v1]: + Use scoped_guard(irqsave) [Leo Yan, John] + Kept Tested-by Breno because there was no functional change. Changes against [RFC]: + Added note about __nbcon_atomic_flush_pending_con() called from nbcon_device_release() and PREEMPT_RT into the commit message [John]. + Added Tested-by [Breno] [v1] https://lore.kernel.org/r/20251202135832.156559-1-pmladek@suse.com [RFC] https://lore.kernel.org/all/aSnI8UQRNICSKxAb@pathway.suse.cz/ kernel/printk/nbcon.c | 38 ++++++++++++++++++-------------------- 1 file changed, 18 insertions(+), 20 deletions(-) diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c index 3fa403f9831f..32fc12e53675 100644 --- a/kernel/printk/nbcon.c +++ b/kernel/printk/nbcon.c @@ -1557,18 +1557,27 @@ static int __nbcon_atomic_flush_pending_con(struct = console *con, u64 stop_seq) ctxt->allow_unsafe_takeover =3D nbcon_allow_unsafe_takeover(); =20 while (nbcon_seq_read(con) < stop_seq) { - if (!nbcon_context_try_acquire(ctxt, false)) - return -EPERM; - /* - * nbcon_emit_next_record() returns false when the console was - * handed over or taken over. In both cases the context is no - * longer valid. + * Atomic flushing does not use console driver synchronization + * (i.e. it does not hold the port lock for uart consoles). + * Therefore IRQs must be disabled to avoid being interrupted + * and then calling into a driver that will deadlock trying + * to acquire console ownership. */ - if (!nbcon_emit_next_record(&wctxt, true)) - return -EAGAIN; + scoped_guard(irqsave) { + if (!nbcon_context_try_acquire(ctxt, false)) + return -EPERM; =20 - nbcon_context_release(ctxt); + /* + * nbcon_emit_next_record() returns false when + * the console was handed over or taken over. + * In both cases the context is no longer valid. + */ + if (!nbcon_emit_next_record(&wctxt, true)) + return -EAGAIN; + + nbcon_context_release(ctxt); + } =20 if (!ctxt->backlog) { /* Are there reserved but not yet finalized records? */ @@ -1595,22 +1604,11 @@ static int __nbcon_atomic_flush_pending_con(struct = console *con, u64 stop_seq) static void nbcon_atomic_flush_pending_con(struct console *con, u64 stop_s= eq) { struct console_flush_type ft; - unsigned long flags; int err; =20 again: - /* - * Atomic flushing does not use console driver synchronization (i.e. - * it does not hold the port lock for uart consoles). Therefore IRQs - * must be disabled to avoid being interrupted and then calling into - * a driver that will deadlock trying to acquire console ownership. - */ - local_irq_save(flags); - err =3D __nbcon_atomic_flush_pending_con(con, stop_seq); =20 - local_irq_restore(flags); - /* * If there was a new owner (-EPERM, -EAGAIN), that context is * responsible for completing. --=20 2.52.0