[PATCH printk v1 0/1] Allow unsafe ->write_atomic() for panic

John Ogness posted 1 patch 2 weeks, 6 days ago
include/linux/console.h |  3 +++
kernel/printk/nbcon.c   | 17 ++++++++++++++---
kernel/printk/printk.c  | 23 ++++++++++++++++-------
3 files changed, 33 insertions(+), 10 deletions(-)
[PATCH printk v1 0/1] Allow unsafe ->write_atomic() for panic
Posted by John Ogness 2 weeks, 6 days ago
Hi,

An effort is underway to update netconsole to support nbcon [0].
Currently it is not known exactly how a safe ->write_atomic()
callback for netconsole could be implemented. However, without
->write_atomic() implemented, there is guaranteed to be _no_
panic output.

We decided to allow unsafe ->write_atomic() implementations to
be used only as a last resort at the end of panic. This should
allow netconsole to be updated to support nbcon and still
provide panic output in "typical panic" situations.

I considered extending the feature to also allow such consoles
to print using their ->write_thread() callback whenever it is
known to be a sleepable context. However, this started to get
quite complex. We can revisit this extended feature if it is
determined that it is necessary.

John Ogness

[0] https://lore.kernel.org/lkml/b2qps3uywhmjaym4mht2wpxul4yqtuuayeoq4iv4k3zf5wdgh3@tocu6c7mj4lt

John Ogness (1):
  printk: nbcon: Allow unsafe write_atomic() for panic

 include/linux/console.h |  3 +++
 kernel/printk/nbcon.c   | 17 ++++++++++++++---
 kernel/printk/printk.c  | 23 ++++++++++++++++-------
 3 files changed, 33 insertions(+), 10 deletions(-)


base-commit: 37dbd4203b42c10b76d55471bb866900f99d6bc1
-- 
2.39.5
Re: [PATCH printk v1 0/1] Allow unsafe ->write_atomic() for panic
Posted by Breno Leitao 2 weeks, 1 day ago
On Fri, Sep 12, 2025 at 02:24:51PM +0206, John Ogness wrote:
> Hi,
> 
> An effort is underway to update netconsole to support nbcon [0].
> Currently it is not known exactly how a safe ->write_atomic()
> callback for netconsole could be implemented. However, without
> ->write_atomic() implemented, there is guaranteed to be _no_
> panic output.
> 
> We decided to allow unsafe ->write_atomic() implementations to
> be used only as a last resort at the end of panic. This should
> allow netconsole to be updated to support nbcon and still
> provide panic output in "typical panic" situations.
> 
> I considered extending the feature to also allow such consoles
> to print using their ->write_thread() callback whenever it is
> known to be a sleepable context. However, this started to get
> quite complex. We can revisit this extended feature if it is
> determined that it is necessary.

Thanks for this patch! I've successfully adapted my PoC netconsole
implementation to work with NBCON using your modifications, and the
initial results are encouraging. With your patch and a corresponding
->write_atomic callback, I'm receiving the complete set of messages that
were previously available through the regular console interface.

Upon further consideration, it's worth noting that not all network
drivers rely on irq-unsafe locks. In practice, only a subset of drivers
use them, while most network drivers I'm familiar with maintain IRQ-safe
TX paths.

If we could determine the IRQ safety characteristics (IRQ-safe vs
IRQ-unsafe TX) during netconsole registration, this would enable more
optimized behavior: netconsole could register as CON_NBCON_ATOMIC_UNSAFE
only when the underlying network adapter uses IRQ-unsafe locks. For
adapters with IRQ-safe implementations, netconsole could safely utilize
the ->write_atomic path without restrictions.

FWIW, I am attaching the patch I used to test netconsole on top of your
changes. Keep in mind this is is calling TX patch with IRQ disabled
given the list is depending on IRQ-safe lock.

This will be fixed once I do the following:

1) move target_list to use RCU
2) Untangling netconsole from netpoll [1]
3) They are depending on a conflicting netpoll fix [2]

Link: https://lore.kernel.org/all/willemdebruijn.kernel.a0f67bb6112a@gmail.com/ [1]
Link: https://lore.kernel.org/all/20250917-netconsole_torture-v4-1-0a5b3b8f81ce@debian.org/ [2]

commit e9f4a292a49ae6d3da29f1dca39754180d2608d7
Author: Breno Leitao <leitao@debian.org>
Date:   Tue Aug 19 04:14:58 2025 -0700

    netconsole: Add support for nbcon
    
    Add support for running netconsole using the new non‑blocking console
    (nbcon) infrastructure.
    
    The nbcon framework improves console handling by avoiding the global
    console lock and enabling asynchronous, non‑blocking writes from
    multiple contexts.
    
    With this option enabled, netconsole can operate as a fully
    asynchronous, lockless nbcon backend, not depending on console lock
    anymore.
    
    This support is marked experimental for now until it receives wider
    testing.
    
    This depends on the CON_NBCON_ATOMIC_UNSAFE nbcon flag, and
    ->write_atomic is unsafe and only called with NBCON_PRIO_PANIC priority.
    
    Signed-off-by: Breno Leitao <leitao@debian.org>

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index b29628d46be9b..ec9a430aa160e 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -382,6 +382,16 @@ config NETCONSOLE_PREPEND_RELEASE
 	  message.  See <file:Documentation/networking/netconsole.rst> for
 	  details.
 
+config NETCONSOLE_NBCON
+	bool "Enable non blocking netconsole (EXPERIMENTAL)"
+	depends on NETCONSOLE
+	default n
+	help
+	  Move netconsole to use non-blocking console (nbcons).  Non-blocking
+	  console (nbcon) is a new console infrastructure introduced to improve
+	  console handling by avoiding the global console lock (Big Kernel
+	  Lock) and enabling non-blocking, asynchronous writes to the console.
+
 config NETPOLL
 	def_bool NETCONSOLE
 
diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index e3722de08ea9f..ac2f0ace45d6e 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -1692,6 +1692,97 @@ static void send_ext_msg_udp(struct netconsole_target *nt, const char *msg,
 				   extradata_len);
 }
 
+static void do_write_msg(struct netconsole_target *nt, const char *msg,
+			 unsigned int len)
+{
+	const char *tmp;
+	int frag, left;
+
+	/*
+	 * We nest this inside the for-each-target loop above
+	 * so that we're able to get as much logging out to
+	 * at least one target if we die inside here, instead
+	 * of unnecessarily keeping all targets in lock-step.
+	 */
+	tmp = msg;
+	for (left = len; left;) {
+		frag = min(left, MAX_PRINT_CHUNK);
+		send_udp(nt, tmp, frag);
+		tmp += frag;
+		left -= frag;
+	}
+}
+
+#ifdef CONFIG_NETCONSOLE_NBCON
+static void netcon_write_ext_atomic(struct console *con,
+				    struct nbcon_write_context *wctxt)
+{
+	struct netconsole_target *nt;
+
+	list_for_each_entry(nt, &target_list, list)
+		if (nt->extended && nt->enabled && netif_running(nt->np.dev)) {
+			if (!nbcon_enter_unsafe(wctxt))
+				continue;
+			send_ext_msg_udp(nt, wctxt->outbuf, wctxt->len);
+			nbcon_exit_unsafe(wctxt);
+		}
+}
+
+static void netcon_write_ext_thread(struct console *con,
+				    struct nbcon_write_context *wctxt)
+{
+	struct netconsole_target *nt;
+
+	list_for_each_entry(nt, &target_list, list)
+		if (nt->extended && nt->enabled && netif_running(nt->np.dev)) {
+			if (!nbcon_enter_unsafe(wctxt))
+				continue;
+			send_ext_msg_udp(nt, wctxt->outbuf, wctxt->len);
+			nbcon_exit_unsafe(wctxt);
+		}
+}
+
+/* regular write call backs */
+static void netcon_write_thread(struct console *con,
+				struct nbcon_write_context *wctxt)
+{
+	struct netconsole_target *nt;
+
+	list_for_each_entry(nt, &target_list, list)
+		if (!nt->extended && nt->enabled && netif_running(nt->np.dev)) {
+			if (!nbcon_enter_unsafe(wctxt))
+				continue;
+			do_write_msg(nt, wctxt->outbuf, wctxt->len);
+			nbcon_exit_unsafe(wctxt);
+		}
+}
+
+static void netcon_write_atomic(struct console *con,
+				struct nbcon_write_context *wctxt)
+{
+	struct netconsole_target *nt;
+
+	list_for_each_entry(nt, &target_list, list)
+		if (!nt->extended && nt->enabled && netif_running(nt->np.dev)) {
+			if (!nbcon_enter_unsafe(wctxt))
+				continue;
+			do_write_msg(nt, wctxt->outbuf, wctxt->len);
+			nbcon_exit_unsafe(wctxt);
+		}
+}
+
+/* locks call backs */
+static void netconsole_device_lock(struct console *con, unsigned long *flags)
+{
+	/* protects all the targets at the same time */
+	spin_lock_irqsave(&target_list_lock, *flags);
+}
+
+static void netconsole_device_unlock(struct console *con, unsigned long flags)
+{
+	spin_unlock_irqrestore(&target_list_lock, flags);
+}
+#else
 static void write_ext_msg(struct console *con, const char *msg,
 			  unsigned int len)
 {
@@ -1710,10 +1801,8 @@ static void write_ext_msg(struct console *con, const char *msg,
 
 static void write_msg(struct console *con, const char *msg, unsigned int len)
 {
-	int frag, left;
 	unsigned long flags;
 	struct netconsole_target *nt;
-	const char *tmp;
 
 	if (oops_only && !oops_in_progress)
 		return;
@@ -1723,24 +1812,13 @@ static void write_msg(struct console *con, const char *msg, unsigned int len)
 
 	spin_lock_irqsave(&target_list_lock, flags);
 	list_for_each_entry(nt, &target_list, list) {
-		if (!nt->extended && nt->enabled && netif_running(nt->np.dev)) {
-			/*
-			 * We nest this inside the for-each-target loop above
-			 * so that we're able to get as much logging out to
-			 * at least one target if we die inside here, instead
-			 * of unnecessarily keeping all targets in lock-step.
-			 */
-			tmp = msg;
-			for (left = len; left;) {
-				frag = min(left, MAX_PRINT_CHUNK);
-				send_udp(nt, tmp, frag);
-				tmp += frag;
-				left -= frag;
-			}
-		}
+		if (!nt->extended && nt->enabled && netif_running(nt->np.dev))
+			do_write_msg(nt, msg, len);
 	}
 	spin_unlock_irqrestore(&target_list_lock, flags);
 }
+#endif
+
 
 static int netpoll_parse_ip_addr(const char *str, union inet_addr *addr)
 {
@@ -1919,14 +1997,30 @@ static void free_param_target(struct netconsole_target *nt)
 
 static struct console netconsole_ext = {
 	.name	= "netcon_ext",
+#ifdef CONFIG_NETCONSOLE_NBCON
+	.flags	= CON_ENABLED | CON_EXTENDED | CON_NBCON | CON_NBCON_ATOMIC_UNSAFE,
+	.write_thread = netcon_write_ext_thread,
+	.write_atomic = netcon_write_ext_atomic,
+	.device_lock = netconsole_device_lock,
+	.device_unlock = netconsole_device_unlock,
+#else
 	.flags	= CON_ENABLED | CON_EXTENDED,
 	.write	= write_ext_msg,
+#endif
 };
 
 static struct console netconsole = {
 	.name	= "netcon",
+#ifdef CONFIG_NETCONSOLE_NBCON
+	.flags	= CON_ENABLED | CON_NBCON | CON_NBCON_ATOMIC_UNSAFE,
+	.write_thread = netcon_write_thread,
+	.write_atomic = netcon_write_atomic,
+	.device_lock = netconsole_device_lock,
+	.device_unlock = netconsole_device_unlock,
+#else
 	.flags	= CON_ENABLED,
 	.write	= write_msg,
+#endif
 };
 
 static int __init init_netconsole(void)
Re: [PATCH printk v1 0/1] Allow unsafe ->write_atomic() for panic
Posted by John Ogness 6 days, 7 hours ago
Hi Breno,

On 2025-09-17, Breno Leitao <leitao@debian.org> wrote:
> Upon further consideration, it's worth noting that not all network
> drivers rely on irq-unsafe locks. In practice, only a subset of drivers
> use them, while most network drivers I'm familiar with maintain IRQ-safe
> TX paths.
>
> If we could determine the IRQ safety characteristics (IRQ-safe vs
> IRQ-unsafe TX) during netconsole registration, this would enable more
> optimized behavior: netconsole could register as CON_NBCON_ATOMIC_UNSAFE
> only when the underlying network adapter uses IRQ-unsafe locks. For
> adapters with IRQ-safe implementations, netconsole could safely utilize
> the ->write_atomic path without restrictions.

This is good to read. But note that if CON_NBCON_ATOMIC_UNSAFE is not
set, it is expected that ->write_atomic() will also function in NMI. So
being IRQ-safe may not be enough.

John Ogness
Re: [PATCH printk v1 0/1] Allow unsafe ->write_atomic() for panic
Posted by Breno Leitao 6 days, 1 hour ago
Hello John,

On Fri, Sep 26, 2025 at 11:27:49AM +0206, John Ogness wrote:
> On 2025-09-17, Breno Leitao <leitao@debian.org> wrote:
> > Upon further consideration, it's worth noting that not all network
> > drivers rely on irq-unsafe locks. In practice, only a subset of drivers
> > use them, while most network drivers I'm familiar with maintain IRQ-safe
> > TX paths.
> >
> > If we could determine the IRQ safety characteristics (IRQ-safe vs
> > IRQ-unsafe TX) during netconsole registration, this would enable more
> > optimized behavior: netconsole could register as CON_NBCON_ATOMIC_UNSAFE
> > only when the underlying network adapter uses IRQ-unsafe locks. For
> > adapters with IRQ-safe implementations, netconsole could safely utilize
> > the ->write_atomic path without restrictions.
> 
> This is good to read. But note that if CON_NBCON_ATOMIC_UNSAFE is not
> set, it is expected that ->write_atomic() will also function in NMI. So
> being IRQ-safe may not be enough.

What are the other requirements for ->write_atomic() so it could be
executed inside a NMI?

That brings me another point, I suppose that netconsole callbacks are
currently being called from NMI, given it is registered as a legacy
console, and legacy consoles are printked from inside NMIs, right?

Thanks!
--breno
Re: [PATCH printk v1 0/1] Allow unsafe ->write_atomic() for panic
Posted by Petr Mladek 3 days, 4 hours ago
On Fri 2025-09-26 08:17:53, Breno Leitao wrote:
> Hello John,
> 
> On Fri, Sep 26, 2025 at 11:27:49AM +0206, John Ogness wrote:
> > On 2025-09-17, Breno Leitao <leitao@debian.org> wrote:
> > > Upon further consideration, it's worth noting that not all network
> > > drivers rely on irq-unsafe locks. In practice, only a subset of drivers
> > > use them, while most network drivers I'm familiar with maintain IRQ-safe
> > > TX paths.
> > >
> > > If we could determine the IRQ safety characteristics (IRQ-safe vs
> > > IRQ-unsafe TX) during netconsole registration, this would enable more
> > > optimized behavior: netconsole could register as CON_NBCON_ATOMIC_UNSAFE
> > > only when the underlying network adapter uses IRQ-unsafe locks. For
> > > adapters with IRQ-safe implementations, netconsole could safely utilize
> > > the ->write_atomic path without restrictions.
> > 
> > This is good to read. But note that if CON_NBCON_ATOMIC_UNSAFE is not
> > set, it is expected that ->write_atomic() will also function in NMI. So
> > being IRQ-safe may not be enough.
> 
> What are the other requirements for ->write_atomic() so it could be
> executed inside a NMI?

Ideally, it should be synchronized only via the nbcon console context
API and should not need any additional lock.

Note that the nbcon console context should get synchronized with
the normal device lock via some wrappers, for example, see
uart_port_lock() in include/linux/serial_core.h.


> That brings me another point, I suppose that netconsole callbacks are
> currently being called from NMI, given it is registered as a legacy
> console, and legacy consoles are printked from inside NMIs, right?

The legacy console handling is automatically deferred in_nmi(), see
is_printk_legacy_deferred().

The only exception is panic() where the consoles are explicitly
flushed. It is not 100% safe. But the risk of a deadlock is reduced
by calling bust_spinlocks(1) which sets oops_in_progress.
Console drivers use trylock when oops_in_progress is set.

I could imagine that we allow the trick with oops_in_progress
and trylock also in the "unsafe" write_atomic() callbacks.
But it would be better to use only the nbcon context ownership
as suggested above.

Best Regards,
Petr
Re: [PATCH printk v1 0/1] Allow unsafe ->write_atomic() for panic
Posted by John Ogness 3 days, 3 hours ago
On 2025-09-29, Petr Mladek <pmladek@suse.com> wrote:
> On Fri 2025-09-26 08:17:53, Breno Leitao wrote:
>> On Fri, Sep 26, 2025 at 11:27:49AM +0206, John Ogness wrote:
>> > On 2025-09-17, Breno Leitao <leitao@debian.org> wrote:
>> > > Upon further consideration, it's worth noting that not all network
>> > > drivers rely on irq-unsafe locks. In practice, only a subset of drivers
>> > > use them, while most network drivers I'm familiar with maintain IRQ-safe
>> > > TX paths.
>> > >
>> > > If we could determine the IRQ safety characteristics (IRQ-safe vs
>> > > IRQ-unsafe TX) during netconsole registration, this would enable more
>> > > optimized behavior: netconsole could register as CON_NBCON_ATOMIC_UNSAFE
>> > > only when the underlying network adapter uses IRQ-unsafe locks. For
>> > > adapters with IRQ-safe implementations, netconsole could safely utilize
>> > > the ->write_atomic path without restrictions.
>> > 
>> > This is good to read. But note that if CON_NBCON_ATOMIC_UNSAFE is not
>> > set, it is expected that ->write_atomic() will also function in NMI. So
>> > being IRQ-safe may not be enough.
>> 
>> What are the other requirements for ->write_atomic() so it could be
>> executed inside a NMI?
>
> Ideally, it should be synchronized only via the nbcon console context
> API and should not need any additional lock.
>
> Note that the nbcon console context should get synchronized with
> the normal device lock via some wrappers, for example, see
> uart_port_lock() in include/linux/serial_core.h.

I would also like to add that write_atomic() calls are synchronized
against "unsafe sections"[0] of write_thread() and write_atomic()

... EXCEPT ...

As a last resort during panic, nbcon will perform hostile
takeovers. This means write_atomic() can be called even though the
driver is in unsafe sections. So your write_atomic() could be called
when your write_thread() is in an unsafe section. However,
write_atomic() can detect this by looking at unsafe_takeover of struct
nbcon_write_context. If true, you know your write_thread() was in an
unsafe section and you might need to be more careful or avoid certain
actions.

John

[0] Code surround by nbcon_enter_unsafe()/nbcon_exit_unsafe().