[PATCH v4 3/5] printk: nbcon: Allow KDB to acquire the NBCON context

Marcos Paulo de Souza posted 5 patches 2 weeks, 3 days ago
There is a newer version of this series
[PATCH v4 3/5] printk: nbcon: Allow KDB to acquire the NBCON context
Posted by Marcos Paulo de Souza 2 weeks, 3 days ago
KDB can interrupt any console to execute the "mirrored printing" at any
time, so add an exception to nbcon_context_try_acquire_direct to allow
to get the context if the current CPU is the same as kdb_printf_cpu.

This change will be necessary for the next patch, which fixes
kdb_msg_write to work with NBCON consoles by calling ->write_atomic on
such consoles. But to print it first needs to acquire the ownership of
the console, so nbcon_context_try_acquire_direct is fixed here.

Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com>
---
 include/linux/kdb.h   | 6 ++++++
 kernel/printk/nbcon.c | 7 ++++++-
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/include/linux/kdb.h b/include/linux/kdb.h
index ecbf819deeca118f27e98bf71bb37dd27a257ebb..9417ad7f124e95987caced07bc8684a1a6c04df4 100644
--- a/include/linux/kdb.h
+++ b/include/linux/kdb.h
@@ -207,11 +207,17 @@ static inline const char *kdb_walk_kallsyms(loff_t *pos)
 /* Dynamic kdb shell command registration */
 extern int kdb_register(kdbtab_t *cmd);
 extern void kdb_unregister(kdbtab_t *cmd);
+
+#define KDB_IS_ACTIVE() (READ_ONCE(kdb_printf_cpu) != raw_smp_processor_id())
+
 #else /* ! CONFIG_KGDB_KDB */
 static inline __printf(1, 2) int kdb_printf(const char *fmt, ...) { return 0; }
 static inline void kdb_init(int level) {}
 static inline int kdb_register(kdbtab_t *cmd) { return 0; }
 static inline void kdb_unregister(kdbtab_t *cmd) {}
+
+#define KDB_IS_ACTIVE() false
+
 #endif	/* CONFIG_KGDB_KDB */
 enum {
 	KDB_NOT_INITIALIZED,
diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
index ff218e95a505fd10521c2c4dfb00ad5ec5773953..8644e019e2391797e623fcc124d37ed4d460ccd9 100644
--- a/kernel/printk/nbcon.c
+++ b/kernel/printk/nbcon.c
@@ -10,6 +10,7 @@
 #include <linux/export.h>
 #include <linux/init.h>
 #include <linux/irqflags.h>
+#include <linux/kdb.h>
 #include <linux/kthread.h>
 #include <linux/minmax.h>
 #include <linux/percpu.h>
@@ -248,13 +249,17 @@ static int nbcon_context_try_acquire_direct(struct nbcon_context *ctxt,
 		 * since all non-panic CPUs are stopped during panic(), it
 		 * is safer to have them avoid gaining console ownership.
 		 *
-		 * If this acquire is a reacquire (and an unsafe takeover
+		 * One exception is if kdb is active, which may print
+		 * from multiple CPUs during a panic.
+		 *
+		 * Second exception is a reacquire (and an unsafe takeover
 		 * has not previously occurred) then it is allowed to attempt
 		 * a direct acquire in panic. This gives console drivers an
 		 * opportunity to perform any necessary cleanup if they were
 		 * interrupted by the panic CPU while printing.
 		 */
 		if (other_cpu_in_panic() &&
+		    !KDB_IS_ACTIVE() &&
 		    (!is_reacquire || cur->unsafe_takeover)) {
 			return -EPERM;
 		}

-- 
2.51.0
Re: [PATCH v4 3/5] printk: nbcon: Allow KDB to acquire the NBCON context
Posted by Petr Mladek 2 weeks, 1 day ago
On Mon 2025-09-15 08:20:32, Marcos Paulo de Souza wrote:
> KDB can interrupt any console to execute the "mirrored printing" at any
> time, so add an exception to nbcon_context_try_acquire_direct to allow
> to get the context if the current CPU is the same as kdb_printf_cpu.
> 
> This change will be necessary for the next patch, which fixes
> kdb_msg_write to work with NBCON consoles by calling ->write_atomic on
> such consoles. But to print it first needs to acquire the ownership of
> the console, so nbcon_context_try_acquire_direct is fixed here.
> 
> --- a/include/linux/kdb.h
> +++ b/include/linux/kdb.h
> @@ -207,11 +207,17 @@ static inline const char *kdb_walk_kallsyms(loff_t *pos)
>  /* Dynamic kdb shell command registration */
>  extern int kdb_register(kdbtab_t *cmd);
>  extern void kdb_unregister(kdbtab_t *cmd);
> +
> +#define KDB_IS_ACTIVE() (READ_ONCE(kdb_printf_cpu) != raw_smp_processor_id())

The condition looks inverted. It should be true when the CPU ID matches.

I actually think about using similar approach and naming scheme
as for the similar API checking @panic_cpu. There are patches
in -mm tree which consolidated that API, see
https://lore.kernel.org/r/20250825022947.1596226-2-wangjinchao600@gmail.com

In our case, the similar API would be:

/* Return true when KDB has locked for printing a message on this CPU. */
static inline
bool kdb_printf_on_this_cpu(void)
{
	/*
	 * We can use raw_smp_processor_id() here because the task could
	 * not get migrated when KDB has locked for printing on this CPU.
	 */
	return unlikely(READ_ONCE(kdb_printf_cpu) == raw_smp_processor_id());
}

> +
>  #else /* ! CONFIG_KGDB_KDB */
>  static inline __printf(1, 2) int kdb_printf(const char *fmt, ...) { return 0; }
>  static inline void kdb_init(int level) {}
>  static inline int kdb_register(kdbtab_t *cmd) { return 0; }
>  static inline void kdb_unregister(kdbtab_t *cmd) {}
> +
> +#define KDB_IS_ACTIVE() false

and here to match the style above:

static inline bool kdb_printf_on_this_cpu(void) { return false };

> +
>  #endif	/* CONFIG_KGDB_KDB */
>  enum {
>  	KDB_NOT_INITIALIZED,
> diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
> index ff218e95a505fd10521c2c4dfb00ad5ec5773953..8644e019e2391797e623fcc124d37ed4d460ccd9 100644
> --- a/kernel/printk/nbcon.c
> +++ b/kernel/printk/nbcon.c
> @@ -248,13 +249,17 @@ static int nbcon_context_try_acquire_direct(struct nbcon_context *ctxt,
>  		 * since all non-panic CPUs are stopped during panic(), it
>  		 * is safer to have them avoid gaining console ownership.
>  		 *
> -		 * If this acquire is a reacquire (and an unsafe takeover
> +		 * One exception is if kdb is active, which may print
> +		 * from multiple CPUs during a panic.

Also here the "active" is a bit ambiguous term. I would use:

		 * One exception is when kdb has locked for printing on this
		 * CPU.

> +		 *
> +		 * Second exception is a reacquire (and an unsafe takeover
>  		 * has not previously occurred) then it is allowed to attempt
>  		 * a direct acquire in panic. This gives console drivers an
>  		 * opportunity to perform any necessary cleanup if they were
>  		 * interrupted by the panic CPU while printing.
>  		 */
>  		if (other_cpu_in_panic() &&
> +		    !KDB_IS_ACTIVE() &&
>  		    (!is_reacquire || cur->unsafe_takeover)) {
>  			return -EPERM;
>  		}

I am sorry that I did not suggested the better names already when
this new API was discussed in v3.

Best Regards,
Petr