[PATCH] printk: Make console tracepoint safe in NMI() context

Petr Mladek posted 1 patch 3 years, 9 months ago
There is a newer version of this series
include/trace/events/printk.h | 11 ++++++++++-
kernel/printk/printk.c        |  2 +-
2 files changed, 11 insertions(+), 2 deletions(-)
[PATCH] printk: Make console tracepoint safe in NMI() context
Posted by Petr Mladek 3 years, 9 months ago
The commit 701850dc0c31bfadf75a0 ("printk, tracing: fix console
tracepoint") moved the tracepoint from console_unlock() to
vprintk_store(). As a result, it might be called in any
context and triggered the following warning:

  WARNING: CPU: 1 PID: 16462 at include/trace/events/printk.h:10 printk_sprint+0x81/0xda
  Modules linked in: ppdev parport_pc parport
  CPU: 1 PID: 16462 Comm: event_benchmark Not tainted 5.19.0-rc5-test+ #5
  Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
  EIP: printk_sprint+0x81/0xda
  Code: 89 d8 e8 88 fc 33 00 e9 02 00 00 00 eb 6b 64 a1 a4 b8 91 c1 e8 fd d6 ff ff 84 c0 74 5c 64 a1 14 08 92 c1 a9 00 00 f0 00 74 02 <0f> 0b 64 ff 05 14 08 92 c1 b8 e0 c4 6b c1 e8 a5 dc 00 00 89 c7 e8
  EAX: 80110001 EBX: c20a52f8 ECX: 0000000c EDX: 6d203036
  ESI: 3df6004c EDI: 00000000 EBP: c61fbd7c ESP: c61fbd70
  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010006
  CR0: 80050033 CR2: b7efc000 CR3: 05b80000 CR4: 001506f0
  Call Trace:
   vprintk_store+0x24b/0x2ff
   vprintk+0x37/0x4d
   _printk+0x14/0x16
   nmi_handle+0x1ef/0x24e
   ? find_next_bit.part.0+0x13/0x13
   ? find_next_bit.part.0+0x13/0x13
   ? function_trace_call+0xd8/0xd9
   default_do_nmi+0x57/0x1af
   ? trace_hardirqs_off_finish+0x2a/0xd9
   ? to_kthread+0xf/0xf
   exc_nmi+0x9b/0xf4
   asm_exc_nmi+0xae/0x29c

It comes from:

  #define __DO_TRACE(name, args, cond, rcuidle) \
  [...]
		/* srcu can't be used from NMI */	\
		WARN_ON_ONCE(rcuidle && in_nmi());	\

It might be possible to make srcu working in NMI. But it
would be slower on some architectures. It is not worth
doing it just because of this tracepoint.

It would be possible to disable this tracepoint in NMI
or in rcuidle context. Where the rcuidle context looks
more rare and thus more acceptable to be ignored.

Alternative solution would be to move the tracepoint
back to console code. But the location is less reliable
by definition. Also the synchronization against other
tracing messages is much worse.

Let's ignore the tracepoint in rcuidle context as the least
evil solution.

Link: https://lore.kernel.org/r/20220712151655.GU1790663@paulmck-ThinkPad-P17-Gen-1

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Petr Mladek <pmladek@suse.com>
---
 include/trace/events/printk.h | 11 ++++++++++-
 kernel/printk/printk.c        |  2 +-
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/include/trace/events/printk.h b/include/trace/events/printk.h
index 13d405b2fd8b..a3ee720f41b5 100644
--- a/include/trace/events/printk.h
+++ b/include/trace/events/printk.h
@@ -7,11 +7,20 @@
 
 #include <linux/tracepoint.h>
 
-TRACE_EVENT(console,
+TRACE_EVENT_CONDITION(console,
 	TP_PROTO(const char *text, size_t len),
 
 	TP_ARGS(text, len),
 
+	/*
+	 * trace_console_rcuidle() is not working in NMI. printk()
+	 * is used more often in NMI than in rcuidle context.
+	 * Choose the less evil solution here.
+	 *
+	 * raw_smp_processor_id() is reliable in rcuidle context.
+	 */
+	TP_CONDITION(!rcu_is_idle_cpu(raw_smp_processor_id())),
+
 	TP_STRUCT__entry(
 		__dynamic_array(char, msg, len + 1)
 	),
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index b49c6ff6dca0..bd76a45ecc7f 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2108,7 +2108,7 @@ static u16 printk_sprint(char *text, u16 size, int facility,
 		}
 	}
 
-	trace_console_rcuidle(text, text_len);
+	trace_console(text, text_len);
 
 	return text_len;
 }
-- 
2.35.3
Re: [PATCH] printk: Make console tracepoint safe in NMI() context
Posted by Steven Rostedt 3 years, 9 months ago
On Fri, 15 Jul 2022 14:01:52 +0200
Petr Mladek <pmladek@suse.com> wrote:

> +	/*
> +	 * trace_console_rcuidle() is not working in NMI. printk()
> +	 * is used more often in NMI than in rcuidle context.
> +	 * Choose the less evil solution here.
> +	 *
> +	 * raw_smp_processor_id() is reliable in rcuidle context.
> +	 */
> +	TP_CONDITION(!rcu_is_idle_cpu(raw_smp_processor_id())),
> +

As Marco mentioned in the other thread, would a check for
'rcu_is_watching()' be better?

-- Steve


>  	TP_STRUCT__entry(
>  		__dynamic_array(char, msg, len + 1)
>  	),
Re: [PATCH] printk: Make console tracepoint safe in NMI() context
Posted by Marco Elver 3 years, 9 months ago
On Fri, 15 Jul 2022 at 14:02, Petr Mladek <pmladek@suse.com> wrote:
>
> The commit 701850dc0c31bfadf75a0 ("printk, tracing: fix console
> tracepoint") moved the tracepoint from console_unlock() to
> vprintk_store(). As a result, it might be called in any
> context and triggered the following warning:
>
>   WARNING: CPU: 1 PID: 16462 at include/trace/events/printk.h:10 printk_sprint+0x81/0xda
>   Modules linked in: ppdev parport_pc parport
>   CPU: 1 PID: 16462 Comm: event_benchmark Not tainted 5.19.0-rc5-test+ #5
>   Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
>   EIP: printk_sprint+0x81/0xda
>   Code: 89 d8 e8 88 fc 33 00 e9 02 00 00 00 eb 6b 64 a1 a4 b8 91 c1 e8 fd d6 ff ff 84 c0 74 5c 64 a1 14 08 92 c1 a9 00 00 f0 00 74 02 <0f> 0b 64 ff 05 14 08 92 c1 b8 e0 c4 6b c1 e8 a5 dc 00 00 89 c7 e8
>   EAX: 80110001 EBX: c20a52f8 ECX: 0000000c EDX: 6d203036
>   ESI: 3df6004c EDI: 00000000 EBP: c61fbd7c ESP: c61fbd70
>   DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010006
>   CR0: 80050033 CR2: b7efc000 CR3: 05b80000 CR4: 001506f0
>   Call Trace:
>    vprintk_store+0x24b/0x2ff
>    vprintk+0x37/0x4d
>    _printk+0x14/0x16
>    nmi_handle+0x1ef/0x24e
>    ? find_next_bit.part.0+0x13/0x13
>    ? find_next_bit.part.0+0x13/0x13
>    ? function_trace_call+0xd8/0xd9
>    default_do_nmi+0x57/0x1af
>    ? trace_hardirqs_off_finish+0x2a/0xd9
>    ? to_kthread+0xf/0xf
>    exc_nmi+0x9b/0xf4
>    asm_exc_nmi+0xae/0x29c
>
> It comes from:
>
>   #define __DO_TRACE(name, args, cond, rcuidle) \
>   [...]
>                 /* srcu can't be used from NMI */       \
>                 WARN_ON_ONCE(rcuidle && in_nmi());      \
>
> It might be possible to make srcu working in NMI. But it
> would be slower on some architectures. It is not worth
> doing it just because of this tracepoint.
>
> It would be possible to disable this tracepoint in NMI
> or in rcuidle context. Where the rcuidle context looks
> more rare and thus more acceptable to be ignored.
>
> Alternative solution would be to move the tracepoint
> back to console code. But the location is less reliable
> by definition. Also the synchronization against other
> tracing messages is much worse.
>
> Let's ignore the tracepoint in rcuidle context as the least
> evil solution.
>
> Link: https://lore.kernel.org/r/20220712151655.GU1790663@paulmck-ThinkPad-P17-Gen-1
>
> Suggested-by: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Petr Mladek <pmladek@suse.com>
> ---
>  include/trace/events/printk.h | 11 ++++++++++-
>  kernel/printk/printk.c        |  2 +-
>  2 files changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/include/trace/events/printk.h b/include/trace/events/printk.h
> index 13d405b2fd8b..a3ee720f41b5 100644
> --- a/include/trace/events/printk.h
> +++ b/include/trace/events/printk.h
> @@ -7,11 +7,20 @@
>
>  #include <linux/tracepoint.h>
>
> -TRACE_EVENT(console,
> +TRACE_EVENT_CONDITION(console,
>         TP_PROTO(const char *text, size_t len),
>
>         TP_ARGS(text, len),
>
> +       /*
> +        * trace_console_rcuidle() is not working in NMI. printk()
> +        * is used more often in NMI than in rcuidle context.
> +        * Choose the less evil solution here.
> +        *
> +        * raw_smp_processor_id() is reliable in rcuidle context.
> +        */
> +       TP_CONDITION(!rcu_is_idle_cpu(raw_smp_processor_id())),
> +

Couldn't this just use rcu_is_watching()?

  | * rcu_is_watching - see if RCU thinks that the current CPU is not idle

Thanks,
-- Marco
Re: [PATCH] printk: Make console tracepoint safe in NMI() context
Posted by Steven Rostedt 3 years, 9 months ago
On Fri, 15 Jul 2022 14:39:52 +0200
Marco Elver <elver@google.com> wrote:

> Couldn't this just use rcu_is_watching()?
> 
>   | * rcu_is_watching - see if RCU thinks that the current CPU is not idle

Maybe, but I was thinking that Petr had a way to hit the issue that we
worry about. But since the non _rcuide() call requires rcu watching,
prehaps that is better to use.

-- Steve
Re: [PATCH] printk: Make console tracepoint safe in NMI() context
Posted by Paul E. McKenney 3 years, 9 months ago
On Fri, Jul 15, 2022 at 09:51:56AM -0400, Steven Rostedt wrote:
> On Fri, 15 Jul 2022 14:39:52 +0200
> Marco Elver <elver@google.com> wrote:
> 
> > Couldn't this just use rcu_is_watching()?
> > 
> >   | * rcu_is_watching - see if RCU thinks that the current CPU is not idle
> 
> Maybe, but I was thinking that Petr had a way to hit the issue that we
> worry about. But since the non _rcuide() call requires rcu watching,
> prehaps that is better to use.

In case this helps...  ;-)

The rcu_is_watching() function is designed to be used from the current
CPU, so it dispenses with memory ordering.  However, it explicitly
disables preemption in order to avoid weird preemption patterns.

The formulation that Marco used is designed to be used from a remote
CPU, and so it includes explicit memory ordering that is not needed
in this case.  But it does not disable preemption.

So if preemption is enabled at that point in tracing, you really want
to be using rcu_is_watching().

							Thanx, Paul
Re: [PATCH] printk: Make console tracepoint safe in NMI() context
Posted by Petr Mladek 3 years, 9 months ago
On Fri 2022-07-15 08:10:00, Paul E. McKenney wrote:
> On Fri, Jul 15, 2022 at 09:51:56AM -0400, Steven Rostedt wrote:
> > On Fri, 15 Jul 2022 14:39:52 +0200
> > Marco Elver <elver@google.com> wrote:
> > 
> > > Couldn't this just use rcu_is_watching()?
> > > 
> > >   | * rcu_is_watching - see if RCU thinks that the current CPU is not idle
> > 
> > Maybe, but I was thinking that Petr had a way to hit the issue that we
> > worry about. But since the non _rcuide() call requires rcu watching,
> > prehaps that is better to use.

I actually saw the warning even with simple sysrq+l. I wonder why
I have missed it during testing. It was probably well hidden within
the other backtraces.

I was not aware that rcu_is_watching() and rcu_is_idle_cpu() did
basically the same. I used rcu_is_idle_cpu() because of the "idle"
in the name and the function description ;-)

> In case this helps...  ;-)
> 
> The rcu_is_watching() function is designed to be used from the current
> CPU, so it dispenses with memory ordering.  However, it explicitly
> disables preemption in order to avoid weird preemption patterns.
> 
> The formulation that Marco used is designed to be used from a remote
> CPU, and so it includes explicit memory ordering that is not needed
> in this case.  But it does not disable preemption.
> 
> So if preemption is enabled at that point in tracing, you really want
> to be using rcu_is_watching().

rcu_is_watching() is the right variant then. I am going to send v2.

Thanks a lot for the detailed explanation.

Best Regards,
Petr
Re: [PATCH] printk: Make console tracepoint safe in NMI() context
Posted by Steven Rostedt 3 years, 9 months ago
On Fri, 15 Jul 2022 08:10:00 -0700
"Paul E. McKenney" <paulmck@kernel.org> wrote:

> So if preemption is enabled at that point in tracing, you really want
> to be using rcu_is_watching().

And yes, at that point in tracing, preemption is still enabled if the
tracepoint was called with preemption enabled.

Thus, we really need to convert that to rcu_is_watching().

-- Steve