[PATCH] bug: Introduce CONFIG_DEBUG_BUGVERBOSE_EXTRA=y to also log warning conditions

Ingo Molnar posted 1 patch 8 months, 4 weeks ago
include/asm-generic/bug.h |  7 +++++++
kernel/sched/core.c       |  2 ++
lib/Kconfig.debug         | 12 ++++++++++++
3 files changed, 21 insertions(+)
[PATCH] bug: Introduce CONFIG_DEBUG_BUGVERBOSE_EXTRA=y to also log warning conditions
Posted by Ingo Molnar 8 months, 4 weeks ago

* Ingo Molnar <mingo@kernel.org> wrote:

> > >    #define SCHED_WARN_ON(x)      WARN_ONCE(x, #x)
> > > 
> > > Because SCHED_WARN_ON() would output the 'x' condition
> > > as well, while WARN_ONCE() will only show a backtrace.
> > > 
> > > Hopefully these are rare enough to not really matter.
> > > 
> > > If it does, we should probably introduce a new WARN_ON()
> > > variant that outputs the condition in stringified form,
> > > or improve WARN_ON() itself.
> > 
> > So those strings really were useful, trouble is WARN_ONCE() generates
> > utter crap code compared to WARN_ON_ONCE(), but since SCHED_DEBUG that
> > doesn't really matter.
> 
> Why wouldn't it matter? CONFIG_SCHED_DEBUG was turned on for 99.9999% 
> of Linux users, ie. we generated crap code for most of our users.
> 
> And as a side effect of using the standard WARN_ON_ONCE() primitive we 
> now generate better code, at the expense of harder to interpret debug 
> output, right?
> 
> Ie. CONFIG_SCHED_DEBUG has obfuscated crappy code generation under the 
> "it's only debugging code" pretense, right?

So, to argue this via code, we'd like to have something like the patch below?

When enabled it will warn in the following fashion:

  static void super_perfect_kernel_function(void *ptr)
  {
	...
	WARN_ON_ONCE(ptr == 0 && 1);
	...
  }


  ------------[ cut here ]------------
  FAIL: 'ptr == 0 && 1' is true
  WARNING: CPU: 0 PID: 0 at kernel/sched/core.c:8511 sched_init+0x44/0x430
  ...

But the real question is, how do we keep distros from enabling 
CONFIG_DEBUG_BUGVERBOSE_EXTRA=y?

It does bloat the defconfig by about +144k .text and ~64k data, so 
maybe that's deterrence enough.

The BSS shift is due to it not using the clever x86 U2D tricks, right?

Thanks,

	Ingo

=================>
From: Ingo Molnar <mingo@kernel.org>
Date: Tue, 25 Mar 2025 11:35:20 +0100
Subject: [PATCH] bug: Introduce CONFIG_DEBUG_BUGVERBOSE_EXTRA=y to also log warning conditions

      text         data	    bss	     dec	    hex	filename
  29522704	7926322	1389904	38838930	250a292	vmlinux.before
  29667392	8017958	1363024	39048374	253d4b6	vmlinux.after

Totally-Not-Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 include/asm-generic/bug.h |  7 +++++++
 kernel/sched/core.c       |  2 ++
 lib/Kconfig.debug         | 12 ++++++++++++
 3 files changed, 21 insertions(+)

diff --git a/include/asm-generic/bug.h b/include/asm-generic/bug.h
index 387720933973..5475258a99dc 100644
--- a/include/asm-generic/bug.h
+++ b/include/asm-generic/bug.h
@@ -92,6 +92,11 @@ void warn_slowpath_fmt(const char *file, const int line, unsigned taint,
 		       const char *fmt, ...);
 extern __printf(1, 2) void __warn_printk(const char *fmt, ...);
 
+#ifdef CONFIG_DEBUG_BUGVERBOSE_EXTRA
+#define WARN_ON_ONCE(condition)						\
+	DO_ONCE_LITE_IF(condition, WARN, 1, "FAIL: '%s' is true", #condition)
+#endif
+
 #ifndef __WARN_FLAGS
 #define __WARN()		__WARN_printf(TAINT_WARN, NULL)
 #define __WARN_printf(taint, arg...) do {				\
@@ -107,6 +112,7 @@ extern __printf(1, 2) void __warn_printk(const char *fmt, ...);
 		__WARN_FLAGS(BUGFLAG_NO_CUT_HERE | BUGFLAG_TAINT(taint));\
 		instrumentation_end();					\
 	} while (0)
+#ifndef WARN_ON_ONCE
 #define WARN_ON_ONCE(condition) ({				\
 	int __ret_warn_on = !!(condition);			\
 	if (unlikely(__ret_warn_on))				\
@@ -115,6 +121,7 @@ extern __printf(1, 2) void __warn_printk(const char *fmt, ...);
 	unlikely(__ret_warn_on);				\
 })
 #endif
+#endif
 
 /* used internally by panic.c */
 
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 87540217fc09..71bf94bf68f8 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -8508,6 +8508,8 @@ void __init sched_init(void)
 	unsigned long ptr = 0;
 	int i;
 
+	WARN_ON_ONCE(ptr == 0 && 1);
+
 	/* Make sure the linker didn't screw up */
 #ifdef CONFIG_SMP
 	BUG_ON(!sched_class_above(&stop_sched_class, &dl_sched_class));
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index b1b92a9a8f24..88f215f712f8 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -206,6 +206,18 @@ config DEBUG_BUGVERBOSE
 	  of the BUG call as well as the EIP and oops trace.  This aids
 	  debugging but costs about 70-100K of memory.
 
+config DEBUG_BUGVERBOSE_EXTRA
+	bool "Extra verbose WARN_ON() reporting" if DEBUG_BUGVERBOSE
+	default n
+	help
+	  Say Y here to make WARN_ON() warnings extra verbose, printing
+	  the condition they warn about.
+
+	  This aids debugging but uses up some memory and causes some
+	  runtime overhead due to worse code generation.
+
+	  If unsure, say N.
+
 endmenu # "printk and dmesg options"
 
 config DEBUG_KERNEL
Re: [PATCH] bug: Introduce CONFIG_DEBUG_BUGVERBOSE_EXTRA=y to also log warning conditions
Posted by Peter Zijlstra 8 months, 4 weeks ago
On Tue, Mar 25, 2025 at 12:18:39PM +0100, Ingo Molnar wrote:

> So, to argue this via code, we'd like to have something like the patch below?

I would do it differently. If we know the thing is a simple string, we
can stick it in bug_entry and print from __report_bug() without causing
horrific shite at the call site.

The problem with WARN() is that it is a format string, which must be
filled out in situ. Resulting in calls to snprintf() and arguments and
whatnot.
Re: [PATCH] bug: Introduce CONFIG_DEBUG_BUGVERBOSE_EXTRA=y to also log warning conditions
Posted by Linus Torvalds 8 months, 3 weeks ago
 On Tue, 25 Mar 2025 at 05:36, Peter Zijlstra <peterz@infradead.org> wrote:
>
> The problem with WARN() is that it is a format string, which must be
> filled out in situ. Resulting in calls to snprintf() and arguments and
> whatnot.

A fair number of warnings do want the format string, so that you can
print out more information about what went wrong if the warning
triggered.

That said, I do think that the "just give a fixed string that is the
warning condition" is probably the right thing 90% of the time, and is
the much simpler interface both to use and causes much less code
(exactly because it's just a single hardcoded string at compile time).

So I think we end up wanting both.

But I *don't* like Ingo's suggestion of DEBUG_BUGVERBOSE_EXTRA,
because it does that "both" by making the simple case complicated.

How about going a different route instead? Right now we have that
CONFIG_DEBUG_BUGVERBOSE thing which adds the file name and line number
information. That has been very good.

But maybe that should be extended to also always take the compile-time
'#condition' string?

So then all warnings would have the warning condition string (assuming
you end up enabling DEBUG_BUGVERBOSE, of course, which I think
everybody pretty much does). With no extra code.

And then the _dynamic_ string - and associated code generation - would
be only for when you want to print out the actual values that caused
the warning.

Hmm?

             Linus
Re: [PATCH] bug: Introduce CONFIG_DEBUG_BUGVERBOSE_EXTRA=y to also log warning conditions
Posted by Peter Zijlstra 8 months, 3 weeks ago
On Tue, Mar 25, 2025 at 10:48:49AM -0700, Linus Torvalds wrote:
> Hmm?

That is indeed what I was thinking of; far better articulated :-)
[PATCH] bug: Add the condition string to the CONFIG_DEBUG_BUGVERBOSE=y output
Posted by Ingo Molnar 8 months, 3 weeks ago

* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> How about going a different route instead? Right now we have that 
> CONFIG_DEBUG_BUGVERBOSE thing which adds the file name and line 
> number information. That has been very good.
> 
> But maybe that should be extended to also always take the 
> compile-time '#condition' string?
> 
> So then all warnings would have the warning condition string 
> (assuming you end up enabling DEBUG_BUGVERBOSE, of course, which I 
> think everybody pretty much does). With no extra code.

So something like the patch below?

Testcase:

  @@ -8508,6 +8508,8 @@ void __init sched_init(void)
          unsigned long ptr = 0;
          int i;
 
  +       WARN_ON_ONCE(ptr == 0 && 1);
  +

Before:

  WARNING: CPU: 0 PID: 0 at kernel/sched/core.c:8511 sched_init+0x20/0x410

After:

  WARNING: CPU: 0 PID: 0 at [ptr == 0 && 1] kernel/sched/core.c:8511 sched_init+0x20/0x410
                            ^^^^^^^^^^^^^^^

I concatenated the condition string with the file string, so didn't 
have to extend the 'struct bug_entry' backend, and could be shared with 
the regular WARN() and BUG*() code as well without modifying its 
output.

The .text impact is zero, as hoped for:

       text       data        bss         dec        hex    filename
   29523998    7926322    1389904    38840224    250a7a0    vmlinux.before
   29523998    8024626    1389904    38938528    25227a0    vmlinue.after

So this does have the debugging advantages of SCHED_WARN_ON() and the 
code generation benefits of WARN_ON_ONCE().

Note that the patch has still the maturity of a Labradoodle puppy: it 
won't build on the majority of non-x86 architectures, has only been 
built and booted once, etc. - so it's not signed off on.

Thanks,

	Ingo

===================>
From: Ingo Molnar <mingo@kernel.org>
Date: Tue, 25 Mar 2025 12:18:44 +0100
Subject: [PATCH] bug: Add the condition string to the CONFIG_DEBUG_BUGVERBOSE=y output

       text       data        bss         dec        hex    filename
   29523998    7926322    1389904    38840224    250a7a0    vmlinux.before
   29523998    8024626    1389904    38938528    25227a0    vmlinue.after

Before:

  WARNING: CPU: 0 PID: 0 at kernel/sched/core.c:8511 sched_init+0x20/0x410

After:

  WARNING: CPU: 0 PID: 0 at [ptr == 0 && 1] kernel/sched/core.c:8511 sched_init+0x20/0x410
                            ^^^^^^^^^^^^^^^
Not-Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/Z-KRD3ODxT9f8Yjw@gmail.com
---
 arch/x86/include/asm/bug.h | 14 +++++++-------
 include/asm-generic/bug.h  |  7 ++++---
 kernel/sched/core.c        |  2 ++
 3 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/bug.h b/arch/x86/include/asm/bug.h
index f0e9acf72547..e966199c8ef7 100644
--- a/arch/x86/include/asm/bug.h
+++ b/arch/x86/include/asm/bug.h
@@ -39,7 +39,7 @@
 
 #ifdef CONFIG_DEBUG_BUGVERBOSE
 
-#define _BUG_FLAGS(ins, flags, extra)					\
+#define _BUG_FLAGS(cond_str, ins, flags, extra)				\
 do {									\
 	asm_inline volatile("1:\t" ins "\n"				\
 		     ".pushsection __bug_table,\"aw\"\n"		\
@@ -50,14 +50,14 @@ do {									\
 		     "\t.org 2b+%c3\n"					\
 		     ".popsection\n"					\
 		     extra						\
-		     : : "i" (__FILE__), "i" (__LINE__),		\
+		     : : "i" (cond_str __FILE__), "i" (__LINE__),		\
 			 "i" (flags),					\
 			 "i" (sizeof(struct bug_entry)));		\
 } while (0)
 
 #else /* !CONFIG_DEBUG_BUGVERBOSE */
 
-#define _BUG_FLAGS(ins, flags, extra)					\
+#define _BUG_FLAGS(cond_str, ins, flags, extra)				\
 do {									\
 	asm_inline volatile("1:\t" ins "\n"				\
 		     ".pushsection __bug_table,\"aw\"\n"		\
@@ -74,7 +74,7 @@ do {									\
 
 #else
 
-#define _BUG_FLAGS(ins, flags, extra)  asm volatile(ins)
+#define _BUG_FLAGS(cond_str, ins, flags, extra)  asm volatile(ins)
 
 #endif /* CONFIG_GENERIC_BUG */
 
@@ -82,7 +82,7 @@ do {									\
 #define BUG()							\
 do {								\
 	instrumentation_begin();				\
-	_BUG_FLAGS(ASM_UD2, 0, "");				\
+	_BUG_FLAGS("", ASM_UD2, 0, "");				\
 	__builtin_unreachable();				\
 } while (0)
 
@@ -92,11 +92,11 @@ do {								\
  * were to trigger, we'd rather wreck the machine in an attempt to get the
  * message out than not know about it.
  */
-#define __WARN_FLAGS(flags)					\
+#define __WARN_FLAGS(cond_str, flags)				\
 do {								\
 	__auto_type __flags = BUGFLAG_WARNING|(flags);		\
 	instrumentation_begin();				\
-	_BUG_FLAGS(ASM_UD2, __flags, ANNOTATE_REACHABLE(1b));	\
+	_BUG_FLAGS(cond_str, ASM_UD2, __flags, ANNOTATE_REACHABLE(1b)); \
 	instrumentation_end();					\
 } while (0)
 
diff --git a/include/asm-generic/bug.h b/include/asm-generic/bug.h
index 387720933973..c8e7126bc26e 100644
--- a/include/asm-generic/bug.h
+++ b/include/asm-generic/bug.h
@@ -100,17 +100,18 @@ extern __printf(1, 2) void __warn_printk(const char *fmt, ...);
 		instrumentation_end();					\
 	} while (0)
 #else
-#define __WARN()		__WARN_FLAGS(BUGFLAG_TAINT(TAINT_WARN))
+#define __WARN()		__WARN_FLAGS("", BUGFLAG_TAINT(TAINT_WARN))
 #define __WARN_printf(taint, arg...) do {				\
 		instrumentation_begin();				\
 		__warn_printk(arg);					\
-		__WARN_FLAGS(BUGFLAG_NO_CUT_HERE | BUGFLAG_TAINT(taint));\
+		__WARN_FLAGS("", BUGFLAG_NO_CUT_HERE | BUGFLAG_TAINT(taint));\
 		instrumentation_end();					\
 	} while (0)
 #define WARN_ON_ONCE(condition) ({				\
 	int __ret_warn_on = !!(condition);			\
 	if (unlikely(__ret_warn_on))				\
-		__WARN_FLAGS(BUGFLAG_ONCE |			\
+		__WARN_FLAGS("["#condition"] ",			\
+			     BUGFLAG_ONCE |			\
 			     BUGFLAG_TAINT(TAINT_WARN));	\
 	unlikely(__ret_warn_on);				\
 })
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 87540217fc09..71bf94bf68f8 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -8508,6 +8508,8 @@ void __init sched_init(void)
 	unsigned long ptr = 0;
 	int i;
 
+	WARN_ON_ONCE(ptr == 0 && 1);
+
 	/* Make sure the linker didn't screw up */
 #ifdef CONFIG_SMP
 	BUG_ON(!sched_class_above(&stop_sched_class, &dl_sched_class));
Re: [PATCH] bug: Add the condition string to the CONFIG_DEBUG_BUGVERBOSE=y output
Posted by Linus Torvalds 8 months, 3 weeks ago
On Tue, 25 Mar 2025 at 15:42, Ingo Molnar <mingo@kernel.org> wrote:
>
> So something like the patch below?
> [...]
> After:
>
>   WARNING: CPU: 0 PID: 0 at [ptr == 0 && 1] kernel/sched/core.c:8511 sched_init+0x20/0x410
>                             ^^^^^^^^^^^^^^^

Hmm. Is that the prettiest output ever? No. But it does seem workable,
and the patch is simple.

And I think the added condition string is useful, in that I often end
up looking up warnings that other people report and where the line
numbers have changed enough that it's not immediately obvious exactly
which warning it is. Not only does it disambiguate which warning it
is, it would probably often would obviate having to look it up
entirely because the warning message is now more useful.

So I think I like it. Let's see how it works in practice.

(I actually think the "CPU: 0 PID: 0" is likely the least useful part
of that warning string, and maybe *that* should be moved away and make
things a bit more legible, but I think that discussion might as well
be part of that "Let's see how it works")

            Linus
Re: [PATCH] bug: Add the condition string to the CONFIG_DEBUG_BUGVERBOSE=y output
Posted by Ingo Molnar 8 months, 3 weeks ago
* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Tue, 25 Mar 2025 at 15:42, Ingo Molnar <mingo@kernel.org> wrote:
> >
> > So something like the patch below?
> > [...]
> > After:
> >
> >   WARNING: CPU: 0 PID: 0 at [ptr == 0 && 1] kernel/sched/core.c:8511 sched_init+0x20/0x410
> >                             ^^^^^^^^^^^^^^^
> 
> Hmm. Is that the prettiest output ever? No. But it does seem workable,
> and the patch is simple.
> 
> And I think the added condition string is useful, in that I often end
> up looking up warnings that other people report and where the line
> numbers have changed enough that it's not immediately obvious exactly
> which warning it is. Not only does it disambiguate which warning it
> is, it would probably often would obviate having to look it up
> entirely because the warning message is now more useful.

Yeah, that exactly was the original motivation for SCHED_WARN_ON(): 
core kernel code often gets backported on and changed by distributions, 
so line numbers are fuzzy and with large functions it's sometimes 
unclear exactly where the warning originated from.

> So I think I like it. Let's see how it works in practice.
> 
> (I actually think the "CPU: 0 PID: 0" is likely the least useful part 
> of that warning string, and maybe *that* should be moved away and 
> make things a bit more legible, but I think that discussion might as 
> well be part of that "Let's see how it works")

Okay!

The CPU and PID part is particularly useless, given that it's repeated 
in the splat a few lines later:

  ------------[ cut here ]------------^M
  WARNING: CPU: 0 PID: 0 at [ptr == 0 && 1] kernel/sched/core.c:8511 sched_init+0x20/0x410
  Modules linked in:
  CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.14.0-01616-g94d7af2844aa #4 PREEMPT(undef)
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
  RIP: 0010:sched_init+0x20/0x410

So I'll just remove it, which will turn this into:

  WARNING: [ptr == 0 && 1] kernel/sched/core.c:8511 sched_init+0x20/0x410

Which is actually pretty nicely formatted IMHO and orders the 
information by expected entropy: most constant, most valuable 
information comes first.

BTW., there's also another option we still have open: by using a unique 
character separator that isn't 0 we could split up the single string 
into cond_str and FILE_str parts, and leave formatting to 
architectures. But I don't think it's needed if we get rid of the "CPU: 
PID:" noise though.

	Ingo