[PATCH v3 0/5] generalize panic_print's dump function to be used by other kernel parts

Feng Tang posted 5 patches 3 months ago
.../admin-guide/kernel-parameters.txt         |  21 ++-
Documentation/admin-guide/sysctl/kernel.rst   |  20 ++-
include/linux/sys_info.h                      |  27 ++++
kernel/panic.c                                |  71 +++++-----
lib/Makefile                                  |   2 +-
lib/sys_info.c                                | 122 ++++++++++++++++++
6 files changed, 221 insertions(+), 42 deletions(-)
create mode 100644 include/linux/sys_info.h
create mode 100644 lib/sys_info.c
[PATCH v3 0/5] generalize panic_print's dump function to be used by other kernel parts
Posted by Feng Tang 3 months ago
When working on kernel stability issues, panic, task-hung and 
software/hardware lockup are frequently met. And to debug them, user
may need lots of system information at that time, like task call stacks,
lock info, memory info etc. 

panic case already has panic_print_sys_info() for this purpose, and has
a 'panic_print' bitmask to control what kinds of information is needed,
which is also helpful to debug other task-hung and lockup cases.

So this patchset extract the function out to a new file 'lib/sys_info.c',
and make it available for other cases which also need to dump system info
for debugging. 

Also as suggested by Petr Mladek, add 'panic_sys_info=' interface to
take human readable string like "tasks,mem,locks,timers,ftrace,....", 
and eventually obsolete the current 'panic_print' bitmap interface.

In RFC and V1 version, hung_task and SW/HW watchdog modules are enabled
with the new sys_info dump interface. In v2, they are kept out for
better review of current change, and will be posted later. 

Locally these have been used in our bug chasing for stability issues
and was proven helpful.

Many thanks to Petr Mladek for great suggestions on both the code and
architectures!

- Feng

One to do left is about adding note for obsoleting 'panic_print' cmdline
as discussed in https://lore.kernel.org/lkml/aFvBuOnD0cAEWJfl@U-2FWC9VHC-2323.local/
and will be posted later.

Changelog:

  Since v2:
     * Rename to PANIC_CONSOLE_REPLAY (Petr Mladek) 
     * Don't let kernel.h include sys_info.h (Petr Mladek)
     * Improve documents and coding style (Petr Mladek/Lance Yang)
     * Add 'panic_console_replay' parameter (Petr Mladek)
     * Fix compiling problem (0Day bot)
     * Add reviewed-by tag from Petr for patch 1/5

  Since V1:
     * Separate the 'sys_show_info' related code to new file sys_info.[ch] 
       (Petr Mladek)
     * Clean up the code for panic console replay (Petr Mladek)
     * Add 'panic_sys_info=' cmdline and sysctl interface for taking
       human readable parameters (Petr Mladek)
     * Add note about the obsoleting of 'panic_print' (Petr Mladek)
     * Hold the changes to hungtask/watchdog 

  Since RFC:
     * Don't print all cpu backtrace if 'sysctl_hung_task_all_cpu_backtracemay'
       is 'false' (Lance Yang)
     * Change the name of 2 new kernel control knob to have 'mask' inside, and
       add kernel document and code comments for them (Lance Yang)
     * Make the sys_show_info() support printk msg replay and all CPU backtrace. 

Feng Tang (5):
  panic: clean up code for console replay
  panic: generalize panic_print's function to show sys info
  panic: add 'panic_sys_info' sysctl to take human readable string
    parameter
  panic: add 'panic_sys_info=' setup option for kernel cmdline
  panic: add note that panic_print sysctl interface is deprecated

 .../admin-guide/kernel-parameters.txt         |  21 ++-
 Documentation/admin-guide/sysctl/kernel.rst   |  20 ++-
 include/linux/sys_info.h                      |  27 ++++
 kernel/panic.c                                |  71 +++++-----
 lib/Makefile                                  |   2 +-
 lib/sys_info.c                                | 122 ++++++++++++++++++
 6 files changed, 221 insertions(+), 42 deletions(-)
 create mode 100644 include/linux/sys_info.h
 create mode 100644 lib/sys_info.c

-- 
2.43.5
Re: [PATCH v3 0/5] generalize panic_print's dump function to be used by other kernel parts
Posted by Lance Yang 3 months ago
Just hit a build failure with this patch series when building for arm64
with a minimal configuration:

kernel/panic.c: In function ‘setup_panic_sys_info’:
kernel/panic.c:151:23: error: implicit declaration of function 
‘sys_info_parse_param’ [-Wimplicit-function-declaration]
151 |         panic_print = sys_info_parse_param(buf);
|                       ^~~~~~~~~~~~~~~~~~~~
make[3]: *** [scripts/Makefile.build:287: kernel/panic.o] Error 1
make[2]: *** [scripts/Makefile.build:554: kernel] Error 2


To reproduce it:
$ make ARCH=arm64 allnoconfig
$ make ARCH=arm64 -j$(nproc)

Thanks,
Lance


On 2025/7/3 10:09, Feng Tang wrote:
> When working on kernel stability issues, panic, task-hung and
> software/hardware lockup are frequently met. And to debug them, user
> may need lots of system information at that time, like task call stacks,
> lock info, memory info etc.
> 
> panic case already has panic_print_sys_info() for this purpose, and has
> a 'panic_print' bitmask to control what kinds of information is needed,
> which is also helpful to debug other task-hung and lockup cases.
> 
> So this patchset extract the function out to a new file 'lib/sys_info.c',
> and make it available for other cases which also need to dump system info
> for debugging.
> 
> Also as suggested by Petr Mladek, add 'panic_sys_info=' interface to
> take human readable string like "tasks,mem,locks,timers,ftrace,....",
> and eventually obsolete the current 'panic_print' bitmap interface.
> 
> In RFC and V1 version, hung_task and SW/HW watchdog modules are enabled
> with the new sys_info dump interface. In v2, they are kept out for
> better review of current change, and will be posted later.
> 
> Locally these have been used in our bug chasing for stability issues
> and was proven helpful.
> 
> Many thanks to Petr Mladek for great suggestions on both the code and
> architectures!
> 
> - Feng
> 
> One to do left is about adding note for obsoleting 'panic_print' cmdline
> as discussed in https://lore.kernel.org/lkml/aFvBuOnD0cAEWJfl@U-2FWC9VHC-2323.local/
> and will be posted later.
> 
> Changelog:
> 
>    Since v2:
>       * Rename to PANIC_CONSOLE_REPLAY (Petr Mladek)
>       * Don't let kernel.h include sys_info.h (Petr Mladek)
>       * Improve documents and coding style (Petr Mladek/Lance Yang)
>       * Add 'panic_console_replay' parameter (Petr Mladek)
>       * Fix compiling problem (0Day bot)
>       * Add reviewed-by tag from Petr for patch 1/5
> 
>    Since V1:
>       * Separate the 'sys_show_info' related code to new file sys_info.[ch]
>         (Petr Mladek)
>       * Clean up the code for panic console replay (Petr Mladek)
>       * Add 'panic_sys_info=' cmdline and sysctl interface for taking
>         human readable parameters (Petr Mladek)
>       * Add note about the obsoleting of 'panic_print' (Petr Mladek)
>       * Hold the changes to hungtask/watchdog
> 
>    Since RFC:
>       * Don't print all cpu backtrace if 'sysctl_hung_task_all_cpu_backtracemay'
>         is 'false' (Lance Yang)
>       * Change the name of 2 new kernel control knob to have 'mask' inside, and
>         add kernel document and code comments for them (Lance Yang)
>       * Make the sys_show_info() support printk msg replay and all CPU backtrace.
> 
> Feng Tang (5):
>    panic: clean up code for console replay
>    panic: generalize panic_print's function to show sys info
>    panic: add 'panic_sys_info' sysctl to take human readable string
>      parameter
>    panic: add 'panic_sys_info=' setup option for kernel cmdline
>    panic: add note that panic_print sysctl interface is deprecated
> 
>   .../admin-guide/kernel-parameters.txt         |  21 ++-
>   Documentation/admin-guide/sysctl/kernel.rst   |  20 ++-
>   include/linux/sys_info.h                      |  27 ++++
>   kernel/panic.c                                |  71 +++++-----
>   lib/Makefile                                  |   2 +-
>   lib/sys_info.c                                | 122 ++++++++++++++++++
>   6 files changed, 221 insertions(+), 42 deletions(-)
>   create mode 100644 include/linux/sys_info.h
>   create mode 100644 lib/sys_info.c
> 

Re: [PATCH v3 0/5] generalize panic_print's dump function to be used by other kernel parts
Posted by Lance Yang 3 months ago

On 2025/7/3 11:23, Lance Yang wrote:
> Just hit a build failure with this patch series when building for arm64
> with a minimal configuration:
> 
> kernel/panic.c: In function ‘setup_panic_sys_info’:
> kernel/panic.c:151:23: error: implicit declaration of function 
> ‘sys_info_parse_param’ [-Wimplicit-function-declaration]
> 151 |         panic_print = sys_info_parse_param(buf);
> |                       ^~~~~~~~~~~~~~~~~~~~
> make[3]: *** [scripts/Makefile.build:287: kernel/panic.o] Error 1
> make[2]: *** [scripts/Makefile.build:554: kernel] Error 2
> 
> 
> To reproduce it:
> $ make ARCH=arm64 allnoconfig
> $ make ARCH=arm64 -j$(nproc)

Realized that now: the root cause of the build failure I saw is the
missing "v3" tag in the subject of the patch #03 - sorry!

b4 reported that it couldn't find patch #03 when I tried to apply
this patch series, which is why I was getting the "implicit function
declaration" error ... Obviously, I missed that error before ;(

```
---
   ✓ [PATCH v3 1/5] panic: clean up code for console replay
     ✓ Signed: DKIM/linux.alibaba.com
   ✓ [PATCH v3 2/5] panic: generalize panic_print's function to show sys 
info
     ✓ Signed: DKIM/linux.alibaba.com
   ERROR: missing [3/5]!
   ✓ [PATCH v3 4/5] panic: add 'panic_sys_info=' setup option for kernel 
cmdline
     ✓ Signed: DKIM/linux.alibaba.com
   ✓ [PATCH v3 5/5] panic: add note that panic_print sysctl interface is 
deprecated
     ✓ Signed: DKIM/linux.alibaba.com
---
Total patches: 4
---
WARNING: Thread incomplete!
Applying: panic: clean up code for console replay
Applying: panic: generalize panic_print's function to show sys info
Applying: panic: add 'panic_sys_info=' setup option for kernel cmdline
Applying: panic: add note that panic_print sysctl interface is deprecated
```

Thanks,
Lance

> 
> Thanks,
> Lance
> 
> 
> On 2025/7/3 10:09, Feng Tang wrote:
>> When working on kernel stability issues, panic, task-hung and
>> software/hardware lockup are frequently met. And to debug them, user
>> may need lots of system information at that time, like task call stacks,
>> lock info, memory info etc.
>>
>> panic case already has panic_print_sys_info() for this purpose, and has
>> a 'panic_print' bitmask to control what kinds of information is needed,
>> which is also helpful to debug other task-hung and lockup cases.
>>
>> So this patchset extract the function out to a new file 'lib/sys_info.c',
>> and make it available for other cases which also need to dump system info
>> for debugging.
>>
>> Also as suggested by Petr Mladek, add 'panic_sys_info=' interface to
>> take human readable string like "tasks,mem,locks,timers,ftrace,....",
>> and eventually obsolete the current 'panic_print' bitmap interface.
>>
>> In RFC and V1 version, hung_task and SW/HW watchdog modules are enabled
>> with the new sys_info dump interface. In v2, they are kept out for
>> better review of current change, and will be posted later.
>>
>> Locally these have been used in our bug chasing for stability issues
>> and was proven helpful.
>>
>> Many thanks to Petr Mladek for great suggestions on both the code and
>> architectures!
>>
>> - Feng
>>
>> One to do left is about adding note for obsoleting 'panic_print' cmdline
>> as discussed in https://lore.kernel.org/lkml/ 
>> aFvBuOnD0cAEWJfl@U-2FWC9VHC-2323.local/
>> and will be posted later.
>>
>> Changelog:
>>
>>    Since v2:
>>       * Rename to PANIC_CONSOLE_REPLAY (Petr Mladek)
>>       * Don't let kernel.h include sys_info.h (Petr Mladek)
>>       * Improve documents and coding style (Petr Mladek/Lance Yang)
>>       * Add 'panic_console_replay' parameter (Petr Mladek)
>>       * Fix compiling problem (0Day bot)
>>       * Add reviewed-by tag from Petr for patch 1/5
>>
>>    Since V1:
>>       * Separate the 'sys_show_info' related code to new file 
>> sys_info.[ch]
>>         (Petr Mladek)
>>       * Clean up the code for panic console replay (Petr Mladek)
>>       * Add 'panic_sys_info=' cmdline and sysctl interface for taking
>>         human readable parameters (Petr Mladek)
>>       * Add note about the obsoleting of 'panic_print' (Petr Mladek)
>>       * Hold the changes to hungtask/watchdog
>>
>>    Since RFC:
>>       * Don't print all cpu backtrace if 
>> 'sysctl_hung_task_all_cpu_backtracemay'
>>         is 'false' (Lance Yang)
>>       * Change the name of 2 new kernel control knob to have 'mask' 
>> inside, and
>>         add kernel document and code comments for them (Lance Yang)
>>       * Make the sys_show_info() support printk msg replay and all CPU 
>> backtrace.
>>
>> Feng Tang (5):
>>    panic: clean up code for console replay
>>    panic: generalize panic_print's function to show sys info
>>    panic: add 'panic_sys_info' sysctl to take human readable string
>>      parameter
>>    panic: add 'panic_sys_info=' setup option for kernel cmdline
>>    panic: add note that panic_print sysctl interface is deprecated
>>
>>   .../admin-guide/kernel-parameters.txt         |  21 ++-
>>   Documentation/admin-guide/sysctl/kernel.rst   |  20 ++-
>>   include/linux/sys_info.h                      |  27 ++++
>>   kernel/panic.c                                |  71 +++++-----
>>   lib/Makefile                                  |   2 +-
>>   lib/sys_info.c                                | 122 ++++++++++++++++++
>>   6 files changed, 221 insertions(+), 42 deletions(-)
>>   create mode 100644 include/linux/sys_info.h
>>   create mode 100644 lib/sys_info.c
>>
> 

Re: [PATCH v3 0/5] generalize panic_print's dump function to be used by other kernel parts
Posted by Feng Tang 3 months ago
On Thu, Jul 03, 2025 at 12:56:54PM +0800, Lance Yang wrote:
> 
> 
> On 2025/7/3 11:23, Lance Yang wrote:
> > Just hit a build failure with this patch series when building for arm64
> > with a minimal configuration:
> > 
> > kernel/panic.c: In function ‘setup_panic_sys_info’:
> > kernel/panic.c:151:23: error: implicit declaration of function
> > ‘sys_info_parse_param’ [-Wimplicit-function-declaration]
> > 151 |         panic_print = sys_info_parse_param(buf);
> > |                       ^~~~~~~~~~~~~~~~~~~~
> > make[3]: *** [scripts/Makefile.build:287: kernel/panic.o] Error 1
> > make[2]: *** [scripts/Makefile.build:554: kernel] Error 2
> > 
> > 
> > To reproduce it:
> > $ make ARCH=arm64 allnoconfig
> > $ make ARCH=arm64 -j$(nproc)
> 
> Realized that now: the root cause of the build failure I saw is the
> missing "v3" tag in the subject of the patch #03 - sorry!
> 
> b4 reported that it couldn't find patch #03 when I tried to apply
> this patch series, which is why I was getting the "implicit function
> declaration" error ... Obviously, I missed that error before ;(
 
I see, as I just rerun 'allyes' and 'allno' build and they passed. I didn't
know that the patch version tag will cause the patch applying issue.

Thanks for the review and compiling test!

- Feng

> ```
> ---
>   ✓ [PATCH v3 1/5] panic: clean up code for console replay
>     ✓ Signed: DKIM/linux.alibaba.com
>   ✓ [PATCH v3 2/5] panic: generalize panic_print's function to show sys info
>     ✓ Signed: DKIM/linux.alibaba.com
>   ERROR: missing [3/5]!
>   ✓ [PATCH v3 4/5] panic: add 'panic_sys_info=' setup option for kernel
> cmdline
>     ✓ Signed: DKIM/linux.alibaba.com
>   ✓ [PATCH v3 5/5] panic: add note that panic_print sysctl interface is
> deprecated
>     ✓ Signed: DKIM/linux.alibaba.com
> ---
> Total patches: 4
> ---
> WARNING: Thread incomplete!
> Applying: panic: clean up code for console replay
> Applying: panic: generalize panic_print's function to show sys info
> Applying: panic: add 'panic_sys_info=' setup option for kernel cmdline
> Applying: panic: add note that panic_print sysctl interface is deprecated
> ```
> 
> Thanks,
> Lance
>