panic: reduce CPU consumption when finished handling panic

[RFC] panic: reduce CPU consumption when finished handling panic

Posted by Carlos Bilbao 10 months, 3 weeks ago

After the kernel has finished handling a panic, it enters a busy-wait loop.
But, this unnecessarily consumes CPU power and electricity. Plus, in VMs,
this negatively impacts the throughput of other VM guests running on the
same hypervisor.

I propose introducing a function cpu_halt_end_panic() to halt the CPU
during this state while still allowing interrupts to be processed. See my
commit below.

Thanks in advance!

Signed-off-by: Carlos Bilbao <carlos.bilbao@kernel.org>
---
 kernel/panic.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/kernel/panic.c b/kernel/panic.c
index fbc59b3b64d0..c00ccaa698d5 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -276,6 +276,21 @@ static void panic_other_cpus_shutdown(bool crash_kexec)
         crash_smp_send_stop();
 }
 
+static void cpu_halt_end_panic(void)
+{
+#ifdef CONFIG_X86
+    native_safe_halt();
+#elif defined(CONFIG_ARM)
+    cpu_do_idle();
+#else
+    /*
+     * Default to a simple busy-wait if no architecture-specific halt is
+     * defined above
+     */
+    mdelay(PANIC_TIMER_STEP);
+#endif
+}
+
 /**
  *    panic - halt the system
  *    @fmt: The text string to print
@@ -474,7 +489,7 @@ void panic(const char *fmt, ...)
             i += panic_blink(state ^= 1);
             i_next = i + 3600 / PANIC_BLINK_SPD;
         }
-        mdelay(PANIC_TIMER_STEP);
+        cpu_halt_end_panic();
     }
 }
 
-- 
2.47.1

Re: [RFC] panic: reduce CPU consumption when finished handling panic

Posted by Thomas Gleixner 10 months, 3 weeks ago

On Mon, Mar 17 2025 at 17:01, Carlos Bilbao wrote:
> After the kernel has finished handling a panic, it enters a busy-wait loop.
> But, this unnecessarily consumes CPU power and electricity. Plus, in VMs,
> this negatively impacts the throughput of other VM guests running on the
> same hypervisor.
>
> I propose introducing a function cpu_halt_end_panic() to halt the CPU
> during this state while still allowing interrupts to be processed. See my
> commit below.

That's not the way how change logs are written. You explain the problem
and then briefly how it is addressed.

No proposal, no 'see below'. Such wording does not make any sense in a
git commit. See Documentation/process/

> @@ -276,6 +276,21 @@ static void panic_other_cpus_shutdown(bool crash_kexec)
>          crash_smp_send_stop();

Your patch is malformed due to white space damage:

  patch: **** malformed patch at line 31:          crash_smp_send_stop();

> +static void cpu_halt_end_panic(void)
> +{
> +#ifdef CONFIG_X86
> +    native_safe_halt();
> +#elif defined(CONFIG_ARM)
> +    cpu_do_idle();
> +#else
> +    /*
> +     * Default to a simple busy-wait if no architecture-specific halt is
> +     * defined above
> +     */
> +    mdelay(PANIC_TIMER_STEP);
> +#endif

Architecture specific #ifdefs in core code are not the right way to
go. Split this into three patches:

1) Add a weak fallback function

void __weak cpu_halt_after_panic(void)
{
        mdelay(PANIC_TIMER_STEP);
}

2) Add non weak implementation in arch/x86

    native_safe_halt() is wrong vs. paravirtualization.

    safe_halt() is what you want.
  
3) Add non weak implementation for arch/arm

   Not my playground, so no comment

Thanks,

        tglx

Re: [RFC] panic: reduce CPU consumption when finished handling panic

Posted by Carlos Bilbao 10 months, 2 weeks ago

Hello,

On 3/21/25 14:03, Thomas Gleixner wrote:
> On Mon, Mar 17 2025 at 17:01, Carlos Bilbao wrote:
>> After the kernel has finished handling a panic, it enters a busy-wait loop.
>> But, this unnecessarily consumes CPU power and electricity. Plus, in VMs,
>> this negatively impacts the throughput of other VM guests running on the
>> same hypervisor.
>>
>> I propose introducing a function cpu_halt_end_panic() to halt the CPU
>> during this state while still allowing interrupts to be processed. See my
>> commit below.
> That's not the way how change logs are written. You explain the problem
> and then briefly how it is addressed.
>
> No proposal, no 'see below'. Such wording does not make any sense in a
> git commit. See Documentation/process/
>
>> @@ -276,6 +276,21 @@ static void panic_other_cpus_shutdown(bool crash_kexec)
>>          crash_smp_send_stop();
> Your patch is malformed due to white space damage:
>
>   patch: **** malformed patch at line 31:          crash_smp_send_stop();
>
>> +static void cpu_halt_end_panic(void)
>> +{
>> +#ifdef CONFIG_X86
>> +    native_safe_halt();
>> +#elif defined(CONFIG_ARM)
>> +    cpu_do_idle();
>> +#else
>> +    /*
>> +     * Default to a simple busy-wait if no architecture-specific halt is
>> +     * defined above
>> +     */
>> +    mdelay(PANIC_TIMER_STEP);
>> +#endif
> Architecture specific #ifdefs in core code are not the right way to
> go. Split this into three patches:
>
> 1) Add a weak fallback function
>
> void __weak cpu_halt_after_panic(void)
> {
>         mdelay(PANIC_TIMER_STEP);
> }
>
> 2) Add non weak implementation in arch/x86
>
>     native_safe_halt() is wrong vs. paravirtualization.
>
>     safe_halt() is what you want.
>   
> 3) Add non weak implementation for arch/arm
>
>    Not my playground, so no comment


Sounds good, sending patch set now. Thanks for your time!


>
> Thanks,
>
>         tglx
>
>     


Thanks,

Carlos

[RFC] panic: reduce CPU consumption when finished handling panic

Posted by Carlos Bilbao 10 months, 3 weeks ago

Hello again,


I thought it would be helpful to share some numbers to support my claim
and a couple ideas to improve the patch. Below are the perf stats from
the hypervisor after triggering a panic on a guest running kernel v5.15
(I'll provide the details of the experiment afterward.)


Samples: 55K of event 'cycles:P', Event count (approx.): 36090772574
Overhead  Command          Shared Object            Symbol
  42.20%  CPU 5/KVM        [kernel.kallsyms]        [k] vmx_vmexit
  19.07%  CPU 5/KVM        [kernel.kallsyms]        [k] vmx_spec_ctrl_restore_host
   9.73%  CPU 5/KVM        [kernel.kallsyms]        [k] vmx_vcpu_enter_exit
   3.60%  CPU 5/KVM        [kernel.kallsyms]        [k] __flush_smp_call_function_queue
   2.91%  CPU 5/KVM        [kernel.kallsyms]        [k] vmx_vcpu_run
   2.85%  CPU 5/KVM        [kernel.kallsyms]        [k] native_irq_return_iret
   2.67%  CPU 5/KVM        [kernel.kallsyms]        [k] native_flush_tlb_one_user
   2.16%  CPU 5/KVM        [kernel.kallsyms]        [k] llist_reverse_order
   2.10%  CPU 5/KVM        [kernel.kallsyms]        [k] __srcu_read_lock
   2.08%  CPU 5/KVM        [kernel.kallsyms]        [k] flush_tlb_func
   1.52%  CPU 5/KVM        [kernel.kallsyms]        [k] vcpu_enter_guest.constprop.0
   1.50%  CPU 5/KVM        [kernel.kallsyms]        [k] native_apic_msr_eoi
   1.01%  CPU 5/KVM        [kernel.kallsyms]        [k] clear_bhb_loop
   0.66%  CPU 5/KVM        [kernel.kallsyms]        [k] sysvec_call_function_single


And here are the results from the guest VM after applying my patch:


Samples: 28  of event 'cycles:P', Event count (approx.): 28961952
Overhead  Command          Shared Object            Symbol
  11.03%  qemu-system-x86  [kernel.kallsyms]        [k] task_mm_cid_work
  11.03%  qemu-system-x86  qemu-system-x86_64       [.] 0x0000000000579944
   9.80%  qemu-system-x86  qemu-system-x86_64       [.] 0x000000000056512b
   8.45%  IO mon_iothread  libc.so.6                [.] 0x00000000000a3f12
   8.45%  IO mon_iothread  libglib-2.0.so.0.7200.4  [.] g_mutex_lock
   7.51%  IO mon_iothread  [kernel.kallsyms]        [k] avg_vruntime
   6.65%  IO mon_iothread  libc.so.6                [.] write
   5.93%  IO mon_iothread  [kernel.kallsyms]        [k] security_file_permission
   4.97%  qemu-system-x86  libglib-2.0.so.0.7200.4  [.] g_thread_self
   4.64%  IO mon_iothread  [kernel.kallsyms]        [k] aa_label_sk_perm.part.0
   4.13%  IO mon_iothread  libglib-2.0.so.0.7200.4  [.] g_main_context_release
   3.79%  IO mon_iothread  [kernel.kallsyms]        [k] seccomp_run_filters
   3.42%  IO mon_iothread  libglib-2.0.so.0.7200.4  [.] g_main_context_dispatch
   3.42%  IO mon_iothread  qemu-system-x86_64       [.] 0x00000000004edbab
   3.28%  IO mon_iothread  qemu-system-x86_64       [.] 0x00000000005999c8
   3.09%  IO mon_iothread  qemu-system-x86_64       [.] 0x00000000004e636b
   0.22%  qemu-system-x86  [kernel.kallsyms]        [k] __intel_pmu_enable_all.constprop.0


As you can see, CPU consumption is significantly reduced after applying the
proposed change during panic, with KVM-related functions (e.g.,
vmx_vmexit) dropping from more than 70% of CPU usage to virtually nothing.
Also, the num of samples decreased from 55K to 28, and the event count
dropped from 36.09 billion to 28.96 million.


Jan suggested that a better way to implement cpu_halt_end_panic() (perhaps
cpu_halt_after_panic() is a better name) would be to define it as a weak
function in asm-generic, allowing archs to overwrite it. What do you think?

Thank you in advance!

Regards,
Carlos

---

Details on the experiment:

- Linux kernel v5.15 (commit 8bb7eca)

-  VM guest CPU: Intel(R) Xeon(R) Gold 5318Y CPU @ 2.10GHz

- I executed to collect samples:
  /usr/bin/perf record -p 2618527 -a sleep 30

- Image Ubuntu 22.04 (LTS) x64, 8 vCPUs, 16GB / 100GB Disk


Thanks,

Carlos


On 3/17/25 17:01, Carlos Bilbao wrote:
> After the kernel has finished handling a panic, it enters a busy-wait loop.
> But, this unnecessarily consumes CPU power and electricity. Plus, in VMs,
> this negatively impacts the throughput of other VM guests running on the
> same hypervisor.
>
> I propose introducing a function cpu_halt_end_panic() to halt the CPU
> during this state while still allowing interrupts to be processed. See my
> commit below.
>
> Thanks in advance!
>
> Signed-off-by: Carlos Bilbao <carlos.bilbao@kernel.org>
> ---
>  kernel/panic.c | 17 ++++++++++++++++-
>  1 file changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/panic.c b/kernel/panic.c
> index fbc59b3b64d0..c00ccaa698d5 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -276,6 +276,21 @@ static void panic_other_cpus_shutdown(bool crash_kexec)
>          crash_smp_send_stop();
>  }
>  
> +static void cpu_halt_end_panic(void)
> +{
> +#ifdef CONFIG_X86
> +    native_safe_halt();
> +#elif defined(CONFIG_ARM)
> +    cpu_do_idle();
> +#else
> +    /*
> +     * Default to a simple busy-wait if no architecture-specific halt is
> +     * defined above
> +     */
> +    mdelay(PANIC_TIMER_STEP);
> +#endif
> +}
> +
>  /**
>   *    panic - halt the system
>   *    @fmt: The text string to print
> @@ -474,7 +489,7 @@ void panic(const char *fmt, ...)
>              i += panic_blink(state ^= 1);
>              i_next = i + 3600 / PANIC_BLINK_SPD;
>          }
> -        mdelay(PANIC_TIMER_STEP);
> +        cpu_halt_end_panic();
>      }
>  }
>

Re: [RFC] panic: reduce CPU consumption when finished handling panic

Posted by Carlos Bilbao 10 months, 3 weeks ago

+CCed Jan Glauber as he'd figure as Co-developed-by if this goes through.

Thanks,

Carlos


On 3/17/25 17:01, Carlos Bilbao wrote:
> After the kernel has finished handling a panic, it enters a busy-wait loop.
> But, this unnecessarily consumes CPU power and electricity. Plus, in VMs,
> this negatively impacts the throughput of other VM guests running on the
> same hypervisor.
>
> I propose introducing a function cpu_halt_end_panic() to halt the CPU
> during this state while still allowing interrupts to be processed. See my
> commit below.
>
> Thanks in advance!
>
> Signed-off-by: Carlos Bilbao <carlos.bilbao@kernel.org>
> ---
>  kernel/panic.c | 17 ++++++++++++++++-
>  1 file changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/panic.c b/kernel/panic.c
> index fbc59b3b64d0..c00ccaa698d5 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -276,6 +276,21 @@ static void panic_other_cpus_shutdown(bool crash_kexec)
>          crash_smp_send_stop();
>  }
>  
> +static void cpu_halt_end_panic(void)
> +{
> +#ifdef CONFIG_X86
> +    native_safe_halt();
> +#elif defined(CONFIG_ARM)
> +    cpu_do_idle();
> +#else
> +    /*
> +     * Default to a simple busy-wait if no architecture-specific halt is
> +     * defined above
> +     */
> +    mdelay(PANIC_TIMER_STEP);
> +#endif
> +}
> +
>  /**
>   *    panic - halt the system
>   *    @fmt: The text string to print
> @@ -474,7 +489,7 @@ void panic(const char *fmt, ...)
>              i += panic_blink(state ^= 1);
>              i_next = i + 3600 / PANIC_BLINK_SPD;
>          }
> -        mdelay(PANIC_TIMER_STEP);
> +        cpu_halt_end_panic();
>      }
>  }
>