kernel/panic.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-)
After the kernel has finished handling a panic, it enters a busy-wait loop.
But, this unnecessarily consumes CPU power and electricity. Plus, in VMs,
this negatively impacts the throughput of other VM guests running on the
same hypervisor.
I propose introducing a function cpu_halt_end_panic() to halt the CPU
during this state while still allowing interrupts to be processed. See my
commit below.
Thanks in advance!
Signed-off-by: Carlos Bilbao <carlos.bilbao@kernel.org>
---
kernel/panic.c | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)
diff --git a/kernel/panic.c b/kernel/panic.c
index fbc59b3b64d0..c00ccaa698d5 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -276,6 +276,21 @@ static void panic_other_cpus_shutdown(bool crash_kexec)
crash_smp_send_stop();
}
+static void cpu_halt_end_panic(void)
+{
+#ifdef CONFIG_X86
+ native_safe_halt();
+#elif defined(CONFIG_ARM)
+ cpu_do_idle();
+#else
+ /*
+ * Default to a simple busy-wait if no architecture-specific halt is
+ * defined above
+ */
+ mdelay(PANIC_TIMER_STEP);
+#endif
+}
+
/**
* panic - halt the system
* @fmt: The text string to print
@@ -474,7 +489,7 @@ void panic(const char *fmt, ...)
i += panic_blink(state ^= 1);
i_next = i + 3600 / PANIC_BLINK_SPD;
}
- mdelay(PANIC_TIMER_STEP);
+ cpu_halt_end_panic();
}
}
--
2.47.1
On Mon, Mar 17 2025 at 17:01, Carlos Bilbao wrote:
> After the kernel has finished handling a panic, it enters a busy-wait loop.
> But, this unnecessarily consumes CPU power and electricity. Plus, in VMs,
> this negatively impacts the throughput of other VM guests running on the
> same hypervisor.
>
> I propose introducing a function cpu_halt_end_panic() to halt the CPU
> during this state while still allowing interrupts to be processed. See my
> commit below.
That's not the way how change logs are written. You explain the problem
and then briefly how it is addressed.
No proposal, no 'see below'. Such wording does not make any sense in a
git commit. See Documentation/process/
> @@ -276,6 +276,21 @@ static void panic_other_cpus_shutdown(bool crash_kexec)
> crash_smp_send_stop();
Your patch is malformed due to white space damage:
patch: **** malformed patch at line 31: crash_smp_send_stop();
> +static void cpu_halt_end_panic(void)
> +{
> +#ifdef CONFIG_X86
> + native_safe_halt();
> +#elif defined(CONFIG_ARM)
> + cpu_do_idle();
> +#else
> + /*
> + * Default to a simple busy-wait if no architecture-specific halt is
> + * defined above
> + */
> + mdelay(PANIC_TIMER_STEP);
> +#endif
Architecture specific #ifdefs in core code are not the right way to
go. Split this into three patches:
1) Add a weak fallback function
void __weak cpu_halt_after_panic(void)
{
mdelay(PANIC_TIMER_STEP);
}
2) Add non weak implementation in arch/x86
native_safe_halt() is wrong vs. paravirtualization.
safe_halt() is what you want.
3) Add non weak implementation for arch/arm
Not my playground, so no comment
Thanks,
tglx
Hello,
On 3/21/25 14:03, Thomas Gleixner wrote:
> On Mon, Mar 17 2025 at 17:01, Carlos Bilbao wrote:
>> After the kernel has finished handling a panic, it enters a busy-wait loop.
>> But, this unnecessarily consumes CPU power and electricity. Plus, in VMs,
>> this negatively impacts the throughput of other VM guests running on the
>> same hypervisor.
>>
>> I propose introducing a function cpu_halt_end_panic() to halt the CPU
>> during this state while still allowing interrupts to be processed. See my
>> commit below.
> That's not the way how change logs are written. You explain the problem
> and then briefly how it is addressed.
>
> No proposal, no 'see below'. Such wording does not make any sense in a
> git commit. See Documentation/process/
>
>> @@ -276,6 +276,21 @@ static void panic_other_cpus_shutdown(bool crash_kexec)
>> crash_smp_send_stop();
> Your patch is malformed due to white space damage:
>
> patch: **** malformed patch at line 31: crash_smp_send_stop();
>
>> +static void cpu_halt_end_panic(void)
>> +{
>> +#ifdef CONFIG_X86
>> + native_safe_halt();
>> +#elif defined(CONFIG_ARM)
>> + cpu_do_idle();
>> +#else
>> + /*
>> + * Default to a simple busy-wait if no architecture-specific halt is
>> + * defined above
>> + */
>> + mdelay(PANIC_TIMER_STEP);
>> +#endif
> Architecture specific #ifdefs in core code are not the right way to
> go. Split this into three patches:
>
> 1) Add a weak fallback function
>
> void __weak cpu_halt_after_panic(void)
> {
> mdelay(PANIC_TIMER_STEP);
> }
>
> 2) Add non weak implementation in arch/x86
>
> native_safe_halt() is wrong vs. paravirtualization.
>
> safe_halt() is what you want.
>
> 3) Add non weak implementation for arch/arm
>
> Not my playground, so no comment
Sounds good, sending patch set now. Thanks for your time!
>
> Thanks,
>
> tglx
>
>
Thanks,
Carlos
Hello again,
I thought it would be helpful to share some numbers to support my claim
and a couple ideas to improve the patch. Below are the perf stats from
the hypervisor after triggering a panic on a guest running kernel v5.15
(I'll provide the details of the experiment afterward.)
Samples: 55K of event 'cycles:P', Event count (approx.): 36090772574
Overhead Command Shared Object Symbol
42.20% CPU 5/KVM [kernel.kallsyms] [k] vmx_vmexit
19.07% CPU 5/KVM [kernel.kallsyms] [k] vmx_spec_ctrl_restore_host
9.73% CPU 5/KVM [kernel.kallsyms] [k] vmx_vcpu_enter_exit
3.60% CPU 5/KVM [kernel.kallsyms] [k] __flush_smp_call_function_queue
2.91% CPU 5/KVM [kernel.kallsyms] [k] vmx_vcpu_run
2.85% CPU 5/KVM [kernel.kallsyms] [k] native_irq_return_iret
2.67% CPU 5/KVM [kernel.kallsyms] [k] native_flush_tlb_one_user
2.16% CPU 5/KVM [kernel.kallsyms] [k] llist_reverse_order
2.10% CPU 5/KVM [kernel.kallsyms] [k] __srcu_read_lock
2.08% CPU 5/KVM [kernel.kallsyms] [k] flush_tlb_func
1.52% CPU 5/KVM [kernel.kallsyms] [k] vcpu_enter_guest.constprop.0
1.50% CPU 5/KVM [kernel.kallsyms] [k] native_apic_msr_eoi
1.01% CPU 5/KVM [kernel.kallsyms] [k] clear_bhb_loop
0.66% CPU 5/KVM [kernel.kallsyms] [k] sysvec_call_function_single
And here are the results from the guest VM after applying my patch:
Samples: 28 of event 'cycles:P', Event count (approx.): 28961952
Overhead Command Shared Object Symbol
11.03% qemu-system-x86 [kernel.kallsyms] [k] task_mm_cid_work
11.03% qemu-system-x86 qemu-system-x86_64 [.] 0x0000000000579944
9.80% qemu-system-x86 qemu-system-x86_64 [.] 0x000000000056512b
8.45% IO mon_iothread libc.so.6 [.] 0x00000000000a3f12
8.45% IO mon_iothread libglib-2.0.so.0.7200.4 [.] g_mutex_lock
7.51% IO mon_iothread [kernel.kallsyms] [k] avg_vruntime
6.65% IO mon_iothread libc.so.6 [.] write
5.93% IO mon_iothread [kernel.kallsyms] [k] security_file_permission
4.97% qemu-system-x86 libglib-2.0.so.0.7200.4 [.] g_thread_self
4.64% IO mon_iothread [kernel.kallsyms] [k] aa_label_sk_perm.part.0
4.13% IO mon_iothread libglib-2.0.so.0.7200.4 [.] g_main_context_release
3.79% IO mon_iothread [kernel.kallsyms] [k] seccomp_run_filters
3.42% IO mon_iothread libglib-2.0.so.0.7200.4 [.] g_main_context_dispatch
3.42% IO mon_iothread qemu-system-x86_64 [.] 0x00000000004edbab
3.28% IO mon_iothread qemu-system-x86_64 [.] 0x00000000005999c8
3.09% IO mon_iothread qemu-system-x86_64 [.] 0x00000000004e636b
0.22% qemu-system-x86 [kernel.kallsyms] [k] __intel_pmu_enable_all.constprop.0
As you can see, CPU consumption is significantly reduced after applying the
proposed change during panic, with KVM-related functions (e.g.,
vmx_vmexit) dropping from more than 70% of CPU usage to virtually nothing.
Also, the num of samples decreased from 55K to 28, and the event count
dropped from 36.09 billion to 28.96 million.
Jan suggested that a better way to implement cpu_halt_end_panic() (perhaps
cpu_halt_after_panic() is a better name) would be to define it as a weak
function in asm-generic, allowing archs to overwrite it. What do you think?
Thank you in advance!
Regards,
Carlos
---
Details on the experiment:
- Linux kernel v5.15 (commit 8bb7eca)
- VM guest CPU: Intel(R) Xeon(R) Gold 5318Y CPU @ 2.10GHz
- I executed to collect samples:
/usr/bin/perf record -p 2618527 -a sleep 30
- Image Ubuntu 22.04 (LTS) x64, 8 vCPUs, 16GB / 100GB Disk
Thanks,
Carlos
On 3/17/25 17:01, Carlos Bilbao wrote:
> After the kernel has finished handling a panic, it enters a busy-wait loop.
> But, this unnecessarily consumes CPU power and electricity. Plus, in VMs,
> this negatively impacts the throughput of other VM guests running on the
> same hypervisor.
>
> I propose introducing a function cpu_halt_end_panic() to halt the CPU
> during this state while still allowing interrupts to be processed. See my
> commit below.
>
> Thanks in advance!
>
> Signed-off-by: Carlos Bilbao <carlos.bilbao@kernel.org>
> ---
> kernel/panic.c | 17 ++++++++++++++++-
> 1 file changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/panic.c b/kernel/panic.c
> index fbc59b3b64d0..c00ccaa698d5 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -276,6 +276,21 @@ static void panic_other_cpus_shutdown(bool crash_kexec)
> crash_smp_send_stop();
> }
>
> +static void cpu_halt_end_panic(void)
> +{
> +#ifdef CONFIG_X86
> + native_safe_halt();
> +#elif defined(CONFIG_ARM)
> + cpu_do_idle();
> +#else
> + /*
> + * Default to a simple busy-wait if no architecture-specific halt is
> + * defined above
> + */
> + mdelay(PANIC_TIMER_STEP);
> +#endif
> +}
> +
> /**
> * panic - halt the system
> * @fmt: The text string to print
> @@ -474,7 +489,7 @@ void panic(const char *fmt, ...)
> i += panic_blink(state ^= 1);
> i_next = i + 3600 / PANIC_BLINK_SPD;
> }
> - mdelay(PANIC_TIMER_STEP);
> + cpu_halt_end_panic();
> }
> }
>
+CCed Jan Glauber as he'd figure as Co-developed-by if this goes through.
Thanks,
Carlos
On 3/17/25 17:01, Carlos Bilbao wrote:
> After the kernel has finished handling a panic, it enters a busy-wait loop.
> But, this unnecessarily consumes CPU power and electricity. Plus, in VMs,
> this negatively impacts the throughput of other VM guests running on the
> same hypervisor.
>
> I propose introducing a function cpu_halt_end_panic() to halt the CPU
> during this state while still allowing interrupts to be processed. See my
> commit below.
>
> Thanks in advance!
>
> Signed-off-by: Carlos Bilbao <carlos.bilbao@kernel.org>
> ---
> kernel/panic.c | 17 ++++++++++++++++-
> 1 file changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/panic.c b/kernel/panic.c
> index fbc59b3b64d0..c00ccaa698d5 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -276,6 +276,21 @@ static void panic_other_cpus_shutdown(bool crash_kexec)
> crash_smp_send_stop();
> }
>
> +static void cpu_halt_end_panic(void)
> +{
> +#ifdef CONFIG_X86
> + native_safe_halt();
> +#elif defined(CONFIG_ARM)
> + cpu_do_idle();
> +#else
> + /*
> + * Default to a simple busy-wait if no architecture-specific halt is
> + * defined above
> + */
> + mdelay(PANIC_TIMER_STEP);
> +#endif
> +}
> +
> /**
> * panic - halt the system
> * @fmt: The text string to print
> @@ -474,7 +489,7 @@ void panic(const char *fmt, ...)
> i += panic_blink(state ^= 1);
> i_next = i + 3600 / PANIC_BLINK_SPD;
> }
> - mdelay(PANIC_TIMER_STEP);
> + cpu_halt_end_panic();
> }
> }
>
© 2016 - 2025 Red Hat, Inc.