arch/x86/kernel/Makefile | 1 + arch/x86/kernel/panic.c | 9 +++++++++ kernel/panic.c | 12 +++++++++++- 3 files changed, 21 insertions(+), 1 deletion(-) create mode 100644 arch/x86/kernel/panic.c
From: Carlos Bilbao <cbilbao@digitalocean.com> After the kernel has finished handling a panic, it enters a busy-wait loop. But, this unnecessarily consumes CPU power and electricity. Plus, in VMs, this negatively impacts the throughput of other VM guests running on the same hypervisor. This patch set introduces a weak function cpu_halt_after_panic() to give architectures the option to halt the CPU during this state while still allowing interrupts to be processed. Do so for arch/x86 by defining the weak function and calling safe_halt(). Here's some numbers to support my claim, the perf stats from the hypervisor after triggering a panic on a guest Linux kernel. Samples: 55K of event 'cycles:P', Event count (approx.): 36090772574 Overhead Command Shared Object Symbol 42.20% CPU 5/KVM [kernel.kallsyms] [k] vmx_vmexit 19.07% CPU 5/KVM [kernel.kallsyms] [k] vmx_spec_ctrl_restore_host 9.73% CPU 5/KVM [kernel.kallsyms] [k] vmx_vcpu_enter_exit 3.60% CPU 5/KVM [kernel.kallsyms] [k] __flush_smp_call_function_queue 2.91% CPU 5/KVM [kernel.kallsyms] [k] vmx_vcpu_run 2.85% CPU 5/KVM [kernel.kallsyms] [k] native_irq_return_iret 2.67% CPU 5/KVM [kernel.kallsyms] [k] native_flush_tlb_one_user 2.16% CPU 5/KVM [kernel.kallsyms] [k] llist_reverse_order 2.10% CPU 5/KVM [kernel.kallsyms] [k] __srcu_read_lock 2.08% CPU 5/KVM [kernel.kallsyms] [k] flush_tlb_func 1.52% CPU 5/KVM [kernel.kallsyms] [k] vcpu_enter_guest.constprop.0 And here are the results from the guest VM after applying my patch: Samples: 51 of event 'cycles:P', Event count (approx.): 37553709 Overhead Command Shared Object Symbol 7.94% qemu-system-x86 [kernel.kallsyms] [k] __schedule 7.94% qemu-system-x86 libc.so.6 [.] 0x00000000000a2702 7.94% qemu-system-x86 qemu-system-x86_64 [.] 0x000000000057603c 7.43% qemu-system-x86 libc.so.6 [.] malloc 7.43% qemu-system-x86 libc.so.6 [.] 0x00000000001af9c0 6.37% IO mon_iothread libglib-2.0.so.0.7200.4 [.] g_mutex_unlock 5.21% IO mon_iothread [kernel.kallsyms] [k] __pollwait 4.70% IO mon_iothread [kernel.kallsyms] [k] clear_bhb_loop 3.56% IO mon_iothread [kernel.kallsyms] [k] __secure_computing 3.56% IO mon_iothread libglib-2.0.so.0.7200.4 [.] g_main_context_query 3.15% IO mon_iothread [kernel.kallsyms] [k] __hrtimer_start_range_ns 3.15% IO mon_iothread [kernel.kallsyms] [k] _raw_spin_lock_irq 2.88% IO mon_iothread libglib-2.0.so.0.7200.4 [.] g_main_context_prepare 2.83% qemu-system-x86 libglib-2.0.so.0.7200.4 [.] g_slist_foreach 2.58% IO mon_iothread qemu-system-x86_64 [.] 0x00000000004e820d 2.21% qemu-system-x86 libc.so.6 [.] 0x0000000000088010 1.94% IO mon_iothread [kernel.kallsyms] [k] arch_exit_to_user_mode_prepar As you can see, CPU consumption is significantly reduced after applying the proposed change after panic logic, with KVM-related functions (e.g., vmx_vmexit()) dropping from more than 70% of CPU usage to virtually nothing. Also, the num of samples decreased from 55K to 51 and the event count dropped from 36.09 billion to 37.55 million. Carlos Bilbao at DigitalOcean (2): panic: Allow archs to reduce CPU consumption after panic x86/panic: Use safe_halt() for CPU halt after panic --- arch/x86/kernel/Makefile | 1 + arch/x86/kernel/panic.c | 9 +++++++++ kernel/panic.c | 12 +++++++++++- 3 files changed, 21 insertions(+), 1 deletion(-) create mode 100644 arch/x86/kernel/panic.c
Hello again,
I would really appreciate your opinions on this.
Thanks!
Carlos
On 3/26/25 10:12, carlos.bilbao@kernel.org wrote:
> From: Carlos Bilbao <cbilbao@digitalocean.com>
>
> After the kernel has finished handling a panic, it enters a busy-wait loop.
> But, this unnecessarily consumes CPU power and electricity. Plus, in VMs,
> this negatively impacts the throughput of other VM guests running on the
> same hypervisor.
>
> This patch set introduces a weak function cpu_halt_after_panic() to give
> architectures the option to halt the CPU during this state while still
> allowing interrupts to be processed. Do so for arch/x86 by defining the
> weak function and calling safe_halt().
>
> Here's some numbers to support my claim, the perf stats from the hypervisor
> after triggering a panic on a guest Linux kernel.
>
> Samples: 55K of event 'cycles:P', Event count (approx.): 36090772574
> Overhead Command Shared Object Symbol
> 42.20% CPU 5/KVM [kernel.kallsyms] [k] vmx_vmexit
> 19.07% CPU 5/KVM [kernel.kallsyms] [k] vmx_spec_ctrl_restore_host
> 9.73% CPU 5/KVM [kernel.kallsyms] [k] vmx_vcpu_enter_exit
> 3.60% CPU 5/KVM [kernel.kallsyms] [k] __flush_smp_call_function_queue
> 2.91% CPU 5/KVM [kernel.kallsyms] [k] vmx_vcpu_run
> 2.85% CPU 5/KVM [kernel.kallsyms] [k] native_irq_return_iret
> 2.67% CPU 5/KVM [kernel.kallsyms] [k] native_flush_tlb_one_user
> 2.16% CPU 5/KVM [kernel.kallsyms] [k] llist_reverse_order
> 2.10% CPU 5/KVM [kernel.kallsyms] [k] __srcu_read_lock
> 2.08% CPU 5/KVM [kernel.kallsyms] [k] flush_tlb_func
> 1.52% CPU 5/KVM [kernel.kallsyms] [k] vcpu_enter_guest.constprop.0
>
> And here are the results from the guest VM after applying my patch:
>
> Samples: 51 of event 'cycles:P', Event count (approx.): 37553709
> Overhead Command Shared Object Symbol
> 7.94% qemu-system-x86 [kernel.kallsyms] [k] __schedule
> 7.94% qemu-system-x86 libc.so.6 [.] 0x00000000000a2702
> 7.94% qemu-system-x86 qemu-system-x86_64 [.] 0x000000000057603c
> 7.43% qemu-system-x86 libc.so.6 [.] malloc
> 7.43% qemu-system-x86 libc.so.6 [.] 0x00000000001af9c0
> 6.37% IO mon_iothread libglib-2.0.so.0.7200.4 [.] g_mutex_unlock
> 5.21% IO mon_iothread [kernel.kallsyms] [k] __pollwait
> 4.70% IO mon_iothread [kernel.kallsyms] [k] clear_bhb_loop
> 3.56% IO mon_iothread [kernel.kallsyms] [k] __secure_computing
> 3.56% IO mon_iothread libglib-2.0.so.0.7200.4 [.] g_main_context_query
> 3.15% IO mon_iothread [kernel.kallsyms] [k] __hrtimer_start_range_ns
> 3.15% IO mon_iothread [kernel.kallsyms] [k] _raw_spin_lock_irq
> 2.88% IO mon_iothread libglib-2.0.so.0.7200.4 [.] g_main_context_prepare
> 2.83% qemu-system-x86 libglib-2.0.so.0.7200.4 [.] g_slist_foreach
> 2.58% IO mon_iothread qemu-system-x86_64 [.] 0x00000000004e820d
> 2.21% qemu-system-x86 libc.so.6 [.] 0x0000000000088010
> 1.94% IO mon_iothread [kernel.kallsyms] [k] arch_exit_to_user_mode_prepar
>
> As you can see, CPU consumption is significantly reduced after applying the
> proposed change after panic logic, with KVM-related functions (e.g.,
> vmx_vmexit()) dropping from more than 70% of CPU usage to virtually
> nothing. Also, the num of samples decreased from 55K to 51 and the event
> count dropped from 36.09 billion to 37.55 million.
>
> Carlos Bilbao at DigitalOcean (2):
> panic: Allow archs to reduce CPU consumption after panic
> x86/panic: Use safe_halt() for CPU halt after panic
>
> ---
>
> arch/x86/kernel/Makefile | 1 +
> arch/x86/kernel/panic.c | 9 +++++++++
> kernel/panic.c | 12 +++++++++++-
> 3 files changed, 21 insertions(+), 1 deletion(-)
> create mode 100644 arch/x86/kernel/panic.c
>
>
> From mboxrd@z Thu Jan 1 00:00:00 1970
> Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201])
> (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
> (No client certificate requested)
> by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7B42E1F4174
> for <linux-kernel@vger.kernel.org>; Wed, 26 Mar 2025 15:12:15 +0000 (UTC)
> Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201
> ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
> t=1743001935; cv=none; b=pTx5wAwLeH5sWAAgsmlCk1lZgzSyUJH4X0UwzbEXvNm3EDKfoAwmJNvbIAk6ESdDQZ4j/9u/Tr51T9mIAGBteoeogjNzS7CEhokwMvfjLwfK/GZHSzyN+0oqtMptT829NvzENA2BVex1DLKAjsePN5nTlrf3/WMHr1bcmQSYBG4=
> ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
> s=arc-20240116; t=1743001935; c=relaxed/simple;
> bh=dvH6cZROBDqL/EIJB0ddluLgh3GMP5pgXEtD5g291tI=;
> h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
> MIME-Version; b=oQxBjv4Hpv/rJEVcoN/5DAetBXYAQcfNM++r5iZT8phmtHiLu/rCJ3KAEuqzqy6ffyuEgAPLj8oD9G5nwxUFtscLmkYOL1LlhmcNF5Qtdfnmpdbtsd6oCsCd+9eV0diUhXdALWysZAF6aQXpSZw5LUfT8xresIHTaKWrp6pvX7A=
> ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=MurPLbRg; arc=none smtp.client-ip=10.30.226.201
> Authentication-Results: smtp.subspace.kernel.org;
> dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="MurPLbRg"
> Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4D6FDC4CEE8;
> Wed, 26 Mar 2025 15:12:14 +0000 (UTC)
> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
> s=k20201202; t=1743001935;
> bh=dvH6cZROBDqL/EIJB0ddluLgh3GMP5pgXEtD5g291tI=;
> h=From:To:Cc:Subject:Date:In-Reply-To:References:From;
> b=MurPLbRgOhvxG7DGoeI4e6uf1uBNgQKuYEVn+R9J1Ys/ntkU8s2GjleUUf4P5gSje
> K0Qw27qmTj6yClEmUiZYU3Jw1dUraF20/y3Y5X2ULu4JIBKzJDJcs5zPefI7Hkzoie
> vbSpNhTmjCjRrUQu0tIv9GAwFTQynj6olDOMx+wonf4CXVF2xg0OSv6n4KuZs9Plps
> V14SmWmJUQLArdVDliLtaFaZ+VR12eQgLTD7XuLG8HExBuGdATgUYre2U3B9lGEfxr
> RcHi7NoRsrkmWAkQfXjInPNCwOkLvWM6CaaRHxsMWSD8aK5/8DS82WxDealKGyUyX0
> LuAEXKNekpppw==
> From: carlos.bilbao@kernel.org
> To: tglx@linutronix.de
> Cc: bilbao@vt.edu,
> pmladek@suse.com,
> akpm@linux-foundation.org,
> jan.glauber@gmail.com,
> jani.nikula@intel.com,
> linux-kernel@vger.kernel.org,
> gregkh@linuxfoundation.org,
> takakura@valinux.co.jp,
> john.ogness@linutronix.de,
> Carlos Bilbao <carlos.bilbao@kernel.org>
> Subject: [PATCH 1/2] panic: Allow archs to reduce CPU consumption after panic
> Date: Wed, 26 Mar 2025 10:12:03 -0500
> Message-ID: <20250326151204.67898-2-carlos.bilbao@kernel.org>
> X-Mailer: git-send-email 2.47.1
> In-Reply-To: <20250326151204.67898-1-carlos.bilbao@kernel.org>
> References: <20250326151204.67898-1-carlos.bilbao@kernel.org>
> Precedence: bulk
> X-Mailing-List: linux-kernel@vger.kernel.org
> List-Id: <linux-kernel.vger.kernel.org>
> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
> MIME-Version: 1.0
> Content-Transfer-Encoding: 8bit
>
> From: Carlos Bilbao <carlos.bilbao@kernel.org>
>
> After handling a panic, the kernel enters a busy-wait loop, unnecessarily
> consuming CPU and potentially impacting other workloads including other
> guest VMs in the case of virtualized setups.
>
> Introduce cpu_halt_after_panic(), a weak function that archs can override
> for a more efficient halt of CPU work. By default, it preserves the
> pre-existing behavior of delay.
>
> Signed-off-by: Carlos Bilbao (DigitalOcean) <carlos.bilbao@kernel.org>
> Reviewed-by: Jan Glauber (DigitalOcean) <jan.glauber@gmail.com>
> ---
> kernel/panic.c | 12 +++++++++++-
> 1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/panic.c b/kernel/panic.c
> index fbc59b3b64d0..fafe3fa22533 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -276,6 +276,16 @@ static void panic_other_cpus_shutdown(bool crash_kexec)
> crash_smp_send_stop();
> }
>
> +/*
> + * Called after a kernel panic has been handled, at which stage halting
> + * the CPU can help reduce unnecessary CPU consumption. In the absence of
> + * arch-specific implementations, just delay
> + */
> +static void __weak cpu_halt_after_panic(void)
> +{
> + mdelay(PANIC_TIMER_STEP);
> +}
> +
> /**
> * panic - halt the system
> * @fmt: The text string to print
> @@ -474,7 +484,7 @@ void panic(const char *fmt, ...)
> i += panic_blink(state ^= 1);
> i_next = i + 3600 / PANIC_BLINK_SPD;
> }
> - mdelay(PANIC_TIMER_STEP);
> + cpu_halt_after_panic();
> }
> }
>
© 2016 - 2025 Red Hat, Inc.