[PATCH 0/2] Reduce CPU usage when finished handling panic

carlos.bilbao@kernel.org posted 2 patches 8 months, 3 weeks ago
arch/x86/kernel/Makefile |  1 +
arch/x86/kernel/panic.c  |  9 +++++++++
kernel/panic.c           | 12 +++++++++++-
3 files changed, 21 insertions(+), 1 deletion(-)
create mode 100644 arch/x86/kernel/panic.c
[PATCH 0/2] Reduce CPU usage when finished handling panic
Posted by carlos.bilbao@kernel.org 8 months, 3 weeks ago
From: Carlos Bilbao <cbilbao@digitalocean.com>

After the kernel has finished handling a panic, it enters a busy-wait loop.
But, this unnecessarily consumes CPU power and electricity. Plus, in VMs,
this negatively impacts the throughput of other VM guests running on the
same hypervisor.

This patch set introduces a weak function cpu_halt_after_panic() to give
architectures the option to halt the CPU during this state while still
allowing interrupts to be processed. Do so for arch/x86 by defining the
weak function and calling safe_halt().

Here's some numbers to support my claim, the perf stats from the hypervisor
after triggering a panic on a guest Linux kernel.

Samples: 55K of event 'cycles:P', Event count (approx.): 36090772574
Overhead  Command          Shared Object            Symbol
  42.20%  CPU 5/KVM        [kernel.kallsyms]        [k] vmx_vmexit
  19.07%  CPU 5/KVM        [kernel.kallsyms]        [k] vmx_spec_ctrl_restore_host
   9.73%  CPU 5/KVM        [kernel.kallsyms]        [k] vmx_vcpu_enter_exit
   3.60%  CPU 5/KVM        [kernel.kallsyms]        [k] __flush_smp_call_function_queue
   2.91%  CPU 5/KVM        [kernel.kallsyms]        [k] vmx_vcpu_run
   2.85%  CPU 5/KVM        [kernel.kallsyms]        [k] native_irq_return_iret
   2.67%  CPU 5/KVM        [kernel.kallsyms]        [k] native_flush_tlb_one_user
   2.16%  CPU 5/KVM        [kernel.kallsyms]        [k] llist_reverse_order
   2.10%  CPU 5/KVM        [kernel.kallsyms]        [k] __srcu_read_lock
   2.08%  CPU 5/KVM        [kernel.kallsyms]        [k] flush_tlb_func
   1.52%  CPU 5/KVM        [kernel.kallsyms]        [k] vcpu_enter_guest.constprop.0

And here are the results from the guest VM after applying my patch:

Samples: 51  of event 'cycles:P', Event count (approx.): 37553709
Overhead  Command          Shared Object            Symbol
   7.94%  qemu-system-x86  [kernel.kallsyms]        [k] __schedule
   7.94%  qemu-system-x86  libc.so.6                [.] 0x00000000000a2702
   7.94%  qemu-system-x86  qemu-system-x86_64       [.] 0x000000000057603c
   7.43%  qemu-system-x86  libc.so.6                [.] malloc
   7.43%  qemu-system-x86  libc.so.6                [.] 0x00000000001af9c0
   6.37%  IO mon_iothread  libglib-2.0.so.0.7200.4  [.] g_mutex_unlock
   5.21%  IO mon_iothread  [kernel.kallsyms]        [k] __pollwait
   4.70%  IO mon_iothread  [kernel.kallsyms]        [k] clear_bhb_loop
   3.56%  IO mon_iothread  [kernel.kallsyms]        [k] __secure_computing
   3.56%  IO mon_iothread  libglib-2.0.so.0.7200.4  [.] g_main_context_query
   3.15%  IO mon_iothread  [kernel.kallsyms]        [k] __hrtimer_start_range_ns
   3.15%  IO mon_iothread  [kernel.kallsyms]        [k] _raw_spin_lock_irq
   2.88%  IO mon_iothread  libglib-2.0.so.0.7200.4  [.] g_main_context_prepare
   2.83%  qemu-system-x86  libglib-2.0.so.0.7200.4  [.] g_slist_foreach
   2.58%  IO mon_iothread  qemu-system-x86_64       [.] 0x00000000004e820d
   2.21%  qemu-system-x86  libc.so.6                [.] 0x0000000000088010
   1.94%  IO mon_iothread  [kernel.kallsyms]        [k] arch_exit_to_user_mode_prepar

As you can see, CPU consumption is significantly reduced after applying the
proposed change after panic logic, with KVM-related functions (e.g.,
vmx_vmexit()) dropping from more than 70% of CPU usage to virtually
nothing. Also, the num of samples decreased from 55K to 51 and the event
count dropped from 36.09 billion to 37.55 million.

Carlos Bilbao at DigitalOcean (2):
  panic: Allow archs to reduce CPU consumption after panic
  x86/panic: Use safe_halt() for CPU halt after panic

---

 arch/x86/kernel/Makefile |  1 +
 arch/x86/kernel/panic.c  |  9 +++++++++
 kernel/panic.c           | 12 +++++++++++-
 3 files changed, 21 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/kernel/panic.c
Reminder: [PATCH 0/2] Reduce CPU usage when finished handling panic
Posted by Carlos Bilbao 8 months, 1 week ago
Hello again,

I would really appreciate your opinions on this.

Thanks!
Carlos

On 3/26/25 10:12, carlos.bilbao@kernel.org wrote:
> From: Carlos Bilbao <cbilbao@digitalocean.com>
> 
> After the kernel has finished handling a panic, it enters a busy-wait loop.
> But, this unnecessarily consumes CPU power and electricity. Plus, in VMs,
> this negatively impacts the throughput of other VM guests running on the
> same hypervisor.
> 
> This patch set introduces a weak function cpu_halt_after_panic() to give
> architectures the option to halt the CPU during this state while still
> allowing interrupts to be processed. Do so for arch/x86 by defining the
> weak function and calling safe_halt().
> 
> Here's some numbers to support my claim, the perf stats from the hypervisor
> after triggering a panic on a guest Linux kernel.
> 
> Samples: 55K of event 'cycles:P', Event count (approx.): 36090772574
> Overhead  Command          Shared Object            Symbol
>   42.20%  CPU 5/KVM        [kernel.kallsyms]        [k] vmx_vmexit
>   19.07%  CPU 5/KVM        [kernel.kallsyms]        [k] vmx_spec_ctrl_restore_host
>    9.73%  CPU 5/KVM        [kernel.kallsyms]        [k] vmx_vcpu_enter_exit
>    3.60%  CPU 5/KVM        [kernel.kallsyms]        [k] __flush_smp_call_function_queue
>    2.91%  CPU 5/KVM        [kernel.kallsyms]        [k] vmx_vcpu_run
>    2.85%  CPU 5/KVM        [kernel.kallsyms]        [k] native_irq_return_iret
>    2.67%  CPU 5/KVM        [kernel.kallsyms]        [k] native_flush_tlb_one_user
>    2.16%  CPU 5/KVM        [kernel.kallsyms]        [k] llist_reverse_order
>    2.10%  CPU 5/KVM        [kernel.kallsyms]        [k] __srcu_read_lock
>    2.08%  CPU 5/KVM        [kernel.kallsyms]        [k] flush_tlb_func
>    1.52%  CPU 5/KVM        [kernel.kallsyms]        [k] vcpu_enter_guest.constprop.0
> 
> And here are the results from the guest VM after applying my patch:
> 
> Samples: 51  of event 'cycles:P', Event count (approx.): 37553709
> Overhead  Command          Shared Object            Symbol
>    7.94%  qemu-system-x86  [kernel.kallsyms]        [k] __schedule
>    7.94%  qemu-system-x86  libc.so.6                [.] 0x00000000000a2702
>    7.94%  qemu-system-x86  qemu-system-x86_64       [.] 0x000000000057603c
>    7.43%  qemu-system-x86  libc.so.6                [.] malloc
>    7.43%  qemu-system-x86  libc.so.6                [.] 0x00000000001af9c0
>    6.37%  IO mon_iothread  libglib-2.0.so.0.7200.4  [.] g_mutex_unlock
>    5.21%  IO mon_iothread  [kernel.kallsyms]        [k] __pollwait
>    4.70%  IO mon_iothread  [kernel.kallsyms]        [k] clear_bhb_loop
>    3.56%  IO mon_iothread  [kernel.kallsyms]        [k] __secure_computing
>    3.56%  IO mon_iothread  libglib-2.0.so.0.7200.4  [.] g_main_context_query
>    3.15%  IO mon_iothread  [kernel.kallsyms]        [k] __hrtimer_start_range_ns
>    3.15%  IO mon_iothread  [kernel.kallsyms]        [k] _raw_spin_lock_irq
>    2.88%  IO mon_iothread  libglib-2.0.so.0.7200.4  [.] g_main_context_prepare
>    2.83%  qemu-system-x86  libglib-2.0.so.0.7200.4  [.] g_slist_foreach
>    2.58%  IO mon_iothread  qemu-system-x86_64       [.] 0x00000000004e820d
>    2.21%  qemu-system-x86  libc.so.6                [.] 0x0000000000088010
>    1.94%  IO mon_iothread  [kernel.kallsyms]        [k] arch_exit_to_user_mode_prepar
> 
> As you can see, CPU consumption is significantly reduced after applying the
> proposed change after panic logic, with KVM-related functions (e.g.,
> vmx_vmexit()) dropping from more than 70% of CPU usage to virtually
> nothing. Also, the num of samples decreased from 55K to 51 and the event
> count dropped from 36.09 billion to 37.55 million.
> 
> Carlos Bilbao at DigitalOcean (2):
>   panic: Allow archs to reduce CPU consumption after panic
>   x86/panic: Use safe_halt() for CPU halt after panic
> 
> ---
> 
>  arch/x86/kernel/Makefile |  1 +
>  arch/x86/kernel/panic.c  |  9 +++++++++
>  kernel/panic.c           | 12 +++++++++++-
>  3 files changed, 21 insertions(+), 1 deletion(-)
>  create mode 100644 arch/x86/kernel/panic.c
> 
> 
> From mboxrd@z Thu Jan  1 00:00:00 1970
> Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201])
> 	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
> 	(No client certificate requested)
> 	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7B42E1F4174
> 	for <linux-kernel@vger.kernel.org>; Wed, 26 Mar 2025 15:12:15 +0000 (UTC)
> Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201
> ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
> 	t=1743001935; cv=none; b=pTx5wAwLeH5sWAAgsmlCk1lZgzSyUJH4X0UwzbEXvNm3EDKfoAwmJNvbIAk6ESdDQZ4j/9u/Tr51T9mIAGBteoeogjNzS7CEhokwMvfjLwfK/GZHSzyN+0oqtMptT829NvzENA2BVex1DLKAjsePN5nTlrf3/WMHr1bcmQSYBG4=
> ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
> 	s=arc-20240116; t=1743001935; c=relaxed/simple;
> 	bh=dvH6cZROBDqL/EIJB0ddluLgh3GMP5pgXEtD5g291tI=;
> 	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
> 	 MIME-Version; b=oQxBjv4Hpv/rJEVcoN/5DAetBXYAQcfNM++r5iZT8phmtHiLu/rCJ3KAEuqzqy6ffyuEgAPLj8oD9G5nwxUFtscLmkYOL1LlhmcNF5Qtdfnmpdbtsd6oCsCd+9eV0diUhXdALWysZAF6aQXpSZw5LUfT8xresIHTaKWrp6pvX7A=
> ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=MurPLbRg; arc=none smtp.client-ip=10.30.226.201
> Authentication-Results: smtp.subspace.kernel.org;
> 	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="MurPLbRg"
> Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4D6FDC4CEE8;
> 	Wed, 26 Mar 2025 15:12:14 +0000 (UTC)
> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
> 	s=k20201202; t=1743001935;
> 	bh=dvH6cZROBDqL/EIJB0ddluLgh3GMP5pgXEtD5g291tI=;
> 	h=From:To:Cc:Subject:Date:In-Reply-To:References:From;
> 	b=MurPLbRgOhvxG7DGoeI4e6uf1uBNgQKuYEVn+R9J1Ys/ntkU8s2GjleUUf4P5gSje
> 	 K0Qw27qmTj6yClEmUiZYU3Jw1dUraF20/y3Y5X2ULu4JIBKzJDJcs5zPefI7Hkzoie
> 	 vbSpNhTmjCjRrUQu0tIv9GAwFTQynj6olDOMx+wonf4CXVF2xg0OSv6n4KuZs9Plps
> 	 V14SmWmJUQLArdVDliLtaFaZ+VR12eQgLTD7XuLG8HExBuGdATgUYre2U3B9lGEfxr
> 	 RcHi7NoRsrkmWAkQfXjInPNCwOkLvWM6CaaRHxsMWSD8aK5/8DS82WxDealKGyUyX0
> 	 LuAEXKNekpppw==
> From: carlos.bilbao@kernel.org
> To: tglx@linutronix.de
> Cc: bilbao@vt.edu,
> 	pmladek@suse.com,
> 	akpm@linux-foundation.org,
> 	jan.glauber@gmail.com,
> 	jani.nikula@intel.com,
> 	linux-kernel@vger.kernel.org,
> 	gregkh@linuxfoundation.org,
> 	takakura@valinux.co.jp,
> 	john.ogness@linutronix.de,
> 	Carlos Bilbao <carlos.bilbao@kernel.org>
> Subject: [PATCH 1/2] panic: Allow archs to reduce CPU consumption after panic
> Date: Wed, 26 Mar 2025 10:12:03 -0500
> Message-ID: <20250326151204.67898-2-carlos.bilbao@kernel.org>
> X-Mailer: git-send-email 2.47.1
> In-Reply-To: <20250326151204.67898-1-carlos.bilbao@kernel.org>
> References: <20250326151204.67898-1-carlos.bilbao@kernel.org>
> Precedence: bulk
> X-Mailing-List: linux-kernel@vger.kernel.org
> List-Id: <linux-kernel.vger.kernel.org>
> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
> MIME-Version: 1.0
> Content-Transfer-Encoding: 8bit
> 
> From: Carlos Bilbao <carlos.bilbao@kernel.org>
> 
> After handling a panic, the kernel enters a busy-wait loop, unnecessarily
> consuming CPU and potentially impacting other workloads including other
> guest VMs in the case of virtualized setups.
> 
> Introduce cpu_halt_after_panic(), a weak function that archs can override
> for a more efficient halt of CPU work. By default, it preserves the
> pre-existing behavior of delay.
> 
> Signed-off-by: Carlos Bilbao (DigitalOcean) <carlos.bilbao@kernel.org>
> Reviewed-by: Jan Glauber (DigitalOcean) <jan.glauber@gmail.com>
> ---
>  kernel/panic.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/panic.c b/kernel/panic.c
> index fbc59b3b64d0..fafe3fa22533 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -276,6 +276,16 @@ static void panic_other_cpus_shutdown(bool crash_kexec)
>  		crash_smp_send_stop();
>  }
>  
> +/*
> + * Called after a kernel panic has been handled, at which stage halting
> + * the CPU can help reduce unnecessary CPU consumption. In the absence of
> + * arch-specific implementations, just delay
> + */
> +static void __weak cpu_halt_after_panic(void)
> +{
> +	mdelay(PANIC_TIMER_STEP);
> +}
> +
>  /**
>   *	panic - halt the system
>   *	@fmt: The text string to print
> @@ -474,7 +484,7 @@ void panic(const char *fmt, ...)
>  			i += panic_blink(state ^= 1);
>  			i_next = i + 3600 / PANIC_BLINK_SPD;
>  		}
> -		mdelay(PANIC_TIMER_STEP);
> +		cpu_halt_after_panic();
>  	}
>  }
>