.../admin-guide/kernel-parameters.txt | 12 +++++++++++ arch/x86/kernel/cpu/intel_epb.c | 21 +++++++++++++++++-- 2 files changed, 31 insertions(+), 2 deletions(-)
Buggy BIOSes may not set a sane boot-time Energy Performance Bias (EPB).
A result of this may be overheating or excess power usage. The kernel
overrides any boot-time EPB "performance" bias to "normal" to avoid this.
When used in data centers it is preferable keep the EPB at "performance"
when performing a live-update of the host kernel via a kexec to the new
kernel. This is due to boot-time being critical when performing the kexec
as running guest VMs will perceieve this as latency or downtime.
On Intel Xeon Ice Lake platforms it has been observed that a combination of
EPB being set to "normal" alongside HWP (Intel Hardware P-states) being
enabled/configured during or close to the kexec causes an increases the
live-update/kexec downtime by 7 times compared to when the EPB is set to
"performance".
Introduce a command-line parameter, "intel_epb=preserve", to skip the
"performance" -> "normal" override/workaround. This maintains prior
functionality when no parameter is set, but adds in the ability to stay at
performance for a speedy kexec if a user wishes.
Signed-off-by: Jack Allister <jalliste@amazon.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Cc: Paul Durrant <pdurrant@amazon.com>
Cc: Jue Wang <juew@amazon.com>
Cc: Usama Arif <usama.arif@bytedance.com>
---
.../admin-guide/kernel-parameters.txt | 12 +++++++++++
arch/x86/kernel/cpu/intel_epb.c | 21 +++++++++++++++++--
2 files changed, 31 insertions(+), 2 deletions(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 65731b060e3f..5602ee213115 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2148,6 +2148,18 @@
0 disables intel_idle and fall back on acpi_idle.
1 to 9 specify maximum depth of C-state.
+ intel_epb= [X86]
+ auto
+ Same as not passing a parameter to intel_epb. This will
+ ensure that the intel_epb module will restore the energy
+ performance bias to "normal" at boot-time. This workaround
+ is for buggy BIOSes which may not set this value and cause
+ either overheating or excess power usage.
+ preserve
+ At kernel boot-time if the EPB value is read as "performance"
+ keep it at this value. This prevents the "performance" -> "normal"
+ transition which is a workaround mentioned above.
+
intel_pstate= [X86]
disable
Do not enable intel_pstate as the default
diff --git a/arch/x86/kernel/cpu/intel_epb.c b/arch/x86/kernel/cpu/intel_epb.c
index e4c3ba91321c..419e699a43e6 100644
--- a/arch/x86/kernel/cpu/intel_epb.c
+++ b/arch/x86/kernel/cpu/intel_epb.c
@@ -50,7 +50,8 @@
* the OS will do that anyway. That sometimes is problematic, as it may cause
* the system battery to drain too fast, for example, so it is better to adjust
* it on CPU bring-up and if the initial EPB value for a given CPU is 0, the
- * kernel changes it to 6 ('normal').
+ * kernel changes it to 6 ('normal'). However, if it is desirable to retain the
+ * original initial EPB value, intel_epb=preserve can be set to enforce it.
*/
static DEFINE_PER_CPU(u8, saved_epb);
@@ -75,6 +76,8 @@ static u8 energ_perf_values[] = {
[EPB_INDEX_POWERSAVE] = ENERGY_PERF_BIAS_POWERSAVE,
};
+static bool intel_epb_no_override __read_mostly;
+
static int intel_epb_save(void)
{
u64 epb;
@@ -106,7 +109,7 @@ static void intel_epb_restore(void)
* ('normal').
*/
val = epb & EPB_MASK;
- if (val == ENERGY_PERF_BIAS_PERFORMANCE) {
+ if (!intel_epb_no_override && val == ENERGY_PERF_BIAS_PERFORMANCE) {
val = energ_perf_values[EPB_INDEX_NORMAL];
pr_warn_once("ENERGY_PERF_BIAS: Set to 'normal', was 'performance'\n");
}
@@ -213,6 +216,20 @@ static const struct x86_cpu_id intel_epb_normal[] = {
{}
};
+static __init int parse_intel_epb(char *str)
+{
+ if (!str)
+ return 0;
+
+ /* "intel_epb=preserve" prevents PERFORMANCE->NORMAL on restore. */
+ if (!strcmp(str, "preserve"))
+ intel_epb_no_override = true;
+
+ return 0;
+}
+
+early_param("intel_epb", parse_intel_epb);
+
static __init int intel_epb_init(void)
{
const struct x86_cpu_id *id = x86_match_cpu(intel_epb_normal);
--
2.40.1
On 1/3/24 06:46, Jack Allister wrote: > + intel_epb= [X86] > + auto > + Same as not passing a parameter to intel_epb. This will > + ensure that the intel_epb module will restore the energy > + performance bias to "normal" at boot-time. This workaround > + is for buggy BIOSes which may not set this value and cause > + either overheating or excess power usage. > + preserve > + At kernel boot-time if the EPB value is read as "performance" > + keep it at this value. This prevents the "performance" -> "normal" > + transition which is a workaround mentioned above. This ends up describing the nitty-gritty details of the implementation instead of what users should take away from the options. Could we up level this a bit? How about this? intel_epb= [X86] auto (default) Work around buggy BIOSes to avoid excess power usage by forcing performance bias to "normal" at boot- time. preserve Do not override the existing performance bias setting. Useful if a previous kernel or bootloader's setting is more desirable than "normal". It's better formatted and uses the "(default)" tag instead of trying to explain it in prose. It also explains when someone might want to use the override instead of just explaining its function.
Buggy BIOSes may not set a sane boot-time Energy Performance Bias (EPB).
A result of this may be overheating or excess power usage. The kernel
overrides any boot-time EPB "performance" bias to "normal" to avoid this.
When used in data centers it is preferable keep the EPB at "performance"
when performing a live-update of the host kernel via a kexec to the new
kernel. This is due to boot-time being critical when performing the kexec
as running guest VMs will perceieve this as latency or downtime.
On Intel Xeon Ice Lake platforms it has been observed that a combination of
EPB being set to "normal" alongside HWP (Intel Hardware P-states) being
enabled/configured during or close to the kexec causes an increases the
live-update/kexec downtime by 7 times compared to when the EPB is set to
"performance".
Introduce a command-line parameter, "intel_epb=preserve", to skip the
"performance" -> "normal" override/workaround. This maintains prior
functionality when no parameter is set, but adds in the ability to stay at
performance for a speedy kexec if a user wishes.
Signed-off-by: Jack Allister <jalliste@amazon.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Cc: Paul Durrant <pdurrant@amazon.com>
Cc: Jue Wang <juew@amazon.com>
Cc: Usama Arif <usama.arif@bytedance.com>
---
.../admin-guide/kernel-parameters.txt | 9 ++++++++
arch/x86/kernel/cpu/intel_epb.c | 22 +++++++++++++++++--
2 files changed, 29 insertions(+), 2 deletions(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 65731b060e3f..d28f2fc41c0c 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2148,6 +2148,15 @@
0 disables intel_idle and fall back on acpi_idle.
1 to 9 specify maximum depth of C-state.
+ intel_epb= [X86]
+ auto (default)
+ Work around buggy BIOSes to avoid excess power usage
+ by forcing the performance bias to "normal" at boot-time.
+ preserve
+ Do not override the existing performance bias setting.
+ Useful if a previous kernel or bootloader's setting is
+ more desirable than "normal".
+
intel_pstate= [X86]
disable
Do not enable intel_pstate as the default
diff --git a/arch/x86/kernel/cpu/intel_epb.c b/arch/x86/kernel/cpu/intel_epb.c
index e4c3ba91321c..01d406177751 100644
--- a/arch/x86/kernel/cpu/intel_epb.c
+++ b/arch/x86/kernel/cpu/intel_epb.c
@@ -50,7 +50,8 @@
* the OS will do that anyway. That sometimes is problematic, as it may cause
* the system battery to drain too fast, for example, so it is better to adjust
* it on CPU bring-up and if the initial EPB value for a given CPU is 0, the
- * kernel changes it to 6 ('normal').
+ * kernel changes it to 6 ('normal'). However, if it is desirable to retain the
+ * original initial EPB value, intel_epb=preserve can be set to enforce it.
*/
static DEFINE_PER_CPU(u8, saved_epb);
@@ -75,6 +76,8 @@ static u8 energ_perf_values[] = {
[EPB_INDEX_POWERSAVE] = ENERGY_PERF_BIAS_POWERSAVE,
};
+static bool intel_epb_no_override __read_mostly;
+
static int intel_epb_save(void)
{
u64 epb;
@@ -106,7 +109,7 @@ static void intel_epb_restore(void)
* ('normal').
*/
val = epb & EPB_MASK;
- if (val == ENERGY_PERF_BIAS_PERFORMANCE) {
+ if (!intel_epb_no_override && val == ENERGY_PERF_BIAS_PERFORMANCE) {
val = energ_perf_values[EPB_INDEX_NORMAL];
pr_warn_once("ENERGY_PERF_BIAS: Set to 'normal', was 'performance'\n");
}
@@ -213,6 +216,21 @@ static const struct x86_cpu_id intel_epb_normal[] = {
{}
};
+static __init int parse_intel_epb(char *str)
+{
+ if (!str)
+ return 0;
+
+ /* "intel_epb=preserve" prevents PERFORMANCE->NORMAL on restore. */
+ if (!strcmp(str, "preserve"))
+ intel_epb_no_override = true;
+
+ /* "intel_epb=auto" not explicitly checked as default behaviour. */
+ return 0;
+}
+
+early_param("intel_epb", parse_intel_epb);
+
static __init int intel_epb_init(void)
{
const struct x86_cpu_id *id = x86_match_cpu(intel_epb_normal);
--
2.40.1
> -----Original Message----- > From: Jack Allister <jalliste@amazon.com> > Sent: 04 January 2024 09:06 > Cc: Allister, Jack <jalliste@amazon.co.uk>; Rafael J . Wysocki > <rafael@kernel.org>; Durrant, Paul <pdurrant@amazon.co.uk>; Wang, Jue > <juew@amazon.com>; Usama Arif <usama.arif@bytedance.com>; Jonathan Corbet > <corbet@lwn.net>; Thomas Gleixner <tglx@linutronix.de>; Ingo Molnar > <mingo@redhat.com>; Borislav Petkov <bp@alien8.de>; Dave Hansen > <dave.hansen@linux.intel.com>; x86@kernel.org; H. Peter Anvin > <hpa@zytor.com>; Paul E. McKenney <paulmck@kernel.org>; Randy Dunlap > <rdunlap@infradead.org>; Tejun Heo <tj@kernel.org>; Peter Zijlstra > <peterz@infradead.org>; Yan-Jie Wang <yanjiewtw@gmail.com>; Hans de Goede > <hdegoede@redhat.com>; linux-doc@vger.kernel.org; linux- > kernel@vger.kernel.org > Subject: [PATCH v6] x86: intel_epb: Add earlyparam option to keep bias at > performance > > Buggy BIOSes may not set a sane boot-time Energy Performance Bias (EPB). > A result of this may be overheating or excess power usage. The kernel > overrides any boot-time EPB "performance" bias to "normal" to avoid this. > > When used in data centers it is preferable keep the EPB at "performance" > when performing a live-update of the host kernel via a kexec to the new > kernel. This is due to boot-time being critical when performing the kexec > as running guest VMs will perceieve this as latency or downtime. > > On Intel Xeon Ice Lake platforms it has been observed that a combination > of > EPB being set to "normal" alongside HWP (Intel Hardware P-states) being > enabled/configured during or close to the kexec causes an increases the > live-update/kexec downtime by 7 times compared to when the EPB is set to > "performance". > > Introduce a command-line parameter, "intel_epb=preserve", to skip the > "performance" -> "normal" override/workaround. This maintains prior > functionality when no parameter is set, but adds in the ability to stay at > performance for a speedy kexec if a user wishes. > > Signed-off-by: Jack Allister <jalliste@amazon.com> > Acked-by: Rafael J. Wysocki <rafael@kernel.org> > Cc: Paul Durrant <pdurrant@amazon.com> > Cc: Jue Wang <juew@amazon.com> > Cc: Usama Arif <usama.arif@bytedance.com> > --- > .../admin-guide/kernel-parameters.txt | 9 ++++++++ > arch/x86/kernel/cpu/intel_epb.c | 22 +++++++++++++++++-- > 2 files changed, 29 insertions(+), 2 deletions(-) > Reviewed-by: Paul Durrant <pdurrant@amazon.com>
On 03/01/2024 14:46, Jack Allister wrote:
> Buggy BIOSes may not set a sane boot-time Energy Performance Bias (EPB).
> A result of this may be overheating or excess power usage. The kernel
> overrides any boot-time EPB "performance" bias to "normal" to avoid this.
>
> When used in data centers it is preferable keep the EPB at "performance"
> when performing a live-update of the host kernel via a kexec to the new
> kernel. This is due to boot-time being critical when performing the kexec
> as running guest VMs will perceieve this as latency or downtime.
>
> On Intel Xeon Ice Lake platforms it has been observed that a combination of
> EPB being set to "normal" alongside HWP (Intel Hardware P-states) being
> enabled/configured during or close to the kexec causes an increases the
> live-update/kexec downtime by 7 times compared to when the EPB is set to
> "performance".
>
> Introduce a command-line parameter, "intel_epb=preserve", to skip the
> "performance" -> "normal" override/workaround. This maintains prior
> functionality when no parameter is set, but adds in the ability to stay at
> performance for a speedy kexec if a user wishes.
>
> Signed-off-by: Jack Allister <jalliste@amazon.com>
> Acked-by: Rafael J. Wysocki <rafael@kernel.org>
> Cc: Paul Durrant <pdurrant@amazon.com>
> Cc: Jue Wang <juew@amazon.com>
> Cc: Usama Arif <usama.arif@bytedance.com>
> ---
> .../admin-guide/kernel-parameters.txt | 12 +++++++++++
> arch/x86/kernel/cpu/intel_epb.c | 21 +++++++++++++++++--
> 2 files changed, 31 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 65731b060e3f..5602ee213115 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -2148,6 +2148,18 @@
> 0 disables intel_idle and fall back on acpi_idle.
> 1 to 9 specify maximum depth of C-state.
>
> + intel_epb= [X86]
> + auto
> + Same as not passing a parameter to intel_epb. This will
> + ensure that the intel_epb module will restore the energy
> + performance bias to "normal" at boot-time. This workaround
> + is for buggy BIOSes which may not set this value and cause
> + either overheating or excess power usage.
Hi,
Thanks for the patch. Is auto needed over here? It was pointed in an
earlier review that it could be an option, but it doesn't seem to serve
a purpose. Its also not how the code works, i.e. intel_epb=abc would be
the same as intel_epb=auto. Just could add a print if intel_epb=preserve
is not encountered then an unexpected value has been passed in.
> + preserve
> + At kernel boot-time if the EPB value is read as "performance"
> + keep it at this value. This prevents the "performance" -> "normal"
> + transition which is a workaround mentioned above.
> +
> intel_pstate= [X86]
> disable
> Do not enable intel_pstate as the default
> diff --git a/arch/x86/kernel/cpu/intel_epb.c b/arch/x86/kernel/cpu/intel_epb.c
> index e4c3ba91321c..419e699a43e6 100644
> --- a/arch/x86/kernel/cpu/intel_epb.c
> +++ b/arch/x86/kernel/cpu/intel_epb.c
> @@ -50,7 +50,8 @@
> * the OS will do that anyway. That sometimes is problematic, as it may cause
> * the system battery to drain too fast, for example, so it is better to adjust
> * it on CPU bring-up and if the initial EPB value for a given CPU is 0, the
> - * kernel changes it to 6 ('normal').
> + * kernel changes it to 6 ('normal'). However, if it is desirable to retain the
> + * original initial EPB value, intel_epb=preserve can be set to enforce it.
> */
>
> static DEFINE_PER_CPU(u8, saved_epb);
> @@ -75,6 +76,8 @@ static u8 energ_perf_values[] = {
> [EPB_INDEX_POWERSAVE] = ENERGY_PERF_BIAS_POWERSAVE,
> };
>
> +static bool intel_epb_no_override __read_mostly;
> +
> static int intel_epb_save(void)
> {
> u64 epb;
> @@ -106,7 +109,7 @@ static void intel_epb_restore(void)
> * ('normal').
> */
> val = epb & EPB_MASK;
> - if (val == ENERGY_PERF_BIAS_PERFORMANCE) {
> + if (!intel_epb_no_override && val == ENERGY_PERF_BIAS_PERFORMANCE) {
> val = energ_perf_values[EPB_INDEX_NORMAL];
> pr_warn_once("ENERGY_PERF_BIAS: Set to 'normal', was 'performance'\n");
> }
> @@ -213,6 +216,20 @@ static const struct x86_cpu_id intel_epb_normal[] = {
> {}
> };
>
> +static __init int parse_intel_epb(char *str)
> +{
> + if (!str)
> + return 0;
> +
> + /* "intel_epb=preserve" prevents PERFORMANCE->NORMAL on restore. */
> + if (!strcmp(str, "preserve"))
> + intel_epb_no_override = true;
> +
Maybe add an print in else here to say that unexpected value has been
encountered for intel_epb if preserve is not seen.
Thanks,
Usama
> + return 0;
> +}
> +
> +early_param("intel_epb", parse_intel_epb);
> +
> static __init int intel_epb_init(void)
> {
> const struct x86_cpu_id *id = x86_match_cpu(intel_epb_normal);
> Thanks for the patch. Is auto needed over here? It was pointed in an > earlier review that it could be an option, but it doesn't seem to serve > a purpose. Auto is effectively just the default as if no parameter is passed in here. In the reply from Dave for he has mentioned that displaying it like this may actually be clearer. ``` intel_epb= [X86] auto (default) ``` As we're not implicitly not taking any action for this default case it doesn't make too much sense to add in a specific strcmp case for auto, however what I can do is add a comment within the code to explicitly show that this is effectively a no-op when parsing. > Maybe add an print in else here to say that unexpected value has been > encountered for intel_epb if preserve is not seen. I'd be hesitant to do this as we already have the pr_warn_once during the intel_epb_restore path when defaulting from perf -> normal.
© 2016 - 2025 Red Hat, Inc.