[PATCH] x86: intel_epb: Add earlyparam option to keep bias at performance

Jack Allister posted 1 patch 2 years ago
There is a newer version of this series
arch/x86/kernel/cpu/intel_epb.c | 19 ++++++++++++++++---
1 file changed, 16 insertions(+), 3 deletions(-)
[PATCH] x86: intel_epb: Add earlyparam option to keep bias at performance
Posted by Jack Allister 2 years ago
There are certain scenarios where it may be intentional that the EPB was
set at to 0/ENERGY_PERF_BIAS_PERFORMANCE on kernel boot. For example, in
data centers a kexec/live-update of the kernel may be performed regularly.

Usually this live-update is time critical and defaulting of the bias back
to ENERGY_PERF_BIAS_NORMAL may actually be detrimental to the overall
update time if processors' time to ramp up/boost are affected.

This patch introduces a kernel command line "intel_epb_keep_performance"
which will leave the EPB at performance if during the restoration code path
it is detected as such.

Signed-off-by: Jack Allister <jalliste@amazon.com>
Cc: Paul Durrant <pdurrant@amazon.com>
Cc: Jue Wang <juew@amazon.com>
Cc: Usama Arif <usama.arif@bytedance.com>
---
 arch/x86/kernel/cpu/intel_epb.c | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_epb.c b/arch/x86/kernel/cpu/intel_epb.c
index e4c3ba91321c..0c7dd092f723 100644
--- a/arch/x86/kernel/cpu/intel_epb.c
+++ b/arch/x86/kernel/cpu/intel_epb.c
@@ -50,7 +50,8 @@
  * the OS will do that anyway.  That sometimes is problematic, as it may cause
  * the system battery to drain too fast, for example, so it is better to adjust
  * it on CPU bring-up and if the initial EPB value for a given CPU is 0, the
- * kernel changes it to 6 ('normal').
+ * kernel changes it to 6 ('normal'). This however is overridable via
+ * intel_epb_keep_performance if required.
  */
 
 static DEFINE_PER_CPU(u8, saved_epb);
@@ -75,6 +76,8 @@ static u8 energ_perf_values[] = {
 	[EPB_INDEX_POWERSAVE] = ENERGY_PERF_BIAS_POWERSAVE,
 };
 
+static bool intel_epb_keep_performance __read_mostly;
+
 static int intel_epb_save(void)
 {
 	u64 epb;
@@ -107,8 +110,12 @@ static void intel_epb_restore(void)
 		 */
 		val = epb & EPB_MASK;
 		if (val == ENERGY_PERF_BIAS_PERFORMANCE) {
-			val = energ_perf_values[EPB_INDEX_NORMAL];
-			pr_warn_once("ENERGY_PERF_BIAS: Set to 'normal', was 'performance'\n");
+			if (!intel_epb_keep_performance) {
+				val = energ_perf_values[EPB_INDEX_NORMAL];
+				pr_warn_once("ENERGY_PERF_BIAS: Set to 'normal', was 'performance'\n");
+			} else {
+				pr_warn_once("ENERGY_PERF_BIAS: Kept at 'performance', no change\n");
+			}
 		}
 	}
 	wrmsrl(MSR_IA32_ENERGY_PERF_BIAS, (epb & ~EPB_MASK) | val);
@@ -213,6 +220,12 @@ static const struct x86_cpu_id intel_epb_normal[] = {
 	{}
 };
 
+static __init int intel_epb_keep_performance_setup(char *str)
+{
+	return kstrtobool(str, &intel_epb_keep_performance);
+}
+early_param("intel_epb_keep_performance", intel_epb_keep_performance_setup);
+
 static __init int intel_epb_init(void)
 {
 	const struct x86_cpu_id *id = x86_match_cpu(intel_epb_normal);
-- 
2.40.1
Re: [PATCH] x86: intel_epb: Add earlyparam option to keep bias at performance
Posted by Rafael J. Wysocki 2 years ago
On Mon, Dec 4, 2023 at 6:30 PM Jack Allister <jalliste@amazon.com> wrote:
>
> There are certain scenarios where it may be intentional that the EPB was
> set at to 0/ENERGY_PERF_BIAS_PERFORMANCE on kernel boot. For example, in
> data centers a kexec/live-update of the kernel may be performed regularly.
>
> Usually this live-update is time critical and defaulting of the bias back
> to ENERGY_PERF_BIAS_NORMAL may actually be detrimental to the overall
> update time if processors' time to ramp up/boost are affected.
>
> This patch introduces a kernel command line "intel_epb_keep_performance"
> which will leave the EPB at performance if during the restoration code path
> it is detected as such.
>
> Signed-off-by: Jack Allister <jalliste@amazon.com>
> Cc: Paul Durrant <pdurrant@amazon.com>
> Cc: Jue Wang <juew@amazon.com>
> Cc: Usama Arif <usama.arif@bytedance.com>
> ---
>  arch/x86/kernel/cpu/intel_epb.c | 19 ++++++++++++++++---
>  1 file changed, 16 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/intel_epb.c b/arch/x86/kernel/cpu/intel_epb.c
> index e4c3ba91321c..0c7dd092f723 100644
> --- a/arch/x86/kernel/cpu/intel_epb.c
> +++ b/arch/x86/kernel/cpu/intel_epb.c
> @@ -50,7 +50,8 @@
>   * the OS will do that anyway.  That sometimes is problematic, as it may cause
>   * the system battery to drain too fast, for example, so it is better to adjust
>   * it on CPU bring-up and if the initial EPB value for a given CPU is 0, the
> - * kernel changes it to 6 ('normal').
> + * kernel changes it to 6 ('normal'). This however is overridable via
> + * intel_epb_keep_performance if required.
>   */
>
>  static DEFINE_PER_CPU(u8, saved_epb);
> @@ -75,6 +76,8 @@ static u8 energ_perf_values[] = {
>         [EPB_INDEX_POWERSAVE] = ENERGY_PERF_BIAS_POWERSAVE,
>  };
>
> +static bool intel_epb_keep_performance __read_mostly;
> +
>  static int intel_epb_save(void)
>  {
>         u64 epb;
> @@ -107,8 +110,12 @@ static void intel_epb_restore(void)
>                  */
>                 val = epb & EPB_MASK;
>                 if (val == ENERGY_PERF_BIAS_PERFORMANCE) {
> -                       val = energ_perf_values[EPB_INDEX_NORMAL];
> -                       pr_warn_once("ENERGY_PERF_BIAS: Set to 'normal', was 'performance'\n");
> +                       if (!intel_epb_keep_performance) {

if (!intel_epb_keep_performance && val == ENERGY_PERF_BIAS_PERFORMANCE) {

and you need not notify the sysadmin that the original value has
returned - they have set the command line switch for this purpose
after all.

> +                               val = energ_perf_values[EPB_INDEX_NORMAL];
> +                               pr_warn_once("ENERGY_PERF_BIAS: Set to 'normal', was 'performance'\n");
> +                       } else {
> +                               pr_warn_once("ENERGY_PERF_BIAS: Kept at 'performance', no change\n");
> +                       }
>                 }
>         }
>         wrmsrl(MSR_IA32_ENERGY_PERF_BIAS, (epb & ~EPB_MASK) | val);
> @@ -213,6 +220,12 @@ static const struct x86_cpu_id intel_epb_normal[] = {
>         {}
>  };
>
> +static __init int intel_epb_keep_performance_setup(char *str)
> +{
> +       return kstrtobool(str, &intel_epb_keep_performance);
> +}
> +early_param("intel_epb_keep_performance", intel_epb_keep_performance_setup);
> +
>  static __init int intel_epb_init(void)
>  {
>         const struct x86_cpu_id *id = x86_match_cpu(intel_epb_normal);
> --
> 2.40.1
>
[PATCH] x86: intel_epb: Add earlyparam option to keep bias at performance
Posted by Jack Allister 2 years ago
There are certain scenarios where it may be intentional that the EPB was
set at to 0/ENERGY_PERF_BIAS_PERFORMANCE on kernel boot. For example, in
data centers a kexec/live-update of the kernel may be performed regularly.

Usually this live-update is time critical and defaulting of the bias back
to ENERGY_PERF_BIAS_NORMAL may actually be detrimental to the overall
update time if processors' time to ramp up/boost are affected.

This patch introduces a kernel command line "intel_epb_keep_performance"
which will leave the EPB at performance if during the restoration code path
it is detected as such.

Signed-off-by: Jack Allister <jalliste@amazon.com>
Cc: Paul Durrant <pdurrant@amazon.com>
Cc: Jue Wang <juew@amazon.com>
Cc: Usama Arif <usama.arif@bytedance.com>
---
 arch/x86/kernel/cpu/intel_epb.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_epb.c b/arch/x86/kernel/cpu/intel_epb.c
index e4c3ba91321c..cbe0e224b8d9 100644
--- a/arch/x86/kernel/cpu/intel_epb.c
+++ b/arch/x86/kernel/cpu/intel_epb.c
@@ -50,7 +50,8 @@
  * the OS will do that anyway.  That sometimes is problematic, as it may cause
  * the system battery to drain too fast, for example, so it is better to adjust
  * it on CPU bring-up and if the initial EPB value for a given CPU is 0, the
- * kernel changes it to 6 ('normal').
+ * kernel changes it to 6 ('normal'). This however is overridable via
+ * intel_epb_no_override if required.
  */
 
 static DEFINE_PER_CPU(u8, saved_epb);
@@ -75,6 +76,8 @@ static u8 energ_perf_values[] = {
 	[EPB_INDEX_POWERSAVE] = ENERGY_PERF_BIAS_POWERSAVE,
 };
 
+static bool intel_epb_no_override __read_mostly;
+
 static int intel_epb_save(void)
 {
 	u64 epb;
@@ -106,7 +109,7 @@ static void intel_epb_restore(void)
 		 * ('normal').
 		 */
 		val = epb & EPB_MASK;
-		if (val == ENERGY_PERF_BIAS_PERFORMANCE) {
+		if (!intel_epb_no_override && val == ENERGY_PERF_BIAS_PERFORMANCE) {
 			val = energ_perf_values[EPB_INDEX_NORMAL];
 			pr_warn_once("ENERGY_PERF_BIAS: Set to 'normal', was 'performance'\n");
 		}
@@ -213,6 +216,12 @@ static const struct x86_cpu_id intel_epb_normal[] = {
 	{}
 };
 
+static __init int intel_epb_no_override_setup(char *str)
+{
+	return kstrtobool(str, &intel_epb_no_override);
+}
+early_param("intel_epb_no_override", intel_epb_no_override_setup);
+
 static __init int intel_epb_init(void)
 {
 	const struct x86_cpu_id *id = x86_match_cpu(intel_epb_normal);
-- 
2.40.1
Re: [PATCH] x86: intel_epb: Add earlyparam option to keep bias at performance
Posted by Rafael J. Wysocki 2 years ago
On Tue, Dec 5, 2023 at 2:14 PM Jack Allister <jalliste@amazon.com> wrote:
>
> There are certain scenarios where it may be intentional that the EPB was
> set at to 0/ENERGY_PERF_BIAS_PERFORMANCE on kernel boot. For example, in
> data centers a kexec/live-update of the kernel may be performed regularly.
>
> Usually this live-update is time critical and defaulting of the bias back
> to ENERGY_PERF_BIAS_NORMAL may actually be detrimental to the overall
> update time if processors' time to ramp up/boost are affected.
>
> This patch introduces a kernel command line "intel_epb_keep_performance"
> which will leave the EPB at performance if during the restoration code path
> it is detected as such.
>
> Signed-off-by: Jack Allister <jalliste@amazon.com>
> Cc: Paul Durrant <pdurrant@amazon.com>
> Cc: Jue Wang <juew@amazon.com>
> Cc: Usama Arif <usama.arif@bytedance.com>
> ---
>  arch/x86/kernel/cpu/intel_epb.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/intel_epb.c b/arch/x86/kernel/cpu/intel_epb.c
> index e4c3ba91321c..cbe0e224b8d9 100644
> --- a/arch/x86/kernel/cpu/intel_epb.c
> +++ b/arch/x86/kernel/cpu/intel_epb.c
> @@ -50,7 +50,8 @@
>   * the OS will do that anyway.  That sometimes is problematic, as it may cause
>   * the system battery to drain too fast, for example, so it is better to adjust
>   * it on CPU bring-up and if the initial EPB value for a given CPU is 0, the
> - * kernel changes it to 6 ('normal').
> + * kernel changes it to 6 ('normal'). This however is overridable via
> + * intel_epb_no_override if required.
>   */

In the comment above I would say

"However, if it is desirable to retain the original initial EPB value,
intel_epb_no_override can be set to enforce it."

Otherwise

Acked-by: Rafael J. Wysocki <rafael@kernel.org>

>
>  static DEFINE_PER_CPU(u8, saved_epb);
> @@ -75,6 +76,8 @@ static u8 energ_perf_values[] = {
>         [EPB_INDEX_POWERSAVE] = ENERGY_PERF_BIAS_POWERSAVE,
>  };
>
> +static bool intel_epb_no_override __read_mostly;
> +
>  static int intel_epb_save(void)
>  {
>         u64 epb;
> @@ -106,7 +109,7 @@ static void intel_epb_restore(void)
>                  * ('normal').
>                  */
>                 val = epb & EPB_MASK;
> -               if (val == ENERGY_PERF_BIAS_PERFORMANCE) {
> +               if (!intel_epb_no_override && val == ENERGY_PERF_BIAS_PERFORMANCE) {
>                         val = energ_perf_values[EPB_INDEX_NORMAL];
>                         pr_warn_once("ENERGY_PERF_BIAS: Set to 'normal', was 'performance'\n");
>                 }
> @@ -213,6 +216,12 @@ static const struct x86_cpu_id intel_epb_normal[] = {
>         {}
>  };
>
> +static __init int intel_epb_no_override_setup(char *str)
> +{
> +       return kstrtobool(str, &intel_epb_no_override);
> +}
> +early_param("intel_epb_no_override", intel_epb_no_override_setup);
> +
>  static __init int intel_epb_init(void)
>  {
>         const struct x86_cpu_id *id = x86_match_cpu(intel_epb_normal);
> --
> 2.40.1
>
[PATCH v4] x86: intel_epb: Add earlyparam option to keep bias at performance
Posted by Jack Allister 2 years ago
There are certain scenarios where it may be intentional that the EPB was
set at to 0/ENERGY_PERF_BIAS_PERFORMANCE on kernel boot. For example, in
data centers a kexec/live-update of the kernel may be performed regularly.

Usually this live-update is time critical and defaulting of the bias back
to ENERGY_PERF_BIAS_NORMAL may actually be detrimental to the overall
update time if processors' time to ramp up/boost are affected.

This patch introduces a kernel command line "intel_epb_no_override"
which will leave the EPB at performance if during the restoration code path
it is detected as such.

Signed-off-by: Jack Allister <jalliste@amazon.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Cc: Paul Durrant <pdurrant@amazon.com>
Cc: Jue Wang <juew@amazon.com>
Cc: Usama Arif <usama.arif@bytedance.com>
---
 arch/x86/kernel/cpu/intel_epb.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_epb.c b/arch/x86/kernel/cpu/intel_epb.c
index e4c3ba91321c..4a3523225572 100644
--- a/arch/x86/kernel/cpu/intel_epb.c
+++ b/arch/x86/kernel/cpu/intel_epb.c
@@ -50,7 +50,8 @@
  * the OS will do that anyway.  That sometimes is problematic, as it may cause
  * the system battery to drain too fast, for example, so it is better to adjust
  * it on CPU bring-up and if the initial EPB value for a given CPU is 0, the
- * kernel changes it to 6 ('normal').
+ * kernel changes it to 6 ('normal'). However, if it is desirable to retain the
+ * original initial EPB value, intel_epb_no_override can be set to enforce it.
  */
 
 static DEFINE_PER_CPU(u8, saved_epb);
@@ -75,6 +76,8 @@ static u8 energ_perf_values[] = {
 	[EPB_INDEX_POWERSAVE] = ENERGY_PERF_BIAS_POWERSAVE,
 };
 
+static bool intel_epb_no_override __read_mostly;
+
 static int intel_epb_save(void)
 {
 	u64 epb;
@@ -106,7 +109,7 @@ static void intel_epb_restore(void)
 		 * ('normal').
 		 */
 		val = epb & EPB_MASK;
-		if (val == ENERGY_PERF_BIAS_PERFORMANCE) {
+		if (!intel_epb_no_override && val == ENERGY_PERF_BIAS_PERFORMANCE) {
 			val = energ_perf_values[EPB_INDEX_NORMAL];
 			pr_warn_once("ENERGY_PERF_BIAS: Set to 'normal', was 'performance'\n");
 		}
@@ -213,6 +216,12 @@ static const struct x86_cpu_id intel_epb_normal[] = {
 	{}
 };
 
+static __init int intel_epb_no_override_setup(char *str)
+{
+	return kstrtobool(str, &intel_epb_no_override);
+}
+early_param("intel_epb_no_override", intel_epb_no_override_setup);
+
 static __init int intel_epb_init(void)
 {
 	const struct x86_cpu_id *id = x86_match_cpu(intel_epb_normal);
-- 
2.40.1
Re: [PATCH v4] x86: intel_epb: Add earlyparam option to keep bias at performance
Posted by Dave Hansen 2 years ago
Jack, I'd really appreciate if you could please slow down.

I saw three versions of this patch between the time that I went to bed
and the time I managed to wake up and get a single set of replies out.
As a result, I don't think my feedback for v2 was incorporated for v4,
condemning this to at least a v5.
Re: [PATCH v4] x86: intel_epb: Add earlyparam option to keep bias at performance
Posted by Jack Allister 1 year, 12 months ago
> Jack, I'd really appreciate if you could please slow down.

I'm sorry about this, I'm still extremely unfamiliar with the open-sourcing conventions/
workflow when working with mailing lists & the upstreaming to the Linux kernel. I have
taken this feedback & have started to look through the maintainer tips & docs for
guidance.

Also sorry for the delay in actual response here too, general work responsibilities & the
holiday period has had an affect here.

Before I go ahead with posting up a revision 5 with all of your queries/suggestions I do
have a few questions I'd just like to clarify.

> We could, for instance just support this pair:
>	intel_epb=auto		(default, will hack performance=>normal)
>	intel_epb=preserve	(leave it alone)

With the suggestion above you mentioned implementing this, if this was to be implemented
do you think keeping it `intel_epb_restore_default` as a bool is still worth it? e.g:

```
static __init int intel_epb_no_override_setup(char *str)
{
	if (!str)
		return 0;

	if (!strcmp(str, "preserve"))
		intel_epb_no_override = true;
```

Or do you think it would be worth actually removing `intel_epb_no_override` and creating
a module variable `intel_epb_restore_default` which is an enum of the performance values.

Doing so would then allow for expansability in the future which you had already alluded to
e.g setting to other values such as EPB_INDEX_BALANCE_POWERSAVE/PERFORMANCE.
Re: [PATCH v4] x86: intel_epb: Add earlyparam option to keep bias at performance
Posted by Dave Hansen 1 year, 12 months ago
On 1/2/24 06:46, Jack Allister wrote:
> With the suggestion above you mentioned implementing this, if this was to be implemented
> do you think keeping it `intel_epb_restore_default` as a bool is still worth it? e.g:
...
> Or do you think it would be worth actually removing `intel_epb_no_override` and creating
> a module variable `intel_epb_restore_default` which is an enum of the performance values.
> 
> Doing so would then allow for expansability in the future which you had already alluded to
> e.g setting to other values such as EPB_INDEX_BALANCE_POWERSAVE/PERFORMANCE.

You should leave 'intel_epb_restore_default' as a bool unless there is
some compelling reason to change it.  A _possible_ future need for more
settings isn't compelling.
[PATCH v5] x86: intel_epb: Add earlyparam option to keep bias at performance
Posted by Jack Allister 1 year, 12 months ago
Buggy BIOSes may not set a sane boot-time Energy Performance Bias (EPB).
A result of this may be overheating or excess power usage. The kernel
overrides any boot-time EPB "performance" bias to "normal" to avoid this.

When used in data centers it is preferable keep the EPB at "performance"
when performing a live-update of the host kernel via a kexec to the new
kernel. This is due to boot-time being critical when performing the kexec
as running guest VMs will perceieve this as latency or downtime.

On Intel Xeon Ice Lake platforms it has been observed that a combination of
EPB being set to "normal" alongside HWP (Intel Hardware P-states) being
enabled/configured during or close to the kexec causes an increases the
live-update/kexec downtime by 7 times compared to when the EPB is set to
"performance".

Introduce a command-line parameter, "intel_epb=preserve", to skip the
"performance" -> "normal" override/workaround. This maintains prior
functionality when no parameter is set, but adds in the ability to stay at
performance for a speedy kexec if a user wishes.

Signed-off-by: Jack Allister <jalliste@amazon.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Cc: Paul Durrant <pdurrant@amazon.com>
Cc: Jue Wang <juew@amazon.com>
Cc: Usama Arif <usama.arif@bytedance.com>
---
 .../admin-guide/kernel-parameters.txt         | 12 +++++++++++
 arch/x86/kernel/cpu/intel_epb.c               | 21 +++++++++++++++++--
 2 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 65731b060e3f..5602ee213115 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2148,6 +2148,18 @@
 			0	disables intel_idle and fall back on acpi_idle.
 			1 to 9	specify maximum depth of C-state.
 
+	intel_epb=	[X86]
+			auto
+			  Same as not passing a parameter to intel_epb. This will
+			  ensure that the intel_epb module will restore the energy
+			  performance bias to "normal" at boot-time. This workaround
+			  is for buggy BIOSes which may not set this value and cause
+			  either overheating or excess power usage.
+			preserve
+			  At kernel boot-time if the EPB value is read as "performance"
+			  keep it at this value. This prevents the "performance" -> "normal"
+			  transition which is a workaround mentioned above.
+
 	intel_pstate=	[X86]
 			disable
 			  Do not enable intel_pstate as the default
diff --git a/arch/x86/kernel/cpu/intel_epb.c b/arch/x86/kernel/cpu/intel_epb.c
index e4c3ba91321c..419e699a43e6 100644
--- a/arch/x86/kernel/cpu/intel_epb.c
+++ b/arch/x86/kernel/cpu/intel_epb.c
@@ -50,7 +50,8 @@
  * the OS will do that anyway.  That sometimes is problematic, as it may cause
  * the system battery to drain too fast, for example, so it is better to adjust
  * it on CPU bring-up and if the initial EPB value for a given CPU is 0, the
- * kernel changes it to 6 ('normal').
+ * kernel changes it to 6 ('normal'). However, if it is desirable to retain the
+ * original initial EPB value, intel_epb=preserve can be set to enforce it.
  */
 
 static DEFINE_PER_CPU(u8, saved_epb);
@@ -75,6 +76,8 @@ static u8 energ_perf_values[] = {
 	[EPB_INDEX_POWERSAVE] = ENERGY_PERF_BIAS_POWERSAVE,
 };
 
+static bool intel_epb_no_override __read_mostly;
+
 static int intel_epb_save(void)
 {
 	u64 epb;
@@ -106,7 +109,7 @@ static void intel_epb_restore(void)
 		 * ('normal').
 		 */
 		val = epb & EPB_MASK;
-		if (val == ENERGY_PERF_BIAS_PERFORMANCE) {
+		if (!intel_epb_no_override && val == ENERGY_PERF_BIAS_PERFORMANCE) {
 			val = energ_perf_values[EPB_INDEX_NORMAL];
 			pr_warn_once("ENERGY_PERF_BIAS: Set to 'normal', was 'performance'\n");
 		}
@@ -213,6 +216,20 @@ static const struct x86_cpu_id intel_epb_normal[] = {
 	{}
 };
 
+static __init int parse_intel_epb(char *str)
+{
+	if (!str)
+		return 0;
+
+	/* "intel_epb=preserve" prevents PERFORMANCE->NORMAL on restore. */
+	if (!strcmp(str, "preserve"))
+		intel_epb_no_override = true;
+
+	return 0;
+}
+
+early_param("intel_epb", parse_intel_epb);
+
 static __init int intel_epb_init(void)
 {
 	const struct x86_cpu_id *id = x86_match_cpu(intel_epb_normal);
-- 
2.40.1
Re: [PATCH v5] x86: intel_epb: Add earlyparam option to keep bias at performance
Posted by Dave Hansen 1 year, 12 months ago
On 1/3/24 06:46, Jack Allister wrote:
> +	intel_epb=	[X86]
> +			auto
> +			  Same as not passing a parameter to intel_epb. This will
> +			  ensure that the intel_epb module will restore the energy
> +			  performance bias to "normal" at boot-time. This workaround
> +			  is for buggy BIOSes which may not set this value and cause
> +			  either overheating or excess power usage.
> +			preserve
> +			  At kernel boot-time if the EPB value is read as "performance"
> +			  keep it at this value. This prevents the "performance" -> "normal"
> +			  transition which is a workaround mentioned above.

This ends up describing the nitty-gritty details of the implementation
instead of what users should take away from the options.  Could we up
level this a bit?

How about this?

	intel_epb=	[X86]
			
			auto (default)
				Work around buggy BIOSes to avoid
				excess power usage by forcing
				performance bias to "normal" at boot-
				time.
			
			preserve
				Do not override the existing performance
				bias setting.  Useful if a previous
				kernel or bootloader's setting is more
				desirable than "normal".

It's better formatted and uses the "(default)" tag instead of trying to
explain it in prose.  It also explains when someone might want to use
the override instead of just explaining its function.
[PATCH v6] x86: intel_epb: Add earlyparam option to keep bias at performance
Posted by Jack Allister 1 year, 11 months ago
Buggy BIOSes may not set a sane boot-time Energy Performance Bias (EPB).
A result of this may be overheating or excess power usage. The kernel
overrides any boot-time EPB "performance" bias to "normal" to avoid this.

When used in data centers it is preferable keep the EPB at "performance"
when performing a live-update of the host kernel via a kexec to the new
kernel. This is due to boot-time being critical when performing the kexec
as running guest VMs will perceieve this as latency or downtime.

On Intel Xeon Ice Lake platforms it has been observed that a combination of
EPB being set to "normal" alongside HWP (Intel Hardware P-states) being
enabled/configured during or close to the kexec causes an increases the
live-update/kexec downtime by 7 times compared to when the EPB is set to
"performance".

Introduce a command-line parameter, "intel_epb=preserve", to skip the
"performance" -> "normal" override/workaround. This maintains prior
functionality when no parameter is set, but adds in the ability to stay at
performance for a speedy kexec if a user wishes.

Signed-off-by: Jack Allister <jalliste@amazon.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Cc: Paul Durrant <pdurrant@amazon.com>
Cc: Jue Wang <juew@amazon.com>
Cc: Usama Arif <usama.arif@bytedance.com>
---
 .../admin-guide/kernel-parameters.txt         |  9 ++++++++
 arch/x86/kernel/cpu/intel_epb.c               | 22 +++++++++++++++++--
 2 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 65731b060e3f..d28f2fc41c0c 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2148,6 +2148,15 @@
 			0	disables intel_idle and fall back on acpi_idle.
 			1 to 9	specify maximum depth of C-state.
 
+	intel_epb=	[X86]
+			auto (default)
+			  Work around buggy BIOSes to avoid excess power usage
+			  by forcing the performance bias to "normal" at boot-time.
+			preserve
+			  Do not override the existing performance bias setting.
+			  Useful if a previous kernel or bootloader's setting is
+			  more desirable than "normal".
+
 	intel_pstate=	[X86]
 			disable
 			  Do not enable intel_pstate as the default
diff --git a/arch/x86/kernel/cpu/intel_epb.c b/arch/x86/kernel/cpu/intel_epb.c
index e4c3ba91321c..01d406177751 100644
--- a/arch/x86/kernel/cpu/intel_epb.c
+++ b/arch/x86/kernel/cpu/intel_epb.c
@@ -50,7 +50,8 @@
  * the OS will do that anyway.  That sometimes is problematic, as it may cause
  * the system battery to drain too fast, for example, so it is better to adjust
  * it on CPU bring-up and if the initial EPB value for a given CPU is 0, the
- * kernel changes it to 6 ('normal').
+ * kernel changes it to 6 ('normal'). However, if it is desirable to retain the
+ * original initial EPB value, intel_epb=preserve can be set to enforce it.
  */
 
 static DEFINE_PER_CPU(u8, saved_epb);
@@ -75,6 +76,8 @@ static u8 energ_perf_values[] = {
 	[EPB_INDEX_POWERSAVE] = ENERGY_PERF_BIAS_POWERSAVE,
 };
 
+static bool intel_epb_no_override __read_mostly;
+
 static int intel_epb_save(void)
 {
 	u64 epb;
@@ -106,7 +109,7 @@ static void intel_epb_restore(void)
 		 * ('normal').
 		 */
 		val = epb & EPB_MASK;
-		if (val == ENERGY_PERF_BIAS_PERFORMANCE) {
+		if (!intel_epb_no_override && val == ENERGY_PERF_BIAS_PERFORMANCE) {
 			val = energ_perf_values[EPB_INDEX_NORMAL];
 			pr_warn_once("ENERGY_PERF_BIAS: Set to 'normal', was 'performance'\n");
 		}
@@ -213,6 +216,21 @@ static const struct x86_cpu_id intel_epb_normal[] = {
 	{}
 };
 
+static __init int parse_intel_epb(char *str)
+{
+	if (!str)
+		return 0;
+
+	/* "intel_epb=preserve" prevents PERFORMANCE->NORMAL on restore. */
+	if (!strcmp(str, "preserve"))
+		intel_epb_no_override = true;
+
+	/* "intel_epb=auto" not explicitly checked as default behaviour. */
+	return 0;
+}
+
+early_param("intel_epb", parse_intel_epb);
+
 static __init int intel_epb_init(void)
 {
 	const struct x86_cpu_id *id = x86_match_cpu(intel_epb_normal);
-- 
2.40.1
RE: [PATCH v6] x86: intel_epb: Add earlyparam option to keep bias at performance
Posted by Durrant, Paul 1 year, 11 months ago
> -----Original Message-----
> From: Jack Allister <jalliste@amazon.com>
> Sent: 04 January 2024 09:06
> Cc: Allister, Jack <jalliste@amazon.co.uk>; Rafael J . Wysocki
> <rafael@kernel.org>; Durrant, Paul <pdurrant@amazon.co.uk>; Wang, Jue
> <juew@amazon.com>; Usama Arif <usama.arif@bytedance.com>; Jonathan Corbet
> <corbet@lwn.net>; Thomas Gleixner <tglx@linutronix.de>; Ingo Molnar
> <mingo@redhat.com>; Borislav Petkov <bp@alien8.de>; Dave Hansen
> <dave.hansen@linux.intel.com>; x86@kernel.org; H. Peter Anvin
> <hpa@zytor.com>; Paul E. McKenney <paulmck@kernel.org>; Randy Dunlap
> <rdunlap@infradead.org>; Tejun Heo <tj@kernel.org>; Peter Zijlstra
> <peterz@infradead.org>; Yan-Jie Wang <yanjiewtw@gmail.com>; Hans de Goede
> <hdegoede@redhat.com>; linux-doc@vger.kernel.org; linux-
> kernel@vger.kernel.org
> Subject: [PATCH v6] x86: intel_epb: Add earlyparam option to keep bias at
> performance
> 
> Buggy BIOSes may not set a sane boot-time Energy Performance Bias (EPB).
> A result of this may be overheating or excess power usage. The kernel
> overrides any boot-time EPB "performance" bias to "normal" to avoid this.
> 
> When used in data centers it is preferable keep the EPB at "performance"
> when performing a live-update of the host kernel via a kexec to the new
> kernel. This is due to boot-time being critical when performing the kexec
> as running guest VMs will perceieve this as latency or downtime.
> 
> On Intel Xeon Ice Lake platforms it has been observed that a combination
> of
> EPB being set to "normal" alongside HWP (Intel Hardware P-states) being
> enabled/configured during or close to the kexec causes an increases the
> live-update/kexec downtime by 7 times compared to when the EPB is set to
> "performance".
> 
> Introduce a command-line parameter, "intel_epb=preserve", to skip the
> "performance" -> "normal" override/workaround. This maintains prior
> functionality when no parameter is set, but adds in the ability to stay at
> performance for a speedy kexec if a user wishes.
> 
> Signed-off-by: Jack Allister <jalliste@amazon.com>
> Acked-by: Rafael J. Wysocki <rafael@kernel.org>
> Cc: Paul Durrant <pdurrant@amazon.com>
> Cc: Jue Wang <juew@amazon.com>
> Cc: Usama Arif <usama.arif@bytedance.com>
> ---
>  .../admin-guide/kernel-parameters.txt         |  9 ++++++++
>  arch/x86/kernel/cpu/intel_epb.c               | 22 +++++++++++++++++--
>  2 files changed, 29 insertions(+), 2 deletions(-)
> 

Reviewed-by: Paul Durrant <pdurrant@amazon.com>
Re: [External] [PATCH v5] x86: intel_epb: Add earlyparam option to keep bias at performance
Posted by Usama Arif 1 year, 12 months ago

On 03/01/2024 14:46, Jack Allister wrote:
> Buggy BIOSes may not set a sane boot-time Energy Performance Bias (EPB).
> A result of this may be overheating or excess power usage. The kernel
> overrides any boot-time EPB "performance" bias to "normal" to avoid this.
> 
> When used in data centers it is preferable keep the EPB at "performance"
> when performing a live-update of the host kernel via a kexec to the new
> kernel. This is due to boot-time being critical when performing the kexec
> as running guest VMs will perceieve this as latency or downtime.
> 
> On Intel Xeon Ice Lake platforms it has been observed that a combination of
> EPB being set to "normal" alongside HWP (Intel Hardware P-states) being
> enabled/configured during or close to the kexec causes an increases the
> live-update/kexec downtime by 7 times compared to when the EPB is set to
> "performance".
> 
> Introduce a command-line parameter, "intel_epb=preserve", to skip the
> "performance" -> "normal" override/workaround. This maintains prior
> functionality when no parameter is set, but adds in the ability to stay at
> performance for a speedy kexec if a user wishes.
> 
> Signed-off-by: Jack Allister <jalliste@amazon.com>
> Acked-by: Rafael J. Wysocki <rafael@kernel.org>
> Cc: Paul Durrant <pdurrant@amazon.com>
> Cc: Jue Wang <juew@amazon.com>
> Cc: Usama Arif <usama.arif@bytedance.com>
> ---
>   .../admin-guide/kernel-parameters.txt         | 12 +++++++++++
>   arch/x86/kernel/cpu/intel_epb.c               | 21 +++++++++++++++++--
>   2 files changed, 31 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 65731b060e3f..5602ee213115 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -2148,6 +2148,18 @@
>   			0	disables intel_idle and fall back on acpi_idle.
>   			1 to 9	specify maximum depth of C-state.
>   
> +	intel_epb=	[X86]
> +			auto
> +			  Same as not passing a parameter to intel_epb. This will
> +			  ensure that the intel_epb module will restore the energy
> +			  performance bias to "normal" at boot-time. This workaround
> +			  is for buggy BIOSes which may not set this value and cause
> +			  either overheating or excess power usage.
Hi,

Thanks for the patch. Is auto needed over here? It was pointed in an 
earlier review that it could be an option, but it doesn't seem to serve 
a purpose. Its also not how the code works, i.e. intel_epb=abc would be 
the same as intel_epb=auto. Just could add a print if intel_epb=preserve 
is not encountered then an unexpected value has been passed in.

> +			preserve
> +			  At kernel boot-time if the EPB value is read as "performance"
> +			  keep it at this value. This prevents the "performance" -> "normal"
> +			  transition which is a workaround mentioned above.
> +
>   	intel_pstate=	[X86]
>   			disable
>   			  Do not enable intel_pstate as the default
> diff --git a/arch/x86/kernel/cpu/intel_epb.c b/arch/x86/kernel/cpu/intel_epb.c
> index e4c3ba91321c..419e699a43e6 100644
> --- a/arch/x86/kernel/cpu/intel_epb.c
> +++ b/arch/x86/kernel/cpu/intel_epb.c
> @@ -50,7 +50,8 @@
>    * the OS will do that anyway.  That sometimes is problematic, as it may cause
>    * the system battery to drain too fast, for example, so it is better to adjust
>    * it on CPU bring-up and if the initial EPB value for a given CPU is 0, the
> - * kernel changes it to 6 ('normal').
> + * kernel changes it to 6 ('normal'). However, if it is desirable to retain the
> + * original initial EPB value, intel_epb=preserve can be set to enforce it.
>    */
>   
>   static DEFINE_PER_CPU(u8, saved_epb);
> @@ -75,6 +76,8 @@ static u8 energ_perf_values[] = {
>   	[EPB_INDEX_POWERSAVE] = ENERGY_PERF_BIAS_POWERSAVE,
>   };
>   
> +static bool intel_epb_no_override __read_mostly;
> +
>   static int intel_epb_save(void)
>   {
>   	u64 epb;
> @@ -106,7 +109,7 @@ static void intel_epb_restore(void)
>   		 * ('normal').
>   		 */
>   		val = epb & EPB_MASK;
> -		if (val == ENERGY_PERF_BIAS_PERFORMANCE) {
> +		if (!intel_epb_no_override && val == ENERGY_PERF_BIAS_PERFORMANCE) {
>   			val = energ_perf_values[EPB_INDEX_NORMAL];
>   			pr_warn_once("ENERGY_PERF_BIAS: Set to 'normal', was 'performance'\n");
>   		}
> @@ -213,6 +216,20 @@ static const struct x86_cpu_id intel_epb_normal[] = {
>   	{}
>   };
>   
> +static __init int parse_intel_epb(char *str)
> +{
> +	if (!str)
> +		return 0;
> +
> +	/* "intel_epb=preserve" prevents PERFORMANCE->NORMAL on restore. */
> +	if (!strcmp(str, "preserve"))
> +		intel_epb_no_override = true;
> +
Maybe add an print in else here to say that unexpected value has been 
encountered for intel_epb if preserve is not seen.

Thanks,
Usama
> +	return 0;
> +}
> +
> +early_param("intel_epb", parse_intel_epb);
> +
>   static __init int intel_epb_init(void)
>   {
>   	const struct x86_cpu_id *id = x86_match_cpu(intel_epb_normal);
Re: [PATCH v5] x86: intel_epb: Add earlyparam option to keep bias at performance
Posted by Jack Allister 1 year, 11 months ago
> Thanks for the patch. Is auto needed over here? It was pointed in an 
> earlier review that it could be an option, but it doesn't seem to serve 
> a purpose.

Auto is effectively just the default as if no parameter is passed in here.
In the reply from Dave for he has mentioned that displaying it like this
may actually be clearer.

```
	intel_epb=	[X86]
			
			auto (default)
```

As we're not implicitly not taking any action for this default case it
doesn't make too much sense to add in a specific strcmp case for auto,
however what I can do is add a comment within the code to explicitly show
that this is effectively a no-op when parsing.


> Maybe add an print in else here to say that unexpected value has been 
> encountered for intel_epb if preserve is not seen.

I'd be hesitant to do this as we already have the pr_warn_once during the
intel_epb_restore path when defaulting from perf -> normal.
[PATCH v3] x86: intel_epb: Add earlyparam option to keep bias at performance
Posted by Jack Allister 2 years ago
There are certain scenarios where it may be intentional that the EPB was
set at to 0/ENERGY_PERF_BIAS_PERFORMANCE on kernel boot. For example, in
data centers a kexec/live-update of the kernel may be performed regularly.

Usually this live-update is time critical and defaulting of the bias back
to ENERGY_PERF_BIAS_NORMAL may actually be detrimental to the overall
update time if processors' time to ramp up/boost are affected.

This patch introduces a kernel command line "intel_epb_no_override"
which will leave the EPB at performance if during the restoration code path
it is detected as such.

Signed-off-by: Jack Allister <jalliste@amazon.com>
Cc: Paul Durrant <pdurrant@amazon.com>
Cc: Jue Wang <juew@amazon.com>
Cc: Usama Arif <usama.arif@bytedance.com>
---
 arch/x86/kernel/cpu/intel_epb.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel_epb.c b/arch/x86/kernel/cpu/intel_epb.c
index e4c3ba91321c..cbe0e224b8d9 100644
--- a/arch/x86/kernel/cpu/intel_epb.c
+++ b/arch/x86/kernel/cpu/intel_epb.c
@@ -50,7 +50,8 @@
  * the OS will do that anyway.  That sometimes is problematic, as it may cause
  * the system battery to drain too fast, for example, so it is better to adjust
  * it on CPU bring-up and if the initial EPB value for a given CPU is 0, the
- * kernel changes it to 6 ('normal').
+ * kernel changes it to 6 ('normal'). This however is overridable via
+ * intel_epb_no_override if required.
  */
 
 static DEFINE_PER_CPU(u8, saved_epb);
@@ -75,6 +76,8 @@ static u8 energ_perf_values[] = {
 	[EPB_INDEX_POWERSAVE] = ENERGY_PERF_BIAS_POWERSAVE,
 };
 
+static bool intel_epb_no_override __read_mostly;
+
 static int intel_epb_save(void)
 {
 	u64 epb;
@@ -106,7 +109,7 @@ static void intel_epb_restore(void)
 		 * ('normal').
 		 */
 		val = epb & EPB_MASK;
-		if (val == ENERGY_PERF_BIAS_PERFORMANCE) {
+		if (!intel_epb_no_override && val == ENERGY_PERF_BIAS_PERFORMANCE) {
 			val = energ_perf_values[EPB_INDEX_NORMAL];
 			pr_warn_once("ENERGY_PERF_BIAS: Set to 'normal', was 'performance'\n");
 		}
@@ -213,6 +216,12 @@ static const struct x86_cpu_id intel_epb_normal[] = {
 	{}
 };
 
+static __init int intel_epb_no_override_setup(char *str)
+{
+	return kstrtobool(str, &intel_epb_no_override);
+}
+early_param("intel_epb_no_override", intel_epb_no_override_setup);
+
 static __init int intel_epb_init(void)
 {
 	const struct x86_cpu_id *id = x86_match_cpu(intel_epb_normal);
-- 
2.40.1

Sorry it looks like I had missed the v2 flag from the subject, also the
commit message did not include the correct rename compared to v1.

This should all be fixed in v3 now.
Re: [PATCH v3] x86: intel_epb: Add earlyparam option to keep bias at performance
Posted by Dave Hansen 2 years ago
On 12/5/23 05:23, Jack Allister wrote:
> There are certain scenarios where it may be intentional that the EPB was
> set at to 0/ENERGY_PERF_BIAS_PERFORMANCE on kernel boot. For example, in
> data centers a kexec/live-update of the kernel may be performed regularly.
> 
> Usually this live-update is time critical and defaulting of the bias back
> to ENERGY_PERF_BIAS_NORMAL may actually be detrimental to the overall
> update time if processors' time to ramp up/boost are affected.

If this makes your kexecs 7 times faster, please say that here.

Could we also please make this less wishy-washy?  "May actually be
detrimental" does not scream how critical this is for you.

> This patch introduces a kernel command line "intel_epb_no_override"
> which will leave the EPB at performance if during the restoration code path
> it is detected as such.

No "this patch", please:

	https://www.kernel.org/doc/html/next/process/maintainer-tip.html

This also needs documentation of the parameter in
Documentation/admin-guide/kernel-parameters.txt.

Let me see if I can write a sane changelog, summarizing the discussion
here for posterity.  If there's confusion about a v1 patch that's
cleared up in the discussion, it would be wonderful to capture that in
the v2 changelog as opposed to making minimal changes.  How's this?  I
think it captures some of the things that Rafael related and also
additional information about the use case that motivated this effort.

--

Buggy BIOSes set a sane boot-time Energy Performance Bias (EPB) that
causes overheating.  The kernel overrides any boot-time EPB
"performance" bias to "normal" to avoid this.

<Hardware name here> platforms can tolerate a "performance" bias during
boot without overheating.  In addition, because of <root cause(s) here>,
a kexec with a "normal" bias is seven times slower than "performance" to
perform the kexec.  Boot time is critical when performing a
kexec/live-update of the kernel which is running guests VMs since boot
time appears as guest latency or downtime.

Introduce a command-line parameter, "intel_epb_no_override", to skip the
"performance"=>"normal" override.  This allows folks to get a speedy
kexec without exposing other folks with wonky BIOSes to overheating.
Re: [PATCH] x86: intel_epb: Add earlyparam option to keep bias at performance
Posted by Dave Hansen 2 years ago
On 12/4/23 09:28, Jack Allister wrote:
> There are certain scenarios where it may be intentional that the EPB was
> set at to 0/ENERGY_PERF_BIAS_PERFORMANCE on kernel boot. For example, in
> data centers a kexec/live-update of the kernel may be performed regularly.
> 
> Usually this live-update is time critical and defaulting of the bias back
> to ENERGY_PERF_BIAS_NORMAL may actually be detrimental to the overall
> update time if processors' time to ramp up/boost are affected.
> 
> This patch introduces a kernel command line "intel_epb_keep_performance"
> which will leave the EPB at performance if during the restoration code path
> it is detected as such.

Folks, while I appreciate the effort to upstream thing that you have
kept out of tree up until now, I don't think this is the right way.

In general new kernel command-line options are a last resort.

> diff --git a/arch/x86/kernel/cpu/intel_epb.c b/arch/x86/kernel/cpu/intel_epb.c
> index e4c3ba91321c..0c7dd092f723 100644
> --- a/arch/x86/kernel/cpu/intel_epb.c
> +++ b/arch/x86/kernel/cpu/intel_epb.c
> @@ -50,7 +50,8 @@
>   * the OS will do that anyway.  That sometimes is problematic, as it may cause
>   * the system battery to drain too fast, for example, so it is better to adjust
>   * it on CPU bring-up and if the initial EPB value for a given CPU is 0, the
> - * kernel changes it to 6 ('normal').
> + * kernel changes it to 6 ('normal'). This however is overridable via
> + * intel_epb_keep_performance if required.
>   */
>  
>  static DEFINE_PER_CPU(u8, saved_epb);
> @@ -75,6 +76,8 @@ static u8 energ_perf_values[] = {
>  	[EPB_INDEX_POWERSAVE] = ENERGY_PERF_BIAS_POWERSAVE,
>  };
>  
> +static bool intel_epb_keep_performance __read_mostly;
> +
>  static int intel_epb_save(void)
>  {
>  	u64 epb;
> @@ -107,8 +110,12 @@ static void intel_epb_restore(void)
>  		 */
>  		val = epb & EPB_MASK;
>  		if (val == ENERGY_PERF_BIAS_PERFORMANCE) {
> -			val = energ_perf_values[EPB_INDEX_NORMAL];
> -			pr_warn_once("ENERGY_PERF_BIAS: Set to 'normal', was 'performance'\n");
> +			if (!intel_epb_keep_performance) {
> +				val = energ_perf_values[EPB_INDEX_NORMAL];
> +				pr_warn_once("ENERGY_PERF_BIAS: Set to 'normal', was 'performance'\n");
> +			} else {
> +				pr_warn_once("ENERGY_PERF_BIAS: Kept at 'performance', no change\n");
> +			}
>  		}

This is fundamentally a hack.

It sounds like you want the system default to be at
ENERGY_PERF_BIAS_PERFORMANCE.  You also mentioned that this was done "on
kernel boot".  How did you do that, exactly?  Shouldn't that "on kernel
boot" action be reflected over here rather than introducing another
command-line parameter?
Re: [PATCH] x86: intel_epb: Add earlyparam option to keep bias at performance
Posted by Jack Allister 2 years ago
> This is fundamentally a hack.

I do not totally agree that this is a hack, the setting of it to the
ENERGY_PERF_BIAS_NORMAL is an equivalent workaround which is used for
systems where the platform firmware has not configured it as expected
for laptops etc. and that is already present.

> It sounds like you want the system default to be at
> ENERGY_PERF_BIAS_PERFORMANCE.  You also mentioned that this was done "on
> kernel boot".  How did you do that, exactly?  Shouldn't that "on kernel
> boot" action be reflected over here rather than introducing another
> command-line parameter?

As Paul has mentioned, we perform live-updates of the host kernel running
on servers. This is done while virtual machines are still running so that
there is no perceived downtime for the guest/customer. This requires a
kexec into the new kernel and there are specific areas such as PCI device
enumeration which can take a substantial amount of time in it's current
form and can be perceived as downtime while the kernel is loading.

> Shouldn't that "on kernel
> boot" action be reflected over here rather than introducing another
> command-line parameter?

A kernel parameter may not be the most elegant solution, would a proposal
for a kernel build configuration be a bit more suitable?
RE: [PATCH] x86: intel_epb: Add earlyparam option to keep bias at performance
Posted by Durrant, Paul 2 years ago
> -----Original Message-----
> From: Dave Hansen <dave.hansen@intel.com>
> Sent: 04 December 2023 17:45
> To: Allister, Jack <jalliste@amazon.co.uk>; tglx@linutronix.de;
> mingo@redhat.com; bp@alien8.de; dave.hansen@linux.intel.com;
> hpa@zytor.com; rafael@kernel.org; len.brown@intel.com
> Cc: Durrant, Paul <pdurrant@amazon.co.uk>; Wang, Jue <juew@amazon.com>;
> Usama Arif <usama.arif@bytedance.com>; x86@kernel.org; Hans de Goede
> <hdegoede@redhat.com>; Peter Zijlstra <peterz@infradead.org>; Rafael J.
> Wysocki <rafael.j.wysocki@intel.com>; linux-kernel@vger.kernel.org
> Subject: RE: [EXTERNAL] [PATCH] x86: intel_epb: Add earlyparam option to
> keep bias at performance
> 
> CAUTION: This email originated from outside of the organization. Do not
> click links or open attachments unless you can confirm the sender and know
> the content is safe.
> 
> 
> 
> On 12/4/23 09:28, Jack Allister wrote:
> > There are certain scenarios where it may be intentional that the EPB was
> > set at to 0/ENERGY_PERF_BIAS_PERFORMANCE on kernel boot. For example, in
> > data centers a kexec/live-update of the kernel may be performed
> regularly.
> >
> > Usually this live-update is time critical and defaulting of the bias
> back
> > to ENERGY_PERF_BIAS_NORMAL may actually be detrimental to the overall
> > update time if processors' time to ramp up/boost are affected.
> >
> > This patch introduces a kernel command line "intel_epb_keep_performance"
> > which will leave the EPB at performance if during the restoration code
> path
> > it is detected as such.
> 
> Folks, while I appreciate the effort to upstream thing that you have
> kept out of tree up until now, I don't think this is the right way.
> 
> In general new kernel command-line options are a last resort.
> 
> > diff --git a/arch/x86/kernel/cpu/intel_epb.c
> b/arch/x86/kernel/cpu/intel_epb.c
> > index e4c3ba91321c..0c7dd092f723 100644
> > --- a/arch/x86/kernel/cpu/intel_epb.c
> > +++ b/arch/x86/kernel/cpu/intel_epb.c
> > @@ -50,7 +50,8 @@
> >   * the OS will do that anyway.  That sometimes is problematic, as it
> may cause
> >   * the system battery to drain too fast, for example, so it is better
> to adjust
> >   * it on CPU bring-up and if the initial EPB value for a given CPU is
> 0, the
> > - * kernel changes it to 6 ('normal').
> > + * kernel changes it to 6 ('normal'). This however is overridable via
> > + * intel_epb_keep_performance if required.
> >   */
> >
> >  static DEFINE_PER_CPU(u8, saved_epb);
> > @@ -75,6 +76,8 @@ static u8 energ_perf_values[] = {
> >       [EPB_INDEX_POWERSAVE] = ENERGY_PERF_BIAS_POWERSAVE,
> >  };
> >
> > +static bool intel_epb_keep_performance __read_mostly;
> > +
> >  static int intel_epb_save(void)
> >  {
> >       u64 epb;
> > @@ -107,8 +110,12 @@ static void intel_epb_restore(void)
> >                */
> >               val = epb & EPB_MASK;
> >               if (val == ENERGY_PERF_BIAS_PERFORMANCE) {
> > -                     val = energ_perf_values[EPB_INDEX_NORMAL];
> > -                     pr_warn_once("ENERGY_PERF_BIAS: Set to 'normal',
> was 'performance'\n");
> > +                     if (!intel_epb_keep_performance) {
> > +                             val = energ_perf_values[EPB_INDEX_NORMAL];
> > +                             pr_warn_once("ENERGY_PERF_BIAS: Set to
> 'normal', was 'performance'\n");
> > +                     } else {
> > +                             pr_warn_once("ENERGY_PERF_BIAS: Kept at
> 'performance', no change\n");
> > +                     }
> >               }
> 
> This is fundamentally a hack.
> 

Actually, it's working round a hack. The existing coment in the code just above that hunk is:

101                /*                                                                                                             
102                 * Because intel_epb_save() has not run for the current CPU yet,                                               
103                 * it is going online for the first time, so if its EPB value is                                               
104                 * 0 ('performance') at this point, assume that it has not been                                                
105                 * initialized by the platform firmware and set it to 6                                                        
106                 * ('normal').                                                                                                 
107                 */

> It sounds like you want the system default to be at
> ENERGY_PERF_BIAS_PERFORMANCE.  You also mentioned that this was done "on
> kernel boot".  How did you do that, exactly?  Shouldn't that "on kernel
> boot" action be reflected over here rather than introducing another
> command-line parameter?
> 

The problem is that this will take effect even on a kexec and hence it is throttling
a system that set ENERGY_PERF_BIAS_PERFORMANCE prior to the kexec.  We use kexec to
live update the host kernel of our systems whilst leaving virtual machines running.
This resetting of the perf bias is having a very detrimental effect on the downtime
of our systems across the live update - about a 7 fold increase.

  Paul
RE: [PATCH] x86: intel_epb: Add earlyparam option to keep bias at performance
Posted by David Woodhouse 2 years ago
Paul writes:
> The problem is that this will take effect even on a kexec and hence it is throttling
> a system that set ENERGY_PERF_BIAS_PERFORMANCE prior to the kexec.  We use kexec to
> live update the host kernel of our systems whilst leaving virtual machines running.
> This resetting of the perf bias is having a very detrimental effect on the downtime
> of our systems across the live update - about a 7 fold increase.

It isn't just about kexec, is it? Even in a clean boot why wouldn't we want to stay in performance mode until the kernel has *finished* booting? It's literally adding seconds to the startup time in some cases.

And yes, we *particularly* care in the kexec case because guests experience it as excessive steal time. But it ain't great in the general case either, surely?
Re: [PATCH] x86: intel_epb: Add earlyparam option to keep bias at performance
Posted by Rafael J. Wysocki 2 years ago
On Tue, Dec 5, 2023 at 1:00 PM David Woodhouse <dwmw2@infradead.org> wrote:
>
>
> Paul writes:
> > The problem is that this will take effect even on a kexec and hence it is throttling
> > a system that set ENERGY_PERF_BIAS_PERFORMANCE prior to the kexec.  We use kexec to
> > live update the host kernel of our systems whilst leaving virtual machines running.
> > This resetting of the perf bias is having a very detrimental effect on the downtime
> > of our systems across the live update - about a 7 fold increase.
>
> It isn't just about kexec, is it? Even in a clean boot why wouldn't we want to stay in performance mode until the kernel has *finished* booting?

Because it may overheat during that period.

> It's literally adding seconds to the startup time in some cases.
>
> And yes, we *particularly* care in the kexec case because guests experience it as excessive steal time. But it ain't great in the general case either, surely?

So IMV it would be perfectly fine to add a command line arg to provide
the initial value of energy_perf_bias for the ones who know what they
are doing.
Re: [PATCH] x86: intel_epb: Add earlyparam option to keep bias at performance
Posted by Dave Hansen 2 years ago
On 12/5/23 04:12, Rafael J. Wysocki wrote:
>> And yes, we *particularly* care in the kexec case because guests experience it as excessive steal time. But it ain't great in the general case either, surely?
> So IMV it would be perfectly fine to add a command line arg to provide
> the initial value of energy_perf_bias for the ones who know what they
> are doing.

Let's say we're on a system where the default is "normal" and the user
actually decides they want "performance"?  Is that rational?  Should the
command-line be more general in specifying a desired performance level
instead of just flipping the hack on and off?

We could, for instance just support this pair:

	intel_epb=auto		(default, will hack performance=>normal)
	intel_epb=preserve	(leave it alone)

for now.  That would give us the existing behavior and the override that
folks want for kexec.  But it would also leave open the possibility to
do things like this in the future:

	intel_epb=normal	(always override to normal)
	intel_epb=performance	(always override to performance)
Re: [PATCH] x86: intel_epb: Add earlyparam option to keep bias at performance
Posted by Dave Hansen 2 years ago
On 12/5/23 07:19, Dave Hansen wrote:
> We could, for instance just support this pair:
> 
> 	intel_epb=auto		(default, will hack performance=>normal)
> 	intel_epb=preserve	(leave it alone)
> 
> for now. 

Oh, and in code, this is literally as simple as:

-early_param("intel_epb_no_override", intel_epb_no_override_setup);
+early_param("intel_epb=preserve", intel_epb_no_override_setup);

You don't even need to go looking for "=auto" if you only have one other
option.