From nobody Fri Dec 26 15:09:19 2025 Received: from smtp-fw-80006.amazon.com (smtp-fw-80006.amazon.com [99.78.197.217]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 85B041A5B6; Wed, 3 Jan 2024 14:46:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.co.uk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amazon.com header.i=@amazon.com header.b="OKle+6Yq" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1704293184; x=1735829184; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=qMGObbzevwrsqS9R/lsiIBQqNq1KVLRx8f7pbsk2p24=; b=OKle+6YqiH82tixAKYSUOhrOWx3fTevYshm2u5jh38kzzHfGlrdhiu+C iAt1jFReDMNd1r2KtGyOHqXXevdcRMf7+PAzo4Jy7gTGUXWMFo8Y3EfYp YTLW64I6m9M6HX7tWMT7PjKgk1H09FkI6yVbfaMypGlhD4iKlM4jJU3Ty k=; X-IronPort-AV: E=Sophos;i="6.04,327,1695686400"; d="scan'208";a="263423795" Received: from pdx4-co-svc-p1-lb2-vlan3.amazon.com (HELO email-inbound-relay-pdx-2a-m6i4x-3ef535ca.us-west-2.amazon.com) ([10.25.36.214]) by smtp-border-fw-80006.pdx80.corp.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Jan 2024 14:46:22 +0000 Received: from smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev (pdx2-ws-svc-p26-lb5-vlan2.pdx.amazon.com [10.39.38.66]) by email-inbound-relay-pdx-2a-m6i4x-3ef535ca.us-west-2.amazon.com (Postfix) with ESMTPS id E2E8660A99; Wed, 3 Jan 2024 14:46:20 +0000 (UTC) Received: from EX19MTAEUB001.ant.amazon.com [10.0.43.254:30180] by smtpin.naws.eu-west-1.prod.farcaster.email.amazon.dev [10.0.16.243:2525] with esmtp (Farcaster) id f9054cef-49b7-484a-b648-0b1e618e9049; Wed, 3 Jan 2024 14:46:19 +0000 (UTC) X-Farcaster-Flow-ID: f9054cef-49b7-484a-b648-0b1e618e9049 Received: from EX19D033EUB004.ant.amazon.com (10.252.61.103) by EX19MTAEUB001.ant.amazon.com (10.252.51.26) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Wed, 3 Jan 2024 14:46:19 +0000 Received: from EX19MTAUEC001.ant.amazon.com (10.252.135.222) by EX19D033EUB004.ant.amazon.com (10.252.61.103) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Wed, 3 Jan 2024 14:46:19 +0000 Received: from dev-dsk-jalliste-1c-e3349c3e.eu-west-1.amazon.com (10.13.244.142) by mail-relay.amazon.com (10.252.135.200) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40 via Frontend Transport; Wed, 3 Jan 2024 14:46:17 +0000 From: Jack Allister To: CC: Jack Allister , "Rafael J . Wysocki" , Paul Durrant , Jue Wang , Usama Arif , Jonathan Corbet , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , , "H. Peter Anvin" , "Paul E. McKenney" , Randy Dunlap , Tejun Heo , Peter Zijlstra , Yan-Jie Wang , Hans de Goede , , Subject: [PATCH v5] x86: intel_epb: Add earlyparam option to keep bias at performance Date: Wed, 3 Jan 2024 14:46:04 +0000 Message-ID: <20240103144607.46369-1-jalliste@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <83431857-7182-471a-9ff1-9dac37e5a02f@intel.com> References: <83431857-7182-471a-9ff1-9dac37e5a02f@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Buggy BIOSes may not set a sane boot-time Energy Performance Bias (EPB). A result of this may be overheating or excess power usage. The kernel overrides any boot-time EPB "performance" bias to "normal" to avoid this. When used in data centers it is preferable keep the EPB at "performance" when performing a live-update of the host kernel via a kexec to the new kernel. This is due to boot-time being critical when performing the kexec as running guest VMs will perceieve this as latency or downtime. On Intel Xeon Ice Lake platforms it has been observed that a combination of EPB being set to "normal" alongside HWP (Intel Hardware P-states) being enabled/configured during or close to the kexec causes an increases the live-update/kexec downtime by 7 times compared to when the EPB is set to "performance". Introduce a command-line parameter, "intel_epb=3Dpreserve", to skip the "performance" -> "normal" override/workaround. This maintains prior functionality when no parameter is set, but adds in the ability to stay at performance for a speedy kexec if a user wishes. Signed-off-by: Jack Allister Acked-by: Rafael J. Wysocki Cc: Paul Durrant Cc: Jue Wang Cc: Usama Arif --- .../admin-guide/kernel-parameters.txt | 12 +++++++++++ arch/x86/kernel/cpu/intel_epb.c | 21 +++++++++++++++++-- 2 files changed, 31 insertions(+), 2 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentatio= n/admin-guide/kernel-parameters.txt index 65731b060e3f..5602ee213115 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -2148,6 +2148,18 @@ 0 disables intel_idle and fall back on acpi_idle. 1 to 9 specify maximum depth of C-state. =20 + intel_epb=3D [X86] + auto + Same as not passing a parameter to intel_epb. This will + ensure that the intel_epb module will restore the energy + performance bias to "normal" at boot-time. This workaround + is for buggy BIOSes which may not set this value and cause + either overheating or excess power usage. + preserve + At kernel boot-time if the EPB value is read as "performance" + keep it at this value. This prevents the "performance" -> "normal" + transition which is a workaround mentioned above. + intel_pstate=3D [X86] disable Do not enable intel_pstate as the default diff --git a/arch/x86/kernel/cpu/intel_epb.c b/arch/x86/kernel/cpu/intel_ep= b.c index e4c3ba91321c..419e699a43e6 100644 --- a/arch/x86/kernel/cpu/intel_epb.c +++ b/arch/x86/kernel/cpu/intel_epb.c @@ -50,7 +50,8 @@ * the OS will do that anyway. That sometimes is problematic, as it may c= ause * the system battery to drain too fast, for example, so it is better to a= djust * it on CPU bring-up and if the initial EPB value for a given CPU is 0, t= he - * kernel changes it to 6 ('normal'). + * kernel changes it to 6 ('normal'). However, if it is desirable to retai= n the + * original initial EPB value, intel_epb=3Dpreserve can be set to enforce = it. */ =20 static DEFINE_PER_CPU(u8, saved_epb); @@ -75,6 +76,8 @@ static u8 energ_perf_values[] =3D { [EPB_INDEX_POWERSAVE] =3D ENERGY_PERF_BIAS_POWERSAVE, }; =20 +static bool intel_epb_no_override __read_mostly; + static int intel_epb_save(void) { u64 epb; @@ -106,7 +109,7 @@ static void intel_epb_restore(void) * ('normal'). */ val =3D epb & EPB_MASK; - if (val =3D=3D ENERGY_PERF_BIAS_PERFORMANCE) { + if (!intel_epb_no_override && val =3D=3D ENERGY_PERF_BIAS_PERFORMANCE) { val =3D energ_perf_values[EPB_INDEX_NORMAL]; pr_warn_once("ENERGY_PERF_BIAS: Set to 'normal', was 'performance'\n"); } @@ -213,6 +216,20 @@ static const struct x86_cpu_id intel_epb_normal[] =3D { {} }; =20 +static __init int parse_intel_epb(char *str) +{ + if (!str) + return 0; + + /* "intel_epb=3Dpreserve" prevents PERFORMANCE->NORMAL on restore. */ + if (!strcmp(str, "preserve")) + intel_epb_no_override =3D true; + + return 0; +} + +early_param("intel_epb", parse_intel_epb); + static __init int intel_epb_init(void) { const struct x86_cpu_id *id =3D x86_match_cpu(intel_epb_normal); --=20 2.40.1