From nobody Fri Feb 13 12:33:03 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1489ACE7A95 for ; Mon, 25 Sep 2023 08:11:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232706AbjIYILd (ORCPT ); Mon, 25 Sep 2023 04:11:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53698 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232643AbjIYILV (ORCPT ); Mon, 25 Sep 2023 04:11:21 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id E3C02FB; Mon, 25 Sep 2023 01:11:14 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D67081424; Mon, 25 Sep 2023 01:11:52 -0700 (PDT) Received: from e129166.arm.com (unknown [10.57.93.139]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 2F96B3F5A1; Mon, 25 Sep 2023 01:11:12 -0700 (PDT) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, mhiramat@kernel.org, qyousef@layalina.io, wvw@google.com Subject: [PATCH v4 01/18] PM: EM: Add missing newline for the message log Date: Mon, 25 Sep 2023 09:11:22 +0100 Message-Id: <20230925081139.1305766-2-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230925081139.1305766-1-lukasz.luba@arm.com> References: <20230925081139.1305766-1-lukasz.luba@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Fix missing newline for the string long in the error code path. Signed-off-by: Lukasz Luba --- kernel/power/energy_model.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c index 7b44f5b89fa1..8b9dd4a39f63 100644 --- a/kernel/power/energy_model.c +++ b/kernel/power/energy_model.c @@ -250,7 +250,7 @@ static void em_cpufreq_update_efficiencies(struct devic= e *dev) =20 policy =3D cpufreq_cpu_get(cpumask_first(em_span_cpus(pd))); if (!policy) { - dev_warn(dev, "EM: Access to CPUFreq policy failed"); + dev_warn(dev, "EM: Access to CPUFreq policy failed\n"); return; } =20 --=20 2.25.1 From nobody Fri Feb 13 12:33:03 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AACCBCE7A81 for ; Mon, 25 Sep 2023 08:11:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232702AbjIYILg (ORCPT ); Mon, 25 Sep 2023 04:11:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53726 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232693AbjIYIL0 (ORCPT ); Mon, 25 Sep 2023 04:11:26 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 00E5CFC; Mon, 25 Sep 2023 01:11:17 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 9C4551474; Mon, 25 Sep 2023 01:11:55 -0700 (PDT) Received: from e129166.arm.com (unknown [10.57.93.139]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id F2E633F5A1; Mon, 25 Sep 2023 01:11:14 -0700 (PDT) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, mhiramat@kernel.org, qyousef@layalina.io, wvw@google.com Subject: [PATCH v4 02/18] PM: EM: Refactor em_cpufreq_update_efficiencies() arguments Date: Mon, 25 Sep 2023 09:11:23 +0100 Message-Id: <20230925081139.1305766-3-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230925081139.1305766-1-lukasz.luba@arm.com> References: <20230925081139.1305766-1-lukasz.luba@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" In order to prepare the code for the modifiable EM perf_state table, refactor existing function em_cpufreq_update_efficiencies(). Signed-off-by: Lukasz Luba --- kernel/power/energy_model.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c index 8b9dd4a39f63..42486674b834 100644 --- a/kernel/power/energy_model.c +++ b/kernel/power/energy_model.c @@ -237,10 +237,10 @@ static int em_create_pd(struct device *dev, int nr_st= ates, return 0; } =20 -static void em_cpufreq_update_efficiencies(struct device *dev) +static void +em_cpufreq_update_efficiencies(struct device *dev, struct em_perf_state *t= able) { struct em_perf_domain *pd =3D dev->em_pd; - struct em_perf_state *table; struct cpufreq_policy *policy; int found =3D 0; int i; @@ -254,8 +254,6 @@ static void em_cpufreq_update_efficiencies(struct devic= e *dev) return; } =20 - table =3D pd->table; - for (i =3D 0; i < pd->nr_perf_states; i++) { if (!(table[i].flags & EM_PERF_STATE_INEFFICIENT)) continue; @@ -397,7 +395,7 @@ int em_dev_register_perf_domain(struct device *dev, uns= igned int nr_states, =20 dev->em_pd->flags |=3D flags; =20 - em_cpufreq_update_efficiencies(dev); + em_cpufreq_update_efficiencies(dev, dev->em_pd->table); =20 em_debug_create_pd(dev); dev_info(dev, "EM: created perf domain\n"); --=20 2.25.1 From nobody Fri Feb 13 12:33:03 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A1E29CE7A98 for ; Mon, 25 Sep 2023 08:11:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232688AbjIYILn (ORCPT ); Mon, 25 Sep 2023 04:11:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49858 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232712AbjIYIL1 (ORCPT ); Mon, 25 Sep 2023 04:11:27 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id A0EA6D3; Mon, 25 Sep 2023 01:11:20 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 706D21476; Mon, 25 Sep 2023 01:11:58 -0700 (PDT) Received: from e129166.arm.com (unknown [10.57.93.139]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id B8D2F3F5A1; Mon, 25 Sep 2023 01:11:17 -0700 (PDT) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, mhiramat@kernel.org, qyousef@layalina.io, wvw@google.com Subject: [PATCH v4 03/18] PM: EM: Find first CPU online while updating OPP efficiency Date: Mon, 25 Sep 2023 09:11:24 +0100 Message-Id: <20230925081139.1305766-4-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230925081139.1305766-1-lukasz.luba@arm.com> References: <20230925081139.1305766-1-lukasz.luba@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" The Energy Model might be updated at runtime and the energy efficiency for each OPP may change. Thus, there is a need to update also the cpufreq framework and make it aligned to the new values. In order to do that, use a first online CPU from the Performance Domain. Signed-off-by: Lukasz Luba --- kernel/power/energy_model.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c index 42486674b834..3dafdd7731c4 100644 --- a/kernel/power/energy_model.c +++ b/kernel/power/energy_model.c @@ -243,12 +243,19 @@ em_cpufreq_update_efficiencies(struct device *dev, st= ruct em_perf_state *table) struct em_perf_domain *pd =3D dev->em_pd; struct cpufreq_policy *policy; int found =3D 0; - int i; + int i, cpu; =20 if (!_is_cpu_device(dev) || !pd) return; =20 - policy =3D cpufreq_cpu_get(cpumask_first(em_span_cpus(pd))); + /* Try to get a CPU which is online and in this PD */ + cpu =3D cpumask_first_and(em_span_cpus(pd), cpu_active_mask); + if (cpu >=3D nr_cpu_ids) { + dev_warn(dev, "EM: No online CPU for CPUFreq policy\n"); + return; + } + + policy =3D cpufreq_cpu_get(cpu); if (!policy) { dev_warn(dev, "EM: Access to CPUFreq policy failed\n"); return; --=20 2.25.1 From nobody Fri Feb 13 12:33:03 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 53149CE7A97 for ; Mon, 25 Sep 2023 08:11:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232758AbjIYILk (ORCPT ); Mon, 25 Sep 2023 04:11:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49814 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232720AbjIYILc (ORCPT ); Mon, 25 Sep 2023 04:11:32 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 46EEB19A; Mon, 25 Sep 2023 01:11:23 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 2E8CEDA7; Mon, 25 Sep 2023 01:12:01 -0700 (PDT) Received: from e129166.arm.com (unknown [10.57.93.139]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 861193F5A1; Mon, 25 Sep 2023 01:11:20 -0700 (PDT) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, mhiramat@kernel.org, qyousef@layalina.io, wvw@google.com Subject: [PATCH v4 04/18] PM: EM: Refactor em_pd_get_efficient_state() to be more flexible Date: Mon, 25 Sep 2023 09:11:25 +0100 Message-Id: <20230925081139.1305766-5-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230925081139.1305766-1-lukasz.luba@arm.com> References: <20230925081139.1305766-1-lukasz.luba@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" The Energy Model (EM) is going to support runtime modification. There are going to be 2 EM tables which store information. This patch aims to prepare the code to be generic and use one of the tables. The function will no longer get a pointer to 'struct em_perf_domain' (the EM) but instead a pointer to 'struct em_perf_state' (which is one of the EM's tables). Prepare em_pd_get_efficient_state() for the upcoming changes and make it possible to re-use. Return an index for the best performance state for a given EM table. The function arguments that are introduced should allow to work on different performance state arrays. The caller of em_pd_get_efficient_state() should be able to use the index either on the default or the modifiable EM table. Signed-off-by: Lukasz Luba Reviewed-by: Daniel Lezcano --- include/linux/energy_model.h | 30 +++++++++++++++++------------- 1 file changed, 17 insertions(+), 13 deletions(-) diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h index b9caa01dfac4..8069f526c9d8 100644 --- a/include/linux/energy_model.h +++ b/include/linux/energy_model.h @@ -175,33 +175,35 @@ void em_dev_unregister_perf_domain(struct device *dev= ); =20 /** * em_pd_get_efficient_state() - Get an efficient performance state from t= he EM - * @pd : Performance domain for which we want an efficient frequency - * @freq : Frequency to map with the EM + * @state: List of performance states, in ascending order + * @nr_perf_states: Number of performance states + * @freq: Frequency to map with the EM + * @pd_flags: Performance Domain flags * * It is called from the scheduler code quite frequently and as a conseque= nce * doesn't implement any check. * - * Return: An efficient performance state, high enough to meet @freq + * Return: An efficient performance state id, high enough to meet @freq * requirement. */ -static inline -struct em_perf_state *em_pd_get_efficient_state(struct em_perf_domain *pd, - unsigned long freq) +static inline int +em_pd_get_efficient_state(struct em_perf_state *table, int nr_perf_states, + unsigned long freq, unsigned long pd_flags) { struct em_perf_state *ps; int i; =20 - for (i =3D 0; i < pd->nr_perf_states; i++) { - ps =3D &pd->table[i]; + for (i =3D 0; i < nr_perf_states; i++) { + ps =3D &table[i]; if (ps->frequency >=3D freq) { - if (pd->flags & EM_PERF_DOMAIN_SKIP_INEFFICIENCIES && + if (pd_flags & EM_PERF_DOMAIN_SKIP_INEFFICIENCIES && ps->flags & EM_PERF_STATE_INEFFICIENT) continue; - break; + return i; } } =20 - return ps; + return nr_perf_states - 1; } =20 /** @@ -226,7 +228,7 @@ static inline unsigned long em_cpu_energy(struct em_per= f_domain *pd, { unsigned long freq, scale_cpu; struct em_perf_state *ps; - int cpu; + int cpu, i; =20 if (!sum_util) return 0; @@ -251,7 +253,9 @@ static inline unsigned long em_cpu_energy(struct em_per= f_domain *pd, * Find the lowest performance state of the Energy Model above the * requested frequency. */ - ps =3D em_pd_get_efficient_state(pd, freq); + i =3D em_pd_get_efficient_state(pd->table, pd->nr_perf_states, freq, + pd->flags); + ps =3D &pd->table[i]; =20 /* * The capacity of a CPU in the domain at the performance state (ps) --=20 2.25.1 From nobody Fri Feb 13 12:33:03 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7140ACE7A81 for ; Mon, 25 Sep 2023 08:11:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232738AbjIYILp (ORCPT ); Mon, 25 Sep 2023 04:11:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42442 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232742AbjIYILe (ORCPT ); Mon, 25 Sep 2023 04:11:34 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 54E591BC; Mon, 25 Sep 2023 01:11:26 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 182291424; Mon, 25 Sep 2023 01:12:04 -0700 (PDT) Received: from e129166.arm.com (unknown [10.57.93.139]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 4B7C93F5A1; Mon, 25 Sep 2023 01:11:23 -0700 (PDT) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, mhiramat@kernel.org, qyousef@layalina.io, wvw@google.com Subject: [PATCH v4 05/18] PM: EM: Refactor a new function em_compute_costs() Date: Mon, 25 Sep 2023 09:11:26 +0100 Message-Id: <20230925081139.1305766-6-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230925081139.1305766-1-lukasz.luba@arm.com> References: <20230925081139.1305766-1-lukasz.luba@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Refactor a dedicated function which will be easier to maintain and re-use in future. The upcoming changes for the modifiable EM perf_state table will use it (instead of duplicating the code). Signed-off-by: Lukasz Luba --- kernel/power/energy_model.c | 72 ++++++++++++++++++++++--------------- 1 file changed, 43 insertions(+), 29 deletions(-) diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c index 3dafdd7731c4..7ea882401833 100644 --- a/kernel/power/energy_model.c +++ b/kernel/power/energy_model.c @@ -103,14 +103,52 @@ static void em_debug_create_pd(struct device *dev) {} static void em_debug_remove_pd(struct device *dev) {} #endif =20 +static int em_compute_costs(struct device *dev, struct em_perf_state *tabl= e, + struct em_data_callback *cb, int nr_states, + unsigned long flags) +{ + unsigned long prev_cost =3D ULONG_MAX; + u64 fmax; + int i, ret; + + /* Compute the cost of each performance state. */ + fmax =3D (u64) table[nr_states - 1].frequency; + for (i =3D nr_states - 1; i >=3D 0; i--) { + unsigned long power_res, cost; + + if (flags & EM_PERF_DOMAIN_ARTIFICIAL) { + ret =3D cb->get_cost(dev, table[i].frequency, &cost); + if (ret || !cost || cost > EM_MAX_POWER) { + dev_err(dev, "EM: invalid cost %lu %d\n", + cost, ret); + return -EINVAL; + } + } else { + power_res =3D table[i].power; + cost =3D div64_u64(fmax * power_res, table[i].frequency); + } + + table[i].cost =3D cost; + + if (table[i].cost >=3D prev_cost) { + table[i].flags =3D EM_PERF_STATE_INEFFICIENT; + dev_dbg(dev, "EM: OPP:%lu is inefficient\n", + table[i].frequency); + } else { + prev_cost =3D table[i].cost; + } + } + + return 0; +} + static int em_create_perf_table(struct device *dev, struct em_perf_domain = *pd, int nr_states, struct em_data_callback *cb, unsigned long flags) { - unsigned long power, freq, prev_freq =3D 0, prev_cost =3D ULONG_MAX; + unsigned long power, freq, prev_freq =3D 0; struct em_perf_state *table; int i, ret; - u64 fmax; =20 table =3D kcalloc(nr_states, sizeof(*table), GFP_KERNEL); if (!table) @@ -154,33 +192,9 @@ static int em_create_perf_table(struct device *dev, st= ruct em_perf_domain *pd, table[i].frequency =3D prev_freq =3D freq; } =20 - /* Compute the cost of each performance state. */ - fmax =3D (u64) table[nr_states - 1].frequency; - for (i =3D nr_states - 1; i >=3D 0; i--) { - unsigned long power_res, cost; - - if (flags & EM_PERF_DOMAIN_ARTIFICIAL) { - ret =3D cb->get_cost(dev, table[i].frequency, &cost); - if (ret || !cost || cost > EM_MAX_POWER) { - dev_err(dev, "EM: invalid cost %lu %d\n", - cost, ret); - goto free_ps_table; - } - } else { - power_res =3D table[i].power; - cost =3D div64_u64(fmax * power_res, table[i].frequency); - } - - table[i].cost =3D cost; - - if (table[i].cost >=3D prev_cost) { - table[i].flags =3D EM_PERF_STATE_INEFFICIENT; - dev_dbg(dev, "EM: OPP:%lu is inefficient\n", - table[i].frequency); - } else { - prev_cost =3D table[i].cost; - } - } + ret =3D em_compute_costs(dev, table, cb, nr_states, flags); + if (ret) + goto free_ps_table; =20 pd->table =3D table; pd->nr_perf_states =3D nr_states; --=20 2.25.1 From nobody Fri Feb 13 12:33:03 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C06F1CE7A89 for ; Mon, 25 Sep 2023 08:11:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232823AbjIYIL6 (ORCPT ); Mon, 25 Sep 2023 04:11:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42366 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232721AbjIYILg (ORCPT ); Mon, 25 Sep 2023 04:11:36 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 24452111; Mon, 25 Sep 2023 01:11:28 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id B613FDA7; Mon, 25 Sep 2023 01:12:06 -0700 (PDT) Received: from e129166.arm.com (unknown [10.57.93.139]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 11E683F7BD; Mon, 25 Sep 2023 01:11:25 -0700 (PDT) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, mhiramat@kernel.org, qyousef@layalina.io, wvw@google.com Subject: [PATCH v4 06/18] PM: EM: Check if the get_cost() callback is present in em_compute_costs() Date: Mon, 25 Sep 2023 09:11:27 +0100 Message-Id: <20230925081139.1305766-7-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230925081139.1305766-1-lukasz.luba@arm.com> References: <20230925081139.1305766-1-lukasz.luba@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" The em_compute_cost() is going to be re-used in runtime modified EM code path. Thus, make sure that this common code is safe and won't try to use the NULL pointer. The former em_compute_cost() didn't have to care about runtime modification code path. The upcoming changes introduce such option, but with different callback. Those two paths which use get_cost() (during first EM registration) or update_power() (during runtime modification) need to be safely handled in em_compute_costs(). Signed-off-by: Lukasz Luba --- kernel/power/energy_model.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c index 7ea882401833..35e07933b34a 100644 --- a/kernel/power/energy_model.c +++ b/kernel/power/energy_model.c @@ -116,7 +116,7 @@ static int em_compute_costs(struct device *dev, struct = em_perf_state *table, for (i =3D nr_states - 1; i >=3D 0; i--) { unsigned long power_res, cost; =20 - if (flags & EM_PERF_DOMAIN_ARTIFICIAL) { + if (flags & EM_PERF_DOMAIN_ARTIFICIAL && cb->get_cost) { ret =3D cb->get_cost(dev, table[i].frequency, &cost); if (ret || !cost || cost > EM_MAX_POWER) { dev_err(dev, "EM: invalid cost %lu %d\n", --=20 2.25.1 From nobody Fri Feb 13 12:33:03 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 58F15CE7A8C for ; Mon, 25 Sep 2023 08:11:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232846AbjIYIMB (ORCPT ); Mon, 25 Sep 2023 04:12:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49774 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232766AbjIYILl (ORCPT ); Mon, 25 Sep 2023 04:11:41 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id D1814180; Mon, 25 Sep 2023 01:11:31 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 9BCBA1424; Mon, 25 Sep 2023 01:12:09 -0700 (PDT) Received: from e129166.arm.com (unknown [10.57.93.139]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id D15053F5A1; Mon, 25 Sep 2023 01:11:28 -0700 (PDT) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, mhiramat@kernel.org, qyousef@layalina.io, wvw@google.com Subject: [PATCH v4 07/18] PM: EM: Refactor struct em_perf_domain and add default_table Date: Mon, 25 Sep 2023 09:11:28 +0100 Message-Id: <20230925081139.1305766-8-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230925081139.1305766-1-lukasz.luba@arm.com> References: <20230925081139.1305766-1-lukasz.luba@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" The Energy Model is going to support runtime modifications. Refactor old implementation which accessed struct em_perf_state and introduce em_perf_domain::default_table to clean up the design. This new field will help to better distinguish 2 performance state tables. Update all drivers or frameworks which used the old field: em_perf_domain::table and now should use em_perf_domain::default_table. Signed-off-by: Lukasz Luba --- drivers/powercap/dtpm_cpu.c | 27 +++++++++++++++++++-------- drivers/powercap/dtpm_devfreq.c | 23 ++++++++++++++++------- drivers/thermal/cpufreq_cooling.c | 24 ++++++++++++++++-------- drivers/thermal/devfreq_cooling.c | 23 +++++++++++++++++------ include/linux/energy_model.h | 24 ++++++++++++++++++------ kernel/power/energy_model.c | 26 ++++++++++++++++++++++---- 6 files changed, 108 insertions(+), 39 deletions(-) diff --git a/drivers/powercap/dtpm_cpu.c b/drivers/powercap/dtpm_cpu.c index 2ff7717530bf..743a0ac8ecdf 100644 --- a/drivers/powercap/dtpm_cpu.c +++ b/drivers/powercap/dtpm_cpu.c @@ -43,6 +43,7 @@ static u64 set_pd_power_limit(struct dtpm *dtpm, u64 powe= r_limit) { struct dtpm_cpu *dtpm_cpu =3D to_dtpm_cpu(dtpm); struct em_perf_domain *pd =3D em_cpu_get(dtpm_cpu->cpu); + struct em_perf_state *table; struct cpumask cpus; unsigned long freq; u64 power; @@ -51,19 +52,21 @@ static u64 set_pd_power_limit(struct dtpm *dtpm, u64 po= wer_limit) cpumask_and(&cpus, cpu_online_mask, to_cpumask(pd->cpus)); nr_cpus =3D cpumask_weight(&cpus); =20 + table =3D pd->default_table->state; + for (i =3D 0; i < pd->nr_perf_states; i++) { =20 - power =3D pd->table[i].power * nr_cpus; + power =3D table[i].power * nr_cpus; =20 if (power > power_limit) break; } =20 - freq =3D pd->table[i - 1].frequency; + freq =3D table[i - 1].frequency; =20 freq_qos_update_request(&dtpm_cpu->qos_req, freq); =20 - power_limit =3D pd->table[i - 1].power * nr_cpus; + power_limit =3D table[i - 1].power * nr_cpus; =20 return power_limit; } @@ -88,12 +91,14 @@ static u64 scale_pd_power_uw(struct cpumask *pd_mask, u= 64 power) static u64 get_pd_power_uw(struct dtpm *dtpm) { struct dtpm_cpu *dtpm_cpu =3D to_dtpm_cpu(dtpm); + struct em_perf_state *table; struct em_perf_domain *pd; struct cpumask *pd_mask; unsigned long freq; int i; =20 pd =3D em_cpu_get(dtpm_cpu->cpu); + table =3D pd->default_table->state; =20 pd_mask =3D em_span_cpus(pd); =20 @@ -101,10 +106,10 @@ static u64 get_pd_power_uw(struct dtpm *dtpm) =20 for (i =3D 0; i < pd->nr_perf_states; i++) { =20 - if (pd->table[i].frequency < freq) + if (table[i].frequency < freq) continue; =20 - return scale_pd_power_uw(pd_mask, pd->table[i].power * + return scale_pd_power_uw(pd_mask, table[i].power * MICROWATT_PER_MILLIWATT); } =20 @@ -115,17 +120,20 @@ static int update_pd_power_uw(struct dtpm *dtpm) { struct dtpm_cpu *dtpm_cpu =3D to_dtpm_cpu(dtpm); struct em_perf_domain *em =3D em_cpu_get(dtpm_cpu->cpu); + struct em_perf_state *table; struct cpumask cpus; int nr_cpus; =20 cpumask_and(&cpus, cpu_online_mask, to_cpumask(em->cpus)); nr_cpus =3D cpumask_weight(&cpus); =20 - dtpm->power_min =3D em->table[0].power; + table =3D em->default_table->state; + + dtpm->power_min =3D table[0].power; dtpm->power_min *=3D MICROWATT_PER_MILLIWATT; dtpm->power_min *=3D nr_cpus; =20 - dtpm->power_max =3D em->table[em->nr_perf_states - 1].power; + dtpm->power_max =3D table[em->nr_perf_states - 1].power; dtpm->power_max *=3D MICROWATT_PER_MILLIWATT; dtpm->power_max *=3D nr_cpus; =20 @@ -182,6 +190,7 @@ static int __dtpm_cpu_setup(int cpu, struct dtpm *paren= t) { struct dtpm_cpu *dtpm_cpu; struct cpufreq_policy *policy; + struct em_perf_state *table; struct em_perf_domain *pd; char name[CPUFREQ_NAME_LEN]; int ret =3D -ENOMEM; @@ -198,6 +207,8 @@ static int __dtpm_cpu_setup(int cpu, struct dtpm *paren= t) if (!pd || em_is_artificial(pd)) return -EINVAL; =20 + table =3D pd->default_table->state; + dtpm_cpu =3D kzalloc(sizeof(*dtpm_cpu), GFP_KERNEL); if (!dtpm_cpu) return -ENOMEM; @@ -216,7 +227,7 @@ static int __dtpm_cpu_setup(int cpu, struct dtpm *paren= t) =20 ret =3D freq_qos_add_request(&policy->constraints, &dtpm_cpu->qos_req, FREQ_QOS_MAX, - pd->table[pd->nr_perf_states - 1].frequency); + table[pd->nr_perf_states - 1].frequency); if (ret) goto out_dtpm_unregister; =20 diff --git a/drivers/powercap/dtpm_devfreq.c b/drivers/powercap/dtpm_devfre= q.c index 91276761a31d..6ef0f2b4a683 100644 --- a/drivers/powercap/dtpm_devfreq.c +++ b/drivers/powercap/dtpm_devfreq.c @@ -37,11 +37,14 @@ static int update_pd_power_uw(struct dtpm *dtpm) struct devfreq *devfreq =3D dtpm_devfreq->devfreq; struct device *dev =3D devfreq->dev.parent; struct em_perf_domain *pd =3D em_pd_get(dev); + struct em_perf_state *table; =20 - dtpm->power_min =3D pd->table[0].power; + table =3D pd->default_table->state; + + dtpm->power_min =3D table[0].power; dtpm->power_min *=3D MICROWATT_PER_MILLIWATT; =20 - dtpm->power_max =3D pd->table[pd->nr_perf_states - 1].power; + dtpm->power_max =3D table[pd->nr_perf_states - 1].power; dtpm->power_max *=3D MICROWATT_PER_MILLIWATT; =20 return 0; @@ -53,22 +56,25 @@ static u64 set_pd_power_limit(struct dtpm *dtpm, u64 po= wer_limit) struct devfreq *devfreq =3D dtpm_devfreq->devfreq; struct device *dev =3D devfreq->dev.parent; struct em_perf_domain *pd =3D em_pd_get(dev); + struct em_perf_state *table; unsigned long freq; u64 power; int i; =20 + table =3D pd->default_table->state; + for (i =3D 0; i < pd->nr_perf_states; i++) { =20 - power =3D pd->table[i].power * MICROWATT_PER_MILLIWATT; + power =3D table[i].power * MICROWATT_PER_MILLIWATT; if (power > power_limit) break; } =20 - freq =3D pd->table[i - 1].frequency; + freq =3D table[i - 1].frequency; =20 dev_pm_qos_update_request(&dtpm_devfreq->qos_req, freq); =20 - power_limit =3D pd->table[i - 1].power * MICROWATT_PER_MILLIWATT; + power_limit =3D table[i - 1].power * MICROWATT_PER_MILLIWATT; =20 return power_limit; } @@ -94,6 +100,7 @@ static u64 get_pd_power_uw(struct dtpm *dtpm) struct device *dev =3D devfreq->dev.parent; struct em_perf_domain *pd =3D em_pd_get(dev); struct devfreq_dev_status status; + struct em_perf_state *table; unsigned long freq; u64 power; int i; @@ -102,15 +109,17 @@ static u64 get_pd_power_uw(struct dtpm *dtpm) status =3D devfreq->last_status; mutex_unlock(&devfreq->lock); =20 + table =3D pd->default_table->state; + freq =3D DIV_ROUND_UP(status.current_frequency, HZ_PER_KHZ); _normalize_load(&status); =20 for (i =3D 0; i < pd->nr_perf_states; i++) { =20 - if (pd->table[i].frequency < freq) + if (table[i].frequency < freq) continue; =20 - power =3D pd->table[i].power * MICROWATT_PER_MILLIWATT; + power =3D table[i].power * MICROWATT_PER_MILLIWATT; power *=3D status.busy_time; power >>=3D 10; =20 diff --git a/drivers/thermal/cpufreq_cooling.c b/drivers/thermal/cpufreq_co= oling.c index e2cc7bd30862..d468aca241e2 100644 --- a/drivers/thermal/cpufreq_cooling.c +++ b/drivers/thermal/cpufreq_cooling.c @@ -91,10 +91,11 @@ struct cpufreq_cooling_device { static unsigned long get_level(struct cpufreq_cooling_device *cpufreq_cdev, unsigned int freq) { + struct em_perf_state *table =3D cpufreq_cdev->em->default_table->state; int i; =20 for (i =3D cpufreq_cdev->max_level - 1; i >=3D 0; i--) { - if (freq > cpufreq_cdev->em->table[i].frequency) + if (freq > table[i].frequency) break; } =20 @@ -104,15 +105,16 @@ static unsigned long get_level(struct cpufreq_cooling= _device *cpufreq_cdev, static u32 cpu_freq_to_power(struct cpufreq_cooling_device *cpufreq_cdev, u32 freq) { + struct em_perf_state *table =3D cpufreq_cdev->em->default_table->state; unsigned long power_mw; int i; =20 for (i =3D cpufreq_cdev->max_level - 1; i >=3D 0; i--) { - if (freq > cpufreq_cdev->em->table[i].frequency) + if (freq > table[i].frequency) break; } =20 - power_mw =3D cpufreq_cdev->em->table[i + 1].power; + power_mw =3D table[i + 1].power; power_mw /=3D MICROWATT_PER_MILLIWATT; =20 return power_mw; @@ -121,18 +123,19 @@ static u32 cpu_freq_to_power(struct cpufreq_cooling_d= evice *cpufreq_cdev, static u32 cpu_power_to_freq(struct cpufreq_cooling_device *cpufreq_cdev, u32 power) { + struct em_perf_state *table =3D cpufreq_cdev->em->default_table->state; unsigned long em_power_mw; int i; =20 for (i =3D cpufreq_cdev->max_level; i > 0; i--) { /* Convert EM power to milli-Watts to make safe comparison */ - em_power_mw =3D cpufreq_cdev->em->table[i].power; + em_power_mw =3D table[i].power; em_power_mw /=3D MICROWATT_PER_MILLIWATT; if (power >=3D em_power_mw) break; } =20 - return cpufreq_cdev->em->table[i].frequency; + return table[i].frequency; } =20 /** @@ -262,8 +265,9 @@ static int cpufreq_get_requested_power(struct thermal_c= ooling_device *cdev, static int cpufreq_state2power(struct thermal_cooling_device *cdev, unsigned long state, u32 *power) { - unsigned int freq, num_cpus, idx; struct cpufreq_cooling_device *cpufreq_cdev =3D cdev->devdata; + unsigned int freq, num_cpus, idx; + struct em_perf_state *table; =20 /* Request state should be less than max_level */ if (state > cpufreq_cdev->max_level) @@ -271,8 +275,9 @@ static int cpufreq_state2power(struct thermal_cooling_d= evice *cdev, =20 num_cpus =3D cpumask_weight(cpufreq_cdev->policy->cpus); =20 + table =3D cpufreq_cdev->em->default_table->state; idx =3D cpufreq_cdev->max_level - state; - freq =3D cpufreq_cdev->em->table[idx].frequency; + freq =3D table[idx].frequency; *power =3D cpu_freq_to_power(cpufreq_cdev, freq) * num_cpus; =20 return 0; @@ -378,8 +383,11 @@ static unsigned int get_state_freq(struct cpufreq_cool= ing_device *cpufreq_cdev, #ifdef CONFIG_THERMAL_GOV_POWER_ALLOCATOR /* Use the Energy Model table if available */ if (cpufreq_cdev->em) { + struct em_perf_state *table; + + table =3D cpufreq_cdev->em->default_table->state; idx =3D cpufreq_cdev->max_level - state; - return cpufreq_cdev->em->table[idx].frequency; + return table[idx].frequency; } #endif =20 diff --git a/drivers/thermal/devfreq_cooling.c b/drivers/thermal/devfreq_co= oling.c index 262e62ab6cf2..4207ef850582 100644 --- a/drivers/thermal/devfreq_cooling.c +++ b/drivers/thermal/devfreq_cooling.c @@ -87,6 +87,7 @@ static int devfreq_cooling_set_cur_state(struct thermal_c= ooling_device *cdev, struct devfreq_cooling_device *dfc =3D cdev->devdata; struct devfreq *df =3D dfc->devfreq; struct device *dev =3D df->dev.parent; + struct em_perf_state *table; unsigned long freq; int perf_idx; =20 @@ -99,8 +100,9 @@ static int devfreq_cooling_set_cur_state(struct thermal_= cooling_device *cdev, return -EINVAL; =20 if (dfc->em_pd) { + table =3D dfc->em_pd->default_table->state; perf_idx =3D dfc->max_state - state; - freq =3D dfc->em_pd->table[perf_idx].frequency * 1000; + freq =3D table[perf_idx].frequency * 1000; } else { freq =3D dfc->freq_table[state]; } @@ -123,10 +125,11 @@ static int devfreq_cooling_set_cur_state(struct therm= al_cooling_device *cdev, */ static int get_perf_idx(struct em_perf_domain *em_pd, unsigned long freq) { + struct em_perf_state *table =3D em_pd->default_table->state; int i; =20 for (i =3D 0; i < em_pd->nr_perf_states; i++) { - if (em_pd->table[i].frequency =3D=3D freq) + if (table[i].frequency =3D=3D freq) return i; } =20 @@ -181,6 +184,7 @@ static int devfreq_cooling_get_requested_power(struct t= hermal_cooling_device *cd struct devfreq_cooling_device *dfc =3D cdev->devdata; struct devfreq *df =3D dfc->devfreq; struct devfreq_dev_status status; + struct em_perf_state *table; unsigned long state; unsigned long freq; unsigned long voltage; @@ -192,6 +196,8 @@ static int devfreq_cooling_get_requested_power(struct t= hermal_cooling_device *cd =20 freq =3D status.current_frequency; =20 + table =3D dfc->em_pd->default_table->state; + if (dfc->power_ops && dfc->power_ops->get_real_power) { voltage =3D get_voltage(df, freq); if (voltage =3D=3D 0) { @@ -204,7 +210,7 @@ static int devfreq_cooling_get_requested_power(struct t= hermal_cooling_device *cd state =3D dfc->capped_state; =20 /* Convert EM power into milli-Watts first */ - dfc->res_util =3D dfc->em_pd->table[state].power; + dfc->res_util =3D table[state].power; dfc->res_util /=3D MICROWATT_PER_MILLIWATT; =20 dfc->res_util *=3D SCALE_ERROR_MITIGATION; @@ -225,7 +231,7 @@ static int devfreq_cooling_get_requested_power(struct t= hermal_cooling_device *cd _normalize_load(&status); =20 /* Convert EM power into milli-Watts first */ - *power =3D dfc->em_pd->table[perf_idx].power; + *power =3D table[perf_idx].power; *power /=3D MICROWATT_PER_MILLIWATT; /* Scale power for utilization */ *power *=3D status.busy_time; @@ -245,13 +251,15 @@ static int devfreq_cooling_state2power(struct thermal= _cooling_device *cdev, unsigned long state, u32 *power) { struct devfreq_cooling_device *dfc =3D cdev->devdata; + struct em_perf_state *table; int perf_idx; =20 if (state > dfc->max_state) return -EINVAL; =20 + table =3D dfc->em_pd->default_table->state; perf_idx =3D dfc->max_state - state; - *power =3D dfc->em_pd->table[perf_idx].power; + *power =3D table[perf_idx].power; *power /=3D MICROWATT_PER_MILLIWATT; =20 return 0; @@ -264,6 +272,7 @@ static int devfreq_cooling_power2state(struct thermal_c= ooling_device *cdev, struct devfreq *df =3D dfc->devfreq; struct devfreq_dev_status status; unsigned long freq, em_power_mw; + struct em_perf_state *table; s32 est_power; int i; =20 @@ -273,6 +282,8 @@ static int devfreq_cooling_power2state(struct thermal_c= ooling_device *cdev, =20 freq =3D status.current_frequency; =20 + table =3D dfc->em_pd->default_table->state; + if (dfc->power_ops && dfc->power_ops->get_real_power) { /* Scale for resource utilization */ est_power =3D power * dfc->res_util; @@ -290,7 +301,7 @@ static int devfreq_cooling_power2state(struct thermal_c= ooling_device *cdev, */ for (i =3D dfc->max_state; i > 0; i--) { /* Convert EM power to milli-Watts to make safe comparison */ - em_power_mw =3D dfc->em_pd->table[i].power; + em_power_mw =3D table[i].power; em_power_mw /=3D MICROWATT_PER_MILLIWATT; if (est_power >=3D em_power_mw) break; diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h index 8069f526c9d8..d236e08e80dc 100644 --- a/include/linux/energy_model.h +++ b/include/linux/energy_model.h @@ -36,9 +36,19 @@ struct em_perf_state { */ #define EM_PERF_STATE_INEFFICIENT BIT(0) =20 +/** + * struct em_perf_table - Performance states table + * @state: List of performance states, in ascending order + * @rcu: RCU used for safe access and destruction + */ +struct em_perf_table { + struct em_perf_state *state; + struct rcu_head rcu; +}; + /** * struct em_perf_domain - Performance domain - * @table: List of performance states, in ascending order + * @default_table: Pointer to the default em_perf_table * @nr_perf_states: Number of performance states * @flags: See "em_perf_domain flags" * @cpus: Cpumask covering the CPUs of the domain. It's here @@ -53,7 +63,7 @@ struct em_perf_state { * field is unused. */ struct em_perf_domain { - struct em_perf_state *table; + struct em_perf_table *default_table; int nr_perf_states; unsigned long flags; unsigned long cpus[]; @@ -227,12 +237,14 @@ static inline unsigned long em_cpu_energy(struct em_p= erf_domain *pd, unsigned long allowed_cpu_cap) { unsigned long freq, scale_cpu; - struct em_perf_state *ps; + struct em_perf_state *table, *ps; int cpu, i; =20 if (!sum_util) return 0; =20 + table =3D pd->default_table->state; + /* * In order to predict the performance state, map the utilization of * the most utilized CPU of the performance domain to a requested @@ -243,7 +255,7 @@ static inline unsigned long em_cpu_energy(struct em_per= f_domain *pd, */ cpu =3D cpumask_first(to_cpumask(pd->cpus)); scale_cpu =3D arch_scale_cpu_capacity(cpu); - ps =3D &pd->table[pd->nr_perf_states - 1]; + ps =3D &table[pd->nr_perf_states - 1]; =20 max_util =3D map_util_perf(max_util); max_util =3D min(max_util, allowed_cpu_cap); @@ -253,9 +265,9 @@ static inline unsigned long em_cpu_energy(struct em_per= f_domain *pd, * Find the lowest performance state of the Energy Model above the * requested frequency. */ - i =3D em_pd_get_efficient_state(pd->table, pd->nr_perf_states, freq, + i =3D em_pd_get_efficient_state(table, pd->nr_perf_states, freq, pd->flags); - ps =3D &pd->table[i]; + ps =3D &table[i]; =20 /* * The capacity of a CPU in the domain at the performance state (ps) diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c index 35e07933b34a..797141638b29 100644 --- a/kernel/power/energy_model.c +++ b/kernel/power/energy_model.c @@ -66,6 +66,7 @@ DEFINE_SHOW_ATTRIBUTE(em_debug_flags); =20 static void em_debug_create_pd(struct device *dev) { + struct em_perf_table *table =3D dev->em_pd->default_table; struct dentry *d; int i; =20 @@ -81,7 +82,7 @@ static void em_debug_create_pd(struct device *dev) =20 /* Create a sub-directory for each performance state */ for (i =3D 0; i < dev->em_pd->nr_perf_states; i++) - em_debug_create_ps(&dev->em_pd->table[i], d); + em_debug_create_ps(&table->state[i], d); =20 } =20 @@ -196,7 +197,7 @@ static int em_create_perf_table(struct device *dev, str= uct em_perf_domain *pd, if (ret) goto free_ps_table; =20 - pd->table =3D table; + pd->default_table->state =3D table; pd->nr_perf_states =3D nr_states; =20 return 0; @@ -210,6 +211,7 @@ static int em_create_pd(struct device *dev, int nr_stat= es, struct em_data_callback *cb, cpumask_t *cpus, unsigned long flags) { + struct em_perf_table *default_table; struct em_perf_domain *pd; struct device *cpu_dev; int cpu, ret, num_cpus; @@ -234,8 +236,17 @@ static int em_create_pd(struct device *dev, int nr_sta= tes, return -ENOMEM; } =20 + default_table =3D kzalloc(sizeof(*default_table), GFP_KERNEL); + if (!default_table) { + kfree(pd); + return -ENOMEM; + } + + pd->default_table =3D default_table; + ret =3D em_create_perf_table(dev, pd, nr_states, cb, flags); if (ret) { + kfree(default_table); kfree(pd); return ret; } @@ -358,6 +369,7 @@ int em_dev_register_perf_domain(struct device *dev, uns= igned int nr_states, bool microwatts) { unsigned long cap, prev_cap =3D 0; + struct em_perf_state *table; unsigned long flags =3D 0; int cpu, ret; =20 @@ -416,7 +428,8 @@ int em_dev_register_perf_domain(struct device *dev, uns= igned int nr_states, =20 dev->em_pd->flags |=3D flags; =20 - em_cpufreq_update_efficiencies(dev, dev->em_pd->table); + table =3D dev->em_pd->default_table->state; + em_cpufreq_update_efficiencies(dev, table); =20 em_debug_create_pd(dev); dev_info(dev, "EM: created perf domain\n"); @@ -435,12 +448,16 @@ EXPORT_SYMBOL_GPL(em_dev_register_perf_domain); */ void em_dev_unregister_perf_domain(struct device *dev) { + struct em_perf_domain *pd; + if (IS_ERR_OR_NULL(dev) || !dev->em_pd) return; =20 if (_is_cpu_device(dev)) return; =20 + pd =3D dev->em_pd; + /* * The mutex separates all register/unregister requests and protects * from potential clean-up/setup issues in the debugfs directories. @@ -449,7 +466,8 @@ void em_dev_unregister_perf_domain(struct device *dev) mutex_lock(&em_pd_mutex); em_debug_remove_pd(dev); =20 - kfree(dev->em_pd->table); + kfree(pd->default_table->state); + kfree(pd->default_table); kfree(dev->em_pd); dev->em_pd =3D NULL; mutex_unlock(&em_pd_mutex); --=20 2.25.1 From nobody Fri Feb 13 12:33:03 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DF36BCE7A97 for ; Mon, 25 Sep 2023 08:12:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232868AbjIYIMF (ORCPT ); Mon, 25 Sep 2023 04:12:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42418 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232768AbjIYILl (ORCPT ); Mon, 25 Sep 2023 04:11:41 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 672141AC; Mon, 25 Sep 2023 01:11:34 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 5E703DA7; Mon, 25 Sep 2023 01:12:12 -0700 (PDT) Received: from e129166.arm.com (unknown [10.57.93.139]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id B604D3F5A1; Mon, 25 Sep 2023 01:11:31 -0700 (PDT) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, mhiramat@kernel.org, qyousef@layalina.io, wvw@google.com Subject: [PATCH v4 08/18] PM: EM: Add update_power() callback for runtime modifications Date: Mon, 25 Sep 2023 09:11:29 +0100 Message-Id: <20230925081139.1305766-9-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230925081139.1305766-1-lukasz.luba@arm.com> References: <20230925081139.1305766-1-lukasz.luba@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" The Energy Model (EM) is going to support runtime modifications. This new callback would be used in the upcoming EM changes. The drivers or frameworks which want to modify the EM have to implement the update_power() callback. Signed-off-by: Lukasz Luba --- include/linux/energy_model.h | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h index d236e08e80dc..546dee90f716 100644 --- a/include/linux/energy_model.h +++ b/include/linux/energy_model.h @@ -168,6 +168,26 @@ struct em_data_callback { */ int (*get_cost)(struct device *dev, unsigned long freq, unsigned long *cost); + + /** + * update_power() - Provide new power at the given performance state of + * a device + * @dev : Device for which we do this operation (can be a CPU) + * @freq : Frequency at the performance state in kHz + * @power : New power value at the performance state + * (modified) + * @priv : Pointer to private data useful for tracking context + * during runtime modifications of EM. + * + * The update_power() is used by runtime modifiable EM. It aims to + * provide updated power value for a given frequency, which is stored + * in the performance state. The power value provided by this callback + * should fit in the [0, EM_MAX_POWER] range. + * + * Return 0 on success, or appropriate error value in case of failure. + */ + int (*update_power)(struct device *dev, unsigned long freq, + unsigned long *power, void *priv); }; #define EM_SET_ACTIVE_POWER_CB(em_cb, cb) ((em_cb).active_power =3D cb) #define EM_ADV_DATA_CB(_active_power_cb, _cost_cb) \ @@ -175,6 +195,7 @@ struct em_data_callback { .get_cost =3D _cost_cb } #define EM_DATA_CB(_active_power_cb) \ EM_ADV_DATA_CB(_active_power_cb, NULL) +#define EM_UPDATE_CB(_update_power_cb) { .update_power =3D &_update_power_= cb } =20 struct em_perf_domain *em_cpu_get(int cpu); struct em_perf_domain *em_pd_get(struct device *dev); @@ -331,6 +352,7 @@ struct em_data_callback {}; #define EM_ADV_DATA_CB(_active_power_cb, _cost_cb) { } #define EM_DATA_CB(_active_power_cb) { } #define EM_SET_ACTIVE_POWER_CB(em_cb, cb) do { } while (0) +#define EM_UPDATE_CB(_update_cb) { } =20 static inline int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states, --=20 2.25.1 From nobody Fri Feb 13 12:33:03 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 03910CE7A8C for ; Mon, 25 Sep 2023 08:12:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232882AbjIYIMI (ORCPT ); Mon, 25 Sep 2023 04:12:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42352 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232712AbjIYILo (ORCPT ); Mon, 25 Sep 2023 04:11:44 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 8E17611F; Mon, 25 Sep 2023 01:11:37 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 206901424; Mon, 25 Sep 2023 01:12:15 -0700 (PDT) Received: from e129166.arm.com (unknown [10.57.93.139]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 798103F5A1; Mon, 25 Sep 2023 01:11:34 -0700 (PDT) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, mhiramat@kernel.org, qyousef@layalina.io, wvw@google.com Subject: [PATCH v4 09/18] PM: EM: Introduce runtime modifiable table Date: Mon, 25 Sep 2023 09:11:30 +0100 Message-Id: <20230925081139.1305766-10-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230925081139.1305766-1-lukasz.luba@arm.com> References: <20230925081139.1305766-1-lukasz.luba@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" The new runtime table would be populated with a new power data to better reflect the actual power. The power can vary over time e.g. due to the SoC temperature change. Higher temperature can increase power values. For longer running scenarios, such as game or camera, when also other devices are used (e.g. GPU, ISP) the CPU power can change. The new EM framework is able to addresses this issue and change the data at runtime safely. The runtime modifiable EM data is used by the Energy Aware Scheduler (EAS) for the task placement. All the other users (thermal, etc.) are still using the default (basic) EM. This fact drove the design of this feature. Signed-off-by: Lukasz Luba --- include/linux/energy_model.h | 4 +++- kernel/power/energy_model.c | 12 +++++++++++- 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h index 546dee90f716..740e7c25cfff 100644 --- a/include/linux/energy_model.h +++ b/include/linux/energy_model.h @@ -39,7 +39,7 @@ struct em_perf_state { /** * struct em_perf_table - Performance states table * @state: List of performance states, in ascending order - * @rcu: RCU used for safe access and destruction + * @rcu: RCU used only for runtime modifiable table */ struct em_perf_table { struct em_perf_state *state; @@ -49,6 +49,7 @@ struct em_perf_table { /** * struct em_perf_domain - Performance domain * @default_table: Pointer to the default em_perf_table + * @runtime_table: Pointer to the runtime modifiable em_perf_table * @nr_perf_states: Number of performance states * @flags: See "em_perf_domain flags" * @cpus: Cpumask covering the CPUs of the domain. It's here @@ -64,6 +65,7 @@ struct em_perf_table { */ struct em_perf_domain { struct em_perf_table *default_table; + struct em_perf_table __rcu *runtime_table; int nr_perf_states; unsigned long flags; unsigned long cpus[]; diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c index 797141638b29..5b40db38b745 100644 --- a/kernel/power/energy_model.c +++ b/kernel/power/energy_model.c @@ -251,6 +251,9 @@ static int em_create_pd(struct device *dev, int nr_stat= es, return ret; } =20 + /* Initialize runtime table as default table. */ + rcu_assign_pointer(pd->runtime_table, default_table); + if (_is_cpu_device(dev)) for_each_cpu(cpu, cpus) { cpu_dev =3D get_cpu_device(cpu); @@ -448,6 +451,7 @@ EXPORT_SYMBOL_GPL(em_dev_register_perf_domain); */ void em_dev_unregister_perf_domain(struct device *dev) { + struct em_perf_table __rcu *runtime_table; struct em_perf_domain *pd; =20 if (IS_ERR_OR_NULL(dev) || !dev->em_pd) @@ -457,18 +461,24 @@ void em_dev_unregister_perf_domain(struct device *dev) return; =20 pd =3D dev->em_pd; - /* * The mutex separates all register/unregister requests and protects * from potential clean-up/setup issues in the debugfs directories. * The debugfs directory name is the same as device's name. */ mutex_lock(&em_pd_mutex); + em_debug_remove_pd(dev); =20 + runtime_table =3D pd->runtime_table; + + rcu_assign_pointer(pd->runtime_table, NULL); + synchronize_rcu(); + kfree(pd->default_table->state); kfree(pd->default_table); kfree(dev->em_pd); + dev->em_pd =3D NULL; mutex_unlock(&em_pd_mutex); } --=20 2.25.1 From nobody Fri Feb 13 12:33:03 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6A29DCE7A95 for ; Mon, 25 Sep 2023 08:12:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232781AbjIYIMM (ORCPT ); Mon, 25 Sep 2023 04:12:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55324 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232742AbjIYILq (ORCPT ); Mon, 25 Sep 2023 04:11:46 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id E30441A7; Mon, 25 Sep 2023 01:11:39 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D7D0DDA7; Mon, 25 Sep 2023 01:12:17 -0700 (PDT) Received: from e129166.arm.com (unknown [10.57.93.139]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 3B8E93F5A1; Mon, 25 Sep 2023 01:11:37 -0700 (PDT) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, mhiramat@kernel.org, qyousef@layalina.io, wvw@google.com Subject: [PATCH v4 10/18] PM: EM: Add RCU mechanism which safely cleans the old data Date: Mon, 25 Sep 2023 09:11:31 +0100 Message-Id: <20230925081139.1305766-11-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230925081139.1305766-1-lukasz.luba@arm.com> References: <20230925081139.1305766-1-lukasz.luba@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" The EM is going to support runtime modifications of the power data. Introduce RCU safe mechanism to clean up the old allocated EM data. It also adds a mutex for the EM structure to serialize the modifiers. Signed-off-by: Lukasz Luba --- kernel/power/energy_model.c | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c index 5b40db38b745..2345837bfd2c 100644 --- a/kernel/power/energy_model.c +++ b/kernel/power/energy_model.c @@ -23,6 +23,9 @@ */ static DEFINE_MUTEX(em_pd_mutex); =20 +static void em_cpufreq_update_efficiencies(struct device *dev, + struct em_perf_state *table); + static bool _is_cpu_device(struct device *dev) { return (dev->bus =3D=3D &cpu_subsys); @@ -104,6 +107,32 @@ static void em_debug_create_pd(struct device *dev) {} static void em_debug_remove_pd(struct device *dev) {} #endif =20 +static void em_destroy_rt_table_rcu(struct rcu_head *rp) +{ + struct em_perf_table *runtime_table; + + runtime_table =3D container_of(rp, struct em_perf_table, rcu); + kfree(runtime_table->state); + kfree(runtime_table); +} + +static void em_perf_runtime_table_set(struct device *dev, + struct em_perf_table *runtime_table) +{ + struct em_perf_domain *pd =3D dev->em_pd; + struct em_perf_table *tmp; + + tmp =3D pd->runtime_table; + + rcu_assign_pointer(pd->runtime_table, runtime_table); + + em_cpufreq_update_efficiencies(dev, runtime_table->state); + + /* Don't free default table since it's used by other frameworks. */ + if (tmp !=3D pd->default_table) + call_rcu(&tmp->rcu, em_destroy_rt_table_rcu); +} + static int em_compute_costs(struct device *dev, struct em_perf_state *tabl= e, struct em_data_callback *cb, int nr_states, unsigned long flags) --=20 2.25.1 From nobody Fri Feb 13 12:33:03 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4F502CE7A8C for ; Mon, 25 Sep 2023 08:12:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232798AbjIYIMQ (ORCPT ); Mon, 25 Sep 2023 04:12:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42378 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232792AbjIYIL4 (ORCPT ); Mon, 25 Sep 2023 04:11:56 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id A55DBCC2; Mon, 25 Sep 2023 01:11:42 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 9BA73DA7; Mon, 25 Sep 2023 01:12:20 -0700 (PDT) Received: from e129166.arm.com (unknown [10.57.93.139]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id F3A433F5A1; Mon, 25 Sep 2023 01:11:39 -0700 (PDT) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, mhiramat@kernel.org, qyousef@layalina.io, wvw@google.com Subject: [PATCH v4 11/18] PM: EM: Add runtime update interface to modify EM power Date: Mon, 25 Sep 2023 09:11:32 +0100 Message-Id: <20230925081139.1305766-12-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230925081139.1305766-1-lukasz.luba@arm.com> References: <20230925081139.1305766-1-lukasz.luba@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Add an interface which allows to modify EM power data at runtime. The new power information is populated by the provided callback, which is called for each performance state. The CPU frequencies' efficiency is re-calculated since that might be affected as well. The old EM memory is going to be freed later using RCU mechanism. Signed-off-by: Lukasz Luba --- include/linux/energy_model.h | 8 +++ kernel/power/energy_model.c | 111 +++++++++++++++++++++++++++++++++++ 2 files changed, 119 insertions(+) diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h index 740e7c25cfff..8f055ab356ed 100644 --- a/include/linux/energy_model.h +++ b/include/linux/energy_model.h @@ -201,6 +201,8 @@ struct em_data_callback { =20 struct em_perf_domain *em_cpu_get(int cpu); struct em_perf_domain *em_pd_get(struct device *dev); +int em_dev_update_perf_domain(struct device *dev, struct em_data_callback = *cb, + void *priv); int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states, struct em_data_callback *cb, cpumask_t *span, bool microwatts); @@ -384,6 +386,12 @@ static inline int em_pd_nr_perf_states(struct em_perf_= domain *pd) { return 0; } +static inline +int em_dev_update_perf_domain(struct device *dev, struct em_data_callback = *cb, + void *priv) +{ + return -EINVAL; +} #endif =20 #endif diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c index 2345837bfd2c..78e1495dc87e 100644 --- a/kernel/power/energy_model.c +++ b/kernel/power/energy_model.c @@ -172,6 +172,101 @@ static int em_compute_costs(struct device *dev, struc= t em_perf_state *table, return 0; } =20 +/** + * em_dev_update_perf_domain() - Update runtime EM table for a device + * @dev : Device for which the EM is to be updated + * @cb : Callback function providing the power data for the EM + * @priv : Pointer to private data useful for passing context + * which might be required while calling @cb + * + * Update EM runtime modifiable table for a @dev using the callback + * defined in @cb. The EM new power values are then used for calculating + * the em_perf_state::cost for associated performance state. + * + * This function uses mutex to serialize writers, so it must not be called + * from non-sleeping context. + * + * Return 0 on success or a proper error in case of failure. + */ +int em_dev_update_perf_domain(struct device *dev, struct em_data_callback = *cb, + void *priv) +{ + struct em_perf_table *runtime_table; + unsigned long power, freq; + struct em_perf_domain *pd; + int ret, i; + + if (!cb || !cb->update_power) + return -EINVAL; + + /* + * The lock serializes update and unregister code paths. When the + * EM has been unregistered in the meantime, we should capture that + * when entering this critical section. It also makes sure that + * two concurrent updates will be serialized. + */ + mutex_lock(&em_pd_mutex); + + if (!dev || !dev->em_pd) { + ret =3D -EINVAL; + goto unlock_em; + } + + pd =3D dev->em_pd; + + runtime_table =3D kzalloc(sizeof(*runtime_table), GFP_KERNEL); + if (!runtime_table) { + ret =3D -ENOMEM; + goto unlock_em; + } + + runtime_table->state =3D kcalloc(pd->nr_perf_states, + sizeof(struct em_perf_state), + GFP_KERNEL); + if (!runtime_table->state) { + ret =3D -ENOMEM; + goto free_runtime_table; + } + + /* Populate runtime table with updated values using driver callback */ + for (i =3D 0; i < pd->nr_perf_states; i++) { + freq =3D pd->default_table->state[i].frequency; + runtime_table->state[i].frequency =3D freq; + + /* + * Call driver callback to get a new power value for + * a given frequency. + */ + ret =3D cb->update_power(dev, freq, &power, priv); + if (ret) { + dev_dbg(dev, "EM: runtime update error: %d\n", ret); + goto free_runtime_state_table; + } + + runtime_table->state[i].power =3D power; + } + + ret =3D em_compute_costs(dev, runtime_table->state, cb, + pd->nr_perf_states, pd->flags); + if (ret) + goto free_runtime_state_table; + + em_perf_runtime_table_set(dev, runtime_table); + + mutex_unlock(&em_pd_mutex); + return 0; + +free_runtime_state_table: + kfree(runtime_table->state); +free_runtime_table: + kfree(runtime_table); +unlock_em: + mutex_unlock(&em_pd_mutex); + + return -EINVAL; +} +EXPORT_SYMBOL_GPL(em_dev_update_perf_domain); + static int em_create_perf_table(struct device *dev, struct em_perf_domain = *pd, int nr_states, struct em_data_callback *cb, unsigned long flags) @@ -494,6 +589,8 @@ void em_dev_unregister_perf_domain(struct device *dev) * The mutex separates all register/unregister requests and protects * from potential clean-up/setup issues in the debugfs directories. * The debugfs directory name is the same as device's name. + * The lock also protects the updater of the runtime modifiable + * EM and this remover. */ mutex_lock(&em_pd_mutex); =20 @@ -501,9 +598,23 @@ void em_dev_unregister_perf_domain(struct device *dev) =20 runtime_table =3D pd->runtime_table; =20 + /* + * Safely destroy runtime modifiable EM. By using the call + * synchronize_rcu() we make sure we don't progress till last user + * finished the RCU section and our update got applied. + */ rcu_assign_pointer(pd->runtime_table, NULL); synchronize_rcu(); =20 + /* + * After the sync no updates will be in-flight, so free the + * memory allocated for runtime table (if there was such). + */ + if (runtime_table !=3D pd->default_table) { + kfree(runtime_table->state); + kfree(runtime_table); + } + kfree(pd->default_table->state); kfree(pd->default_table); kfree(dev->em_pd); --=20 2.25.1 From nobody Fri Feb 13 12:33:03 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 29477CE7A89 for ; Mon, 25 Sep 2023 08:12:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232817AbjIYIMU (ORCPT ); Mon, 25 Sep 2023 04:12:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42564 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232810AbjIYIL5 (ORCPT ); Mon, 25 Sep 2023 04:11:57 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 761D7CCF; Mon, 25 Sep 2023 01:11:45 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 5F6251424; Mon, 25 Sep 2023 01:12:23 -0700 (PDT) Received: from e129166.arm.com (unknown [10.57.93.139]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id B74273F5A1; Mon, 25 Sep 2023 01:11:42 -0700 (PDT) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, mhiramat@kernel.org, qyousef@layalina.io, wvw@google.com Subject: [PATCH v4 12/18] PM: EM: Use runtime modified EM for CPUs energy estimation in EAS Date: Mon, 25 Sep 2023 09:11:33 +0100 Message-Id: <20230925081139.1305766-13-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230925081139.1305766-1-lukasz.luba@arm.com> References: <20230925081139.1305766-1-lukasz.luba@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" The new Energy Model (EM) supports runtime modification of the performance state table to better model the power used by the SoC. Use this new feature to improve energy estimation and therefore task placement in Energy Aware Scheduler (EAS). Signed-off-by: Lukasz Luba --- include/linux/energy_model.h | 20 +++++++++++++------- 1 file changed, 13 insertions(+), 7 deletions(-) diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h index 8f055ab356ed..41290ee2cdd0 100644 --- a/include/linux/energy_model.h +++ b/include/linux/energy_model.h @@ -261,15 +261,14 @@ static inline unsigned long em_cpu_energy(struct em_p= erf_domain *pd, unsigned long max_util, unsigned long sum_util, unsigned long allowed_cpu_cap) { + struct em_perf_table *runtime_table; unsigned long freq, scale_cpu; - struct em_perf_state *table, *ps; + struct em_perf_state *ps; int cpu, i; =20 if (!sum_util) return 0; =20 - table =3D pd->default_table->state; - /* * In order to predict the performance state, map the utilization of * the most utilized CPU of the performance domain to a requested @@ -280,7 +279,14 @@ static inline unsigned long em_cpu_energy(struct em_pe= rf_domain *pd, */ cpu =3D cpumask_first(to_cpumask(pd->cpus)); scale_cpu =3D arch_scale_cpu_capacity(cpu); - ps =3D &table[pd->nr_perf_states - 1]; + + /* + * No rcu_read_lock() since it's already called by task scheduler. + * The runtime_table is always there for CPUs, so we don't check. + */ + runtime_table =3D rcu_dereference(pd->runtime_table); + + ps =3D &runtime_table->state[pd->nr_perf_states - 1]; =20 max_util =3D map_util_perf(max_util); max_util =3D min(max_util, allowed_cpu_cap); @@ -290,9 +296,9 @@ static inline unsigned long em_cpu_energy(struct em_per= f_domain *pd, * Find the lowest performance state of the Energy Model above the * requested frequency. */ - i =3D em_pd_get_efficient_state(table, pd->nr_perf_states, freq, - pd->flags); - ps =3D &table[i]; + i =3D em_pd_get_efficient_state(runtime_table->state, pd->nr_perf_states, + freq, pd->flags); + ps =3D &runtime_table->state[i]; =20 /* * The capacity of a CPU in the domain at the performance state (ps) --=20 2.25.1 From nobody Fri Feb 13 12:33:03 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2CE68CE7A95 for ; Mon, 25 Sep 2023 08:12:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232762AbjIYIM3 (ORCPT ); Mon, 25 Sep 2023 04:12:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56440 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232748AbjIYIL5 (ORCPT ); Mon, 25 Sep 2023 04:11:57 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 2BE20CD9; Mon, 25 Sep 2023 01:11:48 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 20F22DA7; Mon, 25 Sep 2023 01:12:26 -0700 (PDT) Received: from e129166.arm.com (unknown [10.57.93.139]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 799613F5A1; Mon, 25 Sep 2023 01:11:45 -0700 (PDT) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, mhiramat@kernel.org, qyousef@layalina.io, wvw@google.com Subject: [PATCH v4 13/18] Documentation: EM: Update with runtime modification design Date: Mon, 25 Sep 2023 09:11:34 +0100 Message-Id: <20230925081139.1305766-14-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230925081139.1305766-1-lukasz.luba@arm.com> References: <20230925081139.1305766-1-lukasz.luba@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Add a new section 'Design' which covers the information about Energy Model. It contains the design decisions, describes models and how they reflect the reality. Add description of the default EM. Change the other section IDs. Add documentation bit for the new feature which allows to modify the EM in runtime. Signed-off-by: Lukasz Luba --- Documentation/power/energy-model.rst | 144 +++++++++++++++++++++++++-- 1 file changed, 134 insertions(+), 10 deletions(-) diff --git a/Documentation/power/energy-model.rst b/Documentation/power/ene= rgy-model.rst index ef341be2882b..3115411f9839 100644 --- a/Documentation/power/energy-model.rst +++ b/Documentation/power/energy-model.rst @@ -72,16 +72,70 @@ required to have the same micro-architecture. CPUs in d= ifferent performance domains can have different micro-architectures. =20 =20 -2. Core APIs +2. Design +----------------- + +2.1 Basic EM +^^^^^^^^^^^^ + +The basic EM is built around constant power information for each performan= ce +state, which is accessible in: 'dev->em_pd->default_table->state'. This mo= del +can be derived based on power measurements of the device e.g. CPU while +running some benchmark. The benchmark might be integer heavy or floating p= oint +computation with a data set fitting into the CPU cache or registers. Bare = in +mind that this model might not cover all possible workloads running on CPU= s. +Thus, please run a few different benchmarks and verify with some real +workloads your power model values. The power variation due to the workload +instruction mix and data set is not modeled. The static power, which can +change during runtime due to variation of SOC temperature, is not modeled +either. + +2.2 Runtime modifiable EM +^^^^^^^^^^^^^^^^^^^^^^^^^ + +To better reflect power variation due to static power (leakage) the EM +supports runtime modifications of the power values. The mechanism relies on +RCU to free the modifiable EM perf_state table memory. Its user, the task +scheduler, also uses RCU to access this memory. The EM framework is +responsible for allocating the new memory for the modifiable EM perf_state +table. The old memory is freed automatically using RCU callback mechanism. +This design decision is made based on task scheduler using that data and +to prevent wrong usage of kernel modules if they would be responsible for = the +memory management. + +There are two structures with the performance state tables in the EM: +a) dev->em_pd->default_table +b) dev->em_pd->runtime_table +They both point to the same memory location via: +'em_perf_table::state' pointer, until the first modification of the values +This should save memory on platforms which would never modify the EM. When +the first modification is made the 'default_table' (a) contains the old +EM which was created during the setup. The modified EM is available in the +'runtime_table' (b). + +Only EAS uses the 'runtime_table' and benefits from the updates to the +EM values. Other sub-systems (thermal, powercap) use the 'default_table' (= a). + +The kernel code which want to modify the EM values is protected from concu= rrent +access using a mutex. Therefore, the code must use sleeping context when +they want to modify the EM. + +With the runtime modifiable EM we switch from a 'single and during the ent= ire +runtime static EM' (system property) design to a 'single EM which can be +changed during runtime according e.g. to the workload' (system and workload +property) design. + + +3. Core APIs ------------ =20 -2.1 Config options +3.1 Config options ^^^^^^^^^^^^^^^^^^ =20 CONFIG_ENERGY_MODEL must be enabled to use the EM framework. =20 =20 -2.2 Registration of performance domains +3.2 Registration of performance domains ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ =20 Registration of 'advanced' EM @@ -110,8 +164,8 @@ The last argument 'microwatts' is important to set with= correct value. Kernel subsystems which use EM might rely on this flag to check if all EM devices= use the same scale. If there are different scales, these subsystems might deci= de to return warning/error, stop working or panic. -See Section 3. for an example of driver implementing this -callback, or Section 2.4 for further documentation on this API +See Section 4. for an example of driver implementing this +callback, or Section 3.4 for further documentation on this API =20 Registration of EM using DT ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -156,7 +210,7 @@ The EM which is registered using this method might not = reflect correctly the physics of a real device, e.g. when static power (leakage) is important. =20 =20 -2.3 Accessing performance domains +3.3 Accessing performance domains ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ =20 There are two API functions which provide the access to the energy model: @@ -175,10 +229,31 @@ CPUfreq governor is in use in case of CPU device. Cur= rently this calculation is not provided for other type of devices. =20 More details about the above APIs can be found in ``= `` -or in Section 2.4 +or in Section 3.5 + + +3.4 Runtime modifications +^^^^^^^^^^^^^^^^^^^^^^^^^ + +Drivers willing to modify the EM at runtime should use the following API:: + =20 + int em_dev_update_perf_domain(struct device *dev, + struct em_data_callback *cb, void *priv); =20 -2.4 Description details of this API +Drivers must provide a callback .update_power() returning power value for = each +performance state. The callback function provided by the driver is free +to fetch data from any relevant location (DT, firmware, ...) or sensor. +The .update_power() callback is called by the EM for each performance stat= e to +provide new power value. In the Section 4.2 there is an example driver +which shows simple implementation of this mechanism. The callback can be +declared with EM_UPDATE_CB() macro. The caller of that callback also passes +a private void pointer back to the driver which tries to update EM. +It is useful and helps to maintain the consistent context for all performa= nce +state calls for a given EM. + + +3.5 Description details of this API ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. kernel-doc:: include/linux/energy_model.h :internal: @@ -187,8 +262,11 @@ or in Section 2.4 :export: =20 =20 -3. Example driver ------------------ +4. Examples +----------- + +4.1 Example driver with EM registration +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ =20 The CPUFreq framework supports dedicated callback for registering the EM for a given CPU(s) 'policy' object: cpufreq_driver::register_em(). @@ -242,3 +320,49 @@ EM framework:: 39 static struct cpufreq_driver foo_cpufreq_driver =3D { 40 .register_em =3D foo_cpufreq_register_em, 41 }; + + +4.2 Example driver with EM modification +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +This section provides a simple example of a thermal driver modifying the E= M. +The driver implements a foo_mod_power() function to be provided to the +EM framework. The driver is woken up periodically to check the temperature +and modify the EM data if needed:: + + -> drivers/thermal/foo_thermal.c + + 01 static int foo_mod_power(struct device *dev, unsigned long freq, + 02 unsigned long *power, void *priv) + 03 { + 04 struct foo_context *ctx =3D priv; + 05 + 06 /* Estimate power for the given frequency and temperature */ + 07 *power =3D foo_estimate_power(dev, freq, ctx->temperature); + 08 if (*power >=3D EM_MAX_POWER); + 09 return -EINVAL; + 10 + 11 return 0; + 12 } + 13 + 14 /* + 15 * Function called periodically to check the temperature and + 16 * update the EM if needed + 17 */ + 18 static void foo_thermal_em_update(struct foo_context *ctx) + 19 { + 20 struct em_data_callback em_cb =3D EM_UPDATE_CB(mod_power); + 21 struct cpufreq_policy *policy =3D ctx->policy; + 22 struct device *cpu_dev; + 23 + 24 cpu_dev =3D get_cpu_device(cpumask_first(policy->cpus)); + 25 + 26 ctx->temperature =3D foo_get_temp(cpu_dev, ctx); + 27 if (ctx->temperature < FOO_EM_UPDATE_TEMP_THRESHOLD) + 28 return; + 29 + 30 /* Update EM for the CPUs' performance domain */ + 31 ret =3D em_dev_update_perf_domain(cpu_dev, &em_cb, ctx); + 32 if (ret) + 33 pr_warn("foo_thermal: EM update failed\n"); + 34 } --=20 2.25.1 From nobody Fri Feb 13 12:33:03 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1054ECE7A81 for ; Mon, 25 Sep 2023 08:12:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232686AbjIYIMb (ORCPT ); Mon, 25 Sep 2023 04:12:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42652 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232721AbjIYIL6 (ORCPT ); Mon, 25 Sep 2023 04:11:58 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id E0DD4112; Mon, 25 Sep 2023 01:11:50 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D917BDA7; Mon, 25 Sep 2023 01:12:28 -0700 (PDT) Received: from e129166.arm.com (unknown [10.57.93.139]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 3BAB73F5A1; Mon, 25 Sep 2023 01:11:48 -0700 (PDT) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, mhiramat@kernel.org, qyousef@layalina.io, wvw@google.com Subject: [PATCH v4 14/18] PM: EM: Add performance field to struct em_perf_state Date: Mon, 25 Sep 2023 09:11:35 +0100 Message-Id: <20230925081139.1305766-15-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230925081139.1305766-1-lukasz.luba@arm.com> References: <20230925081139.1305766-1-lukasz.luba@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" The performance doesn't scale linearly with the frequency. Also, it may be different in different workloads. Some CPUs are designed to be particularly good at some applications e.g. images or video processing and other CPUs in different. When those different types of CPUs are combined in one SoC they should be properly modeled to get max of the HW in Energy Aware Scheduler (EAS). The Energy Model (EM) provides the power vs. performance curves to the EAS, but assumes the CPUs capacity is fixed and scales linearly with the frequency. This patch allows to adjust the curve on the 'performance' axis as well. Signed-off-by: Lukasz Luba --- include/linux/energy_model.h | 11 ++++++----- kernel/power/energy_model.c | 27 +++++++++++++++++++++++++++ 2 files changed, 33 insertions(+), 5 deletions(-) diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h index 41290ee2cdd0..37fc8490709d 100644 --- a/include/linux/energy_model.h +++ b/include/linux/energy_model.h @@ -12,6 +12,7 @@ =20 /** * struct em_perf_state - Performance state of a performance domain + * @performance: Non-linear CPU performance at a given frequency * @frequency: The frequency in KHz, for consistency with CPUFreq * @power: The power consumed at this level (by 1 CPU or by a registered * device). It can be a total power: static and dynamic. @@ -20,6 +21,7 @@ * @flags: see "em_perf_state flags" description below. */ struct em_perf_state { + unsigned long performance; unsigned long frequency; unsigned long power; unsigned long cost; @@ -223,14 +225,14 @@ void em_dev_unregister_perf_domain(struct device *dev= ); */ static inline int em_pd_get_efficient_state(struct em_perf_state *table, int nr_perf_states, - unsigned long freq, unsigned long pd_flags) + unsigned long max_util, unsigned long pd_flags) { struct em_perf_state *ps; int i; =20 for (i =3D 0; i < nr_perf_states; i++) { ps =3D &table[i]; - if (ps->frequency >=3D freq) { + if (ps->performance >=3D max_util) { if (pd_flags & EM_PERF_DOMAIN_SKIP_INEFFICIENCIES && ps->flags & EM_PERF_STATE_INEFFICIENT) continue; @@ -262,8 +264,8 @@ static inline unsigned long em_cpu_energy(struct em_per= f_domain *pd, unsigned long allowed_cpu_cap) { struct em_perf_table *runtime_table; - unsigned long freq, scale_cpu; struct em_perf_state *ps; + unsigned long scale_cpu; int cpu, i; =20 if (!sum_util) @@ -290,14 +292,13 @@ static inline unsigned long em_cpu_energy(struct em_p= erf_domain *pd, =20 max_util =3D map_util_perf(max_util); max_util =3D min(max_util, allowed_cpu_cap); - freq =3D map_util_freq(max_util, ps->frequency, scale_cpu); =20 /* * Find the lowest performance state of the Energy Model above the * requested frequency. */ i =3D em_pd_get_efficient_state(runtime_table->state, pd->nr_perf_states, - freq, pd->flags); + max_util, pd->flags); ps =3D &runtime_table->state[i]; =20 /* diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c index 78e1495dc87e..c7ad42b42c46 100644 --- a/kernel/power/energy_model.c +++ b/kernel/power/energy_model.c @@ -46,6 +46,7 @@ static void em_debug_create_ps(struct em_perf_state *ps, = struct dentry *pd) debugfs_create_ulong("frequency", 0444, d, &ps->frequency); debugfs_create_ulong("power", 0444, d, &ps->power); debugfs_create_ulong("cost", 0444, d, &ps->cost); + debugfs_create_ulong("performance", 0444, d, &ps->performance); debugfs_create_ulong("inefficient", 0444, d, &ps->flags); } =20 @@ -133,6 +134,30 @@ static void em_perf_runtime_table_set(struct device *d= ev, call_rcu(&tmp->rcu, em_destroy_rt_table_rcu); } =20 +static void em_init_performance(struct device *dev, struct em_perf_domain = *pd, + struct em_perf_state *table, int nr_states) +{ + u64 fmax, max_cap; + int i, cpu; + + /* This is needed only for CPUs and EAS skip other devices */ + if (!_is_cpu_device(dev)) + return; + + cpu =3D cpumask_first(em_span_cpus(pd)); + + /* + * Calculate the performance value for each frequency with + * linear relationship. The final CPU capacity might not be ready at + * boot time, but the EM will be updated a bit later with correct one. + */ + fmax =3D (u64) table[nr_states - 1].frequency; + max_cap =3D (u64) arch_scale_cpu_capacity(cpu); + for (i =3D 0; i < nr_states; i++) + table[i].performance =3D div64_u64(max_cap * table[i].frequency, + fmax); +} + static int em_compute_costs(struct device *dev, struct em_perf_state *tabl= e, struct em_data_callback *cb, int nr_states, unsigned long flags) @@ -317,6 +342,8 @@ static int em_create_perf_table(struct device *dev, str= uct em_perf_domain *pd, table[i].frequency =3D prev_freq =3D freq; } =20 + em_init_performance(dev, pd, table, nr_states); + ret =3D em_compute_costs(dev, table, cb, nr_states, flags); if (ret) goto free_ps_table; --=20 2.25.1 From nobody Fri Feb 13 12:33:03 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0A056CE7A89 for ; Mon, 25 Sep 2023 08:12:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232733AbjIYIMg (ORCPT ); Mon, 25 Sep 2023 04:12:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56550 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232836AbjIYIMA (ORCPT ); Mon, 25 Sep 2023 04:12:00 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id CA983121; Mon, 25 Sep 2023 01:11:53 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 9A5C2DA7; Mon, 25 Sep 2023 01:12:31 -0700 (PDT) Received: from e129166.arm.com (unknown [10.57.93.139]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id F3B753F5A1; Mon, 25 Sep 2023 01:11:50 -0700 (PDT) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, mhiramat@kernel.org, qyousef@layalina.io, wvw@google.com Subject: [PATCH v4 15/18] PM: EM: Adjust performance with runtime modification callback Date: Mon, 25 Sep 2023 09:11:36 +0100 Message-Id: <20230925081139.1305766-16-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230925081139.1305766-1-lukasz.luba@arm.com> References: <20230925081139.1305766-1-lukasz.luba@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" The performance value may be modified at runtime together with the power value for each OPP. They both would form a different power and performance profile in the EM. Modify the callback interface to make this possible. Signed-off-by: Lukasz Luba --- include/linux/energy_model.h | 24 +++++++++++++++--------- kernel/power/energy_model.c | 7 ++++--- 2 files changed, 19 insertions(+), 12 deletions(-) diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h index 37fc8490709d..65a8794d1565 100644 --- a/include/linux/energy_model.h +++ b/include/linux/energy_model.h @@ -174,24 +174,29 @@ struct em_data_callback { unsigned long *cost); =20 /** - * update_power() - Provide new power at the given performance state of - * a device + * update_power_perf() - Provide new power and performance at the given + * performance state of a device * @dev : Device for which we do this operation (can be a CPU) * @freq : Frequency at the performance state in kHz * @power : New power value at the performance state * (modified) + * @perf : New performance value at the performance state + * (modified) * @priv : Pointer to private data useful for tracking context * during runtime modifications of EM. * - * The update_power() is used by runtime modifiable EM. It aims to - * provide updated power value for a given frequency, which is stored - * in the performance state. The power value provided by this callback - * should fit in the [0, EM_MAX_POWER] range. + * The update_power_perf() is used by runtime modifiable EM. It aims to + * provide updated power and performance value for a given frequency, + * which is stored in the performance state. The power value provided + * by this callback should fit in the [0, EM_MAX_POWER] range. The + * performance value should be lower or equal to the CPU max capacity + * (1024). * * Return 0 on success, or appropriate error value in case of failure. */ - int (*update_power)(struct device *dev, unsigned long freq, - unsigned long *power, void *priv); + int (*update_power_perf)(struct device *dev, unsigned long freq, + unsigned long *power, unsigned long *perf, + void *priv); }; #define EM_SET_ACTIVE_POWER_CB(em_cb, cb) ((em_cb).active_power =3D cb) #define EM_ADV_DATA_CB(_active_power_cb, _cost_cb) \ @@ -199,7 +204,8 @@ struct em_data_callback { .get_cost =3D _cost_cb } #define EM_DATA_CB(_active_power_cb) \ EM_ADV_DATA_CB(_active_power_cb, NULL) -#define EM_UPDATE_CB(_update_power_cb) { .update_power =3D &_update_power_= cb } +#define EM_UPDATE_CB(_update_pwr_perf_cb) \ + { .update_power_perf =3D &_update_pwr_perf_cb } =20 struct em_perf_domain *em_cpu_get(int cpu); struct em_perf_domain *em_pd_get(struct device *dev); diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c index c7ad42b42c46..17a59a7717f7 100644 --- a/kernel/power/energy_model.c +++ b/kernel/power/energy_model.c @@ -217,11 +217,11 @@ int em_dev_update_perf_domain(struct device *dev, str= uct em_data_callback *cb, void *priv) { struct em_perf_table *runtime_table; - unsigned long power, freq; + unsigned long power, freq, perf; struct em_perf_domain *pd; int ret, i; =20 - if (!cb || !cb->update_power) + if (!cb || !cb->update_power_perf) return -EINVAL; =20 /* @@ -262,13 +262,14 @@ int em_dev_update_perf_domain(struct device *dev, str= uct em_data_callback *cb, * Call driver callback to get a new power value for * a given frequency. */ - ret =3D cb->update_power(dev, freq, &power, priv); + ret =3D cb->update_power_perf(dev, freq, &power, &perf, priv); if (ret) { dev_dbg(dev, "EM: runtime update error: %d\n", ret); goto free_runtime_state_table; } =20 runtime_table->state[i].power =3D power; + runtime_table->state[i].performance =3D perf; } =20 ret =3D em_compute_costs(dev, runtime_table->state, cb, --=20 2.25.1 From nobody Fri Feb 13 12:33:03 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8EAECCE7A89 for ; Mon, 25 Sep 2023 08:12:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232926AbjIYIMm (ORCPT ); Mon, 25 Sep 2023 04:12:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42472 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232665AbjIYIMD (ORCPT ); Mon, 25 Sep 2023 04:12:03 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 74AF419C; Mon, 25 Sep 2023 01:11:56 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 5B4B4DA7; Mon, 25 Sep 2023 01:12:34 -0700 (PDT) Received: from e129166.arm.com (unknown [10.57.93.139]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id B44523F5A1; Mon, 25 Sep 2023 01:11:53 -0700 (PDT) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, mhiramat@kernel.org, qyousef@layalina.io, wvw@google.com Subject: [PATCH v4 16/18] PM: EM: Support late CPUs booting and capacity adjustment Date: Mon, 25 Sep 2023 09:11:37 +0100 Message-Id: <20230925081139.1305766-17-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230925081139.1305766-1-lukasz.luba@arm.com> References: <20230925081139.1305766-1-lukasz.luba@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" The patch adds needed infrastructure to handle the late CPUs boot, which might change the previous CPUs capacity values. With this changes the new CPUs which try to register EM will trigger the needed re-calculations for other CPUs EMs. Thanks to that the em_per_state::performance values will be aligned with the CPU capacity information after all CPUs finish the boot. Signed-off-by: Lukasz Luba --- kernel/power/energy_model.c | 108 ++++++++++++++++++++++++++++++++++++ 1 file changed, 108 insertions(+) diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c index 17a59a7717f7..6bfd33c2e48c 100644 --- a/kernel/power/energy_model.c +++ b/kernel/power/energy_model.c @@ -25,6 +25,9 @@ static DEFINE_MUTEX(em_pd_mutex); =20 static void em_cpufreq_update_efficiencies(struct device *dev, struct em_perf_state *table); +static void em_check_capacity_update(void); +static void em_update_workfn(struct work_struct *work); +static DECLARE_DELAYED_WORK(em_update_work, em_update_workfn); =20 static bool _is_cpu_device(struct device *dev) { @@ -591,6 +594,10 @@ int em_dev_register_perf_domain(struct device *dev, un= signed int nr_states, =20 unlock: mutex_unlock(&em_pd_mutex); + + if (_is_cpu_device(dev)) + em_check_capacity_update(); + return ret; } EXPORT_SYMBOL_GPL(em_dev_register_perf_domain); @@ -651,3 +658,104 @@ void em_dev_unregister_perf_domain(struct device *dev) mutex_unlock(&em_pd_mutex); } EXPORT_SYMBOL_GPL(em_dev_unregister_perf_domain); + +/* + * Adjustment of CPU performance values after boot, when all CPUs capacites + * are correctly calculated. + */ +static int get_updated_perf(struct device *dev, unsigned long freq, + unsigned long *power, unsigned long *perf, + void *priv) +{ + struct em_perf_state *table =3D priv; + int i, cpu, nr_states; + u64 fmax, max_cap; + + nr_states =3D dev->em_pd->nr_perf_states; + + cpu =3D cpumask_first(em_span_cpus(dev->em_pd)); + + fmax =3D (u64) table[nr_states - 1].frequency; + max_cap =3D (u64) arch_scale_cpu_capacity(cpu); + + for (i =3D 0; i < nr_states; i++) { + if (freq !=3D table[i].frequency) + continue; + + *power =3D table[i].power; + *perf =3D div64_u64(max_cap * freq, fmax); + break; + } + + return 0; +} + +static void em_check_capacity_update(void) +{ + struct em_data_callback em_cb =3D EM_UPDATE_CB(get_updated_perf); + struct em_perf_table *runtime_table; + struct em_perf_domain *em_pd; + cpumask_var_t cpu_done_mask; + unsigned long cpu_capacity; + struct em_perf_state *ps; + struct device *dev; + int cpu, ret; + + if (!zalloc_cpumask_var(&cpu_done_mask, GFP_KERNEL)) { + pr_warn("EM: no free memory\n"); + return; + } + + /* Loop over all EMs and check if the CPU capacity has changed. */ + for_each_possible_cpu(cpu) { + unsigned long em_max_performance; + struct cpufreq_policy *policy; + + if (cpumask_test_cpu(cpu, cpu_done_mask)) + continue; + + policy =3D cpufreq_cpu_get(cpu); + if (!policy) { + pr_debug("EM: Accessing cpu%d policy failed\n", cpu); + schedule_delayed_work(&em_update_work, + msecs_to_jiffies(1000)); + break; + } + + em_pd =3D em_cpu_get(cpu); + if (!em_pd || em_is_artificial(em_pd)) + continue; + + cpu_capacity =3D arch_scale_cpu_capacity(cpu); + + rcu_read_lock(); + runtime_table =3D rcu_dereference(em_pd->runtime_table); + ps =3D &runtime_table->state[em_pd->nr_perf_states - 1]; + em_max_performance =3D ps->performance; + rcu_read_unlock(); + + /* + * Check if the CPU capacity has been adjusted during boot + * and trigger the update for new performance values. + */ + if (em_max_performance !=3D cpu_capacity) { + dev =3D get_cpu_device(cpu); + ret =3D em_dev_update_perf_domain(dev, &em_cb, + em_pd->default_table->state); + if (ret) + dev_warn(dev, "EM: update failed %d\n", ret); + else + dev_info(dev, "EM: updated\n"); + } + + cpumask_or(cpu_done_mask, cpu_done_mask, + em_span_cpus(em_pd)); + } + + free_cpumask_var(cpu_done_mask); +} + +static void em_update_workfn(struct work_struct *work) +{ + em_check_capacity_update(); +} --=20 2.25.1 From nobody Fri Feb 13 12:33:03 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 51328CE7A81 for ; Mon, 25 Sep 2023 08:12:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232889AbjIYIMu (ORCPT ); Mon, 25 Sep 2023 04:12:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38226 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232741AbjIYIMI (ORCPT ); Mon, 25 Sep 2023 04:12:08 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 2BA2ECF3; Mon, 25 Sep 2023 01:11:59 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 200B0DA7; Mon, 25 Sep 2023 01:12:37 -0700 (PDT) Received: from e129166.arm.com (unknown [10.57.93.139]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 7846F3F5A1; Mon, 25 Sep 2023 01:11:56 -0700 (PDT) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, mhiramat@kernel.org, qyousef@layalina.io, wvw@google.com Subject: [PATCH v4 17/18] PM: EM: Optimize em_cpu_energy() and remove division Date: Mon, 25 Sep 2023 09:11:38 +0100 Message-Id: <20230925081139.1305766-18-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230925081139.1305766-1-lukasz.luba@arm.com> References: <20230925081139.1305766-1-lukasz.luba@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" The Energy Model (EM) can be modified at runtime which brings new possibilities. The em_cpu_energy() is called by the Energy Aware Scheduler (EAS) in it's hot path. The energy calculation uses power value for a given performance state (ps) and the CPU busy time as percentage for that given frequency, which effectively is: pd_nrg =3D ps->power * busy_time_pct (1) cpu_util busy_time_pct =3D ----------------- (2) ps->performance The 'ps->performance' is the CPU capacity (performance) at that given ps. Thus, in a situation when the OS is not overloaded and we have EAS working, the busy time is lower than 'ps->performance' that the CPU is running at. Therefore, in longer scheduling period we can treat the power value calculated above as the energy. We can optimize the last arithmetic operation in em_cpu_energy() and remove the division. This can be done because em_perf_state::cost, which is a special coefficient, can now hold the pre-calculated value including the 'ps->performance' information for a performance state (ps): ps->power ps->cost =3D --------------- (3) ps->performance In the past the 'ps->performance' had to be calculated at runtime every time the em_cpu_energy() was called. Thus there was this formula involved: ps->freq ps->performance =3D ------------- * scale_cpu (4) cpu_max_freq When we inject (4) into (2) than we can have this equation: cpu_util * cpu_max_freq busy_time_pct =3D ------------------------ (5) ps->freq * scale_cpu Because the right 'scale_cpu' value wasn't ready during the boot time and EM initialization, we had to perform the division by 'scale_cpu' at runtime. There was not safe mechanism to update EM at runtime. It has changed thanks to EM runtime modification feature. It is possible to avoid the division by 'scale_cpu' at runtime, because EM is updated whenever new max capacity CPU is set in the system or after the boot has finished and proper CPU capacity is ready. Use that feature and do the needed division during the calculation of the coefficient 'ps->cost'. That enhanced 'ps->cost' value can be then just multiplied simply by utilization: pd_nrg =3D ps->cost * \Sum cpu_util (6) to get the needed energy for whole Performance Domain (PD). With this optimization, the em_cpu_energy() should run faster on the Big CPU by 1.43x and on the Little CPU by 1.69x. Signed-off-by: Lukasz Luba --- include/linux/energy_model.h | 68 +++++------------------------------- kernel/power/energy_model.c | 7 ++-- 2 files changed, 12 insertions(+), 63 deletions(-) diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h index 65a8794d1565..31c4e3b8f7c3 100644 --- a/include/linux/energy_model.h +++ b/include/linux/energy_model.h @@ -112,27 +112,6 @@ struct em_perf_domain { #define EM_MAX_NUM_CPUS 16 #endif =20 -/* - * To avoid an overflow on 32bit machines while calculating the energy - * use a different order in the operation. First divide by the 'cpu_scale' - * which would reduce big value stored in the 'cost' field, then multiply = by - * the 'sum_util'. This would allow to handle existing platforms, which ha= ve - * e.g. power ~1.3 Watt at max freq, so the 'cost' value > 1mln micro-Watt= s. - * In such scenario, where there are 4 CPUs in the Perf. Domain the 'sum_u= til' - * could be 4096, then multiplication: 'cost' * 'sum_util' would overflow. - * This reordering of operations has some limitations, we lose small - * precision in the estimation (comparing to 64bit platform w/o reordering= ). - * - * We are safe on 64bit machine. - */ -#ifdef CONFIG_64BIT -#define em_estimate_energy(cost, sum_util, scale_cpu) \ - (((cost) * (sum_util)) / (scale_cpu)) -#else -#define em_estimate_energy(cost, sum_util, scale_cpu) \ - (((cost) / (scale_cpu)) * (sum_util)) -#endif - struct em_data_callback { /** * active_power() - Provide power at the next performance state of @@ -271,29 +250,16 @@ static inline unsigned long em_cpu_energy(struct em_p= erf_domain *pd, { struct em_perf_table *runtime_table; struct em_perf_state *ps; - unsigned long scale_cpu; - int cpu, i; + int i; =20 if (!sum_util) return 0; =20 - /* - * In order to predict the performance state, map the utilization of - * the most utilized CPU of the performance domain to a requested - * frequency, like schedutil. Take also into account that the real - * frequency might be set lower (due to thermal capping). Thus, clamp - * max utilization to the allowed CPU capacity before calculating - * effective frequency. - */ - cpu =3D cpumask_first(to_cpumask(pd->cpus)); - scale_cpu =3D arch_scale_cpu_capacity(cpu); - /* * No rcu_read_lock() since it's already called by task scheduler. * The runtime_table is always there for CPUs, so we don't check. */ runtime_table =3D rcu_dereference(pd->runtime_table); - ps =3D &runtime_table->state[pd->nr_perf_states - 1]; =20 max_util =3D map_util_perf(max_util); @@ -308,35 +274,21 @@ static inline unsigned long em_cpu_energy(struct em_p= erf_domain *pd, ps =3D &runtime_table->state[i]; =20 /* - * The capacity of a CPU in the domain at the performance state (ps) - * can be computed as: - * - * ps->freq * scale_cpu - * ps->cap =3D -------------------- (1) - * cpu_max_freq - * - * So, ignoring the costs of idle states (which are not available in - * the EM), the energy consumed by this CPU at that performance state + * The energy consumed by the CPU at the given performance state (ps) * is estimated as: * - * ps->power * cpu_util - * cpu_nrg =3D -------------------- (2) - * ps->cap + * ps->power + * cpu_nrg =3D --------------- * cpu_util (1) + * ps->performance * - * since 'cpu_util / ps->cap' represents its percentage of busy time. + * The 'cpu_util / ps->performance' represents its percentage of + * busy time. The idle cost is ignored (it's not available in the EM). * * NOTE: Although the result of this computation actually is in * units of power, it can be manipulated as an energy value * over a scheduling period, since it is assumed to be * constant during that interval. * - * By injecting (1) in (2), 'cpu_nrg' can be re-expressed as a product - * of two terms: - * - * ps->power * cpu_max_freq cpu_util - * cpu_nrg =3D ------------------------ * --------- (3) - * ps->freq scale_cpu - * * The first term is static, and is stored in the em_perf_state struct * as 'ps->cost'. * @@ -345,11 +297,9 @@ static inline unsigned long em_cpu_energy(struct em_pe= rf_domain *pd, * total energy of the domain (which is the simple sum of the energy of * all of its CPUs) can be factorized as: * - * ps->cost * \Sum cpu_util - * pd_nrg =3D ------------------------ (4) - * scale_cpu + * pd_nrg =3D ps->cost * \Sum cpu_util (2) */ - return em_estimate_energy(ps->cost, sum_util, scale_cpu); + return ps->cost * sum_util; } =20 /** diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c index 6bfd33c2e48c..cf9da7259f5e 100644 --- a/kernel/power/energy_model.c +++ b/kernel/power/energy_model.c @@ -166,11 +166,9 @@ static int em_compute_costs(struct device *dev, struct= em_perf_state *table, unsigned long flags) { unsigned long prev_cost =3D ULONG_MAX; - u64 fmax; int i, ret; =20 /* Compute the cost of each performance state. */ - fmax =3D (u64) table[nr_states - 1].frequency; for (i =3D nr_states - 1; i >=3D 0; i--) { unsigned long power_res, cost; =20 @@ -182,8 +180,9 @@ static int em_compute_costs(struct device *dev, struct = em_perf_state *table, return -EINVAL; } } else { - power_res =3D table[i].power; - cost =3D div64_u64(fmax * power_res, table[i].frequency); + /* increase resolution of 'cost' precision */ + power_res =3D table[i].power * 10; + cost =3D power_res / table[i].performance; } =20 table[i].cost =3D cost; --=20 2.25.1 From nobody Fri Feb 13 12:33:03 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1160CCE7A89 for ; Mon, 25 Sep 2023 08:12:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232913AbjIYINC (ORCPT ); Mon, 25 Sep 2023 04:13:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56426 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232808AbjIYIMS (ORCPT ); Mon, 25 Sep 2023 04:12:18 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 0D483115; Mon, 25 Sep 2023 01:12:02 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D510CDA7; Mon, 25 Sep 2023 01:12:39 -0700 (PDT) Received: from e129166.arm.com (unknown [10.57.93.139]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 39C503F5A1; Mon, 25 Sep 2023 01:11:59 -0700 (PDT) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, mhiramat@kernel.org, qyousef@layalina.io, wvw@google.com Subject: [PATCH v4 18/18] Documentation: EM: Update information about performance field Date: Mon, 25 Sep 2023 09:11:39 +0100 Message-Id: <20230925081139.1305766-19-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230925081139.1305766-1-lukasz.luba@arm.com> References: <20230925081139.1305766-1-lukasz.luba@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" The Energy Model supports the new information: performance for each performance state. Update the needed documentation part accordingly. Signed-off-by: Lukasz Luba --- Documentation/power/energy-model.rst | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/Documentation/power/energy-model.rst b/Documentation/power/ene= rgy-model.rst index 3115411f9839..da3619c0b98a 100644 --- a/Documentation/power/energy-model.rst +++ b/Documentation/power/energy-model.rst @@ -125,6 +125,11 @@ runtime static EM' (system property) design to a 'sing= le EM which can be changed during runtime according e.g. to the workload' (system and workload property) design. =20 +It is possible also to modify the CPU performance values for each EM's +performance state. Thus, the full power and performance profile (which +is an exponential curve) can be changed according e.g. to the workload +or system property. + =20 3. Core APIs ------------ @@ -326,18 +331,18 @@ EM framework:: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ =20 This section provides a simple example of a thermal driver modifying the E= M. -The driver implements a foo_mod_power() function to be provided to the +The driver implements a mod_power_perf() function to be provided to the EM framework. The driver is woken up periodically to check the temperature and modify the EM data if needed:: =20 -> drivers/thermal/foo_thermal.c =20 - 01 static int foo_mod_power(struct device *dev, unsigned long freq, - 02 unsigned long *power, void *priv) + 01 static int mod_power_perf(struct device *dev, unsigned long freq, + 02 unsigned long *power, unsigned long *perf, void *priv) 03 { 04 struct foo_context *ctx =3D priv; 05 - 06 /* Estimate power for the given frequency and temperature */ + 06 *perf =3D foo_estimate_performance(dev, freq); 07 *power =3D foo_estimate_power(dev, freq, ctx->temperature); 08 if (*power >=3D EM_MAX_POWER); 09 return -EINVAL; --=20 2.25.1