From nobody Sun Sep 14 00:31:11 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 798B9C61DA7 for ; Fri, 27 Jan 2023 20:24:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230425AbjA0UYG (ORCPT ); Fri, 27 Jan 2023 15:24:06 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38188 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229495AbjA0UYB (ORCPT ); Fri, 27 Jan 2023 15:24:01 -0500 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 93D1C6A67; Fri, 27 Jan 2023 12:24:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1674851040; x=1706387040; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=UdsiALeU4rr0v+aNNFHWoK4uGHxdzHfOzR8c4y++rjk=; b=KV3ICMY7TJhyL0BAawcBTNf9FdgqyLdm0YcvQfFG9In8t8KvtCfura1t vN5+/X2LHdFnW6pPqsW6Nh0NwDnqFs7OLEZMlFftrlKiiMTbcswJdD/oz TxmJMj+FvZjy/zZL1wq8Vj3KIYwO/6Z4GSCQ2O5iMCB/bTxJKo3gkwApB m754GEPZjkunFUdydKyC7dHHqMLj+8xZFPCT1F8B1qmn67WJHsk9+Pw7k LLXS/292lCLO71DdNH+XLu9OJYF7o3wOV4BIBZKiVat9qvxNXG27+iCql cjh3L0Jp+dfh8eyts+eVRITSQFtPK+n+j0hUHLNTz/rHasGZ7Pmmn1D6T A==; X-IronPort-AV: E=McAfee;i="6500,9779,10603"; a="354484490" X-IronPort-AV: E=Sophos;i="5.97,252,1669104000"; d="scan'208";a="354484490" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2023 12:24:00 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10603"; a="640816925" X-IronPort-AV: E=Sophos;i="5.97,252,1669104000"; d="scan'208";a="640816925" Received: from spandruv-desk.jf.intel.com ([10.54.75.8]) by orsmga006.jf.intel.com with ESMTP; 27 Jan 2023 12:23:58 -0800 From: Srinivas Pandruvada To: rafael@kernel.org, rui.zhang@intel.com, daniel.lezcano@linaro.org Cc: linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, Srinivas Pandruvada Subject: [RFC PATCH] thermal/drivers/intel_powerclamp: Support cpumask and step_size Date: Fri, 27 Jan 2023 12:23:58 -0800 Message-Id: <20230127202358.992651-1-srinivas.pandruvada@linux.intel.com> X-Mailer: git-send-email 2.39.1 MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" In some use cases, it is desirable to only inject idle on certain set of CPUs. For example on Alder Lake systems, it is possible that we force idle only on P-Cores for thermal reasons. Also the idle percent can be more than 50% if we only choose partial set of CPUs in the system. Introduce module parameters for setting cpumask and step_size. They can be only changed when the cooling device is inactive. This module already have other module parameters. There is no change done for those parameters. cpumask (Read/Write): A bit mask of CPUs to inject idle. The format of this bitmask is same as used in other subsystems like in /proc/irq/*/smp_affinity. The mask is comma separated 32 bit groups. Each CPU is one bit. For example for 256 CPU system the full mask is: ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff The leftmost mask is for CPU 0-32. step_size (Read/Write): Step size specifies a multiplication factor to the cooling device state. The step size can be either 1 or 2. By default, step_size =3D 1. When the step size =3D 1, the max idle percent is 50%. When step size =3D 2, the max idle percent is 100%. For the step size of 2, the cpumask should be smaller than the full system cpumask. In this way the system wide idle injection is restricted to 50%. Since the max state for thermal cooling can't be changed with some thermal core changes, step_size is a way to allow more than 50% idle percent. Signed-off-by: Srinivas Pandruvada --- .../driver-api/thermal/intel_powerclamp.rst | 22 +++ drivers/thermal/intel/intel_powerclamp.c | 166 ++++++++++++++++-- 2 files changed, 170 insertions(+), 18 deletions(-) diff --git a/Documentation/driver-api/thermal/intel_powerclamp.rst b/Docume= ntation/driver-api/thermal/intel_powerclamp.rst index 3f6dfb0b3ea6..582e3e42750b 100644 --- a/Documentation/driver-api/thermal/intel_powerclamp.rst +++ b/Documentation/driver-api/thermal/intel_powerclamp.rst @@ -26,6 +26,8 @@ By: - Generic Thermal Layer (sysfs) - Kernel APIs (TBD) =20 + (*) Module Parameters + INTRODUCTION =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 @@ -318,3 +320,23 @@ device, a PID based userspace thermal controller can m= anage to control CPU temperature effectively, when no other thermal influence is added. For example, a UltraBook user can compile the kernel under certain temperature (below most active trip points). + +Module Parameters +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +``cpumask`` (RW) + A bit mask of CPUs to inject idle. The format of the bitmask is same as + used in other subsystems like in /proc/irq/*/smp_affinity. The mask is + comma separated 32 bit groups. Each CPU is one bit. For example for a 256 + CPU system the full mask is: + ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff + + The leftmost mask is for CPU 0-32. + +``step_size`` (RW) + Step size specifies a multiplication factor to the cooling device state. + The step size can be either 1 or 2. By default, + step_size =3D 1. When the step size =3D 1, the max idle percent is 50%. + When step size =3D 2, the max idle percent is 100%. For the step size of = 2, + the cpumask should be smaller than the full system cpumask. In this way + the system wide idle injection is restricted to 50%. diff --git a/drivers/thermal/intel/intel_powerclamp.c b/drivers/thermal/int= el/intel_powerclamp.c index 320525c3c530..221c6de3e06f 100644 --- a/drivers/thermal/intel/intel_powerclamp.c +++ b/drivers/thermal/intel/intel_powerclamp.c @@ -108,6 +108,119 @@ static const struct kernel_param_ops duration_ops =3D= { module_param_cb(duration, &duration_ops, &duration, 0644); MODULE_PARM_DESC(duration, "forced idle time for each attempt in msec."); =20 +static u8 step_size =3D 1; +static cpumask_var_t idle_injection_cpu_mask; + +static int cpumask_set(const char *arg, const struct kernel_param *kp) +{ + int ret; + + mutex_lock(&powerclamp_lock); + + /* Can't set mask when cooling device is in use */ + if (powerclamp_data.clamping) { + ret =3D -EAGAIN; + goto skip_cpumask_set; + } + + if (!idle_injection_cpu_mask) { + if (!zalloc_cpumask_var(&idle_injection_cpu_mask, GFP_KERNEL)) { + ret =3D -ENOMEM; + goto skip_cpumask_set; + } + } + + ret =3D bitmap_parse(arg, strlen(arg), cpumask_bits(idle_injection_cpu_ma= sk), + nr_cpumask_bits); + if (ret) + goto free_cpu_mask; + + if (cpumask_empty(idle_injection_cpu_mask)) { + ret =3D -EINVAL; + goto free_cpu_mask; + } + + if (cpumask_equal(cpu_present_mask, idle_injection_cpu_mask) && + step_size > 1) { + ret =3D -EINVAL; + goto free_cpu_mask; + } + + mutex_unlock(&powerclamp_lock); + + return 0; + +free_cpu_mask: + free_cpumask_var(idle_injection_cpu_mask); + idle_injection_cpu_mask =3D NULL; + +skip_cpumask_set: + mutex_unlock(&powerclamp_lock); + + return ret; +} + +static int cpumask_get(char *buf, const struct kernel_param *kp) +{ + if (!idle_injection_cpu_mask) + return -EINVAL; + + return bitmap_print_to_pagebuf(false, buf, cpumask_bits(idle_injection_cp= u_mask), + nr_cpumask_bits); +} + +static const struct kernel_param_ops cpumask_ops =3D { + .set =3D cpumask_set, + .get =3D cpumask_get, +}; + +module_param_cb(cpumask, &cpumask_ops, NULL, 0644); +MODULE_PARM_DESC(cpumask, "Mask of CPUs to use for idle injection."); + +static int step_size_set(const char *arg, const struct kernel_param *kp) +{ + u8 _step_size; + int ret =3D 0; + + mutex_lock(&powerclamp_lock); + + /* Can't set mask when cooling device is in use */ + if (powerclamp_data.clamping) { + ret =3D -EAGAIN; + goto skip_limit_set; + } + + ret =3D kstrtou8(arg, 10, &_step_size); + if (ret) + goto skip_limit_set; + + if (_step_size > 2) { + ret =3D -EINVAL; + goto skip_limit_set; + } + + if (idle_injection_cpu_mask && cpumask_equal(cpu_present_mask, idle_injec= tion_cpu_mask) && + _step_size > 1) { + ret =3D -EINVAL; + goto skip_limit_set; + } + + step_size =3D _step_size; + +skip_limit_set: + mutex_unlock(&powerclamp_lock); + + return ret; +} + +static const struct kernel_param_ops step_size_ops =3D { + .set =3D step_size_set, + .get =3D param_get_int, +}; + +module_param_cb(step_size, &step_size_ops, &step_size, 0644); +MODULE_PARM_DESC(step_size, "Step size to increment for each cooling state= . range 1-2"); + struct powerclamp_calibration_data { unsigned long confidence; /* used for calibration, basically a counter * gets incremented each time a clamping @@ -246,6 +359,9 @@ static unsigned int get_compensation(int ratio) { unsigned int comp =3D 0; =20 + if (step_size > 1) + return 0; + /* we only use compensation if all adjacent ones are good */ if (ratio =3D=3D 1 && cal_data[ratio].confidence >=3D CONFIDENCE_OK && @@ -338,6 +454,10 @@ static unsigned int get_run_time(void) unsigned int compensated_ratio; unsigned int runtime; =20 + /* No compensation for non systemwide idle injection */ + if (step_size > 1) + return (duration * 100 / powerclamp_data.target_ratio - duration); + /* * make sure user selected ratio does not take effect until * the next round. adjust target_ratio if user has changed @@ -454,21 +574,26 @@ static void trigger_idle_injection(void) */ static int powerclamp_idle_injection_register(void) { - /* - * The idle inject core will only inject for online CPUs, - * So we can register for all present CPUs. In this way - * if some CPU goes online/offline while idle inject - * is registered, nothing additional calls are required. - * The same runtime and idle time is applicable for - * newly onlined CPUs if any. - * - * Here cpu_present_mask can be used as is. - * cast to (struct cpumask *) is required as the - * cpu_present_mask is const struct cpumask *, otherwise - * there will be compiler warnings. - */ - ii_dev =3D idle_inject_register_full((struct cpumask *)cpu_present_mask, - idle_inject_update); + if (!idle_injection_cpu_mask) { + if (!zalloc_cpumask_var(&idle_injection_cpu_mask, GFP_KERNEL)) + return -ENOMEM; + + /* + * The idle inject core will only inject for online CPUs, + * So we can register for all present CPUs. In this way + * if some CPU goes online/offline while idle inject + * is registered, nothing additional calls are required. + * The same runtime and idle time is applicable for + * newly onlined CPUs if any. + */ + cpumask_copy(idle_injection_cpu_mask, cpu_present_mask); + } + + if (cpumask_equal(cpu_present_mask, idle_injection_cpu_mask)) + ii_dev =3D idle_inject_register_full(idle_injection_cpu_mask, idle_injec= t_update); + else + ii_dev =3D idle_inject_register(idle_injection_cpu_mask); + if (!ii_dev) { pr_err("powerclamp: idle_inject_register failed\n"); return -EAGAIN; @@ -504,7 +629,9 @@ static int start_power_clamp(void) ret =3D powerclamp_idle_injection_register(); if (!ret) { trigger_idle_injection(); - schedule_delayed_work(&poll_pkg_cstate_work, 0); + /* No use of package residency counters when partial set is used */ + if (cpumask_equal(cpu_present_mask, idle_injection_cpu_mask)) + schedule_delayed_work(&poll_pkg_cstate_work, 0); } =20 return ret; @@ -558,7 +685,7 @@ static int powerclamp_set_cur_state(struct thermal_cool= ing_device *cdev, (unsigned long) (MAX_TARGET_RATIO - 1)); if (!powerclamp_data.target_ratio && new_target_ratio > 0) { pr_info("Start idle injection to reduce power\n"); - powerclamp_data.target_ratio =3D new_target_ratio; + powerclamp_data.target_ratio =3D new_target_ratio * step_size; ret =3D start_power_clamp(); if (ret) powerclamp_data.target_ratio =3D 0; @@ -570,7 +697,7 @@ static int powerclamp_set_cur_state(struct thermal_cool= ing_device *cdev, } else /* adjust currently running */ { unsigned int runtime; =20 - powerclamp_data.target_ratio =3D new_target_ratio; + powerclamp_data.target_ratio =3D new_target_ratio * step_size; runtime =3D get_run_time(); idle_inject_set_duration(ii_dev, runtime, duration); } @@ -676,6 +803,9 @@ static void __exit powerclamp_exit(void) =20 cancel_delayed_work_sync(&poll_pkg_cstate_work); debugfs_remove_recursive(debug_dir); + + if (idle_injection_cpu_mask) + free_cpumask_var(idle_injection_cpu_mask); } module_exit(powerclamp_exit); =20 --=20 2.37.2