When autonomous selection (auto_sel) is enabled, the hardware controls
performance within min_perf/max_perf register bounds making the
scaling_min/max_freq effectively read-only.
Enforce this by setting policy limits to min/max_perf bounds in
cppc_verify_policy(). Users must use min_perf/max_perf sysfs interfaces
to change performance limits in autonomous mode.
Signed-off-by: Sumit Gupta <sumitg@nvidia.com>
---
drivers/cpufreq/cppc_cpufreq.c | 32 +++++++++++++++++++++++++++++++-
1 file changed, 31 insertions(+), 1 deletion(-)
diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
index b1f570d6de34..b3da263c18b0 100644
--- a/drivers/cpufreq/cppc_cpufreq.c
+++ b/drivers/cpufreq/cppc_cpufreq.c
@@ -305,7 +305,37 @@ static unsigned int cppc_cpufreq_fast_switch(struct cpufreq_policy *policy,
static int cppc_verify_policy(struct cpufreq_policy_data *policy)
{
- cpufreq_verify_within_cpu_limits(policy);
+ unsigned int min_freq = policy->cpuinfo.min_freq;
+ unsigned int max_freq = policy->cpuinfo.max_freq;
+ struct cpufreq_policy *cpu_policy;
+ struct cppc_cpudata *cpu_data;
+ struct cppc_perf_caps *caps;
+
+ cpu_policy = cpufreq_cpu_get(policy->cpu);
+ if (!cpu_policy)
+ return -ENODEV;
+
+ cpu_data = cpu_policy->driver_data;
+ caps = &cpu_data->perf_caps;
+
+ if (cpu_data->perf_ctrls.auto_sel) {
+ u32 min_perf, max_perf;
+
+ /*
+ * Set policy limits to HW min/max_perf bounds. In autonomous
+ * mode, scaling_min/max_freq is effectively read-only.
+ */
+ min_perf = cpu_data->perf_ctrls.min_perf ?:
+ caps->lowest_nonlinear_perf;
+ max_perf = cpu_data->perf_ctrls.max_perf ?: caps->nominal_perf;
+
+ policy->min = cppc_perf_to_khz(caps, min_perf);
+ policy->max = cppc_perf_to_khz(caps, max_perf);
+ } else {
+ cpufreq_verify_within_limits(policy, min_freq, max_freq);
+ }
+
+ cpufreq_cpu_put(cpu_policy);
return 0;
}
--
2.34.1
Hello Sumit, Lifeng,
On 12/23/25 13:13, Sumit Gupta wrote:
> When autonomous selection (auto_sel) is enabled, the hardware controls
> performance within min_perf/max_perf register bounds making the
> scaling_min/max_freq effectively read-only.
If auto_sel is set, the governor associated to the policy will have no
actual control.
E.g.:
If the schedutil governor is used, attempts to set the
frequency based on CPU utilization will be periodically
sent, but they will have no effect.
The same thing will happen for the ondemand, performance,
powersave, userspace, etc. governors. They can only work if
frequency requests are taken into account.
------------
This looks like the intel_pstate governor handling where it is possible
not to have .target() or .target_index() callback and the hardware is in
charge (IIUC).
For this case, only 2 governor seem available: performance and powersave.
------------
In our case, I think it is desired to unload the scaling governor
currently in
use if auto_sel is selected. Letting the rest of the system think it has
control
over the freq. selection seems incorrect.
I am not sure what to replace it with:
-
There are no specific performance/powersave modes for CPPC.
There is a range of values between 0-255
-
A firmware auto-selection governor could be created just for this case.
Being able to switch between OS-driven and firmware driven freq. selection
is not specific to CPPC (for the future).
However I am not really able to say the implications of doing that.
------------
I think it would be better to split your patchset in 2:
1. adding APIs for the CPPC spec.
2. using the APIs, especially for auto_sel
1. is likely to be straightforward as the APIs will still be used
by the driver at some point.
2. is likely to bring more discussion.
> Enforce this by setting policy limits to min/max_perf bounds in
> cppc_verify_policy(). Users must use min_perf/max_perf sysfs interfaces
> to change performance limits in autonomous mode.
>
> Signed-off-by: Sumit Gupta <sumitg@nvidia.com>
> ---
> drivers/cpufreq/cppc_cpufreq.c | 32 +++++++++++++++++++++++++++++++-
> 1 file changed, 31 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
> index b1f570d6de34..b3da263c18b0 100644
> --- a/drivers/cpufreq/cppc_cpufreq.c
> +++ b/drivers/cpufreq/cppc_cpufreq.c
> @@ -305,7 +305,37 @@ static unsigned int cppc_cpufreq_fast_switch(struct cpufreq_policy *policy,
>
> static int cppc_verify_policy(struct cpufreq_policy_data *policy)
> {
> - cpufreq_verify_within_cpu_limits(policy);
> + unsigned int min_freq = policy->cpuinfo.min_freq;
> + unsigned int max_freq = policy->cpuinfo.max_freq;
> + struct cpufreq_policy *cpu_policy;
> + struct cppc_cpudata *cpu_data;
> + struct cppc_perf_caps *caps;
> +
> + cpu_policy = cpufreq_cpu_get(policy->cpu);
> + if (!cpu_policy)
> + return -ENODEV;
> +
> + cpu_data = cpu_policy->driver_data;
> + caps = &cpu_data->perf_caps;
> +
> + if (cpu_data->perf_ctrls.auto_sel) {
> + u32 min_perf, max_perf;
> +
> + /*
> + * Set policy limits to HW min/max_perf bounds. In autonomous
> + * mode, scaling_min/max_freq is effectively read-only.
> + */
> + min_perf = cpu_data->perf_ctrls.min_perf ?:
> + caps->lowest_nonlinear_perf;
> + max_perf = cpu_data->perf_ctrls.max_perf ?: caps->nominal_perf;
> +
> + policy->min = cppc_perf_to_khz(caps, min_perf);
> + policy->max = cppc_perf_to_khz(caps, max_perf);
policy->min/max values are overwritten, but the governor which is
supposed to use them to select the most fitting frequency will be
ignored by the firmware I think.
> + } else {
> + cpufreq_verify_within_limits(policy, min_freq, max_freq);
> + }
> +
> + cpufreq_cpu_put(cpu_policy);
> return 0;
> }
>
On 08/01/26 22:16, Pierre Gondois wrote:
> External email: Use caution opening links or attachments
>
>
> Hello Sumit, Lifeng,
>
> On 12/23/25 13:13, Sumit Gupta wrote:
>> When autonomous selection (auto_sel) is enabled, the hardware controls
>> performance within min_perf/max_perf register bounds making the
>> scaling_min/max_freq effectively read-only.
>
> If auto_sel is set, the governor associated to the policy will have no
> actual control.
>
> E.g.:
> If the schedutil governor is used, attempts to set the
> frequency based on CPU utilization will be periodically
> sent, but they will have no effect.
>
> The same thing will happen for the ondemand, performance,
> powersave, userspace, etc. governors. They can only work if
> frequency requests are taken into account.
>
> ------------
>
> This looks like the intel_pstate governor handling where it is possible
> not to have .target() or .target_index() callback and the hardware is in
> charge (IIUC).
> For this case, only 2 governor seem available: performance and powersave.
>
In v1 [1], I added a separate cppc_cpufreq_epp_driver instance without
target*() hooks, using setpolicy() instead (similar to AMD pstate).
However, this approach doesn't allow per-CPU control: if we boot with the
EPP driver, we can't dynamically disable auto_sel for individual CPUs and
return to OS governor control (no target hook available). AMD and Intel
pstate drivers seem to set HW autonomous mode for all CPUs globally,
not per-CPU. So, changed it in v2.
[1] https://lore.kernel.org/lkml/20250211103737.447704-6-sumitg@nvidia.com/
> ------------
>
> In our case, I think it is desired to unload the scaling governor
> currently in
> use if auto_sel is selected. Letting the rest of the system think it has
> control
> over the freq. selection seems incorrect.
> I am not sure what to replace it with:
> -
> There are no specific performance/powersave modes for CPPC.
> There is a range of values between 0-255
> -
> A firmware auto-selection governor could be created just for this case.
> Being able to switch between OS-driven and firmware driven freq.
> selection
> is not specific to CPPC (for the future).
> However I am not really able to say the implications of doing that.
>
> ------------
>
> I think it would be better to split your patchset in 2:
> 1. adding APIs for the CPPC spec.
> 2. using the APIs, especially for auto_sel
>
> 1. is likely to be straightforward as the APIs will still be used
> by the driver at some point.
> 2. is likely to bring more discussion.
>
We discussed adding a hw_auto_sel governor as a second step, though the
approach may need refinement during implementation.
Deferred it (to second step) because adding a new governor requires
broader discussion.
This issue already exists in current code - store_auto_select() enables
auto_sel without any governor awareness. These patches improve the
situation by:
- Updating scaling_min/max_freq when toggling auto_sel mode
- Syncing policy limits with actual HW min/max_perf bounds
- Making scaling_min/max_freq read-only in auto_sel mode
Would it be acceptable to merge this as a first step, with the governor
handling as a follow-up?
If not and you prefer splitting, which grouping works better:
A) Patches 1-8 then 9-11.
B) "ACPI: CPPC *" patches then "cpufreq: CPPC *" patches.
>
>> Enforce this by setting policy limits to min/max_perf bounds in
>> cppc_verify_policy(). Users must use min_perf/max_perf sysfs interfaces
>> to change performance limits in autonomous mode.
>>
>> Signed-off-by: Sumit Gupta <sumitg@nvidia.com>
>> ---
>> drivers/cpufreq/cppc_cpufreq.c | 32 +++++++++++++++++++++++++++++++-
>> 1 file changed, 31 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/cpufreq/cppc_cpufreq.c
>> b/drivers/cpufreq/cppc_cpufreq.c
>> index b1f570d6de34..b3da263c18b0 100644
>> --- a/drivers/cpufreq/cppc_cpufreq.c
>> +++ b/drivers/cpufreq/cppc_cpufreq.c
>> @@ -305,7 +305,37 @@ static unsigned int
>> cppc_cpufreq_fast_switch(struct cpufreq_policy *policy,
>>
>> static int cppc_verify_policy(struct cpufreq_policy_data *policy)
>> {
>> - cpufreq_verify_within_cpu_limits(policy);
>> + unsigned int min_freq = policy->cpuinfo.min_freq;
>> + unsigned int max_freq = policy->cpuinfo.max_freq;
>> + struct cpufreq_policy *cpu_policy;
>> + struct cppc_cpudata *cpu_data;
>> + struct cppc_perf_caps *caps;
>> +
>> + cpu_policy = cpufreq_cpu_get(policy->cpu);
>> + if (!cpu_policy)
>> + return -ENODEV;
>> +
>> + cpu_data = cpu_policy->driver_data;
>> + caps = &cpu_data->perf_caps;
>> +
>> + if (cpu_data->perf_ctrls.auto_sel) {
>> + u32 min_perf, max_perf;
>> +
>> + /*
>> + * Set policy limits to HW min/max_perf bounds. In
>> autonomous
>> + * mode, scaling_min/max_freq is effectively read-only.
>> + */
>> + min_perf = cpu_data->perf_ctrls.min_perf ?:
>> + caps->lowest_nonlinear_perf;
>> + max_perf = cpu_data->perf_ctrls.max_perf ?:
>> caps->nominal_perf;
>> +
>> + policy->min = cppc_perf_to_khz(caps, min_perf);
>> + policy->max = cppc_perf_to_khz(caps, max_perf);
>
> policy->min/max values are overwritten, but the governor which is
> supposed to use them to select the most fitting frequency will be
> ignored by the firmware I think.
>
Yes.
>> + } else {
>> + cpufreq_verify_within_limits(policy, min_freq, max_freq);
>> + }
>> +
>> + cpufreq_cpu_put(cpu_policy);
>> return 0;
>> }
>>
Hello Sumit,
On 1/9/26 15:37, Sumit Gupta wrote:
>
> On 08/01/26 22:16, Pierre Gondois wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> Hello Sumit, Lifeng,
>>
>> On 12/23/25 13:13, Sumit Gupta wrote:
>>> When autonomous selection (auto_sel) is enabled, the hardware controls
>>> performance within min_perf/max_perf register bounds making the
>>> scaling_min/max_freq effectively read-only.
>>
>> If auto_sel is set, the governor associated to the policy will have no
>> actual control.
>>
>> E.g.:
>> If the schedutil governor is used, attempts to set the
>> frequency based on CPU utilization will be periodically
>> sent, but they will have no effect.
>>
>> The same thing will happen for the ondemand, performance,
>> powersave, userspace, etc. governors. They can only work if
>> frequency requests are taken into account.
>>
>> ------------
>>
>> This looks like the intel_pstate governor handling where it is possible
>> not to have .target() or .target_index() callback and the hardware is in
>> charge (IIUC).
>> For this case, only 2 governor seem available: performance and
>> powersave.
>>
>
Thanks for pointing me to the first version, I forgot how your
first implementation was.
> In v1 [1], I added a separate cppc_cpufreq_epp_driver instance without
> target*() hooks, using setpolicy() instead (similar to AMD pstate).
> However, this approach doesn't allow per-CPU control: if we boot with the
> EPP driver, we can't dynamically disable auto_sel for individual CPUs and
> return to OS governor control (no target hook available). AMD and Intel
> pstate drivers seem to set HW autonomous mode for all CPUs globally,
> not per-CPU. So, changed it in v2.
> [1]
> https://lore.kernel.org/lkml/20250211103737.447704-6-sumitg@nvidia.com/
>
Ok right.
This is something I don't really understand in the current intel/amd cpufreq
drivers. FWIU:
- the cpufreq drivers abstractions allow to access different hardware
- the governor abstraction allows to switch between different algorithms
to select the 'correct' frequency.
So IMO switching to autonomous selection should be done by switching
to another governor and the 'auto_sel' file should not be accessible to
users.
------------
Being able to enable/disable the autonomous selection on a per-policy
base seems a valid use-case. It also seems to fit the per-policy governor
capabilities.
However toggling the auto_sel on different CPUs inside the same policy
seems inappropriate (this is is not what is done in this patchset IIUC).
>
>> ------------
>>
>> In our case, I think it is desired to unload the scaling governor
>> currently in
>> use if auto_sel is selected. Letting the rest of the system think it has
>> control
>> over the freq. selection seems incorrect.
>> I am not sure what to replace it with:
>> -
>> There are no specific performance/powersave modes for CPPC.
>> There is a range of values between 0-255
>> -
>> A firmware auto-selection governor could be created just for this case.
>> Being able to switch between OS-driven and firmware driven freq.
>> selection
>> is not specific to CPPC (for the future).
>> However I am not really able to say the implications of doing that.
>>
>> ------------
>>
>> I think it would be better to split your patchset in 2:
>> 1. adding APIs for the CPPC spec.
>> 2. using the APIs, especially for auto_sel
>>
>> 1. is likely to be straightforward as the APIs will still be used
>> by the driver at some point.
>> 2. is likely to bring more discussion.
>>
>
> We discussed adding a hw_auto_sel governor as a second step, though the
> approach may need refinement during implementation.
I didn't find in the thread adding a new governor was discussed in the
threads, in case you have a direct link.
>
> Deferred it (to second step) because adding a new governor requires
> broader discussion.
>
> This issue already exists in current code - store_auto_select() enables
> auto_sel without any governor awareness. These patches improve the
> situation by:
> - Updating scaling_min/max_freq when toggling auto_sel mode
> - Syncing policy limits with actual HW min/max_perf bounds
> - Making scaling_min/max_freq read-only in auto_sel mode
>
> Would it be acceptable to merge this as a first step, with the governor
> handling as a follow-up?
> If not and you prefer splitting, which grouping works better:
> A) Patches 1-8 then 9-11.
> B) "ACPI: CPPC *" patches then "cpufreq: CPPC *" patches.
>
If it's possible I would like to understand what the end result should
look like. If ultimately enabling auto_sel implies switching governor
I understand, but I didn't find the thread that discussed about that
unfortunately.
>
>>
>>> Enforce this by setting policy limits to min/max_perf bounds in
>>> cppc_verify_policy(). Users must use min_perf/max_perf sysfs interfaces
>>> to change performance limits in autonomous mode.
>>>
>>> Signed-off-by: Sumit Gupta <sumitg@nvidia.com>
>>> ---
>>> drivers/cpufreq/cppc_cpufreq.c | 32 +++++++++++++++++++++++++++++++-
>>> 1 file changed, 31 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/cpufreq/cppc_cpufreq.c
>>> b/drivers/cpufreq/cppc_cpufreq.c
>>> index b1f570d6de34..b3da263c18b0 100644
>>> --- a/drivers/cpufreq/cppc_cpufreq.c
>>> +++ b/drivers/cpufreq/cppc_cpufreq.c
>>> @@ -305,7 +305,37 @@ static unsigned int
>>> cppc_cpufreq_fast_switch(struct cpufreq_policy *policy,
>>>
>>> static int cppc_verify_policy(struct cpufreq_policy_data *policy)
>>> {
>>> - cpufreq_verify_within_cpu_limits(policy);
>>> + unsigned int min_freq = policy->cpuinfo.min_freq;
>>> + unsigned int max_freq = policy->cpuinfo.max_freq;
>>> + struct cpufreq_policy *cpu_policy;
>>> + struct cppc_cpudata *cpu_data;
>>> + struct cppc_perf_caps *caps;
>>> +
>>> + cpu_policy = cpufreq_cpu_get(policy->cpu);
>>> + if (!cpu_policy)
>>> + return -ENODEV;
>>> +
>>> + cpu_data = cpu_policy->driver_data;
>>> + caps = &cpu_data->perf_caps;
>>> +
>>> + if (cpu_data->perf_ctrls.auto_sel) {
>>> + u32 min_perf, max_perf;
>>> +
>>> + /*
>>> + * Set policy limits to HW min/max_perf bounds. In
>>> autonomous
>>> + * mode, scaling_min/max_freq is effectively read-only.
>>> + */
>>> + min_perf = cpu_data->perf_ctrls.min_perf ?:
>>> + caps->lowest_nonlinear_perf;
>>> + max_perf = cpu_data->perf_ctrls.max_perf ?:
>>> caps->nominal_perf;
>>> +
>>> + policy->min = cppc_perf_to_khz(caps, min_perf);
>>> + policy->max = cppc_perf_to_khz(caps, max_perf);
>>
>> policy->min/max values are overwritten, but the governor which is
>> supposed to use them to select the most fitting frequency will be
>> ignored by the firmware I think.
>>
>
> Yes.
>
>>> + } else {
>>> + cpufreq_verify_within_limits(policy, min_freq, max_freq);
>>> + }
>>> +
>>> + cpufreq_cpu_put(cpu_policy);
>>> return 0;
>>> }
>>>
On 12/01/26 17:14, Pierre Gondois wrote:
> External email: Use caution opening links or attachments
>
>
> Hello Sumit,
>
> On 1/9/26 15:37, Sumit Gupta wrote:
>>
>> On 08/01/26 22:16, Pierre Gondois wrote:
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> Hello Sumit, Lifeng,
>>>
>>> On 12/23/25 13:13, Sumit Gupta wrote:
>>>> When autonomous selection (auto_sel) is enabled, the hardware controls
>>>> performance within min_perf/max_perf register bounds making the
>>>> scaling_min/max_freq effectively read-only.
>>>
>>> If auto_sel is set, the governor associated to the policy will have no
>>> actual control.
>>>
>>> E.g.:
>>> If the schedutil governor is used, attempts to set the
>>> frequency based on CPU utilization will be periodically
>>> sent, but they will have no effect.
>>>
>>> The same thing will happen for the ondemand, performance,
>>> powersave, userspace, etc. governors. They can only work if
>>> frequency requests are taken into account.
>>>
>>> ------------
>>>
>>> This looks like the intel_pstate governor handling where it is possible
>>> not to have .target() or .target_index() callback and the hardware
>>> is in
>>> charge (IIUC).
>>> For this case, only 2 governor seem available: performance and
>>> powersave.
>>>
>>
> Thanks for pointing me to the first version, I forgot how your
> first implementation was.
>
>
>> In v1 [1], I added a separate cppc_cpufreq_epp_driver instance without
>> target*() hooks, using setpolicy() instead (similar to AMD pstate).
>> However, this approach doesn't allow per-CPU control: if we boot with
>> the
>> EPP driver, we can't dynamically disable auto_sel for individual CPUs
>> and
>> return to OS governor control (no target hook available). AMD and Intel
>> pstate drivers seem to set HW autonomous mode for all CPUs globally,
>> not per-CPU. So, changed it in v2.
>> [1]
>> https://lore.kernel.org/lkml/20250211103737.447704-6-sumitg@nvidia.com/
>>
> Ok right.
> This is something I don't really understand in the current intel/amd
> cpufreq
> drivers. FWIU:
> - the cpufreq drivers abstractions allow to access different hardware
> - the governor abstraction allows to switch between different algorithms
> to select the 'correct' frequency.
>
> So IMO switching to autonomous selection should be done by switching
> to another governor and the 'auto_sel' file should not be accessible to
> users.
>
> ------------
>
> Being able to enable/disable the autonomous selection on a per-policy
> base seems a valid use-case. It also seems to fit the per-policy governor
> capabilities.
> However toggling the auto_sel on different CPUs inside the same policy
> seems inappropriate (this is is not what is done in this patchset IIUC).
>
I agree about the new governor approach.
We can make the auto_select interface read-only to reflect the current
state,
and users would use scaling_governor to switch to/from hw_autonomous.
This keeps governor control in one place.
Alternatively, we could have writes to auto_select trigger the governor
switch for backward compatibility - but that might be confusing to have
two ways to switch governors.
As suggested, i will split this patchset into two:
1) v6 with patches 1-7: CPPC register access APIs and sysfs interfaces
- These are straightforward and provide foundational HW access.
2) Meanwhile work on new patch series for hw_autonomous governor.
Please let me know if any other thoughts.
Thank you,
Sumit Gupta
>
>>
>>> ------------
>>>
>>> In our case, I think it is desired to unload the scaling governor
>>> currently in
>>> use if auto_sel is selected. Letting the rest of the system think it
>>> has
>>> control
>>> over the freq. selection seems incorrect.
>>> I am not sure what to replace it with:
>>> -
>>> There are no specific performance/powersave modes for CPPC.
>>> There is a range of values between 0-255
>>> -
>>> A firmware auto-selection governor could be created just for this case.
>>> Being able to switch between OS-driven and firmware driven freq.
>>> selection
>>> is not specific to CPPC (for the future).
>>> However I am not really able to say the implications of doing that.
>>>
>>> ------------
>>>
>>> I think it would be better to split your patchset in 2:
>>> 1. adding APIs for the CPPC spec.
>>> 2. using the APIs, especially for auto_sel
>>>
>>> 1. is likely to be straightforward as the APIs will still be used
>>> by the driver at some point.
>>> 2. is likely to bring more discussion.
>>>
>>
>> We discussed adding a hw_auto_sel governor as a second step, though the
>> approach may need refinement during implementation.
>
> I didn't find in the thread adding a new governor was discussed in the
> threads, in case you have a direct link.
>
>>
>> Deferred it (to second step) because adding a new governor requires
>> broader discussion.
>>
>> This issue already exists in current code - store_auto_select() enables
>> auto_sel without any governor awareness. These patches improve the
>> situation by:
>> - Updating scaling_min/max_freq when toggling auto_sel mode
>> - Syncing policy limits with actual HW min/max_perf bounds
>> - Making scaling_min/max_freq read-only in auto_sel mode
>>
>> Would it be acceptable to merge this as a first step, with the governor
>> handling as a follow-up?
>> If not and you prefer splitting, which grouping works better:
>> A) Patches 1-8 then 9-11.
>> B) "ACPI: CPPC *" patches then "cpufreq: CPPC *" patches.
>>
> If it's possible I would like to understand what the end result should
> look like. If ultimately enabling auto_sel implies switching governor
> I understand, but I didn't find the thread that discussed about that
> unfortunately.
>
>
>>
>>>
>>>> Enforce this by setting policy limits to min/max_perf bounds in
>>>> cppc_verify_policy(). Users must use min_perf/max_perf sysfs
>>>> interfaces
>>>> to change performance limits in autonomous mode.
>>>>
>>>> Signed-off-by: Sumit Gupta <sumitg@nvidia.com>
>>>> ---
>>>> drivers/cpufreq/cppc_cpufreq.c | 32 +++++++++++++++++++++++++++++++-
>>>> 1 file changed, 31 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/cpufreq/cppc_cpufreq.c
>>>> b/drivers/cpufreq/cppc_cpufreq.c
>>>> index b1f570d6de34..b3da263c18b0 100644
>>>> --- a/drivers/cpufreq/cppc_cpufreq.c
>>>> +++ b/drivers/cpufreq/cppc_cpufreq.c
>>>> @@ -305,7 +305,37 @@ static unsigned int
>>>> cppc_cpufreq_fast_switch(struct cpufreq_policy *policy,
>>>>
>>>> static int cppc_verify_policy(struct cpufreq_policy_data *policy)
>>>> {
>>>> - cpufreq_verify_within_cpu_limits(policy);
>>>> + unsigned int min_freq = policy->cpuinfo.min_freq;
>>>> + unsigned int max_freq = policy->cpuinfo.max_freq;
>>>> + struct cpufreq_policy *cpu_policy;
>>>> + struct cppc_cpudata *cpu_data;
>>>> + struct cppc_perf_caps *caps;
>>>> +
>>>> + cpu_policy = cpufreq_cpu_get(policy->cpu);
>>>> + if (!cpu_policy)
>>>> + return -ENODEV;
>>>> +
>>>> + cpu_data = cpu_policy->driver_data;
>>>> + caps = &cpu_data->perf_caps;
>>>> +
>>>> + if (cpu_data->perf_ctrls.auto_sel) {
>>>> + u32 min_perf, max_perf;
>>>> +
>>>> + /*
>>>> + * Set policy limits to HW min/max_perf bounds. In
>>>> autonomous
>>>> + * mode, scaling_min/max_freq is effectively read-only.
>>>> + */
>>>> + min_perf = cpu_data->perf_ctrls.min_perf ?:
>>>> + caps->lowest_nonlinear_perf;
>>>> + max_perf = cpu_data->perf_ctrls.max_perf ?:
>>>> caps->nominal_perf;
>>>> +
>>>> + policy->min = cppc_perf_to_khz(caps, min_perf);
>>>> + policy->max = cppc_perf_to_khz(caps, max_perf);
>>>
>>> policy->min/max values are overwritten, but the governor which is
>>> supposed to use them to select the most fitting frequency will be
>>> ignored by the firmware I think.
>>>
>>
>> Yes.
>>
>>>> + } else {
>>>> + cpufreq_verify_within_limits(policy, min_freq,
>>>> max_freq);
>>>> + }
>>>> +
>>>> + cpufreq_cpu_put(cpu_policy);
>>>> return 0;
>>>> }
>>>>
Hello Pierre,
On 2026/1/12 19:44, Pierre Gondois wrote:
> Hello Sumit,
>
> On 1/9/26 15:37, Sumit Gupta wrote:
>>
>> On 08/01/26 22:16, Pierre Gondois wrote:
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> Hello Sumit, Lifeng,
>>>
>>> On 12/23/25 13:13, Sumit Gupta wrote:
>>>> When autonomous selection (auto_sel) is enabled, the hardware controls
>>>> performance within min_perf/max_perf register bounds making the
>>>> scaling_min/max_freq effectively read-only.
>>>
>>> If auto_sel is set, the governor associated to the policy will have no
>>> actual control.
>>>
>>> E.g.:
>>> If the schedutil governor is used, attempts to set the
>>> frequency based on CPU utilization will be periodically
>>> sent, but they will have no effect.
>>>
>>> The same thing will happen for the ondemand, performance,
>>> powersave, userspace, etc. governors. They can only work if
>>> frequency requests are taken into account.
>>>
>>> ------------
>>>
>>> This looks like the intel_pstate governor handling where it is possible
>>> not to have .target() or .target_index() callback and the hardware is in
>>> charge (IIUC).
>>> For this case, only 2 governor seem available: performance and powersave.
>>>
As you mentioned in [2], 'it still makes sense to have cpufreq requesting a
certain performance level even though autonomous selection is enabled'. So I
think it's OK to have a governor when auto_selection is enabled.
[2] https://lore.kernel.org/all/9f46991d-98c3-41f5-8133-6612b397e33a@arm.com/
>>
> Thanks for pointing me to the first version, I forgot how your
> first implementation was.
>
>
>> In v1 [1], I added a separate cppc_cpufreq_epp_driver instance without
>> target*() hooks, using setpolicy() instead (similar to AMD pstate).
>> However, this approach doesn't allow per-CPU control: if we boot with the
>> EPP driver, we can't dynamically disable auto_sel for individual CPUs and
>> return to OS governor control (no target hook available). AMD and Intel
>> pstate drivers seem to set HW autonomous mode for all CPUs globally,
>> not per-CPU. So, changed it in v2.
>> [1] https://lore.kernel.org/lkml/20250211103737.447704-6-sumitg@nvidia.com/
>>
> Ok right.
> This is something I don't really understand in the current intel/amd cpufreq
> drivers. FWIU:
> - the cpufreq drivers abstractions allow to access different hardware
> - the governor abstraction allows to switch between different algorithms
> to select the 'correct' frequency.
>
> So IMO switching to autonomous selection should be done by switching
> to another governor and the 'auto_sel' file should not be accessible to users.
>
> ------------
>
> Being able to enable/disable the autonomous selection on a per-policy
> base seems a valid use-case. It also seems to fit the per-policy governor
> capabilities.
I'm OK with adding an auto-selection governor. It's better to keep this
governor only in cppc_cpufreq for now I think.
> However toggling the auto_sel on different CPUs inside the same policy
> seems inappropriate (this is is not what is done in this patchset IIUC).
>
I think Sumit means per-policy when he said per-CPU.
>
>>
>>> ------------
>>>
>>> In our case, I think it is desired to unload the scaling governor
>>> currently in
>>> use if auto_sel is selected. Letting the rest of the system think it has
>>> control
>>> over the freq. selection seems incorrect.
>>> I am not sure what to replace it with:
>>> -
>>> There are no specific performance/powersave modes for CPPC.
>>> There is a range of values between 0-255
>>> -
>>> A firmware auto-selection governor could be created just for this case.
>>> Being able to switch between OS-driven and firmware driven freq. selection
>>> is not specific to CPPC (for the future).
>>> However I am not really able to say the implications of doing that.
>>>
>>> ------------
>>>
>>> I think it would be better to split your patchset in 2:
>>> 1. adding APIs for the CPPC spec.
>>> 2. using the APIs, especially for auto_sel
>>>
>>> 1. is likely to be straightforward as the APIs will still be used
>>> by the driver at some point.
>>> 2. is likely to bring more discussion.
>>>
>>
>> We discussed adding a hw_auto_sel governor as a second step, though the
>> approach may need refinement during implementation.
>
> I didn't find in the thread adding a new governor was discussed in the
> threads, in case you have a direct link.
>
>>
>> Deferred it (to second step) because adding a new governor requires
>> broader discussion.
>>
>> This issue already exists in current code - store_auto_select() enables
>> auto_sel without any governor awareness. These patches improve the
>> situation by:
>> - Updating scaling_min/max_freq when toggling auto_sel mode
>> - Syncing policy limits with actual HW min/max_perf bounds
>> - Making scaling_min/max_freq read-only in auto_sel mode
>>
>> Would it be acceptable to merge this as a first step, with the governor
>> handling as a follow-up?
>> If not and you prefer splitting, which grouping works better:
>> A) Patches 1-8 then 9-11.
>> B) "ACPI: CPPC *" patches then "cpufreq: CPPC *" patches.
>>
> If it's possible I would like to understand what the end result should
> look like. If ultimately enabling auto_sel implies switching governor
> I understand, but I didn't find the thread that discussed about that
> unfortunately.
>
>
>>
>>>
>>>> Enforce this by setting policy limits to min/max_perf bounds in
>>>> cppc_verify_policy(). Users must use min_perf/max_perf sysfs interfaces
>>>> to change performance limits in autonomous mode.
>>>>
>>>> Signed-off-by: Sumit Gupta <sumitg@nvidia.com>
>>>> ---
>>>> drivers/cpufreq/cppc_cpufreq.c | 32 +++++++++++++++++++++++++++++++-
>>>> 1 file changed, 31 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
>>>> index b1f570d6de34..b3da263c18b0 100644
>>>> --- a/drivers/cpufreq/cppc_cpufreq.c
>>>> +++ b/drivers/cpufreq/cppc_cpufreq.c
>>>> @@ -305,7 +305,37 @@ static unsigned int cppc_cpufreq_fast_switch(struct cpufreq_policy *policy,
>>>>
>>>> static int cppc_verify_policy(struct cpufreq_policy_data *policy)
>>>> {
>>>> - cpufreq_verify_within_cpu_limits(policy);
>>>> + unsigned int min_freq = policy->cpuinfo.min_freq;
>>>> + unsigned int max_freq = policy->cpuinfo.max_freq;
>>>> + struct cpufreq_policy *cpu_policy;
>>>> + struct cppc_cpudata *cpu_data;
>>>> + struct cppc_perf_caps *caps;
>>>> +
>>>> + cpu_policy = cpufreq_cpu_get(policy->cpu);
>>>> + if (!cpu_policy)
>>>> + return -ENODEV;
>>>> +
>>>> + cpu_data = cpu_policy->driver_data;
>>>> + caps = &cpu_data->perf_caps;
>>>> +
>>>> + if (cpu_data->perf_ctrls.auto_sel) {
>>>> + u32 min_perf, max_perf;
>>>> +
>>>> + /*
>>>> + * Set policy limits to HW min/max_perf bounds. In autonomous
>>>> + * mode, scaling_min/max_freq is effectively read-only.
>>>> + */
>>>> + min_perf = cpu_data->perf_ctrls.min_perf ?:
>>>> + caps->lowest_nonlinear_perf;
>>>> + max_perf = cpu_data->perf_ctrls.max_perf ?: caps->nominal_perf;
>>>> +
>>>> + policy->min = cppc_perf_to_khz(caps, min_perf);
>>>> + policy->max = cppc_perf_to_khz(caps, max_perf);
>>>
>>> policy->min/max values are overwritten, but the governor which is
>>> supposed to use them to select the most fitting frequency will be
>>> ignored by the firmware I think.
>>>
>>
>> Yes.
>>
>>>> + } else {
>>>> + cpufreq_verify_within_limits(policy, min_freq, max_freq);
>>>> + }
>>>> +
>>>> + cpufreq_cpu_put(cpu_policy);
>>>> return 0;
>>>> }
>>>>
>
>>> n 08/01/26 22:16, Pierre Gondois wrote:
>>>> External email: Use caution opening links or attachments
>>>>
>>>>
>>>> Hello Sumit, Lifeng,
>>>>
>>>> On 12/23/25 13:13, Sumit Gupta wrote:
>>>>> When autonomous selection (auto_sel) is enabled, the hardware controls
>>>>> performance within min_perf/max_perf register bounds making the
>>>>> scaling_min/max_freq effectively read-only.
>>>> If auto_sel is set, the governor associated to the policy will have no
>>>> actual control.
>>>>
>>>> E.g.:
>>>> If the schedutil governor is used, attempts to set the
>>>> frequency based on CPU utilization will be periodically
>>>> sent, but they will have no effect.
>>>>
>>>> The same thing will happen for the ondemand, performance,
>>>> powersave, userspace, etc. governors. They can only work if
>>>> frequency requests are taken into account.
>>>>
>>>> ------------
>>>>
>>>> This looks like the intel_pstate governor handling where it is possible
>>>> not to have .target() or .target_index() callback and the hardware is in
>>>> charge (IIUC).
>>>> For this case, only 2 governor seem available: performance and powersave.
>>>>
> As you mentioned in [2], 'it still makes sense to have cpufreq requesting a
> certain performance level even though autonomous selection is enabled'. So I
> think it's OK to have a governor when auto_selection is enabled.
>
> [2] https://lore.kernel.org/all/9f46991d-98c3-41f5-8133-6612b397e33a@arm.com/
>
>> Thanks for pointing me to the first version, I forgot how your
>> first implementation was.
>>
>>
>>> In v1 [1], I added a separate cppc_cpufreq_epp_driver instance without
>>> target*() hooks, using setpolicy() instead (similar to AMD pstate).
>>> However, this approach doesn't allow per-CPU control: if we boot with the
>>> EPP driver, we can't dynamically disable auto_sel for individual CPUs and
>>> return to OS governor control (no target hook available). AMD and Intel
>>> pstate drivers seem to set HW autonomous mode for all CPUs globally,
>>> not per-CPU. So, changed it in v2.
>>> [1] https://lore.kernel.org/lkml/20250211103737.447704-6-sumitg@nvidia.com/
>>>
>> Ok right.
>> This is something I don't really understand in the current intel/amd cpufreq
>> drivers. FWIU:
>> - the cpufreq drivers abstractions allow to access different hardware
>> - the governor abstraction allows to switch between different algorithms
>> to select the 'correct' frequency.
>>
>> So IMO switching to autonomous selection should be done by switching
>> to another governor and the 'auto_sel' file should not be accessible to users.
>>
>> ------------
>>
>> Being able to enable/disable the autonomous selection on a per-policy
>> base seems a valid use-case. It also seems to fit the per-policy governor
>> capabilities.
> I'm OK with adding an auto-selection governor. It's better to keep this
> governor only in cppc_cpufreq for now I think.
>
>> However toggling the auto_sel on different CPUs inside the same policy
>> seems inappropriate (this is is not what is done in this patchset IIUC).
>>
> I think Sumit means per-policy when he said per-CPU.
Yes, it's per-policy.
Thank you,
Sumit Gupta
>>>> ------------
>>>>
>>>> In our case, I think it is desired to unload the scaling governor
>>>> currently in
>>>> use if auto_sel is selected. Letting the rest of the system think it has
>>>> control
>>>> over the freq. selection seems incorrect.
>>>> I am not sure what to replace it with:
>>>> -
>>>> There are no specific performance/powersave modes for CPPC.
>>>> There is a range of values between 0-255
>>>> -
>>>> A firmware auto-selection governor could be created just for this case.
>>>> Being able to switch between OS-driven and firmware driven freq. selection
>>>> is not specific to CPPC (for the future).
>>>> However I am not really able to say the implications of doing that.
>>>>
>>>> ------------
>>>>
>>>> I think it would be better to split your patchset in 2:
>>>> 1. adding APIs for the CPPC spec.
>>>> 2. using the APIs, especially for auto_sel
>>>>
>>>> 1. is likely to be straightforward as the APIs will still be used
>>>> by the driver at some point.
>>>> 2. is likely to bring more discussion.
>>>>
>>> We discussed adding a hw_auto_sel governor as a second step, though the
>>> approach may need refinement during implementation.
>> I didn't find in the thread adding a new governor was discussed in the
>> threads, in case you have a direct link.
>>
>>> Deferred it (to second step) because adding a new governor requires
>>> broader discussion.
>>>
>>> This issue already exists in current code - store_auto_select() enables
>>> auto_sel without any governor awareness. These patches improve the
>>> situation by:
>>> - Updating scaling_min/max_freq when toggling auto_sel mode
>>> - Syncing policy limits with actual HW min/max_perf bounds
>>> - Making scaling_min/max_freq read-only in auto_sel mode
>>>
>>> Would it be acceptable to merge this as a first step, with the governor
>>> handling as a follow-up?
>>> If not and you prefer splitting, which grouping works better:
>>> A) Patches 1-8 then 9-11.
>>> B) "ACPI: CPPC *" patches then "cpufreq: CPPC *" patches.
>>>
>> If it's possible I would like to understand what the end result should
>> look like. If ultimately enabling auto_sel implies switching governor
>> I understand, but I didn't find the thread that discussed about that
>> unfortunately.
>>
>>
>>>>> Enforce this by setting policy limits to min/max_perf bounds in
>>>>> cppc_verify_policy(). Users must use min_perf/max_perf sysfs interfaces
>>>>> to change performance limits in autonomous mode.
>>>>>
>>>>> Signed-off-by: Sumit Gupta <sumitg@nvidia.com>
>>>>> ---
>>>>> drivers/cpufreq/cppc_cpufreq.c | 32 +++++++++++++++++++++++++++++++-
>>>>> 1 file changed, 31 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
>>>>> index b1f570d6de34..b3da263c18b0 100644
>>>>> --- a/drivers/cpufreq/cppc_cpufreq.c
>>>>> +++ b/drivers/cpufreq/cppc_cpufreq.c
>>>>> @@ -305,7 +305,37 @@ static unsigned int cppc_cpufreq_fast_switch(struct cpufreq_policy *policy,
>>>>>
>>>>> static int cppc_verify_policy(struct cpufreq_policy_data *policy)
>>>>> {
>>>>> - cpufreq_verify_within_cpu_limits(policy);
>>>>> + unsigned int min_freq = policy->cpuinfo.min_freq;
>>>>> + unsigned int max_freq = policy->cpuinfo.max_freq;
>>>>> + struct cpufreq_policy *cpu_policy;
>>>>> + struct cppc_cpudata *cpu_data;
>>>>> + struct cppc_perf_caps *caps;
>>>>> +
>>>>> + cpu_policy = cpufreq_cpu_get(policy->cpu);
>>>>> + if (!cpu_policy)
>>>>> + return -ENODEV;
>>>>> +
>>>>> + cpu_data = cpu_policy->driver_data;
>>>>> + caps = &cpu_data->perf_caps;
>>>>> +
>>>>> + if (cpu_data->perf_ctrls.auto_sel) {
>>>>> + u32 min_perf, max_perf;
>>>>> +
>>>>> + /*
>>>>> + * Set policy limits to HW min/max_perf bounds. In autonomous
>>>>> + * mode, scaling_min/max_freq is effectively read-only.
>>>>> + */
>>>>> + min_perf = cpu_data->perf_ctrls.min_perf ?:
>>>>> + caps->lowest_nonlinear_perf;
>>>>> + max_perf = cpu_data->perf_ctrls.max_perf ?: caps->nominal_perf;
>>>>> +
>>>>> + policy->min = cppc_perf_to_khz(caps, min_perf);
>>>>> + policy->max = cppc_perf_to_khz(caps, max_perf);
>>>> policy->min/max values are overwritten, but the governor which is
>>>> supposed to use them to select the most fitting frequency will be
>>>> ignored by the firmware I think.
>>>>
>>> Yes.
>>>
>>>>> + } else {
>>>>> + cpufreq_verify_within_limits(policy, min_freq, max_freq);
>>>>> + }
>>>>> +
>>>>> + cpufreq_cpu_put(cpu_policy);
>>>>> return 0;
>>>>> }
>>>>>
On 1/15/26 16:22, Sumit Gupta wrote: > >>>> n 08/01/26 22:16, Pierre Gondois wrote: >>>>> External email: Use caution opening links or attachments >>>>> >>>>> >>>>> Hello Sumit, Lifeng, >>>>> >>>>> On 12/23/25 13:13, Sumit Gupta wrote: >>>>>> When autonomous selection (auto_sel) is enabled, the hardware >>>>>> controls >>>>>> performance within min_perf/max_perf register bounds making the >>>>>> scaling_min/max_freq effectively read-only. >>>>> If auto_sel is set, the governor associated to the policy will >>>>> have no >>>>> actual control. >>>>> >>>>> E.g.: >>>>> If the schedutil governor is used, attempts to set the >>>>> frequency based on CPU utilization will be periodically >>>>> sent, but they will have no effect. >>>>> >>>>> The same thing will happen for the ondemand, performance, >>>>> powersave, userspace, etc. governors. They can only work if >>>>> frequency requests are taken into account. >>>>> >>>>> ------------ >>>>> >>>>> This looks like the intel_pstate governor handling where it is >>>>> possible >>>>> not to have .target() or .target_index() callback and the hardware >>>>> is in >>>>> charge (IIUC). >>>>> For this case, only 2 governor seem available: performance and >>>>> powersave. >>>>> >> As you mentioned in [2], 'it still makes sense to have cpufreq >> requesting a >> certain performance level even though autonomous selection is >> enabled'. So I >> think it's OK to have a governor when auto_selection is enabled. >> Thanks for the pointer, I forgot the spec said said that the desired_perf register could still be used while autonomous selection is activated. This means that having an autonomous selection governor is not a good idea and that your implementation is correct. I'm asking some questions about this to someone who should know more. Would it still be possible to split you patch in two as you suggested in another message, i.e. PATCH[1-7] and PATCH[8-11] ? I don't have any additional comment for PATCH[1-7], but I might have some questions for PATCH[8-11]. This would give me a bit more time. >> [2] >> https://lore.kernel.org/all/9f46991d-98c3-41f5-8133-6612b397e33a@arm.com/ >> >>> Thanks for pointing me to the first version, I forgot how your >>> first implementation was. >>> >>> >>>> In v1 [1], I added a separate cppc_cpufreq_epp_driver instance without >>>> target*() hooks, using setpolicy() instead (similar to AMD pstate). >>>> However, this approach doesn't allow per-CPU control: if we boot >>>> with the >>>> EPP driver, we can't dynamically disable auto_sel for individual >>>> CPUs and >>>> return to OS governor control (no target hook available). AMD and >>>> Intel >>>> pstate drivers seem to set HW autonomous mode for all CPUs globally, >>>> not per-CPU. So, changed it in v2. >>>> [1] >>>> https://lore.kernel.org/lkml/20250211103737.447704-6-sumitg@nvidia.com/ >>>> >>>> >>> Ok right. >>> This is something I don't really understand in the current intel/amd >>> cpufreq >>> drivers. FWIU: >>> - the cpufreq drivers abstractions allow to access different hardware >>> - the governor abstraction allows to switch between different >>> algorithms >>> to select the 'correct' frequency. >>> >>> So IMO switching to autonomous selection should be done by switching >>> to another governor and the 'auto_sel' file should not be accessible >>> to users. The cpufreq driver / governor abstraction still seems a bit stretched for the CPPC case. Indeed, cpufreq uses: - .setpolicy() callback and CPUFREQ_POLICY_XXX policies to allow external scaling algorithms - .target() and .target_index() callbacks and scaling governors for internal scaling algorithms But if CPPC allows to use both internal/external algorithms at the same time, so for instance it is unknown what the 'performance' governor should be converted to: - using the the linux drivers/cpufreq/cpufreq_performance.c governor - setting the energy performance preference to the maximum perf. value with auto_sel=0 - setting the energy performance preference to the maximum perf. value with auto_sel=1 ------------ Right not selecting the 'performance' or 'powersave' policy gives a clear result for pstate as autonomous selection is alwasy enabled IIUC. It would be nice if we could switch between an 'autonomous_performance' and 'non_autonomous_performance' without manually toggling the auto_sel file.
On 2025/12/23 20:13, Sumit Gupta wrote:
> When autonomous selection (auto_sel) is enabled, the hardware controls
> performance within min_perf/max_perf register bounds making the
> scaling_min/max_freq effectively read-only.
>
> Enforce this by setting policy limits to min/max_perf bounds in
> cppc_verify_policy(). Users must use min_perf/max_perf sysfs interfaces
> to change performance limits in autonomous mode.
>
> Signed-off-by: Sumit Gupta <sumitg@nvidia.com>
> ---
> drivers/cpufreq/cppc_cpufreq.c | 32 +++++++++++++++++++++++++++++++-
> 1 file changed, 31 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
> index b1f570d6de34..b3da263c18b0 100644
> --- a/drivers/cpufreq/cppc_cpufreq.c
> +++ b/drivers/cpufreq/cppc_cpufreq.c
> @@ -305,7 +305,37 @@ static unsigned int cppc_cpufreq_fast_switch(struct cpufreq_policy *policy,
>
> static int cppc_verify_policy(struct cpufreq_policy_data *policy)
> {
> - cpufreq_verify_within_cpu_limits(policy);
> + unsigned int min_freq = policy->cpuinfo.min_freq;
> + unsigned int max_freq = policy->cpuinfo.max_freq;
> + struct cpufreq_policy *cpu_policy;
> + struct cppc_cpudata *cpu_data;
> + struct cppc_perf_caps *caps;
> +
> + cpu_policy = cpufreq_cpu_get(policy->cpu);
Better to use:
struct cpufreq_policy *cpu_policy __free(put_cpufreq_policy) = cpufreq_cpu_get(policy->cpu);
> + if (!cpu_policy)
> + return -ENODEV;
> +
> + cpu_data = cpu_policy->driver_data;
> + caps = &cpu_data->perf_caps;
cpu_policy, cpu_data and cpas are only used in the if branch. Just put them
in it.
> +
> + if (cpu_data->perf_ctrls.auto_sel) {
> + u32 min_perf, max_perf;
> +
> + /*
> + * Set policy limits to HW min/max_perf bounds. In autonomous
> + * mode, scaling_min/max_freq is effectively read-only.
> + */
> + min_perf = cpu_data->perf_ctrls.min_perf ?:
> + caps->lowest_nonlinear_perf;
> + max_perf = cpu_data->perf_ctrls.max_perf ?: caps->nominal_perf;
> +
> + policy->min = cppc_perf_to_khz(caps, min_perf);
> + policy->max = cppc_perf_to_khz(caps, max_perf);
> + } else {
> + cpufreq_verify_within_limits(policy, min_freq, max_freq);
Why not still using cpufreq_verify_within_cpu_limits()?
> + }
> +
> + cpufreq_cpu_put(cpu_policy);
> return 0;
> }
>
On 26/12/25 08:56, zhenglifeng (A) wrote:
> External email: Use caution opening links or attachments
>
>
> On 2025/12/23 20:13, Sumit Gupta wrote:
>> When autonomous selection (auto_sel) is enabled, the hardware controls
>> performance within min_perf/max_perf register bounds making the
>> scaling_min/max_freq effectively read-only.
>>
>> Enforce this by setting policy limits to min/max_perf bounds in
>> cppc_verify_policy(). Users must use min_perf/max_perf sysfs interfaces
>> to change performance limits in autonomous mode.
>>
>> Signed-off-by: Sumit Gupta <sumitg@nvidia.com>
>> ---
>> drivers/cpufreq/cppc_cpufreq.c | 32 +++++++++++++++++++++++++++++++-
>> 1 file changed, 31 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
>> index b1f570d6de34..b3da263c18b0 100644
>> --- a/drivers/cpufreq/cppc_cpufreq.c
>> +++ b/drivers/cpufreq/cppc_cpufreq.c
>> @@ -305,7 +305,37 @@ static unsigned int cppc_cpufreq_fast_switch(struct cpufreq_policy *policy,
>>
>> static int cppc_verify_policy(struct cpufreq_policy_data *policy)
>> {
>> - cpufreq_verify_within_cpu_limits(policy);
>> + unsigned int min_freq = policy->cpuinfo.min_freq;
>> + unsigned int max_freq = policy->cpuinfo.max_freq;
>> + struct cpufreq_policy *cpu_policy;
>> + struct cppc_cpudata *cpu_data;
>> + struct cppc_perf_caps *caps;
>> +
>> + cpu_policy = cpufreq_cpu_get(policy->cpu);
> Better to use:
>
> struct cpufreq_policy *cpu_policy __free(put_cpufreq_policy) = cpufreq_cpu_get(policy->cpu);
Will use this in v6.
>> + if (!cpu_policy)
>> + return -ENODEV;
>> +
>> + cpu_data = cpu_policy->driver_data;
>> + caps = &cpu_data->perf_caps;
> cpu_policy, cpu_data and cpas are only used in the if branch. Just put them
> in it.
Can move caps inside the if branch.
cpu_policy and cpu_data can't be moved inside because we need
perf_ctrls.auto_sel to evaluate the condition itself.
>> +
>> + if (cpu_data->perf_ctrls.auto_sel) {
>> + u32 min_perf, max_perf;
>> +
>> + /*
>> + * Set policy limits to HW min/max_perf bounds. In autonomous
>> + * mode, scaling_min/max_freq is effectively read-only.
>> + */
>> + min_perf = cpu_data->perf_ctrls.min_perf ?:
>> + caps->lowest_nonlinear_perf;
>> + max_perf = cpu_data->perf_ctrls.max_perf ?: caps->nominal_perf;
>> +
>> + policy->min = cppc_perf_to_khz(caps, min_perf);
>> + policy->max = cppc_perf_to_khz(caps, max_perf);
>> + } else {
>> + cpufreq_verify_within_limits(policy, min_freq, max_freq);
> Why not still using cpufreq_verify_within_cpu_limits()?
Will change to use it in v6.
Thank you,
Sumit Gupta
>> + }
>> +
>> + cpufreq_cpu_put(cpu_policy);
>> return 0;
>> }
>>
© 2016 - 2026 Red Hat, Inc.