cpufreq: governor: Apply limits with target_freq instead of policy->cur

[PATCH 0/2] cpufreq: governor: Apply limits with target_freq instead of policy->cur

Posted by Lifeng Zheng 1 month, 3 weeks ago

The motivation for this patchset cames from a test on our platform:

With conservative governor and some pressure on CPU, the frequency rapidly
reach the max supported frequency, such as 2GHz.

Later, some frequency division strategies on our platform were triggered
and the actual frequency become 500MHz -- 1/4 of the OS distribution
frequency.

At that time, if someone excecutes 'cat cpuinfo_cur_freq', the actual
frequency will become 250MHz -- 1/4 of the min supported frequency.

After the platform recovering from the frequency division, the frequency
will stay on 1GHz, until the pressure disappear.

The reason this happens is that in cpufreq_verify_current_freq(), if
policy->cur != new_freq, policy->update will be queued, which will
ultimately lead to a call to cpufreq_policy_apply_limits(), and update the
target frequency to policy->min. And then in cs_dbs_update(), since the
pressure never vanish, it will always hit the following branches:

	if (load > dbs_data->up_threshold) {
		dbs_info->down_skip = 0;

		/* if we are already at full speed then break out early */
		if (requested_freq == policy->max)
			goto out;

Therefore, the target frequency will always remain at the lowest frequency.

The branching conditions in cs_dbs_update() may not be strict enough, but
the root cause of this problem is that the target frequency was updated
when querying cpuinfo_cur_freq. For ondemand and schedutil governor,
although the frequency will not always remain at the lowest level without
rising, will still be min_freq in a short period of time when the query
action occurs.

Using the freq requested by the governor to decide whether to update the
target frequency is more reasonable in cpufreq_policy_apply_limits().

Lifeng Zheng (2):
  cpufreq: governor: Move requested_freq to policy_dbs_info
  cpufreq: governor: Apply limits with requested_freq or next_freq

 drivers/cpufreq/cpufreq_conservative.c | 14 ++++----------
 drivers/cpufreq/cpufreq_governor.c     |  3 ++-
 drivers/cpufreq/cpufreq_governor.h     | 12 ++++++++++++
 drivers/cpufreq/cpufreq_ondemand.c     | 10 +++++-----
 include/linux/cpufreq.h                |  7 ++++---
 kernel/sched/cpufreq_schedutil.c       |  4 ++--
 6 files changed, 29 insertions(+), 21 deletions(-)

-- 
2.33.0

Re: [PATCH 0/2] cpufreq: governor: Apply limits with target_freq instead of policy->cur

Posted by Viresh Kumar 3 weeks, 3 days ago

On 10-02-26, 19:54, Lifeng Zheng wrote:
> The motivation for this patchset cames from a test on our platform:
> 
> With conservative governor and some pressure on CPU, the frequency rapidly
> reach the max supported frequency, such as 2GHz.
> 
> Later, some frequency division strategies on our platform were triggered
> and the actual frequency become 500MHz -- 1/4 of the OS distribution
> frequency.
> 
> At that time, if someone excecutes 'cat cpuinfo_cur_freq', the actual
> frequency will become 250MHz -- 1/4 of the min supported frequency.
> 
> After the platform recovering from the frequency division, the frequency
> will stay on 1GHz, until the pressure disappear.
> 
> The reason this happens is that in cpufreq_verify_current_freq(), if
> policy->cur != new_freq, policy->update will be queued, which will
> ultimately lead to a call to cpufreq_policy_apply_limits(), and update the
> target frequency to policy->min. And then in cs_dbs_update(), since the
> pressure never vanish, it will always hit the following branches:
> 
> 	if (load > dbs_data->up_threshold) {
> 		dbs_info->down_skip = 0;
> 
> 		/* if we are already at full speed then break out early */
> 		if (requested_freq == policy->max)
> 			goto out;
> 
> Therefore, the target frequency will always remain at the lowest frequency.
> 
> The branching conditions in cs_dbs_update() may not be strict enough, but
> the root cause of this problem is that the target frequency was updated
> when querying cpuinfo_cur_freq. For ondemand and schedutil governor,
> although the frequency will not always remain at the lowest level without
> rising, will still be min_freq in a short period of time when the query
> action occurs.
> 
> Using the freq requested by the governor to decide whether to update the
> target frequency is more reasonable in cpufreq_policy_apply_limits().

I think I understand the problem now. We are tracking the current
frequency state via two cached values, policy->cur and requested_freq
and a mismatch (because of your hardware specific quirks/features)
between them is making things tricky.

Rafael, will this break anything we can think about ?

diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c
index e0e847764511..c69577e4f941 100644
--- a/drivers/cpufreq/cpufreq_conservative.c
+++ b/drivers/cpufreq/cpufreq_conservative.c
@@ -14,7 +14,6 @@
 struct cs_policy_dbs_info {
        struct policy_dbs_info policy_dbs;
        unsigned int down_skip;
-       unsigned int requested_freq;
 };

 static inline struct cs_policy_dbs_info *to_dbs_info(struct policy_dbs_info *policy_dbs)
@@ -59,10 +58,10 @@ static unsigned int cs_dbs_update(struct cpufreq_policy *policy)
 {
        struct policy_dbs_info *policy_dbs = policy->governor_data;
        struct cs_policy_dbs_info *dbs_info = to_dbs_info(policy_dbs);
-       unsigned int requested_freq = dbs_info->requested_freq;
        struct dbs_data *dbs_data = policy_dbs->dbs_data;
        struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;
        unsigned int load = dbs_update(policy);
+       unsigned int requested_freq = policy->cur;
        unsigned int freq_step;

        /*
@@ -72,16 +71,6 @@ static unsigned int cs_dbs_update(struct cpufreq_policy *policy)
        if (cs_tuners->freq_step == 0)
                goto out;

-       /*
-        * If requested_freq is out of range, it is likely that the limits
-        * changed in the meantime, so fall back to current frequency in that
-        * case.
-        */
-       if (requested_freq > policy->max || requested_freq < policy->min) {
-               requested_freq = policy->cur;
-               dbs_info->requested_freq = requested_freq;
-       }
-
        freq_step = get_freq_step(cs_tuners, policy);

        /*
@@ -113,7 +102,6 @@ static unsigned int cs_dbs_update(struct cpufreq_policy *policy)

                __cpufreq_driver_target(policy, requested_freq,
                                        CPUFREQ_RELATION_HE);
-               dbs_info->requested_freq = requested_freq;
                goto out;
        }

@@ -137,7 +125,6 @@ static unsigned int cs_dbs_update(struct cpufreq_policy *policy)

                __cpufreq_driver_target(policy, requested_freq,
                                        CPUFREQ_RELATION_LE);
-               dbs_info->requested_freq = requested_freq;
        }

  out:
@@ -310,7 +297,6 @@ static void cs_start(struct cpufreq_policy *policy)
        struct cs_policy_dbs_info *dbs_info = to_dbs_info(policy->governor_data);

        dbs_info->down_skip = 0;
-       dbs_info->requested_freq = policy->cur;
 }

 static struct dbs_governor cs_governor = {

-------------------------8<-------------------------

This always pick the next freq based on policy->cur instead of the
real last request. The two can differ if:
- the hardware plays with current frequency, as is the case here.
- or the limits change and that changes the current frequency (in
  which case we will be at policy->min/max anyway).

-- 
viresh

Re: [PATCH 0/2] cpufreq: governor: Apply limits with target_freq instead of policy->cur

Posted by Rafael J. Wysocki 3 weeks, 3 days ago

On Mon, Mar 9, 2026 at 9:27 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 10-02-26, 19:54, Lifeng Zheng wrote:
> > The motivation for this patchset cames from a test on our platform:
> >
> > With conservative governor and some pressure on CPU, the frequency rapidly
> > reach the max supported frequency, such as 2GHz.
> >
> > Later, some frequency division strategies on our platform were triggered
> > and the actual frequency become 500MHz -- 1/4 of the OS distribution
> > frequency.
> >
> > At that time, if someone excecutes 'cat cpuinfo_cur_freq', the actual
> > frequency will become 250MHz -- 1/4 of the min supported frequency.
> >
> > After the platform recovering from the frequency division, the frequency
> > will stay on 1GHz, until the pressure disappear.
> >
> > The reason this happens is that in cpufreq_verify_current_freq(), if
> > policy->cur != new_freq, policy->update will be queued, which will
> > ultimately lead to a call to cpufreq_policy_apply_limits(), and update the
> > target frequency to policy->min. And then in cs_dbs_update(), since the
> > pressure never vanish, it will always hit the following branches:
> >
> >       if (load > dbs_data->up_threshold) {
> >               dbs_info->down_skip = 0;
> >
> >               /* if we are already at full speed then break out early */
> >               if (requested_freq == policy->max)
> >                       goto out;
> >
> > Therefore, the target frequency will always remain at the lowest frequency.
> >
> > The branching conditions in cs_dbs_update() may not be strict enough, but
> > the root cause of this problem is that the target frequency was updated
> > when querying cpuinfo_cur_freq. For ondemand and schedutil governor,
> > although the frequency will not always remain at the lowest level without
> > rising, will still be min_freq in a short period of time when the query
> > action occurs.
> >
> > Using the freq requested by the governor to decide whether to update the
> > target frequency is more reasonable in cpufreq_policy_apply_limits().
>
> I think I understand the problem now. We are tracking the current
> frequency state via two cached values, policy->cur and requested_freq
> and a mismatch (because of your hardware specific quirks/features)
> between them is making things tricky.
>
> Rafael, will this break anything we can think about ?

I can't recall, but the new code is simpler, so unless anyone has a
particular heartburn with it, go for it.

> diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c
> index e0e847764511..c69577e4f941 100644
> --- a/drivers/cpufreq/cpufreq_conservative.c
> +++ b/drivers/cpufreq/cpufreq_conservative.c
> @@ -14,7 +14,6 @@
>  struct cs_policy_dbs_info {
>         struct policy_dbs_info policy_dbs;
>         unsigned int down_skip;
> -       unsigned int requested_freq;
>  };
>
>  static inline struct cs_policy_dbs_info *to_dbs_info(struct policy_dbs_info *policy_dbs)
> @@ -59,10 +58,10 @@ static unsigned int cs_dbs_update(struct cpufreq_policy *policy)
>  {
>         struct policy_dbs_info *policy_dbs = policy->governor_data;
>         struct cs_policy_dbs_info *dbs_info = to_dbs_info(policy_dbs);
> -       unsigned int requested_freq = dbs_info->requested_freq;
>         struct dbs_data *dbs_data = policy_dbs->dbs_data;
>         struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;
>         unsigned int load = dbs_update(policy);
> +       unsigned int requested_freq = policy->cur;
>         unsigned int freq_step;
>
>         /*
> @@ -72,16 +71,6 @@ static unsigned int cs_dbs_update(struct cpufreq_policy *policy)
>         if (cs_tuners->freq_step == 0)
>                 goto out;
>
> -       /*
> -        * If requested_freq is out of range, it is likely that the limits
> -        * changed in the meantime, so fall back to current frequency in that
> -        * case.
> -        */
> -       if (requested_freq > policy->max || requested_freq < policy->min) {
> -               requested_freq = policy->cur;
> -               dbs_info->requested_freq = requested_freq;
> -       }
> -
>         freq_step = get_freq_step(cs_tuners, policy);
>
>         /*
> @@ -113,7 +102,6 @@ static unsigned int cs_dbs_update(struct cpufreq_policy *policy)
>
>                 __cpufreq_driver_target(policy, requested_freq,
>                                         CPUFREQ_RELATION_HE);
> -               dbs_info->requested_freq = requested_freq;
>                 goto out;
>         }
>
> @@ -137,7 +125,6 @@ static unsigned int cs_dbs_update(struct cpufreq_policy *policy)
>
>                 __cpufreq_driver_target(policy, requested_freq,
>                                         CPUFREQ_RELATION_LE);
> -               dbs_info->requested_freq = requested_freq;
>         }
>
>   out:
> @@ -310,7 +297,6 @@ static void cs_start(struct cpufreq_policy *policy)
>         struct cs_policy_dbs_info *dbs_info = to_dbs_info(policy->governor_data);
>
>         dbs_info->down_skip = 0;
> -       dbs_info->requested_freq = policy->cur;
>  }
>
>  static struct dbs_governor cs_governor = {
>
> -------------------------8<-------------------------
>
> This always pick the next freq based on policy->cur instead of the
> real last request. The two can differ if:
> - the hardware plays with current frequency, as is the case here.
> - or the limits change and that changes the current frequency (in
>   which case we will be at policy->min/max anyway).
>
> --
> viresh

Re: [PATCH 0/2] cpufreq: governor: Apply limits with target_freq instead of policy->cur

Posted by zhenglifeng (A) 3 weeks, 2 days ago

On 3/9/2026 8:32 PM, Rafael J. Wysocki wrote:
> On Mon, Mar 9, 2026 at 9:27 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>>
>> On 10-02-26, 19:54, Lifeng Zheng wrote:
>>> The motivation for this patchset cames from a test on our platform:
>>>
>>> With conservative governor and some pressure on CPU, the frequency rapidly
>>> reach the max supported frequency, such as 2GHz.
>>>
>>> Later, some frequency division strategies on our platform were triggered
>>> and the actual frequency become 500MHz -- 1/4 of the OS distribution
>>> frequency.
>>>
>>> At that time, if someone excecutes 'cat cpuinfo_cur_freq', the actual
>>> frequency will become 250MHz -- 1/4 of the min supported frequency.
>>>
>>> After the platform recovering from the frequency division, the frequency
>>> will stay on 1GHz, until the pressure disappear.
>>>
>>> The reason this happens is that in cpufreq_verify_current_freq(), if
>>> policy->cur != new_freq, policy->update will be queued, which will
>>> ultimately lead to a call to cpufreq_policy_apply_limits(), and update the
>>> target frequency to policy->min. And then in cs_dbs_update(), since the
>>> pressure never vanish, it will always hit the following branches:
>>>
>>>       if (load > dbs_data->up_threshold) {
>>>               dbs_info->down_skip = 0;
>>>
>>>               /* if we are already at full speed then break out early */
>>>               if (requested_freq == policy->max)
>>>                       goto out;
>>>
>>> Therefore, the target frequency will always remain at the lowest frequency.
>>>
>>> The branching conditions in cs_dbs_update() may not be strict enough, but
>>> the root cause of this problem is that the target frequency was updated
>>> when querying cpuinfo_cur_freq. For ondemand and schedutil governor,
>>> although the frequency will not always remain at the lowest level without
>>> rising, will still be min_freq in a short period of time when the query
>>> action occurs.
>>>
>>> Using the freq requested by the governor to decide whether to update the
>>> target frequency is more reasonable in cpufreq_policy_apply_limits().
>>
>> I think I understand the problem now. We are tracking the current
>> frequency state via two cached values, policy->cur and requested_freq
>> and a mismatch (because of your hardware specific quirks/features)
>> between them is making things tricky.
>>
>> Rafael, will this break anything we can think about ?
> 
> I can't recall, but the new code is simpler, so unless anyone has a
> particular heartburn with it, go for it.

Hi Viresh,

Sorry, I don't think the new code can totally solve the problem I met.

In the example I showed above, policy->cur will become 1GHz when someone
query cpuinfo_cur_freq, because the real freq is 500MHz. After that,
conservative governor has to take some time to get to the max freq again
step by step in your code. This means the real freq will shake whenever
cpuinfo_cur_freq is queried. This situation also occurs with ondemand and
schedutil governor, but they don't need to increase the frequency step by
step.

I still believe using policy->cur to update target in
cpufreq_policy_apply_limits() is not very reasonable, because OS never
knows what the platform will do to affect the final frequency. OS should
determine the new target frequency based on the previous decision.

> 
>> diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c
>> index e0e847764511..c69577e4f941 100644
>> --- a/drivers/cpufreq/cpufreq_conservative.c
>> +++ b/drivers/cpufreq/cpufreq_conservative.c
>> @@ -14,7 +14,6 @@
>>  struct cs_policy_dbs_info {
>>         struct policy_dbs_info policy_dbs;
>>         unsigned int down_skip;
>> -       unsigned int requested_freq;
>>  };
>>
>>  static inline struct cs_policy_dbs_info *to_dbs_info(struct policy_dbs_info *policy_dbs)
>> @@ -59,10 +58,10 @@ static unsigned int cs_dbs_update(struct cpufreq_policy *policy)
>>  {
>>         struct policy_dbs_info *policy_dbs = policy->governor_data;
>>         struct cs_policy_dbs_info *dbs_info = to_dbs_info(policy_dbs);
>> -       unsigned int requested_freq = dbs_info->requested_freq;
>>         struct dbs_data *dbs_data = policy_dbs->dbs_data;
>>         struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;
>>         unsigned int load = dbs_update(policy);
>> +       unsigned int requested_freq = policy->cur;
>>         unsigned int freq_step;
>>
>>         /*
>> @@ -72,16 +71,6 @@ static unsigned int cs_dbs_update(struct cpufreq_policy *policy)
>>         if (cs_tuners->freq_step == 0)
>>                 goto out;
>>
>> -       /*
>> -        * If requested_freq is out of range, it is likely that the limits
>> -        * changed in the meantime, so fall back to current frequency in that
>> -        * case.
>> -        */
>> -       if (requested_freq > policy->max || requested_freq < policy->min) {
>> -               requested_freq = policy->cur;
>> -               dbs_info->requested_freq = requested_freq;
>> -       }
>> -
>>         freq_step = get_freq_step(cs_tuners, policy);
>>
>>         /*
>> @@ -113,7 +102,6 @@ static unsigned int cs_dbs_update(struct cpufreq_policy *policy)
>>
>>                 __cpufreq_driver_target(policy, requested_freq,
>>                                         CPUFREQ_RELATION_HE);
>> -               dbs_info->requested_freq = requested_freq;
>>                 goto out;
>>         }
>>
>> @@ -137,7 +125,6 @@ static unsigned int cs_dbs_update(struct cpufreq_policy *policy)
>>
>>                 __cpufreq_driver_target(policy, requested_freq,
>>                                         CPUFREQ_RELATION_LE);
>> -               dbs_info->requested_freq = requested_freq;
>>         }
>>
>>   out:
>> @@ -310,7 +297,6 @@ static void cs_start(struct cpufreq_policy *policy)
>>         struct cs_policy_dbs_info *dbs_info = to_dbs_info(policy->governor_data);
>>
>>         dbs_info->down_skip = 0;
>> -       dbs_info->requested_freq = policy->cur;
>>  }
>>
>>  static struct dbs_governor cs_governor = {
>>
>> -------------------------8<-------------------------
>>
>> This always pick the next freq based on policy->cur instead of the
>> real last request. The two can differ if:
>> - the hardware plays with current frequency, as is the case here.
>> - or the limits change and that changes the current frequency (in
>>   which case we will be at policy->min/max anyway).
>>
>> --
>> viresh
>

Re: [PATCH 0/2] cpufreq: governor: Apply limits with target_freq instead of policy->cur

Posted by Viresh Kumar 3 weeks, 2 days ago

On 10-03-26, 10:22, zhenglifeng (A) wrote:
> In the example I showed above, policy->cur will become 1GHz when someone
> query cpuinfo_cur_freq, because the real freq is 500MHz. After that,
> conservative governor has to take some time to get to the max freq again
> step by step in your code.

The problem is that your platform is doing some tricky stuff (by
changing the frequency for some reason) and it expects the governor to
still work the same. The software is trying to get a target frequency
based on an algorithm and you are ending up breaking it. If this is
software controlled (instead of hardware), I think a better solution
is to play with QoS requests and limit the policy->max frequency, so
the cpufreq core remains in sync instead of reducing the underlying
frequency without informing the software. All cpufreq core can do now
is go adjust based on what you are doing. Yes, conservative governor
will start again from scratch here but this is what you are doing
anyways. Either, don't change the underlying freq or use a different
governor. I don't know how this can be fixed otherwise.

> I still believe using policy->cur to update target in
> cpufreq_policy_apply_limits() is not very reasonable, because OS never
> knows what the platform will do to affect the final frequency.

The software is doing the right thing by using the cached current
frequency (best known to it).

We are using policy->cur at too many places and we will not be able to
change all of them assuming the hardware is not in sync with the
request made earlier. The change you propose may fix your issue for
time being, but it is just a band-aid which I don't want to apply.
When the hardware changes underlying freq, there are far too many
things broken elsewhere too.

-- 
viresh

Re: [PATCH 0/2] cpufreq: governor: Apply limits with target_freq instead of policy->cur

Posted by zhenglifeng (A) 3 weeks, 1 day ago

On 3/10/2026 12:29 PM, Viresh Kumar wrote:
> On 10-03-26, 10:22, zhenglifeng (A) wrote:
>> In the example I showed above, policy->cur will become 1GHz when someone
>> query cpuinfo_cur_freq, because the real freq is 500MHz. After that,
>> conservative governor has to take some time to get to the max freq again
>> step by step in your code.
> 
> The problem is that your platform is doing some tricky stuff (by
> changing the frequency for some reason) and it expects the governor to
> still work the same. The software is trying to get a target frequency
> based on an algorithm and you are ending up breaking it. If this is
> software controlled (instead of hardware), I think a better solution
> is to play with QoS requests and limit the policy->max frequency, so
> the cpufreq core remains in sync instead of reducing the underlying
> frequency without informing the software. All cpufreq core can do now
> is go adjust based on what you are doing. Yes, conservative governor
> will start again from scratch here but this is what you are doing
> anyways. Either, don't change the underlying freq or use a different
> governor. I don't know how this can be fixed otherwise.
> 
>> I still believe using policy->cur to update target in
>> cpufreq_policy_apply_limits() is not very reasonable, because OS never
>> knows what the platform will do to affect the final frequency.
> 
> The software is doing the right thing by using the cached current
> frequency (best known to it).
> 
> We are using policy->cur at too many places and we will not be able to
> change all of them assuming the hardware is not in sync with the
> request made earlier. The change you propose may fix your issue for
> time being, but it is just a band-aid which I don't want to apply.
> When the hardware changes underlying freq, there are far too many
> things broken elsewhere too.
> 

I basically agree with what you said. But I have a little question. Do you
think the target freq should be updated when querying cpuinfo_cur_freq? I
think it should only be updated by governor or when limits changed.

Re: [PATCH 0/2] cpufreq: governor: Apply limits with target_freq instead of policy->cur

Posted by Viresh Kumar 3 weeks ago

On 10-03-26, 21:11, zhenglifeng (A) wrote:
> I basically agree with what you said. But I have a little question. Do you
> think the target freq should be updated when querying cpuinfo_cur_freq? I
> think it should only be updated by governor or when limits changed.

I am not sure I understand the question. Where are we updating
target-freq ? As far as cpuinfo_cur_freq goes, there is no need to
make any changes when that is queried but..
- cpufreq core caches the currently known freq as policy->cur and uses
  it everywhere.
- When cpuinfo_cur_freq is queried, we go and check with the hardware,
  what the frequency is.
- If a difference is found, we align policy->cur with that, given it
  is still withing policy limits. That's all.

-- 
viresh

Re: [PATCH 0/2] cpufreq: governor: Apply limits with target_freq instead of policy->cur

Posted by zhenglifeng (A) 2 weeks, 6 days ago

On 3/12/2026 2:16 PM, Viresh Kumar wrote:
> On 10-03-26, 21:11, zhenglifeng (A) wrote:
>> I basically agree with what you said. But I have a little question. Do you
>> think the target freq should be updated when querying cpuinfo_cur_freq? I
>> think it should only be updated by governor or when limits changed.
> 
> I am not sure I understand the question. Where are we updating
> target-freq ? As far as cpuinfo_cur_freq goes, there is no need to
> make any changes when that is queried but..
> - cpufreq core caches the currently known freq as policy->cur and uses
>   it everywhere.
> - When cpuinfo_cur_freq is queried, we go and check with the hardware,
>   what the frequency is.
> - If a difference is found, we align policy->cur with that, given it
>   is still withing policy limits. That's all.
> 

When querying cpuinfo_cur_freq, based on the following call chain, the
target frequency will eventually be updated if policy->cur is out of
limits:

show_cpuinfo_cur_freq()
  __cpufreq_get()
    cpufreq_verify_current_freq()
      handle_update()
        refresh_frequency_limits()
          cpufreq_set_policy()
            cpufreq_governor_limits()
              cpufreq_dbs_governor_limits()
                cpufreq_policy_apply_limits()
                  __cpufreq_driver_target()

At least to me, it's strange that querying frequency would cause the
target frequency to be updated.

Re: [PATCH 0/2] cpufreq: governor: Apply limits with target_freq instead of policy->cur

Posted by Viresh Kumar 2 weeks, 6 days ago

On 13-03-26, 11:51, zhenglifeng (A) wrote:
> When querying cpuinfo_cur_freq, based on the following call chain, the
> target frequency will eventually be updated if policy->cur is out of
> limits:
> 
> show_cpuinfo_cur_freq()
>   __cpufreq_get()
>     cpufreq_verify_current_freq()
>       handle_update()
>         refresh_frequency_limits()
>           cpufreq_set_policy()
>             cpufreq_governor_limits()
>               cpufreq_dbs_governor_limits()
>                 cpufreq_policy_apply_limits()
>                   __cpufreq_driver_target()
> 
> At least to me, it's strange that querying frequency would cause the
> target frequency to be updated.

Yeah, that is an optimization and not a feature. Normally checking the frequency
shouldn't lead to anything, but cpufreq core needs to fix itself and it actually
thinks that this won't happen. The hardware (yours) doing something secretly
isn't right and breaks the design to be honest. One way to fix all this mess is
to make your platform not do this and instead play with policy->min/max like
everyone else, instead of making a hard change secretly.

-- 
viresh

Re: [PATCH 0/2] cpufreq: governor: Apply limits with target_freq instead of policy->cur

Posted by zhenglifeng (A) 2 weeks, 6 days ago

On 3/13/2026 2:05 PM, Viresh Kumar wrote:
> On 13-03-26, 11:51, zhenglifeng (A) wrote:
>> When querying cpuinfo_cur_freq, based on the following call chain, the
>> target frequency will eventually be updated if policy->cur is out of
>> limits:
>>
>> show_cpuinfo_cur_freq()
>>   __cpufreq_get()
>>     cpufreq_verify_current_freq()
>>       handle_update()
>>         refresh_frequency_limits()
>>           cpufreq_set_policy()
>>             cpufreq_governor_limits()
>>               cpufreq_dbs_governor_limits()
>>                 cpufreq_policy_apply_limits()
>>                   __cpufreq_driver_target()
>>
>> At least to me, it's strange that querying frequency would cause the
>> target frequency to be updated.
> 
> Yeah, that is an optimization and not a feature. Normally checking the frequency
> shouldn't lead to anything, but cpufreq core needs to fix itself and it actually
> thinks that this won't happen. The hardware (yours) doing something secretly
> isn't right and breaks the design to be honest. One way to fix all this mess is
> to make your platform not do this and instead play with policy->min/max like
> everyone else, instead of making a hard change secretly.
> 

Let's forget what happens on our platform. The current discussion is no
longer about applying this patchset. What I really want to discuss are two
questions:

1. Is the platform allowed to adjust to frequencies out of the limits?

My answer is YES. policy->min and policy->max can be modified by the user.
policy->min can even be increased to be as large as policy->max. However,
due to some other limitations(temperature, power, current etc.), the
platform sometimes has to (secretly as you say) reduce the actual
frequency, inevitably falling below the OS's frequency limits. I believe
this situation will occur not only on our platform. The OS should not treat
this situation as an anomaly, and leading to unpredictable behavior. The OS
should have the ability to self-regulate. I believe that your patch is for
this purpose.

2. Is it reasonable to update the target frequency when querying
cpuinfo_cur_freq?

My answer is NO. The idea that "observation can affect the outcome" should
only apply in quantum mechanics. If querying cpuinfo_cur_freq results in an
update of the target frequency, it means that the target frequency should
indeed be updated, regardless of the query itself. So I think the best
approach is to handle this properly in gov_dbs_update(). But I am not sure
if obtaining the actual frequency in gov_dbs_update() will burden the task.
Perhaps this is why target freq updates are made when querying
cpuinfo_cur_freq now. But this does not mean that it is reasonable. It is
what you call a band-aid I think.

Please let me know if you have a different opinion.

Re: [PATCH 0/2] cpufreq: governor: Apply limits with target_freq instead of policy->cur

Posted by Viresh Kumar 2 weeks, 2 days ago

On 13-03-26, 18:12, zhenglifeng (A) wrote:
> Let's forget what happens on our platform. The current discussion is no
> longer about applying this patchset. What I really want to discuss are two
> questions:
> 
> 1. Is the platform allowed to adjust to frequencies out of the limits?
> 
> My answer is YES. policy->min and policy->max can be modified by the user.
> policy->min can even be increased to be as large as policy->max. However,
> due to some other limitations(temperature, power, current etc.), the
> platform sometimes has to (secretly as you say) reduce the actual
> frequency, inevitably falling below the OS's frequency limits.

It is fine if you want to do that, just that you should use OS to make that
change.
- Update freq table to contain all valid frequencies (1/4 of min in you case).
- Update policy->min to current min, so you don't go to those low frequencies.
- Use freq qos or sysfs to change min/max when you want to go really low, like
  1/4 of min.

> I believe
> this situation will occur not only on our platform. The OS should not treat
> this situation as an anomaly, and leading to unpredictable behavior. The OS
> should have the ability to self-regulate. I believe that your patch is for
> this purpose.

When the OS is kept in the dark about such a change, it needs to figure out a
way to get back. That's all we are trying here.

> 2. Is it reasonable to update the target frequency when querying
> cpuinfo_cur_freq?
> 
> My answer is NO. The idea that "observation can affect the outcome" should
> only apply in quantum mechanics. If querying cpuinfo_cur_freq results in an
> update of the target frequency, it means that the target frequency should
> indeed be updated, regardless of the query itself. So I think the best
> approach is to handle this properly in gov_dbs_update(). But I am not sure
> if obtaining the actual frequency in gov_dbs_update() will burden the task.

Exactly the point. We have chosen to do this only when cpufreq core gets to know
the real frequency. Reading current frequency is a time consuming process and we
don't want to do it unnecessarily.

> Perhaps this is why target freq updates are made when querying
> cpuinfo_cur_freq now. But this does not mean that it is reasonable. It is
> what you call a band-aid I think.

Of course it is a band-aid. No doubt about that. It may not be the best way out,
but it is how it is right now.

-- 
viresh

Re: [PATCH 0/2] cpufreq: governor: Apply limits with target_freq instead of policy->cur

Posted by zhenglifeng (A) 2 weeks, 2 days ago

On 3/17/2026 2:20 PM, Viresh Kumar wrote:
> On 13-03-26, 18:12, zhenglifeng (A) wrote:
>> Let's forget what happens on our platform. The current discussion is no
>> longer about applying this patchset. What I really want to discuss are two
>> questions:
>>
>> 1. Is the platform allowed to adjust to frequencies out of the limits?
>>
>> My answer is YES. policy->min and policy->max can be modified by the user.
>> policy->min can even be increased to be as large as policy->max. However,
>> due to some other limitations(temperature, power, current etc.), the
>> platform sometimes has to (secretly as you say) reduce the actual
>> frequency, inevitably falling below the OS's frequency limits.
> 
> It is fine if you want to do that, just that you should use OS to make that
> change.
> - Update freq table to contain all valid frequencies (1/4 of min in you case).
> - Update policy->min to current min, so you don't go to those low frequencies.
> - Use freq qos or sysfs to change min/max when you want to go really low, like
>   1/4 of min.
> 
>> I believe
>> this situation will occur not only on our platform. The OS should not treat
>> this situation as an anomaly, and leading to unpredictable behavior. The OS
>> should have the ability to self-regulate. I believe that your patch is for
>> this purpose.
> 
> When the OS is kept in the dark about such a change, it needs to figure out a
> way to get back. That's all we are trying here.

I see. Then I think the point is how to let OS know when hardware change the
frequency. It's worth thinking about.

> 
>> 2. Is it reasonable to update the target frequency when querying
>> cpuinfo_cur_freq?
>>
>> My answer is NO. The idea that "observation can affect the outcome" should
>> only apply in quantum mechanics. If querying cpuinfo_cur_freq results in an
>> update of the target frequency, it means that the target frequency should
>> indeed be updated, regardless of the query itself. So I think the best
>> approach is to handle this properly in gov_dbs_update(). But I am not sure
>> if obtaining the actual frequency in gov_dbs_update() will burden the task.
> 
> Exactly the point. We have chosen to do this only when cpufreq core gets to know
> the real frequency. Reading current frequency is a time consuming process and we
> don't want to do it unnecessarily.
> 
>> Perhaps this is why target freq updates are made when querying
>> cpuinfo_cur_freq now. But this does not mean that it is reasonable. It is
>> what you call a band-aid I think.
> 
> Of course it is a band-aid. No doubt about that. It may not be the best way out,
> but it is how it is right now.
> 

Thanks for the answer.

Re: [PATCH 0/2] cpufreq: governor: Apply limits with target_freq instead of policy->cur

Posted by Rafael J. Wysocki 2 weeks, 2 days ago

On Tue, Mar 17, 2026 at 9:26 AM zhenglifeng (A) <zhenglifeng1@huawei.com> wrote:
>
> On 3/17/2026 2:20 PM, Viresh Kumar wrote:
> > On 13-03-26, 18:12, zhenglifeng (A) wrote:
> >> Let's forget what happens on our platform. The current discussion is no
> >> longer about applying this patchset. What I really want to discuss are two
> >> questions:
> >>
> >> 1. Is the platform allowed to adjust to frequencies out of the limits?
> >>
> >> My answer is YES. policy->min and policy->max can be modified by the user.
> >> policy->min can even be increased to be as large as policy->max. However,
> >> due to some other limitations(temperature, power, current etc.), the
> >> platform sometimes has to (secretly as you say) reduce the actual
> >> frequency, inevitably falling below the OS's frequency limits.
> >
> > It is fine if you want to do that, just that you should use OS to make that
> > change.
> > - Update freq table to contain all valid frequencies (1/4 of min in you case).
> > - Update policy->min to current min, so you don't go to those low frequencies.
> > - Use freq qos or sysfs to change min/max when you want to go really low, like
> >   1/4 of min.
> >
> >> I believe
> >> this situation will occur not only on our platform. The OS should not treat
> >> this situation as an anomaly, and leading to unpredictable behavior. The OS
> >> should have the ability to self-regulate. I believe that your patch is for
> >> this purpose.
> >
> > When the OS is kept in the dark about such a change, it needs to figure out a
> > way to get back. That's all we are trying here.
>
> I see. Then I think the point is how to let OS know when hardware change the
> frequency. It's worth thinking about.

An interrupt needs to trigger and then it depends.  On systems with
ACPI the OS will get a notification on the processor object, for
example.

Re: [PATCH 0/2] cpufreq: governor: Apply limits with target_freq instead of policy->cur

Posted by Jie Zhan 4 weeks ago

On 2/10/2026 7:54 PM, Lifeng Zheng wrote:
> The motivation for this patchset cames from a test on our platform:
> 
> With conservative governor and some pressure on CPU, the frequency rapidly
> reach the max supported frequency, such as 2GHz.
> 
> Later, some frequency division strategies on our platform were triggered
> and the actual frequency become 500MHz -- 1/4 of the OS distribution
> frequency.
> 
> At that time, if someone excecutes 'cat cpuinfo_cur_freq', the actual
> frequency will become 250MHz -- 1/4 of the min supported frequency.
> 
> After the platform recovering from the frequency division, the frequency
> will stay on 1GHz, until the pressure disappear.
> 
> The reason this happens is that in cpufreq_verify_current_freq(), if
> policy->cur != new_freq, policy->update will be queued, which will
> ultimately lead to a call to cpufreq_policy_apply_limits(), and update the
> target frequency to policy->min. And then in cs_dbs_update(), since the
> pressure never vanish, it will always hit the following branches:
> 
> 	if (load > dbs_data->up_threshold) {
> 		dbs_info->down_skip = 0;
> 
> 		/* if we are already at full speed then break out early */
> 		if (requested_freq == policy->max)
> 			goto out;
> 
> Therefore, the target frequency will always remain at the lowest frequency.
I feel like this is a common issue that some special handling in the
governor should happen when the frequency limits changes, i.e.
governor->limits() gets called.  See 'limits_changed' or 'need_freq_update'
in the schedutil governor.

Do you think it's reasonable to mark such a flag in 'cpufreq_policy' when
its limits changes, and any governor can use that for their own code?

Jie
> 
> The branching conditions in cs_dbs_update() may not be strict enough, but
> the root cause of this problem is that the target frequency was updated
> when querying cpuinfo_cur_freq. For ondemand and schedutil governor,
> although the frequency will not always remain at the lowest level without
> rising, will still be min_freq in a short period of time when the query
> action occurs.
> 
> Using the freq requested by the governor to decide whether to update the
> target frequency is more reasonable in cpufreq_policy_apply_limits().
> 
> Lifeng Zheng (2):
>   cpufreq: governor: Move requested_freq to policy_dbs_info
>   cpufreq: governor: Apply limits with requested_freq or next_freq
> 
>  drivers/cpufreq/cpufreq_conservative.c | 14 ++++----------
>  drivers/cpufreq/cpufreq_governor.c     |  3 ++-
>  drivers/cpufreq/cpufreq_governor.h     | 12 ++++++++++++
>  drivers/cpufreq/cpufreq_ondemand.c     | 10 +++++-----
>  include/linux/cpufreq.h                |  7 ++++---
>  kernel/sched/cpufreq_schedutil.c       |  4 ++--
>  6 files changed, 29 insertions(+), 21 deletions(-)
>

Re: [PATCH 0/2] cpufreq: governor: Apply limits with target_freq instead of policy->cur

Posted by zhenglifeng (A) 3 weeks, 6 days ago

On 3/5/2026 8:19 PM, Jie Zhan wrote:
> 
> On 2/10/2026 7:54 PM, Lifeng Zheng wrote:
>> The motivation for this patchset cames from a test on our platform:
>>
>> With conservative governor and some pressure on CPU, the frequency rapidly
>> reach the max supported frequency, such as 2GHz.
>>
>> Later, some frequency division strategies on our platform were triggered
>> and the actual frequency become 500MHz -- 1/4 of the OS distribution
>> frequency.
>>
>> At that time, if someone excecutes 'cat cpuinfo_cur_freq', the actual
>> frequency will become 250MHz -- 1/4 of the min supported frequency.
>>
>> After the platform recovering from the frequency division, the frequency
>> will stay on 1GHz, until the pressure disappear.
>>
>> The reason this happens is that in cpufreq_verify_current_freq(), if
>> policy->cur != new_freq, policy->update will be queued, which will
>> ultimately lead to a call to cpufreq_policy_apply_limits(), and update the
>> target frequency to policy->min. And then in cs_dbs_update(), since the
>> pressure never vanish, it will always hit the following branches:
>>
>> 	if (load > dbs_data->up_threshold) {
>> 		dbs_info->down_skip = 0;
>>
>> 		/* if we are already at full speed then break out early */
>> 		if (requested_freq == policy->max)
>> 			goto out;
>>
>> Therefore, the target frequency will always remain at the lowest frequency.
> I feel like this is a common issue that some special handling in the
> governor should happen when the frequency limits changes, i.e.
> governor->limits() gets called.  See 'limits_changed' or 'need_freq_update'
> in the schedutil governor.
> 
> Do you think it's reasonable to mark such a flag in 'cpufreq_policy' when
> its limits changes, and any governor can use that for their own code?

Yes. This is why I said 'The branching conditions in cs_dbs_update() may
not be strict enough'. However, using policy->cur to decide the target freq
is unreasonable to me. So this patch set is still meaningful I think.

> 
> Jie
>>
>> The branching conditions in cs_dbs_update() may not be strict enough, but
>> the root cause of this problem is that the target frequency was updated
>> when querying cpuinfo_cur_freq. For ondemand and schedutil governor,
>> although the frequency will not always remain at the lowest level without
>> rising, will still be min_freq in a short period of time when the query
>> action occurs.
>>
>> Using the freq requested by the governor to decide whether to update the
>> target frequency is more reasonable in cpufreq_policy_apply_limits().
>>
>> Lifeng Zheng (2):
>>   cpufreq: governor: Move requested_freq to policy_dbs_info
>>   cpufreq: governor: Apply limits with requested_freq or next_freq
>>
>>  drivers/cpufreq/cpufreq_conservative.c | 14 ++++----------
>>  drivers/cpufreq/cpufreq_governor.c     |  3 ++-
>>  drivers/cpufreq/cpufreq_governor.h     | 12 ++++++++++++
>>  drivers/cpufreq/cpufreq_ondemand.c     | 10 +++++-----
>>  include/linux/cpufreq.h                |  7 ++++---
>>  kernel/sched/cpufreq_schedutil.c       |  4 ++--
>>  6 files changed, 29 insertions(+), 21 deletions(-)
>>

Re: [PATCH 0/2] cpufreq: governor: Apply limits with target_freq instead of policy->cur

Posted by Jie Zhan 3 weeks, 2 days ago


On 3/6/2026 5:21 PM, zhenglifeng (A) wrote:
> On 3/5/2026 8:19 PM, Jie Zhan wrote:
>>
>> On 2/10/2026 7:54 PM, Lifeng Zheng wrote:
>>> The motivation for this patchset cames from a test on our platform:
>>>
>>> With conservative governor and some pressure on CPU, the frequency rapidly
>>> reach the max supported frequency, such as 2GHz.
>>>
>>> Later, some frequency division strategies on our platform were triggered
>>> and the actual frequency become 500MHz -- 1/4 of the OS distribution
>>> frequency.
>>>
>>> At that time, if someone excecutes 'cat cpuinfo_cur_freq', the actual
>>> frequency will become 250MHz -- 1/4 of the min supported frequency.
>>>
>>> After the platform recovering from the frequency division, the frequency
>>> will stay on 1GHz, until the pressure disappear.
>>>
>>> The reason this happens is that in cpufreq_verify_current_freq(), if
>>> policy->cur != new_freq, policy->update will be queued, which will
>>> ultimately lead to a call to cpufreq_policy_apply_limits(), and update the
>>> target frequency to policy->min. And then in cs_dbs_update(), since the
>>> pressure never vanish, it will always hit the following branches:
>>>
>>> 	if (load > dbs_data->up_threshold) {
>>> 		dbs_info->down_skip = 0;
>>>
>>> 		/* if we are already at full speed then break out early */
>>> 		if (requested_freq == policy->max)
>>> 			goto out;
>>>
>>> Therefore, the target frequency will always remain at the lowest frequency.
>> I feel like this is a common issue that some special handling in the
>> governor should happen when the frequency limits changes, i.e.
>> governor->limits() gets called.  See 'limits_changed' or 'need_freq_update'
>> in the schedutil governor.
>>
>> Do you think it's reasonable to mark such a flag in 'cpufreq_policy' when
>> its limits changes, and any governor can use that for their own code?
> 
> Yes. This is why I said 'The branching conditions in cs_dbs_update() may
> not be strict enough'. However, using policy->cur to decide the target freq
> is unreasonable to me. So this patch set is still meaningful I think.
> 
Ok, ignore this.  I was thinking of extracting such a flag/variable to a
common struct shared among ondemand, conservative, and scheduitl.  But this
is unrealistic at the moment, considering the governor data between DBS
(ondemand, conservative) and schedutil is totally different.

>>
>> Jie
>>>
>>> The branching conditions in cs_dbs_update() may not be strict enough, but
>>> the root cause of this problem is that the target frequency was updated
>>> when querying cpuinfo_cur_freq. For ondemand and schedutil governor,
>>> although the frequency will not always remain at the lowest level without
>>> rising, will still be min_freq in a short period of time when the query
>>> action occurs.
>>>
>>> Using the freq requested by the governor to decide whether to update the
>>> target frequency is more reasonable in cpufreq_policy_apply_limits().
>>>
>>> Lifeng Zheng (2):
>>>   cpufreq: governor: Move requested_freq to policy_dbs_info
>>>   cpufreq: governor: Apply limits with requested_freq or next_freq
>>>
>>>  drivers/cpufreq/cpufreq_conservative.c | 14 ++++----------
>>>  drivers/cpufreq/cpufreq_governor.c     |  3 ++-
>>>  drivers/cpufreq/cpufreq_governor.h     | 12 ++++++++++++
>>>  drivers/cpufreq/cpufreq_ondemand.c     | 10 +++++-----
>>>  include/linux/cpufreq.h                |  7 ++++---
>>>  kernel/sched/cpufreq_schedutil.c       |  4 ++--
>>>  6 files changed, 29 insertions(+), 21 deletions(-)
>>>
>