Currently, there is no proper way to update the initial lower frequency
limit from cpufreq drivers. Only way is to add a new min_freq qos
request from the driver side, but it leads to the issue explained below.
The QoS infrastructure collates the constraints from multiple
subsystems and saves them in a plist. The "current value" is defined to
be the highest value in the plist for min_freq constraint.
The cpufreq core adds a qos_request for min_freq to be 0 and the amd-pstate
driver today adds qos request for min_freq to be lowest_freq, where
lowest_freq corresponds to CPPC.lowest_perf.
Eg: Suppose WLOG considering amd-pstate driver, lowest_freq is 400000 KHz,
lowest_non_linear_freq is 1200000 KHz.
At this point of time, the min_freq QoS plist looks like:
head--> 400000 KHz (registered by amd-pstate) --> 0 KHz (registered by
cpufreq core)
When a user updates /sys/devices/system/cpu/cpuX/cpufreq/scaling_min_freq,
it only results in updating the cpufreq-core's node in the plist, where
say 0 becomes the newly echoed value.
Now, if the user echoes a value 1000000 KHz, to scaling_min_freq, then the
new list would be
head--> 1000000 KHz (registered by cpufreq core) --> 400000 KHz (registered
by amd-pstate)
and the new "current value" of the min_freq QoS constraint will be 1000000
KHz, this is the scenario where it works as expected.
Suppose we change the amd-pstate driver code's min_freq qos constraint
to lowest_non_linear_freq instead of lowest_freq, then the user will
never be able to request a value below that, due to the following:
At boot time, the min_freq QoS plist would be
head--> 1200000 KHz (registered by amd-pstate) --> 0 KHz (registered by
cpufreq core)
When the user echoes a value of 1000000 KHz, to
/sys/devices/..../scaling_min_freq, then the new list would be
head--> 1200000 KHz (registered by amd-pstate) --> 1000000 KHz (registered
by cpufreq core)
with the new "current value" of the min_freq QoS remaining 1200000 KHz.
Since the current value has not changed, there won't be any notifications
sent to the subsystems which have added their QoS constraints. In
particular, the amd-pstate driver will not get the notification, and thus,
the user's request to lower the scaling_min_freq will be ineffective.
Hence, it is advisable to have a single source of truth for the min and
max freq QoS constraints between the cpufreq and the cpufreq drivers.
So add a new callback get_init_min_freq() add in struct cpufreq_driver,
which allows amd-pstate (or any other cpufreq driver) to override the
default min_freq value being set in the policy->min_freq_req. Now
scaling_min_freq can be modified by the user to any value (lower or
higher than the init value) later on if desired.
Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
---
drivers/cpufreq/cpufreq.c | 6 +++++-
include/linux/cpufreq.h | 6 ++++++
2 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index f98c9438760c..2923068cf5f4 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1361,6 +1361,7 @@ static int cpufreq_online(unsigned int cpu)
bool new_policy;
unsigned long flags;
unsigned int j;
+ u32 init_min_freq = FREQ_QOS_MIN_DEFAULT_VALUE;
int ret;
pr_debug("%s: bringing CPU%u online\n", __func__, cpu);
@@ -1445,9 +1446,12 @@ static int cpufreq_online(unsigned int cpu)
goto out_destroy_policy;
}
+ if (cpufreq_driver->get_init_min_freq)
+ init_min_freq = cpufreq_driver->get_init_min_freq(policy);
+
ret = freq_qos_add_request(&policy->constraints,
policy->min_freq_req, FREQ_QOS_MIN,
- FREQ_QOS_MIN_DEFAULT_VALUE);
+ init_min_freq);
if (ret < 0) {
/*
* So we don't call freq_qos_remove_request() for an
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index e0e19d9c1323..b20488b55f6c 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -414,6 +414,12 @@ struct cpufreq_driver {
* policy is properly initialized, but before the governor is started.
*/
void (*register_em)(struct cpufreq_policy *policy);
+
+ /*
+ * Set by drivers that want to initialize the policy->min_freq_req with
+ * a value different from the default value (0) in cpufreq core.
+ */
+ int (*get_init_min_freq)(struct cpufreq_policy *policy);
};
/* flags */
--
2.34.1
On 03-10-24, 08:39, Dhananjay Ugwekar wrote: > diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h > index e0e19d9c1323..b20488b55f6c 100644 > --- a/include/linux/cpufreq.h > +++ b/include/linux/cpufreq.h > @@ -414,6 +414,12 @@ struct cpufreq_driver { > * policy is properly initialized, but before the governor is started. > */ > void (*register_em)(struct cpufreq_policy *policy); > + > + /* > + * Set by drivers that want to initialize the policy->min_freq_req with > + * a value different from the default value (0) in cpufreq core. > + */ > + int (*get_init_min_freq)(struct cpufreq_policy *policy); > }; Apart from Rafael's concern, I don't see why you need to define a callback for something this basic. If we are going to make this change, why can't we just add another u64 field in policy's structure which gives you the freq directly ? -- viresh
Hello Viresh, On 10/10/2024 1:05 PM, Viresh Kumar wrote: > On 03-10-24, 08:39, Dhananjay Ugwekar wrote: >> diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h >> index e0e19d9c1323..b20488b55f6c 100644 >> --- a/include/linux/cpufreq.h >> +++ b/include/linux/cpufreq.h >> @@ -414,6 +414,12 @@ struct cpufreq_driver { >> * policy is properly initialized, but before the governor is started. >> */ >> void (*register_em)(struct cpufreq_policy *policy); >> + >> + /* >> + * Set by drivers that want to initialize the policy->min_freq_req with >> + * a value different from the default value (0) in cpufreq core. >> + */ >> + int (*get_init_min_freq)(struct cpufreq_policy *policy); >> }; > > Apart from Rafael's concern, I don't see why you need to define a callback for > something this basic. If we are going to make this change, why can't we just add > another u64 field in policy's structure which gives you the freq directly ? Sure, that also works for us, if it is okay with you and Rafael, I can add the u64 field in policy struct. Thanks, Dhananjay >
On 10-10-24, 15:24, Dhananjay Ugwekar wrote: > Sure, that also works for us, if it is okay with you and Rafael, I can add the > u64 field in policy struct. Let Rafael continue the discussion on the whole idea first, if we are aligned, then you can make this change. -- viresh
On Thu, Oct 3, 2024 at 10:44 AM Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com> wrote: > > Currently, there is no proper way to update the initial lower frequency > limit from cpufreq drivers. Why do you want to do it? > Only way is to add a new min_freq qos > request from the driver side, but it leads to the issue explained below. > > The QoS infrastructure collates the constraints from multiple > subsystems and saves them in a plist. The "current value" is defined to > be the highest value in the plist for min_freq constraint. > > The cpufreq core adds a qos_request for min_freq to be 0 and the amd-pstate > driver today adds qos request for min_freq to be lowest_freq, where > lowest_freq corresponds to CPPC.lowest_perf. > > Eg: Suppose WLOG considering amd-pstate driver, lowest_freq is 400000 KHz, > lowest_non_linear_freq is 1200000 KHz. > > At this point of time, the min_freq QoS plist looks like: > > head--> 400000 KHz (registered by amd-pstate) --> 0 KHz (registered by > cpufreq core) > > When a user updates /sys/devices/system/cpu/cpuX/cpufreq/scaling_min_freq, > it only results in updating the cpufreq-core's node in the plist, where > say 0 becomes the newly echoed value. > > Now, if the user echoes a value 1000000 KHz, to scaling_min_freq, then the > new list would be > > head--> 1000000 KHz (registered by cpufreq core) --> 400000 KHz (registered > by amd-pstate) > > and the new "current value" of the min_freq QoS constraint will be 1000000 > KHz, this is the scenario where it works as expected. > > Suppose we change the amd-pstate driver code's min_freq qos constraint > to lowest_non_linear_freq instead of lowest_freq, then the user will > never be able to request a value below that, due to the following: > > At boot time, the min_freq QoS plist would be > > head--> 1200000 KHz (registered by amd-pstate) --> 0 KHz (registered by > cpufreq core) > > When the user echoes a value of 1000000 KHz, to > /sys/devices/..../scaling_min_freq, then the new list would be > > head--> 1200000 KHz (registered by amd-pstate) --> 1000000 KHz (registered > by cpufreq core) > > with the new "current value" of the min_freq QoS remaining 1200000 KHz. Yes, that's how frequency QoS works. > Since the current value has not changed, there won't be any notifications > sent to the subsystems which have added their QoS constraints. In > particular, the amd-pstate driver will not get the notification, and thus, > the user's request to lower the scaling_min_freq will be ineffective. The value written by user space to scaling_min_freq is a vote, not a request. It may not be physically possible to reduce the frequency below a certain minimum level that need not be known to the user. > Hence, it is advisable to have a single source of truth for the min and > max freq QoS constraints between the cpufreq and the cpufreq drivers. > > So add a new callback get_init_min_freq() add in struct cpufreq_driver, > which allows amd-pstate (or any other cpufreq driver) to override the > default min_freq value being set in the policy->min_freq_req. Now > scaling_min_freq can be modified by the user to any value (lower or > higher than the init value) later on if desired. > > Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com> > --- > drivers/cpufreq/cpufreq.c | 6 +++++- > include/linux/cpufreq.h | 6 ++++++ > 2 files changed, 11 insertions(+), 1 deletion(-) > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > index f98c9438760c..2923068cf5f4 100644 > --- a/drivers/cpufreq/cpufreq.c > +++ b/drivers/cpufreq/cpufreq.c > @@ -1361,6 +1361,7 @@ static int cpufreq_online(unsigned int cpu) > bool new_policy; > unsigned long flags; > unsigned int j; > + u32 init_min_freq = FREQ_QOS_MIN_DEFAULT_VALUE; > int ret; > > pr_debug("%s: bringing CPU%u online\n", __func__, cpu); > @@ -1445,9 +1446,12 @@ static int cpufreq_online(unsigned int cpu) > goto out_destroy_policy; > } > > + if (cpufreq_driver->get_init_min_freq) > + init_min_freq = cpufreq_driver->get_init_min_freq(policy); > + > ret = freq_qos_add_request(&policy->constraints, > policy->min_freq_req, FREQ_QOS_MIN, > - FREQ_QOS_MIN_DEFAULT_VALUE); > + init_min_freq); > if (ret < 0) { > /* > * So we don't call freq_qos_remove_request() for an > diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h > index e0e19d9c1323..b20488b55f6c 100644 > --- a/include/linux/cpufreq.h > +++ b/include/linux/cpufreq.h > @@ -414,6 +414,12 @@ struct cpufreq_driver { > * policy is properly initialized, but before the governor is started. > */ > void (*register_em)(struct cpufreq_policy *policy); > + > + /* > + * Set by drivers that want to initialize the policy->min_freq_req with > + * a value different from the default value (0) in cpufreq core. > + */ > + int (*get_init_min_freq)(struct cpufreq_policy *policy); > }; > > /* flags */ > -- > 2.34.1 >
Hello Rafael, On 10/4/2024 11:47 PM, Rafael J. Wysocki wrote: > On Thu, Oct 3, 2024 at 10:44 AM Dhananjay Ugwekar > <Dhananjay.Ugwekar@amd.com> wrote: >> >> Currently, there is no proper way to update the initial lower frequency >> limit from cpufreq drivers. > > Why do you want to do it? We want to set the initial lower frequency limit at a more efficient level (lowest_nonlinear_freq) than the lowest frequency, which helps save power in some idle scenarios, and also improves benchmark results in some scenarios. At the same time, we want to allow the user to set the lower limit back to the inefficient lowest frequency. Thanks, Dhananjay > >> Only way is to add a new min_freq qos >> request from the driver side, but it leads to the issue explained below. >> >> The QoS infrastructure collates the constraints from multiple >> subsystems and saves them in a plist. The "current value" is defined to >> be the highest value in the plist for min_freq constraint. >> >> The cpufreq core adds a qos_request for min_freq to be 0 and the amd-pstate >> driver today adds qos request for min_freq to be lowest_freq, where >> lowest_freq corresponds to CPPC.lowest_perf. >> >> Eg: Suppose WLOG considering amd-pstate driver, lowest_freq is 400000 KHz, >> lowest_non_linear_freq is 1200000 KHz. >> >> At this point of time, the min_freq QoS plist looks like: >> >> head--> 400000 KHz (registered by amd-pstate) --> 0 KHz (registered by >> cpufreq core) >> >> When a user updates /sys/devices/system/cpu/cpuX/cpufreq/scaling_min_freq, >> it only results in updating the cpufreq-core's node in the plist, where >> say 0 becomes the newly echoed value. >> >> Now, if the user echoes a value 1000000 KHz, to scaling_min_freq, then the >> new list would be >> >> head--> 1000000 KHz (registered by cpufreq core) --> 400000 KHz (registered >> by amd-pstate) >> >> and the new "current value" of the min_freq QoS constraint will be 1000000 >> KHz, this is the scenario where it works as expected. >> >> Suppose we change the amd-pstate driver code's min_freq qos constraint >> to lowest_non_linear_freq instead of lowest_freq, then the user will >> never be able to request a value below that, due to the following: >> >> At boot time, the min_freq QoS plist would be >> >> head--> 1200000 KHz (registered by amd-pstate) --> 0 KHz (registered by >> cpufreq core) >> >> When the user echoes a value of 1000000 KHz, to >> /sys/devices/..../scaling_min_freq, then the new list would be >> >> head--> 1200000 KHz (registered by amd-pstate) --> 1000000 KHz (registered >> by cpufreq core) >> >> with the new "current value" of the min_freq QoS remaining 1200000 KHz. > > Yes, that's how frequency QoS works. > >> Since the current value has not changed, there won't be any notifications >> sent to the subsystems which have added their QoS constraints. In >> particular, the amd-pstate driver will not get the notification, and thus, >> the user's request to lower the scaling_min_freq will be ineffective. > > The value written by user space to scaling_min_freq is a vote, not a > request. It may not be physically possible to reduce the frequency > below a certain minimum level that need not be known to the user. > >> Hence, it is advisable to have a single source of truth for the min and >> max freq QoS constraints between the cpufreq and the cpufreq drivers. >> >> So add a new callback get_init_min_freq() add in struct cpufreq_driver, >> which allows amd-pstate (or any other cpufreq driver) to override the >> default min_freq value being set in the policy->min_freq_req. Now >> scaling_min_freq can be modified by the user to any value (lower or >> higher than the init value) later on if desired. >> >> Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com> >> --- >> drivers/cpufreq/cpufreq.c | 6 +++++- >> include/linux/cpufreq.h | 6 ++++++ >> 2 files changed, 11 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c >> index f98c9438760c..2923068cf5f4 100644 >> --- a/drivers/cpufreq/cpufreq.c >> +++ b/drivers/cpufreq/cpufreq.c >> @@ -1361,6 +1361,7 @@ static int cpufreq_online(unsigned int cpu) >> bool new_policy; >> unsigned long flags; >> unsigned int j; >> + u32 init_min_freq = FREQ_QOS_MIN_DEFAULT_VALUE; >> int ret; >> >> pr_debug("%s: bringing CPU%u online\n", __func__, cpu); >> @@ -1445,9 +1446,12 @@ static int cpufreq_online(unsigned int cpu) >> goto out_destroy_policy; >> } >> >> + if (cpufreq_driver->get_init_min_freq) >> + init_min_freq = cpufreq_driver->get_init_min_freq(policy); >> + >> ret = freq_qos_add_request(&policy->constraints, >> policy->min_freq_req, FREQ_QOS_MIN, >> - FREQ_QOS_MIN_DEFAULT_VALUE); >> + init_min_freq); >> if (ret < 0) { >> /* >> * So we don't call freq_qos_remove_request() for an >> diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h >> index e0e19d9c1323..b20488b55f6c 100644 >> --- a/include/linux/cpufreq.h >> +++ b/include/linux/cpufreq.h >> @@ -414,6 +414,12 @@ struct cpufreq_driver { >> * policy is properly initialized, but before the governor is started. >> */ >> void (*register_em)(struct cpufreq_policy *policy); >> + >> + /* >> + * Set by drivers that want to initialize the policy->min_freq_req with >> + * a value different from the default value (0) in cpufreq core. >> + */ >> + int (*get_init_min_freq)(struct cpufreq_policy *policy); >> }; >> >> /* flags */ >> -- >> 2.34.1 >>
Hi, On Mon, Oct 7, 2024 at 6:40 AM Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com> wrote: > > Hello Rafael, > > On 10/4/2024 11:47 PM, Rafael J. Wysocki wrote: > > On Thu, Oct 3, 2024 at 10:44 AM Dhananjay Ugwekar > > <Dhananjay.Ugwekar@amd.com> wrote: > >> > >> Currently, there is no proper way to update the initial lower frequency > >> limit from cpufreq drivers. > > > > Why do you want to do it? > > We want to set the initial lower frequency limit at a more efficient level > (lowest_nonlinear_freq) than the lowest frequency, which helps save power in > some idle scenarios, and also improves benchmark results in some scenarios. > At the same time, we want to allow the user to set the lower limit back to > the inefficient lowest frequency. So you want the default value of scaling_min_freq to be greater than the total floor. I have to say that I'm not particularly fond of this approach because it is adding a new meaning to scaling_min_freq: Setting it below the default would not cause the driver to use inefficient frequencies which user space may not be aware of. Moreover, it would tell the driver how far it could go with that. IMV it would be bettwr to have a separate interface for this kind of tuning.
On Mon, Oct 7, 2024 at 5:46 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > > Hi, > > On Mon, Oct 7, 2024 at 6:40 AM Dhananjay Ugwekar > <Dhananjay.Ugwekar@amd.com> wrote: > > > > Hello Rafael, > > > > On 10/4/2024 11:47 PM, Rafael J. Wysocki wrote: > > > On Thu, Oct 3, 2024 at 10:44 AM Dhananjay Ugwekar > > > <Dhananjay.Ugwekar@amd.com> wrote: > > >> > > >> Currently, there is no proper way to update the initial lower frequency > > >> limit from cpufreq drivers. > > > > > > Why do you want to do it? > > > > We want to set the initial lower frequency limit at a more efficient level > > (lowest_nonlinear_freq) than the lowest frequency, which helps save power in > > some idle scenarios, and also improves benchmark results in some scenarios. > > At the same time, we want to allow the user to set the lower limit back to > > the inefficient lowest frequency. > > So you want the default value of scaling_min_freq to be greater than > the total floor. > > I have to say that I'm not particularly fond of this approach because > it is adding a new meaning to scaling_min_freq: Setting it below the > default would not cause the driver to use inefficient frequencies s/not/now/ (sorry) I should have double checked this before sending. > which user space may not be aware of. Moreover, it would tell the > driver how far it could go with that. > > IMV it would be bettwr to have a separate interface for this kind of tuning.
Hello Rafael, On 10/7/2024 9:18 PM, Rafael J. Wysocki wrote: > On Mon, Oct 7, 2024 at 5:46 PM Rafael J. Wysocki <rafael@kernel.org> wrote: >> >> Hi, >> >> On Mon, Oct 7, 2024 at 6:40 AM Dhananjay Ugwekar >> <Dhananjay.Ugwekar@amd.com> wrote: >>> >>> Hello Rafael, >>> >>> On 10/4/2024 11:47 PM, Rafael J. Wysocki wrote: >>>> On Thu, Oct 3, 2024 at 10:44 AM Dhananjay Ugwekar >>>> <Dhananjay.Ugwekar@amd.com> wrote: >>>>> >>>>> Currently, there is no proper way to update the initial lower frequency >>>>> limit from cpufreq drivers. >>>> >>>> Why do you want to do it? >>> >>> We want to set the initial lower frequency limit at a more efficient level >>> (lowest_nonlinear_freq) than the lowest frequency, which helps save power in >>> some idle scenarios, and also improves benchmark results in some scenarios. >>> At the same time, we want to allow the user to set the lower limit back to >>> the inefficient lowest frequency. >> >> So you want the default value of scaling_min_freq to be greater than >> the total floor. Yes, we want to set the default min value to what we think is best for the platform. >> >> I have to say that I'm not particularly fond of this approach because >> it is adding a new meaning to scaling_min_freq: Setting it below the >> default would not cause the driver to use inefficient frequencies > > s/not/now/ (sorry) I believe we are not changing the meaning of the scaling_min_freq just setting it to the best value at boot and then allowing the user to have access to the entire frequency range for the platform. Also, we have cpuinfo_min_freq/max_freq to indicate to the user as to what the entire frequency range is for the platform (depending on boost enabled/disabled). > > I should have double checked this before sending. > >> which user space may not be aware of. I guess, this part we can fix by documenting correctly ? >> Moreover, it would tell the >> driver how far it could go with that. Sorry, I didnt understand this part. >> >> IMV it would be bettwr to have a separate interface for this kind of tuning. I feel like we can incorporate this change cleanly enough into scaling_min_freq, without adding a new interface which might further confuse the user. But please let me know your concerns and thoughts. Thanks, Dhananjay
Hi, On Tue, Oct 8, 2024 at 8:32 AM Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com> wrote: > > Hello Rafael, > > On 10/7/2024 9:18 PM, Rafael J. Wysocki wrote: > > On Mon, Oct 7, 2024 at 5:46 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > >> > >> Hi, > >> > >> On Mon, Oct 7, 2024 at 6:40 AM Dhananjay Ugwekar > >> <Dhananjay.Ugwekar@amd.com> wrote: > >>> > >>> Hello Rafael, > >>> > >>> On 10/4/2024 11:47 PM, Rafael J. Wysocki wrote: > >>>> On Thu, Oct 3, 2024 at 10:44 AM Dhananjay Ugwekar > >>>> <Dhananjay.Ugwekar@amd.com> wrote: > >>>>> > >>>>> Currently, there is no proper way to update the initial lower frequency > >>>>> limit from cpufreq drivers. > >>>> > >>>> Why do you want to do it? > >>> > >>> We want to set the initial lower frequency limit at a more efficient level > >>> (lowest_nonlinear_freq) than the lowest frequency, which helps save power in > >>> some idle scenarios, and also improves benchmark results in some scenarios. > >>> At the same time, we want to allow the user to set the lower limit back to > >>> the inefficient lowest frequency. > >> > >> So you want the default value of scaling_min_freq to be greater than > >> the total floor. > > Yes, we want to set the default min value to what we think is best for the platform. > > >> > >> I have to say that I'm not particularly fond of this approach because > >> it is adding a new meaning to scaling_min_freq: Setting it below the > >> default would not cause the driver to use inefficient frequencies > > > > s/not/now/ (sorry) > > I believe we are not changing the meaning of the scaling_min_freq just setting it > to the best value at boot and then allowing the user to have access to the entire > frequency range for the platform. Also, we have cpuinfo_min_freq/max_freq to > indicate to the user as to what the entire frequency range is for the platform > (depending on boost enabled/disabled). But the default you want to set is the lowest efficient frequency which user space needs to be aware of. Otherwise it may make suboptimal decisions. > > > > I should have double checked this before sending. > > > >> which user space may not be aware of. > > I guess, this part we can fix by documenting correctly ? > > >> Moreover, it would tell the > >> driver how far it could go with that. > > Sorry, I didnt understand this part. Sorry, never mind. I was just repeating myself. > >> > >> IMV it would be bettwr to have a separate interface for this kind of tuning. > > I feel like we can incorporate this change cleanly enough into scaling_min_freq, > without adding a new interface which might further confuse the user. But please > let me know your concerns and thoughts. I'm not sure if you realize that the .show() operation for scaling_min_freq prints policy->max and you can easily make your driver's .verify() callback change it to whatever value is desired. You may as well set it to the lowest efficient frequency if the one passed to you is equal to FREQ_QOS_MIN_DEFAULT_VALUE.
On Thu, Oct 03, 2024 at 08:39:52AM +0000, Dhananjay Ugwekar wrote: > Currently, there is no proper way to update the initial lower frequency > limit from cpufreq drivers. Only way is to add a new min_freq qos > request from the driver side, but it leads to the issue explained below. > > The QoS infrastructure collates the constraints from multiple > subsystems and saves them in a plist. The "current value" is defined to > be the highest value in the plist for min_freq constraint. > > The cpufreq core adds a qos_request for min_freq to be 0 and the amd-pstate > driver today adds qos request for min_freq to be lowest_freq, where > lowest_freq corresponds to CPPC.lowest_perf. > > Eg: Suppose WLOG considering amd-pstate driver, lowest_freq is 400000 KHz, > lowest_non_linear_freq is 1200000 KHz. > > At this point of time, the min_freq QoS plist looks like: > > head--> 400000 KHz (registered by amd-pstate) --> 0 KHz (registered by > cpufreq core) > > When a user updates /sys/devices/system/cpu/cpuX/cpufreq/scaling_min_freq, > it only results in updating the cpufreq-core's node in the plist, where > say 0 becomes the newly echoed value. > > Now, if the user echoes a value 1000000 KHz, to scaling_min_freq, then the > new list would be > > head--> 1000000 KHz (registered by cpufreq core) --> 400000 KHz (registered > by amd-pstate) > > and the new "current value" of the min_freq QoS constraint will be 1000000 > KHz, this is the scenario where it works as expected. > > Suppose we change the amd-pstate driver code's min_freq qos constraint > to lowest_non_linear_freq instead of lowest_freq, then the user will > never be able to request a value below that, due to the following: > > At boot time, the min_freq QoS plist would be > > head--> 1200000 KHz (registered by amd-pstate) --> 0 KHz (registered by > cpufreq core) > > When the user echoes a value of 1000000 KHz, to > /sys/devices/..../scaling_min_freq, then the new list would be > > head--> 1200000 KHz (registered by amd-pstate) --> 1000000 KHz (registered > by cpufreq core) > > with the new "current value" of the min_freq QoS remaining 1200000 KHz. > Since the current value has not changed, there won't be any notifications > sent to the subsystems which have added their QoS constraints. In > particular, the amd-pstate driver will not get the notification, and thus, > the user's request to lower the scaling_min_freq will be ineffective. > > Hence, it is advisable to have a single source of truth for the min and > max freq QoS constraints between the cpufreq and the cpufreq drivers. > > So add a new callback get_init_min_freq() add in struct cpufreq_driver, > which allows amd-pstate (or any other cpufreq driver) to override the > default min_freq value being set in the policy->min_freq_req. Now > scaling_min_freq can be modified by the user to any value (lower or > higher than the init value) later on if desired. Looks good to me. Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com> Rafael and Viresh, could you please weigh in on this new callback. -- Thanks and Regards gautham. > > Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com> > --- > drivers/cpufreq/cpufreq.c | 6 +++++- > include/linux/cpufreq.h | 6 ++++++ > 2 files changed, 11 insertions(+), 1 deletion(-) > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > index f98c9438760c..2923068cf5f4 100644 > --- a/drivers/cpufreq/cpufreq.c > +++ b/drivers/cpufreq/cpufreq.c > @@ -1361,6 +1361,7 @@ static int cpufreq_online(unsigned int cpu) > bool new_policy; > unsigned long flags; > unsigned int j; > + u32 init_min_freq = FREQ_QOS_MIN_DEFAULT_VALUE; > int ret; > > pr_debug("%s: bringing CPU%u online\n", __func__, cpu); > @@ -1445,9 +1446,12 @@ static int cpufreq_online(unsigned int cpu) > goto out_destroy_policy; > } > > + if (cpufreq_driver->get_init_min_freq) > + init_min_freq = cpufreq_driver->get_init_min_freq(policy); > + > ret = freq_qos_add_request(&policy->constraints, > policy->min_freq_req, FREQ_QOS_MIN, > - FREQ_QOS_MIN_DEFAULT_VALUE); > + init_min_freq); > if (ret < 0) { > /* > * So we don't call freq_qos_remove_request() for an > diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h > index e0e19d9c1323..b20488b55f6c 100644 > --- a/include/linux/cpufreq.h > +++ b/include/linux/cpufreq.h > @@ -414,6 +414,12 @@ struct cpufreq_driver { > * policy is properly initialized, but before the governor is started. > */ > void (*register_em)(struct cpufreq_policy *policy); > + > + /* > + * Set by drivers that want to initialize the policy->min_freq_req with > + * a value different from the default value (0) in cpufreq core. > + */ > + int (*get_init_min_freq)(struct cpufreq_policy *policy); > }; > > /* flags */ > -- > 2.34.1 >
© 2016 - 2024 Red Hat, Inc.