cpufreq: CPPC: Changing error message in CPPC FIE

[PATCH 2/2] cpufreq: CPPC: Fix error handling in cppc_scale_freq_workfn()

Posted by Bowen Yu 2 months, 1 week ago

From: Jie Zhan <zhanjie9@hisilicon.com>

Perf counters could be 0 if the cpu is in a low-power idle state. Just try
it again next time and update the frequency scale when the cpu is active
and perf counters successfully return.

Also, remove the FIE source on an actual failure.

Signed-off-by: Jie Zhan <zhanjie9@hisilicon.com>
---
 drivers/cpufreq/cppc_cpufreq.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
index 904006027df2..e95844d3d366 100644
--- a/drivers/cpufreq/cppc_cpufreq.c
+++ b/drivers/cpufreq/cppc_cpufreq.c
@@ -78,12 +78,23 @@ static void cppc_scale_freq_workfn(struct kthread_work *work)
 	struct cppc_cpudata *cpu_data;
 	unsigned long local_freq_scale;
 	u64 perf;
+	int ret;
 
 	cppc_fi = container_of(work, struct cppc_freq_invariance, work);
 	cpu_data = cppc_fi->cpu_data;
 
-	if (cppc_get_perf_ctrs(cppc_fi->cpu, &fb_ctrs)) {
+	ret = cppc_get_perf_ctrs(cppc_fi->cpu, &fb_ctrs);
+	/*
+	 * Perf counters could be 0 if the cpu is in a low-power idle state.
+	 * Just try it again next time.
+	 */
+	if (ret == -EFAULT)
+		return;
+
+	if (ret) {
 		pr_warn("%s: failed to read perf counters\n", __func__);
+		topology_clear_scale_freq_source(SCALE_FREQ_SOURCE_CPPC,
+						 cpu_data->shared_cpu_map);
 		return;
 	}
 
-- 
2.33.0

Re: [PATCH 2/2] cpufreq: CPPC: Fix error handling in cppc_scale_freq_workfn()

Posted by Beata Michalska 2 months ago

Hi Bowen, Jie
On Wed, Jul 30, 2025 at 11:23:12AM +0800, Bowen Yu wrote:
> From: Jie Zhan <zhanjie9@hisilicon.com>
> 
> Perf counters could be 0 if the cpu is in a low-power idle state. Just try
> it again next time and update the frequency scale when the cpu is active
> and perf counters successfully return.
> 
> Also, remove the FIE source on an actual failure.
> 
> Signed-off-by: Jie Zhan <zhanjie9@hisilicon.com>
> ---
>  drivers/cpufreq/cppc_cpufreq.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
> index 904006027df2..e95844d3d366 100644
> --- a/drivers/cpufreq/cppc_cpufreq.c
> +++ b/drivers/cpufreq/cppc_cpufreq.c
> @@ -78,12 +78,23 @@ static void cppc_scale_freq_workfn(struct kthread_work *work)
>  	struct cppc_cpudata *cpu_data;
>  	unsigned long local_freq_scale;
>  	u64 perf;
> +	int ret;
>  
>  	cppc_fi = container_of(work, struct cppc_freq_invariance, work);
>  	cpu_data = cppc_fi->cpu_data;
>  
> -	if (cppc_get_perf_ctrs(cppc_fi->cpu, &fb_ctrs)) {
> +	ret = cppc_get_perf_ctrs(cppc_fi->cpu, &fb_ctrs);
> +	/*
> +	 * Perf counters could be 0 if the cpu is in a low-power idle state.
> +	 * Just try it again next time.
> +	 */
> +	if (ret == -EFAULT)
> +		return;
Which counters are we actually talking about here ?
> +
> +	if (ret) {
>  		pr_warn("%s: failed to read perf counters\n", __func__);
> +		topology_clear_scale_freq_source(SCALE_FREQ_SOURCE_CPPC,
> +						 cpu_data->shared_cpu_map);
>  		return;
>  	}
And the real error here would be ... ?
That makes me wonder why this has been registered as the source of the freq
scale in the first place if we are to hit some serious issue. Would you be able
to give an example of any?

---
BR
Beata
>  
> -- 
> 2.33.0
> 
>

Re: [PATCH 2/2] cpufreq: CPPC: Fix error handling in cppc_scale_freq_workfn()

Posted by Jie Zhan 2 months ago


On 31/07/2025 16:19, Beata Michalska wrote:
> Hi Bowen, Jie
> On Wed, Jul 30, 2025 at 11:23:12AM +0800, Bowen Yu wrote:
>> From: Jie Zhan <zhanjie9@hisilicon.com>
>>
>> Perf counters could be 0 if the cpu is in a low-power idle state. Just try
>> it again next time and update the frequency scale when the cpu is active
>> and perf counters successfully return.
>>
>> Also, remove the FIE source on an actual failure.
>>
>> Signed-off-by: Jie Zhan <zhanjie9@hisilicon.com>
>> ---
>>  drivers/cpufreq/cppc_cpufreq.c | 13 ++++++++++++-
>>  1 file changed, 12 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
>> index 904006027df2..e95844d3d366 100644
>> --- a/drivers/cpufreq/cppc_cpufreq.c
>> +++ b/drivers/cpufreq/cppc_cpufreq.c
>> @@ -78,12 +78,23 @@ static void cppc_scale_freq_workfn(struct kthread_work *work)
>>  	struct cppc_cpudata *cpu_data;
>>  	unsigned long local_freq_scale;
>>  	u64 perf;
>> +	int ret;
>>  
>>  	cppc_fi = container_of(work, struct cppc_freq_invariance, work);
>>  	cpu_data = cppc_fi->cpu_data;
>>  
>> -	if (cppc_get_perf_ctrs(cppc_fi->cpu, &fb_ctrs)) {
>> +	ret = cppc_get_perf_ctrs(cppc_fi->cpu, &fb_ctrs);
>> +	/*
>> +	 * Perf counters could be 0 if the cpu is in a low-power idle state.
>> +	 * Just try it again next time.
>> +	 */
>> +	if (ret == -EFAULT)
>> +		return;
> Which counters are we actually talking about here ?

Delivered performance counter and reference performance counter.
They are actually AMU CPU_CYCLES and CNT_CYCLES event counters.

>> +
>> +	if (ret) {
>>  		pr_warn("%s: failed to read perf counters\n", __func__);
>> +		topology_clear_scale_freq_source(SCALE_FREQ_SOURCE_CPPC,
>> +						 cpu_data->shared_cpu_map);
>>  		return;
>>  	}
> And the real error here would be ... ?
> That makes me wonder why this has been registered as the source of the freq
> scale in the first place if we are to hit some serious issue. Would you be able
> to give an example of any?
If it gets here, that would be -ENODEV or -EIO from cppc_get_perf_ctrs(),
which could possibly come from data corruption (no CPC descriptor) or a PCC
failure.

I can't easily fake an error here, but the above -EFAULT path could
happen when it luckily passes the FIE init.

Jie
> 
> ---
> BR
> Beata
>>  
>> -- 
>> 2.33.0
>>
>>
>

Re: [PATCH 2/2] cpufreq: CPPC: Fix error handling in cppc_scale_freq_workfn()

Posted by Beata Michalska 2 months ago

On Thu, Jul 31, 2025 at 04:52:05PM +0800, Jie Zhan wrote:
> 
> 
> On 31/07/2025 16:19, Beata Michalska wrote:
> > Hi Bowen, Jie
> > On Wed, Jul 30, 2025 at 11:23:12AM +0800, Bowen Yu wrote:
> >> From: Jie Zhan <zhanjie9@hisilicon.com>
> >>
> >> Perf counters could be 0 if the cpu is in a low-power idle state. Just try
> >> it again next time and update the frequency scale when the cpu is active
> >> and perf counters successfully return.
> >>
> >> Also, remove the FIE source on an actual failure.
> >>
> >> Signed-off-by: Jie Zhan <zhanjie9@hisilicon.com>
> >> ---
> >>  drivers/cpufreq/cppc_cpufreq.c | 13 ++++++++++++-
> >>  1 file changed, 12 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
> >> index 904006027df2..e95844d3d366 100644
> >> --- a/drivers/cpufreq/cppc_cpufreq.c
> >> +++ b/drivers/cpufreq/cppc_cpufreq.c
> >> @@ -78,12 +78,23 @@ static void cppc_scale_freq_workfn(struct kthread_work *work)
> >>  	struct cppc_cpudata *cpu_data;
> >>  	unsigned long local_freq_scale;
> >>  	u64 perf;
> >> +	int ret;
> >>  
> >>  	cppc_fi = container_of(work, struct cppc_freq_invariance, work);
> >>  	cpu_data = cppc_fi->cpu_data;
> >>  
> >> -	if (cppc_get_perf_ctrs(cppc_fi->cpu, &fb_ctrs)) {
> >> +	ret = cppc_get_perf_ctrs(cppc_fi->cpu, &fb_ctrs);
> >> +	/*
> >> +	 * Perf counters could be 0 if the cpu is in a low-power idle state.
> >> +	 * Just try it again next time.
> >> +	 */
> >> +	if (ret == -EFAULT)
> >> +		return;
> > Which counters are we actually talking about here ?
> 
> Delivered performance counter and reference performance counter.
> They are actually AMU CPU_CYCLES and CNT_CYCLES event counters.
That does track then.
> 
> >> +
> >> +	if (ret) {
> >>  		pr_warn("%s: failed to read perf counters\n", __func__);
> >> +		topology_clear_scale_freq_source(SCALE_FREQ_SOURCE_CPPC,
> >> +						 cpu_data->shared_cpu_map);
> >>  		return;
> >>  	}
> > And the real error here would be ... ?
> > That makes me wonder why this has been registered as the source of the freq
> > scale in the first place if we are to hit some serious issue. Would you be able
> > to give an example of any?
> If it gets here, that would be -ENODEV or -EIO from cppc_get_perf_ctrs(),
> which could possibly come from data corruption (no CPC descriptor) or a PCC
> failure.
> 
> I can't easily fake an error here, but the above -EFAULT path could
> happen when it luckily passes the FIE init.
> 
The change seems reasonable. Though I am wondering if some other errors might be
rather transient as well ? Like -EIO ?
Note, I'm not an expert here.

---
BR
Beata
> Jie
> > 
> > ---
> > BR
> > Beata
> >>  
> >> -- 
> >> 2.33.0
> >>
> >>
> >

Re: [PATCH 2/2] cpufreq: CPPC: Fix error handling in cppc_scale_freq_workfn()

Posted by Jie Zhan 2 months ago


On 31/07/2025 17:42, Beata Michalska wrote:
> On Thu, Jul 31, 2025 at 04:52:05PM +0800, Jie Zhan wrote:
>>
>>
>> On 31/07/2025 16:19, Beata Michalska wrote:
>>> Hi Bowen, Jie
>>> On Wed, Jul 30, 2025 at 11:23:12AM +0800, Bowen Yu wrote:
>>>> From: Jie Zhan <zhanjie9@hisilicon.com>
>>>>
>>>> Perf counters could be 0 if the cpu is in a low-power idle state. Just try
>>>> it again next time and update the frequency scale when the cpu is active
>>>> and perf counters successfully return.
>>>>
>>>> Also, remove the FIE source on an actual failure.
>>>>
>>>> Signed-off-by: Jie Zhan <zhanjie9@hisilicon.com>
>>>> ---
>>>>  drivers/cpufreq/cppc_cpufreq.c | 13 ++++++++++++-
>>>>  1 file changed, 12 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
>>>> index 904006027df2..e95844d3d366 100644
>>>> --- a/drivers/cpufreq/cppc_cpufreq.c
>>>> +++ b/drivers/cpufreq/cppc_cpufreq.c
>>>> @@ -78,12 +78,23 @@ static void cppc_scale_freq_workfn(struct kthread_work *work)
>>>>  	struct cppc_cpudata *cpu_data;
>>>>  	unsigned long local_freq_scale;
>>>>  	u64 perf;
>>>> +	int ret;
>>>>  
>>>>  	cppc_fi = container_of(work, struct cppc_freq_invariance, work);
>>>>  	cpu_data = cppc_fi->cpu_data;
>>>>  
>>>> -	if (cppc_get_perf_ctrs(cppc_fi->cpu, &fb_ctrs)) {
>>>> +	ret = cppc_get_perf_ctrs(cppc_fi->cpu, &fb_ctrs);
>>>> +	/*
>>>> +	 * Perf counters could be 0 if the cpu is in a low-power idle state.
>>>> +	 * Just try it again next time.
>>>> +	 */
>>>> +	if (ret == -EFAULT)
>>>> +		return;
>>> Which counters are we actually talking about here ?
>>
>> Delivered performance counter and reference performance counter.
>> They are actually AMU CPU_CYCLES and CNT_CYCLES event counters.
> That does track then.
>>
>>>> +
>>>> +	if (ret) {
>>>>  		pr_warn("%s: failed to read perf counters\n", __func__);
>>>> +		topology_clear_scale_freq_source(SCALE_FREQ_SOURCE_CPPC,
>>>> +						 cpu_data->shared_cpu_map);
>>>>  		return;
>>>>  	}
>>> And the real error here would be ... ?
>>> That makes me wonder why this has been registered as the source of the freq
>>> scale in the first place if we are to hit some serious issue. Would you be able
>>> to give an example of any?
>> If it gets here, that would be -ENODEV or -EIO from cppc_get_perf_ctrs(),
>> which could possibly come from data corruption (no CPC descriptor) or a PCC
>> failure.
>>
>> I can't easily fake an error here, but the above -EFAULT path could
>> happen when it luckily passes the FIE init.
>>
> The change seems reasonable. Though I am wondering if some other errors might be
> rather transient as well ? Like -EIO ?
> Note, I'm not an expert here.
The -EIO from PCC contains much more error cases than this.

Re: [PATCH 2/2] cpufreq: CPPC: Fix error handling in cppc_scale_freq_workfn()

Posted by Markus Elfring 2 months, 1 week ago

> Perf counters could be 0 if the cpu is in a low-power idle state. Just try
> it again next time and update the frequency scale when the cpu is active
> and perf counters successfully return.
> 
> Also, remove the FIE source on an actual failure.

How do you think about to add any tags (like “Fixes” and “Cc”) accordingly?
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?h=v6.16#n145

Regards,
Markus

Re: [PATCH 2/2] cpufreq: CPPC: Fix error handling in cppc_scale_freq_workfn()

Posted by Jie Zhan 2 months ago

On 31/07/2025 02:38, Markus Elfring wrote:
>> Perf counters could be 0 if the cpu is in a low-power idle state. Just try
>> it again next time and update the frequency scale when the cpu is active
>> and perf counters successfully return.
>>
>> Also, remove the FIE source on an actual failure.
> 
> How do you think about to add any tags (like “Fixes” and “Cc”) accordingly?
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?h=v6.16#n145
> 
> Regards,
> Markus

Hi Markus,

We don't think it's necessarily a bugfix that should be backported to
old/stable kernels.  This is mainly an extra error handling for cases that
may happen on our platform when a CPU is in a deep idle state.

Similar patches were merged before.
https://lore.kernel.org/linux-pm/20240929033214.1039485-2-zhanjie9@hisilicon.com/

Perhaps the 'Fix' in the patch title is a bit misleading.  I'll write a
more appropriate patch description in the following version.

Thanks,
Jie

Re: [2/2] cpufreq: CPPC: Fix error handling in cppc_scale_freq_workfn()

Posted by Markus Elfring 2 months ago

> We don't think it's necessarily a bugfix that should be backported to
> old/stable kernels.  This is mainly an extra error handling for cases that
> may happen on our platform when a CPU is in a deep idle state.

I find mentioned tags relevant also according to proposed completion
of error handling.

Regards,
Markus

Re: [PATCH 2/2] cpufreq: CPPC: Fix error handling in cppc_scale_freq_workfn()

Posted by Viresh Kumar 2 months, 1 week ago

+ Prashant/Beata/Ionela

On 30-07-25, 11:23, Bowen Yu wrote:
> From: Jie Zhan <zhanjie9@hisilicon.com>
> 
> Perf counters could be 0 if the cpu is in a low-power idle state. Just try
> it again next time and update the frequency scale when the cpu is active
> and perf counters successfully return.
> 
> Also, remove the FIE source on an actual failure.
> 
> Signed-off-by: Jie Zhan <zhanjie9@hisilicon.com>
> ---
>  drivers/cpufreq/cppc_cpufreq.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
> index 904006027df2..e95844d3d366 100644
> --- a/drivers/cpufreq/cppc_cpufreq.c
> +++ b/drivers/cpufreq/cppc_cpufreq.c
> @@ -78,12 +78,23 @@ static void cppc_scale_freq_workfn(struct kthread_work *work)
>  	struct cppc_cpudata *cpu_data;
>  	unsigned long local_freq_scale;
>  	u64 perf;
> +	int ret;
>  
>  	cppc_fi = container_of(work, struct cppc_freq_invariance, work);
>  	cpu_data = cppc_fi->cpu_data;
>  
> -	if (cppc_get_perf_ctrs(cppc_fi->cpu, &fb_ctrs)) {
> +	ret = cppc_get_perf_ctrs(cppc_fi->cpu, &fb_ctrs);
> +	/*
> +	 * Perf counters could be 0 if the cpu is in a low-power idle state.
> +	 * Just try it again next time.
> +	 */
> +	if (ret == -EFAULT)
> +		return;
> +
> +	if (ret) {
>  		pr_warn("%s: failed to read perf counters\n", __func__);
> +		topology_clear_scale_freq_source(SCALE_FREQ_SOURCE_CPPC,
> +						 cpu_data->shared_cpu_map);
>  		return;
>  	}
>  
> -- 
> 2.33.0

-- 
viresh

Re: [PATCH 2/2] cpufreq: CPPC: Fix error handling in cppc_scale_freq_workfn()

Posted by Prashant Malani 2 months ago

Thanks for adding me, Viresh.

On Tue, 29 Jul 2025 at 23:39, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> + Prashant/Beata/Ionela
>
> On 30-07-25, 11:23, Bowen Yu wrote:
> > From: Jie Zhan <zhanjie9@hisilicon.com>
> >
> > Perf counters could be 0 if the cpu is in a low-power idle state. Just try
> > it again next time and update the frequency scale when the cpu is active
> > and perf counters successfully return.
> >
> > Also, remove the FIE source on an actual failure.
> >
> > Signed-off-by: Jie Zhan <zhanjie9@hisilicon.com>
> > ---
> >  drivers/cpufreq/cppc_cpufreq.c | 13 ++++++++++++-
> >  1 file changed, 12 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
> > index 904006027df2..e95844d3d366 100644
> > --- a/drivers/cpufreq/cppc_cpufreq.c
> > +++ b/drivers/cpufreq/cppc_cpufreq.c
> > @@ -78,12 +78,23 @@ static void cppc_scale_freq_workfn(struct kthread_work *work)
> >       struct cppc_cpudata *cpu_data;
> >       unsigned long local_freq_scale;
> >       u64 perf;
> > +     int ret;
> >
> >       cppc_fi = container_of(work, struct cppc_freq_invariance, work);
> >       cpu_data = cppc_fi->cpu_data;
> >
> > -     if (cppc_get_perf_ctrs(cppc_fi->cpu, &fb_ctrs)) {
> > +     ret = cppc_get_perf_ctrs(cppc_fi->cpu, &fb_ctrs);
> > +     /*
> > +      * Perf counters could be 0 if the cpu is in a low-power idle state.
> > +      * Just try it again next time.
> > +      */

FWIU the performance counters shouldn't be returning 0 in an idle state.
Per the UEFI spec [1], they increment any time the CPU is active,
so they should just return their last counter value before they went into idle
(of course in the FFH case an IPI is performed on the target CPU, so even
if the CPU was idle, it will get woken up).

As such it is better to either :
- Check for idle_cpu() directly and return (see [2] for the function)
or
- Always clear the source on encountering an error return value.

[1] https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html?highlight=cppc#performance-counters
[2] https://lore.kernel.org/linux-pm/20250619000925.415528-2-pmalani@google.com/

HTH,

-Prashant

Re: [PATCH 2/2] cpufreq: CPPC: Fix error handling in cppc_scale_freq_workfn()

Posted by Jie Zhan 2 months ago


On 31/07/2025 06:34, Prashant Malani wrote:
> Thanks for adding me, Viresh.
> 
> On Tue, 29 Jul 2025 at 23:39, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>>
>> + Prashant/Beata/Ionela
>>
>> On 30-07-25, 11:23, Bowen Yu wrote:
>>> From: Jie Zhan <zhanjie9@hisilicon.com>
>>>
>>> Perf counters could be 0 if the cpu is in a low-power idle state. Just try
>>> it again next time and update the frequency scale when the cpu is active
>>> and perf counters successfully return.
>>>
>>> Also, remove the FIE source on an actual failure.
>>>
>>> Signed-off-by: Jie Zhan <zhanjie9@hisilicon.com>
>>> ---
>>>  drivers/cpufreq/cppc_cpufreq.c | 13 ++++++++++++-
>>>  1 file changed, 12 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
>>> index 904006027df2..e95844d3d366 100644
>>> --- a/drivers/cpufreq/cppc_cpufreq.c
>>> +++ b/drivers/cpufreq/cppc_cpufreq.c
>>> @@ -78,12 +78,23 @@ static void cppc_scale_freq_workfn(struct kthread_work *work)
>>>       struct cppc_cpudata *cpu_data;
>>>       unsigned long local_freq_scale;
>>>       u64 perf;
>>> +     int ret;
>>>
>>>       cppc_fi = container_of(work, struct cppc_freq_invariance, work);
>>>       cpu_data = cppc_fi->cpu_data;
>>>
>>> -     if (cppc_get_perf_ctrs(cppc_fi->cpu, &fb_ctrs)) {
>>> +     ret = cppc_get_perf_ctrs(cppc_fi->cpu, &fb_ctrs);
>>> +     /*
>>> +      * Perf counters could be 0 if the cpu is in a low-power idle state.
>>> +      * Just try it again next time.
>>> +      */
> 
> FWIU the performance counters shouldn't be returning 0 in an idle state.
> Per the UEFI spec [1], they increment any time the CPU is active,
> so they should just return their last counter value before they went into idle
> (of course in the FFH case an IPI is performed on the target CPU, so even
> if the CPU was idle, it will get woken up).

Hi Prashant,

The perf counters could return 0 when a CPU is enters a low-power idle
state, e.g. reset or powered down, and the perf counters are in the system
memory space (the target CPU is not woken up unfortunately).

On our platform, and I suppose so on most ARM64 platforms, perf counters
are mapped to AMU counters.  Per ARM spec, AMEVCNTR0 is 0 on reset.  BTW,
that's also why ARM Trusted Firmware needs to save and restore AMU counters
before and after powering down.
https://github.com/ARM-software/arm-trusted-firmware/blob/master/lib/extensions/amu/aarch64/amu.c

I hope this explains your confusion.  Then we can carry on discussion if we
reach on this consensus.

Thanks for taking a look!
Jie

> 
> As such it is better to either :
> - Check for idle_cpu() directly and return (see [2] for the function)
> or
> - Always clear the source on encountering an error return value.
> 
> [1] https://uefi.org/specs/ACPI/6.5/08_Processor_Configuration_and_Control.html?highlight=cppc#performance-counters
> [2] https://lore.kernel.org/linux-pm/20250619000925.415528-2-pmalani@google.com/
> 
> HTH,
> 
> -Prashant

Re: [PATCH 2/2] cpufreq: CPPC: Fix error handling in cppc_scale_freq_workfn()

Posted by Prashant Malani 2 months ago

Hi Jie,

On Thu, 31 Jul 2025 at 01:32, Jie Zhan <zhanjie9@hisilicon.com> wrote:
>
>
>
> On 31/07/2025 06:34, Prashant Malani wrote:
>
> Hi Prashant,
>
> The perf counters could return 0 when a CPU is enters a low-power idle
> state, e.g. reset or powered down, and the perf counters are in the system
> memory space (the target CPU is not woken up unfortunately).
>

Thanks for the clarification. Reset and powered down are not typically
considered "low-power idle states".
Please re-word your commit message to specifically call out the "reset and
powered-down" CPU states.

This begs the question: why is this work function being scheduled
for CPUs that are in reset or offline/powered-down at all?
IANAE but it sounds like it would be better to add logic to ensure this
work function doesn't get scheduled/executed for CPUs that
are truly offline/powered-down or in reset.

BR,

-- 
-Prashant

Re: [PATCH 2/2] cpufreq: CPPC: Fix error handling in cppc_scale_freq_workfn()

Posted by Jie Zhan 2 months ago


On 01/08/2025 16:58, Prashant Malani wrote:
> Hi Jie,
> 
> On Thu, 31 Jul 2025 at 01:32, Jie Zhan <zhanjie9@hisilicon.com> wrote:
>>
>>
>>
>> On 31/07/2025 06:34, Prashant Malani wrote:
>>
>> Hi Prashant,
>>
>> The perf counters could return 0 when a CPU is enters a low-power idle
>> state, e.g. reset or powered down, and the perf counters are in the system
>> memory space (the target CPU is not woken up unfortunately).
>>
> 
> Thanks for the clarification. Reset and powered down are not typically
> considered "low-power idle states".
Actually, power down and reset is a common low-power idle state defined in
ACPI spec and implemented on Intel chips.
Some quick references:
[1] ACPI spec 6.5, 8 PROCESSOR CONFIGURATION AND CONTROL
https://uefi.org/sites/default/files/resources/ACPI_Spec_6_5_Aug29.pdf
Also, the C1, C2, C3 states are refered as Clock Gated, Retention, and
Power Down respectively in 8.4.3 Lower Power Idle States.
[2] Intel 3rd gen Xeon Scalable, the C6 state.
https://www.intel.com/content/www/us/en/content-details/637748/power-management-technology-overview-technology-guide.html?wapkw=c-state

Nevertheless, it's still not widely supported in practice, so I understand
what you said.
> Please re-word your commit message to specifically call out the "reset and
> powered-down" CPU states.
Sure.  I'll try to make it a bit more straightforward in V2.
> 
> This begs the question: why is this work function being scheduled
> for CPUs that are in reset or offline/powered-down at all?
> IANAE but it sounds like it would be better to add logic to ensure this
> work function doesn't get scheduled/executed for CPUs that
> are truly offline/powered-down or in reset.
Yeah good question.  We may discuss that on your thread.
> 
> BR,
> 
Thanks!
Jie

Re: [PATCH 2/2] cpufreq: CPPC: Fix error handling in cppc_scale_freq_workfn()

Posted by Prashant Malani 2 months ago

On Sun, 3 Aug 2025 at 23:21, Jie Zhan <zhanjie9@hisilicon.com> wrote
> On 01/08/2025 16:58, Prashant Malani wrote:
> > This begs the question: why is this work function being scheduled
> > for CPUs that are in reset or offline/powered-down at all?
> > IANAE but it sounds like it would be better to add logic to ensure this
> > work function doesn't get scheduled/executed for CPUs that
> > are truly offline/powered-down or in reset.
> Yeah good question.  We may discuss that on your thread.

OK.
Quickly looking around, it sounds having in the CPPC tick function [1]
might be a better option (one probably doesn't want to lift it beyond the
CPPC layer, since other drivers might have different behaviour).
One can add a cpu_online/cpu_enabled check there.

[1] https://elixir.bootlin.com/linux/v6.16/source/include/linux/cpumask.h#L1233

-- 
-Prashant

Re: [PATCH 2/2] cpufreq: CPPC: Fix error handling in cppc_scale_freq_workfn()

Posted by Prashant Malani 2 months ago

On Mon, 4 Aug 2025 at 18:12, Prashant Malani <pmalani@google.com> wrote:
>
> On Sun, 3 Aug 2025 at 23:21, Jie Zhan <zhanjie9@hisilicon.com> wrote
> > On 01/08/2025 16:58, Prashant Malani wrote:
> > > This begs the question: why is this work function being scheduled
> > > for CPUs that are in reset or offline/powered-down at all?
> > > IANAE but it sounds like it would be better to add logic to ensure this
> > > work function doesn't get scheduled/executed for CPUs that
> > > are truly offline/powered-down or in reset.
> > Yeah good question.  We may discuss that on your thread.
>
> OK.
> Quickly looking around, it sounds having in the CPPC tick function [1]
> might be a better option (one probably doesn't want to lift it beyond the
> CPPC layer, since other drivers might have different behaviour).
> One can add a cpu_online/cpu_enabled check there.

Fixed link:
[1] https://elixir.bootlin.com/linux/v6.13/source/drivers/cpufreq/cppc_cpufreq.c#L125

Re: [PATCH 2/2] cpufreq: CPPC: Fix error handling in cppc_scale_freq_workfn()

Posted by Jie Zhan 1 month, 3 weeks ago


On 05/08/2025 12:58, Prashant Malani wrote:
> On Mon, 4 Aug 2025 at 18:12, Prashant Malani <pmalani@google.com> wrote:
>>
>> On Sun, 3 Aug 2025 at 23:21, Jie Zhan <zhanjie9@hisilicon.com> wrote
>>> On 01/08/2025 16:58, Prashant Malani wrote:
>>>> This begs the question: why is this work function being scheduled
>>>> for CPUs that are in reset or offline/powered-down at all?
>>>> IANAE but it sounds like it would be better to add logic to ensure this
>>>> work function doesn't get scheduled/executed for CPUs that
>>>> are truly offline/powered-down or in reset.
>>> Yeah good question.  We may discuss that on your thread.
>>
>> OK.
>> Quickly looking around, it sounds having in the CPPC tick function [1]
>> might be a better option (one probably doesn't want to lift it beyond the
>> CPPC layer, since other drivers might have different behaviour).
>> One can add a cpu_online/cpu_enabled check there.
> 
> Fixed link:
> [1] https://elixir.bootlin.com/linux/v6.13/source/drivers/cpufreq/cppc_cpufreq.c#L125
I don't think a cpu_online/cpu_enabled check there would help.

Offlined CPUs don't make cppc_scale_freq_workfn() fail because they won't
have FIE triggered.  It fails from accessing perf counters on powered-down
CPUs.

Perhaps the CPPC FIE needs a bit rework.  AFAICS, FIE is meant to run in
ticks, but currently the CPPC FIE eventually runs in a thread due to the
possible PCC path when reading CPC regs I guess.

Re: [PATCH 2/2] cpufreq: CPPC: Fix error handling in cppc_scale_freq_workfn()

Posted by Beata Michalska 1 month, 3 weeks ago

On Wed, Aug 13, 2025 at 03:15:12PM +0800, Jie Zhan wrote:
> 
> 
> On 05/08/2025 12:58, Prashant Malani wrote:
> > On Mon, 4 Aug 2025 at 18:12, Prashant Malani <pmalani@google.com> wrote:
> >>
> >> On Sun, 3 Aug 2025 at 23:21, Jie Zhan <zhanjie9@hisilicon.com> wrote
> >>> On 01/08/2025 16:58, Prashant Malani wrote:
> >>>> This begs the question: why is this work function being scheduled
> >>>> for CPUs that are in reset or offline/powered-down at all?
> >>>> IANAE but it sounds like it would be better to add logic to ensure this
> >>>> work function doesn't get scheduled/executed for CPUs that
> >>>> are truly offline/powered-down or in reset.
> >>> Yeah good question.  We may discuss that on your thread.
> >>
> >> OK.
> >> Quickly looking around, it sounds having in the CPPC tick function [1]
> >> might be a better option (one probably doesn't want to lift it beyond the
> >> CPPC layer, since other drivers might have different behaviour).
> >> One can add a cpu_online/cpu_enabled check there.
> > 
> > Fixed link:
> > [1] https://elixir.bootlin.com/linux/v6.13/source/drivers/cpufreq/cppc_cpufreq.c#L125
> I don't think a cpu_online/cpu_enabled check there would help.
> 
> Offlined CPUs don't make cppc_scale_freq_workfn() fail because they won't
> have FIE triggered.  It fails from accessing perf counters on powered-down
> CPUs.
> 
> Perhaps the CPPC FIE needs a bit rework.  AFAICS, FIE is meant to run in
> ticks, but currently the CPPC FIE eventually runs in a thread due to the
> possible PCC path when reading CPC regs I guess.
Just for my benefit: the tick is being fired on a given CPU which is when an
irq_work is being queued. Then before this goes through the kworker and finally
ends up in 'cppc_scale_freq_workfn' that CPU is entering a deeper idle state ?
Could the cppc driver register for pm notifications and cancel any pending work
for a CPU that is actually going down, directly or by setting some flag or smth
so that the final worker function is either not triggered or knows it has to
bail out early ?
(Note this is a rough idea and needs verification)

---
BR
Beata

Re: [PATCH 2/2] cpufreq: CPPC: Fix error handling in cppc_scale_freq_workfn()

Posted by Jie Zhan 1 month, 3 weeks ago


On 13/08/2025 17:30, Beata Michalska wrote:
> On Wed, Aug 13, 2025 at 03:15:12PM +0800, Jie Zhan wrote:
>>
>>
>> On 05/08/2025 12:58, Prashant Malani wrote:
>>> On Mon, 4 Aug 2025 at 18:12, Prashant Malani <pmalani@google.com> wrote:
>>>>
>>>> On Sun, 3 Aug 2025 at 23:21, Jie Zhan <zhanjie9@hisilicon.com> wrote
>>>>> On 01/08/2025 16:58, Prashant Malani wrote:
>>>>>> This begs the question: why is this work function being scheduled
>>>>>> for CPUs that are in reset or offline/powered-down at all?
>>>>>> IANAE but it sounds like it would be better to add logic to ensure this
>>>>>> work function doesn't get scheduled/executed for CPUs that
>>>>>> are truly offline/powered-down or in reset.
>>>>> Yeah good question.  We may discuss that on your thread.
>>>>
>>>> OK.
>>>> Quickly looking around, it sounds having in the CPPC tick function [1]
>>>> might be a better option (one probably doesn't want to lift it beyond the
>>>> CPPC layer, since other drivers might have different behaviour).
>>>> One can add a cpu_online/cpu_enabled check there.
>>>
>>> Fixed link:
>>> [1] https://elixir.bootlin.com/linux/v6.13/source/drivers/cpufreq/cppc_cpufreq.c#L125
>> I don't think a cpu_online/cpu_enabled check there would help.
>>
>> Offlined CPUs don't make cppc_scale_freq_workfn() fail because they won't
>> have FIE triggered.  It fails from accessing perf counters on powered-down
>> CPUs.
>>
>> Perhaps the CPPC FIE needs a bit rework.  AFAICS, FIE is meant to run in
>> ticks, but currently the CPPC FIE eventually runs in a thread due to the
>> possible PCC path when reading CPC regs I guess.
> Just for my benefit: the tick is being fired on a given CPU which is when an
> irq_work is being queued. Then before this goes through the kworker and finally
> ends up in 'cppc_scale_freq_workfn' that CPU is entering a deeper idle state ?
Yeah.
> Could the cppc driver register for pm notifications and cancel any pending work
> for a CPU that is actually going down, directly or by setting some flag or smth
> so that the final worker function is either not triggered or knows it has to
> bail out early ?
> (Note this is a rough idea and needs verification)
That could be a feasible workaround, but I prefer to rework the CPPC FIE
rather than try to fix it, i.e. FIE can run in ticks for non-PCC regs I
suppose.

> 
> ---
> BR
> Beata
>