[PATCH] soc: qcom: icc-bwmon: Update zone1_thres_count to 3

Shivnandan Kumar posted 1 patch 1 year, 8 months ago
There is a newer version of this series
drivers/soc/qcom/icc-bwmon.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
[PATCH] soc: qcom: icc-bwmon: Update zone1_thres_count to 3
Posted by Shivnandan Kumar 1 year, 8 months ago
Update zone1_thres_count to 3 from 16 so that
driver can reduce bus vote in 3 sample windows instead
of waiting for 16 windows. This is in line with downstream
implementation.

Signed-off-by: Shivnandan Kumar <quic_kshivnan@quicinc.com>
---
 drivers/soc/qcom/icc-bwmon.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/soc/qcom/icc-bwmon.c b/drivers/soc/qcom/icc-bwmon.c
index 656706259353..f1065427bb80 100644
--- a/drivers/soc/qcom/icc-bwmon.c
+++ b/drivers/soc/qcom/icc-bwmon.c
@@ -815,7 +815,7 @@ static const struct icc_bwmon_data msm8998_bwmon_data = {
 static const struct icc_bwmon_data sdm845_cpu_bwmon_data = {
 	.sample_ms = 4,
 	.count_unit_kb = 64,
-	.zone1_thres_count = 16,
+	.zone1_thres_count = 3,
 	.zone3_thres_count = 1,
 	.quirks = BWMON_HAS_GLOBAL_IRQ,
 	.regmap_fields = sdm845_cpu_bwmon_reg_fields,
@@ -834,7 +834,7 @@ static const struct icc_bwmon_data sdm845_llcc_bwmon_data = {
 static const struct icc_bwmon_data sc7280_llcc_bwmon_data = {
 	.sample_ms = 4,
 	.count_unit_kb = 64,
-	.zone1_thres_count = 16,
+	.zone1_thres_count = 3,
 	.zone3_thres_count = 1,
 	.quirks = BWMON_NEEDS_FORCE_CLEAR,
 	.regmap_fields = sdm845_llcc_bwmon_reg_fields,
--
2.25.1
Re: [PATCH] soc: qcom: icc-bwmon: Update zone1_thres_count to 3
Posted by Krzysztof Kozlowski 1 year, 8 months ago
On 22/05/2024 10:15, Shivnandan Kumar wrote:
> Update zone1_thres_count to 3 from 16 so that
> driver can reduce bus vote in 3 sample windows instead
> of waiting for 16 windows. This is in line with downstream
> implementation.
> 

This might make bwmon quite jittery. I don't think downstream is the
source of truth here. Please provide some measurements *on mainline tree*.

Best regards,
Krzysztof
Re: [PATCH] soc: qcom: icc-bwmon: Update zone1_thres_count to 3
Posted by Shivnandan Kumar 1 year, 8 months ago

On 5/22/2024 1:58 PM, Krzysztof Kozlowski wrote:
> On 22/05/2024 10:15, Shivnandan Kumar wrote:
>> Update zone1_thres_count to 3 from 16 so that
>> driver can reduce bus vote in 3 sample windows instead
>> of waiting for 16 windows. This is in line with downstream
>> implementation.
>>
> 
> This might make bwmon quite jittery. I don't think downstream is the
> source of truth here. Please provide some measurements *on mainline tree*.
> 

Hi Krzysztof,

The 16-window (64 ms) waiting time is too long to reduce the bus vote.
At higher FPS, there will be multiple frames in 64ms e.g. 4 frames at 
60FPS in 64ms. Hence, delay of 64ms in decision making will lead to 
higher power regression. I’ve tested this change for 4K video playback 
on mainline tree, and there’s a significant power-saving.

I propose to make it a tunable,so that user space can tune it
based on runtime depending on fps.

USECASE                     zone1_thres_count=16     zone1_thres_count=3
4K video playback           236.15 mA	             203.15 mA

Thanks,
Shivnandan

> Best regards,
> Krzysztof
> 
Re: [PATCH] soc: qcom: icc-bwmon: Update zone1_thres_count to 3
Posted by Bjorn Andersson 1 year, 8 months ago
On Wed, May 22, 2024 at 02:35:21PM +0530, Shivnandan Kumar wrote:
> 
> 
> On 5/22/2024 1:58 PM, Krzysztof Kozlowski wrote:
> > On 22/05/2024 10:15, Shivnandan Kumar wrote:
> > > Update zone1_thres_count to 3 from 16 so that
> > > driver can reduce bus vote in 3 sample windows instead
> > > of waiting for 16 windows. This is in line with downstream
> > > implementation.
> > > 
> > 
> > This might make bwmon quite jittery. I don't think downstream is the
> > source of truth here. Please provide some measurements *on mainline tree*.
> > 
> 
> Hi Krzysztof,
> 
> The 16-window (64 ms) waiting time is too long to reduce the bus vote.
> At higher FPS, there will be multiple frames in 64ms e.g. 4 frames at 60FPS
> in 64ms. Hence, delay of 64ms in decision making will lead to higher power
> regression. I’ve tested this change for 4K video playback on mainline tree,
> and there’s a significant power-saving.
> 

As requested by Krzysztof already, update your commit message to capture
such motivation.

Please read and follow this:
https://www.kernel.org/doc/html/latest/process/submitting-patches.html#describe-your-changes

> I propose to make it a tunable,so that user space can tune it
> based on runtime depending on fps.
> 

I presume that in e.g. Android there could be some sort of power HAL
that tweaks this value dynamically? In a general purpose system, how do
we make sure that this value stays relevant for multiple types of use
cases?

> USECASE                     zone1_thres_count=16     zone1_thres_count=3
> 4K video playback           236.15 mA	             203.15 mA

4k video playback is a fairly specific (and generally unusual) use case.
Is there any impact (negative or positive) for other use
cases/workloads?

Regards,
Bjorn

> 
> Thanks,
> Shivnandan
> 
> > Best regards,
> > Krzysztof
> > 
Re: [PATCH] soc: qcom: icc-bwmon: Update zone1_thres_count to 3
Posted by Pushpendra Singh 6 months ago
On 5/22/2024 10:27 PM, Bjorn Andersson wrote:
> On Wed, May 22, 2024 at 02:35:21PM +0530, Shivnandan Kumar wrote:
>>
>>
>> On 5/22/2024 1:58 PM, Krzysztof Kozlowski wrote:
>>> On 22/05/2024 10:15, Shivnandan Kumar wrote:
>>>> Update zone1_thres_count to 3 from 16 so that
>>>> driver can reduce bus vote in 3 sample windows instead
>>>> of waiting for 16 windows. This is in line with downstream
>>>> implementation.
>>>>
>>>
>>> This might make bwmon quite jittery. I don't think downstream is the
>>> source of truth here. Please provide some measurements *on mainline tree*.
>>>
>>
>> Hi Krzysztof,
>>
>> The 16-window (64 ms) waiting time is too long to reduce the bus vote.
>> At higher FPS, there will be multiple frames in 64ms e.g. 4 frames at 60FPS
>> in 64ms. Hence, delay of 64ms in decision making will lead to higher power
>> regression. I’ve tested this change for 4K video playback on mainline tree,
>> and there’s a significant power-saving.
>>
> 
> As requested by Krzysztof already, update your commit message to capture
> such motivation.
> 
> Please read and follow this:
> https://www.kernel.org/doc/html/latest/process/submitting-patches.html#describe-your-changes
> 

Ack, will fix it in the next re-spin

>> I propose to make it a tunable,so that user space can tune it
>> based on runtime depending on fps.
>>
> 
> I presume that in e.g. Android there could be some sort of power HAL
> that tweaks this value dynamically? In a general purpose system, how do
> we make sure that this value stays relevant for multiple types of use
> cases?
> 

At this point, we’re keeping the value static and not exposing it as a tunable parameter.

>> USECASE                     zone1_thres_count=16     zone1_thres_count=3
>> 4K video playback           236.15 mA	             203.15 mA
> 
> 4k video playback is a fairly specific (and generally unusual) use case.
> Is there any impact (negative or positive) for other use
> cases/workloads?
> 

We ran additional use cases and observed significant power savings with the updated zone1_thres_count value. Below are the results, 
 
USECASE                     zone1_thres_count=16     zone1_thres_count=3
4K video playback           	236.15 mA                  203.15 mA
Sleep			    	7mA			   6.9mA
Display (idle display)      	71.95mA			   67.11mA

Regards,
Pushpendra Singh

> Regards,
> Bjorn
> 
>>
>> Thanks,
>> Shivnandan
>>
>>> Best regards,
>>> Krzysztof
>>>

Re: [PATCH] soc: qcom: icc-bwmon: Update zone1_thres_count to 3
Posted by Krzysztof Kozlowski 1 year, 8 months ago
On 22/05/2024 11:05, Shivnandan Kumar wrote:
> 
> 
> On 5/22/2024 1:58 PM, Krzysztof Kozlowski wrote:
>> On 22/05/2024 10:15, Shivnandan Kumar wrote:
>>> Update zone1_thres_count to 3 from 16 so that
>>> driver can reduce bus vote in 3 sample windows instead
>>> of waiting for 16 windows. This is in line with downstream
>>> implementation.
>>>
>>
>> This might make bwmon quite jittery. I don't think downstream is the
>> source of truth here. Please provide some measurements *on mainline tree*.
>>
> 
> Hi Krzysztof,
> 
> The 16-window (64 ms) waiting time is too long to reduce the bus vote.
> At higher FPS, there will be multiple frames in 64ms e.g. 4 frames at 
> 60FPS in 64ms. Hence, delay of 64ms in decision making will lead to 
> higher power regression. I’ve tested this change for 4K video playback 
> on mainline tree, and there’s a significant power-saving.

Please include it, with measurement below, in the commit msg.

> I propose to make it a tunable,so that user space can tune it
> based on runtime depending on fps.>
> USECASE                     zone1_thres_count=16     zone1_thres_count=3
> 4K video playback           236.15 mA	             203.15 mA
> 
> Thanks,
> Shivnandan
> 
>> Best regards,
>> Krzysztof
>>

Best regards,
Krzysztof