[PATCH] sched/pelt: Use rq_clock_task() for hw_pressure

Chen Yu posted 1 patch 1 year, 4 months ago
There is a newer version of this series
kernel/sched/fair.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
[PATCH] sched/pelt: Use rq_clock_task() for hw_pressure
Posted by Chen Yu 1 year, 4 months ago
commit 97450eb90965 ("sched/pelt: Remove shift of thermal clock")
removed the decay_shift for hw_pressure. While looking at a related
bug report, it is found that this commit uses the sched_clock_task()
in sched_tick() while replaces the sched_clock_task() with rq_clock_pelt()
in __update_blocked_others(). This could bring inconsistence. One possible
scenario I can think of is in ___update_load_sum():

u64 delta = now - sa->last_update_time

'now' could be calculated by rq_clock_pelt() from
__update_blocked_others(), and last_update_time was calculated by
rq_clock_task() previously from sched_tick(). Usually the former chases
after the latter, it cause a very large 'delta' and brings unexpected
behavior. Although this should not impact x86 platform in the bug report,
it should be fixed for other platforms.

Fixes: 97450eb90965 ("sched/pelt: Remove shift of thermal clock")
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202407091527.bb0be229-lkp@intel.com
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
---
 kernel/sched/fair.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9057584ec06d..cfd4755954fd 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9362,7 +9362,7 @@ static bool __update_blocked_others(struct rq *rq, bool *done)
 
 	decayed = update_rt_rq_load_avg(now, rq, curr_class == &rt_sched_class) |
 		  update_dl_rq_load_avg(now, rq, curr_class == &dl_sched_class) |
-		  update_hw_load_avg(now, rq, hw_pressure) |
+		  update_hw_load_avg(rq_clock_task(rq), rq, hw_pressure) |
 		  update_irq_load_avg(rq, 0);
 
 	if (others_have_blocked(rq))
-- 
2.25.1
Re: [PATCH] sched/pelt: Use rq_clock_task() for hw_pressure
Posted by Hongyan Xia 1 year, 4 months ago
On 25/07/2024 12:42, Chen Yu wrote:
> commit 97450eb90965 ("sched/pelt: Remove shift of thermal clock")
> removed the decay_shift for hw_pressure. While looking at a related
> bug report, it is found that this commit uses the sched_clock_task()
> in sched_tick() while replaces the sched_clock_task() with rq_clock_pelt()
> in __update_blocked_others(). This could bring inconsistence. One possible
> scenario I can think of is in ___update_load_sum():
> 
> u64 delta = now - sa->last_update_time
> 
> 'now' could be calculated by rq_clock_pelt() from
> __update_blocked_others(), and last_update_time was calculated by
> rq_clock_task() previously from sched_tick(). Usually the former chases
> after the latter, it cause a very large 'delta' and brings unexpected
> behavior. Although this should not impact x86 platform in the bug report,
> it should be fixed for other platforms.

I agree with this patch but I'm a bit confused here. May I know what you 
mean by 'should not impact x86 platform in the bug report'? But it 
closes a bug report on qemu x86_64, so it does have an impact?

> 
> Fixes: 97450eb90965 ("sched/pelt: Remove shift of thermal clock")
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Closes: https://lore.kernel.org/oe-lkp/202407091527.bb0be229-lkp@intel.com
> Signed-off-by: Chen Yu <yu.c.chen@intel.com>
> ---
>   kernel/sched/fair.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 9057584ec06d..cfd4755954fd 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -9362,7 +9362,7 @@ static bool __update_blocked_others(struct rq *rq, bool *done)
>   
>   	decayed = update_rt_rq_load_avg(now, rq, curr_class == &rt_sched_class) |
>   		  update_dl_rq_load_avg(now, rq, curr_class == &dl_sched_class) |
> -		  update_hw_load_avg(now, rq, hw_pressure) |
> +		  update_hw_load_avg(rq_clock_task(rq), rq, hw_pressure) |
>   		  update_irq_load_avg(rq, 0);
>   
>   	if (others_have_blocked(rq))

Reviewed-by: Hongyan Xia <hongyan.xia2@arm.com>
Re: [PATCH] sched/pelt: Use rq_clock_task() for hw_pressure
Posted by Chen Yu 1 year, 4 months ago
Hi Hongyan,

On 2024-07-25 at 14:16:30 +0100, Hongyan Xia wrote:
> On 25/07/2024 12:42, Chen Yu wrote:
> > commit 97450eb90965 ("sched/pelt: Remove shift of thermal clock")
> > removed the decay_shift for hw_pressure. While looking at a related
> > bug report, it is found that this commit uses the sched_clock_task()
> > in sched_tick() while replaces the sched_clock_task() with rq_clock_pelt()
> > in __update_blocked_others(). This could bring inconsistence. One possible
> > scenario I can think of is in ___update_load_sum():
> > 
> > u64 delta = now - sa->last_update_time
> > 
> > 'now' could be calculated by rq_clock_pelt() from
> > __update_blocked_others(), and last_update_time was calculated by
> > rq_clock_task() previously from sched_tick(). Usually the former chases
> > after the latter, it cause a very large 'delta' and brings unexpected
> > behavior. Although this should not impact x86 platform in the bug report,
> > it should be fixed for other platforms.
> 
> I agree with this patch but I'm a bit confused here. May I know what you
> mean by 'should not impact x86 platform in the bug report'? But it closes a
> bug report on qemu x86_64, so it does have an impact?
>

It should not have any impact on x86_64. I added the bug link here because I checked
the code while looking at that report. But that report might be false positve,
or at least not caused by this logic introduced by this commit, because
CONFIG_SCHED_HW_PRESSURE was not even set in the kernel config[1]. Maybe I should
remove the 'reported-by' and 'closes' tags?

[1] https://download.01.org/0day-ci/archive/20240709/202407091527.bb0be229-lkp@intel.com/config-6.9.0-rc1-00051-g97450eb90965

> > -		  update_hw_load_avg(now, rq, hw_pressure) |
> > +		  update_hw_load_avg(rq_clock_task(rq), rq, hw_pressure) |
> >   		  update_irq_load_avg(rq, 0);
> >   	if (others_have_blocked(rq))
> 
> Reviewed-by: Hongyan Xia <hongyan.xia2@arm.com>

Thanks!

Best,
Chenyu
Re: [PATCH] sched/pelt: Use rq_clock_task() for hw_pressure
Posted by Hongyan Xia 1 year, 4 months ago
On 25/07/2024 15:00, Chen Yu wrote:
> Hi Hongyan,
> 
> On 2024-07-25 at 14:16:30 +0100, Hongyan Xia wrote:
>> On 25/07/2024 12:42, Chen Yu wrote:
>>> commit 97450eb90965 ("sched/pelt: Remove shift of thermal clock")
>>> removed the decay_shift for hw_pressure. While looking at a related
>>> bug report, it is found that this commit uses the sched_clock_task()
>>> in sched_tick() while replaces the sched_clock_task() with rq_clock_pelt()
>>> in __update_blocked_others(). This could bring inconsistence. One possible
>>> scenario I can think of is in ___update_load_sum():
>>>
>>> u64 delta = now - sa->last_update_time
>>>
>>> 'now' could be calculated by rq_clock_pelt() from
>>> __update_blocked_others(), and last_update_time was calculated by
>>> rq_clock_task() previously from sched_tick(). Usually the former chases
>>> after the latter, it cause a very large 'delta' and brings unexpected
>>> behavior. Although this should not impact x86 platform in the bug report,
>>> it should be fixed for other platforms.
>>
>> I agree with this patch but I'm a bit confused here. May I know what you
>> mean by 'should not impact x86 platform in the bug report'? But it closes a
>> bug report on qemu x86_64, so it does have an impact?
>>
> 
> It should not have any impact on x86_64. I added the bug link here because I checked
> the code while looking at that report. But that report might be false positve,
> or at least not caused by this logic introduced by this commit, because
> CONFIG_SCHED_HW_PRESSURE was not even set in the kernel config[1]. Maybe I should
> remove the 'reported-by' and 'closes' tags?
> 
> [1] https://download.01.org/0day-ci/archive/20240709/202407091527.bb0be229-lkp@intel.com/config-6.9.0-rc1-00051-g97450eb90965
> 

Yeah, it might be a good idea to remove the link to avoid confusion, 
like you said HW pressure is not compiled in.

Even if there is pressure support, before your patch the big 'delta' 
should only result in a HW pressure value that decays more than it 
should, and should not be able to block tasks like in that bug report, 
so it's very likely that it's unrelated.
Re: [PATCH] sched/pelt: Use rq_clock_task() for hw_pressure
Posted by Chen Yu 1 year, 4 months ago
On 2024-07-25 at 15:18:50 +0100, Hongyan Xia wrote:
> On 25/07/2024 15:00, Chen Yu wrote:
> > Hi Hongyan,
> > 
> > On 2024-07-25 at 14:16:30 +0100, Hongyan Xia wrote:
> > > On 25/07/2024 12:42, Chen Yu wrote:
> > > > commit 97450eb90965 ("sched/pelt: Remove shift of thermal clock")
> > > > removed the decay_shift for hw_pressure. While looking at a related
> > > > bug report, it is found that this commit uses the sched_clock_task()
> > > > in sched_tick() while replaces the sched_clock_task() with rq_clock_pelt()
> > > > in __update_blocked_others(). This could bring inconsistence. One possible
> > > > scenario I can think of is in ___update_load_sum():
> > > > 
> > > > u64 delta = now - sa->last_update_time
> > > > 
> > > > 'now' could be calculated by rq_clock_pelt() from
> > > > __update_blocked_others(), and last_update_time was calculated by
> > > > rq_clock_task() previously from sched_tick(). Usually the former chases
> > > > after the latter, it cause a very large 'delta' and brings unexpected
> > > > behavior. Although this should not impact x86 platform in the bug report,
> > > > it should be fixed for other platforms.
> > > 
> > > I agree with this patch but I'm a bit confused here. May I know what you
> > > mean by 'should not impact x86 platform in the bug report'? But it closes a
> > > bug report on qemu x86_64, so it does have an impact?
> > > 
> > 
> > It should not have any impact on x86_64. I added the bug link here because I checked
> > the code while looking at that report. But that report might be false positve,
> > or at least not caused by this logic introduced by this commit, because
> > CONFIG_SCHED_HW_PRESSURE was not even set in the kernel config[1]. Maybe I should
> > remove the 'reported-by' and 'closes' tags?
> > 
> > [1] https://download.01.org/0day-ci/archive/20240709/202407091527.bb0be229-lkp@intel.com/config-6.9.0-rc1-00051-g97450eb90965
> > 
> 
> Yeah, it might be a good idea to remove the link to avoid confusion, like
> you said HW pressure is not compiled in.
>

OK, will do and send a new version.
 
> Even if there is pressure support, before your patch the big 'delta' should
> only result in a HW pressure value that decays more than it should, and
> should not be able to block tasks like in that bug report, so it's very
> likely that it's unrelated.

Yes, for x86_64, the rq->avg_hw.load_avg is even 0 as it does not have a chance to
get accumulated.

thanks,
Chenyu