kernel/sched/fair.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
commit 97450eb90965 ("sched/pelt: Remove shift of thermal clock")
removed the decay_shift for hw_pressure. While looking at a related
bug report, it is found that this commit uses the sched_clock_task()
in sched_tick() while replaces the sched_clock_task() with rq_clock_pelt()
in __update_blocked_others(). This could bring inconsistence. One possible
scenario I can think of is in ___update_load_sum():
u64 delta = now - sa->last_update_time
'now' could be calculated by rq_clock_pelt() from
__update_blocked_others(), and last_update_time was calculated by
rq_clock_task() previously from sched_tick(). Usually the former chases
after the latter, it cause a very large 'delta' and brings unexpected
behavior. Although this should not impact x86 platform in the bug report,
it should be fixed for other platforms.
Fixes: 97450eb90965 ("sched/pelt: Remove shift of thermal clock")
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202407091527.bb0be229-lkp@intel.com
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
---
kernel/sched/fair.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9057584ec06d..cfd4755954fd 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9362,7 +9362,7 @@ static bool __update_blocked_others(struct rq *rq, bool *done)
decayed = update_rt_rq_load_avg(now, rq, curr_class == &rt_sched_class) |
update_dl_rq_load_avg(now, rq, curr_class == &dl_sched_class) |
- update_hw_load_avg(now, rq, hw_pressure) |
+ update_hw_load_avg(rq_clock_task(rq), rq, hw_pressure) |
update_irq_load_avg(rq, 0);
if (others_have_blocked(rq))
--
2.25.1
On 25/07/2024 12:42, Chen Yu wrote:
> commit 97450eb90965 ("sched/pelt: Remove shift of thermal clock")
> removed the decay_shift for hw_pressure. While looking at a related
> bug report, it is found that this commit uses the sched_clock_task()
> in sched_tick() while replaces the sched_clock_task() with rq_clock_pelt()
> in __update_blocked_others(). This could bring inconsistence. One possible
> scenario I can think of is in ___update_load_sum():
>
> u64 delta = now - sa->last_update_time
>
> 'now' could be calculated by rq_clock_pelt() from
> __update_blocked_others(), and last_update_time was calculated by
> rq_clock_task() previously from sched_tick(). Usually the former chases
> after the latter, it cause a very large 'delta' and brings unexpected
> behavior. Although this should not impact x86 platform in the bug report,
> it should be fixed for other platforms.
I agree with this patch but I'm a bit confused here. May I know what you
mean by 'should not impact x86 platform in the bug report'? But it
closes a bug report on qemu x86_64, so it does have an impact?
>
> Fixes: 97450eb90965 ("sched/pelt: Remove shift of thermal clock")
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Closes: https://lore.kernel.org/oe-lkp/202407091527.bb0be229-lkp@intel.com
> Signed-off-by: Chen Yu <yu.c.chen@intel.com>
> ---
> kernel/sched/fair.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 9057584ec06d..cfd4755954fd 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -9362,7 +9362,7 @@ static bool __update_blocked_others(struct rq *rq, bool *done)
>
> decayed = update_rt_rq_load_avg(now, rq, curr_class == &rt_sched_class) |
> update_dl_rq_load_avg(now, rq, curr_class == &dl_sched_class) |
> - update_hw_load_avg(now, rq, hw_pressure) |
> + update_hw_load_avg(rq_clock_task(rq), rq, hw_pressure) |
> update_irq_load_avg(rq, 0);
>
> if (others_have_blocked(rq))
Reviewed-by: Hongyan Xia <hongyan.xia2@arm.com>
Hi Hongyan,
On 2024-07-25 at 14:16:30 +0100, Hongyan Xia wrote:
> On 25/07/2024 12:42, Chen Yu wrote:
> > commit 97450eb90965 ("sched/pelt: Remove shift of thermal clock")
> > removed the decay_shift for hw_pressure. While looking at a related
> > bug report, it is found that this commit uses the sched_clock_task()
> > in sched_tick() while replaces the sched_clock_task() with rq_clock_pelt()
> > in __update_blocked_others(). This could bring inconsistence. One possible
> > scenario I can think of is in ___update_load_sum():
> >
> > u64 delta = now - sa->last_update_time
> >
> > 'now' could be calculated by rq_clock_pelt() from
> > __update_blocked_others(), and last_update_time was calculated by
> > rq_clock_task() previously from sched_tick(). Usually the former chases
> > after the latter, it cause a very large 'delta' and brings unexpected
> > behavior. Although this should not impact x86 platform in the bug report,
> > it should be fixed for other platforms.
>
> I agree with this patch but I'm a bit confused here. May I know what you
> mean by 'should not impact x86 platform in the bug report'? But it closes a
> bug report on qemu x86_64, so it does have an impact?
>
It should not have any impact on x86_64. I added the bug link here because I checked
the code while looking at that report. But that report might be false positve,
or at least not caused by this logic introduced by this commit, because
CONFIG_SCHED_HW_PRESSURE was not even set in the kernel config[1]. Maybe I should
remove the 'reported-by' and 'closes' tags?
[1] https://download.01.org/0day-ci/archive/20240709/202407091527.bb0be229-lkp@intel.com/config-6.9.0-rc1-00051-g97450eb90965
> > - update_hw_load_avg(now, rq, hw_pressure) |
> > + update_hw_load_avg(rq_clock_task(rq), rq, hw_pressure) |
> > update_irq_load_avg(rq, 0);
> > if (others_have_blocked(rq))
>
> Reviewed-by: Hongyan Xia <hongyan.xia2@arm.com>
Thanks!
Best,
Chenyu
On 25/07/2024 15:00, Chen Yu wrote:
> Hi Hongyan,
>
> On 2024-07-25 at 14:16:30 +0100, Hongyan Xia wrote:
>> On 25/07/2024 12:42, Chen Yu wrote:
>>> commit 97450eb90965 ("sched/pelt: Remove shift of thermal clock")
>>> removed the decay_shift for hw_pressure. While looking at a related
>>> bug report, it is found that this commit uses the sched_clock_task()
>>> in sched_tick() while replaces the sched_clock_task() with rq_clock_pelt()
>>> in __update_blocked_others(). This could bring inconsistence. One possible
>>> scenario I can think of is in ___update_load_sum():
>>>
>>> u64 delta = now - sa->last_update_time
>>>
>>> 'now' could be calculated by rq_clock_pelt() from
>>> __update_blocked_others(), and last_update_time was calculated by
>>> rq_clock_task() previously from sched_tick(). Usually the former chases
>>> after the latter, it cause a very large 'delta' and brings unexpected
>>> behavior. Although this should not impact x86 platform in the bug report,
>>> it should be fixed for other platforms.
>>
>> I agree with this patch but I'm a bit confused here. May I know what you
>> mean by 'should not impact x86 platform in the bug report'? But it closes a
>> bug report on qemu x86_64, so it does have an impact?
>>
>
> It should not have any impact on x86_64. I added the bug link here because I checked
> the code while looking at that report. But that report might be false positve,
> or at least not caused by this logic introduced by this commit, because
> CONFIG_SCHED_HW_PRESSURE was not even set in the kernel config[1]. Maybe I should
> remove the 'reported-by' and 'closes' tags?
>
> [1] https://download.01.org/0day-ci/archive/20240709/202407091527.bb0be229-lkp@intel.com/config-6.9.0-rc1-00051-g97450eb90965
>
Yeah, it might be a good idea to remove the link to avoid confusion,
like you said HW pressure is not compiled in.
Even if there is pressure support, before your patch the big 'delta'
should only result in a HW pressure value that decays more than it
should, and should not be able to block tasks like in that bug report,
so it's very likely that it's unrelated.
On 2024-07-25 at 15:18:50 +0100, Hongyan Xia wrote:
> On 25/07/2024 15:00, Chen Yu wrote:
> > Hi Hongyan,
> >
> > On 2024-07-25 at 14:16:30 +0100, Hongyan Xia wrote:
> > > On 25/07/2024 12:42, Chen Yu wrote:
> > > > commit 97450eb90965 ("sched/pelt: Remove shift of thermal clock")
> > > > removed the decay_shift for hw_pressure. While looking at a related
> > > > bug report, it is found that this commit uses the sched_clock_task()
> > > > in sched_tick() while replaces the sched_clock_task() with rq_clock_pelt()
> > > > in __update_blocked_others(). This could bring inconsistence. One possible
> > > > scenario I can think of is in ___update_load_sum():
> > > >
> > > > u64 delta = now - sa->last_update_time
> > > >
> > > > 'now' could be calculated by rq_clock_pelt() from
> > > > __update_blocked_others(), and last_update_time was calculated by
> > > > rq_clock_task() previously from sched_tick(). Usually the former chases
> > > > after the latter, it cause a very large 'delta' and brings unexpected
> > > > behavior. Although this should not impact x86 platform in the bug report,
> > > > it should be fixed for other platforms.
> > >
> > > I agree with this patch but I'm a bit confused here. May I know what you
> > > mean by 'should not impact x86 platform in the bug report'? But it closes a
> > > bug report on qemu x86_64, so it does have an impact?
> > >
> >
> > It should not have any impact on x86_64. I added the bug link here because I checked
> > the code while looking at that report. But that report might be false positve,
> > or at least not caused by this logic introduced by this commit, because
> > CONFIG_SCHED_HW_PRESSURE was not even set in the kernel config[1]. Maybe I should
> > remove the 'reported-by' and 'closes' tags?
> >
> > [1] https://download.01.org/0day-ci/archive/20240709/202407091527.bb0be229-lkp@intel.com/config-6.9.0-rc1-00051-g97450eb90965
> >
>
> Yeah, it might be a good idea to remove the link to avoid confusion, like
> you said HW pressure is not compiled in.
>
OK, will do and send a new version.
> Even if there is pressure support, before your patch the big 'delta' should
> only result in a HW pressure value that decays more than it should, and
> should not be able to block tasks like in that bug report, so it's very
> likely that it's unrelated.
Yes, for x86_64, the rq->avg_hw.load_avg is even 0 as it does not have a chance to
get accumulated.
thanks,
Chenyu
© 2016 - 2025 Red Hat, Inc.