kernel/sched/fair.c | 115 +++++++++++++++++++++++++++++++++++++------ kernel/sched/sched.h | 9 ++++ 2 files changed, 110 insertions(+), 14 deletions(-)
Misfit load balance was added to help handle HMP systems where we can make a wrong decision at wake up thinking a task can run at a smaller core, but its characteristics change and requires to migrate to a bigger core to meet its performance demands. With the addition of uclamp, we can encounter more cases where such wrong placement decisions can be made and require load balancer to do a corrective action. Specifically if a big task capped by uclamp_max was placed on a big core at wake up because EAS thought it is the most energy efficient core at the time, the dynamics of the system might change where other uncapped tasks might wake up on the cluster and there could be a better new more energy efficient placement for the capped task(s). We can generalize the misfit load balance to handle different type of misfits (whatever they may be) by simply giving it a reason. The reason can decide the type of action required then. Current misfit implementation is considered MISFIT_PERF. Which means we need to move a task to a better CPU to meet its performance requirement. For UCLAMP_MAX I propose MISFIT_POWER, where we need to find a better placement to control its impact on power. Once we have an API to annotate latency sensitive tasks, it is anticipated MISFIT_LATENCY load balance will be required to help handle oversubscribe situations to help better distribute the latency sensitive tasks to help reduce their wake up latency. Patch 1 splits misfit status update from misfit detection by adding a new function is_misfit_task(). Patch 2 implements the generalization logic by adding a misfit reason and propagating that correctly and guarding the current misfit code with MISFIT_PERF reason. Patch 3 is an RFC on a potential implementation for MISFIT_POWER. Patch 1 and 2 were tested stand alone and had no regression observed and should not introduce a functional change and can be considered for merge if they make sense after addressing any review comments. Patch 3 was only tested to verify it does what I expected it to do. But no real power/perf testing was done. Mainly because I was expecting to remove uclamp max-aggregation [1] and the RFC I currently have (which I wrote many many months ago) is tied to detecting a task being uncapped by max-aggregation. I need to rethink the detection mechanism. Beside that, the logic relies on using find_energy_efficient_cpu() to find the best potential new placement for the task. To do that though, we need to force every CPU to do the MISFIT_POWER load balance as we don't know which CPU should do the pull. But there might be better thoughts on how to handle this. So feedback and thoughts would be appreciated. [1] https://lore.kernel.org/lkml/20231208015242.385103-1-qyousef@layalina.io/ Thanks! -- Qais Yousef Qais Yousef (3): sched/fair: Add is_misfit_task() function sched/fair: Generalize misfit lb by adding a misfit reason sched/fair: Implement new type of misfit MISFIT_POWER kernel/sched/fair.c | 115 +++++++++++++++++++++++++++++++++++++------ kernel/sched/sched.h | 9 ++++ 2 files changed, 110 insertions(+), 14 deletions(-) -- 2.34.1
Hello Qais, On 12/9/23 02:17, Qais Yousef wrote: > Misfit load balance was added to help handle HMP systems where we can make > a wrong decision at wake up thinking a task can run at a smaller core, but its > characteristics change and requires to migrate to a bigger core to meet its > performance demands. > > With the addition of uclamp, we can encounter more cases where such wrong > placement decisions can be made and require load balancer to do a corrective > action. > > Specifically if a big task capped by uclamp_max was placed on a big core at > wake up because EAS thought it is the most energy efficient core at the time, > the dynamics of the system might change where other uncapped tasks might wake > up on the cluster and there could be a better new more energy efficient > placement for the capped task(s). > > We can generalize the misfit load balance to handle different type of misfits > (whatever they may be) by simply giving it a reason. The reason can decide the > type of action required then. > > Current misfit implementation is considered MISFIT_PERF. Which means we need to > move a task to a better CPU to meet its performance requirement. > > For UCLAMP_MAX I propose MISFIT_POWER, where we need to find a better placement > to control its impact on power. > > Once we have an API to annotate latency sensitive tasks, it is anticipated > MISFIT_LATENCY load balance will be required to help handle oversubscribe > situations to help better distribute the latency sensitive tasks to help reduce > their wake up latency. > > Patch 1 splits misfit status update from misfit detection by adding a new > function is_misfit_task(). > > Patch 2 implements the generalization logic by adding a misfit reason and > propagating that correctly and guarding the current misfit code with > MISFIT_PERF reason. > > Patch 3 is an RFC on a potential implementation for MISFIT_POWER. > > Patch 1 and 2 were tested stand alone and had no regression observed and should > not introduce a functional change and can be considered for merge if they make > sense after addressing any review comments. > > Patch 3 was only tested to verify it does what I expected it to do. But no real > power/perf testing was done. Mainly because I was expecting to remove uclamp > max-aggregation [1] and the RFC I currently have (which I wrote many many > months ago) is tied to detecting a task being uncapped by max-aggregation. > I need to rethink the detection mechanism. I tried to trigger the MISFIT_POWER misfit reason without success so far. Would it be possible to provide a workload/test to reliably trigger the condition ? Regards, Pierre > > Beside that, the logic relies on using find_energy_efficient_cpu() to find the > best potential new placement for the task. To do that though, we need to force > every CPU to do the MISFIT_POWER load balance as we don't know which CPU should > do the pull. But there might be better thoughts on how to handle this. So > feedback and thoughts would be appreciated. > > [1] https://lore.kernel.org/lkml/20231208015242.385103-1-qyousef@layalina.io/ > > Thanks! > > -- > Qais Yousef > > Qais Yousef (3): > sched/fair: Add is_misfit_task() function > sched/fair: Generalize misfit lb by adding a misfit reason > sched/fair: Implement new type of misfit MISFIT_POWER > > kernel/sched/fair.c | 115 +++++++++++++++++++++++++++++++++++++------ > kernel/sched/sched.h | 9 ++++ > 2 files changed, 110 insertions(+), 14 deletions(-) >
On 12/21/23 16:26, Pierre Gondois wrote: > Hello Qais, > > On 12/9/23 02:17, Qais Yousef wrote: > > Misfit load balance was added to help handle HMP systems where we can make > > a wrong decision at wake up thinking a task can run at a smaller core, but its > > characteristics change and requires to migrate to a bigger core to meet its > > performance demands. > > > > With the addition of uclamp, we can encounter more cases where such wrong > > placement decisions can be made and require load balancer to do a corrective > > action. > > > > Specifically if a big task capped by uclamp_max was placed on a big core at > > wake up because EAS thought it is the most energy efficient core at the time, > > the dynamics of the system might change where other uncapped tasks might wake > > up on the cluster and there could be a better new more energy efficient > > placement for the capped task(s). > > > > We can generalize the misfit load balance to handle different type of misfits > > (whatever they may be) by simply giving it a reason. The reason can decide the > > type of action required then. > > > > Current misfit implementation is considered MISFIT_PERF. Which means we need to > > move a task to a better CPU to meet its performance requirement. > > > > For UCLAMP_MAX I propose MISFIT_POWER, where we need to find a better placement > > to control its impact on power. > > > > Once we have an API to annotate latency sensitive tasks, it is anticipated > > MISFIT_LATENCY load balance will be required to help handle oversubscribe > > situations to help better distribute the latency sensitive tasks to help reduce > > their wake up latency. > > > > Patch 1 splits misfit status update from misfit detection by adding a new > > function is_misfit_task(). > > > > Patch 2 implements the generalization logic by adding a misfit reason and > > propagating that correctly and guarding the current misfit code with > > MISFIT_PERF reason. > > > > Patch 3 is an RFC on a potential implementation for MISFIT_POWER. > > > > Patch 1 and 2 were tested stand alone and had no regression observed and should > > not introduce a functional change and can be considered for merge if they make > > sense after addressing any review comments. > > > > Patch 3 was only tested to verify it does what I expected it to do. But no real > > power/perf testing was done. Mainly because I was expecting to remove uclamp > > max-aggregation [1] and the RFC I currently have (which I wrote many many > > months ago) is tied to detecting a task being uncapped by max-aggregation. > > I need to rethink the detection mechanism. > > I tried to trigger the MISFIT_POWER misfit reason without success so far. > Would it be possible to provide a workload/test to reliably trigger the > condition ? I spawn a busy loop like cat /dev/zero > dev/null Then use uclampset -M 0 -p $PID to change uclamp_max to 0 and 1024 back and forth. Try to load the system with some workload and you should see something like attached picture. Red boxes are periods where uclamp_max is 0. The rest is for uclamp_max = 1024. Note how it being constantly moved between CPUs when capped. Cheers -- Qais Yousef
© 2016 - 2025 Red Hat, Inc.