On 7/4/2025 3:52 AM, Shrikanth Hegde wrote:
>
>
> On 6/18/25 23:58, Tim Chen wrote:
>> The load balancer attempts to identify the busiest sched_group with
>> the highest load and migrates some tasks to a less busy sched_group
>> to distribute the load across different CPUs.
>>
>> When cache-aware scheduling is enabled, the busiest sched_group is
>> defined as the one with the highest number of tasks preferring to run
>> on the destination LLC. If the busiest group has llc_balance tag,
>> the cache aware load balance will be launched.
>>
>> Introduce the helper function update_llc_busiest() to identify
>> such sched group with most tasks preferring the destination LLC.
>>
>> Co-developed-by: Chen Yu <yu.c.chen@intel.com>
>> Signed-off-by: Chen Yu <yu.c.chen@intel.com>
>> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
>> ---
>> kernel/sched/fair.c | 36 +++++++++++++++++++++++++++++++++++-
>> 1 file changed, 35 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 48a090c6e885..ab3d1239d6e4 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -10848,12 +10848,36 @@ static inline bool llc_balance(struct lb_env
>> *env, struct sg_lb_stats *sgs,
>> return false;
>> }
>> +
>> +static bool update_llc_busiest(struct lb_env *env,
>> + struct sg_lb_stats *busiest,
>> + struct sg_lb_stats *sgs)
>> +{
>> + int idx;
>> +
>> + /* Only the candidate with llc_balance need to be taken care of */
>> + if (!sgs->group_llc_balance)
>> + return false;
>> +
>> + /*
>> + * There are more tasks that want to run on dst_cpu's LLC.
>> + */
>> + idx = llc_idx(env->dst_cpu);
>> + return sgs->nr_pref_llc[idx] > busiest->nr_pref_llc[idx];
>> +}
>> #else
>> static inline bool llc_balance(struct lb_env *env, struct
>> sg_lb_stats *sgs,
>> struct sched_group *group)
>> {
>> return false;
>> }
>> +
>> +static bool update_llc_busiest(struct lb_env *env,
>> + struct sg_lb_stats *busiest,
>> + struct sg_lb_stats *sgs)
>> +{
>> + return false;
>> +}
>> #endif
>> static inline long sibling_imbalance(struct lb_env *env,
>> @@ -11085,6 +11109,14 @@ static bool update_sd_pick_busiest(struct
>> lb_env *env,
>> sds->local_stat.group_type != group_has_spare))
>> return false;
>> + /* deal with prefer LLC load balance, if failed, fall into normal
>> load balance */
>> + if (update_llc_busiest(env, busiest, sgs))
>> + return true;
>> +
>> + /* if there is already a busy group, skip the normal load balance */
>> + if (busiest->group_llc_balance)
>> + return false;
>> +
>
> If you had a group which was group_overloaded but it could have
> group_llc_balance right?
Yes.
> In this case the priorities based on group_type is not followed no?
>
Currently, group_llc_balance appears to take precedence over the
normal group_type. The setting of group_llc_balance is determined by
_get_migrate_hint(). We've made efforts to set this flag carefully to
avoid disrupting the normal load balancing.
For example, group_llc_balance won't be enabled when both the destination
LLC and source LLC surpass 50% of the average utilization. As for
group_overloaded, its threshold is set at 85% utilization
(imbalance_pct=117).
So in this case, the group_overloaded would be honored.
>> if (sgs->group_type > busiest->group_type)
>> return true;
>> @@ -11991,9 +12023,11 @@ static struct sched_group
>> *sched_balance_find_src_group(struct lb_env *env)
>> /*
>> * Try to move all excess tasks to a sibling domain of the busiest
>> * group's child domain.
>> + * Also do so if we can move some tasks that prefer the local LLC.
>> */
>> if (sds.prefer_sibling && local->group_type == group_has_spare &&
>> - sibling_imbalance(env, &sds, busiest, local) > 1)
>> + (busiest->group_llc_balance ||
>> + sibling_imbalance(env, &sds, busiest, local) > 1))
>> goto force_balance;
>> if (busiest->group_type != group_overloaded) {
>
> Also, This load balancing happening due to llc could be very tricky to
> debug.
> Any stats added to schedstat or sched/debug?
OK, we can add some in the next version.
Thanks,
Chenyu