Cache aware scheduling

[RFC patch v3 14/20] sched: Introduce update_llc_busiest() to deal with groups having preferred LLC tasks

Posted by Tim Chen 3 months, 3 weeks ago

The load balancer attempts to identify the busiest sched_group with
the highest load and migrates some tasks to a less busy sched_group
to distribute the load across different CPUs.

When cache-aware scheduling is enabled, the busiest sched_group is
defined as the one with the highest number of tasks preferring to run
on the destination LLC. If the busiest group has llc_balance tag,
the cache aware load balance will be launched.

Introduce the helper function update_llc_busiest() to identify
such sched group with most tasks preferring the destination LLC.

Co-developed-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
---
 kernel/sched/fair.c | 36 +++++++++++++++++++++++++++++++++++-
 1 file changed, 35 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 48a090c6e885..ab3d1239d6e4 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10848,12 +10848,36 @@ static inline bool llc_balance(struct lb_env *env, struct sg_lb_stats *sgs,
 
 	return false;
 }
+
+static bool update_llc_busiest(struct lb_env *env,
+			       struct sg_lb_stats *busiest,
+			       struct sg_lb_stats *sgs)
+{
+	int idx;
+
+	/* Only the candidate with llc_balance need to be taken care of */
+	if (!sgs->group_llc_balance)
+		return false;
+
+	/*
+	 * There are more tasks that want to run on dst_cpu's LLC.
+	 */
+	idx = llc_idx(env->dst_cpu);
+	return sgs->nr_pref_llc[idx] > busiest->nr_pref_llc[idx];
+}
 #else
 static inline bool llc_balance(struct lb_env *env, struct sg_lb_stats *sgs,
 			       struct sched_group *group)
 {
 	return false;
 }
+
+static bool update_llc_busiest(struct lb_env *env,
+			       struct sg_lb_stats *busiest,
+			       struct sg_lb_stats *sgs)
+{
+	return false;
+}
 #endif
 
 static inline long sibling_imbalance(struct lb_env *env,
@@ -11085,6 +11109,14 @@ static bool update_sd_pick_busiest(struct lb_env *env,
 	     sds->local_stat.group_type != group_has_spare))
 		return false;
 
+	/* deal with prefer LLC load balance, if failed, fall into normal load balance */
+	if (update_llc_busiest(env, busiest, sgs))
+		return true;
+
+	/* if there is already a busy group, skip the normal load balance */
+	if (busiest->group_llc_balance)
+		return false;
+
 	if (sgs->group_type > busiest->group_type)
 		return true;
 
@@ -11991,9 +12023,11 @@ static struct sched_group *sched_balance_find_src_group(struct lb_env *env)
 	/*
 	 * Try to move all excess tasks to a sibling domain of the busiest
 	 * group's child domain.
+	 * Also do so if we can move some tasks that prefer the local LLC.
 	 */
 	if (sds.prefer_sibling && local->group_type == group_has_spare &&
-	    sibling_imbalance(env, &sds, busiest, local) > 1)
+	    (busiest->group_llc_balance ||
+	    sibling_imbalance(env, &sds, busiest, local) > 1))
 		goto force_balance;
 
 	if (busiest->group_type != group_overloaded) {
-- 
2.32.0

Re: [RFC patch v3 14/20] sched: Introduce update_llc_busiest() to deal with groups having preferred LLC tasks

Posted by Shrikanth Hegde 3 months, 1 week ago


On 6/18/25 23:58, Tim Chen wrote:
> The load balancer attempts to identify the busiest sched_group with
> the highest load and migrates some tasks to a less busy sched_group
> to distribute the load across different CPUs.
> 
> When cache-aware scheduling is enabled, the busiest sched_group is
> defined as the one with the highest number of tasks preferring to run
> on the destination LLC. If the busiest group has llc_balance tag,
> the cache aware load balance will be launched.
> 
> Introduce the helper function update_llc_busiest() to identify
> such sched group with most tasks preferring the destination LLC.
> 
> Co-developed-by: Chen Yu <yu.c.chen@intel.com>
> Signed-off-by: Chen Yu <yu.c.chen@intel.com>
> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> ---
>   kernel/sched/fair.c | 36 +++++++++++++++++++++++++++++++++++-
>   1 file changed, 35 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 48a090c6e885..ab3d1239d6e4 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -10848,12 +10848,36 @@ static inline bool llc_balance(struct lb_env *env, struct sg_lb_stats *sgs,
>   
>   	return false;
>   }
> +
> +static bool update_llc_busiest(struct lb_env *env,
> +			       struct sg_lb_stats *busiest,
> +			       struct sg_lb_stats *sgs)
> +{
> +	int idx;
> +
> +	/* Only the candidate with llc_balance need to be taken care of */
> +	if (!sgs->group_llc_balance)
> +		return false;
> +
> +	/*
> +	 * There are more tasks that want to run on dst_cpu's LLC.
> +	 */
> +	idx = llc_idx(env->dst_cpu);
> +	return sgs->nr_pref_llc[idx] > busiest->nr_pref_llc[idx];
> +}
>   #else
>   static inline bool llc_balance(struct lb_env *env, struct sg_lb_stats *sgs,
>   			       struct sched_group *group)
>   {
>   	return false;
>   }
> +
> +static bool update_llc_busiest(struct lb_env *env,
> +			       struct sg_lb_stats *busiest,
> +			       struct sg_lb_stats *sgs)
> +{
> +	return false;
> +}
>   #endif
>   
>   static inline long sibling_imbalance(struct lb_env *env,
> @@ -11085,6 +11109,14 @@ static bool update_sd_pick_busiest(struct lb_env *env,
>   	     sds->local_stat.group_type != group_has_spare))
>   		return false;
>   
> +	/* deal with prefer LLC load balance, if failed, fall into normal load balance */
> +	if (update_llc_busiest(env, busiest, sgs))
> +		return true;
> +
> +	/* if there is already a busy group, skip the normal load balance */
> +	if (busiest->group_llc_balance)
> +		return false;
> +

If you had a group which was group_overloaded but it could have group_llc_balance right?
In this case the priorities based on group_type is not followed no?

>   	if (sgs->group_type > busiest->group_type)
>   		return true;
>   
> @@ -11991,9 +12023,11 @@ static struct sched_group *sched_balance_find_src_group(struct lb_env *env)
>   	/*
>   	 * Try to move all excess tasks to a sibling domain of the busiest
>   	 * group's child domain.
> +	 * Also do so if we can move some tasks that prefer the local LLC.
>   	 */
>   	if (sds.prefer_sibling && local->group_type == group_has_spare &&
> -	    sibling_imbalance(env, &sds, busiest, local) > 1)
> +	    (busiest->group_llc_balance ||
> +	    sibling_imbalance(env, &sds, busiest, local) > 1))
>   		goto force_balance;
>   
>   	if (busiest->group_type != group_overloaded) {

Also, This load balancing happening due to llc could be very tricky to debug.
Any stats added to schedstat or sched/debug?

Re: [RFC patch v3 14/20] sched: Introduce update_llc_busiest() to deal with groups having preferred LLC tasks

Posted by Chen, Yu C 3 months ago

On 7/4/2025 3:52 AM, Shrikanth Hegde wrote:
> 
> 
> On 6/18/25 23:58, Tim Chen wrote:
>> The load balancer attempts to identify the busiest sched_group with
>> the highest load and migrates some tasks to a less busy sched_group
>> to distribute the load across different CPUs.
>>
>> When cache-aware scheduling is enabled, the busiest sched_group is
>> defined as the one with the highest number of tasks preferring to run
>> on the destination LLC. If the busiest group has llc_balance tag,
>> the cache aware load balance will be launched.
>>
>> Introduce the helper function update_llc_busiest() to identify
>> such sched group with most tasks preferring the destination LLC.
>>
>> Co-developed-by: Chen Yu <yu.c.chen@intel.com>
>> Signed-off-by: Chen Yu <yu.c.chen@intel.com>
>> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
>> ---
>>   kernel/sched/fair.c | 36 +++++++++++++++++++++++++++++++++++-
>>   1 file changed, 35 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 48a090c6e885..ab3d1239d6e4 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -10848,12 +10848,36 @@ static inline bool llc_balance(struct lb_env 
>> *env, struct sg_lb_stats *sgs,
>>       return false;
>>   }
>> +
>> +static bool update_llc_busiest(struct lb_env *env,
>> +                   struct sg_lb_stats *busiest,
>> +                   struct sg_lb_stats *sgs)
>> +{
>> +    int idx;
>> +
>> +    /* Only the candidate with llc_balance need to be taken care of */
>> +    if (!sgs->group_llc_balance)
>> +        return false;
>> +
>> +    /*
>> +     * There are more tasks that want to run on dst_cpu's LLC.
>> +     */
>> +    idx = llc_idx(env->dst_cpu);
>> +    return sgs->nr_pref_llc[idx] > busiest->nr_pref_llc[idx];
>> +}
>>   #else
>>   static inline bool llc_balance(struct lb_env *env, struct 
>> sg_lb_stats *sgs,
>>                      struct sched_group *group)
>>   {
>>       return false;
>>   }
>> +
>> +static bool update_llc_busiest(struct lb_env *env,
>> +                   struct sg_lb_stats *busiest,
>> +                   struct sg_lb_stats *sgs)
>> +{
>> +    return false;
>> +}
>>   #endif
>>   static inline long sibling_imbalance(struct lb_env *env,
>> @@ -11085,6 +11109,14 @@ static bool update_sd_pick_busiest(struct 
>> lb_env *env,
>>            sds->local_stat.group_type != group_has_spare))
>>           return false;
>> +    /* deal with prefer LLC load balance, if failed, fall into normal 
>> load balance */
>> +    if (update_llc_busiest(env, busiest, sgs))
>> +        return true;
>> +
>> +    /* if there is already a busy group, skip the normal load balance */
>> +    if (busiest->group_llc_balance)
>> +        return false;
>> +
> 
> If you had a group which was group_overloaded but it could have 
> group_llc_balance right?

Yes.

> In this case the priorities based on group_type is not followed no?
> 

Currently, group_llc_balance appears to take precedence over the
normal group_type. The setting of group_llc_balance is determined by
_get_migrate_hint(). We've made efforts to set this flag carefully to
avoid disrupting the normal load balancing.

For example, group_llc_balance won't be enabled when both the destination
LLC and source LLC surpass 50% of the average utilization. As for
group_overloaded, its threshold is set at 85% utilization 
(imbalance_pct=117).
So in this case, the group_overloaded would be honored.

>>       if (sgs->group_type > busiest->group_type)
>>           return true;
>> @@ -11991,9 +12023,11 @@ static struct sched_group 
>> *sched_balance_find_src_group(struct lb_env *env)
>>       /*
>>        * Try to move all excess tasks to a sibling domain of the busiest
>>        * group's child domain.
>> +     * Also do so if we can move some tasks that prefer the local LLC.
>>        */
>>       if (sds.prefer_sibling && local->group_type == group_has_spare &&
>> -        sibling_imbalance(env, &sds, busiest, local) > 1)
>> +        (busiest->group_llc_balance ||
>> +        sibling_imbalance(env, &sds, busiest, local) > 1))
>>           goto force_balance;
>>       if (busiest->group_type != group_overloaded) {
> 
> Also, This load balancing happening due to llc could be very tricky to 
> debug.
> Any stats added to schedstat or sched/debug?

OK, we can add some in the next version.

Thanks,
Chenyu