[PATCH v2] sched/fair: Prefer cache locality for EAS wakeup

Shubhang Kaushik via B4 Relay posted 1 patch 3 months, 1 week ago
kernel/sched/fair.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)
[PATCH v2] sched/fair: Prefer cache locality for EAS wakeup
Posted by Shubhang Kaushik via B4 Relay 3 months, 1 week ago
From: Shubhang Kaushik <shubhang@os.amperecomputing.com>

When Energy Aware Scheduling (EAS) is enabled, a task waking up on a
sibling CPU might migrate away from its previous CPU even if that CPU
is not overutilized. This sacrifices cache locality and introduces
unnecessary migration overhead.

This patch refines the wakeup heuristic in `select_idle_sibling()`. If
EAS is active and the task's previous CPU (`prev`) is not overutilized,
the scheduler will prioritize waking the task on `prev`, avoiding an
unneeded migration and preserving cache-hotness.

---
v2:
- Addressed reviewer comments to handle this special condition
  within the selection logic, prioritizing the
  previous CPU if not overutilized for EAS.
- Link to v1: https://lore.kernel.org/all/20251017-b4-sched-cfs-refactor-propagate-v1-1-1eb0dc5b19b3@os.amperecomputing.com/

Signed-off-by: Shubhang Kaushik <shubhang@os.amperecomputing.com>
---
 kernel/sched/fair.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 25970dbbb27959bc130d288d5f80677f75f8db8b..ac94463627778f09522fb5420f67b903a694ad4d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7847,9 +7847,7 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
 	    asym_fits_cpu(task_util, util_min, util_max, target))
 		return target;
 
-	/*
-	 * If the previous CPU is cache affine and idle, don't be stupid:
-	 */
+	/* Reschedule on an idle, cache-sharing sibling to preserve affinity: */
 	if (prev != target && cpus_share_cache(prev, target) &&
 	    (available_idle_cpu(prev) || sched_idle_cpu(prev)) &&
 	    asym_fits_cpu(task_util, util_min, util_max, prev)) {
@@ -7861,6 +7859,14 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
 		prev_aff = prev;
 	}
 
+	/*
+	 * If the previous CPU is not overutilized, prefer it for cache locality.
+	 * This prevents migration away from a cache-hot CPU that can still
+	 * handle the task without causing an overload.
+	 */
+	if (sched_energy_enabled() && !cpu_overutilized(prev))
+		return prev;
+
 	/*
 	 * Allow a per-cpu kthread to stack with the wakee if the
 	 * kworker thread and the tasks previous CPUs are the same.

---
base-commit: e53642b87a4f4b03a8d7e5f8507fc3cd0c595ea6
change-id: 20251030-b4-follow-up-ff03b4533a2d

Best regards,
-- 
Shubhang Kaushik <shubhang@os.amperecomputing.com>
Re: [PATCH v2] sched/fair: Prefer cache locality for EAS wakeup
Posted by Vincent Guittot 3 months, 1 week ago
On Thu, 30 Oct 2025 at 20:19, Shubhang Kaushik via B4 Relay
<devnull+shubhang.os.amperecomputing.com@kernel.org> wrote:
>
> From: Shubhang Kaushik <shubhang@os.amperecomputing.com>
>
> When Energy Aware Scheduling (EAS) is enabled, a task waking up on a
> sibling CPU might migrate away from its previous CPU even if that CPU
> is not overutilized. This sacrifices cache locality and introduces
> unnecessary migration overhead.
>
> This patch refines the wakeup heuristic in `select_idle_sibling()`. If
> EAS is active and the task's previous CPU (`prev`) is not overutilized,
> the scheduler will prioritize waking the task on `prev`, avoiding an
> unneeded migration and preserving cache-hotness.
>
> ---
> v2:
> - Addressed reviewer comments to handle this special condition
>   within the selection logic, prioritizing the
>   previous CPU if not overutilized for EAS.
> - Link to v1: https://lore.kernel.org/all/20251017-b4-sched-cfs-refactor-propagate-v1-1-1eb0dc5b19b3@os.amperecomputing.com/
>
> Signed-off-by: Shubhang Kaushik <shubhang@os.amperecomputing.com>
> ---
>  kernel/sched/fair.c | 12 +++++++++---
>  1 file changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 25970dbbb27959bc130d288d5f80677f75f8db8b..ac94463627778f09522fb5420f67b903a694ad4d 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7847,9 +7847,7 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
>             asym_fits_cpu(task_util, util_min, util_max, target))
>                 return target;
>
> -       /*
> -        * If the previous CPU is cache affine and idle, don't be stupid:
> -        */
> +       /* Reschedule on an idle, cache-sharing sibling to preserve affinity: */
>         if (prev != target && cpus_share_cache(prev, target) &&
>             (available_idle_cpu(prev) || sched_idle_cpu(prev)) &&
>             asym_fits_cpu(task_util, util_min, util_max, prev)) {
> @@ -7861,6 +7859,14 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
>                 prev_aff = prev;
>         }
>
> +       /*
> +        * If the previous CPU is not overutilized, prefer it for cache locality.
> +        * This prevents migration away from a cache-hot CPU that can still
> +        * handle the task without causing an overload.
> +        */
> +       if (sched_energy_enabled() && !cpu_overutilized(prev))

From your previous answer on v1, I don't think that you use
heterogeneous system so eas will not be enabled in your case and even
when used find_energy_efficient_cpu() will be called before

select_idle_sibling looks for an idle cpu that shares the cache with
target, Isn't such migration inside the same LLC good in your case ?

Otherwise you might want to check in wake_affine() where we decide
between local cpu and previous cpu which one should be the target.
This can have an impact especially if there are not in the same LLC

> +               return prev;
> +
>         /*
>          * Allow a per-cpu kthread to stack with the wakee if the
>          * kworker thread and the tasks previous CPUs are the same.
>
> ---
> base-commit: e53642b87a4f4b03a8d7e5f8507fc3cd0c595ea6
> change-id: 20251030-b4-follow-up-ff03b4533a2d
>
> Best regards,
> --
> Shubhang Kaushik <shubhang@os.amperecomputing.com>
>
>
Re: [PATCH v2] sched/fair: Prefer cache locality for EAS wakeup
Posted by Madadi Vineeth Reddy 3 months, 1 week ago
Hi Shubhang,

On 31/10/25 00:49, Shubhang Kaushik wrote:
> When Energy Aware Scheduling (EAS) is enabled, a task waking up on a
> sibling CPU might migrate away from its previous CPU even if that CPU
> is not overutilized. This sacrifices cache locality and introduces
> unnecessary migration overhead.
> 
> This patch refines the wakeup heuristic in `select_idle_sibling()`. If
> EAS is active and the task's previous CPU (`prev`) is not overutilized,
> the scheduler will prioritize waking the task on `prev`, avoiding an
> unneeded migration and preserving cache-hotness.
> 
> ---
> v2:
> - Addressed reviewer comments to handle this special condition
>   within the selection logic, prioritizing the
>   previous CPU if not overutilized for EAS.
> - Link to v1: https://lore.kernel.org/all/20251017-b4-sched-cfs-refactor-propagate-v1-1-1eb0dc5b19b3@os.amperecomputing.com/
> 
> Signed-off-by: Shubhang Kaushik <shubhang@os.amperecomputing.com>
> ---
>  kernel/sched/fair.c | 12 +++++++++---
>  1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 25970dbbb27959bc130d288d5f80677f75f8db8b..ac94463627778f09522fb5420f67b903a694ad4d 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7847,9 +7847,7 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
>  	    asym_fits_cpu(task_util, util_min, util_max, target))
>  		return target;
>  
> -	/*
> -	 * If the previous CPU is cache affine and idle, don't be stupid:
> -	 */
> +	/* Reschedule on an idle, cache-sharing sibling to preserve affinity: */
>  	if (prev != target && cpus_share_cache(prev, target) &&
>  	    (available_idle_cpu(prev) || sched_idle_cpu(prev)) &&
>  	    asym_fits_cpu(task_util, util_min, util_max, prev)) {
> @@ -7861,6 +7859,14 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
>  		prev_aff = prev;
>  	}
>  
> +	/*
> +	 * If the previous CPU is not overutilized, prefer it for cache locality.
> +	 * This prevents migration away from a cache-hot CPU that can still
> +	 * handle the task without causing an overload.
> +	 */
> +	if (sched_energy_enabled() && !cpu_overutilized(prev))
> +		return prev;
> +

The above !cpu_overutilized(prev) is placed before recent_used_cpu idle check.
This means if prev is busy (but not overutilized) and recent_used_cpu is completely
idle, the task returns to prev and misses the idle opportunity.

Is cache locality prioritized even over idle CPU availability?
Are there measurements showing this trade-off is worthwhile for real workloads?

Thank you,
Madadi Vineeth Reddy

>  	/*
>  	 * Allow a per-cpu kthread to stack with the wakee if the
>  	 * kworker thread and the tasks previous CPUs are the same.
> 
> ---
> base-commit: e53642b87a4f4b03a8d7e5f8507fc3cd0c595ea6
> change-id: 20251030-b4-follow-up-ff03b4533a2d
> 
> Best regards,
Re: [PATCH v2] sched/fair: Prefer cache locality for EAS wakeup
Posted by Christian Loehle 3 months, 1 week ago
On 10/30/25 19:19, Shubhang Kaushik via B4 Relay wrote:
> From: Shubhang Kaushik <shubhang@os.amperecomputing.com>
> 
> When Energy Aware Scheduling (EAS) is enabled, a task waking up on a
> sibling CPU might migrate away from its previous CPU even if that CPU
> is not overutilized. This sacrifices cache locality and introduces
> unnecessary migration overhead.
> 
> This patch refines the wakeup heuristic in `select_idle_sibling()`. If
> EAS is active and the task's previous CPU (`prev`) is not overutilized,
> the scheduler will prioritize waking the task on `prev`, avoiding an
> unneeded migration and preserving cache-hotness.
> 
> ---
> v2:
> - Addressed reviewer comments to handle this special condition
>   within the selection logic, prioritizing the
>   previous CPU if not overutilized for EAS.
> - Link to v1: https://lore.kernel.org/all/20251017-b4-sched-cfs-refactor-propagate-v1-1-1eb0dc5b19b3@os.amperecomputing.com/
> 
> Signed-off-by: Shubhang Kaushik <shubhang@os.amperecomputing.com>
> ---
>  kernel/sched/fair.c | 12 +++++++++---
>  1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 25970dbbb27959bc130d288d5f80677f75f8db8b..ac94463627778f09522fb5420f67b903a694ad4d 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7847,9 +7847,7 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
>  	    asym_fits_cpu(task_util, util_min, util_max, target))
>  		return target;
>  
> -	/*
> -	 * If the previous CPU is cache affine and idle, don't be stupid:
> -	 */
> +	/* Reschedule on an idle, cache-sharing sibling to preserve affinity: */
>  	if (prev != target && cpus_share_cache(prev, target) &&
>  	    (available_idle_cpu(prev) || sched_idle_cpu(prev)) &&
>  	    asym_fits_cpu(task_util, util_min, util_max, prev)) {
> @@ -7861,6 +7859,14 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
>  		prev_aff = prev;
>  	}
>  
> +	/*
> +	 * If the previous CPU is not overutilized, prefer it for cache locality.
> +	 * This prevents migration away from a cache-hot CPU that can still
> +	 * handle the task without causing an overload.
> +	 */
> +	if (sched_energy_enabled() && !cpu_overutilized(prev))
> +		return prev;
> +
>  	/*
>  	 * Allow a per-cpu kthread to stack with the wakee if the
>  	 * kworker thread and the tasks previous CPUs are the same.
> 
> ---
> base-commit: e53642b87a4f4b03a8d7e5f8507fc3cd0c595ea6
> change-id: 20251030-b4-follow-up-ff03b4533a2d
> 
> Best regards,

So if you're actually targetting EAS I don't get why you would check overutilized (instead
of asym_fits, what about uclamp?) but also, given that many EAS systems have only one common
llc I don't quite get why you would want this anyway.
Do you have a system / workload showing a benefit?
(I find with EAS heavily relying on wakeups, what we do in the slow path isn't that important
for most workloads...)