load balancer for fair class looks at sched domain and active cpus to consider
spreading the load. mask out the paravirt CPUs so that tasks doesn't spread to
those.
At wakeup, don't select a paravirt CPU.
Expect minimal impact when it is disabled.
Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
---
kernel/sched/fair.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index df8dc389af8e..3dc76525b32c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8563,7 +8563,7 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
if (!is_rd_overutilized(this_rq()->rd)) {
new_cpu = find_energy_efficient_cpu(p, prev_cpu);
if (new_cpu >= 0)
- return new_cpu;
+ goto check_new_cpu;
new_cpu = prev_cpu;
}
@@ -8605,7 +8605,12 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
}
rcu_read_unlock();
- return new_cpu;
+ /* If newly found or prev_cpu is a paravirt cpu, use current cpu */
+check_new_cpu:
+ if (is_cpu_paravirt(new_cpu))
+ return cpu;
+ else
+ return new_cpu;
}
/*
@@ -11734,6 +11739,12 @@ static int sched_balance_rq(int this_cpu, struct rq *this_rq,
cpumask_and(cpus, sched_domain_span(sd), cpu_active_mask);
+#ifdef CONFIG_PARAVIRT
+ /* Don't spread load to paravirt CPUs */
+ if (static_branch_unlikely(&cpu_paravirt_push_tasks))
+ cpumask_andnot(cpus, cpus, cpu_paravirt_mask);
+#endif
+
schedstat_inc(sd->lb_count[idle]);
redo:
--
2.47.3
Hello Shrikanth,
On 9/10/2025 11:12 PM, Shrikanth Hegde wrote:
> @@ -8563,7 +8563,7 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
> if (!is_rd_overutilized(this_rq()->rd)) {
> new_cpu = find_energy_efficient_cpu(p, prev_cpu);
> if (new_cpu >= 0)
> - return new_cpu;
> + goto check_new_cpu;
Should this fallback to the overutilized path if the most energy
efficient CPU is found to be paravirtualized or should
find_energy_efficient_cpu() be made aware of it?
> new_cpu = prev_cpu;
> }
>
> @@ -8605,7 +8605,12 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
> }
> rcu_read_unlock();
>
> - return new_cpu;
> + /* If newly found or prev_cpu is a paravirt cpu, use current cpu */
> +check_new_cpu:
> + if (is_cpu_paravirt(new_cpu))
> + return cpu;
> + else
nit. redundant else.
> + return new_cpu;
> }
>
> /*
> @@ -11734,6 +11739,12 @@ static int sched_balance_rq(int this_cpu, struct rq *this_rq,
>
> cpumask_and(cpus, sched_domain_span(sd), cpu_active_mask);
>
> +#ifdef CONFIG_PARAVIRT
> + /* Don't spread load to paravirt CPUs */
> + if (static_branch_unlikely(&cpu_paravirt_push_tasks))
> + cpumask_andnot(cpus, cpus, cpu_paravirt_mask);
> +#endif
Can something similar be also be done in select_idle_sibling() and
sched_balance_find_dst_cpu() for wakeup path?
> +
> schedstat_inc(sd->lb_count[idle]);
>
> redo:
--
Thanks and Regards,
Prateek
On 9/11/25 10:53 AM, K Prateek Nayak wrote:
> Hello Shrikanth,
>
> On 9/10/2025 11:12 PM, Shrikanth Hegde wrote:
>> @@ -8563,7 +8563,7 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
>> if (!is_rd_overutilized(this_rq()->rd)) {
>> new_cpu = find_energy_efficient_cpu(p, prev_cpu);
>> if (new_cpu >= 0)
>> - return new_cpu;
>> + goto check_new_cpu;
>
> Should this fallback to the overutilized path if the most energy
> efficient CPU is found to be paravirtualized or should
> find_energy_efficient_cpu() be made aware of it?
While thinking about this, are there any such system which has vCPUs and
overcommits and still has energy model backing it?
Highly unlikely. So, I am planning to put a warning there and see if any
such usage exists there.
>
>> new_cpu = prev_cpu;
>> }
>>
>> @@ -8605,7 +8605,12 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
>> }
>> rcu_read_unlock();
>>
>> - return new_cpu;
>> + /* If newly found or prev_cpu is a paravirt cpu, use current cpu */
>> +check_new_cpu:
>> + if (is_cpu_paravirt(new_cpu))
>> + return cpu;
>> + else
>
> nit. redundant else.
>
>> + return new_cpu;
>> }
>>
>> /*
>> @@ -11734,6 +11739,12 @@ static int sched_balance_rq(int this_cpu, struct rq *this_rq,
>>
>> cpumask_and(cpus, sched_domain_span(sd), cpu_active_mask);
>>
>> +#ifdef CONFIG_PARAVIRT
>> + /* Don't spread load to paravirt CPUs */
>> + if (static_branch_unlikely(&cpu_paravirt_push_tasks))
>> + cpumask_andnot(cpus, cpus, cpu_paravirt_mask);
>> +#endif
>
> Can something similar be also be done in select_idle_sibling() and
> sched_balance_find_dst_cpu() for wakeup path?
>
We have this pattern in select_task_rq_fair
cpu = smp_processor_id();
new_cpu = prev_cpu;
task is waking up after a while, so likely prev_cpu is marked as paravirt
and in such cases we should return current cpu. if current cpu is paravirt(unlikely),
and prev_cpu is also paravirt, then should return current cpu.
In next sched tick it will be pushed out.
select_idle_sibling(p, prev_cpu, new_cpu); - (new_cpu will remain prev_cpu if wake_affine doesn't change it)
Will have to change the prototype to send current cpu as well.
On 9/11/25 10:53 AM, K Prateek Nayak wrote:
> Hello Shrikanth,
>
> On 9/10/2025 11:12 PM, Shrikanth Hegde wrote:
>> @@ -8563,7 +8563,7 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
>> if (!is_rd_overutilized(this_rq()->rd)) {
>> new_cpu = find_energy_efficient_cpu(p, prev_cpu);
>> if (new_cpu >= 0)
>> - return new_cpu;
>> + goto check_new_cpu;
>
> Should this fallback to the overutilized path if the most energy
> efficient CPU is found to be paravirtualized or should
> find_energy_efficient_cpu() be made aware of it?
>
>> new_cpu = prev_cpu;
>> }
>>
>> @@ -8605,7 +8605,12 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
>> }
>> rcu_read_unlock();
>>
>> - return new_cpu;
>> + /* If newly found or prev_cpu is a paravirt cpu, use current cpu */
>> +check_new_cpu:
>> + if (is_cpu_paravirt(new_cpu))
>> + return cpu;
>> + else
>
> nit. redundant else.
>
Do you mean "is_cpu_paravirt(new_cpu) ? cpu; new_cpu"
This needs to return cpu instead of true/false. maybe i not seeing the obvious.
>> + return new_cpu;
>> }
>>
>> /*
>> @@ -11734,6 +11739,12 @@ static int sched_balance_rq(int this_cpu, struct rq *this_rq,
>>
>> cpumask_and(cpus, sched_domain_span(sd), cpu_active_mask);
>>
>> +#ifdef CONFIG_PARAVIRT
>> + /* Don't spread load to paravirt CPUs */
>> + if (static_branch_unlikely(&cpu_paravirt_push_tasks))
>> + cpumask_andnot(cpus, cpus, cpu_paravirt_mask);
>> +#endif
>
> Can something similar be also be done in select_idle_sibling() and
> sched_balance_find_dst_cpu() for wakeup path?
That's a good suggestion. don't make a choice which is a paravirt CPU.
Will explore.
>
>> +
>> schedstat_inc(sd->lb_count[idle]);
>>
>> redo:
Hello Shrikanth, On 9/11/2025 9:26 PM, Shrikanth Hegde wrote: >>> +check_new_cpu: >>> + if (is_cpu_paravirt(new_cpu)) >>> + return cpu; >>> + else >> >> nit. redundant else. >> > > Do you mean "is_cpu_paravirt(new_cpu) ? cpu; new_cpu" Sorry for the confusion! I meant we can have: if (is_cpu_paravirt(new_cpu)) return cpu; return new_cpu; Since we return from the if clause, we don't need to specify else. -- Thanks and Regards, Prateek
© 2016 - 2026 Red Hat, Inc.