[RFC PATCH v3 05/10] sched/fair: Don't consider paravirt CPUs for wakeup and load balance

Shrikanth Hegde posted 10 patches 5 months ago
[RFC PATCH v3 05/10] sched/fair: Don't consider paravirt CPUs for wakeup and load balance
Posted by Shrikanth Hegde 5 months ago
load balancer for fair class looks at sched domain and active cpus to consider
spreading the load. mask out the paravirt CPUs so that tasks doesn't spread to
those.

At wakeup, don't select a paravirt CPU.

Expect minimal impact when it is disabled.

Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
---
 kernel/sched/fair.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index df8dc389af8e..3dc76525b32c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8563,7 +8563,7 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
 		if (!is_rd_overutilized(this_rq()->rd)) {
 			new_cpu = find_energy_efficient_cpu(p, prev_cpu);
 			if (new_cpu >= 0)
-				return new_cpu;
+				goto check_new_cpu;
 			new_cpu = prev_cpu;
 		}
 
@@ -8605,7 +8605,12 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
 	}
 	rcu_read_unlock();
 
-	return new_cpu;
+	/* If newly found or prev_cpu is a paravirt cpu, use current cpu */
+check_new_cpu:
+	if (is_cpu_paravirt(new_cpu))
+		return cpu;
+	else
+		return new_cpu;
 }
 
 /*
@@ -11734,6 +11739,12 @@ static int sched_balance_rq(int this_cpu, struct rq *this_rq,
 
 	cpumask_and(cpus, sched_domain_span(sd), cpu_active_mask);
 
+#ifdef CONFIG_PARAVIRT
+	/* Don't spread load to paravirt CPUs */
+	if (static_branch_unlikely(&cpu_paravirt_push_tasks))
+		cpumask_andnot(cpus, cpus, cpu_paravirt_mask);
+#endif
+
 	schedstat_inc(sd->lb_count[idle]);
 
 redo:
-- 
2.47.3
Re: [RFC PATCH v3 05/10] sched/fair: Don't consider paravirt CPUs for wakeup and load balance
Posted by K Prateek Nayak 5 months ago
Hello Shrikanth,

On 9/10/2025 11:12 PM, Shrikanth Hegde wrote:
> @@ -8563,7 +8563,7 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
>  		if (!is_rd_overutilized(this_rq()->rd)) {
>  			new_cpu = find_energy_efficient_cpu(p, prev_cpu);
>  			if (new_cpu >= 0)
> -				return new_cpu;
> +				goto check_new_cpu;

Should this fallback to the overutilized path if the most energy
efficient CPU is found to be paravirtualized or should
find_energy_efficient_cpu() be made aware of it?

>  			new_cpu = prev_cpu;
>  		}
>  
> @@ -8605,7 +8605,12 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
>  	}
>  	rcu_read_unlock();
>  
> -	return new_cpu;
> +	/* If newly found or prev_cpu is a paravirt cpu, use current cpu */
> +check_new_cpu:
> +	if (is_cpu_paravirt(new_cpu))
> +		return cpu;
> +	else

nit. redundant else.

> +		return new_cpu;
>  }
>  
>  /*
> @@ -11734,6 +11739,12 @@ static int sched_balance_rq(int this_cpu, struct rq *this_rq,
>  
>  	cpumask_and(cpus, sched_domain_span(sd), cpu_active_mask);
>  
> +#ifdef CONFIG_PARAVIRT
> +	/* Don't spread load to paravirt CPUs */
> +	if (static_branch_unlikely(&cpu_paravirt_push_tasks))
> +		cpumask_andnot(cpus, cpus, cpu_paravirt_mask);
> +#endif

Can something similar be also be done in select_idle_sibling() and
sched_balance_find_dst_cpu() for wakeup path?

> +
>  	schedstat_inc(sd->lb_count[idle]);
>  
>  redo:
-- 
Thanks and Regards,
Prateek
Re: [RFC PATCH v3 05/10] sched/fair: Don't consider paravirt CPUs for wakeup and load balance
Posted by Shrikanth Hegde 3 months ago

On 9/11/25 10:53 AM, K Prateek Nayak wrote:
> Hello Shrikanth,
> 
> On 9/10/2025 11:12 PM, Shrikanth Hegde wrote:
>> @@ -8563,7 +8563,7 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
>>   		if (!is_rd_overutilized(this_rq()->rd)) {
>>   			new_cpu = find_energy_efficient_cpu(p, prev_cpu);
>>   			if (new_cpu >= 0)
>> -				return new_cpu;
>> +				goto check_new_cpu;
> 
> Should this fallback to the overutilized path if the most energy
> efficient CPU is found to be paravirtualized or should
> find_energy_efficient_cpu() be made aware of it?


While thinking about this, are there any such system which has vCPUs and
overcommits and still has energy model backing it?

Highly unlikely. So, I am planning to put a warning there and see if any
such usage exists there.

> 
>>   			new_cpu = prev_cpu;
>>   		}
>>   
>> @@ -8605,7 +8605,12 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
>>   	}
>>   	rcu_read_unlock();
>>   
>> -	return new_cpu;
>> +	/* If newly found or prev_cpu is a paravirt cpu, use current cpu */
>> +check_new_cpu:
>> +	if (is_cpu_paravirt(new_cpu))
>> +		return cpu;
>> +	else
> 
> nit. redundant else.
> 
>> +		return new_cpu;
>>   }
>>   
>>   /*
>> @@ -11734,6 +11739,12 @@ static int sched_balance_rq(int this_cpu, struct rq *this_rq,
>>   
>>   	cpumask_and(cpus, sched_domain_span(sd), cpu_active_mask);
>>   
>> +#ifdef CONFIG_PARAVIRT
>> +	/* Don't spread load to paravirt CPUs */
>> +	if (static_branch_unlikely(&cpu_paravirt_push_tasks))
>> +		cpumask_andnot(cpus, cpus, cpu_paravirt_mask);
>> +#endif
> 
> Can something similar be also be done in select_idle_sibling() and
> sched_balance_find_dst_cpu() for wakeup path?
> 

We have this pattern in select_task_rq_fair

cpu = smp_processor_id();
new_cpu = prev_cpu;

task is waking up after a while, so likely prev_cpu is marked as paravirt
and in such cases we should return current cpu. if current cpu is paravirt(unlikely),
and prev_cpu is also paravirt, then should return current cpu.
In next sched tick it will be pushed out.

select_idle_sibling(p, prev_cpu, new_cpu); - (new_cpu will remain prev_cpu if wake_affine doesn't change it)
Will have to change the prototype to send current cpu as well.
Re: [RFC PATCH v3 05/10] sched/fair: Don't consider paravirt CPUs for wakeup and load balance
Posted by Shrikanth Hegde 5 months ago

On 9/11/25 10:53 AM, K Prateek Nayak wrote:
> Hello Shrikanth,
> 
> On 9/10/2025 11:12 PM, Shrikanth Hegde wrote:
>> @@ -8563,7 +8563,7 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
>>   		if (!is_rd_overutilized(this_rq()->rd)) {
>>   			new_cpu = find_energy_efficient_cpu(p, prev_cpu);
>>   			if (new_cpu >= 0)
>> -				return new_cpu;
>> +				goto check_new_cpu;
> 
> Should this fallback to the overutilized path if the most energy
> efficient CPU is found to be paravirtualized or should
> find_energy_efficient_cpu() be made aware of it?
> 
>>   			new_cpu = prev_cpu;
>>   		}
>>   
>> @@ -8605,7 +8605,12 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int wake_flags)
>>   	}
>>   	rcu_read_unlock();
>>   
>> -	return new_cpu;
>> +	/* If newly found or prev_cpu is a paravirt cpu, use current cpu */
>> +check_new_cpu:
>> +	if (is_cpu_paravirt(new_cpu))
>> +		return cpu;
>> +	else
> 
> nit. redundant else.
> 

Do you mean "is_cpu_paravirt(new_cpu) ? cpu; new_cpu"

This needs to return cpu instead of true/false. maybe i not seeing the obvious.

>> +		return new_cpu;
>>   }
>>   
>>   /*
>> @@ -11734,6 +11739,12 @@ static int sched_balance_rq(int this_cpu, struct rq *this_rq,
>>   
>>   	cpumask_and(cpus, sched_domain_span(sd), cpu_active_mask);
>>   
>> +#ifdef CONFIG_PARAVIRT
>> +	/* Don't spread load to paravirt CPUs */
>> +	if (static_branch_unlikely(&cpu_paravirt_push_tasks))
>> +		cpumask_andnot(cpus, cpus, cpu_paravirt_mask);
>> +#endif
> 
> Can something similar be also be done in select_idle_sibling() and
> sched_balance_find_dst_cpu() for wakeup path?

That's a good suggestion. don't make a choice which is a paravirt CPU.
Will explore.

> 
>> +
>>   	schedstat_inc(sd->lb_count[idle]);
>>   
>>   redo:
Re: [RFC PATCH v3 05/10] sched/fair: Don't consider paravirt CPUs for wakeup and load balance
Posted by K Prateek Nayak 5 months ago
Hello Shrikanth,

On 9/11/2025 9:26 PM, Shrikanth Hegde wrote:
>>> +check_new_cpu:
>>> +    if (is_cpu_paravirt(new_cpu))
>>> +        return cpu;
>>> +    else
>>
>> nit. redundant else.
>>
> 
> Do you mean "is_cpu_paravirt(new_cpu) ? cpu; new_cpu"

Sorry for the confusion! I meant we can have:

	if (is_cpu_paravirt(new_cpu))
		return cpu;

	return new_cpu;

Since we return from the if clause, we don't need to specify else.

-- 
Thanks and Regards,
Prateek