[PATCH v2 2/3] sched/fair: Ignore isolated cpus in update_numa_stat

Chuyi Zhou posted 3 patches 1 year, 1 month ago
There is a newer version of this series
[PATCH v2 2/3] sched/fair: Ignore isolated cpus in update_numa_stat
Posted by Chuyi Zhou 1 year, 1 month ago
Now update_numa_stats() iterates each cpu in a node to gather load
information for the node and attempts to find the idle cpu as a candidate
best_cpu within the node.

In update_numa_stats() we should take into account the scheduling domain.
This is because the "isolcpus" kernel command line option and cpuset iso-
late partitions can remove CPUs from load balance. Similar to task wakeup
and periodic load balancing, we should not involve isolated CPUs in NUMA
balancing. When gathering load information for nodes, we need to ignore the
load of isolated CPUs. This change also avoids selecting an isolated CPU
as the idle_cpu.

Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
---
 kernel/sched/fair.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f544012b9320..a0139659fe7a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2125,6 +2125,11 @@ static void update_numa_stats(struct task_numa_env *env,
 	for_each_cpu(cpu, cpumask_of_node(nid)) {
 		struct rq *rq = cpu_rq(cpu);
 
+		/* skip isolated cpus' load */
+		if (!rcu_dereference(rq->sd))
+			continue;
+
+		ns->weight++;
 		ns->load += cpu_load(rq);
 		ns->runnable += cpu_runnable(rq);
 		ns->util += cpu_util_cfs(cpu);
@@ -2144,8 +2149,6 @@ static void update_numa_stats(struct task_numa_env *env,
 	}
 	rcu_read_unlock();
 
-	ns->weight = cpumask_weight(cpumask_of_node(nid));
-
 	ns->node_type = numa_classify(env->imbalance_pct, ns);
 
 	if (idle_core >= 0)
-- 
2.20.1
Re: [PATCH v2 2/3] sched/fair: Ignore isolated cpus in update_numa_stat
Posted by Waiman Long 1 year, 1 month ago
On 1/3/25 1:59 AM, Chuyi Zhou wrote:
> Now update_numa_stats() iterates each cpu in a node to gather load
> information for the node and attempts to find the idle cpu as a candidate
> best_cpu within the node.
>
> In update_numa_stats() we should take into account the scheduling domain.
> This is because the "isolcpus" kernel command line option and cpuset iso-
> late partitions can remove CPUs from load balance. Similar to task wakeup
> and periodic load balancing, we should not involve isolated CPUs in NUMA
> balancing. When gathering load information for nodes, we need to ignore the
> load of isolated CPUs. This change also avoids selecting an isolated CPU
> as the idle_cpu.
>
> Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
> ---
>   kernel/sched/fair.c | 7 +++++--
>   1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index f544012b9320..a0139659fe7a 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2125,6 +2125,11 @@ static void update_numa_stats(struct task_numa_env *env,
>   	for_each_cpu(cpu, cpumask_of_node(nid)) {
>   		struct rq *rq = cpu_rq(cpu);
>   
> +		/* skip isolated cpus' load */
> +		if (!rcu_dereference(rq->sd))
> +			continue;
> +
> +		ns->weight++;
>   		ns->load += cpu_load(rq);
>   		ns->runnable += cpu_runnable(rq);
>   		ns->util += cpu_util_cfs(cpu);
> @@ -2144,8 +2149,6 @@ static void update_numa_stats(struct task_numa_env *env,
>   	}
>   	rcu_read_unlock();
>   
> -	ns->weight = cpumask_weight(cpumask_of_node(nid));
> -
>   	ns->node_type = numa_classify(env->imbalance_pct, ns);
>   
>   	if (idle_core >= 0)

You should initalize ns->weight to 0 first before iteration to prevent 
pre-existing ns->weight value from corrupting the result.

Cheers,
Longman
Re: [PATCH v2 2/3] sched/fair: Ignore isolated cpus in update_numa_stat
Posted by Chuyi Zhou 1 year, 1 month ago
Hello Waiman,

在 2025/1/8 02:39, Waiman Long 写道:
> 
> On 1/3/25 1:59 AM, Chuyi Zhou wrote:
>> Now update_numa_stats() iterates each cpu in a node to gather load
>> information for the node and attempts to find the idle cpu as a candidate
>> best_cpu within the node.
>>
>> In update_numa_stats() we should take into account the scheduling domain.
>> This is because the "isolcpus" kernel command line option and cpuset iso-
>> late partitions can remove CPUs from load balance. Similar to task wakeup
>> and periodic load balancing, we should not involve isolated CPUs in NUMA
>> balancing. When gathering load information for nodes, we need to 
>> ignore the
>> load of isolated CPUs. This change also avoids selecting an isolated CPU
>> as the idle_cpu.
>>
>> Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
>> ---
>>   kernel/sched/fair.c | 7 +++++--
>>   1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index f544012b9320..a0139659fe7a 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -2125,6 +2125,11 @@ static void update_numa_stats(struct 
>> task_numa_env *env,
>>       for_each_cpu(cpu, cpumask_of_node(nid)) {
>>           struct rq *rq = cpu_rq(cpu);
>> +        /* skip isolated cpus' load */
>> +        if (!rcu_dereference(rq->sd))
>> +            continue;
>> +
>> +        ns->weight++;
>>           ns->load += cpu_load(rq);
>>           ns->runnable += cpu_runnable(rq);
>>           ns->util += cpu_util_cfs(cpu);
>> @@ -2144,8 +2149,6 @@ static void update_numa_stats(struct 
>> task_numa_env *env,
>>       }
>>       rcu_read_unlock();
>> -    ns->weight = cpumask_weight(cpumask_of_node(nid));
>> -
>>       ns->node_type = numa_classify(env->imbalance_pct, ns);
>>       if (idle_core >= 0)
> 
> You should initalize ns->weight to 0 first before iteration to prevent 
> pre-existing ns->weight value from corrupting the result.
> 
> Cheers,
> Longman
> 

Thanks for your review.

We have already memset ns to 0 before the start of update_numa_stats(), 
so I think it should be okay here.

Re: [PATCH v2 2/3] sched/fair: Ignore isolated cpus in update_numa_stat
Posted by Madadi Vineeth Reddy 1 year, 1 month ago
On 03/01/25 12:29, Chuyi Zhou wrote:
> Now update_numa_stats() iterates each cpu in a node to gather load
> information for the node and attempts to find the idle cpu as a candidate
> best_cpu within the node.
> 
> In update_numa_stats() we should take into account the scheduling domain.
> This is because the "isolcpus" kernel command line option and cpuset iso-
> late partitions can remove CPUs from load balance. Similar to task wakeup
> and periodic load balancing, we should not involve isolated CPUs in NUMA
> balancing. When gathering load information for nodes, we need to ignore the
> load of isolated CPUs. This change also avoids selecting an isolated CPU
> as the idle_cpu.

If possible, would you be able to share any performance metrics or benchmarks
that demonstrate the impact of this patch on NUMA balancing or CPU migration
efficiency?

Thanks,
Madadi Vineeth Reddy

> 
> Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
> ---
>  kernel/sched/fair.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
Re: [PATCH v2 2/3] sched/fair: Ignore isolated cpus in update_numa_stat
Posted by Chuyi Zhou 1 year, 1 month ago
Hello Madadi Vineeth Reddy,

在 2025/1/6 02:52, Madadi Vineeth Reddy 写道:
> On 03/01/25 12:29, Chuyi Zhou wrote:
>> Now update_numa_stats() iterates each cpu in a node to gather load
>> information for the node and attempts to find the idle cpu as a candidate
>> best_cpu within the node.
>>
>> In update_numa_stats() we should take into account the scheduling domain.
>> This is because the "isolcpus" kernel command line option and cpuset iso-
>> late partitions can remove CPUs from load balance. Similar to task wakeup
>> and periodic load balancing, we should not involve isolated CPUs in NUMA
>> balancing. When gathering load information for nodes, we need to ignore the
>> load of isolated CPUs. This change also avoids selecting an isolated CPU
>> as the idle_cpu.
> 
> If possible, would you be able to share any performance metrics or benchmarks
> that demonstrate the impact of this patch on NUMA balancing or CPU migration
> efficiency?
> 
> Thanks,
> Madadi Vineeth Reddy
> 

This change will not alter the default behavior of NUMA balancing unless 
we enables both NUMA balancing and isolated CPUs features. Therefore, 
under normal default conditions, there should be no performance 
regression. On the other hand, involving isolated CPUs in NUMA balancing 
or load balancing is inappropriate, and this is the issue that this 
patch aims to fix.

It might be worth setting up a test scenario for cases where NUMA 
balancing and isolated CPUs are both enabled. Perhaps this testing will 
be done later on.

Thanks.
Re: [PATCH v2 2/3] sched/fair: Ignore isolated cpus in update_numa_stat
Posted by Madadi Vineeth Reddy 1 year, 1 month ago
On 07/01/25 18:46, Chuyi Zhou wrote:
> Hello Madadi Vineeth Reddy,
> 
> 在 2025/1/6 02:52, Madadi Vineeth Reddy 写道:
>> On 03/01/25 12:29, Chuyi Zhou wrote:
>>> Now update_numa_stats() iterates each cpu in a node to gather load
>>> information for the node and attempts to find the idle cpu as a candidate
>>> best_cpu within the node.
>>>
>>> In update_numa_stats() we should take into account the scheduling domain.
>>> This is because the "isolcpus" kernel command line option and cpuset iso-
>>> late partitions can remove CPUs from load balance. Similar to task wakeup
>>> and periodic load balancing, we should not involve isolated CPUs in NUMA
>>> balancing. When gathering load information for nodes, we need to ignore the
>>> load of isolated CPUs. This change also avoids selecting an isolated CPU
>>> as the idle_cpu.
>>
>> If possible, would you be able to share any performance metrics or benchmarks
>> that demonstrate the impact of this patch on NUMA balancing or CPU migration
>> efficiency?
>>
>> Thanks,
>> Madadi Vineeth Reddy
>>
> 
> This change will not alter the default behavior of NUMA balancing unless we enables both NUMA balancing and isolated CPUs features. Therefore, under normal default conditions, there should be no performance regression. On the other hand, involving isolated CPUs in NUMA balancing or load balancing is inappropriate, and this is the issue that this patch aims to fix.
> 
> It might be worth setting up a test scenario for cases where NUMA balancing and isolated CPUs are both enabled. Perhaps this testing will be done later on.

Right, by benchmark, I meant the scenario where both NUMA balancing and isolated
CPUs are involved. It would make for an interesting test case.

Thanks,
Madadi Vineeth Reddy

> 
> Thanks.