[PATCH] sched: update the rq->avg_idle when a task is moved to an idle CPU

Huang Shijie posted 1 patch 1 week ago
There is a newer version of this series
kernel/sched/core.c | 29 ++++++++++++++++++-----------
1 file changed, 18 insertions(+), 11 deletions(-)
[PATCH] sched: update the rq->avg_idle when a task is moved to an idle CPU
Posted by Huang Shijie 1 week ago
In the newidle balance, the rq->idle_stamp may set to a non-zero value
if it cannot pull any task.

In the wakeup, it will detect the rq->idle_stamp, and updates
the rq->avg_idle, then ends the CPU idle status by setting rq->idle_stamp
to zero.

Besides the wakeup, current code does not end the CPU idle status
when a task is moved to the idle CPU, such as fork/clone, execve,
or other cases.

This patch introduces a helper: update_rq_avg_idle().
And update the rq->avg_idle when a task is moved to an idle CPU at:
   -- wakeup
   -- fork/clone
   -- execve
   -- other cases

Signed-off-by: Huang Shijie <shijie@os.amperecomputing.com>
---
 kernel/sched/core.c | 29 ++++++++++++++++++-----------
 1 file changed, 18 insertions(+), 11 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 9f10cfbdc228..732c6f708afc 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2412,6 +2412,21 @@ static inline bool is_cpu_allowed(struct task_struct *p, int cpu)
 	return cpu_online(cpu);
 }
 
+static void update_rq_avg_idle(struct rq *rq)
+{
+	if (rq->idle_stamp) {
+		u64 delta = rq_clock(rq) - rq->idle_stamp;
+		u64 max = 2*rq->max_idle_balance_cost;
+
+		update_avg(&rq->avg_idle, delta);
+
+		if (rq->avg_idle > max)
+			rq->avg_idle = max;
+
+		rq->idle_stamp = 0;
+	}
+}
+
 /*
  * This is how migration works:
  *
@@ -2446,6 +2461,7 @@ static struct rq *move_queued_task(struct rq *rq, struct rq_flags *rf,
 	WARN_ON_ONCE(task_cpu(p) != new_cpu);
 	activate_task(rq, p, 0);
 	wakeup_preempt(rq, p, 0);
+	update_rq_avg_idle(rq);
 
 	return rq;
 }
@@ -3646,17 +3662,7 @@ ttwu_do_activate(struct rq *rq, struct task_struct *p, int wake_flags,
 		rq_repin_lock(rq, rf);
 	}
 
-	if (rq->idle_stamp) {
-		u64 delta = rq_clock(rq) - rq->idle_stamp;
-		u64 max = 2*rq->max_idle_balance_cost;
-
-		update_avg(&rq->avg_idle, delta);
-
-		if (rq->avg_idle > max)
-			rq->avg_idle = max;
-
-		rq->idle_stamp = 0;
-	}
+	update_rq_avg_idle(rq);
 }
 
 /*
@@ -4773,6 +4779,7 @@ void wake_up_new_task(struct task_struct *p)
 		p->sched_class->task_woken(rq, p);
 		rq_repin_lock(rq, &rf);
 	}
+	update_rq_avg_idle(rq);
 	task_rq_unlock(rq, p, &rf);
 }
 
-- 
2.40.1
Re: [PATCH] sched: update the rq->avg_idle when a task is moved to an idle CPU
Posted by Madadi Vineeth Reddy 6 days, 17 hours ago
Hi Huang,

On 24/11/25 08:07, Huang Shijie wrote:
> In the newidle balance, the rq->idle_stamp may set to a non-zero value
> if it cannot pull any task.
> 
> In the wakeup, it will detect the rq->idle_stamp, and updates
> the rq->avg_idle, then ends the CPU idle status by setting rq->idle_stamp
> to zero.
> 
> Besides the wakeup, current code does not end the CPU idle status
> when a task is moved to the idle CPU, such as fork/clone, execve,
> or other cases.
> 
> This patch introduces a helper: update_rq_avg_idle().
> And update the rq->avg_idle when a task is moved to an idle CPU at:
>    -- wakeup
>    -- fork/clone
>    -- execve
>    -- other cases
> 
> Signed-off-by: Huang Shijie <shijie@os.amperecomputing.com>
> ---
>  kernel/sched/core.c | 29 ++++++++++++++++++-----------
>  1 file changed, 18 insertions(+), 11 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 9f10cfbdc228..732c6f708afc 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2412,6 +2412,21 @@ static inline bool is_cpu_allowed(struct task_struct *p, int cpu)
>  	return cpu_online(cpu);
>  }
>  
> +static void update_rq_avg_idle(struct rq *rq)
> +{
> +	if (rq->idle_stamp) {
> +		u64 delta = rq_clock(rq) - rq->idle_stamp;
> +		u64 max = 2*rq->max_idle_balance_cost;
> +
> +		update_avg(&rq->avg_idle, delta);
> +
> +		if (rq->avg_idle > max)
> +			rq->avg_idle = max;
> +
> +		rq->idle_stamp = 0;
> +	}
> +}
> +
>  /*
>   * This is how migration works:
>   *
> @@ -2446,6 +2461,7 @@ static struct rq *move_queued_task(struct rq *rq, struct rq_flags *rf,
>  	WARN_ON_ONCE(task_cpu(p) != new_cpu);
>  	activate_task(rq, p, 0);
>  	wakeup_preempt(rq, p, 0);
> +	update_rq_avg_idle(rq);
>  
>  	return rq;
>  }
> @@ -3646,17 +3662,7 @@ ttwu_do_activate(struct rq *rq, struct task_struct *p, int wake_flags,
>  		rq_repin_lock(rq, rf);
>  	}
>  
> -	if (rq->idle_stamp) {
> -		u64 delta = rq_clock(rq) - rq->idle_stamp;
> -		u64 max = 2*rq->max_idle_balance_cost;
> -
> -		update_avg(&rq->avg_idle, delta);
> -
> -		if (rq->avg_idle > max)
> -			rq->avg_idle = max;
> -
> -		rq->idle_stamp = 0;
> -	}
> +	update_rq_avg_idle(rq);
>  }
>  
>  /*
> @@ -4773,6 +4779,7 @@ void wake_up_new_task(struct task_struct *p)
>  		p->sched_class->task_woken(rq, p);
>  		rq_repin_lock(rq, &rf);
>  	}
> +	update_rq_avg_idle(rq);
>  	task_rq_unlock(rq, p, &rf);
>  }
>  

I traced the activate_task() call paths and found that load balancing migrations
through attach_task() in kernel/sched/fair.c may not be covered.

activate_task <- attach_task <- attach_tasks <- sched_balance_rq 
activate_task <- attach_task <- attach_one_task <- active_load_balance_cpu_stop

These paths are called in periodic load balancing and when tasks are pulled
towards an idle CPU via attach_task(), it doesn't update rq->avg_idle or clear
idle_stamp.

Should attach_task() in kernel/sched/fair.c also call update_rq_avg_idle()
after activation?

Also, can update_rq_avg_idle() be placed inside activate_task() to avoid
all these?

Thanks,
Vineeth
Re: [PATCH] sched: update the rq->avg_idle when a task is moved to an idle CPU
Posted by Shijie Huang 6 days, 15 hours ago
On 25/11/2025 15:08, Madadi Vineeth Reddy wrote:
> I traced the activate_task() call paths and found that load balancing migrations
> through attach_task() in kernel/sched/fair.c may not be covered.
thanks for pointing this, I did not notice them.
>
> activate_task <- attach_task <- attach_tasks <- sched_balance_rq
> activate_task <- attach_task <- attach_one_task <- active_load_balance_cpu_stop
>
> These paths are called in periodic load balancing and when tasks are pulled
> towards an idle CPU via attach_task(), it doesn't update rq->avg_idle or clear
> idle_stamp.

Yes, we should update the rq->avg_idle for them.

I will add it in version 2.

>
> Should attach_task() in kernel/sched/fair.c also call update_rq_avg_idle()
> after activation?

IMHO,  we should not call the update_rq_avg_idle() directly in 
attach_task().

In the current attach_task(), there is no information for the 
context(newidle, idle, busy).

The attach_task() is also called in the newidle code path.

But we can call the update_rq_avg_idle() in attach_tasks() with a 
condition check.

>
> Also, can update_rq_avg_idle() be placed inside activate_task() to avoid
> all these?

The same reason as above.


Thanks

Huang Shijie

Re: [PATCH] sched: update the rq->avg_idle when a task is moved to an idle CPU
Posted by Madadi Vineeth Reddy 6 days, 14 hours ago
On 25/11/25 14:29, Shijie Huang wrote:
> 
> On 25/11/2025 15:08, Madadi Vineeth Reddy wrote:
>> I traced the activate_task() call paths and found that load balancing migrations
>> through attach_task() in kernel/sched/fair.c may not be covered.
> thanks for pointing this, I did not notice them.
>>
>> activate_task <- attach_task <- attach_tasks <- sched_balance_rq
>> activate_task <- attach_task <- attach_one_task <- active_load_balance_cpu_stop
>>
>> These paths are called in periodic load balancing and when tasks are pulled
>> towards an idle CPU via attach_task(), it doesn't update rq->avg_idle or clear
>> idle_stamp.
> 
> Yes, we should update the rq->avg_idle for them.
> 
> I will add it in version 2.
> 
>>
>> Should attach_task() in kernel/sched/fair.c also call update_rq_avg_idle()
>> after activation?
> 
> IMHO,  we should not call the update_rq_avg_idle() directly in attach_task().
> 
> In the current attach_task(), there is no information for the context(newidle, idle, busy).
> 
> The attach_task() is also called in the newidle code path.
> 
> But we can call the update_rq_avg_idle() in attach_tasks() with a condition check.
> 

IIUC, update_rq_avg_idle() already checks if (rq->idle_stamp) internally and
in attach_task() we have rq available and the guard in update_rq_avg_idle()
ensures we only update when the CPU was actually idle. Whether it's called
during newidle, idle, or busy balancing shouldn't matter.

Let me know if I am missing something.

Thanks,
Vineeth

>>
>> Also, can update_rq_avg_idle() be placed inside activate_task() to avoid
>> all these?
> 
> The same reason as above.
> 
> 
> Thanks
> 
> Huang Shijie
> 

Re: [PATCH] sched: update the rq->avg_idle when a task is moved to an idle CPU
Posted by Shijie Huang 5 days, 22 hours ago
On 25/11/2025 18:02, Madadi Vineeth Reddy wrote:
> IIUC, update_rq_avg_idle() already checks if (rq->idle_stamp) internally and
> in attach_task() we have rq available and the guard in update_rq_avg_idle()
> ensures we only update when the CPU was actually idle. Whether it's called
> during newidle, idle, or busy balancing shouldn't matter.

In the newidle function:

    sched_balance_newidle():

             ...............................

             this_rq->idle_stamp = rq_clock(this_rq);        // step 1

              sched_balance_rq() ---> ... --> attach_task()        // step 2

             if (pulled_task)

                     this_rq->idle_stamp = 0;                // step 3

              .............................................


          If update_rq_avg_idle() is called in attach_task() in "step 
2", it will mess up the "step 1".

The update_rq_avg_idle() should be called after "step 3".

Thanks

Huang Shijie