[PATCH v4 1/2] sched/fair: set rq->idle_stamp at the end of the sched_balance_newidle

Huang Shijie posted 2 patches 3 days, 13 hours ago
[PATCH v4 1/2] sched/fair: set rq->idle_stamp at the end of the sched_balance_newidle
Posted by Huang Shijie 3 days, 13 hours ago
Save the idle_stamp at the beginning of sched_balance_newidle(),
if it cannot pull any task, set it for rq->idle_stamp.

This patch does not change the logic of rq->idle_stamp.

Signed-off-by: Huang Shijie <shijie@os.amperecomputing.com>
---
 kernel/sched/fair.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 769d7b7990df..c1a8fa043156 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -12862,6 +12862,7 @@ static int sched_balance_newidle(struct rq *this_rq, struct rq_flags *rf)
 	u64 t0, t1, curr_cost = 0;
 	struct sched_domain *sd;
 	int pulled_task = 0;
+	u64 idle_stamp;
 
 	update_misfit_status(NULL, this_rq);
 
@@ -12877,7 +12878,9 @@ static int sched_balance_newidle(struct rq *this_rq, struct rq_flags *rf)
 	 * for CPU_NEWLY_IDLE, such that we measure the this duration
 	 * as idle time.
 	 */
-	this_rq->idle_stamp = rq_clock(this_rq);
+	idle_stamp = rq_clock(this_rq);
+
+	this_rq->idle_stamp = 0;
 
 	/*
 	 * Do not pull tasks towards !active CPUs...
@@ -12989,10 +12992,11 @@ static int sched_balance_newidle(struct rq *this_rq, struct rq_flags *rf)
 	if (time_after(this_rq->next_balance, next_balance))
 		this_rq->next_balance = next_balance;
 
-	if (pulled_task)
-		this_rq->idle_stamp = 0;
-	else
+	if (!pulled_task) {
+		/* Set it here on purpose. */
+		this_rq->idle_stamp = idle_stamp;
 		nohz_newidle_balance(this_rq);
+	}
 
 	rq_repin_lock(this_rq, rf);
 
-- 
2.40.1
Re: [PATCH v4 1/2] sched/fair: set rq->idle_stamp at the end of the sched_balance_newidle
Posted by Madadi Vineeth Reddy 3 days, 11 hours ago
Hi Huang,

On 28/11/25 13:24, Huang Shijie wrote:
> Save the idle_stamp at the beginning of sched_balance_newidle(),
> if it cannot pull any task, set it for rq->idle_stamp.
> 
> This patch does not change the logic of rq->idle_stamp.
> 
> Signed-off-by: Huang Shijie <shijie@os.amperecomputing.com>
> ---
>  kernel/sched/fair.c | 12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 769d7b7990df..c1a8fa043156 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -12862,6 +12862,7 @@ static int sched_balance_newidle(struct rq *this_rq, struct rq_flags *rf)
>  	u64 t0, t1, curr_cost = 0;
>  	struct sched_domain *sd;
>  	int pulled_task = 0;
> +	u64 idle_stamp;
>  
>  	update_misfit_status(NULL, this_rq);
>  
> @@ -12877,7 +12878,9 @@ static int sched_balance_newidle(struct rq *this_rq, struct rq_flags *rf)
>  	 * for CPU_NEWLY_IDLE, such that we measure the this duration
>  	 * as idle time.
>  	 */
> -	this_rq->idle_stamp = rq_clock(this_rq);
> +	idle_stamp = rq_clock(this_rq);
> +
> +	this_rq->idle_stamp = 0;

IIUC, by setting this_rq->idle_stamp = 0 at the beginning, any call to update_rq_avg_idle() during
load balancing when tasks are pulled will fail the if (rq->idle_stamp) check, preventing the average
idle time from being updated.

Thanks,
Vineeth

>  
>  	/*
>  	 * Do not pull tasks towards !active CPUs...
> @@ -12989,10 +12992,11 @@ static int sched_balance_newidle(struct rq *this_rq, struct rq_flags *rf)
>  	if (time_after(this_rq->next_balance, next_balance))
>  		this_rq->next_balance = next_balance;
>  
> -	if (pulled_task)
> -		this_rq->idle_stamp = 0;
> -	else
> +	if (!pulled_task) {
> +		/* Set it here on purpose. */
> +		this_rq->idle_stamp = idle_stamp;
>  		nohz_newidle_balance(this_rq);
> +	}
>  
>  	rq_repin_lock(this_rq, rf);
>
Re: [PATCH v4 1/2] sched/fair: set rq->idle_stamp at the end of the sched_balance_newidle
Posted by Shijie Huang 3 days, 8 hours ago
On 28/11/2025 18:07, Madadi Vineeth Reddy wrote:
>>   	 */
>> -	this_rq->idle_stamp = rq_clock(this_rq);
>> +	idle_stamp = rq_clock(this_rq);
>> +
>> +	this_rq->idle_stamp = 0;
> IIUC, by setting this_rq->idle_stamp = 0 at the beginning, any call to update_rq_avg_idle() during
> load balancing when tasks are pulled will fail the if (rq->idle_stamp) check, preventing the average
> idle time from being updated.

  1.) For the newidle balance, it is okay to prevent to update the 
rq->avg_idle.

  2.) For the idle balance, the this_rq->idle_stamp is not zero, and it 
can update the rq->avg_idle.

             Can the idle balance and newidle balance run at the same time?


  3.) For the busy balance, the  this_rq->idle_stamp should be zero, no 
need to update the rq->avg_idle.


Thanks

Huang Shijie



Re: [PATCH v4 1/2] sched/fair: set rq->idle_stamp at the end of the sched_balance_newidle
Posted by Shijie Huang 3 days, 8 hours ago
On 28/11/2025 20:54, Shijie Huang wrote:
> On 28/11/2025 18:07, Madadi Vineeth Reddy wrote:
>>>        */
>>> -    this_rq->idle_stamp = rq_clock(this_rq);
>>> +    idle_stamp = rq_clock(this_rq);
>>> +
>>> +    this_rq->idle_stamp = 0;
>> IIUC, by setting this_rq->idle_stamp = 0 at the beginning, any call 
>> to update_rq_avg_idle() during
>> load balancing when tasks are pulled will fail the if 
>> (rq->idle_stamp) check, preventing the average
>> idle time from being updated.
>
>  1.) For the newidle balance, it is okay to prevent to update the 
> rq->avg_idle.
>
>  2.) For the idle balance, the this_rq->idle_stamp is not zero, and it 
> can update the rq->avg_idle.
>
>             Can the idle balance and newidle balance run at the same 
> time?

I mean on the same CPU.


>
>
>  3.) For the busy balance, the  this_rq->idle_stamp should be zero, no 
> need to update the rq->avg_idle.
>
Re: [PATCH v4 1/2] sched/fair: set rq->idle_stamp at the end of the sched_balance_newidle
Posted by Peter Zijlstra 3 days, 11 hours ago
On Fri, Nov 28, 2025 at 03:54:53PM +0800, Huang Shijie wrote:
> Save the idle_stamp at the beginning of sched_balance_newidle(),
> if it cannot pull any task, set it for rq->idle_stamp.
> 

This changelog tells me what the patch does, something I can see from
reading the patch itself. However, it completely fails at the purpose of
a changelog, which is to elucidate the reader as to the reasons for
doing so.
Re: [PATCH v4 1/2] sched/fair: set rq->idle_stamp at the end of the sched_balance_newidle
Posted by Shijie Huang 18 hours ago
On 28/11/2025 17:31, Peter Zijlstra wrote:
> This changelog tells me what the patch does, something I can see from
> reading the patch itself. However, it completely fails at the purpose of
> a changelog, which is to elucidate the reader as to the reasons for
> doing so.

How about to change the changelog as following:

"

In current newidle balance, the rq->idle_stamp may set to a non-zero value
if it cannot pull any task.

In the wakeup, it will detect the rq->idle_stamp, and updates
the rq->avg_idle, then ends the CPU idle status by setting rq->idle_stamp
to zero.

Besides the wakeup, current code does not end the CPU idle status
when a task is moved to the idle CPU, such as fork/clone, execve,
or other cases.

In order to fix this issue, we want to add a hook(update_rq_avg_idle())
in the enqueue_task(). With this hook, if a task is moved to the idle CPU,
it will update the rq->avg_idle. Unfortunately, this hook is also called
in the newidle balance:
    sched_balance_newidle() --> sched_balance_rq() --> ... --> 
enqueue_task()

If we still set rq->idle_stamp at the beginning of sched_balance_newidle(),
the rq->avg_idle will not be updated correctly.

In order to make it work correctly, save the idle_stamp at the beginning
of sched_balance_newidle(). If newidle balance cannot pull any task,
set the saved value for rq->idle_stamp. With this method,
the newidle balance still work correctly, and the hook in enqueue_task()
also works correctly.

"

Thanks

Huang Shijie