Save the idle_stamp at the beginning of sched_balance_newidle(),
if it cannot pull any task, set it for rq->idle_stamp.
This patch does not change the logic of rq->idle_stamp.
Signed-off-by: Huang Shijie <shijie@os.amperecomputing.com>
---
kernel/sched/fair.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 769d7b7990df..c1a8fa043156 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -12862,6 +12862,7 @@ static int sched_balance_newidle(struct rq *this_rq, struct rq_flags *rf)
u64 t0, t1, curr_cost = 0;
struct sched_domain *sd;
int pulled_task = 0;
+ u64 idle_stamp;
update_misfit_status(NULL, this_rq);
@@ -12877,7 +12878,9 @@ static int sched_balance_newidle(struct rq *this_rq, struct rq_flags *rf)
* for CPU_NEWLY_IDLE, such that we measure the this duration
* as idle time.
*/
- this_rq->idle_stamp = rq_clock(this_rq);
+ idle_stamp = rq_clock(this_rq);
+
+ this_rq->idle_stamp = 0;
/*
* Do not pull tasks towards !active CPUs...
@@ -12989,10 +12992,11 @@ static int sched_balance_newidle(struct rq *this_rq, struct rq_flags *rf)
if (time_after(this_rq->next_balance, next_balance))
this_rq->next_balance = next_balance;
- if (pulled_task)
- this_rq->idle_stamp = 0;
- else
+ if (!pulled_task) {
+ /* Set it here on purpose. */
+ this_rq->idle_stamp = idle_stamp;
nohz_newidle_balance(this_rq);
+ }
rq_repin_lock(this_rq, rf);
--
2.40.1
Hi Huang,
On 28/11/25 13:24, Huang Shijie wrote:
> Save the idle_stamp at the beginning of sched_balance_newidle(),
> if it cannot pull any task, set it for rq->idle_stamp.
>
> This patch does not change the logic of rq->idle_stamp.
>
> Signed-off-by: Huang Shijie <shijie@os.amperecomputing.com>
> ---
> kernel/sched/fair.c | 12 ++++++++----
> 1 file changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 769d7b7990df..c1a8fa043156 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -12862,6 +12862,7 @@ static int sched_balance_newidle(struct rq *this_rq, struct rq_flags *rf)
> u64 t0, t1, curr_cost = 0;
> struct sched_domain *sd;
> int pulled_task = 0;
> + u64 idle_stamp;
>
> update_misfit_status(NULL, this_rq);
>
> @@ -12877,7 +12878,9 @@ static int sched_balance_newidle(struct rq *this_rq, struct rq_flags *rf)
> * for CPU_NEWLY_IDLE, such that we measure the this duration
> * as idle time.
> */
> - this_rq->idle_stamp = rq_clock(this_rq);
> + idle_stamp = rq_clock(this_rq);
> +
> + this_rq->idle_stamp = 0;
IIUC, by setting this_rq->idle_stamp = 0 at the beginning, any call to update_rq_avg_idle() during
load balancing when tasks are pulled will fail the if (rq->idle_stamp) check, preventing the average
idle time from being updated.
Thanks,
Vineeth
>
> /*
> * Do not pull tasks towards !active CPUs...
> @@ -12989,10 +12992,11 @@ static int sched_balance_newidle(struct rq *this_rq, struct rq_flags *rf)
> if (time_after(this_rq->next_balance, next_balance))
> this_rq->next_balance = next_balance;
>
> - if (pulled_task)
> - this_rq->idle_stamp = 0;
> - else
> + if (!pulled_task) {
> + /* Set it here on purpose. */
> + this_rq->idle_stamp = idle_stamp;
> nohz_newidle_balance(this_rq);
> + }
>
> rq_repin_lock(this_rq, rf);
>
On 28/11/2025 18:07, Madadi Vineeth Reddy wrote: >> */ >> - this_rq->idle_stamp = rq_clock(this_rq); >> + idle_stamp = rq_clock(this_rq); >> + >> + this_rq->idle_stamp = 0; > IIUC, by setting this_rq->idle_stamp = 0 at the beginning, any call to update_rq_avg_idle() during > load balancing when tasks are pulled will fail the if (rq->idle_stamp) check, preventing the average > idle time from being updated. 1.) For the newidle balance, it is okay to prevent to update the rq->avg_idle. 2.) For the idle balance, the this_rq->idle_stamp is not zero, and it can update the rq->avg_idle. Can the idle balance and newidle balance run at the same time? 3.) For the busy balance, the this_rq->idle_stamp should be zero, no need to update the rq->avg_idle. Thanks Huang Shijie
On 28/11/2025 20:54, Shijie Huang wrote: > On 28/11/2025 18:07, Madadi Vineeth Reddy wrote: >>> */ >>> - this_rq->idle_stamp = rq_clock(this_rq); >>> + idle_stamp = rq_clock(this_rq); >>> + >>> + this_rq->idle_stamp = 0; >> IIUC, by setting this_rq->idle_stamp = 0 at the beginning, any call >> to update_rq_avg_idle() during >> load balancing when tasks are pulled will fail the if >> (rq->idle_stamp) check, preventing the average >> idle time from being updated. > > 1.) For the newidle balance, it is okay to prevent to update the > rq->avg_idle. > > 2.) For the idle balance, the this_rq->idle_stamp is not zero, and it > can update the rq->avg_idle. > > Can the idle balance and newidle balance run at the same > time? I mean on the same CPU. > > > 3.) For the busy balance, the this_rq->idle_stamp should be zero, no > need to update the rq->avg_idle. >
On Fri, Nov 28, 2025 at 03:54:53PM +0800, Huang Shijie wrote: > Save the idle_stamp at the beginning of sched_balance_newidle(), > if it cannot pull any task, set it for rq->idle_stamp. > This changelog tells me what the patch does, something I can see from reading the patch itself. However, it completely fails at the purpose of a changelog, which is to elucidate the reader as to the reasons for doing so.
On 28/11/2025 17:31, Peter Zijlstra wrote: > This changelog tells me what the patch does, something I can see from > reading the patch itself. However, it completely fails at the purpose of > a changelog, which is to elucidate the reader as to the reasons for > doing so. How about to change the changelog as following: " In current newidle balance, the rq->idle_stamp may set to a non-zero value if it cannot pull any task. In the wakeup, it will detect the rq->idle_stamp, and updates the rq->avg_idle, then ends the CPU idle status by setting rq->idle_stamp to zero. Besides the wakeup, current code does not end the CPU idle status when a task is moved to the idle CPU, such as fork/clone, execve, or other cases. In order to fix this issue, we want to add a hook(update_rq_avg_idle()) in the enqueue_task(). With this hook, if a task is moved to the idle CPU, it will update the rq->avg_idle. Unfortunately, this hook is also called in the newidle balance: sched_balance_newidle() --> sched_balance_rq() --> ... --> enqueue_task() If we still set rq->idle_stamp at the beginning of sched_balance_newidle(), the rq->avg_idle will not be updated correctly. In order to make it work correctly, save the idle_stamp at the beginning of sched_balance_newidle(). If newidle balance cannot pull any task, set the saved value for rq->idle_stamp. With this method, the newidle balance still work correctly, and the hook in enqueue_task() also works correctly. " Thanks Huang Shijie
© 2016 - 2025 Red Hat, Inc.