[PATCH v6 1/2] sched/fair: set rq->idle_stamp at the end of the sched_balance_newidle

Huang Shijie posted 2 patches 1 week ago
[PATCH v6 1/2] sched/fair: set rq->idle_stamp at the end of the sched_balance_newidle
Posted by Huang Shijie 1 week ago
In current newidle balance, the rq->idle_stamp may set to a non-zero value
if it cannot pull any task.

In the wakeup, it will detect the rq->idle_stamp, and updates
the rq->avg_idle, then ends the CPU idle status by setting rq->idle_stamp
to zero.

Besides the wakeup, current code does not end the CPU idle status
when a task is moved to the idle CPU, such as fork/clone, execve,
or other cases.

In order to fix this issue, we want to add a hook(update_rq_avg_idle())
in the enqueue_task(). With this hook, if a task is moved to the idle CPU,
it will update the rq->avg_idle. Unfortunately, this hook is also called
in the newidle balance:
   sched_balance_newidle() --> sched_balance_rq() --> ... --> enqueue_task()

If we still set rq->idle_stamp at the beginning of sched_balance_newidle(),
the rq->avg_idle will not be updated correctly.

In order to make it work correctly, save the idle_stamp at the beginning
of sched_balance_newidle(). If newidle balance cannot pull any task,
set the saved value for rq->idle_stamp. With this method,
the newidle balance still work correctly, and the hook in enqueue_task()
also works correctly.

Signed-off-by: Huang Shijie <shijie@os.amperecomputing.com>
---
 kernel/sched/fair.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1855975b8248..c3b4895f8e50 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -12865,6 +12865,7 @@ static int sched_balance_newidle(struct rq *this_rq, struct rq_flags *rf)
 	u64 t0, t1, curr_cost = 0;
 	struct sched_domain *sd;
 	int pulled_task = 0;
+	u64 idle_stamp;
 
 	update_misfit_status(NULL, this_rq);
 
@@ -12880,7 +12881,7 @@ static int sched_balance_newidle(struct rq *this_rq, struct rq_flags *rf)
 	 * for CPU_NEWLY_IDLE, such that we measure the this duration
 	 * as idle time.
 	 */
-	this_rq->idle_stamp = rq_clock(this_rq);
+	idle_stamp = rq_clock(this_rq);
 
 	/*
 	 * Do not pull tasks towards !active CPUs...
@@ -12992,10 +12993,13 @@ static int sched_balance_newidle(struct rq *this_rq, struct rq_flags *rf)
 	if (time_after(this_rq->next_balance, next_balance))
 		this_rq->next_balance = next_balance;
 
-	if (pulled_task)
+	if (pulled_task) {
 		this_rq->idle_stamp = 0;
-	else
+	} else {
+		/* Set it here on purpose. */
+		this_rq->idle_stamp = idle_stamp;
 		nohz_newidle_balance(this_rq);
+	}
 
 	rq_repin_lock(this_rq, rf);
 
-- 
2.40.1