In the newidle balance, the rq->idle_stamp may set to a non-zero value
if it cannot pull any task.
In the wakeup, it will detect the rq->idle_stamp, and updates
the rq->avg_idle, then ends the CPU idle status by setting rq->idle_stamp
to zero.
Besides the wakeup, current code does not end the CPU idle status
when a task is moved to the idle CPU, such as fork/clone, execve,
or other cases.
This patch introduces a helper: update_rq_avg_idle().
And uses it in enqueue_task(), so it will update the rq->avg_idle
when a task is moved to an idle CPU at:
-- wakeup
-- fork/clone
-- execve
-- idle balance
-- delayed dequeue task
-- other cases
Signed-off-by: Huang Shijie <shijie@os.amperecomputing.com>
---
kernel/sched/core.c | 36 ++++++++++++++++++++++++------------
1 file changed, 24 insertions(+), 12 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0c4ff93eeb78..8531ef68ce76 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2078,8 +2078,25 @@ unsigned long get_wchan(struct task_struct *p)
return ip;
}
+static void update_rq_avg_idle(struct rq *rq)
+{
+ if (rq->idle_stamp) {
+ u64 delta = rq_clock(rq) - rq->idle_stamp;
+ u64 max = 2*rq->max_idle_balance_cost;
+
+ update_avg(&rq->avg_idle, delta);
+
+ if (rq->avg_idle > max)
+ rq->avg_idle = max;
+
+ rq->idle_stamp = 0;
+ }
+}
+
void enqueue_task(struct rq *rq, struct task_struct *p, int flags)
{
+ int delayed = p->se.sched_delayed;
+
if (!(flags & ENQUEUE_NOCLOCK))
update_rq_clock(rq);
@@ -2100,6 +2117,13 @@ void enqueue_task(struct rq *rq, struct task_struct *p, int flags)
if (sched_core_enabled(rq))
sched_core_enqueue(rq, p);
+
+ if (delayed) {
+ if (entity_eligible(cfs_rq_of(&p->se), &p->se))
+ update_rq_avg_idle(rq);
+ } else {
+ update_rq_avg_idle(rq);
+ }
}
/*
@@ -3645,18 +3669,6 @@ ttwu_do_activate(struct rq *rq, struct task_struct *p, int wake_flags,
p->sched_class->task_woken(rq, p);
rq_repin_lock(rq, rf);
}
-
- if (rq->idle_stamp) {
- u64 delta = rq_clock(rq) - rq->idle_stamp;
- u64 max = 2*rq->max_idle_balance_cost;
-
- update_avg(&rq->avg_idle, delta);
-
- if (rq->avg_idle > max)
- rq->avg_idle = max;
-
- rq->idle_stamp = 0;
- }
}
/*
--
2.40.1
Hello Huang Shijie,
On 11/27/2025 2:44 PM, Huang Shijie wrote:
> void enqueue_task(struct rq *rq, struct task_struct *p, int flags)
> {
> + int delayed = p->se.sched_delayed;
> +
> if (!(flags & ENQUEUE_NOCLOCK))
> update_rq_clock(rq);
>
> @@ -2100,6 +2117,13 @@ void enqueue_task(struct rq *rq, struct task_struct *p, int flags)
>
> if (sched_core_enabled(rq))
> sched_core_enqueue(rq, p);
> +
> + if (delayed) {
> + if (entity_eligible(cfs_rq_of(&p->se), &p->se))
> + update_rq_avg_idle(rq);
Question: Why do we want to treat the delayed case like this?
If entity is not eligible, we want to consider that it hasn't
even gone through a wakeup? Wouldn't this lead to the next
wakeup seeing rq->idle_stamp to be non-zero and inaccurately
account more idle time?
Also if we've done newidle balance and the rq->idle_stamp is
set, we cannot have delayed tasks since pick_next_task() would
have dequeued all delayed tasks before reaching newidle
balance.
Just doing a update_rq_avg_idle() unconditionally should be
fine.
> + } else {
> + update_rq_avg_idle(rq);
> + }
> }
>
> /*
--
Thanks and Regards,
Prateek
On 27/11/2025 18:12, K Prateek Nayak wrote: > Also if we've done newidle balance and the rq->idle_stamp is > set, we cannot have delayed tasks since pick_next_task() would > have dequeued all delayed tasks before reaching newidle > balance. Yes, you are right. > Just doing a update_rq_avg_idle() unconditionally should be > fine. okay. Thanks Huang Shijie
© 2016 - 2025 Red Hat, Inc.