Utilization invariance can cause big delays. When tasks are running,
accumulate non-invairiant version of utilization to help tasks to settle
down to their new util_avg values faster.
Keep track of delta_exec during runnable across activations to help
update util_est for a long running task accurately. util_est shoudl
still behave the same at enqueue/dequeue.
Before this patch the a busy task tamping up would experience the
following transitions, running on M1 Mac Mini
rampup-6338 util_avg running
┌─────────────────────────────────────────────────────────────────────────┐
986.0┤ ▄▄▄▄▄▟▀▀▀▀│
│ ▗▄▄▟▀▀▀▘ │
│ ▗▄▟▀▀ │
│ ▄▟▀▀ │
739.5┤ ▄▟▀▘ │
│ ▗▄▛▘ │
│ ▗▟▀ │
493.0┤ ▗▛▀ │
│ ▗▄▛▀ │
│ ▄▟▀ │
│ ▄▛▘ │
246.5┤ ▗▟▀▘ │
│ ▄▟▀▀ │
│ ▗▄▄▛▘ │
│ ▗▄▄▄▟▀ │
0.0┤ ▗ ▗▄▄▟▀▀ │
└┬───────┬───────┬───────┬───────┬───────┬───────┬───────┬───────┬───────┬┘
1.700 1.733 1.767 1.800 1.833 1.867 1.900 1.933 1.967 2.000
───────────────── rampup-6338 util_avg running residency (ms) ──────────────────
0.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 5.5
15.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 7.9
36.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 8.0
57.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 8.0
78.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 7.9
98.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 5.0
117.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 5.0
137.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 5.0
156.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 4.0
176.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 3.0
191.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 4.0
211.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 4.0
230.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 3.0
248.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 3.0
266.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0
277.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 3.0
294.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.6
311.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.4
327.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0
340.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 3.0
358.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0
371.0 ▇▇▇▇▇▇▇▇▇ 1.0
377.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0
389.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0
401.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0
413.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 3.0
431.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0
442.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0
456.0 ▇▇▇▇▇▇▇▇▇ 1.0
───────────────────────── Sum Time Running on CPU (ms) ─────────────────────────
CPU0.0 ▇▇▇▇▇ 90.39
CPU4.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 1156.93
6338 rampup CPU0.0 Frequency
┌──────────────────────────────────────────────────────────────────────────┐
2.06┤ ▛▀▀ │
│ ▌ │
│ ▌ │
│ ▌ │
1.70┤ ▛▀▀▘ │
│ ▌ │
│ ▌ │
1.33┤ ▗▄▄▄▌ │
│ ▐ │
│ ▐ │
│ ▐ │
0.97┤ ▗▄▄▄▟ │
│ ▐ │
│ ▐ │
│ ▐ │
0.60┤ ▗ ▗▄▄▄▄▄▄▄▄▟ │
└┬───────┬───────┬───────┬───────┬────────┬───────┬───────┬───────┬───────┬┘
1.700 1.733 1.767 1.800 1.833 1.867 1.900 1.933 1.967 2.000
6338 rampup CPU4.0 Frequency
┌──────────────────────────────────────────────────────────────────────────┐
3.20┤ ▐▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀│
│ ▐ │
│ ▛▀▀ │
│ ▌ │
2.78┤ ▐▀▀▘ │
│ ▗▄▟ │
│ ▌ │
2.35┤ ▗▄▄▌ │
│ ▐ │
│ ▄▄▟ │
│ ▌ │
1.93┤ ▗▄▄▌ │
│ ▐ │
│ ▐ │
│ ▐ │
1.50┤ ▗▄▄▟ │
└┬───────┬───────┬───────┬───────┬────────┬───────┬───────┬───────┬───────┬┘
1.700 1.733 1.767 1.800 1.833 1.867 1.900 1.933 1.967 2.000
───────────────── 6338 rampup CPU0.0 Frequency residency (ms) ──────────────────
0.6 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 37.300000000000004
0.972 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 15.0
1.332 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 15.0
1.704 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 11.0
2.064 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 12.1
───────────────── 6338 rampup CPU4.0 Frequency residency (ms) ──────────────────
1.5 ▇▇▇▇▇▇▇▇▇▇ 11.9
1.956 ▇▇▇▇▇▇▇▇ 10.0
2.184 ▇▇▇▇▇▇▇▇ 10.0
2.388 ▇▇▇▇▇▇▇▇▇ 11.0
2.592 ▇▇▇▇▇▇▇▇ 10.0
2.772 ▇▇▇▇▇▇▇▇ 10.0
2.988 ▇▇▇▇▇▇▇▇ 10.0
3.204 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 85.3
After the patch the response is improved to rampup frequencies faster
and migrate from little quicker
rampup-2234 util_avg running
┌───────────────────────────────────────────────────────────────────────────┐
984┤ ▗▄▄▄▄▄▛▀▀▀▀│
│ ▄▄▟▀▀▀▀ │
│ ▄▄▟▀▀ │
│ ▄▟▀▘ │
738┤ ▄▟▀▘ │
│ ▗▟▀▘ │
│ ▗▟▀ │
492┤ ▗▟▀ │
│ ▗▟▀ │
│ ▟▀ │
│ ▄▛▘ │
246┤ ▗▟▘ │
│ ▗▟▀ │
│ ▗▟▀ │
│ ▗▟▀ │
0┤ ▄▄▄▛▀ │
└┬───────┬───────┬────────┬───────┬───────┬───────┬────────┬───────┬───────┬┘
1.700 1.733 1.767 1.800 1.833 1.867 1.900 1.933 1.967 2.000
───────────────── rampup-2234 util_avg running residency (ms) ──────────────────
0.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 5.6000000000000005
15.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 8.0
39.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 5.0
61.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 4.0
85.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0
99.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 3.0
120.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 3.0
144.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0
160.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0
176.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0
192.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0
210.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0
228.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0
246.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0
263.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0
282.0 ▇▇▇▇▇▇▇ 1.0
291.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0
309.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0
327.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0
344.0 ▇▇▇▇▇▇▇ 1.0
354.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0
373.0 ▇▇▇▇▇▇▇ 1.0
382.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0
400.0 ▇▇▇▇▇▇▇ 1.0
408.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0
425.0 ▇▇▇▇▇▇▇ 1.0
434.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 2.0
452.0 ▇▇▇▇▇▇▇ 1.0
2234 rampup CPU1.0 Frequency
┌──────────────────────────────────────────────────────────────────────────┐
2.06┤ ▐▀ │
│ ▐ │
│ ▐ │
│ ▐ │
1.70┤ ▛▀ │
│ ▌ │
│ ▌ │
1.33┤ ▄▌ │
│ ▌ │
│ ▌ │
│ ▌ │
0.97┤ ▗▄▌ │
│ ▐ │
│ ▐ │
│ ▐ │
0.60┤ ▗▄▄▟ │
└┬───────┬───────┬───────┬───────┬────────┬───────┬───────┬───────┬───────┬┘
1.700 1.733 1.767 1.800 1.833 1.867 1.900 1.933 1.967 2.000
2234 rampup CPU4.0 Frequency
┌──────────────────────────────────────────────────────────────────────────┐
3.10┤ ▐▀▀▀▀▀▀▀▀▀▀▀▀▀│
│ ▛▀▀▀▀▀▀▀▀▀▀▀ │
│ ▌ │
│ ▐▀▀▀▀▘ │
2.70┤ ▐ │
│ ▐▀▀▀▀ │
│ ▐ │
2.30┤ ▛▀▀ │
│ ▌ │
│ ▐▀▀▘ │
│ ▐ │
1.90┤ ▐▀▀ │
│ ▐ │
│ ▗▄▟ │
│ ▐ │
1.50┤ ▗▟ │
└┬───────┬───────┬───────┬───────┬────────┬───────┬───────┬───────┬───────┬┘
1.700 1.733 1.767 1.800 1.833 1.867 1.900 1.933 1.967 2.000
───────────────────────── Sum Time Running on CPU (ms) ─────────────────────────
CPU1.0 ▇▇▇▇ 32.53
CPU4.0 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 540.3
───────────────── 2234 rampup CPU1.0 Frequency residency (ms) ──────────────────
0.6 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 12.1
0.972 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 6.5
1.332 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 3.7
1.704 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 5.5
2.064 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 4.8
───────────────── 2234 rampup CPU4.0 Frequency residency (ms) ──────────────────
1.5 ▇▇▇▇▇ 4.0
1.728 ▇▇▇▇▇▇▇▇▇▇ 8.0
1.956 ▇▇▇▇▇▇▇▇▇▇▇▇ 9.0
2.184 ▇▇▇▇▇▇▇▇▇▇▇▇ 9.0
2.388 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 11.0
2.592 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 16.0
2.772 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 18.0
2.988 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 47.0
3.096 ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 53.4
Signed-off-by: Qais Yousef <qyousef@layalina.io>
---
include/linux/sched.h | 1 +
kernel/sched/core.c | 1 +
kernel/sched/fair.c | 43 +++++++++++++++++++++++++++++++------------
3 files changed, 33 insertions(+), 12 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 90691d99027e..8db8f4085d84 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -544,6 +544,7 @@ struct sched_entity {
unsigned int on_rq;
u64 exec_start;
+ u64 delta_exec;
u64 sum_exec_runtime;
u64 prev_sum_exec_runtime;
u64 vruntime;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7099e40cc8bd..e2b4b87ec2b7 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4318,6 +4318,7 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p)
p->se.on_rq = 0;
p->se.exec_start = 0;
+ p->se.delta_exec = 0;
p->se.sum_exec_runtime = 0;
p->se.prev_sum_exec_runtime = 0;
p->se.nr_migrations = 0;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e5e986af18dc..a6421e4032c0 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1118,6 +1118,7 @@ static s64 update_curr_se(struct rq *rq, struct sched_entity *curr)
curr->exec_start = now;
curr->sum_exec_runtime += delta_exec;
+ curr->delta_exec = delta_exec;
if (schedstat_enabled()) {
struct sched_statistics *stats;
@@ -1126,7 +1127,6 @@ static s64 update_curr_se(struct rq *rq, struct sched_entity *curr)
__schedstat_set(stats->exec_max,
max(delta_exec, stats->exec_max));
}
-
return delta_exec;
}
@@ -4890,16 +4890,20 @@ static inline void util_est_update(struct cfs_rq *cfs_rq,
if (!sched_feat(UTIL_EST))
return;
- /*
- * Skip update of task's estimated utilization when the task has not
- * yet completed an activation, e.g. being migrated.
- */
- if (!task_sleep)
- return;
-
/* Get current estimate of utilization */
ewma = READ_ONCE(p->se.avg.util_est);
+ /*
+ * If a task is running, update util_est ignoring utilization
+ * invariance so that if the task suddenly becomes busy we will rampup
+ * quickly to settle down to our new util_avg.
+ */
+ if (!task_sleep) {
+ ewma &= ~UTIL_AVG_UNCHANGED;
+ ewma = approximate_util_avg(ewma, p->se.delta_exec / 1000);
+ goto done;
+ }
+
/*
* If the PELT values haven't changed since enqueue time,
* skip the util_est update.
@@ -4968,6 +4972,14 @@ static inline void util_est_update(struct cfs_rq *cfs_rq,
trace_sched_util_est_se_tp(&p->se);
}
+static inline void util_est_update_running(struct cfs_rq *cfs_rq,
+ struct task_struct *p)
+{
+ util_est_dequeue(cfs_rq, p);
+ util_est_update(cfs_rq, p, false);
+ util_est_enqueue(cfs_rq, p);
+}
+
static inline unsigned long get_actual_cpu_capacity(int cpu)
{
unsigned long capacity = arch_scale_cpu_capacity(cpu);
@@ -5164,13 +5176,13 @@ static inline int sched_balance_newidle(struct rq *rq, struct rq_flags *rf)
static inline void
util_est_enqueue(struct cfs_rq *cfs_rq, struct task_struct *p) {}
-
static inline void
util_est_dequeue(struct cfs_rq *cfs_rq, struct task_struct *p) {}
-
static inline void
-util_est_update(struct cfs_rq *cfs_rq, struct task_struct *p,
- bool task_sleep) {}
+util_est_update(struct cfs_rq *cfs_rq, struct task_struct *p, bool task_sleep) {}
+static inline void
+util_est_update_running(struct cfs_rq *cfs_rq, struct task_struct *p) {}
+
static inline void update_misfit_status(struct task_struct *p, struct rq *rq) {}
#endif /* CONFIG_SMP */
@@ -6906,6 +6918,8 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
rq->next_balance = jiffies;
dequeue_throttle:
+ if (task_sleep)
+ p->se.delta_exec = 0;
util_est_update(&rq->cfs, p, task_sleep);
hrtick_update(rq);
}
@@ -8546,6 +8560,9 @@ pick_next_task_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf
set_next_entity(cfs_rq, se);
}
+ if (prev->on_rq)
+ util_est_update_running(&rq->cfs, prev);
+
goto done;
simple:
#endif
@@ -12710,6 +12727,8 @@ static void task_tick_fair(struct rq *rq, struct task_struct *curr, int queued)
entity_tick(cfs_rq, se, queued);
}
+ util_est_update_running(&rq->cfs, curr);
+
if (static_branch_unlikely(&sched_numa_balancing))
task_tick_numa(rq, curr);
--
2.34.1
On 8/20/24 17:35, Qais Yousef wrote: > Utilization invariance can cause big delays. When tasks are running, > accumulate non-invairiant version of utilization to help tasks to settle > down to their new util_avg values faster. > > Keep track of delta_exec during runnable across activations to help > update util_est for a long running task accurately. util_est shoudl > still behave the same at enqueue/dequeue. For periodic tasks that have longer slices (~tick) this overestimates util_est by a lot. AFAICS this also breaks util_est for co-scheduling tasks of different slice lengths. I'm testing with HZ=1000, but should work for any. On a RK3399, all pinned to one big. Having task A be 10ms period with 20% util (running for 2ms when scheduled) and tasks B+ with 1ms period and 1% util. I guess 9/16 tries to work around that somewhat but without any leeway that doesn't work. Even rt-app tasks will vary slightly in their util_est values: Task A only: mainline: A: util_avg: 192 util_est: 204 9/16 sched/fair: util_est: Take into account periodic tasks A: util_avg: 185 util_est: 423 8 tasks: mainline: A: util_avg: 229 util_est: 229 The rest util_avg: 12 util_est: 24 9/16 sched/fair: util_est: Take into account periodic tasks A: util_avg: 242 util_est: 643 The rest: util_avg: 12 util_est: 50
On 20/08/2024 18:35, Qais Yousef wrote:
> Utilization invariance can cause big delays. When tasks are running,
> accumulate non-invairiant version of utilization to help tasks to settle
> down to their new util_avg values faster.
>
> Keep track of delta_exec during runnable across activations to help
> update util_est for a long running task accurately. util_est shoudl
> still behave the same at enqueue/dequeue.
>
> Before this patch the a busy task tamping up would experience the
> following transitions, running on M1 Mac Mini
[...]
> @@ -4890,16 +4890,20 @@ static inline void util_est_update(struct cfs_rq *cfs_rq,
> if (!sched_feat(UTIL_EST))
> return;
>
> - /*
> - * Skip update of task's estimated utilization when the task has not
> - * yet completed an activation, e.g. being migrated.
> - */
> - if (!task_sleep)
> - return;
> -
> /* Get current estimate of utilization */
> ewma = READ_ONCE(p->se.avg.util_est);
>
> + /*
> + * If a task is running, update util_est ignoring utilization
> + * invariance so that if the task suddenly becomes busy we will rampup
> + * quickly to settle down to our new util_avg.
> + */
> + if (!task_sleep) {
> + ewma &= ~UTIL_AVG_UNCHANGED;
> + ewma = approximate_util_avg(ewma, p->se.delta_exec / 1000);
> + goto done;
> + }
> +
Can you not use the UTIL_EST_FASTER idea for that? I mean speed up
ramp-up on little CPUs for truly ramp-up tasks to fight the influence of
invariant util_avg->util_est here.
https://lkml.kernel.org/r/Y2kLA8x40IiBEPYg@hirez.programming.kicks-ass.net
I do understand that runnable_avg boosting wont help here since we're
not fighting contention.
It uses the sum of all activations since wake-up so it should be faster
than just using the last activation.
It uses existing infrastructure: __accumulate_pelt_segments()
If you use it inside task- and/or cpu-util function, you don't need to
make util_est state handling more complicated (distinguish periodic and
ramp-up task, including PATCH 09/16).
From your workload analysis, do you have examples of Android tasks which
are clearly ramp-up tasks and maybe also affine to the little CPUs
(thanks to Android BACKGROUND group) which would require this correction
of the invariant util_avg->util_est signals?
[...]
© 2016 - 2026 Red Hat, Inc.