kernel/sched/fair.c | 16 ++++++++++++---- kernel/sched/sched.h | 1 + 2 files changed, 13 insertions(+), 4 deletions(-)
V is the weighted average of entities. Adding tasks with positive lag or
removing tasks with negative lag may cause V to move backward. This will
result in unfair task scheduling, causing previously eligible tasks to
become ineligible, shorter runtimes, and more task switches.
For example, when adding tasks a, x, and b, where a and b have zero lag
and x has positive lag, task b (added later) might be scheduled before
task a.
Making V move forward only resolves such issues and simplifies the code
for adding tasks with positive lag.
hackbench tests show that with this patch, execution time is significantly
reduced due to fewer task switches.
-------------------------------------------------
hackbench test base patch opt
-------------------------------------------------
process 1 group: 0.141 0.100 -29.3%
process 4 group: 0.375 0.295 -21.2%
process 16 group: 1.495 1.204 -19.5%
thread 1 group: 0.090 0.068 -25.1%
thread 4 group: 0.244 0.211 -13.4%
thread 16 group: 0.860 0.795 -7.6%
pipe process 1 group: 0.124 0.090 -27.8%
pipe process 4 group: 0.340 0.289 -15.2%
pipe process 16 group: 1.401 1.144 -18.3%
pipe thread 1 group: 0.081 0.071 -11.7%
pipe thread 4 group: 0.241 0.181 -24.7%
pipe thread 16 group: 0.787 0.706 -10.2%
Signed-off-by: wangtao <tao.wangtao@honor.com>
---
kernel/sched/fair.c | 16 ++++++++++++----
kernel/sched/sched.h | 1 +
2 files changed, 13 insertions(+), 4 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 5b752324270b..889ee8d4c9bd 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -671,7 +671,11 @@ u64 avg_vruntime(struct cfs_rq *cfs_rq)
avg = div_s64(avg, load);
}
- return cfs_rq->min_vruntime + avg;
+ avg += cfs_rq->min_vruntime;
+ if ((s64)(cfs_rq->forward_avg_vruntime - avg) < 0)
+ cfs_rq->forward_avg_vruntime = avg;
+
+ return cfs_rq->forward_avg_vruntime;
}
/*
@@ -725,6 +729,9 @@ static int vruntime_eligible(struct cfs_rq *cfs_rq, u64 vruntime)
s64 avg = cfs_rq->avg_vruntime;
long load = cfs_rq->avg_load;
+ if ((s64)(cfs_rq->forward_avg_vruntime - vruntime) >= 0)
+ return 1;
+
if (curr && curr->on_rq) {
unsigned long weight = scale_load_down(curr->load.weight);
@@ -5139,12 +5146,13 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
*
* EEVDF: placement strategy #1 / #2
*/
- if (sched_feat(PLACE_LAG) && cfs_rq->nr_queued && se->vlag) {
+ if (sched_feat(PLACE_LAG) && cfs_rq->nr_queued && se->vlag)
+ lag = se->vlag;
+ /* positive lag does not evaporate with forward_avg_vruntime */
+ if (lag < 0) {
struct sched_entity *curr = cfs_rq->curr;
unsigned long load;
- lag = se->vlag;
-
/*
* If we want to place a task and preserve lag, we have to
* consider the effect of the new entity on the weighted
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index adfb6e3409d7..2691d5e8a0ab 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -681,6 +681,7 @@ struct cfs_rq {
s64 avg_vruntime;
u64 avg_load;
+ u64 forward_avg_vruntime;
u64 min_vruntime;
#ifdef CONFIG_SCHED_CORE
--
2.17.1
On Fri, Nov 28, 2025 at 04:11:18PM +0800, wangtao wrote: > V is the weighted average of entities. Adding tasks with positive lag or > removing tasks with negative lag may cause V to move backward. This will > result in unfair task scheduling, Have you actually read the paper? Why do you think this breaks fairness? > causing previously eligible tasks to become ineligible, shorter > runtimes, and more task switches. None of that is a fairness issue. Those are issues related to when, rather than how much time is given. > Making V move forward only resolves such issues and simplifies the code > for adding tasks with positive lag. It breaks a metric ton of math. Which you don't provide updates for. Yes, the paper is light on dynamic behaviour, but please don't disregard the math like this. Either stay inside the constraints laid out, or provide coherent alternatives. Notably EEVDF is in the same class of scheduling functions as WF2Q and both provide better lag bounds than the simpler WFQ class of schedulers. The 'zero-lag point is the weighted average of the entities' is a fairly core tenet of EEVDF. Mucking with this *will* mess with the lag bounds. The delayed dequeue feature tries to address some of these concerns by keeping non-eligible (negative lag) tasks on the runqueue until such time that they become eligible (approximated by getting picked again) at which point they get removed (and any positive lag gets truncated, as if they were removed at zero-lag). As a consequence you will have much less removal of negative lag, additionally such tasks will be eligible the moment they come back. Also, there is the small matter that your patch simply does not apply.
© 2016 - 2025 Red Hat, Inc.