kernel/sched/fair.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
Typically, the default priority for client tasks is nice0, and reducing
the conversion of virtual runtime to real time for nice0 tasks can
significantly reduce unnecessary computations.
Signed-off-by: Li kunyu <likunyu10@163.com>
---
kernel/sched/fair.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 69361c63353a..74d1c77a8bcf 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7033,7 +7033,10 @@ static void hrtick_start_fair(struct rq *rq, struct task_struct *p)
resched_curr(rq);
return;
}
- delta = (se->load.weight * vdelta) / NICE_0_LOAD;
+ if (unlikely(se->load.weight != NICE_0_LOAD))
+ delta = (se->load.weight * vdelta) / NICE_0_LOAD;
+ else
+ delta = vdelta;
/*
* Correct for instantaneous load of other classes.
--
2.47.3
On 5/29/2026 10:34 AM, Li kunyu wrote: > Typically, the default priority for client tasks is nice0, and reducing > the conversion of virtual runtime to real time for nice0 tasks can > significantly reduce unnecessary computations. > > Signed-off-by: Li kunyu <likunyu10@163.com> > --- > kernel/sched/fair.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 69361c63353a..74d1c77a8bcf 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -7033,7 +7033,10 @@ static void hrtick_start_fair(struct rq *rq, struct task_struct *p) > resched_curr(rq); > return; > } > - delta = (se->load.weight * vdelta) / NICE_0_LOAD; > + if (unlikely(se->load.weight != NICE_0_LOAD)) > + delta = (se->load.weight * vdelta) / NICE_0_LOAD; > + else > + delta = vdelta; > > /* > * Correct for instantaneous load of other classes. Given NICE_0_LOAD is a nice power-of-two which compiles down to just a bit shift, it seems interesting that you would find the multiplication to be 'significant unnecessary computations'. Do you have any data to support this? Also, unless my eyes are failing, it looks like you had a patch just a couple of days ago at the exact same location https://lore.kernel.org/all/20260527072113.359604-1-likunyu10@163.com/ in which Peter pointed out you might had a mistake. Is this somehow connected to that one? Hongyan
On Fri, May 29, 2026 at 07:37:07AM +0000, Hongyan Xia wrote: > On 5/29/2026 10:34 AM, Li kunyu wrote: > > Typically, the default priority for client tasks is nice0, and reducing > > the conversion of virtual runtime to real time for nice0 tasks can > > significantly reduce unnecessary computations. > > > > Signed-off-by: Li kunyu <likunyu10@163.com> > > --- > > kernel/sched/fair.c | 5 ++++- > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index 69361c63353a..74d1c77a8bcf 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -7033,7 +7033,10 @@ static void hrtick_start_fair(struct rq *rq, struct task_struct *p) > > resched_curr(rq); > > return; > > } > > - delta = (se->load.weight * vdelta) / NICE_0_LOAD; > > + if (unlikely(se->load.weight != NICE_0_LOAD)) > > + delta = (se->load.weight * vdelta) / NICE_0_LOAD; > > + else > > + delta = vdelta; > > > > /* > > * Correct for instantaneous load of other classes. > > Given NICE_0_LOAD is a nice power-of-two which compiles down to just a > bit shift, it seems interesting that you would find the multiplication > to be 'significant unnecessary computations'. Do you have any data to > support this? Notably, branches can be many times more expensive than a mult on modern deeply pipelined machines. Divisions are a bit of a mixed bag, but mult is generally dirt cheap. According to Gemini we have something like so: +---------------------------------------------------------------------------------------------------+ | ARCHITECTURE | BRANCH MISPREDICT PENALTY | 64-BIT INTEGER MULTIPLY | 64-BIT INTEGER DIVIDE | | (Modern Cores) | (Clock Cycles) | (Latency / Throughput) | (Latency / Throughput) | +-------------------+----------------------------+-------------------------+------------------------+ | | | | | | Apple M-Series | 16 to 20 cycles | 3 to 4 cycles | 7 to 9 cycles | | (M1 through M5) | | 0.5 cycle thr. (2/clk) | 2 cycles throughput | | | | | | +-------------------+----------------------------+-------------------------+------------------------+ | | | | | | Intel Core | 14 to 15 cycles * | 3 cycles | 18 to 25 cycles | | (Panther / Arrow) | | 1 cycle thr. (1/clk) | 10 to 15 cycles thr. | | | | | | +-------------------+----------------------------+-------------------------+------------------------+ | | | | | | AMD Zen | 17 to 20 cycles | 3 cycles | 12 to 14 cycles | | (Zen 4 / Zen 5) | | 1 cycle thr. (1/clk) | 3 to 4 cycles thr. | | | | | | +-------------------+----------------------------+-------------------------+------------------------+ So the branch in calc_delta_fair() might still be justified, esp. if it is predicted well. But like Hongyan noted, not in this case.
Hello maintainer: I am very happy that you could reply to my email. The judgment statement in the previous patch was added for the nice0 judgment, while the modification of converting virtual running time to actual time was due to carelessness. https://lore.kernel.org/all/20260527072113.359604-1-likunyu10@163.com/ Regarding the newly submitted patch judgment statement, I have verified it on multiple business machines and my own development machine. Over half of the tasks belong to the nice0 priority level. Currently, I also believe that such a modification will bring some benefits, and it maintains consistency with the conversion of actual running time to virtual running time. thinks
© 2016 - 2026 Red Hat, Inc.