kernel/sched/fair: Fix to not require calculation for the weight nice0

[PATCH] kernel/sched/fair: Fix to not require calculation for the weight nice0

Posted by Li kunyu 1 week, 3 days ago

Typically, the default priority for client tasks is nice0, and reducing
the conversion of virtual runtime to real time for nice0 tasks can
significantly reduce unnecessary computations.

Signed-off-by: Li kunyu <likunyu10@163.com>
---
 kernel/sched/fair.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 69361c63353a..74d1c77a8bcf 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7033,7 +7033,10 @@ static void hrtick_start_fair(struct rq *rq, struct task_struct *p)
 			resched_curr(rq);
 		return;
 	}
-	delta = (se->load.weight * vdelta) / NICE_0_LOAD;
+	if (unlikely(se->load.weight != NICE_0_LOAD))
+		delta = (se->load.weight * vdelta) / NICE_0_LOAD;
+	else
+		delta = vdelta;
 
 	/*
 	 * Correct for instantaneous load of other classes.
-- 
2.47.3

Re: [PATCH] kernel/sched/fair: Fix to not require calculation for the weight nice0

Posted by Hongyan Xia 1 week, 3 days ago

On 5/29/2026 10:34 AM, Li kunyu wrote:
> Typically, the default priority for client tasks is nice0, and reducing
> the conversion of virtual runtime to real time for nice0 tasks can
> significantly reduce unnecessary computations.
> 
> Signed-off-by: Li kunyu <likunyu10@163.com>
> ---
>   kernel/sched/fair.c | 5 ++++-
>   1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 69361c63353a..74d1c77a8bcf 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7033,7 +7033,10 @@ static void hrtick_start_fair(struct rq *rq, struct task_struct *p)
>   			resched_curr(rq);
>   		return;
>   	}
> -	delta = (se->load.weight * vdelta) / NICE_0_LOAD;
> +	if (unlikely(se->load.weight != NICE_0_LOAD))
> +		delta = (se->load.weight * vdelta) / NICE_0_LOAD;
> +	else
> +		delta = vdelta;
>   
>   	/*
>   	 * Correct for instantaneous load of other classes.

Given NICE_0_LOAD is a nice power-of-two which compiles down to just a 
bit shift, it seems interesting that you would find the multiplication 
to be 'significant unnecessary computations'. Do you have any data to 
support this?

Also, unless my eyes are failing, it looks like you had a patch just a 
couple of days ago at the exact same location

https://lore.kernel.org/all/20260527072113.359604-1-likunyu10@163.com/

in which Peter pointed out you might had a mistake. Is this somehow 
connected to that one?

Hongyan

Re: [PATCH] kernel/sched/fair: Fix to not require calculation for the weight nice0

Posted by Peter Zijlstra 1 week, 3 days ago

On Fri, May 29, 2026 at 07:37:07AM +0000, Hongyan Xia wrote:
> On 5/29/2026 10:34 AM, Li kunyu wrote:
> > Typically, the default priority for client tasks is nice0, and reducing
> > the conversion of virtual runtime to real time for nice0 tasks can
> > significantly reduce unnecessary computations.
> > 
> > Signed-off-by: Li kunyu <likunyu10@163.com>
> > ---
> >   kernel/sched/fair.c | 5 ++++-
> >   1 file changed, 4 insertions(+), 1 deletion(-)
> > 
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 69361c63353a..74d1c77a8bcf 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -7033,7 +7033,10 @@ static void hrtick_start_fair(struct rq *rq, struct task_struct *p)
> >   			resched_curr(rq);
> >   		return;
> >   	}
> > -	delta = (se->load.weight * vdelta) / NICE_0_LOAD;
> > +	if (unlikely(se->load.weight != NICE_0_LOAD))
> > +		delta = (se->load.weight * vdelta) / NICE_0_LOAD;
> > +	else
> > +		delta = vdelta;
> >   
> >   	/*
> >   	 * Correct for instantaneous load of other classes.
> 
> Given NICE_0_LOAD is a nice power-of-two which compiles down to just a 
> bit shift, it seems interesting that you would find the multiplication 
> to be 'significant unnecessary computations'. Do you have any data to 
> support this?

Notably, branches can be many times more expensive than a mult on modern
deeply pipelined machines. Divisions are a bit of a mixed bag, but mult
is generally dirt cheap.

According to Gemini we have something like so:

+---------------------------------------------------------------------------------------------------+
| ARCHITECTURE      | BRANCH MISPREDICT PENALTY  | 64-BIT INTEGER MULTIPLY | 64-BIT INTEGER DIVIDE  |
| (Modern Cores)    | (Clock Cycles)             | (Latency / Throughput)  | (Latency / Throughput) |
+-------------------+----------------------------+-------------------------+------------------------+

|                   |                            |                         |                        |
| Apple M-Series    | 16 to 20 cycles            | 3 to 4 cycles           | 7 to 9 cycles          |
| (M1 through M5)   |                            | 0.5 cycle thr. (2/clk)  | 2 cycles throughput    |
|                   |                            |                         |                        |
+-------------------+----------------------------+-------------------------+------------------------+

|                   |                            |                         |                        |
| Intel Core        | 14 to 15 cycles *          | 3 cycles                | 18 to 25 cycles        |
| (Panther / Arrow) |                            | 1 cycle thr. (1/clk)    | 10 to 15 cycles thr.   |
|                   |                            |                         |                        |
+-------------------+----------------------------+-------------------------+------------------------+

|                   |                            |                         |                        |
| AMD Zen           | 17 to 20 cycles            | 3 cycles                | 12 to 14 cycles        |
| (Zen 4 / Zen 5)   |                            | 1 cycle thr. (1/clk)    | 3 to 4 cycles thr.     |
|                   |                            |                         |                        |
+-------------------+----------------------------+-------------------------+------------------------+


So the branch in calc_delta_fair() might still be justified, esp. if it
is predicted well. But like Hongyan noted, not in this case.

Re: [PATCH] kernel/sched/fair: Fix to not require calculation for the weight nice0

Posted by Li kunyu 1 week, 3 days ago

Hello maintainer:
  I am very happy that you could reply to my email.

  The judgment statement in the previous patch was added for the nice0
judgment, while the modification of converting virtual running time to
actual time was due to carelessness.

  https://lore.kernel.org/all/20260527072113.359604-1-likunyu10@163.com/

  Regarding the newly submitted patch judgment statement, I have
verified it on multiple business machines and my own development
machine. Over half of the tasks belong to the nice0 priority level.

  Currently, I also believe that such a modification will bring some
benefits, and it maintains consistency with the conversion of actual
running time to virtual running time.

  thinks