sched/fair: set the se->vlag strictly following the paper

[PATCH] sched/fair: set the se->vlag strictly following the paper

Posted by Huang Shijie 7 months, 3 weeks ago

From the paper, the lag should follow the limit:
     -r_max < lag < max(r_max, q)

But current code makes the lag follow the limit:
     -max(r_max, q) < lag < max(r_max, q)

This patch introduces limit_hi/limit_lo/r_max, and
make the lag follow the paper strictly.

Signed-off-by: Huang Shijie <shijie@os.amperecomputing.com>
---
 kernel/sched/fair.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 83157de5b808..102d263df330 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -694,14 +694,17 @@ u64 avg_vruntime(struct cfs_rq *cfs_rq)
  */
 static void update_entity_lag(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
-	s64 vlag, limit;
+	s64 vlag, limit_hi, limit_lo, r_max;
 
 	WARN_ON_ONCE(!se->on_rq);
 
 	vlag = avg_vruntime(cfs_rq) - se->vruntime;
-	limit = calc_delta_fair(max_t(u64, 2*se->slice, TICK_NSEC), se);
+	r_max = 2 * se->slice;
 
-	se->vlag = clamp(vlag, -limit, limit);
+	limit_hi = calc_delta_fair(max_t(u64, r_max, TICK_NSEC), se);
+	limit_lo = calc_delta_fair(r_max, se);
+
+	se->vlag = clamp(vlag, -limit_lo, limit_hi);
 }
 
 /*
-- 
2.40.1

Re: [PATCH] sched/fair: set the se->vlag strictly following the paper

Posted by Vincent Guittot 7 months, 3 weeks ago

On Thu, 19 Jun 2025 at 05:20, Huang Shijie
<shijie@os.amperecomputing.com> wrote:
>
> From the paper, the lag should follow the limit:
>      -r_max < lag < max(r_max, q)
>
> But current code makes the lag follow the limit:
>      -max(r_max, q) < lag < max(r_max, q)
>
> This patch introduces limit_hi/limit_lo/r_max, and
> make the lag follow the paper strictly.

We don't strictly follow the paper. Typically, paper assumes that a
task will not run more than its slice r before deciding which task is
the next to run. But this is not our case as we must wait for a sched
event like the tick before picking next  task which can be longer than
the slice r

Side note, we don't have a fix definition of the quantum q which is
something between 0 and a tick (and even more currently with run to
parity) as we wait for the next the tick to pick another task

This means that a task can run a full tick period even if its slice is
shorter than the tick period

>
> Signed-off-by: Huang Shijie <shijie@os.amperecomputing.com>
> ---
>  kernel/sched/fair.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 83157de5b808..102d263df330 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -694,14 +694,17 @@ u64 avg_vruntime(struct cfs_rq *cfs_rq)
>   */
>  static void update_entity_lag(struct cfs_rq *cfs_rq, struct sched_entity *se)
>  {
> -       s64 vlag, limit;
> +       s64 vlag, limit_hi, limit_lo, r_max;
>
>         WARN_ON_ONCE(!se->on_rq);
>
>         vlag = avg_vruntime(cfs_rq) - se->vruntime;
> -       limit = calc_delta_fair(max_t(u64, 2*se->slice, TICK_NSEC), se);
> +       r_max = 2 * se->slice;
>
> -       se->vlag = clamp(vlag, -limit, limit);
> +       limit_hi = calc_delta_fair(max_t(u64, r_max, TICK_NSEC), se);
> +       limit_lo = calc_delta_fair(r_max, se);
> +
> +       se->vlag = clamp(vlag, -limit_lo, limit_hi);
>  }
>
>  /*
> --
> 2.40.1
>

Re: [PATCH] sched/fair: set the se->vlag strictly following the paper

Posted by Shijie Huang 7 months, 3 weeks ago

On 2025/6/19 21:53, Vincent Guittot wrote:
> On Thu, 19 Jun 2025 at 05:20, Huang Shijie
> <shijie@os.amperecomputing.com> wrote:
>>  From the paper, the lag should follow the limit:
>>       -r_max < lag < max(r_max, q)
>>
>> But current code makes the lag follow the limit:
>>       -max(r_max, q) < lag < max(r_max, q)
>>
>> This patch introduces limit_hi/limit_lo/r_max, and
>> make the lag follow the paper strictly.
> We don't strictly follow the paper. Typically, paper assumes that a
> task will not run more than its slice r before deciding which task is
> the next to run. But this is not our case as we must wait for a sched
> event like the tick before picking next  task which can be longer than
> the slice r
>
> Side note, we don't have a fix definition of the quantum q which is
> something between 0 and a tick (and even more currently with run to
> parity) as we wait for the next the tick to pick another task
>
> This means that a task can run a full tick period even if its slice is
> shorter than the tick period

Thanks for the explanations.

But if we enable the HRTICK, the task will run to match its slice.


Thanks

Huang Shijie

Re: [PATCH] sched/fair: set the se->vlag strictly following the paper

Posted by Vincent Guittot 7 months, 3 weeks ago

On Fri, 20 Jun 2025 at 05:01, Shijie Huang
<shijie@amperemail.onmicrosoft.com> wrote:
>
>
> On 2025/6/19 21:53, Vincent Guittot wrote:
> > On Thu, 19 Jun 2025 at 05:20, Huang Shijie
> > <shijie@os.amperecomputing.com> wrote:
> >>  From the paper, the lag should follow the limit:
> >>       -r_max < lag < max(r_max, q)
> >>
> >> But current code makes the lag follow the limit:
> >>       -max(r_max, q) < lag < max(r_max, q)
> >>
> >> This patch introduces limit_hi/limit_lo/r_max, and
> >> make the lag follow the paper strictly.
> > We don't strictly follow the paper. Typically, paper assumes that a
> > task will not run more than its slice r before deciding which task is
> > the next to run. But this is not our case as we must wait for a sched
> > event like the tick before picking next  task which can be longer than
> > the slice r
> >
> > Side note, we don't have a fix definition of the quantum q which is
> > something between 0 and a tick (and even more currently with run to
> > parity) as we wait for the next the tick to pick another task
> >
> > This means that a task can run a full tick period even if its slice is
> > shorter than the tick period
>
> Thanks for the explanations.
>
> But if we enable the HRTICK, the task will run to match its slice.

Yes, but I'm curious to see the impact of irq time accounting on this
as the time will not be fully accounted as the slice

>
>
> Thanks
>
> Huang Shijie
>