[v1] sched/fair: Manage lag and run to parity with different slices

[PATCH 3/4] sched/fair: Limit run to parity to the min slice of enqueued entities

Posted by Vincent Guittot 3 months, 4 weeks ago

Run to parity ensures that current will get a chance to run its full
slice in one go but this can create large latency for entity with shorter
slice that has alreasy exausted its slice and wait to run the next one.

Clamp the run to parity duration to the shortest slice of all enqueued
entities.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 479b38dc307a..d8345219dfd4 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -917,23 +917,32 @@ struct sched_entity *__pick_first_entity(struct cfs_rq *cfs_rq)
 }
 
 /*
- * HACK, stash a copy of deadline at the point of pick in vlag,
- * which isn't used until dequeue.
+ * HACK, Set the vruntime, up to which the entity can run before picking
+ * another one, in vlag, which isn't used until dequeue.
+ * In case of run to parity, we use the shortest slice of the enqueued
+ * entities.
  */
 static inline void set_protect_slice(struct sched_entity *se)
 {
-	se->vlag = se->deadline;
+	u64 min_slice;
+
+	min_slice = cfs_rq_min_slice(cfs_rq_of(se));
+
+	if (min_slice != se->slice)
+		se->vlag = min(se->deadline, se->vruntime + calc_delta_fair(min_slice, se));
+	else
+		se->vlag = se->deadline;
 }
 
 static inline bool protect_slice(struct sched_entity *se)
 {
-	return se->vlag == se->deadline;
+	return ((s64)(se->vlag - se->vruntime) > 0);
 }
 
 static inline void cancel_protect_slice(struct sched_entity *se)
 {
 	if (protect_slice(se))
-		se->vlag = se->deadline + 1;
+		se->vlag = se->vruntime;
 }
 
 /*
-- 
2.43.0

Re: [PATCH 3/4] sched/fair: Limit run to parity to the min slice of enqueued entities

Posted by dhaval@gianis.ca 3 months, 4 weeks ago





On Friday, June 13th, 2025 at 7:16 AM, Vincent Guittot <vincent.guittot@linaro.org> wrote:

> 
> 
> Run to parity ensures that current will get a chance to run its full
> slice in one go but this can create large latency for entity with shorter
> slice that has alreasy exausted its slice and wait to run the next one.

"already exhausted"

> 
> Clamp the run to parity duration to the shortest slice of all enqueued
> entities.
> 
> Signed-off-by: Vincent Guittot vincent.guittot@linaro.org
> 
> ---
> kernel/sched/fair.c | 19 ++++++++++++++-----
> 1 file changed, 14 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 479b38dc307a..d8345219dfd4 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -917,23 +917,32 @@ struct sched_entity *__pick_first_entity(struct cfs_rq cfs_rq)
> }
> 
> /
> - * HACK, stash a copy of deadline at the point of pick in vlag,
> - * which isn't used until dequeue.
> + * HACK, Set the vruntime, up to which the entity can run before picking
> + * another one, in vlag, which isn't used until dequeue.
> + * In case of run to parity, we use the shortest slice of the enqueued
> + * entities.
> */

I am going to admit - I don't have a good intuitive sense on how this will affect the functionality. Maybe you can help me think of a test case to explicitly write out this assumption in behavior?

Dhaval

> static inline void set_protect_slice(struct sched_entity *se)
> {
> - se->vlag = se->deadline;
> 
> + u64 min_slice;
> +
> + min_slice = cfs_rq_min_slice(cfs_rq_of(se));
> +
> + if (min_slice != se->slice)
> 
> + se->vlag = min(se->deadline, se->vruntime + calc_delta_fair(min_slice, se));
> 
> + else
> + se->vlag = se->deadline;
> 
> }
> 
> static inline bool protect_slice(struct sched_entity *se)
> {
> - return se->vlag == se->deadline;
> 
> + return ((s64)(se->vlag - se->vruntime) > 0);
> 
> }
> 
> static inline void cancel_protect_slice(struct sched_entity *se)
> {
> if (protect_slice(se))
> - se->vlag = se->deadline + 1;
> 
> + se->vlag = se->vruntime;
> 
> }
> 
> /*
> --
> 2.43.0

Re: [PATCH 3/4] sched/fair: Limit run to parity to the min slice of enqueued entities

Posted by Vincent Guittot 3 months, 3 weeks ago

On Sat, 14 Jun 2025 at 00:53, <dhaval@gianis.ca> wrote:
>
>
>
>
>
>
> On Friday, June 13th, 2025 at 7:16 AM, Vincent Guittot <vincent.guittot@linaro.org> wrote:
>
> >
> >
> > Run to parity ensures that current will get a chance to run its full
> > slice in one go but this can create large latency for entity with shorter
> > slice that has alreasy exausted its slice and wait to run the next one.
>
> "already exhausted"
>
> >
> > Clamp the run to parity duration to the shortest slice of all enqueued
> > entities.
> >
> > Signed-off-by: Vincent Guittot vincent.guittot@linaro.org
> >
> > ---
> > kernel/sched/fair.c | 19 ++++++++++++++-----
> > 1 file changed, 14 insertions(+), 5 deletions(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 479b38dc307a..d8345219dfd4 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -917,23 +917,32 @@ struct sched_entity *__pick_first_entity(struct cfs_rq cfs_rq)
> > }
> >
> > /
> > - * HACK, stash a copy of deadline at the point of pick in vlag,
> > - * which isn't used until dequeue.
> > + * HACK, Set the vruntime, up to which the entity can run before picking
> > + * another one, in vlag, which isn't used until dequeue.
> > + * In case of run to parity, we use the shortest slice of the enqueued
> > + * entities.
> > */
>
> I am going to admit - I don't have a good intuitive sense on how this will affect the functionality. Maybe you can help me think of a test case to explicitly write out this assumption in behavior?

Run to parity minimizes the number of context switches to improve
throughput by letting an entity run its full slice before picking
another entity. When all entities have the same and default
sysctl_sched_base_slice, the latter can be assumed to also be the
quantum q (although this is not really true as the entity can be
preempted during its quantum in our case). In such case, we still
comply with the theorem:
        -rmax < lagk (d) < max(rmax ; q); rmax being the max slice
request of the task k

When entities have different slices duration, we will break this rule
which becomes
       -rmax < lagk (d) < max(max of r ; q); 'max of r' being the
maximum slice of all entities

In order to come back to the 1st version, we can't wait for the end of
the slice of the current task but align with shorter slice

When run to parity is disabled, we can face a similar problem because
we don't enforce a resched periodically. In this case (patch 5), we
use the 0.7ms value as the quantum q.

So I would say that checking -rmax < lagk (d) < max(rmax ; q); when a
task is dequeued should be a good test. We might need to use -(rmax +
tick period) < lagk (d) < max(rmax ; q) + tick period; because of the
way we trigger  resched


>
> Dhaval
>
> > static inline void set_protect_slice(struct sched_entity *se)
> > {
> > - se->vlag = se->deadline;
> >
> > + u64 min_slice;
> > +
> > + min_slice = cfs_rq_min_slice(cfs_rq_of(se));
> > +
> > + if (min_slice != se->slice)
> >
> > + se->vlag = min(se->deadline, se->vruntime + calc_delta_fair(min_slice, se));
> >
> > + else
> > + se->vlag = se->deadline;
> >
> > }
> >
> > static inline bool protect_slice(struct sched_entity *se)
> > {
> > - return se->vlag == se->deadline;
> >
> > + return ((s64)(se->vlag - se->vruntime) > 0);
> >
> > }
> >
> > static inline void cancel_protect_slice(struct sched_entity *se)
> > {
> > if (protect_slice(se))
> > - se->vlag = se->deadline + 1;
> >
> > + se->vlag = se->vruntime;
> >
> > }
> >
> > /*
> > --
> > 2.43.0
>
>