kernel/sched/fair.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
If a task yields, the scheduler may decide to pick it again. The task in
turn may decide to yield immediately or shortly after, leading to a tight
loop of yields.
If there's another runnable task as this point, the deadline will be
increased by the slice at each loop. This can cause the deadline to runaway
pretty quickly, and subsequent elevated run delays later on as the task
doesn't get picked again. The reason the scheduler can pick the same task
again and again despite its deadline increasing is because it may be the
only eligible task at that point.
Fix this by updating the deadline only to one slice ahead.
Note, we might want to consider iterating on the implementation of yield as
follow up:
* the yielding task could be forfeiting its remaining slice by
incrementing its vruntime correspondingly
* in case of yield_to the yielding task could be donating its remaining
slice to the target task
Signed-off-by: Fernand Sieber <sieberf@amazon.com>
---
kernel/sched/fair.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e43993a4e580..c1eff68d8ffc 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9024,7 +9024,7 @@ static void yield_task_fair(struct rq *rq)
*/
rq_clock_skip_update(rq);
- se->deadline += calc_delta_fair(se->slice, se);
+ se->deadline = se->vruntime + calc_delta_fair(se->slice, se);
}
static bool yield_to_task_fair(struct rq *rq, struct task_struct *p)
--
2.47.1
Amazon Development Centre (South Africa) (Proprietary) Limited
29 Gogosoa Street, Observatory, Cape Town, Western Cape, 7925, South Africa
Registration Number: 2004 / 034463 / 07
On 01.04.25 14:36, Fernand Sieber wrote:
> If a task yields, the scheduler may decide to pick it again. The task in
> turn may decide to yield immediately or shortly after, leading to a tight
> loop of yields.
>
> If there's another runnable task as this point, the deadline will be
> increased by the slice at each loop. This can cause the deadline to runaway
> pretty quickly, and subsequent elevated run delays later on as the task
> doesn't get picked again. The reason the scheduler can pick the same task
> again and again despite its deadline increasing is because it may be the
> only eligible task at that point.
>
> Fix this by updating the deadline only to one slice ahead.
>
> Note, we might want to consider iterating on the implementation of yield as
> follow up:
> * the yielding task could be forfeiting its remaining slice by
> incrementing its vruntime correspondingly
> * in case of yield_to the yielding task could be donating its remaining
> slice to the target task
>
> Signed-off-by: Fernand Sieber <sieberf@amazon.com>
IMHO it's worth noting that this is not a theoretical issue. We have
seen this in real life: A KVM virtual machine's vCPU which runs into a
busy guest spin lock calls kvm_vcpu_yield_to() which eventually ends up
in the yield_task_fair() function. We have seen such spin locks due to
guest contention rather than host overcommit, which means we go into a
loop of vCPU execution and spin loop exit, which results in an
undesirable increase in the vCPU thread's deadline.
Given this impacts real workloads and is a bug present since the
introduction of EEVDF, I would say it warrants a
Fixes: 147f3efaa24182 ("sched/fair: Implement an EEVDF-like scheduling
policy")
tag.
Alex
Picking up this dead thread again. Has this patch been applied to the mainline or other branch? Thanks.
Hi Tao, The patch hasn't been merged yet. I have resent this patch with fixes tag and additional maintainers in Cc. Please review on the following thread: Link: https://lore.kernel.org/lkml/20250911095113.203439-1-sieberf@amazon.com Thanks, Fernand Amazon Development Centre (South Africa) (Proprietary) Limited 29 Gogosoa Street, Observatory, Cape Town, Western Cape, 7925, South Africa Registration Number: 2004 / 034463 / 07
>> On 01/04/25 18:06, Fernand Sieber wrote:
>> If a task yields, the scheduler may decide to pick it again. The task in
>> turn may decide to yield immediately or shortly after, leading to a tight
>> loop of yields.
>>
>> If there's another runnable task as this point, the deadline will be
>> increased by the slice at each loop. This can cause the deadline to runaway
>> pretty quickly, and subsequent elevated run delays later on as the task
>> doesn't get picked again. The reason the scheduler can pick the same task
>> again and again despite its deadline increasing is because it may be the
>> only eligible task at that point.
>>
>> Fix this by updating the deadline only to one slice ahead.
>>
>> Note, we might want to consider iterating on the implementation of yield as
>> follow up:
>> * the yielding task could be forfeiting its remaining slice by
>> incrementing its vruntime correspondingly
>> * in case of yield_to the yielding task could be donating its remaining
>> slice to the target task
>>
>> Signed-off-by: Fernand Sieber <sieberf@amazon.com>
>IMHO it's worth noting that this is not a theoretical issue. We have
>seen this in real life: A KVM virtual machine's vCPU which runs into a
>busy guest spin lock calls kvm_vcpu_yield_to() which eventually ends up
>in the yield_task_fair() function. We have seen such spin locks due to
>guest contention rather than host overcommit, which means we go into a
>loop of vCPU execution and spin loop exit, which results in an
>undesirable increase in the vCPU thread's deadline.
>Given this impacts real workloads and is a bug present since the
>introduction of EEVDF, I would say it warrants a
>Fixes: 147f3efaa24182 ("sched/fair: Implement an EEVDF-like scheduling
>policy")
>tag.
>Alex
Actually, as Alex described, we encountered the same issue in this
testing scenario: starting qemu, binding cores to the cpuset group,
setting cpuset.cpus=1-3 for stress testing in qemu,
running taskset -c 1-3 ./stress-ng -c 20, and then encountering an error where qemu freezes,
reporting a soft lockup issue in qemu. After applying this patch, the problem was resolved.
Do we have plans to merge this patch into the mainline?
Hi Fernand, On 01/04/25 18:06, Fernand Sieber wrote: > If a task yields, the scheduler may decide to pick it again. The task in > turn may decide to yield immediately or shortly after, leading to a tight > loop of yields. > > If there's another runnable task as this point, the deadline will be > increased by the slice at each loop. This can cause the deadline to runaway > pretty quickly, and subsequent elevated run delays later on as the task > doesn't get picked again. The reason the scheduler can pick the same task > again and again despite its deadline increasing is because it may be the > only eligible task at that point. > > Fix this by updating the deadline only to one slice ahead. > > Note, we might want to consider iterating on the implementation of yield as > follow up: > * the yielding task could be forfeiting its remaining slice by > incrementing its vruntime correspondingly > * in case of yield_to the yielding task could be donating its remaining > slice to the target task > > Signed-off-by: Fernand Sieber <sieberf@amazon.com> > --- > kernel/sched/fair.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index e43993a4e580..c1eff68d8ffc 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -9024,7 +9024,7 @@ static void yield_task_fair(struct rq *rq) > */ > rq_clock_skip_update(rq); > > - se->deadline += calc_delta_fair(se->slice, se); > + se->deadline = se->vruntime + calc_delta_fair(se->slice, se); > } Makes sense. Deadline would be rapidly incremented in the scenario of having other tasks not eligible and the current task calls yield repeatedly for a very short run time. Reviewed-by: Madadi Vineeth Reddy <vineethr@linux.ibm.com> Thanks, Madadi Vineeth Reddy > > static bool yield_to_task_fair(struct rq *rq, struct task_struct *p) > -- > 2.47.1 > > > > > Amazon Development Centre (South Africa) (Proprietary) Limited > 29 Gogosoa Street, Observatory, Cape Town, Western Cape, 7925, South Africa > Registration Number: 2004 / 034463 / 07 >
© 2016 - 2026 Red Hat, Inc.