[PATCH] sched/fair: Clear rel_deadline when initializing forked entities

Zicheng Qu posted 1 patch 1 month, 3 weeks ago
There is a newer version of this series
kernel/sched/core.c | 1 +
1 file changed, 1 insertion(+)
[PATCH] sched/fair: Clear rel_deadline when initializing forked entities
Posted by Zicheng Qu 1 month, 3 weeks ago
A yield-triggered crash can happen when a newly forked sched_entity
enters the fair class with se->rel_deadline unexpectedly set.

The failing sequence is:

  1. A task is forked while se->rel_deadline is still set.
  2. __sched_fork() initializes vruntime, vlag and other sched_entity
     state, but does not clear rel_deadline.
  3. On the first enqueue, enqueue_entity() calls place_entity().
  4. Because se->rel_deadline is set, place_entity() treats se->deadline
     as a relative deadline and converts it to an absolute deadline by
     adding the current vruntime.
  5. However, the forked entity's deadline is not a valid inherited
     relative deadline for this new scheduling instance, so the conversion
     produces an abnormally large deadline.
  6. If the task later calls sched_yield(), yield_task_fair() advances
     se->vruntime to se->deadline.
  7. The inflated vruntime is then used by the following enqueue path,
     where the vruntime-derived key can overflow when multiplied by the
     entity weight.
  8. This corrupts cfs_rq->sum_w_vruntime, breaks EEVDF eligibility
     calculation, and can eventually make all entities appear ineligible.
     pick_next_entity() may then return NULL unexpectedly, leading to a
     later NULL dereference.

A captured trace shows the effect clearly. Before yield, the entity's
vruntime was around:

  9834017729983308

After yield_task_fair() executed:

  se->vruntime = se->deadline

the vruntime jumped to:

  19668035460670230

and the deadline was later advanced further to:

  19668035463470230

This shows that the deadline had already become abnormally large before
yield_task_fair() copied it into vruntime.

rel_deadline is only meaningful when se->deadline really carries a
relative deadline that still needs to be placed against vruntime. A
freshly forked sched_entity should not inherit or retain this state.
Clear se->rel_deadline in __sched_fork(), together with the other
sched_entity runtime state, so that the first enqueue does not interpret
the new entity's deadline as a stale relative deadline.

Fixes: 82e9d0456e06 ("sched/fair: Avoid re-setting virtual deadline on 'migrations'")
Analyzed-by: Hui Tang <tanghui20@huawei.com>
Analyzed-by: Zhang Qiao <zhangqiao22@huawei.com>
Signed-off-by: Zicheng Qu <quzicheng@huawei.com>
---
 kernel/sched/core.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index da20fb6ea25a..b8871449d3c6 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4458,6 +4458,7 @@ static void __sched_fork(u64 clone_flags, struct task_struct *p)
 	p->se.nr_migrations		= 0;
 	p->se.vruntime			= 0;
 	p->se.vlag			= 0;
+	p->se.rel_deadline		= 0;
 	INIT_LIST_HEAD(&p->se.group_node);
 
 	/* A delayed task cannot be in clone(). */
-- 
2.34.1
Re: [PATCH] sched/fair: Clear rel_deadline when initializing forked entities
Posted by K Prateek Nayak 1 month, 3 weeks ago
Hello Zicheng,

On 4/24/2026 12:41 PM, Zicheng Qu wrote:
> A yield-triggered crash can happen when a newly forked sched_entity
> enters the fair class with se->rel_deadline unexpectedly set.
> 
> The failing sequence is:
> 
>   1. A task is forked while se->rel_deadline is still set.

I think a bit more information on how this is happens would be nice:

  "rel_deadline" is meant to be an internal indicator to during
  reweight and migration but a reweight of parent from a remote
  CPU can race with a fork() where the child inherits the
  "rel_deadline" during copy_process() since fork() does not grab
  the pi_lock of parent.

On a side note, should we grab the pi_lock when inheriting the
sched attributes of the parent?

I don't think it is strictly necessary since we reconstruct the state
but it does seem racy to my eyes against a setscheduler on parent
unless I'm missing something.

>   2. __sched_fork() initializes vruntime, vlag and other sched_entity
>      state, but does not clear rel_deadline.
>   3. On the first enqueue, enqueue_entity() calls place_entity().
>   4. Because se->rel_deadline is set, place_entity() treats se->deadline
>      as a relative deadline and converts it to an absolute deadline by
>      adding the current vruntime.
>   5. However, the forked entity's deadline is not a valid inherited
>      relative deadline for this new scheduling instance, so the conversion
>      produces an abnormally large deadline.
>   6. If the task later calls sched_yield(), yield_task_fair() advances
>      se->vruntime to se->deadline.
>   7. The inflated vruntime is then used by the following enqueue path,
>      where the vruntime-derived key can overflow when multiplied by the
>      entity weight.
>   8. This corrupts cfs_rq->sum_w_vruntime, breaks EEVDF eligibility
>      calculation, and can eventually make all entities appear ineligible.
>      pick_next_entity() may then return NULL unexpectedly, leading to a
>      later NULL dereference.
> 
> A captured trace shows the effect clearly. Before yield, the entity's
> vruntime was around:
> 
>   9834017729983308
> 
> After yield_task_fair() executed:
> 
>   se->vruntime = se->deadline
> 
> the vruntime jumped to:
> 
>   19668035460670230
> 
> and the deadline was later advanced further to:
> 
>   19668035463470230
> 
> This shows that the deadline had already become abnormally large before
> yield_task_fair() copied it into vruntime.

Although I can hit this very easily, I haven't yet been able to crash a
system from this, or see the vruntime drift apart when stressing - in my
case, the deadline seems to pretty tame for most part but that is
probably because I don't have the weights right and it is a probability
game.

Either ways, the fix does make sense to me.

> 
> rel_deadline is only meaningful when se->deadline really carries a
> relative deadline that still needs to be placed against vruntime. A
> freshly forked sched_entity should not inherit or retain this state.
> Clear se->rel_deadline in __sched_fork(), together with the other
> sched_entity runtime state, so that the first enqueue does not interpret
> the new entity's deadline as a stale relative deadline.
> 
> Fixes: 82e9d0456e06 ("sched/fair: Avoid re-setting virtual deadline on 'migrations'")
> Analyzed-by: Hui Tang <tanghui20@huawei.com>
> Analyzed-by: Zhang Qiao <zhangqiao22@huawei.com>
> Signed-off-by: Zicheng Qu <quzicheng@huawei.com>

Feel free to include:

Reviewed-by: K Prateek Nayak <kprateek.nayak@amd.com>

> ---
>  kernel/sched/core.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index da20fb6ea25a..b8871449d3c6 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4458,6 +4458,7 @@ static void __sched_fork(u64 clone_flags, struct task_struct *p)
>  	p->se.nr_migrations		= 0;
>  	p->se.vruntime			= 0;
>  	p->se.vlag			= 0;
> +	p->se.rel_deadline		= 0;
>  	INIT_LIST_HEAD(&p->se.group_node);
>  
>  	/* A delayed task cannot be in clone(). */

-- 
Thanks and Regards,
Prateek
[tip: sched/urgent] sched/fair: Clear rel_deadline when initializing forked entities
Posted by tip-bot2 for Zicheng Qu 1 month, 2 weeks ago
The following commit has been merged into the sched/urgent branch of tip:

Commit-ID:     3da56dc063cd77b9c0b40add930767fab4e389f3
Gitweb:        https://git.kernel.org/tip/3da56dc063cd77b9c0b40add930767fab4e389f3
Author:        Zicheng Qu <quzicheng@huawei.com>
AuthorDate:    Fri, 24 Apr 2026 07:11:13 
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 28 Apr 2026 09:19:54 +02:00

sched/fair: Clear rel_deadline when initializing forked entities

A yield-triggered crash can happen when a newly forked sched_entity
enters the fair class with se->rel_deadline unexpectedly set.

The failing sequence is:

  1. A task is forked while se->rel_deadline is still set.
  2. __sched_fork() initializes vruntime, vlag and other sched_entity
     state, but does not clear rel_deadline.
  3. On the first enqueue, enqueue_entity() calls place_entity().
  4. Because se->rel_deadline is set, place_entity() treats se->deadline
     as a relative deadline and converts it to an absolute deadline by
     adding the current vruntime.
  5. However, the forked entity's deadline is not a valid inherited
     relative deadline for this new scheduling instance, so the conversion
     produces an abnormally large deadline.
  6. If the task later calls sched_yield(), yield_task_fair() advances
     se->vruntime to se->deadline.
  7. The inflated vruntime is then used by the following enqueue path,
     where the vruntime-derived key can overflow when multiplied by the
     entity weight.
  8. This corrupts cfs_rq->sum_w_vruntime, breaks EEVDF eligibility
     calculation, and can eventually make all entities appear ineligible.
     pick_next_entity() may then return NULL unexpectedly, leading to a
     later NULL dereference.

A captured trace shows the effect clearly. Before yield, the entity's
vruntime was around:

  9834017729983308

After yield_task_fair() executed:

  se->vruntime = se->deadline

the vruntime jumped to:

  19668035460670230

and the deadline was later advanced further to:

  19668035463470230

This shows that the deadline had already become abnormally large before
yield_task_fair() copied it into vruntime.

rel_deadline is only meaningful when se->deadline really carries a
relative deadline that still needs to be placed against vruntime. A
freshly forked sched_entity should not inherit or retain this state.
Clear se->rel_deadline in __sched_fork(), together with the other
sched_entity runtime state, so that the first enqueue does not interpret
the new entity's deadline as a stale relative deadline.

Fixes: 82e9d0456e06 ("sched/fair: Avoid re-setting virtual deadline on 'migrations'")
Analyzed-by: Hui Tang <tanghui20@huawei.com>
Analyzed-by: Zhang Qiao <zhangqiao22@huawei.com>
Signed-off-by: Zicheng Qu <quzicheng@huawei.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260424071113.1199600-1-quzicheng@huawei.com
---
 kernel/sched/core.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index da20fb6..b887144 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4458,6 +4458,7 @@ static void __sched_fork(u64 clone_flags, struct task_struct *p)
 	p->se.nr_migrations		= 0;
 	p->se.vruntime			= 0;
 	p->se.vlag			= 0;
+	p->se.rel_deadline		= 0;
 	INIT_LIST_HEAD(&p->se.group_node);
 
 	/* A delayed task cannot be in clone(). */