kernel/sched/fair.c | 3 +++ 1 file changed, 3 insertions(+)
When a task is migrated out, there is a probability that the tg->load_avg
value will become abnormal. The reason is as follows.
1. Due to the 1ms update period limitation in update_tg_load_avg(), there
is a possibility that the reduced load_avg is not updated to tg->load_avg
when a task migrates out.
2. Even though __update_blocked_fair() traverses the leaf_cfs_rq_list and
calls update_tg_load_avg() for cfs_rqs that are not fully decayed, the key
function cfs_rq_is_decayed() does not check whether
cfs->tg_load_avg_contrib is null. Consequently, in some cases,
__update_blocked_fair() removes cfs_rqs whose avg.load_avg has not been
updated to tg->load_avg.
Add a check of cfs_rq->tg_load_avg_contrib in cfs_rq_is_decayed(),
which fixes the case (2.) mentioned above.
Fixes: 1528c661c24b ("sched/fair: Ratelimit update to tg->load_avg")
Tested-by: Aaron Lu <ziqianlu@bytedance.com>
Reviewed-by: Aaron Lu <ziqianlu@bytedance.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: xupengbo <xupengbo@oppo.com>
---
Changes:
v1 -> v2:
- Another option to fix the bug. Check cfs_rq->tg_load_avg_contrib in
cfs_rq_is_decayed() to avoid early removal from the leaf_cfs_rq_list.
- Link to v1 : https://lore.kernel.org/cgroups/20250804130326.57523-1-xupengbo@oppo.com/
v2 -> v3:
- Check if cfs_rq->tg_load_avg_contrib is 0 derectly.
- Link to v2 : https://lore.kernel.org/cgroups/20250805144121.14871-1-xupengbo@oppo.com/
v3 -> v4:
- Fix typo
- Link to v3 : https://lore.kernel.org/cgroups/20250826075743.19106-1-xupengbo@oppo.com/
v4 -> v5:
- Amend the commit message
- Link to v4 : https://lore.kernel.org/cgroups/20250826084854.25956-1-xupengbo@oppo.com/
After some preliminary discussion and analysis, I think it is feasible to
directly check if cfs_rq->tg_load_avg_contrib is 0 in cfs_rq_is_decay().
So patch v3 was submitted.
Please send emails to a different email address <xupengbo1029@163.com>
after September 3, 2025, after that date <xupengbo@oppo.com> will expire
for personal reasons.
Thanks,
Xu Pengbo
kernel/sched/fair.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index b173a059315c..81b7df87f1ce 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4062,6 +4062,9 @@ static inline bool cfs_rq_is_decayed(struct cfs_rq *cfs_rq)
if (child_cfs_rq_on_list(cfs_rq))
return false;
+ if (cfs_rq->tg_load_avg_contrib)
+ return false;
+
return true;
}
base-commit: fab1beda7597fac1cecc01707d55eadb6bbe773c
--
2.43.0
Hello,
On Wed, Aug 27, 2025 at 10:22:07AM +0800, xupengbo wrote:
> When a task is migrated out, there is a probability that the tg->load_avg
> value will become abnormal. The reason is as follows.
>
> 1. Due to the 1ms update period limitation in update_tg_load_avg(), there
> is a possibility that the reduced load_avg is not updated to tg->load_avg
> when a task migrates out.
> 2. Even though __update_blocked_fair() traverses the leaf_cfs_rq_list and
> calls update_tg_load_avg() for cfs_rqs that are not fully decayed, the key
> function cfs_rq_is_decayed() does not check whether
> cfs->tg_load_avg_contrib is null. Consequently, in some cases,
> __update_blocked_fair() removes cfs_rqs whose avg.load_avg has not been
> updated to tg->load_avg.
>
> Add a check of cfs_rq->tg_load_avg_contrib in cfs_rq_is_decayed(),
> which fixes the case (2.) mentioned above.
>
> Fixes: 1528c661c24b ("sched/fair: Ratelimit update to tg->load_avg")
> Tested-by: Aaron Lu <ziqianlu@bytedance.com>
> Reviewed-by: Aaron Lu <ziqianlu@bytedance.com>
> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
> Signed-off-by: xupengbo <xupengbo@oppo.com>
I wonder if there are any more concerns about this patch? If no, I hope
this fix can be merged. It's a rare case but it does happen for some
specific setup.
Sorry if this is a bad timing, but I just hit an oncall where this exact
problem occurred so I suppose it's worth a ping :)
Best regards,
Aaron
On Fri, Nov 28, 2025 at 07:54:45PM +0800, Aaron Lu wrote:
> Hello,
>
> On Wed, Aug 27, 2025 at 10:22:07AM +0800, xupengbo wrote:
> > When a task is migrated out, there is a probability that the tg->load_avg
> > value will become abnormal. The reason is as follows.
> >
> > 1. Due to the 1ms update period limitation in update_tg_load_avg(), there
> > is a possibility that the reduced load_avg is not updated to tg->load_avg
> > when a task migrates out.
> > 2. Even though __update_blocked_fair() traverses the leaf_cfs_rq_list and
> > calls update_tg_load_avg() for cfs_rqs that are not fully decayed, the key
> > function cfs_rq_is_decayed() does not check whether
> > cfs->tg_load_avg_contrib is null. Consequently, in some cases,
> > __update_blocked_fair() removes cfs_rqs whose avg.load_avg has not been
> > updated to tg->load_avg.
> >
> > Add a check of cfs_rq->tg_load_avg_contrib in cfs_rq_is_decayed(),
> > which fixes the case (2.) mentioned above.
> >
> > Fixes: 1528c661c24b ("sched/fair: Ratelimit update to tg->load_avg")
> > Tested-by: Aaron Lu <ziqianlu@bytedance.com>
> > Reviewed-by: Aaron Lu <ziqianlu@bytedance.com>
> > Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
> > Signed-off-by: xupengbo <xupengbo@oppo.com>
>
> I wonder if there are any more concerns about this patch? If no, I hope
> this fix can be merged. It's a rare case but it does happen for some
> specific setup.
>
> Sorry if this is a bad timing, but I just hit an oncall where this exact
> problem occurred so I suppose it's worth a ping :)
Totally missed it. Seems okay, let me go queue the thing.
On Fri, Nov 28, 2025 at 02:40:17PM +0100, Peter Zijlstra wrote:
> On Fri, Nov 28, 2025 at 07:54:45PM +0800, Aaron Lu wrote:
> > Hello,
> >
> > On Wed, Aug 27, 2025 at 10:22:07AM +0800, xupengbo wrote:
> > > When a task is migrated out, there is a probability that the tg->load_avg
> > > value will become abnormal. The reason is as follows.
> > >
> > > 1. Due to the 1ms update period limitation in update_tg_load_avg(), there
> > > is a possibility that the reduced load_avg is not updated to tg->load_avg
> > > when a task migrates out.
> > > 2. Even though __update_blocked_fair() traverses the leaf_cfs_rq_list and
> > > calls update_tg_load_avg() for cfs_rqs that are not fully decayed, the key
> > > function cfs_rq_is_decayed() does not check whether
> > > cfs->tg_load_avg_contrib is null. Consequently, in some cases,
> > > __update_blocked_fair() removes cfs_rqs whose avg.load_avg has not been
> > > updated to tg->load_avg.
> > >
> > > Add a check of cfs_rq->tg_load_avg_contrib in cfs_rq_is_decayed(),
> > > which fixes the case (2.) mentioned above.
> > >
> > > Fixes: 1528c661c24b ("sched/fair: Ratelimit update to tg->load_avg")
> > > Tested-by: Aaron Lu <ziqianlu@bytedance.com>
> > > Reviewed-by: Aaron Lu <ziqianlu@bytedance.com>
> > > Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
> > > Signed-off-by: xupengbo <xupengbo@oppo.com>
> >
> > I wonder if there are any more concerns about this patch? If no, I hope
> > this fix can be merged. It's a rare case but it does happen for some
> > specific setup.
> >
> > Sorry if this is a bad timing, but I just hit an oncall where this exact
> > problem occurred so I suppose it's worth a ping :)
>
> Totally missed it. Seems okay, let me go queue the thing.
Thanks Peter!
The following commit has been merged into the sched/urgent branch of tip:
Commit-ID: ca125231dd29fc0678dd3622e9cdea80a51dffe4
Gitweb: https://git.kernel.org/tip/ca125231dd29fc0678dd3622e9cdea80a51dffe4
Author: xupengbo <xupengbo@oppo.com>
AuthorDate: Wed, 27 Aug 2025 10:22:07 +08:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Sat, 06 Dec 2025 10:03:13 +01:00
sched/fair: Fix unfairness caused by stalled tg_load_avg_contrib when the last task migrates out
When a task is migrated out, there is a probability that the tg->load_avg
value will become abnormal. The reason is as follows:
1. Due to the 1ms update period limitation in update_tg_load_avg(), there
is a possibility that the reduced load_avg is not updated to tg->load_avg
when a task migrates out.
2. Even though __update_blocked_fair() traverses the leaf_cfs_rq_list and
calls update_tg_load_avg() for cfs_rqs that are not fully decayed, the key
function cfs_rq_is_decayed() does not check whether
cfs->tg_load_avg_contrib is null. Consequently, in some cases,
__update_blocked_fair() removes cfs_rqs whose avg.load_avg has not been
updated to tg->load_avg.
Add a check of cfs_rq->tg_load_avg_contrib in cfs_rq_is_decayed(),
which fixes the case (2.) mentioned above.
Fixes: 1528c661c24b ("sched/fair: Ratelimit update to tg->load_avg")
Signed-off-by: xupengbo <xupengbo@oppo.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Aaron Lu <ziqianlu@bytedance.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Tested-by: Aaron Lu <ziqianlu@bytedance.com>
Link: https://patch.msgid.link/20250827022208.14487-1-xupengbo@oppo.com
---
kernel/sched/fair.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 769d7b7..da46c31 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4034,6 +4034,9 @@ static inline bool cfs_rq_is_decayed(struct cfs_rq *cfs_rq)
if (child_cfs_rq_on_list(cfs_rq))
return false;
+ if (cfs_rq->tg_load_avg_contrib)
+ return false;
+
return true;
}
The following commit has been merged into the sched/urgent branch of tip:
Commit-ID: 3dc7ae575aa1a32971565d9aaf784e6050dae959
Gitweb: https://git.kernel.org/tip/3dc7ae575aa1a32971565d9aaf784e6050dae959
Author: xupengbo <xupengbo@oppo.com>
AuthorDate: Wed, 27 Aug 2025 10:22:07 +08:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Wed, 03 Dec 2025 19:26:22 +01:00
sched/fair: Fix unfairness caused by stalled tg_load_avg_contrib when the last task migrates out
When a task is migrated out, there is a probability that the tg->load_avg
value will become abnormal. The reason is as follows:
1. Due to the 1ms update period limitation in update_tg_load_avg(), there
is a possibility that the reduced load_avg is not updated to tg->load_avg
when a task migrates out.
2. Even though __update_blocked_fair() traverses the leaf_cfs_rq_list and
calls update_tg_load_avg() for cfs_rqs that are not fully decayed, the key
function cfs_rq_is_decayed() does not check whether
cfs->tg_load_avg_contrib is null. Consequently, in some cases,
__update_blocked_fair() removes cfs_rqs whose avg.load_avg has not been
updated to tg->load_avg.
Add a check of cfs_rq->tg_load_avg_contrib in cfs_rq_is_decayed(),
which fixes the case (2.) mentioned above.
Fixes: 1528c661c24b ("sched/fair: Ratelimit update to tg->load_avg")
Signed-off-by: xupengbo <xupengbo@oppo.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Aaron Lu <ziqianlu@bytedance.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Tested-by: Aaron Lu <ziqianlu@bytedance.com>
Link: https://patch.msgid.link/20250827022208.14487-1-xupengbo@oppo.com
---
kernel/sched/fair.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 00a32c9..a31d88e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4034,6 +4034,9 @@ static inline bool cfs_rq_is_decayed(struct cfs_rq *cfs_rq)
if (child_cfs_rq_on_list(cfs_rq))
return false;
+ if (cfs_rq->tg_load_avg_contrib)
+ return false;
+
return true;
}
The following commit has been merged into the sched/urgent branch of tip:
Commit-ID: 36c26a1f1f510b23ab81db176c90921305fae669
Gitweb: https://git.kernel.org/tip/36c26a1f1f510b23ab81db176c90921305fae669
Author: xupengbo <xupengbo@oppo.com>
AuthorDate: Wed, 27 Aug 2025 10:22:07 +08:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Tue, 02 Dec 2025 15:25:00 +01:00
sched/fair: Fix unfairness caused by stalled tg_load_avg_contrib when the last task migrates out
When a task is migrated out, there is a probability that the tg->load_avg
value will become abnormal. The reason is as follows:
1. Due to the 1ms update period limitation in update_tg_load_avg(), there
is a possibility that the reduced load_avg is not updated to tg->load_avg
when a task migrates out.
2. Even though __update_blocked_fair() traverses the leaf_cfs_rq_list and
calls update_tg_load_avg() for cfs_rqs that are not fully decayed, the key
function cfs_rq_is_decayed() does not check whether
cfs->tg_load_avg_contrib is null. Consequently, in some cases,
__update_blocked_fair() removes cfs_rqs whose avg.load_avg has not been
updated to tg->load_avg.
Add a check of cfs_rq->tg_load_avg_contrib in cfs_rq_is_decayed(),
which fixes the case (2.) mentioned above.
Fixes: 1528c661c24b ("sched/fair: Ratelimit update to tg->load_avg")
Signed-off-by: xupengbo <xupengbo@oppo.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Aaron Lu <ziqianlu@bytedance.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Tested-by: Aaron Lu <ziqianlu@bytedance.com>
Link: https://patch.msgid.link/20250827022208.14487-1-xupengbo@oppo.com
---
kernel/sched/fair.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 769d7b7..da46c31 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4034,6 +4034,9 @@ static inline bool cfs_rq_is_decayed(struct cfs_rq *cfs_rq)
if (child_cfs_rq_on_list(cfs_rq))
return false;
+ if (cfs_rq->tg_load_avg_contrib)
+ return false;
+
return true;
}
© 2016 - 2026 Red Hat, Inc.