kernel/sched/fair.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
child_cfs_rq_on_list attempts to convert a 'prev' pointer to a cfs_rq.
This 'prev' pointer can originate from struct rq's leaf_cfs_rq_list,
making the conversion invalid and potentially leading to memory
corruption. Depending on the relative positions of leaf_cfs_rq_list and
the task group (tg) pointer within the struct, this can cause a memory
fault or access garbage data.
The issue arises in list_add_leaf_cfs_rq, where both
cfs_rq->leaf_cfs_rq_list and rq->leaf_cfs_rq_list are added to the same
leaf list. Also, rq->tmp_alone_branch can be set to rq->leaf_cfs_rq_list.
This adds a check `if (prev == &rq->leaf_cfs_rq_list)` after the main
conditional in child_cfs_rq_on_list. This ensures that the container_of
operation will convert a correct cfs_rq struct.
This check is sufficient because only cfs_rqs on the same CPU are added
to the list, so verifying the 'prev' pointer against the current rq's list
head is enough.
Fixes a potential memory corruption issue that due to current struct
layout might not be manifesting as a crash but could lead to unpredictable
behavior when the layout changes.
Signed-off-by: Zecheng Li <zecheng@google.com>
---
kernel/sched/fair.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 857808da23d8..9dafb374d76d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4061,15 +4061,17 @@ static inline bool child_cfs_rq_on_list(struct cfs_rq *cfs_rq)
{
struct cfs_rq *prev_cfs_rq;
struct list_head *prev;
+ struct rq *rq = rq_of(cfs_rq);
if (cfs_rq->on_list) {
prev = cfs_rq->leaf_cfs_rq_list.prev;
} else {
- struct rq *rq = rq_of(cfs_rq);
-
prev = rq->tmp_alone_branch;
}
+ if (prev == &rq->leaf_cfs_rq_list)
+ return false;
+
prev_cfs_rq = container_of(prev, struct cfs_rq, leaf_cfs_rq_list);
return (prev_cfs_rq->tg->parent == cfs_rq->tg);
base-commit: 7ab02bd36eb444654183ad6c5b15211ddfa32a8f
--
2.48.1
On Tue, 4 Mar 2025 at 22:40, Zecheng Li <zecheng@google.com> wrote:
>
> child_cfs_rq_on_list attempts to convert a 'prev' pointer to a cfs_rq.
> This 'prev' pointer can originate from struct rq's leaf_cfs_rq_list,
> making the conversion invalid and potentially leading to memory
> corruption. Depending on the relative positions of leaf_cfs_rq_list and
> the task group (tg) pointer within the struct, this can cause a memory
> fault or access garbage data.
>
> The issue arises in list_add_leaf_cfs_rq, where both
> cfs_rq->leaf_cfs_rq_list and rq->leaf_cfs_rq_list are added to the same
> leaf list. Also, rq->tmp_alone_branch can be set to rq->leaf_cfs_rq_list.
>
> This adds a check `if (prev == &rq->leaf_cfs_rq_list)` after the main
> conditional in child_cfs_rq_on_list. This ensures that the container_of
> operation will convert a correct cfs_rq struct.
>
> This check is sufficient because only cfs_rqs on the same CPU are added
> to the list, so verifying the 'prev' pointer against the current rq's list
> head is enough.
>
> Fixes a potential memory corruption issue that due to current struct
> layout might not be manifesting as a crash but could lead to unpredictable
> behavior when the layout changes.
Would be good to add a fix tag
Fixes: fdaba61ef8a2 ("sched/fair: Ensure that the CFS parent is added
after unthrottling")
>
> Signed-off-by: Zecheng Li <zecheng@google.com>
> ---
> kernel/sched/fair.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 857808da23d8..9dafb374d76d 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4061,15 +4061,17 @@ static inline bool child_cfs_rq_on_list(struct cfs_rq *cfs_rq)
> {
> struct cfs_rq *prev_cfs_rq;
> struct list_head *prev;
> + struct rq *rq = rq_of(cfs_rq);
>
> if (cfs_rq->on_list) {
> prev = cfs_rq->leaf_cfs_rq_list.prev;
> } else {
> - struct rq *rq = rq_of(cfs_rq);
> -
> prev = rq->tmp_alone_branch;
> }
>
> + if (prev == &rq->leaf_cfs_rq_list)
> + return false;
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
> +
> prev_cfs_rq = container_of(prev, struct cfs_rq, leaf_cfs_rq_list);
>
> return (prev_cfs_rq->tg->parent == cfs_rq->tg);
>
> base-commit: 7ab02bd36eb444654183ad6c5b15211ddfa32a8f
> --
> 2.48.1
>
Hello Li,
On 3/5/2025 3:10 AM, Zecheng Li wrote:
> child_cfs_rq_on_list attempts to convert a 'prev' pointer to a cfs_rq.
> This 'prev' pointer can originate from struct rq's leaf_cfs_rq_list,
> making the conversion invalid and potentially leading to memory
> corruption. Depending on the relative positions of leaf_cfs_rq_list and
> the task group (tg) pointer within the struct, this can cause a memory
> fault or access garbage data.
>
> The issue arises in list_add_leaf_cfs_rq, where both
> cfs_rq->leaf_cfs_rq_list and rq->leaf_cfs_rq_list are added to the same
> leaf list. Also, rq->tmp_alone_branch can be set to rq->leaf_cfs_rq_list.
>
> This adds a check `if (prev == &rq->leaf_cfs_rq_list)` after the main
> conditional in child_cfs_rq_on_list. This ensures that the container_of
> operation will convert a correct cfs_rq struct.
>
> This check is sufficient because only cfs_rqs on the same CPU are added
> to the list, so verifying the 'prev' pointer against the current rq's list
> head is enough.
>
> Fixes a potential memory corruption issue that due to current struct
> layout might not be manifesting as a crash but could lead to unpredictable
> behavior when the layout changes.
>
> Signed-off-by: Zecheng Li <zecheng@google.com>
> ---
> kernel/sched/fair.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 857808da23d8..9dafb374d76d 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4061,15 +4061,17 @@ static inline bool child_cfs_rq_on_list(struct cfs_rq *cfs_rq)
> {
> struct cfs_rq *prev_cfs_rq;
> struct list_head *prev;
> + struct rq *rq = rq_of(cfs_rq);
>
> if (cfs_rq->on_list) {
> prev = cfs_rq->leaf_cfs_rq_list.prev;
> } else {
> - struct rq *rq = rq_of(cfs_rq);
> -
> prev = rq->tmp_alone_branch;
> }
A "SCHED_WARN_ON(prev == &rq->leaf_cfs_rq_list)" here is easily tripped
during early boot on my setup before this fix.
Only nit. is perhaps that return can go into the else clause above since
"cfs_rq->on_list" case will guarantee a "leaf_cfs_rq_list" pointer that
is embedded in a valid cfs_rq struct but I've no strong feelings.
Feel free to add:
Reviewed-and-tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
--
Thanks and Regards,
Prateek
>
> + if (prev == &rq->leaf_cfs_rq_list)
> + return false;
> +
> prev_cfs_rq = container_of(prev, struct cfs_rq, leaf_cfs_rq_list);
>
> return (prev_cfs_rq->tg->parent == cfs_rq->tg);
>
> base-commit: 7ab02bd36eb444654183ad6c5b15211ddfa32a8f
The following commit has been merged into the sched/urgent branch of tip:
Commit-ID: 3b4035ddbfc8e4521f85569998a7569668cccf51
Gitweb: https://git.kernel.org/tip/3b4035ddbfc8e4521f85569998a7569668cccf51
Author: Zecheng Li <zecheng@google.com>
AuthorDate: Tue, 04 Mar 2025 21:40:31
Committer: Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 05 Mar 2025 17:30:54 +01:00
sched/fair: Fix potential memory corruption in child_cfs_rq_on_list
child_cfs_rq_on_list attempts to convert a 'prev' pointer to a cfs_rq.
This 'prev' pointer can originate from struct rq's leaf_cfs_rq_list,
making the conversion invalid and potentially leading to memory
corruption. Depending on the relative positions of leaf_cfs_rq_list and
the task group (tg) pointer within the struct, this can cause a memory
fault or access garbage data.
The issue arises in list_add_leaf_cfs_rq, where both
cfs_rq->leaf_cfs_rq_list and rq->leaf_cfs_rq_list are added to the same
leaf list. Also, rq->tmp_alone_branch can be set to rq->leaf_cfs_rq_list.
This adds a check `if (prev == &rq->leaf_cfs_rq_list)` after the main
conditional in child_cfs_rq_on_list. This ensures that the container_of
operation will convert a correct cfs_rq struct.
This check is sufficient because only cfs_rqs on the same CPU are added
to the list, so verifying the 'prev' pointer against the current rq's list
head is enough.
Fixes a potential memory corruption issue that due to current struct
layout might not be manifesting as a crash but could lead to unpredictable
behavior when the layout changes.
Fixes: fdaba61ef8a2 ("sched/fair: Ensure that the CFS parent is added after unthrottling")
Signed-off-by: Zecheng Li <zecheng@google.com>
Reviewed-and-tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lore.kernel.org/r/20250304214031.2882646-1-zecheng@google.com
---
kernel/sched/fair.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1c0ef43..c798d27 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4045,15 +4045,17 @@ static inline bool child_cfs_rq_on_list(struct cfs_rq *cfs_rq)
{
struct cfs_rq *prev_cfs_rq;
struct list_head *prev;
+ struct rq *rq = rq_of(cfs_rq);
if (cfs_rq->on_list) {
prev = cfs_rq->leaf_cfs_rq_list.prev;
} else {
- struct rq *rq = rq_of(cfs_rq);
-
prev = rq->tmp_alone_branch;
}
+ if (prev == &rq->leaf_cfs_rq_list)
+ return false;
+
prev_cfs_rq = container_of(prev, struct cfs_rq, leaf_cfs_rq_list);
return (prev_cfs_rq->tg->parent == cfs_rq->tg);
© 2016 - 2026 Red Hat, Inc.