kernel/sched/fair.c | 4 ++++ 1 file changed, 4 insertions(+)
When a task is enqueued and its parent cgroup se is already on_rq, this
parent cgroup se will not be enqueued again, and hence the root->min_slice
leaves unchanged. The same issue happens when a task is dequeued and its
parent cgroup se has other runnable entities, and the parent cgroup se
will not be dequeued.
Force propagating min_slice when se doesn't need to be enqueued or
dequeued. Ensure the se hierarchy always get the latest min_slice.
Fixes: aef6987d8954 ("sched/eevdf: Propagate min_slice up the cgroup hierarchy")
Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
---
v3:
I modified some descriptions in commit log, and rebased to the latest
tip branch. The old version of patch can be found in [1].
The original patchset wants to add a feature. As the 2nd patch may be
hard to be accepted, I think at least the bugfix should be applied.
The issue about this patch was described detailly in [2].
[1]https://lore.kernel.org/all/20241031094822.30531-1-dtcccc@linux.alibaba.com/
[2]https://lore.kernel.org/all/a903d0dc-1d88-4ae7-ac81-3eed0445654d@linux.alibaba.com/
---
kernel/sched/fair.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1e78caa21436..0d479b92633a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6970,6 +6970,8 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
update_cfs_group(se);
se->slice = slice;
+ if (se != cfs_rq->curr)
+ min_vruntime_cb_propagate(&se->run_node, NULL);
slice = cfs_rq_min_slice(cfs_rq);
cfs_rq->h_nr_runnable += h_nr_runnable;
@@ -7099,6 +7101,8 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)
update_cfs_group(se);
se->slice = slice;
+ if (se != cfs_rq->curr)
+ min_vruntime_cb_propagate(&se->run_node, NULL);
slice = cfs_rq_min_slice(cfs_rq);
cfs_rq->h_nr_runnable -= h_nr_runnable;
--
2.39.3
Hello Tianchen,
On 2/11/2025 12:06 PM, Tianchen Ding wrote:
> When a task is enqueued and its parent cgroup se is already on_rq, this
> parent cgroup se will not be enqueued again, and hence the root->min_slice
> leaves unchanged. The same issue happens when a task is dequeued and its
> parent cgroup se has other runnable entities, and the parent cgroup se
> will not be dequeued.
>
> Force propagating min_slice when se doesn't need to be enqueued or
> dequeued. Ensure the se hierarchy always get the latest min_slice.
>
> Fixes: aef6987d8954 ("sched/eevdf: Propagate min_slice up the cgroup hierarchy")
> Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
> ---
> v3:
> I modified some descriptions in commit log, and rebased to the latest
> tip branch. The old version of patch can be found in [1].
>
> The original patchset wants to add a feature. As the 2nd patch may be
> hard to be accepted, I think at least the bugfix should be applied.
>
> The issue about this patch was described detailly in [2].
>
> [1]https://lore.kernel.org/all/20241031094822.30531-1-dtcccc@linux.alibaba.com/
> [2]https://lore.kernel.org/all/a903d0dc-1d88-4ae7-ac81-3eed0445654d@linux.alibaba.com/
> ---
> kernel/sched/fair.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 1e78caa21436..0d479b92633a 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6970,6 +6970,8 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> update_cfs_group(se);
>
> se->slice = slice;
> + if (se != cfs_rq->curr)
> + min_vruntime_cb_propagate(&se->run_node, NULL);
Should we check if old slice matches with the new slice before
propagation to avoid any unnecessary propagate call? Something like:
if (se->slice != slice) {
se->slice = slice;
if (se != cfs_rq->curr)
min_vruntime_cb_propagate(&se->run_node, NULL);
}
Thoughts?
Other than that, the fix looks good. Feel free to add:
Reviewed-and-tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
--
Thanks and Regards,
Prateek
> slice = cfs_rq_min_slice(cfs_rq);
>
> cfs_rq->h_nr_runnable += h_nr_runnable;
> @@ -7099,6 +7101,8 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)
> update_cfs_group(se);
>
> se->slice = slice;
> + if (se != cfs_rq->curr)
> + min_vruntime_cb_propagate(&se->run_node, NULL);
> slice = cfs_rq_min_slice(cfs_rq);
>
> cfs_rq->h_nr_runnable -= h_nr_runnable;
Hi. Sorry for replying late due to weekend.
On 2/14/25 11:42 PM, K Prateek Nayak wrote:
[...]
>
> Should we check if old slice matches with the new slice before
> propagation to avoid any unnecessary propagate call? Something like:
>
> if (se->slice != slice) {
> se->slice = slice;
> if (se != cfs_rq->curr)
> min_vruntime_cb_propagate(&se->run_node, NULL);
> }
>
> Thoughts?
>
This optimization makes sense to me. But the code would be a bit ugly :-/
Maybe we should wrap it in a helper. Something like:
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1e78caa21436..ccceb67004a4 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -844,6 +844,16 @@ static inline bool min_vruntime_update(struct sched_entity *se, bool exit)
RB_DECLARE_CALLBACKS(static, min_vruntime_cb, struct sched_entity,
run_node, min_vruntime, min_vruntime_update);
+static inline void propagate_slice(struct cfs_rq *cfs_rq, struct sched_entity *se, u64 slice)
+{
+ if (se->slice == slice)
+ return;
+
+ se->slice = slice;
+ if (se != cfs_rq->curr)
+ min_vruntime_cb_propagate(&se->run_node, NULL);
+}
+
/*
* Enqueue an entity into the rb-tree:
*/
@@ -6969,7 +6979,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
se_update_runnable(se);
update_cfs_group(se);
- se->slice = slice;
+ propagate_slice(cfs_rq, se, slice);
slice = cfs_rq_min_slice(cfs_rq);
cfs_rq->h_nr_runnable += h_nr_runnable;
@@ -7098,7 +7108,7 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)
se_update_runnable(se);
update_cfs_group(se);
- se->slice = slice;
+ propagate_slice(cfs_rq, se, slice);
slice = cfs_rq_min_slice(cfs_rq);
cfs_rq->h_nr_runnable -= h_nr_runnable;
--
Since the patch has been accepted, I'm not sure whether I should send a
next version. The current version does introduce an extra function call
when se->slice == slice, but the loop will run only once and exit because
RBCOMPUTE() will return true. So maybe the cost is insignificant?
Hello Tianchen, On 2/17/2025 8:23 AM, Tianchen Ding wrote: > > Since the patch has been accepted, I'm not sure whether I should send a > next version. The current version does introduce an extra function call > when se->slice == slice, but the loop will run only once and exit because > RBCOMPUTE() will return true. So maybe the cost is insignificant? Yup the cost is likely insignificant! I have no strong feelings and since this has landed, I don't think there is a need to micro optimize this. -- Thanks and Regards, Prateek
The following commit has been merged into the sched/core branch of tip:
Commit-ID: 563bc2161b94571ea425bbe2cf69fd38e24cdedf
Gitweb: https://git.kernel.org/tip/563bc2161b94571ea425bbe2cf69fd38e24cdedf
Author: Tianchen Ding <dtcccc@linux.alibaba.com>
AuthorDate: Tue, 11 Feb 2025 14:36:59 +08:00
Committer: Peter Zijlstra <peterz@infradead.org>
CommitterDate: Fri, 14 Feb 2025 10:32:01 +01:00
sched/eevdf: Force propagating min_slice of cfs_rq when {en,de}queue tasks
When a task is enqueued and its parent cgroup se is already on_rq, this
parent cgroup se will not be enqueued again, and hence the root->min_slice
leaves unchanged. The same issue happens when a task is dequeued and its
parent cgroup se has other runnable entities, and the parent cgroup se
will not be dequeued.
Force propagating min_slice when se doesn't need to be enqueued or
dequeued. Ensure the se hierarchy always get the latest min_slice.
Fixes: aef6987d8954 ("sched/eevdf: Propagate min_slice up the cgroup hierarchy")
Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20250211063659.7180-1-dtcccc@linux.alibaba.com
---
kernel/sched/fair.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1784752..9279bfb 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7002,6 +7002,8 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
update_cfs_group(se);
se->slice = slice;
+ if (se != cfs_rq->curr)
+ min_vruntime_cb_propagate(&se->run_node, NULL);
slice = cfs_rq_min_slice(cfs_rq);
cfs_rq->h_nr_runnable += h_nr_runnable;
@@ -7131,6 +7133,8 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)
update_cfs_group(se);
se->slice = slice;
+ if (se != cfs_rq->curr)
+ min_vruntime_cb_propagate(&se->run_node, NULL);
slice = cfs_rq_min_slice(cfs_rq);
cfs_rq->h_nr_runnable -= h_nr_runnable;
© 2016 - 2025 Red Hat, Inc.