[PATCH v3] sched/eevdf: Force propagating min_slice of cfs_rq when {en,de}queue tasks

Tianchen Ding posted 1 patch 10 months, 1 week ago
kernel/sched/fair.c | 4 ++++
1 file changed, 4 insertions(+)
[PATCH v3] sched/eevdf: Force propagating min_slice of cfs_rq when {en,de}queue tasks
Posted by Tianchen Ding 10 months, 1 week ago
When a task is enqueued and its parent cgroup se is already on_rq, this
parent cgroup se will not be enqueued again, and hence the root->min_slice
leaves unchanged. The same issue happens when a task is dequeued and its
parent cgroup se has other runnable entities, and the parent cgroup se
will not be dequeued.

Force propagating min_slice when se doesn't need to be enqueued or
dequeued. Ensure the se hierarchy always get the latest min_slice.

Fixes: aef6987d8954 ("sched/eevdf: Propagate min_slice up the cgroup hierarchy")
Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
---
v3:
I modified some descriptions in commit log, and rebased to the latest
tip branch. The old version of patch can be found in [1].

The original patchset wants to add a feature. As the 2nd patch may be
hard to be accepted, I think at least the bugfix should be applied.

The issue about this patch was described detailly in [2].

[1]https://lore.kernel.org/all/20241031094822.30531-1-dtcccc@linux.alibaba.com/
[2]https://lore.kernel.org/all/a903d0dc-1d88-4ae7-ac81-3eed0445654d@linux.alibaba.com/
---
 kernel/sched/fair.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1e78caa21436..0d479b92633a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6970,6 +6970,8 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 		update_cfs_group(se);
 
 		se->slice = slice;
+		if (se != cfs_rq->curr)
+			min_vruntime_cb_propagate(&se->run_node, NULL);
 		slice = cfs_rq_min_slice(cfs_rq);
 
 		cfs_rq->h_nr_runnable += h_nr_runnable;
@@ -7099,6 +7101,8 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)
 		update_cfs_group(se);
 
 		se->slice = slice;
+		if (se != cfs_rq->curr)
+			min_vruntime_cb_propagate(&se->run_node, NULL);
 		slice = cfs_rq_min_slice(cfs_rq);
 
 		cfs_rq->h_nr_runnable -= h_nr_runnable;
-- 
2.39.3
Re: [PATCH v3] sched/eevdf: Force propagating min_slice of cfs_rq when {en,de}queue tasks
Posted by K Prateek Nayak 10 months ago
Hello Tianchen,

On 2/11/2025 12:06 PM, Tianchen Ding wrote:
> When a task is enqueued and its parent cgroup se is already on_rq, this
> parent cgroup se will not be enqueued again, and hence the root->min_slice
> leaves unchanged. The same issue happens when a task is dequeued and its
> parent cgroup se has other runnable entities, and the parent cgroup se
> will not be dequeued.
> 
> Force propagating min_slice when se doesn't need to be enqueued or
> dequeued. Ensure the se hierarchy always get the latest min_slice.
> 
> Fixes: aef6987d8954 ("sched/eevdf: Propagate min_slice up the cgroup hierarchy")
> Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
> ---
> v3:
> I modified some descriptions in commit log, and rebased to the latest
> tip branch. The old version of patch can be found in [1].
> 
> The original patchset wants to add a feature. As the 2nd patch may be
> hard to be accepted, I think at least the bugfix should be applied.
> 
> The issue about this patch was described detailly in [2].
> 
> [1]https://lore.kernel.org/all/20241031094822.30531-1-dtcccc@linux.alibaba.com/
> [2]https://lore.kernel.org/all/a903d0dc-1d88-4ae7-ac81-3eed0445654d@linux.alibaba.com/
> ---
>   kernel/sched/fair.c | 4 ++++
>   1 file changed, 4 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 1e78caa21436..0d479b92633a 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6970,6 +6970,8 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
>   		update_cfs_group(se);
>   
>   		se->slice = slice;
> +		if (se != cfs_rq->curr)
> +			min_vruntime_cb_propagate(&se->run_node, NULL);

Should we check if old slice matches with the new slice before
propagation to avoid any unnecessary propagate call? Something like:

		if (se->slice != slice) {
			se->slice = slice;
			if (se != cfs_rq->curr)
				min_vruntime_cb_propagate(&se->run_node, NULL);
		}

Thoughts?

Other than that, the fix looks good. Feel free to add:

Reviewed-and-tested-by: K Prateek Nayak <kprateek.nayak@amd.com>

-- 
Thanks and Regards,
Prateek

>   		slice = cfs_rq_min_slice(cfs_rq);
>   
>   		cfs_rq->h_nr_runnable += h_nr_runnable;
> @@ -7099,6 +7101,8 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)
>   		update_cfs_group(se);
>   
>   		se->slice = slice;
> +		if (se != cfs_rq->curr)
> +			min_vruntime_cb_propagate(&se->run_node, NULL);
>   		slice = cfs_rq_min_slice(cfs_rq);
>   
>   		cfs_rq->h_nr_runnable -= h_nr_runnable;
Re: [PATCH v3] sched/eevdf: Force propagating min_slice of cfs_rq when {en,de}queue tasks
Posted by Tianchen Ding 10 months ago
Hi. Sorry for replying late due to weekend.

On 2/14/25 11:42 PM, K Prateek Nayak wrote:

[...]

> 
> Should we check if old slice matches with the new slice before
> propagation to avoid any unnecessary propagate call? Something like:
> 
>          if (se->slice != slice) {
>              se->slice = slice;
>              if (se != cfs_rq->curr)
>                  min_vruntime_cb_propagate(&se->run_node, NULL);
>          }
> 
> Thoughts?
> 

This optimization makes sense to me. But the code would be a bit ugly :-/

Maybe we should wrap it in a helper. Something like:

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1e78caa21436..ccceb67004a4 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -844,6 +844,16 @@ static inline bool min_vruntime_update(struct sched_entity *se, bool exit)
  RB_DECLARE_CALLBACKS(static, min_vruntime_cb, struct sched_entity,
  		     run_node, min_vruntime, min_vruntime_update);
  
+static inline void propagate_slice(struct cfs_rq *cfs_rq, struct sched_entity *se, u64 slice)
+{
+	if (se->slice == slice)
+		return;
+
+	se->slice = slice;
+	if (se != cfs_rq->curr)
+		min_vruntime_cb_propagate(&se->run_node, NULL);
+}
+
  /*
   * Enqueue an entity into the rb-tree:
   */
@@ -6969,7 +6979,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
  		se_update_runnable(se);
  		update_cfs_group(se);
  
-		se->slice = slice;
+		propagate_slice(cfs_rq, se, slice);
  		slice = cfs_rq_min_slice(cfs_rq);
  
  		cfs_rq->h_nr_runnable += h_nr_runnable;
@@ -7098,7 +7108,7 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)
  		se_update_runnable(se);
  		update_cfs_group(se);
  
-		se->slice = slice;
+		propagate_slice(cfs_rq, se, slice);
  		slice = cfs_rq_min_slice(cfs_rq);
  
  		cfs_rq->h_nr_runnable -= h_nr_runnable;
-- 

Since the patch has been accepted, I'm not sure whether I should send a
next version. The current version does introduce an extra function call
when se->slice == slice, but the loop will run only once and exit because
RBCOMPUTE() will return true. So maybe the cost is insignificant?
Re: [PATCH v3] sched/eevdf: Force propagating min_slice of cfs_rq when {en,de}queue tasks
Posted by K Prateek Nayak 10 months ago
Hello Tianchen,

On 2/17/2025 8:23 AM, Tianchen Ding wrote:
> 
> Since the patch has been accepted, I'm not sure whether I should send a
> next version. The current version does introduce an extra function call
> when se->slice == slice, but the loop will run only once and exit because
> RBCOMPUTE() will return true. So maybe the cost is insignificant?


Yup the cost is likely insignificant! I have no strong feelings and since
this has landed, I don't think there is a need to micro optimize this.

-- 
Thanks and Regards,
Prateek
[tip: sched/core] sched/eevdf: Force propagating min_slice of cfs_rq when {en,de}queue tasks
Posted by tip-bot2 for Tianchen Ding 10 months ago
The following commit has been merged into the sched/core branch of tip:

Commit-ID:     563bc2161b94571ea425bbe2cf69fd38e24cdedf
Gitweb:        https://git.kernel.org/tip/563bc2161b94571ea425bbe2cf69fd38e24cdedf
Author:        Tianchen Ding <dtcccc@linux.alibaba.com>
AuthorDate:    Tue, 11 Feb 2025 14:36:59 +08:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Fri, 14 Feb 2025 10:32:01 +01:00

sched/eevdf: Force propagating min_slice of cfs_rq when {en,de}queue tasks

When a task is enqueued and its parent cgroup se is already on_rq, this
parent cgroup se will not be enqueued again, and hence the root->min_slice
leaves unchanged. The same issue happens when a task is dequeued and its
parent cgroup se has other runnable entities, and the parent cgroup se
will not be dequeued.

Force propagating min_slice when se doesn't need to be enqueued or
dequeued. Ensure the se hierarchy always get the latest min_slice.

Fixes: aef6987d8954 ("sched/eevdf: Propagate min_slice up the cgroup hierarchy")
Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20250211063659.7180-1-dtcccc@linux.alibaba.com
---
 kernel/sched/fair.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1784752..9279bfb 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7002,6 +7002,8 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
 		update_cfs_group(se);
 
 		se->slice = slice;
+		if (se != cfs_rq->curr)
+			min_vruntime_cb_propagate(&se->run_node, NULL);
 		slice = cfs_rq_min_slice(cfs_rq);
 
 		cfs_rq->h_nr_runnable += h_nr_runnable;
@@ -7131,6 +7133,8 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)
 		update_cfs_group(se);
 
 		se->slice = slice;
+		if (se != cfs_rq->curr)
+			min_vruntime_cb_propagate(&se->run_node, NULL);
 		slice = cfs_rq_min_slice(cfs_rq);
 
 		cfs_rq->h_nr_runnable -= h_nr_runnable;