kernel/sched/core.c | 2 +- kernel/sched/fair.c | 15 --------------- 2 files changed, 1 insertion(+), 16 deletions(-)
When creating a new task, we initialize vruntime of the newly task at
sched_cgroup_fork(). However, the timing of executing this action is too
early and may not be accurate.
Because it uses current CPU to init the vruntime, but the new task
actually runs on the cpu which be assigned at wake_up_new_task().
To optimize this case, we pass ENQUEUE_INITIAL flag to activate_task()
in wake_up_new_task(), in this way, when place_entity is called in
enqueue_entity(), the vruntime of the new task will be initialized.
In addition, place_entity() in task_fork_fair() was introduced for two
reasons:
1. Previously, the __enqueue_entity() was in task_new_fair(),
in order to provide vruntime for enqueueing the newly task, the
vruntime assignment equation "se->vruntime = cfs_rq->min_vruntime" was
introduced by commit e9acbff6484d ("sched: introduce se->vruntime").
This is the initial state of place_entity().
2. commit 4d78e7b656aa ("sched: new task placement for vruntime") added
child_runs_first task placement feature which based on vruntime, this
also requires the new task's vruntime value.
After removing the child_runs_first and enqueue_entity() from
task_fork_fair(), this place_entity() no longer makes sense, so remove
it also.
Signed-off-by: Zhang Qiao <zhangqiao22@huawei.com>
---
v2:
Improve comments and commit log.
v1: https://lore.kernel.org/lkml/20240606121133.2218723-1-zhangqiao22@huawei.com/
---
kernel/sched/core.c | 2 +-
kernel/sched/fair.c | 15 ---------------
2 files changed, 1 insertion(+), 16 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index bcf2c4cc0522..b4ff595a2dc8 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4897,7 +4897,7 @@ void wake_up_new_task(struct task_struct *p)
update_rq_clock(rq);
post_init_entity_util_avg(p);
- activate_task(rq, p, ENQUEUE_NOCLOCK);
+ activate_task(rq, p, ENQUEUE_NOCLOCK | ENQUEUE_INITIAL);
trace_sched_wakeup_new(p);
wakeup_preempt(rq, p, WF_FORK);
#ifdef CONFIG_SMP
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 8a5b1ae0aa55..bb5f376fd51e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -12702,22 +12702,7 @@ static void task_tick_fair(struct rq *rq, struct task_struct *curr, int queued)
*/
static void task_fork_fair(struct task_struct *p)
{
- struct sched_entity *se = &p->se, *curr;
- struct cfs_rq *cfs_rq;
- struct rq *rq = this_rq();
- struct rq_flags rf;
-
- rq_lock(rq, &rf);
- update_rq_clock(rq);
-
set_task_max_allowed_capacity(p);
-
- cfs_rq = task_cfs_rq(current);
- curr = cfs_rq->curr;
- if (curr)
- update_curr(cfs_rq);
- place_entity(cfs_rq, se, ENQUEUE_INITIAL);
- rq_unlock(rq, &rf);
}
/*
--
2.18.0.huawei.25
On Thu, Jun 27, 2024 at 09:33:59PM +0800, Zhang Qiao wrote:
> When creating a new task, we initialize vruntime of the newly task at
> sched_cgroup_fork(). However, the timing of executing this action is too
> early and may not be accurate.
>
> Because it uses current CPU to init the vruntime, but the new task
> actually runs on the cpu which be assigned at wake_up_new_task().
>
> To optimize this case, we pass ENQUEUE_INITIAL flag to activate_task()
> in wake_up_new_task(), in this way, when place_entity is called in
> enqueue_entity(), the vruntime of the new task will be initialized.
>
> In addition, place_entity() in task_fork_fair() was introduced for two
> reasons:
> 1. Previously, the __enqueue_entity() was in task_new_fair(),
> in order to provide vruntime for enqueueing the newly task, the
> vruntime assignment equation "se->vruntime = cfs_rq->min_vruntime" was
> introduced by commit e9acbff6484d ("sched: introduce se->vruntime").
> This is the initial state of place_entity().
>
> 2. commit 4d78e7b656aa ("sched: new task placement for vruntime") added
> child_runs_first task placement feature which based on vruntime, this
> also requires the new task's vruntime value.
>
> After removing the child_runs_first and enqueue_entity() from
> task_fork_fair(), this place_entity() no longer makes sense, so remove
> it also.
>
> Signed-off-by: Zhang Qiao <zhangqiao22@huawei.com>
Thanks, I'll queue this for sched/urgent once -rc1 rolls around.
Ping 🙂
在 2024/6/27 21:33, Zhang Qiao 写道:
> When creating a new task, we initialize vruntime of the newly task at
> sched_cgroup_fork(). However, the timing of executing this action is too
> early and may not be accurate.
>
> Because it uses current CPU to init the vruntime, but the new task
> actually runs on the cpu which be assigned at wake_up_new_task().
>
> To optimize this case, we pass ENQUEUE_INITIAL flag to activate_task()
> in wake_up_new_task(), in this way, when place_entity is called in
> enqueue_entity(), the vruntime of the new task will be initialized.
>
> In addition, place_entity() in task_fork_fair() was introduced for two
> reasons:
> 1. Previously, the __enqueue_entity() was in task_new_fair(),
> in order to provide vruntime for enqueueing the newly task, the
> vruntime assignment equation "se->vruntime = cfs_rq->min_vruntime" was
> introduced by commit e9acbff6484d ("sched: introduce se->vruntime").
> This is the initial state of place_entity().
>
> 2. commit 4d78e7b656aa ("sched: new task placement for vruntime") added
> child_runs_first task placement feature which based on vruntime, this
> also requires the new task's vruntime value.
>
> After removing the child_runs_first and enqueue_entity() from
> task_fork_fair(), this place_entity() no longer makes sense, so remove
> it also.
>
> Signed-off-by: Zhang Qiao <zhangqiao22@huawei.com>
> ---
> v2:
> Improve comments and commit log.
>
> v1: https://lore.kernel.org/lkml/20240606121133.2218723-1-zhangqiao22@huawei.com/
> ---
> kernel/sched/core.c | 2 +-
> kernel/sched/fair.c | 15 ---------------
> 2 files changed, 1 insertion(+), 16 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index bcf2c4cc0522..b4ff595a2dc8 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4897,7 +4897,7 @@ void wake_up_new_task(struct task_struct *p)
> update_rq_clock(rq);
> post_init_entity_util_avg(p);
>
> - activate_task(rq, p, ENQUEUE_NOCLOCK);
> + activate_task(rq, p, ENQUEUE_NOCLOCK | ENQUEUE_INITIAL);
> trace_sched_wakeup_new(p);
> wakeup_preempt(rq, p, WF_FORK);
> #ifdef CONFIG_SMP
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 8a5b1ae0aa55..bb5f376fd51e 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -12702,22 +12702,7 @@ static void task_tick_fair(struct rq *rq, struct task_struct *curr, int queued)
> */
> static void task_fork_fair(struct task_struct *p)
> {
> - struct sched_entity *se = &p->se, *curr;
> - struct cfs_rq *cfs_rq;
> - struct rq *rq = this_rq();
> - struct rq_flags rf;
> -
> - rq_lock(rq, &rf);
> - update_rq_clock(rq);
> -
> set_task_max_allowed_capacity(p);
> -
> - cfs_rq = task_cfs_rq(current);
> - curr = cfs_rq->curr;
> - if (curr)
> - update_curr(cfs_rq);
> - place_entity(cfs_rq, se, ENQUEUE_INITIAL);
> - rq_unlock(rq, &rf);
> }
>
> /*
On Thu, 27 Jun 2024 21:33:59 +0800, Zhang Qiao wrote: > When creating a new task, we initialize vruntime of the newly task at > sched_cgroup_fork(). However, the timing of executing this action is too > early and may not be accurate. > > Because it uses current CPU to init the vruntime, but the new task > actually runs on the cpu which be assigned at wake_up_new_task(). > > [...] I tried this on a 104 cpu ppc64le Power9 system The experiment was to stress the cpus with 416 threads(4 * nproc) and then run small commands like date, time, ls etc With bpf i collected the time it took for the tasks from waking up to switching on to the cpu. I collected 3600 samples for each The results are as follows: With patch @: count 3601, average 4609830, total 16599997845 @c: 3709 @latency: [4K, 8K) 1 | | [8K, 16K) 4 | | [16K, 32K) 2270 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [32K, 64K) 45 |@ | [64K, 128K) 3 | | [128K, 256K) 141 |@@@ | [256K, 512K) 24 | | [512K, 1M) 7 | | [1M, 2M) 8 | | [2M, 4M) 38 | | [4M, 8M) 226 |@@@@@ | [8M, 16M) 321 |@@@@@@@ | [16M, 32M) 501 |@@@@@@@@@@@ | [32M, 64M) 12 | | # Without Patch @: count 3600, average 12786599, total 46031757085 @c: 3608 @latency: [4K, 8K) 1 | | [8K, 16K) 6 | | [16K, 32K) 297 |@@@@@@@@@@@ | [32K, 64K) 25 | | [64K, 128K) 4 | | [128K, 256K) 149 |@@@@@ | [256K, 512K) 21 | | [512K, 1M) 5 | | [1M, 2M) 14 | | [2M, 4M) 45 |@ | [4M, 8M) 857 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [8M, 16M) 812 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [16M, 32M) 1325 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [32M, 64M) 31 |@ | The patch leads to a significant decrease in latency of a newly woken task. Tested-by: Mukesh Kumar Chaurasiya <mchauras@linux.ibm.com> Reviewed-by: Mukesh Kumar Chaurasiya <mchauras@linux.ibm.com>
The following commit has been merged into the sched/core branch of tip:
Commit-ID: c40dd90ac045fa1fdf6acc5bf9109a2315e6c92c
Gitweb: https://git.kernel.org/tip/c40dd90ac045fa1fdf6acc5bf9109a2315e6c92c
Author: Zhang Qiao <zhangqiao22@huawei.com>
AuthorDate: Thu, 27 Jun 2024 21:33:59 +08:00
Committer: Peter Zijlstra <peterz@infradead.org>
CommitterDate: Mon, 29 Jul 2024 12:22:34 +02:00
sched: Initialize the vruntime of a new task when it is first enqueued
When creating a new task, we initialize vruntime of the newly task at
sched_cgroup_fork(). However, the timing of executing this action is too
early and may not be accurate.
Because it uses current CPU to init the vruntime, but the new task
actually runs on the cpu which be assigned at wake_up_new_task().
To optimize this case, we pass ENQUEUE_INITIAL flag to activate_task()
in wake_up_new_task(), in this way, when place_entity is called in
enqueue_entity(), the vruntime of the new task will be initialized.
In addition, place_entity() in task_fork_fair() was introduced for two
reasons:
1. Previously, the __enqueue_entity() was in task_new_fair(),
in order to provide vruntime for enqueueing the newly task, the
vruntime assignment equation "se->vruntime = cfs_rq->min_vruntime" was
introduced by commit e9acbff6484d ("sched: introduce se->vruntime").
This is the initial state of place_entity().
2. commit 4d78e7b656aa ("sched: new task placement for vruntime") added
child_runs_first task placement feature which based on vruntime, this
also requires the new task's vruntime value.
After removing the child_runs_first and enqueue_entity() from
task_fork_fair(), this place_entity() no longer makes sense, so remove
it also.
Signed-off-by: Zhang Qiao <zhangqiao22@huawei.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20240627133359.1370598-1-zhangqiao22@huawei.com
---
kernel/sched/core.c | 2 +-
kernel/sched/fair.c | 15 ---------------
2 files changed, 1 insertion(+), 16 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f3951e4..2c61b4f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4686,7 +4686,7 @@ void wake_up_new_task(struct task_struct *p)
update_rq_clock(rq);
post_init_entity_util_avg(p);
- activate_task(rq, p, ENQUEUE_NOCLOCK);
+ activate_task(rq, p, ENQUEUE_NOCLOCK | ENQUEUE_INITIAL);
trace_sched_wakeup_new(p);
wakeup_preempt(rq, p, WF_FORK);
#ifdef CONFIG_SMP
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9057584..e8cdfeb 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -12702,22 +12702,7 @@ static void task_tick_fair(struct rq *rq, struct task_struct *curr, int queued)
*/
static void task_fork_fair(struct task_struct *p)
{
- struct sched_entity *se = &p->se, *curr;
- struct cfs_rq *cfs_rq;
- struct rq *rq = this_rq();
- struct rq_flags rf;
-
- rq_lock(rq, &rf);
- update_rq_clock(rq);
-
set_task_max_allowed_capacity(p);
-
- cfs_rq = task_cfs_rq(current);
- curr = cfs_rq->curr;
- if (curr)
- update_curr(cfs_rq);
- place_entity(cfs_rq, se, ENQUEUE_INITIAL);
- rq_unlock(rq, &rf);
}
/*
On Mon, Jul 29, 2024 at 10:34:07AM GMT, tip-bot2 for Zhang Qiao wrote: > The following commit has been merged into the sched/core branch of tip: > > Commit-ID: c40dd90ac045fa1fdf6acc5bf9109a2315e6c92c > Gitweb: https://git.kernel.org/tip/c40dd90ac045fa1fdf6acc5bf9109a2315e6c92c > Author: Zhang Qiao <zhangqiao22@huawei.com> > AuthorDate: Thu, 27 Jun 2024 21:33:59 +08:00 > Committer: Peter Zijlstra <peterz@infradead.org> > CommitterDate: Mon, 29 Jul 2024 12:22:34 +02:00 > > > Signed-off-by: Zhang Qiao <zhangqiao22@huawei.com> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > Link: https://lkml.kernel.org/r/20240627133359.1370598-1-zhangqiao22@huawei.com > --- Hi Peter, I just noticed that my tags were not picked, just wanted to check if it's some config issue on my end or something on the tipbot side. Thanks, Mukesh
On Tue, Jul 30, 2024 at 01:45:41PM +0530, Mukesh Kumar Chaurasiya wrote: > > On Mon, Jul 29, 2024 at 10:34:07AM GMT, tip-bot2 for Zhang Qiao wrote: > > The following commit has been merged into the sched/core branch of tip: > > > > Commit-ID: c40dd90ac045fa1fdf6acc5bf9109a2315e6c92c > > Gitweb: https://git.kernel.org/tip/c40dd90ac045fa1fdf6acc5bf9109a2315e6c92c > > Author: Zhang Qiao <zhangqiao22@huawei.com> > > AuthorDate: Thu, 27 Jun 2024 21:33:59 +08:00 > > Committer: Peter Zijlstra <peterz@infradead.org> > > CommitterDate: Mon, 29 Jul 2024 12:22:34 +02:00 > > > > > > Signed-off-by: Zhang Qiao <zhangqiao22@huawei.com> > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > > Link: https://lkml.kernel.org/r/20240627133359.1370598-1-zhangqiao22@huawei.com > > --- > Hi Peter, > > I just noticed that my tags were not picked, just wanted to check if it's some > config issue on my end or something on the tipbot side. Could be I applied the patch before your email arrived. No harm intended, and I do appreciate the review effort.
On Tue, Jul 30, 2024 at 10:28:15AM GMT, Peter Zijlstra wrote: > On Tue, Jul 30, 2024 at 01:45:41PM +0530, Mukesh Kumar Chaurasiya wrote: > > > > On Mon, Jul 29, 2024 at 10:34:07AM GMT, tip-bot2 for Zhang Qiao wrote: > > > The following commit has been merged into the sched/core branch of tip: > > > > > > Commit-ID: c40dd90ac045fa1fdf6acc5bf9109a2315e6c92c > > > Gitweb: https://git.kernel.org/tip/c40dd90ac045fa1fdf6acc5bf9109a2315e6c92c > > > Author: Zhang Qiao <zhangqiao22@huawei.com> > > > AuthorDate: Thu, 27 Jun 2024 21:33:59 +08:00 > > > Committer: Peter Zijlstra <peterz@infradead.org> > > > CommitterDate: Mon, 29 Jul 2024 12:22:34 +02:00 > > > > > > > > > Signed-off-by: Zhang Qiao <zhangqiao22@huawei.com> > > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > > > Link: https://lkml.kernel.org/r/20240627133359.1370598-1-zhangqiao22@huawei.com > > > --- > > Hi Peter, > > > > I just noticed that my tags were not picked, just wanted to check if it's some > > config issue on my end or something on the tipbot side. > > Could be I applied the patch before your email arrived. No harm > intended, and I do appreciate the review effort. Sure Peter, Just wanted to check if i missed something. Thanks, Mukesh
© 2016 - 2025 Red Hat, Inc.