From nobody Sun Apr 19 18:46:17 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8BD09C433EF for ; Tue, 28 Jun 2022 07:57:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243500AbiF1H5p (ORCPT ); Tue, 28 Jun 2022 03:57:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52878 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1343783AbiF1H5h (ORCPT ); Tue, 28 Jun 2022 03:57:37 -0400 Received: from out30-45.freemail.mail.aliyun.com (out30-45.freemail.mail.aliyun.com [115.124.30.45]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4689C1B786 for ; Tue, 28 Jun 2022 00:57:36 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R191e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046049;MF=cruzzhao@linux.alibaba.com;NM=1;PH=DS;RN=10;SR=0;TI=SMTPD_---0VHgOW0s_1656403052; Received: from rt2b04371.sqa.tbc.tbsite.net(mailfrom:CruzZhao@linux.alibaba.com fp:SMTPD_---0VHgOW0s_1656403052) by smtp.aliyun-inc.com; Tue, 28 Jun 2022 15:57:33 +0800 From: Cruz Zhao To: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com Cc: linux-kernel@vger.kernel.org Subject: [PATCH 1/3] sched/core: Fix the bug that task won't enqueue into core tree when update cookie Date: Tue, 28 Jun 2022 15:57:23 +0800 Message-Id: <1656403045-100840-2-git-send-email-CruzZhao@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1656403045-100840-1-git-send-email-CruzZhao@linux.alibaba.com> References: <1656403045-100840-1-git-send-email-CruzZhao@linux.alibaba.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" In function sched_core_update_cookie(), a task will enqueue into the core tree only when it enqueued before, that is, if an uncookied task is cookied, it will not enqueue into the core tree until it enqueue again, which will result in unnecessary force idle. Here follows the scenario: CPU x and CPU y are a pair of SMT siblings. 1. Start task a running on CPU x without sleeping, and task b and task c running on CPU y without sleeping. 2. We create a cookie and share it to task a and task b, and then we create another cookie and share it to task c. 3. Simpling core_forceidle_sum of task a and b from /proc/PID/sched And we will find out that core_forceidle_sum of task a takes 30% time of the sampling period, which shouldn't happen as task a and b have the same cookie. Then we migrate task a to CPU x', migrate task b and c to CPU y', where CPU x' and CPU y' are a pair of SMT siblings, and sampling again, we will found out that core_forceidle_sum of task a and b are almost zero. To solve this problem, we enqueue the task into the core tree if it's on rq. Fixes: 6e33cad0af49("sched: Trivial core scheduling cookie management") Signed-off-by: Cruz Zhao --- kernel/sched/core_sched.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/sched/core_sched.c b/kernel/sched/core_sched.c index 38a2cec..ba2466c 100644 --- a/kernel/sched/core_sched.c +++ b/kernel/sched/core_sched.c @@ -75,7 +75,7 @@ static unsigned long sched_core_update_cookie(struct task= _struct *p, old_cookie =3D p->core_cookie; p->core_cookie =3D cookie; =20 - if (enqueued) + if (task_on_rq_queued(p)) sched_core_enqueue(rq, p); =20 /* --=20 1.8.3.1 From nobody Sun Apr 19 18:46:17 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3FA55C43334 for ; Tue, 28 Jun 2022 07:57:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243614AbiF1H5v (ORCPT ); Tue, 28 Jun 2022 03:57:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52878 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1343795AbiF1H5h (ORCPT ); Tue, 28 Jun 2022 03:57:37 -0400 Received: from out30-54.freemail.mail.aliyun.com (out30-54.freemail.mail.aliyun.com [115.124.30.54]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0D23F1C924 for ; Tue, 28 Jun 2022 00:57:36 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R131e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046050;MF=cruzzhao@linux.alibaba.com;NM=1;PH=DS;RN=10;SR=0;TI=SMTPD_---0VHgOW1N_1656403053; Received: from rt2b04371.sqa.tbc.tbsite.net(mailfrom:CruzZhao@linux.alibaba.com fp:SMTPD_---0VHgOW1N_1656403053) by smtp.aliyun-inc.com; Tue, 28 Jun 2022 15:57:34 +0800 From: Cruz Zhao To: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com Cc: linux-kernel@vger.kernel.org Subject: [PATCH 2/3] sched/core: Introduce nr_running percpu for each cookie Date: Tue, 28 Jun 2022 15:57:24 +0800 Message-Id: <1656403045-100840-3-git-send-email-CruzZhao@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1656403045-100840-1-git-send-email-CruzZhao@linux.alibaba.com> References: <1656403045-100840-1-git-send-email-CruzZhao@linux.alibaba.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Introduce a percpu count to struct sched_core_cookie, which indicates how many tasks with this cookie in the runqueue of this cpu. Signed-off-by: Cruz Zhao --- kernel/sched/core.c | 7 +++++++ kernel/sched/core_sched.c | 16 ++++++++-------- kernel/sched/sched.h | 9 +++++++++ 3 files changed, 24 insertions(+), 8 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 263d764..9f71042 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -235,21 +235,28 @@ static inline int rb_sched_core_cmp(const void *key, = const struct rb_node *node) =20 void sched_core_enqueue(struct rq *rq, struct task_struct *p) { + struct sched_core_cookie *ck =3D (struct sched_core_cookie *)p->core_cook= ie; + rq->core->core_task_seq++; =20 if (!p->core_cookie) return; =20 rb_add(&p->core_node, &rq->core_tree, rb_sched_core_less); + + *per_cpu_ptr(ck->nr_running, rq->cpu) +=3D 1; } =20 void sched_core_dequeue(struct rq *rq, struct task_struct *p, int flags) { + struct sched_core_cookie *ck =3D (struct sched_core_cookie *)p->core_cook= ie; + rq->core->core_task_seq++; =20 if (sched_core_enqueued(p)) { rb_erase(&p->core_node, &rq->core_tree); RB_CLEAR_NODE(&p->core_node); + *per_cpu_ptr(ck->nr_running, rq->cpu) -=3D 1; } =20 /* diff --git a/kernel/sched/core_sched.c b/kernel/sched/core_sched.c index ba2466c..65ab9fcb 100644 --- a/kernel/sched/core_sched.c +++ b/kernel/sched/core_sched.c @@ -1,20 +1,19 @@ // SPDX-License-Identifier: GPL-2.0-only =20 -/* - * A simple wrapper around refcount. An allocated sched_core_cookie's - * address is used to compute the cookie of the task. - */ -struct sched_core_cookie { - refcount_t refcnt; -}; - static unsigned long sched_core_alloc_cookie(void) { struct sched_core_cookie *ck =3D kmalloc(sizeof(*ck), GFP_KERNEL); + int cpu; + if (!ck) return 0; =20 refcount_set(&ck->refcnt, 1); + + ck->nr_running =3D alloc_percpu(unsigned int); + for_each_possible_cpu(cpu) + *per_cpu_ptr(ck->nr_running, cpu) =3D 0; + sched_core_get(); =20 return (unsigned long)ck; @@ -25,6 +24,7 @@ static void sched_core_put_cookie(unsigned long cookie) struct sched_core_cookie *ptr =3D (void *)cookie; =20 if (ptr && refcount_dec_and_test(&ptr->refcnt)) { + free_percpu(ptr->nr_running); kfree(ptr); sched_core_put(); } diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 5b14b6b..d852c67 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1186,6 +1186,15 @@ static inline raw_spinlock_t *__rq_lockp(struct rq *= rq) bool cfs_prio_less(struct task_struct *a, struct task_struct *b, bool fi); =20 /* + * A simple wrapper around refcount. An allocated sched_core_cookie's + * address is used to compute the cookie of the task. + */ +struct sched_core_cookie { + refcount_t refcnt; + unsigned int __percpu *nr_running; +}; + +/* * Helpers to check if the CPU's core cookie matches with the task's cookie * when core scheduling is enabled. * A special case is that the task's cookie always matches with CPU's core --=20 1.8.3.1 From nobody Sun Apr 19 18:46:17 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 80389CCA481 for ; Tue, 28 Jun 2022 07:57:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242994AbiF1H5n (ORCPT ); Tue, 28 Jun 2022 03:57:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52898 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1343816AbiF1H5i (ORCPT ); Tue, 28 Jun 2022 03:57:38 -0400 Received: from out30-56.freemail.mail.aliyun.com (out30-56.freemail.mail.aliyun.com [115.124.30.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D59B51573B for ; Tue, 28 Jun 2022 00:57:37 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R591e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046049;MF=cruzzhao@linux.alibaba.com;NM=1;PH=DS;RN=10;SR=0;TI=SMTPD_---0VHgOW1q_1656403054; Received: from rt2b04371.sqa.tbc.tbsite.net(mailfrom:CruzZhao@linux.alibaba.com fp:SMTPD_---0VHgOW1q_1656403054) by smtp.aliyun-inc.com; Tue, 28 Jun 2022 15:57:34 +0800 From: Cruz Zhao To: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com Cc: linux-kernel@vger.kernel.org Subject: [PATCH 3/3] sched/core: Make tasks with the same cookie pairs on SMT siblings Date: Tue, 28 Jun 2022 15:57:25 +0800 Message-Id: <1656403045-100840-4-git-send-email-CruzZhao@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1656403045-100840-1-git-send-email-CruzZhao@linux.alibaba.com> References: <1656403045-100840-1-git-send-email-CruzZhao@linux.alibaba.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" If the number of tasks in the runqueue of SMT siblings are equal, we call the core balanced, otherwise unbalanced. If the core is balanced, everytime we pick next task, we can pick a pair of tasks with the same cookie for each SMT sibling, and forceidle will be avoided. - Migrate the task if source core and destination core can balance If ck->nr_running of src_cpu is the highest among the source core, and ck->nr_running of dst_cpu is the lowest among the destination core, migrate the task. - Select cookie matched idle CPU or idle CPU with the lowest ck->nr_running among the core In the fast path of task wakeup, if ck->nr_running of the cpu is the lowest among the core, we can select this cpu to wake up. - Find cookie matched idlest CPU or cookie matched CPU with the lowest ck->nr_running among the core In the slow path of task wakeup, if ck->nr_running of the cpu is the lowest among the core, we can select this cpu to wake up. Signed-off-by: Cruz Zhao --- kernel/sched/fair.c | 4 ++-- kernel/sched/sched.h | 65 +++++++++++++++++++++++++++++++++++++++++++++++-= ---- 2 files changed, 61 insertions(+), 8 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 78795a9..c18a716 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6096,7 +6096,7 @@ static int wake_affine(struct sched_domain *sd, struc= t task_struct *p, for_each_cpu_and(i, sched_group_span(group), p->cpus_ptr) { struct rq *rq =3D cpu_rq(i); =20 - if (!sched_core_cookie_match(rq, p)) + if (!sched_core_cookie_match(NULL, rq, p)) continue; =20 if (sched_idle_cpu(i)) @@ -7681,7 +7681,7 @@ static int task_hot(struct task_struct *p, struct lb_= env *env) * Don't migrate task if the task's cookie does not match * with the destination CPU's core cookie. */ - if (!sched_core_cookie_match(cpu_rq(env->dst_cpu), p)) + if (!(sched_core_cookie_match(env->src_rq, env->dst_rq, p))) return 1; =20 if (sysctl_sched_migration_cost =3D=3D 0) diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index d852c67..ee0e558 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1195,6 +1195,56 @@ struct sched_core_cookie { }; =20 /* + * When tasks with the same cookie can make pairs on SMT siblings, forceid= le can be + * avoided a lot, so when wake up and load balance, we try to make and kee= p the pairs + * with the same cookie on SMT siblings. + */ +static inline bool +sched_core_make_pair_balance(struct rq *src_rq, struct rq *dst_rq, struct = task_struct *p) +{ + struct sched_core_cookie *ck =3D (struct sched_core_cookie *)p->core_cook= ie; + unsigned int src_cpu, dst_cpu, t; + unsigned int src_nr_running, dst_nr_running; + + if (!ck) + return true; + + /* + * When load balance, if ck->nr_running on src_cpu is less than that on S= MT + * siblings, don't migrate the task. + */ + if (src_rq) { + if (!sched_core_enabled(src_rq)) + return true; + src_cpu =3D cpu_of(src_rq); + src_nr_running =3D *per_cpu_ptr(ck->nr_running, src_cpu); + for_each_cpu(t, cpu_smt_mask(src_cpu)) { + if (t =3D=3D src_cpu) + continue; + if (*per_cpu_ptr(ck->nr_running, t) >=3D src_nr_running) + return false; + } + + } + + /* + * If task p can make pair the cookied task with p->core_cookie on the + * dst core, we can wake up task p on dst_rq, or migrate it to dst_rq. + */ + dst_cpu =3D cpu_of(dst_rq); + dst_nr_running =3D *per_cpu_ptr(ck->nr_running, dst_cpu); + for_each_cpu(t, cpu_smt_mask(dst_cpu)) { + if (t =3D=3D dst_cpu) + continue; + if (*per_cpu_ptr(ck->nr_running, t) <=3D dst_nr_running) + return false; + } + + return true; +} + + +/* * Helpers to check if the CPU's core cookie matches with the task's cookie * when core scheduling is enabled. * A special case is that the task's cookie always matches with CPU's core @@ -1206,19 +1256,21 @@ static inline bool sched_cpu_cookie_match(struct rq= *rq, struct task_struct *p) if (!sched_core_enabled(rq)) return true; =20 - return rq->core->core_cookie =3D=3D p->core_cookie; + return rq->core->core_cookie =3D=3D p->core_cookie || + sched_core_make_pair_balance(NULL, rq, p); } =20 -static inline bool sched_core_cookie_match(struct rq *rq, struct task_stru= ct *p) +static inline bool +sched_core_cookie_match(struct rq *src_rq, struct rq *dst_rq, struct task_= struct *p) { bool idle_core =3D true; int cpu; =20 /* Ignore cookie match if core scheduler is not enabled on the CPU. */ - if (!sched_core_enabled(rq)) + if (!sched_core_enabled(dst_rq)) return true; =20 - for_each_cpu(cpu, cpu_smt_mask(cpu_of(rq))) { + for_each_cpu(cpu, cpu_smt_mask(cpu_of(dst_rq))) { if (!available_idle_cpu(cpu)) { idle_core =3D false; break; @@ -1229,7 +1281,8 @@ static inline bool sched_core_cookie_match(struct rq = *rq, struct task_struct *p) * A CPU in an idle core is always the best choice for tasks with * cookies. */ - return idle_core || rq->core->core_cookie =3D=3D p->core_cookie; + return idle_core || dst_rq->core->core_cookie =3D=3D p->core_cookie || + sched_core_make_pair_balance(src_rq, dst_rq, p); } =20 static inline bool sched_group_cookie_match(struct rq *rq, @@ -1243,7 +1296,7 @@ static inline bool sched_group_cookie_match(struct rq= *rq, return true; =20 for_each_cpu_and(cpu, sched_group_span(group), p->cpus_ptr) { - if (sched_core_cookie_match(rq, p)) + if (sched_core_cookie_match(NULL, rq, p)) return true; } return false; --=20 1.8.3.1