From nobody Thu Dec 18 20:24:51 2025 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E77BF1AF0BC; Wed, 4 Dec 2024 11:22:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733311361; cv=none; b=Hgvq0+RH5tnGj70/IdgIrLvVIpl82P7blqLmaNcloDBDTSLN6k+5G56MkcmHz4WI3wBjzyWd0Mr+9RJFTENJK3HmidRg2uryn9SKOKxl0IIzvUrxvIHMJqM5YxG7bJGOBvcIFDYQ5ExfznP3IkQlgV3VY2dGtr9F+mL1AlAcBG8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733311361; c=relaxed/simple; bh=gVcnqrslDvgg/OlrmHpA7UAYG1S6MF91PCAHX3QzOVA=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=MX19zWqsKVV50SAx1L4Mbc51n8jMlrAYp4yWnL02UFq0EGOGx5eMJhLR4xSPMpfOeQD2HD6MFIgWAEmhE/7STX6W2YMFbdEjZNkH9OxJnnv/IDbGEXte0FyZ/uy+kfAzhBjtGmfEFG2l4Pq2HjW5o8cNJnEa8YSNWAIb91brRtI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=glk1pJYl; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="glk1pJYl" Received: from pps.filterd (m0356517.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 4B47YwYr005437; Wed, 4 Dec 2024 11:21:57 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=x9adoXeHN0gWd9vIJ n+XlnfsB8ML8qkIqqiURRv5I0A=; b=glk1pJYlQMzcALSRy0uVG4yEMu9dWIjit p6bmeKPtLXj0AxTTWfZnT0B/CfgLCcPpncbF2uUJgfP7BD7jpHwrrV028fGlzxKa EPbgYJoPc2UTXbDko8pELkJl8Xw/kim4RE/Q9tgrsSo7Sp5ym8Ozxsnmt4kqMZdb qzm5tBObUJGmEd6pByB8NmrhxJlclvEVy94N0Ck3jqtTEhukj99piaTMI7SwOmJk Ivr3Unt/jEvUOLkANKfAxeAKIFpyTFb3eHYzozhQLA0cTtEMdfp6TKLzLB6zBG/G /GrhEGLSQSS0yhbq6G7x7l73bseLza6awHoUTTZTwhrNj98K6NK0Q== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 437te985nk-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Dec 2024 11:21:57 +0000 (GMT) Received: from m0356517.ppops.net (m0356517.ppops.net [127.0.0.1]) by pps.reinject (8.18.0.8/8.18.0.8) with ESMTP id 4B4BHVvA030492; Wed, 4 Dec 2024 11:21:56 GMT Received: from ppma13.dal12v.mail.ibm.com (dd.9e.1632.ip4.static.sl-reverse.com [50.22.158.221]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 437te985nd-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Dec 2024 11:21:56 +0000 (GMT) Received: from pps.filterd (ppma13.dal12v.mail.ibm.com [127.0.0.1]) by ppma13.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 4B4AMf3w006840; Wed, 4 Dec 2024 11:21:55 GMT Received: from smtprelay04.fra02v.mail.ibm.com ([9.218.2.228]) by ppma13.dal12v.mail.ibm.com (PPS) with ESMTPS id 438f8jmfj8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Dec 2024 11:21:55 +0000 Received: from smtpav01.fra02v.mail.ibm.com (smtpav01.fra02v.mail.ibm.com [10.20.54.100]) by smtprelay04.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 4B4BLq9d13631820 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 4 Dec 2024 11:21:52 GMT Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 0D0C920040; Wed, 4 Dec 2024 11:21:52 +0000 (GMT) Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 3E90F2004D; Wed, 4 Dec 2024 11:21:51 +0000 (GMT) Received: from IBM-PW0CRK36.ibm.com (unknown [9.171.34.7]) by smtpav01.fra02v.mail.ibm.com (Postfix) with ESMTP; Wed, 4 Dec 2024 11:21:51 +0000 (GMT) From: Tobias Huschle To: linux-kernel@vger.kernel.org Cc: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, linux-s390@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, sshegde@linux.ibm.com Subject: [RFC PATCH 1/2] sched/fair: introduce new scheduler group type group_parked Date: Wed, 4 Dec 2024 12:21:48 +0100 Message-Id: <20241204112149.25872-2-huschle@linux.ibm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20241204112149.25872-1-huschle@linux.ibm.com> References: <20241204112149.25872-1-huschle@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-GUID: qLlBvRaXmycuCFWlePYJllfHZYjls4A6 X-Proofpoint-ORIG-GUID: bg091gQDuB65HHjsqJRR5Gszz81P3mEj X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1051,Hydra:6.0.680,FMLib:17.12.62.30 definitions=2024-10-15_01,2024-10-11_01,2024-09-30_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 priorityscore=1501 mlxlogscore=999 impostorscore=0 lowpriorityscore=0 mlxscore=0 phishscore=0 adultscore=0 clxscore=1011 spamscore=0 malwarescore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2411120000 definitions=main-2412040086 Content-Type: text/plain; charset="utf-8" A parked CPU is considered to be flagged as unsuitable to process workload at the moment, but might be become usable anytime. Depending on the necessity for additional computation power and/or available capacity of the underlying hardware. A scheduler group is considered to be parked if it only contains parked CPUs. A parked scheduler group is considered to be busier than another if it runs more tasks than the other parked scheduler group. Indicators whether a CPU should be parked depend on the underlying=20 hardware and must be considered to be architecture dependent.=20 Therefore the check whether a CPU is parked is architecture specific.=20 For architectures not relying on this feature, the check is a NOP. This is more efficient and non-disruptive compared to CPU hotplug in environments where such changes can be necessary on a frequent basis. Signed-off-by: Tobias Huschle Acked-by: Shrikanth Hegde --- include/linux/sched/topology.h | 20 ++++++ kernel/sched/core.c | 10 ++- kernel/sched/fair.c | 122 ++++++++++++++++++++++++++------- 3 files changed, 127 insertions(+), 25 deletions(-) diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h index 4237daa5ac7a..cfe3c59bc329 100644 --- a/include/linux/sched/topology.h +++ b/include/linux/sched/topology.h @@ -270,6 +270,26 @@ unsigned long arch_scale_cpu_capacity(int cpu) } #endif =20 +#ifndef arch_cpu_parked +/** + * arch_cpu_parked - Check if a given CPU is currently parked. + * + * A parked CPU cannot run any kind of workload since underlying + * physical CPU should not be used at the moment . + * + * @cpu: the CPU in question. + * + * By default assume CPU is not parked + * + * Return: Parked state of CPU + */ +static __always_inline +unsigned long arch_cpu_parked(int cpu) +{ + return false; +} +#endif + #ifndef arch_scale_hw_pressure static __always_inline unsigned long arch_scale_hw_pressure(int cpu) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 1dee3f5ef940..8f9aeb97c396 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2437,7 +2437,7 @@ static inline bool is_cpu_allowed(struct task_struct = *p, int cpu) =20 /* Non kernel threads are not allowed during either online or offline. */ if (!(p->flags & PF_KTHREAD)) - return cpu_active(cpu); + return !arch_cpu_parked(cpu) && cpu_active(cpu); =20 /* KTHREAD_IS_PER_CPU is always allowed. */ if (kthread_is_per_cpu(p)) @@ -2447,6 +2447,10 @@ static inline bool is_cpu_allowed(struct task_struct= *p, int cpu) if (cpu_dying(cpu)) return false; =20 + /* CPU should be avoided at the moment */ + if (arch_cpu_parked(cpu)) + return false; + /* But are allowed during online. */ return cpu_online(cpu); } @@ -3924,6 +3928,10 @@ static inline bool ttwu_queue_cond(struct task_struc= t *p, int cpu) if (task_on_scx(p)) return false; =20 + /* The task should not be queued onto a parked CPU. */ + if (arch_cpu_parked(cpu)) + return false; + /* * Do not complicate things with the async wake_list while the CPU is * in hotplug state. diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 4283c818bbd1..fa1c19d285de 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7415,6 +7415,9 @@ static int wake_affine(struct sched_domain *sd, struc= t task_struct *p, { int target =3D nr_cpumask_bits; =20 + if (arch_cpu_parked(target)) + return prev_cpu; + if (sched_feat(WA_IDLE)) target =3D wake_affine_idle(this_cpu, prev_cpu, sync); =20 @@ -7454,6 +7457,9 @@ sched_balance_find_dst_group_cpu(struct sched_group *= group, struct task_struct * for_each_cpu_and(i, sched_group_span(group), p->cpus_ptr) { struct rq *rq =3D cpu_rq(i); =20 + if (arch_cpu_parked(i)) + continue; + if (!sched_core_cookie_match(rq, p)) continue; =20 @@ -7546,10 +7552,14 @@ static inline int sched_balance_find_dst_cpu(struct= sched_domain *sd, struct tas return new_cpu; } =20 +static inline bool is_idle_cpu_allowed(int cpu) +{ + return !arch_cpu_parked(cpu) && (available_idle_cpu(cpu) || sched_idle_cp= u(cpu)); +} + static inline int __select_idle_cpu(int cpu, struct task_struct *p) { - if ((available_idle_cpu(cpu) || sched_idle_cpu(cpu)) && - sched_cpu_cookie_match(cpu_rq(cpu), p)) + if (is_idle_cpu_allowed(cpu) && sched_cpu_cookie_match(cpu_rq(cpu), p)) return cpu; =20 return -1; @@ -7657,7 +7667,7 @@ static int select_idle_smt(struct task_struct *p, str= uct sched_domain *sd, int t */ if (!cpumask_test_cpu(cpu, sched_domain_span(sd))) continue; - if (available_idle_cpu(cpu) || sched_idle_cpu(cpu)) + if (is_idle_cpu_allowed(cpu)) return cpu; } =20 @@ -7779,7 +7789,7 @@ select_idle_capacity(struct task_struct *p, struct sc= hed_domain *sd, int target) for_each_cpu_wrap(cpu, cpus, target) { unsigned long cpu_cap =3D capacity_of(cpu); =20 - if (!available_idle_cpu(cpu) && !sched_idle_cpu(cpu)) + if (!is_idle_cpu_allowed(cpu)) continue; =20 fits =3D util_fits_cpu(task_util, util_min, util_max, cpu); @@ -7850,7 +7860,7 @@ static int select_idle_sibling(struct task_struct *p,= int prev, int target) */ lockdep_assert_irqs_disabled(); =20 - if ((available_idle_cpu(target) || sched_idle_cpu(target)) && + if (is_idle_cpu_allowed(target) && asym_fits_cpu(task_util, util_min, util_max, target)) return target; =20 @@ -7858,7 +7868,7 @@ static int select_idle_sibling(struct task_struct *p,= int prev, int target) * If the previous CPU is cache affine and idle, don't be stupid: */ if (prev !=3D target && cpus_share_cache(prev, target) && - (available_idle_cpu(prev) || sched_idle_cpu(prev)) && + is_idle_cpu_allowed(prev) && asym_fits_cpu(task_util, util_min, util_max, prev)) { =20 if (!static_branch_unlikely(&sched_cluster_active) || @@ -7890,7 +7900,7 @@ static int select_idle_sibling(struct task_struct *p,= int prev, int target) if (recent_used_cpu !=3D prev && recent_used_cpu !=3D target && cpus_share_cache(recent_used_cpu, target) && - (available_idle_cpu(recent_used_cpu) || sched_idle_cpu(recent_used_cp= u)) && + is_idle_cpu_allowed(recent_used_cpu) && cpumask_test_cpu(recent_used_cpu, p->cpus_ptr) && asym_fits_cpu(task_util, util_min, util_max, recent_used_cpu)) { =20 @@ -9198,7 +9208,12 @@ enum group_type { * The CPU is overloaded and can't provide expected CPU cycles to all * tasks. */ - group_overloaded + group_overloaded, + /* + * The CPU should be avoided as it can't provide expected CPU cycles + * even for small amounts of workload. + */ + group_parked }; =20 enum migration_type { @@ -9498,7 +9513,7 @@ static int detach_tasks(struct lb_env *env) * Source run queue has been emptied by another CPU, clear * LBF_ALL_PINNED flag as we will not test any task. */ - if (env->src_rq->nr_running <=3D 1) { + if (env->src_rq->nr_running <=3D 1 && !arch_cpu_parked(env->src_cpu)) { env->flags &=3D ~LBF_ALL_PINNED; return 0; } @@ -9511,7 +9526,7 @@ static int detach_tasks(struct lb_env *env) * We don't want to steal all, otherwise we may be treated likewise, * which could at worst lead to a livelock crash. */ - if (env->idle && env->src_rq->nr_running <=3D 1) + if (env->idle && env->src_rq->nr_running <=3D 1 && !arch_cpu_parked(env-= >src_cpu)) break; =20 env->loop++; @@ -9870,6 +9885,8 @@ struct sg_lb_stats { unsigned long group_runnable; /* Total runnable time over the CPUs of th= e group */ unsigned int sum_nr_running; /* Nr of all tasks running in the group */ unsigned int sum_h_nr_running; /* Nr of CFS tasks running in the group */ + unsigned int sum_nr_parked; + unsigned int parked_cpus; unsigned int idle_cpus; /* Nr of idle CPUs in the= group */ unsigned int group_weight; enum group_type group_type; @@ -10127,6 +10144,9 @@ group_type group_classify(unsigned int imbalance_pc= t, struct sched_group *group, struct sg_lb_stats *sgs) { + if (sgs->parked_cpus) + return group_parked; + if (group_is_overloaded(imbalance_pct, sgs)) return group_overloaded; =20 @@ -10328,10 +10348,15 @@ static inline void update_sg_lb_stats(struct lb_e= nv *env, sgs->nr_numa_running +=3D rq->nr_numa_running; sgs->nr_preferred_running +=3D rq->nr_preferred_running; #endif + + if (rq->cfs.h_nr_running) { + sgs->parked_cpus +=3D arch_cpu_parked(i); + sgs->sum_nr_parked +=3D arch_cpu_parked(i) * rq->cfs.h_nr_running; + } /* * No need to call idle_cpu() if nr_running is not 0 */ - if (!nr_running && idle_cpu(i)) { + if (!nr_running && idle_cpu(i) && !arch_cpu_parked(i)) { sgs->idle_cpus++; /* Idle cpu can't have misfit task */ continue; @@ -10355,7 +10380,14 @@ static inline void update_sg_lb_stats(struct lb_en= v *env, =20 sgs->group_capacity =3D group->sgc->capacity; =20 - sgs->group_weight =3D group->group_weight; + sgs->group_weight =3D group->group_weight - sgs->parked_cpus; + + /* + * Only a subset of the group is parked, so the group itself has the + * capability to potentially pull tasks + */ + if (sgs->parked_cpus < group->group_weight) + sgs->parked_cpus =3D 0; =20 /* Check if dst CPU is idle and preferred to this group */ if (!local_group && env->idle && sgs->sum_h_nr_running && @@ -10422,6 +10454,8 @@ static bool update_sd_pick_busiest(struct lb_env *e= nv, */ =20 switch (sgs->group_type) { + case group_parked: + return sgs->sum_nr_parked > busiest->sum_nr_parked; case group_overloaded: /* Select the overloaded group with highest avg_load. */ return sgs->avg_load > busiest->avg_load; @@ -10633,6 +10667,9 @@ static inline void update_sg_wakeup_stats(struct sc= hed_domain *sd, nr_running =3D rq->nr_running - local; sgs->sum_nr_running +=3D nr_running; =20 + sgs->parked_cpus +=3D arch_cpu_parked(i); + sgs->sum_nr_parked +=3D arch_cpu_parked(i) * rq->cfs.h_nr_running; + /* * No need to call idle_cpu_without() if nr_running is not 0 */ @@ -10649,7 +10686,14 @@ static inline void update_sg_wakeup_stats(struct s= ched_domain *sd, =20 sgs->group_capacity =3D group->sgc->capacity; =20 - sgs->group_weight =3D group->group_weight; + sgs->group_weight =3D group->group_weight - sgs->parked_cpus; + + /* + * Only a subset of the group is parked, so the group itself has the + * capability to potentially pull tasks + */ + if (sgs->parked_cpus < group->group_weight) + sgs->parked_cpus =3D 0; =20 sgs->group_type =3D group_classify(sd->imbalance_pct, group, sgs); =20 @@ -10680,6 +10724,8 @@ static bool update_pick_idlest(struct sched_group *= idlest, */ =20 switch (sgs->group_type) { + case group_parked: + return false; case group_overloaded: case group_fully_busy: /* Select the group with lowest avg_load. */ @@ -10730,7 +10776,7 @@ sched_balance_find_dst_group(struct sched_domain *s= d, struct task_struct *p, int unsigned long imbalance; struct sg_lb_stats idlest_sgs =3D { .avg_load =3D UINT_MAX, - .group_type =3D group_overloaded, + .group_type =3D group_parked, }; =20 do { @@ -10788,6 +10834,8 @@ sched_balance_find_dst_group(struct sched_domain *s= d, struct task_struct *p, int return idlest; =20 switch (local_sgs.group_type) { + case group_parked: + return idlest; case group_overloaded: case group_fully_busy: =20 @@ -11039,6 +11087,12 @@ static inline void calculate_imbalance(struct lb_e= nv *env, struct sd_lb_stats *s local =3D &sds->local_stat; busiest =3D &sds->busiest_stat; =20 + if (busiest->group_type =3D=3D group_parked) { + env->migration_type =3D migrate_task; + env->imbalance =3D busiest->sum_nr_parked; + return; + } + if (busiest->group_type =3D=3D group_misfit_task) { if (env->sd->flags & SD_ASYM_CPUCAPACITY) { /* Set imbalance to allow misfit tasks to be balanced. */ @@ -11207,13 +11261,14 @@ static inline void calculate_imbalance(struct lb_= env *env, struct sd_lb_stats *s /* * Decision matrix according to the local and busiest group type: * - * busiest \ local has_spare fully_busy misfit asym imbalanced overloaded - * has_spare nr_idle balanced N/A N/A balanced balanced - * fully_busy nr_idle nr_idle N/A N/A balanced balanced - * misfit_task force N/A N/A N/A N/A N/A - * asym_packing force force N/A N/A force force - * imbalanced force force N/A N/A force force - * overloaded force force N/A N/A force avg_load + * busiest \ local has_spare fully_busy misfit asym imbalanced overloaded = parked + * has_spare nr_idle balanced N/A N/A balanced balanced = balanced + * fully_busy nr_idle nr_idle N/A N/A balanced balanced = balanced + * misfit_task force N/A N/A N/A N/A N/A = N/A + * asym_packing force force N/A N/A force force = balanced + * imbalanced force force N/A N/A force force = balanced + * overloaded force force N/A N/A force avg_load = balanced + * parked force force N/A N/A force force = nr_tasks * * N/A : Not Applicable because already filtered while updating * statistics. @@ -11222,6 +11277,8 @@ static inline void calculate_imbalance(struct lb_en= v *env, struct sd_lb_stats *s * avg_load : Only if imbalance is significant enough. * nr_idle : dst_cpu is not busy and the number of idle CPUs is quite * different in groups. + * nr_task : balancing can go either way depending on the number of runni= ng tasks + * per group */ =20 /** @@ -11252,6 +11309,13 @@ static struct sched_group *sched_balance_find_src_= group(struct lb_env *env) goto out_balanced; =20 busiest =3D &sds.busiest_stat; + local =3D &sds.local_stat; + + if (local->group_type =3D=3D group_parked) + goto out_balanced; + + if (busiest->group_type =3D=3D group_parked) + goto force_balance; =20 /* Misfit tasks should be dealt with regardless of the avg load */ if (busiest->group_type =3D=3D group_misfit_task) @@ -11273,7 +11337,6 @@ static struct sched_group *sched_balance_find_src_g= roup(struct lb_env *env) if (busiest->group_type =3D=3D group_imbalanced) goto force_balance; =20 - local =3D &sds.local_stat; /* * If the local group is busier than the selected busiest group * don't try and pull any tasks. @@ -11386,6 +11449,8 @@ static struct rq *sched_balance_find_src_rq(struct = lb_env *env, enum fbq_type rt; =20 rq =3D cpu_rq(i); + if (arch_cpu_parked(i) && rq->cfs.h_nr_running) + return rq; rt =3D fbq_classify_rq(rq); =20 /* @@ -11556,6 +11621,9 @@ static int need_active_balance(struct lb_env *env) { struct sched_domain *sd =3D env->sd; =20 + if (arch_cpu_parked(env->src_cpu) && !idle_cpu(env->src_cpu)) + return 1; + if (asym_active_balance(env)) return 1; =20 @@ -11589,6 +11657,9 @@ static int should_we_balance(struct lb_env *env) struct sched_group *sg =3D env->sd->groups; int cpu, idle_smt =3D -1; =20 + if (arch_cpu_parked(env->dst_cpu)) + return 0; + /* * Ensure the balancing environment is consistent; can happen * when the softirq triggers 'during' hotplug. @@ -11612,7 +11683,7 @@ static int should_we_balance(struct lb_env *env) cpumask_copy(swb_cpus, group_balance_mask(sg)); /* Try to find first idle CPU */ for_each_cpu_and(cpu, swb_cpus, env->cpus) { - if (!idle_cpu(cpu)) + if (!idle_cpu(cpu) || arch_cpu_parked(cpu)) continue; =20 /* @@ -11707,7 +11778,7 @@ static int sched_balance_rq(int this_cpu, struct rq= *this_rq, ld_moved =3D 0; /* Clear this flag as soon as we find a pullable task */ env.flags |=3D LBF_ALL_PINNED; - if (busiest->nr_running > 1) { + if (busiest->nr_running > 1 || arch_cpu_parked(busiest->cpu)) { /* * Attempt to move tasks. If sched_balance_find_src_group has found * an imbalance but busiest->nr_running <=3D 1, the group is @@ -12721,6 +12792,9 @@ static int sched_balance_newidle(struct rq *this_rq= , struct rq_flags *rf) =20 update_misfit_status(NULL, this_rq); =20 + if (arch_cpu_parked(this_cpu)) + return 0; + /* * There is a task waiting to run. No need to search for one. * Return 0; the task will be enqueued when switching to idle. --=20 2.34.1 From nobody Thu Dec 18 20:24:51 2025 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B8E981B2183; Wed, 4 Dec 2024 11:22:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733311365; cv=none; b=elW3pNo1kIw2M70viUhUYQ94NJDYHqMDBWb1iqAroiV5EVtt8d190nB60ZeJuS6cWTczXq8LE0+dNqh6lCOxSWbO/Mi1IHY3c77mjGn3zw4GBlci0co9oGyMl8G/TXx01ips7+A+HMNjFD1reFGQ+u9SSFZsp+I2hLnPR+j3dmA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1733311365; c=relaxed/simple; bh=9d/ORHNg1wahJ2ReZ3N3CrEOxHe+jLThSxHWdKfob68=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=sBAmMrwLRVdObqjAO+a8h3aZSQ5t1QKX5t1fZ9FADbOsh6lVJoLdAKlWFG2c23BKQedO2jSuy6WPaMbqVU60jNveIX3NvKLwXfHeOr4+owv1ub0+gRaMzv7O13OF45Dpiv+sz0I3MSDdEGS5aeSAwkMqM2PwDVPcDh0kb55ghYY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=ei10aPZ0; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="ei10aPZ0" Received: from pps.filterd (m0356516.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 4B45bkis002005; Wed, 4 Dec 2024 11:21:58 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=8TXxPCu7CbB/wPbZa hVtAWP9y7IfzK6UIQ0ThrYc1FU=; b=ei10aPZ0W0EFNt4f5FDC7rDqcNZXS9d28 LN+C3t6zjjtLQyEX+OnungMDRQCgdgDyejRGtRx57LK06HsLk0sAI4A4XRk49Op5 op67h3mE1r1lxePPagj/i4p6OmthsWwawJVGkzl6vwUZTr7eIuAVRDBpdy6kMKOO oy1oxVjHR8IYwLQEhDN/x2WLjgh3MDetZYiqUVPncqjlBD0Kf3hkg51hu2qwsSQj +WxQaeAlAPAE2V7u/l+y3UWxGbTsnUcU34eeW9xovJmny6XuQFN+BVwAVTgVm4OH P0MGXWRBhtjT/76pYV1J6JLG4HDP2Fhiee2vo6zqC9EdZYCbr3Fsg== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 437r4pqba4-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Dec 2024 11:21:58 +0000 (GMT) Received: from m0356516.ppops.net (m0356516.ppops.net [127.0.0.1]) by pps.reinject (8.18.0.8/8.18.0.8) with ESMTP id 4B4BLvqT025901; Wed, 4 Dec 2024 11:21:57 GMT Received: from ppma22.wdc07v.mail.ibm.com (5c.69.3da9.ip4.static.sl-reverse.com [169.61.105.92]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 437r4pqba2-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Dec 2024 11:21:57 +0000 (GMT) Received: from pps.filterd (ppma22.wdc07v.mail.ibm.com [127.0.0.1]) by ppma22.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 4B47PMD1005218; Wed, 4 Dec 2024 11:21:56 GMT Received: from smtprelay04.fra02v.mail.ibm.com ([9.218.2.228]) by ppma22.wdc07v.mail.ibm.com (PPS) with ESMTPS id 43a2kxk981-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 04 Dec 2024 11:21:56 +0000 Received: from smtpav01.fra02v.mail.ibm.com (smtpav01.fra02v.mail.ibm.com [10.20.54.100]) by smtprelay04.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 4B4BLqqN33751558 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 4 Dec 2024 11:21:53 GMT Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D0B9620040; Wed, 4 Dec 2024 11:21:52 +0000 (GMT) Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 222D320043; Wed, 4 Dec 2024 11:21:52 +0000 (GMT) Received: from IBM-PW0CRK36.ibm.com (unknown [9.171.34.7]) by smtpav01.fra02v.mail.ibm.com (Postfix) with ESMTP; Wed, 4 Dec 2024 11:21:52 +0000 (GMT) From: Tobias Huschle To: linux-kernel@vger.kernel.org Cc: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, linux-s390@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, sshegde@linux.ibm.com Subject: [RFC PATCH 2/2] s390/topology: Add initial implementation for selection of parked CPUs Date: Wed, 4 Dec 2024 12:21:49 +0100 Message-Id: <20241204112149.25872-3-huschle@linux.ibm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20241204112149.25872-1-huschle@linux.ibm.com> References: <20241204112149.25872-1-huschle@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-GUID: ai3BELS_AXObsiID58AEPu0QlJwGpV0y X-Proofpoint-ORIG-GUID: keLr_KqCpiQ-luBGJMQ9oUvmxrqS6eqe X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1051,Hydra:6.0.680,FMLib:17.12.62.30 definitions=2024-10-15_01,2024-10-11_01,2024-09-30_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 priorityscore=1501 clxscore=1011 phishscore=0 malwarescore=0 suspectscore=0 mlxscore=0 bulkscore=0 mlxlogscore=929 adultscore=0 lowpriorityscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2411120000 definitions=main-2412040086 Content-Type: text/plain; charset="utf-8" In this simplified example, vertical low CPUs are parked generally.=20 This will later be adjusted by making the parked state dependent=20 on the overall utilization on the underlying hypervisor. Vertical lows are always bound to the highest CPU IDs. This implies that the three types of vertically polarized CPUs are always clustered by ID. This has the following implications: - There can be scheduler domains consisting of only vertical highs - There can be scheduler domains consisting of only vertical lows Signed-off-by: Tobias Huschle Acked-by: Shrikanth Hegde --- arch/s390/include/asm/topology.h | 3 +++ arch/s390/kernel/topology.c | 5 +++++ 2 files changed, 8 insertions(+) diff --git a/arch/s390/include/asm/topology.h b/arch/s390/include/asm/topol= ogy.h index cef06bffad80..e86afeccde35 100644 --- a/arch/s390/include/asm/topology.h +++ b/arch/s390/include/asm/topology.h @@ -99,6 +99,9 @@ static inline int numa_node_id(void) =20 #endif /* CONFIG_NUMA */ =20 +#define arch_cpu_parked cpu_parked +int cpu_parked(int cpu); + #include =20 #endif /* _ASM_S390_TOPOLOGY_H */ diff --git a/arch/s390/kernel/topology.c b/arch/s390/kernel/topology.c index 4f9c301a705b..1032b65da574 100644 --- a/arch/s390/kernel/topology.c +++ b/arch/s390/kernel/topology.c @@ -299,6 +299,11 @@ void store_topology(struct sysinfo_15_1_x *info) stsi(info, 15, 1, topology_mnest_limit()); } =20 +int cpu_parked(int cpu) +{ + return smp_cpu_get_polarization(cpu) =3D=3D POLARIZATION_VL; +} + static void __arch_update_dedicated_flag(void *arg) { if (topology_cpu_dedicated(smp_processor_id())) --=20 2.34.1