From nobody Sat Feb  7 21:36:02 2026
Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com
 [148.163.156.1])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id E77BF1AF0BC;
	Wed,  4 Dec 2024 11:22:39 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=148.163.156.1
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1733311361; cv=none;
 b=Hgvq0+RH5tnGj70/IdgIrLvVIpl82P7blqLmaNcloDBDTSLN6k+5G56MkcmHz4WI3wBjzyWd0Mr+9RJFTENJK3HmidRg2uryn9SKOKxl0IIzvUrxvIHMJqM5YxG7bJGOBvcIFDYQ5ExfznP3IkQlgV3VY2dGtr9F+mL1AlAcBG8=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1733311361; c=relaxed/simple;
	bh=gVcnqrslDvgg/OlrmHpA7UAYG1S6MF91PCAHX3QzOVA=;
	h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
	 MIME-Version;
 b=MX19zWqsKVV50SAx1L4Mbc51n8jMlrAYp4yWnL02UFq0EGOGx5eMJhLR4xSPMpfOeQD2HD6MFIgWAEmhE/7STX6W2YMFbdEjZNkH9OxJnnv/IDbGEXte0FyZ/uy+kfAzhBjtGmfEFG2l4Pq2HjW5o8cNJnEa8YSNWAIb91brRtI=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linux.ibm.com;
 spf=pass smtp.mailfrom=linux.ibm.com;
 dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com
 header.b=glk1pJYl; arc=none smtp.client-ip=148.163.156.1
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linux.ibm.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=linux.ibm.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com
 header.b="glk1pJYl"
Received: from pps.filterd (m0356517.ppops.net [127.0.0.1])
	by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id
 4B47YwYr005437;
	Wed, 4 Dec 2024 11:21:57 GMT
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc
	:content-transfer-encoding:date:from:in-reply-to:message-id
	:mime-version:references:subject:to; s=pp1; bh=x9adoXeHN0gWd9vIJ
	n+XlnfsB8ML8qkIqqiURRv5I0A=; b=glk1pJYlQMzcALSRy0uVG4yEMu9dWIjit
	p6bmeKPtLXj0AxTTWfZnT0B/CfgLCcPpncbF2uUJgfP7BD7jpHwrrV028fGlzxKa
	EPbgYJoPc2UTXbDko8pELkJl8Xw/kim4RE/Q9tgrsSo7Sp5ym8Ozxsnmt4kqMZdb
	qzm5tBObUJGmEd6pByB8NmrhxJlclvEVy94N0Ck3jqtTEhukj99piaTMI7SwOmJk
	Ivr3Unt/jEvUOLkANKfAxeAKIFpyTFb3eHYzozhQLA0cTtEMdfp6TKLzLB6zBG/G
	/GrhEGLSQSS0yhbq6G7x7l73bseLza6awHoUTTZTwhrNj98K6NK0Q==
Received: from pps.reinject (localhost [127.0.0.1])
	by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 437te985nk-1
	(version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
	Wed, 04 Dec 2024 11:21:57 +0000 (GMT)
Received: from m0356517.ppops.net (m0356517.ppops.net [127.0.0.1])
	by pps.reinject (8.18.0.8/8.18.0.8) with ESMTP id 4B4BHVvA030492;
	Wed, 4 Dec 2024 11:21:56 GMT
Received: from ppma13.dal12v.mail.ibm.com
 (dd.9e.1632.ip4.static.sl-reverse.com [50.22.158.221])
	by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 437te985nd-1
	(version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
	Wed, 04 Dec 2024 11:21:56 +0000 (GMT)
Received: from pps.filterd (ppma13.dal12v.mail.ibm.com [127.0.0.1])
	by ppma13.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id
 4B4AMf3w006840;
	Wed, 4 Dec 2024 11:21:55 GMT
Received: from smtprelay04.fra02v.mail.ibm.com ([9.218.2.228])
	by ppma13.dal12v.mail.ibm.com (PPS) with ESMTPS id 438f8jmfj8-1
	(version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
	Wed, 04 Dec 2024 11:21:55 +0000
Received: from smtpav01.fra02v.mail.ibm.com (smtpav01.fra02v.mail.ibm.com
 [10.20.54.100])
	by smtprelay04.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id
 4B4BLq9d13631820
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);
	Wed, 4 Dec 2024 11:21:52 GMT
Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1])
	by IMSVA (Postfix) with ESMTP id 0D0C920040;
	Wed,  4 Dec 2024 11:21:52 +0000 (GMT)
Received: from smtpav01.fra02v.mail.ibm.com (unknown [127.0.0.1])
	by IMSVA (Postfix) with ESMTP id 3E90F2004D;
	Wed,  4 Dec 2024 11:21:51 +0000 (GMT)
Received: from IBM-PW0CRK36.ibm.com (unknown [9.171.34.7])
	by smtpav01.fra02v.mail.ibm.com (Postfix) with ESMTP;
	Wed,  4 Dec 2024 11:21:51 +0000 (GMT)
From: Tobias Huschle <huschle@linux.ibm.com>
To: linux-kernel@vger.kernel.org
Cc: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com,
        vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
        rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de,
        vschneid@redhat.com, linux-s390@vger.kernel.org,
        linuxppc-dev@lists.ozlabs.org, sshegde@linux.ibm.com
Subject: [RFC PATCH 1/2] sched/fair: introduce new scheduler group type
 group_parked
Date: Wed,  4 Dec 2024 12:21:48 +0100
Message-Id: <20241204112149.25872-2-huschle@linux.ibm.com>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20241204112149.25872-1-huschle@linux.ibm.com>
References: <20241204112149.25872-1-huschle@linux.ibm.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
X-TM-AS-GCONF: 00
X-Proofpoint-GUID: qLlBvRaXmycuCFWlePYJllfHZYjls4A6
X-Proofpoint-ORIG-GUID: bg091gQDuB65HHjsqJRR5Gszz81P3mEj
X-Proofpoint-Virus-Version: vendor=baseguard
 engine=ICAP:2.0.293,Aquarius:18.0.1051,Hydra:6.0.680,FMLib:17.12.62.30
 definitions=2024-10-15_01,2024-10-11_01,2024-09-30_01
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0
 suspectscore=0
 priorityscore=1501 mlxlogscore=999 impostorscore=0 lowpriorityscore=0
 mlxscore=0 phishscore=0 adultscore=0 clxscore=1011 spamscore=0
 malwarescore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1
 engine=8.19.0-2411120000 definitions=main-2412040086
Content-Type: text/plain; charset="utf-8"

A parked CPU is considered to be flagged as unsuitable to process
workload at the moment, but might be become usable anytime. Depending on
the necessity for additional computation power and/or available capacity
of the underlying hardware.

A scheduler group is considered to be parked if it only contains parked
CPUs. A parked scheduler group is considered to be busier than another
if it runs more tasks than the other parked scheduler group.

Indicators whether a CPU should be parked depend on the underlying=20
hardware and must be considered to be architecture dependent.=20
Therefore the check whether a CPU is parked is architecture specific.=20
For architectures not relying on this feature, the check is a NOP.

This is more efficient and non-disruptive compared to CPU hotplug in
environments where such changes can be necessary on a frequent basis.

Signed-off-by: Tobias Huschle <huschle@linux.ibm.com>
---
 include/linux/sched/topology.h |  20 ++++++
 kernel/sched/core.c            |  10 ++-
 kernel/sched/fair.c            | 122 ++++++++++++++++++++++++++-------
 3 files changed, 127 insertions(+), 25 deletions(-)

diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index 4237daa5ac7a..cfe3c59bc329 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -270,6 +270,26 @@ unsigned long arch_scale_cpu_capacity(int cpu)
 }
 #endif
=20
+#ifndef arch_cpu_parked
+/**
+ * arch_cpu_parked - Check if a given CPU is currently parked.
+ *
+ * A parked CPU cannot run any kind of workload since underlying
+ * physical CPU should not be used at the moment .
+ *
+ * @cpu: the CPU in question.
+ *
+ * By default assume CPU is not parked
+ *
+ * Return: Parked state of CPU
+ */
+static __always_inline
+unsigned long arch_cpu_parked(int cpu)
+{
+	return false;
+}
+#endif
+
 #ifndef arch_scale_hw_pressure
 static __always_inline
 unsigned long arch_scale_hw_pressure(int cpu)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 1dee3f5ef940..8f9aeb97c396 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2437,7 +2437,7 @@ static inline bool is_cpu_allowed(struct task_struct =
*p, int cpu)
=20
 	/* Non kernel threads are not allowed during either online or offline. */
 	if (!(p->flags & PF_KTHREAD))
-		return cpu_active(cpu);
+		return !arch_cpu_parked(cpu) && cpu_active(cpu);
=20
 	/* KTHREAD_IS_PER_CPU is always allowed. */
 	if (kthread_is_per_cpu(p))
@@ -2447,6 +2447,10 @@ static inline bool is_cpu_allowed(struct task_struct=
 *p, int cpu)
 	if (cpu_dying(cpu))
 		return false;
=20
+	/* CPU should be avoided at the moment */
+	if (arch_cpu_parked(cpu))
+		return false;
+
 	/* But are allowed during online. */
 	return cpu_online(cpu);
 }
@@ -3924,6 +3928,10 @@ static inline bool ttwu_queue_cond(struct task_struc=
t *p, int cpu)
 	if (task_on_scx(p))
 		return false;
=20
+	/* The task should not be queued onto a parked CPU. */
+	if (arch_cpu_parked(cpu))
+		return false;
+
 	/*
 	 * Do not complicate things with the async wake_list while the CPU is
 	 * in hotplug state.
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 4283c818bbd1..fa1c19d285de 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7415,6 +7415,9 @@ static int wake_affine(struct sched_domain *sd, struc=
t task_struct *p,
 {
 	int target =3D nr_cpumask_bits;
=20
+	if (arch_cpu_parked(target))
+		return prev_cpu;
+
 	if (sched_feat(WA_IDLE))
 		target =3D wake_affine_idle(this_cpu, prev_cpu, sync);
=20
@@ -7454,6 +7457,9 @@ sched_balance_find_dst_group_cpu(struct sched_group *=
group, struct task_struct *
 	for_each_cpu_and(i, sched_group_span(group), p->cpus_ptr) {
 		struct rq *rq =3D cpu_rq(i);
=20
+		if (arch_cpu_parked(i))
+			continue;
+
 		if (!sched_core_cookie_match(rq, p))
 			continue;
=20
@@ -7546,10 +7552,14 @@ static inline int sched_balance_find_dst_cpu(struct=
 sched_domain *sd, struct tas
 	return new_cpu;
 }
=20
+static inline bool is_idle_cpu_allowed(int cpu)
+{
+	return !arch_cpu_parked(cpu) && (available_idle_cpu(cpu) || sched_idle_cp=
u(cpu));
+}
+
 static inline int __select_idle_cpu(int cpu, struct task_struct *p)
 {
-	if ((available_idle_cpu(cpu) || sched_idle_cpu(cpu)) &&
-	    sched_cpu_cookie_match(cpu_rq(cpu), p))
+	if (is_idle_cpu_allowed(cpu) && sched_cpu_cookie_match(cpu_rq(cpu), p))
 		return cpu;
=20
 	return -1;
@@ -7657,7 +7667,7 @@ static int select_idle_smt(struct task_struct *p, str=
uct sched_domain *sd, int t
 		 */
 		if (!cpumask_test_cpu(cpu, sched_domain_span(sd)))
 			continue;
-		if (available_idle_cpu(cpu) || sched_idle_cpu(cpu))
+		if (is_idle_cpu_allowed(cpu))
 			return cpu;
 	}
=20
@@ -7779,7 +7789,7 @@ select_idle_capacity(struct task_struct *p, struct sc=
hed_domain *sd, int target)
 	for_each_cpu_wrap(cpu, cpus, target) {
 		unsigned long cpu_cap =3D capacity_of(cpu);
=20
-		if (!available_idle_cpu(cpu) && !sched_idle_cpu(cpu))
+		if (!is_idle_cpu_allowed(cpu))
 			continue;
=20
 		fits =3D util_fits_cpu(task_util, util_min, util_max, cpu);
@@ -7850,7 +7860,7 @@ static int select_idle_sibling(struct task_struct *p,=
 int prev, int target)
 	 */
 	lockdep_assert_irqs_disabled();
=20
-	if ((available_idle_cpu(target) || sched_idle_cpu(target)) &&
+	if (is_idle_cpu_allowed(target) &&
 	    asym_fits_cpu(task_util, util_min, util_max, target))
 		return target;
=20
@@ -7858,7 +7868,7 @@ static int select_idle_sibling(struct task_struct *p,=
 int prev, int target)
 	 * If the previous CPU is cache affine and idle, don't be stupid:
 	 */
 	if (prev !=3D target && cpus_share_cache(prev, target) &&
-	    (available_idle_cpu(prev) || sched_idle_cpu(prev)) &&
+		is_idle_cpu_allowed(prev) &&
 	    asym_fits_cpu(task_util, util_min, util_max, prev)) {
=20
 		if (!static_branch_unlikely(&sched_cluster_active) ||
@@ -7890,7 +7900,7 @@ static int select_idle_sibling(struct task_struct *p,=
 int prev, int target)
 	if (recent_used_cpu !=3D prev &&
 	    recent_used_cpu !=3D target &&
 	    cpus_share_cache(recent_used_cpu, target) &&
-	    (available_idle_cpu(recent_used_cpu) || sched_idle_cpu(recent_used_cp=
u)) &&
+	    is_idle_cpu_allowed(recent_used_cpu) &&
 	    cpumask_test_cpu(recent_used_cpu, p->cpus_ptr) &&
 	    asym_fits_cpu(task_util, util_min, util_max, recent_used_cpu)) {
=20
@@ -9198,7 +9208,12 @@ enum group_type {
 	 * The CPU is overloaded and can't provide expected CPU cycles to all
 	 * tasks.
 	 */
-	group_overloaded
+	group_overloaded,
+	/*
+	 * The CPU should be avoided as it can't provide expected CPU cycles
+	 * even for small amounts of workload.
+	 */
+	group_parked
 };
=20
 enum migration_type {
@@ -9498,7 +9513,7 @@ static int detach_tasks(struct lb_env *env)
 	 * Source run queue has been emptied by another CPU, clear
 	 * LBF_ALL_PINNED flag as we will not test any task.
 	 */
-	if (env->src_rq->nr_running <=3D 1) {
+	if (env->src_rq->nr_running <=3D 1 && !arch_cpu_parked(env->src_cpu)) {
 		env->flags &=3D ~LBF_ALL_PINNED;
 		return 0;
 	}
@@ -9511,7 +9526,7 @@ static int detach_tasks(struct lb_env *env)
 		 * We don't want to steal all, otherwise we may be treated likewise,
 		 * which could at worst lead to a livelock crash.
 		 */
-		if (env->idle && env->src_rq->nr_running <=3D 1)
+		if (env->idle && env->src_rq->nr_running <=3D 1 && !arch_cpu_parked(env-=
>src_cpu))
 			break;
=20
 		env->loop++;
@@ -9870,6 +9885,8 @@ struct sg_lb_stats {
 	unsigned long group_runnable;		/* Total runnable time over the CPUs of th=
e group */
 	unsigned int sum_nr_running;		/* Nr of all tasks running in the group */
 	unsigned int sum_h_nr_running;		/* Nr of CFS tasks running in the group */
+	unsigned int sum_nr_parked;
+	unsigned int parked_cpus;
 	unsigned int idle_cpus;                 /* Nr of idle CPUs         in the=
 group */
 	unsigned int group_weight;
 	enum group_type group_type;
@@ -10127,6 +10144,9 @@ group_type group_classify(unsigned int imbalance_pc=
t,
 			  struct sched_group *group,
 			  struct sg_lb_stats *sgs)
 {
+	if (sgs->parked_cpus)
+		return group_parked;
+
 	if (group_is_overloaded(imbalance_pct, sgs))
 		return group_overloaded;
=20
@@ -10328,10 +10348,15 @@ static inline void update_sg_lb_stats(struct lb_e=
nv *env,
 		sgs->nr_numa_running +=3D rq->nr_numa_running;
 		sgs->nr_preferred_running +=3D rq->nr_preferred_running;
 #endif
+
+		if (rq->cfs.h_nr_running) {
+			sgs->parked_cpus +=3D arch_cpu_parked(i);
+			sgs->sum_nr_parked +=3D arch_cpu_parked(i) * rq->cfs.h_nr_running;
+		}
 		/*
 		 * No need to call idle_cpu() if nr_running is not 0
 		 */
-		if (!nr_running && idle_cpu(i)) {
+		if (!nr_running && idle_cpu(i) && !arch_cpu_parked(i)) {
 			sgs->idle_cpus++;
 			/* Idle cpu can't have misfit task */
 			continue;
@@ -10355,7 +10380,14 @@ static inline void update_sg_lb_stats(struct lb_en=
v *env,
=20
 	sgs->group_capacity =3D group->sgc->capacity;
=20
-	sgs->group_weight =3D group->group_weight;
+	sgs->group_weight =3D group->group_weight - sgs->parked_cpus;
+
+	/*
+	 * Only a subset of the group is parked, so the group itself has the
+	 * capability to potentially pull tasks
+	 */
+	if (sgs->parked_cpus < group->group_weight)
+		sgs->parked_cpus =3D 0;
=20
 	/* Check if dst CPU is idle and preferred to this group */
 	if (!local_group && env->idle && sgs->sum_h_nr_running &&
@@ -10422,6 +10454,8 @@ static bool update_sd_pick_busiest(struct lb_env *e=
nv,
 	 */
=20
 	switch (sgs->group_type) {
+	case group_parked:
+		return sgs->sum_nr_parked > busiest->sum_nr_parked;
 	case group_overloaded:
 		/* Select the overloaded group with highest avg_load. */
 		return sgs->avg_load > busiest->avg_load;
@@ -10633,6 +10667,9 @@ static inline void update_sg_wakeup_stats(struct sc=
hed_domain *sd,
 		nr_running =3D rq->nr_running - local;
 		sgs->sum_nr_running +=3D nr_running;
=20
+		sgs->parked_cpus +=3D arch_cpu_parked(i);
+		sgs->sum_nr_parked +=3D arch_cpu_parked(i) * rq->cfs.h_nr_running;
+
 		/*
 		 * No need to call idle_cpu_without() if nr_running is not 0
 		 */
@@ -10649,7 +10686,14 @@ static inline void update_sg_wakeup_stats(struct s=
ched_domain *sd,
=20
 	sgs->group_capacity =3D group->sgc->capacity;
=20
-	sgs->group_weight =3D group->group_weight;
+	sgs->group_weight =3D group->group_weight - sgs->parked_cpus;
+
+	/*
+	 * Only a subset of the group is parked, so the group itself has the
+	 * capability to potentially pull tasks
+	 */
+	if (sgs->parked_cpus < group->group_weight)
+		sgs->parked_cpus =3D 0;
=20
 	sgs->group_type =3D group_classify(sd->imbalance_pct, group, sgs);
=20
@@ -10680,6 +10724,8 @@ static bool update_pick_idlest(struct sched_group *=
idlest,
 	 */
=20
 	switch (sgs->group_type) {
+	case group_parked:
+		return false;
 	case group_overloaded:
 	case group_fully_busy:
 		/* Select the group with lowest avg_load. */
@@ -10730,7 +10776,7 @@ sched_balance_find_dst_group(struct sched_domain *s=
d, struct task_struct *p, int
 	unsigned long imbalance;
 	struct sg_lb_stats idlest_sgs =3D {
 			.avg_load =3D UINT_MAX,
-			.group_type =3D group_overloaded,
+			.group_type =3D group_parked,
 	};
=20
 	do {
@@ -10788,6 +10834,8 @@ sched_balance_find_dst_group(struct sched_domain *s=
d, struct task_struct *p, int
 		return idlest;
=20
 	switch (local_sgs.group_type) {
+	case group_parked:
+		return idlest;
 	case group_overloaded:
 	case group_fully_busy:
=20
@@ -11039,6 +11087,12 @@ static inline void calculate_imbalance(struct lb_e=
nv *env, struct sd_lb_stats *s
 	local =3D &sds->local_stat;
 	busiest =3D &sds->busiest_stat;
=20
+	if (busiest->group_type =3D=3D group_parked) {
+		env->migration_type =3D migrate_task;
+		env->imbalance =3D busiest->sum_nr_parked;
+		return;
+	}
+
 	if (busiest->group_type =3D=3D group_misfit_task) {
 		if (env->sd->flags & SD_ASYM_CPUCAPACITY) {
 			/* Set imbalance to allow misfit tasks to be balanced. */
@@ -11207,13 +11261,14 @@ static inline void calculate_imbalance(struct lb_=
env *env, struct sd_lb_stats *s
 /*
  * Decision matrix according to the local and busiest group type:
  *
- * busiest \ local has_spare fully_busy misfit asym imbalanced overloaded
- * has_spare        nr_idle   balanced   N/A    N/A  balanced   balanced
- * fully_busy       nr_idle   nr_idle    N/A    N/A  balanced   balanced
- * misfit_task      force     N/A        N/A    N/A  N/A        N/A
- * asym_packing     force     force      N/A    N/A  force      force
- * imbalanced       force     force      N/A    N/A  force      force
- * overloaded       force     force      N/A    N/A  force      avg_load
+ * busiest \ local has_spare fully_busy misfit asym imbalanced overloaded =
parked
+ * has_spare        nr_idle   balanced   N/A    N/A  balanced   balanced  =
balanced
+ * fully_busy       nr_idle   nr_idle    N/A    N/A  balanced   balanced  =
balanced
+ * misfit_task      force     N/A        N/A    N/A  N/A        N/A       =
N/A
+ * asym_packing     force     force      N/A    N/A  force      force     =
balanced
+ * imbalanced       force     force      N/A    N/A  force      force     =
balanced
+ * overloaded       force     force      N/A    N/A  force      avg_load  =
balanced
+ * parked           force     force      N/A    N/A  force      force     =
nr_tasks
  *
  * N/A :      Not Applicable because already filtered while updating
  *            statistics.
@@ -11222,6 +11277,8 @@ static inline void calculate_imbalance(struct lb_en=
v *env, struct sd_lb_stats *s
  * avg_load : Only if imbalance is significant enough.
  * nr_idle :  dst_cpu is not busy and the number of idle CPUs is quite
  *            different in groups.
+ * nr_task :  balancing can go either way depending on the number of runni=
ng tasks
+ *            per group
  */
=20
 /**
@@ -11252,6 +11309,13 @@ static struct sched_group *sched_balance_find_src_=
group(struct lb_env *env)
 		goto out_balanced;
=20
 	busiest =3D &sds.busiest_stat;
+	local =3D &sds.local_stat;
+
+	if (local->group_type =3D=3D group_parked)
+		goto out_balanced;
+
+	if (busiest->group_type =3D=3D group_parked)
+		goto force_balance;
=20
 	/* Misfit tasks should be dealt with regardless of the avg load */
 	if (busiest->group_type =3D=3D group_misfit_task)
@@ -11273,7 +11337,6 @@ static struct sched_group *sched_balance_find_src_g=
roup(struct lb_env *env)
 	if (busiest->group_type =3D=3D group_imbalanced)
 		goto force_balance;
=20
-	local =3D &sds.local_stat;
 	/*
 	 * If the local group is busier than the selected busiest group
 	 * don't try and pull any tasks.
@@ -11386,6 +11449,8 @@ static struct rq *sched_balance_find_src_rq(struct =
lb_env *env,
 		enum fbq_type rt;
=20
 		rq =3D cpu_rq(i);
+		if (arch_cpu_parked(i) && rq->cfs.h_nr_running)
+			return rq;
 		rt =3D fbq_classify_rq(rq);
=20
 		/*
@@ -11556,6 +11621,9 @@ static int need_active_balance(struct lb_env *env)
 {
 	struct sched_domain *sd =3D env->sd;
=20
+	if (arch_cpu_parked(env->src_cpu) && !idle_cpu(env->src_cpu))
+		return 1;
+
 	if (asym_active_balance(env))
 		return 1;
=20
@@ -11589,6 +11657,9 @@ static int should_we_balance(struct lb_env *env)
 	struct sched_group *sg =3D env->sd->groups;
 	int cpu, idle_smt =3D -1;
=20
+	if (arch_cpu_parked(env->dst_cpu))
+		return 0;
+
 	/*
 	 * Ensure the balancing environment is consistent; can happen
 	 * when the softirq triggers 'during' hotplug.
@@ -11612,7 +11683,7 @@ static int should_we_balance(struct lb_env *env)
 	cpumask_copy(swb_cpus, group_balance_mask(sg));
 	/* Try to find first idle CPU */
 	for_each_cpu_and(cpu, swb_cpus, env->cpus) {
-		if (!idle_cpu(cpu))
+		if (!idle_cpu(cpu) || arch_cpu_parked(cpu))
 			continue;
=20
 		/*
@@ -11707,7 +11778,7 @@ static int sched_balance_rq(int this_cpu, struct rq=
 *this_rq,
 	ld_moved =3D 0;
 	/* Clear this flag as soon as we find a pullable task */
 	env.flags |=3D LBF_ALL_PINNED;
-	if (busiest->nr_running > 1) {
+	if (busiest->nr_running > 1 || arch_cpu_parked(busiest->cpu)) {
 		/*
 		 * Attempt to move tasks. If sched_balance_find_src_group has found
 		 * an imbalance but busiest->nr_running <=3D 1, the group is
@@ -12721,6 +12792,9 @@ static int sched_balance_newidle(struct rq *this_rq=
, struct rq_flags *rf)
=20
 	update_misfit_status(NULL, this_rq);
=20
+	if (arch_cpu_parked(this_cpu))
+		return 0;
+
 	/*
 	 * There is a task waiting to run. No need to search for one.
 	 * Return 0; the task will be enqueued when switching to idle.
--=20
2.34.1