From nobody Mon Feb  9 01:56:07 2026
Received: from mail-wm1-f50.google.com (mail-wm1-f50.google.com
 [209.85.128.50])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 701C01F8934
	for <linux-kernel@vger.kernel.org>; Tue, 17 Dec 2024 16:07:32 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.50
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1734451654; cv=none;
 b=YEfdvaXUT2RdLn28CQnVYwFaAu2cA6ky+ehjfUXzKo63nvZRSBmk6aeu26vBkeSy3KqXGAmZS5hS/7LaQFpVIgizYQRNrN2j8bV5VW1h/9g5uG3fCh45jSezClFueff95YYs3nUkJBlVWQD1d67L/bWqLSIq8NuhKWF+NigsMyk=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1734451654; c=relaxed/simple;
	bh=ChB9rkCJalnir+1cQs2OVh2xTgbBQH3OQ1gRFUyeu5w=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=YhKv7in6FCiPeN7D1Rlw1Bj//D45GC10QsYhiL67iHlxpxnGRJPNXQU+pxt2uGF92kxxx0NH+ImuqixNEMoB0arDHZZFE4+bZf7b3TQnWP1rC6TH+216UlGup+77YT2gGzHvsz86JMKSXDrrLq9Wd0BE7UnD4kpsNNp5fnl9Q6U=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linaro.org;
 spf=pass smtp.mailfrom=linaro.org;
 dkim=pass (2048-bit key) header.d=linaro.org header.i=@linaro.org
 header.b=tuuY8fuc; arc=none smtp.client-ip=209.85.128.50
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linaro.org
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=linaro.org
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=linaro.org header.i=@linaro.org
 header.b="tuuY8fuc"
Received: by mail-wm1-f50.google.com with SMTP id
 5b1f17b1804b1-434a766b475so58080695e9.1
        for <linux-kernel@vger.kernel.org>;
 Tue, 17 Dec 2024 08:07:32 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=linaro.org; s=google; t=1734451651; x=1735056451;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=JrPmZowSuU/OPzeWslOE/34oeJTgBYavScUT3YyMJo8=;
        b=tuuY8fucclpOSaz8GkWRj9NwwH/PL5pC4oUyQIZvaRzNm/VBot1HpruL6iA4xRvCzV
         vuXYz+unBpsHceiJfustvhSLim8aND//f/QTtpYrUL5pXQP8FzgboCzn9O25Ke3UIY0A
         IXwn8rXdsinCSCScPla4BixYvvT1UbRb9mv3xihnaKe+pSRSoqqUtrLSKVRkxISgTgOB
         ZwgvwX8u2B98jhJYYFzEinOB2h7ahjSRUXOTF/+5a1ovWRtXJUzb6SguQswPbMyvC4X1
         nAEq4tc0MDNMi+4k5rLT5RLv9E50Kik3Dog67CntAOhWUw9s/j0iLHM0XExZTGNR//Iz
         tQ/Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1734451651; x=1735056451;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=JrPmZowSuU/OPzeWslOE/34oeJTgBYavScUT3YyMJo8=;
        b=LMpVrVOJBB2mALzWhdKXu2Hp3ubv3UkOjYG604+yn+LnvQynhtXHR/X+jnsPaRNnC5
         hsNrMmKSm5FUQLoR2hGCKJ5uiR2taaXh6k4D0AQDEm1OFN05xQ03unavfPVoRjxrvQsj
         GmjbusVUGFDETr4+wNqGNyBl0yM0e6LqU8UriICq31mBultgnEKjg2I2BKKZgIt1AWYv
         c7EtdkY0zZX2PtY6YjQ0sSATCLG5PqItDTO4VGNQZFQpdRDj0aS5JA+tCcZ+X+UG0c18
         r944nOuFEfFuCNrdiHDqxB2TCA/3ubv823a9tlBk0iPbK0GvoLD38GOUbhe9cDp4ZXDX
         I+zA==
X-Forwarded-Encrypted: i=1;
 AJvYcCU04aGBiJXkd2f9SwxbQfyQS0MTjAMBTkQP/5MlaWk8UZCAk/1fKSrCbQC+vGz7OwveyD34n+OTyZgeWQ0=@vger.kernel.org
X-Gm-Message-State: AOJu0YzVVtT856Xm4Yt5aqnyYIYIGcIweS8XcQfoCpZ0mNBrjV8UujBR
	ap9GKXo+umESbiEciH23YjpJRPRwashclBgLDYa6uqsMsoEkcDHuBOroH/vGZa4=
X-Gm-Gg: ASbGncurFcvNffgXYuZBd/pt/Uwkhq10Jk2ZkZxZ/KZydcbyp0nsZaeS/Vgey6a8f6C
	SuVbJ8W0zkRVQy18z9bUv9FrW+ixyIOTZyCSMoEW/1RjUKESYxpsuCaFMHea4N+rMTBXjRAf5To
	0FgzlnumcvNc7FTEXiwxiTArVvG/UgfPZxpWUSgqpAcdn+OC0vxmK8jGHCz4yXCufbq2H4vREAk
	q0BCqp9MIGwd7iV0rb7TdlhqyoxZWepU1OquYrEblEYT9owkIKLI46ABBHxR1CDsQ==
X-Google-Smtp-Source: 
 AGHT+IHJFtSwOUMIGtwqFV8KL7qRzDMmi7hU8fXSsk8NvThtMKigae0XsX3vEZ8XECa7t0FCwcDzrQ==
X-Received: by 2002:a05:600c:5023:b0:434:e9ee:c2d with SMTP id
 5b1f17b1804b1-4362aa9789dmr129842055e9.26.1734451650627;
        Tue, 17 Dec 2024 08:07:30 -0800 (PST)
Received: from vingu-cube.. ([2a01:e0a:f:6020:4e5f:e8c8:aade:2d1b])
        by smtp.gmail.com with ESMTPSA id
 5b1f17b1804b1-436257176a4sm176739435e9.38.2024.12.17.08.07.28
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 17 Dec 2024 08:07:29 -0800 (PST)
From: Vincent Guittot <vincent.guittot@linaro.org>
To: mingo@redhat.com,
	peterz@infradead.org,
	juri.lelli@redhat.com,
	dietmar.eggemann@arm.com,
	rostedt@goodmis.org,
	bsegall@google.com,
	mgorman@suse.de,
	vschneid@redhat.com,
	lukasz.luba@arm.com,
	rafael.j.wysocki@intel.com,
	linux-kernel@vger.kernel.org
Cc: qyousef@layalina.io,
	hongyan.xia2@arm.com,
	pierre.gondois@arm.com,
	christian.loehle@arm.com,
	qperret@google.com,
	Vincent Guittot <vincent.guittot@linaro.org>
Subject: [PATCH 1/7 v2] sched/fair: Filter false overloaded_group case for EAS
Date: Tue, 17 Dec 2024 17:07:14 +0100
Message-ID: <20241217160720.2397239-2-vincent.guittot@linaro.org>
X-Mailer: git-send-email 2.43.0
In-Reply-To: <20241217160720.2397239-1-vincent.guittot@linaro.org>
References: <20241217160720.2397239-1-vincent.guittot@linaro.org>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

With EAS, a group should be set overloaded if at least 1 CPU in the group
is overutilized bit it can happen that a CPU is fully utilized by tasks
because of clamping the compute capacity of the CPU. In such case, the CPU
is not overutilized and as a result should not be set overloaded as well.

group_overloaded being a higher priority than group_misfit, such group can
be selected as the busiest group instead of a group with a mistfit task
and prevents load_balance to select the CPU with the misfit task to pull
the latter on a fitting CPU.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Tested-by: Pierre Gondois <pierre.gondois@arm.com>
---
 kernel/sched/fair.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2c4ebfc82917..893eb6844642 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9916,6 +9916,7 @@ struct sg_lb_stats {
 	unsigned int group_asym_packing;	/* Tasks should be moved to preferred CP=
U */
 	unsigned int group_smt_balance;		/* Task on busy SMT be moved */
 	unsigned long group_misfit_task_load;	/* A CPU has a task too big for its=
 capacity */
+	unsigned int group_overutilized;	/* At least one CPU is overutilized in t=
he group */
 #ifdef CONFIG_NUMA_BALANCING
 	unsigned int nr_numa_running;
 	unsigned int nr_preferred_running;
@@ -10148,6 +10149,13 @@ group_has_capacity(unsigned int imbalance_pct, str=
uct sg_lb_stats *sgs)
 static inline bool
 group_is_overloaded(unsigned int imbalance_pct, struct sg_lb_stats *sgs)
 {
+	/*
+	 * With EAS and uclamp, 1 CPU in the group must be overutilized to
+	 * consider the group overloaded.
+	 */
+	if (sched_energy_enabled() && !sgs->group_overutilized)
+		return false;
+
 	if (sgs->sum_nr_running <=3D sgs->group_weight)
 		return false;
=20
@@ -10361,8 +10369,10 @@ static inline void update_sg_lb_stats(struct lb_en=
v *env,
 		if (nr_running > 1)
 			*sg_overloaded =3D 1;
=20
-		if (cpu_overutilized(i))
+		if (cpu_overutilized(i)) {
 			*sg_overutilized =3D 1;
+			sgs->group_overutilized =3D 1;
+		}
=20
 #ifdef CONFIG_NUMA_BALANCING
 		sgs->nr_numa_running +=3D rq->nr_numa_running;
--=20
2.43.0
From nobody Mon Feb  9 01:56:07 2026
Received: from mail-wm1-f54.google.com (mail-wm1-f54.google.com
 [209.85.128.54])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 90A591F8AD2
	for <linux-kernel@vger.kernel.org>; Tue, 17 Dec 2024 16:07:34 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.54
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1734451656; cv=none;
 b=IYL2eWxQz+QjD17bXwq5UiJo+wPU2zz9zMzflLS0XWwPnAJC6sIJoMzMXLJoty5OJY9pvXEK63JWFfMqEXFiurn3Zri7P+kT3jlwWagcdrNZOoDvbNwfxrqRTt0ryXJq517tW4AGuleRwNyUZDd+cyoAE7VVULA2BqMrQSS6mAI=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1734451656; c=relaxed/simple;
	bh=R4MYYzVeN7Cvaqn4APzrKkIp071ZgTGaGp2aCy959UE=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=Mx3+w2qIeplLVOoqkjWVHZVGYXm1TlF9R1NhxWhQb1WUbg7eDyLLMXAeQurSh4FIWFBkOBc2uYG/uJlkB7r7c7oWLE8FhdSrcIGxpAe5l63GAZZ3pOr6sXE3hWIfWByxaBGqPhNYNo6GjAmjqmOIXlIIr2nZnq1i+lATdZBA3Dg=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linaro.org;
 spf=pass smtp.mailfrom=linaro.org;
 dkim=pass (2048-bit key) header.d=linaro.org header.i=@linaro.org
 header.b=qEMGk9tL; arc=none smtp.client-ip=209.85.128.54
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linaro.org
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=linaro.org
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=linaro.org header.i=@linaro.org
 header.b="qEMGk9tL"
Received: by mail-wm1-f54.google.com with SMTP id
 5b1f17b1804b1-434b3e32e9dso59846975e9.2
        for <linux-kernel@vger.kernel.org>;
 Tue, 17 Dec 2024 08:07:34 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=linaro.org; s=google; t=1734451653; x=1735056453;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=U4bmSIu+L/EncSLSMEzPcE+L0bLtnlBnj/TGrULXeHU=;
        b=qEMGk9tLZ3C2EOjlOQ0H/WWZQW7/N2JSGxfptZcgjeVa1GMeEGZFr0m5g4MOC1K8v1
         bThk6ZnHvV05fXr1Dblyh+4+8M4nuLIdcnWn+T52OyZSg/mPSv0rOAU6fH/ryyECm4c0
         2/tWlphNi9150zsGCLgeBFSc9G3azhtfl630NOZry+IeELouLJwUQu3vt2Dx1KNizbar
         ojpDpGqQtXdUtbztnYzeS3JoayAdlZtxnQQ2iyNjZOX0Y5Sy4oMsO12dpGFVS/KIQgd6
         8v6PVTE9KOYccrc0OZz1WOW2zmYxBX5rz3p5Rf9yTa9dbytHYrTA4J0WYEbvHjVWbr71
         n5mw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1734451653; x=1735056453;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=U4bmSIu+L/EncSLSMEzPcE+L0bLtnlBnj/TGrULXeHU=;
        b=VrhmxRqeYZTY/VWzHP1n00EFHZY97DeaxaTAY9GuJDQOPG3v3eJz6ZX9I2h7R/rz3M
         iYq1FSH8wc7wfcJsFdc3l86kHHaqAzkWTwc/AU2EznYh53zpsTGXw0Hw03WtZcUVoNBV
         u74eFtPa4kj/OqwR9Yz5ubD3a31xBHucwYWHqBjcSKF3hfid3NxTJklqex2N9t32/QX6
         BwKc1JdbgppyoDBUh3LDh8E6MsK3cqQ0Q3dvNtvhs111meTPD7DXtwYrws6fBqPDXo7d
         Q8HsdYtGf0CnmIvNY1GPHvmj5UMMYeCH9ms9l1yGoQd4966Y4c4uLbNk0VSPPwCQmdLX
         lOEA==
X-Forwarded-Encrypted: i=1;
 AJvYcCXRDUwJVI5K3cH/j+4gEypZWR7mNZBazhdOTbRtYIIopUimLc9PZo0ngET95woHB0cZRqasS1ztevfkZkU=@vger.kernel.org
X-Gm-Message-State: AOJu0YyPuHpph4z0VNpHksXDmKnqz19K1RbqVTKAmT90JIoBk67tvbeU
	bC+LMXFQaX0JqsDUpFUJn05/VGA0DsDQhCZjU7brECU65UGAEiJau4VMdtKq8IY=
X-Gm-Gg: ASbGncsFeRGCuJ033CoiHo7dnFoLce53DS+RjgmO/GE/ItgfE4O8SuzF8rYDNxV6Et0
	kXaVD18I9ARkatILuDl9COEcmI/Z6xmm0sdjooSwvm+y2AnKCcNwWwlKCjRP7Jo0EG0blk4TM+N
	K8g4PX0R7uZIhvyGBR4PB3GqBAhk6PxCo1LhFTv2Ny+SSF/FLvISaq/x2XZ52eKHJLov3IdCAOy
	NvpGfRzLmabVRODj0nxid22+Q4LXsgsqKb/sOHkw7sbHECNm9r92+0PVcCuJY+ZHw==
X-Google-Smtp-Source: 
 AGHT+IELpAYJXDbMiY5rjliFvVtKAU3HzdADieoHqQn9XM30JARtXqfo8AI0d70QRKRvtNqvTCKDwA==
X-Received: by 2002:a05:600c:154b:b0:435:172:5052 with SMTP id
 5b1f17b1804b1-4362aa113e3mr127928155e9.1.1734451652851;
        Tue, 17 Dec 2024 08:07:32 -0800 (PST)
Received: from vingu-cube.. ([2a01:e0a:f:6020:4e5f:e8c8:aade:2d1b])
        by smtp.gmail.com with ESMTPSA id
 5b1f17b1804b1-436257176a4sm176739435e9.38.2024.12.17.08.07.30
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 17 Dec 2024 08:07:31 -0800 (PST)
From: Vincent Guittot <vincent.guittot@linaro.org>
To: mingo@redhat.com,
	peterz@infradead.org,
	juri.lelli@redhat.com,
	dietmar.eggemann@arm.com,
	rostedt@goodmis.org,
	bsegall@google.com,
	mgorman@suse.de,
	vschneid@redhat.com,
	lukasz.luba@arm.com,
	rafael.j.wysocki@intel.com,
	linux-kernel@vger.kernel.org
Cc: qyousef@layalina.io,
	hongyan.xia2@arm.com,
	pierre.gondois@arm.com,
	christian.loehle@arm.com,
	qperret@google.com,
	Vincent Guittot <vincent.guittot@linaro.org>
Subject: [PATCH 2/7 v2] energy model: Add a get previous state function
Date: Tue, 17 Dec 2024 17:07:15 +0100
Message-ID: <20241217160720.2397239-3-vincent.guittot@linaro.org>
X-Mailer: git-send-email 2.43.0
In-Reply-To: <20241217160720.2397239-1-vincent.guittot@linaro.org>
References: <20241217160720.2397239-1-vincent.guittot@linaro.org>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Instead of parsing all EM table everytime, add a function to get the
previous state.

Will be used in the scheduler feec() function.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 include/linux/energy_model.h | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
index 752e0b297582..26d0ff72feac 100644
--- a/include/linux/energy_model.h
+++ b/include/linux/energy_model.h
@@ -215,6 +215,26 @@ em_pd_get_efficient_state(struct em_perf_state *table,
 	return max_ps;
 }
=20
+static inline int
+em_pd_get_previous_state(struct em_perf_state *table,
+			 struct em_perf_domain *pd, int idx)
+{
+	unsigned long pd_flags =3D pd->flags;
+	int min_ps =3D pd->min_perf_state;
+	struct em_perf_state *ps;
+	int i;
+
+	for (i =3D idx - 1; i >=3D min_ps; i--) {
+		ps =3D &table[i];
+		if (pd_flags & EM_PERF_DOMAIN_SKIP_INEFFICIENCIES &&
+		    ps->flags & EM_PERF_STATE_INEFFICIENT)
+			continue;
+		return i;
+	}
+
+	return -1;
+}
+
 /**
  * em_cpu_energy() - Estimates the energy consumed by the CPUs of a
  *		performance domain
@@ -361,6 +381,19 @@ static inline struct em_perf_domain *em_pd_get(struct =
device *dev)
 {
 	return NULL;
 }
+static inline int
+em_pd_get_efficient_state(struct em_perf_state *table,
+			  struct em_perf_domain *pd, unsigned long max_util)
+{
+	return 0;
+}
+
+static inline int
+em_pd_get_previous_state(struct em_perf_state *table, int nr_perf_states,
+			  int idx, unsigned long pd_flags)
+{
+	return -1;
+}
 static inline unsigned long em_cpu_energy(struct em_perf_domain *pd,
 			unsigned long max_util, unsigned long sum_util,
 			unsigned long allowed_cpu_cap)
--=20
2.43.0
From nobody Mon Feb  9 01:56:07 2026
Received: from mail-wm1-f41.google.com (mail-wm1-f41.google.com
 [209.85.128.41])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 047F21F8EE9
	for <linux-kernel@vger.kernel.org>; Tue, 17 Dec 2024 16:07:36 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.41
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1734451659; cv=none;
 b=tEf326bQ29NNlXd4Jg7zZ4t4kX3cpB9c9wmQzLFdsnAPtdaNrrV6jUqpAAf9RKawKB/cdFHX/8GFkLUepnTvYEpBdXUH0gYvgecNp+2DPS3qjphya/7XAXpr+a7aun5hXBbQiji1EszrASDrw85bPKE8THFnw+sp5YCsukoJflg=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1734451659; c=relaxed/simple;
	bh=eqSL4gI8AmkIRNy5cxp7axcow2k8p0+vNOKNGFhRzjY=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=s3VyrNf9wDuD6RK5QVWqkOkjUYM7e7rOoQgSOkReiqhA7UZnKsWtYBBSuChl9U1XHa9VP7UjSgopn6zgXu+2m5Qx/+k08cbgcJKDaGKr5LcvSn1pF+t4VW3aRVQdlUwA1sCkHZm01uJpdJo863Klr8FdwuCXCGBoew9gnBq1qfU=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linaro.org;
 spf=pass smtp.mailfrom=linaro.org;
 dkim=pass (2048-bit key) header.d=linaro.org header.i=@linaro.org
 header.b=q0v8fd6q; arc=none smtp.client-ip=209.85.128.41
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linaro.org
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=linaro.org
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=linaro.org header.i=@linaro.org
 header.b="q0v8fd6q"
Received: by mail-wm1-f41.google.com with SMTP id
 5b1f17b1804b1-436202dd730so39621795e9.2
        for <linux-kernel@vger.kernel.org>;
 Tue, 17 Dec 2024 08:07:36 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=linaro.org; s=google; t=1734451655; x=1735056455;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=e6jf2/K5umJYQlG0kNTWIgTs86h7wwbE6VMVlXmKt+U=;
        b=q0v8fd6qA9ijrzI5Ry1atRHyzqLeWpc3ht6sjn4ioovQZWVEB6HgdGVxB7L/yjRHSZ
         UauuWoaLroj9g1gH2noPccCtT3PvjqldA7/0cxcCS+CCRu4o2aZ/hLUQbf+Tjt0R2ts+
         Vrb7g+9zJcrl87y9mvLiwjhFUs2GdiOhzbwqyR9QgB16BhCvJDdCcm0npS8Z78CuOqig
         YAUnYDEygYOBsPFiRvYwpyiCAmlN+/aHVZzvEXva8TfIZ94Fd0A9V55IdcsKOMhu5zYB
         +0ACoZy6M6GFRzWF3R2W5JAaS9kugx/JewU8LfW24PT8mi6vehkIvDcXTNyZ5wK49WX/
         P5Uw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1734451655; x=1735056455;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=e6jf2/K5umJYQlG0kNTWIgTs86h7wwbE6VMVlXmKt+U=;
        b=SzTzBnVN9UWeiBvmdka/5YRpsk+vuAWEh9EXrL+jvHMUqGIVQj2EWk4/hVHei+HIey
         cgQgcChxZNZrVIrXV+KJeocWZT0y2hRoKW1JMwYGtGLPNXUBAGQZx5eaYGYMwoCTfGwV
         oNSV0mOgt4r8xvu9959MCXzqolIR8vWgBA48LSd0/xqDBMgsN278Vd+kFjwHi6sIpdpF
         GORetKmDCWjqeHGdm9/uQDTrHwpLyLVMorpmeTbkgTU6kwnBBrmbugF0QN0H7rHptPPi
         RI7IJ0hSiF7OdMUIU5pzwnuJhyEVPVeT5PGUhjNiKpP76c1GNr63UogQiV/fw6zS5Qny
         qoWA==
X-Forwarded-Encrypted: i=1;
 AJvYcCX9DTABleqv9+nm7fDcZjUSht+wp2WwdJKhjb9zxZodiXu+6AsRr0Px/k0V1sAXtk0hHbKJK+2hMjzy19A=@vger.kernel.org
X-Gm-Message-State: AOJu0YzpcWshwrrWOvTma/+7qPpUaPsnOjdPObtzhHkN4Y/t1dkPXXU6
	vd4Ji15vWYGD4B3wWOsZEli5f5GAeKoEVwcq3LGlfIG4iDgSH7mxMhXZQNgBk97pBXa2zPPzuQ9
	2
X-Gm-Gg: ASbGncsJ6UC8TDK5h9ovfZJOTzwN6EidtwBWzoaYKj74GZ4FCOCRvtx4r7m5sWLvkjj
	WJWQxkvkBYsL79YJoMjiX33jI4TQmhQhEjCmMn9JRl1vOYvsltc9tmVloE/Xuyr9VsPWxCSqQne
	8HGWSXMw8ZImp2FaSnqegGaOgjOgNuPY9LX09j/5rYGp2GKOWNqKhLQHGBHw4dBBbuwrS1tXOIo
	JbcBwnC0Q8ViYmLehm/hHMNOIkhocLfYpxBzjDM1UM4PD9MKDolFCszDCefC8Vw7g==
X-Google-Smtp-Source: 
 AGHT+IH5vQ+9mXAbIQ9liuZsjfFS9xg0URvvM2qHVRNv8ijMnkSRXHd5w+j2k6xA4IjOLPcghzlEcw==
X-Received: by 2002:a05:600c:4f01:b0:434:f3a1:b210 with SMTP id
 5b1f17b1804b1-4362aab0faamr143508715e9.32.1734451654962;
        Tue, 17 Dec 2024 08:07:34 -0800 (PST)
Received: from vingu-cube.. ([2a01:e0a:f:6020:4e5f:e8c8:aade:2d1b])
        by smtp.gmail.com with ESMTPSA id
 5b1f17b1804b1-436257176a4sm176739435e9.38.2024.12.17.08.07.32
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 17 Dec 2024 08:07:34 -0800 (PST)
From: Vincent Guittot <vincent.guittot@linaro.org>
To: mingo@redhat.com,
	peterz@infradead.org,
	juri.lelli@redhat.com,
	dietmar.eggemann@arm.com,
	rostedt@goodmis.org,
	bsegall@google.com,
	mgorman@suse.de,
	vschneid@redhat.com,
	lukasz.luba@arm.com,
	rafael.j.wysocki@intel.com,
	linux-kernel@vger.kernel.org
Cc: qyousef@layalina.io,
	hongyan.xia2@arm.com,
	pierre.gondois@arm.com,
	christian.loehle@arm.com,
	qperret@google.com,
	Vincent Guittot <vincent.guittot@linaro.org>
Subject: [PATCH 3/7 v2] sched/fair: Rework feec() to use cost instead of spare
 capacity
Date: Tue, 17 Dec 2024 17:07:16 +0100
Message-ID: <20241217160720.2397239-4-vincent.guittot@linaro.org>
X-Mailer: git-send-email 2.43.0
In-Reply-To: <20241217160720.2397239-1-vincent.guittot@linaro.org>
References: <20241217160720.2397239-1-vincent.guittot@linaro.org>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

feec() looks for the CPU with highest spare capacity in a PD assuming that
it will be the best CPU from a energy efficiency PoV because it will
require the smallest increase of OPP. Although this is true generally
speaking, this policy also filters some others CPUs which will be as
efficients because of using the same OPP.
In fact, we really care about the cost of the new OPP that will be
selected to handle the waking task. In many cases, several CPUs will end
up selecting the same OPP and as a result using the same energy cost. In
these cases, we can use other metrics to select the best CPU for the same
energy cost.

Rework feec() to look 1st for the lowest cost in a PD and then the most
performant CPU between CPUs. The cost of the OPP remains the only
comparison criteria between Performance Domains.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 463 +++++++++++++++++++++++---------------------
 1 file changed, 241 insertions(+), 222 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 893eb6844642..cd046e8216a9 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8228,29 +8228,37 @@ unsigned long sched_cpu_util(int cpu)
 }
=20
 /*
- * energy_env - Utilization landscape for energy estimation.
- * @task_busy_time: Utilization contribution by the task for which we test=
 the
- *                  placement. Given by eenv_task_busy_time().
- * @pd_busy_time:   Utilization of the whole perf domain without the task
- *                  contribution. Given by eenv_pd_busy_time().
- * @cpu_cap:        Maximum CPU capacity for the perf domain.
- * @pd_cap:         Entire perf domain capacity. (pd->nr_cpus * cpu_cap).
- */
-struct energy_env {
-	unsigned long task_busy_time;
-	unsigned long pd_busy_time;
-	unsigned long cpu_cap;
-	unsigned long pd_cap;
+ * energy_cpu_stat - Utilization landscape for energy estimation.
+ * @idx :        Index of the OPP in the performance domain
+ * @cost :       Cost of the OPP
+ * @max_perf :   Compute capacity of OPP
+ * @min_perf :   Compute capacity of the previous OPP
+ * @capa :       Capacity of the CPU
+ * @runnable :   runnbale_avg of the CPU
+ * @nr_running : number of cfs running task
+ * @fits :       Fits level of the CPU
+ * @cpu :        current best CPU
+ */
+struct energy_cpu_stat {
+	unsigned long idx;
+	unsigned long cost;
+	unsigned long max_perf;
+	unsigned long min_perf;
+	unsigned long capa;
+	unsigned long util;
+	unsigned long runnable;
+	unsigned int nr_running;
+	int fits;
+	int cpu;
 };
=20
 /*
- * Compute the task busy time for compute_energy(). This time cannot be
- * injected directly into effective_cpu_util() because of the IRQ scaling.
+ * Compute the task busy time for computing its energy impact. This time c=
annot
+ * be injected directly into effective_cpu_util() because of the IRQ scali=
ng.
  * The latter only makes sense with the most recent CPUs where the task has
  * run.
  */
-static inline void eenv_task_busy_time(struct energy_env *eenv,
-				       struct task_struct *p, int prev_cpu)
+static inline unsigned long task_busy_time(struct task_struct *p, int prev=
_cpu)
 {
 	unsigned long busy_time, max_cap =3D arch_scale_cpu_capacity(prev_cpu);
 	unsigned long irq =3D cpu_util_irq(cpu_rq(prev_cpu));
@@ -8260,124 +8268,150 @@ static inline void eenv_task_busy_time(struct ene=
rgy_env *eenv,
 	else
 		busy_time =3D scale_irq_capacity(task_util_est(p), irq, max_cap);
=20
-	eenv->task_busy_time =3D busy_time;
+	return busy_time;
 }
=20
-/*
- * Compute the perf_domain (PD) busy time for compute_energy(). Based on t=
he
- * utilization for each @pd_cpus, it however doesn't take into account
- * clamping since the ratio (utilization / cpu_capacity) is already enough=
 to
- * scale the EM reported power consumption at the (eventually clamped)
- * cpu_capacity.
- *
- * The contribution of the task @p for which we want to estimate the
- * energy cost is removed (by cpu_util()) and must be calculated
- * separately (see eenv_task_busy_time). This ensures:
- *
- *   - A stable PD utilization, no matter which CPU of that PD we want to =
place
- *     the task on.
- *
- *   - A fair comparison between CPUs as the task contribution (task_util(=
))
- *     will always be the same no matter which CPU utilization we rely on
- *     (util_avg or util_est).
- *
- * Set @eenv busy time for the PD that spans @pd_cpus. This busy time can't
- * exceed @eenv->pd_cap.
- */
-static inline void eenv_pd_busy_time(struct energy_env *eenv,
-				     struct cpumask *pd_cpus,
-				     struct task_struct *p)
+/* Estimate the utilization of the CPU that is then used to select the OPP=
 */
+static unsigned long find_cpu_max_util(int cpu, struct task_struct *p, int=
 dst_cpu)
 {
-	unsigned long busy_time =3D 0;
-	int cpu;
+	unsigned long util =3D cpu_util(cpu, p, dst_cpu, 1);
+	unsigned long eff_util, min, max;
+
+	/*
+	 * Performance domain frequency: utilization clamping
+	 * must be considered since it affects the selection
+	 * of the performance domain frequency.
+	 */
+	eff_util =3D effective_cpu_util(cpu, util, &min, &max);
=20
-	for_each_cpu(cpu, pd_cpus) {
-		unsigned long util =3D cpu_util(cpu, p, -1, 0);
+	/* Task's uclamp can modify min and max value */
+	if (uclamp_is_used() && cpu =3D=3D dst_cpu) {
+		min =3D max(min, uclamp_eff_value(p, UCLAMP_MIN));
=20
-		busy_time +=3D effective_cpu_util(cpu, util, NULL, NULL);
+		/*
+		 * If there is no active max uclamp constraint,
+		 * directly use task's one, otherwise keep max.
+		 */
+		if (uclamp_rq_is_idle(cpu_rq(cpu)))
+			max =3D uclamp_eff_value(p, UCLAMP_MAX);
+		else
+			max =3D max(max, uclamp_eff_value(p, UCLAMP_MAX));
 	}
=20
-	eenv->pd_busy_time =3D min(eenv->pd_cap, busy_time);
+	eff_util =3D sugov_effective_cpu_perf(cpu, eff_util, min, max);
+	return eff_util;
 }
=20
-/*
- * Compute the maximum utilization for compute_energy() when the task @p
- * is placed on the cpu @dst_cpu.
- *
- * Returns the maximum utilization among @eenv->cpus. This utilization can=
't
- * exceed @eenv->cpu_cap.
- */
-static inline unsigned long
-eenv_pd_max_util(struct energy_env *eenv, struct cpumask *pd_cpus,
-		 struct task_struct *p, int dst_cpu)
+/* Estimate the utilization of the CPU without the task */
+static unsigned long find_cpu_actual_util(int cpu, struct task_struct *p)
 {
-	unsigned long max_util =3D 0;
-	int cpu;
+	unsigned long util =3D cpu_util(cpu, p, -1, 0);
+	unsigned long eff_util;
=20
-	for_each_cpu(cpu, pd_cpus) {
-		struct task_struct *tsk =3D (cpu =3D=3D dst_cpu) ? p : NULL;
-		unsigned long util =3D cpu_util(cpu, p, dst_cpu, 1);
-		unsigned long eff_util, min, max;
+	eff_util =3D effective_cpu_util(cpu, util, NULL, NULL);
=20
-		/*
-		 * Performance domain frequency: utilization clamping
-		 * must be considered since it affects the selection
-		 * of the performance domain frequency.
-		 * NOTE: in case RT tasks are running, by default the min
-		 * utilization can be max OPP.
-		 */
-		eff_util =3D effective_cpu_util(cpu, util, &min, &max);
+	return eff_util;
+}
=20
-		/* Task's uclamp can modify min and max value */
-		if (tsk && uclamp_is_used()) {
-			min =3D max(min, uclamp_eff_value(p, UCLAMP_MIN));
+/* Find the cost of a performance domain for the estimated utilization */
+static inline void find_pd_cost(struct em_perf_domain *pd,
+				unsigned long max_util,
+				struct energy_cpu_stat *stat)
+{
+	struct em_perf_table *em_table;
+	struct em_perf_state *ps;
+	int i;
=20
-			/*
-			 * If there is no active max uclamp constraint,
-			 * directly use task's one, otherwise keep max.
-			 */
-			if (uclamp_rq_is_idle(cpu_rq(cpu)))
-				max =3D uclamp_eff_value(p, UCLAMP_MAX);
-			else
-				max =3D max(max, uclamp_eff_value(p, UCLAMP_MAX));
-		}
+	/*
+	 * Find the lowest performance state of the Energy Model above the
+	 * requested performance.
+	 */
+	em_table =3D rcu_dereference(pd->em_table);
+	i =3D em_pd_get_efficient_state(em_table->state, pd, max_util);
+	ps =3D &em_table->state[i];
=20
-		eff_util =3D sugov_effective_cpu_perf(cpu, eff_util, min, max);
-		max_util =3D max(max_util, eff_util);
+	/* Save the cost and performance range of the OPP */
+	stat->max_perf =3D ps->performance;
+	stat->cost =3D ps->cost;
+	i =3D em_pd_get_previous_state(em_table->state, pd, i);
+	if (i < 0)
+		stat->min_perf =3D 0;
+	else {
+		ps =3D &em_table->state[i];
+		stat->min_perf =3D ps->performance;
 	}
-
-	return min(max_util, eenv->cpu_cap);
 }
=20
-/*
- * compute_energy(): Use the Energy Model to estimate the energy that @pd =
would
- * consume for a given utilization landscape @eenv. When @dst_cpu < 0, the=
 task
- * contribution is ignored.
- */
-static inline unsigned long
-compute_energy(struct energy_env *eenv, struct perf_domain *pd,
-	       struct cpumask *pd_cpus, struct task_struct *p, int dst_cpu)
+/*Check if the CPU can handle the waking task */
+static int check_cpu_with_task(struct task_struct *p, int cpu)
 {
-	unsigned long max_util =3D eenv_pd_max_util(eenv, pd_cpus, p, dst_cpu);
-	unsigned long busy_time =3D eenv->pd_busy_time;
-	unsigned long energy;
+	unsigned long p_util_min =3D uclamp_is_used() ? uclamp_eff_value(p, UCLAM=
P_MIN) : 0;
+	unsigned long p_util_max =3D uclamp_is_used() ? uclamp_eff_value(p, UCLAM=
P_MAX) : 1024;
+	unsigned long util_min =3D p_util_min;
+	unsigned long util_max =3D p_util_max;
+	unsigned long util =3D cpu_util(cpu, p, cpu, 0);
+	struct rq *rq =3D cpu_rq(cpu);
=20
-	if (dst_cpu >=3D 0)
-		busy_time =3D min(eenv->pd_cap, busy_time + eenv->task_busy_time);
+	/*
+	 * Skip CPUs that cannot satisfy the capacity request.
+	 * IOW, placing the task there would make the CPU
+	 * overutilized. Take uclamp into account to see how
+	 * much capacity we can get out of the CPU; this is
+	 * aligned with sched_cpu_util().
+	 */
+	if (uclamp_is_used() && !uclamp_rq_is_idle(rq)) {
+		unsigned long rq_util_min, rq_util_max;
+		/*
+		 * Open code uclamp_rq_util_with() except for
+		 * the clamp() part. I.e.: apply max aggregation
+		 * only. util_fits_cpu() logic requires to
+		 * operate on non clamped util but must use the
+		 * max-aggregated uclamp_{min, max}.
+		 */
+		rq_util_min =3D uclamp_rq_get(rq, UCLAMP_MIN);
+		rq_util_max =3D uclamp_rq_get(rq, UCLAMP_MAX);
+		util_min =3D max(rq_util_min, p_util_min);
+		util_max =3D max(rq_util_max, p_util_max);
+	}
+	return util_fits_cpu(util, util_min, util_max, cpu);
+}
+
+/* For a same cost, select the CPU that will povide best performance for t=
he task */
+static bool select_best_cpu(struct energy_cpu_stat *target,
+			    struct energy_cpu_stat *min,
+			    int prev, struct sched_domain *sd)
+{
+	/*  Select the one with the least number of running tasks */
+	if (target->nr_running < min->nr_running)
+		return true;
+	if (target->nr_running > min->nr_running)
+		return false;
=20
-	energy =3D em_cpu_energy(pd->em_pd, max_util, busy_time, eenv->cpu_cap);
+	/* Favor previous CPU otherwise */
+	if (target->cpu =3D=3D prev)
+		return true;
+	if (min->cpu =3D=3D prev)
+		return false;
=20
-	trace_sched_compute_energy_tp(p, dst_cpu, energy, max_util, busy_time);
+	/*
+	 * Choose CPU with lowest contention. One might want to consider load ins=
tead of
+	 * runnable but we are supposed to not be overutilized so there is enough=
 compute
+	 * capacity for everybody.
+	 */
+	if ((target->runnable * min->capa * sd->imbalance_pct) >=3D
+			(min->runnable * target->capa * 100))
+		return false;
=20
-	return energy;
+	return true;
 }
=20
 /*
  * find_energy_efficient_cpu(): Find most energy-efficient target CPU for =
the
- * waking task. find_energy_efficient_cpu() looks for the CPU with maximum
- * spare capacity in each performance domain and uses it as a potential
- * candidate to execute the task. Then, it uses the Energy Model to figure
- * out which of the CPU candidates is the most energy-efficient.
+ * waking task. find_energy_efficient_cpu() looks for the CPU with the low=
est
+ * power cost (usually with maximum spare capacity but not always) in each
+ * performance domain and uses it as a potential candidate to execute the =
task.
+ * Then, it uses the Energy Model to figure out which of the CPU candidate=
s is
+ * the most energy-efficient.
  *
  * The rationale for this heuristic is as follows. In a performance domain,
  * all the most energy efficient CPU candidates (according to the Energy
@@ -8414,17 +8448,14 @@ compute_energy(struct energy_env *eenv, struct perf=
_domain *pd,
 static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
 {
 	struct cpumask *cpus =3D this_cpu_cpumask_var_ptr(select_rq_mask);
-	unsigned long prev_delta =3D ULONG_MAX, best_delta =3D ULONG_MAX;
-	unsigned long p_util_min =3D uclamp_is_used() ? uclamp_eff_value(p, UCLAM=
P_MIN) : 0;
-	unsigned long p_util_max =3D uclamp_is_used() ? uclamp_eff_value(p, UCLAM=
P_MAX) : 1024;
 	struct root_domain *rd =3D this_rq()->rd;
-	int cpu, best_energy_cpu, target =3D -1;
-	int prev_fits =3D -1, best_fits =3D -1;
-	unsigned long best_actual_cap =3D 0;
-	unsigned long prev_actual_cap =3D 0;
+	unsigned long best_nrg =3D ULONG_MAX;
+	unsigned long task_util;
 	struct sched_domain *sd;
 	struct perf_domain *pd;
-	struct energy_env eenv;
+	int cpu, target =3D -1;
+	int best_fits =3D -1;
+	int best_cpu =3D -1;
=20
 	rcu_read_lock();
 	pd =3D rcu_dereference(rd->pd);
@@ -8444,19 +8475,19 @@ static int find_energy_efficient_cpu(struct task_st=
ruct *p, int prev_cpu)
 	target =3D prev_cpu;
=20
 	sync_entity_load_avg(&p->se);
-	if (!task_util_est(p) && p_util_min =3D=3D 0)
-		goto unlock;
-
-	eenv_task_busy_time(&eenv, p, prev_cpu);
+	task_util =3D task_busy_time(p, prev_cpu);
=20
 	for (; pd; pd =3D pd->next) {
-		unsigned long util_min =3D p_util_min, util_max =3D p_util_max;
-		unsigned long cpu_cap, cpu_actual_cap, util;
-		long prev_spare_cap =3D -1, max_spare_cap =3D -1;
-		unsigned long rq_util_min, rq_util_max;
-		unsigned long cur_delta, base_energy;
-		int max_spare_cap_cpu =3D -1;
-		int fits, max_fits =3D -1;
+		unsigned long pd_actual_util =3D 0, delta_nrg =3D 0;
+		unsigned long cpu_actual_cap, max_cost =3D 0;
+		struct energy_cpu_stat target_stat;
+		struct energy_cpu_stat min_stat =3D {
+			.cost =3D ULONG_MAX,
+			.max_perf =3D ULONG_MAX,
+			.min_perf =3D ULONG_MAX,
+			.fits =3D -2,
+			.cpu =3D -1,
+		};
=20
 		cpumask_and(cpus, perf_domain_span(pd), cpu_online_mask);
=20
@@ -8467,13 +8498,9 @@ static int find_energy_efficient_cpu(struct task_str=
uct *p, int prev_cpu)
 		cpu =3D cpumask_first(cpus);
 		cpu_actual_cap =3D get_actual_cpu_capacity(cpu);
=20
-		eenv.cpu_cap =3D cpu_actual_cap;
-		eenv.pd_cap =3D 0;
-
+		/* In a PD, the CPU with the lowest cost will be the most efficient */
 		for_each_cpu(cpu, cpus) {
-			struct rq *rq =3D cpu_rq(cpu);
-
-			eenv.pd_cap +=3D cpu_actual_cap;
+			unsigned long target_perf;
=20
 			if (!cpumask_test_cpu(cpu, sched_domain_span(sd)))
 				continue;
@@ -8481,120 +8508,112 @@ static int find_energy_efficient_cpu(struct task_=
struct *p, int prev_cpu)
 			if (!cpumask_test_cpu(cpu, p->cpus_ptr))
 				continue;
=20
-			util =3D cpu_util(cpu, p, cpu, 0);
-			cpu_cap =3D capacity_of(cpu);
+			target_stat.fits =3D check_cpu_with_task(p, cpu);
+
+			if (!target_stat.fits)
+				continue;
+
+			/* 1st select the CPU that fits best */
+			if (target_stat.fits < min_stat.fits)
+				continue;
+
+			/* Then select the CPU with lowest cost */
+
+			/* Get the performance of the CPU w/ waking task. */
+			target_perf =3D find_cpu_max_util(cpu, p, cpu);
+			target_perf =3D min(target_perf, cpu_actual_cap);
+
+			/* Needing a higher OPP means a higher cost */
+			if (target_perf > min_stat.max_perf)
+				continue;
=20
 			/*
-			 * Skip CPUs that cannot satisfy the capacity request.
-			 * IOW, placing the task there would make the CPU
-			 * overutilized. Take uclamp into account to see how
-			 * much capacity we can get out of the CPU; this is
-			 * aligned with sched_cpu_util().
+			 * At this point, target's cost can be either equal or
+			 * lower than the current minimum cost.
 			 */
-			if (uclamp_is_used() && !uclamp_rq_is_idle(rq)) {
-				/*
-				 * Open code uclamp_rq_util_with() except for
-				 * the clamp() part. I.e.: apply max aggregation
-				 * only. util_fits_cpu() logic requires to
-				 * operate on non clamped util but must use the
-				 * max-aggregated uclamp_{min, max}.
-				 */
-				rq_util_min =3D uclamp_rq_get(rq, UCLAMP_MIN);
-				rq_util_max =3D uclamp_rq_get(rq, UCLAMP_MAX);
=20
-				util_min =3D max(rq_util_min, p_util_min);
-				util_max =3D max(rq_util_max, p_util_max);
-			}
+			/* Gather more statistics */
+			target_stat.cpu =3D cpu;
+			target_stat.runnable =3D cpu_runnable(cpu_rq(cpu));
+			target_stat.capa =3D capacity_of(cpu);
+			target_stat.nr_running =3D cpu_rq(cpu)->cfs.h_nr_runnable;
=20
-			fits =3D util_fits_cpu(util, util_min, util_max, cpu);
-			if (!fits)
+			/* If the target needs a lower OPP, then look up for
+			 * the corresponding OPP and its associated cost.
+			 * Otherwise at same cost level, select the CPU which
+			 * provides best performance.
+			 */
+			if (target_perf < min_stat.min_perf)
+				find_pd_cost(pd->em_pd, target_perf, &target_stat);
+			else if (!select_best_cpu(&target_stat, &min_stat, prev_cpu, sd))
 				continue;
=20
-			lsub_positive(&cpu_cap, util);
-
-			if (cpu =3D=3D prev_cpu) {
-				/* Always use prev_cpu as a candidate. */
-				prev_spare_cap =3D cpu_cap;
-				prev_fits =3D fits;
-			} else if ((fits > max_fits) ||
-				   ((fits =3D=3D max_fits) && ((long)cpu_cap > max_spare_cap))) {
-				/*
-				 * Find the CPU with the maximum spare capacity
-				 * among the remaining CPUs in the performance
-				 * domain.
-				 */
-				max_spare_cap =3D cpu_cap;
-				max_spare_cap_cpu =3D cpu;
-				max_fits =3D fits;
-			}
+			/* Save the new most efficient CPU of the PD */
+			min_stat =3D target_stat;
 		}
=20
-		if (max_spare_cap_cpu < 0 && prev_spare_cap < 0)
+		if (min_stat.cpu =3D=3D -1)
 			continue;
=20
-		eenv_pd_busy_time(&eenv, cpus, p);
-		/* Compute the 'base' energy of the pd, without @p */
-		base_energy =3D compute_energy(&eenv, pd, cpus, p, -1);
+		if (min_stat.fits < best_fits)
+			continue;
=20
-		/* Evaluate the energy impact of using prev_cpu. */
-		if (prev_spare_cap > -1) {
-			prev_delta =3D compute_energy(&eenv, pd, cpus, p,
-						    prev_cpu);
-			/* CPU utilization has changed */
-			if (prev_delta < base_energy)
-				goto unlock;
-			prev_delta -=3D base_energy;
-			prev_actual_cap =3D cpu_actual_cap;
-			best_delta =3D min(best_delta, prev_delta);
-		}
+		/* Idle system costs nothing */
+		target_stat.max_perf =3D 0;
+		target_stat.cost =3D 0;
=20
-		/* Evaluate the energy impact of using max_spare_cap_cpu. */
-		if (max_spare_cap_cpu >=3D 0 && max_spare_cap > prev_spare_cap) {
-			/* Current best energy cpu fits better */
-			if (max_fits < best_fits)
-				continue;
+		/* Estimate utilization and cost without p */
+		for_each_cpu(cpu, cpus) {
+			unsigned long target_util;
=20
-			/*
-			 * Both don't fit performance hint (i.e. uclamp_min)
-			 * but best energy cpu has better capacity.
-			 */
-			if ((max_fits < 0) &&
-			    (cpu_actual_cap <=3D best_actual_cap))
-				continue;
+			/* Accumulate actual utilization w/o task p */
+			pd_actual_util +=3D find_cpu_actual_util(cpu, p);
=20
-			cur_delta =3D compute_energy(&eenv, pd, cpus, p,
-						   max_spare_cap_cpu);
-			/* CPU utilization has changed */
-			if (cur_delta < base_energy)
-				goto unlock;
-			cur_delta -=3D base_energy;
+			/* Get the max utilization of the CPU w/o task p */
+			target_util =3D find_cpu_max_util(cpu, p, -1);
+			target_util =3D min(target_util, cpu_actual_cap);
=20
-			/*
-			 * Both fit for the task but best energy cpu has lower
-			 * energy impact.
-			 */
-			if ((max_fits > 0) && (best_fits > 0) &&
-			    (cur_delta >=3D best_delta))
+			/* Current OPP is enough */
+			if (target_util <=3D target_stat.max_perf)
 				continue;
=20
-			best_delta =3D cur_delta;
-			best_energy_cpu =3D max_spare_cap_cpu;
-			best_fits =3D max_fits;
-			best_actual_cap =3D cpu_actual_cap;
+			/* Compute and save the cost of the OPP */
+			find_pd_cost(pd->em_pd, target_util, &target_stat);
+			max_cost =3D target_stat.cost;
 		}
-	}
-	rcu_read_unlock();
=20
-	if ((best_fits > prev_fits) ||
-	    ((best_fits > 0) && (best_delta < prev_delta)) ||
-	    ((best_fits < 0) && (best_actual_cap > prev_actual_cap)))
-		target =3D best_energy_cpu;
+		/* Add the NRG cost of p */
+		delta_nrg =3D task_util * min_stat.cost;
=20
-	return target;
+		/* Compute the NRG cost of others running at higher OPP because of p */
+		if (min_stat.cost > max_cost)
+			delta_nrg +=3D pd_actual_util * (min_stat.cost - max_cost);
+
+		/* nrg with p */
+		trace_sched_compute_energy_tp(p, min_stat.cpu, delta_nrg,
+				min_stat.max_perf, pd_actual_util + task_util);
+
+		/*
+		 * The probability that delta NRGs are equals is almost null. PDs being =
sorted
+		 * by max capacity, keep the one with highest max capacity if this
+		 * happens.
+		 * TODO: add a margin in nrg cost and take into account other stats
+		 */
+		if ((min_stat.fits =3D=3D best_fits) &&
+		    (delta_nrg >=3D best_nrg))
+			continue;
+
+		best_fits =3D min_stat.fits;
+		best_nrg =3D delta_nrg;
+		best_cpu =3D min_stat.cpu;
+	}
=20
 unlock:
 	rcu_read_unlock();
=20
+	if (best_cpu >=3D 0)
+		target =3D best_cpu;
+
 	return target;
 }
=20
--=20
2.43.0
From nobody Mon Feb  9 01:56:08 2026
Received: from mail-wm1-f46.google.com (mail-wm1-f46.google.com
 [209.85.128.46])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id EF61E1F8EFE
	for <linux-kernel@vger.kernel.org>; Tue, 17 Dec 2024 16:07:37 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.46
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1734451659; cv=none;
 b=puIvZioI8DRlkZtWwPQDM7AuHtKhJDVVzlkaRGoeizsvHXvvE3+wZ+mqF3m8fdJnjtFwmOm0M1M68jjXTC6rgPgTAgsBLI3nK4jtTnxRSsZJGoU34VZCiCrbgOwAN5hpaf/ZdcecHfJAMQtKFxeIMfLNizGK4BhkbZHWxGmMgVg=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1734451659; c=relaxed/simple;
	bh=Rlvf16Hma2TQfu1eUyAJ0hT7CfYXyh1XRt33PLkicU0=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=cP9wYJThvH9+mzGQkAiWguQCgvsxwO9Y6TNnipNC1Kgec1TOpm5VZFDhRn99cvjsA7350gF0iXrcdkuUnrrkJNxcDAEmRiPV2eLzwVxtt9r4sBTvbsj0ECqkgsYC91bv//OOpBbqilvKViiJs5bJAH5HljtP5iI97Ke9YkqG0K8=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linaro.org;
 spf=pass smtp.mailfrom=linaro.org;
 dkim=pass (2048-bit key) header.d=linaro.org header.i=@linaro.org
 header.b=BqCWSeKa; arc=none smtp.client-ip=209.85.128.46
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linaro.org
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=linaro.org
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=linaro.org header.i=@linaro.org
 header.b="BqCWSeKa"
Received: by mail-wm1-f46.google.com with SMTP id
 5b1f17b1804b1-4361f664af5so59898185e9.1
        for <linux-kernel@vger.kernel.org>;
 Tue, 17 Dec 2024 08:07:37 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=linaro.org; s=google; t=1734451656; x=1735056456;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=EinNXaAsRN0AC7AgEijgQ+kS2aGRY7CXUHE390ZjvwY=;
        b=BqCWSeKauhlWEsGnw8KvkdT/hsBuD/N5CpgzNU+zG7lj1pLSfqpCSarOEy6xN8CCyF
         jIScMgY0NJbh2xDu4OOkBEtjdtcLX4H1UsiV/stq8GCvxTknQgVv46Uum8hgZqBDEkhq
         5cJsmDlJ1lXCGkacvMyyBmG0iphfpjg5ktdObIxE+64QhYaQ4QnggPtjMMoY+9kYSNR5
         CKrCb7iVIbIX6bZr41EbBYRjB5NMjMOcBezO6jH62Z2CbVWuW56DIGTG0sM4YzbAPy+m
         SU5GZCrmezMwANtKyskddWtocXeExHZifTYUb9RxhLD/K1twudQsSWovQujeZzvlm8UZ
         AdHg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1734451656; x=1735056456;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=EinNXaAsRN0AC7AgEijgQ+kS2aGRY7CXUHE390ZjvwY=;
        b=qK25KrEbMfOfs3P87zPiZpt5HBd5Icef8ct9bbw98QNUo69BAg4L91ydyUvpo0WjHC
         eOvVoT6LZmSRn/myhzi73FqzQPc8CGM15042ol+s0243V6VOeePX5b+NyHxZNf+04anx
         jxqG3HV+3iQZ4WUkpBA7M2aI1lgDwcNqNOnJdas+ISGK1kbdyzhE0XseXI9k3C2TPkW3
         Xuu4LHht08f6B9Vurky5uuLUZ2zqD6fnHzxdY3VUhWjYk9zvSNaxzixd3intEGwJK/14
         x2IluJRz9UeIBzWW8b4mDvmmdD/42jeeKgE0U2IO8OSifvrflmSJoXC3zn6PM4rPvc/s
         QQxQ==
X-Forwarded-Encrypted: i=1;
 AJvYcCVLsAdL8+BCVaDLqB/ZngylixJEOIhtGoD1wTY2FWFvZh6Ae86z0eLJEWFnGjeyURMXvzYP1L7pOopOkCk=@vger.kernel.org
X-Gm-Message-State: AOJu0YxHtTZjwkPEEjU/YKM+gTKX6LyDWEbbrYAkeR5PfizfSYvhyaZR
	kNbQvoOkpsVO2H05QA/4wV1Qt+PMtNdDOgdGLYL4Jt1+8IYNPoxyG4brMesyIrY=
X-Gm-Gg: ASbGnctiaNogzVATRCN3VSGryoynudOSSl/wx2aCCel1hvu1YS75o8ixIPXHrTIWXN/
	ygmfD7xkrrcPMNtBk7sTYIS+SWfIbHW9lArnC/0/+QhxcHRF+W0L1x9glKS2Qu+dyVbzVyGeNXH
	eD/Mvh6KzOG7JPk8tLCsDqD0JmexUeGGtqzoVAEPd8XOgbalBIWd3jei68kH/OjiSVwXnJq6JOS
	WZOADZB4DNAo5SXSeGaBpILPg/9gV2abD5cVAXKsj5g7sjrvtTegJzKU3sHebcCeQ==
X-Google-Smtp-Source: 
 AGHT+IHbBs1v/efZy24EpNT5C45sG8Ubu+JFT93jHjoobDnm7J6fypTBvwozXSPu8ccgXzcREUn4NA==
X-Received: by 2002:a05:600c:a014:b0:436:1971:2a4 with SMTP id
 5b1f17b1804b1-4362aa3d7e5mr159051665e9.17.1734451656095;
        Tue, 17 Dec 2024 08:07:36 -0800 (PST)
Received: from vingu-cube.. ([2a01:e0a:f:6020:4e5f:e8c8:aade:2d1b])
        by smtp.gmail.com with ESMTPSA id
 5b1f17b1804b1-436257176a4sm176739435e9.38.2024.12.17.08.07.35
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 17 Dec 2024 08:07:35 -0800 (PST)
From: Vincent Guittot <vincent.guittot@linaro.org>
To: mingo@redhat.com,
	peterz@infradead.org,
	juri.lelli@redhat.com,
	dietmar.eggemann@arm.com,
	rostedt@goodmis.org,
	bsegall@google.com,
	mgorman@suse.de,
	vschneid@redhat.com,
	lukasz.luba@arm.com,
	rafael.j.wysocki@intel.com,
	linux-kernel@vger.kernel.org
Cc: qyousef@layalina.io,
	hongyan.xia2@arm.com,
	pierre.gondois@arm.com,
	christian.loehle@arm.com,
	qperret@google.com,
	Vincent Guittot <vincent.guittot@linaro.org>
Subject: [PATCH 4/7 v2] energy model: Remove unused em_cpu_energy()
Date: Tue, 17 Dec 2024 17:07:17 +0100
Message-ID: <20241217160720.2397239-5-vincent.guittot@linaro.org>
X-Mailer: git-send-email 2.43.0
In-Reply-To: <20241217160720.2397239-1-vincent.guittot@linaro.org>
References: <20241217160720.2397239-1-vincent.guittot@linaro.org>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Remove the unused function em_cpu_energy()

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 include/linux/energy_model.h | 99 ------------------------------------
 1 file changed, 99 deletions(-)

diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
index 26d0ff72feac..c766642dc541 100644
--- a/include/linux/energy_model.h
+++ b/include/linux/energy_model.h
@@ -235,99 +235,6 @@ em_pd_get_previous_state(struct em_perf_state *table,
 	return -1;
 }
=20
-/**
- * em_cpu_energy() - Estimates the energy consumed by the CPUs of a
- *		performance domain
- * @pd		: performance domain for which energy has to be estimated
- * @max_util	: highest utilization among CPUs of the domain
- * @sum_util	: sum of the utilization of all CPUs in the domain
- * @allowed_cpu_cap	: maximum allowed CPU capacity for the @pd, which
- *			  might reflect reduced frequency (due to thermal)
- *
- * This function must be used only for CPU devices. There is no validation,
- * i.e. if the EM is a CPU type and has cpumask allocated. It is called fr=
om
- * the scheduler code quite frequently and that is why there is not checks.
- *
- * Return: the sum of the energy consumed by the CPUs of the domain assumi=
ng
- * a capacity state satisfying the max utilization of the domain.
- */
-static inline unsigned long em_cpu_energy(struct em_perf_domain *pd,
-				unsigned long max_util, unsigned long sum_util,
-				unsigned long allowed_cpu_cap)
-{
-	struct em_perf_table *em_table;
-	struct em_perf_state *ps;
-	int i;
-
-#ifdef CONFIG_SCHED_DEBUG
-	WARN_ONCE(!rcu_read_lock_held(), "EM: rcu read lock needed\n");
-#endif
-
-	if (!sum_util)
-		return 0;
-
-	/*
-	 * In order to predict the performance state, map the utilization of
-	 * the most utilized CPU of the performance domain to a requested
-	 * performance, like schedutil. Take also into account that the real
-	 * performance might be set lower (due to thermal capping). Thus, clamp
-	 * max utilization to the allowed CPU capacity before calculating
-	 * effective performance.
-	 */
-	max_util =3D min(max_util, allowed_cpu_cap);
-
-	/*
-	 * Find the lowest performance state of the Energy Model above the
-	 * requested performance.
-	 */
-	em_table =3D rcu_dereference(pd->em_table);
-	i =3D em_pd_get_efficient_state(em_table->state, pd, max_util);
-	ps =3D &em_table->state[i];
-
-	/*
-	 * The performance (capacity) of a CPU in the domain at the performance
-	 * state (ps) can be computed as:
-	 *
-	 *                     ps->freq * scale_cpu
-	 *   ps->performance =3D --------------------                  (1)
-	 *                         cpu_max_freq
-	 *
-	 * So, ignoring the costs of idle states (which are not available in
-	 * the EM), the energy consumed by this CPU at that performance state
-	 * is estimated as:
-	 *
-	 *             ps->power * cpu_util
-	 *   cpu_nrg =3D --------------------                          (2)
-	 *               ps->performance
-	 *
-	 * since 'cpu_util / ps->performance' represents its percentage of busy
-	 * time.
-	 *
-	 *   NOTE: Although the result of this computation actually is in
-	 *         units of power, it can be manipulated as an energy value
-	 *         over a scheduling period, since it is assumed to be
-	 *         constant during that interval.
-	 *
-	 * By injecting (1) in (2), 'cpu_nrg' can be re-expressed as a product
-	 * of two terms:
-	 *
-	 *             ps->power * cpu_max_freq
-	 *   cpu_nrg =3D ------------------------ * cpu_util           (3)
-	 *               ps->freq * scale_cpu
-	 *
-	 * The first term is static, and is stored in the em_perf_state struct
-	 * as 'ps->cost'.
-	 *
-	 * Since all CPUs of the domain have the same micro-architecture, they
-	 * share the same 'ps->cost', and the same CPU capacity. Hence, the
-	 * total energy of the domain (which is the simple sum of the energy of
-	 * all of its CPUs) can be factorized as:
-	 *
-	 *   pd_nrg =3D ps->cost * \Sum cpu_util                       (4)
-	 */
-	return ps->cost * sum_util;
-}
-
 /**
  * em_pd_nr_perf_states() - Get the number of performance states of a perf.
  *				domain
@@ -394,12 +301,6 @@ em_pd_get_previous_state(struct em_perf_state *table, =
int nr_perf_states,
 {
 	return -1;
 }
-static inline unsigned long em_cpu_energy(struct em_perf_domain *pd,
-			unsigned long max_util, unsigned long sum_util,
-			unsigned long allowed_cpu_cap)
-{
-	return 0;
-}
 static inline int em_pd_nr_perf_states(struct em_perf_domain *pd)
 {
 	return 0;
--=20
2.43.0
From nobody Mon Feb  9 01:56:08 2026
Received: from mail-wm1-f45.google.com (mail-wm1-f45.google.com
 [209.85.128.45])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 06FB51F8AD2
	for <linux-kernel@vger.kernel.org>; Tue, 17 Dec 2024 16:07:39 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.45
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1734451662; cv=none;
 b=TrlixuLGyFXHdmhy8q5k+yngaUlOuMnozJGTMSx/0X1+rZkBYij3alBhwCfq+jpNyUAyZDkRJw1y98lWj4N7lX32Dk9mnPg8LmOc9ZU0TinoWhD1qSf38ocnZBJq28/b4y+98xlLBA15l6A0Myneljt7rXP9a7Xw9kVF4Yxr/bE=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1734451662; c=relaxed/simple;
	bh=By5YuEIuipiFnXmiDz/n4TQ75Kjk6rASsidWVGdlzT0=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=ea5L8QEROYjBg5UCWVdD2eD6+cwvCR4yT1b2vAiK/p90Kp3KGu6h9CzdL8QXFykimP/V/nPx4qtAtFoKiQ4ix65CK6iV1M7sekEcCr1NyufTRPDE1NLiQaq34OlAkweHDt5sljKN1pnEragyoBhL0xC+r+OMt3SKBvYjrJ2wH/s=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linaro.org;
 spf=pass smtp.mailfrom=linaro.org;
 dkim=pass (2048-bit key) header.d=linaro.org header.i=@linaro.org
 header.b=PiQ19sid; arc=none smtp.client-ip=209.85.128.45
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linaro.org
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=linaro.org
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=linaro.org header.i=@linaro.org
 header.b="PiQ19sid"
Received: by mail-wm1-f45.google.com with SMTP id
 5b1f17b1804b1-4361815b96cso36871175e9.1
        for <linux-kernel@vger.kernel.org>;
 Tue, 17 Dec 2024 08:07:39 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=linaro.org; s=google; t=1734451658; x=1735056458;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=awmbUvkEZ2PZscQIRijoAF3PtVROPsmlOyeOqA542IY=;
        b=PiQ19sid07rdMOwmy6snMC4am7FF7ESCa+dhVMjudAEz5wgRiuN48PHkLbEKJApt6q
         5ruzm70JUYvNSqPFLtWXf9pk0UPb3WKtfzIhAsusxvb/35RDEI+LtjfyampF8CvEUKYP
         0xfUa/I5HCTHY3nbvCIe4W9aUUl3YWiAvfJcWpm0CGuhV7zLZs2iD/dHL1wZmDoldUjW
         dbb440dLtUyp1x5/HJCbKQbH/f0yFfboy9Uw3JdbH86ULewXz0njZ8U6dt456TkkrsKb
         5CTrs4/4KTqLvBKq7EsPKe3JYqIfEpdscsRgNt4Fkevm04533uyUatQLAnhdFAUR6YNe
         Gi/Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1734451658; x=1735056458;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=awmbUvkEZ2PZscQIRijoAF3PtVROPsmlOyeOqA542IY=;
        b=A//GjC+mnXHQKDAWPuShT6vF/4FLHI/lI+s1oL9hn/5k/6jwe/Mp6DZ7ZDOT9dJ0RQ
         c336mqbZHDtg/VpRr+JwrhowoujgAOPRrN/MEWAWsaYZN2jov0Jl70QoGu2QRmLjqBGr
         447xUy6jQUqPQemfsoPLgmTE6tAbgM5GWNGbdtYI8yDLymrN3BUEaeZcLSJJID7B8MV0
         +UJxX3DNr92lLjxPxQW96AIAK3J4p9zyixcQospaI3M+YrpAtTlfTjXvgRAz1wY/FrgB
         xnnZ3pHX0L6OJwY7KrYYHliRePnl4frp9U6/jCJJWRije0aWWUJkiz6JIztPqfdbrKee
         MbtQ==
X-Forwarded-Encrypted: i=1;
 AJvYcCX9h8ymZ5K0K7eX8Srq8j5CwjpJvxuAW5+GPYl72XMn9SglnUwNo3Cc4MlgxQWROjYIvsp9qrgobNqPq68=@vger.kernel.org
X-Gm-Message-State: AOJu0YxVc753T9jTZDse63rV5g5Bf12BfQRCLBkBwrO+tNPyeHGsSb9H
	MSreouTYJ9Z7BT1mEBkK+7p9xhfANYOBItmnPJ3KHK9pdBc5A+rrMKS6SNaEEiI=
X-Gm-Gg: ASbGnctsd//4tk/ZehbyyGfEJq5Y6nlYARwPEQ1Soud8SwYHtenl9h+zasDfkDdh80T
	/4TBRKcPCrjoPzQdI/qSLeOu02dzCGXkAKTYf1Iaml5LLznwNLx+SXAMbo4b8vpvi1usyDHiw0d
	o0gefUo2GRRQ/7Hl6mxebicWjrCaIZcqOiafQrl6WE+bzeFEL5Tkm0hai1VXmG2ZimlxIM/tCAq
	pHUublYnFcEyhAtZcKhS7oIb5h6oPf79TGGKQ5q0enNqQgdaJYrtd0Nh/VUmHgerg==
X-Google-Smtp-Source: 
 AGHT+IGRkMVZLIS5wLAtxsUV4PQoSXKo0CJ9pQ9HEzWckDbZ7iIOhor9AyFDTVFHHSpjZUopOWXt8g==
X-Received: by 2002:a05:600c:1c07:b0:434:ff25:1988 with SMTP id
 5b1f17b1804b1-4362aaa9632mr136849545e9.32.1734451657625;
        Tue, 17 Dec 2024 08:07:37 -0800 (PST)
Received: from vingu-cube.. ([2a01:e0a:f:6020:4e5f:e8c8:aade:2d1b])
        by smtp.gmail.com with ESMTPSA id
 5b1f17b1804b1-436257176a4sm176739435e9.38.2024.12.17.08.07.36
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 17 Dec 2024 08:07:36 -0800 (PST)
From: Vincent Guittot <vincent.guittot@linaro.org>
To: mingo@redhat.com,
	peterz@infradead.org,
	juri.lelli@redhat.com,
	dietmar.eggemann@arm.com,
	rostedt@goodmis.org,
	bsegall@google.com,
	mgorman@suse.de,
	vschneid@redhat.com,
	lukasz.luba@arm.com,
	rafael.j.wysocki@intel.com,
	linux-kernel@vger.kernel.org
Cc: qyousef@layalina.io,
	hongyan.xia2@arm.com,
	pierre.gondois@arm.com,
	christian.loehle@arm.com,
	qperret@google.com,
	Vincent Guittot <vincent.guittot@linaro.org>
Subject: [PATCH 5/7 v2] sched/fair: Add push task callback for EAS
Date: Tue, 17 Dec 2024 17:07:18 +0100
Message-ID: <20241217160720.2397239-6-vincent.guittot@linaro.org>
X-Mailer: git-send-email 2.43.0
In-Reply-To: <20241217160720.2397239-1-vincent.guittot@linaro.org>
References: <20241217160720.2397239-1-vincent.guittot@linaro.org>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

EAS is based on wakeup events to efficiently place tasks on the system, but
there are cases where a task will not have wakeup events anymore or at a
far too low pace. For such situation, we can take advantage of the task
being put back in the enqueued list to check if it should be migrated on
another CPU.

Wake up events remain the main way to migrate tasks but we now detect
situation where a task is stuck on a CPU by checking that its utilization
is larger than the max available compute capacity (max cpu capacity or
uclamp max setting)

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c  | 206 +++++++++++++++++++++++++++++++++++++++++++
 kernel/sched/sched.h |   2 +
 2 files changed, 208 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index cd046e8216a9..2affc063da55 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7088,6 +7088,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *=
p, int flags)
 	hrtick_update(rq);
 }
=20
+static void dequeue_pushable_task(struct rq *rq, struct task_struct *p);
 static void set_next_buddy(struct sched_entity *se);
=20
 /*
@@ -7118,6 +7119,9 @@ static int dequeue_entities(struct rq *rq, struct sch=
ed_entity *se, int flags)
 		h_nr_idle =3D task_has_idle_policy(p);
 		if (task_sleep || task_delayed || !se->sched_delayed)
 			h_nr_runnable =3D 1;
+
+		if (task_sleep || task_on_rq_migrating(p))
+			dequeue_pushable_task(rq, p);
 	} else {
 		cfs_rq =3D group_cfs_rq(se);
 		slice =3D cfs_rq_min_slice(cfs_rq);
@@ -8617,6 +8621,182 @@ static int find_energy_efficient_cpu(struct task_st=
ruct *p, int prev_cpu)
 	return target;
 }
=20
+static inline bool task_misfit_cpu(struct task_struct *p, int cpu)
+{
+	unsigned long max_capa =3D get_actual_cpu_capacity(cpu);
+	unsigned long util =3D task_util_est(p);
+
+	max_capa =3D min(max_capa, uclamp_eff_value(p, UCLAMP_MAX));
+	util =3D max(util, task_runnable(p));
+
+	/*
+	 * Return true only if the task might not sleep/wakeup because of a low
+	 * compute capacity. Tasks, which wake up regularly, will be handled by
+	 * feec().
+	 */
+	return (util > max_capa);
+}
+
+static int active_load_balance_cpu_stop(void *data);
+
+static inline void migrate_misfit_task(struct task_struct *p, struct rq *r=
q)
+{
+	int new_cpu, cpu =3D cpu_of(rq);
+
+	if (!sched_energy_enabled() || is_rd_overutilized(rq->rd))
+		return;
+
+	if (WARN_ON(!p))
+		return;
+
+	if (WARN_ON(p !=3D rq->curr))
+		return;
+
+	if (is_migration_disabled(p))
+		return;
+
+	if ((rq->nr_running > 1) || (p->nr_cpus_allowed =3D=3D 1))
+		return;
+
+	if (!task_misfit_cpu(p, cpu))
+		return;
+
+	new_cpu =3D find_energy_efficient_cpu(p, cpu);
+
+	if (new_cpu =3D=3D cpu)
+		return;
+
+	/*
+	 * ->active_balance synchronizes accesses to
+	 * ->active_balance_work.  Once set, it's cleared
+	 * only after active load balance is finished.
+	 */
+	if (!rq->active_balance) {
+		rq->active_balance =3D 1;
+		rq->push_cpu =3D new_cpu;
+	} else
+		return;
+
+	raw_spin_rq_unlock(rq);
+	stop_one_cpu_nowait(cpu,
+		active_load_balance_cpu_stop, rq,
+		&rq->active_balance_work);
+	raw_spin_rq_lock(rq);
+}
+
+static inline int has_pushable_tasks(struct rq *rq)
+{
+	return !plist_head_empty(&rq->cfs.pushable_tasks);
+}
+
+static struct task_struct *pick_next_pushable_fair_task(struct rq *rq)
+{
+	struct task_struct *p;
+
+	if (!has_pushable_tasks(rq))
+		return NULL;
+
+	p =3D plist_first_entry(&rq->cfs.pushable_tasks,
+			      struct task_struct, pushable_tasks);
+
+	WARN_ON_ONCE(rq->cpu !=3D task_cpu(p));
+	WARN_ON_ONCE(task_current(rq, p));
+	WARN_ON_ONCE(p->nr_cpus_allowed <=3D 1);
+	WARN_ON_ONCE(!task_on_rq_queued(p));
+
+	/*
+	 * Remove task from the pushable list as we try only once after that
+	 * the task has been put back in enqueued list.
+	 */
+	plist_del(&p->pushable_tasks, &rq->cfs.pushable_tasks);
+
+	return p;
+}
+
+/*
+ * See if the non running fair tasks on this rq can be sent on other CPUs
+ * that fits better with their profile.
+ */
+static bool push_fair_task(struct rq *rq)
+{
+	struct task_struct *next_task;
+	int prev_cpu, new_cpu;
+	struct rq *new_rq;
+
+	next_task =3D pick_next_pushable_fair_task(rq);
+	if (!next_task)
+		return false;
+
+	if (is_migration_disabled(next_task))
+		return true;
+
+	/* We might release rq lock */
+	get_task_struct(next_task);
+
+	prev_cpu =3D rq->cpu;
+
+	new_cpu =3D find_energy_efficient_cpu(next_task, prev_cpu);
+
+	if (new_cpu =3D=3D prev_cpu)
+		goto out;
+
+	new_rq =3D cpu_rq(new_cpu);
+
+	if (double_lock_balance(rq, new_rq)) {
+		/* The task has already migrated in between */
+		if (task_cpu(next_task) !=3D rq->cpu) {
+			double_unlock_balance(rq, new_rq);
+			goto out;
+		}
+
+		deactivate_task(rq, next_task, 0);
+		set_task_cpu(next_task, new_cpu);
+		activate_task(new_rq, next_task, 0);
+
+		resched_curr(new_rq);
+
+		double_unlock_balance(rq, new_rq);
+	}
+
+out:
+	put_task_struct(next_task);
+
+	return true;
+}
+
+static void push_fair_tasks(struct rq *rq)
+{
+	/* push_fair_task() will return true if it moved a fair task */
+	while (push_fair_task(rq))
+		;
+}
+
+static DEFINE_PER_CPU(struct balance_callback, fair_push_head);
+
+static inline void fair_queue_push_tasks(struct rq *rq)
+{
+	if (!sched_energy_enabled() || !has_pushable_tasks(rq))
+		return;
+
+	queue_balance_callback(rq, &per_cpu(fair_push_head, rq->cpu), push_fair_t=
asks);
+}
+static void dequeue_pushable_task(struct rq *rq, struct task_struct *p)
+{
+	if (sched_energy_enabled())
+		plist_del(&p->pushable_tasks, &rq->cfs.pushable_tasks);
+}
+
+static void enqueue_pushable_task(struct rq *rq, struct task_struct *p)
+{
+	if (sched_energy_enabled() && task_on_rq_queued(p) && !p->se.sched_delaye=
d) {
+		if (!is_rd_overutilized(rq->rd) && task_misfit_cpu(p, rq->cpu)) {
+			plist_del(&p->pushable_tasks, &rq->cfs.pushable_tasks);
+			plist_node_init(&p->pushable_tasks, p->prio);
+			plist_add(&p->pushable_tasks, &rq->cfs.pushable_tasks);
+		}
+	}
+}
+
 /*
  * select_task_rq_fair: Select target runqueue for the waking task in doma=
ins
  * that have the relevant SD flag set. In practice, this is SD_BALANCE_WAK=
E,
@@ -8786,6 +8966,10 @@ balance_fair(struct rq *rq, struct task_struct *prev=
, struct rq_flags *rf)
 	return sched_balance_newidle(rq, rf) !=3D 0;
 }
 #else
+static inline void migrate_misfit_task(struct task_struct *p, struct rq *r=
q) {}
+static inline void fair_queue_push_tasks(struct rq *rq) {}
+static void dequeue_pushable_task(struct cfs_rq *cfs_rq, struct task_struc=
t *p) {}
+static inline void enqueue_pushable_task(struct cfs_rq *cfs_rq, struct tas=
k_struct *p) {}
 static inline void set_task_max_allowed_capacity(struct task_struct *p) {}
 #endif /* CONFIG_SMP */
=20
@@ -8968,6 +9152,12 @@ pick_next_task_fair(struct rq *rq, struct task_struc=
t *prev, struct rq_flags *rf
 		put_prev_entity(cfs_rq, pse);
 		set_next_entity(cfs_rq, se);
=20
+		/*
+		 * The previous task might be eligible for being pushed on
+		 * another cpu if it is still runnable.
+		 */
+		enqueue_pushable_task(rq, prev);
+
 		__set_next_task_fair(rq, p, true);
 	}
=20
@@ -9040,6 +9230,13 @@ static void put_prev_task_fair(struct rq *rq, struct=
 task_struct *prev, struct t
 		cfs_rq =3D cfs_rq_of(se);
 		put_prev_entity(cfs_rq, se);
 	}
+
+	/*
+	 * The previous task might be eligible for pushing it on
+	 * another cpu if it is still active.
+	 */
+	enqueue_pushable_task(rq, prev);
+
 }
=20
 /*
@@ -13102,6 +13299,7 @@ static void task_tick_fair(struct rq *rq, struct ta=
sk_struct *curr, int queued)
 	if (static_branch_unlikely(&sched_numa_balancing))
 		task_tick_numa(rq, curr);
=20
+	migrate_misfit_task(curr, rq);
 	update_misfit_status(curr, rq);
 	check_update_overutilized_status(task_rq(curr));
=20
@@ -13254,6 +13452,8 @@ static void __set_next_task_fair(struct rq *rq, str=
uct task_struct *p, bool firs
 {
 	struct sched_entity *se =3D &p->se;
=20
+	dequeue_pushable_task(rq, p);
+
 #ifdef CONFIG_SMP
 	if (task_on_rq_queued(p)) {
 		/*
@@ -13271,6 +13471,11 @@ static void __set_next_task_fair(struct rq *rq, st=
ruct task_struct *p, bool firs
 	if (hrtick_enabled_fair(rq))
 		hrtick_start_fair(rq, p);
=20
+	/*
+	 * Try to push prev task before checking misfit for next task as
+	 * the migration of prev can make next fitting the CPU
+	 */
+	fair_queue_push_tasks(rq);
 	update_misfit_status(p, rq);
 	sched_fair_update_stop_tick(rq, p);
 }
@@ -13301,6 +13506,7 @@ void init_cfs_rq(struct cfs_rq *cfs_rq)
 	cfs_rq->tasks_timeline =3D RB_ROOT_CACHED;
 	cfs_rq->min_vruntime =3D (u64)(-(1LL << 20));
 #ifdef CONFIG_SMP
+	plist_head_init(&cfs_rq->pushable_tasks);
 	raw_spin_lock_init(&cfs_rq->removed.lock);
 #endif
 }
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index aef716c41edb..c9875cd4c986 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -717,6 +717,8 @@ struct cfs_rq {
 	struct list_head	leaf_cfs_rq_list;
 	struct task_group	*tg;	/* group that "owns" this runqueue */
=20
+	struct plist_head	pushable_tasks;
+
 	/* Locally cached copy of our task_group's idle value */
 	int			idle;
=20
--=20
2.43.0
From nobody Mon Feb  9 01:56:08 2026
Received: from mail-wm1-f47.google.com (mail-wm1-f47.google.com
 [209.85.128.47])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 758E81F940C
	for <linux-kernel@vger.kernel.org>; Tue, 17 Dec 2024 16:07:41 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.47
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1734451663; cv=none;
 b=rGX4d4qG2uuuOao32OpQ2ParG00zpNEGK2faNsMJ5rJ1A7WPQpcdZ3qAClBq3i/nl7rRtyGBZGa4PtpEdeKTpeQ1HZV17h0JWNsw8Untl0rC3H7CCgzlvIMO8eHRzX50MGa0hT4SaO3HlwSVqs/w+OsG0dqfMRVqmqq37j6ymm0=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1734451663; c=relaxed/simple;
	bh=dXS/ShcNcHU276KRefmpAmXb27O3K8SKe18Wp8y2ku0=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=pj7qrEvy5M19s5h1RWHD/3EV/QcFphM7V4u2czreA75biusokvgaGAUzIcTseIE+M54FAN8spHgrzrBqHp2WqYnPfmf5C1Jv7II0bQBFjWMHhw/7BDoY38toTfSniEUUH/N+VtCzJdkCHZbUHbNYbQWUHIo2omOCW+psuYAS5rc=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linaro.org;
 spf=pass smtp.mailfrom=linaro.org;
 dkim=pass (2048-bit key) header.d=linaro.org header.i=@linaro.org
 header.b=vHlMX8dl; arc=none smtp.client-ip=209.85.128.47
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linaro.org
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=linaro.org
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=linaro.org header.i=@linaro.org
 header.b="vHlMX8dl"
Received: by mail-wm1-f47.google.com with SMTP id
 5b1f17b1804b1-43624b2d453so61409115e9.2
        for <linux-kernel@vger.kernel.org>;
 Tue, 17 Dec 2024 08:07:41 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=linaro.org; s=google; t=1734451660; x=1735056460;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=qUSb30CiTrhpZ1NM6rfSGm6C56RYHphix50za6j+27U=;
        b=vHlMX8dlEubXt/6nxevgt3kQ5H0dUH1mn6P9x0MVNA+hwDIFbnh2SACFzVCb8LxVDX
         061cQFMFQjMIDuS5KtVx97ulzYS+jMGRGedXNYXq223dIpih5fxa+YkNFDD303hf2jqO
         e46Ll9jAesnh++RhGvPDvjL09K2jJ8Edc2f+wBeRrZuG95rFLeK4vNNQiYuKl+sPEWDf
         flI9HMKBtngG5/ISw92gTqcctKxridizsLbRbMPUpA9/NpX1wAH8s4e+wDVaF6bqOPmj
         HBTm7sfLq8aFYrNyiKeoxy3nd53cXwiht0oT3IxbjePVYwdzUl5rVtK4ksMRbJMiTN1L
         rRzw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1734451660; x=1735056460;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=qUSb30CiTrhpZ1NM6rfSGm6C56RYHphix50za6j+27U=;
        b=YKdCSicB+/W8XQbOGUUc1ugnmhIh9o6XuUPRilvheCjkUFxF5t0I9vfXmsSm3zFxXl
         DTAkseyUMiVI4ye4o4kl7zpq7CQOcVD22V6ydUEKI/cnZj8TnfMlHArlO1/f+v47sveZ
         i+RPXO3+R2Cg3j278qUJDTxca1nJoc/fW7srems67nPmh7fgxebHXl+ddom4uWBZ0Prz
         PcLAuzpzuPX3T9wYgnoQX8iAu0lG/u/+DF9vnOZ26eWqUmAw48IC4heCc4aIddNjkNG0
         2O86OoyDdOaPp0tqXxIjaCCxw8BioaDBujYsDkjsKYGWxgDul+dhdZrbpOe2WAPM9VBb
         aSuA==
X-Forwarded-Encrypted: i=1;
 AJvYcCUpzP67O37VLCkQeIhqHI8TFZ/Cn9qEJHP2lkAu47SpnFalyLngBQ2yVB1JFqbJKaahI8cBYg6H2YG3KC8=@vger.kernel.org
X-Gm-Message-State: AOJu0YyrCNORCc4mbZTaCa4nRm7xGS87kRrOVH7PdwnNbCtGCubcsORg
	h+ecqwMEosf4jkMg993kBb3Vsv5jl0a2r8zaiHUCIWGbR/Httam967MB5q5xHMU=
X-Gm-Gg: ASbGnctB8Z6Gk4/LDTJaMAJqnG9ZY8hF4E4YEtlnYYPjNoa6euXj4zaKZFn96asYLqC
	N+TC+Skz46p53T1au7zu2sYr/rrMOVi5d14zucybx4LLeGx0bTZRL6bwrII/oEPkaSMdWGdXcOu
	EULU4GAKPrR5kflZN/2qxbGV3K7OJoI6lSgnN7rL20sNIfcnRBPsLuDFZ1k+rK2MhrMnvCJ36L+
	mDn/fzI2AbBdDoz2U5TeyNzq4+64mf/suM1s5RhGIuItq+kER0pW6qvhrC5Z5ZrQA==
X-Google-Smtp-Source: 
 AGHT+IE7yhM6n2qT4bX3AO5rYQwaGYYd/DtDCrDHq1aDPS9UWpDNflA38svKPTs5KwZ6j4hcosRHxA==
X-Received: by 2002:a05:600c:cc7:b0:434:f4fa:83c4 with SMTP id
 5b1f17b1804b1-4362aaa65a2mr167340045e9.29.1734451659707;
        Tue, 17 Dec 2024 08:07:39 -0800 (PST)
Received: from vingu-cube.. ([2a01:e0a:f:6020:4e5f:e8c8:aade:2d1b])
        by smtp.gmail.com with ESMTPSA id
 5b1f17b1804b1-436257176a4sm176739435e9.38.2024.12.17.08.07.37
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 17 Dec 2024 08:07:39 -0800 (PST)
From: Vincent Guittot <vincent.guittot@linaro.org>
To: mingo@redhat.com,
	peterz@infradead.org,
	juri.lelli@redhat.com,
	dietmar.eggemann@arm.com,
	rostedt@goodmis.org,
	bsegall@google.com,
	mgorman@suse.de,
	vschneid@redhat.com,
	lukasz.luba@arm.com,
	rafael.j.wysocki@intel.com,
	linux-kernel@vger.kernel.org
Cc: qyousef@layalina.io,
	hongyan.xia2@arm.com,
	pierre.gondois@arm.com,
	christian.loehle@arm.com,
	qperret@google.com,
	Vincent Guittot <vincent.guittot@linaro.org>
Subject: [PATCH 6/7 v2] sched/fair: Add misfit case to push task callback for
 EAS
Date: Tue, 17 Dec 2024 17:07:19 +0100
Message-ID: <20241217160720.2397239-7-vincent.guittot@linaro.org>
X-Mailer: git-send-email 2.43.0
In-Reply-To: <20241217160720.2397239-1-vincent.guittot@linaro.org>
References: <20241217160720.2397239-1-vincent.guittot@linaro.org>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Some task misfit cases can be handled directly by the push callback
instead of triggering an idle load balance to pull the task on a better
CPU.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>

# Conflicts:
#	kernel/sched/fair.c
---
 kernel/sched/fair.c | 53 +++++++++++++++++++++++++++++----------------
 1 file changed, 34 insertions(+), 19 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2affc063da55..9bddb094ee21 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8541,6 +8541,8 @@ static int find_energy_efficient_cpu(struct task_stru=
ct *p, int prev_cpu)
 			target_stat.runnable =3D cpu_runnable(cpu_rq(cpu));
 			target_stat.capa =3D capacity_of(cpu);
 			target_stat.nr_running =3D cpu_rq(cpu)->cfs.h_nr_runnable;
+			if ((p->on_rq) && (!p->se.sched_delayed) && (cpu =3D=3D prev_cpu))
+				target_stat.nr_running--;
=20
 			/* If the target needs a lower OPP, then look up for
 			 * the corresponding OPP and its associated cost.
@@ -8623,48 +8625,58 @@ static int find_energy_efficient_cpu(struct task_st=
ruct *p, int prev_cpu)
=20
 static inline bool task_misfit_cpu(struct task_struct *p, int cpu)
 {
-	unsigned long max_capa =3D get_actual_cpu_capacity(cpu);
-	unsigned long util =3D task_util_est(p);
+	unsigned long max_capa, util;
+
+	if (p->nr_cpus_allowed =3D=3D 1)
+		return false;
=20
-	max_capa =3D min(max_capa, uclamp_eff_value(p, UCLAMP_MAX));
-	util =3D max(util, task_runnable(p));
+	max_capa =3D min(get_actual_cpu_capacity(cpu),
+		       uclamp_eff_value(p, UCLAMP_MAX));
+	util =3D max(task_util_est(p), task_runnable(p));
=20
 	/*
 	 * Return true only if the task might not sleep/wakeup because of a low
 	 * compute capacity. Tasks, which wake up regularly, will be handled by
 	 * feec().
 	 */
-	return (util > max_capa);
+	if (util > max_capa)
+		return true;
+
+	/* Return true if the task doesn't fit anymore to run on the cpu */
+	if ((arch_scale_cpu_capacity(cpu) < p->max_allowed_capacity) && !task_fit=
s_cpu(p, cpu))
+		return true;
+
+	return false;
 }
=20
 static int active_load_balance_cpu_stop(void *data);
=20
-static inline void migrate_misfit_task(struct task_struct *p, struct rq *r=
q)
+static inline bool migrate_misfit_task(struct task_struct *p, struct rq *r=
q)
 {
 	int new_cpu, cpu =3D cpu_of(rq);
=20
 	if (!sched_energy_enabled() || is_rd_overutilized(rq->rd))
-		return;
+		return false;
=20
 	if (WARN_ON(!p))
-		return;
+		return false;
=20
-	if (WARN_ON(p !=3D rq->curr))
-		return;
+	if (WARN_ON(!task_current(rq, p)))
+		return false;
=20
 	if (is_migration_disabled(p))
-		return;
+		return false;
=20
-	if ((rq->nr_running > 1) || (p->nr_cpus_allowed =3D=3D 1))
-		return;
+	if (rq->nr_running > 1)
+		return false;
=20
 	if (!task_misfit_cpu(p, cpu))
-		return;
+		return false;
=20
 	new_cpu =3D find_energy_efficient_cpu(p, cpu);
=20
 	if (new_cpu =3D=3D cpu)
-		return;
+		return false;
=20
 	/*
 	 * ->active_balance synchronizes accesses to
@@ -8675,13 +8687,15 @@ static inline void migrate_misfit_task(struct task_=
struct *p, struct rq *rq)
 		rq->active_balance =3D 1;
 		rq->push_cpu =3D new_cpu;
 	} else
-		return;
+		return false;
=20
 	raw_spin_rq_unlock(rq);
 	stop_one_cpu_nowait(cpu,
 		active_load_balance_cpu_stop, rq,
 		&rq->active_balance_work);
 	raw_spin_rq_lock(rq);
+
+	return true;
 }
=20
 static inline int has_pushable_tasks(struct rq *rq)
@@ -13299,9 +13313,10 @@ static void task_tick_fair(struct rq *rq, struct t=
ask_struct *curr, int queued)
 	if (static_branch_unlikely(&sched_numa_balancing))
 		task_tick_numa(rq, curr);
=20
-	migrate_misfit_task(curr, rq);
-	update_misfit_status(curr, rq);
-	check_update_overutilized_status(task_rq(curr));
+	if (!migrate_misfit_task(curr, rq)) {
+		update_misfit_status(curr, rq);
+		check_update_overutilized_status(task_rq(curr));
+	}
=20
 	task_tick_core(rq, curr);
 }
--=20
2.43.0
From nobody Mon Feb  9 01:56:08 2026
Received: from mail-wm1-f48.google.com (mail-wm1-f48.google.com
 [209.85.128.48])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id E23FE1F9428
	for <linux-kernel@vger.kernel.org>; Tue, 17 Dec 2024 16:07:42 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.128.48
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1734451664; cv=none;
 b=svG/b0XVaAXtsQtFzKRG4/iN+srAAJJullF8gAzSF1jCha8drDndqP8NERiwcFeD6EjaaU+COwfk10HeoDmZNcQIZyYi0DvsRjlq10pQsQn9oC4+wuHkVfuLlusOY208eeEVRkiwNUWKdnECN9w0yy1gJTSP8a1uFkjeQspdG1w=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1734451664; c=relaxed/simple;
	bh=ahlxghrgd0S6AbzHWBxPPhV3sFgWCGSBYKhL5Ip9ApA=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version;
 b=ddzGsB0+1kkNS2tthAoGE9Ls5XvN5qHgQr/h+PFgAHa3UA8FpEAOQUcnyCpMoHjDgGoD+lBsKPc1AJUxN3gzT1Kb4OWZ8bBhOJYw6M/fc+w8InriGV/BWrIdUcOaPUnbJ34wDgFhV6FWWuf6M2fELTg1IJbJTtq5DKzuHDJNOFk=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linaro.org;
 spf=pass smtp.mailfrom=linaro.org;
 dkim=pass (2048-bit key) header.d=linaro.org header.i=@linaro.org
 header.b=TvzS/Oym; arc=none smtp.client-ip=209.85.128.48
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=linaro.org
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=linaro.org
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=linaro.org header.i=@linaro.org
 header.b="TvzS/Oym"
Received: by mail-wm1-f48.google.com with SMTP id
 5b1f17b1804b1-436341f575fso42514365e9.1
        for <linux-kernel@vger.kernel.org>;
 Tue, 17 Dec 2024 08:07:42 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=linaro.org; s=google; t=1734451661; x=1735056461;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=25XBrM0NJ6pePI9T2sJJh5JraqVj1fYsT/JFIPhqnSU=;
        b=TvzS/Oymv9LgVjmZmSp/INsrmwIy4JjMDy7fukwVLDuajFlvDy6O1KDEHYLUJ6TmMV
         McVt+BLmJTNb6cq+lFPVSwXDt5oJVkmAyXYR3Cvp5ZT0QIXPq77K8GG7e1WGJAIe+drq
         /8fp7vGhzQFgujLBQp+Amh7OXJoNVmnhlA/OOZBqUHP6+Y3bK/ZvE5PIgO9Tnknjqo2S
         oZHT4MbyUjf1T2QC9EupmGvKjvIRbBbppRkDlSpsLkVpX9AtPvNQeipAqHky9EPZ5Q8d
         w7SPynzPDp9NOmLx48BaUiIBh9t4UVnlWEYNjOVIx67iWnP2IJ0+m8zNLp1REFTLFi+Q
         S0Hw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1734451661; x=1735056461;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=25XBrM0NJ6pePI9T2sJJh5JraqVj1fYsT/JFIPhqnSU=;
        b=NNTrJYZ8XUieOARCLawzp/B9jzUWluGPLHvLPiWAZa4y5G3/wsWVgHp5uGGnWcZBr9
         bAcGK+Pwpt9R2jyiwNfiRph1GUIWF+Owvr5C2J/BOnrjXXqu6YmD2jpfVKQx+bDzuQ5F
         vUk4bDZFRfPCXWEmevccFEa8gGh8A8rFrgQW/Oja8xM4F+PuINyQwuOOKQIac58Qd43M
         YUj5W86MYcE4HS3soiai0o93P0vttUiQjQP59vEBiBJx15gCWWLKbZYUK9kNmbxpFqO7
         Nkk6WEZQK9TpuPlJUl3/dCSqdYwEKtHqDAv5GXHLMimFXWZ/xV+GyO13Pye052GqS8Vl
         YlKA==
X-Forwarded-Encrypted: i=1;
 AJvYcCXpBPXhCyYTnnLi0ROQQoazCCCh76gN0r5/VVNXJ8Y/H2uL1KdRN+KT65rbG8nZBx2U8cb0O/YvwFXt2Ik=@vger.kernel.org
X-Gm-Message-State: AOJu0YwzH+itfF/GbAuZia1XA+/HAhCm+xOqGy5eKfYRM9obkb+Q/ogx
	OOmDUN1/FwhsmnzY57r4Mta/4wYmPMe9RcfNtZ5yi8PDpxIZHeC9iT9IYmJ0Yeg=
X-Gm-Gg: ASbGncuAZ5xBGHujmvs/L7lXf/btcJgboaSqPkkEQ8Z2ybjPwg3zIhkdX2DSXlYf6cu
	q9UeBwsE0wGc0T8Nkr5en4pgF7NoQVjyhB1rfX0x7T3Cua3ac0oY+f64LeXQlEKMuZnlpDBZRiN
	CiAQBbL+quG3Ev47pha9SU9jj0xeP+ns9+Vs/OYG/PwWNivEwIOhR6kPN8dhxmZP00Fd8Ia2eYR
	/MjP9AFEV7iHFlWKZhOKo/mzckbM6QXw2REsKcOwSF6oDBZc1yO0fo52R6bo5FYIw==
X-Google-Smtp-Source: 
 AGHT+IGUdRAlaDf5RiyF6d3FjeSmlk/HeVhnRFJ2Zl4R4rz0gubeAGYhivS1TKe43q5VHpNM6Rcm5A==
X-Received: by 2002:a05:600c:1e19:b0:434:a734:d279 with SMTP id
 5b1f17b1804b1-4362aa5005fmr191362625e9.16.1734451661133;
        Tue, 17 Dec 2024 08:07:41 -0800 (PST)
Received: from vingu-cube.. ([2a01:e0a:f:6020:4e5f:e8c8:aade:2d1b])
        by smtp.gmail.com with ESMTPSA id
 5b1f17b1804b1-436257176a4sm176739435e9.38.2024.12.17.08.07.39
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 17 Dec 2024 08:07:40 -0800 (PST)
From: Vincent Guittot <vincent.guittot@linaro.org>
To: mingo@redhat.com,
	peterz@infradead.org,
	juri.lelli@redhat.com,
	dietmar.eggemann@arm.com,
	rostedt@goodmis.org,
	bsegall@google.com,
	mgorman@suse.de,
	vschneid@redhat.com,
	lukasz.luba@arm.com,
	rafael.j.wysocki@intel.com,
	linux-kernel@vger.kernel.org
Cc: qyousef@layalina.io,
	hongyan.xia2@arm.com,
	pierre.gondois@arm.com,
	christian.loehle@arm.com,
	qperret@google.com,
	Vincent Guittot <vincent.guittot@linaro.org>
Subject: [PATCH 7/7 v2] sched/fair: Update overutilized detection
Date: Tue, 17 Dec 2024 17:07:20 +0100
Message-ID: <20241217160720.2397239-8-vincent.guittot@linaro.org>
X-Mailer: git-send-email 2.43.0
In-Reply-To: <20241217160720.2397239-1-vincent.guittot@linaro.org>
References: <20241217160720.2397239-1-vincent.guittot@linaro.org>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Checking uclamp_min is useless and counterproductive for overutilized state
as misfit can now happen without being in overutilized state

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9bddb094ee21..9eb4c4946ddc 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6870,16 +6870,15 @@ static inline void hrtick_update(struct rq *rq)
 #ifdef CONFIG_SMP
 static inline bool cpu_overutilized(int cpu)
 {
-	unsigned long  rq_util_min, rq_util_max;
+	unsigned long rq_util_max;
=20
 	if (!sched_energy_enabled())
 		return false;
=20
-	rq_util_min =3D uclamp_rq_get(cpu_rq(cpu), UCLAMP_MIN);
 	rq_util_max =3D uclamp_rq_get(cpu_rq(cpu), UCLAMP_MAX);
=20
 	/* Return true only if the utilization doesn't fit CPU's capacity */
-	return !util_fits_cpu(cpu_util_cfs(cpu), rq_util_min, rq_util_max, cpu);
+	return !util_fits_cpu(cpu_util_cfs(cpu), 0, rq_util_max, cpu);
 }
=20
 /*
--=20
2.43.0