From nobody Wed Dec 17 17:46:13 2025 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id C08AF207DF5 for ; Tue, 4 Mar 2025 14:23:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741098231; cv=none; b=iAWC89oKji8+lpKGVCgBjwIyZyLCErO2tmEiQmKk7qpFxWoEGRnlJ1ceV3R/csYwo0dd592+CS4E5A+ZGwcvyilxT87hbpVt22FMeKvdnbkGgdBygL60HaEw7qy70aMQ22wXugJgmGcWsEyOfZPRdH0yGJHNK86yrn1+kfNNKn8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1741098231; c=relaxed/simple; bh=+/4CvG4LsmN5Sf5U9c6W1i9TGVpKsEVoSEJ4l2pow9M=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=efQSGha87g25eZFguSxpdtF2776pC4rWogeJB7ka1tb3VLQx8njc4vTyPCEDQ0fXYrNLlbOz6+ye4k7rsLs0YzaE/BMhBP1Qanz9PFLcjyXv13ojyNixPMFw6Ct2QVxVBCP+jxmEStcZX8FyrhH1jHKH2vvKmq2+MAcT54c1JmY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 11748FEC; Tue, 4 Mar 2025 06:24:03 -0800 (PST) Received: from e130256.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id B135F3F66E; Tue, 4 Mar 2025 06:23:47 -0800 (PST) From: Hongyan Xia To: Ingo Molnar , Peter Zijlstra , Vincent Guittot , Dietmar Eggemann , Juri Lelli , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider Cc: Morten Rasmussen , Lukasz Luba , Christian Loehle , Pierre Gondois , linux-kernel@vger.kernel.org Subject: [PATCH v2 8/8] sched/uclamp: Solve under-utilization problem Date: Tue, 4 Mar 2025 14:23:15 +0000 Message-Id: <94048802c665752e92d1d354fdc38dd95ffe4a03.1741091349.git.hongyan.xia2@arm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" With sum aggregation, a heavily uclamp_max-throttled task may throttle the whole rq, resulting in low OPP. For example, two tasks having the same priority and both tasks are always-running tasks. One task has no uclamp values but the other has uclamp_max of 1. Then, under sum aggregation, the CPU will run at 512 + 1 =3D 513 OPP, which means the task without uclamp_max only gets 513 / 2 = =3D 256 utilization, even though the CPU still can run faster. With this patch, we do not throttle a uclamp_max too hard such that it impacts other tasks. This is done by tracking the highest uclamp_factor and any uclamp_max tasks cannot throttle more than this factor allows. Signed-off-by: Hongyan Xia --- kernel/sched/fair.c | 12 ++++++++++++ kernel/sched/pelt.c | 33 +++++++++++++++++++++++++++++---- kernel/sched/sched.h | 2 ++ 3 files changed, 43 insertions(+), 4 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 944953b90297..966ca63da3fa 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7159,6 +7159,18 @@ static int dequeue_entities(struct rq *rq, struct sc= hed_entity *se, int flags) if (p) { util_bias_dequeue(rq, p); propagate_negative_bias(p); + if (p->pid =3D=3D rq->max_uclamp_factor_pid) { + /* + * If the task with the highest uclamp_factor gets + * dequeued, the correct thing to do is to set pid and + * factor to the second highest. However, the overhead + * isn't really necessary because the second highest + * will set these fields the next time it gets updated + * anyway. + */ + rq->max_uclamp_factor_pid =3D -1; + rq->max_uclamp_factor =3D 0; + } } =20 if (rq_h_nr_queued && !rq->cfs.h_nr_queued) diff --git a/kernel/sched/pelt.c b/kernel/sched/pelt.c index f38abe6f0b8b..e96ca045af2e 100644 --- a/kernel/sched/pelt.c +++ b/kernel/sched/pelt.c @@ -271,8 +271,8 @@ ___update_load_avg(struct sched_avg *sa, unsigned long = load) static void util_bias_update(struct task_struct *p) { unsigned int util, uclamp_min, uclamp_max; - struct rq *rq; - int old, new; + struct rq *rq =3D task_rq(p); + int old, new, clamped_util, prio =3D p->prio - MAX_RT_PRIO; =20 util =3D READ_ONCE(p->se.avg.util_avg); uclamp_min =3D uclamp_eff_value(p, UCLAMP_MIN); @@ -284,12 +284,37 @@ static void util_bias_update(struct task_struct *p) if (uclamp_max =3D=3D SCHED_CAPACITY_SCALE) uclamp_max =3D UINT_MAX; old =3D READ_ONCE(p->se.avg.util_avg_bias); - new =3D (int)clamp(util, uclamp_min, uclamp_max) - (int)util; + clamped_util =3D (int)clamp(util, uclamp_min, uclamp_max); + if (p->se.on_rq && prio >=3D 0) { + /* We only do this for fair class priorities. */ + u64 uclamp_factor =3D sched_prio_to_wmult[prio]; + + /* This has to be a 64-bit multiplication. */ + uclamp_factor *=3D clamped_util; + if (rq->max_uclamp_factor_pid =3D=3D p->pid) { + rq->max_uclamp_factor =3D uclamp_factor; + } else if (uclamp_factor > rq->max_uclamp_factor) { + rq->max_uclamp_factor =3D uclamp_factor; + rq->max_uclamp_factor_pid =3D p->pid; + } else { + u32 weight =3D sched_prio_to_weight[prio]; + + /* + * We cannot throttle too much if some other task is + * running at high utilization. We should prioritize + * giving that task enough utilization and respect + * task priority, before enforcing uclamp_max. + */ + uclamp_max =3D max(uclamp_max, + (rq->max_uclamp_factor * weight) >> 32); + clamped_util =3D (int)clamp(util, uclamp_min, uclamp_max); + } + } + new =3D clamped_util - (int)util; =20 WRITE_ONCE(p->se.avg.util_avg_bias, new); if (!p->se.on_rq) return; - rq =3D task_rq(p); WRITE_ONCE(rq->cfs.avg.util_avg_bias, READ_ONCE(rq->cfs.avg.util_avg_bias) + new - old); } diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 654eede62979..0dc90208ad73 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1086,6 +1086,8 @@ struct rq { u64 nr_switches; =20 #ifdef CONFIG_UCLAMP_TASK + u64 max_uclamp_factor; + pid_t max_uclamp_factor_pid; #endif =20 struct cfs_rq cfs; --=20 2.34.1