From nobody Mon Apr 6 10:45:03 2026 Received: from canpmsgout06.his.huawei.com (canpmsgout06.his.huawei.com [113.46.200.221]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 81520303A01 for ; Fri, 20 Mar 2026 06:20:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=113.46.200.221 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773987618; cv=none; b=WgDSzQdIHy6edNzjAqEUxS+ZVrN5KaklU/LptnVElPY/jlP8ijpg8Msn/aWNrUKLEIh9uyBULAAm0qTeJV0pPUBvGu/JbUPxeKCGoR/jX+gLvSwwCWB9wDuKbf8Y1ZxjqdTRE0PVHuZ5jMMCUK1BKTY9QOrUhFtFUgjS++nBqho= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773987618; c=relaxed/simple; bh=w2dDjnJ9sdTQwFdgEqM8rz7qsxYTvw1savKKz+G4LPU=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=qjWg1Q3DAD4cCF/wFJeVp9vyPJpB0itUd+kJnN0MS5NQEt7+Oq9wdX3SEOj8SskSMqO0z414dg5M4ELb9z2Jde8hUaq+aB5AgNAjMPWdFc5l7+DYXj8uj7JJalBjzO95GybyH6mIJ/eqAWjRdgK8NrYG5fJLdllQMIBtH3GT/WY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; dkim=pass (1024-bit key) header.d=huawei.com header.i=@huawei.com header.b=3rVF1J8j; arc=none smtp.client-ip=113.46.200.221 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=huawei.com header.i=@huawei.com header.b="3rVF1J8j" dkim-signature: v=1; a=rsa-sha256; d=huawei.com; s=dkim; c=relaxed/relaxed; q=dns/txt; h=From; bh=qUWXZzOAXYq25qwmnuhGtzpPIq6pA2UCWm2XP2yPcJI=; b=3rVF1J8jy1te2qsc18Z7hTg0lsWsDjJh7VaL4lx7zSa4iLrW1FUWu6WwYkBaoLldoR34BXSat Pjg6pokdbZgm6YAfs1OXQWIsjTLOEu3QYud2+cth9v+Ro4CoaSJjLY2IQN4oumGGmY1KJNC6s8o P2IQGvyDL+82zz0EgQU+VPs= Received: from mail.maildlp.com (unknown [172.19.162.140]) by canpmsgout06.his.huawei.com (SkyGuard) with ESMTPS id 4fcXNb56lpzRhR3; Fri, 20 Mar 2026 14:15:11 +0800 (CST) Received: from kwepemr500016.china.huawei.com (unknown [7.202.195.68]) by mail.maildlp.com (Postfix) with ESMTPS id 94B432012A; Fri, 20 Mar 2026 14:20:11 +0800 (CST) Received: from huawei.com (10.67.174.242) by kwepemr500016.china.huawei.com (7.202.195.68) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Fri, 20 Mar 2026 14:20:10 +0800 From: Chen Jinghuang To: , , , , CC: , , , , , , , Subject: [RFC PATCH v5 4/9] sched/fair: Dynamically update cfs_overload_cpus Date: Fri, 20 Mar 2026 05:59:15 +0000 Message-ID: <20260320055920.2518389-5-chenjinghuang2@huawei.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260320055920.2518389-1-chenjinghuang2@huawei.com> References: <20260320055920.2518389-1-chenjinghuang2@huawei.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: kwepems500001.china.huawei.com (7.221.188.70) To kwepemr500016.china.huawei.com (7.202.195.68) Content-Type: text/plain; charset="utf-8" From: Steve Sistare An overloaded CPU has more than 1 runnable task. When a CFS task wakes on a CPU, if h_nr_runnable transitions from 1 to more, then set the CPU in the cfs_overload_cpus bitmap. When a CFS task sleeps, if h_nr_runnable transitions from 2 to less, then clear the CPU in cfs_overload_cpus. Signed-off-by: Steve Sistare Signed-off-by: Chen Jinghuang --- v5: Rename h_nr_running to h_nr_runnable and reposition overload_set/overload_clear to fix overload detection for delay dequeue. v4: Detect CPU overload via changes in h_nr_running. --- kernel/sched/fair.c | 45 ++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 44 insertions(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index eea99ec01a3f..92c3bcff5b6b 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -55,6 +55,7 @@ #include =20 #include "sched.h" +#include "sparsemask.h" #include "stats.h" #include "autogroup.h" =20 @@ -5076,6 +5077,33 @@ static inline void update_misfit_status(struct task_= struct *p, struct rq *rq) rq->misfit_task_load =3D max_t(unsigned long, task_h_load(p), 1); } =20 +#ifdef CONFIG_SMP +static void overload_clear(struct rq *rq) +{ + struct sparsemask *overload_cpus; + + rcu_read_lock(); + overload_cpus =3D rcu_dereference(rq->cfs_overload_cpus); + if (overload_cpus) + sparsemask_clear_elem(overload_cpus, rq->cpu); + rcu_read_unlock(); +} + +static void overload_set(struct rq *rq) +{ + struct sparsemask *overload_cpus; + + rcu_read_lock(); + overload_cpus =3D rcu_dereference(rq->cfs_overload_cpus); + if (overload_cpus) + sparsemask_set_elem(overload_cpus, rq->cpu); + rcu_read_unlock(); +} +#else /* CONFIG_SMP */ +static inline void overload_clear(struct rq *rq) {} +static inline void overload_set(struct rq *rq) {} +#endif + void __setparam_fair(struct task_struct *p, const struct sched_attr *attr) { struct sched_entity *se =3D &p->se; @@ -5955,6 +5983,7 @@ static bool throttle_cfs_rq(struct cfs_rq *cfs_rq) if (!dequeue) return false; /* Throttle no longer required. */ =20 + /* freeze hierarchy runnable averages while throttled */ rcu_read_lock(); walk_tg_tree_from(cfs_rq->tg, tg_throttle_down, tg_nop, (void *)rq); @@ -6875,6 +6904,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *= p, int flags) int h_nr_idle =3D task_has_idle_policy(p); int h_nr_runnable =3D 1; int task_new =3D !(flags & ENQUEUE_WAKEUP); + unsigned int prev_nr =3D rq->cfs.h_nr_runnable; int rq_h_nr_queued =3D rq->cfs.h_nr_queued; u64 slice =3D 0; =20 @@ -6892,6 +6922,10 @@ enqueue_task_fair(struct rq *rq, struct task_struct = *p, int flags) =20 if (flags & ENQUEUE_DELAYED) { requeue_delayed_entity(se); + + if (prev_nr <=3D 1 && rq->cfs.h_nr_runnable >=3D 2) + overload_set(rq); + return; } =20 @@ -6961,6 +6995,8 @@ enqueue_task_fair(struct rq *rq, struct task_struct *= p, int flags) =20 /* At this point se is NULL and we are at root level*/ add_nr_running(rq, 1); + if (prev_nr <=3D 1 && rq->cfs.h_nr_runnable >=3D 2) + overload_set(rq); =20 /* * Since new tasks are assigned an initial util_avg equal to @@ -7003,6 +7039,7 @@ static int dequeue_entities(struct rq *rq, struct sch= ed_entity *se, int flags) int h_nr_idle =3D 0; int h_nr_queued =3D 0; int h_nr_runnable =3D 0; + unsigned int prev_nr =3D rq->cfs.h_nr_runnable; struct cfs_rq *cfs_rq; u64 slice =3D 0; =20 @@ -7018,8 +7055,12 @@ static int dequeue_entities(struct rq *rq, struct sc= hed_entity *se, int flags) cfs_rq =3D cfs_rq_of(se); =20 if (!dequeue_entity(cfs_rq, se, flags)) { - if (p && &p->se =3D=3D se) + if (p && &p->se =3D=3D se) { + if (prev_nr >=3D 2 && rq->cfs.h_nr_runnable <=3D 1) + overload_clear(rq); + return -1; + } =20 slice =3D cfs_rq_min_slice(cfs_rq); break; @@ -7077,6 +7118,8 @@ static int dequeue_entities(struct rq *rq, struct sch= ed_entity *se, int flags) } =20 sub_nr_running(rq, h_nr_queued); + if (prev_nr >=3D 2 && rq->cfs.h_nr_runnable <=3D 1) + overload_clear(rq); =20 /* balance early to pull high priority tasks */ if (unlikely(!was_sched_idle && sched_idle_rq(rq))) --=20 2.34.1