From nobody Mon Feb 9 07:19:28 2026 Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DF08528EC for ; Thu, 26 Dec 2024 02:09:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.188 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1735178985; cv=none; b=t87ie+HppfTsVNNSZO1asqe23NU6reBTtvfPfoxZt5zQ7aiT6DO9hza/1Nts6VJP2Bd4zQdESMUNkXJ7rXdu1rSzWfA/fpVAfLgyBzSyjZYLPP6VNExxfYfdtPV7G2dgk7nt1xbVO6r8icvmrHNTC4EQEDTXJoHisyWk+LqsIuw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1735178985; c=relaxed/simple; bh=AbprpRWPMWZHDLO7RZKw4G76NIiS8OVR5URnK0/3PrA=; h=From:To:CC:Subject:Date:Message-ID:References:In-Reply-To: Content-Type:MIME-Version; b=nTdRdWbsQjkzpDpNCBa2jlwqlIzkX5CTIeXx3D9qPPbxBhkdeO1tlwh5RAyQclojKuzdZ8Y1I7qxg+BrQGA+BK9boEAlul7Ts8Jrhw27tzn82Q8LIx5PrN5C4ZRvX8VjGbgXhQbkBdGY0m5DxifrnmGWcZQiRb6/o+Qol+jE/mI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=45.249.212.188 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.88.194]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4YJX8P4WNdzrS3l; Thu, 26 Dec 2024 10:07:49 +0800 (CST) Received: from dggpemf100002.china.huawei.com (unknown [7.185.36.19]) by mail.maildlp.com (Postfix) with ESMTPS id DF7F81402CB; Thu, 26 Dec 2024 10:09:33 +0800 (CST) Received: from dggpemf100013.china.huawei.com (7.185.36.179) by dggpemf100002.china.huawei.com (7.185.36.19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Thu, 26 Dec 2024 10:09:33 +0800 Received: from dggpemf100013.china.huawei.com ([7.185.36.179]) by dggpemf100013.china.huawei.com ([7.185.36.179]) with mapi id 15.02.1544.011; Thu, 26 Dec 2024 10:09:33 +0800 From: "liukai (Y)" To: "mingo@redhat.com" , "peterz@infradead.org" , "juri.lelli@redhat.com" , "vincent.guittot@linaro.org" CC: "linux-kernel@vger.kernel.org" , "tanghui (C)" , "Zhangqiao (2012 lab)" , "Chenhui (Judy)" , "'weiliang.qwl@antgroup.com'" , "'henry.hj@antgroup.com'" , "'yanyan.yan@antgroup.com'" , "'libang.li@antgroup.com'" , "liwei (JK)" Subject: Trade-off between load_balance frequency and CPU utilization under high load Thread-Topic: Trade-off between load_balance frequency and CPU utilization under high load Thread-Index: AdtS1bduYNLkkWcuRV6j6HJAt9SDLgEYfquw Date: Thu, 26 Dec 2024 02:09:33 +0000 Message-ID: References: <0aab22639ee0476a9a942bc4b06ebbce@huawei.com> In-Reply-To: <0aab22639ee0476a9a942bc4b06ebbce@huawei.com> Accept-Language: en-US Content-Language: zh-CN X-MS-Has-Attach: X-MS-TNEF-Correlator: Content-Transfer-Encoding: quoted-printable Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" In our performance experiments, by gradually increasing the CPU load, we observed that under high load, the CPU utilization (node CPU) in kernel 6.6 is higher than in 4.19, reaching up to 4% higher. By capturing flame graph data, we found that the total execution time of load_balance in kernel 6.6 is 18% longer than in 4.19. Benchmark: specjbb // 6.6 index target QPS actual QPS RT pod CPU node CPU 1 60000 60004 0.73 5.06 14.97 2 120000 120074 0.79 10.22 29.67 3 180000 180866 0.87 16.00 45.91 4 240000 240091 0.92 21.69 62.94 Benchmark: specjbb // 4.19 index target QPS actual QPS RT pod CPU node CPU 1 60000 60004 0.72 4.86 14.81 2 120000 120074 0.79 9.69 29.52 3 180000 180870 0.83 14.57 42.72 4 240000 240074 0.90 19.55 58.59 we found that in kernel 6.6, the execution of load_balance is less costly. Even under high load, the condition this_rq->avg_idle < sd->max_newidle_lb_cost is still easily satisfied. As a result, compared to kernel 4.19, load_balance is executed more frequently in 6.6, leading to higher CPU utilization. if (!READ_ONCE(this_rq->rd->overload) || (sd && this_rq->avg_idle < sd->max_newidle_lb_cost)) { if (sd) update_next_balance(sd, &next_balance); rcu_read_unlock(); goto out; } We identified that the changes introduced by this patch (https://lore.kernel.org/all/20211021095219.GG3891@suse.de/). --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -10895,8 +10895,7 @@ static int newidle_balance(struct rq *this_rq, stru= ct rq_flags *rf) rcu_read_lock(); sd =3D rcu_dereference_check_sched_domain(this_rq->sd); - if (this_rq->avg_idle < sysctl_sched_migration_cost || - !READ_ONCE(this_rq->rd->overload) || + if (!READ_ONCE(this_rq->rd->overload) || (sd && this_rq->avg_idle < sd->max_newidle_lb_cost)) { if (sd) this_rq->avg_idle < sysctl_sched_migration_cost can reduce the frequency of load_balance under high load, is there any way to dynamically adjust the execution of load_balance in high-load scenarios, in order to strike a balance between maintaining good CPU utilization and avoiding unnecessary load_balance executions?