From nobody Sat Feb 7 21:24:31 2026 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C74F02749C1 for ; Thu, 15 Jan 2026 07:36:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768462582; cv=none; b=jDlThUVP50TaLxF2dXSiKp4OKgAsCVfAbwkEWsxHkgF9MlWHaKTP7lnikvtoDWnqg8wrv48HtFgq5pn3LkXdDvdPPzm1A03RGXiKgXtjUVXjlTg88CTO3zVWBEbUKHnZIPqKc8i7elmhYGLa/BZGb4eguxd9P8RxyU7Y5Vv5cIs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768462582; c=relaxed/simple; bh=Fw/N5YvomKfaTQXdNsTMP0auJQqXiA/bjGNSx91BY0A=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=TkNSpG1+DnLoHe2X6r5eCC+OwzM8YvBsjAiFNuOXXevu95Ln/jxbmXYAAfI1pFnJB4Gq4RFprIGFJhAxw9maxgzNoHUwk+T9WWzZWgoaA6USQ70X+oK/qNifBfrBk9lBX8OVfF91FJpY3/UQlAyYmCrM6zkL0ph3w/xtCBupaGM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=L/hr+izX; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="L/hr+izX" Received: from pps.filterd (m0353725.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 60F6u9oH025015; Thu, 15 Jan 2026 07:35:43 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=qCfjdGa4M2c8r7kbr 33Crl8xwnVO96UB/7SryoRGwCM=; b=L/hr+izXSFLgWUHk/PpSugwBIcOdTvSGp eFCNkPPaBwA54kcs2cWraMmLsWE2dD4lw+f1YgyHA5rYKY6cDSsrd/fnKtqy/28I DV1q36d7hOhaioY7MHMQHonexBuHMKWgSwIVXh7BK1gYkVyIiF6+zqragYFgB0f1 GkqAjIqfB60oASjlOxlcmlRnEpwKhWpsxHi1bTaTdAx0VJoS+E42S55+XopJI2EU +MJHpy9XeUvKfKUYEunfpxtwqbJLm4Q7tqWVYNikjwHdawwqK4PrDJjGzXpGc2bQ ykJFg71nu+goJYB0RF7DoD+66g/+hbwngvtJ5MwU8zkMQbOYfwCVA== Received: from ppma23.wdc07v.mail.ibm.com (5d.69.3da9.ip4.static.sl-reverse.com [169.61.105.93]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4bkd6ecy5q-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 15 Jan 2026 07:35:43 +0000 (GMT) Received: from pps.filterd (ppma23.wdc07v.mail.ibm.com [127.0.0.1]) by ppma23.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 60F4PZk5025838; Thu, 15 Jan 2026 07:35:42 GMT Received: from smtprelay05.fra02v.mail.ibm.com ([9.218.2.225]) by ppma23.wdc07v.mail.ibm.com (PPS) with ESMTPS id 4bm2kkpq0w-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 15 Jan 2026 07:35:42 +0000 Received: from smtpav06.fra02v.mail.ibm.com (smtpav06.fra02v.mail.ibm.com [10.20.54.105]) by smtprelay05.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 60F7Zd4S33817046 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 15 Jan 2026 07:35:39 GMT Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id CA3C820040; Thu, 15 Jan 2026 07:35:39 +0000 (GMT) Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 0F1202004B; Thu, 15 Jan 2026 07:35:37 +0000 (GMT) Received: from li-7bb28a4c-2dab-11b2-a85c-887b5c60d769.ibm.com.com (unknown [9.39.17.239]) by smtpav06.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 15 Jan 2026 07:35:36 +0000 (GMT) From: Shrikanth Hegde To: mingo@kernel.org, peterz@infradead.org, vincent.guittot@linaro.org, linux-kernel@vger.kernel.org Cc: sshegde@linux.ibm.com, kprateek.nayak@amd.com, juri.lelli@redhat.com, vschneid@redhat.com, tglx@kernel.org, dietmar.eggemann@arm.com, anna-maria@linutronix.de, frederic@kernel.org, wangyang.guo@intel.com Subject: [PATCH v5 1/3] sched/fair: Move checking for nohz cpus after time check Date: Thu, 15 Jan 2026 13:05:22 +0530 Message-ID: <20260115073524.376643-2-sshegde@linux.ibm.com> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20260115073524.376643-1-sshegde@linux.ibm.com> References: <20260115073524.376643-1-sshegde@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-GUID: PX0m6cl39yNTUCoWbqmM4K8QXv6Sddsi X-Authority-Analysis: v=2.4 cv=LLxrgZW9 c=1 sm=1 tr=0 ts=696898cf cx=c_pps a=3Bg1Hr4SwmMryq2xdFQyZA==:117 a=3Bg1Hr4SwmMryq2xdFQyZA==:17 a=vUbySO9Y5rIA:10 a=VkNPw1HP01LnGYTKEx00:22 a=zd2uoN0lAAAA:8 a=VnNF1IyMAAAA:8 a=BZ9Nrz06eRmaA-7oh88A:9 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMTE1MDA0NyBTYWx0ZWRfX1EhZJpvvjCEB 3MPxqEri5kZPFB5YeAXFrJelt2PIUJ/bGyMojfU1zSyvf5OWdPWYYJWACKLC4CLrbrS86lW4YxB nNWS92d2g0IiKuu9KybBf3L5ECti+YCzrc26VLI8tW3E31r5qrOtPjZRMApCPumdno0iyxQcjpg Rp/pX1fllVzu/ZgEQZ/wXZ07YZJINuEOGc4pBPpuQ8Y+D5T1XP8OJIbAZjF9aTEMXiEsqECHk1V UYncsMq9zlRxcR+OPt7D/ykawOcFMUib0SBtt6+2GLGSq2Jfxs8MGnyhHXyDnxj0BXAMkF+/O3x eMCcVho/qIrUEaK1CJhq/ZsGdOd59W5z8Db3owobWwWh4+XkTd1tJMK5YJAkQ3FDgYkn5Vo64qW 2M3I0QTv2hX601VeHAXDfYPrqm5zLiyiBqoV8pyiwiiSyspndS7BDASAUMjjMU8nWybr6qlFX7M znLAJdsfnWes7jP87nA== X-Proofpoint-ORIG-GUID: PX0m6cl39yNTUCoWbqmM4K8QXv6Sddsi X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2026-01-15_02,2026-01-14_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 suspectscore=0 clxscore=1015 spamscore=0 impostorscore=0 malwarescore=0 phishscore=0 adultscore=0 lowpriorityscore=0 bulkscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2512120000 definitions=main-2601150047 Content-Type: text/plain; charset="utf-8" Current code does.=20 - Read nohz.nr_cpus=20 - Check if the time has passed to do NOHZ idle balance Instead do this. - Check if the time has passed to do NOHZ idle balance - Read nohz.nr_cpus This will skip the read most of the time in normal system usage. i.e when there are nohz.nr_cpus (system is not 100% busy). Note that when there are no idle CPUs(100% busy), even if the flag gets set to NOHZ_STATS_KICK | NOHZ_NEXT_KICK, find_new_ilb will fail and there will be no NOHZ idle balance. In such cases there will be a very narrow window where, kick_ilb will be called un-necessarily. However current functionality is still retained. Note: This patch doesn't solve any cacheline overheads. No improvement in performance apart from saving a few cycles of reading nohz.nr_cpus Reviewed-and-tested-by: K Prateek Nayak Signed-off-by: Shrikanth Hegde Reviewed-by: Vincent Guittot --- kernel/sched/fair.c | 23 ++++++++++++++++------- 1 file changed, 16 insertions(+), 7 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index c921cdc6c3ed..a4910658c5d6 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -12445,20 +12445,29 @@ static void nohz_balancer_kick(struct rq *rq) */ nohz_balance_exit_idle(rq); =20 - /* - * None are in tickless mode and hence no need for NOHZ idle load - * balancing: - */ - if (likely(!atomic_read(&nohz.nr_cpus))) - return; - if (READ_ONCE(nohz.has_blocked_load) && time_after(now, READ_ONCE(nohz.next_blocked))) flags =3D NOHZ_STATS_KICK; =20 + /* + * Most of the time system is not 100% busy. i.e nohz.nr_cpus > 0 + * Skip the read if time is not due. + * + * If none are in tickless mode, there maybe a narrow window + * (28 jiffies, HZ=3D1000) where flags maybe set and kick_ilb called. + * But idle load balancing is not done as find_new_ilb fails. + * That's very rare. So read nohz.nr_cpus only if time is due. + */ if (time_before(now, nohz.next_balance)) goto out; =20 + /* + * None are in tickless mode and hence no need for NOHZ idle load + * balancing: + */ + if (likely(!atomic_read(&nohz.nr_cpus))) + return; + if (rq->nr_running >=3D 2) { flags =3D NOHZ_STATS_KICK | NOHZ_BALANCE_KICK; goto out; --=20 2.51.0 From nobody Sat Feb 7 21:24:31 2026 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C11E22D0298 for ; Thu, 15 Jan 2026 07:36:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768462584; cv=none; b=jENawO05n/7D3O18CcZ6MiNtw0WWEvehNbWeOL2sd9CnpT4CTzQO9m7Smw8TK12Dmt90jF7G4EYV85jO2RtfjhFnU3RbEMs+zOb+9z7Y6YI29TqvLp6SjeoO3pYiY4v1cZLqh0m71GIR7juMHrrvQeYDNjCx84bIVrBpHoxe1Uk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768462584; c=relaxed/simple; bh=f1fUxevjEcw69goxmfLxUXHkDiCcrweKP/F5SF9lsBI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=fMD0gFnoRrfk5WUDVOQUGbTMHqUXqON5tpfLkkW6Xy7jYcbWe0z+m4lwYOxsKPG9onywVND0YXbMJidA03jruEkUagT64lfFV87U0Szdcc4ZWY1GNggdlwxtnAriA3Fs2Em0yibTF0Eb+kekTmcCgV5g6Zg51m3pxpV8jyxlp3E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=Al+OO2PO; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="Al+OO2PO" Received: from pps.filterd (m0360072.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 60F5hCVv019847; Thu, 15 Jan 2026 07:35:46 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=bh2uSKkn+NbBHef4D VWTiGlM/NdKaOPVnh/sZnXCRZ8=; b=Al+OO2POhJ8rBNgwWu54sVk52EPf5a1PZ LtXQf8Px55Ffp2MAuuR8bcCbNJhJFa/IdNGKg2ZRnjvzR4OG2HwaYp1dEJpzh4J8 4/xdKugSKP5HlhDhQMb7PpAhGxx0H0p/8DwX3D98ZbVZ0hYdB0oTjrb796nfOn8o FKujmiErOtiB2VkM/R0JGVzF0n5d86dkXJ5IfmgGPykyV7G/T0GRaF/OwXSrSEFz TX1lG6349lh20WEOMlYxzyu8Sdd365Nc33OIOyn0vnUXtKCIZtwAISq0NRKPXPAi mMvu9inhtEQzHjRZoDcFfHEWI9WqhO/Kx7ucfkwtKW/I8wuV1x+Hg== Received: from ppma13.dal12v.mail.ibm.com (dd.9e.1632.ip4.static.sl-reverse.com [50.22.158.221]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4bkedt4vt5-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 15 Jan 2026 07:35:46 +0000 (GMT) Received: from pps.filterd (ppma13.dal12v.mail.ibm.com [127.0.0.1]) by ppma13.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 60F4ZBHA029743; Thu, 15 Jan 2026 07:35:45 GMT Received: from smtprelay02.fra02v.mail.ibm.com ([9.218.2.226]) by ppma13.dal12v.mail.ibm.com (PPS) with ESMTPS id 4bm3ajxgr6-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 15 Jan 2026 07:35:45 +0000 Received: from smtpav06.fra02v.mail.ibm.com (smtpav06.fra02v.mail.ibm.com [10.20.54.105]) by smtprelay02.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 60F7ZhFq51904796 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 15 Jan 2026 07:35:43 GMT Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 4B46220049; Thu, 15 Jan 2026 07:35:43 +0000 (GMT) Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 2E6C820040; Thu, 15 Jan 2026 07:35:40 +0000 (GMT) Received: from li-7bb28a4c-2dab-11b2-a85c-887b5c60d769.ibm.com.com (unknown [9.39.17.239]) by smtpav06.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 15 Jan 2026 07:35:39 +0000 (GMT) From: Shrikanth Hegde To: mingo@kernel.org, peterz@infradead.org, vincent.guittot@linaro.org, linux-kernel@vger.kernel.org Cc: sshegde@linux.ibm.com, kprateek.nayak@amd.com, juri.lelli@redhat.com, vschneid@redhat.com, tglx@kernel.org, dietmar.eggemann@arm.com, anna-maria@linutronix.de, frederic@kernel.org, wangyang.guo@intel.com Subject: [PATCH v5 2/3] sched/fair: Change likelyhood of nohz.nr_cpus Date: Thu, 15 Jan 2026 13:05:23 +0530 Message-ID: <20260115073524.376643-3-sshegde@linux.ibm.com> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20260115073524.376643-1-sshegde@linux.ibm.com> References: <20260115073524.376643-1-sshegde@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMTE1MDA0NyBTYWx0ZWRfX7TVoPaN9edsZ 1VB+OT5nXYK4vlaquNkqirWo1DdgXN9SiZykzC2pOv9XlZRAKJn4S1k1ih2/DBoTuQ/FSVAd44D brsRf1zFYNQK++6KV9JPOIfbF6b6twLnRG5FXBVamGDO8As7h+gcvf48rVKj4ChmMb56Z0Lj/as zenxCpHpZu/268Vvv+pZDrXQyv8POXzJD7MI5YDFxiUHDNj3qTYPKZg+U1Vol9eNzNdoJtAJmGG 2/cAlE/ucmRMpx8qQaHGjutmpTn9nwNJyGJUBDbrInD78N79o4xtwOaXxCzBJQNx0OiUGZE8GJe 0juqn8IKRAWqXdyTPeVT412SJEYTa9QfwDrC54jjALX+BMEcQwRMBbo7ldGAMRxPaygYPlCUzXs JGMuCvuSPWwVlj3Wh6UBZm1fAR7uiIFVZVzkRSBiRqTN5CIUSqoecx9EBtgMJ0IgdCUsxaIPmqc HX/zQ9EfKCGcR9MFULA== X-Proofpoint-GUID: topDAAswqfAKt-YK-HSyzDHIkj18X2xb X-Authority-Analysis: v=2.4 cv=WLJyn3sR c=1 sm=1 tr=0 ts=696898d2 cx=c_pps a=AfN7/Ok6k8XGzOShvHwTGQ==:117 a=AfN7/Ok6k8XGzOShvHwTGQ==:17 a=vUbySO9Y5rIA:10 a=VkNPw1HP01LnGYTKEx00:22 a=zd2uoN0lAAAA:8 a=VnNF1IyMAAAA:8 a=gfQPMJu-hSpRr0ZdjbEA:9 X-Proofpoint-ORIG-GUID: topDAAswqfAKt-YK-HSyzDHIkj18X2xb X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2026-01-15_02,2026-01-14_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 adultscore=0 malwarescore=0 phishscore=0 suspectscore=0 priorityscore=1501 bulkscore=0 clxscore=1015 impostorscore=0 lowpriorityscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2512120000 definitions=main-2601150047 Content-Type: text/plain; charset="utf-8" These days most of the system have multi cores. The likelyhood of at least one or more CPUs in nohz (idle state) is higher. Give accurate hint to the branch predictor. Reviewed-and-tested-by: K Prateek Nayak Signed-off-by: Shrikanth Hegde Reviewed-by: Vincent Guittot --- kernel/sched/fair.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index a4910658c5d6..3d843d1396ec 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -12463,9 +12463,9 @@ static void nohz_balancer_kick(struct rq *rq) =20 /* * None are in tickless mode and hence no need for NOHZ idle load - * balancing: + * balancing */ - if (likely(!atomic_read(&nohz.nr_cpus))) + if (unlikely(!atomic_read(&nohz.nr_cpus))) return; =20 if (rq->nr_running >=3D 2) { --=20 2.51.0 From nobody Sat Feb 7 21:24:31 2026 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 637FD30EF7E for ; Thu, 15 Jan 2026 07:36:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768462595; cv=none; b=WocCsIbL/NWl0T88dApR1Or3mXBJnT4+rU7+INBUqPZSTVN62iuP728PbUp6HvgXCjzZoVq3nQQQh+PPvFY21qnKcpJawcjjdwYnx/RW9Ma9cZ/cNGz26f0ENJDwx95CRSGGD0nV7UK98SEn1fVSCrVwo8Fux61sZ58GWuDiuDQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768462595; c=relaxed/simple; bh=416Yf0+1au9gv6zKes3Pp5D9ke3bLsJvXy7/DT61vak=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Xfi+OFHmrGlmXIWikcwaMuiPyvG//9cJMIBE0wog2yRtngIom5lC6WGB7qAjnx393J8vmsJMDzxZGDqj/VO72Dsr0Q8giUkIQBgSh878IAMRw1kh75P3CxeI+G9OWZ7pDVApvGl5eLAUaNkLZhLHhwDcAWe5lJGcFoOEYUGraB4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=pwsmQpch; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="pwsmQpch" Received: from pps.filterd (m0356517.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 60F0TqDe021001; Thu, 15 Jan 2026 07:35:52 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=Zn71CbyHsNRbWo38F mycbcx9xweAzkbFMI11t6HX+C8=; b=pwsmQpchDR/kOH7Iq0gGy5JlIzxUTXxU8 eEND5lrl4rW6Gx3c4l/pWCvmFpVocf7nhfn2kidsCelY4/fvoNE1Tgzm9BNZdEQb 6RVNKkyth/h2verT8d8/XUuxKkd+dMhb0GZBp4i5egTZxiAEsD0QVaG29U6mO+P3 W5flXaMF3x8xCMcQA7oTj61FkSvkuY0qf5R93zXjjh+P4qvhTod01CdIHEwCVARV lNj9rJlL/sJsK4qmjOCxTfaAeAYCnJ6RwAZwEeBB6Y/esL7dvfDNoLpw84cwqcRP y3Ij6Cx3SO7nI9qKuSkVBx7S39bhlymAXOGsAm1es3gJG7p//YSfg== Received: from ppma21.wdc07v.mail.ibm.com (5b.69.3da9.ip4.static.sl-reverse.com [169.61.105.91]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4bkeg4na94-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 15 Jan 2026 07:35:51 +0000 (GMT) Received: from pps.filterd (ppma21.wdc07v.mail.ibm.com [127.0.0.1]) by ppma21.wdc07v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 60F4wqwa025549; Thu, 15 Jan 2026 07:35:50 GMT Received: from smtprelay05.fra02v.mail.ibm.com ([9.218.2.225]) by ppma21.wdc07v.mail.ibm.com (PPS) with ESMTPS id 4bm23nenqw-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 15 Jan 2026 07:35:50 +0000 Received: from smtpav06.fra02v.mail.ibm.com (smtpav06.fra02v.mail.ibm.com [10.20.54.105]) by smtprelay05.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 60F7Zku845023634 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 15 Jan 2026 07:35:46 GMT Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 3BB002004B; Thu, 15 Jan 2026 07:35:46 +0000 (GMT) Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id AE86620040; Thu, 15 Jan 2026 07:35:43 +0000 (GMT) Received: from li-7bb28a4c-2dab-11b2-a85c-887b5c60d769.ibm.com.com (unknown [9.39.17.239]) by smtpav06.fra02v.mail.ibm.com (Postfix) with ESMTP; Thu, 15 Jan 2026 07:35:43 +0000 (GMT) From: Shrikanth Hegde To: mingo@kernel.org, peterz@infradead.org, vincent.guittot@linaro.org, linux-kernel@vger.kernel.org Cc: sshegde@linux.ibm.com, kprateek.nayak@amd.com, juri.lelli@redhat.com, vschneid@redhat.com, tglx@kernel.org, dietmar.eggemann@arm.com, anna-maria@linutronix.de, frederic@kernel.org, wangyang.guo@intel.com Subject: [PATCH v5 3/3] sched/fair: Remove nohz.nr_cpus and use weight of cpumask instead Date: Thu, 15 Jan 2026 13:05:24 +0530 Message-ID: <20260115073524.376643-4-sshegde@linux.ibm.com> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20260115073524.376643-1-sshegde@linux.ibm.com> References: <20260115073524.376643-1-sshegde@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMTE1MDA0NyBTYWx0ZWRfX0snL5exMzSJ0 VqWGli66qzF1dwwDeJKC4FuPlHXnMjGq/6C6BkUy0SGoJquWbIFEbpBYDka3yrOD4aAw6G+iatV ogIMbbQ3iWtweCV05UyPZnu7vsbNiPkiU6OITiuHhVU1u08Ug4YHA7Z/zptqB9DF2/iDyxWxZR8 2jJyhmTL7vqSr9bXoFhF21jTLe35taOGB7E2XsAAL+csvRAYNxZH0AyeQWLN1OY/pfXTAbe4CSL DKTw+hXtGU/tTlfQsEIuDm7DLmfkkNXtXE+/7r/Chfa/dnrnXtpDF7bZi1QK3cT8EddLGbAbk8J gi6qs5jSvshYZswv4AplYGhcfF5jusP4AyjUW3r1tv3wNeQRMY4oiipe2DqtqEjh8Ux29BLoUam SryrEBR7MsKqr992aC38ZT7lCuARDfx9hFyJLJBIM0vuRBnb9htUmvhm3tE9MPnDv62fKX/Fw5r J8bOtMWc5w/LSHuZS9Q== X-Proofpoint-ORIG-GUID: A-yRahLKDycdQLsb61gfC_NAyXMfni67 X-Authority-Analysis: v=2.4 cv=B/60EetM c=1 sm=1 tr=0 ts=696898d8 cx=c_pps a=GFwsV6G8L6GxiO2Y/PsHdQ==:117 a=GFwsV6G8L6GxiO2Y/PsHdQ==:17 a=vUbySO9Y5rIA:10 a=VkNPw1HP01LnGYTKEx00:22 a=20KFwNOVAAAA:8 a=zd2uoN0lAAAA:8 a=VnNF1IyMAAAA:8 a=T4Os_ZD8T57gc3T8qpkA:9 X-Proofpoint-GUID: A-yRahLKDycdQLsb61gfC_NAyXMfni67 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2026-01-15_02,2026-01-14_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 suspectscore=0 bulkscore=0 spamscore=0 impostorscore=0 malwarescore=0 phishscore=0 adultscore=0 clxscore=1015 lowpriorityscore=0 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2512120000 definitions=main-2601150047 Content-Type: text/plain; charset="utf-8" nohz.nr_cpus was observed as contended cacheline when running enterprise workload on large systems. Fundamental scalability challenge with nohz.idle_cpus_mask and nohz.nr_cpus is the following: (1) nohz_balancer_kick() observes (reads) nohz.nr_cpus (or nohz.idle_cpu_mask) and nohz.has_blocked to see whether there's any nohz balancing work to do, in every scheduler tick. (2) nohz_balance_enter_idle() and nohz_balance_exit_idle() (through nohz_balancer_kick() via sched_tick()) modify (write) nohz.nr_cpus (and/or nohz.idle_cpu_mask) and nohz.has_blocked. The characteristic frequencies are the following: (1) nohz_balancer_kick() happens at scheduler (busy)tick frequency on CPU(which has not gone idle). This is a relatively constant frequency in the ~1 kHz range or lower. (2) happens at idle enter/exit frequency on every CPU that goes to idle. This is workload dependent, but can easily be hundreds of kHz for IO-bound loads and high CPU counts. Ie. can be orders of magnitude higher than (1), in which case a cachemiss at every invocation of (1) is almost inevitable. idle exit will trigger (1) on the CPU which is coming out of idle. There's two types of costs from these functions: (A) scheduler tick cost via (1): this happens on busy CPUs too, and is thus a primary scalability cost. But the rate here is constant and typically much lower than (B), hence the absolute benefit to workload scalability will be lower as well. (B) idle cost via (2): going-to-idle and coming-from-idle costs are secondary concerns, because they impact power efficiency more than they impact scalability. But in terms of absolute cost this scales up with nr_cpus as well, and a much faster rate, and thus may also approach and negatively impact system limits like memory bus/fabric bandwidth. Note that nohz.idle_cpus_mask and nohz.nr_cpus may appear to reside in the same cacheline, however under CONFIG_CPUMASK_OFFSTACK=3Dy the backing stora= ge for nohz.idle_cpus_mask will be elsewhere. With CPUMASK_OFFSTACK=3Dn, the nohz.idle_cpus_mask and rest of nohz fields are in different cachelines under typical NR_CPUS=3D512/2048. This implies two separate cachelines being dirtied upon idle entry / exit. nohz.nr_cpus can be derived from the mask itself. Its usage doesn't warrant a functionally correct value. This means one less cacheline being dirtied in idle entry/exit path which helps to save some bus bandwidth w.r.t to those nohz functions(approx 50%). This in turn helps to improve enterprise workload throughput. On system with 480 CPUs, running "hackbench 40 process 10000 loops" (Avg of 3 runs) baseline: 0.81% hackbench [k] nohz_balance_exit_idle 0.21% hackbench [k] nohz_balancer_kick 0.09% swapper [k] nohz_run_idle_balance With patch: 0.35% hackbench [k] nohz_balance_exit_idle 0.09% hackbench [k] nohz_balancer_kick 0.07% swapper [k] nohz_run_idle_balance [Ingo Molnar: scalability analysis changlog] Reviewed-by: Valentin Schneider Reviewed-and-tested-by: K Prateek Nayak Signed-off-by: Shrikanth Hegde Reviewed-by: Vincent Guittot --- kernel/sched/fair.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 3d843d1396ec..46ed16466be4 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7141,7 +7141,6 @@ static DEFINE_PER_CPU(cpumask_var_t, should_we_balanc= e_tmpmask); =20 static struct { cpumask_var_t idle_cpus_mask; - atomic_t nr_cpus; int has_blocked_load; /* Idle CPUS has blocked load */ int needs_update; /* Newly idle CPUs need their next_balance collated */ unsigned long next_balance; /* in jiffy units */ @@ -12465,7 +12464,7 @@ static void nohz_balancer_kick(struct rq *rq) * None are in tickless mode and hence no need for NOHZ idle load * balancing */ - if (unlikely(!atomic_read(&nohz.nr_cpus))) + if (unlikely(cpumask_empty(nohz.idle_cpus_mask))) return; =20 if (rq->nr_running >=3D 2) { @@ -12578,7 +12577,6 @@ void nohz_balance_exit_idle(struct rq *rq) =20 rq->nohz_tick_stopped =3D 0; cpumask_clear_cpu(rq->cpu, nohz.idle_cpus_mask); - atomic_dec(&nohz.nr_cpus); =20 set_cpu_sd_state_busy(rq->cpu); } @@ -12636,7 +12634,6 @@ void nohz_balance_enter_idle(int cpu) rq->nohz_tick_stopped =3D 1; =20 cpumask_set_cpu(cpu, nohz.idle_cpus_mask); - atomic_inc(&nohz.nr_cpus); =20 /* * Ensures that if nohz_idle_balance() fails to observe our --=20 2.51.0