From nobody Sat Feb 7 18:29:13 2026 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BFD3E31076D for ; Fri, 2 Jan 2026 12:48:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767358096; cv=none; b=rCNQRcggFhqfiWrBYRmvHQY+u/ycrsmDSNJVJaNLwCiEz91S3WJvn70t+9rBq8NIPL2k2oUx80odgJJpwgobk8ZdPmAdI663URN3QVgCcY+UT2bn5QUb1aqQVAqEvy/Kki2ncocINmIoyxC7PvP27Ad1SYjVb+7nwyTsOjVjkZQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767358096; c=relaxed/simple; bh=jdEUJMntGEFd7cE5QiX9Yfn2HykLvrnNHgPvBZSr33U=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=g2BPPzXwLzfr3nvGkxAR6dsC0zXaSHysBj0LXCKvswudhNWKOMR74bbRCZ81MfPjUneXW1aoXWJEENFBItWQwUhmLWxc5oAHYArp5yhIVPCHOPbLSEXk5zHfq2ojvEUCLB6wmILSCFJIZl7DA2cSygMej041IPp1CDhZmGp0Bm4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=Y3nWxRbN; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="Y3nWxRbN" Received: from pps.filterd (m0356517.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 6020wsj0031634; Fri, 2 Jan 2026 12:47:57 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=+duluWjJAt4b/4NKg oMFJseTODiaLhPv14SNw+knMyE=; b=Y3nWxRbNSRkpKwslI6gts9coeV+lDuCZk mTgunsCtBwYNDfU5GC0N7YJrcQuv0NPgGlUHkpaAsUaguDdKdbv11yy9+5AmIP82 O/XIjo8lr6HYvoiYE8uUyW63E6bv0vaBurjhD5C6910cxrFql0rS9Ile/ahYyrak Lv1p2yemVG/0LGDB6ZxyCt3gUUTRu0DlSiWYWpjWCcRtZ7xOK8lk+d117fh3v8f8 MqwzfNfrvKL7rzfXjBC9MCaLAd7u1kIhbE07P1JsW7j7s7Vbs2dSFrVGjA4FDR08 6Iy6jqmPlQ2PayKhY7Aiv52OxqbGg2acPuYWQ9fWXXcFMq+0h+QAQ== Received: from ppma12.dal12v.mail.ibm.com (dc.9e.1632.ip4.static.sl-reverse.com [50.22.158.220]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4ba7657b66-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 02 Jan 2026 12:47:57 +0000 (GMT) Received: from pps.filterd (ppma12.dal12v.mail.ibm.com [127.0.0.1]) by ppma12.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 6029wLS3012856; Fri, 2 Jan 2026 12:47:56 GMT Received: from smtprelay01.fra02v.mail.ibm.com ([9.218.2.227]) by ppma12.dal12v.mail.ibm.com (PPS) with ESMTPS id 4basst6n8p-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 02 Jan 2026 12:47:56 +0000 Received: from smtpav05.fra02v.mail.ibm.com (smtpav05.fra02v.mail.ibm.com [10.20.54.104]) by smtprelay01.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 602CltZS50921898 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 2 Jan 2026 12:47:55 GMT Received: from smtpav05.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E28222004B; Fri, 2 Jan 2026 12:47:54 +0000 (GMT) Received: from smtpav05.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E97D820040; Fri, 2 Jan 2026 12:47:51 +0000 (GMT) Received: from li-7bb28a4c-2dab-11b2-a85c-887b5c60d769.ibm.com.com (unknown [9.124.213.170]) by smtpav05.fra02v.mail.ibm.com (Postfix) with ESMTP; Fri, 2 Jan 2026 12:47:51 +0000 (GMT) From: Shrikanth Hegde To: mingo@kernel.org, peterz@infradead.org, vincent.guittot@linaro.org Cc: sshegde@linux.ibm.com, linux-kernel@vger.kernel.org, kprateek.nayak@amd.com, juri.lelli@redhat.com, vschneid@redhat.com, tglx@linutronix.de, dietmar.eggemann@arm.com, anna-maria@linutronix.de, frederic@kernel.org, wangyang.guo@intel.com Subject: [PATCH v2 1/3] sched/fair: Move checking for nohz cpus after time check Date: Fri, 2 Jan 2026 18:17:42 +0530 Message-ID: <20260102124744.360872-2-sshegde@linux.ibm.com> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20260102124744.360872-1-sshegde@linux.ibm.com> References: <20260102124744.360872-1-sshegde@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMTAyMDExMyBTYWx0ZWRfX5aPJdHu/SNSX MGiWocBUuCfSIiVAhGruf76PfTTL3qbYNND0TaCjDXzoHGCtenkP20dHP5HBYa2mx9KrDQVwGq1 ggCMQoB/h6pO2SU2By7OUXFlUvG4i44YnqG5obn9pADIWelwam097pXwvBul63i1VdVfTK4bLlZ F/bN10Wtw9eNIK/lBWIeMa0JJjseW541gi50ha6cg0H/fhqr1YKiGviDcu9OL3WmBf3Q1Cz8lJt fK0b3u0z550JB8eJyq1JVDBCbyC/04Fv6FR4jjTqjF6xKbF51TtWXKp5qC02IS1RNvhKuJgZXOf n1aFQsvXhKfzko/NICfFlBQNxrKx1gM0BbV137lmcshxGX6hMbV1JB6r0JJnLRcmN/jrHoyPY7K lV7d0gIcL+XqTNj3Sim+MZcRptiLPoLFWqKe5zXETzJh7cNyY3a+Ds48SvZJx2CwN4XEYjhSE0u Ni7x6YfWrR1hQ/JFekg== X-Proofpoint-GUID: b2e3wZK2R6wuldKPVHTbuVO8owIL30ax X-Authority-Analysis: v=2.4 cv=B4+0EetM c=1 sm=1 tr=0 ts=6957be7d cx=c_pps a=bLidbwmWQ0KltjZqbj+ezA==:117 a=bLidbwmWQ0KltjZqbj+ezA==:17 a=vUbySO9Y5rIA:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=JNbE4A9j0n_sWetgYuoA:9 X-Proofpoint-ORIG-GUID: b2e3wZK2R6wuldKPVHTbuVO8owIL30ax X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2026-01-02_01,2025-12-31_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 adultscore=0 phishscore=0 bulkscore=0 clxscore=1015 malwarescore=0 impostorscore=0 spamscore=0 lowpriorityscore=0 priorityscore=1501 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2512120000 definitions=main-2601020113 Content-Type: text/plain; charset="utf-8" Idle load balancer is kicked off only after time check. So move the atomic read after the time check to avoid the overhead. If there are no nohz CPUs and next_blocked has passed, then there will be one additional stats based load balancing which would set the has_blocked_load to 0. It shouldn't make a difference. Signed-off-by: Shrikanth Hegde =20 --- kernel/sched/fair.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 7377f9117501..cd1c78d2c272 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -12447,13 +12447,6 @@ static void nohz_balancer_kick(struct rq *rq) */ nohz_balance_exit_idle(rq); =20 - /* - * None are in tickless mode and hence no need for NOHZ idle load - * balancing: - */ - if (likely(!atomic_read(&nohz.nr_cpus))) - return; - if (READ_ONCE(nohz.has_blocked_load) && time_after(now, READ_ONCE(nohz.next_blocked))) flags =3D NOHZ_STATS_KICK; @@ -12461,6 +12454,13 @@ static void nohz_balancer_kick(struct rq *rq) if (time_before(now, nohz.next_balance)) goto out; =20 + /* + * None are in tickless mode and hence no need for NOHZ idle load + * balancing: + */ + if (likely(!atomic_read(&nohz.nr_cpus))) + return; + if (rq->nr_running >=3D 2) { flags =3D NOHZ_STATS_KICK | NOHZ_BALANCE_KICK; goto out; --=20 2.47.3 From nobody Sat Feb 7 18:29:13 2026 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 55932310762 for ; Fri, 2 Jan 2026 12:48:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.156.1 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767358096; cv=none; b=dyWAtSAqGyYiTutAxuFUGrZGnrVfihMZce75c4FLFdWS7+dOf5B7z2/qAuDQc4oRqg726RGeK85C/K2cWJnV50yF9OUJxasqPayNw3VW2E7Y/HdUPqohRwxExyZm2xTLFiugQ5nivlzVqABEitiFx6AYS+NY2haEypSlflHqxW8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767358096; c=relaxed/simple; bh=2RHYRjtC7ZDQELdhC5Nqr+RwRgo1QWzwOtoBRiVIlf8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Mqp/ypznDXav1vRLj/pQ+h4XhY1Tr7mKcUe1+iue9+knUmmNyIUt64BW4zjLHeOTaCzYz5PqSQS8272YfnVP0CcXfBdeECplbDSk6BPeC3ob2FVQs/udVb4mpZMwSYcRY9pW8TftUk5zEEA1QhR9qjqqjjPY17nIXiygRStFptE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=Zf4JiNEx; arc=none smtp.client-ip=148.163.156.1 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="Zf4JiNEx" Received: from pps.filterd (m0356517.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 6021hMij011094; Fri, 2 Jan 2026 12:48:01 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=1rhIXFhdOdNBWX0sV RA4xvWHSOImkpvGiUQCwzBjKic=; b=Zf4JiNExJn9Ez2iJpoT5HSvqkd0sApEFX cOmFdZ7DY4d+AFvsOqjedoU4mnMnJZp6WW/LfUaLfGhjmkaoWKbh7bb0KsW5cGwM WSt75ZIKoB4ia4qgqmPgsKdfnWgnX4CmYy0OEw6XJsnxw+5g2idEwgPQAFCzNGxE E+RzQnr3QxGcludWLexSHhUn7ujr7u0PINK7Phv8kFtYmHARhfC+OSjUSRMW15o+ pQ7+tHsckGLXipd+xrfZvqtXU0BRCRRGAFUWKZMzqyl3usE1qNE+FU/1WKGFCeI0 2Hvue16rMhxRxhVvbfr9znb1+GY4YDQBAjtEPEDOJnri3bF+v9f3w== Received: from ppma13.dal12v.mail.ibm.com (dd.9e.1632.ip4.static.sl-reverse.com [50.22.158.221]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4ba7657b6b-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 02 Jan 2026 12:48:00 +0000 (GMT) Received: from pps.filterd (ppma13.dal12v.mail.ibm.com [127.0.0.1]) by ppma13.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 602BSY5M008030; Fri, 2 Jan 2026 12:47:59 GMT Received: from smtprelay04.fra02v.mail.ibm.com ([9.218.2.228]) by ppma13.dal12v.mail.ibm.com (PPS) with ESMTPS id 4bav0k6a52-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 02 Jan 2026 12:47:59 +0000 Received: from smtpav05.fra02v.mail.ibm.com (smtpav05.fra02v.mail.ibm.com [10.20.54.104]) by smtprelay04.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 602ClwJZ28508708 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 2 Jan 2026 12:47:58 GMT Received: from smtpav05.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 1913220043; Fri, 2 Jan 2026 12:47:58 +0000 (GMT) Received: from smtpav05.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 5198520040; Fri, 2 Jan 2026 12:47:55 +0000 (GMT) Received: from li-7bb28a4c-2dab-11b2-a85c-887b5c60d769.ibm.com.com (unknown [9.124.213.170]) by smtpav05.fra02v.mail.ibm.com (Postfix) with ESMTP; Fri, 2 Jan 2026 12:47:55 +0000 (GMT) From: Shrikanth Hegde To: mingo@kernel.org, peterz@infradead.org, vincent.guittot@linaro.org Cc: sshegde@linux.ibm.com, linux-kernel@vger.kernel.org, kprateek.nayak@amd.com, juri.lelli@redhat.com, vschneid@redhat.com, tglx@linutronix.de, dietmar.eggemann@arm.com, anna-maria@linutronix.de, frederic@kernel.org, wangyang.guo@intel.com Subject: [PATCH v2 2/3] sched/fair: Change likelyhood of nohz.nr_cpus and do stats update if its due Date: Fri, 2 Jan 2026 18:17:43 +0530 Message-ID: <20260102124744.360872-3-sshegde@linux.ibm.com> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20260102124744.360872-1-sshegde@linux.ibm.com> References: <20260102124744.360872-1-sshegde@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMTAyMDExMyBTYWx0ZWRfX6POhJHEuY58Z pOhkhmnnRzeP1iIABjjZ/WLWuuLVANos/hQ+pgrz+rvzHp/hzY9oTdKvxBQEY3iT+zwFkpiOCHf EQC1EofIyMZzsqKpaLlWaU42Ql0sCLPiLPalRvBQUsezxdSxG5nbJB0V+wb68nkLXbj2n5Xlb2o S4M24FUlJFU7HsfyYOc1hv5/OreZM8CotMPdnje+6T/EqyOJBUBjoiyCfp6LwtzEuF9Pd82yYOT GzNsYfGh9N71bU87F4Sxpf4FaHct6iL8bgJ+pu80iEMS0LU4htQskm7iNEoIPvEGOYdfukiOWzg PkAEYSmJELW5tp7N+YHFCcigfnMtRewEM/BuIPRjpZ3YO/KHatVcPImsFWyRhB4N/gVMiYXa41g PrFMpedKm/jwgkQEPz9ht959xiBKJh2e1vLMqvoo+wg7K8zcgpy8Am6NLaApB9rhWjDGf6sRzT0 tdp8EvNwlfSuY/pR2Ew== X-Proofpoint-GUID: fh5I1w6DEDKnlf2b-mdHUTQzORvMCVrL X-Authority-Analysis: v=2.4 cv=B4+0EetM c=1 sm=1 tr=0 ts=6957be80 cx=c_pps a=AfN7/Ok6k8XGzOShvHwTGQ==:117 a=AfN7/Ok6k8XGzOShvHwTGQ==:17 a=eJTnxkHzwNBm_rMR:21 a=vUbySO9Y5rIA:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=L2DUigS4eapgOTKWHN4A:9 X-Proofpoint-ORIG-GUID: fh5I1w6DEDKnlf2b-mdHUTQzORvMCVrL X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2026-01-02_01,2025-12-31_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 adultscore=0 phishscore=0 bulkscore=0 clxscore=1015 malwarescore=0 impostorscore=0 spamscore=0 lowpriorityscore=0 priorityscore=1501 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2512120000 definitions=main-2601020113 Content-Type: text/plain; charset="utf-8" These days most of the system have multi cores. The likelyhood of at least one or more CPUs in nohz (idle state) is higher. So move likely to unlikely. Allow stats balancing to complete when there are no nr_cpus as the check happens later. This may do an additional stats based load balancing which would reset has_blocked_load. Code also looks saner by removing that uncharactiristic return in between. Signed-off-by: Shrikanth Hegde --- kernel/sched/fair.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index cd1c78d2c272..5ceb9126d441 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -12456,10 +12456,10 @@ static void nohz_balancer_kick(struct rq *rq) =20 /* * None are in tickless mode and hence no need for NOHZ idle load - * balancing: + * balancing, do stats update if its due */ - if (likely(!atomic_read(&nohz.nr_cpus))) - return; + if (unlikely(!atomic_read(&nohz.nr_cpus))) + goto out; =20 if (rq->nr_running >=3D 2) { flags =3D NOHZ_STATS_KICK | NOHZ_BALANCE_KICK; --=20 2.47.3 From nobody Sat Feb 7 18:29:13 2026 Received: from mx0b-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3A514311599 for ; Fri, 2 Jan 2026 12:48:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.158.5 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767358112; cv=none; b=QY4D0G+ewMdcxvinFfECJKVyTBwbh1dDPuxHL22HnpxeIkZdOPOzuISTBkn5t3o8tkTdvHfZsKOaAsjvxaFuENeMO4u3bL6hFYj+CLO6szfxDLe0rWGqsVA2JTpfIgykQXF19PuWuT1lJI7rw7qCkgplTIp/GhALIDj/f8vmylk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767358112; c=relaxed/simple; bh=l58KatdJInUNY4yDx7oNMEnHGKSXHzj0hZttI48NpaE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=qsZIyIvjBCcB8g7m5DgqdghhptT6DmBuKJB2KG0xaAhI8Dmh+wdIXgJ2K9CS2DlcgK3r8vwpnM7myLY91CSqIhQfsL41JdqHC0NPCUxBMCaMWafFFu98B9dJdL9l6ucH6GdHKq5XMEZisnZOQQBXdGbX5GB19ASpgZymd1pqwm4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com; spf=pass smtp.mailfrom=linux.ibm.com; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b=Fxr5L0p3; arc=none smtp.client-ip=148.163.158.5 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.ibm.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ibm.com header.i=@ibm.com header.b="Fxr5L0p3" Received: from pps.filterd (m0360072.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 601LPOnd000349; Fri, 2 Jan 2026 12:48:04 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=cc :content-transfer-encoding:date:from:in-reply-to:message-id :mime-version:references:subject:to; s=pp1; bh=/w1ChqA0vJZg5Kzo7 bv8QZSjxEr5H743v0hrC7mknCg=; b=Fxr5L0p3QtOU00DAboFaaXlyoMrLNxC8U e+xe5rK6df26ti1RX8pAwvVGPSrDzgYaL6JAzw5BXcjQS2LvrY/Pd3xL7D5PHdkW 8FDJGfaEmC1zVOQu2s3Fe4Gyi4uR9t46sh/IZvesjYmlqN0DDAenD3iDnCIMYmhG wpmbLM0alEv82yljuy+XTdQ0YmDMd1Bg4V0cXWsSt/LRS/jAFlmcQkEDh5iUaQDp a/xKOB/DDvPHhDMW6NIxscP3uq9KjDetv837KdI0vwRWAqs+o7U+fBdEpU2GdTBH dhxEZ4aNFATqveUernPIWHY5AVPDXpIg6tviT7kx/mVsdX6YSkI7Q== Received: from ppma13.dal12v.mail.ibm.com (dd.9e.1632.ip4.static.sl-reverse.com [50.22.158.221]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 4ba73w4f3v-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 02 Jan 2026 12:48:03 +0000 (GMT) Received: from pps.filterd (ppma13.dal12v.mail.ibm.com [127.0.0.1]) by ppma13.dal12v.mail.ibm.com (8.18.1.2/8.18.1.2) with ESMTP id 602BbPgP008042; Fri, 2 Jan 2026 12:48:02 GMT Received: from smtprelay01.fra02v.mail.ibm.com ([9.218.2.227]) by ppma13.dal12v.mail.ibm.com (PPS) with ESMTPS id 4bav0k6a58-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 02 Jan 2026 12:48:02 +0000 Received: from smtpav05.fra02v.mail.ibm.com (smtpav05.fra02v.mail.ibm.com [10.20.54.104]) by smtprelay01.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 602Cm1Jt50921922 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 2 Jan 2026 12:48:01 GMT Received: from smtpav05.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 121AE20043; Fri, 2 Jan 2026 12:48:01 +0000 (GMT) Received: from smtpav05.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 7C96020040; Fri, 2 Jan 2026 12:47:58 +0000 (GMT) Received: from li-7bb28a4c-2dab-11b2-a85c-887b5c60d769.ibm.com.com (unknown [9.124.213.170]) by smtpav05.fra02v.mail.ibm.com (Postfix) with ESMTP; Fri, 2 Jan 2026 12:47:58 +0000 (GMT) From: Shrikanth Hegde To: mingo@kernel.org, peterz@infradead.org, vincent.guittot@linaro.org Cc: sshegde@linux.ibm.com, linux-kernel@vger.kernel.org, kprateek.nayak@amd.com, juri.lelli@redhat.com, vschneid@redhat.com, tglx@linutronix.de, dietmar.eggemann@arm.com, anna-maria@linutronix.de, frederic@kernel.org, wangyang.guo@intel.com Subject: [PATCH v2 3/3] sched/fair: Remove nohz.nr_cpus and use weight of cpumask instead Date: Fri, 2 Jan 2026 18:17:44 +0530 Message-ID: <20260102124744.360872-4-sshegde@linux.ibm.com> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20260102124744.360872-1-sshegde@linux.ibm.com> References: <20260102124744.360872-1-sshegde@linux.ibm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: k-4hdVuhHl09A5Tq1AtBZtzCxDQm1-9r X-Proofpoint-Spam-Details-Enc: AW1haW4tMjYwMTAyMDExMyBTYWx0ZWRfX60pvNhmaXkmw tJukO1ir+gHy+lYZQwh+0ysYvQCsBMODb0mkRvNiGU/O8DlnLVhoTUFo+QZRjWPYa8Hh4c7g+fi Nu0NPEnAPFXrBEsuK80TgHQXOdtDrwV/E9cIqr0iCq9QdFS6soR5LOdIbJvftBHTslay4aN7Abh jKYHsrWiqiHm4HCK3e3D96EN+XyMpltSu0iRPZ2kJFSmq1hMNTBlVaaJYMfXxnGK1nKzYTOWz8m 8olCxIOg6OZC1NPNYCBOLODiIS+V/kBXUegCUPkAOg2lAk2dRpk3srkyhALHC/qFUlkRJC1N1oY 5tSN+PsvrD+x1hsN+37cPqURRItzbEzrY+Z1bPvVBmw9GnNIyLNwULg9cd/+SmriIpM+Nblr+r/ WRnh8IWCBi4nPjdedoQq5cvePKicMuaip03Qy/7TXrhtkmGGwZJmi9dr/kGWGqUp9TjPFgFCWUZ U/nT6rWEi5zQ8KXSFuw== X-Authority-Analysis: v=2.4 cv=fobRpV4f c=1 sm=1 tr=0 ts=6957be83 cx=c_pps a=AfN7/Ok6k8XGzOShvHwTGQ==:117 a=AfN7/Ok6k8XGzOShvHwTGQ==:17 a=vUbySO9Y5rIA:10 a=VkNPw1HP01LnGYTKEx00:22 a=VnNF1IyMAAAA:8 a=T4Os_ZD8T57gc3T8qpkA:9 X-Proofpoint-GUID: k-4hdVuhHl09A5Tq1AtBZtzCxDQm1-9r X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1121,Hydra:6.1.9,FMLib:17.12.100.49 definitions=2026-01-02_01,2025-12-31_01,2025-10-01_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 suspectscore=0 phishscore=0 adultscore=0 malwarescore=0 spamscore=0 bulkscore=0 impostorscore=0 priorityscore=1501 clxscore=1015 classifier=typeunknown authscore=0 authtc= authcc= route=outbound adjust=0 reason=mlx scancount=1 engine=8.19.0-2512120000 definitions=main-2601020113 Content-Type: text/plain; charset="utf-8" nohz.nr_cpus was observed as contended cacheline when running enterprise workload on large systems. Fundamental scalability challenge with nohz.idle_cpus_mask and nohz.nr_cpus is the following: (1) nohz_balancer_kick() observes (reads) nohz.nr_cpus=20 (or nohz.idle_cpu_mask) and nohz.has_blocked to see whether there's any nohz balancing work to do, in every scheduler tick. (2) nohz_balance_enter_idle() and nohz_balance_exit_idle() (through nohz_balancer_kick() via sched_tick()) modify (write) nohz.nr_cpus (and/or nohz.idle_cpu_mask) and nohz.has_blocked. The characteristic frequencies are the following: (1) nohz_balancer_kick() happens at scheduler (busy)tick frequency on CPU(which has not gone idle). This is a relatively constant frequency in the ~1 kHz range or lower. (2) happens at idle enter/exit frequency on every CPU that goes to idle. This is workload dependent, but can easily be hundreds of kHz for IO-bound loads and high CPU counts. Ie. can be orders of magnitude higher than (1), in which case a cachemiss at every invocation of (1) is almost inevitable. idle exit will trigger (1) on the CPU which is coming out of idle. There's two types of costs from these functions: (A) scheduler tick cost via (1): this happens on busy CPUs too, and is thus a primary scalability cost. But the rate here is constant and typically much lower than (B), hence the absolute benefit to workload scalability will be lower as well. (B) idle cost via (2): going-to-idle and coming-from-idle costs are secondary concerns, because they impact power efficiency more than they impact scalability. But in terms of absolute cost this scales up with nr_cpus as well, and a much faster rate, and thus may also approach and negatively impact system limits like memory bus/fabric bandwidth. Above mentioned fundamental scalability challenge remains true for nohz.idle_cpus_mask even after this patch. But nr_cpus can be derived from the mask itself. Its usage doesn't warrant a functionally correct value. It can race, at worst an additional load balance may be attempted. So, derive the value from the idle_cpus_mask. This helps to save some bus bandwidth w.r.t to that nohz cacheline(approx 50%). This in turn helps to improve enterprise workload throughput. This theory holds true for CPUMASK_OFFSTACK=3Dy and mostly true for CPUMASK_OFFSTACK=3Dn (last few bits based on NR_CPUs could be in same=20 cacheline as nr_cpus) On system with 480 CPUs, running hackbench 40 process 10000 loops (Avg of 3 runs) baseline: 0.81% hackbench [k] nohz_balance_exit_idle 0.21% hackbench [k] nohz_balancer_kick 0.09% swapper [k] nohz_run_idle_balance With patch: 0.35% hackbench [k] nohz_balance_exit_idle 0.09% hackbench [k] nohz_balancer_kick 0.07% swapper [k] nohz_run_idle_balance [Ingo Molnar: scalability analysis changlog] Signed-off-by: Shrikanth Hegde --- kernel/sched/fair.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 5ceb9126d441..805b53d9709e 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7141,7 +7141,6 @@ static DEFINE_PER_CPU(cpumask_var_t, should_we_balanc= e_tmpmask); =20 static struct { cpumask_var_t idle_cpus_mask; - atomic_t nr_cpus; int has_blocked_load; /* Idle CPUS has blocked load */ int needs_update; /* Newly idle CPUs need their next_balance collated */ unsigned long next_balance; /* in jiffy units */ @@ -12458,7 +12457,7 @@ static void nohz_balancer_kick(struct rq *rq) * None are in tickless mode and hence no need for NOHZ idle load * balancing, do stats update if its due */ - if (unlikely(!atomic_read(&nohz.nr_cpus))) + if (unlikely(cpumask_empty(nohz.idle_cpus_mask))) goto out; =20 if (rq->nr_running >=3D 2) { @@ -12571,7 +12570,6 @@ void nohz_balance_exit_idle(struct rq *rq) =20 rq->nohz_tick_stopped =3D 0; cpumask_clear_cpu(rq->cpu, nohz.idle_cpus_mask); - atomic_dec(&nohz.nr_cpus); =20 set_cpu_sd_state_busy(rq->cpu); } @@ -12629,7 +12627,6 @@ void nohz_balance_enter_idle(int cpu) rq->nohz_tick_stopped =3D 1; =20 cpumask_set_cpu(cpu, nohz.idle_cpus_mask); - atomic_inc(&nohz.nr_cpus); =20 /* * Ensures that if nohz_idle_balance() fails to observe our --=20 2.47.3