From nobody Fri Dec 19 13:09:31 2025 Received: from DM1PR04CU001.outbound.protection.outlook.com (mail-centralusazon11010012.outbound.protection.outlook.com [52.101.61.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C4B6332B985 for ; Mon, 8 Dec 2025 09:34:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=52.101.61.12 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765186493; cv=fail; b=szYVShFd9uYwvIz/9d6QqrnXHaac4pmFGEOZp4JqxfUPjPMwwnpSaya9rtFsgTb3l3/cWnj5fopDkWT19llg7rLHxc+A/s8X7Um4iVO5gQb7YYOGpDPmGTHTrz4aggdTVufZo5nLDOpt94gBwMf70ycJPHYKekOOFL3upIU9Yr4= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765186493; c=relaxed/simple; bh=fHX9XbkfR1FOZ3JdK5M2kq/BUpZziQ4lPJyyXnPgUrc=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=hK9d8LpBuK3EAWBwA74/ioZODWpQEJB1Ps0v2sTWmbuo3itI3p6+5Xd/vyDlmVk/APMJSS8nvskmY5eJXtKELdHjjUXC2WBy0W3HSx2Gj5A5cx8cGXW3TLMPqnOnb5qf0mSHSylYaKkeC5iLNsxlqZie3DA2O7oOkljxeOENaiA= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com; spf=fail smtp.mailfrom=amd.com; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b=IV3Ykc2E; arc=fail smtp.client-ip=52.101.61.12 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="IV3Ykc2E" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=JwA2bXo3MnxJN+Z/Aco+GMZ9Duo+SY8GNHK0sylQlwBVZVveIc78g0SSEo+zHdS1kitzLOM9tea96lcTvEeENXW/rEHQ70G3+a2Tp//RWNIyJnXAUlYl3+wAURk0e0qtHTDBZDkAPMTyC/5xt0ZS/CRNrz2lD3vqlGU85FBDKX0VK+/LmN31NS9EtRYsO8INyJhLtjS1lyV/R2gq7eqTxS5lrPDdn0ludLgUgxi4XBC4yQA9B5e0uLfnwIMCQAAl38kCVEHVoymqhNPTzfJ7golRtWbgEABrxyXWn+U4gxZD0GmmjeIJyr7bS4vnVUOErfoDtEaGx+TqbUxz0b6N9w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=k9oJerytqOxuiFApGXE+O6U3DNx9IN3ebozo12Xk330=; b=IRptAjEsXOeB0In9MrP1QSu5MvKSJGyffQbtVBaT8LvR1CAAK8J1m5kHNZsUhk3QM+Vjdim/+CDJr/rsQfGH1I72/Ix4HNh+EG+rRoVGCR5uZrr/PjOUw3F6QDRVP3n1/TrThBkyhdYkdDe4K53Ttfk6KOBiQppV/i5XlqTUYnxuAEMI8LlI+HMHXEShIPh9E0lG1n1nSg89xjliTN4vZcl5TKYrP/cxAjw/Re3ShuAS7+uYESwmsd4TxGD7YRkI6mZ3gRC39/gEM1jvECgpaiOOxJvqIaDCba1QdxeATDil874tAMoUbk6YyXOpJa4qjX/tMbU1ngA0FKfoXkKwiw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=redhat.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=k9oJerytqOxuiFApGXE+O6U3DNx9IN3ebozo12Xk330=; b=IV3Ykc2EfxmzlQ/PnbQbBRDVO3Y62Iv97fJXxpxWmdLJ51WxLsCNZNI/FJMNbLpT/oYlCNEeQU+whXK6SqP9SDlxk6WuG1+Das7Rg0/JafpdEvozY04vQASlPfHpwopCi7uhtqVNGyT6fzsgj3UAARZh7p+Ma0VoWfWDhYBWgY4= Received: from DM6PR11CA0059.namprd11.prod.outlook.com (2603:10b6:5:14c::36) by DM4PR12MB6012.namprd12.prod.outlook.com (2603:10b6:8:6c::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9388.14; Mon, 8 Dec 2025 09:34:48 +0000 Received: from DS3PEPF0000C37C.namprd04.prod.outlook.com (2603:10b6:5:14c:cafe::e1) by DM6PR11CA0059.outlook.office365.com (2603:10b6:5:14c::36) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9388.14 via Frontend Transport; Mon, 8 Dec 2025 09:34:49 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=satlexmb07.amd.com; pr=C Received: from satlexmb07.amd.com (165.204.84.17) by DS3PEPF0000C37C.mail.protection.outlook.com (10.167.23.6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9412.4 via Frontend Transport; Mon, 8 Dec 2025 09:34:48 +0000 Received: from BLRKPRNAYAK.amd.com (10.180.168.240) by satlexmb07.amd.com (10.181.42.216) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.17; Mon, 8 Dec 2025 03:34:42 -0600 From: K Prateek Nayak To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Anna-Maria Behnsen , Frederic Weisbecker , Thomas Gleixner CC: , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , K Prateek Nayak , "Gautham R. Shenoy" , Swapnil Sapkal , Shrikanth Hegde , Chen Yu Subject: [RESEND RFC PATCH v2 24/29] sched/fair: Optimize global "nohz.nr_cpus" tracking Date: Mon, 8 Dec 2025 09:27:10 +0000 Message-ID: <20251208092744.32737-24-kprateek.nayak@amd.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20251208083602.31898-1-kprateek.nayak@amd.com> References: <20251208083602.31898-1-kprateek.nayak@amd.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: satlexmb07.amd.com (10.181.42.216) To satlexmb07.amd.com (10.181.42.216) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DS3PEPF0000C37C:EE_|DM4PR12MB6012:EE_ X-MS-Office365-Filtering-Correlation-Id: bce5e76d-58ac-4c12-a8a8-08de363d07c1 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|30052699003|1800799024|36860700013|82310400026|376014|7416014; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?pyNllRGJCQyPRzfDmjVA5IgleoSK/7gpUTXKNdU4fbvUBXbLtIgoNI5DNRsN?= =?us-ascii?Q?2NiHJ9uAdfHuxhZTKmo/cbrVsgliVbUIXes6WHAeVVdshPDSuOmakF9KDPuz?= =?us-ascii?Q?irCg77XhAAppGiFsO5+PoWJgT/O6fjY34ySy177LWENA9ACCUZkmAMIuZOxk?= =?us-ascii?Q?7e1QLf/932mJ2N/GTbmED50xG2sy57RiUONW35IYa8NTPjqYdiwJYCxDcg0S?= =?us-ascii?Q?ZzYaSAF2d+GUyAMN2I27uxiLARUNKVCJkBQcsdlc6ss5MnSTWGFM24Fmy/Tz?= =?us-ascii?Q?GJe6QicT1mCpdGs0cWJH/mZ09sfqFZfigmqoAZiC/LEsr1oioijc1yJ86wX+?= =?us-ascii?Q?wkFj5UzsfQbKWSB5HoVGZmhVlMU9MJjpgyBtWsO6BkOKX+piO0KCOWIqNd4v?= =?us-ascii?Q?eTpT61kXQTlDJgY/e+ivZ2/n3OlLkN4bk/7hm+jK3Fqg7d8zyw1Fo1mRc3P9?= =?us-ascii?Q?quv73M+BuM8dC0q4lwrEorr4GIesg9wSdOsAjynah5bhDLm03Nl+1U3riw4B?= =?us-ascii?Q?pJf97CWgdi1fV0HS0/h+iOwSpABsquvzjF6vo/twhlsm9U3l7K7GhlgptaNW?= =?us-ascii?Q?aLwl6IJkPvW+5//rUiPr/5bF0ywci0ejgwjU2CcY3Ge8MFvRLsdiG7gFTqz3?= =?us-ascii?Q?ILcohS/XOgbFV0NMNvZN6YKvbh4IMDsbrvoy2X0f1J7jdrH8oWOkuos0vYVf?= =?us-ascii?Q?eDSO10/lKQuPE7KNR5PljYrbIQeZ8sKPrWZSWIlVKR9HJfPKpA04RemW1xIm?= =?us-ascii?Q?lDccdLI7fFi4776AiD4P2I2R3a50LKy1Z0J2yN69dH/NETSsjM+KIMEbH0ik?= =?us-ascii?Q?Ge+aa1hBLJf4j4RgfTxjM6KF+9vvk5Rl8L76WkSgtTQZak45PBG7Ih86UvED?= =?us-ascii?Q?rv3asQeP8Pd7INOp6Md2Q8sgyT93DA3z1Q4vovck/vBNLWpfrBa0EPXarkvh?= =?us-ascii?Q?JW8F5akuH3995wb6WhkpRa9eQYfn4mQujSjYLfw8zTP9UXRMHJnip3jHhr2j?= =?us-ascii?Q?Q98QGWo5luU3Dp4Ux7fszSa0ClRIoqhy52vdkpmfYfX/uVi8HV/S3Y0X+O5R?= =?us-ascii?Q?An3mwknUHkkffAkTIo5CSQ6jyXHl0L2TgHaJbJjUlhYYtN6sX7ZNFztiP6HP?= =?us-ascii?Q?8PmlMSTHudteePh+44zxg3kz5liLbQy6xWyRqNhhsgqH5F/g/7dNpkTWjHLe?= =?us-ascii?Q?FYjwrl1zLYXhzZHCWXkt4K8trAJtmOWPpqnCMJyXnF9+7VH7H1srodWRccuy?= =?us-ascii?Q?GYeOpVw08lh2oXGighwSCLa9VcQv+HaCgdKLwUSf6CxUV3gDVy9SxnTKbzRP?= =?us-ascii?Q?I4eUaR17nrK+nArALfmmoEXQil3StIpw6HV44i7igvKcoKDLFcZqc0/+IqF2?= =?us-ascii?Q?NUi52YeGBatjLH5KRFglIivEPikgw+FrbrYTQ7MLrmOjgoKPkLJRtv90upFd?= =?us-ascii?Q?9A2If8cPWakxqVq5YoJCfbCiMGyJMVbRum6CtJw9roirWL5swb8krk360Fs9?= =?us-ascii?Q?tlbFgIwouP6/evEiSPnGPdiq8jz4L1d3npQOJdcfLfvnldIv6SW7rT2hXj69?= =?us-ascii?Q?/82uBCb6wRYVL77KFT0=3D?= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:satlexmb07.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(30052699003)(1800799024)(36860700013)(82310400026)(376014)(7416014);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 08 Dec 2025 09:34:48.3306 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: bce5e76d-58ac-4c12-a8a8-08de363d07c1 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[satlexmb07.amd.com] X-MS-Exchange-CrossTenant-AuthSource: DS3PEPF0000C37C.namprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM4PR12MB6012 Content-Type: text/plain; charset="utf-8" Optimize "nohz.nr_cpus" by tracking number of "sd_nohz->shared" with non-zero "nr_idle_cpus" count via "nohz.nr_doms" and only updating at the boundary of "sd_nohz->shared->nr_idle_cpus" going from 0 -> 1 and back from 1 -> 0. This also introduces a chance of double accounting when a nohz idle entry or the tick races with hotplug or cpuset as described in __nohz_exit_idle_tracking(). __nohz_exit_idle_tracking() called when the sched_domain_shared nodes tracking idle CPUs are freed is used to correct any potential double accounting which can unnecessarily trigger nohz idle balances even when all the CPUs have tick enabled. Signed-off-by: K Prateek Nayak --- kernel/sched/fair.c | 63 ++++++++++++++++++++++++++++++++++++----- kernel/sched/sched.h | 1 + kernel/sched/topology.c | 1 + 3 files changed, 58 insertions(+), 7 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 46cb88e88b31..f622104d54d7 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7143,7 +7143,7 @@ static DEFINE_PER_CPU(cpumask_var_t, should_we_balanc= e_tmpmask); #ifdef CONFIG_NO_HZ_COMMON =20 static struct { - atomic_t nr_cpus; + atomic_t nr_doms; int has_blocked; /* Idle CPUS has blocked load */ int needs_update; /* Newly idle CPUs need their next_balance collated */ unsigned long next_balance; /* in jiffy units */ @@ -12512,7 +12512,7 @@ static void nohz_balancer_kick(struct rq *rq) * None are in tickless mode and hence no need for NOHZ idle load * balancing: */ - if (likely(!atomic_read(&nohz.nr_cpus))) + if (likely(!atomic_read(&nohz.nr_doms))) return; =20 if (READ_ONCE(nohz.has_blocked) && @@ -12609,7 +12609,8 @@ static void set_cpu_sd_state_busy(int cpu) return; =20 cpumask_clear_cpu(cpu, sd->shared->nohz_idle_cpus_mask); - atomic_dec(&sd->shared->nr_idle_cpus); + if (!atomic_dec_return(&sd->shared->nr_idle_cpus)) + atomic_dec(&nohz.nr_doms); } =20 void nohz_balance_exit_idle(struct rq *rq) @@ -12620,7 +12621,6 @@ void nohz_balance_exit_idle(struct rq *rq) return; =20 WRITE_ONCE(rq->nohz_tick_stopped, 0); - atomic_dec(&nohz.nr_cpus); =20 set_cpu_sd_state_busy(rq->cpu); } @@ -12639,7 +12639,58 @@ static void set_cpu_sd_state_idle(int cpu) return; =20 cpumask_set_cpu(cpu, sd->shared->nohz_idle_cpus_mask); - atomic_inc(&sd->shared->nr_idle_cpus); + if (!atomic_fetch_inc(&sd->shared->nr_idle_cpus)) + atomic_inc(&nohz.nr_doms); +} + +/* + * Correct nohz.nr_doms if sd_nohz->shared was found to have non-zero + * nr_idle_cpus when freeing. No local references to sds remain at + * this point and the only reference possible via "nohz_shared_list" + * will be dropped after the grace period. + */ +void __nohz_exit_idle_tracking(struct sched_domain_shared *sds) +{ + + /* + * It is possible for a idle entry to race with sched domain rebuild like: + * + * CPU0 (hotplug) CPU1 (nohz idle) + * + * rq->offline(CPU1) + * set_cpu_sd_state_busy() + * rq->sd =3D sdd; # Processes IPI, re-enters nohz idle + * ... # For old sd_nohz + * ... atomic_fetch_inc(&sd_nohz->shared->nr_idle_cpus); + * ... atomic_inc(&nohz.nr_doms); # XXX: Accounted once + * update_top_cache_domains() + * rq->online(CPU1) + * # rq->nohz_tick_stopped is true + * set_cpu_sd_state_idle() + * # For new sd_nohz + * atomic_fetch_inc(&sd_nohz->shared->nr_idle_cpus); + * atomic_inc(&nohz.nr_doms); # XXX: Accounted twice + * ... + * + * "nohz.nr_doms" is used as an entry criteria in nohz_balancer_kick() + * and this double accounting can lead to wasted idle balancing + * triggers. Use this path to correct the accounting: + * + * # In sds_delayed_free() + * __nohz_exit_idle_tracking(sds) + * # sd->shared->nr_idle_cpus is !=3D 0 + * atomic_dec(&nohz.nr_doms); # XXX: Fixes nohz.nr_doms + */ + if (atomic_read(&sds->nr_idle_cpus)) { + /* + * Reset the "nr_idle_cpus" indicator to prevent + * existing readers from traversing the idle mask + * to reduce chances of traversing the same CPU + * twice. + */ + atomic_set(&sds->nr_idle_cpus, 0); + atomic_dec(&nohz.nr_doms); + } } =20 static void cpu_sd_exit_nohz_balance(struct rq *rq) @@ -12691,8 +12742,6 @@ void nohz_balance_enter_idle(int cpu) =20 WRITE_ONCE(rq->nohz_tick_stopped, 1); =20 - atomic_inc(&nohz.nr_cpus); - set_cpu_sd_state_idle(cpu); =20 /* diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 159c24981ead..3433de20a249 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -3266,6 +3266,7 @@ extern void cfs_bandwidth_usage_dec(void); DECLARE_PER_CPU(struct sched_domain __rcu *, sd_nohz); extern struct list_head nohz_shared_list; =20 +extern void __nohz_exit_idle_tracking(struct sched_domain_shared *sds); extern void nohz_balance_exit_idle(struct rq *rq); #else /* !CONFIG_NO_HZ_COMMON: */ static inline void nohz_balance_exit_idle(struct rq *rq) { } diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index af4ce9451cd0..ec549fb7d7fc 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -615,6 +615,7 @@ static int sds_delayed_free(struct sched_domain_shared = *sds) scoped_guard(raw_spinlock_irqsave, &nohz_shared_list_lock) list_del_rcu(&sds->nohz_list_node); =20 + __nohz_exit_idle_tracking(sds); call_rcu(&sds->rcu, destroy_sched_domain_shared_rcu); return 1; } --=20 2.43.0