From nobody Fri Dec 19 13:09:43 2025 Received: from SJ2PR03CU001.outbound.protection.outlook.com (mail-westusazon11012018.outbound.protection.outlook.com [52.101.43.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6E8DE3271E0 for ; Mon, 8 Dec 2025 09:31:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=52.101.43.18 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765186319; cv=fail; b=fLyRK3VrAelCruaJTLVSc0r1biiIjHIe/ehKp9sebrqkdgVQFDT6MXANPoOMHycXoik0B/7GRZuboBCpLdkYtq+UwaIYS+14UBYJL+uFgP7g8Oo7+h3Se96890CZDpkFy1ov1xBRTJsVURcBSlSpWuozRjb36xgBLYly1GGWoZc= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765186319; c=relaxed/simple; bh=w4z2hS6Z0DknzfaGy68kPm6nyYNRvU7OtzEh6f6qxQI=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=PAxz0MnmkewezZ7n7LsO6NJkGpgQJinNU4pemmu7vMfrzu0TJCkEz87YxLApIsGxeAIYMECBrNbEVFYlWb6YmH9ygZBytm/33k+/l19Or+5bEO874wsJmWkqg9wOdMRurFDUBtkNFA0ErUAPjtdFbPnnkRbruSGChegKfLULc1w= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com; spf=fail smtp.mailfrom=amd.com; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b=IgTlEj75; arc=fail smtp.client-ip=52.101.43.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="IgTlEj75" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=JhgkKBe/ex4CGboq0+AouY1o8IBHlqMsPwUJEYVsbPXIQacFOwpAMdKzQGCiRzH2tDw4DMwdR5Wp+wV57F/LvVGPtjNdJ5+bHbK7uiHPDEA3UoczCNjbg76kb8TiNOBUkCsyBvovknXn8DNz+eCxEaOLaYJs+ULlIyb/j82l3TWuNNpWZlT6XWc9A5R2HrUEaTez2D30XSmw+kXWhMG9TPh5SxWOvJMT0syxuKuA5fzUGyV6oJkpQUB8vm6SPxZ9T0DroQASZnjvU2ATeDO5w/UeXVhpd4cjG5iSaoPC2WrFQpLXLQ+Qom/NS9odu7HobTfkW83lVZomp8w3cXnPBQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=avl1SV63CQe0N5xFKLhun+4ovwZKBYUUvyI9CQSIhlQ=; b=Hs7qqDzfOPknufPT+9GFZfcvCdkLWkMDgxX8VICRSTk2j49CGK5uH0OANtghFToCmHLAnlYsnc35TU/eVUnG1REOcAcu/xH/APl3PsP/7qHwjflHvS4fSmbY7WW2zjys8QBgpjiHmiED/UQYXKN466TYTFcf11FOE8x0+FH8yrWHYEADxsWb0eqqnCnaVXv79D/kcuvW7/0Cu76RyZ80cFi3RKVQb5qJoLZ+xAxkxgjDXaHJ85mR0xvJvjxT9oPCZhBO/zE8MZPf7NSaX6ebTfoPwoje9i9GBYVI1SVFMz/P8Cr0fyCmx9SbJv/1+/TSYXtEk9xuXUEbARFqyZ1E4Q== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=redhat.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=avl1SV63CQe0N5xFKLhun+4ovwZKBYUUvyI9CQSIhlQ=; b=IgTlEj759uKev4hbvjltrvBj813ZRNrR8rB/ekeiPlRGUEStZfCblKhEFMBdpm0i+UR09BpwjZ682u30kWLe1RA5qeQGIpu6NFq4lIHz7NgiiOJdjUGwa4GQbdE3aWKgLt0NQEpRZYh6d+DRh7UGKVLUwMhI3gad1+lWTqGnQ6A= Received: from DM5PR07CA0076.namprd07.prod.outlook.com (2603:10b6:4:ad::41) by DS7PR12MB6192.namprd12.prod.outlook.com (2603:10b6:8:97::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9388.14; Mon, 8 Dec 2025 09:31:50 +0000 Received: from DS3PEPF0000C380.namprd04.prod.outlook.com (2603:10b6:4:ad:cafe::a5) by DM5PR07CA0076.outlook.office365.com (2603:10b6:4:ad::41) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9388.14 via Frontend Transport; Mon, 8 Dec 2025 09:31:52 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=satlexmb07.amd.com; pr=C Received: from satlexmb07.amd.com (165.204.84.17) by DS3PEPF0000C380.mail.protection.outlook.com (10.167.23.10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9412.4 via Frontend Transport; Mon, 8 Dec 2025 09:31:50 +0000 Received: from BLRKPRNAYAK.amd.com (10.180.168.240) by satlexmb07.amd.com (10.181.42.216) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.17; Mon, 8 Dec 2025 03:31:44 -0600 From: K Prateek Nayak To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Anna-Maria Behnsen , Frederic Weisbecker , Thomas Gleixner CC: , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , K Prateek Nayak , "Gautham R. Shenoy" , Swapnil Sapkal , Shrikanth Hegde , Chen Yu Subject: [RESEND RFC PATCH v2 14/29] sched/topology: Introduce fallback sd->shared assignment Date: Mon, 8 Dec 2025 09:27:00 +0000 Message-ID: <20251208092744.32737-14-kprateek.nayak@amd.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20251208083602.31898-1-kprateek.nayak@amd.com> References: <20251208083602.31898-1-kprateek.nayak@amd.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: satlexmb07.amd.com (10.181.42.216) To satlexmb07.amd.com (10.181.42.216) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DS3PEPF0000C380:EE_|DS7PR12MB6192:EE_ X-MS-Office365-Filtering-Correlation-Id: c66c6292-1bef-47ad-0428-08de363c9d7e X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|1800799024|36860700013|7416014|376014|82310400026; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?YiSF6taHIQxrYEcgPDeHXUhm3JnT3RcZpLl5maQ6YLVPuq3mS7x22IpqxF1w?= =?us-ascii?Q?KZou0uZ1mXWOySzT5L028fbCYU7mq1+H60yAzEq1ljQIiIGTN4Zut3ZFzVuK?= =?us-ascii?Q?B0D/YLWa09+LLjWwsZdmbKy/16pnnmjFjMAFGXjsBSthEOk40zulMTmW2pv8?= =?us-ascii?Q?uiupvx96VJzxSVLlC2nSpEFg6HNjWRSLYAZdvFuwQMhd8X4qR2GfRhJ6zKT0?= =?us-ascii?Q?Iaz1q5bxrKCeHlSiTuW5hZpUc/hrJRe7KcYaH5ILrVVp0sRF5G3ZZ35Rpn/e?= =?us-ascii?Q?6epXsGoaSg6yvvYqXRrwJe9QZRuRRm/h8oj/USPhIWA5QrZluZ1MSJxC/t4k?= =?us-ascii?Q?06Qlq0pS+LvLYM75L0Qug/jD+jupR+bOhI3nRzV4yI9Ast1u0o8lKb2HAEJA?= =?us-ascii?Q?22csE8S05T8TKd3AcX7PYlGuenPpv/zUZZAsGr1fNx9eNlypY1vFtusJxNj4?= =?us-ascii?Q?2GRadGCuNmOMq1/SyZxZQDNabRIQFoAHcTruKQKMnQTSkKrwcKIFXTLvbAfb?= =?us-ascii?Q?3UkV4v1axwQHOuwqJXw0uDXWXyIC43OFZ2h5kRpXSMD0/ufy/NW+JgwyFBJ9?= =?us-ascii?Q?aYNepMYI7ZvAiyEeRs4bwD0spf2dHfcuF5oyCIAFtSpNwg4AjpqMVYK+EuoO?= =?us-ascii?Q?9eGX4JUgOISppFhRfl93OiQlp1K3iFwo7vdt8lchEMkkN7LWfNBqlfIowPDW?= =?us-ascii?Q?BiY9+Enk1H92iLGKKgFic2oB//0ikKOTvzYJ1KgfGWZLAtY8g7g+WVA9rUjn?= =?us-ascii?Q?d13zw71FoWPuyV5VxT3m/b+felFmTeNw4zkt8hfTUxjWbLp0d98ixNh7/vTf?= =?us-ascii?Q?cKinBiJZIgFaOzLdAedNk34axBdWaCbG/rdtWOzernFGHmbnkL9XoObfF8FU?= =?us-ascii?Q?cYLKabnv+uZT0LaXeFie77Yp2CUhHn0iQRFfXs2R9DmZk0uWPNLtqQAcJBbu?= =?us-ascii?Q?tSHnThYSBuUKnGtoupDKH/ezqWxSotXm7qqgKe1JvXV90jXrMMR0+C4R507E?= =?us-ascii?Q?usRaO7OCGRuntuInmOtxFbN4rJPfyL7RngHTEgirwwagl+MfPY3MsFwdfBvq?= =?us-ascii?Q?AsgXniBzoZY2B6nbGcb/FcUaJ0ivbKP0Y7SRwPC5wuyT0q7p39jMVSr+w6U2?= =?us-ascii?Q?ChJW0h1IRGFVVngIgscA6jGzFl8mX1ESBAx2gfCr0Iimd1gzDAii5xp8/Z6a?= =?us-ascii?Q?qsJGPpoawelwDQLJ3jxbEBxlhxjdbJ5J970xdlTYDq2S4DuoSHbQ0dZV86AF?= =?us-ascii?Q?dCy1GHtT8cIUTggaDKZnJiRldizmKnZNg+6+6nH7nTzKszDT/QarfVUINEZ2?= =?us-ascii?Q?Npjhyswx4o6so68RH0xYeM/dM3aIgo9tzLQEEWjQG2+Qj8VQtyiUUFRxc9TI?= =?us-ascii?Q?HXK5jiUIHT2FOLv2xptvuMiTwWa9mepEdHjMmFjKIJN71BA7WEUgxfv/K87k?= =?us-ascii?Q?vo7ioqrdO5FHYnGO4HPTeALcxONgn5pBPau7RuhZqDzLEYfILNnGlMwxidIK?= =?us-ascii?Q?yEAwvFyJLuFT/KuPK87Nx4NHPxBxZ4j2xELgRM7MJGeZcUXcNXgMuJd8E9iC?= =?us-ascii?Q?5gylDarTQ0JGCLCia9o=3D?= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:satlexmb07.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(1800799024)(36860700013)(7416014)(376014)(82310400026);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 08 Dec 2025 09:31:50.0793 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: c66c6292-1bef-47ad-0428-08de363c9d7e X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[satlexmb07.amd.com] X-MS-Exchange-CrossTenant-AuthSource: DS3PEPF0000C380.namprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS7PR12MB6192 Content-Type: text/plain; charset="utf-8" Going forward, tying the nohz balancing to "sd->shared" will require each CPU's hierarchy to have at least one "sd->shared" object tracking its idle status. If the lowest domain of the hierarchy after degeneration does not have the SD_SHARE_LLC flag set, assign a per-Node fallback shared object to the lowest domain on CONFIG_NO_HZ_COMMON. !CONFIG_NO_HZ_COMMON kernels will always have tick enabled on idle CPUs and will not require nohz idle tracking. An example scenario where the fallback shared object is used is as follows - Consider a cpuset with 17CPUs where 16CPUs are from the same LLC and a singleton CPU from another LLC: CPU0: domain0: MC {0-15} groups: {0} {1} ... {15} domain1: PKG {0-16} groups: {0-15} {16} ... CPU15: domain0: MC {0-15} groups: {15} {0} {1} ... {14} domain1: PKG {0-16} groups: {0-15} {16} CPU16: # MC is degenerated since {16} is the only CPU is the domain domain0: PKG {0-15} groups: {16} {0-15} # Assign sd[PKG]->shared =3D fallback_dhared[cpu_to_node(16)] If the lowest domain is a SD_OVERLAP domain, "sd->shared" is only shared by the CPUs on the same node and not the entire domain. This is acceptable since the fallback shared object is only keeping track of the CPU's idle status unlike sd_llc_shared which also tracks "has_idle_cores" and "nr_idle_scan". Signed-off-by: K Prateek Nayak --- kernel/sched/topology.c | 123 ++++++++++++++++++++++++++++++++++++++-- 1 file changed, 117 insertions(+), 6 deletions(-) diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index 6b14c7db3e35..3a0740be9fcd 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -455,6 +455,96 @@ static bool build_perf_domains(const struct cpumask *c= pu_map) static void free_pd(struct perf_domain *pd) { } #endif /* !(CONFIG_ENERGY_MODEL && CONFIG_CPU_FREQ_GOV_SCHEDUTIL) */ =20 +struct s_data { +#ifdef CONFIG_NO_HZ_COMMON + struct sched_domain_shared **fallback_nohz_sds; +#endif + struct sched_domain_shared * __percpu *sds; + struct sched_domain * __percpu *sd; + struct root_domain *rd; +}; + +#ifdef CONFIG_NO_HZ_COMMON + +static int __fallback_sds_alloc(struct s_data *d, unsigned long *visited_n= odes) +{ + int j; + + d->fallback_nohz_sds =3D kcalloc(nr_node_ids, + sizeof(*d->fallback_nohz_sds), GFP_KERNEL); + if (!d->fallback_nohz_sds) + return -ENOMEM; + + /* + * Allocate a fallback sd->shared object + * for each node covered by the cpu_map. + */ + for_each_set_bit(j, visited_nodes, nr_node_ids) { + struct sched_domain_shared *sds; + + sds =3D kzalloc_node(sizeof(struct sched_domain_shared), + GFP_KERNEL, j); + if (!sds) + return -ENOMEM; + + d->fallback_nohz_sds[j] =3D sds; + } + + return 0; +} + +static void __fallback_sds_free(struct s_data *d) +{ + int j; + + if (!d->fallback_nohz_sds) + return; + + for (j =3D 0; j < nr_node_ids; ++j) + kfree(d->fallback_nohz_sds[j]); + + kfree(d->fallback_nohz_sds); + d->fallback_nohz_sds =3D NULL; +} + +static void assign_fallback_sds(struct s_data *d, struct sched_domain *sd,= int cpu) +{ + struct sched_domain_shared *sds; + + sds =3D d->fallback_nohz_sds[cpu_to_node(cpu)]; + sd->shared =3D sds; + atomic_inc(&sd->shared->ref); +} + +static void claim_fallback_sds(struct s_data *d) +{ + int j; + + /* + * Claim allocations for the fallback shared objects + * if they were assigned during cpu_attach_domain(). + */ + for (j =3D 0; j < nr_node_ids; ++j) { + struct sched_domain_shared *sds =3D d->fallback_nohz_sds[j]; + + if (sds && atomic_read(&sds->ref)) + d->fallback_nohz_sds[j] =3D NULL; + } +} + +#else /* !CONFIG_NO_HZ_COMMON */ + +static inline int __fallback_sds_alloc(struct s_data *d, unsigned long *vi= sited_nodes) +{ + return 0; +} + +static inline void __fallback_sds_free(struct s_data *d) { } +static inline void assign_fallback_sds(struct s_data *d, struct sched_doma= in *sd, int cpu) { } +static inline void claim_fallback_sds(struct s_data *d) { } + +#endif /* CONFIG_NO_HZ_COMMON */ + static void free_rootdomain(struct rcu_head *rcu) { struct root_domain *rd =3D container_of(rcu, struct root_domain, rcu); @@ -716,12 +806,6 @@ static void update_top_cache_domain(int cpu) rcu_assign_pointer(per_cpu(sd_asym_cpucapacity, cpu), sd); } =20 -struct s_data { - struct sched_domain_shared * __percpu *sds; - struct sched_domain * __percpu *sd; - struct root_domain *rd; -}; - /* * Attach the domain 'sd' to 'cpu' as its base domain. Callers must * hold the hotplug lock. @@ -790,6 +874,14 @@ cpu_attach_domain(struct s_data *d, int cpu) } } =20 + /* + * Ensure there is at least one domain in the + * hierarchy with sd->shared attached to + * ensure participation in nohz balancing. + */ + if (sd && !(sd->flags & SD_SHARE_LLC)) + assign_fallback_sds(d, sd, cpu); + sched_domain_debug(sd, cpu); =20 tmp =3D rq->sd; @@ -2462,12 +2554,19 @@ static void __sdt_free(const struct cpumask *cpu_ma= p) =20 static int __sds_alloc(struct s_data *d, const struct cpumask *cpu_map) { + unsigned long *visited_nodes; int j; =20 + visited_nodes =3D bitmap_alloc(nr_node_ids, GFP_KERNEL); + if (!visited_nodes) + return -ENOMEM; + d->sds =3D alloc_percpu(struct sched_domain_shared *); if (!d->sds) return -ENOMEM; =20 + bitmap_zero(visited_nodes, nr_node_ids); + for_each_cpu(j, cpu_map) { struct sched_domain_shared *sds; =20 @@ -2476,9 +2575,13 @@ static int __sds_alloc(struct s_data *d, const struc= t cpumask *cpu_map) if (!sds) return -ENOMEM; =20 + bitmap_set(visited_nodes, cpu_to_node(j), 1); *per_cpu_ptr(d->sds, j) =3D sds; } =20 + if (__fallback_sds_alloc(d, visited_nodes)) + return -ENOMEM; + return 0; } =20 @@ -2492,6 +2595,8 @@ static void __sds_free(struct s_data *d, const struct= cpumask *cpu_map) for_each_cpu(j, cpu_map) kfree(*per_cpu_ptr(d->sds, j)); =20 + __fallback_sds_free(d); + free_percpu(d->sds); d->sds =3D NULL; } @@ -2730,6 +2835,12 @@ build_sched_domains(const struct cpumask *cpu_map, s= truct sched_domain_attr *att if (lowest_flag_domain(i, SD_CLUSTER)) has_cluster =3D true; } + + /* + * Claim allocations for the fallback shared objects + * if they were assigned during cpu_attach_domain(). + */ + claim_fallback_sds(&d); rcu_read_unlock(); =20 if (has_asym) --=20 2.43.0