From nobody Fri Oct 3 21:07:55 2025 Received: from NAM02-SN1-obe.outbound.protection.outlook.com (mail-sn1nam02on2069.outbound.protection.outlook.com [40.107.96.69]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E64AD1F4191; Tue, 26 Aug 2025 04:14:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.96.69 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756181685; cv=fail; b=YNC7gSkx3t8d08cv4bs8nitbh1uTsh37HME9GVe5YHLBYOEaCJkoqF0bvnYRGuhHE6N2YiGBLIq5cwIYYZ2oh4f2MVXz+eqabF2JU54BPWnU/TQHQbriMckgzj3wV07YjgMGB3kGoD4Wn01hSdvw8KseDwWeSgayP1k63yf5hfg= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756181685; c=relaxed/simple; bh=XP8+M/d3rPt8yIpZepI4e/Tpkw7ghkiCIaEqel9mwgM=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=S003865XgSDl8gftQsGbMdsWaOeW9aDrOj9yVdvjNCI/Axepolv7GBqPcllcIpWNbzFU2ghsfcvU5IE5tNCBK3JvTw8GmWNBF6GKz5fZzNxavHrHwhVorLbx1yymbVPN2MAUPxht2gBiklpCPjKkAkE0ZQVu/NNSUtAbiarGaqI= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com; spf=fail smtp.mailfrom=amd.com; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b=EpXg6T+y; arc=fail smtp.client-ip=40.107.96.69 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="EpXg6T+y" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=b1akbWaBMDm+f3Df7wxTkjTaKyv7Vp2AELArHIoJ7SOAPg2bJKq+5GuMc27jBlagSaVwdJvdtCYj1HGXdUAEOhaSQUFDoDLZZSLsQer3axsTStPlgL9l1X00CfTDO6iJuFrlZcLqZkMREZw2Iv8kqZr22tb6FBjWKo6mz3ZEtl+vlAayN0OvMxGoERoaJhmhOyxCPB803mewEIb32SbTZO3HYATjUFwWb1BeDh9WHQvJS1crPJxwFgVvFh+ZTeQF12hb111adB2s7WKPzkSCBBW7BIEix892j29wUtpYPIg+d2tEXt4/AMj81OwtNmgJO+dRT47I+A7rTUexdj+Wcw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=wXS8hBE21YSUyENkKZmF6OaDotX7yQR55AvFTIUEvOY=; b=brp9mKQ5Jxl/fxfeSIUku/7Z8jP4WTsC1O8JuUW0d3Jx/s1IZu+9GdCiuZqS6jl5QLgVCy7CewShCerzIsFVd1CGZrGaDY0UytvwbkIi2lH+4WsbFFgRlNuX2ktZ+YDY7uJ1rvNLIDxkEjBMWFFchNlIwEjM2JNJtMQ2xVn0SiKAUF5GCZEIH7M1MwQSRjPn0WCx0GGG36SI0T/9GE2pWDnxgMYI1zR3Ddn9xEARTYbgmiT1gZzQt/7JfNOFIQP3cVgDCG4NmTpJm7/yUsqLVpA8Mc8eYC5Yho1Y5ARdg/4qnUycUveT6yjfkpp0IEi/aB7ZuR2F/w1sXnQEcMzihA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=linux.ibm.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=wXS8hBE21YSUyENkKZmF6OaDotX7yQR55AvFTIUEvOY=; b=EpXg6T+ysN0Ncg8KtaydeE288R1uvHOKUR6lXvZztJuRI+IhIsS5vTIBC6jAHvgWWCGNeAa/9nu2W44T0Q2rTlvGkdB9eba0FbB/Mv6dLbpD85h1FzK0OaHVGshBoWZMExx7EOa9ot2gWKIU9z59voj2VKDS1xChjX94fd66Yi4= Received: from DM6PR07CA0120.namprd07.prod.outlook.com (2603:10b6:5:330::10) by DS0PR12MB8198.namprd12.prod.outlook.com (2603:10b6:8:f2::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9052.21; Tue, 26 Aug 2025 04:14:35 +0000 Received: from DS1PEPF0001708E.namprd03.prod.outlook.com (2603:10b6:5:330:cafe::28) by DM6PR07CA0120.outlook.office365.com (2603:10b6:5:330::10) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9052.20 via Frontend Transport; Tue, 26 Aug 2025 04:14:35 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by DS1PEPF0001708E.mail.protection.outlook.com (10.167.17.134) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.9073.11 via Frontend Transport; Tue, 26 Aug 2025 04:14:35 +0000 Received: from BLRKPRNAYAK.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Mon, 25 Aug 2025 23:14:18 -0500 From: K Prateek Nayak To: Madhavan Srinivasan , Michael Ellerman , Nicholas Piggin , Christophe Leroy , Heiko Carstens , "Vasily Gorbik" , Alexander Gordeev , Christian Borntraeger , Sven Schnelle , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , , "H. Peter Anvin" , Peter Zijlstra , Juri Lelli , Vincent Guittot , , , CC: Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , K Prateek Nayak , , Li Chen , Bibo Mao , Mete Durlu , Tobias Huschle , "Easwar Hariharan" , Guo Weikang , "Rafael J. Wysocki" , Brian Gerst , Patryk Wlazlyn , Swapnil Sapkal , "Yury Norov [NVIDIA]" , Sudeep Holla , Jonathan Cameron , Andrea Righi , Yicong Yang , Ricardo Neri , Tim Chen , Vinicius Costa Gomes Subject: [PATCH v7 1/8] sched/fair: Get rid of sched_domains_curr_level hack for tl->cpumask() Date: Tue, 26 Aug 2025 04:13:12 +0000 Message-ID: <20250826041319.1284-2-kprateek.nayak@amd.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250826041319.1284-1-kprateek.nayak@amd.com> References: <20250826041319.1284-1-kprateek.nayak@amd.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DS1PEPF0001708E:EE_|DS0PR12MB8198:EE_ X-MS-Office365-Filtering-Correlation-Id: b72068d9-00d1-45f7-766e-08dde45710e6 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|82310400026|376014|7416014|36860700013|1800799024|13003099007|921020; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?IQu69FlEnjRyGp1FXz6UdNT2brBwSFeKUdn7Y/YXjgCLVamKan86h8rrsqGJ?= =?us-ascii?Q?Rt3ATwUmBcKrtvxhoOfxSULsmhqKYXZtalLKKvQYQdAfpdVnOB6kPLDWY44P?= =?us-ascii?Q?3Qky7q+bHqkV07eKglEzNF85oM6g7H4AkN/dyhPahhFCOHElPrkmRvo2z06m?= =?us-ascii?Q?2WcafHoTEBro8Nx1RNgQFOrTMTqLn6k6wn5c+Hu1jP9nNPUCXOPAmiKqT0Wn?= =?us-ascii?Q?ShQids+TZF52p3MLsg9vooWya72ZoYsqSEJPm+zuL23/LLmnQcTP9GIWSa4S?= =?us-ascii?Q?LYlS6i7NBlojj8qJK/3h54oiJhtF52OjggNf7wKpjV4cnel0EcMqtLH00mbB?= =?us-ascii?Q?WAw96Mz8mWtJPzTz4sFbYgzNVxfcs0fYf0D75HUreDwMJ/BH2JxFiaJt5PqG?= =?us-ascii?Q?xX6p12gUkelbrw0OayPReQfEtz0/zEy0bKNhuoVGL4/Bx1MytnBKc1LSLcF1?= =?us-ascii?Q?e3tWfNWALXwG5fFD3aIkWFjlW93UF4hqSAYnRJVGsJWX/zx+I1g58IhqnHrM?= =?us-ascii?Q?qXJuEn2jjnNfDdttPhrcxfSTKd9xdSk+UC3FNXbsKu1vXqUxYkEpeJfF4Qys?= =?us-ascii?Q?4ZQWzksrh6RKz2odGUsT08RnCGDD6p91QbpwOznhoW6czmX6rMgduNTf5DDH?= =?us-ascii?Q?1xT6oKlXvhMMHKcgU06dg7M88fI5thzEw5HLeoBHZ+TTHSFrlvDRopoXBBw8?= =?us-ascii?Q?wO33zQhis4hAKBmYX3Hjl4wElJ970pRTlpb7Jore3gTCZZW8qCwMNLrH1Nmg?= =?us-ascii?Q?jZ2xf81aobNLr7o8e7tNFt8nCoYm2TF6Xzkm9GGo8KCLq0TBfkKxeRj7k06+?= =?us-ascii?Q?IiY+CxFOdZ0nSoF//j+Dw98q00rlk/kXuHBfQOnD6uxV+ltA0M3qA5OqqPMv?= =?us-ascii?Q?XrKy5lJZtuiivLrYP6rZEU4XU7VIrchGXJ2Ee9scD5KUDdSKzV4sTzsVFw4A?= =?us-ascii?Q?yEACTsGur8Pil1tS/ad5FyJU+cWtj+7bqzjya2es7LXBvvukg/qlWlHKa4n/?= =?us-ascii?Q?d7Vb/ZM1Y8IpriTLlzfBf40T+4x3/AhO17/YKDn11CYgNf2PNsXqZanUODVZ?= =?us-ascii?Q?6YzjfZkh7CVaoxh0Cd2bFnkllFqWct+O4iVdisXyLhmBZvm9YCkkFUgyEvYM?= =?us-ascii?Q?sdrP42yol1BdzPZNnAVdjkWPoqku5kznq4rKOWFxBM/eS99YAtqRlXOUlqoT?= =?us-ascii?Q?v+FY6Sdq7Cq8jcd71W1R+zBsrZWfWJ8tzueHiDTYxFm9hvb8KwqvimD3X/OC?= =?us-ascii?Q?HaKRtOBno/u9xEtPFLInyv1Z+qQsH+ZnsmutYRyd7k5Z/ZN2qfpL2X3YKTxy?= =?us-ascii?Q?THpE1S9MDkGlEAWqXExVVmcu/X1XABinkd7jM/h1tU1pqAlMuh6GUNQAivWt?= =?us-ascii?Q?J/CqWWZ2VqxWbRR6M/EhW1aQD3hW1rpi14sa1M10Uhz9snjv/y6aUJzfWI59?= =?us-ascii?Q?9Bl7VUWrmmxs0SIkrtt437owkGNFHLkTEJgCS1Leht1AZkyl5/AGZCnfQhrf?= =?us-ascii?Q?h2CDV2ZDVCxffadw85ORTJN+OV1vIQvuc2yZdHsMTYY9vSi6R+6G875KleWV?= =?us-ascii?Q?gl62j5D69T3QTxcECy8=3D?= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(82310400026)(376014)(7416014)(36860700013)(1800799024)(13003099007)(921020);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 26 Aug 2025 04:14:35.2491 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: b72068d9-00d1-45f7-766e-08dde45710e6 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: DS1PEPF0001708E.namprd03.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS0PR12MB8198 Content-Type: text/plain; charset="utf-8" From: Peter Zijlstra Leon [1] and Vinicius [2] noted a topology_span_sane() warning during their testing starting from v6.16-rc1. Debug that followed pointed to the tl->mask() for the NODE domain being incorrectly resolved to that of the highest NUMA domain. tl->mask() for NODE is set to the sd_numa_mask() which depends on the global "sched_domains_curr_level" hack. "sched_domains_curr_level" is set to the "tl->numa_level" during tl traversal in build_sched_domains() calling sd_init() but was not reset before topology_span_sane(). Since "tl->numa_level" still reflected the old value from build_sched_domains(), topology_span_sane() for the NODE domain trips when the span of the last NUMA domain overlaps. Instead of replicating the "sched_domains_curr_level" hack, get rid of it entirely and instead, pass the entire "sched_domain_topology_level" object to tl->cpumask() function to prevent such mishap in the future. sd_numa_mask() now directly references "tl->numa_level" instead of relying on the global "sched_domains_curr_level" hack to index into sched_domains_numa_masks[]. The original warning was reproducible on the following NUMA topology reported by Leon: $ sudo numactl -H available: 5 nodes (0-4) node 0 cpus: 0 1 node 0 size: 2927 MB node 0 free: 1603 MB node 1 cpus: 2 3 node 1 size: 3023 MB node 1 free: 3008 MB node 2 cpus: 4 5 node 2 size: 3023 MB node 2 free: 3007 MB node 3 cpus: 6 7 node 3 size: 3023 MB node 3 free: 3002 MB node 4 cpus: 8 9 node 4 size: 3022 MB node 4 free: 2718 MB node distances: node 0 1 2 3 4 0: 10 39 38 37 36 1: 39 10 38 37 36 2: 38 38 10 37 36 3: 37 37 37 10 36 4: 36 36 36 36 10 The above topology can be mimicked using the following QEMU cmd that was used to reproduce the warning and test the fix: sudo qemu-system-x86_64 -enable-kvm -cpu host \ -m 20G -smp cpus=3D10,sockets=3D10 -machine q35 \ -object memory-backend-ram,size=3D4G,id=3Dm0 \ -object memory-backend-ram,size=3D4G,id=3Dm1 \ -object memory-backend-ram,size=3D4G,id=3Dm2 \ -object memory-backend-ram,size=3D4G,id=3Dm3 \ -object memory-backend-ram,size=3D4G,id=3Dm4 \ -numa node,cpus=3D0-1,memdev=3Dm0,nodeid=3D0 \ -numa node,cpus=3D2-3,memdev=3Dm1,nodeid=3D1 \ -numa node,cpus=3D4-5,memdev=3Dm2,nodeid=3D2 \ -numa node,cpus=3D6-7,memdev=3Dm3,nodeid=3D3 \ -numa node,cpus=3D8-9,memdev=3Dm4,nodeid=3D4 \ -numa dist,src=3D0,dst=3D1,val=3D39 \ -numa dist,src=3D0,dst=3D2,val=3D38 \ -numa dist,src=3D0,dst=3D3,val=3D37 \ -numa dist,src=3D0,dst=3D4,val=3D36 \ -numa dist,src=3D1,dst=3D0,val=3D39 \ -numa dist,src=3D1,dst=3D2,val=3D38 \ -numa dist,src=3D1,dst=3D3,val=3D37 \ -numa dist,src=3D1,dst=3D4,val=3D36 \ -numa dist,src=3D2,dst=3D0,val=3D38 \ -numa dist,src=3D2,dst=3D1,val=3D38 \ -numa dist,src=3D2,dst=3D3,val=3D37 \ -numa dist,src=3D2,dst=3D4,val=3D36 \ -numa dist,src=3D3,dst=3D0,val=3D37 \ -numa dist,src=3D3,dst=3D1,val=3D37 \ -numa dist,src=3D3,dst=3D2,val=3D37 \ -numa dist,src=3D3,dst=3D4,val=3D36 \ -numa dist,src=3D4,dst=3D0,val=3D36 \ -numa dist,src=3D4,dst=3D1,val=3D36 \ -numa dist,src=3D4,dst=3D2,val=3D36 \ -numa dist,src=3D4,dst=3D3,val=3D36 \ ... [ prateek: Fixed build issues on s390 and ppc, put everything behind the respective CONFIG_SCHED_* ] Reported-by: Leon Romanovsky Closes: https://lore.kernel.org/lkml/20250610110701.GA256154@unreal/ [1] Fixes: ccf74128d66c ("sched/topology: Assert non-NUMA topology masks don't = (partially) overlap") # ce29a7da84cd, f55dac1dafb3 Link: https://lore.kernel.org/lkml/a3de98387abad28592e6ab591f3ff6107fe01dc1= .1755893468.git.tim.c.chen@linux.intel.com/ [2] Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: K Prateek Nayak --- arch/powerpc/kernel/smp.c | 26 +++++++++++----- arch/s390/kernel/topology.c | 20 +++++++++---- arch/x86/kernel/smpboot.c | 30 ++++++++++++++++--- include/linux/sched/topology.h | 4 ++- include/linux/topology.h | 2 +- kernel/sched/topology.c | 54 ++++++++++++++++++++++------------ 6 files changed, 99 insertions(+), 37 deletions(-) diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c index f59e4b9cc207..862f50c09539 100644 --- a/arch/powerpc/kernel/smp.c +++ b/arch/powerpc/kernel/smp.c @@ -1028,16 +1028,21 @@ static int powerpc_shared_proc_flags(void) * We can't just pass cpu_l2_cache_mask() directly because * returns a non-const pointer and the compiler barfs on that. */ -static const struct cpumask *shared_cache_mask(int cpu) +static const struct cpumask *shared_cache_mask(struct sched_domain_topolog= y_level *tl, int cpu) { return per_cpu(cpu_l2_cache_map, cpu); } =20 #ifdef CONFIG_SCHED_SMT -static const struct cpumask *smallcore_smt_mask(int cpu) +static const struct cpumask *smallcore_smt_mask(struct sched_domain_topolo= gy_level *tl, int cpu) { return cpu_smallcore_mask(cpu); } + +static const struct cpumask *tl_smt_mask(struct sched_domain_topology_leve= l *tl, int cpu) +{ + return cpu_smt_mask(cpu); +} #endif =20 static struct cpumask *cpu_coregroup_mask(int cpu) @@ -1054,11 +1059,16 @@ static bool has_coregroup_support(void) return coregroup_enabled; } =20 -static const struct cpumask *cpu_mc_mask(int cpu) +static const struct cpumask *cpu_mc_mask(struct sched_domain_topology_leve= l *tl, int cpu) { return cpu_coregroup_mask(cpu); } =20 +static const struct cpumask *cpu_pkg_mask(struct sched_domain_topology_lev= el *tl, int cpu) +{ + return cpu_node_mask(cpu); +} + static int __init init_big_cores(void) { int cpu; @@ -1448,7 +1458,7 @@ static bool update_mask_by_l2(int cpu, cpumask_var_t = *mask) return false; } =20 - cpumask_and(*mask, cpu_online_mask, cpu_cpu_mask(cpu)); + cpumask_and(*mask, cpu_online_mask, cpu_node_mask(cpu)); =20 /* Update l2-cache mask with all the CPUs that are part of submask */ or_cpumasks_related(cpu, cpu, submask_fn, cpu_l2_cache_mask); @@ -1538,7 +1548,7 @@ static void update_coregroup_mask(int cpu, cpumask_va= r_t *mask) return; } =20 - cpumask_and(*mask, cpu_online_mask, cpu_cpu_mask(cpu)); + cpumask_and(*mask, cpu_online_mask, cpu_node_mask(cpu)); =20 /* Update coregroup mask with all the CPUs that are part of submask */ or_cpumasks_related(cpu, cpu, submask_fn, cpu_coregroup_mask); @@ -1601,7 +1611,7 @@ static void add_cpu_to_masks(int cpu) =20 /* If chip_id is -1; limit the cpu_core_mask to within PKG */ if (chip_id =3D=3D -1) - cpumask_and(mask, mask, cpu_cpu_mask(cpu)); + cpumask_and(mask, mask, cpu_node_mask(cpu)); =20 for_each_cpu(i, mask) { if (chip_id =3D=3D cpu_to_chip_id(i)) { @@ -1703,7 +1713,7 @@ static void __init build_sched_topology(void) powerpc_topology[i++] =3D SDTL_INIT(smallcore_smt_mask, powerpc_smt_flags, SMT); } else { - powerpc_topology[i++] =3D SDTL_INIT(cpu_smt_mask, powerpc_smt_flags, SMT= ); + powerpc_topology[i++] =3D SDTL_INIT(tl_smt_mask, powerpc_smt_flags, SMT); } #endif if (shared_caches) { @@ -1716,7 +1726,7 @@ static void __init build_sched_topology(void) SDTL_INIT(cpu_mc_mask, powerpc_shared_proc_flags, MC); } =20 - powerpc_topology[i++] =3D SDTL_INIT(cpu_cpu_mask, powerpc_shared_proc_fla= gs, PKG); + powerpc_topology[i++] =3D SDTL_INIT(cpu_pkg_mask, powerpc_shared_proc_fla= gs, PKG); =20 /* There must be one trailing NULL entry left. */ BUG_ON(i >=3D ARRAY_SIZE(powerpc_topology) - 1); diff --git a/arch/s390/kernel/topology.c b/arch/s390/kernel/topology.c index 46569b8e47dd..5129e3ffa7f5 100644 --- a/arch/s390/kernel/topology.c +++ b/arch/s390/kernel/topology.c @@ -509,7 +509,7 @@ int topology_cpu_init(struct cpu *cpu) return rc; } =20 -static const struct cpumask *cpu_thread_mask(int cpu) +static const struct cpumask *cpu_thread_mask(struct sched_domain_topology_= level *tl, int cpu) { return &cpu_topology[cpu].thread_mask; } @@ -520,22 +520,32 @@ const struct cpumask *cpu_coregroup_mask(int cpu) return &cpu_topology[cpu].core_mask; } =20 -static const struct cpumask *cpu_book_mask(int cpu) +static const struct cpumask *cpu_mc_mask(struct sched_domain_topology_leve= l *tl, int cpu) +{ + return &cpu_topology[cpu].core_mask; +} + +static const struct cpumask *cpu_book_mask(struct sched_domain_topology_le= vel *tl, int cpu) { return &cpu_topology[cpu].book_mask; } =20 -static const struct cpumask *cpu_drawer_mask(int cpu) +static const struct cpumask *cpu_drawer_mask(struct sched_domain_topology_= level *tl, int cpu) { return &cpu_topology[cpu].drawer_mask; } =20 +static const struct cpumask *cpu_pkg_mask(struct sched_domain_topology_lev= el *tl, int cpu) +{ + return cpu_node_mask(cpu); +} + static struct sched_domain_topology_level s390_topology[] =3D { SDTL_INIT(cpu_thread_mask, cpu_smt_flags, SMT), - SDTL_INIT(cpu_coregroup_mask, cpu_core_flags, MC), + SDTL_INIT(cpu_mc_mask, cpu_core_flags, MC), SDTL_INIT(cpu_book_mask, NULL, BOOK), SDTL_INIT(cpu_drawer_mask, NULL, DRAWER), - SDTL_INIT(cpu_cpu_mask, NULL, PKG), + SDTL_INIT(cpu_pkg_mask, NULL, PKG), { NULL, }, }; =20 diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 33e166f6ab12..4cd3d69741cf 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -463,14 +463,36 @@ static int x86_core_flags(void) { return cpu_core_flags() | x86_sched_itmt_flags(); } + +static const struct cpumask *tl_smt_mask(struct sched_domain_topology_leve= l *tl, int cpu) +{ + return cpu_smt_mask(cpu); +} #endif + #ifdef CONFIG_SCHED_CLUSTER static int x86_cluster_flags(void) { return cpu_cluster_flags() | x86_sched_itmt_flags(); } +static const struct cpumask *tl_cls_mask(struct sched_domain_topology_leve= l *tl, int cpu) +{ + return cpu_clustergroup_mask(cpu); +} +#endif + +#ifdef CONFIG_SCHED_MC +static const struct cpumask *tl_mc_mask(struct sched_domain_topology_level= *tl, int cpu) +{ + return cpu_coregroup_mask(cpu); +} #endif =20 +static const struct cpumask *tl_pkg_mask(struct sched_domain_topology_leve= l *tl, int cpu) +{ + return cpu_node_mask(cpu); +} + /* * Set if a package/die has multiple NUMA nodes inside. * AMD Magny-Cours, Intel Cluster-on-Die, and Intel @@ -479,14 +501,14 @@ static int x86_cluster_flags(void) static bool x86_has_numa_in_package; =20 static struct sched_domain_topology_level x86_topology[] =3D { - SDTL_INIT(cpu_smt_mask, cpu_smt_flags, SMT), + SDTL_INIT(tl_smt_mask, cpu_smt_flags, SMT), #ifdef CONFIG_SCHED_CLUSTER - SDTL_INIT(cpu_clustergroup_mask, x86_cluster_flags, CLS), + SDTL_INIT(tl_cls_mask, x86_cluster_flags, CLS), #endif #ifdef CONFIG_SCHED_MC - SDTL_INIT(cpu_coregroup_mask, x86_core_flags, MC), + SDTL_INIT(tl_mc_mask, x86_core_flags, MC), #endif - SDTL_INIT(cpu_cpu_mask, x86_sched_itmt_flags, PKG), + SDTL_INIT(tl_pkg_mask, x86_sched_itmt_flags, PKG), { NULL }, }; =20 diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h index 5263746b63e8..602508130c8a 100644 --- a/include/linux/sched/topology.h +++ b/include/linux/sched/topology.h @@ -30,6 +30,8 @@ struct sd_flag_debug { }; extern const struct sd_flag_debug sd_flag_debug[]; =20 +struct sched_domain_topology_level; + #ifdef CONFIG_SCHED_SMT static inline int cpu_smt_flags(void) { @@ -172,7 +174,7 @@ bool cpus_equal_capacity(int this_cpu, int that_cpu); bool cpus_share_cache(int this_cpu, int that_cpu); bool cpus_share_resources(int this_cpu, int that_cpu); =20 -typedef const struct cpumask *(*sched_domain_mask_f)(int cpu); +typedef const struct cpumask *(*sched_domain_mask_f)(struct sched_domain_t= opology_level *tl, int cpu); typedef int (*sched_domain_flags_f)(void); =20 struct sd_data { diff --git a/include/linux/topology.h b/include/linux/topology.h index 33b7fda97d39..6575af39fd10 100644 --- a/include/linux/topology.h +++ b/include/linux/topology.h @@ -260,7 +260,7 @@ static inline bool topology_is_primary_thread(unsigned = int cpu) =20 #endif =20 -static inline const struct cpumask *cpu_cpu_mask(int cpu) +static inline const struct cpumask *cpu_node_mask(int cpu) { return cpumask_of_node(cpu_to_node(cpu)); } diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index 977e133bb8a4..dfc754e0668c 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -1591,7 +1591,6 @@ static void claim_allocations(int cpu, struct sched_d= omain *sd) enum numa_topology_type sched_numa_topology_type; =20 static int sched_domains_numa_levels; -static int sched_domains_curr_level; =20 int sched_max_numa_distance; static int *sched_domains_numa_distance; @@ -1632,14 +1631,7 @@ sd_init(struct sched_domain_topology_level *tl, int sd_id, sd_weight, sd_flags =3D 0; struct cpumask *sd_span; =20 -#ifdef CONFIG_NUMA - /* - * Ugly hack to pass state to sd_numa_mask()... - */ - sched_domains_curr_level =3D tl->numa_level; -#endif - - sd_weight =3D cpumask_weight(tl->mask(cpu)); + sd_weight =3D cpumask_weight(tl->mask(tl, cpu)); =20 if (tl->sd_flags) sd_flags =3D (*tl->sd_flags)(); @@ -1677,7 +1669,7 @@ sd_init(struct sched_domain_topology_level *tl, }; =20 sd_span =3D sched_domain_span(sd); - cpumask_and(sd_span, cpu_map, tl->mask(cpu)); + cpumask_and(sd_span, cpu_map, tl->mask(tl, cpu)); sd_id =3D cpumask_first(sd_span); =20 sd->flags |=3D asym_cpu_capacity_classify(sd_span, cpu_map); @@ -1732,22 +1724,48 @@ sd_init(struct sched_domain_topology_level *tl, return sd; } =20 +#ifdef CONFIG_SCHED_SMT +static const struct cpumask *tl_smt_mask(struct sched_domain_topology_leve= l *tl, int cpu) +{ + return cpu_smt_mask(cpu); +} +#endif + +#ifdef CONFIG_SCHED_CLUSTER +static const struct cpumask *tl_cls_mask(struct sched_domain_topology_leve= l *tl, int cpu) +{ + return cpu_clustergroup_mask(cpu); +} +#endif + +#ifdef CONFIG_SCHED_MC +static const struct cpumask *tl_mc_mask(struct sched_domain_topology_level= *tl, int cpu) +{ + return cpu_coregroup_mask(cpu); +} +#endif + +static const struct cpumask *tl_pkg_mask(struct sched_domain_topology_leve= l *tl, int cpu) +{ + return cpu_node_mask(cpu); +} + /* * Topology list, bottom-up. */ static struct sched_domain_topology_level default_topology[] =3D { #ifdef CONFIG_SCHED_SMT - SDTL_INIT(cpu_smt_mask, cpu_smt_flags, SMT), + SDTL_INIT(tl_smt_mask, cpu_smt_flags, SMT), #endif =20 #ifdef CONFIG_SCHED_CLUSTER - SDTL_INIT(cpu_clustergroup_mask, cpu_cluster_flags, CLS), + SDTL_INIT(tl_cls_mask, cpu_cluster_flags, CLS), #endif =20 #ifdef CONFIG_SCHED_MC - SDTL_INIT(cpu_coregroup_mask, cpu_core_flags, MC), + SDTL_INIT(tl_mc_mask, cpu_core_flags, MC), #endif - SDTL_INIT(cpu_cpu_mask, NULL, PKG), + SDTL_INIT(tl_pkg_mask, NULL, PKG), { NULL, }, }; =20 @@ -1769,9 +1787,9 @@ void __init set_sched_topology(struct sched_domain_to= pology_level *tl) =20 #ifdef CONFIG_NUMA =20 -static const struct cpumask *sd_numa_mask(int cpu) +static const struct cpumask *sd_numa_mask(struct sched_domain_topology_lev= el *tl, int cpu) { - return sched_domains_numa_masks[sched_domains_curr_level][cpu_to_node(cpu= )]; + return sched_domains_numa_masks[tl->numa_level][cpu_to_node(cpu)]; } =20 static void sched_numa_warn(const char *str) @@ -2411,7 +2429,7 @@ static bool topology_span_sane(const struct cpumask *= cpu_map) * breaks the linking done for an earlier span. */ for_each_cpu(cpu, cpu_map) { - const struct cpumask *tl_cpu_mask =3D tl->mask(cpu); + const struct cpumask *tl_cpu_mask =3D tl->mask(tl, cpu); int id; =20 /* lowest bit set in this mask is used as a unique id */ @@ -2419,7 +2437,7 @@ static bool topology_span_sane(const struct cpumask *= cpu_map) =20 if (cpumask_test_cpu(id, id_seen)) { /* First CPU has already been seen, ensure identical spans */ - if (!cpumask_equal(tl->mask(id), tl_cpu_mask)) + if (!cpumask_equal(tl->mask(tl, id), tl_cpu_mask)) return false; } else { /* First CPU hasn't been seen before, ensure it's a completely new spa= n */ --=20 2.34.1