From nobody Fri Dec 19 13:09:33 2025 Received: from SJ2PR03CU001.outbound.protection.outlook.com (mail-westusazon11012049.outbound.protection.outlook.com [52.101.43.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4CA4331CA5B for ; Mon, 8 Dec 2025 09:30:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=52.101.43.49 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765186226; cv=fail; b=dovwoiP1YKdfVp/08CQlLmXkzRIHIX/NChwoah/3VU6qg1Q6Qo1cxzi4zwMVWn7xE7k/4UE8KijV8HEkVklmM0a/uBGI6+nlse2EwY++PAFzbMJgwQYl+C0duA4aRktG53hL8/Sk7hG1L51ihMwiv+iuPWWRbNT+7g9VjRtWcvo= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765186226; c=relaxed/simple; bh=60ncl4ih8REHntNA+WLH0axiSE1uRpJJW49rMn9KQSY=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=JDA8jjeH+WY8fIpsNWA1ib3+WSpMiHsO62wPKavonXheYcbFHXlTgcbmqQjJG8p7GvL7e9lmHYl8XP5dWMIa3/lmnHpCxB8dk5PNRHT/m9diiZQhpV3+tPGp/qHbtgop8a1/nKbRDcazBk5cB165hIj7AxdKYTtrf5IzejLF1X8= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com; spf=fail smtp.mailfrom=amd.com; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b=Fco73yrX; arc=fail smtp.client-ip=52.101.43.49 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="Fco73yrX" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=rzzOYXCVJnOEka4uOL2sZeVQBsft4Ad48NBOcqAf4v4VmLAh+GEcRTSZ9Cfl+lxt9DprQDH1GTuB6GhmLsLCB+7a8FX+fWX6VJv0MtkOUsi/gDOZeaxpfCzb1Ss/Fmp/kBfaqhzHwpQROVo5eAZPYTRk6RycMC1zaJhCXHWv6O01kaceKRlLqEKEKgaYlw9S+4k3xu/RfKe93wGzaJKKzs/GOo1Fbcv48Nnaiy3Jrf8iuIcoVM3AC1NZuT8w7P//EmNx9r/ONFwVX8NALmYITSWu/JNSgdV4MZWPPxxf7YB2A+fKugnK8QbMQVsaMa+aQEc7SAvruWdRvSysaXPfhg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=KygkvPnRhdKYLlDfdoR7EVU/8Djrd4j2+12sGZUfzYE=; b=aUcZtYWdBhBoe1/tR2Kai5T+f/Q/FgR8n0jdi/E8P6t/aVMfXw6BONOIKEpGXE3PqwTdS0bWe7JiUKyt/jKIRg15EO4MWz0tByXT0LCLQMpg3ZfuCiMEo2/i2ZPtEPFI9jWC+v1ZnqGn/jTNc4+M3DsKHIoNo+OdJ03yTThZuYj0zDwhq4T2NOkeXxdlGseFBxnEvknV6m5pKhdh983ELS1BS30Mr2pX8owruyPWw8RpoxkbcOl+rkGcEDW+Cim5j4tGEVL9fzsf50CKG6UEfxLXNLThk2tUw8dPKYB6+jZP1T3Lx7OTe1P/yf7eECrNVIa2NhL1eomDcRn1+gYxdw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=redhat.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=KygkvPnRhdKYLlDfdoR7EVU/8Djrd4j2+12sGZUfzYE=; b=Fco73yrXCWn7L37UncmYgBW9kMrFCF1siiMId21wITCmw+RFturHL+piKku2IIcnryLN1KfLIUgd9Ob6NSr+Fr7eiDc+9xAckeRSnRH+U1bCcukyzm+3Li4bjwvjc0Ls+u3JHK7gPUF2kLf5dgS2G+Lwo1joLEIUvZw+JEVwpd0= Received: from MW4PR04CA0352.namprd04.prod.outlook.com (2603:10b6:303:8a::27) by MN6PR12MB8469.namprd12.prod.outlook.com (2603:10b6:208:46e::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9388.14; Mon, 8 Dec 2025 09:30:21 +0000 Received: from CO1PEPF000042A8.namprd03.prod.outlook.com (2603:10b6:303:8a:cafe::ec) by MW4PR04CA0352.outlook.office365.com (2603:10b6:303:8a::27) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9388.14 via Frontend Transport; Mon, 8 Dec 2025 09:30:20 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=satlexmb07.amd.com; pr=C Received: from satlexmb07.amd.com (165.204.84.17) by CO1PEPF000042A8.mail.protection.outlook.com (10.167.243.37) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9388.8 via Frontend Transport; Mon, 8 Dec 2025 09:30:20 +0000 Received: from BLRKPRNAYAK.amd.com (10.180.168.240) by satlexmb07.amd.com (10.181.42.216) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.17; Mon, 8 Dec 2025 03:30:14 -0600 From: K Prateek Nayak To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Anna-Maria Behnsen , Frederic Weisbecker , Thomas Gleixner CC: , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , K Prateek Nayak , "Gautham R. Shenoy" , Swapnil Sapkal , Shrikanth Hegde , Chen Yu Subject: [RESEND RFC PATCH v2 09/29] sched/fair: Rotate the CPU resposible for busy load balancing Date: Mon, 8 Dec 2025 09:26:55 +0000 Message-ID: <20251208092744.32737-9-kprateek.nayak@amd.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20251208083602.31898-1-kprateek.nayak@amd.com> References: <20251208083602.31898-1-kprateek.nayak@amd.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: satlexmb07.amd.com (10.181.42.216) To satlexmb07.amd.com (10.181.42.216) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CO1PEPF000042A8:EE_|MN6PR12MB8469:EE_ X-MS-Office365-Filtering-Correlation-Id: a304dbd1-0c98-492f-7634-08de363c6827 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|82310400026|30052699003|1800799024|36860700013|376014|7416014; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?tntAmlfSd1aZuu+aJU24OyRmWBFSCMznNirvkcOAy00RSWm3m8BW25rOnja3?= =?us-ascii?Q?mbK9xSWnBwGNf2Bhwt2K7/TTnT4EYYFPgflA6Lss0xaozJf80AIZQOGup+Tg?= =?us-ascii?Q?Ioo30I1UifBCvFRmRMZ7+p3oi0ROrPcVaj9Lg/kLUW1bBbr0wmJ0MnHRboS5?= =?us-ascii?Q?LQAmHz5HVayU0/QiNSwi36dHuCFogAGKvyeZ37PA7XcLwPIGtt8j1UF47y26?= =?us-ascii?Q?4CluS9lAdhBY3mynvN0LWPGq2FIYWJQ5DsIAkpCGAHld/bPnJgbwXaXJvd7t?= =?us-ascii?Q?JZGVA0jxEuGSpqlXEUZkRmkFnuVeFUBLQwkLmSXyt6nTOlMxBAxp1g2MVbsX?= =?us-ascii?Q?2MxPsDx/19l7r4FPUjnuZy2pQR7dIbTG1eb56fjTChPLT5XmOurtNwDnfxVX?= =?us-ascii?Q?XvZp8fXrqmkPGtHJN2KgMU+jlP0hkk5pxzAuM6CoscKMw+09ileN+2nQet8N?= =?us-ascii?Q?fEaZ4RqjAiekFHWcoUfHNxmeqDJ9iSBDVZzg3G8B2dR8kJyhz9IjpDrjE8XI?= =?us-ascii?Q?hcjddpWSas+9nnRzCigiRk4+ynercTfP/a+2ggWz1bR2m9DHEkzzU71ddI7J?= =?us-ascii?Q?Mwom8rbuZeNffEY+v1KWcidg6vdmVyxa1jnun+yJE0CP80cWADg75pMbHx/a?= =?us-ascii?Q?GFGgQz3MwCZjstOIYwY7N0my1OpfAAZYsqMbK/J4OVYkGQAAvoAoxqzD6uYo?= =?us-ascii?Q?+kftcagKW7e+QJm9y3QkS32V3Q2MKH4+XQPaiiMz1JqepUfFqoG7mQhK7unu?= =?us-ascii?Q?q/DAK1vT0ORJfgyKlrOxyNBaWzzl+z/qxF4v0HDwOe8+G6pSgv9A81X+r8k3?= =?us-ascii?Q?oJjCRxq42uZVhCT4QIN3kg/0znHtUiX2MIPo9qqIiJ04vnTv8zEUfJjQq7mL?= =?us-ascii?Q?zKCoMn8++gkGeTzgg14Fsgxzzis+HMGgciwBHV5qMZV+jzXJ5+JZ9o0QCHDg?= =?us-ascii?Q?f41LUKouEBpEUlzIPZE/pbK78C01eo7XUbak/RcFBOcC1eqKfHh78mnyfhAA?= =?us-ascii?Q?1TN160zjpoBrNPyFEwQtxLSMUQDgleXXwUWRQl1iL8k4bYPC2eAbkdId2IIQ?= =?us-ascii?Q?E/Xvnd2M/KbP3YBCfCp/YRJjgCEBayEIzHXHpEnv3qE690ZbjogkuKgUN0x4?= =?us-ascii?Q?TDPc2vOoGiOPdPbq8/qYHqCaoufLzU6QdUGOOznjwf3XRmZQwFR4ie7ubSzK?= =?us-ascii?Q?PTTFLZ6TC4S1h2eX99UOD3KzUbQlP9rfAc2faXeOHGmjHupaCfQLbg+2Ezla?= =?us-ascii?Q?PMzIVh3+1BLE8cvwRSxpERU8TPFgTjUkJGUpGxIf13vz3tuA854TKo/LWVC2?= =?us-ascii?Q?Ycrejb1nSUZ4o9YzbvXVEyPiE9KbzL7ogkH5bTmZnRzSQiITrMiSPVvOfok6?= =?us-ascii?Q?yj09vV2IrTzMP6vVt2KrbfiY5/SjX1vACSKvseb4NlyWezFMJLDKJc6DbXf8?= =?us-ascii?Q?gkvxPUbwcXfiXwaSdzTXSj9JkYTEJia2WJZhPGnS6bNzzFAF6AN/BnJef2qM?= =?us-ascii?Q?zQxJ17Iu+RQEtx5nDeDnYIFICAisDDXVcQ1GGrvFOyfVlybfgNApq93DkZSh?= =?us-ascii?Q?2eRzsuNx7Iaoa0qB1+U=3D?= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:satlexmb07.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(82310400026)(30052699003)(1800799024)(36860700013)(376014)(7416014);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 08 Dec 2025 09:30:20.5212 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: a304dbd1-0c98-492f-7634-08de363c6827 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[satlexmb07.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CO1PEPF000042A8.namprd03.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN6PR12MB8469 Content-Type: text/plain; charset="utf-8" The group_balance_cpu() currently always returns the fist CPU from the group_balance_mask(). This puts the burden of busy balancing on the same set of CPUs when the system is under heavy load. Rotate the CPU responsilble for busy load balancing across all the CPUs in group_balance_cpu(). The "busy_balance_cpu" in "sg->scg" shows the CPU currently responsible for busy balancing in the group. Since "sg->sgc" is shared by all the CPUs of group_balance_cpu(), all CPUs of group will see the same "busy_balance_cpu". The current "busy_balance_cpu" is responsible for updating the shared variable with the next CPU on the mask once it is done attempting balancing. Although there is an unlikely chance of the current "busy_balance_cpu" being unable to perform load balancing in a timely manner if it is running with softirqs disabled, it is no worse than current scenario where the first CPU of group_balance_mask() could also be unavailable for a long time to perform load balancing. Any hotplug / cpuset operation will rebuild the sched domain hierarchy which will reset the "busy_balance_cpu" to the first CPU on the updated group_balance_mask(). Signed-off-by: K Prateek Nayak --- kernel/sched/fair.c | 24 +++++++++++++++++++++++- kernel/sched/sched.h | 1 + kernel/sched/topology.c | 5 ++++- 3 files changed, 28 insertions(+), 2 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 8f5745495974..e3935903d9c5 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -11747,7 +11747,7 @@ static int should_we_balance(struct lb_env *env) if (idle_smt !=3D -1) return idle_smt =3D=3D env->dst_cpu; =20 - /* Are we the first CPU of this group ? */ + /* Are we the busy load balancing CPU of this group ? */ return group_balance_cpu(sg) =3D=3D env->dst_cpu; } =20 @@ -11773,6 +11773,22 @@ static void update_lb_imbalance_stat(struct lb_env= *env, struct sched_domain *sd } } =20 +static void update_busy_balance_cpu(int this_cpu, struct lb_env *env) +{ + struct sched_group *group =3D env->sd->groups; + int balance_cpu =3D group_balance_cpu(group); + + /* + * Only the current CPU responsible for busy load balancing + * should update the "busy_balance_cpu" for next instance. + */ + if (this_cpu !=3D balance_cpu) + return; + + balance_cpu =3D cpumask_next_wrap(balance_cpu, group_balance_mask(group)); + WRITE_ONCE(group->sgc->busy_balance_cpu, balance_cpu); +} + /* * This flag serializes load-balancing passes over large domains * (above the NODE topology level) - only one load-balancing instance @@ -12075,6 +12091,12 @@ static int sched_balance_rq(int this_cpu, struct r= q *this_rq, out: if (need_unlock) atomic_set_release(&sched_balance_running, 0); + /* + * If this was a successful busy balancing attempt, + * update the "busy_balance_cpu" of the group. + */ + if (!idle && continue_balancing) + update_busy_balance_cpu(this_cpu, &env); =20 return ld_moved; } diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index b419a4d98461..659e712f348f 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2100,6 +2100,7 @@ struct sched_group_capacity { unsigned long max_capacity; /* Max per-CPU capacity in group */ unsigned long next_update; int imbalance; /* XXX unrelated to capacity but shared group state */ + int busy_balance_cpu; =20 int id; =20 diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index 14be90af9761..8870b38d4072 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -810,7 +810,7 @@ enum s_alloc { */ int group_balance_cpu(struct sched_group *sg) { - return cpumask_first(group_balance_mask(sg)); + return READ_ONCE(sg->sgc->busy_balance_cpu); } =20 =20 @@ -992,6 +992,8 @@ static void init_overlap_sched_group(struct sched_domai= n *sd, cpu =3D cpumask_first(mask); =20 sg->sgc =3D *per_cpu_ptr(sdd->sgc, cpu); + sg->sgc->busy_balance_cpu =3D cpu; + if (atomic_inc_return(&sg->sgc->ref) =3D=3D 1) cpumask_copy(group_balance_mask(sg), mask); else @@ -1211,6 +1213,7 @@ static struct sched_group *get_group(int cpu, struct = sd_data *sdd) =20 sg =3D *per_cpu_ptr(sdd->sg, cpu); sg->sgc =3D *per_cpu_ptr(sdd->sgc, cpu); + sg->sgc->busy_balance_cpu =3D cpu; =20 /* Increase refcounts for claim_allocations: */ already_visited =3D atomic_inc_return(&sg->ref) > 1; --=20 2.43.0