From nobody Fri Dec 19 13:09:32 2025 Received: from CY7PR03CU001.outbound.protection.outlook.com (mail-westcentralusazon11010011.outbound.protection.outlook.com [40.93.198.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E4D1D2E92B4 for ; Mon, 8 Dec 2025 09:35:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.93.198.11 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765186552; cv=fail; b=SKcLD+k+P1q86mAX/tnzznd74liy6h44LptX9isIzMPaYAADd4MhVGDpUmAzsPmJCcqqcMXn8eCebodCzdy13GC64VEhbrkEF1OR/zIACY38sgJNGtxo3wd+w2r7iboduWt76OxTA+reRQWK25CBjhe+PaH9Uygos4ZH0kSNSHY= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765186552; c=relaxed/simple; bh=J6xSRLB4/4WusLQTejuDQOw601eAy7lWBCOMmmysBlI=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=NO5p7VKkpU7fR3xyYPepuhIog9k4pDPIk9+glfeogiC1XPO1gswzeDfTTTC2dSocq2WrL9GivUD4DcPjX+LMzXc7zjAaDvnzSRmyysrQFLC8H/d53FtsphS7JDmv8jxNQLsnYpJbHTofArRy7+kWWNG7B1MNQ+0U78jq/CNff9U= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com; spf=fail smtp.mailfrom=amd.com; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b=wn7gEKcR; arc=fail smtp.client-ip=40.93.198.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="wn7gEKcR" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=oigGfWThdUV0voCqNDrhO9cjY8wGwj93GqfIxBgDJH15xCkWJTPk/PJOu9hGAshu27jAQpe6MGP2/btsSNVifKppl2REHJRGPXKlxKCjYxdDYgzBF1hD6snqGO4ZtBwRw8TNjFsh8NwYoff0h5b0dqHppNgZCPcIE/+dKWPkH6SSOZTeV3f1GNuGEibIlEoN2+n7W99g0yoEEzZKXrwBprVTGBytmroIyIlznuutRiNYBVAV/4sXVuhmE3vtSmOdrykuJVOrihM6XtrdD5bBO88QAiqhcVpHTi5H1P+0KUCkU/8hKRu4XkbrLbjOvd5NiUIQw0f2pMqqZikzgFTSOg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=e56DlgsKt1m3ECclUTEFbPmmjQPXrotwgSy2l/XaBMc=; b=xsH0Z2GbdMckYDz7czRmR3bRP8uUjSCFcYmmlBYNn/pxKIXVMWgvq3nOMYjQ9TarlFcEAZ5YVVvX+ePSk2vEavUFHPkP6LbTSp6QSi8rFh8B8BnxPJpglnSxgxshIlf+oJ+d14IwSwqJWeDxyKio+cv8meFf9xewWnM6f0lzvhHhqKcF0KgjUaMv4QPWN+oDlqBrHy4X3kNTKrTpik7kszfOh1FwuVOPbj3yNd9HDm4Rj29fhmk00Uo0WXWwXcYcrXermm09J2kCrtUVRfMoPz3kyh5nsR81Gg2un1JqJ5sCKnKvC7Vie1GLsa+nFZpX5zVGUnv3/I1ngc31EOljIQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=redhat.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=e56DlgsKt1m3ECclUTEFbPmmjQPXrotwgSy2l/XaBMc=; b=wn7gEKcR+5Vvjj9wN2B//PxvSBERbrScd0qbToLvPXP2e0bngl0k+uCERFDx0QMQA8MKZIbr6lLoVZbFHzlHN5YXf9i5YeIQlvoATcZDQhuhQwZssbIXmOkxcYXuzXwkSLqpFtQVj2+ffJojJNWLAN/yd7e4JiCsMgKDYBdk1TE= Received: from CH0PR03CA0328.namprd03.prod.outlook.com (2603:10b6:610:118::14) by DS7PR12MB5863.namprd12.prod.outlook.com (2603:10b6:8:7a::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9388.14; Mon, 8 Dec 2025 09:35:43 +0000 Received: from DS3PEPF0000C37E.namprd04.prod.outlook.com (2603:10b6:610:118:cafe::7a) by CH0PR03CA0328.outlook.office365.com (2603:10b6:610:118::14) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9388.14 via Frontend Transport; Mon, 8 Dec 2025 09:35:20 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=satlexmb07.amd.com; pr=C Received: from satlexmb07.amd.com (165.204.84.17) by DS3PEPF0000C37E.mail.protection.outlook.com (10.167.23.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9412.4 via Frontend Transport; Mon, 8 Dec 2025 09:35:42 +0000 Received: from BLRKPRNAYAK.amd.com (10.180.168.240) by satlexmb07.amd.com (10.181.42.216) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.17; Mon, 8 Dec 2025 03:35:36 -0600 From: K Prateek Nayak To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Anna-Maria Behnsen , Frederic Weisbecker , Thomas Gleixner CC: , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , K Prateek Nayak , "Gautham R. Shenoy" , Swapnil Sapkal , Shrikanth Hegde , Chen Yu Subject: [RESEND RFC PATCH v2 27/29] [EXPERIMENTAL] sched/fair: Proactive idle balance using push mechanism Date: Mon, 8 Dec 2025 09:27:13 +0000 Message-ID: <20251208092744.32737-27-kprateek.nayak@amd.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20251208083602.31898-1-kprateek.nayak@amd.com> References: <20251208083602.31898-1-kprateek.nayak@amd.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: satlexmb07.amd.com (10.181.42.216) To satlexmb07.amd.com (10.181.42.216) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DS3PEPF0000C37E:EE_|DS7PR12MB5863:EE_ X-MS-Office365-Filtering-Correlation-Id: cd73f303-2880-4ac4-0a5d-08de363d2837 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|30052699003|36860700013|1800799024|82310400026|7416014|376014; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?dRl5sRlwPplGz/oZP5Nx8v1G3GiGL/wEQ4DyglMqjkxO73C4vD2VJaGz8k3Z?= =?us-ascii?Q?JM5m1MYhYS0qsPer4hISX1ymTAwirsLJCE7TSGIsHWWSl1BSFn1mIHDm9TqW?= =?us-ascii?Q?pu5+VbxW1m6jRkagZ1SVgA6/VWc0cfxsLd6Hjb+KP223IXJDXQbjszNi6FUx?= =?us-ascii?Q?0skDqvI0AonGkTXpS5LRFpPWHMJswcQCijfkMHBeTleY1pj9uvSp6pLeRCeG?= =?us-ascii?Q?Yx/mOoRgjnWjuk9uoktMtSY9BbwE0el9ppCC956+iKiJ4L7N68Q0Ui4uc5QA?= =?us-ascii?Q?Ru/vlqdWxAZ6Yb6901CsMKQaMAenXairKosOAuWXoNbBQbCRsTyI9mHZ54pN?= =?us-ascii?Q?vKczqIQYxEUVzIIvhBhgNbyiojVuYRNCMpGjLCCRf05dVYkKkgj1hgfppAHl?= =?us-ascii?Q?nWWeuzO3w+wy8//DAU37nQeHQqYUebLKn4zT1mRSaIkwMoXGLlgzF97OB/7e?= =?us-ascii?Q?QEHb5fmSWJcu3vdPOs+6w8fBoPwKfuhJ0B6bGCj/s1IdFy/r1uTZrqQwh3DX?= =?us-ascii?Q?E54MMuJKbNBRPRL0oWIUBwp1Tt/cv43rIxnnnhbtX2MGAY+EH7nb4fmvU0yQ?= =?us-ascii?Q?1tI/LRu3r+GJZBzSZeboZTamrjd2i3/6+mZonMOFW0tnZJ2OGtxDMtDPRAZK?= =?us-ascii?Q?LUQzHRoYigjMCDw/tAAykWtnEEAMZBAWJjlCf/UBLAh1OyEy+Mrw4b5saaBk?= =?us-ascii?Q?4Z7H+qwBoCwE9G1VIv+WtFjIypok9XlipMUwWkciNNq0YlEeoGPIc+AokL9a?= =?us-ascii?Q?4SltWSA+H3EPDhExV+dQUApiT86ocJD3QgCkc7nTsv7VxGRXct7iczOLKPub?= =?us-ascii?Q?06nu9+0sDmma5PWJbSpnp3/zxN9XEfpoh8ulQvkElLomJV57bwu7rjqW+0W9?= =?us-ascii?Q?4cDBCUA4+bWSdDnoY0NJG82nTuIbRndVICl3B5ClSzTGDklBBbqPl1OaFG43?= =?us-ascii?Q?a5ynnCuIBxEedN2/zzLwYh+tvLDamTfqF6WdyaUmVxbe1xv3ASXzH45HDAdc?= =?us-ascii?Q?CdNtzMT4qADagi1pnnLMHJRYc71ldrd9wdhxgJpVt62iIx0KyLH1EqzvYEpD?= =?us-ascii?Q?U2z3LE2EPAtEsXoCHXXw7dXmduZYwDHoTHxXPWTSWtuHtqJHfMcpNO1qVEMX?= =?us-ascii?Q?3WgC0xVsx8l6BESpwia14DbLd8/k4j4JSt7luO7F+u4AyPLBJBg3qtfR+Fm3?= =?us-ascii?Q?UrB/gAHSrhRK5uJ2OtjpGcE62yaadJMi4cF7wz+xLbfAs3UqUbQYA1J4ZXjz?= =?us-ascii?Q?cl2l+tWLf3YS8YNYOYWFIK5C2JEOS2lX/KbW1STLjMq8LwIn2044acuJmDAh?= =?us-ascii?Q?XVAtNKXnmnKBjbrc99j5w0qYHAdOAdXVFrDZ3k3UB/Q/UwlMS+021xGGJMEw?= =?us-ascii?Q?ycxhQMulV7oR4qhB4qkLh1uz3rnfcIgrUppLIqt94i7wr0aeSyADgShCqx/U?= =?us-ascii?Q?oqPIcvUtVizkbcaK178hKU3TN3GJ+AQDhX3oRhznYnhvuJcNILzUbiebGI8/?= =?us-ascii?Q?3emcBtiL11MtSKmxyHO5TSP1ZB67osScmD7We6KhawxsTUM+6ZRlvHpcX57R?= =?us-ascii?Q?EerMrp3WJa9luY0LWBH2oH1HfWlCBSAoEohBAgcp?= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:satlexmb07.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(30052699003)(36860700013)(1800799024)(82310400026)(7416014)(376014);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 08 Dec 2025 09:35:42.8046 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: cd73f303-2880-4ac4-0a5d-08de363d2837 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[satlexmb07.amd.com] X-MS-Exchange-CrossTenant-AuthSource: DS3PEPF0000C37E.namprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS7PR12MB5863 Content-Type: text/plain; charset="utf-8" Proactively try to push tasks to one of the CPUs in the sd_nohz domain if "nr_idle_cpus" indicator indicates the presence of idle CPUs. pick_next_pushable_fair_task() is taken from Vincent's series [1] as is but the locking rules in push_fair_task() has been replaced with an IPI based __ttwu_queue_wakelist(). Few additional checks have been added to catch any corner cases with proxy execution where neither the current, not the donor is pushable. For the sake of this PoC, the __ttwu_queue_wakelist() based mechanism is being used as is. If folks are in agreement with wider use of the wakelist based migration, we can work on making this wakelist based activation path more generic to be used for migrations too. Although it is logical to traverse the "sd_nohz->shared->nohz_idle_cpus" only, in testing, traversing the entire span was found to be more beneficial. Link: https://lore.kernel.org/all/20250302210539.1563190-6-vincent.guittot@= linaro.org/ [1] Signed-off-by: K Prateek Nayak --- kernel/sched/core.c | 2 +- kernel/sched/fair.c | 93 +++++++++++++++++++++++++++++++++++++++++++- kernel/sched/sched.h | 1 + 3 files changed, 94 insertions(+), 2 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 35cb640b7266..388805c4436c 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3769,7 +3769,7 @@ bool call_function_single_prep_ipi(int cpu) * via sched_ttwu_wakeup() for activation so the wakee incurs the cost * of the wakeup instead of the waker. */ -static void __ttwu_queue_wakelist(struct task_struct *p, int cpu, int wake= _flags) +void __ttwu_queue_wakelist(struct task_struct *p, int cpu, int wake_flags) { struct rq *rq =3D cpu_rq(cpu); =20 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index e6ba7bb09a61..34aeb8e58e0b 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -13079,12 +13079,103 @@ static inline int has_pushable_tasks(struct rq *= rq) return !plist_head_empty(&rq->cfs.pushable_tasks); } =20 +static struct task_struct *pick_next_pushable_fair_task(struct rq *rq) +{ + struct task_struct *p; + + if (!has_pushable_tasks(rq)) + return NULL; + + p =3D plist_first_entry(&rq->cfs.pushable_tasks, + struct task_struct, pushable_tasks); + + WARN_ON_ONCE(rq->cpu !=3D task_cpu(p)); + WARN_ON_ONCE(task_current(rq, p)); + WARN_ON_ONCE(task_current_donor(rq, p)); + WARN_ON_ONCE(p->nr_cpus_allowed <=3D 1); + WARN_ON_ONCE(!task_on_rq_queued(p)); + + /* + * Remove task from the pushable list as we try only once after that + * the task has been put back in enqueued list. + */ + plist_del(&p->pushable_tasks, &rq->cfs.pushable_tasks); + + return p; +} + +static inline bool should_push_tasks(struct rq *rq) +{ + struct sched_domain_shared *sds; + struct sched_domain *sd; + int cpu =3D cpu_of(rq); + + /* TODO: Add a CPU local failure counter. */ + + /* CPU doesn't have any fair task to push. */ + if (!has_pushable_tasks(rq)) + return false; + + /* CPU is overloaded! Do not waste cycles pushing tasks. */ + if (!fits_capacity(cpu_util_cfs(cpu), capacity_of(cpu))) + return false; + + guard(rcu)(); + + sd =3D rcu_dereference(per_cpu(sd_nohz, cpu)); + if (!sd) + return false; + + /* + * We may not be able to find a push target. + * Skip for this tick and depend on the periodic + * balance to pull the queued tasks. + */ + sds =3D sd->shared; + if (!sds || !atomic_read(&sds->nr_idle_cpus)) + return false; + + return true; +} + /* * See if the non running fair tasks on this rq can be sent on other CPUs * that fits better with their profile. */ static bool push_fair_task(struct rq *rq) { + struct task_struct *p =3D pick_next_pushable_fair_task(rq); + struct sched_domain_shared *sds; + int cpu, this_cpu =3D cpu_of(rq); + struct sched_domain *sd; + + if (!p) + return false; + + guard(rcu)(); + + sd =3D rcu_dereference(per_cpu(sd_nohz, cpu)); + if (!sd) + return false; + + /* + * It is possble to have idle CPUs with ticks enabled. To maximize the ch= ance + * of pulling a task, traverse the entire sched_domain_span() instead of = just + * the sd->shared->nohz_idle_cpus. + */ + for_each_cpu_and_wrap(cpu, p->cpus_ptr, sched_domain_span(sd), this_cpu += 1) { + struct rq *target_rq; + + if (!idle_cpu(cpu)) + continue; + + target_rq =3D cpu_rq(cpu); + deactivate_task(rq, p, 0); + set_task_cpu(p, cpu); + __ttwu_queue_wakelist(p, cpu, 0); + return true; + } + return false; } =20 @@ -13099,7 +13190,7 @@ static DEFINE_PER_CPU(struct balance_callback, fair= _push_head); =20 static inline void fair_queue_pushable_tasks(struct rq *rq) { - if (!has_pushable_tasks(rq)) + if (should_push_tasks(rq)) return; =20 queue_balance_callback(rq, &per_cpu(fair_push_head, rq->cpu), push_fair_t= asks); diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 91928a371588..451666753c2a 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2939,6 +2939,7 @@ static inline void __block_task(struct rq *rq, struct= task_struct *p) =20 extern void activate_task(struct rq *rq, struct task_struct *p, int flags); extern void deactivate_task(struct rq *rq, struct task_struct *p, int flag= s); +void __ttwu_queue_wakelist(struct task_struct *p, int cpu, int wake_flags); =20 extern void wakeup_preempt(struct rq *rq, struct task_struct *p, int flags= ); =20 --=20 2.43.0