From nobody Fri Dec 19 13:09:33 2025 Received: from MW6PR02CU001.outbound.protection.outlook.com (mail-westus2azon11012057.outbound.protection.outlook.com [52.101.48.57]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B254C32B9A1 for ; Mon, 8 Dec 2025 09:35:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=52.101.48.57 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765186529; cv=fail; b=pHNxXP+eCGpgeGe2Gj0yRMYFNVgm7J+WnWkmCNBgBCOMXalusYk96zvOBcwew2mr6+MrZWrfxwSIu9JwuEKTwx9fojUKRwqx/pa+epBbPtDBYgKMhLgFxGEfd5bcFEoKatigNLSsSZ/POrrvncieNbwzIsPp19ren3MgH8X7ebQ= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765186529; c=relaxed/simple; bh=5T1/gnEF3AJItIobXdb1qJufeG2IwP5MYCT2AWQ6SiU=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=r9IMsy0Kefk6oq9mCpK107LXe+qN2J8ZDq6rfMT6Pw2wiayaTMil1hipv4rXmGI1Fjnab9ExmUe+eZvKSYaNmibS/5JHabpXfq9Cyf9lrz2bnzsytfeSQc8GfcRz7AM+Jtjc3JvpKswHByaTWyGOilGZ57gXCJ2PDxYqWJPNGX4= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com; spf=fail smtp.mailfrom=amd.com; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b=DBMRMdtS; arc=fail smtp.client-ip=52.101.48.57 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="DBMRMdtS" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=UIdKkGce8UxiUQKYtsMUN0QKYJ/7plHfq4SY82HazIlBkfoqQyChlvQOcAXtVWAWYuTNRcpSiOAYR26Q839cw/YP1SghBby1y+4UnLOg9dxhfo+RS1i0JLBOk+KFCxABuUIJnJglEhdN1+j1SXqDvJHBQyEhK933KvOaCnF3dqXffJaN7xlbd7vJh7By+L85ohabMrkMCDUj96qT1zJvnGJmpO0e1+xINXEUfymHCJ7vUDlNQDiftwUOkbJJwzuTIC3OwIa1DeadtFawYNIOlpkpno0a1pcf8gGuem1UtypkitgHzWr8t1eMATx53EF2JZrAUX+zj9CpDjrAdqxb8w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=8ZSQoBULtorlWlhy1KAppYsKl4NIp4AmygiUvDkNj/A=; b=Csuh0+vnxMvju0F5zIPqAVxd1cJGf53J2KJ++8dFiqaZ5YOQ8lqPDChx4VU4j1nBLGf+UZkO/4ESKgwPRvnxMTKWm2TXw4aDXoziO06cKscEyI3B3o+SqT5ThJVS02gF2raiol7D1bKpqI8xGjo/UMSAoH5J7cXBX2SVnBsKAa7j59y2E6wpJ0b4NpDWBwj3AWqQq+/DuS0DOSYMakccKGGb1hSq3PSkQBFrBPfC5omIwXdmMJZJikaqSSHL4BATn0N3C05nxKLuODqronXqMQu0BIIlVqwn/YqpZEiZJlT/qkj+fIvAt5PsyfortY2NHSw5xu0WSlhgZXz7nn4QlA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=redhat.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=8ZSQoBULtorlWlhy1KAppYsKl4NIp4AmygiUvDkNj/A=; b=DBMRMdtSpso47Qr778cQcjEEXHgvCcxAkoBW+/FZ9Pp3nqddKpj19PzfrV72icANVkOyf9iNgHaWHOg5ICi8K9Jc6fpFnexeHMhpLvnW7S1CLK+6BdnizWXOujHZsZ+Ukk7iITa8HA0YUY6sI7lVhS8ytvrAn8AlzAPt+uUodKc= Received: from DM6PR01CA0020.prod.exchangelabs.com (2603:10b6:5:296::25) by MW6PR12MB9017.namprd12.prod.outlook.com (2603:10b6:303:23b::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9388.9; Mon, 8 Dec 2025 09:35:24 +0000 Received: from DS3PEPF0000C37F.namprd04.prod.outlook.com (2603:10b6:5:296:cafe::26) by DM6PR01CA0020.outlook.office365.com (2603:10b6:5:296::25) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9388.14 via Frontend Transport; Mon, 8 Dec 2025 09:34:53 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=satlexmb07.amd.com; pr=C Received: from satlexmb07.amd.com (165.204.84.17) by DS3PEPF0000C37F.mail.protection.outlook.com (10.167.23.9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9412.4 via Frontend Transport; Mon, 8 Dec 2025 09:35:24 +0000 Received: from BLRKPRNAYAK.amd.com (10.180.168.240) by satlexmb07.amd.com (10.181.42.216) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.17; Mon, 8 Dec 2025 03:35:18 -0600 From: K Prateek Nayak To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Anna-Maria Behnsen , Frederic Weisbecker , Thomas Gleixner CC: , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , K Prateek Nayak , "Gautham R. Shenoy" , Swapnil Sapkal , Shrikanth Hegde , Chen Yu Subject: [RESEND RFC PATCH v2 26/29] [EXPERIMENTAL] sched/fair: Add push task framework Date: Mon, 8 Dec 2025 09:27:12 +0000 Message-ID: <20251208092744.32737-26-kprateek.nayak@amd.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20251208083602.31898-1-kprateek.nayak@amd.com> References: <20251208083602.31898-1-kprateek.nayak@amd.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: satlexmb07.amd.com (10.181.42.216) To satlexmb07.amd.com (10.181.42.216) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DS3PEPF0000C37F:EE_|MW6PR12MB9017:EE_ X-MS-Office365-Filtering-Correlation-Id: 31c74122-0377-4629-68a0-08de363d1d25 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|36860700013|30052699003|82310400026|1800799024|7416014|376014|13003099007; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?TR/edb7Xzm3q3Mj6k3fe5SO+kVS4Eh91+73DuYUkaMYd6Y0pbBBB8x8WRUDY?= =?us-ascii?Q?B79sSNp6vzRmaZ376NIMlSSlS91FGBCNQXlVSC7qg8RwDdmgEE4mkkg3mcmr?= =?us-ascii?Q?prisO8TSW3E0LjlxsMY+wLpSrj/XSHpV9bzAF9//buWi2EgWCvLHIwrnmBET?= =?us-ascii?Q?YwEOYNQvZNK6sOW5Y527dSQIEsvyYrP8k5sarEVIsGiXLDy2NMvWujfQJ2/G?= =?us-ascii?Q?lSt5XX1yIQ/MuFWHDTcC7F1mlXXSl6Pz+tfqZh2v98aPc2+9VlDZhDQpHNsa?= =?us-ascii?Q?kGUYJXnx56lw95Vk67rxHuziLhL5VSxMX4gH/S6AZRf6njEObG8wvqO2QE4r?= =?us-ascii?Q?RItNJwpI+7acrwt2ohZ/g2XwTbpAhX1g2HbaBI75883Beic+7s5h3CSzANDY?= =?us-ascii?Q?ccXPXewv/9LCNv9wW9h1CFqxVZ30rIBxM5Qu/gJ2Dgt2sGTIERlkA1uYxHvu?= =?us-ascii?Q?7ai/e4wXrLxui+FS/jCJG7YPLCOa3vx9tFhnX9M5r5i44PEgICXdKQh3215T?= =?us-ascii?Q?hGwNG2l4gZqsGMp1ZtZuj8YDWfvs9NUm3fXL5HIdWd/XwWoQ8xd9FAwDdMdd?= =?us-ascii?Q?5lntMiAt5S5yxOHpTD5KVA/Ij7vzgtCoHQIQ+QNfTHZfjMVVIRpIIKfCWmfj?= =?us-ascii?Q?nwRvtky5hdvivz9P9aZq2LawPZXBI+rNPVzfQirdlkoh73EPE9tnK1PZUw/r?= =?us-ascii?Q?Ty9mU/GjCzkgbT66Tb3WbHOXmeb1CQOUxVSpIBTEFR9/ekdcYDrkCI9nA5N1?= =?us-ascii?Q?HsGq0N81E4x9RN19fHo5Br7HaGJLtXzkEGyIh/WPeQ+8mOfXObOg5iaBMdyj?= =?us-ascii?Q?84zPfEk52i5NWBSI7qbMiAiZg9yh5fooZScCrsIuABwxMfj58cuu5uT1LKWY?= =?us-ascii?Q?1ui+AwHoVP0bQwFvSP1ltNpIGxS4MPkxjwbWR+IDPv4/EtxzUpYQn7dbNw40?= =?us-ascii?Q?qEu2gXmeMdUOHOhQv0vdpOVJtJdNQmFyEh9m3JB0O+A1qdiz/4wt8B0ggT1K?= =?us-ascii?Q?A7ZbduDecS9GL4UQ+oEt/grNzaPa9eaCuAxl8wGbW9/ETR84MyPqz0N1yBGw?= =?us-ascii?Q?WjuFyrvYHDIXKOspwYJHm9l1lB+TvM2gL+SRzV9gff+HKKglRDgH3pOkQk3C?= =?us-ascii?Q?gVqCnzEfrrQffchtTy0GfufsOPfCz0FfH7rbxdl12G4/y3pi0SarZZTftvfT?= =?us-ascii?Q?IHfgWChLVfRQ6U+uTFiwCSv2Be8HmAaWLf76cabl2lUwJL8yNtlZhHzWtbZF?= =?us-ascii?Q?f4ATOtIKfRRvAEmGOGnSLosImotpIwUGKgBJcQP32rzHAWDw/fHNPU+tmyZD?= =?us-ascii?Q?9XyJLt27IMyllwmMcTL+V7cjG4xnbQJ5pfLW/aZoaxrMtnRwOmhPyrkZ7v9x?= =?us-ascii?Q?SPyaD4p8y90i5gX8J6e4bDRyAyCtsSBNQxAW1pMfhiV0anbrIhCUb5ZTu47N?= =?us-ascii?Q?YzoCN9JeNtQgiZiBfIkLYw968IGwxPIWnstxrA4M2Aiw+0+Y2/CHkExMB+j3?= =?us-ascii?Q?wwWz9/4boo0PtI2QH4zimVlh9hQloAyOSKk+fSRsdQ+pH+7H+Oof6Fgx/A5X?= =?us-ascii?Q?Vmdtjd7PMp7YB0zHGf8=3D?= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:satlexmb07.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(36860700013)(30052699003)(82310400026)(1800799024)(7416014)(376014)(13003099007);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 08 Dec 2025 09:35:24.2295 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 31c74122-0377-4629-68a0-08de363d1d25 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[satlexmb07.amd.com] X-MS-Exchange-CrossTenant-AuthSource: DS3PEPF0000C37F.namprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: MW6PR12MB9017 Content-Type: text/plain; charset="utf-8" From: Vincent Guittot Add the skeleton for push task infrastructure. The empty push_fair_task() prototype will be used to implement proactive idle balancing in subsequent commits. [ prateek: Broke off relevant bits from [1] ] Link: https://lore.kernel.org/all/20250302210539.1563190-6-vincent.guittot@= linaro.org/ [1] Signed-off-by: Vincent Guittot Signed-off-by: K Prateek Nayak --- Peter, the plist is still being used since a plist node already exists in the task_struct which can be reused. Depending on the collective push effort, we can either settle on the reuse of the plist_node or add a new list_head for fair tasks. --- kernel/sched/fair.c | 102 +++++++++++++++++++++++++++++++++++++++++++ kernel/sched/sched.h | 2 + 2 files changed, 104 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index f622104d54d7..e6ba7bb09a61 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6865,6 +6865,9 @@ requeue_delayed_entity(struct sched_entity *se) clear_delayed(se); } =20 +static void fair_remove_pushable_task(struct rq *rq, struct task_struct *p= ); +static void fair_add_pushable_task(struct rq *rq, struct task_struct *p); + /* * The enqueue_task method is called before nr_running is * increased. Here we update the fair scheduling stats and @@ -7015,6 +7018,7 @@ static int dequeue_entities(struct rq *rq, struct sch= ed_entity *se, int flags) h_nr_idle =3D task_has_idle_policy(p); if (task_sleep || task_delayed || !se->sched_delayed) h_nr_runnable =3D 1; + fair_remove_pushable_task(rq, p); } =20 for_each_sched_entity(se) { @@ -8954,6 +8958,12 @@ pick_next_task_fair(struct rq *rq, struct task_struc= t *prev, struct rq_flags *rf put_prev_entity(cfs_rq, pse); set_next_entity(cfs_rq, se); =20 + /* + * The previous task might be eligible for being pushed on + * another cpu if it is still active. + */ + fair_add_pushable_task(rq, prev); + __set_next_task_fair(rq, p, true); } =20 @@ -9017,6 +9027,13 @@ static void put_prev_task_fair(struct rq *rq, struct= task_struct *prev, struct t cfs_rq =3D cfs_rq_of(se); put_prev_entity(cfs_rq, se); } + + /* + * The previous task might be eligible for being pushed on another cpu + * if it is still active. + */ + fair_add_pushable_task(rq, prev); + } =20 /* @@ -13028,6 +13045,79 @@ static void nohz_newidle_balance(struct rq *this_r= q) atomic_or(NOHZ_NEWILB_KICK, nohz_flags(this_cpu)); } =20 +/* + * Push based load balancing: It may take several ticks before a nohz idle= CPU + * is selected for load balancing which is less than ideal for latency + * sensitive tasks stuck on overloaded CPUs. + * + * If a fair task is preempted, opportunistically try pushing to an idle C= PU if + * the indicators say it is favourable. Since a busy CPU is handling the p= ush, + * this is a time sensitive operation. + */ +static inline bool fair_push_task(struct rq *rq, struct task_struct *p) +{ + if (!task_on_rq_queued(p)) + return false; + + if (p->se.sched_delayed) + return false; + + if (p->nr_cpus_allowed =3D=3D 1) + return false; + + if (task_current_donor(rq, p)) + return false; + + if (task_current(rq, p)) + return false; + + return true; +} + +static inline int has_pushable_tasks(struct rq *rq) +{ + return !plist_head_empty(&rq->cfs.pushable_tasks); +} + +/* + * See if the non running fair tasks on this rq can be sent on other CPUs + * that fits better with their profile. + */ +static bool push_fair_task(struct rq *rq) +{ + return false; +} + +static void push_fair_tasks(struct rq *rq) +{ + /* push_fair_task() will return true if it moved a fair task */ + while (push_fair_task(rq)) + ; +} + +static DEFINE_PER_CPU(struct balance_callback, fair_push_head); + +static inline void fair_queue_pushable_tasks(struct rq *rq) +{ + if (!has_pushable_tasks(rq)) + return; + + queue_balance_callback(rq, &per_cpu(fair_push_head, rq->cpu), push_fair_t= asks); +} +static void fair_remove_pushable_task(struct rq *rq, struct task_struct *p) +{ + plist_del(&p->pushable_tasks, &rq->cfs.pushable_tasks); +} + +static void fair_add_pushable_task(struct rq *rq, struct task_struct *p) +{ + if (fair_push_task(rq, p)) { + plist_del(&p->pushable_tasks, &rq->cfs.pushable_tasks); + /* Place the task with greates chance to be pushed first. */ + plist_node_init(&p->pushable_tasks, p->prio); + plist_add(&p->pushable_tasks, &rq->cfs.pushable_tasks); + } +} #else /* !CONFIG_NO_HZ_COMMON: */ static inline void cpu_sd_exit_nohz_balance(struct rq *rq) { } static inline void cpu_sd_reenter_nohz_balance(struct rq *rq) { } @@ -13039,6 +13129,10 @@ static inline bool nohz_idle_balance(struct rq *th= is_rq, enum cpu_idle_type idle } =20 static inline void nohz_newidle_balance(struct rq *this_rq) { } + +static inline void fair_remove_pushable_task(struct rq *rq, struct task_st= ruct *p) { } +static inline void fair_add_pushable_task(struct rq *rq, struct task_struc= t *p) { } +static inline void fair_queue_pushable_tasks(struct rq *rq) { } #endif /* !CONFIG_NO_HZ_COMMON */ =20 /* @@ -13738,6 +13832,8 @@ static void __set_next_task_fair(struct rq *rq, str= uct task_struct *p, bool firs { struct sched_entity *se =3D &p->se; =20 + fair_remove_pushable_task(rq, p); + if (task_on_rq_queued(p)) { /* * Move the next running task to the front of the list, so our @@ -13753,6 +13849,11 @@ static void __set_next_task_fair(struct rq *rq, st= ruct task_struct *p, bool firs if (hrtick_enabled_fair(rq)) hrtick_start_fair(rq, p); =20 + /* + * Try to push prev task before checking misfit for next task as + * the migration of prev can make next fitting the CPU + */ + fair_queue_pushable_tasks(rq); update_misfit_status(p, rq); sched_fair_update_stop_tick(rq, p); } @@ -13782,6 +13883,7 @@ void init_cfs_rq(struct cfs_rq *cfs_rq) { cfs_rq->tasks_timeline =3D RB_ROOT_CACHED; cfs_rq->zero_vruntime =3D (u64)(-(1LL << 20)); + plist_head_init(&cfs_rq->pushable_tasks); raw_spin_lock_init(&cfs_rq->removed.lock); } =20 diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 3433de20a249..91928a371588 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -743,6 +743,8 @@ struct cfs_rq { struct list_head leaf_cfs_rq_list; struct task_group *tg; /* group that "owns" this runqueue */ =20 + struct plist_head pushable_tasks; + /* Locally cached copy of our task_group's idle value */ int idle; =20 --=20 2.43.0