From nobody Mon Jun 8 04:25:14 2026 Received: from CH4PR04CU002.outbound.protection.outlook.com (mail-northcentralusazon11013007.outbound.protection.outlook.com [40.107.201.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D930229CB24 for ; Tue, 2 Jun 2026 05:00:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.201.7 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780376456; cv=fail; b=dCluavvSKeF0ME46qjtu3YFUh719LNIoTTen00kLk4nX4AG3w9OeI6clsn9UWm8DRykHqPvYjgwpehZw8U6jrBwTNPmR87Ime/Hrmzql+MsM/D9yUgglmJMpUApiTny/4+ZNTcZcEki6OCkIgBJRjRPhKsRGnR4t/RDjLpepuSw= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780376456; c=relaxed/simple; bh=Sfx32y/8VQ+y9i2z/vP5XrrR6+ES0qMgt2py/iVnJO4=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=FEIBGdBwW7q/MtZr/TZIeWgcb7Tk+EBD9T6JObt+Nrk/S5Up0vQXq1cVIbAtfhw92Wmb5wWqsIkN9MLUYgA/MPAXlowyxRsxJayipfhGUGiZ41Fu13SkVxEJoD8Qmg5IfljWUCqqbUMc507j4xrnh8+PqyrLFMo9obtKJuPOAow= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com; spf=fail smtp.mailfrom=amd.com; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b=IiNiBvy0; arc=fail smtp.client-ip=40.107.201.7 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="IiNiBvy0" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=AN/LjQRvcH6UHCCLUcO0e67UjcU2c7Gcu9NkuYiAEcyMwWG5gYk0EBjLcGZmBPc85BqmRZJQgaL49T3+AIJPUf2kBm4lIyqlDKENWyFm3ul6Z0Y1lbFMnR/IUFCP1qYxBUc9jweSam+lbyHsQqKTmPyhyuhX2zeKD0riU2Nz4WNwMU8sq5tM8isP/1dLT6s/fBBBRQvvQ6BvPAFJRkxmzqOD5mRsQVsxZwxgkP9+XNdNk+bA4YPUL6TM9c7uUaE7/T1htKAyoukZaIYfQ9sEWWIkAS198sts57RPtof2MlWmYjLQoN4LB/ERz6JaJQY/O7uJ5NPLMgSGBk520EwYhA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=FhDNF5pbIAU4yo3HQyLReFmqwtZu/IUualItia1BIaI=; b=gv2C6Dw5h1Ihl537P6qrQVRoVSPKGjBPVC4iod7+J+ru1Q81I22xO3v+q+Ph5E+IKmuFf75a7eD2dwtckr2iqZO041EPtl9oEe/ziHWteSyr8hpE19LNTi+ed5wwpAYELW82i7Duzo+uMKEp6OTIfVxpZMrNoxFUCsid+ww52U4JYgTgpuUHi5RQgZgoSnTxoeq5w7QdtuGWIf3C1FrFGfy5rfOzErnD8b/RzgrF2FjEf3GsjSiXOUkFhKC9gI+8eO1J/McA0IcFsj+Wxrig/gnB4ohDKaCD2SXuHKbvGbKzoZsZeY5AfBCuvBSbbaJPpnRqQLik2tbJEvGyQqlNQQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=redhat.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=FhDNF5pbIAU4yo3HQyLReFmqwtZu/IUualItia1BIaI=; b=IiNiBvy0LsHD0Kd1m3a4OoNoTChGxMVCCZ9rXjFw4IJJFTzFOX965oAXbLRYvqgRSyoqxKDAChK4AnkiraOeqnpeBARUWTwwhFjYDwHfU069YhMaAN6FKo1y4sgJAanKg0YCmSF2CwA6cdMPtppf3uKyl6+jYfNXyJ15/NaS5pI= Received: from SJ0PR03CA0066.namprd03.prod.outlook.com (2603:10b6:a03:331::11) by DSVPR12MB999174.namprd12.prod.outlook.com (2603:10b6:8:389::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9891.23; Tue, 2 Jun 2026 05:00:46 +0000 Received: from SJ5PEPF000001D4.namprd05.prod.outlook.com (2603:10b6:a03:331:cafe::1d) by SJ0PR03CA0066.outlook.office365.com (2603:10b6:a03:331::11) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.21.71.17 via Frontend Transport; Tue, 2 Jun 2026 05:00:44 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=satlexmb07.amd.com; pr=C Received: from satlexmb07.amd.com (165.204.84.17) by SJ5PEPF000001D4.mail.protection.outlook.com (10.167.242.56) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.21.92.5 via Frontend Transport; Tue, 2 Jun 2026 05:00:44 +0000 Received: from BLRKPRNAYAK.amd.com (10.180.168.240) by satlexmb07.amd.com (10.181.42.216) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.41; Tue, 2 Jun 2026 00:00:39 -0500 From: K Prateek Nayak To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot CC: Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Aaron Lu , Josh Don , K Prateek Nayak , Subject: [PATCH v2 1/5] sched/fair: Convert cfs bandwidth throttling to use guards Date: Tue, 2 Jun 2026 05:00:01 +0000 Message-ID: <20260602050005.11160-2-kprateek.nayak@amd.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260602050005.11160-1-kprateek.nayak@amd.com> References: <20260602050005.11160-1-kprateek.nayak@amd.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: satlexmb07.amd.com (10.181.42.216) To satlexmb07.amd.com (10.181.42.216) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SJ5PEPF000001D4:EE_|DSVPR12MB999174:EE_ X-MS-Office365-Filtering-Correlation-Id: 9cafa2c7-e86d-4dd5-8fad-08dec063e762 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|36860700016|7416014|82310400026|376014|1800799024|18002099003|22082099003|56012099006|11063799006|6133799003; X-Microsoft-Antispam-Message-Info: R5wb1eykPsgTlfK/xT19jjCTmhRFR77fcixac8PL5VQitENlUxhSaSfZr5Ta9OW/jF3KFqUetk+CB6AY449Tx3ujo7Z454wj8/cVOUBwd2WnfPELv+lRDYzTOzHqjXANPc1hq01KD2R5j3LXmf0p+0a22qQGy9sWDSRQRA1CMXkmczCAOgbCWmsTBuixNMOOorH+daiEZszk4W1EaBWtm5gbHsl+NsmbGrjQk21LBN6nnWZgBKSb35GjNXgA1eRJ5VFiII1ZT75j7VcCfUvaTX6/o+A9raaKKGO8YDb2Agk/SS/BbNOI5fMvQDZ2NVOr4KAKJJzDwSfPCoxYlrRf8VzzTmPvFYnnLTqkMG+JF1pS4eXDUHXBj6GwmV4BVaIMOJfnU9JEXMb04+qshM3Tpvt31aeIr9jilamsvvya/0FE7A5kx7ijQijY6sQepk7Ax6MRcNLLvIesRMrfB7JwVUASBMkN0jERZ29DKBaTDDPr0pQOF30F5zZyFF7FmQM4WNgbkONgAJu4oUudStNK91M1rXT00q422XxV17Ywi8Zm6a7AD/DKVwLoCCHzlpFc2csWNhdNMy7tLe89jYC21AyCLrSSaK96YrBQZblEOZTbSsTDSK2CoHUwpprOXtuCmbScrDuzH9ZStXn2JYmdillegZZNFgzJ2v/Zh0T43BRq99CeHQoF+n4KbAA8IBWIERACcYYrnbAN0/bNVaWCr5imb8EFjSaAgS4wvRw9X3k= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:satlexmb07.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(36860700016)(7416014)(82310400026)(376014)(1800799024)(18002099003)(22082099003)(56012099006)(11063799006)(6133799003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: 9VfcEnPEOnM+2MdyKCCKV2rseglk2NKKOUM2RN3gxJf3OuB3ZksrkWVZhcr/GES/HTKnaOozmrFJapLz/LsGdZ8qi8NC3kDZeGKBC4/2aUQzlRYZGP8hfa0esfXqVknbyTk3NGEEhyryoqXlAKBIFI/QCvcInXO49mKYaWPwOLMWAA69n8DkN7qwwaT7+/lzudoITueaUdmFCjvng8+xH3Q+rby4P+2EvoNfxZOlKgijGgaJImuPfGv9J/+wNZU9QOzC0YaM9Id+zKO+wGd7IHEUSKcjrdHSGjjrlicpWIqV3/n2Zwp/acjwAo4xEJe19HslslQR1CwEhehSN6+n75qEoJwqRyMZYxSWC5N6uW2NVxSmVERJzju0bmbIpwRoUWYaj51B2JYMH1iMkADSqmocQIkCb9ZZbaf4ZaBxgsbrf4aPHoUXlz7brJskMYMd X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 02 Jun 2026 05:00:44.8776 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 9cafa2c7-e86d-4dd5-8fad-08dec063e762 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[satlexmb07.amd.com] X-MS-Exchange-CrossTenant-AuthSource: SJ5PEPF000001D4.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DSVPR12MB999174 Content-Type: text/plain; charset="utf-8" Routine conversion of rcu_read_lock(), spin_lock*, and rq_lock usage within the cfs bandwidth controller to use class guards. Only notable changes are: - Checking for "cfs_rq->runtime_remaining <=3D 0" instead of the inverse to spot a throttle and break early. This also saves the need for extra indentation in the unthrottle case. - Reordering of list_del_rcu() against throttled_clock indicator update in unthrottle_cfs_rq(). Both are done with "cfs_b->lock" held after the "cfs_rq->throttled" is cleared which make the reordering safe against concurrent list modifications. No functional changes intended. Reviewed-by: Ben Segall Signed-off-by: K Prateek Nayak Tested-by: Aaron Lu --- v1..v2: o Addressed comments to keep the call to distribute_cfs_runtime() in do_sched_cfs_slack_timer() and used scoped_guard() instead. (Ben) o Instead of adding back "else { throttled =3D true; }" in distribute_cfs_runtime(), invert the condition to "cfs_rq->runtime_remaining <=3D 0" and break early after setting "throttled =3D true". (Indirectly addresses Ben's comment to keep it clear as to which condition leads to "throttled" being set to true.) o Collected tags from Ben and Aaron. (Thanks a ton!) --- kernel/sched/fair.c | 193 +++++++++++++++++++++----------------------- 1 file changed, 90 insertions(+), 103 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index d030fccfa4b4..766cd4395e6c 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5006,13 +5006,13 @@ static void __maybe_unused clear_tg_offline_cfs_rqs= (struct rq *rq) */ rq_clock_start_loop_update(rq); =20 - rcu_read_lock(); + guard(rcu)(); + list_for_each_entry_rcu(tg, &task_groups, list) { struct cfs_rq *cfs_rq =3D tg_cfs_rq(tg, cpu_of(rq)); =20 clear_tg_load_avg(cfs_rq); } - rcu_read_unlock(); =20 rq_clock_stop_loop_update(rq); } @@ -6511,13 +6511,10 @@ static int __assign_cfs_rq_runtime(struct cfs_bandw= idth *cfs_b, static int assign_cfs_rq_runtime(struct cfs_rq *cfs_rq) { struct cfs_bandwidth *cfs_b =3D tg_cfs_bandwidth(cfs_rq->tg); - int ret; =20 - raw_spin_lock(&cfs_b->lock); - ret =3D __assign_cfs_rq_runtime(cfs_b, cfs_rq, sched_cfs_bandwidth_slice(= )); - raw_spin_unlock(&cfs_b->lock); + guard(raw_spinlock)(&cfs_b->lock); =20 - return ret; + return __assign_cfs_rq_runtime(cfs_b, cfs_rq, sched_cfs_bandwidth_slice()= ); } =20 static void __account_cfs_rq_runtime(struct cfs_rq *cfs_rq, u64 delta_exec) @@ -6806,33 +6803,32 @@ static bool throttle_cfs_rq(struct cfs_rq *cfs_rq) { struct rq *rq =3D rq_of(cfs_rq); struct cfs_bandwidth *cfs_b =3D tg_cfs_bandwidth(cfs_rq->tg); - int dequeue =3D 1; =20 - raw_spin_lock(&cfs_b->lock); - /* This will start the period timer if necessary */ - if (__assign_cfs_rq_runtime(cfs_b, cfs_rq, 1)) { + scoped_guard(raw_spinlock, &cfs_b->lock) { /* - * We have raced with bandwidth becoming available, and if we - * actually throttled the timer might not unthrottle us for an - * entire period. We additionally needed to make sure that any - * subsequent check_cfs_rq_runtime calls agree not to throttle - * us, as we may commit to do cfs put_prev+pick_next, so we ask - * for 1ns of runtime rather than just check cfs_b. + * Check if We have raced with bandwidth becoming available. If + * we actually throttled the timer might not unthrottle us for + * an entire period. We additionally needed to make sure that + * any subsequent check_cfs_rq_runtime calls agree not to + * throttle us, as we may commit to do cfs put_prev+pick_next, + * so we ask for 1ns of runtime rather than just check cfs_b. + * + * This will start the period timer if necessary. + */ + if (__assign_cfs_rq_runtime(cfs_b, cfs_rq, 1)) + return false; + + /* + * No bandwidth available; Add ourselves on the list to be + * unthrottled later. */ - dequeue =3D 0; - } else { list_add_tail_rcu(&cfs_rq->throttled_list, &cfs_b->throttled_cfs_rq); } - raw_spin_unlock(&cfs_b->lock); - - if (!dequeue) - return false; /* Throttle no longer required. */ =20 /* freeze hierarchy runnable averages while throttled */ - rcu_read_lock(); - walk_tg_tree_from(cfs_rq->tg, tg_throttle_down, tg_nop, (void *)rq); - rcu_read_unlock(); + scoped_guard(rcu) + walk_tg_tree_from(cfs_rq->tg, tg_throttle_down, tg_nop, (void *)rq); =20 /* * Note: distribution will already see us throttled via the @@ -6865,13 +6861,15 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq) =20 update_rq_clock(rq); =20 - raw_spin_lock(&cfs_b->lock); - if (cfs_rq->throttled_clock) { + scoped_guard(raw_spinlock, &cfs_b->lock) { + list_del_rcu(&cfs_rq->throttled_list); + + if (!cfs_rq->throttled_clock) + break; + cfs_b->throttled_time +=3D rq_clock(rq) - cfs_rq->throttled_clock; cfs_rq->throttled_clock =3D 0; } - list_del_rcu(&cfs_rq->throttled_list); - raw_spin_unlock(&cfs_b->lock); =20 /* update hierarchical throttle state */ walk_tg_tree_from(cfs_rq->tg, tg_nop, tg_unthrottle_up, (void *)rq); @@ -6900,9 +6898,8 @@ static void __cfsb_csd_unthrottle(void *arg) { struct cfs_rq *cursor, *tmp; struct rq *rq =3D arg; - struct rq_flags rf; =20 - rq_lock(rq, &rf); + guard(rq_lock)(rq); =20 /* * Iterating over the list can trigger several call to @@ -6919,7 +6916,7 @@ static void __cfsb_csd_unthrottle(void *arg) * race with group being freed in the window between removing it * from the list and advancing to the next entry in the list. */ - rcu_read_lock(); + guard(rcu)(); =20 list_for_each_entry_safe(cursor, tmp, &rq->cfsb_csd_list, throttled_csd_list) { @@ -6929,10 +6926,7 @@ static void __cfsb_csd_unthrottle(void *arg) unthrottle_cfs_rq(cursor); } =20 - rcu_read_unlock(); - rq_clock_stop_loop_update(rq); - rq_unlock(rq, &rf); } =20 static inline void __unthrottle_cfs_rq_async(struct cfs_rq *cfs_rq) @@ -6972,11 +6966,11 @@ static bool distribute_cfs_runtime(struct cfs_bandw= idth *cfs_b) u64 runtime, remaining =3D 1; bool throttled =3D false; struct cfs_rq *cfs_rq, *tmp; - struct rq_flags rf; struct rq *rq; LIST_HEAD(local_unthrottle); =20 - rcu_read_lock(); + guard(rcu)(); + list_for_each_entry_rcu(cfs_rq, &cfs_b->throttled_cfs_rq, throttled_list) { rq =3D rq_of(cfs_rq); @@ -6986,65 +6980,63 @@ static bool distribute_cfs_runtime(struct cfs_bandw= idth *cfs_b) break; } =20 - rq_lock_irqsave(rq, &rf); + guard(rq_lock_irqsave)(rq); + if (!cfs_rq_throttled(cfs_rq)) - goto next; + continue; =20 /* Already queued for async unthrottle */ if (!list_empty(&cfs_rq->throttled_csd_list)) - goto next; + continue; =20 /* By the above checks, this should never be true */ WARN_ON_ONCE(cfs_rq->runtime_remaining > 0); =20 - raw_spin_lock(&cfs_b->lock); - runtime =3D -cfs_rq->runtime_remaining + 1; - if (runtime > cfs_b->runtime) - runtime =3D cfs_b->runtime; - cfs_b->runtime -=3D runtime; - remaining =3D cfs_b->runtime; - raw_spin_unlock(&cfs_b->lock); + scoped_guard(raw_spinlock, &cfs_b->lock) { + runtime =3D -cfs_rq->runtime_remaining + 1; + if (runtime > cfs_b->runtime) + runtime =3D cfs_b->runtime; + cfs_b->runtime -=3D runtime; + remaining =3D cfs_b->runtime; + } =20 cfs_rq->runtime_remaining +=3D runtime; =20 - /* we check whether we're throttled above */ - if (cfs_rq->runtime_remaining > 0) { - if (cpu_of(rq) !=3D this_cpu) { - unthrottle_cfs_rq_async(cfs_rq); - } else { - /* - * We currently only expect to be unthrottling - * a single cfs_rq locally. - */ - WARN_ON_ONCE(!list_empty(&local_unthrottle)); - list_add_tail(&cfs_rq->throttled_csd_list, - &local_unthrottle); - } - } else { + /* + * Ran out of bandwidth during distribution! + * Indicate throttled entities and break early. + */ + if (cfs_rq->runtime_remaining <=3D 0) { throttled =3D true; + break; } =20 -next: - rq_unlock_irqrestore(rq, &rf); + /* we check whether we're throttled above */ + if (cpu_of(rq) !=3D this_cpu) { + unthrottle_cfs_rq_async(cfs_rq); + continue; + } + + /* + * We currently only expect to be unthrottling + * a single cfs_rq locally. + */ + WARN_ON_ONCE(!list_empty(&local_unthrottle)); + list_add_tail(&cfs_rq->throttled_csd_list, &local_unthrottle); } =20 list_for_each_entry_safe(cfs_rq, tmp, &local_unthrottle, throttled_csd_list) { struct rq *rq =3D rq_of(cfs_rq); =20 - rq_lock_irqsave(rq, &rf); + guard(rq_lock_irqsave)(rq); =20 list_del_init(&cfs_rq->throttled_csd_list); - if (cfs_rq_throttled(cfs_rq)) unthrottle_cfs_rq(cfs_rq); - - rq_unlock_irqrestore(rq, &rf); } WARN_ON_ONCE(!list_empty(&local_unthrottle)); =20 - rcu_read_unlock(); - return throttled; } =20 @@ -7167,7 +7159,8 @@ static void __return_cfs_rq_runtime(struct cfs_rq *cf= s_rq) if (slack_runtime <=3D 0) return; =20 - raw_spin_lock(&cfs_b->lock); + guard(raw_spinlock)(&cfs_b->lock); + if (cfs_b->quota !=3D RUNTIME_INF) { cfs_b->runtime +=3D slack_runtime; =20 @@ -7176,7 +7169,6 @@ static void __return_cfs_rq_runtime(struct cfs_rq *cf= s_rq) !list_empty(&cfs_b->throttled_cfs_rq)) start_cfs_slack_bandwidth(cfs_b); } - raw_spin_unlock(&cfs_b->lock); =20 /* even if it's not valid for return we don't want to try again */ cfs_rq->runtime_remaining -=3D slack_runtime; @@ -7199,25 +7191,21 @@ static __always_inline void return_cfs_rq_runtime(s= truct cfs_rq *cfs_rq) */ static void do_sched_cfs_slack_timer(struct cfs_bandwidth *cfs_b) { - u64 runtime =3D 0, slice =3D sched_cfs_bandwidth_slice(); - unsigned long flags; - /* confirm we're still not at a refresh boundary */ - raw_spin_lock_irqsave(&cfs_b->lock, flags); - cfs_b->slack_started =3D false; + scoped_guard(raw_spinlock_irqsave, &cfs_b->lock) { + u64 runtime =3D 0, slice =3D sched_cfs_bandwidth_slice(); =20 - if (runtime_refresh_within(cfs_b, min_bandwidth_expiration)) { - raw_spin_unlock_irqrestore(&cfs_b->lock, flags); - return; - } + cfs_b->slack_started =3D false; =20 - if (cfs_b->quota !=3D RUNTIME_INF && cfs_b->runtime > slice) - runtime =3D cfs_b->runtime; + if (runtime_refresh_within(cfs_b, min_bandwidth_expiration)) + return; =20 - raw_spin_unlock_irqrestore(&cfs_b->lock, flags); + if (cfs_b->quota !=3D RUNTIME_INF && cfs_b->runtime > slice) + runtime =3D cfs_b->runtime; =20 - if (!runtime) - return; + if (!runtime) + return; + } =20 distribute_cfs_runtime(cfs_b); } @@ -7306,18 +7294,18 @@ static enum hrtimer_restart sched_cfs_period_timer(= struct hrtimer *timer) { struct cfs_bandwidth *cfs_b =3D container_of(timer, struct cfs_bandwidth, period_timer); - unsigned long flags; int overrun; int idle =3D 0; int count =3D 0; =20 - raw_spin_lock_irqsave(&cfs_b->lock, flags); + CLASS(raw_spinlock_irqsave, cfsb_guard)(&cfs_b->lock); + for (;;) { overrun =3D hrtimer_forward_now(timer, cfs_b->period); if (!overrun) break; =20 - idle =3D do_sched_cfs_period_timer(cfs_b, overrun, flags); + idle =3D do_sched_cfs_period_timer(cfs_b, overrun, cfsb_guard.flags); =20 if (++count > 3) { u64 new, old =3D ktime_to_ns(cfs_b->period); @@ -7350,11 +7338,13 @@ static enum hrtimer_restart sched_cfs_period_timer(= struct hrtimer *timer) count =3D 0; } } - if (idle) + + if (idle) { cfs_b->period_active =3D 0; - raw_spin_unlock_irqrestore(&cfs_b->lock, flags); + return HRTIMER_NORESTART; + } =20 - return idle ? HRTIMER_NORESTART : HRTIMER_RESTART; + return HRTIMER_RESTART; } =20 void init_cfs_bandwidth(struct cfs_bandwidth *cfs_b, struct cfs_bandwidth = *parent) @@ -7421,14 +7411,12 @@ static void destroy_cfs_bandwidth(struct cfs_bandwi= dth *cfs_b) */ for_each_possible_cpu(i) { struct rq *rq =3D cpu_rq(i); - unsigned long flags; =20 if (list_empty(&rq->cfsb_csd_list)) continue; =20 - local_irq_save(flags); - __cfsb_csd_unthrottle(rq); - local_irq_restore(flags); + scoped_guard(irqsave) + __cfsb_csd_unthrottle(rq); } } =20 @@ -7446,16 +7434,15 @@ static void __maybe_unused update_runtime_enabled(s= truct rq *rq) =20 lockdep_assert_rq_held(rq); =20 - rcu_read_lock(); + guard(rcu)(); + list_for_each_entry_rcu(tg, &task_groups, list) { struct cfs_bandwidth *cfs_b =3D &tg->cfs_bandwidth; struct cfs_rq *cfs_rq =3D tg_cfs_rq(tg, cpu_of(rq)); =20 - raw_spin_lock(&cfs_b->lock); - cfs_rq->runtime_enabled =3D cfs_b->quota !=3D RUNTIME_INF; - raw_spin_unlock(&cfs_b->lock); + scoped_guard(raw_spinlock, &cfs_b->lock) + cfs_rq->runtime_enabled =3D cfs_b->quota !=3D RUNTIME_INF; } - rcu_read_unlock(); } =20 /* cpu offline callback */ @@ -7476,7 +7463,8 @@ static void __maybe_unused unthrottle_offline_cfs_rqs= (struct rq *rq) */ rq_clock_start_loop_update(rq); =20 - rcu_read_lock(); + guard(rcu)(); + list_for_each_entry_rcu(tg, &task_groups, list) { struct cfs_rq *cfs_rq =3D tg_cfs_rq(tg, cpu_of(rq)); =20 @@ -7499,7 +7487,6 @@ static void __maybe_unused unthrottle_offline_cfs_rqs= (struct rq *rq) cfs_rq->runtime_remaining =3D 1; unthrottle_cfs_rq(cfs_rq); } - rcu_read_unlock(); =20 rq_clock_stop_loop_update(rq); } --=20 2.34.1 From nobody Mon Jun 8 04:25:14 2026 Received: from BN1PR04CU002.outbound.protection.outlook.com (mail-eastus2azon11010006.outbound.protection.outlook.com [52.101.56.6]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5EC4F24A06A for ; Tue, 2 Jun 2026 05:01:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=52.101.56.6 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780376474; cv=fail; b=IkfilgtY8LQ3DYGZUYx48oHA02EYCsMTBjVe5WWKPpS8o/pYRaAe2ZRDBc64Yje30vsSMpHfSc6H9x+pcwdhisccElVOfcjha+E0jlDkB+p47lvrBEFZfNlNnAnma9Hb7/yWfnhkaRihfY0hu2RS1k7xICPD9FabYwKZp4cy/0s= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780376474; c=relaxed/simple; bh=PDSmk4xsAvldkUqhuzxc3C4/LeFqfOYGGek8wXyTVpk=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Jcsfgyvn11vSADuYVUOVZYqqSYDh5a6JwTP3gPfa72su2VwhsUyzSvi1NtrQKlyhlEMXpaKK+eCDowmdGPJIAwki5Or4Wi24dJZtCi/eLo35Itf8YYpgosL1XBrNg8iXw2zLuNeLh135NUD3xJrYUu7dmAWl9IIi9UOROp81NLA= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com; spf=fail smtp.mailfrom=amd.com; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b=jpemriwr; arc=fail smtp.client-ip=52.101.56.6 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="jpemriwr" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=CMXIbB1/Sq19FRKs7Mp30LhNVMVx4KkKehJbHFFXzrKFQgBLo0H5ahiGCLentzgrDZ8QIvDJejcZ+KEW830B0lsyHWz7uIBNjcwqGZaw4jfAJNqWPxYrKN30OOINfpXgwG99sSpNNxsDPRRgkt3CC/nlW01ybzeZsvBZHCtiOXdB9esGgAa63ZxPbufs3KF3xSihp63HQD8beYXCEo54mm9twoyH9YK69T4N/pGPLgXuvw01imnGkEjuZ2qvMwVJ5OLeOYH0Tz8KGWSjNdgmzBqsA3ttz3FJJw494o+zyN4LodcDoye01CtPFjVwUax4KtDcSmMzpKqdTgzfw4QCTg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=li7fs0JvGGPzV6xXbp861/3hg1pqzFRGX0HgOXTHyVU=; b=O0aYfDlMTTqdY0erdPstSve4B/2T17o9wH0zPE2lH8JdRfAVGyg4eYNX8JZ8ZVStOg8Nb+Ip36Bdd7HcQZulRHqdSSr8iBYM3ytGzncGOBbc+UuNyPTj5fnvZoGlNRWvX+tD/mXzhjRUI9KzIHBHdDDkEFkQm+p7oM93Z3hhz3yB+sQL1L0Y42aBvaNyIrIh+czh1Hyq1bWDJ+MCbDKuArf3fyCwbQiM4+Kk2ABWUPynKaV9J3dG7PUFsewk+9ghz5kx13CxO8cZlNGCYpAtdyGXsDrFmYTxPk1Y/kPysVP0u3Hzk1IJ0AUUky7ae4J6m35FxjR1KaHJDBhV+UvdbA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=redhat.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=li7fs0JvGGPzV6xXbp861/3hg1pqzFRGX0HgOXTHyVU=; b=jpemriwrz9b7qK/D6270lpVMlu6aihlfnz1B15DrsJIGPtDwWqDKv3h1P4raACuIKMcGX97c8hOLktmrGdM857O3uFp+fpomPnNa3QmGn49WON+f6I263uFDCSYXipwPxZLXyZK4yb9vqJdqXaO4QgC5WYW4IDTGW6AIe8oTFG0= Received: from SJ0PR03CA0192.namprd03.prod.outlook.com (2603:10b6:a03:2ef::17) by PH7PR12MB7113.namprd12.prod.outlook.com (2603:10b6:510:1ec::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.21.71.14; Tue, 2 Jun 2026 05:01:01 +0000 Received: from SJ5PEPF000001D2.namprd05.prod.outlook.com (2603:10b6:a03:2ef:cafe::8a) by SJ0PR03CA0192.outlook.office365.com (2603:10b6:a03:2ef::17) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.21.71.16 via Frontend Transport; Tue, 2 Jun 2026 05:01:01 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=satlexmb07.amd.com; pr=C Received: from satlexmb07.amd.com (165.204.84.17) by SJ5PEPF000001D2.mail.protection.outlook.com (10.167.242.54) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.21.92.5 via Frontend Transport; Tue, 2 Jun 2026 05:01:01 +0000 Received: from BLRKPRNAYAK.amd.com (10.180.168.240) by satlexmb07.amd.com (10.181.42.216) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.41; Tue, 2 Jun 2026 00:00:57 -0500 From: K Prateek Nayak To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot CC: Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Aaron Lu , Josh Don , K Prateek Nayak , Subject: [PATCH v2 2/5] sched/fair: Use throttled_csd_list for local unthrottle Date: Tue, 2 Jun 2026 05:00:02 +0000 Message-ID: <20260602050005.11160-3-kprateek.nayak@amd.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260602050005.11160-1-kprateek.nayak@amd.com> References: <20260602050005.11160-1-kprateek.nayak@amd.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: satlexmb07.amd.com (10.181.42.216) To satlexmb07.amd.com (10.181.42.216) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SJ5PEPF000001D2:EE_|PH7PR12MB7113:EE_ X-MS-Office365-Filtering-Correlation-Id: ac54589c-d75f-46c1-9ec2-08dec063f156 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|1800799024|82310400026|376014|7416014|36860700016|56012099006|11063799006|22082099003|18002099003; X-Microsoft-Antispam-Message-Info: Wq56665KAQR5Cxcg39OT1AaciTgybmAsNj3UlwDI5lK3XhJiNDSXq7HQvDxCk5nHDloWDgawr1WtsOJ2Kaq2CqVDcfpvaxMMKvuyNScg7ptufpLpgsvWfmz026xgPP7EebiZOgV4qbNlen2YACuL8/tyN4KyzHwsOek2QX16iT+lTbrN7BvSMNsMx7znazhrEkJD2tKumGT5WIAkBBcOf7umpsP9hQb0EB/B90Dvfov1Hx8f9k1eVEhJ7ss6td/vXeciUEs0wPVyId6eLn3/CZIQP27iacPJuQNdanCb9tcYdd8OpWMIMRPRtpK7wEDaDXae4hn0n0wF+5VsFsYXsbPeexNdNEIUvp94VzfKzw0VoAPoa1aQiXAu7KQw5CvLaTE0+ccpWhJnfQs9Hr177ZmrDvfb8VSWgAKStnRo9dW9Ato7bwGywjEcJWMmFqhPYiAn/HZgiEgmMnbmHDJLieFMlbc4TC403mPt1lfxsePB7xiqaCKnVnjVKOF3ev1maFx3+Xltv77LwQ3QWneGh6en5MaUA0zIQO+KIO2IeBJdVQ+vhUz2QNFmeghq625G8PAeSGwWWDOsi562sH8WEnRT+NI/yorNLMdEETmGYdxSxFYsZ/9J6DzpoY/j8qibuzCQmTQvnOz9D69ZDMywkzsV+DVYf7zc5oiXWn+C9dS+CmJbmOg4muCnFGw8789FZvDkY8+TUYp4sQacur1Dr9uEC+w7+LBNBSrF6gr7luc= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:satlexmb07.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(1800799024)(82310400026)(376014)(7416014)(36860700016)(56012099006)(11063799006)(22082099003)(18002099003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: qZjXHNqsf5zqYtlI+WnZ9oo8ZYSf9XFY2bx1TBO0Vs2nsiceCLoMDPD0ki2rWx3bQMs10htQTm01LP/hnwMcj3YlVJNdGr4bPs3u2S8Nbco/kkiCWJbda8DEUdcgIroyMcjb+L3rkl/L8dt2hNpUgTiXBg7CWMb8qKrNpK1tMPSjeVAjr40zG74om6oS972xrBzUq/5LMlZpZ1y+dDweYB1J+VFGjhfIR7IiAk+Ol1WWYm1zju6OWXTthnEBUUCNtugeMaHQG5kbo8QHZn5LNMpUWg1tINlkGxqJ2lgijal1tdlQdgD3DCqFBZLb760q8k/YQ2A70+U81mAkPqG1WTNI2NaCjce8QDBU1+B7KKDr+l5Iygl4Sb8D8sAaKDaGD7ug9OJ2oT+Ywzz9B8mkpdWSn9KeG2w15ChTEQRIeZFZJuzj/+TotAyfei7Iotof X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 02 Jun 2026 05:01:01.5616 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: ac54589c-d75f-46c1-9ec2-08dec063f156 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[satlexmb07.amd.com] X-MS-Exchange-CrossTenant-AuthSource: SJ5PEPF000001D2.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH7PR12MB7113 Content-Type: text/plain; charset="utf-8" When distribute_cfs_runtime() encounters a local cfs_rq, it adds it to a local list and unthrottles it at the end, when it is done unthrottling other cfs_rq(s) on cfs_b->throttled_cfs_rq until the bandwidth runs out. Instead of using a local list, reuse the local CPU's rq->throttled_csd_list and the __cfsb_csd_unthrottle() path for unthrottle. If this is the first cfs_rq to be queued on the "throttled_csd_list", it prevents the need for a remote CPUs to interrupt this local CPU if they themselves are performing async unthrottle. If this is not the first cfs_rq on the list, there is an async unthrottle operation pending on this local CPU and the unthrottle can be batched together. No functional changes intended. Reviewed-by: Benjamin Segall Signed-off-by: K Prateek Nayak Tested-by: Aaron Lu --- changelog v1..v2: o Collected tags from Ben, and Aaron (Thanks a ton!) --- kernel/sched/fair.c | 32 +++++++++++++++----------------- 1 file changed, 15 insertions(+), 17 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 766cd4395e6c..825393d3f039 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6962,12 +6962,11 @@ static void unthrottle_cfs_rq_async(struct cfs_rq *= cfs_rq) =20 static bool distribute_cfs_runtime(struct cfs_bandwidth *cfs_b) { + bool throttled =3D false, unthrottle_local =3D false; int this_cpu =3D smp_processor_id(); u64 runtime, remaining =3D 1; - bool throttled =3D false; - struct cfs_rq *cfs_rq, *tmp; + struct cfs_rq *cfs_rq; struct rq *rq; - LIST_HEAD(local_unthrottle); =20 guard(rcu)(); =20 @@ -7018,24 +7017,23 @@ static bool distribute_cfs_runtime(struct cfs_bandw= idth *cfs_b) } =20 /* - * We currently only expect to be unthrottling - * a single cfs_rq locally. + * Allow a parallel async unthrottle to unthrottle + * this cfs_rq too via __cfsb_csd_unthrottle(). + * If we are first, do it ourselves at the end and + * save on an IPI from remote CPUs. */ - WARN_ON_ONCE(!list_empty(&local_unthrottle)); - list_add_tail(&cfs_rq->throttled_csd_list, &local_unthrottle); + unthrottle_local =3D list_empty(&rq->cfsb_csd_list); + list_add_tail(&cfs_rq->throttled_csd_list, &rq->cfsb_csd_list); } =20 - list_for_each_entry_safe(cfs_rq, tmp, &local_unthrottle, - throttled_csd_list) { - struct rq *rq =3D rq_of(cfs_rq); - - guard(rq_lock_irqsave)(rq); - - list_del_init(&cfs_rq->throttled_csd_list); - if (cfs_rq_throttled(cfs_rq)) - unthrottle_cfs_rq(cfs_rq); + if (unthrottle_local) { + /* + * Protect against an IPI that is also trying to flush + * the unthrottled cfs_rq(s) from this CPU's csd_list. + */ + scoped_guard(irqsave) + __cfsb_csd_unthrottle(cpu_rq(this_cpu)); } - WARN_ON_ONCE(!list_empty(&local_unthrottle)); =20 return throttled; } --=20 2.34.1 From nobody Mon Jun 8 04:25:14 2026 Received: from BL2PR02CU003.outbound.protection.outlook.com (mail-eastusazon11011011.outbound.protection.outlook.com [52.101.52.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 985322AE7A for ; Tue, 2 Jun 2026 05:25:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=52.101.52.11 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780377954; cv=fail; b=joKyRObYmQkRvu5JCyajFfuk48+RmQLG8Aafj0RsAxGJwdCVwYLdBjhaCB41/hgGv/9tJl9xlKpficsK+6uwgC+ViJutlvWjHD5bYgKkKmTB/IKd3SwdDuXVnBbaDURDgaNvJqpufXOXhSSNgitRP8lsTY3TsuWpuoP98WXZigM= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780377954; c=relaxed/simple; bh=YFJkYOi8z1oqR8JwFkDdRSpF+bdEnrm0x1m2P4zcLUU=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=AREeinuRScNpzA5lCJjxO3XcpoywJ0sJTmJPMLNY+erhIzJJt47yQUHA/JV6gBsnYsjRn4Hl9XhpojoBoU4Ewnd+uKPniz7HBBTY3wDg2A8/9LWFMs+Z9seaVjos0IFv37xmwkt0KP9xhqziyt5jDnnPi6hBZAdHEjtIODs5Kuk= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com; spf=fail smtp.mailfrom=amd.com; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b=5uePIOpL; arc=fail smtp.client-ip=52.101.52.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="5uePIOpL" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=mcS+DQvADuKoonwkJlbPgxwFBCIzBKgFgYNuBJxWEcDwteoo5GDSDZW7lGqzwG6Pv0/rFMFwqGDlt1AMwfpJq8GHrFaX7IQwbZhb2D8nAUUpDwO2J0M+TowBDLnt7kqS9Lec/MCJydsEKUKcbeW74LvoGX+bkL/2LU6dMv3yN71j4vUBhASq7RVn1iqjCVSpSD4OReZHv+M6Oxzw0uxx+tm9R0YJdYs4pEv23VgyslCViR/uH7gT371DWHnwViIJGGrBvZzpypkaNCd2aLALP31fC5ZroVRuwV5BwVyDzUGYA62SPohOzFScg3kMHBLWoog1NqUP9dq9gLeNck7EtQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=tgzz6mOc6ODmaX9v3KW/x2FXuLDrBJKZcq+ZnH/L/2g=; b=scVA1xwl9+LkIv1pyoVSN9bzGB1GiKIMB3UcFga/wIQ+rF2W3AUC0EC1xGnhL3gWQXWKyhY2U5bOe93OGjAi/x/jH90OBiSCiiH9hd3tVLb/HsQqGt3KlriIPHPZ8pOLaz4c2m58LF24cWW8UbdrP53MM6IUYluWfjiTQ5No9/PEggGw0SzO8Zl7JJsbxLtdsz+MLMza7xjHEKUPjKJA5RSTg6VrXfSgwLDr7Lv1hlOuLgMmZmhG1QQ9fEJt07sdwRDog4rMMNhXho0lcD4zjhxHJLQ3aD4ba0+C937/KO0TFHMs8s2G7pbGKZ+fyrjaUzwLcAuZE1lnstVg1w7s3A== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=redhat.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=tgzz6mOc6ODmaX9v3KW/x2FXuLDrBJKZcq+ZnH/L/2g=; b=5uePIOpL8RhFvfjhM3xXLg28faxEpzIxaoJxQq3w9PS2PaAHfBOzV44bGvLG5NFuli6W3eRXr05AkRDA2VF/3PTif38JiQe6EEHvoJs87mlZr+aA37B/uclZCikx4Lb5YcTnuYCprnCvSiYCnL2TuuLQi8S0JHikk30jvVDYYy8= Received: from MW4PR03CA0023.namprd03.prod.outlook.com (2603:10b6:303:8f::28) by DS5PPF8B1E59479.namprd12.prod.outlook.com (2603:10b6:f:fc00::659) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.21.71.14; Tue, 2 Jun 2026 05:25:49 +0000 Received: from CO1PEPF00012E81.namprd03.prod.outlook.com (2603:10b6:303:8f:cafe::60) by MW4PR03CA0023.outlook.office365.com (2603:10b6:303:8f::28) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.21.92.7 via Frontend Transport; Tue, 2 Jun 2026 05:25:48 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=satlexmb07.amd.com; pr=C Received: from satlexmb07.amd.com (165.204.84.17) by CO1PEPF00012E81.mail.protection.outlook.com (10.167.249.56) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.21.92.5 via Frontend Transport; Tue, 2 Jun 2026 05:25:48 +0000 Received: from BLRKPRNAYAK.amd.com (10.180.168.240) by satlexmb07.amd.com (10.181.42.216) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.41; Tue, 2 Jun 2026 00:25:44 -0500 From: K Prateek Nayak To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot CC: Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Aaron Lu , Josh Don , K Prateek Nayak , Subject: [PATCH v2 3/5] sched/fair: Call update_curr() before unthrottling the hierarchy Date: Tue, 2 Jun 2026 05:25:29 +0000 Message-ID: <20260602052531.11450-1-kprateek.nayak@amd.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260602050005.11160-1-kprateek.nayak@amd.com> References: <20260602050005.11160-1-kprateek.nayak@amd.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: satlexmb07.amd.com (10.181.42.216) To satlexmb07.amd.com (10.181.42.216) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CO1PEPF00012E81:EE_|DS5PPF8B1E59479:EE_ X-MS-Office365-Filtering-Correlation-Id: b48f52ee-ba09-41fa-5c28-08dec06767c6 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|36860700016|376014|7416014|82310400026|1800799024|18002099003|10063799003|22082099003|11063799006|5023799004|6133799003|56012099006; X-Microsoft-Antispam-Message-Info: rVGh5og75ZZIOZtVzlGIpcw2VvtMW4CRWesAgYTbKADpWrPcqUJLWRJb6K6DQ3nshPeUlaY6eYrNGVJmqeK3OnhzlfB+YZZIk5RzGB6qiVyDOVjou0yh4gbauFTFshlODGgUEMOmqxNweDcch73Uf55bjSaHFN0WXtMYqS+NycgZT/chz7WWJ9F7z0DaFKw+IezT082PU6bS9CWLJkSumfXa5TonDZ110ISfD9b+IrNMU64TdMz880ZfkxhuXHWmps+YHluWUYPY47wXGZ7QYqz2gyUy7/vPPYOHrK7NkBHGtEnr0wGFkTRC9I9ajdROubbcqCmKM8MGiwxxrWLnC2JUBC3+PnNh78ir38LKPWrhwKJ/TxqjUWMCmo6fcAsUZ8Sklk/dJACz37w+uWemCWrC3JATznW2AAG3t0ofFlITNgQVlvnul/yXfHWuKi7LmmlvXoGSscZFSBNpLdLIPps0E1uxyI6m4pSZcq4Z8dr3TPSlAyy6tH27JE5GwI9sr7d5PXXZeBid8Sm6dtXxNy2aOGn6GtrdBQDOHOCUfQVzrzLDx9LNqnczLhR3xt1hEmWUVoyUwcU/W/0Sm+7T9+pq4HZF0dM0ZIaYMGC3qURlsQBR1a3NW/oYT564PiuAosFLKn9bERxx+PjJHvMilD5spzRYKeMKI/cPFHRW948bJIJehyWyeBYJSi3GWK+1up6mFYuusLxIkqhxO9TtZEYQridl2qjwOUE+HSq7o70= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:satlexmb07.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(36860700016)(376014)(7416014)(82310400026)(1800799024)(18002099003)(10063799003)(22082099003)(11063799006)(5023799004)(6133799003)(56012099006);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: 4vdO63QMDjakS4bm6SLoueTQIMS56zNRYP01YmPUWjii0KDVppIqtJZ3UTUlx43kuidG8dKjMW7XLkyoAMqhpEuDELHt0KPdEUlMzeMns2Fo7/2/8wo6IN2fJzgR7IRfe60iELCO2u0Ddq+8gm1S3jzn4hSVPp5p2QHpPujg/gCg75TNZZDlt8QR/hqGnD3PlMg2QjmKWJhFcQarhmQWav3FLh6Jm9vryg8zP5OcnDhU7jijkX4xs6tAINjD1ALhAyki/QLvZUaj6kk5zMHyM5ZNfKoXe57Slts01y/vEq5MRlyobFT3QaIalKxl3LohgMw88ertfsiEO3tgo/Kp6wQ6DDqGWH3XFXjVd0RQk5d974G+/RddkKgvjzGtGq//PtPL1+Nmd0/9fJZdIU9aJpXPBTWMt9KEHPoTdzDszRK/6uYr0q3i8RtbqUFpsWDE X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 02 Jun 2026 05:25:48.7312 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: b48f52ee-ba09-41fa-5c28-08dec06767c6 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[satlexmb07.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CO1PEPF00012E81.namprd03.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS5PPF8B1E59479 Content-Type: text/plain; charset="utf-8" Subsequent commits will allow update_curr() to throttle the hierarchy when the runtime accounting exceeds allocated quota. Call update_curr() before the unthrottle event, and in tg_unthrottle_up() to catch up on any remaining runtime and stabilize the "runtime_remaining" and "throttle_count" for that cfs_rq. Doing an update_curr() early ensures the cfs_rq is not throttled right back up again when the unthrottle is in progress. Since all callers of unthrottle_cfs_rq(), except two, already update the rq_clock and call rq_clock_start_loop_update(), move the update_rq_clock() from unthrottle_cfs_rq() to the callers that don't update the rq_clock. Reviewed-by: Benjamin Segall Signed-off-by: K Prateek Nayak Tested-by: Aaron Lu --- changelog v1..v2: o Added the missing update_rq_clock() in tg_set_cfs_bandwidth() - well, Peter added it and I just moved it just before the unthrottle and made the robots happy with the build. (Aaron, Peter, Intel Test Robot) o Added an update_curr() before replenish in distribute_cfs_runtime() which helps performance a little bit. Since this was trivial addition, I've retained the tags. o Collected tags from Ben, and Aaron (Thanks a ton!) --- kernel/sched/core.c | 5 ++++- kernel/sched/fair.c | 21 +++++++++++++++++++-- 2 files changed, 23 insertions(+), 3 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index dd031410ab1a..e745c58671ed 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -9859,11 +9859,14 @@ static int tg_set_cfs_bandwidth(struct task_group *= tg, struct rq *rq =3D cfs_rq->rq; =20 guard(rq_lock_irq)(rq); + cfs_rq->runtime_enabled =3D runtime_enabled; cfs_rq->runtime_remaining =3D 1; =20 - if (cfs_rq->throttled) + if (cfs_rq->throttled) { + update_rq_clock(rq); unthrottle_cfs_rq(cfs_rq); + } } =20 if (runtime_was_enabled && !runtime_enabled) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 825393d3f039..16e0ccaffe6e 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6711,6 +6711,15 @@ static int tg_unthrottle_up(struct task_group *tg, v= oid *data) struct cfs_rq *cfs_rq =3D tg_cfs_rq(tg, cpu_of(rq)); struct task_struct *p, *tmp; =20 + /* + * If cfs_rq->curr is set, the cfs_rq might not have caught up + * since the last clock update. Do it now before we begin + * queueing task onto it to save the need for unnecessarily + * unthrottle the hierarchy for this cfs_rq to be throttled + * right back again. + */ + update_curr(cfs_rq); + if (--cfs_rq->throttle_count) return 0; =20 @@ -6853,14 +6862,16 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq) * We can't unthrottle this cfs_rq without any runtime remaining because * any enqueue in tg_unthrottle_up() will immediately trigger a throttle, * which is not supposed to happen on unthrottle path. + * + * Catch up on the remaining runtime since last clock update before + * checking runtime remaining. */ + update_curr(cfs_rq); if (cfs_rq->runtime_enabled && cfs_rq->runtime_remaining <=3D 0) return; =20 cfs_rq->throttled =3D 0; =20 - update_rq_clock(rq); - scoped_guard(raw_spinlock, &cfs_b->lock) { list_del_rcu(&cfs_rq->throttled_list); =20 @@ -6935,6 +6946,7 @@ static inline void __unthrottle_cfs_rq_async(struct c= fs_rq *cfs_rq) bool first; =20 if (rq =3D=3D this_rq()) { + update_rq_clock(rq); unthrottle_cfs_rq(cfs_rq); return; } @@ -6988,6 +7000,11 @@ static bool distribute_cfs_runtime(struct cfs_bandwi= dth *cfs_b) if (!list_empty(&cfs_rq->throttled_csd_list)) continue; =20 + if (cfs_rq->curr) { + update_rq_clock(rq); + update_curr(cfs_rq); + } + /* By the above checks, this should never be true */ WARN_ON_ONCE(cfs_rq->runtime_remaining > 0); =20 --=20 2.34.1 From nobody Mon Jun 8 04:25:14 2026 Received: from CH4PR04CU002.outbound.protection.outlook.com (mail-northcentralusazon11013048.outbound.protection.outlook.com [40.107.201.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9AB102AE7A for ; Tue, 2 Jun 2026 05:26:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.201.48 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780377973; cv=fail; b=n/1u10qU7lWWifcOfhY4ukDmmKbLBaR19soxDwKWnbNF22SuzgVoUrGHoEuqUx/2F8G4ksErfYJ45TB61ZTtnYMJFrdsr3KWziErta8BnPtxDMOECAi9RSUIRGt5Mrjj5rryz68VBmD9zaWRi6PK2rLChXaWwpMTkJcFPqgGyuA= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780377973; c=relaxed/simple; bh=3qyJID67jStDgUb4mCRnJY43iAfB1h0g35WJKxQ+lic=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=fSG3JMIz89MWRBP41cg3Tmh2fTNcYO5r7HUAp4I1Gzj799lfTCxDtEqiop32OOtVBPQ4J8ymKxunBznElzZgdF6q/GJESIeln7AD5VcwFxBPV2VIEDr3L/CwzljiEJo5ISiaMRolD2t+OwXjC0INKLew7bqvQhCpsn695SMNO3U= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com; spf=fail smtp.mailfrom=amd.com; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b=fvv6Guf3; arc=fail smtp.client-ip=40.107.201.48 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="fvv6Guf3" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=Rz5lA8E0K16Iq0grSoSUWYqVHDKcB7zb9evtAQj9OnznH8I4QiDLxL7v/z+9gOm4ky9rB2mHI6+RgqSm7hp3opVAZsJrpBo5cIVHnON4Tpz7ue7btFzs9AHEtzUWxUyZLC1XVA0DM7qgFpcWuIJ1YAgLVmSLNXUxx/9KRy/0Ko4Q8UaIAeFg4d/LoV7m4sB1IPR5fx1RXBU50ZXC+HFMQ/3w45QBABlZ/2nc7MvlHA46FGpd7zWS89yii0bidpGalA9RNu1Dv+thrpfiJxyU8FrL3qc0Xnng10LCzNsKrliT0lve9P0P2ZRvQkO+METlaAdhftnwM9ffYqJxdwxjjg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Ga8MFoHOm5Ieugtnd5nMkAUAHZ+gKwabJ7cOut6lboQ=; b=xAfqRQnAVHay3SPidFb8pgnFzlxPmMbCtJm98Q/yCPNXORNzhrCNsQ9KUMh5zcXHAqVel5zorWWIR6STBpeH9BynN3uI3HWTIrVTw70RF1+iAg7JH6qWsdFGg72x23x7CLbV48lbJpbDbpx2HgXrMfFfkdaDzveNjZ1F9zegvLA10pBzqhBX4Gp8n1b4zwxb8/2SLdwx1w1YwusobVCFDPXZAO3uBh1QbqaJsItWD+PYAElnX0g2B9ta/DGqwzeyCjCnDT8FlOGk9EVUj6lXZc5hqZ5pYmyh1nzVxWqc8J6gtZWEcfTIFVVCJIVPwxlQ0xeJXXMp2xSLidrMYLaMUg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=redhat.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Ga8MFoHOm5Ieugtnd5nMkAUAHZ+gKwabJ7cOut6lboQ=; b=fvv6Guf3B+H+NgrxWugUJCp/+QJhi8LxvNg/b/zsZL+aM8r/G9+KP4zHhpFKuVXEwYEcPpuxh4JUKEgamF3UD68jtJig2EXaCW8+jTkxnRcgRZamNykIHYvs967sykHaPSMjibMoR7vIzU+0MhQmlhdzgRO3RytkU9Xxay65YwQ= Received: from SJ0PR03CA0017.namprd03.prod.outlook.com (2603:10b6:a03:33a::22) by DS0PR12MB8814.namprd12.prod.outlook.com (2603:10b6:8:14e::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.21.48.17; Tue, 2 Jun 2026 05:26:07 +0000 Received: from CO1PEPF00012E84.namprd03.prod.outlook.com (2603:10b6:a03:33a:cafe::49) by SJ0PR03CA0017.outlook.office365.com (2603:10b6:a03:33a::22) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.21.71.16 via Frontend Transport; Tue, 2 Jun 2026 05:26:07 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=satlexmb07.amd.com; pr=C Received: from satlexmb07.amd.com (165.204.84.17) by CO1PEPF00012E84.mail.protection.outlook.com (10.167.249.59) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.21.92.5 via Frontend Transport; Tue, 2 Jun 2026 05:26:07 +0000 Received: from BLRKPRNAYAK.amd.com (10.180.168.240) by satlexmb07.amd.com (10.181.42.216) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.41; Tue, 2 Jun 2026 00:26:01 -0500 From: K Prateek Nayak To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot CC: Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Aaron Lu , Josh Don , K Prateek Nayak , Subject: [PATCH v2 4/5] sched/fair: Move the throttled tasks to a local list in tg_unthrottle_up() Date: Tue, 2 Jun 2026 05:25:30 +0000 Message-ID: <20260602052531.11450-2-kprateek.nayak@amd.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260602050005.11160-1-kprateek.nayak@amd.com> References: <20260602050005.11160-1-kprateek.nayak@amd.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: satlexmb07.amd.com (10.181.42.216) To satlexmb07.amd.com (10.181.42.216) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CO1PEPF00012E84:EE_|DS0PR12MB8814:EE_ X-MS-Office365-Filtering-Correlation-Id: 220c924b-e6bb-4782-b2ce-08dec06772b7 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|376014|82310400026|1800799024|36860700016|7416014|11063799006|56012099006|18002099003|22082099003; X-Microsoft-Antispam-Message-Info: jSXpNPR34CF9WgwdO3TbcoHm91kf/67xoq/s4cJnI8lMqIjKvDEGndM+Q2AUgUuQyS87uoZqT1DHVmUP5nrA6lSpIiq9hFPg4JToJJE2q/zabJ6cT2+fr/RyxuVF5Z0bv+i+JMKR2EZQ/l0YH6eH6ZI87kyWZZ9rY0Zx2CbuqaHFSDCWPEsvOToP43Lsgl2bT7CLCfhdKS9g1ZWiuJZU3kAIWVNeHyeG65LKr6Rhl3983mjAWWWhklp5d72h8foM8d8wWst/6WTJvzudpmgO4xJXNlFp8m9SaKv9wqag/JuXKEXofvZPh1ROMWYyZoi2NnWCdzRJ9MEF7O+FbDc+ZBromWNygPkS2YRPEKZgdiXOpYKzqYwnmaoPRpqWtWt8jZ1TrNBh8Cd9SwMoySPKVF3oBe6jlmiWE9aIiXSCMphSDhpSC9yDYjQGKoh5v8W1Q9WcEgjVzjH9p394i3xEdCYVFSq2ozUJbnnwtryHasCcJ1E+ZFtjSCu4QAAUf79fohMqJKrxO+P4Nx8Ls9z1K2zffY1LSzJIIEwEgKrh93OTRwBfW3tIcpEcMOewIu154QHXbj4jeUXX0xAw9J8Qxnq3DasQtnhmehA2wKPhlG6qphWlTh8HWbq+Gy498g1YkjAXLsujAWNxkdMcWwtOH3m/LYtPaOvzEKK8Tdoqcz/smzBS1s9TuXIfTKU5gz1tUPVflx71LG3RnKy7XFYeAfy4OreBxnn/cXOgpX0XtkU= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:satlexmb07.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(376014)(82310400026)(1800799024)(36860700016)(7416014)(11063799006)(56012099006)(18002099003)(22082099003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: K13yZvaIjzIOJhHAHikIDmwGPsIs+hUPixWEPI3/ZWaGyjdNvwy3X/w6tirhW0+lGNWJmiJnQNynvzD0ITGOQ++YZoBAqgWlSHgUnUUQcqWfqvtbUWAod4D96w73ANoUSMF2gXE+XhynmFOiw22Jnnp6uPPFw9Mkk7Tbbg81CUFVknogj39GK/9WXTuAPJvRrbqNZVf8XMNzayiqwRr1LMUXyUEWWKyjYeWR5m0iuJBxVX2tPcOfmbqHu5LxEDCDt7J3aRSao5BpxWGUpdSukSnEg3TjMfTIC0nJOAv/1ZB2vhObqshQclEol43zkqLK+lJzTDfI7nWeohZGvpMEbI3SKFFZWQxrI2Ifd4AiJGHehn/mqrQxgzCf7oFKIW7mltu97lcDQq+GIC+OgRda5A2U/uFpt2oj3Egq8pYX0DruYykkLGfzpZ//6OqHYTRk X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 02 Jun 2026 05:26:07.0654 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 220c924b-e6bb-4782-b2ce-08dec06772b7 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[satlexmb07.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CO1PEPF00012E84.namprd03.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS0PR12MB8814 Content-Type: text/plain; charset="utf-8" An update_curr() during the enqueue of throttled task will start throttling the hierarchy from subsequent commit. This can lead to tg_throttle_down() seeing non-empty throttled_limbo_list for the cfs_rq attaching the task from throttled_limbo_list one by one. For example: R | A / \ *B C | rq->curr *B is throttled with tasks on hte limbo list. When the tasks are unthrottled via tg_unthrottle_up() and entity of group B is placed onto A, update_curr() is called to catch up the vruntime and it may throttle group A causing the subsequent tg_throttle_down() to see the pending task's on B's limbo list. tg_unthrottle_up() /* --cfs_rq->throttle_count =3D=3D 0 */ list_for_each_entry_safe(p, cfs_rq->throttled_limbo_list) enqueue_task_fair() enqueue_entity(se /* B->se */) update_curr(cfs_rq /* A->gcfs_rq */) account_cfs_rq_runtime(cfs_rq) throttle_cfs_rq(cfs_rq /* A->gcfs_rq */ ) tg_throttle_down() /* Reaches B->cfs_rq with throttle_count =3D=3D 0 */ !!! !list_empty(&cfs_rq->throttled_limbo_list)) !!! Move the tasks from throttled_limbo_list onto a local list before starting the unthrottle to prevent the splat described above. If the hierarchy is throttled again in middle of an unthrottle, put the pending tasks back onto the limbo list to prevent running them unnecessarily. Reviewed-by: Benjamin Segall Signed-off-by: K Prateek Nayak Tested-by: Aaron Lu --- changelog v1..v2: o Collected tags from Ben, and Aaron (Thanks a ton!) --- kernel/sched/fair.c | 23 +++++++++++++++++++++-- 1 file changed, 21 insertions(+), 2 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 16e0ccaffe6e..8169bcedcfc8 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6710,6 +6710,7 @@ static int tg_unthrottle_up(struct task_group *tg, vo= id *data) struct rq *rq =3D data; struct cfs_rq *cfs_rq =3D tg_cfs_rq(tg, cpu_of(rq)); struct task_struct *p, *tmp; + LIST_HEAD(throttled_tasks); =20 /* * If cfs_rq->curr is set, the cfs_rq might not have caught up @@ -6740,13 +6741,31 @@ static int tg_unthrottle_up(struct task_group *tg, = void *data) cfs_rq->throttled_clock_self_time +=3D delta; } =20 + /* + * Move the tasks to a local list since an update_curr() during + * enqueue_task_fair() can throttle a higher cfs_rq, and it can + * see the "throttled_limbo_list" being non-empty in + * tg_throttle_down() if throttle_count turned 0 above. + */ + list_splice_init(&cfs_rq->throttled_limbo_list, &throttled_tasks); + /* Re-enqueue the tasks that have been throttled at this level. */ - list_for_each_entry_safe(p, tmp, &cfs_rq->throttled_limbo_list, throttle_= node) { + list_for_each_entry_safe(p, tmp, &throttled_tasks, throttle_node) { + /* + * Back to being throttled! Break out and put the remaining + * tasks back onto the limbo_list to prevent running them + * unnecessarily. + */ + if (cfs_rq->throttle_count) + break; + list_del_init(&p->throttle_node); p->throttled =3D false; - enqueue_task_fair(rq_of(cfs_rq), p, ENQUEUE_WAKEUP); + enqueue_task_fair(rq, p, ENQUEUE_WAKEUP); } =20 + list_splice(&throttled_tasks, &cfs_rq->throttled_limbo_list); + /* Add cfs_rq with load or one or more already running entities to the li= st */ if (!cfs_rq_is_decayed(cfs_rq)) list_add_leaf_cfs_rq(cfs_rq); --=20 2.34.1 From nobody Mon Jun 8 04:25:14 2026 Received: from BN1PR04CU002.outbound.protection.outlook.com (mail-eastus2azon11010036.outbound.protection.outlook.com [52.101.56.36]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5695C2AE7A for ; Tue, 2 Jun 2026 05:26:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=52.101.56.36 ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780377994; cv=fail; b=C1Az2+9ZdqKRydlDOGF6pteT8Jk1NXuLw78R0Zy1+xF1CywozWqz9yIBqbqPZSHUKpwSrGFEp4UDY1G1VRL3/iJobwxASTpaFiOJMzKPNoWanVFBWK9XcD04iAn0ORshYtlD2C06MA2ADO2Y/eTFX1XL/WLGf/KnpQThHGnByj8= ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780377994; c=relaxed/simple; bh=wHajlFojkKJrtMWosKpRFbUkLkoQrb12v67+H4W7ucQ=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=FZ24cI/mCCJyhltbuO83ENxabUIOhUg7C70gJ6mTDpSM7Wp79jk7pXy5k7uDTwVD3myn9YMU9e4DvOz05yNT0WM+yRfBG8k0DNVF3+ZXtN0LarZI2nih4AlRN3JyOI5FFwc7aQ10VPxMScaWbIpu5Oo+GechhEHVJRSklHJObxY= ARC-Authentication-Results: i=2; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com; spf=fail smtp.mailfrom=amd.com; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b=j6W5P+rT; arc=fail smtp.client-ip=52.101.56.36 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="j6W5P+rT" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=X1/RcUXhrx9N8vEFELiWcbECHgAi5KcLXRFDfUK4MzAIW4KSlhbg2E23V3adQorM8ev5folDZFsfkByHBiVNTcqzvZv+4SQhTEI7sZGfw+bqkKcJ5I9Uyac6q1N8CBjirIwld5zNP6ntCgOmHiPyzKF6NFCq6oaw27NHIJnrF3c/c40w+HeLMaQGSBNpWiDuatGVsnuLhGn//Ez+XBKizp2+zuonfzGjU1BdHdhkud0X3zB1GGQRNZdpzMwHTOw5AsLC4qlEd6sUYcR6e/bZ4XYS2fG8niySour5nCa4HSZoPlmUh696+C6XuuvtvwcpojSbVlEwictI+bpOjcAe1w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=QMhkIoyvVPigGjZ9jafZqbwOyR5cSUyQz69+fpeLwWc=; b=izbHu8rcnPI4ZOjshmbI8B3xcQ11mhTz3Du7wHx/P60NxEb1Lcrpy9bx0uW6KUqxs2Sifn75cGEMi/3KPFgEsMchyFyNTPU7klgaR7qdRfJRX0/YhyizoCJAjDlybNnTJwXNSIWP7BzrBFf6ISyNxb/GFij25swvGE0PMPCEGA0PODV5V6h1toLLjfDdzKJVhT79qv1RNvmbMJBvvjX0skSgyZe8IzINWSuRCltqXE4ccF8RrYhT5cih3LG/2KTQSKG7n2BJudSwDDueUlpNtwtjg0OYdlcDRoaHXGfH80Zx5ajZtUiamObQPtQX++GjBf+lIHP+DPczwrRx1QAFGA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=redhat.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=QMhkIoyvVPigGjZ9jafZqbwOyR5cSUyQz69+fpeLwWc=; b=j6W5P+rTZICBO0Yl+vzZfd5Uvl+L5UgHxVyHoJFMgFHjMdATUBqMahmuMYhusZCZIJn+kEPW2otbrm+UE8UAxCHi2szcrhJ2EWzuHN5hwEpguDRDXMg2EW09xQzwTZquOwGuf/ybZHLOFnkKLxvVtDBQV87ikqg0TIHIXRiGA60= Received: from SJ0PR03CA0019.namprd03.prod.outlook.com (2603:10b6:a03:33a::24) by CYYPR12MB8938.namprd12.prod.outlook.com (2603:10b6:930:c7::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.21.71.16; Tue, 2 Jun 2026 05:26:24 +0000 Received: from CO1PEPF00012E84.namprd03.prod.outlook.com (2603:10b6:a03:33a:cafe::4e) by SJ0PR03CA0019.outlook.office365.com (2603:10b6:a03:33a::24) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.21.71.17 via Frontend Transport; Tue, 2 Jun 2026 05:26:24 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=satlexmb07.amd.com; pr=C Received: from satlexmb07.amd.com (165.204.84.17) by CO1PEPF00012E84.mail.protection.outlook.com (10.167.249.59) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.21.92.5 via Frontend Transport; Tue, 2 Jun 2026 05:26:24 +0000 Received: from BLRKPRNAYAK.amd.com (10.180.168.240) by satlexmb07.amd.com (10.181.42.216) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.41; Tue, 2 Jun 2026 00:26:19 -0500 From: K Prateek Nayak To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot CC: Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Aaron Lu , Josh Don , K Prateek Nayak , Subject: [PATCH v2 5/5] sched/fair: Unify cfs_rq throttling via account_cfs_rq_runtime() Date: Tue, 2 Jun 2026 05:25:31 +0000 Message-ID: <20260602052531.11450-3-kprateek.nayak@amd.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20260602050005.11160-1-kprateek.nayak@amd.com> References: <20260602050005.11160-1-kprateek.nayak@amd.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: satlexmb07.amd.com (10.181.42.216) To satlexmb07.amd.com (10.181.42.216) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CO1PEPF00012E84:EE_|CYYPR12MB8938:EE_ X-MS-Office365-Filtering-Correlation-Id: 701e699a-47d9-4132-de52-08dec0677d11 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|7416014|36860700016|1800799024|82310400026|376014|22082099003|18002099003|56012099006|11063799006; X-Microsoft-Antispam-Message-Info: DkGilhtMyBpEiJseZDToqdRz+U3uADFfsLewPHsvZDnPEB71EJob/nT2GYGB3KJ2MbUEQFIlzlFwSAQpcCoRRz7DlyUwLpD75ny1CQCKxujX/UjCZij/Ws0u3xB57sfYYQRTStxXNOFHODXY87V1EmrpHDr1X2vHE6wYxR4xPDHfKloyhrhP1vjolpW3ptuVOIHvf94FLXkYA95JNKdDRrgoUCAvefsrvviVUefhGLj99JSh+ZXHA08bj0aIULqav+0r/JlQe/yuRyfrxnb1ZweSXw4E7Y3bdPvNrRdkZXeNlyRJafi8zqH1JCcBZ8lJb73ep0sYr2/3m2NATjVk3SV3oJh4wNWfHuT2VSuQhY2QBXsvMkeMLMJsjev9qscIEAqamTiZL4CafifVEgrs4wCqfKKbyNMxlQtcly7iD4BrZ1VJrUdrDLMmGimCuKjjhny6TemFzwCglRxBsRD3gEKN1qbihrw+NkrEBqnkdMeCtBMoWDbIK57wzr46tIqczSE0YlSVY3dqW2z4Okc/abiJAUapgvd437Xohi9VUregPgt1TL+X/rHEPHsFMdspFijXk1dfcj42cvK239JEWm1z3KcaA2KV/2BcPQJh31hgSosWuhvqjRnirqqAVYUba6Wn3YsZ4XSROrXlBGqqvD/Oew1iRlMpiyIu5two4fGIrbYd5zTLkykVulrYQiZ+qMQI52IJ37Oi13rYfkbRZoaoBY3sg3n0fW85vCtbeKU= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:satlexmb07.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(7416014)(36860700016)(1800799024)(82310400026)(376014)(22082099003)(18002099003)(56012099006)(11063799006);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: 6VO53X8ZDJqgkggQKPMmmVBlLYGeGCbvB/YLCTvB2K9nAM6qNioXQ5EYGxE9ovKWM93pMKgopTWrksQeWq33sUkK/m/eRkgGJwhvRYwexq21/iVrE7FWDtlRkavMLd8gUh4pmG1x+XrckgdDMxmxTsMM1ZCZMlGrKLkmS2OL9Hi2/ZS/MFKz4rFmPCNMJifyTPGZWd4Nnh6UdgElENfvYtRQ6nBaP75L3Zm35EDFIAWNAb5pMEEWN7CayFQIaIIGKL68/ajWg9LE3QxaqcLf+ET0qzDQzttSXQRFZw7ChmdZwZt0XUUSB2gmVLbOWvlPh96sCOVdHYWOnLaV59zDGgNZoPxNxsI7OBvFoEHbzZrK8DPcaOHHmPEm86nht9t+OyYa+yOKFuNqMuDr08/FYq/N4i0+0CtDASGSEOoZMta7oTGtbolht7d4A0BflvFd X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 02 Jun 2026 05:26:24.4344 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 701e699a-47d9-4132-de52-08dec0677d11 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[satlexmb07.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CO1PEPF00012E84.namprd03.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CYYPR12MB8938 Content-Type: text/plain; charset="utf-8" From: Peter Zijlstra assign_cfs_rq_runtime() during update_curr() sets the resched indicator and relies on check_cfs_rq_runtime() during pick_next_task() / put_prev_entity() to throttle the hierarchy once current task is preempted / blocks. Per-task throttle, on the other hand, uses throttle_cfs_rq() to simply propagate the throttle signals, and then relies on task work to individually throttle the runnable tasks on their way out to the userspace. Remove check_cfs_rq_runtime() and unify throttling into account_cfs_rq_runtime() which only sets the cfs_rq->throttled, cfs_rq->throttle_count indicators via throttle_cfs_rq() and optionally adds the task work to the current task (donor) it is on the throttled hierarchy. throttle_cfs_rq() requests for sched_cfs_bandwidth_slice() worth of bandwidth for the current hierarchy that enable it to continue running uninterrupted when selected. For the rest, it requests a bare minimum of "1" to ensure some bandwidth is available and pass the "runtime_remaining > 0" checks once selected. For SCHED_PROXY_EXEC, a mutex holder cannot exit to userspace without dropping it first and the mutex_unlock() ensures proxy is stopped before the mutex handoff which preserves the current semantics for running a throttled task until it exits to the userspace even if it acts as a donor. [ prateek: rebased on tip, comments, commit message. ] Reviewed-by: Benjamin Segall Signed-off-by: Peter Zijlstra Signed-off-by: K Prateek Nayak Tested-by: Aaron Lu --- changelog v1..v2: o Added S-o-b from Peter matching queue:sched/core. o Do account_cfs_rq_runtime() selectively in set_next_task_fair() on "!first" since pick would have ensured sufficient bandwidth is available if set_next_task_fair() is called after pick which is more common after the recent pick_next_task removal. (Peter) o Collected tags from Ben, and Aaron (Thanks a ton!) --- kernel/sched/fair.c | 116 +++++++++++++++++++++++--------------------- 1 file changed, 60 insertions(+), 56 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 8169bcedcfc8..ce5cf494b934 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -525,7 +525,7 @@ static int se_is_idle(struct sched_entity *se) #endif /* !CONFIG_FAIR_GROUP_SCHED */ =20 static __always_inline -void account_cfs_rq_runtime(struct cfs_rq *cfs_rq, u64 delta_exec); +bool account_cfs_rq_runtime(struct cfs_rq *cfs_rq, u64 delta_exec); =20 /************************************************************** * Scheduling class tree data structure manipulation methods: @@ -6359,8 +6359,6 @@ pick_next_entity(struct rq *rq, struct cfs_rq *cfs_rq= , bool protect) return se; } =20 -static bool check_cfs_rq_runtime(struct cfs_rq *cfs_rq); - static void put_prev_entity(struct cfs_rq *cfs_rq, struct sched_entity *pr= ev) { /* @@ -6370,9 +6368,6 @@ static void put_prev_entity(struct cfs_rq *cfs_rq, st= ruct sched_entity *prev) if (prev->on_rq) update_curr(cfs_rq); =20 - /* throttle cfs_rqs exceeding runtime */ - check_cfs_rq_runtime(cfs_rq); - if (prev->on_rq) { update_stats_wait_start_fair(cfs_rq, prev); /* Put 'current' back into the tree. */ @@ -6507,41 +6502,32 @@ static int __assign_cfs_rq_runtime(struct cfs_bandw= idth *cfs_b, return cfs_rq->runtime_remaining > 0; } =20 -/* returns 0 on failure to allocate runtime */ -static int assign_cfs_rq_runtime(struct cfs_rq *cfs_rq) -{ - struct cfs_bandwidth *cfs_b =3D tg_cfs_bandwidth(cfs_rq->tg); - - guard(raw_spinlock)(&cfs_b->lock); - - return __assign_cfs_rq_runtime(cfs_b, cfs_rq, sched_cfs_bandwidth_slice()= ); -} +static bool throttle_cfs_rq(struct cfs_rq *cfs_rq); =20 -static void __account_cfs_rq_runtime(struct cfs_rq *cfs_rq, u64 delta_exec) +static bool __account_cfs_rq_runtime(struct cfs_rq *cfs_rq, u64 delta_exec) { /* dock delta_exec before expiring quota (as it could span periods) */ cfs_rq->runtime_remaining -=3D delta_exec; =20 if (likely(cfs_rq->runtime_remaining > 0)) - return; + return false; =20 if (cfs_rq->throttled) - return; + return true; /* - * if we're unable to extend our runtime we resched so that the active - * hierarchy can be throttled + * throttle_cfs_rq() will try to extend the runtime first + * before throttling the hierarchy. */ - if (!assign_cfs_rq_runtime(cfs_rq) && likely(cfs_rq->curr)) - resched_curr(rq_of(cfs_rq)); + return throttle_cfs_rq(cfs_rq); } =20 static __always_inline -void account_cfs_rq_runtime(struct cfs_rq *cfs_rq, u64 delta_exec) +bool account_cfs_rq_runtime(struct cfs_rq *cfs_rq, u64 delta_exec) { if (!cfs_bandwidth_used() || !cfs_rq->runtime_enabled) - return; + return false; =20 - __account_cfs_rq_runtime(cfs_rq, delta_exec); + return __account_cfs_rq_runtime(cfs_rq, delta_exec); } =20 static inline int cfs_rq_throttled(struct cfs_rq *cfs_rq) @@ -6829,10 +6815,24 @@ static int tg_throttle_down(struct task_group *tg, = void *data) =20 static bool throttle_cfs_rq(struct cfs_rq *cfs_rq) { - struct rq *rq =3D rq_of(cfs_rq); struct cfs_bandwidth *cfs_b =3D tg_cfs_bandwidth(cfs_rq->tg); + struct sched_entity *curr =3D cfs_rq->curr; + struct rq *rq =3D rq_of(cfs_rq); =20 scoped_guard(raw_spinlock, &cfs_b->lock) { + u64 target_runtime =3D 1; + + /* + * If cfs_rq->curr is still runnable, we are here from an + * update_curr(). Request sysctl_sched_cfs_bandwidth_slice + * worth of bandwidth to continue running. + * + * If the curr is not runnable, just request enough bandwidth + * to be runnable next time the pick selects this cfs_rq. + */ + if (curr && curr->on_rq) + target_runtime =3D sched_cfs_bandwidth_slice(); + /* * Check if We have raced with bandwidth becoming available. If * we actually throttled the timer might not unthrottle us for @@ -6843,7 +6843,7 @@ static bool throttle_cfs_rq(struct cfs_rq *cfs_rq) * * This will start the period timer if necessary. */ - if (__assign_cfs_rq_runtime(cfs_b, cfs_rq, 1)) + if (__assign_cfs_rq_runtime(cfs_b, cfs_rq, target_runtime)) return false; =20 /* @@ -6864,6 +6864,17 @@ static bool throttle_cfs_rq(struct cfs_rq *cfs_rq) */ cfs_rq->throttled =3D 1; WARN_ON_ONCE(cfs_rq->throttled_clock); + + /* + * If current hierarchy was throttled, add throttle work to the + * current donor. In case of proxy-execution, the execution + * context cannot exit to the userspace while holding a mutex + * and the rule of throttle deferral to only throttle the + * throttled context at exit to userspace is still preserved. + */ + if (curr && curr->on_rq) + task_throttle_setup_work(rq->donor); + return true; } =20 @@ -7254,7 +7265,7 @@ static void check_enqueue_throttle(struct cfs_rq *cfs= _rq) if (!cfs_bandwidth_used()) return; =20 - /* an active group must be handled by the update_curr()->put() path */ + /* an active group must be handled by the update_curr() path */ if (!cfs_rq->runtime_enabled || cfs_rq->curr) return; =20 @@ -7264,8 +7275,6 @@ static void check_enqueue_throttle(struct cfs_rq *cfs= _rq) =20 /* update runtime allocation */ account_cfs_rq_runtime(cfs_rq, 0); - if (cfs_rq->runtime_remaining <=3D 0) - throttle_cfs_rq(cfs_rq); } =20 static void sync_throttle(struct task_group *tg, int cpu) @@ -7295,25 +7304,6 @@ static void sync_throttle(struct task_group *tg, int= cpu) cfs_rq->pelt_clock_throttled =3D 1; } =20 -/* conditionally throttle active cfs_rq's from put_prev_entity() */ -static bool check_cfs_rq_runtime(struct cfs_rq *cfs_rq) -{ - if (!cfs_bandwidth_used()) - return false; - - if (likely(!cfs_rq->runtime_enabled || cfs_rq->runtime_remaining > 0)) - return false; - - /* - * it's possible for a throttled entity to be forced into a running - * state (e.g. set_curr_task), in this case we're finished. - */ - if (cfs_rq_throttled(cfs_rq)) - return true; - - return throttle_cfs_rq(cfs_rq); -} - static enum hrtimer_restart sched_cfs_slack_timer(struct hrtimer *timer) { struct cfs_bandwidth *cfs_b =3D @@ -7567,8 +7557,7 @@ static void sched_fair_update_stop_tick(struct rq *rq= , struct task_struct *p) =20 #else /* !CONFIG_CFS_BANDWIDTH: */ =20 -static void account_cfs_rq_runtime(struct cfs_rq *cfs_rq, u64 delta_exec) = {} -static bool check_cfs_rq_runtime(struct cfs_rq *cfs_rq) { return false; } +static bool account_cfs_rq_runtime(struct cfs_rq *cfs_rq, u64 delta_exec) = { return false; } static void check_enqueue_throttle(struct cfs_rq *cfs_rq) {} static inline void sync_throttle(struct task_group *tg, int cpu) {} static __always_inline void return_cfs_rq_runtime(struct cfs_rq *cfs_rq) {} @@ -9903,8 +9892,15 @@ struct task_struct *pick_task_fair(struct rq *rq, st= ruct rq_flags *rf) /* Might not have done put_prev_entity() */ if (cfs_rq->curr && cfs_rq->curr->on_rq) update_curr(cfs_rq); - - throttled |=3D check_cfs_rq_runtime(cfs_rq); + /* + * For the current hierarchy, update_curr() above would + * have set the throttle indicators if the cfs_rq has + * run out of bandwidth. For others, enqueue / last + * update_curr() for the cfs_rq would have ensured the + * throttle indicators are set if bandwidth was not + * available. + */ + throttled |=3D cfs_rq_throttled(cfs_rq); =20 se =3D pick_next_entity(rq, cfs_rq, true); if (!se) @@ -14823,8 +14819,8 @@ static inline void task_tick_core(struct rq *rq, st= ruct task_struct *curr) {} */ static void task_tick_fair(struct rq *rq, struct task_struct *curr, int qu= eued) { - struct cfs_rq *cfs_rq; struct sched_entity *se =3D &curr->se; + struct cfs_rq *cfs_rq; =20 for_each_sched_entity(se) { cfs_rq =3D cfs_rq_of(se); @@ -15006,6 +15002,7 @@ static void switched_to_fair(struct rq *rq, struct = task_struct *p) static void set_next_task_fair(struct rq *rq, struct task_struct *p, bool = first) { struct sched_entity *se =3D &p->se; + bool throttled =3D false; =20 for_each_sched_entity(se) { struct cfs_rq *cfs_rq =3D cfs_rq_of(se); @@ -15015,10 +15012,17 @@ static void set_next_task_fair(struct rq *rq, str= uct task_struct *p, bool first) break; =20 set_next_entity(cfs_rq, se, first); - /* ensure bandwidth has been allocated on our new cfs_rq */ - account_cfs_rq_runtime(cfs_rq, 0); + /* + * Ensure bandwidth has been allocated on our new cfs_rq + * if we've reached here for reasons other than pick. + */ + if (!first) + throttled |=3D account_cfs_rq_runtime(cfs_rq, 0); } =20 + if (throttled) + task_throttle_setup_work(p); + se =3D &p->se; =20 if (task_on_rq_queued(p)) { --=20 2.34.1