From nobody Fri Dec 19 12:47:19 2025 Received: from mail-pj1-f45.google.com (mail-pj1-f45.google.com [209.85.216.45]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3388D274FE1 for ; Tue, 20 May 2025 10:41:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.45 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747737703; cv=none; b=qaainrybSnGChsdJ8V7wQxh/kIduiuVG9Bf+CjI934TPPZ0zh8rqXE1mI3ePWhlYc/wvrH00vMM6FPnUYYD1Kt/3VmBPqqU4RJ3pH2p/2vbipJPo74lVrQDK24Sedm8F/V9wyIoJa8sQxp5LKsuBFXJs/TKsTr6bq/6H0HRJQgA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747737703; c=relaxed/simple; bh=Izb7aLi+WH+5cs0TzodTTJ7VdCAULb3n19DWVdGXYqg=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=tLz6G/gcdV19nleBbd0GM/LXQqPTrBevANXADOxxss2QCiog6RyaVp26YcysG95uKLeqBhXAN/8NOFQ2/SnuKxrRd1ZRH5Q2MsH36puLL08TP5F9fiHhrSZGrGR7dqj3qHd3FcxDQZKM/A8UTYt9YJXP36OOHonAjKisFl8JYyk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=eFQFiaYn; arc=none smtp.client-ip=209.85.216.45 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="eFQFiaYn" Received: by mail-pj1-f45.google.com with SMTP id 98e67ed59e1d1-30ea7770bd2so4073766a91.0 for ; Tue, 20 May 2025 03:41:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1747737700; x=1748342500; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=UXD9eQDD1WY6muo41vlEIFm8Mrw4lgDt+RYZt2fuhCY=; b=eFQFiaYnG8I/lhNfZBW1PCBLqlUsQwMNP6saMLlfKkCrYpXx4z7W9srP9e3RavXutu LXGL5NTIy1V+nG9yn66J0Fu3o7LMgP93xlqT3DF39qawem+Ajbw2ASA6rc96sJxRGlD5 fYHqz0CuuGC90vbI7GIwr2o6FEfTrsfviQ6JAEdtxrkL4EgWpzBJVRmRtAQUAchbNfdd JTg0SoFW7UA81PkeCFvTHGU+B1BHeyLiTlZx+ivDRLtfO8/ByUyv80bDxm1bwEti3p7T VKLjziHaMcR1GnVa1mM9Yu1Tl8gq2cvPnshLdOrLYeCsRdUxziBsoaCTqgdkHe3s4ef2 lnvw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747737700; x=1748342500; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=UXD9eQDD1WY6muo41vlEIFm8Mrw4lgDt+RYZt2fuhCY=; b=FPmnYoEicr86iSnAqtYR3BcT7z68C2MzzWjLPNtIpyLDbkY1OTmDukNadpgXytZuiA sp2DcHdZnhAgg7lPrmM5kgXxFp+GkMgH0lDSpyseit277J6LKDpLmw4PcVuFh8v+cpho 3N265n43wuy0ay1GAY3NB7wKkBes7qINRl+YsyukYE+fuTm9oj9aMQngSpJi+ynIgiNf xy10KpML3AGA/2dgzu4PDSpotcYOAR85FGhsmNoY3KmiP3C7xhG5VXdjaKn5wzCiNNO2 JrNr1UwSh9YkIZa4S8WI7D0Q/wMo3BVYJ82ECPuBfBJdePyI0N3kUPBS2Cd6nVKKE53N SRtw== X-Gm-Message-State: AOJu0Yxd+vsLZ/oVn24+DTIOWPd4anesOjfejZPe+DxtfT9O/5vAEc1U m3e2iStQVoE+D7CMXfz0sXH/bjvf+br4kwb/+jvQn/q67q8qKMJQuA7W5VCnQZXWWw== X-Gm-Gg: ASbGncvAyJQ1kmiZ5xSdS9QlhRMQVTkLCnSeZ0jeh+x2XsOEHZ1T/BZ+2b9eN4A16kn d2L+CnpC6rA9Y59pw1+R5IVSyHwCWzQgnfu10DwMLvKk6azYsf5jvPztntm3746sRIyg2QmN+K7 h1MpgzbX4XGicHFlajFRBvdU37TVHCZ+evDX9bYQs7JW/ydCZvXorbnLw017qZuUEG+kv7bQk22 7fooMwqCFiZ/RRYJ36LL5s2MUi5pI1N8f2NwoJ6/xx8WexEZID2S2R0kZRgNhOvvghi1fzXyGsw XsV66kS1DkGy122eHF03DhMYqLyVqF6xFy0P8PZwMWc1PNzR3w5P5iVBYfZ3o798ipE= X-Google-Smtp-Source: AGHT+IG2QnNtVSL+EkvUPcvvnHLtnaeImNOLRa+FXdFlfdjwO/g1dA37F8n6SYaiNH+3L7tzVxSVtw== X-Received: by 2002:a17:90b:4c47:b0:30e:391e:ac00 with SMTP id 98e67ed59e1d1-30e7d5564f2mr29766668a91.18.1747737700320; Tue, 20 May 2025 03:41:40 -0700 (PDT) Received: from n37-107-136.byted.org ([115.190.40.14]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-30f365e5d31sm1359431a91.38.2025.05.20.03.41.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 May 2025 03:41:39 -0700 (PDT) From: Aaron Lu To: Valentin Schneider , Ben Segall , K Prateek Nayak , Peter Zijlstra , Josh Don , Ingo Molnar , Vincent Guittot , Xi Wang Cc: linux-kernel@vger.kernel.org, Juri Lelli , Dietmar Eggemann , Steven Rostedt , Mel Gorman , Chengming Zhou , Chuyi Zhou , Jan Kiszka , Florian Bezdeka Subject: [PATCH 1/7] sched/fair: Add related data structure for task based throttle Date: Tue, 20 May 2025 18:41:04 +0800 Message-Id: <20250520104110.3673059-2-ziqianlu@bytedance.com> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20250520104110.3673059-1-ziqianlu@bytedance.com> References: <20250520104110.3673059-1-ziqianlu@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Valentin Schneider Add related data structures for this new throttle functionality. Signed-off-by: Valentin Schneider Signed-off-by: Aaron Lu Reviewed-by: Chengming Zhou --- include/linux/sched.h | 4 ++++ kernel/sched/core.c | 3 +++ kernel/sched/fair.c | 12 ++++++++++++ kernel/sched/sched.h | 2 ++ 4 files changed, 21 insertions(+) diff --git a/include/linux/sched.h b/include/linux/sched.h index b98195991031c..055f3782eeaee 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -879,6 +879,10 @@ struct task_struct { =20 #ifdef CONFIG_CGROUP_SCHED struct task_group *sched_task_group; +#ifdef CONFIG_CFS_BANDWIDTH + struct callback_head sched_throttle_work; + struct list_head throttle_node; +#endif #endif =20 =20 diff --git a/kernel/sched/core.c b/kernel/sched/core.c index bece0ba6f5b3a..b7ca7cefee54e 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4499,6 +4499,9 @@ static void __sched_fork(unsigned long clone_flags, s= truct task_struct *p) =20 #ifdef CONFIG_FAIR_GROUP_SCHED p->se.cfs_rq =3D NULL; +#ifdef CONFIG_CFS_BANDWIDTH + init_cfs_throttle_work(p); +#endif #endif =20 #ifdef CONFIG_SCHEDSTATS diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index eb5a2572b4f8b..75bf6186a5137 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5825,6 +5825,18 @@ static inline int throttled_lb_pair(struct task_grou= p *tg, throttled_hierarchy(dest_cfs_rq); } =20 +static void throttle_cfs_rq_work(struct callback_head *work) +{ +} + +void init_cfs_throttle_work(struct task_struct *p) +{ + init_task_work(&p->sched_throttle_work, throttle_cfs_rq_work); + /* Protect against double add, see throttle_cfs_rq() and throttle_cfs_rq_= work() */ + p->sched_throttle_work.next =3D &p->sched_throttle_work; + INIT_LIST_HEAD(&p->throttle_node); +} + static int tg_unthrottle_up(struct task_group *tg, void *data) { struct rq *rq =3D data; diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index c5a6a503eb6de..921527327f107 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2703,6 +2703,8 @@ extern bool sched_rt_bandwidth_account(struct rt_rq *= rt_rq); =20 extern void init_dl_entity(struct sched_dl_entity *dl_se); =20 +extern void init_cfs_throttle_work(struct task_struct *p); + #define BW_SHIFT 20 #define BW_UNIT (1 << BW_SHIFT) #define RATIO_SHIFT 8 --=20 2.39.5 From nobody Fri Dec 19 12:47:19 2025 Received: from mail-pj1-f51.google.com (mail-pj1-f51.google.com [209.85.216.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 512FC270570 for ; Tue, 20 May 2025 10:41:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747737709; cv=none; b=n9zzRMel81BPHxRF5h1nbnpAV2UU8e1ZmMPHBHphc9lx8VMthlrwh5pMiYxC8sCCH/iVo+7H2kXnjmrv7Mml1Rn+oUNHPmMDIPrJnJlk+hT4DJ/BWS3MkvpwlL15f0HOiDAv/KNdVyU7kheSt/MIO3f2n58R7/KbNCJ5zCU/iXs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747737709; c=relaxed/simple; bh=9AGJ4GxlRpshblPLOwtf4kmR6xioUmQ01bwBZpPUGqw=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=aNCH4ziZ1br76Vd0034DXNPt3D/6CydG5Gsl7mdInhw9gdiHk+wR/PP4c84xOVrk9bUMZlsh93+uOTMcUdNepjJ1ulI7BhU0BnTrYaAYtca6njfTeDjvHsgb3HfXDrbH0f/YrY0qlDpU40RoOT5yh2SvHmAILrswkxFvJOGwT9Y= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=Ma7Lc3Hi; arc=none smtp.client-ip=209.85.216.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="Ma7Lc3Hi" Received: by mail-pj1-f51.google.com with SMTP id 98e67ed59e1d1-30e8f4dbb72so4306704a91.1 for ; Tue, 20 May 2025 03:41:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1747737706; x=1748342506; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=hKI6AvJAKm0q4V9pf0FaG1/k8HhhTM3e80N6uB0ow4E=; b=Ma7Lc3HilydISockeP2i7hzKZPuY8r4UNYe96Pna8kDHpMQr/ycKDQzPNZFajIj4lj 1W24ABoAQMxMCl8h2drG5b+fR6xD9XnG/Wq9J9TA7Nvnz+NT71sqHYHONlj0fD9kYAU+ 6BqJesvWrkqebigGT6vSqUPa3a5Q0lpuq+SPTohjm8JsPyqmjSvXKs5m7PCYuwweJdmq /uTKJpc3raKXjtm30MmpbLMtvVJqI51sVPIyw01XTpS/HttgjLT69gujsMzMMePhALaH sScuR+/GfjBy1E/IbR/2M0E70ntmRPxKQnOq1E7gMfLWyUK7QVwPMsy0G+h8v8J69Tv5 AyGg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747737706; x=1748342506; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=hKI6AvJAKm0q4V9pf0FaG1/k8HhhTM3e80N6uB0ow4E=; b=wvbUbsNeU27rA24P0AjiRclZps0FFNfa+/BfjW1PY4T4SmS+MychTxkuXRDGeQrM/E OPg+6b2SivhDy4WpnZhqH2bXqPF5L3JwA1/Zky05u7DrVkgjaDx1dTOn0d4MjZ2HnSgN rIkgqYUwx8Si+FPd8/vAtXhqDuwEyalGQy/if4m2j+ETacEHl/o/bQH3Ybeb5ibAH3BD tlo/dFD/DQMVQP7y3QjUDCMzJm8s6Bzv6HhCRE4bDMoNO7NvPNJsBS4+34YUFfy97wdH 1v+7EnOZqAVQS6fpDz21Go8qpiUnwuLC3tohyPZ09NiI+rpxNvJE3BV9xKd5XNuVWrYm TT9w== X-Gm-Message-State: AOJu0YyWHbgB2SdU9ebw648GAN8JgPIfBu7FGXTVLlWO6/DCFLwI/YZi 7SRz12ksdfG15LYvBLk/B4cypV4CQUkWKsNyVD3YqyQrOFCiPwZJyaerFYxnGkb59Q== X-Gm-Gg: ASbGnctCUidbg5bdIJFhTFPMTKHQDTR54a3MmgNqW/zslqh8OpRJEuI122cK0Yq9nKF ssZeiGNjDUssPGnCxGTe1LAK+mw0yIA3wgpbsQsZA8aI1zLRIMLZzFnC3diCzjREe8/QvLU5trk mI0fb+Da/rCWcTH4V+arF3rTUc3uUOK2WMJ80f+3b/L0R6M+3BKeqhELz3L4q50MlHzJvg/t3FZ xQSUgfW98DS6mFGbx37KjxkBSTgaDpI72Hi+l2jURyOp4dU6rj2I5yN18m6hDRsnVMOSu/2XrNX FHVVXMBTRl9lykomUcEJdfSAqsS3bzm5uluTc5uSFcNW950DS94AbcglgtqfGFzcWNM= X-Google-Smtp-Source: AGHT+IH/PtwPd1mgjYXQvnL4fwIENCrgk+qMTuNxVA/gY12kivs7np2kOcCvSnKID14pDW2p8TjTwQ== X-Received: by 2002:a17:90b:5407:b0:309:f53c:b0a0 with SMTP id 98e67ed59e1d1-30e83225857mr23064769a91.24.1747737706371; Tue, 20 May 2025 03:41:46 -0700 (PDT) Received: from n37-107-136.byted.org ([115.190.40.14]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-30f365e5d31sm1359431a91.38.2025.05.20.03.41.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 May 2025 03:41:45 -0700 (PDT) From: Aaron Lu To: Valentin Schneider , Ben Segall , K Prateek Nayak , Peter Zijlstra , Josh Don , Ingo Molnar , Vincent Guittot , Xi Wang Cc: linux-kernel@vger.kernel.org, Juri Lelli , Dietmar Eggemann , Steven Rostedt , Mel Gorman , Chengming Zhou , Chuyi Zhou , Jan Kiszka , Florian Bezdeka Subject: [PATCH 2/7] sched/fair: prepare throttle path for task based throttle Date: Tue, 20 May 2025 18:41:05 +0800 Message-Id: <20250520104110.3673059-3-ziqianlu@bytedance.com> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20250520104110.3673059-1-ziqianlu@bytedance.com> References: <20250520104110.3673059-1-ziqianlu@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Valentin Schneider In current throttle model, when a cfs_rq is throttled, its entity will be dequeued from cpu's rq, making tasks attached to it not able to run, thus achiveing the throttle target. This has a drawback though: assume a task is a reader of percpu_rwsem and is waiting. When it gets wakeup, it can not run till its task group's next period comes, which can be a relatively long time. Waiting writer will have to wait longer due to this and it also makes further reader build up and eventually trigger task hung. To improve this situation, change the throttle model to task based, i.e. when a cfs_rq is throttled, record its throttled status but do not remove it from cpu's rq. Instead, for tasks that belong to this cfs_rq, when they get picked, add a task work to them so that when they return to user, they can be dequeued. In this way, tasks throttled will not hold any kernel resources. To avoid breaking bisect, preserve the current throttle behavior by still dequeuing throttled hierarchy from rq and because of this, no task can have that throttle task work added yet. The throttle model will switch to task based in a later patch. Suggested-by: Chengming Zhou # tag on pick Signed-off-by: Valentin Schneider Signed-off-by: Aaron Lu --- kernel/sched/fair.c | 88 +++++++++++++++++++++++++++++++++++++++----- kernel/sched/sched.h | 1 + 2 files changed, 80 insertions(+), 9 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 75bf6186a5137..e87ceb0a2d37f 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5825,8 +5825,47 @@ static inline int throttled_lb_pair(struct task_grou= p *tg, throttled_hierarchy(dest_cfs_rq); } =20 +static bool dequeue_task_fair(struct rq *rq, struct task_struct *p, int fl= ags); static void throttle_cfs_rq_work(struct callback_head *work) { + struct task_struct *p =3D container_of(work, struct task_struct, sched_th= rottle_work); + struct sched_entity *se; + struct cfs_rq *cfs_rq; + struct rq *rq; + + WARN_ON_ONCE(p !=3D current); + p->sched_throttle_work.next =3D &p->sched_throttle_work; + + /* + * If task is exiting, then there won't be a return to userspace, so we + * don't have to bother with any of this. + */ + if ((p->flags & PF_EXITING)) + return; + + scoped_guard(task_rq_lock, p) { + se =3D &p->se; + cfs_rq =3D cfs_rq_of(se); + + /* Raced, forget */ + if (p->sched_class !=3D &fair_sched_class) + return; + + /* + * If not in limbo, then either replenish has happened or this + * task got migrated out of the throttled cfs_rq, move along. + */ + if (!cfs_rq->throttle_count) + return; + rq =3D scope.rq; + update_rq_clock(rq); + WARN_ON_ONCE(!list_empty(&p->throttle_node)); + dequeue_task_fair(rq, p, DEQUEUE_SLEEP | DEQUEUE_SPECIAL); + list_add(&p->throttle_node, &cfs_rq->throttled_limbo_list); + resched_curr(rq); + } + + cond_resched_tasks_rcu_qs(); } =20 void init_cfs_throttle_work(struct task_struct *p) @@ -5866,21 +5905,42 @@ static int tg_unthrottle_up(struct task_group *tg, = void *data) return 0; } =20 +static inline bool task_has_throttle_work(struct task_struct *p) +{ + return p->sched_throttle_work.next !=3D &p->sched_throttle_work; +} + +static inline void task_throttle_setup_work(struct task_struct *p) +{ + if (task_has_throttle_work(p)) + return; + + /* + * Kthreads and exiting tasks don't return to userspace, so adding the + * work is pointless + */ + if ((p->flags & (PF_EXITING | PF_KTHREAD))) + return; + + task_work_add(p, &p->sched_throttle_work, TWA_RESUME); +} + static int tg_throttle_down(struct task_group *tg, void *data) { struct rq *rq =3D data; struct cfs_rq *cfs_rq =3D tg->cfs_rq[cpu_of(rq)]; =20 + cfs_rq->throttle_count++; + if (cfs_rq->throttle_count > 1) + return 0; + /* group is entering throttled state, stop time */ - if (!cfs_rq->throttle_count) { - cfs_rq->throttled_clock_pelt =3D rq_clock_pelt(rq); - list_del_leaf_cfs_rq(cfs_rq); + cfs_rq->throttled_clock_pelt =3D rq_clock_pelt(rq); + list_del_leaf_cfs_rq(cfs_rq); =20 - WARN_ON_ONCE(cfs_rq->throttled_clock_self); - if (cfs_rq->nr_queued) - cfs_rq->throttled_clock_self =3D rq_clock(rq); - } - cfs_rq->throttle_count++; + WARN_ON_ONCE(cfs_rq->throttled_clock_self); + if (cfs_rq->nr_queued) + cfs_rq->throttled_clock_self =3D rq_clock(rq); =20 return 0; } @@ -6575,6 +6635,7 @@ static void init_cfs_rq_runtime(struct cfs_rq *cfs_rq) cfs_rq->runtime_enabled =3D 0; INIT_LIST_HEAD(&cfs_rq->throttled_list); INIT_LIST_HEAD(&cfs_rq->throttled_csd_list); + INIT_LIST_HEAD(&cfs_rq->throttled_limbo_list); } =20 void start_cfs_bandwidth(struct cfs_bandwidth *cfs_b) @@ -6744,6 +6805,7 @@ static bool check_cfs_rq_runtime(struct cfs_rq *cfs_r= q) { return false; } static void check_enqueue_throttle(struct cfs_rq *cfs_rq) {} static inline void sync_throttle(struct task_group *tg, int cpu) {} static __always_inline void return_cfs_rq_runtime(struct cfs_rq *cfs_rq) {} +static void task_throttle_setup_work(struct task_struct *p) {} =20 static inline int cfs_rq_throttled(struct cfs_rq *cfs_rq) { @@ -8851,6 +8913,7 @@ static struct task_struct *pick_task_fair(struct rq *= rq) { struct sched_entity *se; struct cfs_rq *cfs_rq; + struct task_struct *p; =20 again: cfs_rq =3D &rq->cfs; @@ -8871,7 +8934,14 @@ static struct task_struct *pick_task_fair(struct rq = *rq) cfs_rq =3D group_cfs_rq(se); } while (cfs_rq); =20 - return task_of(se); + p =3D task_of(se); + if (throttled_hierarchy(cfs_rq_of(se))) { + /* Shuold not happen for now */ + WARN_ON_ONCE(1); + task_throttle_setup_work(p); + } + + return p; } =20 static void __set_next_task_fair(struct rq *rq, struct task_struct *p, boo= l first); diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 921527327f107..83f16fc44884f 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -736,6 +736,7 @@ struct cfs_rq { int throttle_count; struct list_head throttled_list; struct list_head throttled_csd_list; + struct list_head throttled_limbo_list; #endif /* CONFIG_CFS_BANDWIDTH */ #endif /* CONFIG_FAIR_GROUP_SCHED */ }; --=20 2.39.5 From nobody Fri Dec 19 12:47:19 2025 Received: from mail-pf1-f180.google.com (mail-pf1-f180.google.com [209.85.210.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3D85F2741CA for ; Tue, 20 May 2025 10:41:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747737715; cv=none; b=QDAvSxgLC1INI+MCUJpJgXCiHYapXtnMtdQK9EB4potrxdhylaubtixsFSQVycQ2gg2ThtAFXPHnWadVfSq8RyBsNeya2/YC/1VwvJwvhahE23z9R6H27EOpGOQplqOq9bCmnc4ODD1wdin6HQXs3kMlbjFsHBgGyUGKbJAQ1sk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747737715; c=relaxed/simple; bh=Yi3quYVqyCI1bbWOlHM5FspDzUGMS3vBIi76r6QnT34=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=H2xGISo6Fxa7/ugBnS/4a+h5sP0PFp1b8YhMU8t4hCxX75xmvL1uDPvA04PjSQRSFZ9jHShD89qaU1KZjO/y8Y++3jXrLXSWh/y/TEn5OyXypshm6+NPv7pl7HO5M3aGJDY7AHuXrkE5M/IB3Lz2Veo0Ok+2H7WSwu0WdWstSE4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=OL0ykUeL; arc=none smtp.client-ip=209.85.210.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="OL0ykUeL" Received: by mail-pf1-f180.google.com with SMTP id d2e1a72fcca58-736c277331eso5699892b3a.1 for ; Tue, 20 May 2025 03:41:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1747737712; x=1748342512; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=6gGyT2TBYGGTLbruyOBVqMqc5c3gyEsAoJSjXYUgjR0=; b=OL0ykUeLMrUYemVnmd6Dwoe/Hj1em6o3olq9nZvHCvrDMFQEVv+awmJ/dshJefa/Jg uD5YvMAONRzeIsuDU2anbkKgChHaR8p08qVzeBSsnVTKZW2XN3HlA0X4u5nc6h73qCyg uy07a3zL/NdhiRtfX2a4jwzpqxEHBkGqlx0IdCCFTwZzQdVp8GMj4gFdKnjQD6EhMxi5 3LjTiJ5UI2ZpDuAwGBLaomMKY8Xa+be+rH/ZBIiyGUgkX1bv7UAyLdDRZy2/3Kch1SWW IRsXDlupNqVjszKEHc6j6/Y2143bf+IcWV76tADyKxGJHJt+w0dhMd0+Pjs0pdEtP7lG viSw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747737712; x=1748342512; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=6gGyT2TBYGGTLbruyOBVqMqc5c3gyEsAoJSjXYUgjR0=; b=NcmP0gbfvRdBQecAWrUwTb1u71qqiEO+KkA4gxBQhme/ZAHJRX2E7M4m0LAWLfLuUi vcEsUmtHNKYFB+BmNYrl3uI8d/skAAPhBkdhhb9MoocQHkRusRNVliVeBfDFvdCctSoM sq0xLXNfPTbd822bG3TsqaGyzXwD18rf/1ubUkC6HSFR371RpQ6X24xcOBy2OUT4ZACt pBtERRwiTcDu2Q5fWJjL8ThxUD2DhR3qX8zi/hh7Jw5XioinpA6kFOJ48yhHYX91HW+M CLb5WT/Mi48M6m56k1lMkMOkNd+wo7BWnNxdImO6LTf5E/O+vSCsildLa7orHL1DmDNO MQFw== X-Gm-Message-State: AOJu0YxltYQ05DswarKMZOBS/z9sYp2tdH0bIrFxc/aHXLEVgIjvH+AV ACnSJYgs+jq0E1dZUZ/6iMAmc11G3yKCF/W2/QyeyAIcpnoUte9N7vVAx3ArI+qcmw== X-Gm-Gg: ASbGnctrQqEHaOhDuuhw6xiYMHunPW3/pIOTlDNlqV3S+tIpLV8bG7Oku0lpwBgLxxJ esslg4qLaFCdho5erZ7HMN9NnXNl0+2aykV81CSeKV79P6jgdYpJ1lq8FvzWYnlDXC4pL4oMgyt fJdSpPUrrlwaaK8wWCwwYQtI0I6rN9PYJhl5W0KBZDWUmggGblK2jndHfeG4dC6oNRkh3TnZV3A iMjD3+jpGpmTazhDTVQSpW7TP8JN1KUZLZzAaiSapNX+i93CjKnexU9GKt5VwWjeP/qzKeGKNhk YZvf+SP51RjvnyV/OjaMZKF8V8Da16hUFL5g9AGf8UW+eNJGhyaVgX8ZUC6rVhprnck= X-Google-Smtp-Source: AGHT+IENNcTSUwnsI+Z9AsK6+k8fGvZRH8A165FYAKbqg0IwJSc5LyI8x210CGvJR9VhZ9zm/LjCHw== X-Received: by 2002:a05:6a20:c707:b0:218:183b:ccaa with SMTP id adf61e73a8af0-218183bd4e6mr17182557637.17.1747737712352; Tue, 20 May 2025 03:41:52 -0700 (PDT) Received: from n37-107-136.byted.org ([115.190.40.14]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-30f365e5d31sm1359431a91.38.2025.05.20.03.41.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 May 2025 03:41:52 -0700 (PDT) From: Aaron Lu To: Valentin Schneider , Ben Segall , K Prateek Nayak , Peter Zijlstra , Josh Don , Ingo Molnar , Vincent Guittot , Xi Wang Cc: linux-kernel@vger.kernel.org, Juri Lelli , Dietmar Eggemann , Steven Rostedt , Mel Gorman , Chengming Zhou , Chuyi Zhou , Jan Kiszka , Florian Bezdeka Subject: [PATCH 3/7] sched/fair: prepare unthrottle path for task based throttle Date: Tue, 20 May 2025 18:41:06 +0800 Message-Id: <20250520104110.3673059-4-ziqianlu@bytedance.com> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20250520104110.3673059-1-ziqianlu@bytedance.com> References: <20250520104110.3673059-1-ziqianlu@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Valentin Schneider During unthrottle, enqueued back throttled tasks so that they can resume execution. This patch only sets up the enqueue mechanism; it does not take effect yet because no tasks in the throttled hierarchy are placed on the limbo list as of now. Signed-off-by: Valentin Schneider Signed-off-by: Aaron Lu --- kernel/sched/fair.c | 55 ++++++++++++++++++++++++++++++++++----------- 1 file changed, 42 insertions(+), 13 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index e87ceb0a2d37f..74bc320cbc238 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5825,6 +5825,11 @@ static inline int throttled_lb_pair(struct task_grou= p *tg, throttled_hierarchy(dest_cfs_rq); } =20 +static inline bool task_is_throttled(struct task_struct *p) +{ + return !list_empty(&p->throttle_node); +} + static bool dequeue_task_fair(struct rq *rq, struct task_struct *p, int fl= ags); static void throttle_cfs_rq_work(struct callback_head *work) { @@ -5876,32 +5881,41 @@ void init_cfs_throttle_work(struct task_struct *p) INIT_LIST_HEAD(&p->throttle_node); } =20 +static void enqueue_task_fair(struct rq *rq, struct task_struct *p, int fl= ags); static int tg_unthrottle_up(struct task_group *tg, void *data) { struct rq *rq =3D data; struct cfs_rq *cfs_rq =3D tg->cfs_rq[cpu_of(rq)]; + struct task_struct *p, *tmp; =20 cfs_rq->throttle_count--; - if (!cfs_rq->throttle_count) { - cfs_rq->throttled_clock_pelt_time +=3D rq_clock_pelt(rq) - - cfs_rq->throttled_clock_pelt; + if (cfs_rq->throttle_count) + return 0; =20 - /* Add cfs_rq with load or one or more already running entities to the l= ist */ - if (!cfs_rq_is_decayed(cfs_rq)) - list_add_leaf_cfs_rq(cfs_rq); + cfs_rq->throttled_clock_pelt_time +=3D rq_clock_pelt(rq) - + cfs_rq->throttled_clock_pelt; =20 - if (cfs_rq->throttled_clock_self) { - u64 delta =3D rq_clock(rq) - cfs_rq->throttled_clock_self; + if (cfs_rq->throttled_clock_self) { + u64 delta =3D rq_clock(rq) - cfs_rq->throttled_clock_self; =20 - cfs_rq->throttled_clock_self =3D 0; + cfs_rq->throttled_clock_self =3D 0; =20 - if (WARN_ON_ONCE((s64)delta < 0)) - delta =3D 0; + if (WARN_ON_ONCE((s64)delta < 0)) + delta =3D 0; =20 - cfs_rq->throttled_clock_self_time +=3D delta; - } + cfs_rq->throttled_clock_self_time +=3D delta; + } + + /* Re-enqueue the tasks that have been throttled at this level. */ + list_for_each_entry_safe(p, tmp, &cfs_rq->throttled_limbo_list, throttle_= node) { + list_del_init(&p->throttle_node); + enqueue_task_fair(rq_of(cfs_rq), p, ENQUEUE_WAKEUP); } =20 + /* Add cfs_rq with load or one or more already running entities to the li= st */ + if (!cfs_rq_is_decayed(cfs_rq)) + list_add_leaf_cfs_rq(cfs_rq); + return 0; } =20 @@ -6059,6 +6073,19 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq) long queued_delta, runnable_delta, idle_delta; long rq_h_nr_queued =3D rq->cfs.h_nr_queued; =20 + /* + * It's possible we are called with !runtime_remaining due to things + * like user changed quota setting(see tg_set_cfs_bandwidth()) or async + * unthrottled us with a positive runtime_remaining but other still + * running entities consumed those runtime before we reached here. + * + * Anyway, we can't unthrottle this cfs_rq without any runtime remaining + * because any enqueue in tg_unthrottle_up() will immediately trigger a + * throttle, which is not supposed to happen on unthrottle path. + */ + if (cfs_rq->runtime_enabled && cfs_rq->runtime_remaining <=3D 0) + return; + se =3D cfs_rq->tg->se[cpu_of(rq)]; =20 cfs_rq->throttled =3D 0; @@ -6806,6 +6833,7 @@ static void check_enqueue_throttle(struct cfs_rq *cfs= _rq) {} static inline void sync_throttle(struct task_group *tg, int cpu) {} static __always_inline void return_cfs_rq_runtime(struct cfs_rq *cfs_rq) {} static void task_throttle_setup_work(struct task_struct *p) {} +static bool task_is_throttled(struct task_struct *p) { return false; } =20 static inline int cfs_rq_throttled(struct cfs_rq *cfs_rq) { @@ -7014,6 +7042,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *= p, int flags) util_est_enqueue(&rq->cfs, p); =20 if (flags & ENQUEUE_DELAYED) { + WARN_ON_ONCE(task_is_throttled(p)); requeue_delayed_entity(se); return; } --=20 2.39.5 From nobody Fri Dec 19 12:47:19 2025 Received: from mail-pg1-f169.google.com (mail-pg1-f169.google.com [209.85.215.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5DE7F2741CA for ; Tue, 20 May 2025 10:41:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747737721; cv=none; b=r7X2CC+AzRZdwM22bjAGEES/kHjzhEsx+7Ve98sNpPJmmosJ3W5kYtWAu29g0zka36thguGW7z88hZ0f7Hl/PieMT/05nS8+nrBuMGEOOXSkDqf+rEnxZdNWe8Qp91jxNM1Yq5xpjvmsqW1ElL0Tk4rYHnGrHpT1rRrzn2qCJmc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747737721; c=relaxed/simple; bh=mUWTXN9HHfTlRXKGpPk+mQJc0KnWojsAPguAI+nOV+U=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=fL8w3M40/eEqj0qEjTGPXkRO6BdJM2SS+j+p8MzFauNv3hE9UVQOXP0Wd5H7rPu/JorkTrGspemH0k1i17Bd0egRAebeFdAq2ooWymouo29HKw7rpGQxRAZ2EM6YHTDWnLXWlGMcHOe+7yXd8dDBfQWDOKnFIQukQYqWqTonjZc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=U5qMx1CD; arc=none smtp.client-ip=209.85.215.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="U5qMx1CD" Received: by mail-pg1-f169.google.com with SMTP id 41be03b00d2f7-b0b2d0b2843so4193583a12.2 for ; Tue, 20 May 2025 03:41:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1747737718; x=1748342518; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=o/EhzKMaxIvvDCcy/f4feKI9fqwkuL11s7nWdOGRixo=; b=U5qMx1CDhFxQ2AbPAyPfVLaEQQOIMtN5iSLgBUaZTEMy49KWN+0ZiNJyiGwiDVkbbY KTyLDiRC5oAfn4n2sG6OxBHCSvYISwl97l35paU7KwqxiEJvsNDu4p4RSWjjwkCCUw2v L6nzp50qRyhkso9a27CzU2MnsMg1FwUposHld9loTQl/dSCg59QFqxKE7F/rfb/uM0ad 0mHVSOghIUTbgJMo8kcmr+YwopSyvpcSB4nS1lf+BUcE+L2mh3y3GGbpjpcRyDdPXOnt t7oJ9cKDRFTqdZgDcMdTN/pvU8UavNHjxPsTEcbN0LPAT3qhcVTJdsaj6EBYdqjZF55y 3XQw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747737718; x=1748342518; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=o/EhzKMaxIvvDCcy/f4feKI9fqwkuL11s7nWdOGRixo=; b=XKDtgPWgjlsuMKbK4Lbkt4acJkohSgtN8DK5H2nR/ws23BWBTneiWFAoPjWQkYdIjq 6H64lYMrMCFPQtVQDTEE8EceZLpPbuRSth9TNimNvYGNNmOzlKM4L6Mt+/Ej2qs/LtDX D6Hnq+18uaiOrfVnpouLln1n23ZN4xYcydZmyNR2WvtNfU5rAO0i7mTJtCSngKQhSnGd K2PwaBTpHUeY38kW6XussB0hR/DrQgTvtp67Loq0PnLBrW6avw/ghJ2Vt23u3LqjnSz9 X/HZZwngEND3bkXuzmH4VvmrlD6cuzkuQ00AyOPqky3AJ5dSKLdi9/QAWMQBEatqdlR7 zM9w== X-Gm-Message-State: AOJu0YxiIYYt+7e6Z3slSyAcR2KGInRbiWRDjw9HAQw07b8FYHas+e3b rIlWOc5U3ujfbdtLHjr4fV5+AEwnX+qxdk6khtALXSVfUvdJ6/1fTYht3DuJhAzXVQ== X-Gm-Gg: ASbGnctQR5tIZAw5I59Ch1ED3GIg0JFr19CDLHu2r3OA7u15Lf/8SWG8/MtynUFz1OP XvIrHX79CxxbnZBI6e2vXQXjlYI0GeOBG9cK2GoEPwpnXOdTeyodrSJdQ2OYPnlR2PwhTVk7PWA 3JlzlBEfeQjcT30r6f/vXQpVzb+BgSIolZTW2HYts7E2Z6IGXn2MLMOaymxr3l8IwR3fJobBFDK EXNgDEQAnTVP+HWnDKybmJ7Q8vmfJoKloxp+x0gDD5imhHrdEmOLxwKDebMGBKopeLvg8XD7ksW yfAdzOpN0fYX+10UKA7e2rYx9K6izKiJeY2eWtpQ+ns0bwIlroyfyGNRuMjFZ6GNuQnro1vgmAw ryw== X-Google-Smtp-Source: AGHT+IHKvJUSPKPdHp3qQ5+Ubc2geQwAfPn/IejVahVN8wD99rI82HgnXp5cQMlWsXUPZMovKC/fmQ== X-Received: by 2002:a17:90a:e706:b0:30e:5c7f:5d26 with SMTP id 98e67ed59e1d1-30e7d5aca67mr25983405a91.24.1747737718476; Tue, 20 May 2025 03:41:58 -0700 (PDT) Received: from n37-107-136.byted.org ([115.190.40.14]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-30f365e5d31sm1359431a91.38.2025.05.20.03.41.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 May 2025 03:41:58 -0700 (PDT) From: Aaron Lu To: Valentin Schneider , Ben Segall , K Prateek Nayak , Peter Zijlstra , Josh Don , Ingo Molnar , Vincent Guittot , Xi Wang Cc: linux-kernel@vger.kernel.org, Juri Lelli , Dietmar Eggemann , Steven Rostedt , Mel Gorman , Chengming Zhou , Chuyi Zhou , Jan Kiszka , Florian Bezdeka Subject: [PATCH 4/7] sched/fair: Take care of group/affinity/sched_class change for throttled task Date: Tue, 20 May 2025 18:41:07 +0800 Message-Id: <20250520104110.3673059-5-ziqianlu@bytedance.com> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20250520104110.3673059-1-ziqianlu@bytedance.com> References: <20250520104110.3673059-1-ziqianlu@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" On task group change, for tasks whose on_rq equals to TASK_ON_RQ_QUEUED, core will dequeue it and then requeued it. The throttled task is still considered as queued by core because p->on_rq is still set so core will dequeue it, but since the task is already dequeued on throttle in fair, handle this case properly. Affinity and sched class change is similar. Signed-off-by: Aaron Lu --- kernel/sched/fair.c | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 74bc320cbc238..4c66fd8d24389 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5866,6 +5866,10 @@ static void throttle_cfs_rq_work(struct callback_hea= d *work) update_rq_clock(rq); WARN_ON_ONCE(!list_empty(&p->throttle_node)); dequeue_task_fair(rq, p, DEQUEUE_SLEEP | DEQUEUE_SPECIAL); + /* + * Must not add it to limbo list before dequeue or dequeue will + * mistakenly regard this task as an already throttled one. + */ list_add(&p->throttle_node, &cfs_rq->throttled_limbo_list); resched_curr(rq); } @@ -5881,6 +5885,20 @@ void init_cfs_throttle_work(struct task_struct *p) INIT_LIST_HEAD(&p->throttle_node); } =20 +static void dequeue_throttled_task(struct task_struct *p, int flags) +{ + /* + * Task is throttled and someone wants to dequeue it again: + * it must be sched/core when core needs to do things like + * task affinity change, task group change, task sched class + * change etc. + */ + WARN_ON_ONCE(p->se.on_rq); + WARN_ON_ONCE(flags & DEQUEUE_SLEEP); + + list_del_init(&p->throttle_node); +} + static void enqueue_task_fair(struct rq *rq, struct task_struct *p, int fl= ags); static int tg_unthrottle_up(struct task_group *tg, void *data) { @@ -6834,6 +6852,7 @@ static inline void sync_throttle(struct task_group *t= g, int cpu) {} static __always_inline void return_cfs_rq_runtime(struct cfs_rq *cfs_rq) {} static void task_throttle_setup_work(struct task_struct *p) {} static bool task_is_throttled(struct task_struct *p) { return false; } +static void dequeue_throttled_task(struct task_struct *p, int flags) {} =20 static inline int cfs_rq_throttled(struct cfs_rq *cfs_rq) { @@ -7281,6 +7300,11 @@ static int dequeue_entities(struct rq *rq, struct sc= hed_entity *se, int flags) */ static bool dequeue_task_fair(struct rq *rq, struct task_struct *p, int fl= ags) { + if (unlikely(task_is_throttled(p))) { + dequeue_throttled_task(p, flags); + return true; + } + if (!(p->se.sched_delayed && (task_on_rq_migrating(p) || (flags & DEQUEUE= _SAVE)))) util_est_dequeue(&rq->cfs, p); =20 --=20 2.39.5 From nobody Fri Dec 19 12:47:19 2025 Received: from mail-pj1-f50.google.com (mail-pj1-f50.google.com [209.85.216.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A286727B4E2 for ; Tue, 20 May 2025 10:42:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.50 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747737727; cv=none; b=ROQn8WveoPkWqPlxtisEGGD5ZY9/+iHJ4Njxh/3mowLpKaAMPyIjxKfU2TxihBRsZl7kfRYvzyp/Jrymv/vJ4j/S9QKB7N+lRx2GHssRfEujBTbGRAB8w8oS3BKEMqk35WEQvsPluLZkMcXIVFZSv8s/P35/LuKBqi5BqOVgRtA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747737727; c=relaxed/simple; bh=f3NI11wY3c8+57II55JXOvfFx2iJFNe4gCEH05S+EDk=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=VO1/iY+GsxSMuwDlrGT8GH4IuaR+pup34TwXk6kYMqYZczxuzjIfZuOP7ibUVqE8JDA2P0l2lSF5PlwFA4jo6Kpyi2Z3CVE27XVH4/h0Yqrp3qTPW3iFtvWbp0JlJvzXM/8/Tc2qf0PNiv/8ncYrAMrscQPjWz+Y6N2zHumFCbA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=S4CIJvaS; arc=none smtp.client-ip=209.85.216.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="S4CIJvaS" Received: by mail-pj1-f50.google.com with SMTP id 98e67ed59e1d1-30e8f4dbb72so4307124a91.1 for ; Tue, 20 May 2025 03:42:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1747737725; x=1748342525; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=qGNKviJzaWhadrTdDs2rGTaET3ATkPE7R4ubBwCoT3Y=; b=S4CIJvaSTLSiOZIzMFShdlZNKd3Oc4vVbNjWxAhxxgoeQbssQynSzej+gXReFFPEhD QhAXhvu14TukJat8vePo5dt9LbVMB5Me5vgIG2HDNXVkDcAhY+bvDpeky3JJY2IoZxDq G2Z/aH3wSnU5oyybppqu8S2YY4acTVn/XEKTJ88xo6LDUqOgK9qY+99ryK7aPSxwpZg/ L0qzU7Hf9rbzZUOnFCQ/9aAQyzDVp/zSHa6qrF1TPvA0t3XQRHJyUWV5CkfltZDpTRlx bac56HONiBaMB4u7t0RIPby+7bcX/wISnhmapoZnCqqRg4ucu0cY7Q7ct9ZkVkt0L4ue 5m7g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747737725; x=1748342525; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=qGNKviJzaWhadrTdDs2rGTaET3ATkPE7R4ubBwCoT3Y=; b=H/ie5QbbPB5XMUh4D6AMSUjk7kpQafIHt7Xo3hHk8281xqgO2N539X0//t9sl0Xli4 mcc9Ff5PDsoSDAIcP0Ado+zzhVggxafEMeLsnU2y+MCCrIFUh5ESFZQ9SAQMg3+9ATb/ xbG5YnDXXjQnLyVATtrTVuDM95NXPTgB3CW+1v3TOn5B6OpnWGSaeKQMWegWUKWlAznC Cx+VsH15IsOc5L/hJsk5KDLCIai/76i/s7Fq32gl9GV7uUenKLfAJA36lnUcgegeYSMF 7Evy+3EaICOXLrKIbTsBhzOG/06nTq/v/fBM0Y6GgC2Qj3IMD5q3MOHgBhjyrU4M2EAf b4KA== X-Gm-Message-State: AOJu0YztVupWlvzXdzBDKR+T6gKnnNIEHrcOlUfBHDF1Ty0wUmdk9VHL o2kLSs7ZhpbJwPishoznDbPswaBBN/oXoUNrEGUKXrJQn859rCKDAsz7ZwzANVWRIw== X-Gm-Gg: ASbGncs4ywe0tnS5hkXdAZk0qpSu4ktmOGY9QY/hGKBvzzw9NuDwlWUnDD4WrP1welj gRtkVK3i+7GTKMnDgcQ1+34WGjkT2cB2bftrDBnpIqo8ylDQxwLBp741T2IfehD/xbGaqz1sS0A UnQ9TRybxDBuMx3YsMiCfGrgvmQJnBYptHr+Gz3Sq3ZWlVRUr23CA4ICTczZJpKTwI97xr9d2Qx ZGNDDQNlgR+cpyDIYA+iye1pyH2pgiUl0lesyhxnJSWklADwYOmk6TwanQHyqORdolq6KFaUTjt 0YhsDVWbqD1aAVYqM29Ps82FmGiz7bQBQgZAgwgCm6N6q2Y8G6uc+3U0uQLKZPZrQCE= X-Google-Smtp-Source: AGHT+IFRYqXeyuRE9uH0d/fZONsHwBwYJVUQxmwaod5YhfMdNCz4cup1gdbEkeUDcCoigoP103ai4g== X-Received: by 2002:a17:90b:4b82:b0:2ff:6f88:b04a with SMTP id 98e67ed59e1d1-30e8314d95bmr23728515a91.15.1747737724595; Tue, 20 May 2025 03:42:04 -0700 (PDT) Received: from n37-107-136.byted.org ([115.190.40.14]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-30f365e5d31sm1359431a91.38.2025.05.20.03.41.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 May 2025 03:42:04 -0700 (PDT) From: Aaron Lu To: Valentin Schneider , Ben Segall , K Prateek Nayak , Peter Zijlstra , Josh Don , Ingo Molnar , Vincent Guittot , Xi Wang Cc: linux-kernel@vger.kernel.org, Juri Lelli , Dietmar Eggemann , Steven Rostedt , Mel Gorman , Chengming Zhou , Chuyi Zhou , Jan Kiszka , Florian Bezdeka Subject: [PATCH 5/7] sched/fair: switch to task based throttle model Date: Tue, 20 May 2025 18:41:08 +0800 Message-Id: <20250520104110.3673059-6-ziqianlu@bytedance.com> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20250520104110.3673059-1-ziqianlu@bytedance.com> References: <20250520104110.3673059-1-ziqianlu@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Valentin Schneider Now with all the preparatory work in place, switch to task based throttle model by: - on throttle time, do not remove the hierarchy in runqueue but rely on task work added in pick_task_fair() to do the actual throttle/dequeue in task's ret2user path; - on unthrottle, enqueue back those throttled tasks on limbo list. Since throttle_cfs_rq() no longer remove the hierarchy from rq, its return value is not needed. Same for check_cfs_rq_runtime(). Throttled cfs_rq's leaf_cfs_rq_list is handled differently now: since a task can be enqueued to a throttled cfs_rq and gets to run, to not break the assert_list_leaf_cfs_rq() in enqueue_task_fair(), always add it to leaf cfs_rq list when it has its first entity enqueued and delete it from leaf cfs_rq list when it has no tasks enqueued. Signed-off-by: Valentin Schneider Signed-off-by: Aaron Lu --- kernel/sched/fair.c | 188 ++++++-------------------------------------- 1 file changed, 24 insertions(+), 164 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 4c66fd8d24389..a968d334e8730 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5359,18 +5359,17 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct sched_= entity *se, int flags) =20 if (cfs_rq->nr_queued =3D=3D 1) { check_enqueue_throttle(cfs_rq); - if (!throttled_hierarchy(cfs_rq)) { - list_add_leaf_cfs_rq(cfs_rq); - } else { + list_add_leaf_cfs_rq(cfs_rq); #ifdef CONFIG_CFS_BANDWIDTH + if (throttled_hierarchy(cfs_rq)) { struct rq *rq =3D rq_of(cfs_rq); =20 if (cfs_rq_throttled(cfs_rq) && !cfs_rq->throttled_clock) cfs_rq->throttled_clock =3D rq_clock(rq); if (!cfs_rq->throttled_clock_self) cfs_rq->throttled_clock_self =3D rq_clock(rq); -#endif } +#endif } } =20 @@ -5409,8 +5408,6 @@ static void set_delayed(struct sched_entity *se) struct cfs_rq *cfs_rq =3D cfs_rq_of(se); =20 cfs_rq->h_nr_runnable--; - if (cfs_rq_throttled(cfs_rq)) - break; } } =20 @@ -5431,8 +5428,6 @@ static void clear_delayed(struct sched_entity *se) struct cfs_rq *cfs_rq =3D cfs_rq_of(se); =20 cfs_rq->h_nr_runnable++; - if (cfs_rq_throttled(cfs_rq)) - break; } } =20 @@ -5518,8 +5513,11 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_e= ntity *se, int flags) if (flags & DEQUEUE_DELAYED) finish_delayed_dequeue_entity(se); =20 - if (cfs_rq->nr_queued =3D=3D 0) + if (cfs_rq->nr_queued =3D=3D 0) { update_idle_cfs_rq_clock_pelt(cfs_rq); + if (throttled_hierarchy(cfs_rq)) + list_del_leaf_cfs_rq(cfs_rq); + } =20 return true; } @@ -5600,7 +5598,7 @@ pick_next_entity(struct rq *rq, struct cfs_rq *cfs_rq) return se; } =20 -static bool check_cfs_rq_runtime(struct cfs_rq *cfs_rq); +static void check_cfs_rq_runtime(struct cfs_rq *cfs_rq); =20 static void put_prev_entity(struct cfs_rq *cfs_rq, struct sched_entity *pr= ev) { @@ -5968,22 +5966,22 @@ static int tg_throttle_down(struct task_group *tg, = void *data) =20 /* group is entering throttled state, stop time */ cfs_rq->throttled_clock_pelt =3D rq_clock_pelt(rq); - list_del_leaf_cfs_rq(cfs_rq); =20 WARN_ON_ONCE(cfs_rq->throttled_clock_self); if (cfs_rq->nr_queued) cfs_rq->throttled_clock_self =3D rq_clock(rq); + else + list_del_leaf_cfs_rq(cfs_rq); =20 + WARN_ON_ONCE(!list_empty(&cfs_rq->throttled_limbo_list)); return 0; } =20 -static bool throttle_cfs_rq(struct cfs_rq *cfs_rq) +static void throttle_cfs_rq(struct cfs_rq *cfs_rq) { struct rq *rq =3D rq_of(cfs_rq); struct cfs_bandwidth *cfs_b =3D tg_cfs_bandwidth(cfs_rq->tg); - struct sched_entity *se; - long queued_delta, runnable_delta, idle_delta, dequeue =3D 1; - long rq_h_nr_queued =3D rq->cfs.h_nr_queued; + int dequeue =3D 1; =20 raw_spin_lock(&cfs_b->lock); /* This will start the period timer if necessary */ @@ -6004,74 +6002,13 @@ static bool throttle_cfs_rq(struct cfs_rq *cfs_rq) raw_spin_unlock(&cfs_b->lock); =20 if (!dequeue) - return false; /* Throttle no longer required. */ - - se =3D cfs_rq->tg->se[cpu_of(rq_of(cfs_rq))]; + return; /* Throttle no longer required. */ =20 /* freeze hierarchy runnable averages while throttled */ rcu_read_lock(); walk_tg_tree_from(cfs_rq->tg, tg_throttle_down, tg_nop, (void *)rq); rcu_read_unlock(); =20 - queued_delta =3D cfs_rq->h_nr_queued; - runnable_delta =3D cfs_rq->h_nr_runnable; - idle_delta =3D cfs_rq->h_nr_idle; - for_each_sched_entity(se) { - struct cfs_rq *qcfs_rq =3D cfs_rq_of(se); - int flags; - - /* throttled entity or throttle-on-deactivate */ - if (!se->on_rq) - goto done; - - /* - * Abuse SPECIAL to avoid delayed dequeue in this instance. - * This avoids teaching dequeue_entities() about throttled - * entities and keeps things relatively simple. - */ - flags =3D DEQUEUE_SLEEP | DEQUEUE_SPECIAL; - if (se->sched_delayed) - flags |=3D DEQUEUE_DELAYED; - dequeue_entity(qcfs_rq, se, flags); - - if (cfs_rq_is_idle(group_cfs_rq(se))) - idle_delta =3D cfs_rq->h_nr_queued; - - qcfs_rq->h_nr_queued -=3D queued_delta; - qcfs_rq->h_nr_runnable -=3D runnable_delta; - qcfs_rq->h_nr_idle -=3D idle_delta; - - if (qcfs_rq->load.weight) { - /* Avoid re-evaluating load for this entity: */ - se =3D parent_entity(se); - break; - } - } - - for_each_sched_entity(se) { - struct cfs_rq *qcfs_rq =3D cfs_rq_of(se); - /* throttled entity or throttle-on-deactivate */ - if (!se->on_rq) - goto done; - - update_load_avg(qcfs_rq, se, 0); - se_update_runnable(se); - - if (cfs_rq_is_idle(group_cfs_rq(se))) - idle_delta =3D cfs_rq->h_nr_queued; - - qcfs_rq->h_nr_queued -=3D queued_delta; - qcfs_rq->h_nr_runnable -=3D runnable_delta; - qcfs_rq->h_nr_idle -=3D idle_delta; - } - - /* At this point se is NULL and we are at root level*/ - sub_nr_running(rq, queued_delta); - - /* Stop the fair server if throttling resulted in no runnable tasks */ - if (rq_h_nr_queued && !rq->cfs.h_nr_queued) - dl_server_stop(&rq->fair_server); -done: /* * Note: distribution will already see us throttled via the * throttled-list. rq->lock protects completion. @@ -6080,16 +6017,14 @@ static bool throttle_cfs_rq(struct cfs_rq *cfs_rq) WARN_ON_ONCE(cfs_rq->throttled_clock); if (cfs_rq->nr_queued) cfs_rq->throttled_clock =3D rq_clock(rq); - return true; + return; } =20 void unthrottle_cfs_rq(struct cfs_rq *cfs_rq) { struct rq *rq =3D rq_of(cfs_rq); struct cfs_bandwidth *cfs_b =3D tg_cfs_bandwidth(cfs_rq->tg); - struct sched_entity *se; - long queued_delta, runnable_delta, idle_delta; - long rq_h_nr_queued =3D rq->cfs.h_nr_queued; + struct sched_entity *se =3D cfs_rq->tg->se[cpu_of(rq)]; =20 /* * It's possible we are called with !runtime_remaining due to things @@ -6132,62 +6067,8 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq) if (list_add_leaf_cfs_rq(cfs_rq_of(se))) break; } - goto unthrottle_throttle; } =20 - queued_delta =3D cfs_rq->h_nr_queued; - runnable_delta =3D cfs_rq->h_nr_runnable; - idle_delta =3D cfs_rq->h_nr_idle; - for_each_sched_entity(se) { - struct cfs_rq *qcfs_rq =3D cfs_rq_of(se); - - /* Handle any unfinished DELAY_DEQUEUE business first. */ - if (se->sched_delayed) { - int flags =3D DEQUEUE_SLEEP | DEQUEUE_DELAYED; - - dequeue_entity(qcfs_rq, se, flags); - } else if (se->on_rq) - break; - enqueue_entity(qcfs_rq, se, ENQUEUE_WAKEUP); - - if (cfs_rq_is_idle(group_cfs_rq(se))) - idle_delta =3D cfs_rq->h_nr_queued; - - qcfs_rq->h_nr_queued +=3D queued_delta; - qcfs_rq->h_nr_runnable +=3D runnable_delta; - qcfs_rq->h_nr_idle +=3D idle_delta; - - /* end evaluation on encountering a throttled cfs_rq */ - if (cfs_rq_throttled(qcfs_rq)) - goto unthrottle_throttle; - } - - for_each_sched_entity(se) { - struct cfs_rq *qcfs_rq =3D cfs_rq_of(se); - - update_load_avg(qcfs_rq, se, UPDATE_TG); - se_update_runnable(se); - - if (cfs_rq_is_idle(group_cfs_rq(se))) - idle_delta =3D cfs_rq->h_nr_queued; - - qcfs_rq->h_nr_queued +=3D queued_delta; - qcfs_rq->h_nr_runnable +=3D runnable_delta; - qcfs_rq->h_nr_idle +=3D idle_delta; - - /* end evaluation on encountering a throttled cfs_rq */ - if (cfs_rq_throttled(qcfs_rq)) - goto unthrottle_throttle; - } - - /* Start the fair server if un-throttling resulted in new runnable tasks = */ - if (!rq_h_nr_queued && rq->cfs.h_nr_queued) - dl_server_start(&rq->fair_server); - - /* At this point se is NULL and we are at root level*/ - add_nr_running(rq, queued_delta); - -unthrottle_throttle: assert_list_leaf_cfs_rq(rq); =20 /* Determine whether we need to wake up potentially idle CPU: */ @@ -6569,22 +6450,22 @@ static void sync_throttle(struct task_group *tg, in= t cpu) } =20 /* conditionally throttle active cfs_rq's from put_prev_entity() */ -static bool check_cfs_rq_runtime(struct cfs_rq *cfs_rq) +static void check_cfs_rq_runtime(struct cfs_rq *cfs_rq) { if (!cfs_bandwidth_used()) - return false; + return; =20 if (likely(!cfs_rq->runtime_enabled || cfs_rq->runtime_remaining > 0)) - return false; + return; =20 /* * it's possible for a throttled entity to be forced into a running * state (e.g. set_curr_task), in this case we're finished. */ if (cfs_rq_throttled(cfs_rq)) - return true; + return; =20 - return throttle_cfs_rq(cfs_rq); + throttle_cfs_rq(cfs_rq); } =20 static enum hrtimer_restart sched_cfs_slack_timer(struct hrtimer *timer) @@ -6846,7 +6727,7 @@ static void sched_fair_update_stop_tick(struct rq *rq= , struct task_struct *p) #else /* CONFIG_CFS_BANDWIDTH */ =20 static void account_cfs_rq_runtime(struct cfs_rq *cfs_rq, u64 delta_exec) = {} -static bool check_cfs_rq_runtime(struct cfs_rq *cfs_rq) { return false; } +static void check_cfs_rq_runtime(struct cfs_rq *cfs_rq) {} static void check_enqueue_throttle(struct cfs_rq *cfs_rq) {} static inline void sync_throttle(struct task_group *tg, int cpu) {} static __always_inline void return_cfs_rq_runtime(struct cfs_rq *cfs_rq) {} @@ -7104,10 +6985,6 @@ enqueue_task_fair(struct rq *rq, struct task_struct = *p, int flags) if (cfs_rq_is_idle(cfs_rq)) h_nr_idle =3D 1; =20 - /* end evaluation on encountering a throttled cfs_rq */ - if (cfs_rq_throttled(cfs_rq)) - goto enqueue_throttle; - flags =3D ENQUEUE_WAKEUP; } =20 @@ -7129,10 +7006,6 @@ enqueue_task_fair(struct rq *rq, struct task_struct = *p, int flags) =20 if (cfs_rq_is_idle(cfs_rq)) h_nr_idle =3D 1; - - /* end evaluation on encountering a throttled cfs_rq */ - if (cfs_rq_throttled(cfs_rq)) - goto enqueue_throttle; } =20 if (!rq_h_nr_queued && rq->cfs.h_nr_queued) { @@ -7162,7 +7035,6 @@ enqueue_task_fair(struct rq *rq, struct task_struct *= p, int flags) if (!task_new) check_update_overutilized_status(rq); =20 -enqueue_throttle: assert_list_leaf_cfs_rq(rq); =20 hrtick_update(rq); @@ -7220,10 +7092,6 @@ static int dequeue_entities(struct rq *rq, struct sc= hed_entity *se, int flags) if (cfs_rq_is_idle(cfs_rq)) h_nr_idle =3D h_nr_queued; =20 - /* end evaluation on encountering a throttled cfs_rq */ - if (cfs_rq_throttled(cfs_rq)) - return 0; - /* Don't dequeue parent if it has other entities besides us */ if (cfs_rq->load.weight) { slice =3D cfs_rq_min_slice(cfs_rq); @@ -7260,10 +7128,6 @@ static int dequeue_entities(struct rq *rq, struct sc= hed_entity *se, int flags) =20 if (cfs_rq_is_idle(cfs_rq)) h_nr_idle =3D h_nr_queued; - - /* end evaluation on encountering a throttled cfs_rq */ - if (cfs_rq_throttled(cfs_rq)) - return 0; } =20 sub_nr_running(rq, h_nr_queued); @@ -8978,8 +8842,7 @@ static struct task_struct *pick_task_fair(struct rq *= rq) if (cfs_rq->curr && cfs_rq->curr->on_rq) update_curr(cfs_rq); =20 - if (unlikely(check_cfs_rq_runtime(cfs_rq))) - goto again; + check_cfs_rq_runtime(cfs_rq); =20 se =3D pick_next_entity(rq, cfs_rq); if (!se) @@ -8988,11 +8851,8 @@ static struct task_struct *pick_task_fair(struct rq = *rq) } while (cfs_rq); =20 p =3D task_of(se); - if (throttled_hierarchy(cfs_rq_of(se))) { - /* Shuold not happen for now */ - WARN_ON_ONCE(1); + if (throttled_hierarchy(cfs_rq_of(se))) task_throttle_setup_work(p); - } =20 return p; } --=20 2.39.5 From nobody Fri Dec 19 12:47:19 2025 Received: from mail-pj1-f41.google.com (mail-pj1-f41.google.com [209.85.216.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 942AB27EC97 for ; Tue, 20 May 2025 10:42:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.41 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747737733; cv=none; b=tB5qZ3so5nJ3WDz64bKrC0PoXsg2a0o7WjxvfT7hx2QCsI6GHZfoKaqUtS3ebxVnL+Ja1WZSO4ZekNIqapsAvh7Rk7GVOal7SYGp3ysRrXUxky7QT868+MCc9Zl5TVa8h3zaRvrJl06Om5vQ3D1UypDEgal+CprCO/Lu0N2UaEY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747737733; c=relaxed/simple; bh=0dZ1drf8rOlm0eW/UihINT07gNG+dj8YiiGFJaN8q9w=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=ltKZxSmmrHSp0sLTkMANMjVZoMkA1x7+5/PRAaeWE8dWhkDUrZPsxMHbBZRgPTCkxa4OfvUipoDL+OH4pChr//kAvTm+bzkHmGWfqoRgGfvUjK/QeWk5KDRT3R/5Z+hybvuIp7Dx8RFZ6jcQ+WEA74eSPbRbB3aod9XGjn5016E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=RUkhKFqH; arc=none smtp.client-ip=209.85.216.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="RUkhKFqH" Received: by mail-pj1-f41.google.com with SMTP id 98e67ed59e1d1-30e9e81d517so2684358a91.1 for ; Tue, 20 May 2025 03:42:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1747737731; x=1748342531; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=UEMmT73jJYNNK4t8cAMM0nD3eDabIAA39FkqxQKoUqU=; b=RUkhKFqH+0vMhTOEdjAa4RlO4/+ofix7PkAik/Aq+VTiqDRrFNomXCH/Ke5+wPugM7 Jyn90/Xro5cSVvEKgEdDyqvDlbu1uL6QTiXoJqBFy48cVJP0L7FaFPlWuffJazU9oaEf TwfYPS4/B91EZB49uRjdY95c9dWBgUjE1DtGTzCEKdQGAvlj1cZyeL2xjCfpjJoIz0HG qz5XQO5OlqOEkTbmaJuyFLP9+cY3T8QcxUjaAe3mpjdtodQfBgP7orSUaUxm/1oDf+GV TRDcNiHqIF/f71Qrd1uZYEBt5pPMnPsYM6OVOeC3YeQCnv9VkhQYGlzmdH5lVoAOnfl9 AH5w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747737731; x=1748342531; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=UEMmT73jJYNNK4t8cAMM0nD3eDabIAA39FkqxQKoUqU=; b=sXB63vGgVQx8qxZd7tj6BZbk3uKMJfdwWk4+oSujnF6aIPaaBgEgXY9RUbuCpdAcsh OQf+v70fwRSKfo41OumrGh4nnBnEk9wc8aknjy2ftFB7x05nuNMSDDz31vtHmvs5Zr5f ZdTYOdyWPG+lUnh8u741hPgS8TdRXAD6ckYNHKI7LH5+vFXPk36EZDKZnJba9lZhgu+T 59xydF9S33DuZU9R745Ms3ETsxBa7Lyx3ZLPn7s2awc6rWmBTPZNtxnd9VdeoK8XV+pL /i6F9DMCzx0RMqIWwNCFFkIb0JgSmyxptW8RqIkFA+aSEFpN5ZqS7dhDV/9hBRZCKnVh HPJQ== X-Gm-Message-State: AOJu0Yzi0TGuSfx4bH5MkwPfZXVIKsrDyd1bCv7RHVZcVZihDRtoiTnT L2MtNqe0tIn+aZ+lkg7lPEf8Dwn1hrUVZ4Cllrt5mamHFxgZ2cFPlCgNTcYHlck6gg== X-Gm-Gg: ASbGncuXuMOMvxRwriet14qaxG2F+/n4n3ZOzo/3ZcyOUnphmm767mfUZwM6Nmu4Q+1 GXwlYSWf1c9kowYxv3dItTomoSijwc19LRol4BmIErZYLn9IOXWDNboL8TA8VLps+CS1Oixynok WttpaQCTPYtIdpwXFkM1sY2IhfEP/MrY8DRlwnQ0OnN19YQttnrODZ1eecQDLdx/YHBlutZ99Z+ kDdX+BfdzZ+9DSebrkgPwO+CvQYnBbecyWugS3xFaCw7SzWRh3u8kDWWusIILGqjkXh/+uiTM/6 ChH8nicDVSaI/v3BU03qsMwxtoXqhx/q6RdE2x2shP7/DLhxTwNH1olkpHBZDlbvTHQ= X-Google-Smtp-Source: AGHT+IG1KSRO5kFcZoTO2J4HyN7oHwgR/lBFjkCfWVy+6oaO2Ks3YVK68ZEzctFgZE4Y7GmCHzXz7A== X-Received: by 2002:a17:90b:3904:b0:30e:54be:37e4 with SMTP id 98e67ed59e1d1-30e7d5564dcmr27917283a91.17.1747737730645; Tue, 20 May 2025 03:42:10 -0700 (PDT) Received: from n37-107-136.byted.org ([115.190.40.14]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-30f365e5d31sm1359431a91.38.2025.05.20.03.42.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 May 2025 03:42:10 -0700 (PDT) From: Aaron Lu To: Valentin Schneider , Ben Segall , K Prateek Nayak , Peter Zijlstra , Josh Don , Ingo Molnar , Vincent Guittot , Xi Wang Cc: linux-kernel@vger.kernel.org, Juri Lelli , Dietmar Eggemann , Steven Rostedt , Mel Gorman , Chengming Zhou , Chuyi Zhou , Jan Kiszka , Florian Bezdeka Subject: [PATCH 6/7] sched/fair: task based throttle time accounting Date: Tue, 20 May 2025 18:41:09 +0800 Message-Id: <20250520104110.3673059-7-ziqianlu@bytedance.com> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20250520104110.3673059-1-ziqianlu@bytedance.com> References: <20250520104110.3673059-1-ziqianlu@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" With task based throttle model, the previous way to check cfs_rq's nr_queued to decide if throttled time should be accounted doesn't work as expected, e.g. when a cfs_rq is throttled, its nr_queued =3D=3D 1 but then that task could block in kernel mode instead of being dequeued on limbo list. Account this as throttled time is not accurate. Rework throttle time accounting for a cfs_rq as follows: - start accounting when the first task gets throttled in its hierarchy; - stop accounting on unthrottle. Suggested-by: Chengming Zhou # accounting mechan= ism Co-developed-by: K Prateek Nayak # simplify implem= entation Signed-off-by: K Prateek Nayak Signed-off-by: Aaron Lu --- kernel/sched/fair.c | 41 +++++++++++++++++++++++------------------ kernel/sched/sched.h | 1 + 2 files changed, 24 insertions(+), 18 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index a968d334e8730..4646d4f8b878d 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5360,16 +5360,6 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct sched_e= ntity *se, int flags) if (cfs_rq->nr_queued =3D=3D 1) { check_enqueue_throttle(cfs_rq); list_add_leaf_cfs_rq(cfs_rq); -#ifdef CONFIG_CFS_BANDWIDTH - if (throttled_hierarchy(cfs_rq)) { - struct rq *rq =3D rq_of(cfs_rq); - - if (cfs_rq_throttled(cfs_rq) && !cfs_rq->throttled_clock) - cfs_rq->throttled_clock =3D rq_clock(rq); - if (!cfs_rq->throttled_clock_self) - cfs_rq->throttled_clock_self =3D rq_clock(rq); - } -#endif } } =20 @@ -5455,7 +5445,7 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_en= tity *se, int flags) * DELAY_DEQUEUE relies on spurious wakeups, special task * states must not suffer spurious wakeups, excempt them. */ - if (flags & DEQUEUE_SPECIAL) + if (flags & (DEQUEUE_SPECIAL | DEQUEUE_THROTTLE)) delay =3D false; =20 WARN_ON_ONCE(delay && se->sched_delayed); @@ -5863,7 +5853,7 @@ static void throttle_cfs_rq_work(struct callback_head= *work) rq =3D scope.rq; update_rq_clock(rq); WARN_ON_ONCE(!list_empty(&p->throttle_node)); - dequeue_task_fair(rq, p, DEQUEUE_SLEEP | DEQUEUE_SPECIAL); + dequeue_task_fair(rq, p, DEQUEUE_SLEEP | DEQUEUE_THROTTLE); /* * Must not add it to limbo list before dequeue or dequeue will * mistakenly regard this task as an already throttled one. @@ -5955,6 +5945,17 @@ static inline void task_throttle_setup_work(struct t= ask_struct *p) task_work_add(p, &p->sched_throttle_work, TWA_RESUME); } =20 +static void record_throttle_clock(struct cfs_rq *cfs_rq) +{ + struct rq *rq =3D rq_of(cfs_rq); + + if (cfs_rq_throttled(cfs_rq) && !cfs_rq->throttled_clock) + cfs_rq->throttled_clock =3D rq_clock(rq); + + if (!cfs_rq->throttled_clock_self) + cfs_rq->throttled_clock_self =3D rq_clock(rq); +} + static int tg_throttle_down(struct task_group *tg, void *data) { struct rq *rq =3D data; @@ -5967,12 +5968,10 @@ static int tg_throttle_down(struct task_group *tg, = void *data) /* group is entering throttled state, stop time */ cfs_rq->throttled_clock_pelt =3D rq_clock_pelt(rq); =20 - WARN_ON_ONCE(cfs_rq->throttled_clock_self); - if (cfs_rq->nr_queued) - cfs_rq->throttled_clock_self =3D rq_clock(rq); - else + if (!cfs_rq->nr_queued) list_del_leaf_cfs_rq(cfs_rq); =20 + WARN_ON_ONCE(cfs_rq->throttled_clock_self); WARN_ON_ONCE(!list_empty(&cfs_rq->throttled_limbo_list)); return 0; } @@ -6015,8 +6014,6 @@ static void throttle_cfs_rq(struct cfs_rq *cfs_rq) */ cfs_rq->throttled =3D 1; WARN_ON_ONCE(cfs_rq->throttled_clock); - if (cfs_rq->nr_queued) - cfs_rq->throttled_clock =3D rq_clock(rq); return; } =20 @@ -6734,6 +6731,7 @@ static __always_inline void return_cfs_rq_runtime(str= uct cfs_rq *cfs_rq) {} static void task_throttle_setup_work(struct task_struct *p) {} static bool task_is_throttled(struct task_struct *p) { return false; } static void dequeue_throttled_task(struct task_struct *p, int flags) {} +static void record_throttle_clock(struct cfs_rq *cfs_rq) {} =20 static inline int cfs_rq_throttled(struct cfs_rq *cfs_rq) { @@ -7057,6 +7055,7 @@ static int dequeue_entities(struct rq *rq, struct sch= ed_entity *se, int flags) int rq_h_nr_queued =3D rq->cfs.h_nr_queued; bool task_sleep =3D flags & DEQUEUE_SLEEP; bool task_delayed =3D flags & DEQUEUE_DELAYED; + bool task_throttled =3D flags & DEQUEUE_THROTTLE; struct task_struct *p =3D NULL; int h_nr_idle =3D 0; int h_nr_queued =3D 0; @@ -7092,6 +7091,9 @@ static int dequeue_entities(struct rq *rq, struct sch= ed_entity *se, int flags) if (cfs_rq_is_idle(cfs_rq)) h_nr_idle =3D h_nr_queued; =20 + if (throttled_hierarchy(cfs_rq) && task_throttled) + record_throttle_clock(cfs_rq); + /* Don't dequeue parent if it has other entities besides us */ if (cfs_rq->load.weight) { slice =3D cfs_rq_min_slice(cfs_rq); @@ -7128,6 +7130,9 @@ static int dequeue_entities(struct rq *rq, struct sch= ed_entity *se, int flags) =20 if (cfs_rq_is_idle(cfs_rq)) h_nr_idle =3D h_nr_queued; + + if (throttled_hierarchy(cfs_rq) && task_throttled) + record_throttle_clock(cfs_rq); } =20 sub_nr_running(rq, h_nr_queued); diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 83f16fc44884f..9ba4b8f988ebf 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2360,6 +2360,7 @@ extern const u32 sched_prio_to_wmult[40]; #define DEQUEUE_SPECIAL 0x10 #define DEQUEUE_MIGRATING 0x100 /* Matches ENQUEUE_MIGRATING */ #define DEQUEUE_DELAYED 0x200 /* Matches ENQUEUE_DELAYED */ +#define DEQUEUE_THROTTLE 0x800 =20 #define ENQUEUE_WAKEUP 0x01 #define ENQUEUE_RESTORE 0x02 --=20 2.39.5 From nobody Fri Dec 19 12:47:19 2025 Received: from mail-pj1-f49.google.com (mail-pj1-f49.google.com [209.85.216.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9B130270563 for ; Tue, 20 May 2025 10:42:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.49 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747737742; cv=none; b=ha6XPdooHs3xhpyNoEcEXW6GTwS2b5HqRz0xQixNPfzUjpLbHq08Pe48hgjA3yFSf9vYQ1KhAcr26JIvewb21/TS8kES957sFievG5esy+hP95CaMlsZvYJQQjGGfA4SEvg+8pSizI0XV3iQ1RNcqmMzE5aO5zGxErroDAydUwc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747737742; c=relaxed/simple; bh=DNI4KUTdIiNAMAxBFxHwmbuQrjAHQHbL5Toe3Sp6AFA=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=KB2LvguI9u+KrBtasYJqxSDOxUwzqSpL6T8UUi51kWGrqRns8kqq9fWMusM5mK0UzY5leIPqZCJTuMWwDfPg+bDCapAvUjuN/1SGrFxfhjo0DYMNsW2wP9qNtI9/JGNidQO/LjIbgSH8HtLbVEq2KBexLoZB07RQgGCtYWnC89c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=ZIKWR9Jb; arc=none smtp.client-ip=209.85.216.49 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="ZIKWR9Jb" Received: by mail-pj1-f49.google.com with SMTP id 98e67ed59e1d1-30ec226b7d6so2473857a91.2 for ; Tue, 20 May 2025 03:42:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1747737737; x=1748342537; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=CaS0HedpcaRBMMa9Dux7MJUqBml94jvopVU0/1WL8p4=; b=ZIKWR9Jbp2FJi2pqt0phaEhQzHlAVKXf/pDNlPMKMWv3QXUN2IAjKWlEPI1DOjNRDg WNcIQfsvtm+yD0ujKzYsfMgHHWNkerT31jPpjtaKNbOSQkQ9fQjXEz5In4dukVYfGezq szXWyM4WMZ1EPEHoevbYUxinsA7pgd1y7Cf52fAgbyY4Yt3Hx1k2xwDBb2z2vXTaCoBu DG23D5NZlgYS2D0vzGR6x6FSUbtLxJcOvKP2nDeS+dtpJSlvTMpyzWR6U3UNkapE23lo 3j8j+JiRDf219Qwh07DWubbeVU5XUKB03GJaxFs36ay1tX0u6qkXx7hBvS5XZOsFUlQu ngcg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747737737; x=1748342537; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=CaS0HedpcaRBMMa9Dux7MJUqBml94jvopVU0/1WL8p4=; b=kJOZJ9lGsiwQS9jNHfKji6geYZ68e/YZ7WzZElAubGJ9t44DD3clfHCNAt97en5GHY 1+/BcRnmpBKqe8DMOGjGfWTjugMmcylZJpiAKpvzRykuix67rlfy3iXcWF5h7Eah7/yG 4cKKkSsqWsOOUpLTfzI4yi4DQ5Co+vr6NHW1SB/LTjvEjB2a1uPzVGMGUqL+Qxa0+0xR uTmlGgQgaHQUd/DL6JfdIKIqMhmbm8HJvLnNZfh6wuVEkLeL88XOKTVk955LA5x8H3eQ cmvi2oX6NNERk2Wr4GHVM1SBI04sBcKvK7OYWF79rtB0W45oKmhc5b4UC/DDCx89zI2s b9iQ== X-Gm-Message-State: AOJu0YwIWkGIwKND4BrMvp/qJ3Z9myhYeI4isBJiPWr5B1wxHFVrVb+9 phDZpJq27HXvH2dBbR8y3NTyq0FvRlc+OXDbOpCSQ4UWYTj8dIp4r3ZwbVtkos5ugA== X-Gm-Gg: ASbGncue6fBGVDh8RX7HpjIBQHbl7Ye6urpPPjq17rf1IWHdzTTl4QEsBbYt1OAD+g6 g5bOmyY9I+RJNrCgygrqRuQdZkfPcQoGwdDkgOi9P8/+qVLIywk0DwEX+GS2gyinwG4IKtDN9fh e0I0/hCKUvt5eth3vJDxIJrSrvXIVLZq35z/0sKSWQvhoNi0FltgJAMV2Jo4h8gXTY1ZhgKZEfu CTe6RVcTC2PmlijieHX5dDHabJBToPDOqAVkIzwLyPaGQDSElvqEPxLxnXYzJuhaFyDqf5aizjR nZ833ojoNfZXVkCQZdzYV3gt9rhNdd/n871mzY5FI5tutNHAcfIGIbdEU58YD4iGctw= X-Google-Smtp-Source: AGHT+IHpBDXGYWl/RrcXTJSh17Hti9EqQPY0w9/SiV2MgnsTxwyXp7rWcAnLd8AJrVPHipv79AC9Gg== X-Received: by 2002:a17:90b:3c8f:b0:30c:540b:9ba with SMTP id 98e67ed59e1d1-30e7d51fa7cmr25284471a91.10.1747737736680; Tue, 20 May 2025 03:42:16 -0700 (PDT) Received: from n37-107-136.byted.org ([115.190.40.14]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-30f365e5d31sm1359431a91.38.2025.05.20.03.42.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 May 2025 03:42:16 -0700 (PDT) From: Aaron Lu To: Valentin Schneider , Ben Segall , K Prateek Nayak , Peter Zijlstra , Josh Don , Ingo Molnar , Vincent Guittot , Xi Wang Cc: linux-kernel@vger.kernel.org, Juri Lelli , Dietmar Eggemann , Steven Rostedt , Mel Gorman , Chengming Zhou , Chuyi Zhou , Jan Kiszka , Florian Bezdeka Subject: [PATCH 7/7] sched/fair: get rid of throttled_lb_pair() Date: Tue, 20 May 2025 18:41:10 +0800 Message-Id: <20250520104110.3673059-8-ziqianlu@bytedance.com> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20250520104110.3673059-1-ziqianlu@bytedance.com> References: <20250520104110.3673059-1-ziqianlu@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Now that throttled tasks are dequeued and can not stay on rq's cfs_tasks list, there is no need to take special care of these throttled tasks anymore in load balance. Suggested-by: K Prateek Nayak Signed-off-by: Aaron Lu --- kernel/sched/fair.c | 33 +++------------------------------ 1 file changed, 3 insertions(+), 30 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 4646d4f8b878d..89afa472299b7 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5796,23 +5796,6 @@ static inline int throttled_hierarchy(struct cfs_rq = *cfs_rq) return cfs_bandwidth_used() && cfs_rq->throttle_count; } =20 -/* - * Ensure that neither of the group entities corresponding to src_cpu or - * dest_cpu are members of a throttled hierarchy when performing group - * load-balance operations. - */ -static inline int throttled_lb_pair(struct task_group *tg, - int src_cpu, int dest_cpu) -{ - struct cfs_rq *src_cfs_rq, *dest_cfs_rq; - - src_cfs_rq =3D tg->cfs_rq[src_cpu]; - dest_cfs_rq =3D tg->cfs_rq[dest_cpu]; - - return throttled_hierarchy(src_cfs_rq) || - throttled_hierarchy(dest_cfs_rq); -} - static inline bool task_is_throttled(struct task_struct *p) { return !list_empty(&p->throttle_node); @@ -6743,12 +6726,6 @@ static inline int throttled_hierarchy(struct cfs_rq = *cfs_rq) return 0; } =20 -static inline int throttled_lb_pair(struct task_group *tg, - int src_cpu, int dest_cpu) -{ - return 0; -} - #ifdef CONFIG_FAIR_GROUP_SCHED void init_cfs_bandwidth(struct cfs_bandwidth *cfs_b, struct cfs_bandwidth = *parent) {} static void init_cfs_rq_runtime(struct cfs_rq *cfs_rq) {} @@ -9387,17 +9364,13 @@ int can_migrate_task(struct task_struct *p, struct = lb_env *env) /* * We do not migrate tasks that are: * 1) delayed dequeued unless we migrate load, or - * 2) throttled_lb_pair, or - * 3) cannot be migrated to this CPU due to cpus_ptr, or - * 4) running (obviously), or - * 5) are cache-hot on their current CPU. + * 2) cannot be migrated to this CPU due to cpus_ptr, or + * 3) running (obviously), or + * 4) are cache-hot on their current CPU. */ if ((p->se.sched_delayed) && (env->migration_type !=3D migrate_load)) return 0; =20 - if (throttled_lb_pair(task_group(p), env->src_cpu, env->dst_cpu)) - return 0; - /* * We want to prioritize the migration of eligible tasks. * For ineligible tasks we soft-limit them and only allow --=20 2.39.5