From nobody Mon Feb 9 03:47:11 2026 Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1A15F261593 for ; Wed, 9 Apr 2025 12:08:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744200494; cv=none; b=ZLwy6awpjyRnhqJcJ3ABJj2+da4OYpzyrZxg0NIHd4f15pOIOdDbSFugMoLGsJO45vioaq4hddQLdl6Dc7OMYMJzCT82pxduJmDa/kolrd4AQ2v5qpGO4qNzExEmJrMZJKPW7TAokSIVs3903iOKdFUJLJipQ8O3esJQP26UX1o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744200494; c=relaxed/simple; bh=9glDjXquJfqO6C9W/LiTBdIBwxP+T5lvdPxOUIqveuI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=P4gegrskyBMy3h32jAiaSVAOpBN/8GVY+lfJ9wOFh1uwQmicKCaPCg6HDWnzLGbsQtabMyHEv0G1Uxi/yAQ827Zgzq92gUMR4cRuh1X+taSdcBj5y+hLH+j6Gtw7XxC3my8NhhRhrvAsrssU2y4oYLd/ollE84Khev7hKbrwS6w= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=RTYvVG0E; arc=none smtp.client-ip=209.85.214.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="RTYvVG0E" Received: by mail-pl1-f173.google.com with SMTP id d9443c01a7336-225df540edcso6284535ad.0 for ; Wed, 09 Apr 2025 05:08:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1744200491; x=1744805291; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=0G4KHeZYSdHP0bakHehhXz8YvR1HH0AVesSjdKA/SFo=; b=RTYvVG0EK+IBMZiwS195VnrWPq9l9bpwRyLggkGqL6Tg3/AgrQ6xHl8ztRtijEW5bn iAPnK88WREgBEBGiFGr5ShRHp3jI/PwS5/V/GC2qrpNiXVyOhOgMMlMxC89dQfFsgoti iCmM+HPIk3QQ0DZcl2ykcow35KsytiBu3QBg6wJ6oD9JZPp5TbaGHFxZxAHivW06Kry3 MoGoWQxNq/s9/ud6+VVvqeAJ/zDJUw9Evf9iPo4Di1NdImE8Ofh2Q6nTiAIs7kyoy9Uw ah7lM+j60a17Gsmk06xkyU6UqdMF9NNXif79ds/ae9QIte1R6lE8wlWkwo0Nxd/0tbQN yOyQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744200491; x=1744805291; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=0G4KHeZYSdHP0bakHehhXz8YvR1HH0AVesSjdKA/SFo=; b=jMX6FpgbknjNuABmp4/uXXtEJKe2ORtANiA2DYUSI2iBkwL3kOahrLxMAa1NH6yJ33 QQyvR023F2XKPVW5aVSdsbGenje/xMmuQKxDrjzHoYHFr+fxvk5pufZQtnNqTG1nFqVE z3ivbjlQr6scfIBZhJLAEtweiH22Wu5QIsbB/taQDmM+gmDuzSOjH/2NspLgN/zcTIaz JBp0c+B6I4M7H+2/AhZRJDX1rB6mpeDOB1CC2HxRKL31Ja9bOLMH/6FI0iCp9y4+lytK i/yFZ9N/QjnR0D76mo+JX9yJUwMcfJ1MwyJfwreuSo2+rwQ+fQY1t4IvmqjEELExZo9T GD/w== X-Gm-Message-State: AOJu0Yy64qVvWJnRSLge3MFkRQ7C64g87JPhwMzztm7/6JIN+V5xJou7 n9svBd2nJ2c1VwgkMKLvixFfaWfYDVJt6DD5RA4JAl8VUyOHojWW4y0ptVVfLg== X-Gm-Gg: ASbGncuvihBflTEI4HHEvUyOXEn3zbQGdubLAGF1cHo6oWNNGbM3wXULPbeHnr7dYJe O73/77CizLcqODZ0iHzNRhP6F/+tUcQUJeiINTIln7BwYTW8kOpGgCs/sJr4Yn5HUASO5QfYC4N HUUVsLFf7ynRIBpi8JLvnyiIfftI7uv27qZOr7xalEWGVgFrNLp7MS7LVs48fPzkac1SyzI+iMJ s9vaIjM7hdncxtqa7uSmdWlk8A9czVv+bmxtRnn80qCUiHcJWCq7WMqQHfsbnI1EQlgBE4RZpnS 10abIVTPtZh525hkD0F2v1m3ER+UCry0mBRG5j8RHCxj1qIQJVjgrfTE X-Google-Smtp-Source: AGHT+IHCPanQS0GuZf7KyCMruQ/cZnhPHMapypJWdh/EsdTJuf9XaMPAcRacN2sdK9eNOuAfC3pTFw== X-Received: by 2002:a17:902:f68d:b0:223:517a:d4ed with SMTP id d9443c01a7336-22ac325acccmr36547005ad.15.1744200491279; Wed, 09 Apr 2025 05:08:11 -0700 (PDT) Received: from n37-107-136.byted.org ([115.190.40.11]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-22ac7b8c62dsm10017875ad.95.2025.04.09.05.08.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 09 Apr 2025 05:08:10 -0700 (PDT) From: Aaron Lu To: Valentin Schneider , Ben Segall , K Prateek Nayak , Peter Zijlstra , Josh Don , Ingo Molnar , Vincent Guittot , Xi Wang Cc: linux-kernel@vger.kernel.org, Juri Lelli , Dietmar Eggemann , Steven Rostedt , Mel Gorman , Chengming Zhou , Chuyi Zhou , Jan Kiszka Subject: [RFC PATCH v2 1/7] sched/fair: Add related data structure for task based throttle Date: Wed, 9 Apr 2025 20:07:40 +0800 Message-Id: <20250409120746.635476-2-ziqianlu@bytedance.com> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20250409120746.635476-1-ziqianlu@bytedance.com> References: <20250409120746.635476-1-ziqianlu@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Valentin Schneider Add related data structures for this new throttle functionality. Signed-off-by: Valentin Schneider Signed-off-by: Aaron Lu Tested-by: K Prateek Nayak --- include/linux/sched.h | 4 ++++ kernel/sched/core.c | 3 +++ kernel/sched/fair.c | 12 ++++++++++++ kernel/sched/sched.h | 2 ++ 4 files changed, 21 insertions(+) diff --git a/include/linux/sched.h b/include/linux/sched.h index f96ac19828934..0b55c79fee209 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -880,6 +880,10 @@ struct task_struct { =20 #ifdef CONFIG_CGROUP_SCHED struct task_group *sched_task_group; +#ifdef CONFIG_CFS_BANDWIDTH + struct callback_head sched_throttle_work; + struct list_head throttle_node; +#endif #endif =20 =20 diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 79692f85643fe..3b8735bc527da 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4492,6 +4492,9 @@ static void __sched_fork(unsigned long clone_flags, s= truct task_struct *p) =20 #ifdef CONFIG_FAIR_GROUP_SCHED p->se.cfs_rq =3D NULL; +#ifdef CONFIG_CFS_BANDWIDTH + init_cfs_throttle_work(p); +#endif #endif =20 #ifdef CONFIG_SCHEDSTATS diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 0c19459c80422..894202d232efd 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5823,6 +5823,18 @@ static inline int throttled_lb_pair(struct task_grou= p *tg, throttled_hierarchy(dest_cfs_rq); } =20 +static void throttle_cfs_rq_work(struct callback_head *work) +{ +} + +void init_cfs_throttle_work(struct task_struct *p) +{ + init_task_work(&p->sched_throttle_work, throttle_cfs_rq_work); + /* Protect against double add, see throttle_cfs_rq() and throttle_cfs_rq_= work() */ + p->sched_throttle_work.next =3D &p->sched_throttle_work; + INIT_LIST_HEAD(&p->throttle_node); +} + static int tg_unthrottle_up(struct task_group *tg, void *data) { struct rq *rq =3D data; diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index c5a6a503eb6de..921527327f107 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2703,6 +2703,8 @@ extern bool sched_rt_bandwidth_account(struct rt_rq *= rt_rq); =20 extern void init_dl_entity(struct sched_dl_entity *dl_se); =20 +extern void init_cfs_throttle_work(struct task_struct *p); + #define BW_SHIFT 20 #define BW_UNIT (1 << BW_SHIFT) #define RATIO_SHIFT 8 --=20 2.39.5 From nobody Mon Feb 9 03:47:11 2026 Received: from mail-pl1-f177.google.com (mail-pl1-f177.google.com [209.85.214.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9ED3425E831 for ; Wed, 9 Apr 2025 12:08:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.177 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744200500; cv=none; b=C0vx9Ll8EuoZMZULaJbvwPmr294sjA1sJfu5PJ9uyHBXYQLcJUrwvaG0WJSrwCAFZ7NSuZX0dVxK3auJb//u/bzf6GOiAtshC338D6J9usBGsrPRGewrTLSFgj3+vQIMDvCklwCWOsb5Cm0bpushIA1MoBaAljbLkfubJ+Hz9eU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744200500; c=relaxed/simple; bh=kRYUMXW6d0pPsWafZd6VcprVmx6EwfgNkQjZuNsYIXo=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=JBAWy2dODS0+ZXXdAitj0kY4d3DqucMPcZo+cR2OW44udNJBeAPnPqozBDxOITGITdyEwrhi5GyllQr9FpiYlcMm3JQRavRzuGH72ekcvkcVt21ZoTBrx6xtisjH6kDl0zqxYu7L6bC2L2c2dElFYFpRoWMFKuFAvV9Mk0SXbDc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=WyhzWXUL; arc=none smtp.client-ip=209.85.214.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="WyhzWXUL" Received: by mail-pl1-f177.google.com with SMTP id d9443c01a7336-22423adf751so66059485ad.2 for ; Wed, 09 Apr 2025 05:08:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1744200498; x=1744805298; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=4oeve/VhH1+e4GELXdgfe5Drhb/R9u4R8BEoWz1dv+c=; b=WyhzWXULTW5hOqkjoT0djLx7thB56QfRH5O33tzL8nC5SfpVBXEfY6MSA5f+D/t3PK 6mxf02e8HI21nZ3KREMWossAa2uExzMfslTt9XaKjWXwVHJHcU2teRXMtAJoGE+VUnqH OZErMuQk2c4PrtILU+r5qoRFVVqZ2oVdJMpvWi0m47Ts1Y0bZB4lTNQPaKy4YD1WCSdR lBuhd3E3V+AShur7MyX8GC+oP4tXr/Km8OiUXF+mcmw1NMbMBUtFLTlAbEvcBl6UWt9A jQaiI52o4cSA1UmPIS/TFCHxd5qighMWgTMriniz2CsJefT0sSRMyJEquEnUNtF7TO5b Ezfw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744200498; x=1744805298; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=4oeve/VhH1+e4GELXdgfe5Drhb/R9u4R8BEoWz1dv+c=; b=qnDGBwhFnrbhomIzdv1B8+Z6tSVOuLtuXawUvOAa3fhWelP/vFtC5Iiw5z+RHliSQ3 UgctMSszwHbhEcGnTDCzO7FM/XOqZEMsciwzT++PN6slPk9/8pr+rh5zOF4Tc5by9dYC 4rebuO1cD9d880uvdl0+gvVEMUZlEREZBx0ut6OL+/x5fRCg1hSJEpqtgTBKd/wkIFvs Qg/6szolhFUfqlv7bmi+Ty63O7DLelJMET2chC/ZJbyMz3H/e/QYNC0IT3sgbAyuKzEl MS9PSMXq8uzeD0jhmIZCLL0lMs57BuLXNKhsLBmPDhkApzEpyDvZTYopqWVhUZGuFLjl TIYQ== X-Gm-Message-State: AOJu0YztKxcBiJNsO9fnVBnWXv4w3utaupUl0jO6TVxtiBLRGnrvrxsw rzYI0bWr4cbzxQTFv1B4YdhvCQPnDdCoHp5ieVeKoL6O9NB21Jfsca0zx16Xhw== X-Gm-Gg: ASbGncuhCpIvLSyEpTaShHUgfgapJX8p6UFJdjzUCUzM7Dij+kvl4EN+qNX7OxBm66+ PmmYScBu/mn0DwnLKk+WhyiH8lut1HjTrxOltlGD+JmHSV+bVxdsD4Gt6goH3CsqBpU5T4Gw9pX JLWdURg6NE03Qszik9t2/Y84GCG2S3zV25nZ86HkYU4kOD0q4LIrcf3HP9GJT3TqaHh7eCbAPmu 1/nC4gmmP7KyleDMim3IiLV2Dz02Jia1yuZ+SQbx833a5Zh15GGYTXrq1xy16iaiJkIPsngaFe1 HYzD03O7P/z4SbbiVw3J3NFmBjiyN/jx+2NZzlnPmTE5RzEFAB19fUQ7 X-Google-Smtp-Source: AGHT+IGP/j8RYXZtiunh5hRLpDqu4v2+E7dpsJv3rrL87mnIYTmCbGXZ5rbwOSecURZtXQfARyT/Hw== X-Received: by 2002:a17:903:2352:b0:223:5945:ffd5 with SMTP id d9443c01a7336-22ac2a1b6fdmr43450335ad.32.1744200497777; Wed, 09 Apr 2025 05:08:17 -0700 (PDT) Received: from n37-107-136.byted.org ([115.190.40.11]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-22ac7b8c62dsm10017875ad.95.2025.04.09.05.08.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 09 Apr 2025 05:08:17 -0700 (PDT) From: Aaron Lu To: Valentin Schneider , Ben Segall , K Prateek Nayak , Peter Zijlstra , Josh Don , Ingo Molnar , Vincent Guittot , Xi Wang Cc: linux-kernel@vger.kernel.org, Juri Lelli , Dietmar Eggemann , Steven Rostedt , Mel Gorman , Chengming Zhou , Chuyi Zhou , Jan Kiszka Subject: [RFC PATCH v2 2/7] sched/fair: Handle throttle path for task based throttle Date: Wed, 9 Apr 2025 20:07:41 +0800 Message-Id: <20250409120746.635476-3-ziqianlu@bytedance.com> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20250409120746.635476-1-ziqianlu@bytedance.com> References: <20250409120746.635476-1-ziqianlu@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Valentin Schneider In current throttle model, when a cfs_rq is throttled, its entity will be dequeued from cpu's rq, making tasks attached to it not able to run, thus achiveing the throttle target. This has a drawback though: assume a task is a reader of percpu_rwsem and is waiting. When it gets wakeup, it can not run till its task group's next period comes, which can be a relatively long time. Waiting writer will have to wait longer due to this and it also makes further reader build up and eventually trigger task hung. To improve this situation, change the throttle model to task based, i.e. when a cfs_rq is throttled, record its throttled status but do not remove it from cpu's rq. Instead, for tasks that belong to this cfs_rq, when they get picked, add a task work to them so that when they return to user, they can be dequeued. In this way, tasks throttled will not hold any kernel resources. Signed-off-by: Valentin Schneider Signed-off-by: Aaron Lu Tested-by: K Prateek Nayak --- kernel/sched/fair.c | 185 +++++++++++++++++++++---------------------- kernel/sched/sched.h | 1 + 2 files changed, 93 insertions(+), 93 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 894202d232efd..c566a5a90d065 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5516,8 +5516,11 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_e= ntity *se, int flags) if (flags & DEQUEUE_DELAYED) finish_delayed_dequeue_entity(se); =20 - if (cfs_rq->nr_queued =3D=3D 0) + if (cfs_rq->nr_queued =3D=3D 0) { update_idle_cfs_rq_clock_pelt(cfs_rq); + if (throttled_hierarchy(cfs_rq)) + list_del_leaf_cfs_rq(cfs_rq); + } =20 return true; } @@ -5598,7 +5601,7 @@ pick_next_entity(struct rq *rq, struct cfs_rq *cfs_rq) return se; } =20 -static bool check_cfs_rq_runtime(struct cfs_rq *cfs_rq); +static void check_cfs_rq_runtime(struct cfs_rq *cfs_rq); =20 static void put_prev_entity(struct cfs_rq *cfs_rq, struct sched_entity *pr= ev) { @@ -5823,8 +5826,48 @@ static inline int throttled_lb_pair(struct task_grou= p *tg, throttled_hierarchy(dest_cfs_rq); } =20 +static bool dequeue_task_fair(struct rq *rq, struct task_struct *p, int fl= ags); static void throttle_cfs_rq_work(struct callback_head *work) { + struct task_struct *p =3D container_of(work, struct task_struct, sched_th= rottle_work); + struct sched_entity *se; + struct cfs_rq *cfs_rq; + struct rq *rq; + + WARN_ON_ONCE(p !=3D current); + p->sched_throttle_work.next =3D &p->sched_throttle_work; + + /* + * If task is exiting, then there won't be a return to userspace, so we + * don't have to bother with any of this. + */ + if ((p->flags & PF_EXITING)) + return; + + scoped_guard(task_rq_lock, p) { + se =3D &p->se; + cfs_rq =3D cfs_rq_of(se); + + /* Raced, forget */ + if (p->sched_class !=3D &fair_sched_class) + return; + + /* + * If not in limbo, then either replenish has happened or this + * task got migrated out of the throttled cfs_rq, move along. + */ + if (!cfs_rq->throttle_count) + return; + + rq =3D scope.rq; + update_rq_clock(rq); + WARN_ON_ONCE(!list_empty(&p->throttle_node)); + dequeue_task_fair(rq, p, DEQUEUE_SLEEP | DEQUEUE_SPECIAL); + list_add(&p->throttle_node, &cfs_rq->throttled_limbo_list); + resched_curr(rq); + } + + cond_resched_tasks_rcu_qs(); } =20 void init_cfs_throttle_work(struct task_struct *p) @@ -5864,32 +5907,53 @@ static int tg_unthrottle_up(struct task_group *tg, = void *data) return 0; } =20 +static inline bool task_has_throttle_work(struct task_struct *p) +{ + return p->sched_throttle_work.next !=3D &p->sched_throttle_work; +} + +static inline void task_throttle_setup_work(struct task_struct *p) +{ + if (task_has_throttle_work(p)) + return; + + /* + * Kthreads and exiting tasks don't return to userspace, so adding the + * work is pointless + */ + if ((p->flags & (PF_EXITING | PF_KTHREAD))) + return; + + task_work_add(p, &p->sched_throttle_work, TWA_RESUME); +} + static int tg_throttle_down(struct task_group *tg, void *data) { struct rq *rq =3D data; struct cfs_rq *cfs_rq =3D tg->cfs_rq[cpu_of(rq)]; =20 + cfs_rq->throttle_count++; + if (cfs_rq->throttle_count > 1) + return 0; + /* group is entering throttled state, stop time */ - if (!cfs_rq->throttle_count) { - cfs_rq->throttled_clock_pelt =3D rq_clock_pelt(rq); - list_del_leaf_cfs_rq(cfs_rq); + cfs_rq->throttled_clock_pelt =3D rq_clock_pelt(rq); =20 - WARN_ON_ONCE(cfs_rq->throttled_clock_self); - if (cfs_rq->nr_queued) - cfs_rq->throttled_clock_self =3D rq_clock(rq); - } - cfs_rq->throttle_count++; + WARN_ON_ONCE(cfs_rq->throttled_clock_self); + if (cfs_rq->nr_queued) + cfs_rq->throttled_clock_self =3D rq_clock(rq); + else + list_del_leaf_cfs_rq(cfs_rq); =20 + WARN_ON_ONCE(!list_empty(&cfs_rq->throttled_limbo_list)); return 0; } =20 -static bool throttle_cfs_rq(struct cfs_rq *cfs_rq) +static void throttle_cfs_rq(struct cfs_rq *cfs_rq) { struct rq *rq =3D rq_of(cfs_rq); struct cfs_bandwidth *cfs_b =3D tg_cfs_bandwidth(cfs_rq->tg); - struct sched_entity *se; - long queued_delta, runnable_delta, idle_delta, dequeue =3D 1; - long rq_h_nr_queued =3D rq->cfs.h_nr_queued; + int dequeue =3D 1; =20 raw_spin_lock(&cfs_b->lock); /* This will start the period timer if necessary */ @@ -5910,74 +5974,13 @@ static bool throttle_cfs_rq(struct cfs_rq *cfs_rq) raw_spin_unlock(&cfs_b->lock); =20 if (!dequeue) - return false; /* Throttle no longer required. */ - - se =3D cfs_rq->tg->se[cpu_of(rq_of(cfs_rq))]; + return; /* Throttle no longer required. */ =20 /* freeze hierarchy runnable averages while throttled */ rcu_read_lock(); walk_tg_tree_from(cfs_rq->tg, tg_throttle_down, tg_nop, (void *)rq); rcu_read_unlock(); =20 - queued_delta =3D cfs_rq->h_nr_queued; - runnable_delta =3D cfs_rq->h_nr_runnable; - idle_delta =3D cfs_rq->h_nr_idle; - for_each_sched_entity(se) { - struct cfs_rq *qcfs_rq =3D cfs_rq_of(se); - int flags; - - /* throttled entity or throttle-on-deactivate */ - if (!se->on_rq) - goto done; - - /* - * Abuse SPECIAL to avoid delayed dequeue in this instance. - * This avoids teaching dequeue_entities() about throttled - * entities and keeps things relatively simple. - */ - flags =3D DEQUEUE_SLEEP | DEQUEUE_SPECIAL; - if (se->sched_delayed) - flags |=3D DEQUEUE_DELAYED; - dequeue_entity(qcfs_rq, se, flags); - - if (cfs_rq_is_idle(group_cfs_rq(se))) - idle_delta =3D cfs_rq->h_nr_queued; - - qcfs_rq->h_nr_queued -=3D queued_delta; - qcfs_rq->h_nr_runnable -=3D runnable_delta; - qcfs_rq->h_nr_idle -=3D idle_delta; - - if (qcfs_rq->load.weight) { - /* Avoid re-evaluating load for this entity: */ - se =3D parent_entity(se); - break; - } - } - - for_each_sched_entity(se) { - struct cfs_rq *qcfs_rq =3D cfs_rq_of(se); - /* throttled entity or throttle-on-deactivate */ - if (!se->on_rq) - goto done; - - update_load_avg(qcfs_rq, se, 0); - se_update_runnable(se); - - if (cfs_rq_is_idle(group_cfs_rq(se))) - idle_delta =3D cfs_rq->h_nr_queued; - - qcfs_rq->h_nr_queued -=3D queued_delta; - qcfs_rq->h_nr_runnable -=3D runnable_delta; - qcfs_rq->h_nr_idle -=3D idle_delta; - } - - /* At this point se is NULL and we are at root level*/ - sub_nr_running(rq, queued_delta); - - /* Stop the fair server if throttling resulted in no runnable tasks */ - if (rq_h_nr_queued && !rq->cfs.h_nr_queued) - dl_server_stop(&rq->fair_server); -done: /* * Note: distribution will already see us throttled via the * throttled-list. rq->lock protects completion. @@ -5986,7 +5989,7 @@ static bool throttle_cfs_rq(struct cfs_rq *cfs_rq) WARN_ON_ONCE(cfs_rq->throttled_clock); if (cfs_rq->nr_queued) cfs_rq->throttled_clock =3D rq_clock(rq); - return true; + return; } =20 void unthrottle_cfs_rq(struct cfs_rq *cfs_rq) @@ -6462,22 +6465,22 @@ static void sync_throttle(struct task_group *tg, in= t cpu) } =20 /* conditionally throttle active cfs_rq's from put_prev_entity() */ -static bool check_cfs_rq_runtime(struct cfs_rq *cfs_rq) +static void check_cfs_rq_runtime(struct cfs_rq *cfs_rq) { if (!cfs_bandwidth_used()) - return false; + return; =20 if (likely(!cfs_rq->runtime_enabled || cfs_rq->runtime_remaining > 0)) - return false; + return; =20 /* * it's possible for a throttled entity to be forced into a running * state (e.g. set_curr_task), in this case we're finished. */ if (cfs_rq_throttled(cfs_rq)) - return true; + return; =20 - return throttle_cfs_rq(cfs_rq); + throttle_cfs_rq(cfs_rq); } =20 static enum hrtimer_restart sched_cfs_slack_timer(struct hrtimer *timer) @@ -6573,6 +6576,7 @@ static void init_cfs_rq_runtime(struct cfs_rq *cfs_rq) cfs_rq->runtime_enabled =3D 0; INIT_LIST_HEAD(&cfs_rq->throttled_list); INIT_LIST_HEAD(&cfs_rq->throttled_csd_list); + INIT_LIST_HEAD(&cfs_rq->throttled_limbo_list); } =20 void start_cfs_bandwidth(struct cfs_bandwidth *cfs_b) @@ -6738,10 +6742,11 @@ static void sched_fair_update_stop_tick(struct rq *= rq, struct task_struct *p) #else /* CONFIG_CFS_BANDWIDTH */ =20 static void account_cfs_rq_runtime(struct cfs_rq *cfs_rq, u64 delta_exec) = {} -static bool check_cfs_rq_runtime(struct cfs_rq *cfs_rq) { return false; } +static void check_cfs_rq_runtime(struct cfs_rq *cfs_rq) {} static void check_enqueue_throttle(struct cfs_rq *cfs_rq) {} static inline void sync_throttle(struct task_group *tg, int cpu) {} static __always_inline void return_cfs_rq_runtime(struct cfs_rq *cfs_rq) {} +static void task_throttle_setup_work(struct task_struct *p) {} =20 static inline int cfs_rq_throttled(struct cfs_rq *cfs_rq) { @@ -7108,10 +7113,6 @@ static int dequeue_entities(struct rq *rq, struct sc= hed_entity *se, int flags) if (cfs_rq_is_idle(cfs_rq)) h_nr_idle =3D h_nr_queued; =20 - /* end evaluation on encountering a throttled cfs_rq */ - if (cfs_rq_throttled(cfs_rq)) - return 0; - /* Don't dequeue parent if it has other entities besides us */ if (cfs_rq->load.weight) { slice =3D cfs_rq_min_slice(cfs_rq); @@ -7148,10 +7149,6 @@ static int dequeue_entities(struct rq *rq, struct sc= hed_entity *se, int flags) =20 if (cfs_rq_is_idle(cfs_rq)) h_nr_idle =3D h_nr_queued; - - /* end evaluation on encountering a throttled cfs_rq */ - if (cfs_rq_throttled(cfs_rq)) - return 0; } =20 sub_nr_running(rq, h_nr_queued); @@ -8860,8 +8857,7 @@ static struct task_struct *pick_task_fair(struct rq *= rq) if (cfs_rq->curr && cfs_rq->curr->on_rq) update_curr(cfs_rq); =20 - if (unlikely(check_cfs_rq_runtime(cfs_rq))) - goto again; + check_cfs_rq_runtime(cfs_rq); =20 se =3D pick_next_entity(rq, cfs_rq); if (!se) @@ -8888,6 +8884,9 @@ pick_next_task_fair(struct rq *rq, struct task_struct= *prev, struct rq_flags *rf goto idle; se =3D &p->se; =20 + if (throttled_hierarchy(cfs_rq_of(se))) + task_throttle_setup_work(p); + #ifdef CONFIG_FAIR_GROUP_SCHED if (prev->sched_class !=3D &fair_sched_class) goto simple; diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 921527327f107..97be6a6f53b9c 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -736,6 +736,7 @@ struct cfs_rq { int throttle_count; struct list_head throttled_list; struct list_head throttled_csd_list; + struct list_head throttled_limbo_list; #endif /* CONFIG_CFS_BANDWIDTH */ #endif /* CONFIG_FAIR_GROUP_SCHED */ }; --=20 2.39.5 From nobody Mon Feb 9 03:47:11 2026 Received: from mail-pg1-f182.google.com (mail-pg1-f182.google.com [209.85.215.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B20C825FA09 for ; Wed, 9 Apr 2025 12:08:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.182 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744200507; cv=none; b=HEvaE4mcCXFDPbyHayLk5lGR2uvvRZpSR9RLTpUBkgM8jd/iyhNMIu+Cz+fV5eWseVSyRFp3WY6G3MiGoJ6niUXjU9CPJhvCaLDtSJLQ2PV+0L1zcOZ51nd0+qRSZWB9sZPzbNb61lPB0Fq+euksiuddDYxDZeFBIZEX8mucjJo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744200507; c=relaxed/simple; bh=5FIyRLIuTEKE9CK1u0UbyyTdW5OtYxuO2X06M6RgAF4=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=KzpDY7yzScOQomlIgIpM7qCN6OjJ0Iffe3KUCHhba5NUtO3L+prFyaj31pfsrpvTABr7879M2jgSJ//Q8YAmbDG34elKUFBvaeME+zxAg9SSrR4J2vYDJtALFioTdnuZUBJ91aFTDxn24vDaqVxdRw3pVEbhiQg5xLgpzdcP5Qc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=N0hCUWB6; arc=none smtp.client-ip=209.85.215.182 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="N0hCUWB6" Received: by mail-pg1-f182.google.com with SMTP id 41be03b00d2f7-af908bb32fdso552113a12.1 for ; Wed, 09 Apr 2025 05:08:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1744200505; x=1744805305; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=vRjfUgyAXNg6y8q1U/ED0xC0L5K6VtUkY9Y8CsxVVuA=; b=N0hCUWB6DKIQlkIHwGt77p1IZsE1+kHZy+UgXvosOZgjSPxOVGeSgKbacwY3u10uvI W/xVETaUDKFxdy65anBKF4u52BkhxGA1G9J3258zIHOCt9UfOYvJZdVfz4UwGYEG5U87 DLXkx0mg9hjKtHXldnOn4Oie4JqobhdY7CU/3+XLPjkN1hgMvkk6asziG0VDAViiPYBs wbry/lNiQ+Ot4Gg+TQQJlH4jEYF0rRS82RdxOsqlwnfIT2Lgx0Q6eWfsSLvANtpB8APd 3/46vDBYih9iYNHTmifLmov2onRWa+vqbT9/J3C7OoGtSNsT5B6LcY84PJxLJVzdqtr5 5TIg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744200505; x=1744805305; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=vRjfUgyAXNg6y8q1U/ED0xC0L5K6VtUkY9Y8CsxVVuA=; b=NgeoTZYaXDrsqFwsHXjjSOT4/nSH7rzM0hpBsuQD9VbIwRmIZB7eXUf3snq1jxJGWt 3swAbrg+MGu4CdjffH4pHu/Z4rOWEaCe3DWZifN3PpElqwADOsUcjYTS9kn8gBf5a4Am E9DTALbfqsSqPZFsN9i8VtMULld1fbDE9NcaV2Mgh9DF7cdIGMX1MsaFZr1coszP0CFm WK4WZjmH2m8ppsMf2EUs7vW8RXA1B7vd4Y4y6S8c2Szw/LP8UcdIw/BGU/dWWN6yl1bH 3jKamNui5jWIVS9WS1W35pIc1mRv9+9XvPj3dZb2ynSFS2yhEquOo/DtTL7uXXV5OVX0 LTHA== X-Gm-Message-State: AOJu0YyZLd33wUYtw7gPqnO8Z0X/lYMpfSE1ehzl3Jx2x1jy0CdA2xuh sEhIJqUXELIFddPtITBYvQSgT2r+6w9/GLa+sT52B7BVfMLWMkBvvFRM97VYJg== X-Gm-Gg: ASbGncsYWkaLdyxb66u0vSPNBvKFmmdsZTmdcBq1YyfdATD362q/POdf2I42di9Zxr3 AV217B9Sul6iMUMImAkZ84vtcnHwcm8af8IndMOWekv6WrtBech2hue8sOZTjwU/gBW5KU8wIBR 6N4lJOb15YwVc/UxkzbeJj4SH4rUPkRjT2rRC2ZFphHlG9l/OMGwaRUw8g93MIpYR0RSOvJJDhO mUCOS1b3yc/WXm9TJFD37vxZo4ftqbICax1x1dTj5SjqnMSQlPoLU4IUDPCksaMw52ZNAOCY079 FbcroNUh0JWJweKp/RcHj76QFkOGdTflCz34sOFux0aswRb58PZ8LlUs X-Google-Smtp-Source: AGHT+IF1hsPkMFJ/W2lv3i3RQmOsxNX0efTq+Chl3HuoptfTN5xXARCntSnFNA6yZVZN6R0O2y0I1w== X-Received: by 2002:a17:90b:51d1:b0:2fa:3174:e344 with SMTP id 98e67ed59e1d1-306d0d1d6afmr10139880a91.14.1744200504872; Wed, 09 Apr 2025 05:08:24 -0700 (PDT) Received: from n37-107-136.byted.org ([115.190.40.11]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-22ac7b8c62dsm10017875ad.95.2025.04.09.05.08.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 09 Apr 2025 05:08:24 -0700 (PDT) From: Aaron Lu To: Valentin Schneider , Ben Segall , K Prateek Nayak , Peter Zijlstra , Josh Don , Ingo Molnar , Vincent Guittot , Xi Wang Cc: linux-kernel@vger.kernel.org, Juri Lelli , Dietmar Eggemann , Steven Rostedt , Mel Gorman , Chengming Zhou , Chuyi Zhou , Jan Kiszka Subject: [RFC PATCH v2 3/7] sched/fair: Handle unthrottle path for task based throttle Date: Wed, 9 Apr 2025 20:07:42 +0800 Message-Id: <20250409120746.635476-4-ziqianlu@bytedance.com> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20250409120746.635476-1-ziqianlu@bytedance.com> References: <20250409120746.635476-1-ziqianlu@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Valentin Schneider On unthrottle, enqueue throttled tasks back so they can continue to run. Note that for this task based throttling, the only throttle place is when it returns to user space so as long as a task is enqueued, no matter its cfs_rq is throttled or not, it will be allowed to run till it reaches that throttle place. leaf_cfs_rq list is handled differently now: as long as a task is enqueued to a throttled or not cfs_rq, this cfs_rq will be added to that list and when cfs_rq is throttled and all its tasks are dequeued, it will be removed from that list. I think this is easy to reason so chose to do so. Signed-off-by: Valentin Schneider Signed-off-by: Aaron Lu Tested-by: K Prateek Nayak --- kernel/sched/fair.c | 129 ++++++++++++++++---------------------------- 1 file changed, 45 insertions(+), 84 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index c566a5a90d065..4152088fc0546 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5357,18 +5357,17 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct sched_= entity *se, int flags) =20 if (cfs_rq->nr_queued =3D=3D 1) { check_enqueue_throttle(cfs_rq); - if (!throttled_hierarchy(cfs_rq)) { - list_add_leaf_cfs_rq(cfs_rq); - } else { + list_add_leaf_cfs_rq(cfs_rq); #ifdef CONFIG_CFS_BANDWIDTH + if (throttled_hierarchy(cfs_rq)) { struct rq *rq =3D rq_of(cfs_rq); =20 if (cfs_rq_throttled(cfs_rq) && !cfs_rq->throttled_clock) cfs_rq->throttled_clock =3D rq_clock(rq); if (!cfs_rq->throttled_clock_self) cfs_rq->throttled_clock_self =3D rq_clock(rq); -#endif } +#endif } } =20 @@ -5826,6 +5825,11 @@ static inline int throttled_lb_pair(struct task_grou= p *tg, throttled_hierarchy(dest_cfs_rq); } =20 +static inline bool task_is_throttled(struct task_struct *p) +{ + return !list_empty(&p->throttle_node); +} + static bool dequeue_task_fair(struct rq *rq, struct task_struct *p, int fl= ags); static void throttle_cfs_rq_work(struct callback_head *work) { @@ -5878,32 +5882,41 @@ void init_cfs_throttle_work(struct task_struct *p) INIT_LIST_HEAD(&p->throttle_node); } =20 +static void enqueue_task_fair(struct rq *rq, struct task_struct *p, int fl= ags); static int tg_unthrottle_up(struct task_group *tg, void *data) { struct rq *rq =3D data; struct cfs_rq *cfs_rq =3D tg->cfs_rq[cpu_of(rq)]; + struct task_struct *p, *tmp; =20 cfs_rq->throttle_count--; - if (!cfs_rq->throttle_count) { - cfs_rq->throttled_clock_pelt_time +=3D rq_clock_pelt(rq) - - cfs_rq->throttled_clock_pelt; + if (cfs_rq->throttle_count) + return 0; =20 - /* Add cfs_rq with load or one or more already running entities to the l= ist */ - if (!cfs_rq_is_decayed(cfs_rq)) - list_add_leaf_cfs_rq(cfs_rq); + cfs_rq->throttled_clock_pelt_time +=3D rq_clock_pelt(rq) - + cfs_rq->throttled_clock_pelt; =20 - if (cfs_rq->throttled_clock_self) { - u64 delta =3D rq_clock(rq) - cfs_rq->throttled_clock_self; + if (cfs_rq->throttled_clock_self) { + u64 delta =3D rq_clock(rq) - cfs_rq->throttled_clock_self; =20 - cfs_rq->throttled_clock_self =3D 0; + cfs_rq->throttled_clock_self =3D 0; =20 - if (WARN_ON_ONCE((s64)delta < 0)) - delta =3D 0; + if (WARN_ON_ONCE((s64)delta < 0)) + delta =3D 0; =20 - cfs_rq->throttled_clock_self_time +=3D delta; - } + cfs_rq->throttled_clock_self_time +=3D delta; + } + + /* Re-enqueue the tasks that have been throttled at this level. */ + list_for_each_entry_safe(p, tmp, &cfs_rq->throttled_limbo_list, throttle_= node) { + list_del_init(&p->throttle_node); + enqueue_task_fair(rq_of(cfs_rq), p, ENQUEUE_WAKEUP); } =20 + /* Add cfs_rq with load or one or more already running entities to the li= st */ + if (!cfs_rq_is_decayed(cfs_rq)) + list_add_leaf_cfs_rq(cfs_rq); + return 0; } =20 @@ -5996,11 +6009,20 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq) { struct rq *rq =3D rq_of(cfs_rq); struct cfs_bandwidth *cfs_b =3D tg_cfs_bandwidth(cfs_rq->tg); - struct sched_entity *se; - long queued_delta, runnable_delta, idle_delta; - long rq_h_nr_queued =3D rq->cfs.h_nr_queued; + struct sched_entity *se =3D cfs_rq->tg->se[cpu_of(rq)]; =20 - se =3D cfs_rq->tg->se[cpu_of(rq)]; + /* + * It's possible we are called with !runtime_remaining due to things + * like user changed quota setting(see tg_set_cfs_bandwidth()) or async + * unthrottled us with a positive runtime_remaining but other still + * running entities consumed those runtime before we reach here. + * + * Anyway, we can't unthrottle this cfs_rq without any runtime remaining + * because any enqueue below will immediately trigger a throttle, which + * is not supposed to happen on unthrottle path. + */ + if (cfs_rq->runtime_enabled && cfs_rq->runtime_remaining <=3D 0) + return; =20 cfs_rq->throttled =3D 0; =20 @@ -6028,62 +6050,8 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq) if (list_add_leaf_cfs_rq(cfs_rq_of(se))) break; } - goto unthrottle_throttle; } =20 - queued_delta =3D cfs_rq->h_nr_queued; - runnable_delta =3D cfs_rq->h_nr_runnable; - idle_delta =3D cfs_rq->h_nr_idle; - for_each_sched_entity(se) { - struct cfs_rq *qcfs_rq =3D cfs_rq_of(se); - - /* Handle any unfinished DELAY_DEQUEUE business first. */ - if (se->sched_delayed) { - int flags =3D DEQUEUE_SLEEP | DEQUEUE_DELAYED; - - dequeue_entity(qcfs_rq, se, flags); - } else if (se->on_rq) - break; - enqueue_entity(qcfs_rq, se, ENQUEUE_WAKEUP); - - if (cfs_rq_is_idle(group_cfs_rq(se))) - idle_delta =3D cfs_rq->h_nr_queued; - - qcfs_rq->h_nr_queued +=3D queued_delta; - qcfs_rq->h_nr_runnable +=3D runnable_delta; - qcfs_rq->h_nr_idle +=3D idle_delta; - - /* end evaluation on encountering a throttled cfs_rq */ - if (cfs_rq_throttled(qcfs_rq)) - goto unthrottle_throttle; - } - - for_each_sched_entity(se) { - struct cfs_rq *qcfs_rq =3D cfs_rq_of(se); - - update_load_avg(qcfs_rq, se, UPDATE_TG); - se_update_runnable(se); - - if (cfs_rq_is_idle(group_cfs_rq(se))) - idle_delta =3D cfs_rq->h_nr_queued; - - qcfs_rq->h_nr_queued +=3D queued_delta; - qcfs_rq->h_nr_runnable +=3D runnable_delta; - qcfs_rq->h_nr_idle +=3D idle_delta; - - /* end evaluation on encountering a throttled cfs_rq */ - if (cfs_rq_throttled(qcfs_rq)) - goto unthrottle_throttle; - } - - /* Start the fair server if un-throttling resulted in new runnable tasks = */ - if (!rq_h_nr_queued && rq->cfs.h_nr_queued) - dl_server_start(&rq->fair_server); - - /* At this point se is NULL and we are at root level*/ - add_nr_running(rq, queued_delta); - -unthrottle_throttle: assert_list_leaf_cfs_rq(rq); =20 /* Determine whether we need to wake up potentially idle CPU: */ @@ -6747,6 +6715,7 @@ static void check_enqueue_throttle(struct cfs_rq *cfs= _rq) {} static inline void sync_throttle(struct task_group *tg, int cpu) {} static __always_inline void return_cfs_rq_runtime(struct cfs_rq *cfs_rq) {} static void task_throttle_setup_work(struct task_struct *p) {} +static bool task_is_throttled(struct task_struct *p) { return false; } =20 static inline int cfs_rq_throttled(struct cfs_rq *cfs_rq) { @@ -6955,6 +6924,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *= p, int flags) util_est_enqueue(&rq->cfs, p); =20 if (flags & ENQUEUE_DELAYED) { + WARN_ON_ONCE(task_is_throttled(p)); requeue_delayed_entity(se); return; } @@ -6997,10 +6967,6 @@ enqueue_task_fair(struct rq *rq, struct task_struct = *p, int flags) if (cfs_rq_is_idle(cfs_rq)) h_nr_idle =3D 1; =20 - /* end evaluation on encountering a throttled cfs_rq */ - if (cfs_rq_throttled(cfs_rq)) - goto enqueue_throttle; - flags =3D ENQUEUE_WAKEUP; } =20 @@ -7022,10 +6988,6 @@ enqueue_task_fair(struct rq *rq, struct task_struct = *p, int flags) =20 if (cfs_rq_is_idle(cfs_rq)) h_nr_idle =3D 1; - - /* end evaluation on encountering a throttled cfs_rq */ - if (cfs_rq_throttled(cfs_rq)) - goto enqueue_throttle; } =20 if (!rq_h_nr_queued && rq->cfs.h_nr_queued) { @@ -7055,7 +7017,6 @@ enqueue_task_fair(struct rq *rq, struct task_struct *= p, int flags) if (!task_new) check_update_overutilized_status(rq); =20 -enqueue_throttle: assert_list_leaf_cfs_rq(rq); =20 hrtick_update(rq); --=20 2.39.5 From nobody Mon Feb 9 03:47:11 2026 Received: from mail-pl1-f169.google.com (mail-pl1-f169.google.com [209.85.214.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 48805262811 for ; Wed, 9 Apr 2025 12:08:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744200513; cv=none; b=M+vEbKWNpLsszXBVlCvVX+sO367rABSB77nkZqpr8kMHFeWjnhjuHNNDOoAXZ+5QiBY1Dnnm5g7X1nEOuOu464+btGGRuqLolPvCbacau1MUK3WWNw1x5SahE8eRTp2kPhYH8Z+GLFfq0FDy1Y9Nwk/p97pf7E5F1X6UcM/JiOc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744200513; c=relaxed/simple; bh=VmkTbUGH49uWbFWwHprZpcWbZczCvck7DO4P3whc+uI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=aVzbFIoVqVNBr4Sr3W/GsLal/PK4TffnddAKkYKtOmMAGN2GHS5Um0tgtY+lWf7cV7Oyv+aVdj3EgUDkpbYrOv2msnDJIJpnTpwx88/wgOr2PlH5QiLdpBJTdws/Q69gkdoZm/K1+YN10urz+xZ38tAtxlBF9/RfQGJT3vS0hwI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=X0iREXfq; arc=none smtp.client-ip=209.85.214.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="X0iREXfq" Received: by mail-pl1-f169.google.com with SMTP id d9443c01a7336-224100e9a5cso70680995ad.2 for ; Wed, 09 Apr 2025 05:08:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1744200511; x=1744805311; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=h9d+efQ8xuINK9fEQQiZtPNOWqjFlfihj42Q0OJNtFE=; b=X0iREXfqJO7pqM58nFJrWdOlD1jZbSlDpQwVPpmdg8+/qCOIu4ua/QiskReCf+V4nk dGrX2MI+M3gPQOHTml75FzztWfGhhMbtBzeaT1bPnv8miwsDBudYRY6jfT2bzCWDb39n r9GnDcCdGaImedwMysSjM/uyYJyc9u2UNqJOtSww7Qxq4RBr5SmyTKxIFnKVmBL2yfOU cvOHVGq0wDIOlSyKEODw7dQRiC6vfY92JXJZhMyor5R2VlyDxUTwqCFlYxt3NUH4qein SMfZXj0NTzehEWA8k9r7aYuUll8C5gz4Sx4D/jkjz5U1ZvMF+I6+H57iFUJiCojsWvTy C4vw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744200511; x=1744805311; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=h9d+efQ8xuINK9fEQQiZtPNOWqjFlfihj42Q0OJNtFE=; b=RyCzhrcvMrBjmmTrq1QMHL4Tj8XPREDQljgYpqYKjkG5PE8uLFx9UbLmxtj5fwLkre wPz4C1/1CZGWETT3/7FqbHKAdsWPSmd/n7jWXHEjdntS/xTDKAS55du2YAT9+tEYtRIf P0C/luNeLsPBhTriHp6zm1zIhhv6bIYeXpvCZjvTVK9V0wP5Pj2PgAMK1l9WWcOwPTCz Ursgt5uCCdz/UOuphpLmkoySLtluhhJ3yiKrHT8dGxt7ArBL5MV3S67KFLIVAdgnCWED ZOU+F5YqjykXKU/VEFdV6pY4nG0ngfcsvvioIlsPfY63Ajf1Cu9KkM2hfCqa3YIVAoft 02Og== X-Gm-Message-State: AOJu0YwWH/mhgN2mZKRsEYB+YOWTSR/KneJ3mc8UcOqq5ocMyHQXW3Kc arddQQAM2p+b5PmLVxqzUUV+o7gLg9zxuWrf3OooB6gqMTHN4hv4NXUwvVB23w== X-Gm-Gg: ASbGncuk8R9lBEwTrsbU0CVT714UhXB5bj4e/2+ujRtB9fU9tmNdvKd1+D7Ztp3Q5Ee tUzXK+ufy/ckkaKbn9YR1p7t+fjPC6CtnE/Jlju6DcxBjhZMj8KgsO5envD7f5QMQBa0V8iRjgU 7QCR1bTlGmOUGryjz8Gc2eSk0heUN3lA5wbcCU5Ryvf/vomxRKPhIpSYJY6Np/nsfz1QBYfzNu9 /ukW6z9OpRIxDyPB0XHb/h0y/uwwqScZ8JpVV0GHJytwO8oZm6ADcBUIn8iOrVtSMETwH8kgVoy upZoVM8n+MkfiZhbwCge9wtaWrVT6i4JWR0Z0TsTkvUnPwfF+m/EtUGPteM+p7efafQ= X-Google-Smtp-Source: AGHT+IE2y5wslPjDoDurBBERMau0q7WCZLb7js7d+fFeoxj+WKnS9RsyGHo3PO6AQnMn9smocobl1w== X-Received: by 2002:a17:902:e750:b0:224:256e:5e4e with SMTP id d9443c01a7336-22ac2991b3dmr31796375ad.16.1744200511458; Wed, 09 Apr 2025 05:08:31 -0700 (PDT) Received: from n37-107-136.byted.org ([115.190.40.11]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-22ac7b8c62dsm10017875ad.95.2025.04.09.05.08.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 09 Apr 2025 05:08:30 -0700 (PDT) From: Aaron Lu To: Valentin Schneider , Ben Segall , K Prateek Nayak , Peter Zijlstra , Josh Don , Ingo Molnar , Vincent Guittot , Xi Wang Cc: linux-kernel@vger.kernel.org, Juri Lelli , Dietmar Eggemann , Steven Rostedt , Mel Gorman , Chengming Zhou , Chuyi Zhou , Jan Kiszka Subject: [RFC PATCH v2 4/7] sched/fair: Take care of group/affinity/sched_class change for throttled task Date: Wed, 9 Apr 2025 20:07:43 +0800 Message-Id: <20250409120746.635476-5-ziqianlu@bytedance.com> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20250409120746.635476-1-ziqianlu@bytedance.com> References: <20250409120746.635476-1-ziqianlu@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" On task group change, for tasks whose on_rq equals to TASK_ON_RQ_QUEUED, core will dequeue it and then requeued it. The throttled task is still considered as queued by core because p->on_rq is still set so core will dequeue it, but since the task is already dequeued on throttle in fair, handle this case properly. Affinity and sched class change is similar. Signed-off-by: Aaron Lu Tested-by: K Prateek Nayak --- kernel/sched/fair.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 4152088fc0546..76b8a5ffcbdd2 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5882,6 +5882,20 @@ void init_cfs_throttle_work(struct task_struct *p) INIT_LIST_HEAD(&p->throttle_node); } =20 +static void dequeue_throttled_task(struct task_struct *p, int flags) +{ + /* + * Task is throttled and someone wants to dequeue it again: + * it must be sched/core when core needs to do things like + * task affinity change, task group change, task sched class + * change etc. + */ + WARN_ON_ONCE(p->se.on_rq); + WARN_ON_ONCE(flags & DEQUEUE_SLEEP); + + list_del_init(&p->throttle_node); +} + static void enqueue_task_fair(struct rq *rq, struct task_struct *p, int fl= ags); static int tg_unthrottle_up(struct task_group *tg, void *data) { @@ -6716,6 +6730,7 @@ static inline void sync_throttle(struct task_group *t= g, int cpu) {} static __always_inline void return_cfs_rq_runtime(struct cfs_rq *cfs_rq) {} static void task_throttle_setup_work(struct task_struct *p) {} static bool task_is_throttled(struct task_struct *p) { return false; } +static void dequeue_throttled_task(struct task_struct *p, int flags) {} =20 static inline int cfs_rq_throttled(struct cfs_rq *cfs_rq) { @@ -7146,6 +7161,11 @@ static int dequeue_entities(struct rq *rq, struct sc= hed_entity *se, int flags) */ static bool dequeue_task_fair(struct rq *rq, struct task_struct *p, int fl= ags) { + if (unlikely(task_is_throttled(p))) { + dequeue_throttled_task(p, flags); + return true; + } + if (!(p->se.sched_delayed && (task_on_rq_migrating(p) || (flags & DEQUEUE= _SAVE)))) util_est_dequeue(&rq->cfs, p); =20 --=20 2.39.5 From nobody Mon Feb 9 03:47:11 2026 Received: from mail-pg1-f176.google.com (mail-pg1-f176.google.com [209.85.215.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BB98625EF83 for ; Wed, 9 Apr 2025 12:08:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.176 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744200520; cv=none; b=XsCav74L9Ga6ORzGWv+1LG3j8h35cum30OW9auO0P6rTy8PZH/92QNb+becq/vbDG8HPObip+za1Ob04Q1LtVseF9BSg4bHRq4AHjcspENS0ZNJVXOuZOKBrYtcDcviV4MuQ3NXWQk+bWDWrXPN9FDj1ATtbni3TyYbF26PE0nw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744200520; c=relaxed/simple; bh=QVVAyUMw1GYU2eQ9MgmKJr6JmFovcIE9h4ZutUBXYDg=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=RZ7oFm9AMANwj0JF5xloyU8mQWKSTNR//eNWzWakME1l7akgXZjp25RZrWtaAUGe9pk/HnLmTxvo3qdw1P/yxBlB5SSsddxpACOXuHc869w4dyaz4vIbTdyTDEsBN8xxPtbGh3qBO/17i6GkcCmUR3bFbdBXxeApvRTUc6angcE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=QnzJC5+O; arc=none smtp.client-ip=209.85.215.176 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="QnzJC5+O" Received: by mail-pg1-f176.google.com with SMTP id 41be03b00d2f7-af19b9f4c8cso4426054a12.2 for ; Wed, 09 Apr 2025 05:08:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1744200518; x=1744805318; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=nKAlGwjYFaaJe+h1c6LhiwkGgAd8UZrNATBWfFhiu04=; b=QnzJC5+OXfviyWFPL8qxF608pvCLLFd3L8OAgzgWrO+wi3rm3kZ8z/jhjMbl1VO7YK 8NJfnWwYy8OFpJlz9iEIhO8HlME7NOAN5Q5T6styUZcW3crumGpoP3ehYgIiLqgWkdIH zWpZIqgyyRPTNxVPEJVyhB5j0fKvUhW8kUtbd+Kjn3O0ToPKgz/eoKM1Jg6tVJc4oK5s Q2zpYtiCUspHQ3cW6MZ7QUpoQkEznNWaZbGeKeLZzwFWWeuO/5aY09MeIHKfMh8cZ36m 4iXDsimmulkbB/c2tefSfMSIHsVIJmLrczAnWAz3gZBWXE3tXFp2Rer6mo4yhJFV6Ly1 wsOw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744200518; x=1744805318; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=nKAlGwjYFaaJe+h1c6LhiwkGgAd8UZrNATBWfFhiu04=; b=aWWhbAkBjVrpjK3wRZmbuTVsYqqdWQo6P+0nHS3/A3s5Um5nAdqC/RErn1L/RDkEWb gqgCmne6OzsYcU1H+ilkciDMf29LPZQp5rqaT5viKY6muSYEC4LzRzmGoqg641hdsLVl c29p1HCVZXCUvdTw9Oe3N/+mSL4yg/TEvdxj6TvrdqXdC5tHcI3WCuluJEunjhMSgtp0 cU3SvXPuyg5D58iIVvaf3Br0j9DSReuSns+EVcOt3m3AwNTe4HjkM2dN0473VhNc7OGG zz2ZKy6Ll5uBL72WLO0mdlfOMH3F+7p3dgbmFEPuHVOO1kNSB5JNrkg8D0GNwzKCsuao rx3g== X-Gm-Message-State: AOJu0YzJaCJv7OyqyEcPbwrP9Nsw1SHACQpFzzRyyxESRAkKsUykUcna MnaNVqVqjOebvkgjVw5omeNaWyWFkudgy78TkW9WEebK2SoHLdI2bO445lq6cleDydlcZpke5Z4 Ecw== X-Gm-Gg: ASbGnctmgil8mQKyB+3p54TwN4nEL09F5wu7dM2hHd3gz2RaAbd1mGvnd3XiuDjn3Nv tyLcni1l6adI/ApFvR5buzok1v/kk5X4BDAT/1aTjZdVICZuvJjIZTjTtWeF8MiAW4Ww6R2OVks m9KXxuO7zyq76tyKYUpEpn9FDv3wnUFfXl0C+SroNXV+/YdEhGR+5soNIc0WT8nvXjeM93PEI8O qagBHOJqM5CrhlB065rZf1/AFDRyOnmKU3Ubg01woDp8rP4m7OyCw32lOUD+Cu1w9qu4XigEUPF zbRzYH6Sy9Invmn1TIcjV2pHx4zInwus2qLFaYFWM60+dQGnwsd5AGjK X-Google-Smtp-Source: AGHT+IGtU6qYiL2dkf63bw15nUQfC7L6vqwL8vIVcUfsdB/xRER/gUmRWJbjS855B6fDinrEWskrbA== X-Received: by 2002:a17:90b:6ce:b0:2ff:4bac:6fba with SMTP id 98e67ed59e1d1-306dbc29851mr4471254a91.24.1744200518008; Wed, 09 Apr 2025 05:08:38 -0700 (PDT) Received: from n37-107-136.byted.org ([115.190.40.11]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-22ac7b8c62dsm10017875ad.95.2025.04.09.05.08.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 09 Apr 2025 05:08:37 -0700 (PDT) From: Aaron Lu To: Valentin Schneider , Ben Segall , K Prateek Nayak , Peter Zijlstra , Josh Don , Ingo Molnar , Vincent Guittot , Xi Wang Cc: linux-kernel@vger.kernel.org, Juri Lelli , Dietmar Eggemann , Steven Rostedt , Mel Gorman , Chengming Zhou , Chuyi Zhou , Jan Kiszka Subject: [RFC PATCH v2 5/7] sched/fair: get rid of throttled_lb_pair() Date: Wed, 9 Apr 2025 20:07:44 +0800 Message-Id: <20250409120746.635476-6-ziqianlu@bytedance.com> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20250409120746.635476-1-ziqianlu@bytedance.com> References: <20250409120746.635476-1-ziqianlu@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Now that throttled tasks are dequeued and can not stay on rq's cfs_tasks list, there is no need to take special care of these throttled tasks anymore in load balance. Suggested-by: K Prateek Nayak Signed-off-by: Aaron Lu Tested-by: K Prateek Nayak --- kernel/sched/fair.c | 33 +++------------------------------ 1 file changed, 3 insertions(+), 30 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 76b8a5ffcbdd2..ff4252995d677 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5808,23 +5808,6 @@ static inline int throttled_hierarchy(struct cfs_rq = *cfs_rq) return cfs_bandwidth_used() && cfs_rq->throttle_count; } =20 -/* - * Ensure that neither of the group entities corresponding to src_cpu or - * dest_cpu are members of a throttled hierarchy when performing group - * load-balance operations. - */ -static inline int throttled_lb_pair(struct task_group *tg, - int src_cpu, int dest_cpu) -{ - struct cfs_rq *src_cfs_rq, *dest_cfs_rq; - - src_cfs_rq =3D tg->cfs_rq[src_cpu]; - dest_cfs_rq =3D tg->cfs_rq[dest_cpu]; - - return throttled_hierarchy(src_cfs_rq) || - throttled_hierarchy(dest_cfs_rq); -} - static inline bool task_is_throttled(struct task_struct *p) { return !list_empty(&p->throttle_node); @@ -6742,12 +6725,6 @@ static inline int throttled_hierarchy(struct cfs_rq = *cfs_rq) return 0; } =20 -static inline int throttled_lb_pair(struct task_group *tg, - int src_cpu, int dest_cpu) -{ - return 0; -} - #ifdef CONFIG_FAIR_GROUP_SCHED void init_cfs_bandwidth(struct cfs_bandwidth *cfs_b, struct cfs_bandwidth = *parent) {} static void init_cfs_rq_runtime(struct cfs_rq *cfs_rq) {} @@ -9377,17 +9354,13 @@ int can_migrate_task(struct task_struct *p, struct = lb_env *env) /* * We do not migrate tasks that are: * 1) delayed dequeued unless we migrate load, or - * 2) throttled_lb_pair, or - * 3) cannot be migrated to this CPU due to cpus_ptr, or - * 4) running (obviously), or - * 5) are cache-hot on their current CPU. + * 2) cannot be migrated to this CPU due to cpus_ptr, or + * 3) running (obviously), or + * 4) are cache-hot on their current CPU. */ if ((p->se.sched_delayed) && (env->migration_type !=3D migrate_load)) return 0; =20 - if (throttled_lb_pair(task_group(p), env->src_cpu, env->dst_cpu)) - return 0; - /* * We want to prioritize the migration of eligible tasks. * For ineligible tasks we soft-limit them and only allow --=20 2.39.5 From nobody Mon Feb 9 03:47:11 2026 Received: from mail-pl1-f175.google.com (mail-pl1-f175.google.com [209.85.214.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4F28525EFBB for ; Wed, 9 Apr 2025 12:08:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.175 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744200526; cv=none; b=U0bFJUbGbebfUsnODyRCucBvSJmGfgWYw6uKt2BjTd7AjUB12y24wgGOAXEp2Y0v8Lra3hFgYxoK8exAqOYMLzb7W+429c+efAuRcseO/WspxV+Jw8+QdNxBE/jDeEc0V7iL7PesUE+KKpdkBCZih8icKJ1t5+/FgQ41MzYW0Gg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744200526; c=relaxed/simple; bh=VHD94vTP3h36PXRN9S/T2DGKvXNCp6OibQRG2wjgNhM=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=QXBCheol6YgkOBtlsPG2aDL7e2jc3xDrd+9apFxaV07nZ7LnUhMZ0LoL+gIVk3F8eplyr8iCCbgmL+gtkFO5GVY1Euy4vWgJn3BBOCQwzzzspPsgI+pRiPM7r11aFPD5iOoNpGHIdw7QPZodV09VcDPk9tvljwkRxEuHL6xODpY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=exVHbkSb; arc=none smtp.client-ip=209.85.214.175 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="exVHbkSb" Received: by mail-pl1-f175.google.com with SMTP id d9443c01a7336-224019ad9edso85667595ad.1 for ; Wed, 09 Apr 2025 05:08:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1744200524; x=1744805324; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=PbSUkVrVD3xkDETpxNaGhOsnTSGzGK+aEYnf85XWEQg=; b=exVHbkSb3WUVLSDC9QJHaaeBvZRgn2HWoflZNTwCc7ZQomeZ05KB9Mvs3ORWcd5A3d wrLaOTdQIpWtZC0ZqjshVpbbImRL1oBlHyzeHdwZEB8Q1qMsxjB2RJaecy6a2zbQVb8e 2E80U1/FoPCbahf4SAy5HTXXSARcX/BZkPXSTXSUETaX7gmRSeCFv95S4tVoG1DyC6Xw XBW6mg+Ppjt+gc+3FYpc/AgPxtYvcjygSwrzF8GCEkGFJpENjCoGCaRavb2syV2zsVId v5/XB2evxykSJpUagjkrdtg95OagWNQKsxUjP+DwvotXobMJiwExkUNSMI2xfiXmweAL hSVg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744200524; x=1744805324; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=PbSUkVrVD3xkDETpxNaGhOsnTSGzGK+aEYnf85XWEQg=; b=BTvd2CzrzXWYtmACOKLRhme2mjfCOZelOmvRU8VozyVw8AzN8snwyui66kPEcBiS24 BX7LjfVjQDvJqmti4nTK5EPR7suEOlmQBly4NlQ6LIP9eli9/Y02o4DYDvHOx7bt8V3d tAUqx5TchpEbulCZ/F36RlKK7uh51ieYerhj/ge95ra1yZ2s/fvL9ntOEWm+z300hrf8 /yc/KC1GuzzD/eNvx3NKJXtYCy2JfGftToA6Bw43RPiV8YqyS2Pz9zARfAvYUdQE2Q0L k5LTkiF1f7jN2X1KAAt9yTGxjTSDwsnNrljU/NHqDbCeh08RW2xfP25SYMfjOmpqM32Q ujcA== X-Gm-Message-State: AOJu0YxrX9x7DxxhTLZydfNDEz3IomAPqHspEZQJgHGfstrihMhNtKY0 IePAAVKoVsDVQAKt7qDtyc3kmQH991bafT/ZrMOyHDXCxufqoQi9RgpqvMXJxg== X-Gm-Gg: ASbGnctWpbsu/QfMnqs6a/7+W5DnSGkBNJZ/nk9HVC3wRPxhNQUZk1VgMPsRrf3471h R+Sy3SKOHf4GOV+oJmkuHifkjS5X2ipr/giSNbbKzunfYdf+oOCi4nmJG0Dve4LewkMNaDvtJ9G oi06nukfSYzXLfTnpix8OC8hfm2dyVH6ukAt33+yXHzyIoBRWec+7miZIx6h4jZ3v613R/1YvPA 9ymbBSv1d7uEGZaT+swM6u9Vj9W5qXM/J71ulIY1SkQ49XKud8IfXPDdM3YxY81vSY07GXK73DD VFWkGJK7y6dFeFp1e1S5LpYwSsgXZOvFuNiueRxNj6bGsQ4vKkeET0+F X-Google-Smtp-Source: AGHT+IFGi48zBlxcV6VWNk6dcWklyh3y8HBTbpxff65QH9XfTXe4esKetVuI3F8se06ilEs9xXpCaA== X-Received: by 2002:a17:903:1905:b0:223:6744:bfb9 with SMTP id d9443c01a7336-22ac2a20495mr50055855ad.41.1744200524548; Wed, 09 Apr 2025 05:08:44 -0700 (PDT) Received: from n37-107-136.byted.org ([115.190.40.11]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-22ac7b8c62dsm10017875ad.95.2025.04.09.05.08.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 09 Apr 2025 05:08:44 -0700 (PDT) From: Aaron Lu To: Valentin Schneider , Ben Segall , K Prateek Nayak , Peter Zijlstra , Josh Don , Ingo Molnar , Vincent Guittot , Xi Wang Cc: linux-kernel@vger.kernel.org, Juri Lelli , Dietmar Eggemann , Steven Rostedt , Mel Gorman , Chengming Zhou , Chuyi Zhou , Jan Kiszka Subject: [RFC PATCH v2 6/7] sched/fair: fix h_nr_runnable accounting with per-task throttle Date: Wed, 9 Apr 2025 20:07:45 +0800 Message-Id: <20250409120746.635476-7-ziqianlu@bytedance.com> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20250409120746.635476-1-ziqianlu@bytedance.com> References: <20250409120746.635476-1-ziqianlu@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Task based throttle does not adjust cfs_rq's h_nr_runnable on throttle anymore but relies on standard en/dequeue_entity(), so there is no need to take special care of h_nr_runnable in delayed dequeue operations. Signed-off-by: Aaron Lu Tested-by: K Prateek Nayak --- kernel/sched/fair.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index ff4252995d677..20471a3aa35e6 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5406,8 +5406,6 @@ static void set_delayed(struct sched_entity *se) struct cfs_rq *cfs_rq =3D cfs_rq_of(se); =20 cfs_rq->h_nr_runnable--; - if (cfs_rq_throttled(cfs_rq)) - break; } } =20 @@ -5428,8 +5426,6 @@ static void clear_delayed(struct sched_entity *se) struct cfs_rq *cfs_rq =3D cfs_rq_of(se); =20 cfs_rq->h_nr_runnable++; - if (cfs_rq_throttled(cfs_rq)) - break; } } =20 --=20 2.39.5 From nobody Mon Feb 9 03:47:11 2026 Received: from mail-pl1-f172.google.com (mail-pl1-f172.google.com [209.85.214.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0257525F7AD for ; Wed, 9 Apr 2025 12:08:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744200533; cv=none; b=dRO8CpcBvjARl3TAI7QjVn/hE02yNW1aE1y40crxC5ZxgGaA992j9jTgMZOBjxxb04vhHKSMD5L4DCEKKdbS/EDxN9fmT5DSByQMt2nstSM8f9/v9qcfwbZJqMwKGItrqurB2wZ7XWSOvboUnqMVm4eUe8+NlDFhcbdS9KSq0v4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744200533; c=relaxed/simple; bh=r+U8nm/XYGjZeIaR9vq8NIB9rt1fWxKAyc80zUWemVM=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=b1ZYzpQr/8yTt4yCOJEo+6/tL5yGVyv1NS/FxhB/UWhQfmGriUyh+hqcZvPwXW2oZEuBMvN8Qu7YYHQaucsY3Yx85oQIJJglty7U6uZabFKNOTfNjhvr+MGHorfOMThkdqoPlllWrTvEY9zurRKHgZNiaSBC3fUmeJzLFXc412I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=DMUWLXlO; arc=none smtp.client-ip=209.85.214.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="DMUWLXlO" Received: by mail-pl1-f172.google.com with SMTP id d9443c01a7336-22548a28d0cso94452585ad.3 for ; Wed, 09 Apr 2025 05:08:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1744200531; x=1744805331; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=0iLVN+UQDsaz8fEhz9VXXrzsLP1VkED6AM2sgVLXPic=; b=DMUWLXlOtZtOyMccLJ7Ma+8DrdlsQrF948MppKrN3iGypWBIh9+c/3rZTsVe9Tz+ft /vsdZDNJddcF/wDDhY5ny2y0OcZn4A0ayTy6JfjrrBUljyrmSb/H9Zak+6t7JzKGdYqq xfvAM2oY8+KwEqzomDAoC58w42ouXNKfhDVH/70MFlXidz99dmd4tEjcDqWye27GYjjZ pSrLBj53ckEbvLHMCvk9Uh9uHQu2GQtGzzPrfMU7ZTgzjyaAEchdHMbIumRqfBSxVqxa Ij0MZEbL0m53VUVTDhVSyixaePbzcmGhF7vGlcTXrRkJsY264j5ULqKUg3xOw+VBrXll nAbA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744200531; x=1744805331; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=0iLVN+UQDsaz8fEhz9VXXrzsLP1VkED6AM2sgVLXPic=; b=rKR1NpK3I8Q6b/7qM1VDoUJ/LNeG00Ln+5VTEsShxfnzwYUrm5BF6qwI6cfVHNX/ji U0N3ebP4/szIZsQo8NT9vTbUonATxjNj+Q6j1kjW8CA0Xpkgq0ltBlnXuBhGp14sQkbY 9Hs/PQHZbQT+QWx8Xpyb5WbUOfZ3NzJMTa+xl/9wDVcwntJ/GCtYnbPNOyykDsWymxjm a4PGnmUGCHadVjsAov8k34A0hiLe5YxF8C6sIn8Vh2BEwhsfG4zae28SLEycLOehqyG1 qVPPa5aJu7eMMy6nFHI0IT6HgtSR4+h6L+Hbv3Z6e8F6HBYaYo4ZBIIAT/474zMEaqhq +fDg== X-Gm-Message-State: AOJu0YwNhvgueQBs5A/kiSzd446wUtajx08ee9CiTcgFMSm+RsXjSkV4 sZJLPSABXeV8L/pvM2jbiKtYFaHih31l5d5NS4IFtAW/lY+o8b2zydS9RfrqPg== X-Gm-Gg: ASbGncujxF1wQZYKoir87oLcsE5kSGw3IkBPbvRc3/7l0CyEAp/S1zckjwpPPudDPGB 9+xTrjCWmmmXHzW1eRJldmIj7105PA0ngZJ4B4PcIUp+MM5vkuT/3R46PVi9rRVR1X1wNX5RypW FmtgIyzCMpxPIAXugIPGf8bNZ14D92/Cnm2EEZ+wI4Q4CWWTn3wrhHpIuOtoVLmDxx/dFslIaI5 YtwH+rKJm6MC/TjPAaw5DYSwQ4D7v8gOyWIjWPDk5odNjjtf8YExuoL+ntxzONr8CCp8uTv13U1 IIJWR1+58/E4KsKC33sMcOeHjAFFV6Tsp+tunt5sIk/tcbmLQXEcGmT0 X-Google-Smtp-Source: AGHT+IHEahy8pwoua01WcQcJf/l3yT1AbK/F+vkss2DqIiIV9m+OrkJEuowHzcV1gZmUco39r340ig== X-Received: by 2002:a17:903:2310:b0:227:e709:f71 with SMTP id d9443c01a7336-22ac29b41d7mr45248595ad.29.1744200531114; Wed, 09 Apr 2025 05:08:51 -0700 (PDT) Received: from n37-107-136.byted.org ([115.190.40.11]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-22ac7b8c62dsm10017875ad.95.2025.04.09.05.08.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 09 Apr 2025 05:08:50 -0700 (PDT) From: Aaron Lu To: Valentin Schneider , Ben Segall , K Prateek Nayak , Peter Zijlstra , Josh Don , Ingo Molnar , Vincent Guittot , Xi Wang Cc: linux-kernel@vger.kernel.org, Juri Lelli , Dietmar Eggemann , Steven Rostedt , Mel Gorman , Chengming Zhou , Chuyi Zhou , Jan Kiszka Subject: [RFC PATCH v2 7/7] sched/fair: alternative way of accounting throttle time Date: Wed, 9 Apr 2025 20:07:46 +0800 Message-Id: <20250409120746.635476-8-ziqianlu@bytedance.com> X-Mailer: git-send-email 2.39.5 In-Reply-To: <20250409120746.635476-1-ziqianlu@bytedance.com> References: <20250409120746.635476-1-ziqianlu@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Implement an alternative way of accounting cfs_rq throttle time which: - starts accounting when a throttled cfs_rq has no tasks enqueued and its throttled list is not empty; - stops accounting when this cfs_rq gets unthrottled or a task gets enqueued. This way, the accounted throttle time is when the cfs_rq has absolutely no tasks enqueued and has tasks throttled. Signed-off-by: Aaron Lu Tested-by: K Prateek Nayak --- kernel/sched/fair.c | 112 ++++++++++++++++++++++++++++++++----------- kernel/sched/sched.h | 4 ++ 2 files changed, 89 insertions(+), 27 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 20471a3aa35e6..70f7de82d1d9d 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5300,6 +5300,7 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_enti= ty *se, int flags) =20 static void check_enqueue_throttle(struct cfs_rq *cfs_rq); static inline int cfs_rq_throttled(struct cfs_rq *cfs_rq); +static void account_cfs_rq_throttle_self(struct cfs_rq *cfs_rq); =20 static void requeue_delayed_entity(struct sched_entity *se); @@ -5362,10 +5363,14 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct sched_= entity *se, int flags) if (throttled_hierarchy(cfs_rq)) { struct rq *rq =3D rq_of(cfs_rq); =20 - if (cfs_rq_throttled(cfs_rq) && !cfs_rq->throttled_clock) - cfs_rq->throttled_clock =3D rq_clock(rq); - if (!cfs_rq->throttled_clock_self) - cfs_rq->throttled_clock_self =3D rq_clock(rq); + if (cfs_rq->throttled_clock) { + cfs_rq->throttled_time +=3D + rq_clock(rq) - cfs_rq->throttled_clock; + cfs_rq->throttled_clock =3D 0; + } + + if (cfs_rq->throttled_clock_self) + account_cfs_rq_throttle_self(cfs_rq); } #endif } @@ -5453,7 +5458,7 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_en= tity *se, int flags) * DELAY_DEQUEUE relies on spurious wakeups, special task * states must not suffer spurious wakeups, excempt them. */ - if (flags & DEQUEUE_SPECIAL) + if (flags & (DEQUEUE_SPECIAL | DEQUEUE_THROTTLE)) delay =3D false; =20 WARN_ON_ONCE(delay && se->sched_delayed); @@ -5513,8 +5518,24 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_e= ntity *se, int flags) =20 if (cfs_rq->nr_queued =3D=3D 0) { update_idle_cfs_rq_clock_pelt(cfs_rq); - if (throttled_hierarchy(cfs_rq)) + +#ifdef CONFIG_CFS_BANDWIDTH + if (throttled_hierarchy(cfs_rq)) { list_del_leaf_cfs_rq(cfs_rq); + + if (cfs_rq->h_nr_throttled) { + struct rq *rq =3D rq_of(cfs_rq); + + WARN_ON_ONCE(cfs_rq->throttled_clock_self); + cfs_rq->throttled_clock_self =3D rq_clock(rq); + + if (cfs_rq_throttled(cfs_rq)) { + WARN_ON_ONCE(cfs_rq->throttled_clock); + cfs_rq->throttled_clock =3D rq_clock(rq); + } + } + } +#endif } =20 return true; @@ -5809,6 +5830,18 @@ static inline bool task_is_throttled(struct task_str= uct *p) return !list_empty(&p->throttle_node); } =20 +static inline void +cfs_rq_inc_h_nr_throttled(struct cfs_rq *cfs_rq, unsigned int nr) +{ + cfs_rq->h_nr_throttled +=3D nr; +} + +static inline void +cfs_rq_dec_h_nr_throttled(struct cfs_rq *cfs_rq, unsigned int nr) +{ + cfs_rq->h_nr_throttled -=3D nr; +} + static bool dequeue_task_fair(struct rq *rq, struct task_struct *p, int fl= ags); static void throttle_cfs_rq_work(struct callback_head *work) { @@ -5845,7 +5878,7 @@ static void throttle_cfs_rq_work(struct callback_head= *work) rq =3D scope.rq; update_rq_clock(rq); WARN_ON_ONCE(!list_empty(&p->throttle_node)); - dequeue_task_fair(rq, p, DEQUEUE_SLEEP | DEQUEUE_SPECIAL); + dequeue_task_fair(rq, p, DEQUEUE_SLEEP | DEQUEUE_THROTTLE); list_add(&p->throttle_node, &cfs_rq->throttled_limbo_list); resched_curr(rq); } @@ -5863,16 +5896,37 @@ void init_cfs_throttle_work(struct task_struct *p) =20 static void dequeue_throttled_task(struct task_struct *p, int flags) { + struct sched_entity *se =3D &p->se; + /* * Task is throttled and someone wants to dequeue it again: * it must be sched/core when core needs to do things like * task affinity change, task group change, task sched class * change etc. */ - WARN_ON_ONCE(p->se.on_rq); - WARN_ON_ONCE(flags & DEQUEUE_SLEEP); + WARN_ON_ONCE(se->on_rq); + WARN_ON_ONCE(flags & DEQUEUE_THROTTLE); =20 list_del_init(&p->throttle_node); + + for_each_sched_entity(se) { + struct cfs_rq *cfs_rq =3D cfs_rq_of(se); + + cfs_rq->h_nr_throttled--; + } +} + +static void account_cfs_rq_throttle_self(struct cfs_rq *cfs_rq) +{ + /* account self time */ + u64 delta =3D rq_clock(rq_of(cfs_rq)) - cfs_rq->throttled_clock_self; + + cfs_rq->throttled_clock_self =3D 0; + + if (WARN_ON_ONCE((s64)delta < 0)) + delta =3D 0; + + cfs_rq->throttled_clock_self_time +=3D delta; } =20 static void enqueue_task_fair(struct rq *rq, struct task_struct *p, int fl= ags); @@ -5889,27 +5943,21 @@ static int tg_unthrottle_up(struct task_group *tg, = void *data) cfs_rq->throttled_clock_pelt_time +=3D rq_clock_pelt(rq) - cfs_rq->throttled_clock_pelt; =20 - if (cfs_rq->throttled_clock_self) { - u64 delta =3D rq_clock(rq) - cfs_rq->throttled_clock_self; - - cfs_rq->throttled_clock_self =3D 0; - - if (WARN_ON_ONCE((s64)delta < 0)) - delta =3D 0; - - cfs_rq->throttled_clock_self_time +=3D delta; - } + if (cfs_rq->throttled_clock_self) + account_cfs_rq_throttle_self(cfs_rq); =20 /* Re-enqueue the tasks that have been throttled at this level. */ list_for_each_entry_safe(p, tmp, &cfs_rq->throttled_limbo_list, throttle_= node) { list_del_init(&p->throttle_node); - enqueue_task_fair(rq_of(cfs_rq), p, ENQUEUE_WAKEUP); + enqueue_task_fair(rq_of(cfs_rq), p, ENQUEUE_WAKEUP | ENQUEUE_THROTTLE); } =20 /* Add cfs_rq with load or one or more already running entities to the li= st */ if (!cfs_rq_is_decayed(cfs_rq)) list_add_leaf_cfs_rq(cfs_rq); =20 + WARN_ON_ONCE(cfs_rq->h_nr_throttled); + return 0; } =20 @@ -5945,10 +5993,7 @@ static int tg_throttle_down(struct task_group *tg, v= oid *data) /* group is entering throttled state, stop time */ cfs_rq->throttled_clock_pelt =3D rq_clock_pelt(rq); =20 - WARN_ON_ONCE(cfs_rq->throttled_clock_self); - if (cfs_rq->nr_queued) - cfs_rq->throttled_clock_self =3D rq_clock(rq); - else + if (!cfs_rq->nr_queued) list_del_leaf_cfs_rq(cfs_rq); =20 WARN_ON_ONCE(!list_empty(&cfs_rq->throttled_limbo_list)); @@ -5992,9 +6037,6 @@ static void throttle_cfs_rq(struct cfs_rq *cfs_rq) * throttled-list. rq->lock protects completion. */ cfs_rq->throttled =3D 1; - WARN_ON_ONCE(cfs_rq->throttled_clock); - if (cfs_rq->nr_queued) - cfs_rq->throttled_clock =3D rq_clock(rq); return; } =20 @@ -6026,6 +6068,10 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq) cfs_b->throttled_time +=3D rq_clock(rq) - cfs_rq->throttled_clock; cfs_rq->throttled_clock =3D 0; } + if (cfs_rq->throttled_time) { + cfs_b->throttled_time +=3D cfs_rq->throttled_time; + cfs_rq->throttled_time =3D 0; + } list_del_rcu(&cfs_rq->throttled_list); raw_spin_unlock(&cfs_b->lock); =20 @@ -6710,6 +6756,8 @@ static __always_inline void return_cfs_rq_runtime(str= uct cfs_rq *cfs_rq) {} static void task_throttle_setup_work(struct task_struct *p) {} static bool task_is_throttled(struct task_struct *p) { return false; } static void dequeue_throttled_task(struct task_struct *p, int flags) {} +static void cfs_rq_inc_h_nr_throttled(struct cfs_rq *cfs_rq, unsigned int = nr) {} +static void cfs_rq_dec_h_nr_throttled(struct cfs_rq *cfs_rq, unsigned int = nr) {} =20 static inline int cfs_rq_throttled(struct cfs_rq *cfs_rq) { @@ -6898,6 +6946,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *= p, int flags) struct sched_entity *se =3D &p->se; int h_nr_idle =3D task_has_idle_policy(p); int h_nr_runnable =3D 1; + int h_nr_throttled =3D (flags & ENQUEUE_THROTTLE) ? 1 : 0; int task_new =3D !(flags & ENQUEUE_WAKEUP); int rq_h_nr_queued =3D rq->cfs.h_nr_queued; u64 slice =3D 0; @@ -6951,6 +7000,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *= p, int flags) cfs_rq->h_nr_runnable +=3D h_nr_runnable; cfs_rq->h_nr_queued++; cfs_rq->h_nr_idle +=3D h_nr_idle; + cfs_rq_dec_h_nr_throttled(cfs_rq, h_nr_throttled); =20 if (cfs_rq_is_idle(cfs_rq)) h_nr_idle =3D 1; @@ -6973,6 +7023,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *= p, int flags) cfs_rq->h_nr_runnable +=3D h_nr_runnable; cfs_rq->h_nr_queued++; cfs_rq->h_nr_idle +=3D h_nr_idle; + cfs_rq_dec_h_nr_throttled(cfs_rq, h_nr_throttled); =20 if (cfs_rq_is_idle(cfs_rq)) h_nr_idle =3D 1; @@ -7027,10 +7078,12 @@ static int dequeue_entities(struct rq *rq, struct s= ched_entity *se, int flags) int rq_h_nr_queued =3D rq->cfs.h_nr_queued; bool task_sleep =3D flags & DEQUEUE_SLEEP; bool task_delayed =3D flags & DEQUEUE_DELAYED; + bool task_throttle =3D flags & DEQUEUE_THROTTLE; struct task_struct *p =3D NULL; int h_nr_idle =3D 0; int h_nr_queued =3D 0; int h_nr_runnable =3D 0; + int h_nr_throttled =3D 0; struct cfs_rq *cfs_rq; u64 slice =3D 0; =20 @@ -7040,6 +7093,9 @@ static int dequeue_entities(struct rq *rq, struct sch= ed_entity *se, int flags) h_nr_idle =3D task_has_idle_policy(p); if (task_sleep || task_delayed || !se->sched_delayed) h_nr_runnable =3D 1; + + if (task_throttle) + h_nr_throttled =3D 1; } else { cfs_rq =3D group_cfs_rq(se); slice =3D cfs_rq_min_slice(cfs_rq); @@ -7058,6 +7114,7 @@ static int dequeue_entities(struct rq *rq, struct sch= ed_entity *se, int flags) cfs_rq->h_nr_runnable -=3D h_nr_runnable; cfs_rq->h_nr_queued -=3D h_nr_queued; cfs_rq->h_nr_idle -=3D h_nr_idle; + cfs_rq_inc_h_nr_throttled(cfs_rq, h_nr_throttled); =20 if (cfs_rq_is_idle(cfs_rq)) h_nr_idle =3D h_nr_queued; @@ -7095,6 +7152,7 @@ static int dequeue_entities(struct rq *rq, struct sch= ed_entity *se, int flags) cfs_rq->h_nr_runnable -=3D h_nr_runnable; cfs_rq->h_nr_queued -=3D h_nr_queued; cfs_rq->h_nr_idle -=3D h_nr_idle; + cfs_rq_inc_h_nr_throttled(cfs_rq, h_nr_throttled); =20 if (cfs_rq_is_idle(cfs_rq)) h_nr_idle =3D h_nr_queued; diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 97be6a6f53b9c..54cdec21aa5c2 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -721,6 +721,7 @@ struct cfs_rq { =20 #ifdef CONFIG_CFS_BANDWIDTH int runtime_enabled; + unsigned int h_nr_throttled; s64 runtime_remaining; =20 u64 throttled_pelt_idle; @@ -732,6 +733,7 @@ struct cfs_rq { u64 throttled_clock_pelt_time; u64 throttled_clock_self; u64 throttled_clock_self_time; + u64 throttled_time; int throttled; int throttle_count; struct list_head throttled_list; @@ -2360,6 +2362,7 @@ extern const u32 sched_prio_to_wmult[40]; #define DEQUEUE_SPECIAL 0x10 #define DEQUEUE_MIGRATING 0x100 /* Matches ENQUEUE_MIGRATING */ #define DEQUEUE_DELAYED 0x200 /* Matches ENQUEUE_DELAYED */ +#define DEQUEUE_THROTTLE 0x800 /* Matches ENQUEUE_THROTTLE */ =20 #define ENQUEUE_WAKEUP 0x01 #define ENQUEUE_RESTORE 0x02 @@ -2377,6 +2380,7 @@ extern const u32 sched_prio_to_wmult[40]; #define ENQUEUE_MIGRATING 0x100 #define ENQUEUE_DELAYED 0x200 #define ENQUEUE_RQ_SELECTED 0x400 +#define ENQUEUE_THROTTLE 0x800 =20 #define RETRY_TASK ((void *)-1UL) =20 --=20 2.39.5