From nobody Sat Feb 7 17:42:00 2026 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 536BF1AADD for ; Fri, 2 Feb 2024 08:10:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706861456; cv=none; b=rEqZ4EZKraYr+wtHlPrm1b/XTdZ2ICgUrQ2iRkWLmLFRPyAEll/WOm8I4yKhca9fbeh2HLuRNdnVwxPD1QfGanFvRQ/o/VKuznsz8qxqTrFFnfk1DNlHl4Qw/tQTVsvCAHmdACyI2Z+CefS0sQtBdGmePSwzekcejEtj0MtL25E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706861456; c=relaxed/simple; bh=XSUm7ZVkUeJgzY+9JV/lu1DGc96b7lL1jb2zM9BjCJw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=EYM2x8UDmGABzZaMedauhb3ttg72aB1JOPC0PIDNqjNE2ZS5CAde05rHAsECldeDT5BlobcKmg7d5pU8O++xrl9T8K1wJCodxjhZadJV180U9zHDo2GZD2w4Z+FcByS+fRLAsTtCmG7RMJETPaeEnCa6FZc2MENzkeygfVMk81w= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=UPlur54y; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="UPlur54y" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1706861453; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=59FY7Fjt6DUz/1aW5sr5w/uHLVwuGkenIcuvCVNSVT4=; b=UPlur54yBIBI1UWQTbEn1GaxfNJniS+Ptuvdfc/bT5D403PrC1lJqyuUYA09QKCJTWOEcY wfTbCpu66ChWChvUox8o0aYdNHgD0Yg4YRK74ihZ33nypfKia/fTWZ3vDigNOBOEZ3/PI3 Zw1pLYz12P3HHUYf4wxeM84hzjxrjPo= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-427-vO7VM0CHMYa-y6FzF5P9nA-1; Fri, 02 Feb 2024 03:10:48 -0500 X-MC-Unique: vO7VM0CHMYa-y6FzF5P9nA-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 635D983B82B; Fri, 2 Feb 2024 08:10:48 +0000 (UTC) Received: from vschneid-thinkpadt14sgen2i.remote.csb (unknown [10.39.193.2]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 3F9A0C2590D; Fri, 2 Feb 2024 08:10:46 +0000 (UTC) From: Valentin Schneider To: linux-kernel@vger.kernel.org Cc: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Phil Auld , Clark Williams , Tomas Glozar Subject: [RFC PATCH v2 4/5] sched/fair: Track count of tasks running in userspace Date: Fri, 2 Feb 2024 09:09:19 +0100 Message-ID: <20240202080920.3337862-5-vschneid@redhat.com> In-Reply-To: <20240202080920.3337862-1-vschneid@redhat.com> References: <20240202080920.3337862-1-vschneid@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.8 Content-Type: text/plain; charset="utf-8" While having a second tree to pick from solves the throttling aspect of thi= ngs, it also requires modification of the task count at the cfs_rq level. .h_nr_running is used throughout load_balance(), and it needs to accurately reflect the amount of pickable tasks: a cfs_rq with .throttle_pending=3D1 m= ay have many tasks in userspace (thus effectively throttled), and this "excess" of = tasks shouldn't cause find_busiest_group() / find_busiest_queue() to pick that cfs_rq's CPU to pull load from when there are other CPUs with more pickable tasks to pull. The approach taken here is to track both the count of tasks in kernelspace = and the count of tasks in userspace (technically tasks-just-about-to-enter-user= space). When a cfs_rq runs out of runtime, it gets marked as .throttle_pending=3D1.= From this point on, only tasks executing in kernelspace are pickable, and this is reflected up the hierarchy by removing that cfs_rq.h_user_running from its parents' .h_nr_running. To aid in validating the proper behaviour of the implementation, we assert = the following invariants: o For any cfs_rq with .throttle_pending =3D=3D 0: .h_kernel_running + .h_user_running =3D=3D .h_nr_running o For any cfs_rq with .throttle_pending =3D=3D 1: .h_kernel_running =3D=3D .h_nr_running This means the .h_user_running also needs to be updated as cfs_rq's become .throttle_pending=3D1. When a cfs_rq becomes .throttle_pending=3D1, its .h_user_running remains untouched, but it is subtracted from its parents' .h_user_running. Another way to look at it is that the .h_user_running is "stored" at the le= vel of the .throttle_pending cfs_rq, and restored to the upper part of the hier= archy at unthrottle. An overview of the count logic is: Consider: cfs_rq.kernel :=3D count of kernel *tasks* enqueued on this cfs_rq cfs_rq.user :=3D count of user *tasks* enqueued on this cfs_rq Then, the following logic is implemented: cfs_rq.h_kernel_running =3D Sum(child.kernel) for all child cfs_rq cfs_rq.h_user_running =3D Sum(child.user) for all child cfs_rq with = !child.throttle_pending cfs_rq.h_nr_running =3D Sum(child.kernel) for all child cfs_rq + Sum(child.user) for all child cfs_rq with !child.throttle_pending An application of that logic to an A/B/C cgroup hierarchy: Initial condition, no throttling +------+ .h_kernel_running =3D C.kernel + B.kernel + A.kernel A |cfs_rq| .h_user_running =3D C.user + B.user + A.user +------+ .h_nr_running =3D C.{kernel+user} + B.{kernel+user} + A.{k= ernel+user} ^ .throttle_pending =3D 0 | | parent | +------+ .h_kernel_running =3D C.kernel + B.kernel B |cfs_rq| .h_user_running =3D C.user + B.user +------+ .h_nr_running =3D C.{kernel+user} + B.{kernel+user} ^ .throttle_pending =3D 0 | | parent | +------+ .h_kernel_running =3D C.kernel C |cfs_rq| .h_user_running =3D C.user +------+ .h_nr_running =3D C.{kernel+user} .throttle_pending =3D 0 C becomes .throttle_pending +------+ .h_kernel_running =3D C.kernel + B.kernel + A.kernel = <- Untouched A |cfs_rq| .h_user_running =3D B.user + A.user = <- Decremented by C.user +------+ .h_nr_running =3D C.kernel + B.{kernel+user} + A.{kernel+u= ser} <- Decremented by C.user ^ .throttle_pending =3D 0 | | parent | +------+ .h_kernel_running =3D C.kernel + B.kernel = <- Untouched B |cfs_rq| .h_user_running =3D B.user = <- Decremented by C.user +------+ .h_nr_running =3D C.kernel + B.{kernel+user} + A.{kernel+u= ser} <- Decremented by C.user ^ .throttle_pending =3D 0 | | parent | +------+ .h_kernel_running =3D C.kernel C |cfs_rq| .h_user_running =3D C.user <- Untouched, the count is "sto= red" at this level +------+ .h_nr_running =3D C.kernel <- Decremented by C.user .throttle_pending =3D 1 C becomes throttled +------+ .h_kernel_running =3D B.kernel + A.kernel <- Dec= remented by C.kernel A |cfs_rq| .h_user_running =3D B.user + A.user +------+ .h_nr_running =3D B.{kernel+user} + A.{kernel+user} <- Dec= remented by C.kernel ^ .throttle_pending =3D 0 | | parent | +------+ .h_kernel_running =3D B.kernel <- Dec= remented by C.kernel B |cfs_rq| .h_user_running =3D B.user +------+ .h_nr_running =3D B.{kernel+user} + A.{kernel+user} <- Dec= remented by C.kernel ^ .throttle_pending =3D 0 | | parent | +------+ .h_kernel_running =3D C.kernel C |cfs_rq| .h_user_running =3D C.user +------+ .h_nr_running =3D C.{kernel+user} <- Incremented by C.user .throttle_pending =3D 0 Could we get away with just one count, e.g. the user count and not the kern= el count? Technically yes, we could follow this scheme: if (throttle_pending) =3D> kernel count :=3D h_nr_running - h_user_running else =3D> kernel count :=3D h_nr_running this however prevents any sort of assertion or sanity checking on the count= s, which I am not the biggest fan on - CFS group scheduling is enough of a hea= dache as it is. Signed-off-by: Valentin Schneider --- kernel/sched/fair.c | 174 ++++++++++++++++++++++++++++++++++++------- kernel/sched/sched.h | 2 + 2 files changed, 151 insertions(+), 25 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 60778afbff207..2b54d3813d18d 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5785,17 +5785,48 @@ static bool throttle_cfs_rq(struct cfs_rq *cfs_rq) struct rq *rq =3D rq_of(cfs_rq); struct cfs_bandwidth *cfs_b =3D tg_cfs_bandwidth(cfs_rq->tg); struct sched_entity *se; - long task_delta, idle_task_delta, kernel_delta, dequeue =3D 1; + long task_delta, idle_task_delta, kernel_delta, user_delta, dequeue =3D 1; + bool was_pending; =20 /* - * We don't actually throttle, though account() will have made sure to - * resched us so that we pick into a kernel task. + * We don't actually throttle just yet, though account_cfs_rq_runtime() + * will have made sure to resched us so that we pick into a kernel task. */ if (cfs_rq->h_kernel_running) { + if (cfs_rq->throttle_pending) + return false; + + /* + * From now on we're only going to pick tasks that are in the + * second tree. Reflect this by discounting tasks that aren't going + * to be pickable from the ->h_nr_running counts. + */ cfs_rq->throttle_pending =3D true; + + se =3D cfs_rq->tg->se[cpu_of(rq_of(cfs_rq))]; + user_delta =3D cfs_rq->h_user_running; + cfs_rq->h_nr_running -=3D user_delta; + + for_each_sched_entity(se) { + struct cfs_rq *qcfs_rq =3D cfs_rq_of(se); + + if (!se->on_rq) + goto done; + + qcfs_rq->h_nr_running -=3D user_delta; + qcfs_rq->h_user_running -=3D user_delta; + + assert_cfs_rq_counts(qcfs_rq); + } return false; } =20 + /* + * Unlikely as it may be, we may only have user tasks as we hit the + * throttle, in which case we won't have discount them from the + * h_nr_running, and we need to be aware of that. + */ + was_pending =3D cfs_rq->throttle_pending; cfs_rq->throttle_pending =3D false; =20 raw_spin_lock(&cfs_b->lock); @@ -5826,9 +5857,27 @@ static bool throttle_cfs_rq(struct cfs_rq *cfs_rq) walk_tg_tree_from(cfs_rq->tg, tg_throttle_down, tg_nop, (void *)rq); rcu_read_unlock(); =20 - task_delta =3D cfs_rq->h_nr_running; + /* + * At this point, h_nr_running =3D=3D h_kernel_running. We add back the + * h_user_running to the throttled cfs_rq, and only remove the difference + * to the upper cfs_rq's. + */ + if (was_pending) { + WARN_ON_ONCE(cfs_rq->h_nr_running !=3D cfs_rq->h_kernel_running); + cfs_rq->h_nr_running +=3D cfs_rq->h_user_running; + } else { + WARN_ON_ONCE(cfs_rq->h_nr_running !=3D cfs_rq->h_user_running); + } + + /* + * We always discount user tasks from h_nr_running when throttle_pending + * so only h_kernel_running remains to be removed + */ + task_delta =3D was_pending ? cfs_rq->h_kernel_running : cfs_rq->h_nr_runn= ing; idle_task_delta =3D cfs_rq->idle_h_nr_running; kernel_delta =3D cfs_rq->h_kernel_running; + user_delta =3D was_pending ? 0 : cfs_rq->h_user_running; + for_each_sched_entity(se) { struct cfs_rq *qcfs_rq =3D cfs_rq_of(se); /* throttled entity or throttle-on-deactivate */ @@ -5843,6 +5892,8 @@ static bool throttle_cfs_rq(struct cfs_rq *cfs_rq) qcfs_rq->h_nr_running -=3D task_delta; qcfs_rq->idle_h_nr_running -=3D idle_task_delta; dequeue_kernel(qcfs_rq, se, kernel_delta); + qcfs_rq->h_user_running -=3D user_delta; + =20 if (qcfs_rq->load.weight) { /* Avoid re-evaluating load for this entity: */ @@ -5866,6 +5917,7 @@ static bool throttle_cfs_rq(struct cfs_rq *cfs_rq) qcfs_rq->h_nr_running -=3D task_delta; qcfs_rq->idle_h_nr_running -=3D idle_task_delta; dequeue_kernel(qcfs_rq, se, kernel_delta); + qcfs_rq->h_user_running -=3D user_delta; } =20 /* At this point se is NULL and we are at root level*/ @@ -5888,7 +5940,7 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq) struct rq *rq =3D rq_of(cfs_rq); struct cfs_bandwidth *cfs_b =3D tg_cfs_bandwidth(cfs_rq->tg); struct sched_entity *se; - long task_delta, idle_task_delta, kernel_delta; + long task_delta, idle_task_delta, kernel_delta, user_delta; =20 se =3D cfs_rq->tg->se[cpu_of(rq)]; =20 @@ -5924,6 +5976,7 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq) task_delta =3D cfs_rq->h_nr_running; idle_task_delta =3D cfs_rq->idle_h_nr_running; kernel_delta =3D cfs_rq->h_kernel_running; + user_delta =3D cfs_rq->h_user_running; for_each_sched_entity(se) { struct cfs_rq *qcfs_rq =3D cfs_rq_of(se); =20 @@ -5937,6 +5990,9 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq) qcfs_rq->h_nr_running +=3D task_delta; qcfs_rq->idle_h_nr_running +=3D idle_task_delta; enqueue_kernel(qcfs_rq, se, kernel_delta); + qcfs_rq->h_user_running +=3D user_delta; + + assert_cfs_rq_counts(qcfs_rq); =20 /* end evaluation on encountering a throttled cfs_rq */ if (cfs_rq_throttled(qcfs_rq)) @@ -5955,6 +6011,7 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq) qcfs_rq->h_nr_running +=3D task_delta; qcfs_rq->idle_h_nr_running +=3D idle_task_delta; enqueue_kernel(qcfs_rq, se, kernel_delta); + qcfs_rq->h_user_running +=3D user_delta; =20 /* end evaluation on encountering a throttled cfs_rq */ if (cfs_rq_throttled(qcfs_rq)) @@ -6855,6 +6912,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *= p, int flags) int idle_h_nr_running =3D task_has_idle_policy(p); int task_new =3D !(flags & ENQUEUE_WAKEUP); bool kernel_task =3D is_kernel_task(p); + bool throttle_pending =3D false; =20 /* * The code below (indirectly) updates schedutil which looks at @@ -6878,13 +6936,20 @@ enqueue_task_fair(struct rq *rq, struct task_struct= *p, int flags) cfs_rq =3D cfs_rq_of(se); enqueue_entity(cfs_rq, se, flags); =20 - cfs_rq->h_nr_running++; - cfs_rq->idle_h_nr_running +=3D idle_h_nr_running; =20 - if (cfs_rq_is_idle(cfs_rq)) - idle_h_nr_running =3D 1; + if (kernel_task || (!throttle_pending && !cfs_rq->throttle_pending)) + cfs_rq->h_nr_running++; if (kernel_task) enqueue_kernel(cfs_rq, se, 1); + else if (!throttle_pending) + cfs_rq->h_user_running++; + + throttle_pending |=3D cfs_rq->throttle_pending; + + cfs_rq->idle_h_nr_running +=3D idle_h_nr_running; + if (cfs_rq_is_idle(cfs_rq)) + idle_h_nr_running =3D 1; + =20 /* end evaluation on encountering a throttled cfs_rq */ if (cfs_rq_throttled(cfs_rq)) @@ -6900,13 +6965,20 @@ enqueue_task_fair(struct rq *rq, struct task_struct= *p, int flags) se_update_runnable(se); update_cfs_group(se); =20 - cfs_rq->h_nr_running++; - cfs_rq->idle_h_nr_running +=3D idle_h_nr_running; =20 - if (cfs_rq_is_idle(cfs_rq)) - idle_h_nr_running =3D 1; + if (kernel_task || (!throttle_pending && !cfs_rq->throttle_pending)) + cfs_rq->h_nr_running++; if (kernel_task) enqueue_kernel(cfs_rq, se, 1); + else if (!throttle_pending) + cfs_rq->h_user_running++; + + throttle_pending |=3D cfs_rq->throttle_pending; + + cfs_rq->idle_h_nr_running +=3D idle_h_nr_running; + if (cfs_rq_is_idle(cfs_rq)) + idle_h_nr_running =3D 1; + =20 /* end evaluation on encountering a throttled cfs_rq */ if (cfs_rq_throttled(cfs_rq)) @@ -6957,6 +7029,7 @@ static void dequeue_task_fair(struct rq *rq, struct t= ask_struct *p, int flags) int idle_h_nr_running =3D task_has_idle_policy(p); bool was_sched_idle =3D sched_idle_rq(rq); bool kernel_task =3D !list_empty(&p->se.kernel_node); + bool throttle_pending =3D false; =20 util_est_dequeue(&rq->cfs, p); =20 @@ -6964,13 +7037,20 @@ static void dequeue_task_fair(struct rq *rq, struct= task_struct *p, int flags) cfs_rq =3D cfs_rq_of(se); dequeue_entity(cfs_rq, se, flags); =20 - cfs_rq->h_nr_running--; - cfs_rq->idle_h_nr_running -=3D idle_h_nr_running; =20 - if (cfs_rq_is_idle(cfs_rq)) - idle_h_nr_running =3D 1; + if (kernel_task || (!throttle_pending && !cfs_rq->throttle_pending)) + cfs_rq->h_nr_running--; if (kernel_task) dequeue_kernel(cfs_rq, se, 1); + else if (!throttle_pending) + cfs_rq->h_user_running--; + + throttle_pending |=3D cfs_rq->throttle_pending; + + cfs_rq->idle_h_nr_running -=3D idle_h_nr_running; + if (cfs_rq_is_idle(cfs_rq)) + idle_h_nr_running =3D 1; + =20 /* end evaluation on encountering a throttled cfs_rq */ if (cfs_rq_throttled(cfs_rq)) @@ -6998,13 +7078,20 @@ static void dequeue_task_fair(struct rq *rq, struct= task_struct *p, int flags) se_update_runnable(se); update_cfs_group(se); =20 - cfs_rq->h_nr_running--; - cfs_rq->idle_h_nr_running -=3D idle_h_nr_running; =20 - if (cfs_rq_is_idle(cfs_rq)) - idle_h_nr_running =3D 1; + if (kernel_task || (!throttle_pending && !cfs_rq->throttle_pending)) + cfs_rq->h_nr_running--; if (kernel_task) dequeue_kernel(cfs_rq, se, 1); + else if (!throttle_pending) + cfs_rq->h_user_running--; + + throttle_pending |=3D cfs_rq->throttle_pending; + + cfs_rq->idle_h_nr_running -=3D idle_h_nr_running; + if (cfs_rq_is_idle(cfs_rq)) + idle_h_nr_running =3D 1; + =20 /* end evaluation on encountering a throttled cfs_rq */ if (cfs_rq_throttled(cfs_rq)) @@ -8503,28 +8590,65 @@ static void check_preempt_wakeup_fair(struct rq *rq= , struct task_struct *p, int resched_curr(rq); } =20 +/* + * Consider: + * cfs_rq.kernel :=3D count of kernel *tasks* enqueued on this cfs_rq + * cfs_rq.user :=3D count of user *tasks* enqueued on this cfs_rq + * + * Then, the following logic is implemented: + * cfs_rq.h_kernel_running =3D Sum(child.kernel) for all child cfs_rq + * cfs_rq.h_user_running =3D Sum(child.user) for all child cfs_rq wi= th !child.throttle_pending + * cfs_rq.h_nr_running =3D Sum(child.kernel) for all child cfs_rq + * + Sum(child.user) for all child cfs_rq with !child.throttle_pe= nding + * + * IOW, count of kernel tasks is always propagated up the hierarchy, and c= ount + * of user tasks is only propagated up if the cfs_rq isn't .throttle_pendi= ng. + */ static void handle_kernel_task_prev(struct task_struct *prev) { #ifdef CONFIG_CFS_BANDWIDTH struct sched_entity *se =3D &prev->se; bool p_in_kernel =3D is_kernel_task(prev); bool p_in_kernel_tree =3D !list_empty(&se->kernel_node); + bool throttle_pending =3D false; /* * These extra loops are bad and against the whole point of the merged * PNT, but it's a pain to merge, particularly since we want it to occur * before check_cfs_runtime(). */ if (p_in_kernel_tree && !p_in_kernel) { + /* Switch from KERNEL -> USER */ WARN_ON_ONCE(!se->on_rq); /* dequeue should have removed us */ + for_each_sched_entity(se) { - dequeue_kernel(cfs_rq_of(se), se, 1); - if (cfs_rq_throttled(cfs_rq_of(se))) + struct cfs_rq *cfs_rq =3D cfs_rq_of(se); + + if (throttle_pending || cfs_rq->throttle_pending) + cfs_rq->h_nr_running--; + dequeue_kernel(cfs_rq, se, 1); + if (!throttle_pending) + cfs_rq->h_user_running++; + + throttle_pending |=3D cfs_rq->throttle_pending; + + if (cfs_rq_throttled(cfs_rq)) break; } } else if (!p_in_kernel_tree && p_in_kernel && se->on_rq) { + /* Switch from USER -> KERNEL */ + for_each_sched_entity(se) { - enqueue_kernel(cfs_rq_of(se), se, 1); - if (cfs_rq_throttled(cfs_rq_of(se))) + struct cfs_rq *cfs_rq =3D cfs_rq_of(se); + + if (throttle_pending || cfs_rq->throttle_pending) + cfs_rq->h_nr_running++; + enqueue_kernel(cfs_rq, se, 1); + if (!throttle_pending) + cfs_rq->h_user_running--; + + throttle_pending |=3D cfs_rq->throttle_pending; + + if (cfs_rq_throttled(cfs_rq)) break; } } diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 0b33ce2e60555..e8860e0d6fbc7 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -660,6 +660,8 @@ struct cfs_rq { int throttled; int throttle_count; int h_kernel_running; + int h_user_running; + int throttle_pending; struct list_head throttled_list; struct list_head throttled_csd_list; struct list_head kernel_children; --=20 2.43.0