From nobody Wed Sep 10 02:30:23 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E9EFEC636D7 for ; Thu, 23 Feb 2023 19:11:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232055AbjBWTLm (ORCPT ); Thu, 23 Feb 2023 14:11:42 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57966 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231681AbjBWTLT (ORCPT ); Thu, 23 Feb 2023 14:11:19 -0500 Received: from mail-wr1-x42b.google.com (mail-wr1-x42b.google.com [IPv6:2a00:1450:4864:20::42b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9F31436FC1 for ; Thu, 23 Feb 2023 11:10:58 -0800 (PST) Received: by mail-wr1-x42b.google.com with SMTP id bw19so926096wrb.13 for ; Thu, 23 Feb 2023 11:10:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=lrHADWHoMY9y8ELVmvMbzTureJuyfP6JlwHmhpl9ZDg=; b=Qyw1CqW71eyXYbSz0GTnhvWK1TYOrEuH868f5034qbj8ll371dbPYSqy5emdEUDKGB 42bYlsrkmT1rK0Z/31TH70o0tisVmmC4i6Chi5MZJLpbzi3QxfPaSDgh8n+ZuvR/ifPI d655ZEinrxDVcIJyowzv4a7GQndq1CoIVqg1iKbq7qkFM9Sl7W2zednQlkJ8z25swJqB ANtbIND9z53bgeEp0PMHu58W7WDiGjA02kYKiM8K4u0tyCEq+tJ7k97+Ayatc6Jjww5n 7bkXG/2cvljjOEubkyxfPTmqpUlfuzxhI5IqRje3UzJ1iMVKAfz/9AavplCgCxdq1x8R esQw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=lrHADWHoMY9y8ELVmvMbzTureJuyfP6JlwHmhpl9ZDg=; b=VlcUGhPx7opaNj+VMeThFbquxs2D/91LQm4EjO+1KpYmA12HqZI7O86M09eJTfryNs UUxd967g+wBmp2RKZWAsUTOvsMslHUSP4Hu/h8ZXxe8GpgkFDRcj6zMJyUQdK9Koeo1z buMhayv8D1AI0Is3Bs8/04E3hVt7XuUn2+/FYZ+Da1z9scvILdI7y/Gu+8mnABC38jC1 /mEvSvFgDymwk6C6IDAOcg+Na6f1AptaiAgwfREmZbVWI+WgHGtVx3977fIDY9MToqKG j8sgPqpRZSvSJY+SoT2Z0Bl1R/UCI/FGPVJOMsbb4m+bly4ZBEfZajw0CiQ6QbzGWA2w iKTg== X-Gm-Message-State: AO0yUKWreZCpoYhmXmmK7+qXHJmp70DKAkVhzYtK3SSYw7zS/8KoaWR9 i/PnU+yA+Dy5pGLgmlNzcTvvTQ== X-Google-Smtp-Source: AK7set907oOLmf5B2utz5wIhikq7alUFQyP2J1ImCu7ZiubjkvQNk70Sm9GawcW4HF6tZz+tgwZGKQ== X-Received: by 2002:adf:cd0c:0:b0:2c7:454:cee8 with SMTP id w12-20020adfcd0c000000b002c70454cee8mr9504064wrm.1.1677179457122; Thu, 23 Feb 2023 11:10:57 -0800 (PST) Received: from vingu-book.. ([2a01:e0a:f:6020:a6f0:4ee9:c103:44cb]) by smtp.gmail.com with ESMTPSA id k2-20020adff282000000b002c6e8cb612fsm9844481wro.92.2023.02.23.11.10.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Feb 2023 11:10:56 -0800 (PST) From: Vincent Guittot To: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vschneid@redhat.com, linux-kernel@vger.kernel.org, parth@linux.ibm.com, lizefan.x@bytedance.com, hannes@cmpxchg.org, cgroups@vger.kernel.org, corbet@lwn.net, linux-doc@vger.kernel.org Cc: tj@kernel.org, qyousef@layalina.io, chris.hyser@oracle.com, patrick.bellasi@matbug.net, David.Laight@aculab.com, pjt@google.com, pavel@ucw.cz, qperret@google.com, tim.c.chen@linux.intel.com, joshdon@google.com, timj@gnu.org, kprateek.nayak@amd.com, yu.c.chen@intel.com, youssefesmat@chromium.org, joel@joelfernandes.org, Vincent Guittot Subject: [PATCH v11 8/8] sched/fair: Add latency list Date: Thu, 23 Feb 2023 20:10:41 +0100 Message-Id: <20230223191041.577305-9-vincent.guittot@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230223191041.577305-1-vincent.guittot@linaro.org> References: <20230223191041.577305-1-vincent.guittot@linaro.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Add a rb tree for latency sensitive entities so we can schedule the most sensitive one first even when it failed to preempt current at wakeup or when it got quickly preempted by another entity of higher priority. In order to keep fairness, the latency is used once at wakeup to get a minimum slice and not during the following scheduling slice to prevent long running entity to got more running time than allocated to his nice priority. The rb tree enables to cover the last corner case where latency sensitive entity can't got schedule quickly after the wakeup. Signed-off-by: Vincent Guittot Tested-by: K Prateek Nayak --- include/linux/sched.h | 1 + kernel/sched/core.c | 1 + kernel/sched/fair.c | 108 ++++++++++++++++++++++++++++++++++++++++-- kernel/sched/sched.h | 1 + 4 files changed, 108 insertions(+), 3 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 38decae3e156..41bb92be5ecc 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -548,6 +548,7 @@ struct sched_entity { /* For load-balancing: */ struct load_weight load; struct rb_node run_node; + struct rb_node latency_node; struct list_head group_node; unsigned int on_rq; =20 diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 093cc1af73dc..752fd364216c 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4434,6 +4434,7 @@ static void __sched_fork(unsigned long clone_flags, s= truct task_struct *p) p->se.nr_migrations =3D 0; p->se.vruntime =3D 0; INIT_LIST_HEAD(&p->se.group_node); + RB_CLEAR_NODE(&p->se.latency_node); =20 #ifdef CONFIG_FAIR_GROUP_SCHED p->se.cfs_rq =3D NULL; diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 125a6ff53378..9fb0461197bd 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -680,7 +680,85 @@ struct sched_entity *__pick_last_entity(struct cfs_rq = *cfs_rq) =20 return __node_2_se(last); } +#endif =20 +/************************************************************** + * Scheduling class tree data structure manipulation methods: + * for latency + */ + +static inline bool latency_before(struct sched_entity *a, + struct sched_entity *b) +{ + return (s64)(a->vruntime + a->latency_offset - b->vruntime - b->latency_o= ffset) < 0; +} + +#define __latency_node_2_se(node) \ + rb_entry((node), struct sched_entity, latency_node) + +static inline bool __latency_less(struct rb_node *a, const struct rb_node = *b) +{ + return latency_before(__latency_node_2_se(a), __latency_node_2_se(b)); +} + +/* + * Enqueue an entity into the latency rb-tree: + */ +static void __enqueue_latency(struct cfs_rq *cfs_rq, struct sched_entity *= se, int flags) +{ + + /* Only latency sensitive entity can be added to the list */ + if (se->latency_offset >=3D 0) + return; + + if (!RB_EMPTY_NODE(&se->latency_node)) + return; + + /* + * The entity is always added the latency list at wakeup. + * Then, a not waking up entity that is put back in the list after an + * execution time less than sysctl_sched_min_granularity, means that + * the entity has been preempted by a higher sched class or an entity + * with higher latency constraint. In thi case, the entity is also put + * back in the latency list so it gets a chance to run 1st during the + * next slice. + */ + if (!(flags & ENQUEUE_WAKEUP)) { + u64 delta_exec =3D se->sum_exec_runtime - se->prev_sum_exec_runtime; + + if (delta_exec >=3D sysctl_sched_min_granularity) + return; + } + + rb_add_cached(&se->latency_node, &cfs_rq->latency_timeline, __latency_les= s); +} + +/* + * Dequeue an entity from the latency rb-tree and return true if it was re= ally + * part of the rb-tree: + */ +static bool __dequeue_latency(struct cfs_rq *cfs_rq, struct sched_entity *= se) +{ + if (!RB_EMPTY_NODE(&se->latency_node)) { + rb_erase_cached(&se->latency_node, &cfs_rq->latency_timeline); + RB_CLEAR_NODE(&se->latency_node); + return true; + } + + return false; +} + +static struct sched_entity *__pick_first_latency(struct cfs_rq *cfs_rq) +{ + struct rb_node *left =3D rb_first_cached(&cfs_rq->latency_timeline); + + if (!left) + return NULL; + + return __latency_node_2_se(left); +} + +#ifdef CONFIG_SCHED_DEBUG /************************************************************** * Scheduling class statistics methods: */ @@ -4758,8 +4836,10 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct sched_e= ntity *se, int flags) check_schedstat_required(); update_stats_enqueue_fair(cfs_rq, se, flags); check_spread(cfs_rq, se); - if (!curr) + if (!curr) { __enqueue_entity(cfs_rq, se); + __enqueue_latency(cfs_rq, se, flags); + } se->on_rq =3D 1; =20 if (cfs_rq->nr_running =3D=3D 1) { @@ -4845,8 +4925,10 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_e= ntity *se, int flags) =20 clear_buddies(cfs_rq, se); =20 - if (se !=3D cfs_rq->curr) + if (se !=3D cfs_rq->curr) { __dequeue_entity(cfs_rq, se); + __dequeue_latency(cfs_rq, se); + } se->on_rq =3D 0; account_entity_dequeue(cfs_rq, se); =20 @@ -4941,6 +5023,7 @@ set_next_entity(struct cfs_rq *cfs_rq, struct sched_e= ntity *se) */ update_stats_wait_end_fair(cfs_rq, se); __dequeue_entity(cfs_rq, se); + __dequeue_latency(cfs_rq, se); update_load_avg(cfs_rq, se, UPDATE_TG); } =20 @@ -4979,7 +5062,7 @@ static struct sched_entity * pick_next_entity(struct cfs_rq *cfs_rq, struct sched_entity *curr) { struct sched_entity *left =3D __pick_first_entity(cfs_rq); - struct sched_entity *se; + struct sched_entity *latency, *se; =20 /* * If curr is set we have to see if its left of the leftmost entity @@ -5021,6 +5104,12 @@ pick_next_entity(struct cfs_rq *cfs_rq, struct sched= _entity *curr) se =3D cfs_rq->last; } =20 + /* Check for latency sensitive entity waiting for running */ + latency =3D __pick_first_latency(cfs_rq); + if (latency && (latency !=3D se) && + wakeup_preempt_entity(latency, se) < 1) + se =3D latency; + return se; } =20 @@ -5044,6 +5133,7 @@ static void put_prev_entity(struct cfs_rq *cfs_rq, st= ruct sched_entity *prev) update_stats_wait_start_fair(cfs_rq, prev); /* Put 'current' back into the tree. */ __enqueue_entity(cfs_rq, prev); + __enqueue_latency(cfs_rq, prev, 0); /* in !on_rq case, update occurred at dequeue */ update_load_avg(cfs_rq, prev, 0); } @@ -12222,6 +12312,7 @@ static void set_next_task_fair(struct rq *rq, struc= t task_struct *p, bool first) void init_cfs_rq(struct cfs_rq *cfs_rq) { cfs_rq->tasks_timeline =3D RB_ROOT_CACHED; + cfs_rq->latency_timeline =3D RB_ROOT_CACHED; u64_u32_store(cfs_rq->min_vruntime, (u64)(-(1LL << 20))); #ifdef CONFIG_SMP raw_spin_lock_init(&cfs_rq->removed.lock); @@ -12529,8 +12620,19 @@ int sched_group_set_latency(struct task_group *tg,= int prio) =20 for_each_possible_cpu(i) { struct sched_entity *se =3D tg->se[i]; + struct rq *rq =3D cpu_rq(i); + struct rq_flags rf; + bool queued; + + rq_lock_irqsave(rq, &rf); =20 + queued =3D __dequeue_latency(se->cfs_rq, se); WRITE_ONCE(se->latency_offset, latency_offset); + if (queued) + __enqueue_latency(se->cfs_rq, se, ENQUEUE_WAKEUP); + + + rq_unlock_irqrestore(rq, &rf); } =20 mutex_unlock(&shares_mutex); diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 9a2e71231083..21dd309e98a9 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -570,6 +570,7 @@ struct cfs_rq { #endif =20 struct rb_root_cached tasks_timeline; + struct rb_root_cached latency_timeline; =20 /* * 'curr' points to currently running entity on this cfs_rq. --=20 2.34.1