From nobody Tue Dec 30 13:27:22 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B3FF5C07548 for ; Wed, 15 Nov 2023 03:37:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234479AbjKODhw (ORCPT ); Tue, 14 Nov 2023 22:37:52 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33878 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234462AbjKODht (ORCPT ); Tue, 14 Nov 2023 22:37:49 -0500 Received: from mail-pl1-x635.google.com (mail-pl1-x635.google.com [IPv6:2607:f8b0:4864:20::635]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D9C39E9 for ; Tue, 14 Nov 2023 19:37:21 -0800 (PST) Received: by mail-pl1-x635.google.com with SMTP id d9443c01a7336-1cc53d0030fso3748025ad.0 for ; Tue, 14 Nov 2023 19:37:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1700019441; x=1700624241; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=NMEAhqnib6Cx5fXTMT63a6eUDp64lMLN0QbaBCMwWy8=; b=ltB4FYSDCn3reQo3M0rR/nvY9EeGr5UxJiiohxA9VhLXj7WipdROHSDjNKdzHOZDD9 BDvvsYVZYPe+5N7t3ke3q5vceVz3BF0H8jukQD22JjJEw/6lXVFDOFasvnCouHzhWYg6 OpEwegEFIVVYyGktbvVVKzaQGDcZrBPzvwMwmgcdq5jA0RBqrVXlnuPXDOrwCl/gFPHy 1V4JIv6Wr0+YUyt1k403jCqyc+5+Cp988ieadgrp7rn/SWLq6wfRrx+21SHb/MxB7XU9 ld+HqvFtC1yQWtOaPsEzGIwMC5d3/q/3WYb4D+wKS6W9AKlP/9E96/f46T4l9R6hf2NV iPew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700019441; x=1700624241; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=NMEAhqnib6Cx5fXTMT63a6eUDp64lMLN0QbaBCMwWy8=; b=C1FJD15KZTH9TNdv02Pg+1t+SsX0DbWIRXTpzZJixZ2anACzMIbNuJWcDNlRr5RrGM YFhpuYxh3Wsyf1ksmjHoIGzZxfrYKk5zOoIz+aHd+PO94ozoW/8zRirvrg0vOg3NCkfy oAhfBOlgOGqqdaMuQNlRMOQ5b1JBiUdH2O5DCcjz3TO2XV+Sx/jJDQvrhE3KcBZorhle M5BkcTIqsqNqR7XCM+n0Fu1gMDbNsVukUmIbvHhg2JRnNrCzWof5pEJEe1LeG06SNLkB C4OgndC3FvjGENFngawyLT/iHl83FKwot2B9jnEQdLPjBQDCwNGLeSsY5auKOx+apwsK akGQ== X-Gm-Message-State: AOJu0YzK6AO2yJRkDAQ+x2pVHOgRbO5C7yCameSdvTyoFxl7zVLJojnb 2sqbJdAa+rhjZmGHJdrp+L3Hnw== X-Google-Smtp-Source: AGHT+IF1o3A8Q26Q6/Ia30dEfyGjFfcZTfMDGF8BZ5bloFwjrGZzNkr09GffOL8JHhYdd0V15HIncw== X-Received: by 2002:a17:902:c401:b0:1cc:31c4:3426 with SMTP id k1-20020a170902c40100b001cc31c43426mr5883356plk.11.1700019441312; Tue, 14 Nov 2023 19:37:21 -0800 (PST) Received: from C02DV8HUMD6R.bytedance.net ([139.177.225.251]) by smtp.gmail.com with ESMTPSA id l19-20020a170902d35300b001b9da42cd7dsm6419529plk.279.2023.11.14.19.37.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 14 Nov 2023 19:37:20 -0800 (PST) From: Abel Wu To: Peter Zijlstra , Ingo Molnar , Vincent Guittot , Dietmar Eggemann , Valentin Schneider Cc: Barry Song <21cnbao@gmail.com>, Benjamin Segall , Chen Yu , Daniel Jordan , "Gautham R . Shenoy" , Joel Fernandes , K Prateek Nayak , Mike Galbraith , Qais Yousef , Tim Chen , Yicong Yang , Youssef Esmat , linux-kernel@vger.kernel.org, Abel Wu Subject: [PATCH v2 1/4] sched/eevdf: Fix vruntime adjustment on reweight Date: Wed, 15 Nov 2023 11:36:44 +0800 Message-Id: <20231115033647.80785-2-wuyun.abel@bytedance.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20231115033647.80785-1-wuyun.abel@bytedance.com> References: <20231115033647.80785-1-wuyun.abel@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" vruntime of the (on_rq && !0-lag) entity needs to be adjusted when it gets re-weighted, and the calculations can be simplified based on the fact that re-weight won't change the w-average of all the entities. Please check the proofs in comments. But adjusting vruntime can also cause position change in RB-tree hence require re-queue to fix up which might be costly. This might be avoided by deferring adjustment to the time the entity actually leaves tree (dequeue/pick), but that will negatively affect task selection and probably not good enough either. Fixes: 147f3efaa241 ("sched/fair: Implement an EEVDF-like scheduling policy= ") Signed-off-by: Abel Wu --- kernel/sched/fair.c | 151 +++++++++++++++++++++++++++++++++++++------- 1 file changed, 128 insertions(+), 23 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 2048138ce54b..025d90925bf6 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3666,41 +3666,140 @@ static inline void dequeue_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) { } #endif =20 +static void reweight_eevdf(struct cfs_rq *cfs_rq, struct sched_entity *se, + unsigned long weight) +{ + unsigned long old_weight =3D se->load.weight; + u64 avruntime =3D avg_vruntime(cfs_rq); + s64 vlag, vslice; + + /* + * VRUNTIME + * =3D=3D=3D=3D=3D=3D=3D=3D + * + * COROLLARY #1: The virtual runtime of the entity needs to be + * adjusted if re-weight at !0-lag point. + * + * Proof: For contradiction assume this is not true, so we can + * re-weight without changing vruntime at !0-lag point. + * + * Weight VRuntime Avg-VRuntime + * before w v V + * after w' v' V' + * + * Since lag needs to be preserved through re-weight: + * + * lag =3D (V - v)*w =3D (V'- v')*w', where v =3D v' + * =3D=3D> V' =3D (V - v)*w/w' + v (1) + * + * Let W be the total weight of the entities before reweight, + * since V' is the new weighted average of entities: + * + * V' =3D (WV + w'v - wv) / (W + w' - w) (2) + * + * by using (1) & (2) we obtain: + * + * (WV + w'v - wv) / (W + w' - w) =3D (V - v)*w/w' + v + * =3D=3D> (WV-Wv+Wv+w'v-wv)/(W+w'-w) =3D (V - v)*w/w' + v + * =3D=3D> (WV - Wv)/(W + w' - w) + v =3D (V - v)*w/w' + v + * =3D=3D> (V - v)*W/(W + w' - w) =3D (V - v)*w/w' (3) + * + * Since we are doing at !0-lag point which means V !=3D v, we + * can simplify (3): + * + * =3D=3D> W / (W + w' - w) =3D w / w' + * =3D=3D> Ww' =3D Ww + ww' - ww + * =3D=3D> W * (w' - w) =3D w * (w' - w) + * =3D=3D> W =3D w (re-weight indicates w' !=3D w) + * + * So the cfs_rq contains only one entity, hence vruntime of + * the entity @v should always equal to the cfs_rq's weighted + * average vruntime @V, which means we will always re-weight + * at 0-lag point, thus breach assumption. Proof completed. + * + * + * COROLLARY #2: Re-weight does NOT affect weighted average + * vruntime of all the entities. + * + * Proof: According to corollary #1, Eq. (1) should be: + * + * (V - v)*w =3D (V' - v')*w' + * =3D=3D> v' =3D V' - (V - v)*w/w' (4) + * + * According to the weighted average formula, we have: + * + * V' =3D (WV - wv + w'v') / (W - w + w') + * =3D (WV - wv + w'(V' - (V - v)w/w')) / (W - w + w') + * =3D (WV - wv + w'V' - Vw + wv) / (W - w + w') + * =3D (WV + w'V' - Vw) / (W - w + w') + * + * =3D=3D> V'*(W - w + w') =3D WV + w'V' - Vw + * =3D=3D> V' * (W - w) =3D (W - w) * V (5) + * + * If the entity is the only one in the cfs_rq, then reweight + * always occurs at 0-lag point, so V won't change. Or else + * there are other entities, hence W !=3D w, then Eq. (5) turns + * into V' =3D V. So V won't change in either case, proof done. + * + * + * So according to corollary #1 & #2, the effect of re-weight + * on vruntime should be: + * + * v' =3D V' - (V - v) * w / w' (4) + * =3D V - (V - v) * w / w' + * =3D V - vl * w / w' + * =3D V - vl' + */ + if (avruntime !=3D se->vruntime) { + vlag =3D (s64)(avruntime - se->vruntime); + vlag =3D div_s64(vlag * old_weight, weight); + se->vruntime =3D avruntime - vlag; + } + + /* + * DEADLINE + * =3D=3D=3D=3D=3D=3D=3D=3D + * + * When the weight changes, the virtual time slope changes and + * we should adjust the relative virtual deadline accordingly. + * + * d' =3D v' + (d - v)*w/w' + * =3D V' - (V - v)*w/w' + (d - v)*w/w' + * =3D V - (V - v)*w/w' + (d - v)*w/w' + * =3D V + (d - V)*w/w' + */ + vslice =3D (s64)(se->deadline - avruntime); + vslice =3D div_s64(vslice * old_weight, weight); + se->deadline =3D avruntime + vslice; +} + static void reweight_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, unsigned long weight) { - unsigned long old_weight =3D se->load.weight; + bool curr =3D cfs_rq->curr =3D=3D se; =20 if (se->on_rq) { /* commit outstanding execution time */ - if (cfs_rq->curr =3D=3D se) + if (curr) update_curr(cfs_rq); else - avg_vruntime_sub(cfs_rq, se); + __dequeue_entity(cfs_rq, se); update_load_sub(&cfs_rq->load, se->load.weight); } dequeue_load_avg(cfs_rq, se); =20 - update_load_set(&se->load, weight); - if (!se->on_rq) { /* * Because we keep se->vlag =3D V - v_i, while: lag_i =3D w_i*(V - v_i), * we need to scale se->vlag when w_i changes. */ - se->vlag =3D div_s64(se->vlag * old_weight, weight); + se->vlag =3D div_s64(se->vlag * se->load.weight, weight); } else { - s64 deadline =3D se->deadline - se->vruntime; - /* - * When the weight changes, the virtual time slope changes and - * we should adjust the relative virtual deadline accordingly. - */ - deadline =3D div_s64(deadline * old_weight, weight); - se->deadline =3D se->vruntime + deadline; - if (se !=3D cfs_rq->curr) - min_deadline_cb_propagate(&se->run_node, NULL); + reweight_eevdf(cfs_rq, se, weight); } =20 + update_load_set(&se->load, weight); + #ifdef CONFIG_SMP do { u32 divider =3D get_pelt_divider(&se->avg); @@ -3712,8 +3811,17 @@ static void reweight_entity(struct cfs_rq *cfs_rq, s= truct sched_entity *se, enqueue_load_avg(cfs_rq, se); if (se->on_rq) { update_load_add(&cfs_rq->load, se->load.weight); - if (cfs_rq->curr !=3D se) - avg_vruntime_add(cfs_rq, se); + if (!curr) { + /* + * The entity's vruntime has been adjusted, so let's check + * whether the rq-wide min_vruntime needs updated too. Since + * the calculations above require stable min_vruntime rather + * than up-to-date one, we do the update at the end of the + * reweight process. + */ + __enqueue_entity(cfs_rq, se); + update_min_vruntime(cfs_rq); + } } } =20 @@ -3857,14 +3965,11 @@ static void update_cfs_group(struct sched_entity *s= e) =20 #ifndef CONFIG_SMP shares =3D READ_ONCE(gcfs_rq->tg->shares); - - if (likely(se->load.weight =3D=3D shares)) - return; #else - shares =3D calc_group_shares(gcfs_rq); + shares =3D calc_group_shares(gcfs_rq); #endif - - reweight_entity(cfs_rq_of(se), se, shares); + if (unlikely(se->load.weight !=3D shares)) + reweight_entity(cfs_rq_of(se), se, shares); } =20 #else /* CONFIG_FAIR_GROUP_SCHED */ --=20 2.37.3 From nobody Tue Dec 30 13:27:22 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1E189C07548 for ; Wed, 15 Nov 2023 03:37:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234283AbjKODhe (ORCPT ); Tue, 14 Nov 2023 22:37:34 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37838 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234404AbjKODhd (ORCPT ); Tue, 14 Nov 2023 22:37:33 -0500 Received: from mail-pf1-x429.google.com (mail-pf1-x429.google.com [IPv6:2607:f8b0:4864:20::429]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BBFC0F4 for ; Tue, 14 Nov 2023 19:37:28 -0800 (PST) Received: by mail-pf1-x429.google.com with SMTP id d2e1a72fcca58-6c398717726so5504716b3a.2 for ; Tue, 14 Nov 2023 19:37:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1700019448; x=1700624248; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Nu05TqF1mMA8yrCM083vJ/OVx2NY3Qgo312HI68ZsBQ=; b=kufJS7OymXQYVx37zoa5Gbi8CDuXszwi0MgdE12I8j0SMl0pTIskCYXygWbIX6O8UQ 1PMvE3dkOz9LxwLQw8Cc05j87XuYO2cbjGOSSnn7CvdczN1HB+nDueHpWGHT+VpNSsy7 eRqtfi7l3uSDOdAI9v9Qv/QqvtY6z/IBYBFybMJMEGpe7/8BJo+q5qWMC6qlU2xKONjU EV8G9x4rGNl7zt0p8RlAucwdgGfzZlaNIx/9C4fvNY20TDKkBVoaikv7lX+uVKjIkL35 2fMd2qP5UJI+exwfdVnF9bf/KUnYwxyHBWQFexna1LE/0dITJPcSgH9n0KEaYjx+V8bB qDsA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700019448; x=1700624248; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Nu05TqF1mMA8yrCM083vJ/OVx2NY3Qgo312HI68ZsBQ=; b=EqbvOh4BApJp8MvGghNyFuvrPtbKiz/vxDkxqf50g4kpNqWug6ipN9BqXNvLlh3HiB abdCqDaqVZIiT87+7i/O1XsDZKP8o22niqc0sDTgpjO7Y/o+emTaa/M+6cZxCJj6nhzG TQLtw8ky6ZHtJQTbwpFa/NWov5s1EZXZj36Rmvmlr1Mh/xIOujtL7ZFkKb2Xu1X/AYNk mM91d6xryrh+DgGIyFsMW11v2GEnzB0pHhPVDM+T9zSUYYyQHpkzbNGDnhN+D1e8XaEs tUT487WfBURgYdx5n0D0eXwV5zEDbQDxpQCu1e4eeC2G6vV/DKYE872FMkbmgt+oWg1M MjRw== X-Gm-Message-State: AOJu0Yyd2AJKRS1zYG9DeNNjdz0KtBRXtywF1U6NLzRPBmN6WNB1UXHu wBq9HFvMMncnMobM3hO197hXYA== X-Google-Smtp-Source: AGHT+IHtVsbY3/pN1PvZSJXqLJGyXYtYtDaPEojW8ctjRwSC4toEowW62J7SOmtkG8Z+554vGzmDKA== X-Received: by 2002:a05:6a20:12cb:b0:17b:9b0c:f215 with SMTP id v11-20020a056a2012cb00b0017b9b0cf215mr10994158pzg.37.1700019448133; Tue, 14 Nov 2023 19:37:28 -0800 (PST) Received: from C02DV8HUMD6R.bytedance.net ([139.177.225.251]) by smtp.gmail.com with ESMTPSA id l19-20020a170902d35300b001b9da42cd7dsm6419529plk.279.2023.11.14.19.37.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 14 Nov 2023 19:37:27 -0800 (PST) From: Abel Wu To: Peter Zijlstra , Ingo Molnar , Vincent Guittot , Dietmar Eggemann , Valentin Schneider Cc: Barry Song <21cnbao@gmail.com>, Benjamin Segall , Chen Yu , Daniel Jordan , "Gautham R . Shenoy" , Joel Fernandes , K Prateek Nayak , Mike Galbraith , Qais Yousef , Tim Chen , Yicong Yang , Youssef Esmat , linux-kernel@vger.kernel.org, Abel Wu Subject: [PATCH v2 2/4] sched/eevdf: Sort the rbtree by virtual deadline Date: Wed, 15 Nov 2023 11:36:45 +0800 Message-Id: <20231115033647.80785-3-wuyun.abel@bytedance.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20231115033647.80785-1-wuyun.abel@bytedance.com> References: <20231115033647.80785-1-wuyun.abel@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Sort the task timeline by virtual deadline and keep the min_vruntime in the augmented tree, so we can avoid doubling the worst case cost and make full use of the cached leftmost node to enable O(1) fastpath picking in next patch. Signed-off-by: Abel Wu --- include/linux/sched.h | 2 +- kernel/sched/debug.c | 11 ++- kernel/sched/fair.c | 168 +++++++++++++++++------------------------- kernel/sched/sched.h | 1 + 4 files changed, 77 insertions(+), 105 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 292c31697248..cd56d4018527 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -553,7 +553,7 @@ struct sched_entity { struct load_weight load; struct rb_node run_node; u64 deadline; - u64 min_deadline; + u64 min_vruntime; =20 struct list_head group_node; unsigned int on_rq; diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c index 4580a450700e..168eecc209b4 100644 --- a/kernel/sched/debug.c +++ b/kernel/sched/debug.c @@ -628,8 +628,8 @@ static void print_rq(struct seq_file *m, struct rq *rq,= int rq_cpu) =20 void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq) { - s64 left_vruntime =3D -1, min_vruntime, right_vruntime =3D -1, spread; - struct sched_entity *last, *first; + s64 left_vruntime =3D -1, min_vruntime, right_vruntime =3D -1, left_deadl= ine =3D -1, spread; + struct sched_entity *last, *first, *root; struct rq *rq =3D cpu_rq(cpu); unsigned long flags; =20 @@ -644,15 +644,20 @@ void print_cfs_rq(struct seq_file *m, int cpu, struct= cfs_rq *cfs_rq) SPLIT_NS(cfs_rq->exec_clock)); =20 raw_spin_rq_lock_irqsave(rq, flags); + root =3D __pick_root_entity(cfs_rq); + if (root) + left_vruntime =3D root->min_vruntime; first =3D __pick_first_entity(cfs_rq); if (first) - left_vruntime =3D first->vruntime; + left_deadline =3D first->deadline; last =3D __pick_last_entity(cfs_rq); if (last) right_vruntime =3D last->vruntime; min_vruntime =3D cfs_rq->min_vruntime; raw_spin_rq_unlock_irqrestore(rq, flags); =20 + SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "left_deadline", + SPLIT_NS(left_deadline)); SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "left_vruntime", SPLIT_NS(left_vruntime)); SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "min_vruntime", diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 025d90925bf6..e1d686196528 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -551,7 +551,11 @@ static inline u64 min_vruntime(u64 min_vruntime, u64 v= runtime) static inline bool entity_before(const struct sched_entity *a, const struct sched_entity *b) { - return (s64)(a->vruntime - b->vruntime) < 0; + /* + * Tiebreak on vruntime seems unnecessary since it can + * hardly happen. + */ + return (s64)(a->deadline - b->deadline) < 0; } =20 static inline s64 entity_key(struct cfs_rq *cfs_rq, struct sched_entity *s= e) @@ -720,7 +724,7 @@ static void update_entity_lag(struct cfs_rq *cfs_rq, st= ruct sched_entity *se) * Note: using 'avg_vruntime() > se->vruntime' is inacurate due * to the loss in precision caused by the division. */ -int entity_eligible(struct cfs_rq *cfs_rq, struct sched_entity *se) +static int vruntime_eligible(struct cfs_rq *cfs_rq, u64 vruntime) { struct sched_entity *curr =3D cfs_rq->curr; s64 avg =3D cfs_rq->avg_vruntime; @@ -733,7 +737,12 @@ int entity_eligible(struct cfs_rq *cfs_rq, struct sche= d_entity *se) load +=3D weight; } =20 - return avg >=3D entity_key(cfs_rq, se) * load; + return avg >=3D (s64)(vruntime - cfs_rq->min_vruntime) * load; +} + +int entity_eligible(struct cfs_rq *cfs_rq, struct sched_entity *se) +{ + return vruntime_eligible(cfs_rq, se->vruntime); } =20 static u64 __update_min_vruntime(struct cfs_rq *cfs_rq, u64 vruntime) @@ -752,9 +761,8 @@ static u64 __update_min_vruntime(struct cfs_rq *cfs_rq,= u64 vruntime) =20 static void update_min_vruntime(struct cfs_rq *cfs_rq) { - struct sched_entity *se =3D __pick_first_entity(cfs_rq); + struct sched_entity *se =3D __pick_root_entity(cfs_rq); struct sched_entity *curr =3D cfs_rq->curr; - u64 vruntime =3D cfs_rq->min_vruntime; =20 if (curr) { @@ -766,9 +774,9 @@ static void update_min_vruntime(struct cfs_rq *cfs_rq) =20 if (se) { if (!curr) - vruntime =3D se->vruntime; + vruntime =3D se->min_vruntime; else - vruntime =3D min_vruntime(vruntime, se->vruntime); + vruntime =3D min_vruntime(vruntime, se->min_vruntime); } =20 /* ensure we never gain time by being placed backwards. */ @@ -781,34 +789,34 @@ static inline bool __entity_less(struct rb_node *a, c= onst struct rb_node *b) return entity_before(__node_2_se(a), __node_2_se(b)); } =20 -#define deadline_gt(field, lse, rse) ({ (s64)((lse)->field - (rse)->field)= > 0; }) +#define vruntime_gt(field, lse, rse) ({ (s64)((lse)->field - (rse)->field)= > 0; }) =20 -static inline void __update_min_deadline(struct sched_entity *se, struct r= b_node *node) +static inline void __min_vruntime_update(struct sched_entity *se, struct r= b_node *node) { if (node) { struct sched_entity *rse =3D __node_2_se(node); - if (deadline_gt(min_deadline, se, rse)) - se->min_deadline =3D rse->min_deadline; + if (vruntime_gt(min_vruntime, se, rse)) + se->min_vruntime =3D rse->min_vruntime; } } =20 /* - * se->min_deadline =3D min(se->deadline, left->min_deadline, right->min_d= eadline) + * se->min_vruntime =3D min(se->vruntime, {left,right}->min_vruntime) */ -static inline bool min_deadline_update(struct sched_entity *se, bool exit) +static inline bool min_vruntime_update(struct sched_entity *se, bool exit) { - u64 old_min_deadline =3D se->min_deadline; + u64 old_min_vruntime =3D se->min_vruntime; struct rb_node *node =3D &se->run_node; =20 - se->min_deadline =3D se->deadline; - __update_min_deadline(se, node->rb_right); - __update_min_deadline(se, node->rb_left); + se->min_vruntime =3D se->vruntime; + __min_vruntime_update(se, node->rb_right); + __min_vruntime_update(se, node->rb_left); =20 - return se->min_deadline =3D=3D old_min_deadline; + return se->min_vruntime =3D=3D old_min_vruntime; } =20 -RB_DECLARE_CALLBACKS(static, min_deadline_cb, struct sched_entity, - run_node, min_deadline, min_deadline_update); +RB_DECLARE_CALLBACKS(static, min_vruntime_cb, struct sched_entity, + run_node, min_vruntime, min_vruntime_update); =20 /* * Enqueue an entity into the rb-tree: @@ -816,18 +824,28 @@ RB_DECLARE_CALLBACKS(static, min_deadline_cb, struct = sched_entity, static void __enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *s= e) { avg_vruntime_add(cfs_rq, se); - se->min_deadline =3D se->deadline; + se->min_vruntime =3D se->vruntime; rb_add_augmented_cached(&se->run_node, &cfs_rq->tasks_timeline, - __entity_less, &min_deadline_cb); + __entity_less, &min_vruntime_cb); } =20 static void __dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *s= e) { rb_erase_augmented_cached(&se->run_node, &cfs_rq->tasks_timeline, - &min_deadline_cb); + &min_vruntime_cb); avg_vruntime_sub(cfs_rq, se); } =20 +struct sched_entity *__pick_root_entity(struct cfs_rq *cfs_rq) +{ + struct rb_node *root =3D cfs_rq->tasks_timeline.rb_root.rb_node; + + if (!root) + return NULL; + + return __node_2_se(root); +} + struct sched_entity *__pick_first_entity(struct cfs_rq *cfs_rq) { struct rb_node *left =3D rb_first_cached(&cfs_rq->tasks_timeline); @@ -850,23 +868,28 @@ struct sched_entity *__pick_first_entity(struct cfs_r= q *cfs_rq) * with the earliest virtual deadline. * * We can do this in O(log n) time due to an augmented RB-tree. The - * tree keeps the entries sorted on service, but also functions as a - * heap based on the deadline by keeping: + * tree keeps the entries sorted on deadline, but also functions as a + * heap based on the vruntime by keeping: * - * se->min_deadline =3D min(se->deadline, se->{left,right}->min_deadline) + * se->min_vruntime =3D min(se->vruntime, se->{left,right}->min_vruntime) * - * Which allows an EDF like search on (sub)trees. + * Which allows tree pruning through eligibility. */ -static struct sched_entity *__pick_eevdf(struct cfs_rq *cfs_rq) +static struct sched_entity *pick_eevdf(struct cfs_rq *cfs_rq) { struct rb_node *node =3D cfs_rq->tasks_timeline.rb_root.rb_node; struct sched_entity *curr =3D cfs_rq->curr; struct sched_entity *best =3D NULL; - struct sched_entity *best_left =3D NULL; + + /* + * We can safely skip eligibility check if there is only one entity + * in this cfs_rq, saving some cycles. + */ + if (cfs_rq->nr_running =3D=3D 1) + return curr && curr->on_rq ? curr : __node_2_se(node); =20 if (curr && (!curr->on_rq || !entity_eligible(cfs_rq, curr))) curr =3D NULL; - best =3D curr; =20 /* * Once selected, run a task until it either becomes non-eligible or @@ -875,95 +898,38 @@ static struct sched_entity *__pick_eevdf(struct cfs_r= q *cfs_rq) if (sched_feat(RUN_TO_PARITY) && curr && curr->vlag =3D=3D curr->deadline) return curr; =20 + /* Heap search for the EEVD entity */ while (node) { struct sched_entity *se =3D __node_2_se(node); + struct rb_node *left =3D node->rb_left; =20 /* - * If this entity is not eligible, try the left subtree. + * Eligible entities in left subtree are always better + * choices, since they have earlier deadlines. */ - if (!entity_eligible(cfs_rq, se)) { - node =3D node->rb_left; + if (left && vruntime_eligible(cfs_rq, + __node_2_se(left)->min_vruntime)) { + node =3D left; continue; } =20 /* - * Now we heap search eligible trees for the best (min_)deadline + * The left subtree either is empty or has no eligible + * entity, so check the current node since it is the one + * with earliest deadline that might be eligible. */ - if (!best || deadline_gt(deadline, best, se)) + if (entity_eligible(cfs_rq, se)) { best =3D se; - - /* - * Every se in a left branch is eligible, keep track of the - * branch with the best min_deadline - */ - if (node->rb_left) { - struct sched_entity *left =3D __node_2_se(node->rb_left); - - if (!best_left || deadline_gt(min_deadline, best_left, left)) - best_left =3D left; - - /* - * min_deadline is in the left branch. rb_left and all - * descendants are eligible, so immediately switch to the second - * loop. - */ - if (left->min_deadline =3D=3D se->min_deadline) - break; - } - - /* min_deadline is at this node, no need to look right */ - if (se->deadline =3D=3D se->min_deadline) break; - - /* else min_deadline is in the right branch. */ - node =3D node->rb_right; - } - - /* - * We ran into an eligible node which is itself the best. - * (Or nr_running =3D=3D 0 and both are NULL) - */ - if (!best_left || (s64)(best_left->min_deadline - best->deadline) > 0) - return best; - - /* - * Now best_left and all of its children are eligible, and we are just - * looking for deadline =3D=3D min_deadline - */ - node =3D &best_left->run_node; - while (node) { - struct sched_entity *se =3D __node_2_se(node); - - /* min_deadline is the current node */ - if (se->deadline =3D=3D se->min_deadline) - return se; - - /* min_deadline is in the left branch */ - if (node->rb_left && - __node_2_se(node->rb_left)->min_deadline =3D=3D se->min_deadline) { - node =3D node->rb_left; - continue; } =20 - /* else min_deadline is in the right branch */ node =3D node->rb_right; } - return NULL; -} =20 -static struct sched_entity *pick_eevdf(struct cfs_rq *cfs_rq) -{ - struct sched_entity *se =3D __pick_eevdf(cfs_rq); - - if (!se) { - struct sched_entity *left =3D __pick_first_entity(cfs_rq); - if (left) { - pr_err("EEVDF scheduling fail, picking leftmost\n"); - return left; - } - } + if (!best || (curr && entity_before(curr, best))) + best =3D curr; =20 - return se; + return best; } =20 #ifdef CONFIG_SCHED_DEBUG diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 2e5a95486a42..539c7e763f15 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2822,6 +2822,7 @@ DEFINE_LOCK_GUARD_2(double_rq_lock, struct rq, double_rq_lock(_T->lock, _T->lock2), double_rq_unlock(_T->lock, _T->lock2)) =20 +extern struct sched_entity *__pick_root_entity(struct cfs_rq *cfs_rq); extern struct sched_entity *__pick_first_entity(struct cfs_rq *cfs_rq); extern struct sched_entity *__pick_last_entity(struct cfs_rq *cfs_rq); =20 --=20 2.37.3 From nobody Tue Dec 30 13:27:22 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EB9B4C07548 for ; Wed, 15 Nov 2023 03:38:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234460AbjKODiG (ORCPT ); Tue, 14 Nov 2023 22:38:06 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37900 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234506AbjKODiC (ORCPT ); Tue, 14 Nov 2023 22:38:02 -0500 Received: from mail-pg1-x532.google.com (mail-pg1-x532.google.com [IPv6:2607:f8b0:4864:20::532]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A766FF7 for ; Tue, 14 Nov 2023 19:37:35 -0800 (PST) Received: by mail-pg1-x532.google.com with SMTP id 41be03b00d2f7-5bd6ac9833fso4047195a12.0 for ; Tue, 14 Nov 2023 19:37:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1700019455; x=1700624255; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=782xl0ixEcaMsR3OW1SEfy7sX5L5c4L0uPuoo+x1uDU=; b=hzxXhYCtzc7NPYbisVjdcNMdyUQlerZTIH7pf8OJrf9FrSXK2Phhq1mepNY2iAPAlj J+ehnbo23wqh2WUmv5SK+GvkrxiCI2zpKOvfAafgbav0qscQdMM9Pg2r2GLpcZvo+tmI Cz5ijif4Zwut4WPSLrMZBd+NlkxL5PpcqnBZmw8U8a+HetFVMNWk/ESSxTnoTtVerwNM 1C5p03v86E5bFd7Yr5mPBe0H0m0gliI47h0WMOiBn2tjTTjSB0TJgJhkQjX9buFZOU1s q3qF/qVPUjaFHmoVxu3qg26bpGl/pTh3fJtQHZ23UQykFkBc+6lNLvRKajVXTTC5o2YJ OWuQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700019455; x=1700624255; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=782xl0ixEcaMsR3OW1SEfy7sX5L5c4L0uPuoo+x1uDU=; b=TlFeBwN0aFnhoxZ5ossmWXNcjzsFXcqHLnxMewTAPpRYdNJkIXao90uPuuDTU769pp /PDsGG/wwyDgITusmXsYnxhc03p6vQCMZg/+ue2KO09ww14/EPVtQwKRhS3etKn0O7WD LOy/3EMTLmFQxDbCSstiYdNZLLSPvs/7KGnUxVNktIYA/4H/HWs2OsMwJDDLzQtmT2TY z7PIm1JKXVDTdcX931SqBFmH1cxiEmnvP22iG7Q6bc05xLkhkqYGVT5himjaDV82+L3+ 85vJmMP05jAKJSvz2hnC/3Ik+OaoKdnR3ZipWHP+pAdcJnKUbjNT2nWsR49eHxUCWhg3 CYtA== X-Gm-Message-State: AOJu0YxgIVVKcIfwBHrH26OAqdfUFVQf4PEkymicC+CtlsJ0AssR3NF9 fAe2K1PZkotcSjLbYZWQU0fzvw== X-Google-Smtp-Source: AGHT+IHzwmgpedAeYs7JSP8A1zsAz92kv65lWse3717KkOZyjf5mMtzYAcDWqJ35y2JIgt7tnmj3+w== X-Received: by 2002:a05:6a20:d429:b0:16c:b5ce:50f with SMTP id il41-20020a056a20d42900b0016cb5ce050fmr9230882pzb.32.1700019455159; Tue, 14 Nov 2023 19:37:35 -0800 (PST) Received: from C02DV8HUMD6R.bytedance.net ([139.177.225.251]) by smtp.gmail.com with ESMTPSA id l19-20020a170902d35300b001b9da42cd7dsm6419529plk.279.2023.11.14.19.37.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 14 Nov 2023 19:37:34 -0800 (PST) From: Abel Wu To: Peter Zijlstra , Ingo Molnar , Vincent Guittot , Dietmar Eggemann , Valentin Schneider Cc: Barry Song <21cnbao@gmail.com>, Benjamin Segall , Chen Yu , Daniel Jordan , "Gautham R . Shenoy" , Joel Fernandes , K Prateek Nayak , Mike Galbraith , Qais Yousef , Tim Chen , Yicong Yang , Youssef Esmat , linux-kernel@vger.kernel.org, Abel Wu Subject: [PATCH v2 3/4] sched/eevdf: O(1) fastpath for task selection Date: Wed, 15 Nov 2023 11:36:46 +0800 Message-Id: <20231115033647.80785-4-wuyun.abel@bytedance.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20231115033647.80785-1-wuyun.abel@bytedance.com> References: <20231115033647.80785-1-wuyun.abel@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Since the RB-tree is now sorted by deadline, let's first try the leftmost entity which has the earliest virtual deadline. I've done some benchmarks to see its effectiveness. All the benchmarks are done inside a normal cpu cgroup in a clean environment with cpu turbo disabled, on a dual-CPU Intel Xeon(R) Platinum 8260 with 2 NUMA nodes each of which has 24C/48T. hackbench: process/thread + pipe/socket + 1/2/4/8 groups netperf: TCP/UDP + STREAM/RR + 24/48/72/96/192 threads tbench: loopback 24/48/72/96/192 threads schbench: 1/2/4/8 mthreads direct: cfs_rq has only one entity parity: RUN_TO_PARITY fast: O(1) fastpath slow: heap search (%) direct parity fast slow hackbench 92.95 2.02 4.91 0.12 netperf 68.08 6.60 24.18 1.14 tbench 67.55 11.22 20.61 0.62 schbench 69.91 2.65 25.73 1.71 The above results indicate that this fastpath really makes task selection more efficient. Signed-off-by: Abel Wu --- kernel/sched/fair.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index e1d686196528..4197258b76ab 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -878,6 +878,7 @@ struct sched_entity *__pick_first_entity(struct cfs_rq = *cfs_rq) static struct sched_entity *pick_eevdf(struct cfs_rq *cfs_rq) { struct rb_node *node =3D cfs_rq->tasks_timeline.rb_root.rb_node; + struct sched_entity *se =3D __pick_first_entity(cfs_rq); struct sched_entity *curr =3D cfs_rq->curr; struct sched_entity *best =3D NULL; =20 @@ -886,7 +887,7 @@ static struct sched_entity *pick_eevdf(struct cfs_rq *c= fs_rq) * in this cfs_rq, saving some cycles. */ if (cfs_rq->nr_running =3D=3D 1) - return curr && curr->on_rq ? curr : __node_2_se(node); + return curr && curr->on_rq ? curr : se; =20 if (curr && (!curr->on_rq || !entity_eligible(cfs_rq, curr))) curr =3D NULL; @@ -898,9 +899,14 @@ static struct sched_entity *pick_eevdf(struct cfs_rq *= cfs_rq) if (sched_feat(RUN_TO_PARITY) && curr && curr->vlag =3D=3D curr->deadline) return curr; =20 + /* Pick the leftmost entity if it's eligible */ + if (se && entity_eligible(cfs_rq, se)) { + best =3D se; + goto found; + } + /* Heap search for the EEVD entity */ while (node) { - struct sched_entity *se =3D __node_2_se(node); struct rb_node *left =3D node->rb_left; =20 /* @@ -913,6 +919,8 @@ static struct sched_entity *pick_eevdf(struct cfs_rq *c= fs_rq) continue; } =20 + se =3D __node_2_se(node); + /* * The left subtree either is empty or has no eligible * entity, so check the current node since it is the one @@ -925,7 +933,7 @@ static struct sched_entity *pick_eevdf(struct cfs_rq *c= fs_rq) =20 node =3D node->rb_right; } - +found: if (!best || (curr && entity_before(curr, best))) best =3D curr; =20 --=20 2.37.3 From nobody Tue Dec 30 13:27:22 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B3D36C07548 for ; Wed, 15 Nov 2023 03:37:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234312AbjKODhs (ORCPT ); Tue, 14 Nov 2023 22:37:48 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37882 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234426AbjKODhp (ORCPT ); Tue, 14 Nov 2023 22:37:45 -0500 Received: from mail-pf1-x42b.google.com (mail-pf1-x42b.google.com [IPv6:2607:f8b0:4864:20::42b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6C49EE4 for ; Tue, 14 Nov 2023 19:37:42 -0800 (PST) Received: by mail-pf1-x42b.google.com with SMTP id d2e1a72fcca58-6b5af4662b7so5606358b3a.3 for ; Tue, 14 Nov 2023 19:37:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1700019462; x=1700624262; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=JyoiUQBgcyNRGwstcopIfmNH6aSfBuLyH/TrScFe48c=; b=emlavt45PVVibqkVPzPNGY/PJZFuhD3KoG+Kmkuq6kDW2kwo2ZLZ/FVCeMSGpcXoci BQT03VLR7rfcEkxCIU5vC+nMd03dTsVIWapX6O0YIPsp3m+wfpRbieHWtvNw+CXhMsOa PIsPyNJWJZDesZN36wOqlv4KlWNtaAeuoUNub2d6ik6/pZUFRsMBHpiVxbhVdug0f+3p /uCNo8fjYbt5Di2q5e57omXj+pb0/HNsXjFmdp5AsSnEcAoxY2nwTL+2sAQUFaxmtfyv OjCLZA1jnkAgch2QU7vHJGo84C3FlysEDI5G/AeQVIo8B3G9E0zQAcWx3J5ecTJDWZG/ 6ffw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700019462; x=1700624262; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=JyoiUQBgcyNRGwstcopIfmNH6aSfBuLyH/TrScFe48c=; b=lrahp5K2k9j2wDRWDxumHYDVs2e7HJPKz3msqAozjm2dCjRY8uwX40mPj3XxtvBBFh C4BnetI9jgiAaUhQkH3G6O/3iIvSZVQHUx5xEWVxtSYQXcLfsQSFWvY6O0Bd6W645emh Ex4GNcrgweO+JgkdzuTByR0nrU4cDH67BKAHxoBpmHBCgHYrLSd8DmhuIVZn/EZDfdrE INtXutfxmtwn+GWdp5pUQCHFXCFEiHEnf18rHfM1AJoqylmTsyqgglMkuBMlIuZO7MXg bmsZ04gn25ms7p8O2NhtMq43rOyaDz2sZ0Kl0VFf41DK0jNnnyTnoqGQSX9GXtbarka0 RJOQ== X-Gm-Message-State: AOJu0Yz3LQNUIS8VW1UXaOmjkrCPVWa0qLThB9hKu8jV3G0KjyahhkhM BuoDASFbxC/HpUekdqeH86m2wg== X-Google-Smtp-Source: AGHT+IGdowEBy5PG8nDrQuAgESQrDHu9sMMCIJU4pu5ygHFwqnrD4l9aaPVWfg69FiGuiyQ61LiQxg== X-Received: by 2002:a05:6a21:a59b:b0:181:8654:8279 with SMTP id gd27-20020a056a21a59b00b0018186548279mr11160565pzc.47.1700019461963; Tue, 14 Nov 2023 19:37:41 -0800 (PST) Received: from C02DV8HUMD6R.bytedance.net ([139.177.225.251]) by smtp.gmail.com with ESMTPSA id l19-20020a170902d35300b001b9da42cd7dsm6419529plk.279.2023.11.14.19.37.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 14 Nov 2023 19:37:41 -0800 (PST) From: Abel Wu To: Peter Zijlstra , Ingo Molnar , Vincent Guittot , Dietmar Eggemann , Valentin Schneider Cc: Barry Song <21cnbao@gmail.com>, Benjamin Segall , Chen Yu , Daniel Jordan , "Gautham R . Shenoy" , Joel Fernandes , K Prateek Nayak , Mike Galbraith , Qais Yousef , Tim Chen , Yicong Yang , Youssef Esmat , linux-kernel@vger.kernel.org, Abel Wu Subject: [PATCH v2 4/4] sched/stats: Add statistics for pick_eevdf() Date: Wed, 15 Nov 2023 11:36:47 +0800 Message-Id: <20231115033647.80785-5-wuyun.abel@bytedance.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20231115033647.80785-1-wuyun.abel@bytedance.com> References: <20231115033647.80785-1-wuyun.abel@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" For statistical purpose only, not intended to upstream. Signed-off-by: Abel Wu --- kernel/sched/fair.c | 12 ++++++++++-- kernel/sched/sched.h | 5 +++++ kernel/sched/stats.c | 6 ++++-- 3 files changed, 19 insertions(+), 4 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 4197258b76ab..94d9318ac484 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -881,13 +881,16 @@ static struct sched_entity *pick_eevdf(struct cfs_rq = *cfs_rq) struct sched_entity *se =3D __pick_first_entity(cfs_rq); struct sched_entity *curr =3D cfs_rq->curr; struct sched_entity *best =3D NULL; + struct rq *rq =3D rq_of(cfs_rq); =20 /* * We can safely skip eligibility check if there is only one entity * in this cfs_rq, saving some cycles. */ - if (cfs_rq->nr_running =3D=3D 1) + if (cfs_rq->nr_running =3D=3D 1) { + schedstat_inc(rq->pick_direct); return curr && curr->on_rq ? curr : se; + } =20 if (curr && (!curr->on_rq || !entity_eligible(cfs_rq, curr))) curr =3D NULL; @@ -896,15 +899,20 @@ static struct sched_entity *pick_eevdf(struct cfs_rq = *cfs_rq) * Once selected, run a task until it either becomes non-eligible or * until it gets a new slice. See the HACK in set_next_entity(). */ - if (sched_feat(RUN_TO_PARITY) && curr && curr->vlag =3D=3D curr->deadline) + if (sched_feat(RUN_TO_PARITY) && curr && curr->vlag =3D=3D curr->deadline= ) { + schedstat_inc(rq->pick_parity); return curr; + } =20 /* Pick the leftmost entity if it's eligible */ if (se && entity_eligible(cfs_rq, se)) { + schedstat_inc(rq->pick_fast); best =3D se; goto found; } =20 + schedstat_inc(rq->pick_slow); + /* Heap search for the EEVD entity */ while (node) { struct rb_node *left =3D node->rb_left; diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 539c7e763f15..85a79990a698 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1105,6 +1105,11 @@ struct rq { /* try_to_wake_up() stats */ unsigned int ttwu_count; unsigned int ttwu_local; + + unsigned int pick_direct; + unsigned int pick_parity; + unsigned int pick_fast; + unsigned int pick_slow; #endif =20 #ifdef CONFIG_CPU_IDLE diff --git a/kernel/sched/stats.c b/kernel/sched/stats.c index 857f837f52cb..4b862c798989 100644 --- a/kernel/sched/stats.c +++ b/kernel/sched/stats.c @@ -133,12 +133,14 @@ static int show_schedstat(struct seq_file *seq, void = *v) =20 /* runqueue-specific stats */ seq_printf(seq, - "cpu%d %u 0 %u %u %u %u %llu %llu %lu", + "cpu%d %u 0 %u %u %u %u %llu %llu %lu %u %u %u %u", cpu, rq->yld_count, rq->sched_count, rq->sched_goidle, rq->ttwu_count, rq->ttwu_local, rq->rq_cpu_time, - rq->rq_sched_info.run_delay, rq->rq_sched_info.pcount); + rq->rq_sched_info.run_delay, rq->rq_sched_info.pcount, + rq->pick_direct, rq->pick_parity, + rq->pick_fast, rq->pick_slow); =20 seq_printf(seq, "\n"); =20 --=20 2.37.3