From nobody Wed Dec 31 05:24:05 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 62908C4167B for ; Tue, 7 Nov 2023 09:07:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233773AbjKGJHa (ORCPT ); Tue, 7 Nov 2023 04:07:30 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51088 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229541AbjKGJH1 (ORCPT ); Tue, 7 Nov 2023 04:07:27 -0500 Received: from mail-il1-x129.google.com (mail-il1-x129.google.com [IPv6:2607:f8b0:4864:20::129]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3385B101 for ; Tue, 7 Nov 2023 01:07:00 -0800 (PST) Received: by mail-il1-x129.google.com with SMTP id e9e14a558f8ab-3575287211bso18784375ab.1 for ; Tue, 07 Nov 2023 01:07:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1699348019; x=1699952819; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=5WhtyOuoYTC7y8LDKEmQ/ZymxuVCiJ9n8cawTAnr7CM=; b=PYTtV0ZFmM1bebYpgz6OEjwztW+AexkOIbsl7ifPrRqVM+gbJsSE09N71BiKicfY+r E4mr754r3tyoZF0SvvUP5o7SWHE0v7vomORX5VqCltfXkYBkEJkexrRIBb22vXM1uYyw As+HqOnjRdCWLRlKDIVwUrMfEjUG/7B2YLATjzKK6pJ3d7Rp/bYynSaBNm7P/k56x0fW cyTgyuBhKK7yyay23n0PucPLcR4Yo5/526FwwunPwPBRERBOQpxj+r31VBocHzi69n+d /H0Svb+cNjF53FH+FAoYbq9GX5I8BidMgrT5E12DHbzaTJSrowNkr/ccrqReR9GD38/b z3xQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699348019; x=1699952819; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=5WhtyOuoYTC7y8LDKEmQ/ZymxuVCiJ9n8cawTAnr7CM=; b=taOGgxTKfCPDtfE4iWI0H9rmvgSwMhwcNtQCHKwROSdPwBp+PN3j5JE6yaZu4A1mZo PvAitET+2CZEBTAQhsDC8U3ywYPMzmuf0ZpiFUtV0nuKsNaFOy9Evn571Xl3K6KK/JDo rmOI7XlV9sNiZs1Y1+I9g1SqP/y3Pu/GHeBFWTv46pEAI8vEiVjQCEQa1Fgu2DX8Z0rK YCBujll/91LEtiA4hiZJ2nBXpBhOQYMtDn9wISaWG3AEZCQCdQnxKZiVpZfvMdxddAng kHcH7WKRtrIBtBR7jcwBlGXo0uxM1B7T4nqTE4aMXAtb19Yte0t6CzmXBUWlSMRBCYcn P/Wg== X-Gm-Message-State: AOJu0YxXU2mQzpdUTErkMjk2BI/aUod2OaNT1iADkDICtmE7DE+zjMWB oM91AU8pC8XTIjTJEjkdmqrnpQ== X-Google-Smtp-Source: AGHT+IEIZ7CTtj4Z+Mlik01+Ryk6gfGcuPZXNFyBQMW/QDE1hVQZIC30SBzzm50JmvGK9v+U9cx6kQ== X-Received: by 2002:a05:6e02:1c22:b0:359:4726:9017 with SMTP id m2-20020a056e021c2200b0035947269017mr2718352ilh.15.1699348019547; Tue, 07 Nov 2023 01:06:59 -0800 (PST) Received: from C02DV8HUMD6R.bytedance.net ([203.208.167.147]) by smtp.gmail.com with ESMTPSA id k27-20020a63ba1b000000b005ab7b055573sm875154pgf.79.2023.11.07.01.06.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 07 Nov 2023 01:06:59 -0800 (PST) From: Abel Wu To: Peter Zijlstra , Ingo Molnar , Vincent Guittot , Dietmar Eggemann , Valentin Schneider Cc: Barry Song <21cnbao@gmail.com>, Benjamin Segall , Chen Yu , Daniel Jordan , "Gautham R . Shenoy" , Joel Fernandes , K Prateek Nayak , Mike Galbraith , Qais Yousef , Tim Chen , Yicong Yang , Youssef Esmat , linux-kernel@vger.kernel.org, Abel Wu Subject: [PATCH 1/4] sched/eevdf: Fix vruntime adjustment on reweight Date: Tue, 7 Nov 2023 17:05:07 +0800 Message-Id: <20231107090510.71322-2-wuyun.abel@bytedance.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20231107090510.71322-1-wuyun.abel@bytedance.com> References: <20231107090510.71322-1-wuyun.abel@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" vruntime of the (on_rq && !0-lag) entity needs to be adjusted when it gets re-weighted, and the calculations can be simplified based on the fact that re-weight won't change the w-average of all the entities. Please check the proofs in comments. But adjusting vruntime can also cause position change in RB-tree hence require re-queue to fix up which might be costly. This might be avoided by deferring adjustment to the time the entity actually leaves tree (dequeue/pick), but that will negatively affect task selection and probably not good enough either. Fixes: 147f3efaa241 ("sched/fair: Implement an EEVDF-like scheduling policy= ") Signed-off-by: Abel Wu --- kernel/sched/fair.c | 151 +++++++++++++++++++++++++++++++++++++------- 1 file changed, 128 insertions(+), 23 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 8767988242ee..b00d09a9b601 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3666,41 +3666,140 @@ static inline void dequeue_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) { } #endif =20 +static void reweight_eevdf(struct cfs_rq *cfs_rq, struct sched_entity *se, + unsigned long weight) +{ + unsigned long old_weight =3D se->load.weight; + u64 avruntime =3D avg_vruntime(cfs_rq); + s64 vlag, vslice; + + /* + * VRUNTIME + * =3D=3D=3D=3D=3D=3D=3D=3D + * + * COROLLARY #1: The virtual runtime of the entity needs to be + * adjusted if re-weight at !0-lag point. + * + * Proof: For contradiction assume this is not true, so we can + * re-weight without changing vruntime at !0-lag point. + * + * Weight VRuntime Avg-VRuntime + * before w v V + * after w' v' V' + * + * Since lag needs to be preserved through re-weight: + * + * lag =3D (V - v)*w =3D (V'- v')*w', where v =3D v' + * =3D=3D> V' =3D (V - v)*w/w' + v (1) + * + * Let W be the total weight of the entities before reweight, + * since V' is the new weighted average of entities: + * + * V' =3D (WV + w'v - wv) / (W + w' - w) (2) + * + * by using (1) & (2) we obtain: + * + * (WV + w'v - wv) / (W + w' - w) =3D (V - v)*w/w' + v + * =3D=3D> (WV-Wv+Wv+w'v-wv)/(W+w'-w) =3D (V - v)*w/w' + v + * =3D=3D> (WV - Wv)/(W + w' - w) + v =3D (V - v)*w/w' + v + * =3D=3D> (V - v)*W/(W + w' - w) =3D (V - v)*w/w' (3) + * + * Since we are doing at !0-lag point which means V !=3D v, we + * can simplify (3): + * + * =3D=3D> W / (W + w' - w) =3D w / w' + * =3D=3D> Ww' =3D Ww + ww' - ww + * =3D=3D> W * (w' - w) =3D w * (w' - w) + * =3D=3D> W =3D w (re-weight indicates w' !=3D w) + * + * So the cfs_rq contains only one entity, hence vruntime of + * the entity @v should always equal to the cfs_rq's weighted + * average vruntime @V, which means we will always re-weight + * at 0-lag point, thus breach assumption. Proof completed. + * + * + * COROLLARY #2: Re-weight does NOT affect weighted average + * vruntime of all the entities. + * + * Proof: According to corollary #1, Eq. (1) should be: + * + * (V - v)*w =3D (V' - v')*w' + * =3D=3D> v' =3D V' - (V - v)*w/w' (4) + * + * According to the weighted average formula, we have: + * + * V' =3D (WV - wv + w'v') / (W - w + w') + * =3D (WV - wv + w'(V' - (V - v)w/w')) / (W - w + w') + * =3D (WV - wv + w'V' - Vw + wv) / (W - w + w') + * =3D (WV + w'V' - Vw) / (W - w + w') + * + * =3D=3D> V'*(W - w + w') =3D WV + w'V' - Vw + * =3D=3D> V' * (W - w) =3D (W - w) * V (5) + * + * If the entity is the only one in the cfs_rq, then reweight + * always occurs at 0-lag point, so V won't change. Or else + * there are other entities, hence W !=3D w, then Eq. (5) turns + * into V' =3D V. So V won't change in either case, proof done. + * + * + * So according to corollary #1 & #2, the effect of re-weight + * on vruntime should be: + * + * v' =3D V' - (V - v) * w / w' (4) + * =3D V - (V - v) * w / w' + * =3D V - vl * w / w' + * =3D V - vl' + */ + if (avruntime !=3D se->vruntime) { + vlag =3D (s64)(avruntime - se->vruntime); + vlag =3D div_s64(vlag * old_weight, weight); + se->vruntime =3D avruntime - vlag; + } + + /* + * DEADLINE + * =3D=3D=3D=3D=3D=3D=3D=3D + * + * When the weight changes, the virtual time slope changes and + * we should adjust the relative virtual deadline accordingly. + * + * d' =3D v' + (d - v)*w/w' + * =3D V' - (V - v)*w/w' + (d - v)*w/w' + * =3D V - (V - v)*w/w' + (d - v)*w/w' + * =3D V + (d - V)*w/w' + */ + vslice =3D (s64)(se->deadline - avruntime); + vslice =3D div_s64(vslice * old_weight, weight); + se->deadline =3D avruntime + vslice; +} + static void reweight_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, unsigned long weight) { - unsigned long old_weight =3D se->load.weight; + bool curr =3D cfs_rq->curr =3D=3D se; =20 if (se->on_rq) { /* commit outstanding execution time */ - if (cfs_rq->curr =3D=3D se) + if (curr) update_curr(cfs_rq); else - avg_vruntime_sub(cfs_rq, se); + __dequeue_entity(cfs_rq, se); update_load_sub(&cfs_rq->load, se->load.weight); } dequeue_load_avg(cfs_rq, se); =20 - update_load_set(&se->load, weight); - if (!se->on_rq) { /* * Because we keep se->vlag =3D V - v_i, while: lag_i =3D w_i*(V - v_i), * we need to scale se->vlag when w_i changes. */ - se->vlag =3D div_s64(se->vlag * old_weight, weight); + se->vlag =3D div_s64(se->vlag * se->load.weight, weight); } else { - s64 deadline =3D se->deadline - se->vruntime; - /* - * When the weight changes, the virtual time slope changes and - * we should adjust the relative virtual deadline accordingly. - */ - deadline =3D div_s64(deadline * old_weight, weight); - se->deadline =3D se->vruntime + deadline; - if (se !=3D cfs_rq->curr) - min_deadline_cb_propagate(&se->run_node, NULL); + reweight_eevdf(cfs_rq, se, weight); } =20 + update_load_set(&se->load, weight); + #ifdef CONFIG_SMP do { u32 divider =3D get_pelt_divider(&se->avg); @@ -3712,8 +3811,17 @@ static void reweight_entity(struct cfs_rq *cfs_rq, s= truct sched_entity *se, enqueue_load_avg(cfs_rq, se); if (se->on_rq) { update_load_add(&cfs_rq->load, se->load.weight); - if (cfs_rq->curr !=3D se) - avg_vruntime_add(cfs_rq, se); + if (!curr) { + /* + * The entity's vruntime has been adjusted, so let's check + * whether the rq-wide min_vruntime needs updated too. Since + * the calculations above require stable min_vruntime rather + * than up-to-date one, we do the update at the end of the + * reweight process. + */ + __enqueue_entity(cfs_rq, se); + update_min_vruntime(cfs_rq); + } } } =20 @@ -3857,14 +3965,11 @@ static void update_cfs_group(struct sched_entity *s= e) =20 #ifndef CONFIG_SMP shares =3D READ_ONCE(gcfs_rq->tg->shares); - - if (likely(se->load.weight =3D=3D shares)) - return; #else - shares =3D calc_group_shares(gcfs_rq); + shares =3D calc_group_shares(gcfs_rq); #endif - - reweight_entity(cfs_rq_of(se), se, shares); + if (unlikely(se->load.weight !=3D shares)) + reweight_entity(cfs_rq_of(se), se, shares); } =20 #else /* CONFIG_FAIR_GROUP_SCHED */ --=20 2.37.3 From nobody Wed Dec 31 05:24:05 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7A4B0C4332F for ; Tue, 7 Nov 2023 09:07:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233762AbjKGJHO (ORCPT ); Tue, 7 Nov 2023 04:07:14 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57194 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233732AbjKGJHM (ORCPT ); Tue, 7 Nov 2023 04:07:12 -0500 Received: from mail-il1-x12f.google.com (mail-il1-x12f.google.com [IPv6:2607:f8b0:4864:20::12f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 88A64FA for ; Tue, 7 Nov 2023 01:07:08 -0800 (PST) Received: by mail-il1-x12f.google.com with SMTP id e9e14a558f8ab-3594560fa09so21115575ab.0 for ; Tue, 07 Nov 2023 01:07:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1699348028; x=1699952828; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=fXCKXoXuJ85/PH3yZG4U1Zwx0SWkHt78OpGG2eQoUTY=; b=V/sdYijyYP2MgMzPADVdpRWrmwYyhYGCA0aixjHtPYv9SdGdC5FZzIoZlz1UkXey0/ tZh6A1VIZzYp4Z/15ttmZm/ltHck9czwTZjYkSGUAC0s2TijhW8i9As9tiKPP80jFrMD KiUj3uyM77S2D6GY6FkmrY2d4XIyGdYiW15AjhttLmexP7v5OZw2rsOtm+eLW+rdQXty bEnU033E+b3RcNgYs8yQ7lvAnVQY4rQEP42QdyljU3QHZ9ZlZtdjYG3H55MJujrxy8JN Oz3/mJpGq9eBD2Lhi0UArrANjETdCBob5sjcRR4l5Dw2Tm/C9gM1G53pelenRiEH+s65 2lFA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699348028; x=1699952828; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=fXCKXoXuJ85/PH3yZG4U1Zwx0SWkHt78OpGG2eQoUTY=; b=Dc0GMX5QKV3oibXQBC4g+tLA/Axnfww6sMTyaATA6x40KyNO7P9tSm+B1STqKMmi0N wwU3X4qpF7LukgmJzKYsQyPIdhanTzsMk4W/8fUzAQKaRrXSFEbA4/tfqCAoynUFcL1r xhd3QuZJzIRbenFuwUWtL0xa2T1VBQA+YtJa0ZG6Qz7odBJLjjhsbfJmSMIKrMBQCj4y 9+bWFM4NI0nzi5EjFb/1xwEneAictAw+MkGYDEKX+LtFBkvTDBcsqkT0xmnIAytomRL9 v89YMKB0qFtmMZ1VkEj5FqQUl4FIwM2uv0L4Gs8ixuYZPsk4ooSIEy6jV85eGRdle8/g voqQ== X-Gm-Message-State: AOJu0YyYXSLTk/TjxOFYwDFOswoCYQtFuuZ60/ai4h7zke8ddYGxDNo/ sdy3CbMc1QY82b2rTEWclD2s/w== X-Google-Smtp-Source: AGHT+IHVnHBdCWFDdKDxB3E7fhkSq95idR05nI7BhD9aR5o7w6oLnZV31CcydjBjimecTb+wc7ZbXQ== X-Received: by 2002:a05:6e02:144c:b0:34f:70ec:d4cf with SMTP id p12-20020a056e02144c00b0034f70ecd4cfmr2940043ilo.8.1699348027886; Tue, 07 Nov 2023 01:07:07 -0800 (PST) Received: from C02DV8HUMD6R.bytedance.net ([203.208.167.147]) by smtp.gmail.com with ESMTPSA id k27-20020a63ba1b000000b005ab7b055573sm875154pgf.79.2023.11.07.01.07.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 07 Nov 2023 01:07:07 -0800 (PST) From: Abel Wu To: Peter Zijlstra , Ingo Molnar , Vincent Guittot , Dietmar Eggemann , Valentin Schneider Cc: Barry Song <21cnbao@gmail.com>, Benjamin Segall , Chen Yu , Daniel Jordan , "Gautham R . Shenoy" , Joel Fernandes , K Prateek Nayak , Mike Galbraith , Qais Yousef , Tim Chen , Yicong Yang , Youssef Esmat , linux-kernel@vger.kernel.org, Abel Wu Subject: [PATCH 2/4] sched/eevdf: Sort the rbtree by virtual deadline Date: Tue, 7 Nov 2023 17:05:08 +0800 Message-Id: <20231107090510.71322-3-wuyun.abel@bytedance.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20231107090510.71322-1-wuyun.abel@bytedance.com> References: <20231107090510.71322-1-wuyun.abel@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Sort the task timeline by virtual deadline and keep the min_vruntime in the augmented tree, so we can avoid doubling the worst case cost and make full use of the cached leftmost node to enable O(1) fastpath picking in next patch. This patch also cleans up the unused max_vruntime() and adjusts pos for some functions. Signed-off-by: Abel Wu --- include/linux/sched.h | 2 +- kernel/sched/debug.c | 11 +- kernel/sched/fair.c | 239 +++++++++++++++++------------------------- kernel/sched/sched.h | 1 + 4 files changed, 107 insertions(+), 146 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 556ad71b532b..815832a0e7eb 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -553,7 +553,7 @@ struct sched_entity { struct load_weight load; struct rb_node run_node; u64 deadline; - u64 min_deadline; + u64 min_vruntime; =20 struct list_head group_node; unsigned int on_rq; diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c index 4580a450700e..168eecc209b4 100644 --- a/kernel/sched/debug.c +++ b/kernel/sched/debug.c @@ -628,8 +628,8 @@ static void print_rq(struct seq_file *m, struct rq *rq,= int rq_cpu) =20 void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq) { - s64 left_vruntime =3D -1, min_vruntime, right_vruntime =3D -1, spread; - struct sched_entity *last, *first; + s64 left_vruntime =3D -1, min_vruntime, right_vruntime =3D -1, left_deadl= ine =3D -1, spread; + struct sched_entity *last, *first, *root; struct rq *rq =3D cpu_rq(cpu); unsigned long flags; =20 @@ -644,15 +644,20 @@ void print_cfs_rq(struct seq_file *m, int cpu, struct= cfs_rq *cfs_rq) SPLIT_NS(cfs_rq->exec_clock)); =20 raw_spin_rq_lock_irqsave(rq, flags); + root =3D __pick_root_entity(cfs_rq); + if (root) + left_vruntime =3D root->min_vruntime; first =3D __pick_first_entity(cfs_rq); if (first) - left_vruntime =3D first->vruntime; + left_deadline =3D first->deadline; last =3D __pick_last_entity(cfs_rq); if (last) right_vruntime =3D last->vruntime; min_vruntime =3D cfs_rq->min_vruntime; raw_spin_rq_unlock_irqrestore(rq, flags); =20 + SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "left_deadline", + SPLIT_NS(left_deadline)); SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "left_vruntime", SPLIT_NS(left_vruntime)); SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "min_vruntime", diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index b00d09a9b601..459487bf8824 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -530,15 +530,6 @@ void account_cfs_rq_runtime(struct cfs_rq *cfs_rq, u64= delta_exec); * Scheduling class tree data structure manipulation methods: */ =20 -static inline u64 max_vruntime(u64 max_vruntime, u64 vruntime) -{ - s64 delta =3D (s64)(vruntime - max_vruntime); - if (delta > 0) - max_vruntime =3D vruntime; - - return max_vruntime; -} - static inline u64 min_vruntime(u64 min_vruntime, u64 vruntime) { s64 delta =3D (s64)(vruntime - min_vruntime); @@ -551,7 +542,11 @@ static inline u64 min_vruntime(u64 min_vruntime, u64 v= runtime) static inline bool entity_before(const struct sched_entity *a, const struct sched_entity *b) { - return (s64)(a->vruntime - b->vruntime) < 0; + /* + * Tiebreak on vruntime seems unnecessary since it can + * hardly happen. + */ + return (s64)(a->deadline - b->deadline) < 0; } =20 static inline s64 entity_key(struct cfs_rq *cfs_rq, struct sched_entity *s= e) @@ -720,7 +715,7 @@ static void update_entity_lag(struct cfs_rq *cfs_rq, st= ruct sched_entity *se) * Note: using 'avg_vruntime() > se->vruntime' is inacurate due * to the loss in precision caused by the division. */ -int entity_eligible(struct cfs_rq *cfs_rq, struct sched_entity *se) +static int vruntime_eligible(struct cfs_rq *cfs_rq, u64 vruntime) { struct sched_entity *curr =3D cfs_rq->curr; s64 avg =3D cfs_rq->avg_vruntime; @@ -733,16 +728,19 @@ int entity_eligible(struct cfs_rq *cfs_rq, struct sch= ed_entity *se) load +=3D weight; } =20 - return avg >=3D entity_key(cfs_rq, se) * load; + return avg >=3D (s64)(vruntime - cfs_rq->min_vruntime) * load; +} + +int entity_eligible(struct cfs_rq *cfs_rq, struct sched_entity *se) +{ + return vruntime_eligible(cfs_rq, se->vruntime); } =20 static u64 __update_min_vruntime(struct cfs_rq *cfs_rq, u64 vruntime) { u64 min_vruntime =3D cfs_rq->min_vruntime; - /* - * open coded max_vruntime() to allow updating avg_vruntime - */ s64 delta =3D (s64)(vruntime - min_vruntime); + if (delta > 0) { avg_vruntime_update(cfs_rq, delta); min_vruntime =3D vruntime; @@ -752,9 +750,8 @@ static u64 __update_min_vruntime(struct cfs_rq *cfs_rq,= u64 vruntime) =20 static void update_min_vruntime(struct cfs_rq *cfs_rq) { - struct sched_entity *se =3D __pick_first_entity(cfs_rq); + struct sched_entity *se =3D __pick_root_entity(cfs_rq); struct sched_entity *curr =3D cfs_rq->curr; - u64 vruntime =3D cfs_rq->min_vruntime; =20 if (curr) { @@ -766,9 +763,9 @@ static void update_min_vruntime(struct cfs_rq *cfs_rq) =20 if (se) { if (!curr) - vruntime =3D se->vruntime; + vruntime =3D se->min_vruntime; else - vruntime =3D min_vruntime(vruntime, se->vruntime); + vruntime =3D min_vruntime(vruntime, se->min_vruntime); } =20 /* ensure we never gain time by being placed backwards. */ @@ -781,34 +778,34 @@ static inline bool __entity_less(struct rb_node *a, c= onst struct rb_node *b) return entity_before(__node_2_se(a), __node_2_se(b)); } =20 -#define deadline_gt(field, lse, rse) ({ (s64)((lse)->field - (rse)->field)= > 0; }) +#define vruntime_gt(field, lse, rse) ({ (s64)((lse)->field - (rse)->field)= > 0; }) =20 -static inline void __update_min_deadline(struct sched_entity *se, struct r= b_node *node) +static inline void __min_vruntime_update(struct sched_entity *se, struct r= b_node *node) { if (node) { struct sched_entity *rse =3D __node_2_se(node); - if (deadline_gt(min_deadline, se, rse)) - se->min_deadline =3D rse->min_deadline; + if (vruntime_gt(min_vruntime, se, rse)) + se->min_vruntime =3D rse->min_vruntime; } } =20 /* - * se->min_deadline =3D min(se->deadline, left->min_deadline, right->min_d= eadline) + * se->min_vruntime =3D min(se->vruntime, {left,right}->min_vruntime) */ -static inline bool min_deadline_update(struct sched_entity *se, bool exit) +static inline bool min_vruntime_update(struct sched_entity *se, bool exit) { - u64 old_min_deadline =3D se->min_deadline; + u64 old_min_vruntime =3D se->min_vruntime; struct rb_node *node =3D &se->run_node; =20 - se->min_deadline =3D se->deadline; - __update_min_deadline(se, node->rb_right); - __update_min_deadline(se, node->rb_left); + se->min_vruntime =3D se->vruntime; + __min_vruntime_update(se, node->rb_right); + __min_vruntime_update(se, node->rb_left); =20 - return se->min_deadline =3D=3D old_min_deadline; + return se->min_vruntime =3D=3D old_min_vruntime; } =20 -RB_DECLARE_CALLBACKS(static, min_deadline_cb, struct sched_entity, - run_node, min_deadline, min_deadline_update); +RB_DECLARE_CALLBACKS(static, min_vruntime_cb, struct sched_entity, + run_node, min_vruntime, min_vruntime_update); =20 /* * Enqueue an entity into the rb-tree: @@ -816,18 +813,28 @@ RB_DECLARE_CALLBACKS(static, min_deadline_cb, struct = sched_entity, static void __enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *s= e) { avg_vruntime_add(cfs_rq, se); - se->min_deadline =3D se->deadline; + se->min_vruntime =3D se->vruntime; rb_add_augmented_cached(&se->run_node, &cfs_rq->tasks_timeline, - __entity_less, &min_deadline_cb); + __entity_less, &min_vruntime_cb); } =20 static void __dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *s= e) { rb_erase_augmented_cached(&se->run_node, &cfs_rq->tasks_timeline, - &min_deadline_cb); + &min_vruntime_cb); avg_vruntime_sub(cfs_rq, se); } =20 +struct sched_entity *__pick_root_entity(struct cfs_rq *cfs_rq) +{ + struct rb_node *root =3D cfs_rq->tasks_timeline.rb_root.rb_node; + + if (!root) + return NULL; + + return __node_2_se(root); +} + struct sched_entity *__pick_first_entity(struct cfs_rq *cfs_rq) { struct rb_node *left =3D rb_first_cached(&cfs_rq->tasks_timeline); @@ -838,6 +845,35 @@ struct sched_entity *__pick_first_entity(struct cfs_rq= *cfs_rq) return __node_2_se(left); } =20 +#ifdef CONFIG_SCHED_DEBUG +struct sched_entity *__pick_last_entity(struct cfs_rq *cfs_rq) +{ + struct rb_node *last =3D rb_last(&cfs_rq->tasks_timeline.rb_root); + + if (!last) + return NULL; + + return __node_2_se(last); +} + +/************************************************************** + * Scheduling class statistics methods: + */ +#ifdef CONFIG_SMP +int sched_update_scaling(void) +{ + unsigned int factor =3D get_update_sysctl_factor(); + +#define WRT_SYSCTL(name) \ + (normalized_sysctl_##name =3D sysctl_##name / (factor)) + WRT_SYSCTL(sched_base_slice); +#undef WRT_SYSCTL + + return 0; +} +#endif +#endif + /* * Earliest Eligible Virtual Deadline First * @@ -850,23 +886,28 @@ struct sched_entity *__pick_first_entity(struct cfs_r= q *cfs_rq) * with the earliest virtual deadline. * * We can do this in O(log n) time due to an augmented RB-tree. The - * tree keeps the entries sorted on service, but also functions as a - * heap based on the deadline by keeping: + * tree keeps the entries sorted on deadline, but also functions as a + * heap based on the vruntime by keeping: * - * se->min_deadline =3D min(se->deadline, se->{left,right}->min_deadline) + * se->min_vruntime =3D min(se->vruntime, se->{left,right}->min_vruntime) * - * Which allows an EDF like search on (sub)trees. + * Which allows tree pruning through eligibility. */ -static struct sched_entity *__pick_eevdf(struct cfs_rq *cfs_rq) +static struct sched_entity *pick_eevdf(struct cfs_rq *cfs_rq) { struct rb_node *node =3D cfs_rq->tasks_timeline.rb_root.rb_node; struct sched_entity *curr =3D cfs_rq->curr; struct sched_entity *best =3D NULL; - struct sched_entity *best_left =3D NULL; + + /* + * We can safely skip eligibility check if there is only one entity + * in this cfs_rq, saving some cycles. + */ + if (cfs_rq->nr_running =3D=3D 1) + return curr && curr->on_rq ? curr : __node_2_se(node); =20 if (curr && (!curr->on_rq || !entity_eligible(cfs_rq, curr))) curr =3D NULL; - best =3D curr; =20 /* * Once selected, run a task until it either becomes non-eligible or @@ -875,126 +916,40 @@ static struct sched_entity *__pick_eevdf(struct cfs_= rq *cfs_rq) if (sched_feat(RUN_TO_PARITY) && curr && curr->vlag =3D=3D curr->deadline) return curr; =20 + /* Heap search for the EEVD entity */ while (node) { struct sched_entity *se =3D __node_2_se(node); + struct rb_node *left =3D node->rb_left; =20 /* - * If this entity is not eligible, try the left subtree. + * Eligible entities in left subtree are always better + * choices, since they have earlier deadlines. */ - if (!entity_eligible(cfs_rq, se)) { - node =3D node->rb_left; + if (left && vruntime_eligible(cfs_rq, + __node_2_se(left)->min_vruntime)) { + node =3D left; continue; } =20 /* - * Now we heap search eligible trees for the best (min_)deadline + * The left subtree either is empty or has no eligible + * entity, so check the current node since it is the one + * with earliest deadline that might be eligible. */ - if (!best || deadline_gt(deadline, best, se)) + if (entity_eligible(cfs_rq, se)) { best =3D se; - - /* - * Every se in a left branch is eligible, keep track of the - * branch with the best min_deadline - */ - if (node->rb_left) { - struct sched_entity *left =3D __node_2_se(node->rb_left); - - if (!best_left || deadline_gt(min_deadline, best_left, left)) - best_left =3D left; - - /* - * min_deadline is in the left branch. rb_left and all - * descendants are eligible, so immediately switch to the second - * loop. - */ - if (left->min_deadline =3D=3D se->min_deadline) - break; - } - - /* min_deadline is at this node, no need to look right */ - if (se->deadline =3D=3D se->min_deadline) break; - - /* else min_deadline is in the right branch. */ - node =3D node->rb_right; - } - - /* - * We ran into an eligible node which is itself the best. - * (Or nr_running =3D=3D 0 and both are NULL) - */ - if (!best_left || (s64)(best_left->min_deadline - best->deadline) > 0) - return best; - - /* - * Now best_left and all of its children are eligible, and we are just - * looking for deadline =3D=3D min_deadline - */ - node =3D &best_left->run_node; - while (node) { - struct sched_entity *se =3D __node_2_se(node); - - /* min_deadline is the current node */ - if (se->deadline =3D=3D se->min_deadline) - return se; - - /* min_deadline is in the left branch */ - if (node->rb_left && - __node_2_se(node->rb_left)->min_deadline =3D=3D se->min_deadline) { - node =3D node->rb_left; - continue; } =20 - /* else min_deadline is in the right branch */ node =3D node->rb_right; } - return NULL; -} =20 -static struct sched_entity *pick_eevdf(struct cfs_rq *cfs_rq) -{ - struct sched_entity *se =3D __pick_eevdf(cfs_rq); + if (!best || (curr && entity_before(curr, best))) + best =3D curr; =20 - if (!se) { - struct sched_entity *left =3D __pick_first_entity(cfs_rq); - if (left) { - pr_err("EEVDF scheduling fail, picking leftmost\n"); - return left; - } - } - - return se; + return best; } =20 -#ifdef CONFIG_SCHED_DEBUG -struct sched_entity *__pick_last_entity(struct cfs_rq *cfs_rq) -{ - struct rb_node *last =3D rb_last(&cfs_rq->tasks_timeline.rb_root); - - if (!last) - return NULL; - - return __node_2_se(last); -} - -/************************************************************** - * Scheduling class statistics methods: - */ -#ifdef CONFIG_SMP -int sched_update_scaling(void) -{ - unsigned int factor =3D get_update_sysctl_factor(); - -#define WRT_SYSCTL(name) \ - (normalized_sysctl_##name =3D sysctl_##name / (factor)) - WRT_SYSCTL(sched_base_slice); -#undef WRT_SYSCTL - - return 0; -} -#endif -#endif - static void clear_buddies(struct cfs_rq *cfs_rq, struct sched_entity *se); =20 /* diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 2e5a95486a42..539c7e763f15 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2822,6 +2822,7 @@ DEFINE_LOCK_GUARD_2(double_rq_lock, struct rq, double_rq_lock(_T->lock, _T->lock2), double_rq_unlock(_T->lock, _T->lock2)) =20 +extern struct sched_entity *__pick_root_entity(struct cfs_rq *cfs_rq); extern struct sched_entity *__pick_first_entity(struct cfs_rq *cfs_rq); extern struct sched_entity *__pick_last_entity(struct cfs_rq *cfs_rq); =20 --=20 2.37.3 From nobody Wed Dec 31 05:24:05 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F2627C4332F for ; Tue, 7 Nov 2023 09:07:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233743AbjKGJHx (ORCPT ); Tue, 7 Nov 2023 04:07:53 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57224 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233803AbjKGJHq (ORCPT ); Tue, 7 Nov 2023 04:07:46 -0500 Received: from mail-oa1-x35.google.com (mail-oa1-x35.google.com [IPv6:2001:4860:4864:20::35]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4CC3512A for ; Tue, 7 Nov 2023 01:07:16 -0800 (PST) Received: by mail-oa1-x35.google.com with SMTP id 586e51a60fabf-1ead2e6fab7so3257947fac.0 for ; Tue, 07 Nov 2023 01:07:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1699348035; x=1699952835; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Wm0qUqDRJW4jPSWFEatfvbHk2iFZ3worCcvtvRb4Syc=; b=ggeePsRLDImPMyvMBNbcMSTrjWgs4vDT636LlCoPqnWg138joj4OODG3DOaoBh9ZKA 5g2Ssa016Lj+e/9Qhn4he7DPC6ldXrD7W5yUr6V1nROunxE8tDl/ElfpR27BFXBfm6ob OEAh/GBRZwGBsnFwhQMer8vvaZr/9bWTQW1TPykYv6NAy8r7BNXcFhPAQHa0Ne2uM+f7 H+Qk4vi5GbceBXOmshrHOVYpSsdgC9VE/4uwqoB4o8UH3yc5x56efa2Fwcuhz3Zcsj93 UZT7+cwbzeunysDUgWJdZtOP7tEVvg6gHe+H1+qmT9d+Cy4XImBJRPqHFlZFKQgsQ0vU 2gVg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699348035; x=1699952835; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Wm0qUqDRJW4jPSWFEatfvbHk2iFZ3worCcvtvRb4Syc=; b=Llnw3Oh47/P9/BTmCXEDHjwGFQ0OzjGHmQVTd5NtXo722hdV+UWCXL3ajqbUApNnlN 8bmVKqv/zlVjmZHmgTD0CQd1iKTtcF9XNWkl8suyH3TIt3sLkUEOFdITUhbJ9vKod4hR PdRXbZ2jBqXCLW/mPm7NfYhos88YCEjZZd6YW1KvomLoPl9VEF5bIIaXoPYZzFRHOdV7 jsg4jduHXG6P0zXSFeLEfbI9o8qETuswkEG2LdnSdaffs2LjZEkuLqDb5imQM8v13Qg7 TRoBUyRFUA1FDDQtIkQKg0tnikqxuCau7lMBzB3S3xwC4jbc0IxnAGRDKaoWo1m2tfCM 5nzQ== X-Gm-Message-State: AOJu0YyT0HcEHSmLhqsyRVvWCldIcYC24+HQp0a3EEAtAivh2nzYSyhT M04b/gQrsq6Fh8R4LDOQsZk9JA== X-Google-Smtp-Source: AGHT+IEQDiPX+WRXpJLhh1eW6bazUb5GLMbB7ofg9DnPKP2uzbnfxrecnw02CSCxS7I57pTfNTsWpg== X-Received: by 2002:a05:6870:12d5:b0:1e9:e268:ab6f with SMTP id 21-20020a05687012d500b001e9e268ab6fmr1980703oam.17.1699348035560; Tue, 07 Nov 2023 01:07:15 -0800 (PST) Received: from C02DV8HUMD6R.bytedance.net ([203.208.167.147]) by smtp.gmail.com with ESMTPSA id k27-20020a63ba1b000000b005ab7b055573sm875154pgf.79.2023.11.07.01.07.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 07 Nov 2023 01:07:15 -0800 (PST) From: Abel Wu To: Peter Zijlstra , Ingo Molnar , Vincent Guittot , Dietmar Eggemann , Valentin Schneider Cc: Barry Song <21cnbao@gmail.com>, Benjamin Segall , Chen Yu , Daniel Jordan , "Gautham R . Shenoy" , Joel Fernandes , K Prateek Nayak , Mike Galbraith , Qais Yousef , Tim Chen , Yicong Yang , Youssef Esmat , linux-kernel@vger.kernel.org, Abel Wu Subject: [PATCH 3/4] sched/eevdf: O(1) fastpath for task selection Date: Tue, 7 Nov 2023 17:05:09 +0800 Message-Id: <20231107090510.71322-4-wuyun.abel@bytedance.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20231107090510.71322-1-wuyun.abel@bytedance.com> References: <20231107090510.71322-1-wuyun.abel@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Since the RB-tree is now sorted by deadline, let's first try the leftmost entity which has the earliest virtual deadline. I've done some benchmarks to see its effectiveness. All the benchmarks are done inside a normal cpu cgroup in a clean environment with cpu turbo disabled, on a dual-CPU Intel Xeon(R) Platinum 8260 with 2 NUMA nodes each of which has 24C/48T. hackbench: process/thread + pipe/socket + 1/2/4/8 groups netperf: TCP/UDP + STREAM/RR + 24/48/72/96/192 threads tbench: loopback 24/48/72/96/192 threads schbench: 1/2/4/8 mthreads direct: cfs_rq has only one entity parity: RUN_TO_PARITY fast: O(1) fastpath slow: heap search (%) direct parity fast slow hackbench 92.95 2.02 4.91 0.12 netperf 68.08 6.60 24.18 1.14 tbench 67.55 11.22 20.61 0.62 schbench 69.91 2.65 25.73 1.71 The above results indicate that this fastpath really makes task selection more efficient. Signed-off-by: Abel Wu --- kernel/sched/fair.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 459487bf8824..a1fdd0c7a051 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -896,6 +896,7 @@ int sched_update_scaling(void) static struct sched_entity *pick_eevdf(struct cfs_rq *cfs_rq) { struct rb_node *node =3D cfs_rq->tasks_timeline.rb_root.rb_node; + struct sched_entity *se =3D __pick_first_entity(cfs_rq); struct sched_entity *curr =3D cfs_rq->curr; struct sched_entity *best =3D NULL; =20 @@ -904,7 +905,7 @@ static struct sched_entity *pick_eevdf(struct cfs_rq *c= fs_rq) * in this cfs_rq, saving some cycles. */ if (cfs_rq->nr_running =3D=3D 1) - return curr && curr->on_rq ? curr : __node_2_se(node); + return curr && curr->on_rq ? curr : se; =20 if (curr && (!curr->on_rq || !entity_eligible(cfs_rq, curr))) curr =3D NULL; @@ -916,9 +917,14 @@ static struct sched_entity *pick_eevdf(struct cfs_rq *= cfs_rq) if (sched_feat(RUN_TO_PARITY) && curr && curr->vlag =3D=3D curr->deadline) return curr; =20 + /* Pick the leftmost entity if it's eligible */ + if (se && entity_eligible(cfs_rq, se)) { + best =3D se; + goto found; + } + /* Heap search for the EEVD entity */ while (node) { - struct sched_entity *se =3D __node_2_se(node); struct rb_node *left =3D node->rb_left; =20 /* @@ -931,6 +937,8 @@ static struct sched_entity *pick_eevdf(struct cfs_rq *c= fs_rq) continue; } =20 + se =3D __node_2_se(node); + /* * The left subtree either is empty or has no eligible * entity, so check the current node since it is the one @@ -943,7 +951,7 @@ static struct sched_entity *pick_eevdf(struct cfs_rq *c= fs_rq) =20 node =3D node->rb_right; } - +found: if (!best || (curr && entity_before(curr, best))) best =3D curr; =20 --=20 2.37.3 From nobody Wed Dec 31 05:24:05 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C0F70C4332F for ; Tue, 7 Nov 2023 09:07:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233786AbjKGJHd (ORCPT ); Tue, 7 Nov 2023 04:07:33 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53834 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233763AbjKGJH1 (ORCPT ); Tue, 7 Nov 2023 04:07:27 -0500 Received: from mail-ot1-x32a.google.com (mail-ot1-x32a.google.com [IPv6:2607:f8b0:4864:20::32a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8CD8111D for ; Tue, 7 Nov 2023 01:07:23 -0800 (PST) Received: by mail-ot1-x32a.google.com with SMTP id 46e09a7af769-6d31f3e8ca8so3351885a34.0 for ; Tue, 07 Nov 2023 01:07:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1699348043; x=1699952843; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=wwX48usRaKEA5HaGZtCGAssiXT8r/2oxrc8cbsc3Tm8=; b=DTJmubAOblnTL4KaCW3xYJrqiR/klLWdvORaoZA2YuYv3FywAll+jqRC+P23xQoo9g Bioir8D6RxQJi8QixMlp2x/Ujo/MPNdYLYawejKbJg7upTD9rNvgay0XV3MgAlUkoyPz lK1HtdkIOAQFdHRbT4fGqySBpQsDdZIGvnRaYWPY/ekMKD1I43gCUlaG/DAPVgsaMdbp P7g0rp4jrRqyJTHZuS5EaNDzy0l4/LnxSFne0+fe7sjq6+x7HeKD5BuvVC46SEQgqEr0 1wHNevDNT71bZ44XdBDWbO3Jh+6fV10RCICGjHLxcN3wXs6lz06ToMUPQwzQM2LM9B37 N/cg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699348043; x=1699952843; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=wwX48usRaKEA5HaGZtCGAssiXT8r/2oxrc8cbsc3Tm8=; b=C5+QIr+iX6hRNMTSTnJA9SvgIhdeYvFwz3krFK3W2G+TZNuy3FIDlxRrxNB9bJ5LHD K8IgPwhyW3trS4sHmeD4CP85vcJ+MCqu6M5QHLg3+DnrgA6Pp4O5M8Eq+nV5WlEux6I0 64m8d0Fs9858EhTDv1SbptlGCdLn+Sx1Rhx5EN10GfxcuFdTFn8/28ri/TdLRWcmrX+9 L9Mq9JPkxh2wiCYzLDznUlZD1l56/llduGQlDvGpnLvCp94KCLbFHsD811SWDp//cJom bZyTZR5Tf7hMSnli+2vejlcMEf4pKa7sr2c1lah4nkn17qmuskVvz7xMkq4tEjngvMVG i5CQ== X-Gm-Message-State: AOJu0YzOUI+a/LB69kxe1omWb9cJcOnl08JylsTO3hGbKuS/VSbE0Gys wj0qJY6O6lJkamSv0Zz6COd7EQ== X-Google-Smtp-Source: AGHT+IHChR5h3gdArSdp56QWypUzALBaTB/fgCvQe2XApaJGlMDEt25g25XwL9m5U4X4IdYKr49XTQ== X-Received: by 2002:a05:6830:18fa:b0:6b9:1473:7fa3 with SMTP id d26-20020a05683018fa00b006b914737fa3mr32542977otf.27.1699348042881; Tue, 07 Nov 2023 01:07:22 -0800 (PST) Received: from C02DV8HUMD6R.bytedance.net ([203.208.167.147]) by smtp.gmail.com with ESMTPSA id k27-20020a63ba1b000000b005ab7b055573sm875154pgf.79.2023.11.07.01.07.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 07 Nov 2023 01:07:22 -0800 (PST) From: Abel Wu To: Peter Zijlstra , Ingo Molnar , Vincent Guittot , Dietmar Eggemann , Valentin Schneider Cc: Barry Song <21cnbao@gmail.com>, Benjamin Segall , Chen Yu , Daniel Jordan , "Gautham R . Shenoy" , Joel Fernandes , K Prateek Nayak , Mike Galbraith , Qais Yousef , Tim Chen , Yicong Yang , Youssef Esmat , linux-kernel@vger.kernel.org, Abel Wu Subject: [PATCH 4/4] sched/stats: branch statistics for pick_eevdf Date: Tue, 7 Nov 2023 17:05:10 +0800 Message-Id: <20231107090510.71322-5-wuyun.abel@bytedance.com> X-Mailer: git-send-email 2.37.3 In-Reply-To: <20231107090510.71322-1-wuyun.abel@bytedance.com> References: <20231107090510.71322-1-wuyun.abel@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" For trace purpose only, not intended for upstream. Signed-off-by: Abel Wu --- kernel/sched/fair.c | 12 ++++++++++-- kernel/sched/sched.h | 5 +++++ kernel/sched/stats.c | 6 ++++-- 3 files changed, 19 insertions(+), 4 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index a1fdd0c7a051..9b07e9437526 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -899,13 +899,16 @@ static struct sched_entity *pick_eevdf(struct cfs_rq = *cfs_rq) struct sched_entity *se =3D __pick_first_entity(cfs_rq); struct sched_entity *curr =3D cfs_rq->curr; struct sched_entity *best =3D NULL; + struct rq *rq =3D rq_of(cfs_rq); =20 /* * We can safely skip eligibility check if there is only one entity * in this cfs_rq, saving some cycles. */ - if (cfs_rq->nr_running =3D=3D 1) + if (cfs_rq->nr_running =3D=3D 1) { + schedstat_inc(rq->pick_direct); return curr && curr->on_rq ? curr : se; + } =20 if (curr && (!curr->on_rq || !entity_eligible(cfs_rq, curr))) curr =3D NULL; @@ -914,15 +917,20 @@ static struct sched_entity *pick_eevdf(struct cfs_rq = *cfs_rq) * Once selected, run a task until it either becomes non-eligible or * until it gets a new slice. See the HACK in set_next_entity(). */ - if (sched_feat(RUN_TO_PARITY) && curr && curr->vlag =3D=3D curr->deadline) + if (sched_feat(RUN_TO_PARITY) && curr && curr->vlag =3D=3D curr->deadline= ) { + schedstat_inc(rq->pick_parity); return curr; + } =20 /* Pick the leftmost entity if it's eligible */ if (se && entity_eligible(cfs_rq, se)) { + schedstat_inc(rq->pick_fast); best =3D se; goto found; } =20 + schedstat_inc(rq->pick_slow); + /* Heap search for the EEVD entity */ while (node) { struct rb_node *left =3D node->rb_left; diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 539c7e763f15..85a79990a698 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1105,6 +1105,11 @@ struct rq { /* try_to_wake_up() stats */ unsigned int ttwu_count; unsigned int ttwu_local; + + unsigned int pick_direct; + unsigned int pick_parity; + unsigned int pick_fast; + unsigned int pick_slow; #endif =20 #ifdef CONFIG_CPU_IDLE diff --git a/kernel/sched/stats.c b/kernel/sched/stats.c index 857f837f52cb..4b862c798989 100644 --- a/kernel/sched/stats.c +++ b/kernel/sched/stats.c @@ -133,12 +133,14 @@ static int show_schedstat(struct seq_file *seq, void = *v) =20 /* runqueue-specific stats */ seq_printf(seq, - "cpu%d %u 0 %u %u %u %u %llu %llu %lu", + "cpu%d %u 0 %u %u %u %u %llu %llu %lu %u %u %u %u", cpu, rq->yld_count, rq->sched_count, rq->sched_goidle, rq->ttwu_count, rq->ttwu_local, rq->rq_cpu_time, - rq->rq_sched_info.run_delay, rq->rq_sched_info.pcount); + rq->rq_sched_info.run_delay, rq->rq_sched_info.pcount, + rq->pick_direct, rq->pick_parity, + rq->pick_fast, rq->pick_slow); =20 seq_printf(seq, "\n"); =20 --=20 2.37.3