From nobody Tue Dec 30 14:57:38 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 943BDC4332F for ; Tue, 14 Nov 2023 21:57:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234135AbjKNV50 (ORCPT ); Tue, 14 Nov 2023 16:57:26 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52476 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233873AbjKNV5T (ORCPT ); Tue, 14 Nov 2023 16:57:19 -0500 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B211DD6; Tue, 14 Nov 2023 13:57:15 -0800 (PST) Date: Tue, 14 Nov 2023 21:57:13 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1699999034; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CUJpXWysZAAxb9xNg7ygyvbvxjeUdaOSci8Z2b9XVsk=; b=pN4oIfHA1tc9V8WVcyCOXXys4vdfCsNcas7u9CnptvQQAoa3S4+XiyuQnqPFl0yi9Bq+pR EccmPt4GoVUfihsRMYfBxemZ3cPlLyp3Gu1XYdVmg7Y1BiTds22MwZlC/+6IOo3mqBSwwD T5yJ14VENbN+QPt1Q899CIs1oaNgns2OiwrzrfsewjDbh3y5ctEEYDrPcDP6TYCh3gy+w5 cAoY2o7tTTPh4ustE55laYqPzP2vwE5Z4FAMg1wAt8JZtGhJWPYlpR+r1iSueeS0hhAsW/ 9eDR3N6G5fAEPAWO+xxB/85wTYecKd3iXMzuAyySabLo6oEBF7lXq1elB0swpA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1699999034; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CUJpXWysZAAxb9xNg7ygyvbvxjeUdaOSci8Z2b9XVsk=; b=wDeJfjQn82EgJ89yjrwdRFjOC3A+7HHrClYa3gUlL8v6JV+qmhez76oZQ2RTMFZA34RgPP S3g15P27ZdUzeUAg== From: "tip-bot2 for Abel Wu" Sender: tip-bot2@linutronix.de Reply-to: linux-kernel@vger.kernel.org To: linux-tip-commits@vger.kernel.org Subject: [tip: sched/urgent] sched/eevdf: Fix vruntime adjustment on reweight Cc: Abel Wu , "Peter Zijlstra (Intel)" , x86@kernel.org, linux-kernel@vger.kernel.org In-Reply-To: <20231107090510.71322-2-wuyun.abel@bytedance.com> References: <20231107090510.71322-2-wuyun.abel@bytedance.com> MIME-Version: 1.0 Message-ID: <169999903332.391.13801155435294949562.tip-bot2@tip-bot2> Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The following commit has been merged into the sched/urgent branch of tip: Commit-ID: eab03c23c2a162085b13200d7942fc5a00b5ccc8 Gitweb: https://git.kernel.org/tip/eab03c23c2a162085b13200d7942fc5a0= 0b5ccc8 Author: Abel Wu AuthorDate: Tue, 07 Nov 2023 17:05:07 +08:00 Committer: Peter Zijlstra CommitterDate: Tue, 14 Nov 2023 22:27:00 +01:00 sched/eevdf: Fix vruntime adjustment on reweight vruntime of the (on_rq && !0-lag) entity needs to be adjusted when it gets re-weighted, and the calculations can be simplified based on the fact that re-weight won't change the w-average of all the entities. Please check the proofs in comments. But adjusting vruntime can also cause position change in RB-tree hence require re-queue to fix up which might be costly. This might be avoided by deferring adjustment to the time the entity actually leaves tree (dequeue/pick), but that will negatively affect task selection and probably not good enough either. Fixes: 147f3efaa241 ("sched/fair: Implement an EEVDF-like scheduling policy= ") Signed-off-by: Abel Wu Signed-off-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20231107090510.71322-2-wuyun.abel@bytedance= .com --- kernel/sched/fair.c | 151 ++++++++++++++++++++++++++++++++++++------- 1 file changed, 128 insertions(+), 23 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 2048138..025d909 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3666,41 +3666,140 @@ static inline void dequeue_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) { } #endif =20 +static void reweight_eevdf(struct cfs_rq *cfs_rq, struct sched_entity *se, + unsigned long weight) +{ + unsigned long old_weight =3D se->load.weight; + u64 avruntime =3D avg_vruntime(cfs_rq); + s64 vlag, vslice; + + /* + * VRUNTIME + * =3D=3D=3D=3D=3D=3D=3D=3D + * + * COROLLARY #1: The virtual runtime of the entity needs to be + * adjusted if re-weight at !0-lag point. + * + * Proof: For contradiction assume this is not true, so we can + * re-weight without changing vruntime at !0-lag point. + * + * Weight VRuntime Avg-VRuntime + * before w v V + * after w' v' V' + * + * Since lag needs to be preserved through re-weight: + * + * lag =3D (V - v)*w =3D (V'- v')*w', where v =3D v' + * =3D=3D> V' =3D (V - v)*w/w' + v (1) + * + * Let W be the total weight of the entities before reweight, + * since V' is the new weighted average of entities: + * + * V' =3D (WV + w'v - wv) / (W + w' - w) (2) + * + * by using (1) & (2) we obtain: + * + * (WV + w'v - wv) / (W + w' - w) =3D (V - v)*w/w' + v + * =3D=3D> (WV-Wv+Wv+w'v-wv)/(W+w'-w) =3D (V - v)*w/w' + v + * =3D=3D> (WV - Wv)/(W + w' - w) + v =3D (V - v)*w/w' + v + * =3D=3D> (V - v)*W/(W + w' - w) =3D (V - v)*w/w' (3) + * + * Since we are doing at !0-lag point which means V !=3D v, we + * can simplify (3): + * + * =3D=3D> W / (W + w' - w) =3D w / w' + * =3D=3D> Ww' =3D Ww + ww' - ww + * =3D=3D> W * (w' - w) =3D w * (w' - w) + * =3D=3D> W =3D w (re-weight indicates w' !=3D w) + * + * So the cfs_rq contains only one entity, hence vruntime of + * the entity @v should always equal to the cfs_rq's weighted + * average vruntime @V, which means we will always re-weight + * at 0-lag point, thus breach assumption. Proof completed. + * + * + * COROLLARY #2: Re-weight does NOT affect weighted average + * vruntime of all the entities. + * + * Proof: According to corollary #1, Eq. (1) should be: + * + * (V - v)*w =3D (V' - v')*w' + * =3D=3D> v' =3D V' - (V - v)*w/w' (4) + * + * According to the weighted average formula, we have: + * + * V' =3D (WV - wv + w'v') / (W - w + w') + * =3D (WV - wv + w'(V' - (V - v)w/w')) / (W - w + w') + * =3D (WV - wv + w'V' - Vw + wv) / (W - w + w') + * =3D (WV + w'V' - Vw) / (W - w + w') + * + * =3D=3D> V'*(W - w + w') =3D WV + w'V' - Vw + * =3D=3D> V' * (W - w) =3D (W - w) * V (5) + * + * If the entity is the only one in the cfs_rq, then reweight + * always occurs at 0-lag point, so V won't change. Or else + * there are other entities, hence W !=3D w, then Eq. (5) turns + * into V' =3D V. So V won't change in either case, proof done. + * + * + * So according to corollary #1 & #2, the effect of re-weight + * on vruntime should be: + * + * v' =3D V' - (V - v) * w / w' (4) + * =3D V - (V - v) * w / w' + * =3D V - vl * w / w' + * =3D V - vl' + */ + if (avruntime !=3D se->vruntime) { + vlag =3D (s64)(avruntime - se->vruntime); + vlag =3D div_s64(vlag * old_weight, weight); + se->vruntime =3D avruntime - vlag; + } + + /* + * DEADLINE + * =3D=3D=3D=3D=3D=3D=3D=3D + * + * When the weight changes, the virtual time slope changes and + * we should adjust the relative virtual deadline accordingly. + * + * d' =3D v' + (d - v)*w/w' + * =3D V' - (V - v)*w/w' + (d - v)*w/w' + * =3D V - (V - v)*w/w' + (d - v)*w/w' + * =3D V + (d - V)*w/w' + */ + vslice =3D (s64)(se->deadline - avruntime); + vslice =3D div_s64(vslice * old_weight, weight); + se->deadline =3D avruntime + vslice; +} + static void reweight_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, unsigned long weight) { - unsigned long old_weight =3D se->load.weight; + bool curr =3D cfs_rq->curr =3D=3D se; =20 if (se->on_rq) { /* commit outstanding execution time */ - if (cfs_rq->curr =3D=3D se) + if (curr) update_curr(cfs_rq); else - avg_vruntime_sub(cfs_rq, se); + __dequeue_entity(cfs_rq, se); update_load_sub(&cfs_rq->load, se->load.weight); } dequeue_load_avg(cfs_rq, se); =20 - update_load_set(&se->load, weight); - if (!se->on_rq) { /* * Because we keep se->vlag =3D V - v_i, while: lag_i =3D w_i*(V - v_i), * we need to scale se->vlag when w_i changes. */ - se->vlag =3D div_s64(se->vlag * old_weight, weight); + se->vlag =3D div_s64(se->vlag * se->load.weight, weight); } else { - s64 deadline =3D se->deadline - se->vruntime; - /* - * When the weight changes, the virtual time slope changes and - * we should adjust the relative virtual deadline accordingly. - */ - deadline =3D div_s64(deadline * old_weight, weight); - se->deadline =3D se->vruntime + deadline; - if (se !=3D cfs_rq->curr) - min_deadline_cb_propagate(&se->run_node, NULL); + reweight_eevdf(cfs_rq, se, weight); } =20 + update_load_set(&se->load, weight); + #ifdef CONFIG_SMP do { u32 divider =3D get_pelt_divider(&se->avg); @@ -3712,8 +3811,17 @@ static void reweight_entity(struct cfs_rq *cfs_rq, s= truct sched_entity *se, enqueue_load_avg(cfs_rq, se); if (se->on_rq) { update_load_add(&cfs_rq->load, se->load.weight); - if (cfs_rq->curr !=3D se) - avg_vruntime_add(cfs_rq, se); + if (!curr) { + /* + * The entity's vruntime has been adjusted, so let's check + * whether the rq-wide min_vruntime needs updated too. Since + * the calculations above require stable min_vruntime rather + * than up-to-date one, we do the update at the end of the + * reweight process. + */ + __enqueue_entity(cfs_rq, se); + update_min_vruntime(cfs_rq); + } } } =20 @@ -3857,14 +3965,11 @@ static void update_cfs_group(struct sched_entity *s= e) =20 #ifndef CONFIG_SMP shares =3D READ_ONCE(gcfs_rq->tg->shares); - - if (likely(se->load.weight =3D=3D shares)) - return; #else - shares =3D calc_group_shares(gcfs_rq); + shares =3D calc_group_shares(gcfs_rq); #endif - - reweight_entity(cfs_rq_of(se), se, shares); + if (unlikely(se->load.weight !=3D shares)) + reweight_entity(cfs_rq_of(se), se, shares); } =20 #else /* CONFIG_FAIR_GROUP_SCHED */