From nobody Sat Feb 7 09:30:18 2026 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5FE8135503A for ; Fri, 30 Jan 2026 09:47:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769766448; cv=none; b=DDdXkhJr0qn9OkSTbQ+N0eHOBi8PivEKMnJ1OhJ/YCzGarygkSc5PJznWlPSdaWAZvoJhB7wVNZn7DpnzZKcVfkLBaqLv7v7TSurIQpUjpT6mUolCiTdA1PmswPHHiJkPAOqgV8UfAffjSbRnE/2B037Q0NuwxOIzdKVY/fJK1Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769766448; c=relaxed/simple; bh=CUzSYwXtEn6bKJhYuTm2YTqKJiPH8OF3mreBceKxDwk=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=oERFj2/iHQlXYk0zKvcuufnP2h79CS8sanWv9gLnxjA2c50XSwA6aFZMeXofgRJLbwwzjK7nzdAzz7rjtECXJKUKWFn+18EmIE3FMr6kVWgmJKt5Qtjafx+fIAE3kFsyFxRizcwrckO2wB/x25RB+Sp/Nk2GW3LPtpe8HPvQxwA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=gVfs2bjF; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="gVfs2bjF" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=sfP2nSjjOinSs1YfVph5mDoqs4X7R7bDhOi26++MamM=; b=gVfs2bjFYOkvQPfbTI5V8DJJ1u MMiOQKUa+yKrBRmyMDKCgn2fIMzumk4T638aF9YiX+7Gru2rdFLR7WReXZTvyA4zTZnh6Zf2Er3gT gOX3PygO2SowJb65n8T6GjZECnWNdqgKky/AEHmILrMV58cKm3aAhexYsJPfUfqhnkF9JRtAYmmn2 frv96KVdwHdNl0+Hptc+etBYFF6eQD1lQ/J2mYnyrVx6A5u5RXccFbpQMvefttu+ZfA6SSFjp2QbD MC0qZkL74ZF79/E+tHiDlw26dQgdGv9RNos17Xq17JlYguym0PVd3aBHlZChNGYQKS6IzL5fFZUP5 Kd/PxZig==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1vll5n-0000000BzUB-1bs3; Fri, 30 Jan 2026 09:47:07 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id 2CAE1303320; Fri, 30 Jan 2026 10:47:06 +0100 (CET) Message-ID: <20260130094607.945348633@infradead.org> User-Agent: quilt/0.68 Date: Fri, 30 Jan 2026 10:34:40 +0100 From: Peter Zijlstra To: mingo@kernel.org Cc: peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, linux-kernel@vger.kernel.org, wangtao554@huawei.com, quzicheng@huawei.com, kprateek.nayak@amd.com, wuyun.abel@bytedance.com, dsmythies@telus.net Subject: [PATCH 1/4] sched/fair: Only set slice protection at pick time References: <20260130093439.803225718@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" We should not (re)set slice protection in the sched_change pattern which calls put_prev_task() / set_next_task(). Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Vincent Guittot --- kernel/sched/fair.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5420,7 +5420,7 @@ dequeue_entity(struct cfs_rq *cfs_rq, st } =20 static void -set_next_entity(struct cfs_rq *cfs_rq, struct sched_entity *se) +set_next_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, bool first) { clear_buddies(cfs_rq, se); =20 @@ -5435,7 +5435,8 @@ set_next_entity(struct cfs_rq *cfs_rq, s __dequeue_entity(cfs_rq, se); update_load_avg(cfs_rq, se, UPDATE_TG); =20 - set_protect_slice(cfs_rq, se); + if (first) + set_protect_slice(cfs_rq, se); } =20 update_stats_curr_start(cfs_rq, se); @@ -8958,13 +8959,13 @@ pick_next_task_fair(struct rq *rq, struc pse =3D parent_entity(pse); } if (se_depth >=3D pse_depth) { - set_next_entity(cfs_rq_of(se), se); + set_next_entity(cfs_rq_of(se), se, true); se =3D parent_entity(se); } } =20 put_prev_entity(cfs_rq, pse); - set_next_entity(cfs_rq, se); + set_next_entity(cfs_rq, se, true); =20 __set_next_task_fair(rq, p, true); } @@ -13578,7 +13579,7 @@ static void set_next_task_fair(struct rq for_each_sched_entity(se) { struct cfs_rq *cfs_rq =3D cfs_rq_of(se); =20 - set_next_entity(cfs_rq, se); + set_next_entity(cfs_rq, se, first); /* ensure bandwidth has been allocated on our new cfs_rq */ account_cfs_rq_runtime(cfs_rq, 0); } From nobody Sat Feb 7 09:30:18 2026 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 97B07329E5E for ; Fri, 30 Jan 2026 09:47:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769766444; cv=none; b=kcev782Gf97wqhP/ZW3KyBlPkbYJ4nook/xWu02uHyHrhFgxLcn3NYQ5BpXfF8D0oGd3gb9ur6wJ68Z+yoK4IuPgVPgSQmIEL6zlyL9dy99tdtfZBugmNeHz4oNXtJSbMGMy8rKk8nRGH3Og7dJD9P5A6wS9mC5k6kEPGApZNdw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769766444; c=relaxed/simple; bh=5xEZmEBbkfegr2OGVKeKsdyevstNRSAUPMTM6/KoY+4=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=azHWdvgexB41Gq612YdlJsrQQZmiSo69FpDcfgIKnA7Q+Tac6ww35O7q7f9/LLsaTQKV8d97yFU8pFJVSBWOialKa4BkLnL88PIfHXDG/E5zdJ8r5fTlEYY1VmO8L1UEMgF6fCVUN5r80wZsQdMdHYXP9OUp3Qoll4f3e/z6YY8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=eZLz0LgJ; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="eZLz0LgJ" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=F7w+Q7+8/Rniw3Q2nYlILu+YaOhI5ezuUMSHbQvjRHQ=; b=eZLz0LgJl7Nl2sVExgyH77fAoo B4WHH0xudR26XkyQkOW5jKw9AnJFSt1D0VdJA1bSBYom9ArqFNwPR3jJsSWSmBOQubsYJ23lSdDc/ Moin99KyK6esCaCYhc+eCrteeCzwAF1OF4ricngQlxOXsEuQcHaCJ5VkDlICSwEn1iy0D0w1Bx5Cc 6pa639viLcXGlyb71NQix7s4SdHIlI1fdQ+wvQq9cSC880KK1C/Ee7xZyrsBgql15JhDAlUfQPGqj zs3WEofjWog4p3R8CupU2H+gSYh/MIAOO7gkCW/f2gSBzd4J2eNvTHyWR8PhbjqTPEY5m98qHXDB3 Ov1+lwjg==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1vll5n-0000000BZrE-3KnB; Fri, 30 Jan 2026 09:47:08 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id 311E8303323; Fri, 30 Jan 2026 10:47:06 +0100 (CET) Message-ID: <20260130094608.058761221@infradead.org> User-Agent: quilt/0.68 Date: Fri, 30 Jan 2026 10:34:41 +0100 From: Peter Zijlstra To: mingo@kernel.org Cc: peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, linux-kernel@vger.kernel.org, wangtao554@huawei.com, quzicheng@huawei.com, kprateek.nayak@amd.com, wuyun.abel@bytedance.com, dsmythies@telus.net, Zhang Qiao Subject: [PATCH 2/4] sched/eevdf: Update se->vprot in reweight_entity() References: <20260130093439.803225718@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Wang Tao In the EEVDF framework with Run-to-Parity protection, `se->vprot` is an independent variable defining the virtual protection timestamp. When `reweight_entity()` is called (e.g., via nice/renice), it performs the following actions to preserve Lag consistency: 1. Scales `se->vlag` based on the new weight. 2. Calls `place_entity()`, which recalculates `se->vruntime` based on the new weight and scaled lag. However, the current implementation fails to update `se->vprot`, leading to mismatches between the task's actual runtime and its expected duration. Fixes: 63304558ba5d ("sched/eevdf: Curb wakeup-preemption") Suggested-by: Zhang Qiao Signed-off-by: Wang Tao Signed-off-by: Peter Zijlstra (Intel) Link: https://patch.msgid.link/20260120123113.3518950-1-wangtao554@huawei.c= om Reviewed-by: Vincent Guittot --- kernel/sched/fair.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3790,6 +3790,8 @@ static void reweight_entity(struct cfs_r unsigned long weight) { bool curr =3D cfs_rq->curr =3D=3D se; + bool rel_vprot =3D false; + u64 vprot; =20 if (se->on_rq) { /* commit outstanding execution time */ @@ -3797,6 +3799,11 @@ static void reweight_entity(struct cfs_r update_entity_lag(cfs_rq, se); se->deadline -=3D se->vruntime; se->rel_deadline =3D 1; + if (curr && protect_slice(se)) { + vprot =3D se->vprot - se->vruntime; + rel_vprot =3D true; + } + cfs_rq->nr_queued--; if (!curr) __dequeue_entity(cfs_rq, se); @@ -3812,6 +3819,9 @@ static void reweight_entity(struct cfs_r if (se->rel_deadline) se->deadline =3D div_s64(se->deadline * se->load.weight, weight); =20 + if (rel_vprot) + vprot =3D div_s64(vprot * se->load.weight, weight); + update_load_set(&se->load, weight); =20 do { @@ -3823,6 +3833,8 @@ static void reweight_entity(struct cfs_r enqueue_load_avg(cfs_rq, se); if (se->on_rq) { place_entity(cfs_rq, se, 0); + if (rel_vprot) + se->vprot =3D se->vruntime + vprot; update_load_add(&cfs_rq->load, se->load.weight); if (!curr) __enqueue_entity(cfs_rq, se); From nobody Sat Feb 7 09:30:18 2026 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 97A872D9EC2 for ; Fri, 30 Jan 2026 09:47:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769766445; cv=none; b=eiLN67NpMSTutOCDCpUw0SWaT54luXpqrdCjN/ttIXHqeZn/K0fVsFQxg4XUsCNqpn1bgoDBvgkrfOxigwf6Oc4b4e1tZViNt8LnO7P3M6ttihg4c6dhphlLXp2CVrGS+OxL+h1vVWNC7dEEgOuhRzyjcutyc+TDyFAeXn2x50I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769766445; c=relaxed/simple; bh=iSlTT1I0NuIhQRv2468yXZwTuwx4cZvUP25ewWENwnw=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=IO6CFp11bfJueoTZKDm07ALYgkEAJy1Kze9OVapclwxBTJvVvsAId1xnzLATsN1DcmVZ+aFaQ89FbnCzEom7hM0Q0+RCDU5kIcq7zDJ0YaVSiwnvEKpr0X+esMbdjRX1pGJ0hFw06lKBc9IKkignnwBFIR4qDS5gWVz5knAFiuM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=CmHCgrlL; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="CmHCgrlL" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=pIs0NMRsyfAwIcaR5r6HVvUwYdHyo1fldgrnn8JRq5I=; b=CmHCgrlLarBMO5Ph7rl3ki2t9u Yixr34pF0PFjJ777FdgMYABXSe56/N6/pVpUb/vLayK3N/VxuGIdDpnscWAGiSC/Ir4S0DiO3sgs7 cCvo6RFqtZ147w8cH4/EIX7PCyxvQOkj5cz+rwK2TkdP7bHnQy2UiGgOnQZ9mmfQVve2BoFOLplJI qX7UzLxe4ZlL45PX9syHQH4GGCHTsA+LPkGOT4jVBmA5MyYlV5B5p+gUmnuRFBag90/FFusOoxUCm GRmfve4oHkYYxzc/7L5ssWSaRcwnx3dMC5JgxzK18dnNMnzeQSZawPNJLfHNThVaNF72/fHkfLK5e NrO4UpWQ==; Received: from 2001-1c00-8d85-5700-266e-96ff-fe07-7dcc.cable.dynamic.v6.ziggo.nl ([2001:1c00:8d85:5700:266e:96ff:fe07:7dcc] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1vll5n-0000000BZrD-3GIk; Fri, 30 Jan 2026 09:47:08 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id 35D55303324; Fri, 30 Jan 2026 10:47:06 +0100 (CET) Message-ID: <20260130094608.190020747@infradead.org> User-Agent: quilt/0.68 Date: Fri, 30 Jan 2026 10:34:42 +0100 From: Peter Zijlstra To: mingo@kernel.org Cc: peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, linux-kernel@vger.kernel.org, wangtao554@huawei.com, quzicheng@huawei.com, kprateek.nayak@amd.com, wuyun.abel@bytedance.com, dsmythies@telus.net Subject: [PATCH 3/4] sched/fair: Increase weight bits for avg_vruntime References: <20260130093439.803225718@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Due to the zero_vruntime patch, the deltas are now a lot smaller and measurement with kernel-build and hackbench runs show about 54 bits used. This ensures avg_vruntime() tracks the full weight range, reducing numerical artifacts in reweight and the like. Also, lets keep the paranoid debug code around this time. Signed-off-by: Peter Zijlstra (Intel) --- kernel/sched/debug.c | 14 ++++++++ kernel/sched/fair.c | 77 +++++++++++++++++++++++++++++++++++++++++--= ----- kernel/sched/features.h | 2 + kernel/sched/sched.h | 3 + 4 files changed, 83 insertions(+), 13 deletions(-) --- a/kernel/sched/debug.c +++ b/kernel/sched/debug.c @@ -8,6 +8,7 @@ */ #include #include +#include #include "sched.h" =20 /* @@ -790,10 +791,13 @@ static void print_rq(struct seq_file *m, =20 void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq) { - s64 left_vruntime =3D -1, zero_vruntime, right_vruntime =3D -1, left_dead= line =3D -1, spread; + s64 left_vruntime =3D -1, right_vruntime =3D -1, left_deadline =3D -1, sp= read; + s64 zero_vruntime =3D -1, sum_w_vruntime =3D -1; struct sched_entity *last, *first, *root; struct rq *rq =3D cpu_rq(cpu); + unsigned int sum_shift; unsigned long flags; + u64 sum_weight; =20 #ifdef CONFIG_FAIR_GROUP_SCHED SEQ_printf(m, "\n"); @@ -814,6 +818,9 @@ void print_cfs_rq(struct seq_file *m, in if (last) right_vruntime =3D last->vruntime; zero_vruntime =3D cfs_rq->zero_vruntime; + sum_w_vruntime =3D cfs_rq->sum_w_vruntime; + sum_weight =3D cfs_rq->sum_weight; + sum_shift =3D cfs_rq->sum_shift; raw_spin_rq_unlock_irqrestore(rq, flags); =20 SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "left_deadline", @@ -822,6 +829,11 @@ void print_cfs_rq(struct seq_file *m, in SPLIT_NS(left_vruntime)); SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "zero_vruntime", SPLIT_NS(zero_vruntime)); + SEQ_printf(m, " .%-30s: %Ld (%d bits)\n", "sum_w_vruntime", + sum_w_vruntime, ilog2(abs(sum_w_vruntime))); + SEQ_printf(m, " .%-30s: %Lu\n", "sum_weight", + sum_weight); + SEQ_printf(m, " .%-30s: %u\n", "sum_shift", sum_shift); SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "avg_vruntime", SPLIT_NS(avg_vruntime(cfs_rq))); SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "right_vruntime", --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -650,15 +650,20 @@ static inline s64 entity_key(struct cfs_ * Since zero_vruntime closely tracks the per-task service, these * deltas: (v_i - v0), will be in the order of the maximal (virtual) lag * induced in the system due to quantisation. - * - * Also, we use scale_load_down() to reduce the size. - * - * As measured, the max (key * weight) value was ~44 bits for a kernel bui= ld. */ -static void -sum_w_vruntime_add(struct cfs_rq *cfs_rq, struct sched_entity *se) +static inline unsigned long avg_vruntime_weight(struct cfs_rq *cfs_rq, uns= igned long w) +{ +#ifdef CONFIG_64BIT + if (cfs_rq->sum_shift) + w =3D max(2UL, w >> cfs_rq->sum_shift); +#endif + return w; +} + +static inline void +__sum_w_vruntime_add(struct cfs_rq *cfs_rq, struct sched_entity *se) { - unsigned long weight =3D scale_load_down(se->load.weight); + unsigned long weight =3D avg_vruntime_weight(cfs_rq, se->load.weight); s64 key =3D entity_key(cfs_rq, se); =20 cfs_rq->sum_w_vruntime +=3D key * weight; @@ -666,9 +671,59 @@ sum_w_vruntime_add(struct cfs_rq *cfs_rq } =20 static void +sum_w_vruntime_add_paranoid(struct cfs_rq *cfs_rq, struct sched_entity *se) +{ + unsigned long weight; + s64 key, tmp; + +again: + weight =3D avg_vruntime_weight(cfs_rq, se->load.weight); + key =3D entity_key(cfs_rq, se); + + if (check_mul_overflow(key, weight, &key)) + goto overflow; + + if (check_add_overflow(cfs_rq->sum_w_vruntime, key, &tmp)) + goto overflow; + + cfs_rq->sum_w_vruntime =3D tmp; + cfs_rq->sum_weight +=3D weight; + return; + +overflow: + /* + * There's gotta be a limit -- if we're still failing at this point + * there's really nothing much to be done about things. + */ + BUG_ON(cfs_rq->sum_shift >=3D 10); + cfs_rq->sum_shift++; + + /* + * Note: \Sum (k_i * (w_i >> 1)) !=3D (\Sum (k_i * w_i)) >> 1 + */ + cfs_rq->sum_w_vruntime =3D 0; + cfs_rq->sum_weight =3D 0; + + for (struct rb_node *node =3D cfs_rq->tasks_timeline.rb_leftmost; + node; node =3D rb_next(node)) + __sum_w_vruntime_add(cfs_rq, __node_2_se(node)); + + goto again; +} + +static void +sum_w_vruntime_add(struct cfs_rq *cfs_rq, struct sched_entity *se) +{ + if (sched_feat(PARANOID_AVG)) + return sum_w_vruntime_add_paranoid(cfs_rq, se); + + __sum_w_vruntime_add(cfs_rq, se); +} + +static void sum_w_vruntime_sub(struct cfs_rq *cfs_rq, struct sched_entity *se) { - unsigned long weight =3D scale_load_down(se->load.weight); + unsigned long weight =3D avg_vruntime_weight(cfs_rq, se->load.weight); s64 key =3D entity_key(cfs_rq, se); =20 cfs_rq->sum_w_vruntime -=3D key * weight; @@ -695,7 +750,7 @@ u64 avg_vruntime(struct cfs_rq *cfs_rq) long load =3D cfs_rq->sum_weight; =20 if (curr && curr->on_rq) { - unsigned long weight =3D scale_load_down(curr->load.weight); + unsigned long weight =3D avg_vruntime_weight(cfs_rq, curr->load.weight); =20 avg +=3D entity_key(cfs_rq, curr) * weight; load +=3D weight; @@ -5170,9 +5225,9 @@ place_entity(struct cfs_rq *cfs_rq, stru */ load =3D cfs_rq->sum_weight; if (curr && curr->on_rq) - load +=3D scale_load_down(curr->load.weight); + load +=3D avg_vruntime_weight(cfs_rq, curr->load.weight); =20 - lag *=3D load + scale_load_down(se->load.weight); + lag *=3D load + avg_vruntime_weight(cfs_rq, se->load.weight); if (WARN_ON_ONCE(!load)) load =3D 1; lag =3D div_s64(lag, load); --- a/kernel/sched/features.h +++ b/kernel/sched/features.h @@ -58,6 +58,8 @@ SCHED_FEAT(CACHE_HOT_BUDDY, true) SCHED_FEAT(DELAY_DEQUEUE, true) SCHED_FEAT(DELAY_ZERO, true) =20 +SCHED_FEAT(PARANOID_AVG, false) + /* * Allow wakeup-time preemption of the current task: */ --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -679,8 +679,9 @@ struct cfs_rq { =20 s64 sum_w_vruntime; u64 sum_weight; - u64 zero_vruntime; + unsigned int sum_shift; + #ifdef CONFIG_SCHED_CORE unsigned int forceidle_seq; u64 zero_vruntime_fi; From nobody Sat Feb 7 09:30:18 2026 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B33E32DEA94 for ; Fri, 30 Jan 2026 09:47:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769766444; cv=none; b=URfu8+cXSdBlPcsKeQBDQfcqv6BV0+Rv/0EwwsvibZmw24TnPffdMIaMveoGEbnqVCPgYQCAbhgT26ibIisBJUoNEmF60TskfqbT2qfM40l31L39hGvYfsgVECTNTTUemZJR07/R1bWFRQYx+LkIKcJZFZyc/oQ6S+RlhQslaTU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769766444; c=relaxed/simple; bh=4sCfDrcawGQUUXxOtUKxLPfr16XQFFQaCVLSA0FI8Qk=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=LAryT8jqFvpLhh42HOQmFO60IUJBt4fGqxZUh/tjP8Nmpd3f1AYyYjTjqOMeCTHkrdIELP6T4x/1MO9YlWiOjSFOlvIHLP7iB9tx7Kokk5o9aIB18vbBNnmnUt2Z2zKYxbks8c/MvTZ0CWY37cppsExllUjHvEPtlFA7AaV/BE0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=V7et1b6a; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="V7et1b6a" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=LZhsF/Eim8xuo3bNblOXGtXHxO3fvjWt2GLWDS/ISmM=; b=V7et1b6aidAiSEkubD++fLFVcN c1xBv+Vcn46G3+ndrhDCJnqKb1hWA4nyV0R1ZfT8+mduCwb92J1leMo8ysV+9O8t8aVkyw8YWQqHI HR4vNZZVjtRotSagNoDsuTT3jdZrq5yVzdI9FmSyqqaUMHlko2D7KiN58qtW1QQ/Hp5RjvWR1xqWt 6ZtRWQLhAGLW2EAU5+hMD2YvMAtcx3QNCewUmTtXQ5IQrTOSsJIb4GeFZwvxexPWcYqRRb4TTdoSE 4sLPwgc0WwYcbA9ac4oVhVJNRt9xXrL2wQ2kkAZgM4YsyvR4IotbzbrlBhSsUY6eXGMD1onYfaJUK 2nbKXILg==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1vll5n-0000000BZrF-3ICa; Fri, 30 Jan 2026 09:47:08 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id 3A0B1303329; Fri, 30 Jan 2026 10:47:06 +0100 (CET) Message-ID: <20260130094608.304041157@infradead.org> User-Agent: quilt/0.68 Date: Fri, 30 Jan 2026 10:34:43 +0100 From: Peter Zijlstra To: mingo@kernel.org Cc: peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, linux-kernel@vger.kernel.org, wangtao554@huawei.com, quzicheng@huawei.com, kprateek.nayak@amd.com, wuyun.abel@bytedance.com, dsmythies@telus.net Subject: [PATCH 4/4] sched/fair: Revert 6d71a9c61604 ("sched/fair: Fix EEVDF entity placement bug causing scheduling lag") References: <20260130093439.803225718@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Zicheng Qu reported that, because avg_vruntime() always includes cfs_rq->curr, when ->on_rq, place_entity() doesn't work right. Specifically, the lag scaling in place_entity() relies on avg_vruntime() being the state *before* placement of the new entity. However in this case avg_vruntime() will actually already include the entity, which breaks things. Also, Zicheng Qu argues that avg_vruntime should be invariant under reweight. IOW commit 6d71a9c61604 ("sched/fair: Fix EEVDF entity placement bug causing scheduling lag") was wrong! The issue reported in 6d71a9c61604 could possibly be explained by rounding artifacts -- notably the extreme weight '2' is outside of the range of avg_vruntime/sum_w_vruntime, since that uses scale_load_down(). By scaling vruntime by the real weight, but accounting it in vruntime with a factor 1024 more, the average moves significantly. Tested by reverting 66951e4860d3 ("sched/fair: Fix update_cfs_group() vs DELAY_DEQUEUE") and tracing vruntime and vlag figures again. Reported-by: Zicheng Qu Signed-off-by: Peter Zijlstra (Intel) --- kernel/sched/fair.c | 154 +++++++++++++++++++++++++++++++++++++++++++----= ----- 1 file changed, 129 insertions(+), 25 deletions(-) --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -782,16 +782,21 @@ u64 avg_vruntime(struct cfs_rq *cfs_rq) * * XXX could add max_slice to the augmented data to track this. */ -static void update_entity_lag(struct cfs_rq *cfs_rq, struct sched_entity *= se) +static s64 entity_lag(u64 avruntime, struct sched_entity *se) { s64 vlag, limit; =20 - WARN_ON_ONCE(!se->on_rq); - - vlag =3D avg_vruntime(cfs_rq) - se->vruntime; + vlag =3D avruntime - se->vruntime; limit =3D calc_delta_fair(max_t(u64, 2*se->slice, TICK_NSEC), se); =20 - se->vlag =3D clamp(vlag, -limit, limit); + return clamp(vlag, -limit, limit); +} + +static void update_entity_lag(struct cfs_rq *cfs_rq, struct sched_entity *= se) +{ + WARN_ON_ONCE(!se->on_rq); + + se->vlag =3D entity_lag(avg_vruntime(cfs_rq), se); } =20 /* @@ -3839,23 +3844,135 @@ dequeue_load_avg(struct cfs_rq *cfs_rq, se_weight(se) * -se->avg.load_sum); } =20 -static void place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, i= nt flags); +static void rescale_entity(struct sched_entity *se, unsigned long weight, + u64 avruntime, bool rel_vprot) +{ + unsigned long old_weight =3D se->load.weight; + + /* + * VRUNTIME + * -------- + * + * COROLLARY #1: The virtual runtime of the entity needs to be + * adjusted if re-weight at !0-lag point. + * + * Proof: For contradiction assume this is not true, so we can + * re-weight without changing vruntime at !0-lag point. + * + * Weight VRuntime Avg-VRuntime + * before w v V + * after w' v' V' + * + * Since lag needs to be preserved through re-weight: + * + * lag =3D (V - v)*w =3D (V'- v')*w', where v =3D v' + * =3D=3D> V' =3D (V - v)*w/w' + v (1) + * + * Let W be the total weight of the entities before reweight, + * since V' is the new weighted average of entities: + * + * V' =3D (WV + w'v - wv) / (W + w' - w) (2) + * + * by using (1) & (2) we obtain: + * + * (WV + w'v - wv) / (W + w' - w) =3D (V - v)*w/w' + v + * =3D=3D> (WV-Wv+Wv+w'v-wv)/(W+w'-w) =3D (V - v)*w/w' + v + * =3D=3D> (WV - Wv)/(W + w' - w) + v =3D (V - v)*w/w' + v + * =3D=3D> (V - v)*W/(W + w' - w) =3D (V - v)*w/w' (3) + * + * Since we are doing at !0-lag point which means V !=3D v, we + * can simplify (3): + * + * =3D=3D> W / (W + w' - w) =3D w / w' + * =3D=3D> Ww' =3D Ww + ww' - ww + * =3D=3D> W * (w' - w) =3D w * (w' - w) + * =3D=3D> W =3D w (re-weight indicates w' !=3D w) + * + * So the cfs_rq contains only one entity, hence vruntime of + * the entity @v should always equal to the cfs_rq's weighted + * average vruntime @V, which means we will always re-weight + * at 0-lag point, thus breach assumption. Proof completed. + * + * + * COROLLARY #2: Re-weight does NOT affect weighted average + * vruntime of all the entities. + * + * Proof: According to corollary #1, Eq. (1) should be: + * + * (V - v)*w =3D (V' - v')*w' + * =3D=3D> v' =3D V' - (V - v)*w/w' (4) + * + * According to the weighted average formula, we have: + * + * V' =3D (WV - wv + w'v') / (W - w + w') + * =3D (WV - wv + w'(V' - (V - v)w/w')) / (W - w + w') + * =3D (WV - wv + w'V' - Vw + wv) / (W - w + w') + * =3D (WV + w'V' - Vw) / (W - w + w') + * + * =3D=3D> V'*(W - w + w') =3D WV + w'V' - Vw + * =3D=3D> V' * (W - w) =3D (W - w) * V (5) + * + * If the entity is the only one in the cfs_rq, then reweight + * always occurs at 0-lag point, so V won't change. Or else + * there are other entities, hence W !=3D w, then Eq. (5) turns + * into V' =3D V. So V won't change in either case, proof done. + * + * + * So according to corollary #1 & #2, the effect of re-weight + * on vruntime should be: + * + * v' =3D V' - (V - v) * w / w' (4) + * =3D V - (V - v) * w / w' + * =3D V - vl * w / w' + * =3D V - vl' + */ + se->vlag =3D div_s64(se->vlag * old_weight, weight); + if (avruntime) + se->vruntime =3D avruntime - se->vlag; + + /* + * DEADLINE + * -------- + * + * When the weight changes, the virtual time slope changes and + * we should adjust the relative virtual deadline accordingly. + * + * d' =3D v' + (d - v)*w/w' + * =3D V' - (V - v)*w/w' + (d - v)*w/w' + * =3D V - (V - v)*w/w' + (d - v)*w/w' + * =3D V + (d - V)*w/w' + */ + if (se->rel_deadline) { + se->deadline =3D div_s64(se->deadline * old_weight, weight); + if (avruntime) { + se->rel_deadline =3D 0; + se->deadline +=3D avruntime; + } + } + + if (rel_vprot) { + se->vprot =3D div_s64(se->vprot * old_weight, weight); + if (avruntime) + se->vprot +=3D avruntime; + } +} =20 static void reweight_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, unsigned long weight) { bool curr =3D cfs_rq->curr =3D=3D se; bool rel_vprot =3D false; - u64 vprot; + u64 avruntime =3D 0; =20 if (se->on_rq) { /* commit outstanding execution time */ update_curr(cfs_rq); - update_entity_lag(cfs_rq, se); - se->deadline -=3D se->vruntime; + avruntime =3D avg_vruntime(cfs_rq); + se->vlag =3D entity_lag(avruntime, se); + se->deadline -=3D avruntime; se->rel_deadline =3D 1; if (curr && protect_slice(se)) { - vprot =3D se->vprot - se->vruntime; + se->vprot -=3D avruntime; rel_vprot =3D true; } =20 @@ -3866,30 +3983,17 @@ static void reweight_entity(struct cfs_r } dequeue_load_avg(cfs_rq, se); =20 - /* - * Because we keep se->vlag =3D V - v_i, while: lag_i =3D w_i*(V - v_i), - * we need to scale se->vlag when w_i changes. - */ - se->vlag =3D div_s64(se->vlag * se->load.weight, weight); - if (se->rel_deadline) - se->deadline =3D div_s64(se->deadline * se->load.weight, weight); - - if (rel_vprot) - vprot =3D div_s64(vprot * se->load.weight, weight); + rescale_entity(se, weight, avruntime, rel_vprot); =20 update_load_set(&se->load, weight); =20 do { u32 divider =3D get_pelt_divider(&se->avg); - se->avg.load_avg =3D div_u64(se_weight(se) * se->avg.load_sum, divider); } while (0); =20 enqueue_load_avg(cfs_rq, se); if (se->on_rq) { - place_entity(cfs_rq, se, 0); - if (rel_vprot) - se->vprot =3D se->vruntime + vprot; update_load_add(&cfs_rq->load, se->load.weight); if (!curr) __enqueue_entity(cfs_rq, se); @@ -5247,7 +5351,7 @@ place_entity(struct cfs_rq *cfs_rq, stru =20 se->vruntime =3D vruntime - lag; =20 - if (se->rel_deadline) { + if (sched_feat(PLACE_REL_DEADLINE) && se->rel_deadline) { se->deadline +=3D se->vruntime; se->rel_deadline =3D 0; return;