From nobody Wed Apr  1 22:00:29 2026
Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2491C425CEF
	for <linux-kernel@vger.kernel.org>; Wed,  1 Apr 2026 13:24:47 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=90.155.92.199
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1775049891; cv=none;
 b=sMFjYNTBQvNucLMeMrc2bTzim++tZlPwc6bGUNKPgg4lms0beG+4DKpvF+lnjRE1YQIne19G900tEINnXH0jRiCaBMF/BWpCXNkmBk0vCq9UMTEysqh4W4ZoQDH/ULOTvYJFTL9DZL4j96g04+oSHMjnO/WC9V4dq3R4zKk+5KA=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1775049891; c=relaxed/simple;
	bh=VVPjurRAoa7mn/xa4XTQiEiow6nI++BZXAes0J9YC1c=;
	h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version:
	 Content-Type;
 b=Sk7DJfozeNf9mydAxjZyiBf7n1xi2wb0F9YtlT4ooZXOC/xJJyNVA/7bs3W24uq4AcK7XqkHiA5A8bqtNTcRahDXzxzGqIGzqnLSKQlVN3R7GGq3+xXwcNg5dyGF6W62qGJj41tQmR5wCc75x92l1tFNNxV8DvlN7H1eu2J8d9w=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=infradead.org;
 spf=none smtp.mailfrom=infradead.org;
 dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org
 header.b=h3Uxuc4s; arc=none smtp.client-ip=90.155.92.199
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=infradead.org
Authentication-Results: smtp.subspace.kernel.org;
 spf=none smtp.mailfrom=infradead.org
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org
 header.b="h3Uxuc4s"
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
	d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References:
	Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding:
	Content-ID:Content-Description:In-Reply-To;
	bh=o3huYjPhKnpxfoi/viJCGU60iYinS0YT43JO6FaHtE4=; b=h3Uxuc4sXl9FXIqPaYqFK+kqnl
	Knu+Ful3EC2iRO7l9J7y9bBUJmiT1B1BLuPEHaYXZYGOs7IfvO1OOBydTE0PwmPQBak3EZSKZu7yH
	cZsYxHadTvbjovElzvXmG0bq40C75VOdxbY303I4DJmV035RWOlPkvUJZW2jp5G64q1zGQqVxOYK8
	52SbwNI23UGeTuKIMEqBwnvz9dvt5vlbD8p/iozLRep3VFLJzSFEzmfuguRK+MTvxl8nVX6ztaSRg
	/dT9H2Qs74ngDCFrepA9rWke+icrwefoUQ+qb7/1mNCQbRymRTZRGdwR7l+W0vVGzt6FjsQch/Aua
	eXK+qazA==;
Received: from
 2001-1c00-8d85-4b00-266e-96ff-fe07-7dcc.cable.dynamic.v6.ziggo.nl
 ([2001:1c00:8d85:4b00:266e:96ff:fe07:7dcc]
 helo=noisy.programming.kicks-ass.net)
	by desiato.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux))
	id 1w7vYl-00000000SCa-1xId;
	Wed, 01 Apr 2026 13:24:39 +0000
Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0)
	id 744B03032E1; Wed, 01 Apr 2026 15:24:37 +0200 (CEST)
Message-ID: <20260401132355.081530332@infradead.org>
User-Agent: quilt/0.68
Date: Wed, 01 Apr 2026 15:20:20 +0200
From: Peter Zijlstra <peterz@infradead.org>
To: jstultz@google.com,
 kprateek.nayak@amd.com
Cc: linux-kernel@vger.kernel.org,
 peterz@infradead.org,
 mingo@kernel.org,
 juri.lelli@redhat.com,
 vincent.guittot@linaro.org,
 dietmar.eggemann@arm.com,
 rostedt@goodmis.org,
 bsegall@google.com,
 mgorman@suse.de,
 vschneid@redhat.com
Subject: [PATCH 1/2] sched/fair: Fix zero_vruntime tracking fix
References: <20260401132019.057895815@infradead.org>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

John reported that stress-ng-yield could make his machine unhappy and
managed to bisect it to commit b3d99f43c72b ("sched/fair: Fix
zero_vruntime tracking").

The combination of yield and that commit was specific enough to
hypothesize the following scenario:

Suppose we have 2 runnable tasks, both doing yield. Then one will be
eligible and one will not be, because the average position must be in
between these two entities.

Therefore, the runnable task will be eligible, and be promoted a full
slice (all the tasks do is yield after all). This causes it to jump over
the other task and now the other task is eligible and current is no
longer. So we schedule.

Since we are runnable, there is no {de,en}queue. All we have is the
__{en,de}queue_entity() from {put_prev,set_next}_task(). But per the
fingered commit, those two no longer move zero_vruntime.

All that moves zero_vruntime are tick and full {de,en}queue.

This means, that if the two tasks playing leapfrog can reach the
critical speed to reach the overflow point inside one tick's worth of
time, we're up a creek.

Additionally, when multiple cgroups are involved, there is no guarantee
the tick will in fact hit every cgroup in a timely manner. Statistically
speaking it will, but that same statistics does not rule out the
possibility of one cgroup not getting a tick for a significant amount of
time -- however unlikely.

Therefore, just like with the yield() case, force an update at the end
of every slice. This ensures the update is never more than a single
slice behind and the whole thing is within 2 lag bounds as per the
comment on entity_key().

Fixes: b3d99f43c72b ("sched/fair: Fix zero_vruntime tracking")
Reported-by: John Stultz <jstultz@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Tested-by: John Stultz <jstultz@google.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c |   10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -707,7 +707,7 @@ void update_zero_vruntime(struct cfs_rq
  * Called in:
  *  - place_entity()      -- before enqueue
  *  - update_entity_lag() -- before dequeue
- *  - entity_tick()
+ *  - update_deadline()   -- slice expiration
  *
  * This means it is one entry 'behind' but that puts it close enough to wh=
ere
  * the bound on entity_key() is at most two lag bounds.
@@ -1131,6 +1131,7 @@ static bool update_deadline(struct cfs_r
 	 * EEVDF: vd_i =3D ve_i + r_i / w_i
 	 */
 	se->deadline =3D se->vruntime + calc_delta_fair(se->slice, se);
+	avg_vruntime(cfs_rq);
=20
 	/*
 	 * The task has consumed its request, reschedule.
@@ -5593,11 +5594,6 @@ entity_tick(struct cfs_rq *cfs_rq, struc
 	update_load_avg(cfs_rq, curr, UPDATE_TG);
 	update_cfs_group(curr);
=20
-	/*
-	 * Pulls along cfs_rq::zero_vruntime.
-	 */
-	avg_vruntime(cfs_rq);
-
 #ifdef CONFIG_SCHED_HRTICK
 	/*
 	 * queued ticks are scheduled to match the slice, so don't bother
@@ -9128,7 +9124,7 @@ static void yield_task_fair(struct rq *r
 	 */
 	if (entity_eligible(cfs_rq, se)) {
 		se->vruntime =3D se->deadline;
-		se->deadline +=3D calc_delta_fair(se->slice, se);
+		update_deadline(cfs_rq, se);
 	}
 }