From nobody Sun Jun 21 04:20:56 2026 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E76BC1E9B1A; Tue, 7 Apr 2026 13:42:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775569331; cv=none; b=D7ReOLbrh0f/zfpUiTxUjEQBYQOhsP8fhqGPVhb4++Jhil8U0PMiZMGxTc9HV8WPKnuU3bbF/ISoZemyufvhkBU7U1dWztfUX8e291Y5XazXi6WNMPFudzw+ZLJSYYU1e/1S73k0neCE0U1dR8Up84/qkGH1KfPYfH/tVuW/lO4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775569331; c=relaxed/simple; bh=aik2IpMg9ZVRWkL2ZbJhx/h4o6Ge0EUvzC2lTi2rayM=; h=Date:From:To:Subject:Cc:In-Reply-To:References:MIME-Version: Message-ID:Content-Type; b=Z8madlxYmahJaXtFQNAskrOyf4oxMCYHBn0v2cwMkFX3F/5wuh+XaFUd9ZT70kDlH9j4dt7BGhM3y8iJv2lZIWXgxkukE28vlNDcUD9dk476lVWoTHsxfTiyTqbMKt6kDF0XXosElU9ypDKrxQxSEUwjWYFSuI1rFbVtjmEPmKA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=v80Z2G3n; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=nOlGykbH; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="v80Z2G3n"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="nOlGykbH" Date: Tue, 07 Apr 2026 13:42:06 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1775569327; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=x2IbYkqjShKUtP3EnCp9TLnvZiECAimll/uQWRP5Ffw=; b=v80Z2G3n6L7jLMrx7wBvMCooslHXYN81JeTLBYiMmBeIu52D5cRm6H3CHfXJilnhlI6nv9 Zzsw28rjAwz27A+VeyxkIVY+grHjwLDBgWDI598RRtz/AFz0NsiKmSpQVvjAAsRol4iHmd xRwiEyRROKTd3mhr+bjXvku6O53Xt+b1k5jHdU2rQKpHj9p2i07yFOr+8azHuKmYEsqWTx WgcJiWOVXbh99D7/LIOnMUGrVhN0t7lzKzq4p0WoM8ukk+cUghBe0sVLl4cljRbg9QKAZ2 sYQDNlKNmaWeN1ONpvue6ffKlZtmJzgHzjV/zRH4CkoL/8BF8wqIqMYyGyJUeg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1775569327; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=x2IbYkqjShKUtP3EnCp9TLnvZiECAimll/uQWRP5Ffw=; b=nOlGykbH+oGKeCQgX9gmOPeuGbOJQCaG11WhdooPjUSS56qtbqmxFuizngfre+t84E/Acm J6GCh67bELs9uJAw== From: "tip-bot2 for K Prateek Nayak" Sender: tip-bot2@linutronix.de Reply-to: linux-kernel@vger.kernel.org To: linux-tip-commits@vger.kernel.org Subject: [tip: sched/core] sched/fair: Avoid overflow in enqueue_entity() Cc: K Prateek Nayak , "Peter Zijlstra (Intel)" , x86@kernel.org, linux-kernel@vger.kernel.org In-Reply-To: <20260407120052.GG3738010@noisy.programming.kicks-ass.net> References: <20260407120052.GG3738010@noisy.programming.kicks-ass.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-ID: <177556932623.226963.18250390207164382053.tip-bot2@tip-bot2> Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails Precedence: bulk Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable The following commit has been merged into the sched/core branch of tip: Commit-ID: 556146ce5e9476db234134c46ddf0e154ca17028 Gitweb: https://git.kernel.org/tip/556146ce5e9476db234134c46ddf0e154= ca17028 Author: K Prateek Nayak AuthorDate: Tue, 07 Apr 2026 13:36:17 +02:00 Committer: Peter Zijlstra CommitterDate: Tue, 07 Apr 2026 14:02:00 +02:00 sched/fair: Avoid overflow in enqueue_entity() Here is one scenario which was triggered when running: stress-ng --yield=3D32 -t 10000000s& while true; do perf bench sched messaging -p -t -l 100000 -g 16; done on a 256CPUs machine after about an hour into the run: __enqeue_entity: entity_key(-141245081754) weight(90891264) overflow_mu= l(5608800059305154560) vlag(57498) delayed?(0) cfs_rq: zero_vruntime(3809707759657809) sum_w_vruntime(0) sum_weight(0)= nr_queued(1) cfs_rq->curr: entity_key(0) vruntime(3809707759657809) deadline(3809723= 966988476) weight(37) The above comes from __enqueue_entity() after a place_entity(). Breaking this down: vlag_initial =3D 57498 vlag =3D (57498 * (37 + 90891264)) / 37 =3D 141,245,081,754 vruntime =3D 3809707759657809 - 141245081754 =3D 3,809,566,514,576,055 entity_key(se, cfs_rq) =3D -141,245,081,754 Now, multiplying the entity_key with its own weight results to 5,608,800,059,305,154,560 (same as what overflow_mul() suggests) but in Python, without overflow, this would be: -1,2837,944,014,404,397,056 Avoid the overflow (without doing the division for avg_vruntime()), by movi= ng zero_vruntime to the new entity when it is heavier. Fixes: 4823725d9d1d ("sched/fair: Increase weight bits for avg_vruntime") Signed-off-by: K Prateek Nayak [peterz: suggested 'weight > load' condition] Signed-off-by: Peter Zijlstra (Intel) Link: https://patch.msgid.link/20260407120052.GG3738010@noisy.programming.k= icks-ass.net --- kernel/sched/fair.c | 32 ++++++++++++++++++++++++++++++-- 1 file changed, 30 insertions(+), 2 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 597ce5b..12890ef 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5352,6 +5352,7 @@ static void place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags) { u64 vslice, vruntime =3D avg_vruntime(cfs_rq); + bool update_zero =3D false; s64 lag =3D 0; =20 if (!se->custom_slice) @@ -5368,7 +5369,7 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_enti= ty *se, int flags) */ if (sched_feat(PLACE_LAG) && cfs_rq->nr_queued && se->vlag) { struct sched_entity *curr =3D cfs_rq->curr; - long load; + long load, weight; =20 lag =3D se->vlag; =20 @@ -5428,14 +5429,41 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_en= tity *se, int flags) if (curr && curr->on_rq) load +=3D avg_vruntime_weight(cfs_rq, curr->load.weight); =20 - lag *=3D load + avg_vruntime_weight(cfs_rq, se->load.weight); + weight =3D avg_vruntime_weight(cfs_rq, se->load.weight); + lag *=3D load + weight; if (WARN_ON_ONCE(!load)) load =3D 1; lag =3D div64_long(lag, load); + + /* + * A heavy entity (relative to the tree) will pull the + * avg_vruntime close to its vruntime position on enqueue. But + * the zero_vruntime point is only updated at the next + * update_deadline()/place_entity()/update_entity_lag(). + * + * Specifically (see the comment near avg_vruntime_weight()): + * + * sum_w_vruntime =3D \Sum (v_i - v0) * w_i + * + * Note that if v0 is near a light entity, both terms will be + * small for the light entity, while in that case both terms + * are large for the heavy entity, leading to risk of + * overflow. + * + * OTOH if v0 is near the heavy entity, then the difference is + * larger for the light entity, but the factor is small, while + * for the heavy entity the difference is small but the factor + * is large. Avoiding the multiplication overflow. + */ + if (weight > load) + update_zero =3D true; } =20 se->vruntime =3D vruntime - lag; =20 + if (update_zero) + update_zero_vruntime(cfs_rq, -lag); + if (sched_feat(PLACE_REL_DEADLINE) && se->rel_deadline) { se->deadline +=3D se->vruntime; se->rel_deadline =3D 0;