From nobody Mon Jun 8 17:38:08 2026 Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0342735675F for ; Wed, 27 May 2026 15:03:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=96.67.55.147 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779894223; cv=none; b=By7URETFmOK8GBSdARmTHfiHtECOVAsAdVGWSeyaAIur6O2GD4QakG8mGNN7vZn084r2qWNzznNzwh2dgXaxINkyajSvrgdv0AAllMfywWteJB5XMqYl9Emso8trGIFnyBSBcweUWA/cWAdG3n4pr5DRj5SvXNT3rXwGn2jpyEs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779894223; c=relaxed/simple; bh=9aQKXZp2KtES9jsKFGsX37/sgixcD9Oog7DkV+AVNBs=; h=Date:From:To:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=SZwFRJl94g+ynJRn82T1Qv4/DRHTiT1ea+adyf5cK0Ub1SoZaP4CKnQl8awD6Qk+m2/cmkIIz0U96zHAmmmHsD84EAoc3vUXjyRrR0/55sDV4X9DtAMqaObRTHjNK+AQKDaesVeEW6ap9036CyvmeKjj2r3StouN/O/fnIgfz1Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com; spf=pass smtp.mailfrom=surriel.com; dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b=hKjPxLFT; arc=none smtp.client-ip=96.67.55.147 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=surriel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b="hKjPxLFT" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=surriel.com ; s=mail; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=Qc3n9Qzd6oATZ9gVHti7v4g8+MPBhl2a1CXRq+QINzA=; b=hKjPxLFT95Hs5ozOT86WwAJcSm nTFx0WGm5bROQb5Ac2YtES8O8IFMM0eGto5/rnoqbgdx59C1OPkE0aPW4N4JY4jsObG9Qowwg6DTw DNUdJzZP3bRBch8HOg9XTJncfOAQxg2x1MEFU9OoxfcOOjfrSCj6xrxYyaKGqgzUWFB70P40QdGMg wEpL6FBPjAy04Uo0XGbyEB7M7Au6gG0Kf5dkp9+1ADodqgtPot5EsExUDHJhWkyVV9LMTlcDsr3UR L5HrFlQbV8h2y762Q0kFbDZPG2QBJvuolZcKWtkupDFdTMEtFh3ksmASFG43U1d8eWKp7RsQttmmv qOuMoBqQ==; Received: from [2601:18c:8180:83cc:5a47:caff:fe78:8708] (helo=fangorn) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1wSFmV-0000000058U-1pRT; Wed, 27 May 2026 11:02:51 -0400 Date: Wed, 27 May 2026 11:02:50 -0400 From: Rik van Riel To: Vincent Guittot Cc: Aaron Lu , Ingo Molnar , Peter Zijlstra , Juri Lelli , Jakub Kicinski , Dietmar Eggemann , Steven Rostedt , Valentin Schneider , linux-kernel@vger.kernel.org, kernel-team@meta.com Subject: [PATCH v3] sched/fair: use rq_clock() in update_tg_load_avg() rate-limit Message-ID: <20260527110250.6a91718d@fangorn> In-Reply-To: References: <20260527095930.18427953@fangorn> X-Mailer: Claws Mail 4.3.1 (GTK 3.24.49; x86_64-redhat-linux-gnu) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From 82aea6d6dc076e2f774aadb6b0c1a98c9123af19 Mon Sep 17 00:00:00 2001 From: Rik van Riel Date: Tue, 26 May 2026 12:43:29 -0700 Subject: [PATCH] sched/fair: use rq_clock() in update_tg_load_avg() rate-li= mit update_tg_load_avg() is called once per leaf cfs_rq from the __update_blocked_fair() walk that runs inside the NOHZ idle-balance softirq, and again from update_load_avg() with UPDATE_TG. Its first operation after the trivial early-outs is unconditionally: now =3D sched_clock_cpu(cpu_of(rq_of(cfs_rq))); if (now - cfs_rq->last_update_tg_load_avg < NSEC_PER_MSEC) return; Jakub ran into a system where nohz_idle_balance() was taking 75% of a CPU (which is handling network traffic and doing many irq_exit_cpu calls), with 35% of that CPU spent in update_load_avg, and 17% of the CPU in sched_clock_cpu(), reading the TSC. In a quick synthetic test, it looks like this patch reduces the CPU use of sched_balance_update_blocked_averages by about 20%. Switch the rate-limit to read rq_clock(rq_of(cfs_rq)) instead. This eliminates the rdtsc, and uses a fairly fresh timestamp, because all callers of update_tg_load_avg() and clear_tg_load_avg() hold rq->lock and have called update_rq_clock(rq) within microseconds: caller pre-state __update_blocked_fair encloser did update_rq_clock(rq) update_load_avg's three UPDATE_TG sites under rq->lock after enqueue/deq= ueue/update_curr attach_/detach_entity_cfs_rq preceded by update_load_avg(...) clear_tg_load_avg via offline path rq_clock_start_loop_update(rq) u= pfront so rq->clock is fresh at every call. Since cfs_rqs are per-CPU per-task_group, cfs_rq->last_update_tg_load_avg is always compared against the same rq's clock; no cross-rq drift. Signed-off-by: Rik van Riel Assisted-by: Claude (Anthropic) Reviewed-by: Vincent Guittot --- kernel/sched/fair.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 3ebec186f982..37b39bdf5ef9 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4393,7 +4393,7 @@ static inline void update_tg_load_avg(struct cfs_rq *= cfs_rq) * For migration heavy workloads, access to tg->load_avg can be * unbound. Limit the update rate to at most once per ms. */ - now =3D sched_clock_cpu(cpu_of(rq_of(cfs_rq))); + now =3D rq_clock(rq_of(cfs_rq)); if (now - cfs_rq->last_update_tg_load_avg < NSEC_PER_MSEC) return; =20 @@ -4416,7 +4416,7 @@ static inline void clear_tg_load_avg(struct cfs_rq *c= fs_rq) if (cfs_rq->tg =3D=3D &root_task_group) return; =20 - now =3D sched_clock_cpu(cpu_of(rq_of(cfs_rq))); + now =3D rq_clock(rq_of(cfs_rq)); delta =3D 0 - cfs_rq->tg_load_avg_contrib; atomic_long_add(delta, &cfs_rq->tg->load_avg); cfs_rq->tg_load_avg_contrib =3D 0; --=20 2.54.0