From nobody Mon Jun 8 12:13:56 2026 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 273162E06ED; Fri, 29 May 2026 10:45:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780051543; cv=none; b=jXbgMLC0GvgQSyFOJQeb3NbyM/vDF6/vN/dIJrgnQRIpfGg+n4Qzj91L3ECTZyEL78hh35MOE9AJKVI9cc9ErV9xwnGn7cs74qx9iCB3NI6O6h4vEcg+6MSvI1HxXZaPersi6m0Slyr/6aplX6EpuiIVRfip00RrXFibUqiifWc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780051543; c=relaxed/simple; bh=SdmakWj5dzE2uNEpH1uRkgnlMsvzUi1i1eQ9tWiaLLs=; h=Date:From:To:Subject:Cc:In-Reply-To:References:MIME-Version: Message-ID:Content-Type; b=TjVTLJ973v7E1G3jBKN6JLGQqBEpaowHYynWCfXM0juzZWlfheR/moGk96wspXF6l1SfByX2a17Rs+KyVPFBuyGivitNI3t8NPGAv7mFMrGUObtZI904Ag/6SGfD2gFdEbw8hDHUimuTHjEzVXMdU9o6qsMo0yA5M1kf8sciR74= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=BRgYDzMr; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=so0L9vPj; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="BRgYDzMr"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="so0L9vPj" Date: Fri, 29 May 2026 10:45:37 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1780051539; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Qkj2zjMS+YwgcGWr4BIseNctVUGpvCOM0S873KOXV60=; b=BRgYDzMrb/rLD/YTJYfJX9HAjBVqruJM8Eq8gNSKb7pz3VesBq7h2YVpOBfPB33y9QRylL Xz1kYpCB32QMmL065KP8Ljkk7ufzSF4JsVOPSp2nxoY/sJ9cuKcFyqyXZJYWpQs1FpUiwP NVv5WATYoChv5GdkUZ5WF+Pr+pOFESGt5E10pvdGk8aZyUPHAB469qqdicL2fx0QFLwWSG RFCkWEaieWlcz+IXGUkWSJXGFFx7dBSWdABUtRQMeJD50RRHVx4WsUSF8x4Nnb/84iq//+ MM8Z0Ml0jNJ4LM4hXl4INyo1B0YGpcssaIOaUY55ne1gxbV1FFAerVMHguRkWg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1780051539; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Qkj2zjMS+YwgcGWr4BIseNctVUGpvCOM0S873KOXV60=; b=so0L9vPjWBIJTF1xr+5B+ReZldTroR9T13f/xvrzJXOhtNrJhn08RpoPLZlDO5zwgotYNY KgaMYenZzVWJ2ZDQ== From: "tip-bot2 for Rik van Riel" Sender: tip-bot2@linutronix.de Reply-to: linux-kernel@vger.kernel.org To: linux-tip-commits@vger.kernel.org Subject: [tip: sched/core] sched/fair: Use rq_clock() in update_tg_load_avg() rate-limit Cc: Rik van Riel , "Peter Zijlstra (Intel)" , Vincent Guittot , x86@kernel.org, linux-kernel@vger.kernel.org In-Reply-To: <20260527110250.6a91718d@fangorn> References: <20260527110250.6a91718d@fangorn> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-ID: <178005153770.1039918.6053015894167022043.tip-bot2@tip-bot2> Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails Precedence: bulk Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable The following commit has been merged into the sched/core branch of tip: Commit-ID: 3b7be8e7fa698359616c3276e005f08c3b6070e4 Gitweb: https://git.kernel.org/tip/3b7be8e7fa698359616c3276e005f08c3= b6070e4 Author: Rik van Riel AuthorDate: Tue, 26 May 2026 12:43:29 -07:00 Committer: Peter Zijlstra CommitterDate: Fri, 29 May 2026 12:43:16 +02:00 sched/fair: Use rq_clock() in update_tg_load_avg() rate-limit update_tg_load_avg() is called once per leaf cfs_rq from the __update_blocked_fair() walk that runs inside the NOHZ idle-balance softirq, and again from update_load_avg() with UPDATE_TG. Its first operation after the trivial early-outs is unconditionally: now =3D sched_clock_cpu(cpu_of(rq_of(cfs_rq))); if (now - cfs_rq->last_update_tg_load_avg < NSEC_PER_MSEC) return; Jakub ran into a system where nohz_idle_balance() was taking 75% of a CPU (which is handling network traffic and doing many irq_exit_cpu calls), with 35% of that CPU spent in update_load_avg, and 17% of the CPU in sched_clock_cpu(), reading the TSC. In a quick synthetic test, it looks like this patch reduces the CPU use of sched_balance_update_blocked_averages by about 20%. Switch the rate-limit to read rq_clock(rq_of(cfs_rq)) instead. This eliminates the rdtsc, and uses a fairly fresh timestamp, because all callers of update_tg_load_avg() and clear_tg_load_avg() hold rq->lock and have called update_rq_clock(rq) within microseconds: caller pre-state __update_blocked_fair encloser did update_rq_clock(rq) update_load_avg's three UPDATE_TG sites under rq->lock after enqueue/deq= ueue/update_curr attach_/detach_entity_cfs_rq preceded by update_load_avg(...) clear_tg_load_avg via offline path rq_clock_start_loop_update(rq) u= pfront so rq->clock is fresh at every call. Since cfs_rqs are per-CPU per-task_group, cfs_rq->last_update_tg_load_avg is always compared against the same rq's clock; no cross-rq drift. Signed-off-by: Rik van Riel Assisted-by: Claude (Anthropic) Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Vincent Guittot Link: https://patch.msgid.link/20260527110250.6a91718d@fangorn --- kernel/sched/fair.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 62a2dcb..b5819c4 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4962,7 +4962,7 @@ static inline void update_tg_load_avg(struct cfs_rq *= cfs_rq) * For migration heavy workloads, access to tg->load_avg can be * unbound. Limit the update rate to at most once per ms. */ - now =3D sched_clock_cpu(cpu_of(rq_of(cfs_rq))); + now =3D rq_clock(rq_of(cfs_rq)); if (now - cfs_rq->last_update_tg_load_avg < NSEC_PER_MSEC) return; =20 @@ -4985,7 +4985,7 @@ static inline void clear_tg_load_avg(struct cfs_rq *c= fs_rq) if (cfs_rq->tg =3D=3D &root_task_group) return; =20 - now =3D sched_clock_cpu(cpu_of(rq_of(cfs_rq))); + now =3D rq_clock(rq_of(cfs_rq)); delta =3D 0 - cfs_rq->tg_load_avg_contrib; atomic_long_add(delta, &cfs_rq->tg->load_avg); cfs_rq->tg_load_avg_contrib =3D 0;