From nobody Thu Oct 2 10:57:10 2025 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5B2182F83C5 for ; Thu, 18 Sep 2025 08:07:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758182828; cv=none; b=TdwHmsfhOLE6upe6XYT8LXc3QQEuPyhW2ta0Zm2V4rxvDC7RKFNQOYC92iuVw1U8m87ZJojLVwGP3sKADKETaOf2XjH4V4/TgMP0qg7GermllLueY8sqrt8GxzzxK8SOprHhyuUQk3iORY4v8TxY+vNQWHvOZEaQyms+6fguOXw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758182828; c=relaxed/simple; bh=IlN4XLKrso/cZZvtFPca5y3RkTjhYGy7kLLh3IzBo+A=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=Dlrf1SZO5wCuwKB2cw2gffv8EPb94vZSkKegb5FtMFqcJ0nRd2T/JaTyL+EAyqT6yc7tee35Sy2VEpZwvUAQ+PdYuiC/q1tjrutCEFDmuA7oXwxIFYiAEzFUPyV6H48u5yhVvDDghlUqQhSw54Vl4uxzldhjuX30qqEM5mGogxE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=kuhSRgWj; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="kuhSRgWj" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=T6/Y9m+G84U4TaaIEmT8jGIPgMyNBhKn1kYlP4YLp50=; b=kuhSRgWjax0PJIYuDLIjvMz6Od FBVuVS23Rc+N+Huknh/Sqsj24+ks26AN8+HOGvSAvdZE0H4Xe4m6h4AseZs/4kdbuuc3CkYEJWqds ttH4OJ1D6Dt+XB9vYOwBaIdjcmxuKiRMG6jv44uvvzDh848e++LHwJ4yu0CIkuaqk8uZ7uckHYJ1e 67xFoz9hQEM4t7Nj60NnA+QVGEdWlk/6oSj7RsdV8WxLrq1HY2cHtjD2st4drOi/ISLGXK1wdGr93 gqeFcVjdcjO8o9wqHOBj6Of4+c9OyyUdEfQlJX61NMojNmeh9QE2M3IfpQYTZQM509HJZmLJO41v+ i8alh7oA==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1uz9fD-00000007Z6L-0R7F; Thu, 18 Sep 2025 08:06:47 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id 15EFB3003C4; Thu, 18 Sep 2025 10:06:46 +0200 (CEST) Message-ID: <20250918080205.442967033@infradead.org> User-Agent: quilt/0.68 Date: Thu, 18 Sep 2025 09:52:20 +0200 From: Peter Zijlstra To: tglx@linutronix.de Cc: arnd@arndb.de, anna-maria@linutronix.de, frederic@kernel.org, peterz@infradead.org, luto@kernel.org, mingo@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, linux-kernel@vger.kernel.org, oliver.sang@intel.com, jstultz@google.com Subject: [PATCH 1/8] sched: Fix hrtick() vs scheduling context References: <20250918075219.091828500@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The sched_class::task_tick() method is called on the donor sched_class, and sched_tick() hands it rq->donor as argument, which is consistent. However, while hrtick() uses the donor sched_class, it then passes rq->curr, which is inconsistent. Fix it. Signed-off-by: Peter Zijlstra (Intel) Acked-by: John Stultz Reviewed-by: K Prateek Nayak --- kernel/sched/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -875,7 +875,7 @@ static enum hrtimer_restart hrtick(struc =20 rq_lock(rq, &rf); update_rq_clock(rq); - rq->donor->sched_class->task_tick(rq, rq->curr, 1); + rq->donor->sched_class->task_tick(rq, rq->donor, 1); rq_unlock(rq, &rf); =20 return HRTIMER_NORESTART; From nobody Thu Oct 2 10:57:10 2025 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C1FEA2EFDBA for ; Thu, 18 Sep 2025 08:06:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758182818; cv=none; b=cZb0jJT6BmZaphTYCbpysoh3bHR2BjDokkf6kA4QnH1CWJoQL06q0xhU/N6y3G/yr7xZVnIjbChj9ee+6zBdma+GEY1B/qeyDcQykAWuH3XU9qE1aR1wFMLUu2DbGGtcXMjX9f+tFmA1/fpff00+jtaQx/1707vYxqKiD0prRLw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758182818; c=relaxed/simple; bh=OBJBNAFTWF4epETjD0VTBxin86MSefivKrtTrJzuBYA=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=cvoyqoEUcct2JGr1O05UVaVPPtYoU8dx4+5nh3hi4hGkL2YU+nJvnFfst/g0mssx35ipd8CS2ZReGh6rn5BnhbcQVXTrHA8hIJZ/eN/6Lit+FS2rJqwsW3WSI7m7c3nCGGW3ZiAP69eZKjjmtIpF1CKGLmuLmXcFu41xyhh8Hsc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=MuEoXUzT; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="MuEoXUzT" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=Q0Z5JUpWWqXHKko/fT1GheaPfXqjih9iBoI4fmm5N7E=; b=MuEoXUzTJHiFcx6zIruFYn/3a3 sKg+5IgdA7f3iu/F+byXVXIXbuwJAUpYVJEZZ8T014FxnU5bTI4MIl4UuSbkiwFoH93qQ84Ung1HT M4KvgZfoa9M9l3H84yQpc3Z8RJcjk+wA9+b2hJVfIoHoLDCI0BCPcbfBV1x5QTnthQRz7z3JpG5nq 406aQcnIFU8hM8vIQaQ86aZWqNJPCd9wGBoou4JE6Bzr+3Qv/pSwm4HucBBjuoaO8Kee8lk864uqZ vlC5Es3GqP7bEymuhxB9YKneW+HYC2dm9oDQtVjOTJFtdOj9VIwXHwAhEPzWxDWurAbrQWBcT6RxA /XodDpgQ==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1uz9fC-0000000FbrO-2FJy; Thu, 18 Sep 2025 08:06:46 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id 1AAE5300566; Thu, 18 Sep 2025 10:06:46 +0200 (CEST) Message-ID: <20250918080205.563385766@infradead.org> User-Agent: quilt/0.68 Date: Thu, 18 Sep 2025 09:52:21 +0200 From: Peter Zijlstra To: tglx@linutronix.de Cc: arnd@arndb.de, anna-maria@linutronix.de, frederic@kernel.org, peterz@infradead.org, luto@kernel.org, mingo@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, linux-kernel@vger.kernel.org, oliver.sang@intel.com Subject: [PATCH 2/8] sched/fair: Limit hrtick work References: <20250918075219.091828500@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The task_tick_fair() function does: - update the hierarchical runtimes - drive numa-balancing - update load-balance statistics - drive force-idle preemption All but the very first can be limited to the periodic tick. Let hrtick only update accounting and drive preemption, not load-balancing and other bits. Signed-off-by: Peter Zijlstra (Intel) --- kernel/sched/fair.c | 6 ++++++ 1 file changed, 6 insertions(+) --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -13119,6 +13119,12 @@ static void task_tick_fair(struct rq *rq entity_tick(cfs_rq, se, queued); } =20 + if (queued) { + if (!need_resched()) + hrtick_start_fair(rq, curr); + return; + } + if (static_branch_unlikely(&sched_numa_balancing)) task_tick_numa(rq, curr); From nobody Thu Oct 2 10:57:10 2025 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 58A382E9746 for ; Thu, 18 Sep 2025 08:06:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758182817; cv=none; b=f5E2rKTWUjZ80PH288HqI/hJs40ghLol9Kj/Q0R3oCUzCqYNV9jhQ9nGINJCF4xrVHIrmMvUsDcDoNUbj6zhdw0JZLEo6hJjRzenlTjIBiCgSw6PSGwzoYtdLh+A+KwYmCTWHTMcgkc1vVQh/NQo8a8WbnWXa9FXGvGHFV2vams= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758182817; c=relaxed/simple; bh=hqtAxj4Qr3+FMgaGa6Yi/VBeDgiyWZo+kDAH6v7C9SQ=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=RcsIuW3gbLbEmw/vPfxWzdDE0EPPKdg/IaOv/MuHeiJMS5BV6vGWLyN2fcH0/PgpBPjujaydg5LBExT7dRd96VwnOAWvpeLBOMMzqH5/CTl3m+8cxTjGRoYN4raWBbF3U4eKQ0C2yK0cWfXdWMuGdemg230tvqb/IGmSb1aNyoU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=JK00hdel; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="JK00hdel" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=g7pmX4SB9lVXDbhwfM+ZH5mANed4a5N6NuoCNhx0Q44=; b=JK00hdelnc73SIJ3z9SK+jpKAg kbRRElBhZvUPaUC5e5Zexk9SfX5EImAnRgHEdu1RCl4Prwmi26G0bsQs6hDTSB1nrk3UIkL3ofICR ABgXFnNFyZjpUA4M1FCEnWGs2v3ovYv6rqoT7lch1GnTxknIuvm8/+XcgcQ74t42rvAzfWXZ3uK6J faYWyHMcviziYIutShSEiOnrVbNaksfqFh6I/VDQqpo72eOyfuwS5xO7MtTuq1woXSOCLje8l/ls9 hRFLmhSWhNxuD4DyhQv0BkT45lgIkCb0n73Ciq5TAhNNQXW+3dBwrGQHMe+C3S3i3l60tQ+UDl57V iBCgRI0g==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1uz9fC-0000000FbrP-2D2c; Thu, 18 Sep 2025 08:06:46 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id 1F132302E34; Thu, 18 Sep 2025 10:06:46 +0200 (CEST) Message-ID: <20250918080205.716937764@infradead.org> User-Agent: quilt/0.68 Date: Thu, 18 Sep 2025 09:52:22 +0200 From: Peter Zijlstra To: tglx@linutronix.de Cc: arnd@arndb.de, anna-maria@linutronix.de, frederic@kernel.org, peterz@infradead.org, luto@kernel.org, mingo@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, linux-kernel@vger.kernel.org, oliver.sang@intel.com Subject: [PATCH 3/8] sched/eevdf: Fix HRTICK duration References: <20250918075219.091828500@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The nominal duration for an EEVDF task to run is until its deadline. At which point the deadline is moved ahead and a new task selection is done. Signed-off-by: Peter Zijlstra (Intel) --- kernel/sched/fair.c | 40 +++++++++++++++++++++++++++++----------- 1 file changed, 29 insertions(+), 11 deletions(-) --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6775,21 +6775,39 @@ static inline void sched_fair_update_sto static void hrtick_start_fair(struct rq *rq, struct task_struct *p) { struct sched_entity *se =3D &p->se; + unsigned long scale =3D 1024; + unsigned long util =3D 0; + u64 vdelta; + u64 delta; =20 WARN_ON_ONCE(task_rq(p) !=3D rq); =20 - if (rq->cfs.h_nr_queued > 1) { - u64 ran =3D se->sum_exec_runtime - se->prev_sum_exec_runtime; - u64 slice =3D se->slice; - s64 delta =3D slice - ran; - - if (delta < 0) { - if (task_current_donor(rq, p)) - resched_curr(rq); - return; - } - hrtick_start(rq, delta); + if (rq->cfs.h_nr_queued <=3D 1) + return; + + /* + * Compute time until virtual deadline + */ + vdelta =3D se->deadline - se->vruntime; + if ((s64)vdelta < 0) { + if (task_current_donor(rq, p)) + resched_curr(rq); + return; + } + delta =3D (se->load.weight * vdelta) / NICE_0_LOAD; + + /* + * Correct for instantaneous load of other classes. + */ + util +=3D cpu_util_dl(rq); + util +=3D cpu_util_rt(rq); + util +=3D cpu_util_irq(rq); + if (util && util < 1024) { + scale *=3D 1024; + scale /=3D (1024 - util); } + + hrtick_start(rq, (scale * delta) / 1024); } =20 /* From nobody Thu Oct 2 10:57:11 2025 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 58ABB2EB5C4 for ; Thu, 18 Sep 2025 08:06:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758182818; cv=none; b=cZqW9V3xrBCAh7xRdg+38SYaNuP6sdOQVuG4/QzhJBmRp6dw2CnuXT2C8+0mjyBS3FOFHbMFXn3wqgR3YPlmqLJq/v9UrGKjo6nTXhkymeJu3mqR329GBXyNCB70w73eGR7tV5+wtSeudTX2fqHDuTXZM2ACNZpKSzeIYR2jOYk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758182818; c=relaxed/simple; bh=AeCzUiyAR/I0Tvdv/NWhP4mvqDzurjDvXBES4hJ84S4=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=po6xCeB7YOVRd7KllvGSFb5tCOTi8XTxYjCsxYJoQb1gp/HL/C8zbFCtwcCvkdeqk/TmOSvBa2QZXACnke6oTzuiEhytJLZyTc5eCDX2ZuWvHkGQ+u9Jln6C3U0hxdYWNO3xkmHfMNNutY0ebuRNAc3b0buiSYINjD5SNTnVqM8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=OS3odTvI; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="OS3odTvI" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=6CHdliF5fkO251zOWaCUfJIHKwCAxY5qzq5kGxF+XF4=; b=OS3odTvIBGetwlwqPSQkZb4nX2 pJWeuQ56oENcq8ZCLOQ1ultdGY0zTbBJjhHAIUOekIFruMF64BobXdpbpdIA9JfR4OF6mZA83mFv7 t7HPSbe97V97q9UGd9ERncykm/vS27CAYV98Odbo+9I0mMmKLAHW9QFHokq3nut6rEjJ7CntQIEHV z7sKlynQdNY1VXdZvGJfNRWlV0gB2LWsZ53Y0tTIQ7DxULBr6C8EmgSoby7l33jETZyYkf/CAwpdn gJpLWoUcif70ooWYGe/1jIF0u9lqgOm5Vrmu1cb8VPvgQV+suvTuAp/+zqZCZyZC69uTzCPH+r+HZ sMFMpjxg==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1uz9fC-0000000FbrQ-2B0q; Thu, 18 Sep 2025 08:06:46 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id 23AF5302E5F; Thu, 18 Sep 2025 10:06:46 +0200 (CEST) Message-ID: <20250918080205.835307230@infradead.org> User-Agent: quilt/0.68 Date: Thu, 18 Sep 2025 09:52:23 +0200 From: Peter Zijlstra To: tglx@linutronix.de Cc: arnd@arndb.de, anna-maria@linutronix.de, frederic@kernel.org, peterz@infradead.org, luto@kernel.org, mingo@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, linux-kernel@vger.kernel.org, oliver.sang@intel.com Subject: [PATCH 4/8] hrtimer: Optimize __hrtimer_start_range_ns() References: <20250918075219.091828500@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Much like hrtimer_reprogram(), skip programming is the cpu_base is running the hrtimer interrupt. Signed-off-by: Peter Zijlstra (Intel) --- kernel/time/hrtimer.c | 8 ++++++++ 1 file changed, 8 insertions(+) --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -1261,6 +1261,14 @@ static int __hrtimer_start_range_ns(stru } =20 first =3D enqueue_hrtimer(timer, new_base, mode); + + /* + * If the hrtimer interrupt is running, then it will reevaluate the + * clock bases and reprogram the clock event device. + */ + if (new_base->cpu_base->in_hrtirq) + return 0; + if (!force_local) { /* * If the current CPU base is online, then the timer is From nobody Thu Oct 2 10:57:11 2025 Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 18C642F83AB for ; Thu, 18 Sep 2025 08:07:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.92.199 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758182827; cv=none; b=j+/csvCJZj9TJVzcXAvDdwqWoti4KLAFSPp5aiqSP/u2k2DniR7/1kj8u1vZwS1zrgqRKC2+06sUDD4o9RaLr2wCjqfKlN7F4NzI8jCpNeGefBsjAgOwoC9EjGLD0FWX/Q1R5Mv6ZjqmFNi4T1OBD3ZV53Bf8oIfNa4gXOBBco0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758182827; c=relaxed/simple; bh=Hocxsvqta7ng97jsUGtfkfY8kMmy2AwacDSRcdwgxUk=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=EbOtnqnCIAfesKm0Vfe73yb8Lm1SVbPfe46u7F2Cql4AmuLkP8rILja7rDMJqnyBRrTDRDfBHTZ/MFSEs1EzVz23IpngEjcBZzoZiSYfB24/ayD/NvtCHPMVQynH+i0Jl0wWQ7IOLBGjFZU4nEE6LY+kMH4VrMR/mFHSUDWH6Js= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=LlU7lQXP; arc=none smtp.client-ip=90.155.92.199 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="LlU7lQXP" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=S+8w906lmYBpDgHqxUiliogi7/tD4aoFlTOCeUtEKj0=; b=LlU7lQXPnfgCmJtdbqkljh9AlN NTMPOp3pSGQ4jjl1c+1+a7nlpFnKBgNt8Be/Ob8tXsPV05oqK6o8MIC6V6g3yDOslpBS1Gk4bHQdW c1tc2WGjwm3YGfSRLf/V8hZaEGMhC/bItC6GWBG8Ne0K+0MM0df2sYseepwiDDkfTO/kAzSJFZ4O1 NfrmhLuf2hTuMuHaBeF9XVP7Loiy6SKHs0ZGVgBWbPGc2Z4X3NTaxEYVoxCiABuHk1M/eqfB8nhZ/ uyFxTO57YPdg343LqCeeOfUPvzRAhNzKthNpMsHCQKV9SxL3GOP2zzB/ckkQABeCxQxASGxg7rbiQ 2O/JJeGQ==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1uz9fD-00000007Z6M-0fx8; Thu, 18 Sep 2025 08:06:48 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id 27958302F79; Thu, 18 Sep 2025 10:06:46 +0200 (CEST) Message-ID: <20250918080205.949243191@infradead.org> User-Agent: quilt/0.68 Date: Thu, 18 Sep 2025 09:52:24 +0200 From: Peter Zijlstra To: tglx@linutronix.de Cc: arnd@arndb.de, anna-maria@linutronix.de, frederic@kernel.org, peterz@infradead.org, luto@kernel.org, mingo@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, linux-kernel@vger.kernel.org, oliver.sang@intel.com Subject: [PATCH 5/8] hrtimer,sched: Add fuzzy hrtimer mode for HRTICK References: <20250918075219.091828500@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Upon schedule() HRTICK will cancel the current timer, pick the next task and reprogram the timer. When schedule() consistently triggers due to blocking conditions instead of the timer, this leads to endless reprogramming without ever firing. Mitigate this with a new hrtimer mode: fuzzy (not really happy with that name); this mode does two things: - skip reprogramming the hardware on timer remove; - skip reprogramming the hardware when the new timer is after cpu_base->expires_next Both things are already possible; - removing a remote timer will leave the hardware programmed and cause a spurious interrupt. - this remote CPU adding a timer can skip the reprogramming when the timer's expiration is after the (spurious) expiration. This new timer mode simply causes more of this 'fuzzy' behaviour; it causes a few spurious interrupts, but similarly avoids endlessly reprogramming the timer. This makes the HRTICK match the NO_HRTICK hackbench runs. Signed-off-by: Peter Zijlstra (Intel) --- include/linux/hrtimer.h | 1 + include/linux/hrtimer_types.h | 1 + kernel/sched/core.c | 3 ++- kernel/time/hrtimer.c | 16 +++++++++++++++- 4 files changed, 19 insertions(+), 2 deletions(-) --- a/include/linux/hrtimer.h +++ b/include/linux/hrtimer.h @@ -38,6 +38,7 @@ enum hrtimer_mode { HRTIMER_MODE_PINNED =3D 0x02, HRTIMER_MODE_SOFT =3D 0x04, HRTIMER_MODE_HARD =3D 0x08, + HRTIMER_MODE_FUZZY =3D 0x10, =20 HRTIMER_MODE_ABS_PINNED =3D HRTIMER_MODE_ABS | HRTIMER_MODE_PINNED, HRTIMER_MODE_REL_PINNED =3D HRTIMER_MODE_REL | HRTIMER_MODE_PINNED, --- a/include/linux/hrtimer_types.h +++ b/include/linux/hrtimer_types.h @@ -45,6 +45,7 @@ struct hrtimer { u8 is_rel; u8 is_soft; u8 is_hard; + u8 is_fuzzy; }; =20 #endif /* _LINUX_HRTIMER_TYPES_H */ --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -928,7 +928,8 @@ void hrtick_start(struct rq *rq, u64 del static void hrtick_rq_init(struct rq *rq) { INIT_CSD(&rq->hrtick_csd, __hrtick_start, rq); - hrtimer_setup(&rq->hrtick_timer, hrtick, CLOCK_MONOTONIC, HRTIMER_MODE_RE= L_HARD); + hrtimer_setup(&rq->hrtick_timer, hrtick, CLOCK_MONOTONIC, + HRTIMER_MODE_REL_HARD | HRTIMER_MODE_FUZZY); } #else /* !CONFIG_SCHED_HRTICK: */ static inline void hrtick_clear(struct rq *rq) --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -1122,7 +1122,7 @@ static void __remove_hrtimer(struct hrti * an superfluous call to hrtimer_force_reprogram() on the * remote cpu later on if the same timer gets enqueued again. */ - if (reprogram && timer =3D=3D cpu_base->next_timer) + if (!timer->is_fuzzy && reprogram && timer =3D=3D cpu_base->next_timer) hrtimer_force_reprogram(cpu_base, 1); } =20 @@ -1269,6 +1269,19 @@ static int __hrtimer_start_range_ns(stru if (new_base->cpu_base->in_hrtirq) return 0; =20 + if (timer->is_fuzzy) { + /* + * XXX fuzzy implies pinned! not sure how to deal with + * retrigger_next_event() for the !local case. + */ + WARN_ON_ONCE(!(mode & HRTIMER_MODE_PINNED)); + /* + * Notably, by going into hrtimer_reprogram(), it will + * not reprogram if cpu_base->expires_next is earlier. + */ + return first; + } + if (!force_local) { /* * If the current CPU base is online, then the timer is @@ -1645,6 +1658,7 @@ static void __hrtimer_setup(struct hrtim base +=3D hrtimer_clockid_to_base(clock_id); timer->is_soft =3D softtimer; timer->is_hard =3D !!(mode & HRTIMER_MODE_HARD); + timer->is_fuzzy =3D !!(mode & HRTIMER_MODE_FUZZY); timer->base =3D &cpu_base->clock_base[base]; timerqueue_init(&timer->node); From nobody Thu Oct 2 10:57:11 2025 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6652C2ED853 for ; Thu, 18 Sep 2025 08:06:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758182819; cv=none; b=KtlqvAPnVar/lKXh/zJ8zUDZ4cau511j/SuY1aX6TLJKNhb2aqjL6DUfSuOAZsiO73B3qNLr5j/krdh0QnjB0ziAlkkLb72zyHz6molZE3tBHm7dLZfM0JGix11Zae/B5oNmxkbtrup3wk80YXKgSyeNte/tPaHM2+yiSsNF600= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758182819; c=relaxed/simple; bh=nkRh0zoa1wb7KWILX2rkE+WfDaLMenhee0GDCugCODg=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=EEWYfsO3QiQZONtSKM0GaSucvx5pdX1nQvZGp7sqmcMIJzfR/CrJOsiuOWznLmZJN3otXapZExbuRaTkatrs37JsH/552WLLxkAiV38f5eF7DVKQ0Rwcn4HXxvf8cNCAavvC0Fk30p7mRpwE/IoAmwkmozTaTZgN9dUq8xt3ggs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=kPAlMzXn; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="kPAlMzXn" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=i/e6ipjEvjUpvQ1N8nr9Vxs1Rh/qX1Ik2QYKpUjY4A4=; b=kPAlMzXnrG8RA2n9kRWtJrjnqS 5u14EwjbnK2GbEgeN4RXuY9O6R4X3Jip0YaqpqNDfP5ihnFgrNG0zb/tylKCp4faX1CfVZKJPds4+ mz6ZR6b1/zzwcMk16xuPvcZMKhlpGHLcQS+O2WFOXEYIp1YrGrsiCAG/ntVeGZuXrL+24Q3EXdM2b zLIDwS7yNMlyUvC+mUbwbfz9MUhu3UiKPy2hsB1GpvZjudLc/tDtUBL7TABNiohXLobbng68Gd7de /O/omHegG77FnZrcfTBkUQDKs1PuDXY0BiOr5J+L9nAXlyWPOpC8Rv/WpBGzqcd8ckBJIWDCHQkD4 pKqMkHJQ==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1uz9fD-0000000Fbs2-0Ws3; Thu, 18 Sep 2025 08:06:47 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id 2BE68302FFC; Thu, 18 Sep 2025 10:06:46 +0200 (CEST) Message-ID: <20250918080206.065140324@infradead.org> User-Agent: quilt/0.68 Date: Thu, 18 Sep 2025 09:52:25 +0200 From: Peter Zijlstra To: tglx@linutronix.de Cc: arnd@arndb.de, anna-maria@linutronix.de, frederic@kernel.org, peterz@infradead.org, luto@kernel.org, mingo@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, linux-kernel@vger.kernel.org, oliver.sang@intel.com Subject: [PATCH 6/8] hrtimer: Re-arrange hrtimer_interrupt() References: <20250918075219.091828500@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Rework hrtimer_interrupt() such that reprogramming is split out into an independent function at the end of the interrupt. This prepares for reprogramming getting delayed beyond the end of hrtimer_interrupt(). Notably, this changes the hang handling to always wait 100ms instead of trying to keep it proportional to the actual delay. This simplifies the state, also this really shouldn't be happening. Signed-off-by: Peter Zijlstra (Intel) --- kernel/time/hrtimer.c | 87 ++++++++++++++++++++++-----------------------= ----- 1 file changed, 39 insertions(+), 48 deletions(-) --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -1889,6 +1889,29 @@ static __latent_entropy void hrtimer_run #ifdef CONFIG_HIGH_RES_TIMERS =20 /* + * Very similar to hrtimer_force_reprogram(), except it deals with + * in_hrirq and hang_detected. + */ +static void __hrtimer_rearm(struct hrtimer_cpu_base *cpu_base, ktime_t now) +{ + ktime_t expires_next =3D hrtimer_update_next_event(cpu_base); + + cpu_base->expires_next =3D expires_next; + cpu_base->in_hrtirq =3D 0; + + if (unlikely(cpu_base->hang_detected)) { + /* + * Give the system a chance to do something else than looping + * on hrtimer interrupts. + */ + expires_next =3D ktime_add_ns(now, 100 * NSEC_PER_MSEC); + cpu_base->hang_detected =3D 0; + } + + tick_program_event(expires_next, 1); +} + +/* * High resolution timer interrupt * Called with interrupts disabled */ @@ -1924,63 +1947,31 @@ void hrtimer_interrupt(struct clock_even =20 __hrtimer_run_queues(cpu_base, now, flags, HRTIMER_ACTIVE_HARD); =20 - /* Reevaluate the clock bases for the [soft] next expiry */ - expires_next =3D hrtimer_update_next_event(cpu_base); - /* - * Store the new expiry value so the migration code can verify - * against it. - */ - cpu_base->expires_next =3D expires_next; - cpu_base->in_hrtirq =3D 0; - raw_spin_unlock_irqrestore(&cpu_base->lock, flags); - - /* Reprogramming necessary ? */ - if (!tick_program_event(expires_next, 0)) { - cpu_base->hang_detected =3D 0; - return; - } - /* * The next timer was already expired due to: * - tracing * - long lasting callbacks * - being scheduled away when running in a VM * - * We need to prevent that we loop forever in the hrtimer - * interrupt routine. We give it 3 attempts to avoid - * overreacting on some spurious event. - * - * Acquire base lock for updating the offsets and retrieving - * the current time. + * We need to prevent that we loop forever in the hrtiner interrupt + * routine. We give it 3 attempts to avoid overreacting on some + * spurious event. */ - raw_spin_lock_irqsave(&cpu_base->lock, flags); + expires_next =3D hrtimer_update_next_event(cpu_base); now =3D hrtimer_update_base(cpu_base); - cpu_base->nr_retries++; - if (++retries < 3) - goto retry; - /* - * Give the system a chance to do something else than looping - * here. We stored the entry time, so we know exactly how long - * we spent here. We schedule the next event this amount of - * time away. - */ - cpu_base->nr_hangs++; - cpu_base->hang_detected =3D 1; - raw_spin_unlock_irqrestore(&cpu_base->lock, flags); + if (expires_next < now) { + if (++retries < 3) + goto retry; + + delta =3D ktime_sub(now, entry_time); + cpu_base->max_hang_time =3D max_t(unsigned int, + cpu_base->max_hang_time, delta); + cpu_base->nr_hangs++; + cpu_base->hang_detected =3D 1; + } =20 - delta =3D ktime_sub(now, entry_time); - if ((unsigned int)delta > cpu_base->max_hang_time) - cpu_base->max_hang_time =3D (unsigned int) delta; - /* - * Limit it to a sensible value as we enforce a longer - * delay. Give the CPU at least 100ms to catch up. - */ - if (delta > 100 * NSEC_PER_MSEC) - expires_next =3D ktime_add_ns(now, 100 * NSEC_PER_MSEC); - else - expires_next =3D ktime_add(now, delta); - tick_program_event(expires_next, 1); - pr_warn_once("hrtimer: interrupt took %llu ns\n", ktime_to_ns(delta)); + __hrtimer_rearm(cpu_base, now); + raw_spin_unlock_irqrestore(&cpu_base->lock, flags); } #endif /* !CONFIG_HIGH_RES_TIMERS */ From nobody Thu Oct 2 10:57:11 2025 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 665A62EF662 for ; Thu, 18 Sep 2025 08:06:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758182818; cv=none; b=QMaSFEoJH6JikVpEFjuATfRH1S1DcnOsgYa+rteMtNvycyKAUy9k57SuooIUP8LAuWS17xTZkkC0Yccdxx14awt77iEwkDE/ubVHnTx2aBnPcIfpP/dVc9lvNh/LMoEROHm5Io1hcwQUUy/JSgRr7bIemRtNNoHBZNHerGywbeU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758182818; c=relaxed/simple; bh=BiS576CCzFASsMDNnssFJqocZ8w2bVORDrMvHX51BNg=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=uxVI1AgPIhuO/AgWOZi0PH6tQ4ONPrk9kPPSIRBEZ7nkr0ExkDvEaTGBASxDvdmiXTZo8ghPqVXrSsLWZSsYf/PkowRlKRSMkUEeWJjSdOgbXQOYxo8ieBEipJc0H5pJLsW6TmjV4nofNk2YRMmJeNA4abiz4l5+drgUEf3OH/c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=t80yLKq1; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="t80yLKq1" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=0IHJD1ve6otuEUBwQ7egAocxkTJS+OpX92f7OrXR+qQ=; b=t80yLKq17IDwK8WLOwUgzN7fea K0FPk6zHaVJUoxaLzR59s21cxHXPfwohwKISR1x7pP87pAecvfdoKOC8y+0+86XYD498CHux1SJBD SAvkG+WsXAtogOI6X2wmbVFbUL2sDrAvUuq5+QSjHsrnVCqGdq44Wq5GY5xeiHPlmhCCXQVmSQ1Jn 2a/8jTn/N9r6U+Ei8coPaSjFyEwKXkz9dedeYaiE96jpXXFNgPuDV7/J8qY4ehmcUxk4gREWCqMMB GZDeYgNdSrJo7OlDCdOD/l+qTiS+kFAasZoA8/6mDfUD6tJscPWgKU5JTsXtMX8aBDxu3Itu+Ugfc bXJNw7gg==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1uz9fD-0000000Fbs3-0i2L; Thu, 18 Sep 2025 08:06:47 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id 302DA303002; Thu, 18 Sep 2025 10:06:46 +0200 (CEST) Message-ID: <20250918080206.180399724@infradead.org> User-Agent: quilt/0.68 Date: Thu, 18 Sep 2025 09:52:26 +0200 From: Peter Zijlstra To: tglx@linutronix.de Cc: arnd@arndb.de, anna-maria@linutronix.de, frederic@kernel.org, peterz@infradead.org, luto@kernel.org, mingo@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, linux-kernel@vger.kernel.org, oliver.sang@intel.com Subject: [RFC][PATCH 7/8] entry,hrtimer: Push reprogramming timers into the interrupt return path References: <20250918075219.091828500@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Currently hrtimer_interrupt() runs expired timers, which can re-arm themselves, after which it computes the next expiration time and re-programs the hardware. However, things like HRTICK, a highres timer driving preemption, cannot re-arm itself at the point of running, since the next task has not been determined yet. The schedule() in the interrupt return path will switch to the next task, which then causes a new hrtimer to be programmed. This then results in reprogramming the hardware at least twice, once after running the timers, and once upon selecting the new task. Notably, *both* events happen in the interrupt. By pushing the hrtimer reprogram all the way into the interrupt return path, it runs after schedule() and this double reprogram can be avoided. XXX: 0-day is unhappy with this patch -- it is reporting lockups that very much look like a timer goes missing. Am unable to reproduce. Notable: the lockup goes away when the workloads are ran without perf monitors. Signed-off-by: Peter Zijlstra (Intel) --- include/asm-generic/thread_info_tif.h | 5 ++++- include/linux/hrtimer.h | 17 +++++++++++++++++ kernel/entry/common.c | 7 +++++++ kernel/sched/core.c | 6 ++++++ kernel/time/hrtimer.c | 28 ++++++++++++++++++++++++---- 5 files changed, 58 insertions(+), 5 deletions(-) --- a/include/asm-generic/thread_info_tif.h +++ b/include/asm-generic/thread_info_tif.h @@ -41,8 +41,11 @@ #define _TIF_PATCH_PENDING BIT(TIF_PATCH_PENDING) =20 #ifdef HAVE_TIF_RESTORE_SIGMASK -# define TIF_RESTORE_SIGMASK 10 // Restore signal mask in do_signal() */ +# define TIF_RESTORE_SIGMASK 10 // Restore signal mask in do_signal() # define _TIF_RESTORE_SIGMASK BIT(TIF_RESTORE_SIGMASK) #endif =20 +#define TIF_HRTIMER_REARM 11 // re-arm the timer +#define _TIF_HRTIMER_REARM BIT(TIF_HRTIMER_REARM) + #endif /* _ASM_GENERIC_THREAD_INFO_TIF_H_ */ --- a/include/linux/hrtimer.h +++ b/include/linux/hrtimer.h @@ -175,10 +175,27 @@ extern void hrtimer_interrupt(struct clo =20 extern unsigned int hrtimer_resolution; =20 +#ifdef TIF_HRTIMER_REARM +extern void _hrtimer_rearm(void); +/* + * This is to be called on all irqentry_exit() paths; as well as in the co= ntext + * switch path before switch_to(). + */ +static inline void hrtimer_rearm(void) +{ + if (test_thread_flag(TIF_HRTIMER_REARM)) + _hrtimer_rearm(); +} +#else +static inline void hrtimer_rearm(void) { } +#endif /* TIF_HRTIMER_REARM */ + #else =20 #define hrtimer_resolution (unsigned int)LOW_RES_NSEC =20 +static inline void hrtimer_rearm(void) { } + #endif =20 static inline ktime_t --- a/kernel/entry/common.c +++ b/kernel/entry/common.c @@ -7,6 +7,7 @@ #include #include #include +#include =20 /* Workaround to allow gradual conversion of architecture code */ void __weak arch_do_signal_or_restart(struct pt_regs *regs) { } @@ -71,6 +72,7 @@ noinstr void irqentry_exit_to_user_mode( { instrumentation_begin(); exit_to_user_mode_prepare(regs); + hrtimer_rearm(); instrumentation_end(); exit_to_user_mode(); } @@ -183,6 +185,7 @@ noinstr void irqentry_exit(struct pt_reg */ if (state.exit_rcu) { instrumentation_begin(); + hrtimer_rearm(); /* Tell the tracer that IRET will enable interrupts */ trace_hardirqs_on_prepare(); lockdep_hardirqs_on_prepare(); @@ -196,10 +199,14 @@ noinstr void irqentry_exit(struct pt_reg if (IS_ENABLED(CONFIG_PREEMPTION)) irqentry_exit_cond_resched(); =20 + hrtimer_rearm(); /* Covers both tracing and lockdep */ trace_hardirqs_on(); instrumentation_end(); } else { + instrumentation_begin(); + hrtimer_rearm(); + instrumentation_end(); /* * IRQ flags state is correct already. Just tell RCU if it * was not watching on entry. --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5161,6 +5161,12 @@ prepare_task_switch(struct rq *rq, struc fire_sched_out_preempt_notifiers(prev, next); kmap_local_sched_out(); prepare_task(next); + /* + * Notably, this must be called after pick_next_task() but before + * switch_to(), since the new task need not be on the return from + * interrupt path. + */ + hrtimer_rearm(); prepare_arch_switch(next); } =20 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -1892,10 +1892,9 @@ static __latent_entropy void hrtimer_run * Very similar to hrtimer_force_reprogram(), except it deals with * in_hrirq and hang_detected. */ -static void __hrtimer_rearm(struct hrtimer_cpu_base *cpu_base, ktime_t now) +static void __hrtimer_rearm(struct hrtimer_cpu_base *cpu_base, + ktime_t now, ktime_t expires_next) { - ktime_t expires_next =3D hrtimer_update_next_event(cpu_base); - cpu_base->expires_next =3D expires_next; cpu_base->in_hrtirq =3D 0; =20 @@ -1970,9 +1969,30 @@ void hrtimer_interrupt(struct clock_even cpu_base->hang_detected =3D 1; } =20 - __hrtimer_rearm(cpu_base, now); +#ifdef TIF_HRTIMER_REARM + set_thread_flag(TIF_HRTIMER_REARM); +#else + __hrtimer_rearm(cpu_base, now, expires_next); +#endif raw_spin_unlock_irqrestore(&cpu_base->lock, flags); } + +#ifdef TIF_HRTIMER_REARM +void _hrtimer_rearm(void) +{ + struct hrtimer_cpu_base *cpu_base =3D this_cpu_ptr(&hrtimer_bases); + ktime_t now, expires_next; + + lockdep_assert_irqs_disabled(); + + scoped_guard (raw_spinlock, &cpu_base->lock) { + now =3D hrtimer_update_base(cpu_base); + expires_next =3D hrtimer_update_next_event(cpu_base); + __hrtimer_rearm(cpu_base, now, expires_next); + clear_thread_flag(TIF_HRTIMER_REARM); + } +} +#endif /* TIF_HRTIMER_REARM */ #endif /* !CONFIG_HIGH_RES_TIMERS */ =20 /* From nobody Thu Oct 2 10:57:11 2025 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 15E562C0263 for ; Thu, 18 Sep 2025 08:06:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758182817; cv=none; b=nBbabbzYOhb4zR4QzuE5JACiQjUznweuiMJiNAJcsJ3q8PsWxpEAiac+vZSEJTOvxGAsmXkGg8xpCcmYrWCuzDGROmUdWlVtt1pqBvxJ9SVrmpE6gXfrm/HNBcC+eg5/b8qWqWMtOB19iL27IiDZnvJ2FV2zQWrnJvMi92e0omU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1758182817; c=relaxed/simple; bh=1e2zakSUqeLToQoAPVml6fhi/XriNjG58yMRZpByB7A=; h=Message-ID:Date:From:To:Cc:Subject:References:MIME-Version: Content-Type; b=QC5Ap5qg8c8Q25W5IzKCiq9c73l3KARk4gadzZ1NYHMtK9KU3XS9ec1AYSAJIzU7yN4+ZnO5MY1Xh+cLxWrKcihBGODX1USpbx47xqdkldgXJvEJgFrBDFQtsHesfNKwn7qBDFwnbpUsJXu3DYkp7NfQ0OEfrKFkbIfpr8xPmpI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=HoKt1pLQ; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="HoKt1pLQ" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=NBKWbQzjQqnxgf5mh4ZofQjDAlswEiWQQO80U3lRI7I=; b=HoKt1pLQ/dEMvYt4waSSFdl3Hi RIp3I1XOVNHbT6Aa7ecgMsisljHGCRHOrWZC7Wnoz65vqDjFDoGRiaHwHDca5SualV6RWCiWflmeY To5l1Kd9Jk8GcelsVdicYKHpw/ljzxsOeTzni0xvfrnv8UOc7au8qlkZtnxC7KhSM/RLbFR8r0lQJ pA7dshUv5fenD1r21aFPMdCvtUaJbuS4C8kUueh49iCDDINaWJKpQNuKBdktz9WEKYnbjTS8/aSBL 15eDVzEztQzDGr726Ypk8VQcZolFLR39n6/27vun+7rpGOoi0bCb1m0lOxYZ1LcheRfJ4B95l02Ty I//8hUQQ==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1uz9fD-0000000Fbs4-0nJG; Thu, 18 Sep 2025 08:06:47 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 0) id 34615303003; Thu, 18 Sep 2025 10:06:46 +0200 (CEST) Message-ID: <20250918080206.295040075@infradead.org> User-Agent: quilt/0.68 Date: Thu, 18 Sep 2025 09:52:27 +0200 From: Peter Zijlstra To: tglx@linutronix.de Cc: arnd@arndb.de, anna-maria@linutronix.de, frederic@kernel.org, peterz@infradead.org, luto@kernel.org, mingo@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, vschneid@redhat.com, linux-kernel@vger.kernel.org, oliver.sang@intel.com Subject: [RFC][PATCH 8/8] sched: Default enable HRTICK References: <20250918075219.091828500@infradead.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" For the robots.. let us find regressions. Signed-off-by: Peter Zijlstra (Intel) --- kernel/sched/features.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/kernel/sched/features.h +++ b/kernel/sched/features.h @@ -63,8 +63,8 @@ SCHED_FEAT(DELAY_ZERO, true) */ SCHED_FEAT(WAKEUP_PREEMPTION, true) =20 -SCHED_FEAT(HRTICK, false) -SCHED_FEAT(HRTICK_DL, false) +SCHED_FEAT(HRTICK, true) +SCHED_FEAT(HRTICK_DL, true) =20 /* * Decrement CPU capacity based on time not spent running tasks